A Parametrizable Hybrid Stack-Register Processor as Soft Int

更新时间:2023-04-17 13:13:01 阅读量: 实用文档 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

A Parametrizable Hybrid Stack-Register Processor as Soft

Intellectual Property Module

Peter L¨u thi,Thomas R¨o wer,Manfred Stadler,Daniel Forrer,Stefan Moscibroda,Norbert Felber,

Hubert Kaeslin,Wolfgang Fichtner

Integrated Systems Laboratory,Swiss Federal Institute of Technology,Z¨u rich,Switzerland

Abstract—Hardware/Software Co-Design usually encoun-ters serious problems to guarantee strong real-time con-straints while serving many interrupt routines.We present an enhanced register-based RISC processor,which is capa-ble of launching every interrupt routine within two clock cycles.This processor is implemented as soft IP-Module and features a customizable instruction set,extensive pa-rameterization,and a synthesis model with separate core and interfaces.An automatic derivation of adequate test vectors from the current parameter setting veri?es the correct functionality.

I.Introduction

The design of integrated circuits is currently subject to extensive changes.Until now,project-speci?c code has been written for every new design.This results in highly optimized code for the target application,but also leads to inacceptable development time,especially for large designs. Since time to market and product lifetime are shrinking while,at the same time,circuit complexity is growing,the traditional way of designing integrated circuits has to be altered to achieve higher e?ciency than before.As the complexity of circuits increases,there is urgent need for new design methodologies,that allow fast development of demanding applications up to complete system-on-a-chip integrations[1][2].

One possibility to cope with this e?ciency problem is the use of Intellectual Property(IP)Modules or Virtual Components(VC).Quick and easy adaptations on the reusable blocks speeds up system design and provides more time for thorough testing,an important issue in cost-intensive chip design.

Another new trend in system design is the inclusion of programmable parts into the ASIC.This is commonly termed Hardware/Software Co-Design,which means that a system functionality is partitioned into hardware and software running on an embedded processor.This solution o?ers great?exibility by allowing fast alterations in func-tionality if project or system speci?cations are to change. Although software on embedded systems is very conve-nient,it poses numerous problems in conjunction with hard real-time requirements.Demanding telecommunication applications ask among others for an interrupt latency which is hard to obtain from traditional microprocessors. This paper presents an embedded processor with a new approach in processor architecture to guarantee such demanding real-time constraints.Section II elaborates The authors want to thank KTI(Swiss Commission for Technology and Innovations)for funding this project.on the processor-speci?c IP requirements,introduces the architecture of the processor and its parameterization. Section III explains the functional veri?cation?ow for the IP-Module.In Section IV,we give an impression of the area occupation for di?erent parameter sets and the details of our test integration.In the?nal section,we summarize the results of our work and discuss possible future enhancements.

II.Project”SILVERBIRD”

A.IP requirements

Designing an embedded processor as a soft IP-Module engenders di?erent design problems not encountered during the design of an application-speci?c integrated circuit (ASIC).

First of all,the user needs the ability to decide whether the IP-Module meets his requirements at the beginning of his project.Therefore all functionality and all limits concerning the processor IP must be stated clearly. Moreover,the designer of a processor IP-Module has to be aware of the following issues:

1.A microprocessor IP has to be highly adaptable to satisfy multiple application requirements.Qualitative customization of the instruction set as well as quantita-tive customization of hardware parameters are important prerequisites.

2.An IP-Module can be plugged into di?erent system envi-ronments.To make this possible,either various communi-cation protocols must be supported by the IP itself or the possibility for the user to implement application-speci?c interfaces has to be clearly de?ned.

3.The processor IP-Module has to provide a convenient environment to allow for fast functional veri?cation.It is highly recommended that this feature is already available during implementation time to verify the hardware/soft-ware concept and to reveal possible conceptual errors.

4.The ability to support a high-level programming lan-guage to allow quick adaptations on the software-based functionality.This feature greatly simpli?es future soft-ware enhancements.

B.The”SILVERBIRD”IP-Module

During the design of our processor IP,close attention has been paid to all particularities of IP-Module design.We have realized a processor IP-Module featuring the following items:

Fig.1.Architecture of the”SILVERBIRD”RISC processor

?Qualitative adaptability:The functionality of the pro-cessor IP can be modi?ed by customizing the instruction set.

?Quantitative parametrizability:Only the basic archi-tecture of the processor core is?xed:Two separate ALUs for data and address computations,register-based architecture,and Harvard memory organization.On the other hand,all key parameters of this architecture are adaptable to the current application’s needs.Fig.2shows the customizable parameters in Greek letters.

?Quick interface adaptations to comply with the target speci?cations.To achieve a clearly structured IP,the pro-cessor core and the system interfaces have been separated.?The register-based architecture simpli?es the implemen-tation of a high-level programming language compiler.?An assembler is part of the IP.It is self-parameterizing based on the parameters con?gured for the RTL model.

C.The processor architecture

To meet the demanding requirements of managing high interrupt loads and being parametrizable,we have decided to combine the advantages of a stack architecture with the ones of a register-based approach.Therefore,the general purpose registers of our processor are implemented as top-of-stack registers.In case of an interrupt,precious processing time can be saved by just pushing the current register contents on the stack.The maximum interrupt latency achieved by our architecture is two clock cycles. To obtain maximum processor performance,neither the pipeline is ever?ushed nor any no-operation cycles are performed.

A striking argument against a pure stack processor was the need for compiler-compatibility:A compiler for a stack architecture is di?cult to implement because it always needs to trace the exact position of each register[3][4]. As a consequence,the entire stack has to be controlled by”push”and”pop”instructions.Our solution provides a?xed amount of general purpose registers for every interrupt level.The whole stack control is done by the processor itself and requires no software-based”push”and ”pop”operations.This organization is easy to support by a high-level compiler since the compiler does not have to control the stack at all.

One slight disadvantage of our architecture is the large chip area taken by the stacks,a consequence of the traditional trade-o?between speed and area.But this can be avoided by implementing an interface from the top-of-stack registers to an on-chip RAM and spilling the major part of the stack contents to the RAM.It will result in more control logic and maybe in lower performance, unless the user builds a complex control logic to cope with the slow RAM.This way to save chip area is only preferable on large parameter values.On the other hand,decreasing costs for chip area and even increasing integration densities seem to justify this compromise.

Key features of the hybrid stack-register architecture of our RISC processor”SILVERBIRD”are:

?Separate data and instruction memory(Harvard archi-tecture).

?Read-after-write sequences are allowed.Being able to access the same register in consecutive order yields much more e?cient code.

?Data and address register banks have been implemented as top-of-stack,which allows for fast interrupt launch.?A classic four-stage pipeline.In the?rst stage instruc-tions are read from the program memory(Instruction Fetch).After decoding the instruction in stage two (Instruction Decode),it is executed in the third stage (Execute).In stage four,registers are updated and data memory access takes place(Write Back).

?An additional address ALU with reduced functionality for e?cient block access operations on the data memory.?Return addresses and condition code storage on separate stacks.

?Parametrizable instruction set of up to40instructions.

D.Qualitative adaptation of the processor core Application-speci?c optimization of the core’s function-ality can be performed by selecting the number of supported instructionsνneeded for the current application from a total setμof40instructions.The hardware associated with the unwanted instructions will be implicitly discarded during logic synthesis.As an example,if all arithmetic data memory address instructions are disabled and only immediate memory access is retained,the address ALU will be completely removed.Otherwise,with full parame-terization,the address ALU will be inferred(shaded areas I or II in Fig.2).

E.Quantitative parameterization of the processor core From the perspective of an IP user,the most signi?cant adaptations in?uencing the?nal chip area have to be done by choosing the right number of general purpose registers,as well as the required stack depths.A thorough

Fig.2.”SILVERBIRD”IP-Module with customizable parameters shown as Greek letters

evaluation of the parameterization going to be used is strongly recommended since the stacks consume most of the processor area.

The following parameters can be varied:

1.Data Widthαof the processor data path and of all data registers.

2.Address Widthβof the data memory de?ning the maximum addressable memory size2β.The value for the address widthβcan range up to twice the data width (β≤2α).

3.Address Widthγof the instruction memory de?ning the maximum accessible instruction memory size2γ.The instruction width is expressed by[5+2?ceil(log2δ)+α].

4.Number of Data Registersδ:Randomly accessible top of data stack registers.There is no upper boundary for this parameter,but excessive size will result in high area occupation.

5.Number of Address Registersε:Randomly accessible top of address stack registers.It is either possible to increment the address register contents directly with the address ALU or an immediate o?set can be speci?ed within the memory instruction.

6.Data and Address Stack Depthκ:This stack depth de?nes the limit of simultaneously launched interrupt service routines.The expression[κ?(δ?α+ε?β)]speci?es the total number of required registers in the data and address stacks.

7.Return Address Stack Depthπ:The return address stack depthπis dependent on the number of supported interrupt priorities and subroutine calls and has therefore to be adapted to the application by hand.The total amount of return address stack registers is calculated as γ?π.

An interrupt launch is always accompanied by the start address of the corresponding interrupt service routine.The width of this address is equal to the instruction memory address widthγ.Therefore,the interrupt routines can be spread over the whole program memory.The carry?ag is saved on the condition code stack during an interrupt.The condition code stack’s depth equals the number of interrupt priority levelsκ,its width is1bit.

In addition,the processor supports trap instructions with di?erent prioritiesλto allow for communication between processor and interrupt controller.When the processor executes a trap instruction,the corresponding priority will be passed to the external interrupt controller. This instruction can also be viewed as software interrupt from the processor to the interrupt controller.Finally, there are four di?erent run-time exceptions:Data and Address Stack Under-&Over?ow and Return Address Stack Under-&Over?ow.

F.System interfaces

The processor IP-Module provides separate core and system interfaces to permit for easy adaptation to the target environment.A parametrizable data memory in-terface with a FIFO bu?er supports Asynchronous Static RAMs.It allows memory burst writes from the processor core to the data memory without any processor stalls. FIFO depth,memory read latency,and write latency are inpidually parametrizable.For the instruction memory, an interface based on another Asynchronous Static RAM is available.Interaction and data exchange between the embedded processor and its system environment is done by an external interrupt controller and memory-mapped IO.

III.Functional verification Functional veri?cation of highly parameterized IP-Modules poses several problems.As described in section II, the IP-user can choose a parameter set that exactly matches the application-speci?c requirements.After this customization,the functional correctness of the IP-Module has to be veri?ed.Because the designer of a parametrizable IP-Module can not provide test vectors for all possible parameter settings,a behavioral model has to do so:The expected responses for the synthesizable RTL model are generated based on the current parameterization by this behavioral model.

A suitable functional veri?cation?ow has already been published and comprehensively discussed in[3]and[5]. Here we only want to sketch this functional veri?cation method.As Fig.3shows,the whole con?guration?ow is based on one con?guration package.This guarantees consistency of the IP for synthesis and veri?cation.An assembler source code?le is used as common starting point for the?ow consisting of the following three steps:

1.Translation of the assembler code into binary format by the parametrizable assembler(right side of Fig.3): Thereby the assembler refers to the con?guration package to check the range and validity of all parameters associated with the corresponding instructions in the assembler source code.In case of a mismatch,an error message is reported by the assembler.Since the assembler is entirely integrated

Fig.3.Veri?cation?ow of the”SILVERBIRD”IP-Module into the veri?cation?ow,it’s correctness is implicitly assured as well.

2.Generation of the expected responses from the assembler source(left side of Fig.3).A behavioral model of the processor serves as a generator of the expected responses. They are?rst converted into generic format.The be-havioral model processes the code and generates separate expected response?les for every interface,which are used later on for the veri?cation of the RTL model.

3.Functional veri?cation of the customized RTL model (right side of Fig.3):The binary code generated by the assembler is passed to the RTL model,which processes the code using the custom-speci?c settings.In a?nal step the generated output is compared against the expected re-sponses and test report and memory log?le are generated.

412843e8998fcc22bcd10d09parison of various parameterization

examples and test integration

A.Con?gurations for the test synthesis runs

To get an impression of the area required by the pro-cessor IP-Module,we present an evaluation of12di?erent parameter settings.For the demonstration of the e?ects of both quantitative and qualitative parameterizations, synthesis runs with di?erent numeric parameters as well as with full and reduced instruction set have been carried out.Thereby the data widthαwas set to either8, 16or32bit and the address widthβto10or15bit. The qualitative changes have been addressed by

taking

Fig.4.Various synthesis runs with di?erent parameterizations the entire instruction setμ,or a set reduced to unsigned instructions and immediate memory access only.As stack depths,we assignedκ=3to the data and address stacks andπ=8to the return address stack.There areδ=8 top of stack data registers andε=1address register.The number of supported traps isλ=8.

B.Results of the various synthesis runs

While the processor version having a complete instruc-tion set inferred both address and data ALU,the reduced version only made use of the data ALU.This is due to the restriction to immediate memory address operations and the elimination of all instructions needing an address ALU for calculations on address registers.Although the tiny8 bit processor version does not reveal big di?erences between full and reduced instruction set,a signi?cant in?uence is obvious with the32bit implementation.Increased area di?erences show up when larger data widths are chosen. This is to be interpreted as a consequence of the timing constraints,which are gaining more and more impact on large parameter values.The extra area overhead of the full instruction set version is likely to come from an increased inference of parallel acting functional blocks for meeting the timing requirements due to the higher complexity of the data ALU.From the perspective of area e?ciency,the IP-Module allows no major optimizations concerning the chip area.The main part of the area required by the processor is occupied by data,address and return address stack registers.The area taken by these registers is inherent with the process used.The only combinatorial part,which can be in?uenced in size with di?erent synthesis constraints is the execute stage.But the percentage of combinatorial chip area is of minor scale and therefore negligible.

C.Test integration

For?nal veri?cation of the functionality and quali?cation of speed and power consumption of our IP-Module,we let the circuit fabricate with on-chip data memory(see Fig.5 and Table I).For the test implementation a0.6μm3layer

Fig.5.Picture of the test integration with on-chip data memory metal CMOS process has been chosen.The settings of the parameters for the integration are given in Table II.

Process0.6μm3LM CMOS Supply Voltage5Volt

Chip size incl.pads4.6×4.6mm

Chip area incl.pads21.16mm2

Core area incl.data memory13.69mm2

Max.operating frequency121.5MHz

Max.throughput121.5MIPS

Max.interrupt latency

(without memory R/W stalls)

2Tclk =16.46ns

TABLE I

Key values of the test integration

The program and data memory interfaces are imple-mented for asynchronous RAMs.While the data memory has been chosen on-chip,the instruction memory is placed o?-chip.Finally all stacks(return address stack,condition code stack,data and address stacks)are implemented using registers.

V.Results and Outlook

We have developed an embedded processor IP-Module that is highly adaptable in both functionality and con?g-uration.This was achieved by separating the processor core and the system interfaces.The hybrid stack-register processor is excellently suited for applications with high interrupt loads.There is a convenient functional veri?-cation?ow covering automatically the con?guration of the processor.It uses assembler code as common starting point for both synthesizable RTL model and behavioral model. Furthermore,implementing a high-level language com-piler could so be done easily because of the register-based architecture,the read-after-write operation support,and

Param.Description Value αData Width16bit

βData Memory Address Width10bit

γInstruction Memory Address Width

&Interrupt Address Range

11bit

δNumber of Data Registers12

εNumber of Address Registers2

κData/Address Stack Depth4

πReturn Address Stack Depth20

λNumber of available traps8

ν =μNumber of supported instructions40

Instruction Width

(5+2?ceil(log2δ)+α)

29bit

TABLE II

Parameter settings for the test integration

the fully parametrizable assembler.As the assembler is already integrated into the veri?cation?ow,no further adaptations to the?ow are necessary to check the complete processor-assembler package.To compile high-level code for the current con?guration of the IP-Module,a compiler needs to know about the instruction set,the data width, the number of data and address registers and the data memory address range.

As conclusion,we estimate the design e?ort for the IP to be about twice as much as for a one-time implemen-tation.But this extra e?ort is easily recovered in future system designs because the extensive parameterization,the adaptable instruction set,the con?guration-independent veri?cation?ow and the self-parameterizing assembler make reuse of our IP-Module very simple.Additionally,the ability to check hardware/software concepts already during implementation time allows for straightforward design and saves precious development time.

References

[1]Stefan Pees,Martin Vaupel,VojinˇZivojnovi′c,and Heinrich Meyr,

“On core and more:A design perspective for systems-on-a-chip,”

in Proc.International Conference on Application-Speci?c Sys-tems,Architectures and Processors.IEEE Press,July1997,pp.

448–457.

[2]Rajesh K.Gupta and Yervant Zorian,“Introducing core-based

system design,”IEEE Design&Test of Computers,vol.14,no.

4,pp.15–25,October-December1997.

[3]Thomas R¨o wer,Manfred Stadler,Markus Thalmann,Norbert

Felber,Hubert Kaeslin,and Wolfgang Fichtner,“Intellectual property module of a highly parametrizable embedded stack processor,”in Proc.Twelfth International IEEE ASIC/SOC Conference.IEEE,September1999,pp.399–403.

[4]John L.Hennessy and David A.Patterson,Computer Archi-

tecture a Quantitative Approach,Morgan Kaufmann Publishers, second edition,1996.

[5]Manfred Stadler,Thomas R¨o wer,Markus Thalmann,Norbert

Felber,Hubert Kaeslin,and Wolfgang Fichtner,“Functional veri?cation of intellectual properties(ip):a simulation-based solution for an application-speci?c instruction-set processor,”in Proc.International Test Conference.IEEE,September1999,pp.

415–420.

本文来源:https://www.bwwdw.com/article/3clq.html

Top