CO: The Chameleon 64-Bit Microprocessor Prototype: B. F. Sgs-Thomson
CO: The Chameleon 64-Bit Microprocessor Prototype: B. F. Sgs-Thomson
B. Ramanadin, F. Pogodalla
SGS-THOMSON Microelectronics 5 bis, chemin de la Dhuy 38240 Meylan FRANCE
Abstract
In the context of designing a complex chip, it is important to have the capability of prototyping the design. Prototypes are the key to ensuring the architecture is implementable, allowing application development to start before silicon, validating the pelformanee estimates (micro-architectural issues).
This paper presents the prototyping strategy chosen by the Chameleon 64-bit microprocessor developement programme of SGS-THOMSON Microelectronics. We will discuss the choice of making a silicon prototype, its aims (strongly related to the applications development allowance), its realisation, schedules and status.
verification, in-circuit emulation and system verification). Real prototype (field of application:software tools, OS and application software development without final product).
To allow software development to start as early as possible, we decided to go for a silicon prototype, for the reasons that will be detailed now.
1. Chameleon
Chameleon is a family of next-generation microprocessors developed by SGS-THOMSON Microelectronics. It is a modular, core-based 64-bit superscalar architecture.This programme involves 150 persons over 3 sites (Meylan -France, Bristol -UK and Cagliari -Italy). The team is composed of architecture, design, marketing, applications and software groups. The first Chameleon products are targeted at multimedia applications, with special emphasis placed upon software programmability. The intention is to provide not only a silicon but also OS, libraries and applications with the product. This requires a huge amount of software development in parallel with the design activity. In this context, it is obviously intended that the software is written, developed and debugged as much as possible while the design goes on, thus raising the necessity of a prototype. Several choices are then possible, among them: I ) an instruction set simulator in high level language ( C ) 2 ) a virtual silicon (acceleration and/or emulation) 3) an ASIC
In fact all these 3 solutions have been fully developed within the programme, but each of them for a particular purpose. The instruction set simulator allows generation of a reference to check against VHDL (field of application: verification and software tools). Virtual silicon (field of application: functional
140
0-8186-7603-5/96 $5.00 0 1996 IEEE
NO featbtltty nor tmpkmcntauon SNdlcs Far from futun pmduct Luuons on execuuon cnvironmenf
3. Methodology
This section details the main steps to go from concept to working hardware. Considering the global Chameleon programme and the fact that software develoPent must start 1 year before real silicon availability, a reduced risks design methodology must be chosen and architecture simplification must be performed.
Execution time and distance from reality are too critical to rely on this solution
CONS
Expcnsive (I MS) No distribution
The global technical and financial compromise is not satisfying enough to choose this solution
3.3. Technology
0.7 p, HCMOS process, tripie metal level, supplied by SGS-THOMSON Microelectronics ISB280()0 family, sea of gates (r150 kgates) CPGA, 256 pins ne design is targeted to rzln at 25 Mtlz tactually NnS at 28Mhz) Full scan DFT approach (97% of fault coverage)
CONS
Expcnsive (4 men-ycar) Medium development omc (lyca )
As the chosen operating system for use with Chameleon processors has been developed using industry standard personal computers, a PC running with a Chameleon processor prototype will be used as the development system. This proto-
141
Synthesis was performed by using SYNOPSYS tools. This task occupied 2 engineers during 3 months. Comparison between VHDL and gate-level netlist was performed by sampling external input/outputs of the chip at each cycle. the gate simulator used at this point was VERILOG-XL. We used the synopsys verilog-out option to enter this simulator. Scan was introduced once the complete chip had been synthesized. We used edif-out option from SYNOPSYS to enter MENTOR / FASTSCAN. No post-scan timing optimization has been performed.
group and is used as the reference for the VHDL. It does not have any other purpose than being a reference at instructionlevel. This means that is not relevant in terms of timing as compared to the VHDL model (no cycle-accuracy). One key point is that there is no component-level verification: each component of the design is integrated into the full chip in order to be verified. This is a chip-level verification. The design is used in a sort of real context, that is rather than applying vectors to it with a VHDL bench, it executes real Chameleon binary code (so-called a test-case) via a simulated memory. The following figure shows the VHDL architecture that is actually simulated.
r s
0x90000000
Tme)
Ox00008ffff
Halt
! 4
CO
prototype
I Clock 1
ddress-space Memory
ox903f ff f f
Interrupts irritator
0x00800010
This chip-level methodology avoids the complicated development and maintenance of VHDL test-benches, which are highly sensitive to any modification of interfaces, timing, etc ... By verifying the design as a whole, we not only guarantee that the functionality of each block is correct, but we also make sure that the design acts as a complete chip within some realistic representation of its environment. In addition, debugging is facilitated by working at assembler-level rather than at bit-vector level. The verification of features such as interrupts or external bus arbitration is done via external VHDL devices that can be programmed from the test-case to trigger interrupts, generate bus requests, etc ... This allows to control the external events that CO receives from the test-case itself, thus again not requiring any test-bench level signals manipulations. The VHDL test bench consists in: I ) asserting the reset line, causing the memory to be downloaded and the design to be reset 2 ) de-asserting the reset line to allow the program execution to start
3) wait for a specific memory access which notifies that the program is over, and that the memory can be dumped to perform the result checking
142
As a summary, the folllowing figures shows the verification flow of the model. One of the advantages of this methodology is also that the verification of CO prototype requires the development of a large number of test programs. These programs constitute a important database that is used for achieving the first part of the verification of the final design CO is a prototype of. Here are given some key figures of this Verification process: number of instructions simulated: 2818281 in test-cases + 1928224 in C programs number of cycles simulated: 27935309 simulation time: 1Ms CPU (2277h) As stated, the simulation time is quite big. The VHDL simulation are actually concurrent over several CPUs, dividing the overall simulation time. In addition, an IKOS NSIM hardware accelerator is used to improve the turn-around-tirne. Its performance is around 40 times greater than VHDL simulator one.
functional reference 2) an assembler, which generates object-code from a Chameleon assembler file 3) a linker which allows to build binary programs from one or more object-files 4 ) a C compiler which is mainly used for high-level applications. This means that the tests, which are targeted specifically at the verification of the prototype, are written in assembler, whereas C is used for more general code. As an example, it is used for standard benches like dhrystone, sieve, fibonacci computations,...
3.8. Daughterboard
A prototype is not useful until it is plugged into a working environment which allows to use it for development and experimentation. Thus CO silicon is plugged into a host, resulting in a full system prototype. In order to reduce the cost of host development, and because of the first OS that is to be ported, a standard PC host has been chosen as the development platform. The aim is to use CO prototype as the CPU of a system that does have all the common capabilities like disks, monitor, etc... But as CO does not have a compatible pinout with Intel 80486, the development of a specific board is required, which role is to adapt CO interfaces to the PC. In addition, this board provides some control capabilities as well as some memory. The next figure shows the daughterboard organisation.
-SW toclchain
\
/
/
Assembler file
Generahantools \
--------Acceleration
I
\
\
\
\
\
\
Pl-7TI
Motherboard
proto
/
SimulationpJols
- - __ -
Control
IkosNSlM
1
T4
v
Cache
The remote host, connected via a link and a on-board T4 (transputer) , is used to control CO operations: download bootcode into local memory, access on-board control regis-
143
ters, etc .... It provides the capability of debugging the software running on CO. The PC is the target machine to which the OS is ported. The development of this board is far from being a negligible task, and as detailed later on is done in parallel with the design of CO.
8
Processor
Specificatioru
4. FAQ
4.1. Monitoring and analyzing target architecure
Analyse and monitor of the run-time behaviour of the target architecture is an issue. The big question is do we met our objectives. Chameleon programme address this particular point by using an architecture simulator. This simulator relies on informations from design team (pipe structure, reordering, dependancies, number of cyle for an instruction to leave the pipe....). This simulator was built in early stage of the design to check we can run our application. It is updated periodically. The prototype does not replace the architecture simulator. For the expected application, one must consider the architecture is frozen.
Processor
Si deveiopmer
CO
Specificationi
CO
Si developmen
Daughterboar development
Basic micr*ken software developr Daughterboar integration
The results of this programme are: I) C processor and daughterboard are working successO fully (Perf 5 Mips) 2) Software tool chain (Compiler, assembler, linker, simulator, debugger) is fully operational. 3) OS and application software development started end November 95 4 ) 15 systems are now being used in development.
4.3. Limitattions
The only limitation is the performance. The prototype, for users is a 5 MIPS box while the final processor is a 100 MIPS same box.
6. Conclusion
The complexity of some of todays design is such as prototypes are often required to confirm architectural choices, performance estimates but also to allow some back-end tasks to start (e.g. software development). A prototype has to match some specific requirements, not only in terms of functionality and performance but also in terms of ease of use or development costs. Several types of prototypes are possible, depending on the project constraints: software, virtual hardware, real hardware. Each of those have specific advantages (flexibility for software, power for hardware) but also important drawbacks (power for software, flexibility for hardware). To address the particular problem of software-hardware concurrent design within Chameleon programme, the choice
scratch and the time to market is a key factor for the success of the programme as a whole. Thus any effort for parallel software and hardware co-design must be
144
to go for a real ASIC prototype was mainly driven for the following reasons: technology is available and cheap Performance (3 MIPS) wide distribution capability (for development) customer demonstration capabilities architecture implementability evaluation Despite the rather higlh (4 medyear) development cost (full design flow, from VHDL to sea of gate via synthesis, verification,...), this solution is a success today: it does provide more performance than any emulated prototype that could have been built (5 MIPS vs 200 KIPS) 15 systems are available for development technology and metbodology allowed FTSS (First Time Silicon Success) Finally, another important advantage to notice on this methodology is that it exercises some parts of the design flow for the final design, which is also some kind of a prototyping usefulness.
Acknowledgements
To all people involved in the CO project (architecture: , software, board design, silicon design, verification, staff, management) and more specificidly to : Henry Guyot, Christian Bicais, Christian Berthet,, Ariel Lasry, Andrew Betts, Jon Frosdick, Carlo Gallino, Nathan Sidwell, Chris Dunford, Jeff Wilson, Mark Debagge, and Genevieve Bartlett.
References [ I ] D.A. Patterson. J.L. Henessy Computer organization & design, The hardware I softuare inteirface Morgan Kaujinunn Publisher?. 2929 Campus drive, Suite 260, San Matteo, CA 94403 [2] J.P. Hayes, Computer architecture and organization McGraw. I221 Avenue of the Americas New York. N.ZIOO20 [3] Chameleon-Progra, CHAMELEON ARCHITECTURE:, CPU Architecture manual CH067-01. Corijidential [4] B.B.. Brey, The Intel microprocessors Maxwell Macmillan Canada, Inc. 1200 Eglinton Avenuf East, Suite Don Mills Ontario M3C 3Nl
145