# Thursday 24 September 2009 Plenary Session 5

### Key technologies for present and future optical networks Jean-Christophe ANTONA ALCATEL, Nozay, FRANCE

Jean.Christophe.antona@alcatel-lucent.com













| System Evolutio<br>SE = Spectral Ef                                                                                              | n in metro/core te<br>ficiency = Channel                                                                       | rrestrial networks<br>Bit Rate / Channel S                                                                       | Spacing (b/s/Hz)                                                                                       |
|----------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------|
| 1990s<br>≥ 2.5-10 Gb/s<br>channel rate<br>≥ 8,16, 40<br>channels<br>≥ 20-160 Gb/s<br>Capacity<br>≥ SE = .02505<br><i>History</i> | 2000<br>> 10 Gb/s<br>channel rate<br>> 100<br>channels<br>> 1 Tb/s<br>Capacity<br>> SE = 0.2<br><i>History</i> | 2010<br>> 100 Gb/s<br>channel rate<br>> 100<br>channels<br>> 10 Tb/s<br>Capacity<br>> SE = 2.0<br><i>Planned</i> | 2020<br>>1 Tb/s !<br>channel rate<br>>100<br>channels<br>>100 Tb/s<br>Capacity<br>>SE = 20 !<br>Needed |
| →When the chan                                                                                                                   | nel bit-rate increases<br>spectral eff                                                                         | s, the system capacity<br>iciency increases.                                                                     | increases only if the                                                                                  |
| Even w/ aggressive 2                                                                                                             | 020 target, traffic gr                                                                                         | owth will exceed capa                                                                                            | city growth by factor 10                                                                               |
| 8   Bell Labs Opt. Networks   January 2009                                                                                       |                                                                                                                | All Rights Reserved © Alcatel-Lucent 2009                                                                        | Alcatel-Lucent                                                                                         |







































Alcatel-Lucent 🕼















Reduction of CAPEX

35 | Bell Labs Opt. Networks | January 2009

- Better fiber/lambda utilization
- Reduced network cost by increasing statistical multiplexing efficiency
- Future proof systems, scalable to manage the expected demand "explosion"

All Rights Reserved © Alcatel-Lucent 2009

<complex-block><figure>

Alcatel-Lucent 🕢

100G: Applications











### Thursday 24 September 2009

## PARALLEL SESSION A5 ASICS

#### Smart Analogue Sampler for the Optical Module of a Cherenkov Neutrino Detector

L. Caponetto <sup>a</sup>, D. Lo Presti <sup>b</sup>, G.V. Russo <sup>b</sup>, N. Randazzo <sup>c</sup> on behalf of the KM3NeT Consortium

<sup>a</sup> INFN Catania/CNRS-IN2P3-CPPM Marseille
<sup>b</sup> INFN Catania/University of Catania
<sup>c</sup> INFN Catania

#### caponetto@cppm.in2p3.fr

#### Abstract

A transient waveform sampler/recorder IC has been developed and realized in AMS C35B4 technology. This chip has been designed to fit the needs of a proposal for a front-end architecture for the readout of the anode signal of the photomultipliers in an underwater neutrino telescope.

The design is based around a 3 channels  $\times$  32 cells switched capacitor array unit sampling its voltage inputs at 200MHz external clock rate and transferring the stored analogue voltage samples to its single analogue output at 1/10th of the sampling rate. This unit is replicated inside the ASIC providing 4 independent analogue sampling queues for signal transients up to 32  $\times$  5 ns and a fifth unit storing transients up to 128  $\times$  5 ns. A micro-pipelined unit, based on Muller C-gates, controls the 5 independent samplers.

This paper briefly summarizes the complete front-end architecture and discusses in more detail the internal structure of the ASIC and its first functional tests.

#### I.INTRODUCTION

The use of an analogue sampler/recorder for the purposes of the front-end electronics of a Cherenkov neutrino telescope demonstrated its effectiveness in detectors such as ANTARES, Nestor and IceCube. The readout system of those detectors provides the time-stamping of single photoelectron (SPE in the following) signals produced by photomultipliers (PMTs) while separating them from background and bioluminescence events which mainly contributes to the acquisition dead-time [1].

The solution presented in this paper has been tailored to sustain an event rate of short pulses (namely: PMT events not longer than about 150 ns and with an amplitude not larger than about 600 mV for a SPE) in excess of 300 kHz with a negligible dead-time. At the same time the architecture must also offer the way to record PMT events longer than approximately 500 ns, as observed in physics simulations of a model detector.

This architecture may equip a single PMT and be placed inside the same pressure-resistant container (Optical Module: OM in the following) or externally to it: one key point is the choice of having an analogue treatment of the PMT signals based on commercial components which could be easily changed during prototyping, allowing the use of the same conceptual architecture even if the board containing the frontend electronics is placed at distances in the order of a few metres from the PMT. This section introduces the main logical blocks inside the proposal and describes the role played by the ASIC (SAS: Smart Analogue Sampler) which has been designed and prototyped in AMS CMOS  $0.35 \,\mu m$  technology.



Figure 1: Block diagram of the proposed front-end architecture

#### A. Optical module front-end

The proposed architecture [2] is based around five main building blocks:

- a mixed-signal ASIC, providing the capabilities of a four layer first-in first-out analogue transient recorder,
- a PMT interface, built with commercial components;
- two commercial ADCs, one providing the digitization of the analogue signals sampled and stored by the ASIC and the other the digitization of a low-pass replica of the PMT output;
- a FPGA, controlling the data transfer between the ASIC and its ADC, the digitization issued by the second ADC and the packing of the digital data into larger frames to be sent to the data acquisition system;

all these functionalities are achieved on the same board constituting the front-end electronics of a single PMT.

Figure 1 shows a functional block diagram summarizing the architecture: it depicts a conceptual view of the PMT interface components and the simplified connection scheme between the SAS the two ADCs and the FPGA.

Two separate prototypes – of the PMT interface and of the ASIC – have been realised: a complete prototype board hosting all the building blocks has been designed and is being produced at the time of writing of this paper.

This board receives the slow control and the synchronous clock (the broadcast distribution of a synchronous protocol for clock and data to all the OMs in the detector is described in [3]) and transmits the collected data out of the OM.

#### B. Store and forward architecture

The data acquisition provided by the architecture is of the store-and-forward type: the off-shore front-end electronics is clocked synchronously by a broadcast signal distributed to all the OMs and all the stored events are time-stamped at the moment they trigger the acquisition. The timing information is then packed together with the digital data to be sent onshore.

The digital time-stamp feature is provided by the ASIC which contains a 17 bits counter: its content is stored in the *record* which is built each time that an external trigger arrives. This record is constituted of two parts: a digital part, which is stored inside a digital 4-levels FIFO, and an analogue part, consisting of the analogue voltage samples stored in the memory cells. When a readout is made all this information is transferred from the ASIC toward the FPGA (with a digitization of the analogue samples provided by the ADC), making available again the storage area for the next events.

The main synchronization clock runs at 200 MHz and a resynch pulse is issued from shore each 500  $\mu$ s in order to verify the timing integrity of the clocks local to each OM: a synchronizing feature is provided by the ASIC, issuing an external pulse on one of its output pins (not shown in Figure 1) each time that the synchronization input is in phase with the internal counter. A missing pulse on this output indicates that the synchronization has been lost during that particular 500  $\mu$ s interval: this is monitored through the FPGA and the data collected during this time slice should be discarded.

The trigger signal for the start of the sampling is provided by the PMT interface through a threshold comparator: a transition on this signal is issued each time that the PMT anode signal crosses the level-0 threshold (about 1/3 of single photoelectron). On the rising edge of the trigger and after a short delay produced by the ASIC internal logic, the sampler is then started for a minimum time window length of 160 ns. During this period the permanence of the anode signal over the threshold is checked again: a decision is then taken whether or not continuing the sampling for a larger period of time. This criterion is described in section II.C.

#### C. The ASIC architecture

The analysis of the transients produced by MonteCarlo simulated events on a model detector led to the adoption of a front-end architecture based around an analogue buffer of four analogue memories, configured in a first-in first-out structure [2] with these memories providing sampling and storing for a foreseen maximum rate of about 300 kHz pulse transients not longer than 160 ns.

The acquired voltage samples would then be serially digitized by an ADC and the digital data transferred to the onboard FPGA. The digital part of the event record, containing the time-stamp, the address of the analogue buffer used for the sampling and the signal classification code, is directly sent to the FPGA, through a serial output, during the readout of the memory content. The latency of this transfer is actually dominated by the analogue part of the record.

Other considerations required the integration inside the architecture also of a larger analogue memory, with the purpose to provide sampling and storing of transients longer than 160 ns (seldom produced by the PMTs). For that reason the architecture also integrates an independent memory offering the same sampling and storing features as the others but 4 times longer (namely, offering storage for transients falling in a time window of  $4 \times 160$  ns). The decision whether to start the sampling of this second unit is taken inside the ASIC during the acquisition of the first 160 ns as described in section II.C.

#### II. The proposed front-end architecture

The initial considerations which originated most of the design choices of the front-end will be now shortly reviewed and a general description of the architecture will be given. Furthermore, the signal chain – constituting the PMT interface and the basic signal classification scheme used inside the architecture – will be discussed.

#### A. Considerations on signal range

The analysis of the simulated high energy neutrino events in a model detector showed that events having an amplitude range of about 1000 photoelectrons and lasting less than 500 ns could be produced within the detector and provide useful information for the physics.

At the same time laboratory measurements on a 10" PMT, operating with a gain of  $5 \times 10^7$ , with an anodic load of 50 Ohm (which produces a SPE signal having an amplitude of about 50 mV with a rise time of about 2.6 ns) showed a linear characteristic curve up to about 100 photoelectrons.

Such considerations lead to the observation that, in order to increase the linearity range of the PMT front-end electronics, two strategies could be followed: both increasing the anode load and dividing the amplitude range over multiple input sub-ranges.

The choice of increasing the anodic load also brings the benefit of decreasing the PMT gain, leading to an increase of its average life time.

The second strategy is implemented replicating through different gains the anode signals and then sampling all these replicas: a classification circuit passes to the sampler unit the information relative to the amplitude of the anode signal. Depending upon this classification, only one of the different replicas will be transferred from the sampler to the ADC for digitization.

A signal path from the PMT to the SAS consisting of three channels with three different gains is shown in Figure 1: this path is implemented with commercial components and feeds the input signals of the sampler. The electronics constituting the signal chain, together with the classification circuit, forms the PMT interface block of the front-end architecture [4].

#### B. Signal input chain

Figure 1 shows that each signal channel inside the PMT interface is made of three logical blocks: a gain block, a low pass filter and a delay block.

The three gains have been selected in order to fully exploit the linearity characteristic of the PMT up to about 500 photoelectrons, providing three sub-ranges of 1 V each. As shown in Figure 1, the first channel is a buffered replica of the anode signals, the second is reduced by a factor 8 and the third by a factor 64. The anode load offered by this PMT interface is 600 Ohm (see [4] for a detailed description).

A sampling rate of 200 MHz has been chosen in order to minimize the amount of data that must be transferred after digitization: a reconstruction algorithm applied to filtered and sampled anode signals already demonstrated an attainable time resolution better than 300 ps with a relative charge error better than 3%. The anode signals are filtered in order to present an attenuation of at least 60 dB at 100 MHz (10b over a 1 V range is the specified resolution of the signal front-end) right after they are attenuated by the three channel gains.

A solid state delay line is also introduced on the signal path in order to take into account the delay needed by the electronics used for the signal classification and inside the ASIC to actually start the sampling after having received the trigger signal. This delay also takes into account the necessity for the timing reconstruction algorithm to have at least a few points of the signal baseline sampled.

#### C. Signal classification

The architecture classifies input signals following two different criteria: the first is based on the signal amplitude and the second on a time-over-threshold criterion. The classification is made partially within the ASIC (time classification) and partially by a set of comparators on the board. The outputs of those comparators are used by the ASIC to apply both the classification criteria.

Two sets of comparators are shown in Figure 1: the three comparators labelled as th1 operate with the same threshold and their outputs are the 3 bit classification of the anode signal amplitude. This is the amplitude classification criterion: that code is stored in the event record built when a trigger arrives and is later used, during the readout phase, to decide which one of the three sampled channels will be transferred to the ADC. The choice to transfer toward the external ADC only one channel out of the available three is crucial for the transfer latency of the architecture in the presence of high rates of short PMT signals.

Still referring to Figure 1, the th0 comparator operates with a 1/3 SPE threshold and produces the trigger pulse, starting the ASIC acquisition. After 100 ns from this start, the SAS stores the amplitude classification code and the status of this comparator output: if this status bit is found high then at the end of the 160 ns sampling period, the signal acquisition will be continued by the longer memory unit described in section I.C. If this unit is still storing samples from a previous acquisition which has not yet being read, the SAS ends its sampling after 160 ns and this condition is signalled to the FPGA. Samples acquired by the 128 cell memory are all

converted by the ADC and transferred to the FPGA: the analogue part of the record is stored in this case in a total of 3x128 cells plus the 32 cells of the analogue FIFO.

#### III. THE SAS ASIC

This section discusses in more detail the design choices made for the realization of the ASIC. A basic macrocell has been first designed and used as a building block for the core of the ASIC. A complete library of asynchronous cells has also been realized and used for the design of the digital FIFO and of the ASIC control unit.



Figure 2: SAS analogue memory macrocell block diagram

#### Analogue memory macrocell Α.

The core of the SAS is built around an analogue memory macrocell. A block diagram of that macrocell is shown in Figure 2. The memory cells are serially addressable using two external clocks: once the addressing units are initialized to the first cell, each pulse issued on one of the two clock inputs shifts the writing or read address to the next cell. There are 96 memory cells inside each macrocell organized in three channels: writing proceeds in parallel along each channel while the readout is always sequential.



Figure 3: SAS memory cell schematic

Voltage samples stored in the memory cells are shifted through a single output allowing the serial readout of only one channel or of the  $32 \times 3$  cells constituting the memory.

The addressing units are simple linear shift registers with a serial input and 32 parallel outputs: both of them use

additional logic cells in order to properly shape the width of the output signals. In particular, the write address unit must provide two different pulses in order to separately control the opening of the memory cells write switches (see Figure 3).

The timing of the falling edge of those pulses implements the so-called bottom sampling strategy: the opening of the *writeTOP* CMOS switch slightly precedes the opening of the *writeBOTTOM* switch allowing the injected error due to the parasitic charge present in each switch, to be (to first order) signal independent.



Figure 4: Schematic of the SAS macrocell readout configuration

The write addressing unit uses True Single Phase Clock dynamic flip-flops to attain the sampling speed of 200 MHz while static CMOS flip-flops are used by the read addressing unit which, during the readout, must attain a readout rate of 40 MHz.

The analogue multiplexer shown in Figure 2 on the output of the macrocell controls the readout behaviour of the memory: depending on the state of the amplitude classification only one channel is switched to the single analogue output of the macrocell. The same multiplexer is also used to sequentially switch through the single analogue output all the  $3 \times 32$  cells of the memory: this solution is used during the readout of transients longer than 160 ns. Figure 4 details the readout configuration used in the macrocell [5].

All the *analogue OUT* connections of the 8 macrocells constituting the chip core are routed to a single node: when the macrocell is idling, storing data or waiting for a trigger, all the outputs of the three bank amplifiers are left open, allowing other macrocells to be read out. Each macrocell is strictly a single-port memory: only writing or reading can happen at the same time on the same unit. The output configuration allows indeed the simultaneous serial writing of more than one unit while another is being readout.

The ASIC uses that macrocell to implement the two analogue memories' functionalities described above: the FIFO analogue buffer and the 128 cell unit. Figure 5 shows a block diagram of the SAS ASIC: the same macrocell here is replicated 8 times and 6 different clock domains are shown. Clock domains from *CLK1* to *CLK4* are commuted in sequence each time that an analogue transient must be stored into the analogue FIFO. Clock domain *CLK5* is only commuted after the analogue transient has passed the time classification criterion described in section II.C. The clock domain produced by the internal LVDS clock buffer is always sent to the 17 bits synchronous counter and used by the

control unit to derive the other 5 domains. The initialization logic inside each macrocell provides a means to immediately start the sampling of the analogue signals when the first *writeCLK* pulse arrives and to do the same for the readout of the macrocell when a *readCLK* pulse is sensed.



Figure 5: SAS conceptual block diagram

Still referring to Figure 5, we note how all the internally buffered input signals are permanently connected to the eight analogue memory macrocells. The four on the left implement the four level analogue FIFO buffer and are activated in a circular way: each time that the control unit dispatches the 200 MHz clock to one of the CLK1-4 domains, the sampling starts. Similarly, the four macrocells on the right are all sharing the same clock domain: their acquisition is then started when the control unit commutes CLK5 on.

The analogue samples stored by each memory are available on the single analogue output pin after the issuing of a *request* action in the SAS *readout request* output pin. The serial shifting of the cell addresses by the read address unit is controlled by the external *readoutCLK* signal: each pulse issues the readout of a single memory cell.



Figure 6: Two-wires asynchronous protocol used in the SAS

#### B. Asynchronous control unit

The 200 MHz clock is internally produced by a LVDS receiver and used to control the write address units of each sampling macrocell: it only needs to be dispatched to the active memory unit during its sampling window for the sampling to start. The control unit which dispatches this clock to the sampling macrocells doesn't need then to be clocked: a fully asynchronous design has been implemented instead. The

overall synchronization strategy used within the control unit and also at the external chip communication layer is based on a simple two-wires *request-ack* protocol.

In that protocol each block transfers data with its own neighbour based on the approach depicted in Figure 6: when data are ready at the sender output, a *request* action is issued. Then the data are consumed by the receiver block and must remain valid on the sender output until an *ack* action is issued by the receiver: this action allows the production of new data by the sender and the restarting of the cycle. Both actions consist of simple level transitions on the two wires used for flow control (low to high and high to low transitions are equally valid actions).

The implementation of this protocol inside the chip has been accomplished using a design strategy based on transparent latches and a control flow circuit based on Muller C-gates [6]. Figure 7 shows the control flow circuit used for the realisation of the digital FIFO which stores the digital data of the output record.

The same C-gate is at the basis of all the synchronization logic operating within the control unit: this unit is normally idling waiting for a rising edge on the chip trigger input. During idle the 200 MHz clock is only used by the 17 bits synchronous counter: a rising edge on this input makes the control unit copy the counter value into the digital FIFO



Figure 7: Digital FIFO: control flow circuit

and launch the analogue sampling of the first available unit in the analogue buffer. If all the units contain valid data waiting for a pending readout, then the trigger event is discarded. When the writing address unit of the sampling macrocell reaches its  $20^{\text{th}}$  cell, the status of the input trigger is stored into the digital FIFO together with the 3 bits classification code, adding this information to the time-stamp already stored. If the status of the trigger signal is low at this point, the sampling proceeds till the  $32^{\text{nd}}$  memory cell of the macrocell and the control unit issues a *request* action on the SAS output pin signalling that a complete record is ready to be read out from the memory. If the status is high, then the sampling of all the input signals is continued by the 128 cell unit. The issuing of the *request* action is then delayed until it reaches the last memory cell.

The same status is checked again before that unit ends its sampling: this will eventually be communicated to the FPGA which then starts the digitization of the signal filtered by the low-pass filter (see Figure 1).

#### IV. FUNCTIONALITY TEST AND CONCLUSIONS

The first samples of the SAS ASIC were received before the board containing the PMT interface, the ADCs and the FPGA was designed. The functional test of the chip was then carried out using a two faces board with a socket and the power supply filters: the analogue output was observed with a scope and the digital output trough a state analyser. The trigger as well as the ack control signal and the classification status bits are emulated using a pattern generator while the 200 MHz LVDS clock used for the sampling is generated by a free running signal generator. Another signal generator provides the analogue waveforms which are sampled by the analogue memories: controlling the timing delay between the input signal and the trigger starting the SAS acquisition allows the measurement of the actual delay between the external start and the first sample acquired by the analogue memory.

Functional tests conducted with this set-up showed that the desired readout rate of 40 MHz could not be achieved: the origin of this problem has been traced back to a wrong design of the bias network of the macrocell bank amplifiers. Tests have been performed at the degraded readout rate of 5 MHz and showed a good signal reconstruction of fast analogue pulses (50 ns input rectangular pulse with 20 ns rise time) up to a 1.8 Vp-p amplitude. The overall power consumption is less than 170 mW (less than 50 mA at 3.3 V power supply) and somewhat larger than expected: also this problem appears to be bound to the same origin.

Though more extensive tests will be performed when the board integrating the ADC and the FPGA will be available, the mentioned problems clearly require another foundry run.

#### V.Acknowledgements

This work is supported through the EU-funded FP6 KM3NeT Design Study Contract No. 011937.

#### References

[1] U. Katz - "Status of the KM3NeT project", Nucl. Instr. & Meth. A 602, (2009) 40-46

[2] Lo Presti, Caponetto, Randazzo - "Low power multidynamics front-end architecture for the optical module of a neutrino underwater telescope", Nucl. Instr. & Meth. A 602 (2009) 126-128

[3] Ameli *et al.* - "The Data Acquisition and Transport Design for NEMO Phase 1", IEEE Transactions on Nuclear Science Vol. 55, No. 1, Feb. 2008 233-240

[4] Sipala, Lo Presti, Randazzo, Caponetto, "A PMT interface for the optical module front-end of a neutrino underwater telescope", proceedings of the *DDECS '07 IEEE Conference*, New York April 2007

[5] Breton, Tocut, Borgeaud, Delagnes, Parsons, Sippach -"HAMAC, a rad-hard high dynamic range analog memory for ATLAS calorimetry," proceedings of the *6th Workshop Electronics for LHC Experiments*, Cracow, Poland, Sept. 11– 15, 2000

[6] Sparsø, Furber editors - "Principles of Asynchronous Circuit Design – A System Perspective", K.A.P. Boston 2001

#### PARISROC, a Photomultiplier Array Integrated Read Out Chip

<u>S. Conforti Di Lorenzo<sup>a</sup></u>, J.E. Campagne<sup>b</sup>, F. Dulucq<sup>a</sup>, C. de La Taille<sup>a</sup>, G. Martin-Chassard<sup>a</sup>, M. El Berni<sup>a</sup>, W. Wei<sup>c</sup>

> <sup>a</sup> OMEGA/LAL/IN2P3, centre universitaire BP34 91898 ORSAY Cedex, France <sup>b</sup> LAL/IN2P3, centre universitaire BP34 91898 ORSAY Cedex, France <sup>c</sup>IHEP, Beijing, China

> > conforti@lal.in2p3.fr

#### Abstract

PARISROC is a complete read out chip, in AMS SiGe 0.35 µm technology [1], for photomultipliers array. It allows triggerless acquisition for next generation neutrino experiments and it belongs to an R&D program funded by the French national agency for research (ANR) called PMm2: "Innovative electronics for photodetectors array used in High Energy Physics and Astroparticles" [2] (ref.ANR-06-BLAN-0186). The ASIC integrates 16 independent and auto triggered channels with variable gain and provides charge and time measurement by a Wilkinson ADC and a 24-bit Counter. The charge measurement should be performed from 1 up to 300 photo-electrons (p.e.) with a good linearity. The time measurement allowed to a coarse time with a 24-bit counter at 10 MHz and a fine time on a 100ns ramp to achieve a resolution of 1 ns. The ASIC sends out only the relevant data through network cables to the central data storage. This paper describes the front-end electronics ASIC called PARISROC.

#### I. INTRODUCTION

The PMm<sup>2</sup> project proposes to segment the large surface of photodetection [3] in macro pixel consisting of an array (2\*2m) of 16 photomultipliers connected to an autonomous front-end electronics (Figure 1) and powered by a common High Voltage. These large detectors are used in next generation proton decay and neutrino experiment i.e. the post-SuperKamiokande detector as those that will take place in megaton size water Cerenkov or 100kt size liquid scintillator one. These news detectors will require very large surfaces of photo detection at a moderate cost. This R&D [2] involves three French laboratories (LAL Orsay, LAPP Annecy, IPN Orsay) and ULB Brussels for the DAQ.

LAL Orsay is in charge of the design and tests of the readout chip named PARISROC which stands for Photomultiplier ARrray Integrated in Si-Ge Read Out Chip.



Figure 1: Principal of PMm2 proposal for megaton scale Cerenkov water tank.

#### II. PARISROC ARCHITECURE.

#### A. Global architecture

The ASIC PARISROC (Figure 2) is composed of 16 analog channels managed by a common digital part.



Figure 2: PARISROC global schematic.

Each analog channel (Figure 3) is made of a voltage preamplifier with variable and adjustable gain. The variable gain is common for all channels and it can change thanks to the input variable capacitance on 3 bits. The gain is also tuneable channel by channel to adjust the input PMTs gain non homogeneity, thanks to the switched feedback capacitance on 8 bits.



Figure 3: One channel schematic.

The preamplifier is followed by a slow channel for the charge measurement in parallel with a fast channel for the trigger output. The slow channel is made by a slow shaper followed by an analog memory with a depth of 2 to provide a linear charge measurement up to 50 pC; this charge is converted by a Wilkinson ADC (8,9 or 12 bits). One follower OTA is added to deliver an analog multiplexed charge measurement. The fast channel consists in a fast shaper followed by 2 low offset discriminators to auto-trig down to 50 fC. The thresholds are loaded by 2 internal 10-bit DACs common for the 16 channels and an individual 4-bit DAC for one discriminator. The 2 discriminator outputs are multiplexed to provide only 16 trigger outputs. Each output trigger is latched to hold the state of the response until the end of the clock cycle. It is also delayed to open the hold switch at the maximum of the slow shaper. An "OR" of the 16 trigger gives a 17th output. For each channel, a fine time measurement is made by an analog memory with depth of 2 which samples a 12-bit TDC ramp of 100 ns, common for all channels, at the same time of the charge. This time is then converted by the Wilkinson ADC. The two ADC discriminators have a common ramp, of 8/10/12 bits, as threshold to convert the charge and the fine time. In addition a bandgap bloc provides all voltage references.

#### B. Digital part.

On overview of the digital part is given in figure 4. The digital bloc manages the track and hold system like a FIFO and starts and stops all the counters [4]. All the data are serialized to be sent out.



Figure 4: Digital part overview.

There are two clocks: one at 40 MHz for the analog to digital conversion and for the track and hold management, the second at 10MHz for timestamp and readout.

The readout format is 52 bits: 4 bits for channel number + 24 bits for timestamp + 12 bits for charge conversion + 12 bits for fine time conversion. The readout is selective: only the hit channels are read; so the maximum readout time will be 100 $\mu$ s if all channels are hit.

#### III. MEASUREMENTS AND SIMULATION.

#### A. General tests.

A dedicated test board has been designed and realized for test the ASIC (Figure 5). Its aim is to allow the characterization of the chip and the communication between photomultipliers and ASIC. This is possible thanks to a dedicated Labview program that allows sending the ASIC configuration (slow control parameters, ASIC parameters, etc) and receiving the output bits via a USB cable connected to the test board. The Labview is developed by the LAL "Tests group".



Figure 5: Test board.

#### 1) Input signal.

A signal generator is used to create the input charge injected in the ASIC. The signal injected is similar, as possible, to the PMT signal. In Figure 6 is represented the generator input signal and its characteristics. The input signal, used in measurements and simulation, is a triangle signal with 5 ns rise and fall time and 5 ns of duration. This current signal is sent to an external resistor (50 Ohms) and varies from 0 to 5 mA in order to simulate a PMT charge from 0 to 50 pC which represents 0 to 300 p.e. when the PM gain is  $10^6$ .



Figure 6: Input signal used for measurements and simulations.

#### 2) Analog part tests.

| <br>For the second sec |              |              |              |  |  |  |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|--------------|--------------|--|--|--|
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | Preamplifier | Slow Shaper  | Fast Shaper  |  |  |  |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | Gain PA=8    | RC=50ns      |              |  |  |  |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | Meas. / Sim. | Meas. / Sim. | Meas. / Sim. |  |  |  |
| Voltage                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 5mV/5.43mV   | 12mV/19mV    | 30mV/39mV    |  |  |  |
| (1 p.e.)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |              |              |              |  |  |  |
| rms noise/                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 1mV/468uV    | 4mV/2.3mV    | 2.5mV/2.4mV  |  |  |  |
| Noise p.e.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 0.2/0.086    | 0.3/0.125    | 0.08/0.06    |  |  |  |
| (SNR)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 5/12         | 3/8          | 12/16        |  |  |  |

Table 1 lists the simulation and measurements results for the three main blocks of the analog part: Preamplifier, Slow shaper and Fast shaper.

#### Table 1: Analog part results.

There is a good agreement between measurements and simulation in analog part results except for the noise values. To characterize the noise, the Signal to Noise Ratio (SNR) is calculated with reference to the MIP (1 p.e.). The noise differences are immediately evident: an additional low frequency noise is present in measurement (is now under investigation even if it is supposed to be tied to the power supply noise). A small difference has been noticed in measurement without the USB cable that allowed the communication between the test board and the Labwiev program: an rms noise value of  $660\mu V$  (0.132 p.e.) for preamplifier and so a SNR value of 8.

Another important characteristic is the linearity. The preamplifier linearity in function of variable feedback capacitor value with an input charge of 10 p.e. and with residuals from -2.5 to 1.35 % is represented on Figure 7. The gain adjustment linearity is good at 2% on 8 bits.



Figure 7: Preamplifier linearity vs feedback capacitor value.

Figure 8 represents the slow shaper linearity for a time constant of 50 ns and a preamplifier gain of 8. The slow shaper output voltage in function of the input injected charge is plotted. Good linearity performances are obtained with residuals better than  $\pm 1\%$ .



Figure 8: Slow shaper linearity;  $\tau$ =50 ns and Gpa=8.

In order to investigate the homogeneity among the whole chip, essential for a multichannel ASIC, for the different preamplifier gains is plotted the maximum voltage value for all channels. On Figure 9 is given the gain uniformity. A good dispersion of 0.5%, 1.4% and 1.2% have respectively been obtained for gain 8, 4 and 2. This represents a goal for the ASIC.



Figure 9: Gain uniformity for Gpa = 8, 4, 2.

#### B. DAC Linearity

The DAC linearity has been measured and it consists in measuring the voltage DAC (Vdac) amplitude obtained for different DAC register values. Figure 10 gives the evolution of Vdac as a function of the register for the DAC and the residuals with values from -0.1% to 0.1%.



#### C. Trigger Output.

The trigger output behaviour was studied scanning the threshold for different injected charges. At first no charge was injected which corresponds to measure the fast shaper pedestal. The result is represented on Figure 11 for each channel. The 16 curves (called s-curves because of their shape) are superimposed that meaning good homogeneity. The spread is of one DAC count (LSB DAC=1.78 mV) equivalent to 0.06 p.e.



Figure 11: Pedestal S-Curves for channel 1 to 16.

The trigger efficiency was then measured for a fixed injected charge of 10 p.e. On Figure 12 are represented the S-curves obtained with 200 measurements of the trigger for all channels varying the threshold. The homogeneity is proved by a spread of 7 DAC units (0.4 p.e).



Figure 12: S-Curve for input of 10 p.e. for channel 1 to 16.

The trigger output is studied also by scanning the threshold for a fixed channel and changing the injected charge. Figure 13 shows the trigger efficiency versus the DAC unit with an injected charge from 0 to 300 p.e. and on Figure 14 is plotted the threshold versus the injected charge but only until 0.5 pC.





Figure 14:

Trigger efficiency vs DAC count until 3 p.e.

In Figure 15 are plotted the 50% trigger efficiency values, extracted from the plot in Figure 14, converted in mV versus the injected charges. A noise of 10fC has been extrapolated. Therefore the threshold is limited to 10  $\sigma$  noise due to the discriminator coupling.



Figure 15: Threshold vs injected charge until 500 fC.

#### D. ADC.

The ADC performance has been studied alone and with the whole chain. Injecting to the ADC input directly DC voltages by the internal DAC (in order to have a voltage level as stable as possible) the ADC values for all channels have been measured. The measurement is repeated 10000 times for each channel and in the first panel of the Labview window (Figure 16) the minimal, maximal and mean values, over all acquisitions, for each channel are plotted. In the second panel there is the rms charge value versus channel number with a value in the range [0.5, 1] ADC unit. Finally the third panel shows an example of charge amplitude distribution for a single channel; a spread of 5 ADC counts is obtained.



Figure 16: ADC measurements with DC input 1.45V.

The ADC is suited to a multichannel conversion so the uniformity and linearity are studied in order to characterize the ADC behaviour. On Figure 17 is represented the ADC transfer function for the 10-bit ADC versus the input voltage level. All channels are represented and have plots superimposed.



Figure 17:

10-bit ADC transfer function vs input charge.

This plot shows the good ADC uniformity among the 16 channels. In Figure 18 is shown the 12-bit ADC linearity plots with the 25 measurements made at each input voltage level. The average ADC count value is plotted versus the input signal. The residuals from -1.5 to 0.9 ADC units for the 12-bit ADC; from -0.5 to 0.4 for the 10-bit ADC and from -0.5 to 0.5 for the 8-bit ADC prove the good ADC behaviour in terms of Integral non linearity.



Once the ADC performances have been tested separately, the measurements are performed on the complete chain. The results of the input signal auto triggered, held in the T&H and converted in the ADC are illustrated in Figure 19 where are plotted the 10-bit ADC counts in function of the variable input charge (up to 50 p.e). A nice linearity of 1.4% and a noise of 6 ADC units are obtained.



#### IV. CONCLUSION

Good overall performances of the chip PARISROC are obtained: auto trigger signal and digitalization of DATA. Good uniformity and linearity although strange noise performance due to 10 MHz clock noise and a low frequency noise under investigation. A second version of the chip will be submitted in November 09 with an increasing of the dynamic range thanks to 2 preamplifier gains: high gain and low gain; 8/9/10 bits ADC to reduce the p.e. loss below 1% level in case of 5 kHz dark current per PMT and a double fine TAC.

#### V. REFERENCES

- [1] http://asic.austriamicrosystems.com/
- [2] http://pmm2.in2p3.fr/

[3] B. Genolini et al., PMm2: large photomultipliers and innovative electronics for next generation neutrino experiments, NDIP'08 conference. doi:10.1016/j.nima.2009.05.135.

[4] F. Dulucq et al., Digital part of PARISROC: a photomultiplier array readout chip, TWEPP08 conference.

#### The 8 bits 100 MS/s Pipeline ADC for the INNOTEP Project – TWEPP-09

S. Crampon a,c, G. Bohner a,c, H. Chanal a,c, J. Lecoq a,c, H. Mathez b,c, P.-E. Vert a,c

<sup>a</sup> Laboratoire de Physique Corpusculaire 63177 Aubière, France
<sup>b</sup> Institut de Physique Nucléaire de Lyon 69622 Villeurbanne, France
<sup>c</sup> MICRHAU pole de MIcroélectronique RHone AUvergne <u>http://micrhau.in2p3.fr/</u>

#### crampon@clermont.in2p3.fr

#### Abstract

This paper describes the Analog to Digital Converter developed for the front end electronic of the IN2P3 INNOTEP project by the "pole microelectronique Rhone-Auvergne". (Collaboration between LPC Clermont-Ferrand and IPNL Lyon). This ADC is a 4 stages 2.5 bits per stage pipe line with open loops track and holds and amplifiers. It runs at 100MSamples/s and has 8 bits resolution. The stages used two lines, the gain line and the comparison line, with most operators running in current. The main idea of this current line is to make a first step toward an all in current structure. Currently, this ADC is designed with a 0,35µm SiGe technology.

#### I. INTRODUCTION

Positron Emission Tomography (PET) scanners have been recognized as very powerful and sensitive instruments for biomedical purposes such as brain studies, cardiac imaging, early cancer diagnosis and therapy. They operate by indirect detection of radioisotope's positron emission, which annihilates with an electron to produce a pair of 511-KeV (gamma) photons emitted in opposite directions. Each escaped photon may hit a scintillator to generate a light pulse that can be detected using a photomultiplier tube (PMT) or an avalanche photodiode (APD). Via sensitive and rapid detection of the 511-KeV photon pair, the positron annihilation event can be localized on a straight line of coincidence (line of response, or LOR). PET scanners should make use of low-noise and rapid electronics associated with PMT or APD. The associated electronics may include successive charge-sensitive amplification, analog filtering, Ato-D conversion and digital signal processing, as shown in Fig 1.



Figure 1: Architecture of the detector-associated electronics

We present in this paper the design of an Analog to Digital Converter for this application. The proposed circuit is based on a fully differential structure.

#### II. CIRCUIT DESCRIPTION

The architecture chosen is a 4 stages pipe line (seen in fig2). Each stage is designed in 2.5bits to get a resolution of 8 bits. There are 6 comparators per stage (for 7 references). The Analog to Digital Converter consists of two parts: the gain line which is fully differential and open loop, to try to minimize the stability problems, and the comparison line using current structure, to limit the comparators kick back noise and charge injection in the 3 bits DAC. The first three stages are similar, only the 4<sup>th</sup> stage is different, there is no gain line. The constraints of quality are released by a factor 4 between the stage n and the stage n +1.



Figure 2: The ADC architecture

#### III. Characteristics and tolerances

For the first version, the clock was different for the 4 stages (Clk for stage 1 and 3, and reverse Clk for stage 2 and 4). We obtain new bits every 5ns, at a real rate of 200MHz at the outputs of stages.

A behavioral structure was made to determine the critical point of the structure studied. After simulation the following results were found:

- The need of a comparator with important gain.
- Comparators offset uncritical (1/16 of dynamics => 125mV).
- Quality of references and amplifiers is a key point (+/- 2LSB) in the first stage.
- The maximum gain error to be tolerated is of 4%.

#### IV. THE STRUCTURE OF THE ADC

#### A. Gain line

The gain line in the first version is structured with a single track/hold and a subtraction block. To obtain the necessary gain of 4 for the 2.5 bits structure, these two blocks have an intrinsic gain of 2. The Track / Hold and the subtractor use an open-loop structure and bipolar transistors to obtain the expected 100MHz. The subtraction is done using a DAC working in current mode which controls the current generators associated with the input differential pair.



Figure 3: The stage structure, with the comparison line (in black) and the gain line (in red)

#### B. Subtractor with gain 2

One interesting feature of this block, due to the distribution of the gain 4 between the two blocks on the gain line, is that the output voltage is the half of the dynamic (1V differential), simplifying the excursion problems. However, it calls for a better precision.

The gain 2 differential multiplier is a classic open-loop structure that uses a resistor ratio. A single-stage correction is implemented to obtain the absolute accuracy requested. To perform the subtraction we use variation of currents between the two branches of the differential multiplier.



Figure 4: The subtractor structure, with the gain 2 differential amplifier and the current DAC structure. Depending on the results of the comparisons, the generators are distributed separatly on one of the two branches.

In fact, the current in each branch is defined by a DAC of which switches are controlled by the outputs of the comparators of the same stage. Another key feature of this structure is to work at constant current and to avoid the classical voltage references.

#### 1) Multiplication:

A classic gain 2 differential amplifier in open-loop is operated for the multiplication. Corrections have been simplified to the maximum as we have half of the dynamic on the output: fixed gain, improved gain for large signals by a diode which decreases a part of the collector resistor and the structure is accelerated using a capacitor in parallel with the collector resistor.

#### 2) Current Digital to Analog Converter:

The subtraction of references is made implementing a DAC in current, using a basic structure which that adds current generators to convert the input voltage. With the subtraction by modulation of current in the differential branches, we achieved:

- no external references (only one current source and w/l for all generators).
- direct use of the comparators output.
- easier control of charge injection phenomena.
- better linearity.

This structure operates with 8 current generators, there is 2 fixed generators and we have to control the other 6 identical generators. Through the DAC, which uses a system of switches, the current generators are distributed in both branches, which will allow us to perform the subtraction.

For a zero subtraction (range 0), currents should be identical in the two branches, or a distribution 3-3 for the generators. Then we have 4-2 (a difference of 2 units of current) for the range 1 and 5-1 (a difference of 4) for the range 2 and 6-0 (a difference of 6) for the range 3. We get the opposite for the other 3 ranges. (-3, -2 and -1)

We have staggered the different ranges -3, -2, -1, 0, 1, 2, 3 which will allow us to obtain the good references for the subtraction depending on the input voltages.

#### C. track and hold (gain 2)

On the output of each stage, we put a track/hold of gain 2. The input level is identical for all blocs: 2.3 V. The design is very classic: adaptation of the input level, gain 2 amplifier in open-loop with its corrections, adaptation level, exit on switches, storage capacitors and output transistors (PMOS needed to get the right common mode voltage ).

The signal remains on PMOS therefore there is no current discharging the capacitor. The noise depends only on the capacitor value:  $\sigma = kT/C$ , we want the  $\sigma$  less than 0.25 LSB differential or 1mV. For the capacitor:  $C > 20 \ aF$  is needed, and finally a capacitor  $C = 300 \ fF$  have been chosen.

One critical point with this structure is the switches. The errors due to the switches need to be controlled: the charge injection and the clock feedthrough.

#### 1) Amplification by 2:

We use the same structure as for subtractor amplifier, mounted in gain 2, but we do not use here an acceleration capacitor.

#### 2) Switch:

The principle is to take a master transistor with a ghost transistor on each side controlled by the reverse clock. The most important point that needs to be controlled is the charge injection. We use NMOS because of the polarity of the signals. By testing, it appears that it is with ghost transistors of half the main that we get the best results. The minimum size of transistors to reduce the charge injection is operated. The best result in charge injection is achieved with w=10 $\mu$ m and 1=0.35 $\mu$ m for the master, w=5 $\mu$ m and 0.5 $\mu$ m for the ghosts.

#### D. Comparison line

The comparison line is composed of 3 parts, the voltage to current conversion block, the differential current ladder and the comparators. The main interest of this line is the use of a differential current ladder.

#### E. Voltage to current conversion

This block was realised to modulate a quiescent current (in fact, we control current conveyor) according to the differential input voltage. Two correction structures which operate depending on the signal are implemented for the linearity. There is therefore a main floor with a fixed modulation using a parallel resistor set off by a pair of diodes. A second floor adds a little current, which is modulated by the same correction as the main floor (with different sizes of components). All floors use the same input voltage to 2.3V, this leads us to insert an input stage to the voltage to current block (V2C).

To check the quality of the transformation, the outputs are converted into tensions with the transistors of the same size as the V2C, the current measurement is a voltage generator to the same value as the input comparison voltage, and we consider an arbitrary gain of 1000; The simulation error is less than  $\pm 0.5$ mV for a LSB of 8mV.

After the layout achievement, it appears that this floor is not fast enough for an important comparators changeover. It was accelerated with a capacitor and increasing the current. A linearity error of  $\pm 1/4$ LSB is obtained, which is correct. We will have to edit the position of the comparators according to the errors of this blocks, this correction may include the V2C error. Anyway there is an important error margin on the comparator (because of the use of 2.5 bit structure), and it is the gain bandwidth product aspect of the whole comparison which is decisive.

#### F. The ladder

We now begin the current scales study. The first element is the comparator. In this structure, we fixed the comparator operating at zero differential voltage on a slave floor and, if possible, at the same common mode voltage regardless of the comparator. The size of resistors or transistors can either be adjusted, it was choosen arbitrarily to adjust the size of transistors.

1) Differential current ladder:

Determination of the current failover compare:

m: coefficient giving the current values.

In the master branches:  $I_a = I_r \times (1 + m)$  and  $I_b = I_r \times (1 - m)$  with  $k_a$  and  $k_b$  reports currents between masters and slaves.

In the slave branches:  $I_A = k_a \times I_r \times (1 + m)$  and  $I_b = k_b \times I_r \times (1 - m)$ , the changeover tensions are the same in both branches,  $R_A \times k_a \times (1 + m) = R_B \times k_b \times (1 - m)$ 

If one chooses  $R_A = R_B$  (this is a possible degree of freedom):  $k_a \times (1 + m) = k_b \times (1 - m)$ 





We want to fixe the comparison level of different pairs at the same common mode voltage.  $k_a \times (1 + m)$  must always have the same value, for example 1. k is the ratio w / l. we can write:

$$w_a = \frac{w}{1+m}$$
$$w_b = \frac{w}{1-m}$$

To achieve the design, the resistors  $R_A$  and  $R_B$  must be paired, but their absolute values is essential only for the operating point of the comparator. A good treatment is required, depending on the desired accuracy, the ratio of transistors  $w_a$  and  $w_b$ , largely among themselves, the relationship with the masters will play on the working point of the comparator. Care matching: master transistors with them, the resistance between them, the size ratio of transistors slaves. The absolute value of these components will play on the comparators working point voltage.

Using this current scales help us to control the kick back noise. If 6 different comparators on one differential pair are operated, the kick back noise generated by the different comparators is absorbed by the single differential pair. With this current scales, the comparators are all identical, and the kick-back noise generated by the comparator is absorbed with this structure by 6 differential pairs.

#### *G. The latched comparator*



Figure 6: The diagram of the latched comparator

Following a design now often implemented, we have made the comparator faster to keep the 100MHz. The schematic concept is illustrated in Fig 6. In the design, a problem of product gain-bandwidth which need to be improved is encountered, it is necessary to optimize the assembly with this point of view. Moreover, if we consider the design of Fig 5, the signal pass in current before the latched comparator, we look for the next version, the possibility of designing this comparator also all in current.

#### V. CORRECTIONS TO THE ADC STRUCTURE

#### A. Timing management

In the first version of the ADC, when there is an important change in the input voltage, the subtractor have not enough time to do the multiplication and references subtraction. More exactly, the subtractor have less than 5ns to obtain the good value. To remedy this, without touching the opening time, we doubled the time during which the subtractor may work. The clock is now the same for the 4 stages. We obtain new bits every 10ns, at a real rate of 100MHz in the output of the stage. It can be possible because we have chosen to use two T / H on each stage. The comparison order is fixed at the end of the track period of the output T/H on the preceding stage. The possible range change have therefore a complete period (the track and hold the T / H input) to be realised, 10ns exactly. This timing is now used and allows the floor the more critical (subtract X2) to manage the range change.

On the input of the comparison line, we therefore place a track/hold gain 1 in order to double the working time of the subtraction block.



Figure 7: The stage structure with the T/H at the entry of the gain line

#### B. Digital Part

The ADC digital part is the adder, which has been synthesized and developed/routed to be implemented in parallel with the analog part. Check inputs/outputs have been built on this block, that can help us to trace the possible conversion errors at the different stages.

#### VI. SIMULATIONS

Different schematic simulations were made, the bandwidth was the critical point with this structure. The process/matching simulation gave results corresponding to specifications, with a gain error and offsets level lower than the tolerances given by the behavioral simulations. After the layout achievement, parasitic simulations were made, we use the capacitor extraction, to check if the bandwidth is not too decreased. In Fig 8, we can see that we obtain ideal results.



Figure 8: Results with parasitic simulations, we obtain ideal results. We generated a ramp at the input of the ADC and then compare the conversion results with this ramp.

#### VII. MEASUREMENTS



On the graph, a conversion error is present all along the ramp because an offset is present at the output of each stage. Levels are also visible (first stage levels), the 2.5-bit algorithm does not compensate for this offset.

From -0.91 V to 0.15 V in input, the ADC, despite the offset and levels, offers interesting results. In fact, it works on this voltage range at 100MHz and with a precision of 9 bits (we use the  $9^{th}$  bit due to the use of 2.5bits algorithm). The INL (when we take into account gain and offset errors) is less than 1LSB, as the noise.

All the chips had this error and gave almost identical results, so we were routed to an error in the layout. After a complete study to determine where this offset error came from, we found that in the layout of subtract X2, a parasitic resistance injects an error in the half of current generators.



Figure 10: The layout of the ADC

This parasitic resistor less than 5 ohms had not appeared on the old parasites simulations (less accurate). With this resistor, we have an offset of more than 250mV at the output of the stage. After correcting the offending tracks, the layout is fixed, and we hope that the next foundry will be satisfactory. Moreover, process/missmatch problems seem to be mastered because we obtain the same results for all chips.

| rable 1. reatures of the ADC  |
|-------------------------------|
| Table 1. I caldres of the ADC |

| Architecture    | 2.5-bit/stage                   |  |
|-----------------|---------------------------------|--|
| Technology      | 0.35µm SiGe                     |  |
| Area            | 2425 μm x 2775 μm               |  |
| Supply Voltage  | 3.5 V (Analog), 3.3 V (Digital) |  |
| Resolution      | 8 bits (9 bits possible)        |  |
| Full Scale      | 2V differential                 |  |
| Conversion rate | 100MS/s                         |  |
| Consumption     | 240mW                           |  |
| INL             | <1LSB                           |  |

#### VIII. CONCLUSION

A pipeline ADC has been designed using the 0.35  $\mu$ m BiCMOS technology of Austriamicrosystems. It presents a resolution of 8 bits with a clock frequency of 100MHz. The power consumption is 240mW with a power supply of 3.5V. The performance of the ADC has been measured. Currently, this first prototype does not respect the specifications. But the offset error have been corrected and a new prototype will be sent before the end of the year. This first prototype give us some satisfactions:

- It works at 100MHz
- Current driven blocks works perfectly (comparison levels errors < 1 LSB)
- The yield seems to be good

#### IX. References

[1] B. Joly, G. Montarou, J. Lecoq, G. Bohner, M. Crouau, M. Beossard, P.-E. Vert "An Optimal Filter Based Algorithm for PET Detectors With Digital Sampling Front-end" submitted to IEEE Transactions on Nuclear Sciences, 2009

[2] H. Mathez, P. Russo, G.-N. Lu, P. Pittet, L. Quiquerez, J. Lecoq, G. Bohner "A Charge-Sensitive Amplifier Associated with APD or PMT for Positron Emission Tomography Scanners" MIPRO 2009, Opatija

[3] B. Joly et Al "Test and Optimization of Timing Algorithms for PET Detector with Digital Sampling Front-End" proceeding conference PID771775 IEEE NSS 2008, Dresden

[4] P.-E. Vert "Etude, développement et validation d'un concept d'architecture électronique sans temps mort pour TEP de haute sensibilité" Ph.D dissertation, Université Clermont-Ferrand II – Blaise Pascal, 2007

#### A latchup topology to investigate novel particle detectors

A. Gabrielli<sup>a</sup>, M. Lolli<sup>a</sup>, D. Demarchi<sup>b</sup>, E. G. Villani<sup>c</sup>, A. Ranieri<sup>d</sup>

<sup>a</sup> INFN & Physics Department, University of Bologna, IT - <sup>b</sup> Chilab, Electronics Department, Politecnico of Torino, IT <sup>c</sup> STFC, Rutherford Appleton Laboratory, UK, <sup>d</sup> INFN & PHYSICS DEPARTMENT, UNIVERSITY OF BARI, IT

#### Abstract

Here the latchup effect is described as a novel approach to detect and read out particles by means of a solid-state device exploiting latchup topology. The paper first describes the state-of-the-art of the project and its development over the latest years, then the present and future studies are proposed. An elementary cell composed of two transistors connected in a thyristor structure is shown. A first prototype uses MOS transistors, resulting an even more promising and challenging configuration than that obtained via bipolar transistors. A second version of the circuit exploits a commercial SiC MESFET as sensing device. As the MOS transistors are widely used at present in microelectronics, a latchup topology is proposed as a novel structure for future applications in particle detection, amplification of signal sensors and radiation monitoring.

#### I. INTRODUCTION

This paper presents a work that started just few years ago, when the authors - in particular A. Gabrielli and G. Villani were investigating redundant logic circuits against Single Event Effects (SEE) [1] and studying new structure to reduce the in-pixel power consumption, respectively. In particular, SEE originate when an overthreshold charge is deposited in sensible nodes of microelectronics devices. Hence, while studying and investigating on these effects the two authors, independently of each other, had the idea to exploit one of the most dangerous of the SEE: the latchup effect [2]. The topology corresponding to this effect - a thyristor - could be exploited as a powerful means of achieving the precise detection and positioning of a broad range of ionising particles or, for example, the proposed device can only be used as a readout circuit for amplification and latching of a variety of signals provided by sensors for high-energy physics experiments. In fact, the circuit takes the function of the data acquisition chain that is to date designed within any pixel of pixel detectors widely used, for example, in experiments [3, 4, 5] of the Large Hadron Collider. Although the principle was already proved in the past [6, 7], a novel prototype has been designed, constructed and tested and some new results are presented below.

#### II. A FIRST PROTOTYPE

Figure 1 shows two MOS transistors instead of the bipolar devices that create the well-known latchup circuit. Figure 2 shows how the circuit has been implemented via commercial MOS components. In more detail, by connecting the MOS transistors extracted from CMOS inverters after having disconnected the power pin of the N-MOS and the ground pin of the P-MOS, the two individual transistors became available. In this way we exploited submicron MOS transistors without fabricating an integrated version of the cell, which is to be done in the next future in any case. Figure



Figure 1: Latchup topology

shows a test board provided with many jumpers and variable resistors to easily configure and bias in several ways the circuit. Figure 4 shows an oscilloscope plot of the cell under test. By following the top graph from left to right, it is evident that initially the output signal is at high (supply) voltage. This indicates that the entire thyristor is off, waiting for an ignition. Then, an over-threshold spike is provided with the NMOS gate (bottom graph) and, as a consequence, the output voltage goes down to reach its standing value. Here the situation stabilizes and the circuit locks into a standing condition. Successively, a reset pulse not shown in the figure forces the circuit into the initial turned off condition. This pulse is provided through an additional MOS transistor that shorts the N-MOS's gate to ground. After having proved that the circuit effectively ignites depending on the input spike, we have



Figure 2: Transistors extracted from commercial inverters



Figure 3: The board



Figure 4: Oscilloscope plot at T=5µs



Figure 5: Cyclic latchup ignitions at T=5µs

measured the spike height, which is of the order of 10 mV and, and the input impedance of the circuit, which is of the order of 100  $\Omega$ . As the pulse width is about 100 ns, the injected charge in the transistor's gate is of the order of 10 pC (10mV / 100 $\Omega \times$  100ns = 10pC). This is a rough estimation fully compatible with what was obtained in [6, 7].

Eventually, we measured the noise figure of the circuit in terms of spread in the ignition voltage. Hence, we have swept accurately the spike height, while measuring the ignition-tonon-ignition ratio over 200 cycles at a time. These measurements have been repeated several times to estimate the reliability and repeatability of the system. Moreover, the tests have been carried out by increasing and by decreasing this spike's height to measure the behavior of the circuit during rising and falling transition points. Figure 5 shows a



Figure 6: Noise curve

test configuration using a cyclic ignition of the system. Figure 6 summarizes all these measurements. It can be easily seen that the rising and falling curves of the noise transition – Scurves – are different in transition width, point and spread. However, the most significant part of the curve has been shaded to point out that the spread in the ignition point is about 640µV - i.e. spread of ignition threshold @ 50% of Scurve, let us say lower that 1mV, in any case -. So, both rising and falling transition spreads are very sharp since any transition curve owns a noise that can be estimated in of the order of 100µV. All in all, the whole power consumption of the cell is also very low, of the order of  $1\mu$ W, when it is not ignited. This can be easily understood since the number of components inserted, basically two transistors plus one reset switch plus some resistors, is much smaller than that of the modern pixel circuits. Hence, it is reasonable to expect even better numbers and results for integrated versions of the latchup circuit.

The authors [see G. Villani et al., 8, 9] are investigating other types of latchup detector studies oriented to low-power applications and dosimetry. In fact, if just one or both transistors of the latchup cell are replaced with floating gate devices, not only the over spike input would be under control, but also the baseline over which this spike is added. Thus, being the charge injected within the floating gate removable via external radiation, the same latchup circuit could be applied as a dosimeter. In more detail, once a floating gate MOS has been programmed with a certain threshold, this threshold is swept back down depending on the total absorbed radiation dose, till the latchup process ignites spontaneously. Hence, if the threshold to absorbed dose ratio is known, it can be claimed that the cell ignites whenever a certain dose of radiation is absorbed: this is a dosimeter. This type of research is ongoing but the principle has already been proved [8, 9].

#### III. A SECOND PROTOTYPE

Figure 7 shows a circuit implementing a SCR topology using one P-channel MOS transistor and one N-channel MESFET component by CREE. Additionally the SCR is built via SiC instead of silicon. The reason of this choice relies in the high-temperature and high-radiation tolerance of the SiC. This could open new applications in these fields. Hence, we have here used the CREE 24010 MESFET component instead of the N-MOS. At first we have used a standard JFET Spice model to describe a linear behavior of the MESFET trying to simulate the whole circuit shown in Figure 7. The topology corresponds to the circuit shown in Fig. 1. The ignition is



Figure 7: Actual circuit mounted on a test-board using the MESFET 24010



Figure 8: Simulation of a Temp. Montecarlo of the Latchup ignition of the above circuit



Figure 9: Oscilloscope plot

confirmed as it was investigated in the past provided a different and dedicated polarization. Figure 8 shows a Montecarlo Spice simulation of the circuit shown in Fig. 7. A sweep in temperature has been done. From top to bottom, first set of plots represent the output voltage on the MESFET's drain, a second set of plots describe the current on the MESFET and a single pulse simulates a given deposited charge at the MESFET's gate.

Figure 9 shows two oscilloscope plots of the same circuit tested on a board. The top graph represents the Vout in the circuit while the bottom plots is the MESFET's gate, or Vin pin. Even though the input spike is not visible, it is clear that the two curves cross each other as a confirmation of the circuit ignition. The voltage swing is of the order of 2 volts and the ignition time of the order of several s. The sensitivity of the circuit will be a future business. For the time being the results confirm that also a MESFET component can be used into a latchup topology.

#### IV. CONCLUSION

This study indicates that a very simple circuit can operate like the more complicated structures used today in modern pixel detectors. An integrated device designed via modern CMOS technologies may work either as particle detector or as readout circuit for general sensors. The cell tested in laboratory was designed by exploiting commercial transistors connected to form a thyristor circuit. The circuit has a noise spread of the threshold lower that 1mV, power consumption due to leakage-biasing currents of the order of 1 $\mu$ W, estimated charge sensitivity of the order of 1pC and very good repeatability.

Future applications in high-energy physics and in radiation monitoring seem to be the most suitable for this type of device. In addition, for its high simplicity and, consequently, for its very low power consumption, it is also easily adaptable to a wide range of monitors, from portable devices to huge pixel detectors.

#### V. REFERENCES

[1] F.W. Sexton, "Destructive single-event effects in semiconductor devices and ICS', IEEE Trans. Nucl. Sci., 2003, 50/3, 603–621

[2] A.H. Johnston, "Latchup in integrated circuits from energetic protons", IEEE Trans. Nucl. Sci., 1997, 44/6, 2367– 2377

[3] G. Gagliardi, "The ATLAS pixel detector electronics" Nucl.Instr. Meth. A", 2001, 466 275–281

[4] R. Baur, "Readout architecture of the CMS pixel detector", Nucl. Instr. Meth. A, 2001, 465, 159–165

[5] P. Riedler et al., "Production and integration of the ALICE silicon pixel detector", Nucl. Instr. Meth. A 2006, 572, 128–131

[6] A. Gabrielli, "Proposal for solid-state particle detector based on latchup effect", El. Lett., 2005, 41/11, 641-643

[7] A. Gabrielli, "Particle detector prototype based on a discrete-cell sensitive to latchup effect", Meas. Sci. Tech., 2006, 17, 2269-2273

[8] G. Villani, A. Gabrielli, D. Demarchi, "A family of sensitive pixel devices by exploiting the latchup effect", SORMA WEST 2008, Proceedings of the Symposium on Radiation Measurements and Applications 02-05 June, 2008, Berkeley, CA, USA

[9] G. Villani et al., "Radiation detection and readout based on the latchup effect" - Proceedings of the PSD 2008 - Int. Conf. on Position Sensitive Detectors, 01-05 September, 2008, Glasgow, UK, to be published in Nucl. Instr. Meth. A

#### A 5 Gb/s Radiation Tolerant Laser Driver

#### in CMOS 0.13 µm technology

L. Amaral <sup>a</sup>, B. Checcucci <sup>b</sup>, S. Da Silva <sup>a</sup>, G. Mazza <sup>c</sup>, S. Meroli <sup>b</sup>, P. Moreira <sup>a</sup>, A. Rivetti <sup>c</sup>, J. Troska <sup>a</sup>, K. Wyllie <sup>a</sup>

> <sup>a</sup> CERN, 1211 Geneva 23, Switzerland <sup>b</sup> INFN sez. di Perugia, Perugia, Italy <sup>c</sup> INFN sez. di Torino, Torino, Italy

> > mazza@to.infn.it

#### Abstract

A laser driver for data transmission at 5 Gb/s has been developed as a part of the Giga Bit Transceiver (GBT) project. The Giga Bit Laser Driver (GBLD) targets High Energy Physics (HEP) applications for which radiation tolerance is mandatory.

The GBLD ASIC can drive both VCSELs and some types of edge emitting lasers. It is essentially composed of two drivers capable of sinking up to 12 mA each from the load at a maximum data rate of 5 Gb/s, and of a current sink for the laser bias current. The laser driver include also pre-emphasis and duty cycle control capabilities.

#### I. The GBT Project

The GBT project [1] aims to design a radiation tolerant optical transceiver for High Energy Physics (HEP) experiments. The GBT will provide a bi-directional connection between the front-end electronics and the DAQ, trigger and DCS systems. Therefore experimental data, trigger, timing and control informations will be transmitted over the same physical link.

Fig. 1 shows the GBT architecture. On the detector side the electronics has to be radiation tolerant for both total dose and single event upset (SEU) effects, and therefore a full custom ASIC development is required. In this case the GBT will be based on four ASICs : a photo-diode receiver (GBTIA), a laser driver (GBLD), a main chip with serialiser, deserialiser and protocol handling and a slow control interface (GBT-SCA). On the counting room side, where radiation is not an issue, the GBT functions can be implemented either by the same chip set or by a commercial driver and receiver and by an FPGA-based implementation of the GBT functions.



#### Figure 1: The GBT architecture

#### II. GBLD REQUIREMENTS

The GBT chip set includes a laser driver targeted at driving both VCSELs and some type of edge-emitting lasers at a maximum data rate of 5 Gb/s, named GBLD.

VCSELs are characterized by a dynamic impedance of the order of tens of ohm and currents of the order of few mA, while edge-emitting lasers have lower impedance (few  $\Omega$ ) and requires higher currents (tens of mA). Therefore a large range of modulation and bias currents is required in order to address both laser types.

The GBLD has to provide laser modulation and bias currents that are programmable in the  $2\div24$  mA and  $2\div43$  mA range, respectively, with a 0.16 mA resolution. In order to compensate for high external capacitive loads or asymmetries in the laser diode response, independently programmable preemphasis and de-emphasis of the rising and falling edges are also required. The emphasis current has to be in the range  $0\div12$  mA with a 0.8 mA resolution.

The different requirements from the two laser types has been addressed by splitting the output stage into two identical drivers. Each driver can provide up to 12 mA to the load and has a 50  $\Omega$  internal termination. Both drivers are controlled by the same driving signals and the same control DAC. With such an arrangement a VCSEL can be driven by a single driver while the other one can be switched off to reduce power consumption, while an edge emitting laser will be driven by the two drivers in parallel. In the latter case the input impedance is halved, thus obtaining a better impedance matching with the lower dynamic impedance of the edge emitting lasers.

The GBLD is driven by an AC-coupled differential signal. The differential input dynamic range is between 100 mV and 1.2  $V_{PP}$ . The input stage has to be internally biased and terminated.

A 2-wire  $I^2C$  protocol has been chosen as the control and configuration interface between the GBLD and the counting room. The configuration registers and the control logic must be protected against SEU.

The GBLD will be packaged in a  $4 \times 4 \text{ mm}^2$  QFN24 package. Such a small package limits the maximum die size to  $2 \times 2 \text{ mm}^2$ .

#### **III. GBLD** ARCHITECTURE

The most critical part of the GBLD is the modulator, depicted in fig. 2. The input stage is followed by a pulse width modulation circuit, which allows to change the signal duty cycle by  $\pm 15\%$  @ 5 Gbps. The output signal is then split into two path. The first one goes to the pre-driver, which drives the two output stages *A* and *B*. In the second one a delay stage generates a delayed signal which is used by the following differential AND gates to create the emphasis pulses for the rising and falling edges. In this prototype the emphasis driver is connected to the driver B only in order to evaluate the influence of the parasitics added by the driver itself.



Figure 2: Modulator block diagram

All the stages have been designed as nMOS only differential pairs with resistive load in order to maximize speed. It has been shown [2] that, independently from the technology, the maximum speed can be obtained when the transistor current density is around 0.25 mA/ $\mu$ m. All switching transistors have been sized following this criteria.

A critical point is to be able to switch large current while keeping the parasitic capacitors, and therefore the transistor size, as small as possible. This implies that  $V_{GS}$ - $V_{TH}$ . has to be maximized. Resistive load allows to pull  $V_G$  up to the power supply voltage, while triple well transistors with the source connected to the bulk have been used to get rid of the bulk effect on the threshold voltage.

The power supply is 1.5 V for all the stages with the exception of the two output and emphasis stages, which are powered at 2.5 V. This second power line is required in order to accommodate the voltage swing across the laser in all conditions of driving currents and laser differential impedance.

#### A. Pre-driver and output stages

The schematic of the pre-driver and of one of the two output stages is depicted in fig. 3. Here  $V_{DDmod}$  and  $V_{DDlaser}$  are the 1.5 V core and the 2.5 V output power supplies, respectively.

The pre-driver is a differential stage with a resistive load. Inductive peaking has been used in order to increase the stage bandwidth. In order to limit the jitter the frequency dependence of the phase shift has to be minimized. It can be shown [3] that for best group delay the maximum obtainable bandwidth increase is 60%. The integrated inductors have been realized as two parallel spirals with octagonal shape. The inductors use the two uppermost metal layers for minimum series resistance and an high resistivity substrate underneath (via a p-well implant block mask) to decrease the parasitic capacitance to the substrate.

The output stages are cascoded differential stages directly driven by the pre-driver. The two cascode transistors  $M_{2A}$  and  $M_{2B}$  in fig. 3 are thick oxide transistors. These transistors can withstand a voltage of 2.5 V and therefore can be safely connected to the higher power supply. The gate of these transistors is connected to 1.5 V, thus protecting the thin oxide transistors  $M_{1A}$ ,  $M_{1B}$  and  $M_3$  from the higher voltage.



Figure 3: Schematic of the pre-driver and output stage

#### B. Pre/de emphasis circuit

The pre-emphasis principle consists of an increase of the signal amplitude for a very short time in correspondence of the signal transition, in order to increase the system bandwidth. It can be used when the RC limitation come from a component which cannot be improved (typically long wires that cannot be replaced). The de-emphasis works in the opposite way by decreasing the system bandwidth. Both cases are shown in fig. 4 for the rising edge.

An example of application to optical transmission where both techniques are required is when a laser is biased under its threshold. In that case the optical response can be characterized by a sharp rise followed by relaxation oscillations, while the falling edge is rather slow. Therefore it can be required both to speed up the falling edge (via preemphasis) and to slow down the rising one (via de-emphasis).



Figure 4: Pre-emphasis and de-emphasis

In order to obtain a pre-emphasis on the rising edge, a current pulse synchronized with the signal transition can be simply added to the output signal, as shown on the left part of fig. 4. However, due to the fact that the driver can only sink current, the de-emphasis function cannot be implemented by subtracting a pulse. In the proposed architecture such a function has been implemented by adding a de-emphasis current in steady state and removing it during the emphasis pulse, as shown on the right part of fig. 4. Such a current has to be considered when the laser bias current is set. Pre-emphasis and de-emphasis on the falling edge use the same two techniques (just reversed).

The emphasis output driver, shown in fig. 5, is composed of two stages which are scaled versions of the main output stage, one for each signal transition. Each differential stage has two cascode transistors connecting the differential pair to the output either with direct on inverted polarity, on the base of the voltage on the cascode gate terminals. This voltage is controlled by one configuration register and allows to independently select pre-emphasis or de-emphasis for each of the two edges.



Figure 5: Emphasis driver

The two pulses used to drive the emphasis driver are generate by a differential AND gate [4]. A delayed copy of the signal is generated via a delay line. The original signal is then ANDed with the inverted delayed signal to obtain a pulse in correspondence to the rising edge. The falling edge pulse is obtained in a similar way with the inverted input signal and the direct delayed signal.

#### C. Bias current generator

The bias current generator stage resembles the output stage, where both sides of the differential pair are connected to VDD and the two outputs are shorted together. Again, triple well transistors with the bulk connected to the source are used to avoid the bulk effect and thick oxide transistors are used to protect the rest of the circuit from the 2.5 V supply.

#### D. Control logic

The GBLD configuration can be done via a  $I^2C$  slave interface. In the current version seven 8-bit register are used, to control the modulation, bias and emphasis current and to disable the non-used circuits in order to save power. Two mask registers are provided to protect the laser diode against erroneous settings excessive modulation and bias currents.

For correct driver operation, it is important that the contents of the configuration registers will not be upset by SEUs. To avoid malfunction, the  $I^2C$  controller uses Triple

Modular Redundancy (TMR) logic. However, since the  $I^2C$  interface operates with a gated clock (i.e. the clock is only active during the data transfers) TMR alone can not prevent corruption from the registers since errors can accumulate during inactivity periods thus eventually leading to data corruption. To avoid this problem, the scheme shown in fig.6 is proposed. It operates as follow : when no error is present or during a load cycle, the register behaves as a common triple voted register. However, when a corrupted bit is detected by the error correction circuit, a clock rising edge is generated loading the register swith the output of the majority voters. Once the register content is corrected the clock signal is cleared. The circuit is thus self-timed.



Figure 6: Majority voting of clock gated registers

#### E. DACs

Current mode steering DACs based on a matrix of current mirrors are used to generate the modulator and bias currents. The reference currents are generated from a 644 mV bandgap reference voltage.



Figure 7: Bias scheme

Due to the large range of currents foreseen for the output, emphasis and laser bias stages, the  $V_{DS}$  of the tail current source of the corresponding differential pairs will vary

significantly. Therefore a simple diode-connected transistor cannot provide the required accuracy for these current sources. In the proposed solution an OTA compares the drain voltage of the tail current transistor with the drain voltage of the bias transistor in order to compensate for the channel length modulation effect. Fig. 7 shows the bias scheme.

#### IV. LAYOUT CONSIDERATIONS

The described GBLD prototype has been designed in a CMOS 0.13  $\mu$ m technology and tested. The adopted technology features 8 metal layers, 2.5 V compatible thick oxide transistors and triple well nMOS transistors. The die size is  $2 \times 2 \text{ mm}^2$ . Fig. 8 shows the chip layout.



Figure 8: GBLD prototype layout

On chip decoupling capacitors have been used to prevent switching noise on the supply. A combination of MOS capacitors and vertical metal capacitors has been used in order to maximize the capacitance density. Approximately 630 pF of MOS capacitors and 50 pF of metal capacitors have been placed for each power supply.

#### V. TEST RESULTS

A first group of tests has been performed with only driver A bonded to the package output pin. Figures 9 and 10 shows the eye diagram at 2.4 Gb/s and 4.8 Gb/s, respectively.



It can be observed from fig. 9 that at 2.4 Gb/s the eye is open and the jitter is quite low. However, it can be already noted that the rise and fall times are not sufficiently fast for 5 Gb/s operation. Indeed, at that frequency (fig. 10) the eye is still open but a significant jitter is present.



Figures 11 and 12 shows the deterministic and random jitter, respectively. As expected, deterministic jitter is the dominant part while random jitter remains into the specifications. It can be concluded that the dominant jitter component is due to Inter-Symbolic Interference (ISI) related to the bandwidth limitation.



Figure 11: Deterministic jitter vs modulation current



Figure 12: Random jitter vs modulation current

The three curves of fig. 11 and 12 correspond to three different values of bias current in the pre-driver stages. It can be observed an improvement in the jitter performances when the pre-driver is biased with a 40% higher current. Successive simulations with a complete layout parasitics extraction have

confirmed that the capacitive load of both the pre-driver and the output stages is much higher than expected and therefore limits the system bandwidth.

A second group of tests has been performed with both drivers bonded to the package output pads. The obtained results are summarized in fig. 13, where the three rows correspond to enable bits set for driver A, B and both A and B, respectively, while the three columns corresponds to different pre-emphasis values.



Figure 13: Eye diagrams @ 4.8 Gb/s

As expected, the higher capacitive load at the output, due to the presence of both drivers, significantly worsen the system bandwidth; however, the pre-emphasis technique allows to partially compensate the effect. The high jitter observed when both drivers are on can be attributed to asymmetries in the two drivers.

The best parameters setting has been used to connect the laser driver to a 850 nm VCSEL. The corresponding optical eye diagram is shown in fig. 14.



The GBLD was also qualified against a commercial SFP+ transceiver. Fig. 15 compares the electrical eye obtained by the transceiver in loop-back configuration (top) with the eye obtained with the GBLD (bottom). It can be observed that a comparable eye was obtained in the two configurations, though more jitter can be observed when using the GBLD.

#### VI. CONCLUSIONS

A 5 Gb/s laser driver prototype in a commercial CMOS 0.13  $\mu$ m technology has been designed and tested. The prototype is functional in all components but fall short of specifications in term of bandwidth.

Test results show that 5 Gb/s operation is possible only with the pre-emphasis function active. Even in this configuration, however, the jitter is relatively high even though it is fairly close to the specifications.

Accurate simulations on the full modulator layout with complete parasitic extraction showed that the bandwidth limitation is due to the parasitic capacitances introduced by the large lines required to drive the modulator current. An improved version will be submitted in the near future.



Figure 15: Optical eye patterns of commercial transmitter and GBLD @ 4.8 Gb/s

#### VII. References

[1] P. Moreira et al., "The GBT : A proposed architecture for multi-Gb/s data transmission in high energy physics", *Proc. of the 13<sup>th</sup> Workshop on Electronics for LHC and Future Experiments, CERN-2007-007 pp. 332-336* 

[2] T. Dickson et al., "The Invariance of Characteristic Current Densities in Nanoscale MOSFETs and Its Impact on Algorithmic Design Methodologies and Design Porting of Si(Ge) (Bi)CMOS High-Speed Building Blocks", *IEEE J. Solid-State Circuit, vol. 41, no. 8, pp. 1830-1845 August* 2006,

[3] T. Lee, "The Design of CMOS Radio-Frequency Integrated Circuits", 2<sup>nd</sup> Edition, Cambridge University Press, 2003

[4] P. Westergaard et al, "A 1.5V 20/30 Gb/s CMOS backplane driver with digital pre-emphasis", Proc. of Custom Integrated Circuits Conference, 2004, pp. 23-26 3-6 Oct. 2004 Page(s):23 - 26

#### The GBTIA, a 5 Gbit/s Radiation-Hard Optical Receiver for the SLHC Upgrades

M. Menouni<sup>a</sup>, P. Gui<sup>b</sup>, P. Moreira<sup>c</sup>

<sup>a</sup> CPPM, Université de la méditerranée, CNRS/IN2P3, Marseille, France
<sup>b</sup> SMU, Southern Methodist University, Dallas, Texas, USA
<sup>c</sup> CERN, European Organization for Nuclear Research, Geneva, Switzerland

#### menouni@cppm.in2p3.fr

#### Abstract

The GigaBit Transceiver (GBT) is a high-speed optical transmission system currently under development for HEP applications. This system will implement bi-directional optical links to be used in the radiation environment of the Super LHC. The GigaBit Transimpedance Amplifier (GBTIA) is the front-end optical receiver of the GBT chip set.

This paper presents the GBTIA, a 5 Gbit/s, fully differential, and highly sensitive optical receiver designed and implemented in a commercial 0.13  $\mu$ m CMOS process. When connected to a PIN-diode, the GBTIA displays a sensitivity better than -19 dBm for a BER of  $10^{-12}$ . The differential output across an external 50  $\Omega$  load remains constant at 400 mV<sub>pp</sub> even for signals near the sensitivity limit. The chip achieves an overall transimpedance gain of 20 k $\Omega$  with a measured bandwidth of 4 GHz. The total power consumption of the chip is less than 120 mW and the chip die size is 0.75 mm x 1.25 mm. Irradiation testing of the chip shows no performance degradation after a dose rate of 200 Mrad.

#### I. INTRODUCTION

The GBTIA chip consists of a low-noise, highbandwidth transimpedance amplifier (TIA) and a high performance limiting amplifier (LA) followed by a 50  $\Omega$ output stage to achieve high gain and high bandwidth. The photodiode biasing circuit is integrated in the same chip. Figure 1 shows the block diagram of the GBTIA receiver.

The TIA adopts a differential cascode structure (Figure 1) with series inductive peaking to achieve high transimpedance gain, high bandwidth, and low input referred noise. The Photo Detector (PD) current is AC coupled to the TIA using on-chip capacitors. The capacitive coupling rejects the DC component of the PD signal and allows for a fully differential structure with high power supply rejection ratio (PSRR) and common-mode rejection ratio (CMRR) to be used.

To cope with a potentially high leakage current in the PD induced by radiation, a novel PD biasing circuit is designed in the TIA to ensure the proper biasing of the PD for a leakage current ranging from 1 pA to 1 mA.

The LA is composed a cascade of four limiting amplifier stages followed by a 50  $\Omega$  output stage to

achieve high gain and high bandwidth. Each limiting stage employs a modified Cherry-Hooper structure with resistive loading and active inductive peaking to enhance the bandwidth. The four limiting stages are sized with increasing currents and transistor dimensions to be capable of delivering 8 mA to the output load while maintaining a high bandwidth. The GBTIA chip has been tested with a high-frequency PD at room temperature.



Figure 1 : The Block Diagram of the GBTIA Receiver

In section II, the architecture of the transimpedance amplifier is described and analyzed. Section III presents the design of the limiting amplifier. In section IV, the effect of the leakage current is analyzed and finally section V is dedicated to the presentation of the experimental results.

#### II. TRANSIMPEDANCE AMPLIFIER DESIGN

Figure 2 shows the TIA schematic diagram. As mentioned before, a differential configuration was adopted for its high PSRR and CMMR (although at a small sensitivity penalty). This ensures low cross-talk between the first stage and subsequent stages allowing for integrating the three functions: pre-amplifier, limiting amplifiers and 50  $\Omega$  driver in a single chip.

A high current level is needed for the input transistor to achieve high cut-off frequency and low noise. Consequently the input transistor size becomes large and
the parasitic capacitance reaches a high value. The cascode structure eliminates the effect of Miller capacitance and enhances the bandwidth.



Figure 2 : Schematic diagram of the transimpedance amplifier

The bandwidth of the transimpedance amplifier is determined by the total capacitance at the input node, the total input resistance of the preamplifier and the open loop gain of the amplifier.

The capacitance of the input node is defined by the photodiode and the bond-pad capacitance. It is difficult to increase the open loop gain of the amplifier to a value higher than 10 because of the relatively low transconductance  $g_m$  of the MOS transistors. The input resistance can be reduced by decreasing the feedback resistor  $R_F$ , but additional thermal noise is induced due to the lower value of the feedback resistance and therefore the sensitivity of preamplifier is degraded. In order to meet the low noise and wide bandwidth characteristics simultaneously, the shunt peaking technique was used in the TIA stage.

Figure 3 shows that the bandwidth can be significantly improved by using the shunt peaking technique. However, this comes at the cost of significant gain peaking which introduces Inter Symbol Interference (ISI). For this reason, the inductance value is sized to work at optimum group delay where the bandwidth is extended by less than 40 %.



Figure 3 : Bandwidth extension with shunt peaking

#### III. LIMITING AMPLIFIER DESIGN

# *A.* Design consideration and the overall architecture of the limiting amplifier

The purpose of the liming amplifier is to amplify the small voltage signal from the TIA so that it reaches the voltage swing required by the clock and data recovery circuit. To meet the overall design goals, there are several design considerations. First of all, given the sensitivity requirement of the overall receiver (to accommodate a photo-detector current as small as 20  $\mu$ A) and the 600  $\Omega$ differential gain of the TIA, the limiting amplifier needs to have a sensitivity of 12 mVpp. The gain of the limiting amplifier should be sufficient to amplify such a small signal to a few hundreds of mV (400mV<sub>pp</sub> in our design). Second, the minimum overall bandwidth must be 3.5 GHz (5 Gbit/s x 70%) to achieve an overall data rate of 5 Gbit/s [1]. Moreover, the input referred noise of the liming amplifier must be smaller than 857  $\mu$ V(12 mV<sub>pp</sub>/14) for a BER of  $10^{-12}$ . Finally, the input capacitance of the liming amplifier must be small so that it does not load the preceding TIA and reduce its performance.

To meet the above design specifications, we designed the liming amplifier using gain stages followed by an output stage to drive a 50  $\Omega$  load. The overall architecture of the limiting amplifier along with the TIA is depicted in Figure 4. The number of stages is chosen to be four to keep the overall power dissipation from being too high. To make the input capacitance of the limiting amplifier low while still maintaining a high bandwidth and delivering sufficient current to drive the output stage, we designed the four gain stages with increasing driving capabilities. As shown in Figure 4, each stage is biased with increasing current. To minimize the input referred noise, the first stage (LA1 in the Figure 4) was designed to have higher gain than the following stages so that the noise from stages LA2-LA4 is effectively suppressed. In addition, an offset cancellation circuit is added to prevent the mismatch in the differential gain stages from saturating the gain stages.



Figure 4 : The overall architecture of the LA

In the following subsections, we will describe the design of each gain stage and the output driver.

# *B. The design of the limiting amplifier gain stage*

To achieve a high bandwidth for the overall limiting amplifier, the bandwidth of each gain stage (LA1-LA4) needs to have a substantially higher bandwidth. We adopted the Cherry and Hopper (CH) topology [2] as the base line and employed resistive loading and inductive peaking techniques to further broaden the bandwidth. Figure 5 shows the modified CH stage used in our design. It consists of a tranconductance stage followed by a gain stage with shunt feedback. This topology guarantees that every node in the CH circuit is low impedance, thus yielding a high bandwidth. The resistive load in the transconductance stage provides a higher bandwidth than a current source load; in addition, an active inductive peaking circuit, made of transistors M6 or M5 and resistors R8 or R7 extends the bandwidth by another 34% over the purely resistive loaded topology.



Figure 5 Schematic of one LA stage

#### *C.* The 50 $\Omega$ output driver

To provide a 400 mV<sub>pp</sub> output voltage, the nominal bias current in the 50  $\Omega$  output driver is chosen to be 8 mA (Figure 6). To fully switch the bias current from one arm of the differential pair to the other, the input voltage at the output driver should be large enough. This is only achieved through sufficient gain from the stages of LA1 to LA4.



Figure 6 Schematic of the 50  $\Omega$  output stage

#### D. Simulation Results

Extensive simulations have been performed on the limiting amplifier to make sure that it works against various process corners, supply voltages and temperature variations. The overall limiting amplifier achieves a gain of 40 dB and a bandwidth of 4.3 GHz in typical cases (TT corner and 27 °C), and a gain of 28 dB and a bandwidth of

3.9 GHz in the worse-case scenario (SS corner and 100 °C). These simulations were done with a double 50  $\Omega$  termination as shown in Figure 6. The input referred noise in the worst-case scenario is 309  $\mu$ V, lower than maximum noise allowed by the design specification.

#### IV. PIN DIODE BIAS AND LEAKAGE CURRENT EFFECT

The pin diode leakage current increases with the radiation dose level and can reach a value of 1 mA for the dose level expected in the Super LHC upgrade. This current will increase the low cut-off frequency. The proposed biasing for the photodiode in Figure 7 is capable of maintaining this frequency to be lower than 1 MHz and thus be compatible with the GBT encoding.



Figure 7 Pin diode bias circuit

Additionally, the leakage current level has an effect on the noise and the sensitivity. In fact when the DC level is around 1 mA, the shot noise becomes comparable to the receiver noise. A sensitivity degradation is thus expected at the end of life of the SLHC. Simulations show a sensitivity loss of 3-4 dB.

#### V. MEASUREMENT RESULTS

The GBTIA was designed and implemented in a 0.13µm CMOS process. Figure 8 shows the chip photograph where the die size is 0.75 mm × 1.25 mm. The chip is wire-bonded to a high speed photodiode with a responsitivity of 0.9 A/W at a wavelength  $\lambda = 1310$  nm and parasitic capacitance around 240 fF.



Figure 8 The chip microphotograph

In order to minimize the wire bond effect and particularly the input parasitic capacitance, the connection between the TIA and the pin diode is made very short and does not exceed 200  $\mu$ m (Figure 9).

The power dissipation of the GBTIA is 120 mW for a power supply of 2.5 V.



Figure 9 Photodiode to the GBTIA connection

#### A. Eye diagram measurements

The differential eye diagram is measured at 5 Gbit/s and for different optical input levels. The pin diode is illuminated on the top by an optical signal coming from a high speed optical transmitter. Using a PRBS sequence length of  $2^7$ –1, we obtained a clear and well opened eye diagram for an input power of –6 dBm. The eye diagram is still acceptable when the optical input is set to –18 dBm (Figure 10).

For a -6 dBm input, the rise time is 30 ps and the total jitter is maintained below 0.15 Unit Interval (UI) for a bit error rate of  $10^{-12}$ . For a -18 dBm input, the jitter is less than 0.55 UI and the rise time around 60 ps.



Figure 10 Measured differential eye diagrams at 4.8 Gbit/s (a) -6 dBm input (b) -18 dBm input

#### B. BER estimate

A BER tester based on a commercial 10 Gbits/s optical transmitter and a high performance FPGA was used in order to measure the BER variation with the input optical level at the bit rate of 4.8 Gbit/s. With a PRBS sequence

length of  $2^7-1$ , the measured sensitivity is better than -19 dBm for a bit error rate of  $10^{-12}$  (Figure 11).

The output differential output is 400 mV and remains constant even for low optical input levels.



Figure 11: BER versus the input optical level for 2<sup>7</sup>–1 PRBS sequence

### C. BER measurements with the GBT protocol



Figure 12: BER versus optical level for the GBT data encoding sequence

In the GBT chip an error correction system is implemented. This system is based on the Reed-Solomon error-correcting encoder/decoder. Since the Single Event Upsets (SEU) on the photodiodes are considered to be the main source of errors, the proposed line encoding includes an error correction scheme particularly targeted to this issue. Without enabling the error correction system, the sensitivity is around -19 dBm for a BER of  $10^{-12}$ . The sensitivity is improved by 2 dB if the correction encoder is enabled

#### D. Total Ionization Dose effects

In order to facilitate the irradiation test, the pin diode is replaced in this case by a passive network where the input capacitance is set to 500 fF. Irradiation test was done using CERN Xray facility and only the GBTIA chip was placed under the beam. As shown in Figure 13, we did not observe any degradation of the BER even after a dose rate of 200 Mrad.



Figure 13: BER variation with the cumulated dose level

# E. Influence of the optical DC level on the BER



Figure 14: BER variation with the cumulated dose level

The DC current in the photodiode increases to a value higher than 1 mA after the TID irradiation. In order to measure the influence of this leakage current, the pin diode was illuminated by an additional DC laser source. In this case the integrated bias circuit ensures a sufficient voltage across the pin diode. No noticeable degradation of the BER coming from the effect of the low cut off frequency was observed. The value of this frequency was still compatible with the GBT data encoding when the DC current increased. However, we measured a sensitivity degradation coming from the DC current level. The power penalty introduced by the shot noise of the DC level is around 4 dB as shown in Figure 14.

#### VI. CONCLUSION

This paper describes the design of a 5 Gbit/s optical receiver circuit in a 0.13  $\mu$ m fully CMOS process.

The choice of a differential architecture allows the integration of the TIA and the LA in the same chip and rejects any noise propagated from power supplies.

In order to achieve a high gain, high bandwidth and low noise we used both active and passive shunt peaking techniques in the TIA and LA stages.

The GBTIA has been tested with a high speed photodiode and the most important results are summarized in Table 1.

| Bit rate                            | 5 Gbit/s                |
|-------------------------------------|-------------------------|
| Transimpedance gain                 | 20 kQ                   |
| Output voltage                      | $\pm 0.2 V (50 \Omega)$ |
| Sensitivity for BER = $10^{-12}$    | -19 dBm                 |
| Supply voltage                      | 2.5 V ± 10%             |
| Power consumption                   | 120 mW                  |
| Radiation tolerance                 | > 200 Mrad              |
| Power penalty for high dark current | 4 dB                    |

Table 1 : Summary of performances

The next step consists of measuring the effects of the Single Event Upset on the receiver and integrating additional features in the final design.

#### VII. ACKNOWLEDGEMENTS

We would like to thank L. Amaral, J. Troska and C. Soos from CERN for their help with the test setup and K. Arnaud from CPPM for the test board design.

#### VIII. REFERENCES

- [1] Behzad Razavi, "Design of Integrated Circuits for Optical Communications", McGraw-Hill, 2002.
- [2] E. M. Cherry and D. E. Hooper, "The design of wide-band transistor feedback amplifiers", Proc. IEE, Vol. 110(2), pp. 375-389, 1963.
- [3] Sunderarajan S. Mohan et al, "Bandwidth Extension in CMOS with Optimized On-Chip Inductors", IEEE Journal of Solid State Circuits, VOL. 35, NO. 3, March 2000
- [4] Karl Schrödinger et al, "A Fully Integrated CMOS Receiver Front-End for Optic Gigabit Ethernet", IEEE Journal of Solid State Circuits, VOL. 37, NO. 7, July 2002
- [5] Drew Guckenberger et al, "1V, 10 mW, 10 Gb/s CMOS Optical Receiver Front-End", 2005 IEEE Radio Frequency Integrated Circuits Symposium
- [6] Ty Yoon and Bahram Jalali, "1.25 Gb/s CMOS Differential Transimpedance Amplifier For Gigabit Networks Integrated Circuits and Systems"

# Thursday 24 September 2009

# PARALLEL SESSION B5 OPTOELECTRONICS AND LINKS

# The Radiation Hardness of Certain Optical fibres for the LHC upgrades at $-25^{\circ}$ C

C. Issever<sup>a</sup>, J. Hanzlik<sup>a</sup>, B.T. Huffman<sup>a</sup>, A. Weidberg<sup>a</sup>

<sup>a</sup> Oxford University, Oxford OX1 3RH, United Kingdom

t.huffman1@physics.ox.ac.uk

### Abstract

A luminosity upgrade is planned in the future for the Large Hadron Collider at CERN (called SLHC). Two optical fibres have been tested in a bespoke cold container achieving a constant temperature of  $\simeq -25^{\circ}$ C during the entire exposure. The motivations and results of these tests are presented and two multimode and one single mode optical fibre have been identified as candidates for optical links within the joint ATLAS and CMS Versatile Link project.

#### I. INTRODUCTION

The SLHC programme aims to increase the integrated luminosity by a factor of 10 compared to that expected for the LHC. [1] The LHC studies were based on the assumption that the integrated luminosity available for physics would be  $300 \text{ fb}^{-1}$ , therefore the SLHC studes are based on the assumption that the integrated luminosity delivered will be  $3000 \text{ fb}^{-1}$ . Based on this scaling an equivalent whole lifetime dose of ionizing radiation is estimated to be in the region of 550kGy (dose on Si at a radius of 30cm from the beam line) using a simple scaling of levels already calculated for ATLAS [2] based on the ratio of integrated luminosities expected.

Two of the detectors in the LHC, ATLAS and CMS, intend to use optical communication systems to read out their inner detectors during the upgraded machine's operation. In order to design and build an optical data link able to withstand this environment a joint project was formed called the "Versatile Link" project between ATLAS, CMS, and CERN.[3] Our group has the responsibility, among other things, to find suitable optical fibres for use in the Versatile Link.

Optical fibres generally take damage from ionizing radiation through the breaking of chemical bonds within the amorphous structure of Silica. The doping elements used in optical fibres to alter their refractive index can sometimes be highly sensitive to ionizing radiation. It is well-known, for example, that the element Phosphorous, which is often used to aid the manufacturing process, produces severe attenuation in optical fibres even at relatively low levels of exposure to ionizing radiation. Because the damage process is one involving the molecular bonds, heat applied to a damaged optical fibre can help re-establish broken bonds and the fibre will anneal with added heat.

The inner detectors of ATLAS and CMS plan to use silicon detectors as the primary tracking elements within both detectors and silicon detectors maintain higher performance in radiation environments when they are kept cold. Unfortunately, cold operation has the opposite effect on optical fibres, "freezing in" defects that form during radiation exposure.[4]

# A. Outline of this proceeding

A brief history of past radiation exposures is presented in Section 2 explaining some of our motivation for the current set of tests. In Section 3 we describe the sources, experimental setup, and procedures. Section 4 contains a description and analysis of the sensitivity of our tests. Section 5 is a description of the data and the experimental results we obtained. We explain our programme of future work in Section 6 and summarize our conclusions in Section 7.

#### **II. PAST RESULTS**



Figure 1: Shown is a plot of Radiation Induced Absorption during a previous radiation exposure. Four fibres are exposed here. The blue curve is Infinicor SX+ fibre and the black curve is Draka RHP-1 fibre. Below this is the fibre temperature showing a significant rise from room temperature during the radiation exposure.

Part of the motivation for these tests comes from fibre studies that our group conducted in August of 2008.[6] In the 2008 test we exposed 4 graded-index fibres to 630kGy(Si) in a gamma radiation source. It was from this test that we identified the two mutlimode (MM) fibres and one single-mode fibre (SMF-28) which we have qualified for use in the SLHC environment for warm operations. The focus of this paper is upon the two MM fibres identified from these previous studies, Infinicor SX+ by Corning and Draka RHP-1.

During this test we observed effects that we believed were

partially related to the fact that our container could not maintain a stable temperature. The relevant portion of this test is shown in Figure 1. These results indicated that the sensitivity of RIA to temperature could be very significant. Furthermore, the literature indicates that RIA increases, potentially substantially, when the fibre is cold [4]. Both the CMS and ATLAS experiments intend to run optical fibres through detector volumes that are held at temperatures near  $-25^{\circ}$ C. This motivated us to study RIA at a temperature close to this so that we might determine whether our two best candidate fibres from the August 2008 test would remain acceptable for use in the LHC upgrade.

### III. THE RADIATION SOURCES AND THE TEST PROCEDURE

All tests are performed at the Belgian Nuclear reactor facility SCK-CEN [5] located near Mol. Two sources have been used for the results presented here. All use gamma rays from the decay of  $^{60}$ Co as the source of ionizing radiation. To achieve SLHC level exposure a facility, called "Brigitte", is available which achieves a dose level of  $\simeq 22$ kGy(Si)/hr. A much lower level source known as "Rita" achieves a dose rate of  $\simeq 0.5$ kGy(Si)/hr and was used for our recent cold fibre tests. The sources are located 8 meters underwater, which acts as a shield. This also means that, with a properly designed container, it is possible to measure the damage taken by the optical fibre as a function of exposure in both time and dose. For optical fibre tests, this capability is superior to methods that permit damage testing only before and after exposure.

The group at SCK-CEN can control the temperature of their radiation containers as long as this temperature is above the ambient level of the water (typically between 25°C and 30°C). Maintaining a constant temperature in Brigitte is a challenge because the number of Compton scattering electrons is so high that any material used to contain the fibres as well as the metal wall of the outer container will heat up. This process caused the temperature rise displayed in Figure 1. Previous tests by our group showed an additional 30°C rise in temperature after the fibre was lowered into the radiation zone. The lower dose rates in the Rita facility generally do not pose such a problem as long as ambient room temperature is one's desired operational point.

As a result of these limitations our group constructed a container with an active cooling system. The container is approximately 450mm long and has a 200mm inner diameter. This cold container was designed for, and used, in the Rita facility. The active cooling elements were Peltier coolers. Exposures of the coolers separately indicated that they ought to be able to withstand up to 10kGy(Si) of dose and still operate effectively. Heat exchangers dumped the heat from the interior of the container into the surrounding shielding water. The volume of water is very large, many hundreds of cubic meters, and circulated so that it has a uniform temperature and forms an ideal heat sink for our purposes.

Optical fibres of 50m length are wound one layer deep around aluminium cylinders which fit inside the container. The fibres are wound in only one layer so that every part of the fibre is in physical contact with the cylinder. In one run up to two cylinders can be irradiated. The cylinders are thermally connected to each other and the upper cylinder is thermally connected to the 4 peltier cooling devices arrayed symmetrically about the central axis of the cylinder. Each of the cylinders has its own temperature measurement so that we can measure the temperature of each fibre during radiation. Pt100 devices were used for the temperature measurements. They are calibrated to within  $0.5^{\circ}$ C of absolute temperature but relative temperature measurements are sensitive to within  $\pm 0.01^{\circ}$ C.



Figure 2: The top figure shows the RIA for our Draka fibre in the cold container as a function of time. The lower plot is of the fibre temperature during this same period of exposure. The cold container was lowered into the radiation environment near hour 16. It was temporarily removed from the radiation environment from hours 42-46. The lower plot is the temperature of that same fibre. The band at  $-25^{\circ}$ C is present because the cooling system turns on and off to maintain a constant average temperature but this causes a  $\pm 1^{\circ}$ C variation throughout the exposure.

Each channel uses a separate laser light source at 850nm wavelength. This light is launched down a 25m length of patch fibre which runs into the container, through an ST connection to 50m of optical fibre under test, back through another ST connection and then returns through 25m of patch cable to a photodiode receiver. The laser and photodiode are in a shielded area and take no radiation damage. The lasers are all part of one VCSEL array[8] and each is driven by a current source with a stability of better than one part in  $10^4$  with a nominal current of 10mA. In addition to the fibres under test, the light from one laser channel simply goes down to the chamber and straight back to a photodiode through an ST barrel connector. The reason for this is to be able to remove residual losses from the patch cables. As a result all of our measurements are quoted as attenuation figures relative to the received light level from this reference fibre.

#### **IV. CURRENT RESULTS**

This summer two different radiation runs were performed in the Rita source at SCK-CEN. The first was 50m of prototype Draka RHP-1 SRH fibre held near  $-4^{\circ}$ C. During this test the cold container was operating at its maximum capacity and the cooling was essentially "best effort". Because of this variations of up to  $2^{\circ}$ C were encountered during the exposure. (The Radiation Induced Absorption (RIA) in his test is shown later in Figure 6)

The cold container was redesigned for the second test using a set of stacked peltier coolers and better thermal contact from the warm side of the coolers to the heat exchangers. The second test held two fibres (Infinicor SX+ and the Draka fibre) to temperatures near  $-25^{\circ}$ C. The Infinicor SX+ fibre had been previously exposed at room temperature (+30°C) in this same source during 2008.

Figure 2 shows the extent of the test. The fibres were first lowered into the water tank but out of the radiation environment so that the system could cool down. During this time no serious change to the received light was observed that was not consistent with the inherent stability of our measurement apparatus. Once cold the container was left over night with the Draka fibre spool at  $-25^{\circ}$ C while the lower spool holding infinicorSX+ fibre stabilized at  $-23.7^{\circ}$ C. The temperature sensor on the Draka spool was used to control the coolers.



Figure 3: The top figure shows the RIA for Infinicor SX+ fibre from Infinicor SX+ in the cold container as a function of time. The lower plot is of the fibre temperature during this same period of exposure. The lower plot is the temperature of that same fibre. The cooling system turns on and off to maintain a constant average temperature but this causes a  $\pm 0.05^{\circ}$ C variation during the exposure.

Radiation exposure started just after hour 16 on the figure and continued until hour 42. At this point the container was removed from the radiation environment but maintained at the nominal temperature to allow for any photobleaching effects to become evident. After 1.5 hours the cooling system was turned off and the fibres were allowed to reach the water temperature ( $+30^{\circ}$ C) while still outside the radiation environment. The coolers were then re-engaged and once the nominal  $-25^{\circ}$ C was again achieved the container was inserted back into the radiation area for further exposure where it remained until approximately hour 66.

Figure 3 shows the equivalent plot as Figure 2 but for the Infinicor SX+ fibre spool.

#### A. Annealing and Photo-bleaching Effects

Removing and replacing the fibres was done in order to determine the relative amount of photobleaching effects compared to effects due to temperature annealing. The Draka fibre in Figure 2 shows no indication of a change in attenuation when the temperature is increased outside of the radiation volume. Furthermore, when this fibre is re-exposed to radiation the level of RIA returns directly to the value prior to removal from the gamma source.

This is in contrast to the Infinicor SX+ fibre. An expanded view of it's behaviour during the time out of the radiation zone is shown in Figure 4. Here there is also a quick drop in attenuation once the container is removed from the radiation zone (the location of the blue line). Prior to turning off the coolers this reduction is beginning to stabilize. However, once the coolers are shut down (red dotted line) the attenuation again begins to drop. The level of attenuation almost returns to the baseline that existed prior to the start of any exposure in the first place. Unlike the Draka fibre, however, when the container is cooled and returned to the radiation zone (solid red line) the attenuation returns to a level between 0.02 and 0.03dB/m while the attenuation prior to removal was above 0.05dB/m.



Figure 4: An expanded view of the previous figure during the time that the container was removed from the radiation zone and allowed to warm up. The vertical lines show where the container was removed, when the coolers were turned off, and when the container was returned to the radiation zone respectively.

From these results we conclude that the level of RIA reduction seen in the Draka fibre is due mainly to photobleaching effects. However, there is a measurable amount of temperature annealing present in the Infinicor SX+ fibre.

# B. Comparison of RIA at different temperatures; same dose rates

Infinicor SX+ fibre from the same pre-form has been exposed in the Rita zone both at room temperature and at  $-23.7^{\circ}$ C. The Draka fibre from the same pre-form has been exposed in the zone at  $-4^{\circ}$ C and  $-25.5^{\circ}$ C. Figures 5 and 6 show the results of these exposures. In both figures the red curve is the "warm" exposure while the blue curve is the "cold" exposure. In the case of the Infinicor fibre the effect of temperature annealing as described previously has the effect of underestimating the total damage that would have been taken if the container had not been extracted from the radiation zone and warmed to room temperature. Accounting for this it is clear that even in this case the Infinicor fibre would have shown greater RIA at cold temperatures than at room temperature. The Draka fibre clearly shows that, for every part of the radiation exposure, the cold fibre (at  $-25.5^{\circ}$ C) is taking more damage than the "warm" fibre (at  $-4^{\circ}$ C).



Figure 5: Plotted is the RIA for Infinicor SX+ fibre from the same spool, exposed at the same dose rate (within a factor of two), but with the fibre held at two different temperatures. The blue curve was held at  $-23.7^{\circ}$ C while the red curve was exposed at  $+30.0^{\circ}$ C.

However, the reader might note that in Figure 5, at doses less than 1kGy(Si) the cold fibre is taking *less* damage than the same fibre held at room temperature. We do not understand this result as the dose rate difference between the two experiments was not significant enough to cause a substantial difference in damage.

These tests clearly demonstrate that the RIA for SLHC doses for these two MM fibres is larger at cold temperatures, compared to warm temperatures. However the behaviour of the RIA versus dose is too complicated to allow a reliable extrapolation to the full SLHC dose. Therefore further tests using cold operation and the full SLHC dose will be required.

# C. High Temperature sensitivity of Optical Fibres during Radiation

Looking at Figures 2 and 3 it appears that there is a great deal of noise on short time scales relative to the time axis on those plots. There are instabilities in laser systems and some of those are manifest in our measurements here. However, most of the fast variation after the radiation begins is due primarily to very small changes in the temperature inside the container. One can see from the temperature plots in Figures 2 and 3 that overall temperature stability is very good. However, because the system's temperature is controlled by turning peltier coolers on and off in response to the Draka temperature sensor, there is still some variation on a few minute time scale and this is what causes the variation in RIA during the radiation exposure.



Figure 6: Plotted is the RIA for Draka RHP-1 SRH fibre from the same spool, exposed at the same dose rate (within a factor of two), but with the fibre held at two different temperatures. The blue curve was held at  $-25.5^{\circ}$ C while the red curve was exposed at  $-4.0^{\circ}$ C.

One can see this effect much more clearly if we zoom in on a particular region around the 55 hour mark in time which corresponds to 22.7kGy(Si) of integrated dose. A set of plots in this region is shown in Figure 7. The upper figure shows the individual attenuation measurements with sufficient resolution that one can easily see how the RIA is changing as a function of dose. Both fibre types are shown here. Below this are the temperatures of the two fibres for the same dose range. Note that the Infinicor fibre is very much more sensitive to temperature during radiation than the Draka fibre as the rms variation for the infinicor fibre is 0.0035dB/m while for the Draka fibre the rms variation is 0.0013dB/m while the temperature swing for the Draka fibre is much greater. This rather dramatic effect was unexpected but does demonstrate how sensitive Radiation Induced Absorption of fibres can be to temperature, when they are irradiated in a cold environment.

#### V. FUTURE PLANS

In order to understand the RIA for these fibres using cold operation up to the full SLHC dose, we will perform tests within the Brigitte radiation zone. The fibres will be cooled to around  $-30^{\circ}$ C by an evaporative CO<sub>2</sub> cooling system. This will be a simple "blow-off" system where the coolant is vented to the atmosphere after use. The pressure from a standard CO<sub>2</sub> bottle will provide the work needed for cooling. The design is modelled on that of systems in use in the ATLAS experiment.[9]

#### VI. CONCLUSIONS

The ultimate reason for exposing these fibres cold to radiation is to determine whether or not, at full SLHC doses, they would be acceptable candidates for use in the Versatile Link project.

We have confirmed the results in the literature showing that the RIA of MM fibres is significantly larger at low temperatures compared to warm temperature. We have observed a new effect which we have not seen discussed in the literature, that the RIA of these MM fires is extremely sensitive to very small temperature changes, when irradiated cold. Since a reliable extrapolation of our results to the full SLHC dose is not possible, tests will be performed at low temperature to the full SLHC dose.

#### A. Acknowledgements

We would like to thank Drs. Jan Troska and Francois Vasey from CERN for their help and advice. We would also like to thank Drs. P.K. Teng and M-L. Chu (Academia Sinica, Taiwan) for providing both the radiation facility to test our peltier coolers and providing VCSEL's for these experiments. We acknowledge the financial support of the Science and Technologies Research council in the UK. The authors would also like to thank the trustees and donors of the John Fell Fund with Oxford University. The flexibility afforded by this fund enabled our further work without delay as we learned more through research.

#### REFERENCES

- [1] N. Hessey Overview and Electronics Needs for ATLAS and CMS High Luminosity Upgrades, in Proceedings of the Topical Workshop on Electronics for Particle Physics, Naxos, Greece, September 15-19 2008, CERN-2008-008, http://cdsweb.cern.ch/record/1108885?ln=en.
- [2] I. Dawson, private communication.
- [3] L. Amaral, et. al., "The Versatile Link, A Common Project for Super-LHC", Submitted to Journal of Instrumentation August 2009.
- [4] There are many reports of such effects including: H. Kanamori, et al., "1986 Transmission Characteristics and Reliability of Pure Silica-Core Single-Mode Fibers", J. Lightwave Technol. 4 1144. S. Thriault, "2006 Radiation effects on COTS laser-optimized graded-index multimode fibers exposed to intense gamma radiation fields", Proc. SPIE 6343 63431Q.
- [5] http://www.sckcen.be/en/Our-Services/ Irradiations/Gamma-irradiations.

- [6] B. Avridsson, et. al., 2009 JINST 4 P07010.
- [7] S. Amato *et al.*, LHCb Technical Design Report,CERN/LHCC/2000-0036 (2000).
- [8] The VSCEL's were supplied by Academia Sinica, Taiwan; TSA-8B12-00 Truelight.
- [9] Dr. G. Viehhauser private communication. D.Attree et al., "The evaporative cooling system for the ATLAS inner detector," 2008 JINST 3 P07003.



Figure 7: The upper plot shows the RIA for both the Infinitor SX+ fibre (pink) and the Draka RHP-1 fibre (green) where we have zommed in on the horizontal axis scale. The lower plot shows the temperature of those two fibres for the same dose. The Infinitor fibre's rms variation is 0.0035dB/m caused by a full-scale temperature variation of  $0.03^{\circ}$ C. The Draka fibre's RIA varies by 0.0013dB/m rms, and this is caused by full-scale temperature changes of  $0.8^{\circ}$ C.

# Study of the Radiation-Hardness of VCSEL and PIN

K.K. Gan<sup>a</sup>, B. Abi<sup>c</sup>, W. Fernando<sup>a</sup>, H.P. Kagan<sup>a</sup>, R.D. Kass<sup>a</sup>, M.R.M. Lebbai<sup>b</sup>, H. Merritt<sup>a</sup>, J.R. Moore<sup>a</sup>, A. Nagarkar<sup>a</sup>, F. Rizatdinova<sup>c</sup>, P.L. Skubic<sup>b</sup>, D.S. Smith<sup>a</sup>, M. Strang<sup>a</sup>

<sup>a</sup>Department of Physics, The Ohio State University, 191 W. Woodruff Ave., Columbus, OH 43210, USA

<sup>b</sup>Department of Physics, University of Oklahoma, 440 W. Brooks St., Norman, OK 73019, USA

<sup>c</sup>Department of Physics, Oklahoma State University, Stillwater, OK 74078, USA

#### gan@mps.ohio-state.edu

#### Abstract

The silicon trackers of the ATLAS experiment at the Large Hadron Collider (LHC) at CERN (Geneva) use optical links for data transmission. An upgrade of the trackers is planned for the Super LHC (SLHC), an upgraded LHC with ten times higher luminosity. We study the radiation-hardness of VCSELs (Vertical-Cavity Surface-Emitting Laser) and GaAs and silicon PINs using 24 GeV/c protons at CERN for possible application in the data transmission upgrade. The optical power of VCSEL arrays decreases significantly after the irradiation but can be partially annealed with high drive currents. The responsivities of the PIN diodes also decrease significantly after irradiation, but can be recovered by operating at higher bias voltage. This provides a simple mechanism to recover from the radiation damage.

#### I. INTRODUCTION

The SLHC is designed to increase the luminosity of the LHC by a factor of ten to  $10^{35}$  cm<sup>-2</sup>s<sup>-1</sup>. Accordingly, the radiation level at the detector is expected to increase by a similar factor. The increased data rate and radiation level will pose new challenges for a tracker situated close to the interaction region. The silicon trackers of the ATLAS experiment at the LHC use VCSELs to generate the optical signals at 850 nm and PIN diodes to convert the signals back into electrical signals for further processing. The devices have been proven to be radiation-hard for operation at the LHC. In this paper, we present a study of the radiation hardness of PINs and VCSELs using 24 GeV/c protons at CERN to the dose expected at the SLHC.

#### II. RADIATION DAMAGE IN VCSEL AND PIN

The main effect of radiation in a VCSEL is expected to be bulk damage and in a PIN diode the displacement of atoms. We use the Non Ionizing Energy Loss (NIEL) scaling hypothesis to estimate the SLHC fluences [1-2]. The silicon trackers will be consisted of a pixel detector followed by a stripe detector. For the pixel detector, we expect the optical links to be mounted off detector to reduce the radiation exposure and simplify the detector construction. In fact, the electric signals from the front-end electronics will be transmitted on micro-coax cables to a location ~6 m away. At this location, the radiation level is expected to be lower than that for the stripe detector. The optical links for the stripe detector will be mounted close to the detector which starts at a radius of ~ 37 cm. At this location, after five years of operation at the SLHC, we expect a GaAs device (VCSEL and PIN) to be exposed to a fluence [3] of 2.8 x 10<sup>15</sup> 1-MeV n<sub>eq</sub>/cm<sup>2</sup>. The corresponding fluence for a silicon device (PIN) is 7.2 x 10<sup>14</sup> 1-MeV n<sub>eq</sub>/cm<sup>2</sup>. We study the response of the optical devices to a high dose of 24 GeV/c protons. The expected equivalent fluences at SLHC are 5.4 and 12 x 10<sup>14</sup> p/cm<sup>2</sup>, respectively.

#### **III. RADIATION HARDNESS OF VCSEL**

In the past four years, we have irradiated a small sample of devices (typically 2-4 arrays per year) from three vendors, Advanced Optical Components (AOC), Optowell, and ULM Photonics with various bandwidths [4]. For the AOC, we irradiated three varieties of devices, 2.5, 5, and 10 Gb/s. For the ULM, we irradiated two varieties, 5 and 10 Gb/s. For the Optowell, we irradiated 2.5 Gb/s devices. Based on the multi-year study, we identified the AOC devices as more radiation hard and selected the 10 Gb/s device for further study with higher statistics. The original plan was to irradiate twenty 10 Gb/s AOC arrays in 2009. Unfortunately a production problem at the manufacturer reduced the irradiation sample to six devices. We packaged the VCSEL arrays at The Ohio State University for the irradiation [5].

The VCSEL arrays were mounted on a shuttle to allow the devices to be moved out of the beam for periodic annealing by passing the maximum allowable current (~11 mA per channel) through the arrays for ~12-16 hours each day. The optical power vs. dosage for a device irradiated in 2008 is shown in Fig. 1. The devices received an equivalent dose of 7.6 x 10<sup>15</sup> 1-MeV  $n_{eq}/cm^2$ . The optical powers of 14 channels from two 12-channel arrays are shown; the total number of channels that can be monitored during the irradiation was limited by the use of an older circuit board. The optical power decreased during the irradiation but increased during the annealing as expected. There was insufficient time for a complete annealing and the arrays were further annealed after returning to Ohio State. It is evident that the optical power recovery is logarithmic like and hence slow, but the arrays

recover much of the original power. However, there is a channel which has low power,  $\sim 200~\mu W.$  Further measurement on a different setup after the annealing indicates that the channel does indeed have good power as shown in Fig. 2 where the power is plotted vs. the channel number for various temperatures. It is evident that power increases with decreasing temperature and hence it is important to operate the VCSEL at low temperature (room temperature or below) to maximize the power output.



Figure 1: Optical power of two 10 Gb/s VCSEL arrays of AOC as a function of time. The power decreased during the irradiation but increased during the annealing. The extended annealing started at slightly past 200 hours.



Figure 2: Optical power of two 10 Gb/s VCSEL arrays of AOC for four different temperatures.

The result from the irradiation of the six 10 Gb/s VCSEL arrays of AOC in 2009 is shown in Fig. 3. The devices received an equivalent dose of 7.6 x  $10^{15}$  1-MeV  $n_{eq}/cm^2$ , which is the same as the year before. The behaviour of the optical power as a function of time is also similar to that shown in Fig. 1. The last segment shows a linear rise in the optical power. This line is added so that the last power measurement of each channel can be differentiated from the last data point measured after the long annealing. The length of this segment, the time separating the two measurements, is arbitrary and hence not physically meaningful. The last measurements were performed without the long twisted fibres used in the irradiation and hence most of the measured power is higher. It is evident that all channels except one have

optical power in excess of 300  $\mu$ W. The lowest power is 145  $\mu$ W. This channel has lower power (~250  $\mu$ W) at the beginning of the irradiation in contrast to the good power measurement at the Ohio State prior to the shipment to CERN. We will investigate the cause of the lower power once the arrays have been returned to Ohio State after the activation has subdued. The arrays will be annealed for an extended period and we expect more recovery of the optical power. The radiation hardness of these six AOC arrays is therefore acceptable for the SLHC applications. We plan to repeat the irradiation with a much larger sample, twenty arrays, in August of 2010, to fully qualify the arrays.



Figure 3: Optical power of six 10 Gb/s VCSEL arrays of AOC as a function of time. The power decreased during the irradiation but increased during the annealing. See the text for the comment on the last segment of the measurements.

#### IV. RADIATION HARDNESS OF PIN

In 2008, we irradiated both single channel and array PIN diodes from several sources. This includes two GaAs PIN arrays from AOC, Optowell, ULM Photonics, and Hamamatsu. We packaged these arrays at The Ohio State University for the irradiation [5]. In addition, we also irradiated silicon PINs, two Taiwan arrays and eleven single-channel silicon diodes from Hamamatsu (five S5973 and six S9055). These arrays were delivered pre-packaged.

We monitored the PIN responsivities during the irradiation by illuminating the devices with light from VCSELs and measuring the PIN currents. Table 1 summarizes the responsivities before and after irradiation. The responsivity is for a dose of 4.4 x  $10^{15}$  1-MeV  $n_{eq}/cm^2$  for the GaAs devices and 7.5 x  $10^{14}$  1-MeV  $n_{eq}/cm^2$  for the silicon devices. For the GaAs arrays, Optowell and Hamamatsu have the highest responsivities after the irradiation. As expected, the silicon devices are more radiation hard, with Hamamatsu S5973 having the highest responsivities. However, it should be noted that the bandwidth of the silicon PIN diodes is somewhat low.

Table 1: Responsivities (R) of PIN diodes from various sources before and after irradiation. The bandwidth (BW) of each device is also indicated.

|                 | BW      | R (A | /W)  |
|-----------------|---------|------|------|
|                 | (Gb/s)  |      |      |
| GaAs            |         | Pre  | Post |
| ULM             | 4.25    | 0.50 | 0.09 |
| AOC             | 2.5     | 0.60 | 0.13 |
| Optowell        | 3.125   | 0.60 | 0.17 |
| Hamamatsu G8921 | 2.5     | 0.50 | 0.28 |
| Si              |         |      |      |
| Taiwan          | 1.0     | 0.55 | 0.21 |
| Hamamatsu S5973 | 1.0     | 0.47 | 0.31 |
| Hamamatsu S9055 | 1.5/2.0 | 0.25 | 0.20 |

The PIN responsivity is expected to be a constant as a function of the bias voltage before irradiation. Figure 4 shows a typical example of the measurement for an Optowell PIN array. However, after a PIN is exposed to radiation, the responsivity increases with the bias voltage as shown in Figure 5 for the arrays from the three vendors that were exposed to a dose of 4.4 x 1015 1-MeV n<sub>eq</sub>/cm<sup>2</sup>. Figure 6 shows the responsivity as a function of the bias voltage up to the specified maximum of 40 V by the vendor. It is evident that by operating the array at this high bias voltage, the responsivity can reach the pre-irradiated value. However, the integrity of the signal at this high bias should be verified. Figure 7 shows the eye diagram of an 1 Gb/s signal at 40 V. The test is performed at this relative low speed because of the limitation of the array carrier board. It is evident that the eye diagram is quite open, indicating the operation at this speed is quite adequate. However, the interest in the SLHC applications is for a much higher speed and the high-speed performance will be verified in the future. Nevertheless, the design of the PIN receiver for the SLHC applications should allow the operation of the PIN diode at high bias voltage to take advantage of this interesting observation.



Figure 4: Responsivity as a function of bias voltage for a 12-channel Optowell PIN array before irradiation.

We chose to irradiate a larger sample of twenty Optowell PIN arrays in 2009 based on the results of the 2008 irradiation. This allowed us to test the uniformity of the radiation-hardness in a sample.



Figure 5: Responsivity as a function of the bias voltage for a channel in a 12-channel PIN array after irradiation. The PIN arrays are from three vendors, Optowell (top), AOC (middle), and ULM (bottom).



Figure 6: Responsivity as a function of the bias voltage for a 12channel Optowell PIN array after irradiation.

We irradiated the samples in two batches of ten arrays each. Unfortunately, the beam was not properly aligned in one of the batches, resulting in non-uniform dosage across the arrays. Consequently we will only present the results from the batch with uniform illumination. The analysis of the degradation in the responsivity of the other batch is more complicated and will be presented at a future conference.



Figure 7: Eye diagram of the response of an irradiated Optowell PIN array operating at 40 V. The speed of the incident optical signal is 1 Gb/s.



Figure 8: Responsivity of ten 12-channel Optowell PIN arrays before and after irradiation.

The responsivity of the ten arrays with an uniform proton illumination is shown in Fig. 8. The estimated dose is 8.1 x

10<sup>15</sup> 1-MeV  $n_{eq}$ /cm<sup>2</sup>. The responsivity after irradiation is ~ 0.3 A/W with a minimum of 0.15 A/W. This is certainly quite adequate for the SLHC applications. For example, with a modest incident optical power of 1 mW, the PIN current is 150  $\mu$ A. This is significantly above the expected operation threshold of 100  $\mu$ A to minimize single event upset (SEU) from traversing particles. We are awaiting the return of the irradiated devices for more detailed characterization after the activation has subdued.

#### V. SUMMARY

We have studied the radiation hardness of PINs and VCSELs up to the SLHC dose. The optical power of the VCSEL arrays decreases significantly after the irradiation but can be partially annealed with high drive currents. The responsivities of the PIN diodes also decrease significantly after irradiation, but can be recovered by operating at higher bias voltage. This provides a simple mechanism to recover from the radiation damage.

#### VI. ACKNOWLEDGEMENT

This work was supported in part by the U.S. Department of Energy under contract No. DE-FG-02-91ER-40690. The authors are indebted to M. Glaser for the assistance in the use of the T7 irradiation facility at CERN.

#### VII. REFERENCES

- A. Van Ginneken, "Nonionzing Energy Deposition in Silicon for Radiation Damage Studies," FERMILAB-FN-0522, Oct. 1989.
- A. Chilingarov, J.S. Meyer, T. Sloan, "Radiation Damage due to NIEL in GaAs Particle Detectors," Nucl. Instrum. Meth. A 395, 35 (1997).
- 3. The fluences include a 50% safety margin.
- 4. K.K. Gan et al., "Radiation-Hard/High-Speed Data Transmission using Optical Links", to be published in the Proc. of the 11th Topical Seminar on Innovative Particle and Radiation Detectors, Siena, Italy, 2008.
- K.K. Gan, "An MT-Style Optical Package for VCSEL and PIN Arrays", Nucl. Instrum. Methods. A 607, 527 (2009).

# The GBT Project

P. Moreira<sup>a</sup>, R. Ballabriga<sup>a</sup>, S. Baron<sup>a</sup>, S. Bonacini<sup>a</sup>, O. Cobanoglu<sup>a</sup>, F. Faccio<sup>a</sup>, T. Fedorov<sup>b</sup>, R. Francisco<sup>a</sup>, P. Gui<sup>b</sup>, P. Hartin<sup>b</sup>, K. Kloukinas<sup>a</sup>, X. Llopart<sup>a</sup>, A. Marchioro<sup>a</sup>, C. Paillard<sup>a</sup>, N. Pinilla<sup>b</sup>, K. Wyllie<sup>a</sup> and B. Yu<sup>b</sup>

> <sup>a</sup> CERN, 1211 Geneva 23, Switzerland <sup>b</sup> SMU, Dallas TX 75275-0338, USA

### Paulo.Moreira@cern.ch

#### Abstract

The GigaBit Transceiver (GBT) architecture and transmission protocol has been proposed for data transmission in the physics experiments of the future upgrade of the LHC accelerator, the SLHC. Due to the high beam luminosity planned for the SLHC, the experiments will require high data rate links and electronic components capable of sustaining high radiation doses. The GBT ASICs address this issue implementing a radiation-hard bi-directional 4.8 Gb/s optical fibre link between the counting room and the experiments. The paper describes in detail the GBT-SERDES architecture and presents an overview of the various components that constitute the GBT chipset.

#### I. RADIATION HARD OPTICAL LINK ARCHITECTURE

The goal of the GBT project is to produce the electrical components of a radiation hard optical link, as shown in **Figure 1**. One half of the system resides on the detector and hence in a radiation environment, therefore requiring custom electronics. The other half of the system is free from radiation and can use commercially-available components. Optical data transmission is via a system of opto-electronics components produced by the Versatile Link project, described elsewhere in these proceedings [1]. The architecture incorporates timing and trigger signals, detector data and slow controls all into one physical link, hence providing an economic solution for all data transmission in a particle physics experiment.



Figure 1 Radiation-hard optical link architecture

The on-detector part of the system consists of the following components.

**GBTX:** a serializer-de-serializer chip receiving and transmitting serial data at 4.8 Gb/s [2]. It encodes and decodes the data into the GBT protocol and provides the interface to the detector front-end electronics. Some of the implementation aspects of this ASIC will be the subject of the following sections.

**GBTIA:** a trans-impedance amplifier receiving the 4.8 Gb/s serial input data from a photodiode [3]. This device was specially designed to cope with the performance degradation of PIN-diodes under radiation. In particular the GBTIA can handle very large photodiode leakage currents (a condition that is typical for PIN-diodes subjected to high radiation doses [1]) with only a moderate degradation of the sensitivity. The device integrates in the same die the transimpedance pre-amplifier, limiting amplifier and 50  $\Omega$  line driver. The GBTIA was fabricated and tested for performance and radiation tolerance with excellent results. A complete description of the circuit and tests can be found in [3] in these proceedings.

GBLD: a laser-driver ASIC to modulate 4.8 Gb/s serial data on a laser [4]. At present it is not yet clear which type of laser diodes, edge-emitters or VCSELs, will offer the best tolerance to radiation [1]. The GBLD was thus conceived to drive both types of lasers. These devices have very different characteristics with the former type requiring high modulation and bias currents while the latter need low bias and modulation currents. The GBLD is thus a programmable device that can handle both types of lasers. Additionally, the GBLD implements programmable pre- and de-emphasis equalization, a feature that allows its optimisation for different laser responses. The GBLD has been prototyped and it is functional but displays a limited bandwidth and, therefore requires a small re-design to correct for under-estimated parasitic effects in the layout. Reference [4] in these proceedings describes the laser driver circuits and discusses the experimental results.

**GBT-SCA:** a chip to provide the slow-controls interface to the front-end electronics. This device is optional in the GBT system. Its main functions are to adapt the GBT to the most commonly used control buses used in High Energy Physics (HEP) as well as the monitoring of detector environmental quantities such as temperatures and voltages. The device is still in an early phase of specification and a discussion of its architecture can be found in reference [5] in these proceedings.

The off-detector part of the GBT system consists of a Field-Programmable-Gate-Array (FPGA), programmed to be compatible with the GBT protocol and to provide the interface to off-detector systems.

To implement reliable links the on-detector components have to be tolerant to total radiation doses and to single event effects (SEE), for example transient pulses in the photodiodes and bit flips in the digital logic [6]. The chips will therefore be implemented in commercial 130 nm CMOS to benefit from its inherent resistance to ionising radiation. Tolerance to SEE is achieved by triple modular redundancy (TMR) and other architectural choices described later in this paper. One such measure is forward error correction (FEC), where the data is transmitted together with a Reed-Solomon code which allows both error detection and correction in the receiver [2] and [7]. The format of the GBT data packet is shown in **Figure 2**. A fixed header (H) is followed by 4 bits of slow control data (SC), 80 bits of user data (D) and the Reed-Solomon FEC code of 32 bits. The coding efficiency is therefore 88/120 =73%, and the available user bandwidth is 3.2 Gb/s.

| н | SC | D   | FEC |
|---|----|-----|-----|
|   |    | 202 | →   |
| 4 | 4  | 80  | 32  |

Figure 2 GBT frame format

FPGA designs have been successfully implemented in both Altera and Xilinx devices, and reference firmware is available to users. Details on the FPGA design can be found in reference [8] in these proceedings.

#### II. THE GBTX PROTOTYPE: GBT-SERDES

The GBTX will be based on a 4.8 Gb/s Serializer-Deserializer (SERDES) circuit which will convert the input data received from the front-end electronics into a serial stream with the GBT format and will de-serialize the GBT frame transmitted from the counting room and feed the data to the front-end electronics.

From the point of view of manufacturability this circuit requires careful study and planning since it operates at high frequency with tight timing margins. Total dose radiation tolerance and robustness to Single Event Upsets (SEU) are major design requirements. They call for the use of circuits that have speed and power penalties when compared with those commonly used in engineering projects that target the consumer markets. An additional constraint that is specific to HEP applications is the requirement of predictable and constant latency links. To study the feasibility of a SERDES circuit that can handle all of these constraints in a commercial 130 nm CMOS technology, a prototype (the GBT-SERDES) is currently under development.



Figure 3 GBT-SERDES architecture

The architecture of the GBT-SERDES is shown in Figure 3. It is broadly composed of a transmitter (TX) and a receiver

(RX) section. The TX receives parallel data through the Parallel Input (Parallel In) interface. The parallel data is then scrambled and Reed-Salomon encoded before it is fed to the Serializer (SER) where it is converted into a 4.8 Gb/s serial stream with the frame format described above. On the RX side, after serial to parallel conversion in the De-serializer circuit (DES), the data is fed to the frame aligner, then Reed-Salomon decoded and de-scrambled before it is sent to the external parallel bus through the parallel output interface. The procedures adopted for Reed-Solomon encoding/decoding and scrambling/descrambling used in this implementation were already discussed in detail in references [2] and [7] and will not be reviewed in this work. For cost savings in the prototype, a time-division multiplexed parallel bus was adopted for the input and output buses thus significantly reducing the silicon area required to fabricate the circuit since the ASIC is pad limited.

In the receiver and transmitter data paths, switches have been inserted between the functional blocks. These switches allow routing the data, at different levels of depth down the data path, from either the RX into the TX or from the TX to the RX. This functionality can be used for evaluation testing of the ASIC but it mainly aims at providing a link diagnostics tool for field tests of the optical link that will use the GBTX. Further self testing features are a Pseudo Random Bit Sequence (PRBS) generator in the TX. The PRBS generator can also be programmed to produce constant data or a simple bit count. As shown in Figure 3 only the performance critical blocks (shaded regions) are implemented using full-custom design techniques while the remaining circuits are based on the standard library cells provided by the foundry.

The full custom circuits include the Serializer (SER), the de-serializer (DES) with its Clock and Data Recovery (CDR) circuit, the Clock Generator (CG) and the Phase Shifter (PS). The serializer circuit is described in detail elsewhere in these proceedings [9] and consequently will not be described here.

**De-serializer:** The de-serializer block diagram is represented in Figure 4. Its main features are: a Half-rate Phase/Frequency- Detector (HPFD), frequency aided lock acquisition and a constant-latency "barrel-shifter.



Figure 4 De-serializer architecture

CDR: A Half-rate Alexander Phase/Frequency Detector (HPFD) is used in the GBT-SERDES since it allows the use of a lower operation frequency of the CDR PLL and hence safer timing margins in the de-serializer circuit. Although the HPFD is of the bang-bang type, it is well suited for operation with scrambled data since the phase-error information is only provided when data transitions are present on the incoming serial stream. Although the phase detector used also detects frequency, its detection range is insufficient to cover all the process, voltage and temperature variations. To ensure that the CDR can always lock to the data it is thus necessary to pre-calibrate the VCO "free-running" oscillation frequency. For that, the VCO has two control inputs: a coarse control input that allows the centring of the VCO oscillation frequency and a fine control input that is under the CDR HPFD control and allows the CDR circuit to lock to the serial data. The ASIC provides two alternative ways to centre the VCO free-running oscillation frequency. In one method, a 9bit voltage DAC (not shown in Figure 4) is used to control the coarse input of the VCO. When using the DAC, the calibration procedure is the following. In a first phase the oscillation frequency of the VCO is compared with the reference clock frequency and a search of the coarse control voltage that leads to the smallest frequency error is done. When that operation is complete, the control is passed to the CDR HPFD which will finally pull the VCO frequency to data frequency and finally will lock to the phase of the incoming serial stream. In a second method the CDR VCO coarse voltage is derived from that of a reference PLL that is locked to the reference clock (see Figure 4). The VCOs in both PLL are replicas of each other so that for the same control voltage they should have the same oscillation frequency. Due to statistical variations on the fabrication process this is however not exact, leading to a slight difference between the VCO frequencies. The CDR VCO fine control voltage is under control of the CDR loop and, due to the frequency detecting ability of the HRPD, will be able to pull the CDR VCO to that of the incoming serial data.

Barrel-shifter: Since a Half-Rate phase detector is used there is an ambiguity of 180° on the phase of the VCO clock signal in relation to the phase of the incoming data. This ambiguity is non-deterministic and will vary randomly every time the CDR circuit is started. Moreover, since the word clock (40 MHz) is generated by frequency division of the VCO clock (2.4 GHz), its phase is random in relation to the start of the frame (i.e. frame header) and consequently to the LHC bunch-crossing clock. The receiver must thus find the boundaries of the frame in order to correctly interpret the incoming data. That function is commonly implemented in de-serializers by a barrel-shifter. These devices are used to search for the position of the frame header in a shift register. When found, the following bits in the shift register are taken to be the data. In other words, the serial data is shifted until the frame header aligns with the word clock. This method has however the disadvantage of having a non-predictable latency: every time the system is restarted the phase of the word clock is random in relation to the frame header. To avoid this problem and thus to guarantee fixed latency, a novel "barrel-shifter" principle is used in the GBT-SERDES. In this circuit, instead, the clock is shifted until the frame header is found in a definite position in the shift register. This guaranties that the clock is always aligned with the frame header. To phase shift the clock in order to search for the frame header the clock is phase advanced by a VCO clock cycle at a time. This is made by forcing the counter to skip a count cycle every time the clock phase needs to be advanced. Even when the frame header has been found in the correct position there is still an uncertainty of half clock cycle which is intrinsic to the use of the half-rate phase detector. This final ambiguity is resolved by the header detection circuit and the codes chosen for the header that together can detect if the phase of the VCO clock is in phase or in anti-phase with the header. After this phase relationship has been determined an extra phase shift of half clock cycle can be made if necessary in order to align the word clock with the beginning of the frame header and thus ensuring predictable and fixed latency as required for trigger links in HEP applications.

#### **PHASE SHIFTER:**

The purpose of the phase shifter is to generate multiple clocks as local timing references that are synchronous with the accelerator clock. The frequency and phase of the output clocks are digitally programmable. The output clock frequency can be 40 MHz, 80 MHz, or 160 MHz and the phase resolution is 50 ps independent of the frequency.

To handle multiple output frequencies and a phase resolution of 50 ps in a range of 25 ns (for the 40 MHz clock), the phase shifter is designed to consist of three components: a PLL, Coarse De-skewing Logic (CDL), and Fine De-skewing Logic (FDL). Figure Figure 5 depicts the overall system block diagram.



Figure 5 The block diagram of the phase shifter

From the 40 MHz accelerator reference, the PLL generates the FastClk of 1.28 GHz (with a period of 781 ps) for both the CDL and FDL blocks. The divider in the PLL is made of a 5bit binary counter whose outputs are used by the CDL to produce the right output clock frequency. Since the output clocks are synchronized with FastClk, the PLL guarantees the synchronization of the output clocks with the machine reference clock.

In addition to performing frequency selection, the CDL shifts the clock by multiple periods of the FastClk according

to the MSB bits of the control word (Delay [8:4] in Figure 5). The output of the CDL block is therefore a clock of the specified frequency with the phase shifted by multiples of 781ps.

The FDL is designed to fine de-skewing the clock by a fraction of 781 ps (one period of the FastClk). It is based on a modified DLL structure with a 16-stage voltage controlled delay line (VCDL). The 16 delay stages allow for fine de-skewing the clock by 1/16 of one period of the FastClk to obtain the 50 ps delay resolution. This is achieved by feeding the CDL clock to the VCDL and connecting a delayed version of the CDL clock, delayed by one clock cycle of the FastClk, to the phase detector (PD). The other input of the PD is the VCDL output. This architecture sets the delay through the VCDL to be exactly one period of FastClk, 781 ps, thus the delay through each stage is 50 ps. A 16:1 Mux is used to select the appropriate delay stage output based on the FDL control word (Delay[3:0]).

To generate multiple clock outputs simultaneously using this architecture, replicas of the CDL and FDL can be employed whereas one PLL can be shared among different channels. In the first version of the GBT chip, three phaseshifting channels are implemented.

**C4 PACKAGE:** The GBT-SERDES, and even more-so the future GBTX, are heavily pad-limited ASICs. Adoption of a wire bond packaging technique would result in high silicon area and thus in high silicon cost. C4 packages (flip-chip) and ASIC design techniques allow the distribution of the I/O over the full area of the ASIC and therefore reduce the wasted silicon area in pad limited designs. C4 packages are always custom made and thus incur development costs. However, in the case of the GBT-SERDES, the cost balance is in favour of the use of a C4 package.

Due to the absence of bond-wires, C4 packages exhibit very low parasitic inductances on the chip-to-package interconnect. Moreover, since they use fabrication technologies very similar to the ones employed for the fabrication of PCBs, it is possible to design controlled impedance transmission lines directly in the package in order to optimize the high speed connections. Considering both the economical and electrical advantages that the use of a C4 package could bring it was thus chosen to package the GBT-SERDES in a  $13 \times 13$  bump-pad C4 package.

#### **III. STATUS AND FUTURE DEVELOPMENTS**

The GBT-SERDES is expected in early 2010 and will then undergo tests, including an irradiation programme. These will verify the functionality of the serializer and de-serializer blocks which will then be incorporated into the final GBTX design. This will contain a more sophisticated digital interface for coupling to the front-end systems, as illustrated in **Figure 6** and **Figure 7**. The interface will be configurable so the user can select an appropriate mode to input and output the 80 bits of data per frame. Parallel mode (**Figure 6**) uses a 40-bit bidirectional double-data-rate bus running at the system frequency. The user can also split this into 5 independent 8-bit busses. An alternative configuration uses serial data transport, known as E-link mode (**Figure 7**). The interface can provide 40, 20 or 10 bidirectional serial links running at 80 Mb/s, 160 Mb/s and 320 Mb/s respectively. Each port transmits and receives the serial data and clock using the Scalable Low Voltage Signalling (SLVS) standard. The E-link port is being implemented as a portable design macro that can be incorporated easily within the design of a front-end chip. More details of this and SLVS can be found in [11]. One E-port can be dedicated to communication with the GBT-SCA chip (although other uses are not precluded). This will provide an interface between the GBT protocol and standards such as 12C and JTAG [5].







Figure 7 E-Link interface mode

The user will be able to operate the GBTX in one of three different data modes. In transceiver configuration, the chip will handle full bi-directional data, receiving its configuration from the link and acting as a clock source for the on-detector system. In simplex receiver configuration, the chip will receive data from the off-detector system and the transmission functions are disabled. The GBTX will provide the clock and can still be configured via the link, but the reading of its status will have to be done via a secondary link. In simplex transmitter configuration, the GBTX transmits data from the detector and the receiver functions are disabled. The chip will therefore require an external clock and configuration link. Both of these can be fulfilled by, for example, another GBTX in the transceiver configuration. These different configuration possibilities allow the user to optimise the GBT for their particular system.

#### IV. CONCLUSIONS

The GBT project is now at the prototyping stage for all components in the chipset. Measurements of the prototype GBTIA and GBLD indicate that functionality has been achieved, but some corrections are required in the case of the GBLD. The GBT-SERDES, incorporating the serializer and de-serializer blocks, has been designed with special measures to enhance radiation tolerance and will be submitted for fabrication in November 2009. Results are expected in early 2010 when the design of the final GBTX chip will start.

#### V. REFERENCES

[1] J. Troska et al., 'The Versatile Transceiver Proof of Concept', these proceedings

[2] • P. Moreira et al., 'The GBT, a Proposed Architecture for Multi-Gb/s Data Transmission in High Energy Physics', Topical Workshop on Electronics for particle Physics, Prague, Czech Republic, 3 – 7 Sept. 2007, pp. 332-336

[3] M. Menouni et al., 'The GBTIA, a 5 Gbit/s radiationhard optical receiver for the SLHC upgrades', these proceedings [4] G. Mazza et al., 'A 5 Gb/s Radiation Tolerant Laser Driver in 0.13 um CMOS technology', these proceedings

[5] A. Gabrielli et al., 'The GBT-SCA, a radiation tolerant ASIC for detector control applications in SLHCB experiments', these proceedings

[6] A. Pacheco et al, 'Single-Event Upsets in Photoreceivers for Multi-Gb/s Data Transmission', Nuclear Science, IEEE Transactions on Volume 56, Issue 4, Part 2, Aug. 2009 Page(s):1978 - 1986

[7] G. Papotti et al., 'An Error-Correcting Line Code for a HEP Rad-Hard Multi-GigaBit Optical Link', Proceedings of the 12<sup>th</sup> Workshop on Electronics for LHC and Future Experiments, Valencia, Spain, 25-29 Sept 2006, CERN-LHCC-2007-006

[8] F. Marin et al., 'Implementing the GBT data transmission protocol in FPGAs', these proceedings

[9] O. Cobanoglu et al. 'A Radiation Tolerant 4.8 Gb/s Serializer for the Giga-Bit Transceiver', these proceedings

[10] B. Razavi, 'Challenges in the Design of High-Speed Clock and Data Recovery Circuits', IEEE Communications Magazine, August 2002, pp: 94-101

[11] S. Bonacini et al., 'e-link: A Radiation-Hard Low-Power Electrical Link for Chip-to-Chip Communication', these proceedings

# The Versatile Transceiver Proof of Concept

# J. Troska, S.Detraz, S.Papadopoulos, I. Papakonstantinou, S. Rui Silva, S. Seif el Nasr, C. Sigaud, P. Stejskal, C. Soos, F.Vasey

CERN, 1211 Geneva 23, Switzerland

#### jan.troska@cern.ch

#### Abstract

SLHC experiment upgrades will make substantial use of optical links to enable high-speed data readout and control. The Versatile Link project will develop and assess optical link architectures and components suitable for deployment at SLHC. The on-detector element will be bidirectional optoelectronic module: the Versatile Transceiver that will be based on a commercially available module type minimally customized to meet the constraints of the SLHC on-detector environment in terms of mass, volume, power consumption, operational temperature and radiation environment. We report on the first proof of concept phase of the development, showing the steps towards customization and first results of the radiation resistance of candidate optoelectronic components.

#### I. INTRODUCTION

The Versatile Link project [1] aims to provide a multigigabit per second optical physical data transmission layer for the readout and control of Super LHC (SLHC) experiments. Point-to-point bidirectional (P2P) as well as point-tomultipoint (PON) architectures are foreseen to be supported by the systems and components currently being assessed and developed. The P2P implementation and its relationship with the GBT project [2] is shown in Figure 1.



Figure 1: P2P radiation hard optical link for SLHC

The front-end component that will enable the configuration of any of the Versatile Link's supported architectures is a bi-directional module composed of both optical transmitter and receiver: the Versatile Transceiver (VTRx). Both SingleMode (SM) and MultiMode (MM) flavours of the VTRx will be developed to support the various types of installed fibre-plant in the LHC experiments.

Components situated on the detectors at the front-end must meet strict requirements imposed by the operational environment for radiation- and magnetic-field tolerance, low temperature operation (between -40 and -10°C), low mass and volume, and low power consumption. The radiation environment is particularly challenging, as any device placed at the front-end must survive the Si-equivalent of  $1.5 \times 10^{15}$  n (1MeV)/cm<sup>2</sup> fluence and 500kGy ionizing dose. Experience with optical links deployed in LHC experiments has indicated that even the opto-electronic modules situated on the detectors should be sufficiently rugged to allow handling by integration teams relatively unfamiliar with their use. For this reason the VTRx development aims to minimally customize a commercial form factor bidirectional transceiver module that features a direct optical connector interface.

In this paper we will present how we have achieved these goals by providing details of the internals of the module that we have built and showing results of the optoelectronic characterization that has been carried out. Additionally, a critical requirement for the choice of laser- and photo-diodes to be included in the VTRx is that of radiation resistance. A first survey of devices has been carried out to gauge their resistance to displacement damage (the most challenging type of radiation damage for active opto-electronic devices).

#### II. PACKAGING

The most promising commercial form factor for modification to meet the needs of operation within the SLHC detectors is the SFP+, which measures approx. 50mm long by 10mm wide by 14mm high. Such a commercial module contains a laser diode driver (LDD) and laser in the transmit path, a photodiode plus transimpedance (TIA) and limiting amplifiers (LA) in the receive path, along with a microcontroller ( $\mu$ C) for module control (Figure 2 a). The VTRx will omit the microcontroller, replace the ASICs with custom-designed radiation resistant versions, and add commercially available laser- and photo-diodes (Figure 2 b) that have been qualified to be sufficiently radiation-resistant.



Figure 2: Block diagram of (a) Standard SFP+ transceiver and (b) Versatile transceiver showing the differences between the two.

Work on packaging has been carried out on two major fronts: the investigation of suitable components for inclusion in the VTRX (custom and commercial laser drivers and TIAs, ROSAs and TOSAs); and becoming familiar with the design issues associated with transceiver packaging through the evaluation of commercial test boards and transceiver modules sourced from an industrial partner as well as the in-house design of test PCBs to evaluate the high-speed components.

We have also successfully tested modified lower-mass SFP+ modules sourced from a commercial transceiver manufacturer. These show that removing material from the metallic SFP+ housing does not adversely affect the performance of individual modules (see Section III for detailed results).

Finally, a study has been carried out to characterize laser diodes through the development of a package and device model that can be used by both ASIC and PCB designers to aid the matching to particular devices. This model [3], with the parameters extracted from the measurement of several candidate laser transmitters, has been successfully used to simulate the performance of a matching network and PCB layout for connection of a laser transmitter to a commercial laser driver. The GBLD [4] designer has also recently used this model to confirm the measured performance of his ASIC.

#### **III.** FUNCTIONAL TESTING

Two main methods for assessing the functionality of optical transceivers have been adopted: measurement of signal 'eye' diagrams and Bit Error Rate (BER) testing. Both measurement methods have been implemented in our laboratory are used routinely to characterize the performance of components and full transceivers. They are described in detail in Reference [5] and outlined below for completeness.

Measurement of the optical output of a transmitter driven with a pseudorandom bit pattern using a sampling oscilloscope yields an optical eye diagram from which the salient characteristics can be extracted. When such an optical signal is fed back to the optical receiver the same method can be applied to the electrical output of the receiver. Attenuating the optical input to the receiver allows measurement of receiver performance under stressed conditions. We extract metrics such as amplitude, rise/fall times, noise and jitter from such eye diagrams. Figure 3 shows a typical test setup and a typical eye diagram with parameter definitions is shown in Figure 4.



Figure 3: Showing the test setup for eye diagram measurements.



Figure 4: showing a typical eye diagram with parameter definitions.

Measurement of BER as a function of optical modulation amplitude at the receiver allows determination of the receiver sensitivity and thus the overall system power budget. We have implemented a custom BER tester based upon a Xilinx Virtex 5 FPGA evaluation platform that allows us to test not only the basic BER but also the performance of the proposed Forward Error Correction (FEC) code of the GBT protocol [6]. Figure 5 shows a typical test setup.



Figure 5: showing the test setup for Bit Error Rate measurements.

We have implemented a visual method for inter-device comparison of the relatively large number of parameters produced per DUT in preparation for being able to compare the relative performance of different components and transceivers. This method creates a so-called Spider- or Radar plot where each parameter is plotted on its own axis and then joining the plotted points on different axes to provide a sort of fingerprint for each DUT that is easily compared visually to the others. An example Spider plot is shown in Figure 6, which shows a comparison of the overall performance of SM and MM transceivers.



Figure 6: showing an example Spider- or Radar plot comparing the performance of several SM and MM transceiver modules operating at 5Gb/s. Tj and Dj are Total and Deterministic Jitter, respectively.

The Spider plots allow an easy visual comparison between different DUTs, which makes it rather appropriate for investigations involving changes in the transceiver packaging. Figure 7 shows the comparison of three tested generations of SM VTRx prototype: a first standard fully metallic package containing a SM VCSEL transmitter operating at 1310nm; a second standard package containing a DFB edge-emitter; and a third containing the same active components as the second but with a significant amount of metallic shielding removed from the package. Clearly the change in transmitting laser has a large impact on several performance parameters, whereas it is very encouraging that reducing the amount of material appears to have little impact on device performance. We had been concerned that removing material would lead to crosstalk between transmitting- and receiving sides of the VTRx once the electrical shielding was removed, but this appears not to be the case. This result is confirmed by the measurements of MM VTRx prototypes shown in Figure 8, where generation 2 and 3 differ in the packaging only as described for the SM modules.



Figure 7: Performance comparison of different SM packaging generations. Values further from the Centre are better.



Figure 8: Performance comparison of different MM packaging generations. Values further from the Centre are better.

#### IV. RADIATION TESTING

Two radiation tests have been carried out during the first phase of the VTRx development: a Single Event Upset (SEU) test using 60MeV protons at PSI, Villigen, CH and a total fluence test using 20MeV neutrons at the cyclotron facility of UCL, Louvain-la-Neuve, B. The goal of both tests was to survey a large number of devices from different manufacturers in order to compare their relative radiation resistance.

#### A. SEU Test

The SEU test surveyed SM and MM bare PiN photodiodes and ROSAs by operating them in the proton beam and measuring the effect of the beam on their BER curves. This showed that the passage of particles through the devices can corrupt the data leading to an increase of BER as expected. For operation in SLHC Trackers this increase is beyond tolerable and thus requires the use of FEC in order to guarantee a BER below 10<sup>-12</sup>. In addition, this test showed that burst errors lasting up to ten consecutive bits can occur in photodiodes, while such bursts may last for hundreds of bits in the case of ROSAs where the receiver TIA is also in the beam. The currently proposed GBT FEC scheme can correct the former but not the latter bursts and so to maintain the BER below 10<sup>-12</sup> the GBTIA will have to be SEU-hardened by design. Full results have been published [7].

#### B. Total Fluence Test

The total fluence test surveyed a wide spectrum of commercially available lasers and photodiodes. We have tested single-channel devices from ten different manufacturers. A total of 20 laser devices included two types of 850nm VCSEL, four types of 1310nm Fabry-Perot (FP) edge-emitting laser and three variants of long wavelength (1310/1330/1550 nm) VCSEL. A total of 28 PIN devices included three types of MM GaAs devices and four types of SM InGaAs devices.

The irradiation took place at the cyclotron facility of the Université Catholique de Louvain-la-Neuve in Belgium. Devices were mounted in groups on PCBs that were stacked in front of the neutron-producing Beryllium target. The distance from the target to the devices varied from 13 cm to 18 cm depending upon the location in the stack. Figure 9 shows the fluences reached by the DUTs during the test. There were two periods with no beam due to problems with the operation of the cyclotron.



Figure 9: The shaded area represents the range of fluences to which the DUTs were exposed. This variation is due to the distance of individual DUTs from the Beryllium target.

DC device characteristics were measured every twenty minutes during both irradiation and recovery periods. For laser devices we measured their L-I-V curves in order to extract the maximum output power, threshold current, efficiency and series resistance. The progression of the LIV curves during irradiation is shown in Figure 10. For the photodiodes we measured their response to varying levels of light input that allowed us to extract their responsivity and leakage current as a function of applied reverse bias. The typical response for an InGaAs device is shown in Figure 11.



Figure 10: showing the typical behaviour of a laser L-I-V curve during irradiation. The device is a 1310nm FP laser, which stops lasing after a little more than  $2x10^{15}$  n/cm<sup>2</sup>.



Figure 11: Typical measurement result for a SM PIN showing its response to varying input light power for various increasing levels of irradiation.

For lasers, we show the reduction of the maximum output optical power as a function of total fluence in Figure 12. The smallest active volume devices (MM VCSELs operating at 850nm) showed the highest resistance to radiation damage and remained functional after exposure. All of the longer wavelength SM devices stopped lasing at the highest fluences reached during the test. Of the SM devices again the smaller active volume devices (VCSELs and Quantum Dot lasers) survived to higher fluences than standard edge emitting FP All devices showed recovery after irradiation devices. indicating that the lower flux exposure of the SLHC application will yield less overall damage. Given the observed increases in forward voltage and the already higher pre-irradiation values of the MM VCSELs, further analysis will be required in order to get the full picture of the system implications of these results. Only once this is done can the final conclusion and device selection be carried out.



Figure 12: Laser maximum output power as a function of total fluence during irradiation (left-hand side) and then as a function of recovery time (right-hand side).

All InGaAs-based long wavelength devices showed a similar decrease in responsivity (Figure 13) and increase in leakage current (Figure 14), while the GaAs-based devices showed a larger relative drop in responsivity yet no measurable increase in leakage current. The damage in both material types did not anneal post-irradiation. From a system perspective, the lack of leakage current increase in the MM GaAs devices seems very attractive. However, these devices showed a larger relative drop in responsivity and already have a pre-irradiation responsivity value that is at least 50% lower than their SM InGaAs counterparts. So in terms of system margin the final comparison will depend upon the relative impact of increased leakage current on the receiver sensitivity, a parameter that depends entirely on the performance of the transimpedance amplifier (TIA).



Figure 13: Showing the evolution of PIN responsivity at 2V reverse bias as a function of fluence (left-hand side) and then recovery time (right-hand side).



Figure 14: Showing the evolution of PIN leakage current at 2V reverse bias as a function of fluence (left-hand side) and then recovery time (right-hand side).

The data obtained from the total fluence test for lasers are still being analysed to assess whether a shorter irradiation could be used to predict the final outcome, something that is desirable in terms of reducing the cost of future tests.

#### V. CONCLUSION

The first phase of development of the VTRx – the frontend component of the Versatile Link – has been successfully completed. We have demonstrated the concept of minimally modifying a commercial transceiver module for use in upgraded SLHC detector systems having removed a significant amount of material and measured no impact on device performance. We have carried out a survey of radiation response to SLHC fluences of a number of commercially available optoelectronic transmitters and receivers. The survey results indicate that we will be able to find a number of commercial devices that are sufficiently radiation resistant to employ in both SM and MM variants of the VTRx.

In the next phase of the project we will further investigate the radiation tolerance of the VTRx and its sub-components. We plan further SEU, total dose and total fluence tests to investigate the details of the radiation response of the components in order to be able to predict the performance of the VTRx once installed in upgraded SLHC detectors.

Further modifications to the VTRx packaging are envisaged in order to reduce the module mass to a strict minimum while ensuring the specified performance of both the VTRx and other parts of the detector systems in which it will be used. The Electromagnetic Compatibility (EMC) properties of the VTRx – that is how much it affects and is affected by its electromagnetic environment – are of particular concern, as the device will be switching relatively large currents at high speeds in the vicinity of the sensitive amplifiers of the detector front-ends.

#### VI. REFERENCES

- [1] "The Versatile Link, A Common Project For Super-LHC", F. Vasey et al., submitted to JINST
- [2] "The GBT Project", P. Moreira et al., these proceedings
- [3] "Characterization of Semiconductor Lasers for Radiation Hard High Speed Transceivers", S. Silva et al., these Proceedings
- [4] "A 5 Gb/s Radiation Tolerant Laser Driver in 0.13 um CMOS technology", G. Mazza et al., these proceedings
- [5] "Evaluation of Multi-Gbps Optical Transceivers for Use in Future HEP Experiments", L. Amaral et al., Proceedings of TWEPP 2008, CERN-2008-008, pp. 151-155
- [6] "FPGA-based Bit-Error-Ratio Tester for SEU-hardened Optical Links" C. Soos et al., these proceedings
- [7] "Single-Event Upsets in Photoreceivers for Multi-Gb/s Data Transmission", A. Jimenez Pacheco, J. Troska et al., IEEE Trans. Nucl. Sci., Vol. 56, Iss. 4, Pt. 2 (2009), pp. 1978 - 1986

# Passive Optical Networks for the Distribution of Timed Signals in Particle Physics Experiments

I. Papakonstantinou<sup>a</sup>, C. Soos<sup>a</sup>, S. Papadopoulos<sup>a</sup>, J. Troska<sup>a</sup>, F. Vasey<sup>a</sup>, S. Baron<sup>a</sup>, L. Santos<sup>a</sup>, S. Silva<sup>a</sup>, P. Stejskal<sup>a</sup>, C. Sigaud<sup>a</sup>, S. Detraz<sup>a</sup>, P. Moreira<sup>a</sup>, I. Darwazeh<sup>b</sup>

<sup>a</sup> CERN, Div. PH-ESE, 1211 Geneva 23, Switzerland <sup>b</sup> University College London, Department of Electronics Engineering, Torrington Place, London WC1E 7JE, UK

# jan.troska@cern.ch

#### Abstract

A passive optical network for timing distribution applications based on FPGAs has been successfully demonstrated. Deterministic latency was achieved in the critical downstream direction where triggers are distributed while a burst mode receiver was successfully implemented in the upstream direction. Finally, a simple and efficient protocol was introduced for the communication between the OLT and the ONUs in the network that maximizes bandwidth utilization.

#### I. INTRODUCTION

Optical links are deployed in a number of applications currently in the Large Hadron Collider (LHC) where both point-to-point (P2P) and point-to-multipoint (P2MP)topologies are exploited for data collection, timing distribution and control and management signal transmission. P2P links are mainly used in data read out systems, as the inherent bandwidth sharing property of the P2MP links makes them inadequate to be used in such applications. However, P2MP links are seen to offer advantages in cases when signals have to be broadcasted simultaneously to a number of destinations. This is the case for the Timing-Trigger and Control (TTC) system, [1] the part of which we are interested in is shown in Fig. 1, where triggers and commands are distributed downstream from the TTCex to a number of TTCrxs. Typically two variations of the TTC system are met depending on whether the TTCrxs are installed inside the detector or in the counting room, Fig. 1. Optical links are unicast in both cases and information is flowing only in the downstream direction from the TTCex to the TTCrxs. A separate "busy" electrical data link is used in order for the TTCrxs to communicate their status back to the TTCex but the "busy" link is usually slow to respond and it would be beneficial if the communication took place in real time. The objective of this work is to design a bidirectional optical link based on the commercial Passive Optical Network (PON) architecture to combine both downstream and upstream data in the same fibre while at the same time being able to meet the stringent latency and jitter requirements of the bespoke optical networks used in particle physics experiments.

#### II. PASSIVE OPTICAL NETWORKS

Passive Optical Networks (PONs) are Point-to-MultiPoint optical networks with no active elements in the signal's path



Figure 1: LHC TTC system.



Figure 2: A schematic representation of a Passive Optical Network

from the source to the destination. A master node, the Optical Line Terminal (OLT), communicates to a number of slave terminals, the Optical Network Units (ONUs), via a long feeder optical fiber and an optical splitter, Fig. 2. In the downstream direction (OLT $\rightarrow$ ONUs), PON is a broadcast network and so collisions cannot occur. Data are delivered to all ONUs which decide whether to further process them or to ignore them based on an address field. However, in the upstream direction (ONUs $\rightarrow$ OLT) a number of ONUs share the same transmission medium and so a channel arbitration



Figure 3: Optical link power budget diagram.



Figure 4: PON Demonstrator with one master (OLT) and two slave (ONU) nodes and 1km of fiber.

mechanism should be put in place to prevent collisions and to distribute bandwidth fairly among them. Time Division Multiple Access (TDMA) is the preferred multiplexing scheme in the first generation PONs as it is very cost effective while Dynamic Bandwidth Allocation (DBA) algorithms are employed for fairness. Typical commercial PON systems operate at 1.25Gb/s or 2.5Gb/s symmetric (downstream data rate equal to upstream), or asymmetric modes (downstream data rate higher than upstream).

#### **III. PON DEMONSTRATOR**

The aim of this project is to construct a PON demonstrator able to distribute trigger and command data with deterministic latency and fixed jitter while allowing the ONUs to communicate with the OLT in real time.

#### A. System Requirements and Specifications

A PON for TTC applications should meet the following requirement:

- System has to be able to deliver **synchronous** triggers and commands **continuously**
- Latency has to be fixed at both transmitting and receiving ends in the downstream direction
- A clock should be recovered from the downstream data with **low jitter**
- System should provide with the flexibility of both individually addressing or broadcasting to ONUs

• ONUs have to be able to respond in short time

| <b>Property (General)</b>  | PON Demonstrator                              |
|----------------------------|-----------------------------------------------|
| Clock rate                 | 40 MHz (ie LHC clock 40.08MHz)                |
| Max distance               | Up to 1000m                                   |
| Encoding   Target BER      | NRZ 8b/10b   <10-12                           |
| Splitting ratio            | 64                                            |
| Frame Format               | Commands + Trigger                            |
| BW Allocation<br>Algorithm | Statistical Multiplexing                      |
| Property (Down Up)         | PON Demonstrator                              |
| Bit rate                   | 1.6 Gb/s   800 Mb/s                           |
| Latency                    | Fixed and Deterministic  <br>To be determined |
| Received clock jitter      | Able to drive a high-speed SERDES             |

The specifications of the system built are given in Table I. OLT and ONU transceivers were purchased from OESolutions [2] and were 1.25Gb/s EPON PX-20 standard compliant while the logic of our system was implemented on a Virtex 5, FPGA by Xilinx [3]. Power budget calculations, Fig. 3, revealed that we could comfortably support 64 ONUs in our network for 1km distance and so we designed our protocol to be able to support such a number of ONUs. However, due to restrictions to the number of evaluation boards and FPGAs we had at our disposal, we physically implemented a PON with 2 ONUs, Fig. 4, which were enough to allow us to demonstrate and to test all desired features.

#### IV. COMMUNICATION PROTOCOL

A feasibility study was conducted to evaluate the two commercial PON protocols, EPON (Etheret-PON) and GPON (Gigabit-PON), [4]-[5], and their potential to be used in our environment. The study concluded that none of the commercial protocols would be able to deliver the triggers with the strict timing requirements of the LHC experiments and so a custom protocol was devised that was addressing the following requirements:

- Synchronous delivery of a periodic trigger with clock rate 25ns, (T) field in Fig. 5.
- Auxiliary field to extend or to protect the trigger field,
  (F) field in Fig. 5.
- Broadcast or individual commands to ONUs, (D1) and (D2) field in Fig. 5.
- Arbitration of upstream channel to avoid collisions due to simultaneous transmissions from multiple ONUs, (R) field in Fig. 5.

#### A. Downstream Frames

Downstream is the most important direction for the network synchronization. According to the developed custom protocol, superframes are flowing in the downstream direction which consist of 65 subframes, Fig. 4. The beginning of each



MSB LSB Figure 5: (a) OLT→ONU Upstream frame. Each field in the diagram corresponds to 1 byte. (b) Zoom in a D1 field to demonstrate how the distinction between broadcast or individually addressed ONUs is implemented.



Figure 6: Timing relationship between two successive ONU→OLT bursts



Figure 7: Upstream frame.

superframe is signalled by a comma alignment character,  $\langle \mathbf{K} \rangle$ , which is used for synchronization and frame alignment. After, the  $\langle \mathbf{K} \rangle$  character the transmission of the first subframe begins. The first field of the subframe,  $\langle \mathbf{T} \rangle$ , carries the trigger information and is 1 byte long to provide the flexibility of assigning different triggers. The second byte,  $\langle \mathbf{F} \rangle$ , is an auxiliary field that might be used to either extend the trigger field or to protect it by means of forward error correction.

The last two characters in the subframe,  $\langle D1 \rangle$  and  $\langle D2 \rangle$ , carry commands intended for ONU1 only or commands broadcasted to all ONUs. The transmission duration of the four bytes (**T**, **F**, **D1** and **D2**) in each subframe is 25ns, at the at 1.6Gb/s downstream rate, corresponding to exactly one trigger period. Once the first subfame is finished the second subframe begins transmitting back-to-back. The structure of the second subframe is identical to the first one with the distinction that the D1 and D2 fields are now intended for ONU2 only unless if we operate in the broadcast mode. The distinction between individually addressed commands and broadcast commands depends on the most significant bit (MSB) in the D1 field, Fig. 5 (b). If this bit is "0" then we have a broadcast command if it is "1" then we have individual addressing. Sixty four such subframes are sent downstream,



Figure 8: (a) Osciloscope traces of bursts with different power (b) burst mode *Rx* dynamic range as a function of interframe gap (IFG).



Figure 9: Waiting time between two successive transmissions from one ONU as a function of IFG.

as many as the number of supported ONUs, before the transmission of the last subframe that concludes the superframe. The 65th subframe is 3bytes long only, to restore the symmetry in the superframe and to allow the first trigger in the next superframe to be exactly 25ns apart from the last one. An important feature of the 65<sup>th</sup> subframe is that it finishes with an  $\langle \mathbf{R} \rangle$  character, which is used to arbitrate the occupation of the upstream channel as it will be explained in the next section.

#### B. Upstream Transmission and Frames

The  $\langle \mathbf{R} \rangle$  character carries the address of the next ONU to occupy the upstream channel. In the example shown in Fig. 6, an  $\langle \mathbf{R} \rangle$  characters arrives that contains the address of the ONU N1. Although the  $\langle \mathbf{R} \rangle$  character is received by everybody, only ONU N1 starts switching its laser on. After an initial period required for the power of the laser in the ONU N1 to stabilise, it starts transmitting its data in a predefined time window before it switches its laser off. Precautions have been taken to leave a gap without transmission between two successive transmissions from different ONUs, the interframe gap (IFG), to allow to the burst mode receiver at the OLT to get ready to accept a new burst.

The upstream frame is shown in Fig. 7. It starts with a long transition rich field (alternated 1s and 0s) to allow to the burst mode receiver to successfully recover the average transmission level and to set its decision threshold. It then contains a comma  $\langle \mathbf{K} \rangle$  character for frame alignment



Figure 10: OLT transmitter implementation in FPGA.



Figure 11: ONU receiver implementation in FPGA.

followed by the address of the ONU and the transmitted data. The IFG affects the amount of bandwidth that is available in the upstream for pure data transmission and is closely related to the dynamic range of the receiver the maximum difference between the powers from two successive bursts for error free operation. Figure 8 shows experimental results of the IFG as a function of the dynamic range. According to Fig. 7, the larger the power difference between two bursts arriving at the OLT Rx, the larger the IFG required to maintain errorless operation. It is therefore advised to design a PON network whose branches are balanced, in terms of optical loss, to keep the IFG as small as possible and thus to maximize the upstream bandwidth.

Another important parameter in PON networks is the time that an ONU has to wait before it is able to occupy the transmission medium. Fig. 9 shows the waiting time between two consecutive transmissions from the same ONU as a function of the IFG and for different number of supported ONUs. The waiting time increases linearly with the IFG which is another reason to prefer balanced PONs that require minimum IFGs. At the same time as we add more ONUs in the system and we increase the IFG, the available bandwidth per ONU for data transmission reduces. Figure 9 reveals an interesting trade-off: On one hand we want to be able to design a network with as many client ONUs served by a single OLT as possible to reduce the cost of the system; On the other hand, the greater the number of supported ONUs the longer the waiting time. A balance between cost and waiting time must therefore be found.

#### V. TRANSCEIVER DESIGN IN VIRTEX 5 FPGA

This section introduces the transmitter and receiver designs for the upstream and downstream datastreams with emphasis given on the steps taken to achieve deterministic latency.



Figure 12: (a) Oscilloscope traces showing the phase difference between a reference clock (green line) and the recovered by the Rx parallel clock (blue line) for different barrel shifter values. (b) Relative delay between reference and recovered clocks as a function of the barrel shifter position for the two ONUs implemented in our system for 200 test cases.

#### A. OLT Transmitter

The transmitter at the OLT is implemented based on the GTX transmitter of the Virtex 5, a more detailed description of which can be found in [3]. Latency issues at the Tx generally arise when data cross clock domains such as the Tx-PCS and the Tx-PMA in our case (Fig. 10). These two domains are clocked by the RXUSRCLK and the XCLK correspondingly, two clocks that are not phase aligned but have to be for the correct operation of the serializer block (PISO). The default method to phase align these two clocks is by using an elastic buffer (FIFO) which introduces a non-deterministic latency. Instead, we operate the GTX transmitter in advanced mode where we completely bypass the elastic buffer and use the PMA PLL to adjust the phase of the XCLK so that it matches the phase of the RXUSRCLK. The total latency through the transmitter was measured to be 75ns.

#### B. ONU Receiver

The ONU receiver design is shown in Fig. 11. The 1.6Gbit/s serial datastream is presented at the input of a CDR (clock and data recovery) circuit. The CDR recovers the clock from the incoming bitstream, retimes the data and passes them on to the next stage which is a serial-to-parallel circuit (SIPO). In addition, a divider generates the FPGA receiver parallel clock which is also fed to the SIPO and which affects the time that the parallel data leave from the SIPO. The operation of the divider is the most vulnerable part in the receiver with regards to achieving deterministic latency. This is because the 80 MHz parallel clock can lock on any edge of the serial 800 MHz clock when the receiver is reset introducing non-deterministic latency.

The latency issue that the divider introduces is solved by implementing a barrel shifter after the SIPO (Fig. 11). In order to identify the relative phase of the parallel clock compared to the serial clock, we take advantage of the  $\langle \mathbf{K} \rangle$ character in the downstream superframe and the fact that the order with which the parallel data exit from the parallel lines of the SIPO is affected by the operation of the divider as well. To make this point more explicit in the speculative scenario where the parallel clock started from the first edge of the serial clock, the first bit of the  $\langle \mathbf{K} \rangle$  character should come out from the first parallel line of the SIPO, the second bit from the



Figure 13: Burst mode receiver operating on oversampling mode.

second line and so on. However, if the parallel clock was delayed compared to the first edge of the serial clock then the first bit of the  $<\mathbf{K}>$  character would be transferred to a different output line of the SIPO. The job of the barrel shifter is to identify which line exactly the first bit of the  $<\mathbf{K}>$  character came out from and to feed this information to a PLL to perform the phase correction task. This last phase correction step has not yet been implemented.

Figure 12 demonstrates the operation of the barrel shifter concept by comparing the phase of a fixed reference clock with the phase of the recovered at the ONU clock for different barrel shifter values and for both ONUs supported by our system. The relative delay between reference and recovered clocks follows a linear trend. The slope of the two lines is 606ps and 629ps for the two ONUs correspondingly close enough, to within experimental error, to the expected value of 625ps that corresponds to the period between two consecutive edges of the serial clock. Based on these measurements, the barrel shifter concept will allow us to correct the latency at the receiver.

#### C. OLT Burst Mode Receiver

The burst mode receiver in the OLT, Fig. 13 (a), requires a 5x oversampling circuit. Burst mode oversampling works by blindly sampling the incoming datastream at a multiple of the bit rate and making a decision based on the sample that is closest to the center of the bit, [6]. This method is preferred over the usual implementations that use PLLs to recover the clock since PLLs typically have a large time constant and therefore are impractical to be used in high speed serial applications that involve bursts. The oversampling circuit generates 5 samples for each received bit (Fig. 13 (b)) and then tries to identify the transition region between bits. It is therefore important to provide a sufficient number of transitions in the datastream, a requirement satisfied by the long <5555> field transmitted in our upstream frame (Fig. 7). A decision circuit collects all samples from a predefined

window of incoming bits and implements a majority voting algorithm to identify the sample which is most likely to be closest to the center of the bit. If a burst from a second ONU arrives then it will be out of phase with the previous burst, Fig. 12 (b). In this case, the decision circuit will identify the new transition regions and adjust its decision sample.

#### VI. FUTURE DEVELOPMENTS

In order to complete our demonstrator system we will carry out the following implementations.

- The system will migrate onto two FPGA platforms, one for the OLT and one for the ONUs.
- The Barrel shifter position will be used to feed a PLL in order for the latency of the receiver at the ONU to become constant.
- Currently, we measured a jitter at the recovered parallel clock at the ONU of 166ps pk-to-pk and 36ps RMS which is worse than our specifications. An external PLL will be used to clean the jitter from the recovered clock.

#### VII. CONCLUSIONS

Our work has shown that bidirectional optical links based on Passive Optical Networks are excellent candidates for future TTC distribution systems. Optical links with fixed latency in the downstream direction and potentially low jitter where demonstrated while at the same time information was allowed to flow in the opposite direction through the same optical fiber. In future systems a ranging mechanism might be implemented through which the round trip time between the OLT and each ONU can be calculated. In this case, we need to ensure that the latency in the upstream direction is deterministic as well.

#### VIII. ACKNOWLEDGEMENT

We would like to thank the European Commission and the ACEOLE project for support through a Marie Curie fellowship.

#### IX. REFERENCES

- J. Troska, E. Corrin, Y. Kojevnikov, T. Rohlev, and J. Varela, "Implementation of the Timing, Trigger and Control System of the CMS Experiment," *IEEE Trans. Nucl. Scienc.*, vol. 53, pp. 834-837, Jun 2006.
- [2] <u>www.oesolutions.com</u>.
- [3] Virtex 5 GTX transceiver Available online.
- http://www.xilinx.com/support/documentation/data\_sheets/ds022-1.pdf. [4] EPON Standard is part of the Ethernet IEEE802.3 standard and is available online from IEEE. http://standards.ieee.org/getieee802/802.3.html.
- [5] GPON is the ITU-T G.984 standard available online from ITU. http://www.itu.int/rec/T-REC-G/e.
- [6] J. Kim, and D.-K. Jeong, "Multi-Gigabit-Rate Clock and Data Recovery Based on Blind Oversampling," *IEEE Comm. Magazine*, Vol. 41, pp. 68-74, Dec. 2003.

# Thursday 24 September 2009 **TOPICAL**

# Low Power SoC Design

#### **Christian Piguet**

CSEM, Neuchâtel, Switzerland

#### Abstract

The design of Low Power Systems-on-Chips (SoC) in very deep submicron technologies becomes a very complex task that has to bridge very high level system description with low-level considerations due to technology defaults and variations and increasing system and circuit complexity. This paper describes the major low-level issues, such as dynamic and static power consumption, temperature, technology variations, interconnect, DFM, reliability and yield, and their impact on high-level design, such as the design of multi-Vdd, fault-tolerant, redundant or adaptive chip architectures. Some very low power System-on-Chip (SoC) will be presented in three domains: wireless sensor networks, vision sensors and mobile TV.

#### I. INTRODUCTION

With the introduction of very deep submicron technologies as low as 45 nanometers and tomorrow down to 32 and even 22 nanometers, integrated circuit (IC) designers have to face two major challenges: first, they have to take into account a dramatic increase in complexity due to the number of components including multi-core processors ("More Moore") but also due to the significant increase in heterogeneity ("More than Moore"). Secondly, the significant decrease in reliability of the components needs to be taken into account, in particular with the behavior of switches that are very sensitive to technology variations, temperature effects and environmental conditions.



Figure 1. Problems in SoC Design

Figure 1 shows a list of many problems that are present today in the design of SoCs. The trend, given by the European JTI ARTEMIS platform initiative, is to describe SoC behavior at increasingly higher levels in order to enable reduced time to market. The gap with low level effects inherent to very deep submicron technologies is widening, as one has to take into account more and more effects like process variations, leakage, temperature, interconnect delays, while not impacting yield and cost, which is the focus of the European JTI ENIAC platform initiative. In addition, the "More than Moore" impact is a new low-level issue with the introduction of MEMS and NEMS on top of chips with specific packaging constraints, extending the SoC complexity with SiP issues. Furthermore, power management and increased levels of autonomy are more than ever a main issue, and call for complex management blocks that can accommodate with a variety of sources ranging from batteries to energyscavengers. For these reasons, the relationships between all these design aspects become very complex and clearly need to be addressed using interdisciplinary approaches, and this is the essence of heterogeneous SoC design.

The relationships between these design aspects are very complex. The necessary design methodologies become extremely interdisciplinary. Designers are forced to go higher and higher in the abstraction levels, like it is proposed in the ARTEMIS platform. However, they are also forced to go lower and lower, as proposed in the ENIAC platform. The result is a huge gap between the two, which is larger and larger!!

#### II. INTERDEPENDENCY FROM LOW LEVEL TOWARDS HIGH LEVEL

The interdependency between low level issues mainly due to very deep submicron technologies, and high-level issues related to SoC design, is a major design issue today. One can think that the gap between low level and high level is larger and larger, with the risk that high level designers could totally ignore low level effects and produce non working SoCs. Leakage power, technology variations, temperature effects, interconnect delay, design for manufacturability, yield, and tomorrow "beyond CMOS" unknown devices (ENIAC), are the main low level design aspects that have to be shifted to the high level synthesis. They will impact the high level design methodologies (ARTEMIS), for instance, by rethinking the clocking scheme of processor architectures, by the introduction of redundancy and fault-tolerance, by increasing the number of processor cores, by using multi-voltage domains or by using more dedicated techniques to reduce dynamic and static power. An example of big impact of the low level on high level design is interconnect delays. They are increased due to the smaller and smaller section of wires distributing the clock. So alternate architectures are clockless asynchronous architectures, moving to multicores or organized as GALS (Globally Asynchronous and Locally Synchronous) and using Networks-on-Chips.

#### A. Dynamic Power

Many techniques have been proposed (and some are widely used today) for reducing dynamic power. One has in a non exhaustive list gated clock, logic parallelization, activity reduction, asynchronous, adiabatic, bus encoding, standard cell libraries, complex gate decomposition and transistor sizing. The gated clock technique is widely used (to cut the clock when the unit is idle). Parallelism has a strong impact on high level design. Working with many parallel cores or execution units at low supply voltage is always beneficial for the dynamic power. However, it is another story for leakage due the significant increase in terms of number of transistors.



Figure 2. Datapath Parallelization

Circuit parallelization has been proposed to maintain, at a reduced Vdd, the throughput of logic modules that are placed on the critical path [1, 2]. It can be achieved with M parallel units clocked at f/M. Results are provided at the nominal frequency f through an output multiplexer (Fig. 2). Each unit can compute its result in a time slot M times longer, and can therefore be supplied at a reduced supply voltage. If the units are data paths or processors [2], the latter have to be duplicated, resulting in an M times area and switched capacitance increase. Applying the well-known dynamic power formula, one can write:

$$P = M*C * f/M * Vdd^2 = C * f * Vdd^2$$

So the dynamic power is reduced as Vdd can be reduced due to the M times longer clock period.

#### B. Impact of Leakage on Architectures

The leakage current of switches when they are off is becoming a very dramatic problem regardless of the technology, may it be CMOS, carbon nanotube (CNT) or nanowires. Leakage power increases exponentially with decreasing threshold voltage VT, implying that a significant part of the total power can be leakage for large SoCs. The wasted power is very dependent of the external conditions, such as the chosen technology, the values of VT, the duty cycle defined by the application, etc... There are many techniques [3] at low level and circuit level for reducing leakage, such as using sleep transistors to cut the supply voltage for idle blocks, but other techniques are also available (such as several VT's, stacked transistors, or body biasing).

In addition to circuit-level techniques, the total power consumption can also be reduced at architectural level. Specific blocks can be operated at optimal supply values (reduced Vdd reduces dynamic power), and optimal VT (larger VT reduce static power) for a given speed, in order to find the lowest total power (Ptot) depending on the architecture of a given logic block. Therefore, between all the combinations of Vdd/VT guaranteeing the desired speed, only one couple will result in the lowest power consumption [4, 5]. The identification of this optimal working point and its associated total power consumption are tightly related to architectural and technology parameters. This optimal point is depending on activity (a) and logical depth (LD). A not too small activity is preferred in such a way that dynamic power would be not negligible versus static power. A small LD is preferred as too many logic gates in series result in gates that do not switch sufficiently. A gate that does not switch is useless as it is only a leaky gate. The ratio between dynamic and static power is thus an interesting figure of merit, and it is linked to the ratio between Ion/Ioff of the technology. This ratio is smaller and smaller due to leaky transistors. In [4], this ratio is related to the activity (a) and the logical depth (LD) with the following formula:

Ion/Ioff = k1 \* LD/a



Figure 3. Multiplier Architectures

With a small Ion/Ioff ratio (100 to 500), it can be observed that LD has to be small and activity quite large. This implies a clear paradigm shift, as activity has been until now a main factor to be reduced, because only the dynamic power was considered. When both static and dynamic power are considered, the activity should not be as small as possible, as very inactive gates or transistors are leaky devices. The parameter k1 is the ratio between dynamic and static power; it is roughly between 1 and 5. This optimal power has been estimated for eleven 16-bit multiplier architectures. Figure 3 shows that too sequential multiplier architectures (at the right of the picture) present a very large total power due to the fact that they are not fast enough and consequently have to be operated at high Vdd (large dynamic power) and very low VT( large static power). Conversely, reasonably parallel multiplier architectures such as the Wallace Tree present the best total power. However, if one increases the parallelism too much (2- or 4-Wallace trees in parallel), even at very low Vdd and high VT, leakage power and total power re-increase due to the very large number of logic gates.

#### C. Interconnect Delays

The wire delays are a main issue: for every technology node with a reduction factor S, the wire delay is increased by a factor  $S^2$ !! It is a severe problem for busses, but it is an extremely dramatic problem for clock distribution.

Consequently, the influence on architectures is large: everything could be clockless (asynchronous) or GALS (Globally Asynchronous and Locally Synchronous). Any architecture becomes an array of N\*N zones (isochronous), so it leads naturally to multicore architectures and to massive parallelism with very difficult synchronization problems. For such architectures, it is mandatory to consider NoC (Networkon-Chip) for designing efficient complex SoCs.

# D. Process Variations

On the same die, there are technology variations from transistor to transistor, which can be systematic or random due to oxide thickness variations, small difference in W and L transistor dimensions, doping variations, temperature and effects of Vdd variations. Many of these variations impact the VT, which can impact the delay variations by a factor 1.5 and leakage by a factor 20. Other effects have not to be neglected, such as soft errors. On the overall, these effects have a very dramatic impact on yield and consequently on the fabrication cost of the circuits. In addition to their low-level impacts, the variations described above also affect higher levels. An interesting impact is the fact that multi-core architectures, at the same throughput, are better to mitigate technology variations than single core architecture. With multi-core architecture, one can work at lower frequency for the same computation throughput. Consequently, the processor cores (at lower frequencies) are less sensitive to process variations on delay. At very high frequency, even a very small VT variation will have a quite large impact on delay variation.

For over-100nm technologies, Adaptive Body Biasing (ABB) is a good technique for compensating the variations [6, 7]. Since ABB changes the  $V_T$  value directly, it can control both leakage and delay. Also, the overhead of this technique is small. This technique is very good but has three important weaknesses. First, using ABB for compensating intra-die variations of NMOS transistors need triple-well technology. Second, the increased short-channel effect due to scaling has decreased the body factor of bulk-CMOS drastically. According to the foundry data, at 65-nm technology, ABB can change  $V_T$  value effectively less than 60 mV. This amount is much less than PV and temperature effects. And third, body factor is almost zero in emerging Multi-Gate devices which are promising candidate for future electronics [9]. In addition, in multi-gate devices (double-gated FinFET, tri-gated, gateall-around or GAA), body factor is much smaller than in single-gate devices because of the enhanced coupling between gate and channel. Measurements in [8] show that in GAA devices body factor is exactly zero. So we need to find new compensation techniques as replacements of ABB.

Looking at standard cell libraries and digital block design, some rules could be given regarding technology variations. Resistance to technology variations is better with long critical paths, as the technology variations are better compensated with a large number of cells connected in series. For the same logic function, a way to have more cells in a given critical path is to provide a standard cell library with few simple cells, as shown in [10]: "It can be shown that with a small set of Boolean functions ... (and careful selection of lithography friendly patterns)...we mitigate technology variations". For designing digital block architecture, one can ask the following question: for a full adder, which is the best architecture (ripple carry, carry look-ahead, etc...) and Vdd for reducing the effect of technology variations? A ripple carry adder at 500 mV provides same speed and same power than a carry lookahead adder at 400 mV with 2 times less sensitivity to PV. Using low-power slow circuits in higher Vdd voltage is better than using high-power fast circuits in lower Vdd!

PCMOS or Probabilistic CMOS, is a new very promising technique [11]. It is based on the fact that each logic gate has a probability of failure. So it characterizes an explicit relationship between the probability (p) at which the CMOS switch computes correctly, and its associated energy consumed by each switching step across technology generations. Each basic logic gate (NOT, NAND, NOR) has a given probability to provide a correct result for a given input. For instance, a truth table indicates that for input 100 (correct output is "0"), probability for the output to be "1" is <sup>1</sup>/<sub>4</sub> while probability for the output to be "0" is <sup>3</sup>/<sub>4</sub>. Using such basic gates to synthesize more complex functions (adder, flip-flops, etc...), over many different schematics that perform the same function, the optimized schematic is chosen in such a way of minimizing the probability of failure.

Logic circuits based on transistors operated in weak inversion (also called subthreshold) offer minimum possible operating voltage [12], and thereby minimum Pdyn for a given Pstat. This technique has been revived recently and applied to complete subsystems operated below 200 mV. It has been demonstrated that minimal energy circuits are those operated in subthreshold regime with Vdd below VT, resulting in lower frequencies and larger clock period. Therefore, dynamic power is reduced, static power is decreased, although the static energy is increased as more time is required to execute the logic function, meaning that there is an optimum in energy. This optimal energy is also depending on logic depth and activity factor [13]. The minimal Vdd (and minimal energy) is smaller for small logical depth and for large activity factors. Reference [14] shows this optimum for Vdd=0.4 Volt with VT at 0.4 Volt.

Another approach is to introduce spatial or timing redundancy for implementing Fault-Tolerant architectures. It is a paradigm shift, as any system would not be composed of reliable units, but one has to consider that every unit could fail, without inducing the entire system to fail. A possible architecture is to use massive parallelism while presenting redundant units that could take over the work of faulty units. One can have spatial redundancy (very expensive) or timing redundancy (quite expensive in terms of throughput). However, all redundant architectures face the same problem: the overhead in hardware or in throughput is huge, which is a contradictory effect for energy efficient architecture. An example for limiting the hardware overhead is to compare the result of a given operation at 2 different time frames. But as the same operation is executed two times, it reduces the throughput by a factor of 2.

### *E. Yield and DFM*

For very deep submicron technologies, the smallest dimensions of transistor geometries on the mask set are well below the lithographic light wavelength. This yields a variety of unwanted effects, such as bad end line extension, missing small geometries, etc... They can be corrected by OPC (Optical Proximity Correction) which is a technique available for DFM (Design For Manufacturability). However, to facilitate the process of mask correction by OPC, it is recommended to have regular circuit layout. Regular arrays implementing combinational circuits like PLA or ROM memories are therefore more and more attractive. Figure 4 shows three examples of regular layout. A first example back to 1988 [15] is shown at right of Fig. 4 in micronic technology, called gate-matrix style. It was used to facilitate the automatic layout generation. The two other pictures describe a SRAM cell as well as nanowires [16] for which it is mandatory to have very regular structures. This has a huge impact on architectures and systems: SoC architectures should be based on regular arrays and structures, such as PLAs and ROMs for combinational circuits and volatile memories such as SRAM for data storage. Consequently, SoC design should be fully dominated by memories and array structures.



Figure 4. Regular layouts

#### F. Alternative Energy Sources

SoCs used in portable devices may be powered by a variety of energy sources and sometimes energy will be scavenged from the environment. Primary or rechargeable batteries may be used and ultimately miniature fuel cells. To implement energy scavenging, one could use vibrations, thermoelectricity, solar cells, human energy sources, etc... Considering the SoC itself, one has to generate inside the chip multiple supply voltages with very diverse peak currents (some uA, some mA, up to 10 or 100 mA). This requires « Power Management » circuits that may be very complicated circuits (DC-DC, regulators) in particular for high-efficiency implementations required by low-power applications. On top of this, one requires to add DVS and DVFS (Dynamic Voltage Frequency Scaling). It turns out that the power management circuit has to manage many aspects, i.e. energy sources, the multiple supply voltages that have to be generated, DVFS as well as idle modes, resulting in a complex control that is most of the time performed in software by the Operating System. In addition,

this part of the embedded software has to interact with the application embedded software, which increases the overall complexity.

#### G. Complexity

With technology scaling, increasingly more low-level effects have to be taken into account. Consequently, the impacts of these low level effects on to the high level SoC synthesis process are more and more difficult to understand and to be taken into account. Only the low level effects have been presented here, but there are also effects at high level that have to be taken at low level, such as architectures for executing efficiently a given language, asynchronous architectures requiring special Standard Cell Libraries or parallelizing compiler onto N processors, and their constraints on to the processor architectures.

#### III. HETEROGENEOUS SOC EXAMPLES

This Section shows some SoC examples designed at CSEM for research projects or for industrial customers. These circuits are extremely low power SoCs for radio communication, image recognition or mobile TV applications. The first SoC is called WiseNET [17] and is a circuit designed for supporting radio communication and has been leveraged and industrialized into a home security application for industrial customer. The second SoC is a vision sensor integrated with the processor and memory on the same chip. The third SoC has been designed by a Swiss company named Abilis, using a CSEM DSP core. The fourth SoC is a radio communication circuit using a powerful CSEM processor core.

### A. Wisenet SoC



Figure 5. Wisenet SoC
The Wisenet SoC contains an ultra-low-power dual-band radio transceiver (for the 434 MHz and 868 MHz ISM bands), a sensor interface with a signal conditioner and two analog-todigital converters, a digital control unit based on a CoolRISC microcontroller with SRAM low-leakage memories and a power management block. In terms of power consumption, the most critical block is the RF transceiver. In a 0.18micrometer standard digital CMOS process, in receive mode, the radio consumes 2.3 mA at 1.0 Volt and 27 mA in transmit mode for 10dBm emitted power. However, as the duty cycle of any WSN application is very low, using the WiseNET transceiver with the WiseMAC protocol [18], a relay sensor node consumes about 25 microwatts when forwarding 56-byte packets every 100 seconds, enabling several years of autonomy from a single 1.5V AA alkaline cell. Figure 5 shows the integrated WiseNET SoC.

# B. Vision Sensor SoC

Icycam is a circuit combining on the same chip a CSEM 32bit icyflex 1 processor [19] operated at 50 MHz, and a high dynamic range versatile pixel array integrated on a 0.18  $\mu$ m optical process.



Figure 6. icycam SoC

Icycam has been developed to address vision tasks in fields such as surveillance, automotive, optical character recognition and industrial control. It can be programmed in assembler or C code to implement vision algorithms and controlling tasks. The icyflex 1 processor communicates with the pixel array, the on-chip SRAM and peripherals via a 64bit internal data bus. The pixel array has a resolution of 320 by 240 pixels (OVGA), with a pixel pitch of 14 um. Its digital-domain pixel-level logarithmic compression makes it a low noise logarithmic sensor with close to 7 decades of intrascene dynamic range encoded on a 10-bit data word. One can extract on the fly the local contrast magnitude (relative change of illumination between neighbour pixels) and direction when data are transferred from the pixel array to the memory. Thus it offers a data representation facilitating image analysis, without overhead in term of processing time. Data transfer between the pixel array and memory or peripherals is performed by group of 4 (10 bits per pixel) or 8 (8 bits per pixel) pixels in parallel at system clock rate. These image data can be processed with the icyflex's Data Processing Unit (DPU) which has been complemented with a Graphical Processing Unit (GPU) tailored for vision algorithms, able to perform simple arithmetical operations on 8- or 16-bit data grouped in a 64-bit word. Internal SRAM being size consuming, the internal data and program memory space is limited to 128 KBytes. This memory range can be extended with an external SDRAM up to 32 MBytes. The chip has been integrated and is pictured in Figure 6.

### C. Mobile TV SoC

CSEM has licensed a DSP core (called MACGIC [20]) to Abilis [21], a Swiss company of the Kudelski group. This DSP core has been used in a SoC for broadband communication in a wireless multipath environment using Orthogonal Frequency Division Multiplexing (OFDM).



Figure 7. Abilis SoC

The SoC developed by Abilis (Fig. 7) is an OFDM digital TV receiver for the European DVB-T/H standards containing a multi-band analog RF tuner, immediately followed by an analog-to-digital-converter (ADC) and a digital front-end implementing time-domain filtering and I/Q channels mismatch correction. Several algorithms are executed on chip, such as mismatch correction, Fast Fourier Transform (FFT), equalizer, symbol de-mapping and de-interleaving, forward error correction (FEC) through Viterbi decoder, de-interleaver

and Reed-Solomon decoder. The main algorithms implemented by the software programmable OFDM demodulator are the frequency compensation, the FFT and an adaptive channel estimation/equalization. Abilis has designed a 90nm single-die digital mobile TV receiver platform (Fig. 7), from which two different chips, the AS-101 and AS-102 have been developed (for DVB-T/H applications). The programmable OFDM demodulator is implemented as a set of 3 CSEM's MACGIC DSPs customized for OFDM applications (Fig. 8). The SoC contains also an ARC 32-bit RISC core as well as four hardware accelerators (RS decoder, Viterbi decoder, de-interleaver, PID filter).



Figure 8. CSEM MACGIC Test Chip

### D. SoC for RF Aplications

The SoC icycom has been design for radio communication for surveillance applications and high end wireless sensor networks.



Figure 9. icycom SoC

The chip is based on a CSEM DSP icyflex1 that runs at up to 3.2 MHz [19]. Its average dynamic power is 120  $\mu$ W/MHz @ 1.0 V. The radio is RF: 865 ~ 915 MHz, using FSK type modulation schemes, including MSK, GFSK, 4FSK, OOK and OQPSK. In transmit mode TX, there is 10 dBm of emitted power. In receive mode RX: -105 dBm at 200 kb/s.

The power management circuits provide power supplies for external devices and use single alkaline or lithium cells. There are many low power and standby modes. The chip contains a 10 bit ADC, a 96 KBytes SRAM (with BIST), DMA, RTC, Timers, Watchdog, I2C, SPI, I2S, GPIO, UART and JTAG. It has been integrated (Fig. 9) in TSMC 180 nm generic technology.

#### IV. DISRUPTIVE ARCHITECTURES AND SYSTEMS?

By looking at various Roadmaps, the end of CMOS «scaling» is predicted around 11 nanometers, around 2013 to 2017. So we could conclude of this that after 2017, we should move to « Beyond CMOS ». However, today, there is no clear alternating route to replace CMOS. If one is looking at CNT, nanowires, molecular switches etc..., one can conclude that it is not so clear how to use these devices for architectures and systems requiring billions of switches and how to interconnect them with billions of wires. Nevertheless, there is an interesting approach in hybrids CMOS and nano-devices, it will be heterogeneous... With these nano-elements, one has sometimes the same problems at low level (leakage, process variations), but we could also imagine or hope that some of these effects would disappear!

It is sometimes interesting to revise completely the classical ways of thinking and to try to elaborate disruptive heterogeneous SoC architectures. A first idea could be to design a single universal SoC platform: the motivation is that all applications have to rely on the same hardware, and consequently, the design and differentiator between various applications is fully concentrated on embedded software. Such a SoC platform would be very expensive to develop, about 100 M $\in$ , and one could ask whether it remains reasonable for applications sensitive to power consumption or to other specific performances.

A second idea is a SoC dominated by memories. Memories are automatically generated, implying that the hardware part to design is very small and yields low development. It means that one has to maximize the on-chip memory part, with very small processors and peripherals. In this case, the design of a new chip mainly consists in the development of embedded software. It is therefore similar to the first idea, the difference being that a new chip is designed with the required amount of memory, but not more. A third idea is a SoC with 1'000 parallel processors. It is very different from multicore chips with 2 to 32 cores. With 1'000 cores, each core is a very small logic block of 50K gates combined with a lot of memory. A fourth idea is the design of SoC architectures with nanoelements. The design methodology will be completely different, consisting in a bottom-up design methodology and not in a top-down one. It is due to the fact that the fabrication process will produce many nano-devices with few of them being functional. So the design methodology will consist of checking if the fabricated chip can be used for something useful. However, the applications which will be completely different than existing microprocessors; one can think more about neural nets, biological circuits or learning circuits.

#### V. CONCLUSION

The diagnostic is clear: complexity increases, interdisciplinary too. There are increasingly more interactions between all design levels from application software down to RF-based MPSoC and even MEMS and SiP. Consequently, engineers have to design towards higher and higher design levels but also down to lower and lower design levels. This widening gap will call for design teams that are more and more heterogeneous, with increasingly challenging objectives: to perform focused research for providing outstanding and innovative blocks in a SoC, but also interdisciplinary research which becomes the "key" to successful SoC designs.

#### VI. ACKNOWLEDGEMENTS

The author wishes to acknowledge the CSEM design teams that contributed to the SoC cases described above: C. Arm, M. Morgan, D. Séverac, S. Gyger, J-L. Nagel, F. Rampogna, S. Todeschini, R. Caseiro of the "SoC and Digital Group", E. Franzi, P-F. Ruedi, F. Kaess, E. Grenet, P. Heim, P-A. Beuchat, of the "Vision Sensor Group", V. Peiris. D. Ruffieux, F. Pengg, M. Kucera, A. Vouilloz, J. Chabloz, M. Contaldo, F. Giroud, N. Raemy of the "RF and Analog IC Group", E. Le Roux, P. Volet of the "Digital Radio Group".

The author also wishes to acknowledge the EU project UltraSponder partners (FP7 ICT 20007-2, project 224009), for using the icycom SoC.

The author also acknowledges the industrial contributions from Hager and Semtech for the WiseNET SoC, and Abilis for the MACGIC-based SoC for mobile TV.

#### VII. REFERENCES

[1] A. P. Chandrakasan, S. Sheng, R. W. Brodersen, "Low-Power CMOS Digital Design" IEEE J. of Solid-State Circuits, Vol. 27, No 4, April 1992, pp. 473-484.

[2] C. Piguet et al. "Logic Design for Low-Voltage/Low-Power CMOS Circuits", 1995 Intl. Symposium on Low Power Design, Dana Point, CA, USA, April 23-26, 1995.

[3] K. Roy, S. Mukhopadhyay, H. Mahmoodi-Meimand, "Leakage Current Mechanisms and Leakage Reduction Techniques in Deep-Submicrometer CMOS Circuits", Proc. of the IEEE, Vol. 91, No.2, 2003, pp305-327.

[4] C. Schuster, J-L. Nagel, C. Piguet, P-A. Farine, « Leakage reduction at the architectural level and its application to 16 bit multiplier architectures", PATMOS '04, Santorini Island, Greece, September 15-17, 2004

[5] C. Schuster, J-L. Nagel, C. Piguet, P-A. Farine, "Architectural and Technology Influence on the Optimal Total Power Consumption", DATE 2006, Munchen, March 6-10, 2006

[6] J. Tschanz et al., "Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage", JSSC Nov. 2002.

[7] N. Jayakumar and S.P. Khatri, "A variation-tolerant subthreshold design approach," DAC, pp. 716-719, 2005.

[8] N. Singh, et al., "High-performance fully depleted silicon nanowire (diameter ≤5 nm) gate-all-around CMOS devices," IEEE Electron Device Lett., vol. 27, no. 5, pp. 383–386, May 2006.

[9] B. Kheradmand Boroujeni, C. Piguet, Y. Leblebici, "Adaptive Vgs: A Novel Technique for Controlling Power and Delay of Logic Gates in Sub-V<sub>T</sub> Regime", VLSI-SoC'08, Rhodes, Greece, Oct. 10-12, 2008

[10] T. Jhaveri et al. "Maximization of layout printability/Manufacturability Extreme Layout by Regularity", "Design and process Integration fro Microelectronic Manufacturing IV" Edited by A. K. K. Wong and V. K. Singh, proc. SPIE Vol. 6156 615609-1, 2006

[11] Lakshmi N. B. Chakrapani , Krishna V. Palem, "A Probabilistic Boolean Logic and its Meaning", Report No TR08-05, June 2008, Pages 1-33, Rice University, Houston, Texas, USA

[12] E. Vittoz, "Weak Inversion for Ultimate Low-Power Logic", Chapter 16 in « Low-Power Electronics Design", CRC Press, November 2004, edited by Christian Piguet.

[13] Zhai, B., Blaauw, D.; Sylvester, D. & Flautner, K.
"Theoretical and Practical Limits of Dynamic Voltage Scaling", *Design Automation Conference*, 2004, pp. 868-873.
[14] Hanson, S.; Zhai, B.; Blaauw, D.; Sylvester, D.; Bryant, A. & Wang, "Energy Optimality and Variability in Subthreshold Design", International Symposium on Low Power Electronics and Design, ISLPED 2006, pp. 363-365.

[15] C. Piguet, G. Berweiler, C. Voirol, E. Dijkstra, J. Rijmenants, R. Zinszner, M. Stauffer, M. Joss, "ALADDIN : A CMOS Gate-Matrix Layout System", Proc. of ISCAS 88, p. 2427, Espoo, Helsinki, Finland, 1988.

[16] M. Haykel Ben Jamaa, Kirsten E. Moselund, David Atienza, Didier Bouvet, Adrian M. Ionescu, Yusuf Leblebici, and Giovanni De Micheli, "Fault-Tolerant Multi-Level Logic Decoder for Nanoscale Crossbar Memory Arrays", Proc. ICCAD'07, pp. 765-772

[17] V. Peiris, et. al, "A 1V 433/868MHz 25kb/s-FSK 2kb/s-OOK RF Transceiver SoC in Standard Digital 0.18μm CMOS," in Int. Solid-State Circ. Conf. Dig. of Tech. Papers, Feb. 2005, pp. 258–259

[18] A. El-Hoiydi, J.-D. Decotignie, C. Enz and E. Le Roux, «WiseMAC, an Ultra Low Power MAC Protocol for the WiseNET Wireless Sensor Network", SenSys'03, November 5–7, 2003, Los Angeles, California, USA.

[19] C. Arm, S. Gyger, J.-M. Masgonty, M. Morgan, J.-L. Nagel, C. Piguet, F. Rampogna, P. Volet, « Low-Power 32-bit Dual-MAC 120 uW/MHz 1.0 V icyflex DSP/MCU Core", ESSCIRC 2008, Sept. 15-19, 2008, Edinburgh, Scotland, U.K.

[20] C. Arm, J.-M. Masgonty, M. Morgan, C. Piguet, P.-D. Pfister, F. Rampogna, P. Volet; "Low-Power Quad MAC 170 uW/MHz 1.0 V MACGIC DSP Core", ESSCIRC 2006, Sept. 19-22. 2006, Montreux, Switzerland

[21] http://www.abiliss.com

# Two-Phase Cooling of Targets and Electronics for Particle Physics Experiments

J.R. Thome, J.A. Olivier, J.E. Park

Laboratory of Heat and Mass Transfer (LTCM), École Polytechnique Fédérale de Lausanne (EPFL) Station 9, CH-1015 Lausanne, Switzerland

john.thome@epfl.ch

# Abstract

An overview of the LTCM lab's decade of experience with two-phase cooling research for computer chips and power electronics will be described with its possible beneficial application to high-energy physics experiments. Flow boiling in multi-microchannel cooling elements in silicon (or aluminium) have the potential to provide high cooling rates (up to as high as  $350 \text{ W/cm}^2$ ), stable and uniform temperatures of targets and electronics, and lightweight construction while also minimizing the fluid inventory. An overview of two-phase flow and boiling research in single microchannels and multi-microchannel test elements will be presented together with video images of these flows. The objective is to stimulate discussion on the use of two-phase cooling in these demanding applications, including the possible use of CO<sub>2</sub>.

#### I. INTRODUCTION

Flow boiling in microchannels has become one of the "hottest" research topics in heat transfer. Numerous experimental studies on boiling in microchannels have appeared over the past decade, especially in the past few years. Most tests have been done with refrigerants but tests have also been done with water, acetone, CO<sub>2</sub>, etc.

#### A. Electronics cooling application

As an example of the two phase cooling application, the microelectronics and power electronics industries are now facing the challenge of removing very high heat fluxes of 300 W/cm<sup>2</sup> or more while maintaining their operating temperature below the targeted temperature, such as 85°C for CPUs. Although conventional cooling solutions, such as aircooled heat sinks, have been used successfully until now, no straightforward extension is expected for such high heat fluxes. Alternative solutions such as jet impingement cooling, single-phase and two-phase cooling in microchannels have been explored and showed different advantages or drawbacks [1]. Figure 1 shows the heat sink thermal resistances for diverse cooling technologies as a function of the pumping power to the dissipated thermal power ratio. The best heat sink solution should be that nearest the lower left axis intersection point because it represents the lowest thermal resistance at the lowest pumping power. Recent literature on two-phase flow boiling in microchannels does not yet show a race to achieve very high heat fluxes compared to singlephase flow or jet cooling studies, although it yields much lower pressure drops and a much higher overall efficiency (dissipated power/pumping power). The fact that the fluid temperature varies very little during the vaporization process and that the heat transfer coefficient increases with heat flux are also major advantages.



Figure 1: Thermal resistance of heat sinks for diverse cooling technologies as a function of the pump to the dissipated power ratio from Agostini *et al.* [1].

#### B. Microchannel effect

It is worth noting that what happens in small channels in two-phase flows can be quite different than that for singlephase flows in small channels. While initial studies in the literature reported significant size effects on friction factors and heat transfer coefficients in very small channels in singlephase flows, more accurate recent tests and analysis done with very smooth internal channels have shown that macroscale methods for single-phase flows work well at least down to diameters of 5-10 microns. This is not the case for macroscale two-phase flow methods, which usually do not work very well when compared to data for channels below about 2.0 mm diameter. Thus, it is very risky to extrapolate macroscale twophase flow pattern maps, flow boiling methods and two-phase pressure drop correlations to the microscale, except for specific documented cases. Furthermore, many of the controlling phenomena and mechanisms change when passing from macroscale two-phase flow and heat transfer to the microscale. For example, surface tension (capillary) forces become much stronger as the channel size diminishes while gravitational forces are weakened. Therefore, it is usually not sensible to empirically refit macroscale methods to microscale data since the underlying physics has substantially changed, which means that different dimensionless groups are now controlling and/or come into play.

Figure 2 depicts the buoyancy effect on an elongated bubble in 2.0, 0.790 and 0.509 mm horizontal channels. In the 2.0 mm channel, no stratified flow was observed while the difference in film thickness at the top compared to that at the bottom is still quite noticeable. Similarly, the film thickness in

the 0.790 mm channel is still not uniform above and below the bubble. Instead, in the 0.509 mm channel, the film is now quite uniform. Interpreting these images and many others available in the literature, one ascertains that in small, horizontal channels that stratified-wavy and fully stratified flows disappear (more or less completely). This transition is thus perhaps an indication of the lower boundary of macroscale two-phase flow, in this case occurring for a diameter somewhat greater than 2.0 mm. The upper boundary of microscale two-phase flow may be interpreted as the point in which the effect of gravity becomes insignificant, such that the bubble in the 0.509 mm channel is thus a microscale flow, with the transition occurring at about this diameter at the present test conditions.



(a) 2.0 mm



(b) 0.790 mm



(c) 0.509 mm

Figure 2: Video images of slug (elongated bubble) flow in a 2.0, 0.8 and 0.5 mm horizontal channels with R-134a at 30°C at the exit of a micro-evaporator channel of the same diameter (images by R. Revellin of LTCM).

#### II. PREDICTION METHODS

Numerous applications for microscale flow boiling are emerging: high heat flux cooling of computer microprocessor chips and power electronics, cooling of micro-reactors, microheat pumps and micro-refrigerators, automotive evaporators with multi-port aluminium tubes, etc. All of these applications require thermal design methods that are accurate, reliable and robust (that is, methods that follow the trends of data well and work for a multitude of fluids, microchannel sizes and shapes, pressures, flow rates, applications, etc.). Presently, the stateof-the-art is only partially able to fulfil such requirements.



Figure 3: Heat transfer trends versus vapor quality documented by Agostini and Thome [2] from 13 different studies on boiling in microchannels.

Agostini and Thome [2], based on a review of 13 published studies, analyzed the numerous trends in the heat transfer data. Figure 3 shows a composite diagram of these trends in the local flow boiling heat transfer coefficient plotted versus the vapor quality (defined as the rate of the mass flow rate of the vapor to that of the total flow) and denotes whether or not the heat transfer coefficient varied with another parameter or not, where an arrow with the symbol shows the direction of the variation with this parameter. For instance, QX1 means that the heat transfer coefficient decreased with increasing vapor quality but at the same time the heat transfer coefficient increased with increasing heat flux. OX2 showed a similar trend except that the data all came together at a higher vapor quality. In contrast, QX3 describes data in which the heat transfer coefficient increased with vapor quality and with heat flux. The X1 data type decreased sharply with vapor quality but did not depend on mass velocity or heat flux whereas X2 refers to data sets that only increased with vapor quality while were insensitive to mass velocity and heat flux. The GX1, GX2 and GX3 showed three types of trends with respect to mass velocity and vapor quality.

The majority of the studies found boiling heat transfer trends represented by QX1 and X1 (11 out of 13). It was thus concluded generally that:

- at very low vapor qualities (x < 0.05), the heat transfer coefficient either tends to increase with vapor quality or is insensitive to vapor quality while it increases with heat flux (not shown);
- at low to medium vapor qualities (0.05 < x < 0.5), the heat transfer coefficient increases with heat flux

and decreases or is relatively constant with respect to vapor quality;

- at higher vapor qualities (x > 0.5), the heat transfer coefficient decreases sharply with vapor quality and does not depend on heat flux or mass velocity;
- the effect of heat flux is always to increase the heat transfer coefficient except at high x where it tends to have little effect (more recent studies show however that at very high heat flux its effect diminishes and then may even create a decrease in heat transfer with a further increase in heat flux;
- the influence of mass velocity varies from no effect, an increasing effect or a decreasing effect.

These conflicting trends, which are different than the simple trends typically found in macroscale flow boiling, appear to point to the influence of additional phenomena, channel geometry, surface roughness and heat transfer mechanisms coming into play in microchannel boiling. Thus, heat transfer coefficients are extremely difficult to predict. Most of the existing heat transfer models are empirical correlations.

Jacobi and Thome [3] proposed the first theoreticallybased, elongated bubble (slug) flow boiling model for microchannels, modelling the thin film evaporation of the liquid film trapped between these bubbles and the channel wall and also accounting for the liquid-phase convection in the liquid slugs between the bubbles. The focus of their study was to demonstrate that the thin film evaporation mechanism was the principal heat transfer mechanism controlling heat transfer in slug flows in microchannels, not nucleate boiling as cited in many experimental studies where that conclusion was based solely on the basis of heat transfer coefficient vs. heat flux data plotting up like a nucleate pool boiling curve but without actual observations (and by extrapolation of macroscale ideology to the microscale).

Following this initial work, a three-zone flow boiling model for slug (elongated bubble) flow in microchannels was proposed by Thome et al. [4], i.e. an updated version of the prior two-zone model of Jacobi and Thome [3]. Figure 4 shows a representation of the three-zone model where  $L_p$  is the total length of the pair or triplet,  $L_L$  is the length of the liquid slug,  $L_G$  is the length of the bubble including the length of the dry wall of the vapor slug  $L_{dry}$ , and  $L_{film}$  is the length of the liquid film trapped by the bubble. The internal radius and the diameter of the tube are *R* and  $d_i$  while  $\delta_o$  and  $\delta_{min}$  are the thicknesses of the liquid film trapped between the elongated bubble and the channel wall at its formation and at dry out of the film (only when dry out occurs). The evolution of successive bubbles is shown in the lower diagram. The local vapor quality, heat flux, microchannel internal diameter, mass flow rate and fluid physical properties at the local saturation pressure are input parameters to the model. The three-zone model predicts the heat transfer coefficient of each zone and the local time-averaged heat transfer coefficient of the cycle at a fixed location along a microchannel during evaporation of an elongated bubble at a constant, uniform heat flux boundary condition. The elongated bubbles are assumed to nucleate and quickly grow to the channel size upstream such that successive elongated bubbles are formed that are confined by

the channel and grow in axial length, trapping a thin film of liquid between the bubble and the inner tube wall as they flow along the channel. The thickness of this film plays an important role in heat transfer. At a fixed location, the process is assumed to proceed as follows: (i) a liquid slug passes (without any entrained vapor bubbles, contrary to macroscale flows which often have numerous entrained bubbles), (ii) an elongated bubble passes (whose liquid film is formed from liquid removed from the liquid slug) and (iii) a vapor slug passes if the thin evaporating film of the bubble dries out before the arrival of the next liquid slug. The cycle then repeats itself upon arrival of the next liquid slug at a frequency  $f (=1/\tau)$ . Thus, a liquid slug and an elongated bubble pair or a liquid slug, an elongated bubble and a vapor slug triplet pass this fixed point at a frequency f that is a function of the formation and coalescence rate of the bubbles upstream.



Figure 4: Three-zone heat transfer model of Thome *et al.* [4] for elongated bubble flow regime in microchannels. *Top*: Diagram illustrating a triplet comprised of a liquid slug, an elongated bubble and a vapor slug; *bottom*: Bubble tracking of a *triplet* with passage of a new bubble at time intervals of  $\tau$ .

#### B. Pressure drop model

The two principle approaches to predict frictional pressure gradients in microscale two-phase flow are the homogeneous and the separated flow models. The homogenous model assumes that the two-phase fluid behaves as a single-phase fluid but uses pseudo-properties for the density and viscosity that are weighted relative to the vapor and liquid flow fractions. It is also assumed that the liquid and vapour flow at the same velocity, which is evidenced in slug flow within microchannels. The separated flow model considers that the phases are artificially segregated into two streams, one liquid and one vapor, and interact through their common interface.

An extensive comparison was done by Ribatski *et al.* [5]. Among them, the homogeneous model, the simplest model, predicted the data better than other, more complicated models, for a wide range of test conditions.

#### III. COOLANT FLUIDS

CO<sub>2</sub> is a natural refrigerant and has been intensively investigated for automobile air-conditioning, refrigeration and

heat pump systems over the past decade [6-11]. It has no ozone depletion potential (ODP = 0) and a negligible direct global warming potential (GWP = 1).

The physical and transport properties of CO<sub>2</sub> are quite different from those of conventional refrigerants at the same saturation temperatures. CO<sub>2</sub> has higher liquid and vapor thermal conductivities, a lower vapor-liquid density ratio (lower liquid and higher vapor densities), a very low surface tension, and a lower liquid-vapor viscosity ratio (lower liquid and higher vapor viscosities) than conventional refrigerants. Thus, flow boiling heat transfer, two-phase flow pattern and pressure drop characteristics are quite different from those of conventional low pressure refrigerants. Previous experimental studies have shown that CO<sub>2</sub> has higher flow boiling heat transfer coefficients and lower pressure drops than those of common refrigerants at the same saturation temperature [6, 7, 7]9, 11]. However, it must be realized that the operational pressure of CO<sub>2</sub> is much higher than other conventional refrigerants such as R134a and R245fa. Figure 5 compares the saturation pressure and temperature curves for several radiation-hard fluids. It is seen that the most likely candidate for low-temperature operations would be  $CO_2$  and  $C_2F_6$ . These fluids will be investigated in the next section.



Figure 5: Saturation curves for radiation-hard fluids

#### **IV. SIMULATION**

Simulations will be performed on a rectangular multimicrochannel as shown in the schematic in Figure 6. The simulations will compare single-phase flow to two-phase flow and are aimed at the cooling of a high precision silicon pixel detector, also called the GigaTracKer (GTK), being developed at CERN.

Some specifications regarding the GTK are that the channels should be as small as possible, with fin heights not being greater than 50  $\mu$ m, although simulations will be run on higher fin heights. The fin width can be made as small as possible, although mechanical integrity must be maintained.

The GTK has a total width of 60 mm with channel lengths being 30 mm. Channel lengths should be as short as possible as to reduce the total inlet-to-outlet temperature difference for single-phase flow and to keep the outlet vapor quality as low as possible for two-phase flow (due to a decrease in heat transfer coefficients for qualities greater than 0.4).



Figure 6: Schematic of a multi-microchannel evaporator

In all cases, the base thickness, *e*, will be zero, thus showing a best-case scenario as any additional material added can be accounted for in separate calculations. It is also stated that the GTK should not see a temperature difference of more than 5°C while being kept as cold as possible ( $\sim$  -30°C).

Assumptions made for the present simulations are: (1) the evaporator is uniformly heated from the bottom with a base heat flux of  $q_b$ , (2) the flow through the cooler is uniformly distributed between all the channels, (3) the top of the cooler is adiabatic and (4) for two-phase flow, no inlet subcooling is used. The models used for single-phase heat transfer and pressure drop are those from Shah and London [12], while the three-zone model [4] and homogeneous model were used for the two-phase heat transfer and pressure drop, respectively. The homogeneous model was used as it predicts the pressure drops within microchannels with fair accuracy [5].

The fluid to be simulated will be radiation-hard fluorcarbons and CO2. The saturations curves of these fluids were given in Figure 5. The choice of fluids depends on the desired operating condition. For temperatures below -10°C, the best choice of fluid would be between CO<sub>2</sub> and C<sub>2</sub>F<sub>6</sub> but possibly also C<sub>3</sub>F<sub>8</sub>. The most common cooling fluid used at CERN is  $C_6F_{14}$  and is used in single-phase flows only. This fluid is not ideal for two-phase cooling as it is a low-pressure fluid, having a saturation temperature of 56°C at atmospheric pressure, implying that the system would need to be under vacuum for lower temperatures. This has the disadvantage that one is limited by the allowable pressure drop within the cooling device, implying that channels should be relatively large. The potential of air also leaking into the system becomes greater, having serious consequences regarding the performance of the cooling device. Thus, for two-phase cooling,  $CO_2$ ,  $C_2F_6$  and  $C_3F_8$  will be compared.

#### A. Single-Phase Flow

Figure 7 shows the effects the channel width and height have on the maximum base temperature difference relative to the inlet and the pressure drop. In both cases the channel width was kept constant at 50  $\mu$ m while varying the fin height, and vice versa. The fin width was kept at 25  $\mu$ m. A base heat flux of 2 W/cm<sup>2</sup> was applied while maintaining the mass flux at 4500 kg/m<sup>2</sup>s. From the simulation, it is seen that the major contribution is in the increase of the fin height. By doubling the fin height the temperature difference and pressure drop are decreased by about 50%. Any further increase does not improve the performance by much. Thus, ideally the channel width should be kept at 50  $\mu$ m with the fin height at 100  $\mu$ m. For the given footprint dimensions this translates into 799 parallel channels.



Figure 7: Effect of channel width and fin height on maximum base temperature difference and pressure drop for single-phase flow using  $C_6F_{14}$  as the working fluid

#### B. Two-Phase Flow

The three fluids to be used in two-phase simulation are  $CO_2$ ,  $C_2F_6$  and  $C_3F_8$ . Due to the low saturation pressure of  $C_{3}F_{8}$ , all simulations will be performed at a saturation temperature of -1°C. The actual local temperatures and pressures for the three refrigerants are shown in Figure 8. The pressures are shown in terms of the ratio of the local pressure to the inlet pressure. As seen, although the process is twophase, the base temperature is not always constant and is dependent on the fluid. This is due to the dry out of the liquid film of the elongated bubble (viz. Figure 4). The film thickness is directly a function of the enthalpy of vaporisation with CO<sub>2</sub> having a value almost 3 times higher than the other refrigerants, thus having a thicker film, which inevitably does not dry out for the current conditions. Pressure drops are also significantly less for CO<sub>2</sub> and C<sub>2</sub>F<sub>6</sub> since their viscosities are about half that of  $C_3F_8$ .



Figure 8: Local temperature and pressure drops for  $CO_2$ ,  $C_3F_8$  and  $C_2F_6$  during two-phase flow

Figure 9 shows the maximum base temperature difference relative to the inlet and the pressure drop as a function of the fin height. The fin height has a major effect with  $C_3F_8$ , decreasing the temperature difference and pressure drop by

almost 700% by doubling the height. The temperature difference also becomes less for  $C_2F_6$  with the increase in fin height, with hardly any effect on the pressure drop. These simulations show that the highest pressure fluid,  $CO_2$ , is best suited for small geometries as temperature gradients and pressure drops are small.



Figure 9: Effect of fin height on maximum base temperature difference and pressure drop for two-phase flow

#### C. Single-Phase vs. Two-Phase

Figure 10 shows a comparison of single-phase to twophase cooling. The single-phase fluid used was C<sub>6</sub>F<sub>14</sub> with the inlet temperature of -30°C while the two-phase fluid was CO<sub>2</sub> with an inlet saturation temperature of -30°C. The fin height and channel width were 50 µm, while the fin thickness was  $25 \,\mu\text{m}$ . A base heat flux of  $2 \,\text{W/cm}^2$  was applied. The diagram shows the actual junction/base temperature and fluid pressure versus the axial position along the channel. For both the single-phase and two-phase results the axial temperature difference is below 5°C, although the increase in temperature for the two-phase fluid is much less than for the single-phase fluid (0.14°C vs. 4.7°C). The difference in the fluids' pressure drops is even more significant. The single-phase fluid requires a mass flux of 4500 kg/m<sup>2</sup>s to obtain a temperature difference of less than 5°C and has a pressure drop of about 700 kPa! The two-phase fluid only required a mass flux of 250 kg/m<sup>2</sup>s that resulted in a pressure drop of 60 kPa. The power required to move the two fluids is 1984 mW and 28 mW, respectively. This also implies that any advantage gained from using a single-phase fluid due to overall system pressure is negated due to the high pressure drops, unless changes to the geometry are made.





#### V. DESIGN CONSIDERATIONS

Several design considerations are needed for safe operation of the cooling system. Although a large international effort is underway on two-phase heat transfer research, the physical mechanisms involved are still not fully understood, particularly instabilities and critical heat flux.

#### A. Critical heat flux

For high heat flux cooling applications using multimicrochannel cooling channels, the critical heat flux (CHF) in saturated flow boiling conditions is a very important operational limit. It signifies the maximum heat flux that can be dissipated at the particular operating conditions. Surpassing CHF means that the heated wall becomes completely and irrevocably dry, and is associated with a very rapid and sharp increase in the wall temperature due to the replacement of liquid by vapor adjacent to the heat transfer surface. For example, Figure 11 illustrates the onset of CHF in multi-microchannel tests showing the temperature excursion that occurs during small steps of increasing heat flux. For most applications, this temperature excursion will result in irreparable damage to the device being cooled. Thus, critical heat flux is a particularly important design parameter in microchannel boiling applications in determining the upper operating limit of the cooling system for safe, reliable



Figure 11: Flow boiling curve measured at three different positions along the channel showing the onset of critical heat flux from Park [13].

Revellin and Thome [14] have proposed a theoretically based model for predicting critical heat flux in microchannels. Their model is based on the premise that CHF is reached when local dryout occurs during evaporation in annular flow at the location where the height of the interfacial waves reaches that of the annular film's mean thickness. To implement the model, they first solve one-dimensionally the conservation of mass, momentum and energy equations assuming annular flow to determine variation of the annular liquid film thickness  $\delta$ , ignoring any interfacial wave formation, along the channel. Then, based on the slip ratio and a Kelvin-Helmoltz critical wavelength criterion (assuming the film thickness to be proportional to the critical wavelength of the interfacial waves), the wave height was modelled with the following empirical expression:

$$\Delta \delta = 0.15 \left(\frac{u_{\rm G}}{u_{\rm L}}\right)^{-\frac{3}{7}} \left(\frac{g(\rho_{\rm L} - \rho_{\rm G})(d_{\rm i}/2)^2}{\sigma}\right)^{-\frac{1}{7}}$$

Then, when  $\delta$  equals  $\Delta \delta$  at the outlet of the microchannel, CHF is reached. Refer to Figure 12 for a simulation. The leading constant and two exponents were determined with a database including three fluids (R-134a, R-245fa and R-113) and three circular channel diameters (0.509 mm, 0.790 mm and 3.15 mm) taken from the CHF data of Wojtan et al. [15] and Lazarek and Black [16]. Their model also satisfactorily predicted the R-113 data of Bowers and Mudawar [17] for circular multi-microchannels with diameters of 0.510 and 2.54 mm of 10 mm length. Furthermore, taking the channel width as the characteristic dimension to use as the diameter in their 1-d model, they were also able to predict the rectangular multi-microchannel data of Qu and Mudawar [18] for water. All together, 90% of the database was predicted within  $\pm 20\%$ . As noted above, this model also accurately predicted the R-236fa multi-microchannel data of Agostini et al. [19]. Furthermore, in a yet to be published comparison, this model also predicts CHF data of CO<sub>2</sub> in microchannels from three additional independent studies.



Figure 12: Revellin and Thome [14] CHF model showing the annular film thickness variation along the channel plotted versus the wave height. The simulation is for R-134a at a saturation temperature of 30°C in a 0.5 mm channel of 20 mm heated length without inlet subcooling for a mass velocity of 500 kg/m<sup>2</sup>s, yielding a CHF of  $396 \text{ kW/m}^2$ 

#### B. Flow instabilities

Multi-microchannel flow boiling test sections can suffer from flow maldistribution and backflow effects, where some channels have a higher liquid flow rate than others. The flow may in fact flow back into the inlet header and some channels may become prematurely dry from too low of an inlet liquid flow rate. Such instabilities and maldistribution must be avoided completely to have a safe, reliable design.



Figure 13: Dramatic effect that maldistribution can have on the heat transfer process from Park et al. [20].





(b) With Orifice

Figure 14: Flow boiling in a copper multi-microchannel test section from Park [20] showing the difference in bubble distribution a) without an inlet orifice and b) with an inlet orifice.

Figure 13 shows a sequence of video images to demonstrate back flow and parallel channel instability in a multi-microchannel element. A slug bubble was observed at the inlet of the topmost channel in (a). If the flow in the channel is pushed upstream by bubble growth downstream, the bubble goes back into the inlet plenum in (b), as there is no restriction at the channel inlet of the channel to prevent this. This reversed bubble quickly moves to one of the adjacent channels, (c), and breaks down into smaller parts before entering these channels, (d). Depending on its location, the inserted bubble becomes stagnant, (e) and (f), before moving forwards or backwards again.

Using an inlet orifice at the mouth of each microchannel can prevent backflow, instabilities and maldistribution. This may also cause the fluid to flash on entering the channels and the restriction prevents any bubbles from re-entering the inlet manifold. The results of making use of such an orifice are seen in Figure 14, clearly showing the maldistribution at the top right corner in Figure 14a without inlet orifices but excellent distribution in Figure 14b with inlet orifices. Generally, the CHF measured with orifices is much higher than that measured without [20].

#### C. Split flow

As far as the geometric effect is concerned, for most studies, CHF increased when the channel diameter increased and the channel length decreased under the same mass velocity and inlet temperature, as described by Wojtan *et al.* [15].

Besides the thermal goal of achieving a high CHF for a micro-evaporator cooling, the energetic goal is to operate with as low pumping power consumption as possible, and hence minimizing the two-phase pressure drop and fluid flow rate through the element are also of primary importance.

A method developed to minimise the heated length while having the same heat transfer footprint is to have the flow enter at the mid-section of the channels and split in two. Therefore, the evaporator would have one inlet and two outlets. The advantages of this setup are that heat transfer lengths are shorter, implying lower outlet vapour qualities, higher heat transfer rates and lower pressure drops. This also has the advantage that significantly higher critical heat fluxes are obtainable compared to conventional setups with one inlet and one outlet. Extensive experimental work was performed on such setups by Agostini *et al.* [21-23].

#### VI. CONCLUSION

Over the past decade, research into microchannel cooling has gained considerable attention. This is primarily due to the electronics industry requiring the removal of heat in excess of what can be achieved by conventional air-cooled methods. This paper presented some aspects of microchannel cooling, especially for two-phase cooling. Mechanisms of heat transfer were presented showing the main difference between macro and microscale boiling. The state-of-the-art heat transfer and pressure drop models were presented for simulation purposes. Simulations were run using these models to illustrate the main differences between single-phase and two-phase flows. It was shown that two-phase flow had a significant advantage over single-phase flow, having much lower temperature gradients as well as lower pressure drops, especially when channels become very small. Different radiation hard fluids were also compared, showing that CO<sub>2</sub> and C<sub>2</sub>F<sub>6</sub> are the best for low temperature applications (<-10°C). Of these two, CO<sub>2</sub> outperformed C<sub>2</sub>F<sub>6</sub> due to its higher latent heat of vaporisation. This had the effect of producing much more uniform temperatures along the channel lengths.

Best practise design suggestions were given. Prediction methods were given to determine the critical heat flux of microchannel evaporators. Flow instabilities were also discussed, showing that making use of an inlet orifice at the mouth of each microchannel can prevent these instabilities as well as backflow and maldistribution. Further, by taking advantage of a split flow set-up, even higher heat transfer rates, lower pressure drops, and greater critical heat fluxes can be achieved.

#### VII. NOMENCLATURE

 $d_i$  Internal diameter

*e* Base thickness m

| f                             | Frequency                               | Hz                  |
|-------------------------------|-----------------------------------------|---------------------|
| G                             | Mass flux                               | kg/m <sup>2</sup> s |
| g                             | Gravitational acceleration              | $m/s^2$             |
| H                             | Fin height                              | m                   |
| L                             | Channel length                          | m                   |
| $L_{dry}$                     | Length of vapour slug                   | m                   |
| $L_{film}$                    | Length of liquid film trapped by bubble | m                   |
| $L_G$                         | Length of bubble including vapour slug  | m                   |
| $L_L$                         | Length of liquid slug                   | m                   |
| $L_p$                         | Total length of the pair of triplets    | m                   |
| P                             | Pressure                                | kPa                 |
| $P_{in}$                      | Inlet pressure                          | kPa                 |
| $P_x$                         | Local pressure                          | kPa                 |
| $q_b$                         | Base heat flux                          | W/cm <sup>2</sup>   |
| R                             | Radius of tube                          | m                   |
| Т                             | Temperature                             | °C                  |
| $T_b$                         | Base temperature                        | °C                  |
| $T_w$                         | Wall temperature                        | °C                  |
| t                             | Fin thickness                           | m                   |
| t                             | Time                                    | S                   |
| $u_G$                         | Vapour velocity                         | m/s                 |
| $\boldsymbol{u}_{\mathrm{L}}$ | Liquid velocity                         | m/s                 |
| W                             | Channel width                           | m                   |
|                               |                                         |                     |

#### Greek Letters

| δ              | Liquid film thickness         | m        |
|----------------|-------------------------------|----------|
| $\delta_{min}$ | Minimum liquid film thickness | m        |
| $\delta_{o}$   | Initial liquid film thickness | m        |
| $ ho_G$        | Vapour density                | $kg/m^3$ |
| $ ho_{L}$      | Liquid density                | $kg/m^3$ |
| $\sigma$       | Surface tension               | N/m      |
| τ              | Pair period                   | S        |
|                |                               |          |

#### VIII. REFERENCES

- B. Agostini, M. Fabbri, J. E. Park, L. Wojtan, J. R. Thome, and B. Michel, "State-of-the-art of High Heat Flux Cooling Technologies," *Heat Transfer Engineering*, vol. 28, pp. 258-281, 2007.
- [2] B. Agostini, and J. R. Thome, Comparison of an Extented Database for Flow Boiling Heat Transfer Coefficients in Multi-Microchannels Elements with the Three-Zone Model, Castelvecchio Pascoli, Italy, 2005.

m

- [3] A. M. Jacobi and J. R. Thome, "Heat transfer model for evaporation of elongated bubble flows in microchannels," *Journal of Heat Transfer-Transactions of the Asme*, vol. 124, pp. 1131-1136, 2002.
- [4] J. R. Thome, V. Dupont, and A. M. Jacobi, "Heat transfer model for evaporation in microchannels. Part I: presentation of the model," *International Journal of Heat and Mass Transfer*, vol. 47, pp. 3375-3385, 2004.
- [5] G. Ribatski, L. Wojtan, and J. R. Thome, "An analysis of experimental data and prediction methods for two-phase frictional pressure drop and flow boiling heat transfer in micro-scale channels " *Experimental Thermal and Fluid Science*, vol. 31, pp. 1-19, 2006.
- [6] L. X. Cheng, G. Ribatski, J. M. Quiben, and J. R. Thome, "New prediction methods for CO2 evaporation inside tubes: Part I - A two-phase flow pattern map and a flow pattern based phenomenological model for two-phase flow frictional pressure drops," *International Journal of Heat and Mass Transfer*, vol. 51, pp. 111-124, 2008.
- [7] L. X. Cheng, G. Ribatski, and J. R. Thome, "New prediction methods for CO2 evaporation inside tubes: Part II - An updated general flow boiling heat transfer model based on flow patterns," *International Journal of Heat and Mass Transfer*, vol. 51, pp. 125-135, 2008.
- [8] L. X. Cheng, G. Ribatski, and J. R. Thome, "Analysis of supercritical CO2 cooling in macro- and micro-channels," *International Journal of Refrigeration-Revue Internationale Du Froid*, vol. 31, pp. 1301-1316, 2008.
- [9] L. X. Cheng, G. Ribatski, L. Wojtan, and J. R. Thome, "New flow boiling heat transfer model and flow pattern map for carbon dioxide evaporating inside horizontal tubes," *International Journal of Heat and Mass Transfer*, vol. 49, pp. 4082-4094, 2006.
- [10] M. H. Kim, J. Pettersen, and C. W. Bullard, "Fundamental process and system design issues in CO2 vapor compression systems," *Progress in Energy and Combustion Science*, vol. 30, pp. 119-174, 2004.
- [11] J. R. Thome and G. Ribatski, "State-of-the-art of two-phase flow and flow boiling heat transfer and pressure drop of CO2 in macro- and microchannels," *International Journal of Refrigeration-Revue Internationale Du Froid*, vol. 28, pp. 1149-1168, 2005.

- [12] R. K. Shah and A. L. London, *Laminar Flow Forced Convection in Ducts*, Academic Press, New York, 1978.
- [13] J. E. Park, *Critical Heat Flux in Multi-Microchannel Copper Elements with Low Pressure Refrigerants*, Swiss Federal Institute of Technology, Lausanne, 2008.
- [14] R. Revellin and J. R. Thome, "An analytical model for the prediction of the critical heat flux in heated microchannels," *Int. J. Heat Mass Transfer*, vol. 51, pp. 1216-1225, 2008.
- [15] L. Wojtan, R. Revellin, and J. R. Thome, "Investigation of saturated critical heat flux in a single uniformly heated microchannel," *Experimental Thermal and Fluid Science*, vol. 30, pp. 765-774, 2006.
- [16] G. M. Lazarek and S. H. Black, "Evaporating Heat Transfer, Pressure Drop and Critical Heat Flux in a Small Vertical Tube with R-113," *International Journal of Heat and Mass Transfer*, vol. 25, pp. 945-960, 1982.
- [17] M. B. Bowers and I. Mudawar, "High flux boiling in low flow rate, low pressure drop mini-channel and micro-channel heat sinks," *Int. J. Heat Mass Transfer*, vol. 37, pp. 321-332, 1994.
- [18] W. Qu and I. Mudawar, "Measurement and correlation of critical heat flux in two-phase microchannel heat sink," *Int. J. Heat Mass Transfer*, vol. 47, pp. 2045-2059, 2004.
- [19] B. Agostini, R. Revellin, J. R. Thome, M. Fabbri, B. Michel, D. Calmi, and U. Kloter, "High Heat Flux Flow Boiling in Silicon Multi-Microchannels: Part III. Saturated Critical Heat Flux of R236fa and Two-Phase Pressure Drops," *Int. J. Heat Mass Transfer*, vol. 51, pp. 5426-5442, 2008.
- [20] J. E. Park, J. R. Thome, and B. Michel, "Effect of Inlet Orifice on Saturated CHF and Flow Visualization in Multi-microchannel Heat Sinks," *Twenty-Fifth Annual Ieee Semiconductor Thermal Measurement and Management Symposium*, vol., pp. 1-8, 2009.
- [21] B. Agostini, J. R. Thome, M. Fabbri, B. Michel, D. Calmi, and U. Kloter, "High heat flux flow boiling in silicon multi-microchannels Part I: Heat transfer characteristics of refrigerant R236fa," *International Journal of Heat and Mass Transfer*, vol. 51, pp. 5400-5414, 2008.

- [22] B. Agostini, J. R. Thome, M. Fabbri, B. Michel, D. Calmi, and U. Kloter, "High heat flux flow boiling in silicon multi-microchannels Part II: Heat transfer characteristics of refrigerant R245fa," *International Journal of Heat and Mass Transfer*, vol. 51, pp. 5415-5425, 2008.
- [23] B. Agostini, J. R. Thome, M. Fabbri, and B. Michel, "High heat flux two-phase cooling in silicon multimicrochannels," *IEEE Transactions on Components and Packaging Technologies*, vol. 31, pp. 691-701, 2008.

# Thursday 24 September 2009 **POSTERS Session**

# A Prototype Front-End Readout Chip for Silicon Microstrip Detectors Using an Advanced SiGe Technology

A. A. Grillo<sup>a</sup>, E. Spencer<sup>a</sup>, L. Daniel<sup>a</sup>, G. Horn<sup>a</sup>, A. Martchovsky<sup>a</sup>, F. Martinez-McKinney<sup>a</sup>, H.F.-W. Sadrozinski<sup>a</sup>, A. Seiden<sup>a</sup>, M. Wilder<sup>a</sup>

<sup>a</sup> Santa Cruz Institute for Particle Physics (SCIPP), University of California Santa Cruz, USA

#### grillo@scipp.ucsc.edu

#### Abstract

The upgrade of the ATLAS detector for the high luminosity upgrade of the LHC will require a rebuild of the Inner Detector as well as replacement of the readout electronics of the Liquid Argon Calorimeter and other detector components. We proposed some time ago to study silicon germanium (SiGe) BiCMOS technologies as a possible choice for the required silicon microstrip and calorimeter front-end chips given that they showed promise to provide necessary low noise at low power. Evaluation of the radiation hardness of these technologies has been under study. To validate the expected performance of these technologies, we designed and fabricated an 8-channel front-end readout chip for a silicon microstrip detector using the IBM 8WL technology, a likely choice for the ATLAS upgrade. Preliminary electrical characteristics of this chip will be presented.

#### I. INTRODUCTION

The planned upgrade of the ATLAS Inner Detector will consist of an all silicon tracker consisting of several layers of pixel detectors and several layers of microstrip detectors. The inner strip layers will likely consist of short strips (~2.5 cm long) and the outer layers long strips (~10 cm long). The capacitive load of the sensors presented to the front-end amplifier circuit will be approximately 5 pF for the short strips and 15 pF for the long. These relatively large loads have in the past presented difficulty for CMOS front-end circuits if the required shaping time is tens of nanoseconds. That is, the bias current of the front-end FET will have to be large in order to achieve high enough trans-conductance to achieve low noise with fast shaping time. Under these conditions, bipolar transistors can often out perform CMOS with lower power for the same noise level. Silicon germanium technologies (SiGe) represent a modern bipolar version. They are designed to have very high  $f_{TS}$  (e.g. 200 GHz) and achieve this by maintaining very low base resistance (tens of Ohms). The benefit for sensor readout circuits is that this low base resistance affords low noise at low bias current but fast shaping time.

We have been studying the radiation hardness of several SiGe technologies for several years and those results have been presented elsewhere [1], [2], [3], [4], [5], [6], [7]. In order to demonstrate the electrical performance of at least one

technology, we have designed and fabricated an 8-channel prototype chip on the IBM 8WL technology. This was chosen primarily because the CMOS component of this BiCMOS technology is compatible with the IBM 8RF all CMOS technology that is being used by other ATLAS collaborators. This would allow a future full readout chip with a bipolar front-end and a CMOS back-end to make use of digital CMOS circuits already being developed on the 8RF process. There is another similar SiGe BiCMOS process, the 8HP, which also includes a 130 nm CMOS technology. The SiGe component of that technology is even higher performance than the 8WL but is it is also more costly. We chose the 8WL over the 8HP for this prototype primarily for cost considerations.

#### II. THE PROTOTYPE CIRCUIT

The 8-channel prototype chip is based upon the binary readout architecture used in the present ATLAS strip detector and planned for the upgraded detector, which yields only a simple hit or no-hit signal. Each of the 8 channels consists of a first stage preamp, a DC coupled second stage differential amplifier, followed by an AC coupled shaper stage, which differentially drives a comparator. There is global bias adjustment for the DC coupled differential amplifier, and control of the final shaping time using varactors. These adjustments would be controlled by on-chip programmable DACs in the final readout chip and allow optimization of performance for variations in input characteristics and radiation damage. Individual channel-by-channel adjustment of comparator threshold allows for compensation of DC matching offsets in the shaper and comparator. A block diagram of the circuit is shown in Figure 1.

Both analogue and digital supplies power the comparator. The digital signal is passed between the two sections by a differential current. The comparator CMOS output is converted to LVDS in the output stage of the actual prototype. The CMOS signal would become the input to the digital processing section (e.g. a pipeline, etc.) in a future full readout chip. This differential connection between analogue and digital sections insures negligible coupling between analogue and digital sections, thus reducing EMI noise. In this prototype front-end only chip, the digital section includes only an LVDS driver to send the signal off chip. For testing, these LVDS signals were fed to an FPGA for processing.

The separation of analogue and digital sections, even given the minimal digital components of this chip, allows

separate analysis of the analogue power consumption, one of the primary objectives of this study.



Figure 1: Block diagram of single channel with nominal bias and power settings indicated for each stage.

While the test results below show that the chip can operate successfully with the expected capacitive loads of the short and long strip silicon sensors, certain optimizations were made in the circuit design for the long strip option. Further optimization for the short strips, primarily a reduction in the front transistor size and a larger feedback resistor, would improve the noise vs. power performance for short strips.

#### III. TESTING

#### A. The test board

A test board was designed and fabricated to allow one chip to be completely tested. The board provided all the necessary power rails, the adjustable bias currents and voltages, current pulses to simulate sensor signals to each channel and connection of the LVDS output signals to an external FPGA. The response to different input loads could be tested by changing capacitors mounted on the board. A picture of the board is shown in Figure 2.



Figure 2: Test board with chip at right

The chip to be tested was not mounted directly on the test board but instead was glued and wire-bonded to a mini-board, which was then mounted in a shallow cavity in the test board. Wire bonds electrically connected the traces on the miniboard to the corresponding nodes on the test board. The miniboard was secured mechanically with Delrin clamps and covered with a plastic cap to protect the wire bonds. A closeup of this part of the test board is shown in Figure 3.

By breaking the wire bonds between mini-board and test board, the mini-board with chip can be removed and irradiated without exposing all the support components on the test board to radiation damage since most of them are not radhard. This strategy also minimizes the activation of the chip and mini-board during irradiation. After irradiation, the miniboard can be re-mounted and the wire bonds restored for postradiation testing. Care was taken to keep traces on the miniboard to a minimum in order to minimize stray inductances, which might confuse the expected low noise, high performance of the chip.



Figure 3: Close-up of mini-board with Delrin clamps

While the chip is fully capable of reading out a real silicon sensor, a new test board would have to be designed which mounted the chip in a position where it could easily be attached to a sensor. As of now, we have only tested the chip on this test board with simple chip capacitor loads and externally supplied input signals. Radiation testing is also planned for the future.

#### **B.** Measurements

Figure 4 shows scope traces to illustrate shaper signal timewalk. Shaper signals are buffered by source followers for Picoprobe measurements. Traces are 1 fC, 1.25 fC, and 10 fC. Vertical cursors intersect the 1.25 fC and 10 fC signals at 1 fC threshold. The measured timewalk is 13.6 ns. Two amplifiers have varactor capacitors with the controlling VSHAPE = 1.000 V in this case, so that shape can be tuned for a specific timewalk.

Figure 5 and Figure 6 show how the shaper signal is affected by VSHAPE and input load capacitance. The figures show one side of the differential shaper signal with a 1 fC input and Ibias at 120  $\mu$ A and two different input loads. The three signals shown correspond to VSHAPE = 1.5 V, 0.75 V, 0. V (highest amplitude to lowest). The signal peak shifts

10.8 ns over the VSHAPE range for a 3.31 pF load, and only slightly less, 8.4 ns, for 19 pF load.



Figure 4: Shaper outputs for 1 fC, 1.25 fC and 10 fC input signals



Figure 5: Shaping range of 10.8 ns using the varactor control is illustrated. One side of differential shaper signal shown with 1 fC input, 3.31 pF load.



Figure 6: Shaping range of 8.4 ns using the varactor control is illustrated with much larger load than in Figure 5. Input signal is 1 fC, and load, 19 pF.

The timewalk measurement technique using a digital scope is illustrated in Figure 7. The scope trigger is the LVDS comparator signal at the lower right. This signal varies in width with noise in the shaper analogue signal. The calibration trigger signal on the upper traces is averaged  $\sim$ 200 times. Since it has constant width and amplitude, the average shows as a signal with lower rise time for the 1.25 fC trigger on the left. The apparently faster 10 fC trigger shown on the right indicates much less jitter in the comparator LVDS signal width and timing. The cursor indicates a timewalk of 16 ns for this amplifier setting. Note that the earlier 10 fC signal appears to the right of the later 1.25 fC signal, since we are post-triggering.



Figure 7: Timewalk of 16 ns for input signals of 1.25 fC and 10 fC

For the simple binary readout architecture the preamp and shaper circuit is characterized by varying the comparator threshold and counting the percentage of comparator firings vs. the threshold. The count rate ranges from 100% at low threshold to 0% at the highest threshold. The plot of count rate versus threshold is actually the standard Error Function where the 50% point corresponds to the mean signal amplitude and the width of the slope the Gaussian noise. This was measured for several input charges and several settings of the front transistor bias current and input capacitance. Figure 8 shows the results for a 13 pF input load at six different front transistor currents. Using the varactor, the timewalk was adjusted to 16 ns for all points. The 50% responses shown are non-linear by design in order to minimize power since a linear response is not required in this readout architecture. It is linear in the region of the planned operating threshold, 0.5 fC to 1.0 fC. The small signal gain is then the derivative of the response curve. The small signal gain is used in calculating the noise referred to preamp input and is shown in Figure 9.

The noise as referred to preamp input was measured for input loads from 3.3 pF to 17.4 pF, and front transistor bias currents from 60  $\mu$ A to 180  $\mu$ A. Figures 10 and 11 show the results of these tests. Load capacitance includes all strays, including the mini-board, the bond pads, and an estimate of the chip circuit and protection diodes. The noise equivalent of 640 nA sensor leakage current has been added in quadrature to the measured noise to include the effect of radiation damaged sensors. Note that the 17.4 pF curve in figure 10 has no data points below 120  $\mu$ A since the VSHAPE control is out of range for lower front currents.



Figure 8: Response curve for 13 pF load and 16 ns timewalk



Figure 9: Small signal gain, the derivative of the response curve, such as in Figure 8, for 13 pF load and 16 ns timewalk



Figure 10: Noise referred to input vs. front bias current

Using results shown in Figure 11, we measure 1360 e<sup>-</sup> at a front bias current of  $102 \,\mu\text{A}$  for a 15 pF load. When we include the additional noise due to the expected post radiation DC gain reduction as well as the irradiated sensor leakage current, a post radiation noise level of 1500 e<sup>-</sup> can still be achieved. Adding in the bias currents of the remainder of the

analogue circuit at nominal rail voltage of 1.2 V, the total analogue power consumption is 197  $\mu$ W per channel. This would be the expected power consumption for the long strip ATLAS upgraded detector. For the short strip option (5 pF load), the front bias current can be reduced to about 60  $\mu$ A for minimal noise and 146  $\mu$ W total power per channel. The noise, however, would not be optimal once the post radiation front transistor DC gain reduction is taken into account. This could be remedied by further optimization, namely a reduction in the size of the front transistor and an increase in the feedback resistance. The results of such a further optimization will be quantified in the near future.



Figure 11: Noise referred to input vs. load capacitance

#### **IV. CONCLUSIONS**

This 8-channel chip demonstrates that acceptable noise values can be achieved for the silicon microstrip detectors currently envisaged for the ATLAS Upgrade Detector, especially the long strip (~10 cm) sensor version, at exceptionally low power. Additional optimization of the front transistor and feedback could further reduce noise and power for the short strip sensor version. The design and technology easily meet the required 16 ns time-walk requirement and faster performance could be easily achieved. The design allows for variable control of the front transistor bias current and the shaping time, thus allowing noise vs. power optimization for a range of sensor characteristics, in particular to accommodate changes in sensor characteristics due to ongoing radiation damage.

#### V. REFERENCES

- 1. Edwin Spencer et al., "Evaluation of SiGe biCMOS technology for Next Generation Strip Readout", Heidelberg, Proceedings of the 11th Workshop on Electronics for LHC Experiments, September 2005.
- J. Metcalfe, et al., "Evaluation of the radiation tolerance of SiGe heterojunction bipolar transistors under 24 GeV proton exposure", IEEE Trans. Nucl. Sci., vol. 53, no. 2, pp. 3889-3893, 2006.
- 3. J. Metcalfe, "Silicon germanium heterojunction bipolar transistors: Exploration of radiation tolerance for use at SLHC", Masters Thesis, UCSC, Sept. 2006.
- J. Metcalfe, et al., "Evaluation of the radiation tolerance of several generations of SiGe, heterojunction bipolar transistors under radiation exposure", Nucl. Instrum. Methods A579, 833 (2007).

- M. Ullán, et al. "Evaluation of Two SiGe HBT Technologies for the ATLAS sLHC Upgrade", Proceedings of the Topical Workshop on Electronics for Particle Physics (TWEPP-08), Naxos, Greece, pp. 111-115, Sep. 2008.
- 6. J. S. Rice, et al. "Performance of the SiGe HBT 8HP and 8WL Technologies after High Dose/Fluence Radiation Exposure", Proceedings of the IEEE Nuclear Science

Symposium and Medical Imaging Conference (NSS-MIC 2008), Dresden, Germany, pp. 2206-2210, Oct. 2008.

7. M. Ullán, et al., "Evaluation of silicon-germanium (SiGe) bipolar technologies for use in an upgraded atlas detector", Nucl. Instrum. Methods A604, 668 (2009).

# DC-DC SWITCHING CONVERTER BASED POWER DISTRIBUTION VS SERIAL POWER DISTRIBUTION: EMC STRATEGIES

F.Arteche<sup>a</sup>, C. Esteban<sup>a</sup>, M. Iglesias<sup>a</sup>, C. Rivetta<sup>b</sup>, F.J. Arcega<sup>c</sup>, I. Vila<sup>d</sup>

<sup>a</sup> Instituto Tecnológico de Aragón , Zaragoza, Spain <sup>b</sup> SLAC National Accelerator Laboratory, Stanford, CA 94025, USA <sup>c</sup> Universidad de Zaragoza , Zaragoza, Spain <sup>d</sup> IFCA (CSIC-UC), Santander, Spain

> farteche@ita.es rivetta@slac.stanford.edu

#### Abstract

This paper presents a detailed and comparative analysis from the electromagnetic compatibility point of view of the proposed power distributions for the SLHC tracker up-grade. The main idea is to identify and quantify the noise sources, noise distribution at the system level and the sensitive areas in the front-end electronics corresponding to both proposed topologies: The DC-DC converter based power distribution and the serial power distribution. These studies will be used to define critical points on both systems to be studied and prototyped to ensure the correct integration of the system critically account the taking into electromagnetic compatibility. This analysis at the system level is crucial to ensure the final performance of the detector using non conventional power distributions, avoiding interference problems and excessive losses that can lead to catastrophic failures or expensive and un-practical solutions.

#### I. INTRODUCTION

The up-grade for the tracker sub-system in both CMS and Atlas detectors are based on a front-end electronic (FEE) circuitry that requires ultra-low voltages to power-up the integrated circuits. This constraint forces to define new schemes of DC power distribution to bias efficiently the tracker front-end electronics, reducing the volume of the power conductors. The proposed power distribution schemes can be grouped into:

- Serial Power Distribution System
- DC-DC switching converter based Power Distribution System.

Both schemes are not conventional and have advantages and disadvantages.

The high magnetic field in the central detector does not allow to use magnetic materials in the switching power converter units for the proposed DC-DC converter based Power Distribution System. A large R&D effort is planned to develop unique DC-DC switching converters to operate under high magnetic field and particle radiation with minimum radiated and conducted noise emissions. The constraint imposed by the no-magnetic material design sets the conductive and radiated noise levels to a minimum that is higher than that achieved in conventional switching converters. Additionally, in this power distribution scheme, DC-DC converters will be located near the FEE, within the tracker volume, increasing the coupling of interference between the power switching converters and the silicon detector / front-end electronic units.

Serial power distribution system has been already used in other small subsystems and experiments. This topology is mainly characterized by floating FEE. This requirement forces special design to keep low-impedance connection of the FEE to ground at high frequency, which may introduce undesirable effects due to imbalances in the HF ground connection. For that purpose, a special effort should be focused on the FEE design analyzing parasitic effects that have important impact in the performance of the system. Furthermore, in order to increase the efficiency, the serial power distribution plans to use DC-DC power converters as a primary power supply, which may increase the total interference of the system due to the conducted noise emission at the output. Those scenarios force to conduct electromagnetic compatibility studies on the proposed systems to be able to improve the noise immunity of the frontend electronics in order to assess compatibility with the noise generated by the power supply system.

CMS tracker power task force [1] has recommended that the baseline powering system for an upgraded CMS Tracking system should be based on DC-DC conversion, with Serial Powering maintained as a back-up solution. ATLAS upgrade has defined the final decision and keeps studying both proposals. In any case, electromagnetic compatibility between components in both the DC-DC switching converters based Power Distribution and the Serial Power Distribution topologies can be only achieved minimizing both the radiated and conducted noise emitted by the main noise sources and increasing the noise immunity of the FEE by a robust design.

This paper analyses the main elements that define the electromagnetic compatibility (EMC) of both power distribution systems and defines the impact of the system design and integration strategies in the compatibility of FEE. The main aspects (noise sources and FEE immunity) that define the electromagnetic compatibility of both topologies are presented. The integration aspects have strong impact in the system compatibility. However, if an EMC strategy is implemented at an early stage of the design, the compatibility between both the FEE and the proposed DC power distributions may be achieved.

# II. EMC ELEMENTS: NOISE SOURCES & IMPEDANCES

The two main elements that define the electromagnetic compatibility of any electronic system are noise sources and the impedances of the system.

The main noise sources in any electronic system are:

- *DC-DC converters* (conducted and radiated noise)
- *Electronic systems* (the current consumption is not constant (fast signals, clocks,..) and interact with the impedance of the power distribution system)

These sources usually define the noise emission level [2][3] of the system.

On the other hand, the element that usually defines the immunity of the system is the impedance of the circuit. It defines the coupling strength between the noise and interference and the sensitive parts of FEE. These impedances are conformed by the intrinsic impedance of components and the equivalent impedance defined by the coupling and parasitic impedances between the system and the mechanical structure. They depend strongly on the frequency range in study because above a certain frequency (~1-5 MHz) the stray components are a fundamental part of the impedance and direct connections between two points may not be considered low impedance connections. Also, the impedance of cables or conducting structures is important because it defines the ability of such systems to conduct, radiate or receive noise from other systems.

# III. NOISE DISTRIBUTION ON DC-DC CONVERTER BASED POWER DISTRIBUTION SYSTEM

Large amount of conducted noise in the tracker upgrade may be created by the switching mode power supplies (SMPS) installed close to FEE. SMPSs generate high frequency noise due to the switching action. This noise can propagate through power network, where it can be either radiated to other systems or conducted to the FEE, decreasing the performance of the detector. The FEE immunity will be directly related to the systems topology because the types of cables, grounding strategy and power network design define the ability of the system to bypass noise and interference currents from sensitive parts of FEE. Fig. 1 depicts a simple diagram of a possible topology of one leader of the Tracker upgrade. This topology is characterized by two important aspects that define the FEE-Power Distribution compatibility:

- The long distance between the ground of the leader and the local ground of each FEE may not be considered as the unique ground or equipotential structure at high frequency.
- The high number of DC-DC converters located very close to FEE and connected through a common power network.

Based on this scenario, the more important electromagnetic interference issues founded in the DC-DC converter based power distribution are:

- 1. Noise emission (radiated and conducted) effects at the output of the DC-DC converters.
- 2. Noise emission (radiated and conducted) effects at the input of the DC-DC converters.
- 3. Noise emission (radiated and conducted) effects in HV & MT lines
- 4. Grounding Noise effects between FEE and overall ground of the system.



Figure 1: Noise sources on DC-DC converter based power distribution

# *A.* Output emissions effects of DC-DC converters

DC-DC converters emit radiated and conducted noise at the output, which can decrease the performance of the FEE. The output currents in DC-DC converters contain not only the DC components that contribute to the real power transfer, but also a large amount of harmonic components of the switching frequency. These harmonic components propagates out of the power supply as conducted electromagnetic interference emitted through the input and output cables [4][5]. The input/output is composed by two conductors (+, -) and a reference and the interference signals can be decomposed into two modes of propagation: Differential mode (DM) and Common mode (CM).

The DM noise is the direct result of the fundamental operation of the switching converter, whereas the CM noise often includes parasitic capacitive or inductive coupling. Selecting the adequate filtering strategy (capacitors with low series inductance and series resistance) and SMPS topology is possible to decrease the DM & CM emissions as well as radiated noise. The later one has significant importance in this distribution because the close distance between DC-DC converters and FEE. Fig. 2 depicts the mutual coupling between a group of power distribution conductors and the sensitive areas of FEE when the SMPSs and FEE are close enough.



Figure 2: DC-DC converters output emissions

The layout and integration strategy have a strong impact on the system compatibility and most of all noise problems may be solved easily. As an example, there are several ways to ensure the compatibility between FEE and magnetic radiations emission from DC-DC converters. They are listed bellow:

- It is possible to define a common ground for the DC-DC and Sensor to cancel the CM
- It may be possible to design the inductor to avoid radiation
- It may be possible to design the FEE-Sensor in a way that it is immune to magnetic fields.

#### B. DC-DC input noise emissions

DC-DC converters emit also conducted noise at the input. This noise emission can be coupled to other systems via direct conduction or radiated (electrically or magnetically) by power network or any cable present inside tracker volume. Fig. 3 shows the noise coupling mechanism associated to noise emissions at the input of a DC-DC power converter.



Figure 3: DC-DC converter input noise emissions

To solve this problem, the input of the SMPS must have filters and the power distribution network have to be properly designed. To minimize the mass in the tracker region and make a distribution network with low EMI emission, an integral design is necessary to satisfy both a good mechanical design and a structure electromagnetic compatible with the sensitive FEE. Proposal for the power distribution network implementation includes the design of PCB or custom networks based on twisted-pair cables and shields. The evaluation of those structures from the EMC point of view is critical to minimize the impact of interferences in the FEE. Solutions based on PCBs gives compact structures and it is easy to include filtering, but usually use more cooper in the design. Combinations of PCBs and carbon fibre materials could give optimal designs with minimum mass and EMC with the FEE connected to the distribution network. Additionally, for the proposed power distribution, the SMPS will switch a very high frequency, above 1MHz, introducing harmonic components with considerable energy up to 100-200MHz in the rod. Therefore, studies of the power network from mechanical and EMC point of view are an important topic for future power network to define the key issues that have to be addressed in the development of power networks to increase the immunity of tracker system.

#### C. HV & MT lines

Experience from the previous CMS tracker detectors [6] has showed that the slow control lines (MT) and high voltage (HV) lines are able to couple noise to sensitive parts of FEE. Slow control lines have been strongest coupling elements of noise and interferences for the tracker system. The new proposed scheme to power the tracker system based on DC-DC switching converters will produce large amount of noise inside the tracker volume. This noise can couple to the MT and HV lines and through them finally to the sensitive areas of the FEE. Particularly, in the past those lines included only filter to avoid external perturbation flowing through them, but within the volume of the detector only minimum by-passing filtering to ground have been included. Fig 4 shows a simplified scheme including the noise generation and distribution across HV and MT lines and the coupling with the FEE.



Figure 4: MT & HV line emissions

To design properly the MT and HV lines there are several approaches as good filtering and careful layout of the network. In the case of the MT lines, there are other solutions based on the substitution of these lines by optical fibres and optical transducers. The Fibre Bragg Grating (FBG) sensors have many enhanced features with respect to traditional electrical probes: no need for readout near the detector or sensor, no power cables; long term stability; immunity to electromagnetic fields, high voltages, extreme temperatures, and ionising radiation; simple multiplexing; etc. This solution reduces the EMI coupled by the MT line to the FEE to a minimum.

#### D. GND noise

In large volumes and with minimum cooper mass it is very difficult to achieve an equipotential ground structure. Different areas of the structure and the sensitive parts of the FEE can have potential difference that will couple near-fields or CM currents in the FEE. The magnitude of this potential difference is directly related to the characteristic of the ground connection. Fig. 5 depicts a simple scheme showing noise and interference coupling between the sensitive FEE locate in within the tracker volume and the surrounding structure.



Figure 5: GND noise implications

Good grounding connection minimizing the impedance between the FEE and the structure is the solution to this problem but it is generally limited by mechanical constraints. In general the main characteristics that should be followed for the ground connections are:

• The ground connections should be short and flat.

• Routing path should be as close as possible to a conductive structure near the FEE.

# IV. NOISE DISTRIBUTION ON SERIAL TOPOLOGY

Serial powering is the other option under study to power the Tracker upgrade. In the serial powering scheme a power supply, operating as current source, biases a set of detector modules connected in series. At that FEE module level, local shunt regulators provide the local voltage regulation per module. The voltage across the total chain of modules is n times the module voltage. The potential reference for each module is different. The serial powering scheme is composed by three main elements:

- Current sources
- Shunt regulator with distribution per module of digital and analogue power
- AC or opto-coupling of clock, command and data signals.

The serial powering scheme [7][8][9] is also characterized by the connection among analogue ground, digital ground and sensor bias ground, which are tied together on the module. Since the grounds or reference voltage of different modules are different, floating HV power supplies must be used. Figure 6 shows a simple diagram of a possible topology of the Tracker upgrade for ATLAS detector. This topology defines several noise issues that must be taken into account.

- 1. Noise emission (radiated and conducted) effects at the FEE level
- 2. Conducted Noise emitted by the power supply and coupled to the electronic system through the distribution network.
- 3. Noise coupled (radiated and conducted) through the HV and MT lines
- 4. Grounding Noise effects between FEE and GND.



Figure 6: Noise emissions in the serial power distribution system (courtesy from Mark Weber [9])

#### A. FEE noise emissions

Electronics units generate noise in the power system because it operates with non-constant current consumption. The spectrum of the power supply currents for the FEE has low frequency components and high frequency components associated with the data rate and fast transitions signals. This current spectrum is mainly filtered by the LV regulators and high frequency capacitors by-passing the power lines and output terminals of the regulators. The voltage developed at the input power terminal of each module is defined by that filtering. Fig. 7 shows the effect of the input terminal voltage in the serial power distribution. Due to the series connection of the modules, the input terminal voltages are added and define the common mode voltage between the i<sup>th</sup> module and the structure ground. This common mode noise in each module reference can couple noise to the FEE via the stray capacitance between the FEE/detector module and the structure.



Figure 7: Common mode voltage per module due to the current module consumption.

#### B. Primary power source

The primary power is also able to introduce noise into the serial powering system. The two main elements that may introduce noise are:

- Current source (primary power supply)
- Power cables

Power cables are the connection between the series module chain with the external word. Electromagnetic radiated noise generated by neighbouring sub-systems may couple to these cables and the interference distributed in the serial array. Primary power supply located outside of the tracker module array emits conductive and radiated interference that is coupled to the detector through the power cables. It is planed to develop these power supplies acting as current generator based on switching converters. These units will generate CM & DM in the same way that has been explained in section III and it may decrease the performance of the FEE as it has been already study in previous generation of CMS [5]. The current source operation of the power supply is only limited to low frequencies, above hundreds of kHz, any noise generated by the switching converter is not filtered. This noise should be cancelled by CM - DM filters in the power supply (Filter for a set of N modules). In the detector, it is important to filter the noise at the input power terminal of the overall distribution network. The interference flowing through the serial power distribution network is more difficult to filter for each module because there is no GND at module level. Additionally, as it is depicted in Fig., the effect of interference currents flowing through the distribution network and the input impedance of the power terminal of each module develops a common mode voltage. This CM voltage affects more critically to those modules located far away from the unique GND connection. It is important to minimize for high frequency the input impedance of modules and the impedance of the power distribution network for common mode signal.



Figure 8: Effect of current source noise in the Serial Distribution.

# C. HV & MT lines

The noise effects from MT and HV lines are very similar to the one analyzed for the DC-DC converter based power distribution topology. Noise can be coupled to these lines from outside of the detector.

#### D. Grounding effects

The GND effects in the serial power distribution are very important because the design has only a ground connection for one of the modules. There is not 'explicit' ground connection between the local reference per module and the structure. This special characteristic forces to consider the grounding at high frequency of each module from the design stage, being almost impossible or unpractical to change it during the commissioning stage. As it was analyzed in previous sub-sections, current interferences flowing through the serial power distribution develops common mode voltages that drives the reference voltage per module respect to the structure potential. Fig. 9 depicts the ground noise distribution on serial powering scheme. The stray component defines the return path for CM currents flowing through the FEE and induced by CM ground voltages. To increase the immunity of the system to ground voltages or potential differences is necessary minimize the stray capacitances between the structure and the detector and reduce the impedance at high frequency between modules and the structure. It requires an integral design of the grounding and ground connection of the modules at high frequency, minimizing the overall mass of the structure.



Figure 9: Noise induced by GND potential differences in serial distribution

### V. SUMARY OF BOTH SYSTEMS

# *A. DC-DC* converters based power system topology

The DC-DC converter based power distribution system has several noise sources:

- DC-DC power converters
- MT & HV lines
- Cables and structures

The main characteristics of DC-DC converter based power distribution system are the close distance between the main noise source (DC-DC power converter) and the victim (FEE) and the high switching frequency of the DC-DC converters, around 1 MHz. These two elements force to take into account radiation effects (near and far field) that have not been considered in previous detectors. Additionally, the large number of DC-DC power converters located inside the tracker volume and connected to the same power network is critical. The interference current flowing through the power network will radiate EMI and has to be taken into account to define the safety margins in the compatibility between the immunity level of FEE and the emission level of DC-DC power converters at the input and at the output ports. All these elements may be controlled if an EMC strategy is implemented from the very beginning of the design. This strategy has to be focused on the grounding topology and the DC-DC converters, the FEE and sensor and the distribution network design.

#### *B. Serial system topology*

Serial powering topology has also several noise sources:

- Electronics noise
- Current source (power supply)
- MT & HV lines
- Cables & structures

The two main characteristics of the Serial power distribution in terms of noise are the lack of local ground or grounding at the module level and the addition of inner noise sources or common mode voltages due to serial array. A critical element in the serial power distribution is the LV regulators and power filters per module. The internal impedance of those devices is critical to minimize the common mode voltage developed along the series distribution due to the current variations per module and interferences flowing through the distribution network.

The grounding topology have to be designed from the beginning because it is going to be unpractical or difficult to introduce changes during commissioning. Similar to the other distribution proposal, the grounding design suffers of an strong limitation that is the minimization of cooper material in the structure. That reason sets for an integral design for the grounding taking into account mechanical constraints as well as electromagnetic compatibility issues.

#### VI. CONCLUSIONS

A general overview of the most important noise issues of both DC-DC converter based power distribution system and serial powering system has been presented. Noise issues in both systems are much more complex than in the past and introduce an important risk in terms of system performance. They can not be solved during commissioning by means of try-error procedure. They require a systematic design approach including the electromagnetic compatibility of the system as a fundamental issue from the very beginning.

#### VII. ACKNOWLEDGMENT

The authors would like to thank to Dr. Peter Sharp from Imperial College/CERN for helping us during the development of these studies. Also, we would like to thank to Instituto Tecnológico de Aragón (ITA), Zaragoza, Spain and specially Dr. J.L. Pelegay, head of Grupo de Investigación Aplicada (G.I.A.) for the support of this work. Finally, one of us (C.R.) wants to thank to US DOE, under contract DE-AC02-76SF00515, for the support of this work.

#### VIII. REFERENCES

[1] "Report of Power Task Force", January 2009

[2] F.Arteche and C. Rivetta, "EMI Filter Design and Stability Assessment of DC Voltage Distribution based on Switching Converters", Proceedings of Workshop on Electronics for LHC Experiments, LEB 2001, Vol 1, pp353-357, September 2001

[3] F. Arteche and C. Rivetta, "*Noise Susceptibility Analysis of the HF Front-End Electronics for the CMS High -Energy Experiment*", Proc. of IEEE Int. Symposium on EMC. August 2003, Boston, USA, pp. 718-723.

[4] F. Arteche and C. Rivetta "*EM Immunity studies for frontend electronics in high-energy physics experiments*", Proc. of Int. Symposium on EMC, EMC Europe 2004. September 2004, Eindhoven, The Netherlands, pp. 533-538.

[5] F. Arteche, C. Rivetta and F. Szonsco, "*Electromagnetic Compatibility Plan for the CMS Detector at CERN*", Proc. of 15th Int. Zurich Symposium on EMC, February 18-20, 2003, Zurich, Switzerland, pp. 533-538.

[6] F.Arteche, C. Rivetta, "Detector noise susceptibility issues for the future generation of High Energy Physics Experiments", Proceedings of Workshop on Electronics for LHC Experiments – TWEPP 2008, Vol 1, pp533-538,September 2008

[7] Marc Weber, Giulio Villani, Mike Tyndel, Robert Apsimon, "Serial powering of silicon strip detectors at SLHC" Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, Volume 579, Issue 2, 1 September 2007, Pages 844-847

[8] Mark Weber, "*Power distribution for SLHC trackers: Challenges and solutions*" Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, Volume 592, Issues 1-2, 11 July 2008, Pages 44-55

[9] Mark Weber, "Serial Powering discussion", Power task force meeting, December 2008

# Study of the Radiation Hardness Performance of PiN diodes for the ATLAS Pixel Detector at the SLHC upgrade

B. Abi, F. Rizatdinova

Oklahoma State University, 148 PS II, Stillwater, OK 74078, USA

babak.abi@okstate.edu

#### Abstract

We study the radiation tolerance of the silicon and GaAs PiN diodes that will be the part of the readout system of the upgraded ATLAS pixel detector. The components were irradiated by 200 MeV protons up to total accumulated dose  $1.2 \times 10^{15}$  p/cm<sup>2</sup> and by 24 GeV protons up to  $2.6 \times 10^{15}$  p/cm<sup>2</sup>. Based on obtained results, we conclude that radiation hardness does not depend on the sensitive area or cut off frequency of PiN diodes. We identify two diodes that can be used for the SLHC upgrade.

#### I. INTRODUCTION

At the SLHC, the luminosity will be increased by a factor of ten compared to the LHC. The radiation level in the ATLAS detector is expected to increase by a similar factor. Current ATLAS pixel detector has to be upgraded to address the higher radiation environment of the SLHC. We use the Non Ionizing Energy Loss (NIEL) scaling hypothesis to estimate the SLHC fluences at the present optical link location (PP0) of the pixel detector of the ATLAS experiment. After five years of operation the SLHC is expected to achieve  $3 \text{ nb}^{-1}$  of integrated luminosity which corresponds to the fluences as shown in Table 1 [1-2].

| Beam | 1MeV                 | 200MeV               | 24GeV                |
|------|----------------------|----------------------|----------------------|
|      | $[n_{eq}/cm^2]$      | $[p/cm^2]$           | $[p/cm^2]$           |
| Si   | $1.5 \times 10^{15}$ | $1.4 \times 10^{15}$ | $2.6 \times 10^{15}$ |
| GaAs | 8.2×10 <sup>15</sup> | $1.2 \times 10^{15}$ | $1.6 \times 10^{15}$ |

Table 1: Beam fluencies.

Our goal is to identify the PiN diode candidates which will tolerate the SLHC dose and will have sufficient speed for the optical readouts to be used in tracking detectors at the SLHC. In order to accomplish this goal we designed and built the test stands and developed the methods to study characteristics and radiation hardness and reliability of PiN diodes available on the market vs. irradiation dose.

#### II. PIN DIODES SELECTED FOR TESTS

We have chosen the following PiN diodes for our irradiation tests:

A. Si PiN diodes S9055-01 and S5973-01 (single devices).

- B. GaAs PiN diode G8522-XX (single device). This family includes three types with different optical active area and cut off frequency but the same physical structure, which provides an excellent opportunity to study the radiation hardness vs. PiN diode frequency.
- C. GaAs PiN diode G8921-01 (diode die available in 4 to 12 channel modifications). It is a potential candidate for the high speed parallel optical transceiver.

#### **III.** PERFORMED TESTS

We performed three major tests to study radiation hardness characteristics of the PiN diodes. First, we performed the total ionization damage (TID) test at BNL using gamma rays with a total dose of 10 Mrad. The purpose of this test was to confirm that the performance of PiN diodes is not affected by the gamma irradiation.

Second, we did two identical tests at Indiana University Cyclotron Facility (IUCF). The tests were done in two phases with two weeks interval which allowed us to observe a possible annealing effect. At each phase we irradiated diodes with 200 MeV protons up to  $0.7 \times 10^{15}$  p/cm<sup>2</sup> with a total fluence of  $1.4 \times 10^{15}$  p/cm<sup>2</sup>. These tests were done using the Open-air Optical Path approach described below. The main goal of the tests was to measure the PiN diode responsivity as a function of accumulated dose. In addition we wanted to figure out if the degradation of the responsivity depends on the active area and cut-off frequency of PiN diodes. For this purpose we used 3 PiN diodes from the G8522 family.

Finally, we did the irradiation tests at CERN T7 test beam facility using 24 GeV proton beam and a total fluence of  $1.5 \times 10^{15} \text{ p/cm}^2$  for GaAs and  $2.6 \times 10^{15} \text{ p/cm}^2$  for Si diodes. We studied the same set of PiN diodes using different experimental setup to cross-check the results obtained at different beam energy.

#### IV. TID TEST AT BNL

The gamma ray source at BNL is a cylindrical cobalt-60 source with maximum irradiation rate of 200 kRad/h positioned at 6 inches from the center of the source. Diodes chosen for the test were biased, so we could make offline measurement of responsivity as a function of accumulated dose. For the measurement of responsivity we used a homogenous infrared (IR) source biased by a constant current. Figure 1 shows the schematics of the setup for optical calibration and responsivity measurement. Samples have been installed at the irradiation point at 6 inches from the source to get the maximal dose. The accumulated dose is known with

20% accuracy. Responsivity was measured offline at 0 Mrad, 5.6 Mrad and 9.6 Mrad. Figure 2 shows responsivity versus total dose. No change in responsivity has been observed after 9.6 Mrad.



Figure 1 : Schematics of the optical setup for BNL tests.



Figure 2 : Responsivity vs. accumulated dose obtained at BNL tests.

#### V. OPEN-AIR OPTICAL PATH TEST STAND

For the second set of tests we introduced a new concept of a test stand for optical components like PiN diodes, named Open-air Optical Path, which has many advantages. First, it allows us to avoid any parasitic effects and random mechanical attenuation from optical circuits and connectors. With this concept we can remove the effect of optical packages in final measurement and simplify the test setup. Another benefit is the ability to easily test a large number of samples. The test stand is characterized by simplified electrical and DAQ circuitry. It allows us to monitor and control the optical power of the sources with high accuracy. Figure 3 shows the motherboard which includes a high power IR power source with a wavelength of 850 nm focused precisely at the center of the daughterboard which carries the PiN diodes. Figure 4 shows the daughterboard with installed PiN diodes. The motherboard is positioned in such a way that the beam crosses the PiN diodes but does not touch the optical sources which are installed on the ring. The daughterboard is detachable from the motherboard to allow for quick and easy measurement of multiple samples.



Figure 3 : Open-air Optical Path motherboard.



Figure 4: Daughterboard with PiN diodes.

#### VI. TESTS AT IUCF

The second set of tests was performed at Indiana University Cyclotron Facility (IUCF) using 200 MeV proton beam. The IUCF's beam is 5 cm in diameter and has a maximum irradiation rate of 5 Mrad/h. We used the Open-air Optical Path test stand. The test has been performed in two separate runs. The accumulated dose in each run was  $0.7 \times 10^{15}$  p/cm<sup>2</sup> (40 MRad). Since the IR source was losing power during the run due to secondary radiation, its optical power has been monitored, and the results of the PiN diode responsivity measurements have been corrected for this loss. Figure 5 shows the results of the responsivity measurement for two types of PiN diodes (S8522-1 and 3) with and without compensation. The lines marked "ON" take the compensation into account. The small plot on top shows the normalized optical power of the source.



Figure 5 : Responsivity vs. accumulated dose obtained at IUCF tests.

Figure 6 shows the results of the first irradiation phase with normalized responsivity but without optical power variation compensation.



Figure 6 : Responsivity vs. accumulated dose obtained in the first IUCF run (no optical variation compensation).

Table 2 shows total responsivity degradation after 80 Mrad of accumulated dose using 200 MeV protons. The results are corrected for the optical power fluctuation. The results for the G8522-0X diode family show no correlation between responsivity and size of active optical area.

Table 2 : final results responsivity

| Pin      | Total degradation [%] |
|----------|-----------------------|
| S9055-01 | 12                    |
| S5973-01 | 33                    |
| G8522-01 | 34                    |
| G8522-02 | 29                    |
| G8522-03 | 55                    |

# VII. TESTS AT CERN T7 FACILITY

We performed a test at CERN T7 facility with 24 GeV protons with beam profile of 2 cm. Figure 7 shows the test stand we designed to be used at T7. The stand has 32 optical channels, but can be expanded up to 256 channels. It has a compact, portable footprint. The stand provides a fine control of the optical power for each individual channel. It can be also modified into a bit error rate test stand. The test stand is controlled by a PC running LabView.



Figure 7 : Test stand used at CERN T7.

At T7, the proton beam is about 20 m away from the control room and samples have to be installed in a shuttle to reach the beam. Samples in a shuttle are connected via fiber ribbons to the VCSEL modules at the test stand. Every 30 seconds, the test stand reads out the optical power sent to the PiN diodes and the currents of each individual channel of PiN diodes as a measure of responsivity.

Figures 8-10 show degradation of the PiN diode responsivity due to irradiation with 24 GeV protons. This degradation is in a good agreement with results obtained at 200 MeV at IUCF. Figure 8 shows that the S5793-01 diodes lost about 30% of their initial responsivity, to be compared with 33% at IUCF.

Figure 9 shows that with total fluence of  $2.6 \times 10^{15}$  p/cm<sup>2</sup> the S9055-01 diodes lost about 10% of their initial responsivity which is in agreement with 12% at IUCF. Finally, Figure 10 shows responsivity plots for the GaAs PiN array with an average of 50% loss of responsivity.

The responsivity shown in these plots is calculated taking into account the optical power fluctuations of the optical sources.



Figure 8 : Responsivity vs. accumulated dose obtained at CERN T7 (three diodes S5793-01).



Figure 9 : Responsivity vs. accumulated dose obtained at CERN T7 (four diodes S9055-01).



Figure 10 : Responsivity vs. accumulated dose obtained at CERN T7 (diode G8921-01, 4 channels)

#### VIII. SUMMARY AND CONCLUSIONS

We developed and built three test stands for PiN diode responsivity studies and performed irradiation tests at three different beam facilities. Our results demonstrate that radiation hardness does not depend on the active area of PiN diodes from the same family. Results obtained at two different proton beam energies (200 MeV and 24 GeV) are found to be in a good agreement with each other. Based on results from IUCF and CERN irradiation runs we identified the following PiN diode candidates:

- a) GaAs array G8921-01 with total responsivity degradation less than 50% (initial photosensitivity is 0.5 A/W). Its responsivity after irradiation is better than for S9055-01. It is available in different configurations (4 to 16 channels per array).
- b) Si PiN 9055-01 with total degradation less than 10% (initial photosensitivity is 0.25 A/W).

These candidates can be used for other applications in high radiation areas at the LHC and SLHC.

#### IX. REFERENCES

1. I. Gregor, "Optical Links for the ATLAS Pixel Detector," Ph. D. Thesis, University of Wuppertal, 2001.

2. A. Van Ginneken, "Non-ionizing Energy Deposition in Silicon for Radiation Damage Studies," FERMILAB-FN-0522, Oct 1989, 8pp.

# Interference coupling mechanisms in Silicon Strip Detectors - CMS tracker "wings": A learned lesson for SLHC -

F. Arteche<sup>a</sup>, C. Esteban<sup>a</sup>, C. Rivetta<sup>b</sup>,

<sup>a</sup> Instituto Tecnológico de Aragón (ITA), Zaragoza, Spain
 <sup>b</sup> SLAC National Accelerator Laboratory, Stanford University, USA

# farteche@ita.es

#### Abstract

The identification of coupling mechanisms between noise sources and sensitive areas of the front-end electronics (FEE) in the previous CMS tracker sub-system is critical to optimize the design and integration of integrated circuits, sensors and power distribution circuitry for the proposed SLHC Silicon Strip Tracker systems.

This paper presents a validated model of the noise sensitivity observed in the Silicon Strip Detector-FEE of the CMS tracker that allows quantifying both the impact of the noise coupling mechanisms and the system immunity against electromagnetic interferences. This model has been validated based on simulations using finite element models and immunity tests conducted on prototypes of the Silicon Tracker End-Caps (TEC) and Outer Barrel (TOB) systems. The results of these studies show important recommendations and criteria to be applied in the design of future detectors to increase the immunity against electromagnetic noise.

#### I. INTRODUCTION

The *Silicon Tracker* is located in the interaction region of the calorimeter and two parts; the inner one based on *Pixel detectors* and the outer part built with *Silicon micro-strip detectors*. In the *silicon tracking* system, the detector module is the basic functional component. Each module consists of three main elements:

- Single or double side silicon micro-strip sensors.
- Mechanical support (Carbon fibre frame).
- Readout front-end electronics (Hybrid circuit).

These modules are grouped, partially overlapped, in leaders and petals to cover several cylinders and end-caps of the tracker's mechanical structure. The hybrid module includes the sensitive front-end amplifier APV25 [1]. Power distribution and slow control signal are distributed to the modules by a custom interconnection board (ICB).

The analysis presented in this paper is based on data measured on the Tracker End-Caps (TEC) and Tracker Outer-Barrel (TOB) detectors. The TEC prototype used to perform the EMC tests consisted of a '*petal*' with 96 APV25 chips and associated electronics distributed along one interconnection board (ICB). The TOB prototype used in the test consisted of a '*leader*' with 6 modules (about 28 APV25) distributed along the ICB. The tracker detector uses similar detector modules, being the main difference among sub-detectors (TEC, TIB, TOB) the geometric arrange of the modules and the ICB design.

Based on the measurements to characterize the electromagnetic interference (EMI) immunity of the tracker FEE, this paper presents a model that describes and quantifies the interference coupling mechanism between the near-field radiated by the ICB and the sensitive areas of the detector module. This coupling mechanism was detected during the EMI immunity tests [2] and was a limiting factor of the FEE noise performance during the tracker integration.

#### II. EMI CHARACTERIZATION OF TRACKER

Since the tracker FEE is linked to the acquisition system via optical fibres, the conductive noise is mainly coupled into the FEE through the input power cables and the slow control network. To characterize the electromagnetic susceptibility of the FEE to conductive disturbances, different tests are conducted by injecting RF currents through the FEE input power and slow control cables. The main goal of these tests is to characterize the immunity of the system to RF perturbations [3] [4].

#### A. Test set-up

The experimental set-up is designed [5][6] such that the FEE and the auxiliary equipment exhibit during the test a configuration as close as possible to the final one. The perturbing signal is injected to the FEE input power and slow control cables using a bulk injection current probe, a RF amplifier and a RF signal generator. The level of the injected signal is monitored using an inductive current clamp and a spectrum analyser. The test procedure consists in injecting a sine-wave perturbing current at different frequencies and amplitudes into the FEE through the input cables and evaluating the performance of the FEE, measuring the output noise signal. The output signal of the FEE is measured by its own acquisition system. The frequency range of the injected RF signal is between 150 kHz and 50 MHz.

The data used in this paper to model the coupling mechanism between the ICB and the detector module correspond to the common mode (CM) noise injection. In this case, the perturbation sine-wave current is injected to both the active and return power cables. The sine wave injected will perturb the FEE by adding a noise component to the intrinsic thermal noise component of the APV25. The level of the signal injected is large enough to have a good signal-to-noise ratio at the input of the ADC without affecting the linearity of the overall FEE. The coupled interference to the FEE depends on the amount of noise current induced in the sensitive areas of the FEE.

#### **III. EMI CHARACTERIZATION - RESULTS**

Results from tests conducted on prototypes of TEC and TOP detectors give insights to analyze the coupling mechanism between the noise source and the sensitive frontend electronics.

# A. Tracker End Cap

Injection signal tests [2] have showed that the noise does not distribute equally among all the channels in the TEC petal tested. The strip channels located in the centre of the silicon detector are more sensitive that those located near the periphery. The detector-APV modules closer to the ICB and petal connector also are more sensitive. The sensitivity to noise or interference increases in frequency from zero at 20-40dB/dec and extend above the intrinsic bandwidth of the APV. The frequency response also includes two resonances associated to parasitic coupling between the ICB and ground connections.

#### B. Tracker Output Barrel

Results [6] from the TOB showed a non-uniform distribution of the coupled perturbation among the channels of the rod under test. In this case, the strip channels located in the centre of the silicon detector are less sensitive and the most sensitive detector-APV modules are those located close to the rod's input power connector. The frequency response of the coupled noise is similar to the TEC channels, with exception of the resonance frequencies.

Additional tests were performed in TOB in order to analyze the origin and coupling mechanism of the interference. An un-grounded cooper sheet, isolated on one side, with an area approximately equal to <sup>1</sup>/<sub>4</sub> of the silicon detector area was used to screen partially the detector. Partial screening of different detector's areas with the cooper plate gave different results. Covering the areas remote of the pitch adapter did not introduce appreciable noise reduction for all the channels. When the cooper sheet covered the areas over the pitch adapter the interference coupling was null.

This set of tests allows defining the pitch adapter region of the hybrid module and silicon sensor as the area susceptible to near-field interference generated by the ICB. The coupled field should be prominently magnetic, because the effect of the shielding is negligible when the cooper sheet is covering the detector in areas remote from the pitch adapter. If Electric field is the major component of the coupled field, the attenuation due to the screening should be the same for all region of the silicon detector.

### IV. NOISE COUPLING

To investigate the coupling mechanism between front-end electronics/silicon detector and the noise generated by the ICB when common mode current are flowing, we separated the study in 3 parts: Noise source or field generation, Sensitive area in the receptor and signal processing in the receptor. The last part it is important to be included in the analysis because the signal measured by the electronic system and used in the analysis is partially processed.

#### A. Noise Source – Near-Fields

The noise source is controlled in these tests because the interference injected to the ICB is known. The electric and magnetic field around the ICB can be calculated via finite element simulation to have a perfect representation of the electric and magnetic near-field perturbing the silicon detector and the front-end electronics. Results of the magnetic field for the TEC and TOB configurations are depicted in Fig. 1 and 2.



Figure 1: *A*: Magnetic field around the ICB (lower line) and Silicon Detector (upper line) for the TEC module. *B*: Normalized vertical magnetic field component  $(By(x)/B_{max})$  around the Silicon detector

The relative position between the ICB and the silicon detector is preserved in the analysis for both cases. The upper plots show a view of the x-y plane for a given location along z in the ICB and the silicon detector. The closed lines represent curves of constant magnetic induction in x-y. Using as reference the position along the width of the silicon detector (x axis, x=0 left edge, x=100 mm right edge), the lower plots depicts the normalized magnitude of the vertical component of the magnetic field By(x) intersecting the silicon detector.

Fig. 1 corresponds to the TEC case and it is possible to observe that the vertical component of the magnetic field intersecting the silicon detector has a maximum at the centre of the device (left edge of ICB). At the left side of the center, it decreases because of the distance, however at the right side, it decreases because the horizontal component of the magnetic field start to be the dominant one. For TOB Fig 2 because the ICB and the silicon detector are mounted one over the other sharing the x symmetry line, the vertical component of the magnetic field is odd-symmetric respect to x=40mm. Similar

results can be obtained for the other near-field components around the silicon detector.



Figure 2: *A*: Magnetic field around the ICB (lower line) and Silicon Detector (upper line) for the TOB module. *B*: Normalized vertical magnetic field component  $(By(x)/B_{max})$  around the Silicon detector

#### *B. Sensitive area in the receptor*

The critical area in the front-end electronics/silicon detector, in general, is the connections between the detector and the sensitive front-end electronics. In that connection, the signal level is the lowest in all the electronic system. Additionally, front-end amplifiers have large gain to be able to process the tinny signals delivered by the HEP detector. The connection between the strips of the silicon detector and the multichannel is simple. Each 512 channel detector is readout by 4 APVs with 128 channel each one. Each strip is connected to the input pin of the corresponding APV through the wire bonding between the hybrid board and the pitch adapter. To close the signal circuit, the current return circuit has not a direct connection to the hybrid board or APV chips. Currents return via the silicon detector backplane but there is no direct connection between the backplane and the hybrid circuit. From the backplane, currents find the return path through the conductive carbon fibre [7] holder of the detector and through parasitic capacitive find the hybrid board. This signal circuit can be understood better following the electrical and mechanical schemes showed in Figs. 3 and 4. Fig. 3 depicts a simplified electric circuit of the input signal path, showing the main components that define the loop.



Figure 3: Simplified electric circuit describing the Silicon Detector-APV25 connection.

Signal currents return to the APV flowing through the backplane, the capacitive coupling between the backplane and the carbon fibre, the carbon fibre holder (carbon fibre legs and cross-piece) and the hybrid board. This path is mainly formed by parasitic elements in the circuits, defining not the optimal path compatible with the circuit sensitivity at that point. The main problem associated with the return current path in the pitch adapter area can be explored from in the following figures.



Figure 4: Top view of the detector module: Silicon Detector & Hybrid module



Figure 5: Partial view of the pitch adapter area

Fig 4 shows the top view of the tracker module. From Figs 3 and 4 it is possible to observe that for the current, the lowest impedance path around the pitch adapter is flowing through the edges of the holder structure as defined by the dot lines and arrows. This loop is defined by the no direct connection between the hybrid's 0V layer and the detector backplane. This loop increases the susceptibility of the circuit to vertical magnetic fields (By). A more detailed drawing of that area is

shown in Fig. 5, where it is defined a rectangular loop whose length is equal to the silicon detector width (wd) and the width (e) equal to the gap between the back-plane and the hybrid board (around 0.6mm). This loop is common to the 512 channels of the strip silicon detector.

#### C. APV Signal Processing

The APV [1] is a charge amplifier followed by a shaper able to amplify and process current signals from the strip detector up to a frequency around 10MHz. This processing is the same for all the 128 channels included in the chip. Additionally, each chip includes a common mode (CM) subtraction to reduce the common mode noise induced in the 512 channels of the detector. Therefore, the multi-signal recorded and used for analysis is not proportional to the input current per channel but it includes the coupling of all the other APV's channels due to the common mode subtraction. Defining the input current per APV channel as  $i_{APVi}(t)$  with i=1,2...128 being the channel number, then the output signal  $v_{oi}(t)$  is

$$v_{oi}(t) = \int_{-\infty}^{\infty} h_{APV}(t-\tau) * \left[ i_{APV}(\tau) - \frac{1}{128} \sum_{i=1}^{128} i_{APV}(\tau) \right] d\tau$$
(1)

where  $h_{APV}(t)$  is the impulsive response of each APV channel. Each APV subtracts the common mode level corresponding to 128 channels of the detector, shifting the corresponding DC level of those 128 output signals. For a given silicon detector, the CM level for adjacent APVs can be different.

#### V. COUPLING MECHANISM MODEL

Based on the analysis presented in previous section, the magnetic interference coupled into the input signal loop can be quantized using the simple model depicted in Fig. 6.



Figure 6: Simplified circuit

The loop showed in Figs. 4 and 5, can be considered as a short transmission line conformed by two conductors (one is the backplane, the other is the CF cross-piece) with capacitive loads at both ends (mechanical joint between the CF legs and CF cross-piece) and illuminated by a perpendicular magnetic field By. Between the conductors, the input impedance of the APV is connected through a series capacitor that represents the capacitance between the silicon strip and the backplane. The circuit depicted only shows one of the APV channels and the equivalent voltage generators represent the voltage induced by the magnetic field toward the left and right of the

APV channel considered. These equivalent voltages can be expressed as

$$v_{i1}(t) = e \int_{1}^{t} \dot{B}_{y}(t,x) dx, \quad v_{i2}(t) = e \int_{1}^{512} \dot{B}_{y}(t,x) dx$$
 (2)

with e: the width of the loop or separation between conductive surfaces (Fig.5) and  $\dot{B}_y(t,x) = \frac{dB_y(t,x)}{dt}$ . The current flowing through the input impedance of the APV (input current) in frequency domain is

$$I_{APVi}(\omega) = \frac{V_{i1}(\omega) - V_{i2}(\omega)}{Z_{lg} + Z_{APVi}}$$
(3)

Based on the magnetic induction By(x) calculated previously for the TEC and TOB configurations, it is possible to evaluate the APV's input current for the 512 channels of the silicon detector. Solving (2) and (3) for a given time t and assuming direct proportionality between  $I_{APV}$  and the voltage difference, the input current per channel for both detector are plotted in Fig. 7. The dotted red lines separate the channels processed by each APV.



Figure 7: APV normalized current distribution per channel  $I_{APVi}/I_{APV \max}$  - A: TEC Detector , Br: TOB Detector

Considering the effect of the common mode subtraction included in each APV-25, the signal proportional to the output voltage is plotted in Fig 8.



Figure 8: Digitized APV output voltage distribution per channel after common mode subtraction -*A*: TEC Detector, *B*: TOB Detector

#### VI. RESULTS – COMPARISON BETWEEN MEASUREMENTS AND SIMULATIONS

In order to compare the measurements and the results obtained by simulations based on the model presented, a particular analysis of the recorded data was conducted. If a common mode current is injected to the ICB at a particular frequency, at any time, the measurements and simulation results should give the same voltage distribution for all the APV channels. For a particular time instant, the injected current is constant and the APV output voltages should follow a pattern for all the channels as those depicted in Fig. 8. Setting three different time instants along the injected sinewave signal, t=t1 coincident with the positive peak, t = t2, coincident with the zero crossing and t = t3 coincident with the negative peak of the sine-wave, the simulated output voltage for all the 512 channels is depicted in Fig 9A for the TEC detector. In Fig. 9B the measurement at the same sampling times of the same output voltages are shown. It is important to observe the similarity between the measurements and the simulation results. Comparing the simulation and the measurements for the TOB detector, it is possible to obtain similar results. They are depicted in Fig. 10.

If the digitized output voltage of the APVs are further processed to measure the root-mean-squared (RMS) values of each channel, the RMS voltage distribution for all the detectors channels changes. Mainly, negative values in previous plots (Figs. 9 and 10) became positive when the RMS value is calculated. Figs. 11B and 12B depict the measured RMS output voltage of the APVs when a perturbing common mode current is injected through the ICB. The base noise in those plots is defined by the intrinsic thermal noise of the APVs. Plots have showing the RMS output voltage versus the 512 channels of the silicon detector have particular shape called by the CMS collaboration "wings". Proceeding with the calculations based on the model and simulation, the resulting RMS output voltage is depicted in Figs. 11A and 12A.



Figure 9: APV Digitized output voltage distribution per channel after common mode subtraction for TEC detector- t=t1 (red), t=t2 (green), t=t3 (blue) A: (simulated values) - B: (measured values).



Figure 10: Digitized APV output voltage distribution per channel after common mode subtraction for TOB detector- t=t1 (red), t=t2 (green), t=t3 (blue)A: (simulated values)- B : (measured values)


Figure 11: APV RMS output voltage distribution per channel after common mode subtraction for TOB detector -A (*upper*): (simulated values)-*B* (*lower*): (measured values)



gure 12: APV RMS output voltage distribution per channel after common mode subtraction for TEC detector -A(upper): (simulated values)-*B* (lower): (measured values)

## VII. CONCLUSIONS – SLHC EFFECTS

A model of the coupling mechanism between the interference currents flowing in the ICB and the detector module have analyzed. Agreement is shown between simulations results based on the model and measurements on prototypes of two different CMS tracker systems. These studies and the model suggest that the improvement in the tracker module immunity can be achieved by minimizing the signal return loop around the pitch adapter. An integral mechanical design that connects the silicon detector back-plane and the hybrid board reference will force to flow the return current beneath the signal current minimizing the input signal loop. Another important point is that basic coupling is due to near electromagnetic fields, then minimizing, by design, the field radiated by the ICB will improve the overall immunity of the tracker's leather or petal. Additionally, filtering the interference currents at the input terminals of the distribution board will reduce the magnitude of the radiated fields.

## VIII. ACKNOWLEDGMENT

The authors would like to thank to Dr. Peter Sharp from Imperial College/CERN for helping us during the development of these studies. Also, we would like to thank to Instituto Tecnológico de Aragón (ITA), Zaragoza, Spain and specially Dr. J.L. Pelegay, head of Grupo de Investigación Aplicada (G.I.A.) for the support of this work. Finally, one of us (C.R.) wants to thank to US DOE, under contract DE-AC02-76SF00515, for the support of this work..

## IX. REFERENCES

[1] M. Raymond, et. al., "*The APV25 0.25 μm CMOS readout chip for the CMS Tracker*", Proc. IEEE Nuclear Science Conference, October 2000, Lyon, France, pp. 9/113 - 9/118

[2] F.Arteche, C. Rivetta, C. Esteban et al "*Detector noise susceptibility issues for the future generation of High Energy Physics Experiments*", Proceedings of Workshop on Electronics for LHC Experiments – TWEPP 2008, Vol 1, pp533-538, September 2008.

[3] F. Arteche, C. Rivetta and F. Szonsco, "*Electromagnetic Compatibility Plan for the CMS Detector at CERN*", Proc. of 15th Int. Zurich Symposium on EMC, February 18-20, 2003, Zurich, Switzerland, pp. 533-538.

[4] F.Arteche and C. Rivetta, "*Electromagnetic Compatibility Test for CMS experiment*", Proceedings of Workshop on Electronics for LHC Experiments – LECC 2002, Vol 1, pp191-196, September 2002.

[5] F. Arteche and C. Rivetta "*EM Immunity studies for frontend electronics in high-energy physics experiments*", Proc. of Int. Symposium on EMC, EMC Europe 2004. September 2004, Eindhoven, The Netherlands, pp. 533-538.

[6] F.Arteche, C. Rivetta,"EMC phenomena in High Energy Physics Experiments: Prevention & Cost savings", Proceedings International Symposium on the Development of Detectors for Particle and Synchrotron Radiation Experiments – SNIC 2006: pp 0149. SLAC-R-842, April 2006

[7] M.Johnson et Al." *Electrical properties of carbon fiber support systems*", Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, Vol. 550, Issues 1-2,Sept 2005, Pages 127-138.

Fi

# Development and commissioning of the ALICE pixel detector control system

C. Bortolin<sup>a,\*</sup>, I. A. Calì<sup>b</sup>, R. Santoro<sup>b</sup>, C. Torcato Matos<sup>c</sup> on behalf of the Silicon Pixel Detector project in the ALICE collaboration

<sup>a</sup> University of Udine and I.N.F.N. of Padova (Italy) <sup>b</sup> Dipartimento di Fisica dell'Università and sezione I.N.F.N. of Bari (Italy) <sup>c</sup> CERN, Geneva, CH-1211 Geneva 23, Switzerland

# \*<u>claudio.bortolin@cern.ch</u>

#### Abstract

The Silicon Pixel Detector (SPD) is the innermost detector of the ALICE Inner Tracking System and the closest one to the interaction point. In order to operate the detector in a safe way, a control system was developed in the framework of PVSS which allows to monitor and control a large number of parameters such as temperatures, currents, voltages, etc.

The control system of the SPD implements interlock features to protect the detector against overheating and prevents operating it in case of malfunctions. The nearly 50,000 parameters required to fully configure the detector are stored in a database which employs automatic configuration versions after a new calibration run has been carried out. Several user interface panels were developed to allow experts and non-expert shifters to operate the detector in an easy and safe way.

This contribution provides an overview of the SPD control system.

#### I. THE SILICON PIXEL DETECTOR

SPD is based on a hybrid silicon pixels technology and contains around 9.8 M read-out channels. It is composed of 120 half-staves (HS) mounted on 10 carbon fibre supporting sectors (Fig. 1). Each half-stave is made of two ladders, a Multi Chip Module (MCM) and an aluminium-polyimide multilayer bus. Each ladder consists of 5 front-end chips flip-chip bonded to a 200 microns thick silicon sensor [1].



Figure 1: The Silicon Pixel Detector

The MCM constitutes the on-detector electronics and performs operations such as clock distributions, data multiplexing, etc. The multilayer bus provides the connection between the MCM and the front-end chips, while communication between the MCM and the off-detector electronics (Routers) is assured by three single-mode optical fiber links.

The SPD low voltage power supply (PS) system is based on 20 CAEN A3009 dc-dc converter modules (1 for each half sector) housed in 4 CAEN Easy3000 crates located about 40m from the detector. The sensor bias voltage is provided by 10 CAEN A1519 modules (1 for each sector) housed in a CAEN SY1527 mainframe 100 m away from the detector.

# II. OVERVIEW OF THE DETECTOR CONTROL SYSTEM

The DCS plays a leading role in operating the SPD and fulfils very stringent requirements. The ALICE Detector Control System (DCS), as well as all the LHC experiments, is supervised by a SCADA system (Supervisory Control and Data Acquisition) based on a software platform called PVSSII [2].

The aim of every control system is to supervise all the operations carried out in its structure and to react promptly in case of misbehaviors. The ALICE DCS group, in collaboration with every detector, foresaw a series of constrains to integrate the control system of each sub-detector into a unique control system. The DCS of the SPD was designed according to such requirements. Standard components were mainly used to reduce maintenance efforts and, in few cases, dedicated components were developed for specific and innovative tasks.

The block diagram shows the connections between the hardware and software components (Fig. 2).



Figure 2: Detector control system scheme

There are 4 sub-control systems: PS control, Interlock control, Cooling Control and the Front-End driver (FED) control. The first three directly communicate with the hardware via Ethernet (TCP/IP, OPC protocol), while the last one uses the same protocol to connect to the FED, which is the driver that communicates with the off-detector electronics (20 routers) via a VME bus.

PVSS provides the interface between the hardware components and the control logic units which are supervised by the Finite State Machine (FSM).

The FSM connects the defined logical states of the detector components and sends macroinstructions (i.e.: Go Off, Go Ready, etc) via PVSS: a macroinstruction is a sequence of operations addressed to the hardware. The correct sequence of actions is checked and possible errors are detected. The FSM also has the task of providing the ALICE DCS with information regarding the status of the SPD (i.e.: Ready for data-taking, calibrating, etc.).

PVSS is designed for slow control applications and it is not suitable for direct control of fast front-end electronics.

The FED was built to interface the PVSS layer with the SPD electronics. It controls two Front-end device servers (C++ based) [3], it receives macroinstructions and autonomously operates the front-end/off-detector electronics. It is provided with an Oracle database interface that operates the whole SPD configuration. The FED can read and save all the required parameters in the Configuration Oracle Database (CDB).

The DCS of SPD is divided into 4 different PVSSII projects which run in 4 computers: 3 working nodes and 1 operator node. These projects constitute a distributed system and they communicate with one another through an Ethernet (TCP/IP) protocol. The working nodes are installed on 3 Windows XP machines, accessible only to expert users; they are the computers used to control the detector. The User Interface (UI) is installed in the operator node where all users can login and it is equipped with a Windows server 2003.

The UI is the graphic interface tool that allows users to monitor and manage the detector in a careful way.

#### A. Finite state machine

The Finite State Machine is based on a State Management Interface (SMI++) [3] and it is the logical part of the DCS (Fig. 3) since it manages the starting, intermediate and final states of the subsystems and it also reacts according to their changes (Ready, McmOnly, Error, Off...).



Figure 3: SPD Finite State Machine scheme

The hardware system is represented by the Device Units (DU). It contains 862 DUs which transmit their state to 31

Control Units (CU) through 365 Logical Units (LU). Therefore, any signal is transmitted from the bottom to the top and actions are carried out according to the kind of signal.

All information passes through the Top Node which communicates with the ALICE control system.

The CUs are connected to some background scripts (running) which constantly check the system stability allowing or forbidding to carry out the commands sent by operators. On the contrary, if a DU changes its state, the information is sent to the upper nodes through the LUs and the CUs modify their state accordingly.

The CUs allow to intervene only in the concerned parts of the detector without affecting its global operating state. Further details are available in the bibliography [4].

## B. Cooling system and interlocks

The cooling system is very important and it must be in full working conditions for SPD to run efficiently. Since every HS dissipates  $\approx 12.5$  W (corresponding to about 1.5 KW for the whole detector) and having the detector a very low mass, a  $C_4F_{10}$  evaporative cooling system was chosen.

The cooling plant is controlled by a PLC. An OPC Serverclient protocol is used to communicate via Ethernet (TCP/IP) with the control PC.

Should the cooling system be suddenly switched off, the temperature rise in the half-staves would increase by 1°C/s and this would irreparably damage the detector in a few seconds. Several interlock levels (hardware and software) act in parallel to switch off the power supply in case of misbehavior.

Two redundant chains of 5 temperature sensors (Pt1000) each installed on the upper surface of all half-staves provide the relevant temperature for the interlock system. One chain is directly connected to the PLC analog input modules, while the second one is read-out through the MCM which transmits the temperature values to the routers. In case a half-stave exceeds a threshold temperature value (usually set at 40°C) the above mentioned hardware interlocks act on the Low Voltage (LV) board switching off the concerned half-sector.

The cooling plant is provided with a further interlock system composed of 11 hardware interlock channels. One of them acts on all the low voltage modules and switches off the whole detector in case of cooling plant failure (i.e.: pressure instability, high temperatures in the boosters, etc...). The other ten channels instantly switch off only the corresponding single sector in case the relevant cooling line is faulty.

The main difference between hardware and software interlocks is their reaction time. The hardware interlocks instantly react switching off the power supply of the whole detector or, in case of temperature peak (spike) in a HS, they switch off the power supply of the half sector that hosts it. As regards the software interlocks, two levels were implemented; their reaction time is slower than the hardware ones because they are transmitted through the communication protocols of the control system. Nonetheless they offer a great advantage since they can intervene on single half-staves. The first software interlock level forbids operators to switch on a HS if the measured temperature is higher than 22°C; the second level switches off single HSs in case their temperature

exceeds 38°C. These operations are carried out thanks to a background script that individually intervenes in the channels of the LV supply module.

# C. User Interface

Simplified panels, in accordance with the ALICE UI framework, are provided to monitor and operate the detector. The latter is schematically reproduced in the middle of the UI; each color is associated to a hardware condition (Green= Ready state , Red = Error State...).

Detailed panels showing the values of hardware subsystems are also available (Fig. 4).



Figure 4: User Interface panels

The temperatures, voltages and currents of the HSs, as well as the main cooling plant operation parameters are archived in a dedicated database and can be displayed in tables and plots or saved in text files for offline analysis.

# D. PVSS -Database Connection

The system configurations are stored in an Oracle based Database named Configuration Database (CDB). The ALICE DCS group provides the infrastructure and the CDB maintenance, however every detector can autonomously chose how to manage its own database.

The SPD configurations are stored in the Configuration Database (CDB) and each version contains 52,800 DACs values.

The Database design is an optimized structure that creates new configuration versions without data duplication. When DACs values are changed, the DB generates a new configuration version number and the full configuration is retrieved by the FED through the rearrangement of the pointers (Fig. 5).



Figure 5: Saving a configuration structure

A direct connection between PVSS and the CDB was implemented to control the working configurations stored in the CDB in a more efficient way (Fig. 6). Making use of library functions in the framework of PVSS and adjusting them to the CDB structure it is now possible to raise predefined queries to the SPD CDB and to show their results in the UI panels. This way we can directly gain access to the several configurations. As a matter of fact, some panels were implemented to compare the configurations of different HSs or the ones of a same HS in order to monitor the way it worked during a slot.

Recently we have developed a hardware system that allows to detect errors coming from the detector (i.e.: optical connection status and data format errors, front-end and backend errors, wrong trigger sequences, etc.) [5] also while data are being taken. The routers send such errors to the FED which carries out an online pooling operation and stores them in a dedicated table in the DB. As soon as the FED detects any errors in the routers, it generates a signal that is transmitted to the operator via PVSS. Therefore the direct connection between PVSS and the database provides the operator with information concerning such errors without interfering in data taking operations.



Figure 6: Schematic of the PVSS-DB connection

This improvement made the storing procedure more user friendly. Besides, thanks to the panels containing the predefined queries operators can get real-time information regarding the detector state and therefore be able to monitor it in a more useful and efficient way.

## **III. FINAL COMMISSIONING AND TESTS**

In 2007 the DCS pre-commissioning was carried out at the CERN Departmental Silicon Facility (DSF) laboratory.

The control system was moved to the ALICE Control Room (ACR) at the end of 2007 and since then the commissioning has been carried on with improved automatic functionalities that allow experts and shifters to operate and monitor the detector in a more and more effective way.

Since then, any hardware and software further developments are first of all tested in the setup maintained in the DSF.

The efficiency of the interlocks was proven when it promptly reacted by switching off the SPD during normal operations as a consequence of external alarms.

The SPD control system is completely integrated in the ALICE DCS from which the detector can be monitored and directly operated.

During the cosmic runs that took place in 2008, SPD collected data for more than 200 hours. In August 2009 the SPD took part in the ALICE cosmic program with magnetic field providing the L0 trigger signal for about 280 hours.

## **IV. REFERENCES**

[1] ALICE collaboration, ALICE Inner Tracking System (ITS): Technical Design Report, CERN-LHCC-99-012,

http://edms.cern.ch/file/398932/1.ALICE Collaboration, The ALICE experiment at the CERN LHC, JINST 3 S08002 (2008).

- [2] http://itcobe.web.cern.ch/itcobe/Services/Pvss/Docu ments/PvssIntro.pdf
- [3] I.A. Cali, The ALICE Silicon Pixel Detector Control and Calibration Systems, 2008 CERN-THESIS-2008-038.
- [4] SMI++ Manual, http://smi.web.cern.ch/smi/.
- [5] M. Caselle, The Online Error Control and Handling of the ALICE Pixel Detector, Proceeding of TWEPP 2009.

# Upgrade of the BOC for the ATLAS Pixel Insertable B-Layer

# J. Dopke<sup>a</sup>, T. Flick<sup>a</sup>, T. Heim<sup>a</sup>, A. Kugel<sup>b</sup>, P. Mättig<sup>a</sup>, N. Schroer<sup>b</sup> and C. Zeitnitz<sup>a</sup>

<sup>a</sup> University of Wuppertal, Germany <sup>b</sup> University of Heidelberg, Germany

jens.dopke@cern.ch

## Abstract

The phase 1 upgrade of the ATLAS [1] pixel detector will be done by inserting a fourth pixel layer together with a new beampipe into the recent three layer detector. This new detector, the Insertable B-Layer (IBL) should be integrated into the recent pixel system with as few changes in services as possible, but deliver some advantages over the recent system.

One of those advantages will be a new data transmission link from the detector modules to the off-detector electronics, requiring a re-design of the electro-optical converters on the off-detector side. First ideas of how to implement those, together with some ideas to reduce cost by increasing the systems throughput are discussed.

# I. REQUIREMENTS OF THE UPGRADE

Readout wise the IBL will be run as part of the recent pixel detector subsystem. Hence the IBL subsystem will have to be compatible to the pixel subsystem in terms of software integration and connectivity with other ATLAS systems.

# A. Integration into the recent ATLAS pixel readout structure

The off-detector side of the ATLAS pixel detector readout is a VME based system. It delivers a maximum data rate of 160 MB/s (per building block) to the higher level readout systems. 16 building blocks can be integrated into one readout crate and controlled by a Single Board Computer (SBC), using the VME bus interface.

A building block of the readout system is composed of a pixel ReadOut Driver (ROD) and a pixel Back Of Crate card (BOC). The ROD can send commands to and receive data from a maximum of 32 modules via the BOC. It is given four floating point Digital Signal Processors (DSP) to shrink and evaluate calibration data. Data received from the modules is, during data taking, put out through a high-speed interface on the BOC, the SLink. The VME bus is only used for configuration and calibration communication (e.g. histogram download).

The BOC is an I/O board to the ROD, carrying electrooptical converters (cf. [2]). It adds delays to sent signals to adjust the detectors phase against the LHC bunch crossing and to the returned data, aligning it with the off-detector clocks. The SLink interface is attached to the BOC as a mezzanine card and controlled by the ROD only. A special feature of the Pixel<sup>1</sup> BOC is decoding of 80 MBit/s streams into two 40 MBit/s streams, which is an input requirement of the ROD.

Software interfaces have been written for the recent system, to run calibration scans, generate histograms and start datataking. Part of this software package is the firmware running inside the ROD DSPs. It can control all readout hardware, generate configurations for the system automatically and was checked for consistent results during a long calibration phase. Most of this software should be kept as is for the IBL system. Particular importance goes to the firmware of the DSPs on the readout drivers, doing most of the calibration scan control, data analysis and histogramming. Hence the pixel RODs should not undergo changes that are not desperately needed or can be done without changing the DSP code. They should be re-used for the IBL system, concerning the software point of view.

## B. Upgrades needed for the IBL

The Insertable B-Layer will suffer higher occupancy due to its lower distance from the interaction point, hence a higher read out rate per detector area will be needed. As single frontends (two per IBL module) are read out a single transmission line only has to transfer a quarter of the former area. Estimates for the IBL frontend data rate assume more than 80 MBit/s. This will be served with 160 MBit/s readout via a single fibre, as opposed to two 80 MBit/s links in the existing B-Layer.

Balanced encoding is foreseen for the coming system to allow for automated threshold adjustments and clock reconstruction in the off-detector electronics. 8B10B encoding will therefore be integrated into the next on-detector readout chip. It allows to use market solutions for clock-data recovery (CDR) and implementing simple failure checks via parity control. The receiver can automatically sense the average light level per transmission line and a per channel monitoring can give direct status information.

The BOC *must* be upgraded to handle the new data rate and, in the process of keeping things simple, rescale it for the RODs input. A decoder will be integrated into the BOC as it has to do the CDR for changing the data rate. This implies a change in data rate down to 144MBit/s: 10 Bit data transferred at 160 MBit/s are decoded into 8 data bits and a single status bit, offering special k-words of the code. This either needs an adaption on the ROD side to read data from the BOC asynchronously or a conceptual change moving the first registration of data into the BOC (Therefore removing the redundancy of the status bit) and reading it from the ROD side as an input FIFO.

<sup>&</sup>lt;sup>1</sup>ATLAS pixel uses the same readout system structure and hardware as the SemiConductor Tracker (SCT) with small modifications.



Figure 1: Schematic view of the proposed IBL BOC Layout

The Timing, Trigger and Control (TTC) path will use the same encoding standard as is used in the recent ATLAS pixel detector, BiPhase Mark (BPM) encoding. It can either be encoded by the recently used transmission IC, the BPM-12, or implemented into programmable logic. In the IBL system, two frontends (one module) will share a common TTC link.

#### II. CONCEPTUAL LAYOUT

Following the requirements, a first schema (cf. Figure 1) for the IBL BOC has been decided on, which allows for maximum flexibility in implementation of other components. Additional features that seemed missing in the previous system have also been included into the new schema.

## A. Fulfilling the needs

The core of the IBL BOC is a large programmable device, which connects to any data-path element of the BOC, optical receivers(RX), optical transmitters (TX) and Higher Level Trigger (HLT) connections. Additionally most of the backplane connections will be fed into it. Final layout decisions can thereby be implemented in firmware, when the IBL system goes into production stage.

An interface FPGA is to serve firmware to the core, deliver a bus interface to the ROD and give JTAG access. This FPGA should only be programmable with manual intervention (programming cable), whilst giving easy upgrade-ability for the core of the BOC.

An ATLAS Embedded Local Monitoring Board (ELMB) will be mounted to read monitoring values from the BOC and serve as a native Detector Control System (DCS) interface. Reading and archiving of PiN currents, voltages and interlock values will thus be possible without interaction with the DAQ software system. Also the DCS side will be of much bigger value in debugging the IBL readout chain, which took a lot of DAQ expertise with the recent system.

Optical converter boards will be served with the same 40-pin sockets as before to serve either the same or new plugins. The

latter is guaranteed by wiring all I/O pins of the connector up with the central programmable device.

The connection to the higher level readout will be prepared as a mezzanine slot. Opposing to the recent BOC, this will be served with a reprogrammable interface and is planned to host a single ReadOut Buffer INput (ROBIN, see [3]) card. This will remove a transmission line from the readout chain giving a faster and simpler interface between the readout building block and the HLT.

#### B. Minding the upgrade

In preparation are multiple layouts for the IBL readout based on the recently used VME crate architecture and TTC Infrastructure:

The simplest one is to keep the ROD as is and have the BOC be the I/O card as it is now. The BOC will have to split Data here, such that the per-line data rate goes down below 40 MBit/s. As mentioned above, the ROD-BOC Interface would have to change to an asynchronous one, as raw data rate will not be a multiple of 40 MBit/s anymore

The opposite approach is to also re-design the ROD, allowing faster operation, in particular concerning data-taking throughput and VME bus performance. This comes with the complication of either rewriting software or hard boundaries on the design, such as usage of the same family of DSPs that is used in the recent ROD.



Figure 2: Data path implementation using the BOC

Our favoured design goes with a reproduction of the previous ROD with some simple modifications: The data-path (c.f. Figure 2, dashed container) is removed from the ROD and placed into the BOC. Calibration data that needs to reach the DSPs would be passed over by the BOC via a reversed SLink Interface.

Advantages are significant:

- 1. The ROD production will be very close to that of the previous ROD, implying fewer complications.
  - (a) Either full reproduction of the previous ROD with pin compatible replacement of some components
  - (b) Minor re-designs, speeding up VME bandwidth, maybe running a JTAG interface for the BOC into the Program Reset Manager (PRM) on the ROD for in-system reprogramming ability.
- 2. The former formatting and event building section of the ROD could be removed from the layout or just not equipped. Therefore a total of 9 FPGAs and multiple obsolete memories will not be needed in this ROD design, making it a lot cheaper.
- 3. The new BOC would deliver a data path during data-taking that can handle a higher throughput than the recent ROD. Data rate could be increased by at least a factor of two, assuming there is a, yet to be defined, faster interface to the embedded ROBIN card. Hence the total system size could be reduced by the same factor decreasing cost again.

To circumvent changing the DSP code, FPGAs on the ROD would be reprogrammed to map the previous ROD data-path functionality into the BOCs data path, hence blinding the DSP code to most of the changes. Programming of the new BOC data path will include re-use of the ROD sourcecode, as a lot of components will only need minor modifications.

## C. Programmable encoding

A first successful approach has been made to move the recent encoding chip (BPM-12, cf. [2]) functionality into

an FPGA. Implementing the encoding standard as a reprogrammable block would allow for later changes of the standard and re-use of hardware for other systems. In the present system there is no way of bypassing the encoding process or implementing another encoding standard (8B10B), which would help for loop-back testing of the optical transmission lines, now or in the future system. This would definitely be overcome by using a reprogrammable encoder with a standard optical transmitter.

#### **III.** CONCLUSIONS

The new data path of the IBL implies a change in offdetector readout electronics, which will be served with a new BOC card. Servicing and DCS interface will be simplified compared to the recent ATLAS pixel BOC. It will be kept as flexible as possible to allow later implementation of final configurations or protocols, while serving an early prototype for system testing [4]. The BOC will fit into the recent system base, while allowing to speed up the total bandwidth per building block, shrink the IBL system and hence reduce production cost.

#### REFERENCES

- G. Aad *et al.*, The ATLAS Experiment at the CERN Large Hadron Collider, *JINST* 3 (2008) S08003, http://stacks.iop.org/1748-0221/3/S08003.
- [2] M. L. Chu *et al.*, The Off-Detector Opto-Electronics for the Optical Links of the ATLAS Semiconductor Tracker and Pixel Detector, *NIM-A* 530 (2004) 293-310
- [3] R, Cranfield *et al.*, The ATLAS ROBIN, *JINST* 3 (2008) T01002
- AT-Biesiada The [4] J. et al., ToothPix LAS Pixel Detector Test Stand in SR1, https://twiki.cern.ch/twiki/pub/Atlas/ToothpixWiki/ toothpix\_note.pdf

# Improved performance for the ATLAS ReadOut System with the switchbased architecture

N. Schroer<sup>f\*</sup>, G. Crone<sup>b</sup>, D. Della Volpe<sup>c</sup>, B. Gorini<sup>d</sup>, B. Green<sup>a</sup>, M. Joos<sup>d</sup>, G. Kieft<sup>e</sup>, K. Kordas<sup>h</sup> A. Kugel<sup>f</sup>, A. Misiejuk<sup>a</sup>, P. Teixeira-Dias<sup>a</sup>, L. Tremblet<sup>d</sup>, J. Vermeulen<sup>e</sup>, F. Wickens<sup>g</sup>, P. Werner<sup>d</sup>

<sup>a</sup> Royal Holloway University of London, <sup>b</sup> University College London, <sup>c</sup> Universita & INFN Napoli
<sup>d</sup> CERN, <sup>e</sup> Nikhef Amsterdam, <sup>f</sup> Ruprecht-Karls-Universitaet Heidelberg
<sup>g</sup> Rutherford Appleton Laboratory, <sup>h</sup> University Bern

#### Abstract

About 600 custom-built ReadOut Buffer INput (ROBIN) PCI boards are used in the DataCollection system of the ATLAS experiment at CERN. They are plugged into the PCI slots of about 150 PCs of the ReadOut System (ROS). In the standard *busbased* setup of the ROS requests and event data are passed via the PCI interfaces. The performance meets the requirements, but may need to be enhanced for more demanding use cases. Modifications in the software and firmware of the ROBINs have made it possible to improve the performance by using the onboard Gigabit Ethernet interfaces for passing part of the requests and of the data in the so called *switch-based* scenario. Details of these modifications as well as measurement results are presented in this paper.

#### I. INTRODUCTION

The first level trigger (L1) of the ATLAS experiment [1] at CERN reduces the event rate from 40 MHz (bunch crossing frequency of the LHC) to at maximum 100 kHz. With this input frequency fragment data is written to the buffers of custom made circuit boards (ROBIN [2]) at about 120 GB/s (via  $\sim$  1600 optical links, 3 per ROBIN). Typically four ROBINs are plugged into the PCI slots of each of the 150 ReadOut System (ROS) PCs and the read out of the event data is performed on the PCI bus, thus the name bus-based for this setup. The connection to the Data Collection (DC) network, which manages the selection and storage of events for later analysis, uses two of the four Gigabit Ethernet ports of a quad-port NIC plugged into the ROS PC. Only two interfaces are used as the CPU of the PC needs to handle the network protocol and its performance cannot cope with more. This is the main bottleneck of the bus-based scenario. In the standard use case the second level trigger (L2) requests data from 2-3 of the 12 links of a ROS PC (in the typical case of 4 ROBINs) at about 20 kHz and based on the L2 trigger decision the event builder system requests data at  $\sim$ 3 kHz from all links. For use cases with higher L2 request rates or for trigger types which have additional bandwidth demands such as Inner Detector or Calorimeter full scans this setup cannot deliver sufficient performance. The ROBINs have the potential to be directly connected to the DC network with their built-in GbE ports in the so called *switch-based* scenario, which also allows the message handling to be offloaded to the PowerPC (PPC) processor[3] on the ROBIN. For this so far unused approach the FPGA[4] and

\*Corresponding author, Email address: nschroer@cern.ch

PPC code of the ROBIN needed to be modified in order to improve the performance of the network interface and to adapt the message handling to the demands of the DC network.



Figure 1: The main components and interfaces of the ROBIN.

#### A. Modifications

On a ROBIN the two main components are the FPGA and the PPC. Their original firmware is fully functional, but the built-in network interface is not optimized for the communication with the DC network. The throughput is limited to about  $\frac{2}{3}$ (i.e. 80 MB/s) of the Gigabit Ethernet capacity and the maximal L1 rate is only around 60 kHz for the use case of 1kB fragments and a request ratio of 23%.

The firmware of the ROBIN has been modified to respond to messages at the network interface in the same way as a ROS PC to allow the integration into the DC network. However as most of the resources of the FPGA are already in use and the remaining ones are not sufficient to implement TCP, which is the standard protocol of the DC network, only UDP is supported. In the original firmware fragments with the same L1 ID (but from a different input channel) need to be requested individually and are sent out in one message per fragment. This has been improved to allow data from the three input links to be bundled in one message to minimize overhead in the transmissions by reducing the necessary requests. Due to the bundling the messages are bigger and reach the Ethernet frame limit of 1.5kB earlier and need to be divided into several Ethernet packets. To avoid this and to improve the performance support of jumbo Ethernet frames of up to 6kB is available in the new firmware. Furthermore the possibility to use DMA for internal data handling is now fully operational and allows buffering of incoming fragments in parallel to message processing and speeds up packet building by managing the transfer of header and data to the output buffer. As well as the modifications necessary to allow the support of jumbo frames, the latest FPGA firmware was improved by adding a second buffer to the transmission part of the network interface. This additional buffer allows the DMA engine to complete one packet while an already completed one can be transmitted, thus minimizing the send latency.

#### B. Test Setup

For the test setup the ROBIN is housed in a ROS PC and another PC is used to run a test program to simulate the DC network. This test program requests data fragments and sends delete messages (with 100 delete commands per message ) to free the ROBIN buffers. UDP is used to communicate via a direct network connection between the ROBINs NIC and the requesting PC. Event fragments are generated by the internal data generator of the ROBIN, which has been implemented for test purposes. The size of the fragments can be programmed. The rate is throttled if the buffers of the ROBIN are full. This is the situation for the measurements described in this paper, therefore this rate is equal to the delete rate. The test program has reduced functionality compared to the ROS software environment that is usually used to request and delete the fragments via the bus interface, but it is easier to setup and suffices for performance measurements. The goal is to be able to request fragments of 1kB at a rate of 23 kHz from all three links while the input rate (L1 rate) is 100 kHz, which corresponds to requesting 23% of the data. As it is not possible in this test setup to set the L1 rate to a given value, the request frequency and associated throughput are measured for different fractions of the events (generated by the internal data generator) requested and for different fragment sizes. Hence the maximal possible L1 frequency is determined by multiplying the measured request rate by  $\frac{100}{\% requested}$ .

#### **II. MEASUREMENT RESULTS**

In figure 2 measurement results for the target request ratio of 23% are presented for both the original and the modified firmware. Contrary to the original firmware, which is not able to service a L1 rate of 100 kHz the modified firmware is capable of doing so up to a fragment size of about 1.25 kB ( $\sim$ 300 words of 4 bytes) which exceeds the requirement. The gain in manageable L1 rate is more than 75%.

To measure the maximal possible throughput of the network interface 100% of the fragments are requested which keeps the fraction of time spent on managing fragments (buffering & deleting) as small as possible. The results are shown in figure 3. The throughput of the modified firmware is increased by about 50% compared to the original firmware, reaching the limit of Gigabit Ethernet of ~120 MB/s.



Figure 2: Results of measurements to determine the maximum L1 rate at target request ratio of 23% from all 3 links. Throughput and calculated (actual measured request rate \* 100/23) maximal possible L1 rate as a function of fragment size. Both measurements are done in the switch-based setup, one with the original and the other with the modified firmware.



Figure 3: Results of measurements to determine maximum throughput by requesting 100% of fragment data from all 3 links.

The graphs of the measurement results show that for small fragment sizes the request rate (and thus the maximum L1 rate) does not depend on the fragment size and therefore the throughput is increasing linearly with bigger fragment sizes. Request handling and the deletion of fragments are overlapping with the data transfer which can be performed in parallel by the DMA.

As long as the data transfer time is shorter than the processing time of the requests the latter is dominating and thus results in constant event rate. For fragments larger than about 175 or 275 words, depending on the version of the firmware, the internal data transfers no longer overlap completely with processing by the processor. Therefore the event rate decreases for increasing fragment size.

Finally the request rate at a fixed L1 rate of 100 kHz is calculated from the data of the prior and several other measurements (see figure 4). These are the most significant figures as they represent the use case studied. As expected from the measurements with 23% request ratio the original firmware cannot provide a performance which would allow to request with 23 kHz. But with the modifications about three times the request rate is feasible fulfilling the requirements for fragment sizes of up to 1.25 kB.



Figure 4: Calculated maximum request rate for a fixed L1 rate of 100 kHz.

#### **III.** CONCLUSIONS

The modifications of the ROBIN firmware result in a significant performance increase of the network interface, making it possible to request event data of up to 1.25kB (~300 words of 4 byte) per fragment with more than 23 kHz at a fixed L1 rate of 100 kHz, hence fulfilling all the requirements. Used in switchbased mode each ROBIN can provide more than half of the output data rate of a ROS PC in the bus-based scenario, therefore the 4 ROBINs installed in a typical ROS PC together can provide over twice the output. This yields the potential to consider use cases with high L2 request rates or for trigger types which have additional bandwidth demands such as Inner Detector or Calorimeter full scans. With the modification of the message handling of the network interface to the standard format used in the DC network, an integration into the system is fairly straightforward, although additional cabling is required as each ROBIN needs to be connected to a switch. This setup would be used only in those parts of the readout system with high demands, thus the amount of extra cabling and switches is modest. Tradeoffs are that the switches need to be able to handle jumbo frames and that only UDP can be used to communicate directly with the ROBIN. But it is remarkable that the hardware design of our board together with reconfigurable components could be used to optimize the performance and implement alternative data transfer solutions.

#### REFERENCES

- The ATLAS Collaboration, G. Aad et al., The ATLAS Experiment at the CERN Large Hadron Collider, JINST 3 (2008)S08003.
- [2] R, Cranfield et al., The ATLAS ROBIN, JINST 3 (2008) T01002.
- [3] IBM PowerPC 440GP embedded processor 462 MHz http://www.alacron.com/downloads/vncl98076xz/440GP\_pb.pdf
- [4] Xilinx Virtex II XC2V2000 FPGA http://www.xilinx.com/support/#Virtex-II

# Development of a 1 GS/s high-resolution transient recorder

S. Bartknecht, H. Fischer, F. Herrmann, K. Königsmann, L. Lauser, C. Schill, S. Schopferer, H. Wollny

Universität Freiburg, Physikalisches Institut, Freiburg, Germany

sebastian.schopferer@cern.ch

# Abstract

With present-day detectors in high energy physics one is often faced with short analog pulses of a few nanoseconds length which may cover large dynamic ranges. In many experiments both amplitude and timing information have to be measured with high accuracy. Additionally, the data rate per readout channel can reach several MHz, which makes high demands on the separation of pile-up pulses.

For such applications we have built the GANDALF transient recorder with a resolution of 12bit@1GS/s and an analog bandwidth of 500 MHz. Signals are digitized and processed by fast algorithms to extract pulse arrival times and amplitudes in real-time and to generate experiment trigger signals. With up to 16 analog channels, deep memories and a high data rate interface, this 6U-VME64x/VXS module is not only a dead-time free digitization unit but also has huge numerical capabilities provided by the implemented in the FPGA may be used to disentangle possible pile-up pulses and determine timing information from sampled pulse shapes with a time resolution in the picosecond range.

Recently the application spectrum has been extended by implementing a 128-channel time-to-digital converter inside the FPGA and an appropriate input mezzanine card.

#### I. INTRODUCTION

The COmmon Muon and Proton Apparatus for Structure and Spectroscopy (COMPASS) at the CERN SPS [1] is a state-of-the-art two stage magnetic spectrometer [2] with a flexible setup to allow for a rich variety of physics programs to be performed with secondary muon or hadron beams. Common to all measurements is the requirement for highest beam intensity and interaction rates with the needs of a high readout speed. Recently interest has been expressed for pursuing a dedicated measurement of Generalized Parton Distributions (GPD) [3]. For these measurements the existing COMPASS spectrometer will be extended by a new 2.4 m long liquid hydrogen target, which will be surrounded by a new recoil detector based on scintillating counters. The background induced by the passage of the beam through the target will yield rates of the order of a few MHz in the recoil detector counters. This imposes great demands on the digitization units and on a hardware trigger based on the recoil particle. For this purpose we have developed within the GANDALF framework [4] a modular high speed and high resolution transient recorder system.

# II. THE GANDALF FRAMEWORK

GANDALF (Fig. 1) is a 6U-VME64x/VXS carrier board which can host two mezzanine cards. It has been designed to cope with a variety of readout tasks in high energy and nuclear physics experiments. Two exchangeable mezzanine cards allow an employment of the system in very different applications such as analog-to-digital or time-to-digital conversions, coincidence matrix formation, fast pattern recognition or fast trigger generation. A schematic overview of the carrier board as transient recorder is provided in Figure 2. The heart of the board is a VIRTEX5-SXT FPGA which is connected to the mezzanine cards by several single ended and more than 110 differential signal interconnections. The data processing FPGA can perform complex calculations on data which have been sampled on the mezzanine cards.



Figure 1: Picture of the GANDALF carrier board equipped with two ADC mezzanine cards. The center mezzanine card hosts an optical receiver for the COMPASS trigger and clock distribution system.



Figure 2: Block diagram of GANDALF as a transient recorder.

Fast and deep memory extensions of 144-Mbit QDRII+ and 4-Gbit DDR2 RAM are connected to a second Virtex5 FPGA. Both FPGAs are linked to each other by eight bidirectional high-speed Aurora lanes.

Connected to the VXS backplane GANDALF has 16 highspeed lanes for data transfer to a central VXS module, where the lanes of up to 18 GANDALF modules merge. This connection can be used for continuous transmission of the amplitudes and the time stamps from sampled signals to the VXS trigger processor, which then forms an input to the experiment-wide first-level trigger based on the energy loss and the time-of-flight in the recoil detector.

A dead-time free data output can either be realized by dedicated backplane link cards connected to each GANDALF P2-connector, i.e. following the 160 MByte/s SLink [5] or Ethernet protocol, or by the VME64x bus in block read mode [6] or by USB2.0 from the front panel.

#### III. ANALOG-TO-DIGITAL CONVERTER

Two models of analog-to-digital converters (ADC) can be used with the GANDALF board, depending on the desired resolution. With the Texas Instruments models ADS5463 (12bit@500MS/s) and ADS5474 (14bit@400MS/s) we chose two of the fastest pipelined high resolution ADC chips that are currently available. Their low latency of only 3.5 clock cycles gives valuable time for the signal processing and the following trigger generation with its tight timing constraints defined by existent readout electronics.

The DC-coupled analog input circuit uses the differential amplifier LMH6552 from National Semiconductor and has a bandwidth of 500 MHz. It adapts the incoming single ended signal (e.g. from a PMT) to the dynamic range of the ADC while the baseline of each channel can be adjusted individually by 16-bit digital-to-analog converters (Fig. 3). Two adjacent channels can be interleaved to achieve an effective sampling rate of 1GS/s (800 MS/s with the ADS5474) at the cost of the number of channels per mezzanine card. In this time-interleaved mode the second ADC receives a sampling clock which is phase-shifted by 180 degree and the input signal is passively split to both channels. Thus the signal is sampled alternately by two ADCs.



Figure 3: Schematic of the DC-coupled analog input circuit. For each channel U<sub>Offset</sub> can be set by 16-bit DACs.

On each ADC mezzanine card the high frequency sampling clock is generated by a digital clock synthesizer chip SI5326 from Silicon Labs, which comprises an integrated PLL consisting of an oscillator, a digital phase detector and a programmable loop filter. The experiment-wide 155.52-MHz clock, distributed by the COMPASS trigger and clock distribution system (TCS), is used as reference. Particular attention has been paid to the design of the clock filter networks and the board layout to reach a time interval error smaller than 730 fs (Fig. 4) [7], which is essential for high bandwidth sampling applications.



Figure 4: Time interval error of the sampling clock.

#### IV. TESTS AND SIMULATION

In experimental tests performed with a high precision function waveform generator (AFG-3252) and a selection of narrow band pass filters connected directly to the analog input we achieved an effective resolution on sample measurements of above 10.1 ENOB (ADS5463) and 10.6 ENOB (ADS5474) over an input frequency range up to 240 MHz. The result of these measurements is shown in Fig. 5 as a function of the frequency of the input analog signal and is expressed in dB as well as ENOB (effective number of bits). From a sampled pulse the FPGA can calculate the time of its occurrence using DSP-optimized numerical algorithms. With our knowledge of the sampling resolution extensive simulations aimed at the time resolution were performed. Different algorithms were tested and optimized [8]. The resolution on the time extracted from a pulse with different amplitudes and ~3 ns rise time, as expected from our detector, is shown in Figure 6.



Figure 5: Signal-to-noise ratio (full-scale) and effective resolution of the 12-bit and 14-bit digitization units. Values from the ADC datasheets are given for comparison.





# V. CONCLUSION AND OUTLOOK

A low cost VME64x system aimed at digitizing and processing detector signals has been designed and implemented to our full satisfaction. The design is modular, consisting of a carrier board on which two mezzanine boards with either analog or digital inputs can be plugged. The ADC mezzanine cards have been characterized and show excellent performance over a wide input frequency range. Recently an additional type of mezzanine card with 64 digital inputs has been designed, which accepts LVDS and LVPECL signals over a VHDCI connector. An optional high-speed serial VXS backplane offers inter-module communication for sophisticated trigger processing possibly using a large number of detector channels.

The GANDALF transient recorder has been installed at the COMPASS experiment during a two-week DVCS pilot run in September 2009. Extensive data have been recorded in order to verify the performance of the hardware and the signal processing algorithms.

In a forthcoming paper we will describe the realization of GANDALF as a 128-channel time-to-digital converter module with 100 ps digitization units, comparable to the F1-TDC chip [9]. The TDC design is implemented inside the main FPGA which can host 128 channels of 500-MHz scalers at the same time.

#### VI. REFERENCES

- The COMPASS Collaboration, Proposal, CERN/SPSLC/96-14, SPSLC/P297 (1996)
- [2] P. Abbon et al., COMPASS Collaboration, Nucl. Instr. and Meth. A 577 (2007) 455
- [3] The COMPASS Collaboration, Medium and Long Term Plans, CERN-SPSC-2009-003, SPSC-I-238
- [4] S. Bartknecht et al., accepted for publication in Nucl. Instr. and Meth. A
- [5] H.C. van der Bij, et al., IEEE Trans. Nucl. Sci. NS-44 (1997) 398
- [6] L. Lauser, Diploma Thesis, Universität Freiburg (2009)
- [7] S. Schopferer, Diploma Thesis, Universität Freiburg (2009)
- [8] S. Bartknecht, Research Thesis, Universität Freiburg (2009)
- [9] H. Fischer et al., Nucl. Instr. and Meth. A 461 (2001) 507

# Novel Charge Sensitive Amplifier Design Methodology suitable for Large Detector Capacitance Applications

Thomas Noulis <sup>a</sup>, Stylianos Siskos <sup>a</sup>, Gerard Sarrabayrouse <sup>b,c</sup> and Laurent Bary <sup>b,c</sup>

 <sup>a</sup> Electronics Laboratory of Physics Department, Aristotle University of Thessaloniki, Aristotle University Campus, 54124 Thessaloniki, Greece.
<sup>b</sup> CNRS; LAAS; 7 avenue du colonel Roche, F-31077 Toulouse, France
<sup>c</sup> Université de Toulouse; UPS, INSA, INP, ISAE; LAAS; F-31077 Toulouse, France

tnoul@physics.auth.gr, siskos@physics.auth.gr, sarra@laas.fr, bary@laas.gr

#### Abstract

Current mode charge sensitive amplifier (CSA) topology and related methodology for use as pre-amplification block in radiation detection read out front end IC systems is proposed<sup>1</sup>. It is based on the use of a suitably configured current conveyor topology providing advantageous noise performance characteristics in comparison to the typical used CSA structures. In the proposed architecture the noise at the output of the CSA is independent of the detector capacitance value, allowing the use of large area detectors without affecting the system noise performance. Theoretical analysis and simulation analysis are performed concerning the operation – performance of the proposed topology. Measurement results on a current mode CSA prototype fabricated with a 0.35 µm CMOS process by Austriamicrosystems are provided supporting the theoretical and simulation results and confirming the performance mainly in terms of the noise performance dependency on the detector capacitance value.

#### I. INTRODUCTION

Noise, power, volume and weight specifications are very stringent in radiation detection applications. Using CMOS technology, that can withstand dose of irradiation, a fully integrated readout front end system can be implemented at low cost. This offers all the advantages of an integrated solution like low power consumption small area and weight. However, the most crucial motivation is that the implementation of readout electronics and semiconductor detectors onto the same chip offers enhanced detection sensitivity thanks to improved noise performances [1]-[6]. Placing the first stage of the front-end close to the detector electrode reduces the amount of material and complexity in the active detection area and minimizes connection-related stray capacitances.

The noise performance of the amplification stage (preamplifier) determines the overall system noise and therefore needs optimization. A folded cascode architecture is commonly used in the implementation of the preamplifier, mainly because of its low input capacitance [6]-[14]. On the other hand a current mode structure could be an attractive alternative to the more typical voltage mode one, since the signal is processed in the current domain, avoiding high

voltage swings during charging and discharging of the parasitic capacitance and keeping the internal nodes of the circuit at low impedance values. While many current mode preamplifiers were so far suggested [15]-[19], none has provided any great advantage over the traditional voltage mode structure.

In this work an alternative implementation is presented and configured, providing output noise independent of the detector capacitance thus allowing the use of large area detectors without affecting the system noise performance, high easily adjustable dc gain and satisfactory performance regarding speed requirements.

#### II. METHODOLOGY & ARCHITECTURE

A current mode approach is used in order to implement an alternative CSA using basically a second generation current conveyor (CCII). A CCII is defined by the following relation between the terminal currents and voltages.

|   | $\left[I_{y}\right]$ |       | 0 | 0   | 0] | $\left[ V_{y} \right]$ |     |
|---|----------------------|-------|---|-----|----|------------------------|-----|
| { | $V_{x}$              | } = ‹ | 1 | 0   | 0  | $I_x$                  | (1) |
|   | $I_z$                |       | 0 | ± 1 | 0  | $V_z$                  |     |

where the subscripts x, y and z, refer to the terminals labelled X, Y and Z in Fig. 1. The CCII is defined in both positive and negative version (the +sign is used for the CCII+ type and the -sign for the CCII- type). The current mode preamplifier using a CMOS CCII implementation is shown in Fig. 2. The operation of a CCII cell, it is described using the equations below:

$$i_y = 0, i_x = i_z$$
, and  $v_x = v_y$ , (2)

Using the configuration of Fig. 2 a charge signal is fed to the X input node (the detector model is given in Fig. 1, Cd is the detector capacitance) and the output voltage is given by:

$$i_{in} = 0$$
, and  $v_{out} = i_{out} Z_{out}$  (3)

where the output impedance Zout configured by the parallel connection of Rf, and Cf and using a Laplace representation, is given by Using the configuration of Figure 2 a charge signal is fed to the X input node and the output voltage is given by ( $C_d$  is the detector capacitance):

<sup>&</sup>lt;sup>1</sup> Patent pending



Figure 1: Second Generation Current Conveyor (CCII).



Figure 2: Proposed Current mode CSA architecture

$$Z_{out} = \frac{R_f}{R_f C_f s + 1} \quad (4)$$

From equations (3) and (4) the transfer function of the circuit is given by:

$$H(s) = \frac{V_{out}}{i_{in}} = \frac{R_f}{R_f C_f s + 1}$$
(5)

The DC gain and the 3-dB frequency of the architecture are  $A_{DC}=R_f$  and  $\omega_0=1/R_fC_f$  respectively. The particular structure implements a charge amplifier or generally a trans-impedance amplifier function where the gain and the operating bandwidth are determined by the selection of passive elements  $R_{f_0}$   $C_f$ . Very important regarding the radiation detection application is the fact that the detector is connected to node X, which is practically a virtual ground since the Y input is grounded. The detector capacitance does not affect the transfer function of the proposed topology.

# III. CURRENT MODE CSA OPERATION ANALYSIS-SIMULATION RESULTS

The above alternative CSA was designed and simulated in a 0.35  $\mu$ m CMOS process (3.3V/5V 2P/3M) commercially available by Austriamicrosystems (AMS) using a previously designed [20] high gain CCII cell. A high gain CCII circuit is similar to a second generation current conveyor but it has a large current gain from X to Z rather than the unity gain of the standard CCII so as to characterize it as a high gain second generation current conveyor [21]. This amplifier is constructed by a negative second generation current conveyor and a transconductance output buffer. The CMOS high gain CCII circuit, with a current mirror input stage, was configured in order to implement the particular architecture and it is depicted in Figure 3.

The topology power supplies were VDD = -VSS = 1.65Volt. The gate of MOSFET Mbias was biased with 970 mV.



All the simulations were performed using HSPICE and SPECTRE simulators and the BSIM3V3.2 MOSFET model (Level 49) at 250C. The simulated AC response of figure 2 charge amplifying topology, for  $R_f$  resistor values of 10 k $\Omega$ , 32.8 k $\Omega$  and 100 k $\Omega$  and a capacitance  $C_f$  of 20 pF is depicted in Figure 4. Table 1 contains the theoretical and the respective simulated gain and 3-*dB* frequency performance parameters. The respective  $\sigma$ % error is below 1% for all three configurations in both gain and operating bandwidth performance, confirming the proposed architecture operation analysis.

Table 1: Current mode CSA theoretical and simulated response

| $R_f$ , $C_f$ elements | Gain   |            | 3-dB Frequency (kHz) |            |
|------------------------|--------|------------|----------------------|------------|
|                        | Theory | Simulation | Theory               | Simulation |
| 10kΩ & 20pF            | 80.00  | 79.98      | 796.2                | 795.6      |
| 32.8kΩ & 20pF          | 90.31  | 90.23      | 242.8                | 242.6      |
| 100kΩ & 20pF           | 100.0  | 99.76      | 79.6                 | 79.3       |

The above analysis confirms the advantageous operation of the proposed architecture since the gain and the operation BW can be easily adjusted selecting properly the passive elements  $R_f$  and  $C_f$ . In addition, the particular technique provides relatively high gain performance in a wide operating BW. The most important feature of the proposed current mode CSA configuration is that the total output noise and in particular the rms output noise is independent of the detector capacitance value.



Figure 4: Current mode CSA architecture frequency (gain) response for different feedback resistance  $R_f$  values.



Figure 5: Current mode CSA architecture output noise voltage spectral densities, for different detector capacitance values.

Regarding the noise performance of the proposed topology, a respective simulation is provided in Figure 5. The CCII CSA noise performance was simulated for detector capacitances ranging from 2 pF to 40 pF. The feedback elements  $R_f$  and  $C_f$  were selected equal to 32.8 k $\Omega$  and 20 pF respectively. As it is obvious the noise performance is the same for all the capacitance values in the frequency range up to 100 kHz, which is basically the frequency range of interest.

#### **IV. MEASUREMENT RESULTS**

The above alternative CSA was fabricated in 0.35  $\mu$ m CMOS process by Austriamicrosystems (AMS). A photograph (magnified) of the high gain Current Conveyor is shown in Figure 6. The measured input and output signal of the proposed structure for  $R_f = 1 \ k\Omega$  and  $C_f = 20 \text{pF}$  are depicted in Figure 7 and Figure 8 respectively (transient response), for a detector capacitance value of 2 pF. Regarding the input signal and in particular the detector specifications, the input signal corresponds to a radiation signal of 875Mecharge and time duration of 400 ns (collection of 90% of the total Q). The detector leakage current is equal to 10 pA.



Figure 6: Magnification of the second generation high gain current conveyor circuit.



Figure 7: Current mode CSA measured input signal



Figure 8: Current mode CSA measured output signal.

The proposed topology can detect and amplify the input signal and implements a charge sensitive pre-amplifier stage providing easily achievable application specified charge and discharge times and high gain performance for a relatively large operating bandwidth. Respective noise measurements in relation to the above noise simulations were also performed. Measurement results are depicted in Fig. 9. These results confirm the theoretical analysis and the simulation results since the output rms noise is independent of the detector capacitance value.



Figure 9: Measurement of the Current mode CSA output noise performance.

#### V. SUMMARY

An alternative novel current mode CSA topology and a related methodology are proposed, for use in capacitive radiation detection read out front end IC systems. It is based on the use of current mode topologies and in particular on a current conveyor suitably configured. A transimpedance amplifying topology is presented, showing advantage for charge amplification, providing easily adjustable gain and operating bandwidth. The proposed structure is fully integrated and provides advantageous noise performance for large detector capacitance applications since the detector capacitance is not included in the transfer function and does not affect the bias of the input stage. The proposed topology can be implemented using a variety of current mode circuits like CCI and other current mode architectures, suitably configured.

#### References

- [1] V. Radeka, P. Rehak, S. Rescia, E. Gatti, A. Longoni, M. Sampietro, P. Holl, L. Strüder and J. Kemmer, "Design of a charge sensitive preamplifier on high resistivity silicon," IEEE Trans. Nuclear Science, vol.35, no.1, pp.155-159, Feb. 1988.
- [2] V. Radeka, P. Rehak, S. Rescia, E. Gatti, A. Longoni, M. Sampietro, G. Bertuccio, P. Holl, L. Strüder and J. Kemmer, "Implanted silicon JFET on completely depleted high-resistivity devices," IEEE Electron Devices Letters, vol.10, no.2, pp.91-94, Jan. 1989.
- [3] J. C. Lund, F. Olschner, P. Bennett and L. Rehn, "Epitaxial n-channel JFETs integrated on high resistivity silicon for X-ray detectors," IEEE Trans. Nuclear Science, vol.42, no.4, pp.820-823, Aug. 1995.
- [4] P. Lechner, S. Eckbauer, R. Hartmann, S. Krisch, D. Hauff, R. Richter, H. Soltau, L. Strüder, C. Fiorini, E. Gatti, A. Longoni and M. Sampietro, "Silicon drift detectors for high resolution room temperature X-ray spectroscopy," Nuclear Instruments and Methods, vol.A377, pp.346-351, Aug. 1996.
- [5] L. Ratti, M. Manghisoni, V. Re and V. Speziali, "Integrated front-end electronics in a detector compatible process: source-follower and charge-sensitive preamplifier configurations," in: R. B. James (ed.), Hard X-Ray and Gamma- Ray Detector Physics III, Proc. SPIE 4507, pp.141-151, Dec. 2001.
- [6] Z. Y. Chang and W. Sansen, "Effect of 1/f noise on the resolution of CMOS analog readout systems for microstrip and pixel detectors," Nuclear Instruments and Methods, vol.305, no.3, pp.553-560, Aug.1991.
- [7] W. Sansen and Z. Y. Chang, "Limits of low noise performance of detector readout front ends in CMOS technology," IEEE Trans. Circuits and Systems, vol.37, no.11, pp.1375-1382, Nov. 1990.

- [8] C. Kapnistis, K. Misiakos and N, Haralabidis, "Noise performance of pixel readout electronics using very small area devices in CMOS technology," Nuclear Instruments and Methods, vol.458, no.3, pp.729-737, Feb. 2001.
- [9] Y. Hu, G. Deptuch, R. Turchetta and C. Guo, "A low noise, low power CMOS SOI readout front-end for silicon detectors leakage current compensation capability," IEEE Trans. on Circuits and Systems I, vol.48, no.8, pp.1022-1030, Aug. 2001.
- [10] P. Grybos, A. E. Cabal Rodriguez, M. Idzik, J. Lopez Gaitan, F. Prino, L. Ramello, K. Swientek and P. Wiacek, "RX64DTH – A fully integrated 64-channel ASIC for digital X-ray imaging system with energy window selection," IEEE Trans. Nuclear Science, vol.52, no.4, pp.839-846, Aug. 2005.
- [11] N. Randazzo, G. V. Russo, C. Caligiore, D. LoPresti, C. Petta, S. Reito, L. Todaro, G. Fallica, G. Valvo, M. Lattuada, S. Romano, A. Tumino, "Integrated front-end for a large strip detector with E, ΔE and position measurements," IEEE Trans. Nuclear Science, vol.46, no.6, pp.1300¬ 1309, Oct. 1999.
- [12] P. Grybos and W. Dabrowski, "Development of a fully integrated readout system for high count rate position-sensitive measurements of X-rays using silicon strip detectors," IEEE Trans. Nuclear Science, vol.48, no.3, pp.466-472, June 2001.
- [13] C. Kapnistis, K. Misiakos and N, Haralabidis, "A small area charge sensitive readout chain with a dual mode of operation," Analog Integrated Circuits and Signal Processing, vol. 27, pp. 39-48, 2001.
- [14] J. C. Stanton "A low power low noise amplifier for 128 channel detector read-out chip," IEEE Trans. Nuclear Science, vol.36, no.1, pp.522-527, Feb. 1989.
- [15] J. Wulleman, "Current mode charge pulse amplifier in CMOS technology for use with particle detectors", Electronics Letters, vio. 32, no.6, pp. 515–516, 1996.
- [16] Fei Yuan, "Low voltage CMOS current-mode preamplifier: Analysis and design", IEEE Transactions on Circuits and Systems 53(1), 26–39, 2006.
- [17] F. Anghinolfi, P. Aspell, M. Campbell, E.H.M Heijne, P. Jarron, G. Meddeler, and J.C. Santiard, "ICON, a current mode preamplifier in CMOS technology for use with high rate particle detectors", IEEE Transactions on Nuclear Science 40(3), 271–274, 1993.
- [18] T. Vanisri, and C. Toumazou, "Low-noise optimization of current-mode transimpedance optical preamplifiers", IEEE, pp. 966–969.1993
- [19] Arie Arbel, "Innovative current sensitive differential low noise preamplifier in CMOS", Proc. ICECS, pp. 69–72, 1996.
- [20] T. Noulis, S. Siskos, L. Bary, G. Sarrabayrouse, "Non Inverting Voltage Amplifier noise analysis using a CCII<sup>∞</sup> based structure", IFIP/IEEE SOCVLSI 2008, Rhodos, Greece, pp.11-16, October 2006.
- [21] K. Koli, "CMOS current amplifiers: speed versus nonlinearity," Ph.D. dissertation, Dept. Electrical and Communications Engineering, Helsinki Univ. of Technology, Finland, 2000.

# Readout and Data Processing Electronics for the Belle-II Silicon Vertex Detector

M. Friedl<sup>a</sup>, C. Irmler<sup>a</sup>, M. Pernicka<sup>a</sup>

<sup>a</sup> Institute of High Energy Physics, Nikolsdorfergasse 18, A-1050 Vienna, Austria

friedl@hephy.at

# Abstract

A prototype readout system has been developed for the future Belle-II Silicon Vertex Detector at the Super-KEK-B factory in Tsukuba, Japan. It will receive raw data from double-sided sensors with a total of approximately 240,000 strips read out by APV25 chips at a trigger rate of up to 30kHz and perform strip reordering, pedestal subtraction, a two-pass common mode correction and zero suppression in FPGA firmware.

Moreover, the APV25 will be operated in multi-peak mode, where (typically) six samples along the shaped waveform are used for precise hit-time reconstruction which will also be implemented in FPGAs using look-up tables.

## I. INTRODUCTION

The Belle Experiment [1] at the KEK Research Laboratory in Tsukuba (Japan) has successfully been observing CP violation and other phenomena in the B system for a decade. It will conclude its data taking by the end of 2009 at an integrated luminosity of about  $1 \text{ ab}^{-1}$ . The Belle Experiment as well as its counterpart BaBar [2] in Stanford (USA) were explicitly acknowledged in the 2008 physics Nobel Price statement for the experimental verification of the CP violation theory [3] by Makoto Kobayashi and Toshihide Maskawa.

Already now, the KEK-B machine [4], which stores electron and positron beams that are collided in the center of the Belle Experiment, provides the highest luminosity in the world, peaking at more than  $2 \times 10^{34} \,\mathrm{cm^{-2}s^{-1}}$ . In order to study rare phenomena and increase the statistics of measurements, it is foreseen to upgrade the KEK-B machine until 2013 such that the ultimate luminosity will be 40 times higher than now. This also implies changes in the Belle Experiment, which was not designed for such an intensity and consequently all parts of the detector need an upgrade as well.

The present Silicon Vertex Detector [5] (SVD2) of the Belle Experiment is composed of four layers of double-sided silicon sensors and read out by the VA1TA front-end chip [6], which has a shaping time of about 800 ns. Its innermost layer, located at a radius of 2 cm from the beam axis, suffers from an occupancy of about 10% at the present luminosity. Moreover, the readout speed of 5 MHz sets another limit, because the VA1TA has no pipeline memory and thus a dead time occurs after a trigger until the data are read out. This is at the percent level with the present trigger rate of about 450 Hz, but will be prohibitive at 40 times higher luminosity with a projected trigger rate of up to 30 kHz. Consequently, the present SVD2 is not suitable for Belle-II and a completely new silicon detector, together with a

new readout chain, is being developed, which is described below.

## II. SILICON VERTEX DETECTOR FOR BELLE-II

The Silicon Vertex Detector for the future upgrade of the Belle Experiment, shown in fig. 1, will again consist of four layers of double-sided sensors which are arranged cylindrically around the interaction point. In contrast to the present SVD, however, a two-layer pixel detector, consisting of DEPFET sensors [7], will be placed in the innermost part at radii of 1.3 and 2.2 cm. In this sense, the future vertex detector will consist of a total of six layers, which enables robust and redundant tracking as well as precise vertex reconstruction thanks to the pixel detector.

Another striking difference to the current SVD2 is, that the future detector, tentatively named SuperSVD, will cover the same angular acceptance, but with tilted (trapezoidal) sensors in the forward region. This will significantly complicate the mechanical assembly, but at the same time improve the signal-to-noise ratio in that area and also save a considerable amount of readout channels and thus cost. Simulation studies are ongoing whether or not to introduce such a lantern-shape also in the backward side, which would then lead to a silicon detector very similar to the one of the Babar experiment [8].

The SuperSVD will entirely be composed of double-sided silicon sensors made from 6" wafers, which are read out by four or six APV25 front-end chips on either side, depending on the position, strip pitch and overall size. In the SVD2, the strips of up to three sensors were concatenated and read out by readout chips located at the sides outside of the acceptance region. Such a concept is not possible anymore with the short shaping time of the APV25, which is necessary to reduce the occupancy, because this also implies an increased noise susceptibility related to load capacitance [9].

While those sensors that are located at the forward or backward edges can still be read out in the conventional way by placing a hybrid at the side (and thus outside of the acceptance), this is not possible for the inner sensors, for which we developed the Origami chip-on-sensor concept. As this idea is described in detail in this volume [10], we will just summarize the main features here. Thinned readout chips are placed on flex hybrids which sit on one side of the sensor and thus have very short connections to the strips on that sensor side. The opposite side of the sensor is contacted through flexible fanout pieces which are bent around the edge – hence the name Origami. Fig. 2 shows the first working prototype of such an Origami module assembly, which is described in detail in [10].



Figure 1: Conceptual design of the Silicon Vertex Detector for Super-Belle, consisting of two pixel layers surrounded by four double-sided silicon strip layers with slanted sensors in the forward region.



Figure 2: Origami chip-on-sensor prototype module on a 4" doublesided silicon detector read out by four thinned APV25 chips on either side, which are all cooled by a single cooling pipe. The fanout pieces wrapped around the edge to connect to the opposite sensor side are clearly visible. See [10] for details.

The Origami concept inevitably increases the material budget compared to conventional readout schemes, but it is the only way of maintaining a good signal-to-noise with fast shaping. Moreover, it also implies that the number of readout channels will roughly doubled compared to the SVD2, namely 243,456 strips being read out by 1,902 APV25 chips (cf. 110,592 strips and 864 VA1TA chips for the present system).

## III. AVP25 FRONT-END READOUT CHIP

The APV25 [11] readout chip was originally developed for the CMS experiment at CERN, but it also fits the needs of the SuperSVD at Belle-II. Thanks to its short shaping time of 50 ns (compared to about 800 ns of the present VA1TA), it automatically reduces the occupancy by a factor of 16. (A factor of 12.5 was found by measurement because the actual waveforms are not exactly congruent and thresholds need to be considered.) The APV25 chip also features an internal analog pipeline of 192 cells and thus allows dead time-free measurement. (Actually, there is a very short dead time of 3 clock cycles by design, but this is irrelevant for Belle-II.)

In CMS, the APV25 is operated at a 40 MHz clock which is synchronous to the bunch crossings in the experiment. This also allows to use the so called "deconvolution" mode [12], where a weighted sum of three consecutive samples in the pipeline is calculated for each channel upon reception of a trigger. This on-chip processing narrows down the resulting signal such that the data can unambiguously be assigned to a particular bunch crossing at the cost of a moderate increase of noise.

Unfortunately, this feature cannot be used in Belle-II, because the bunch crossings occur in a quasi-continuous fashion (the accelerator frequency is about 508 MHz) and thus the APV25 clock cannot be synchronized to the collisions. However, the APV25 chips also offers a mode where the three samples from the pipeline can be obtained in raw format without passing the deconvolution algorithm. In this mode, integer multiples of three samples can be obtained by sending two or more triggers with the correct spacing. This opens the path for offchip data processing which is pursued for the SuperSVD, as described in section V.

The APV25 has a differential analog output where multiplexed strip data are presented at clock frequency. Moreover, the APV25 has a slow control interface which uses the  $I^2C$  standard. Various internal bias voltages and currents as well as general settings (such as the mode of operation) can be controlled through this interface.

# IV. READOUT CHAIN

Fig. 3 shows the conceptual layout of the readout system, which follows a pretty conventional scheme that largely resembles the present situation. Repeater boxes (called "DOCK") are located a few meters away from the front-end hybrids and are used for buffering clock, trigger and control signals sent to the front-end as well as the analog data obtained from there. Moreover, the repeater has another important task. As we read out double-sided sensors, the front-end chips of each side are operated by floating LV power which is tied to the bias voltage level of each side, respectively. In the present SVD2, the sensors are biased at 80 V which means that the front-end readout chips are at  $\pm 40$  V. Consequently, the repeater box also has to translate the analog front-end signals to earth-bound levels and control signals in the opposite direction. Presently, this is done using optocouplers, but as the readout speed will be much faster in the future, a capacitive coupling scheme has been established for analog signals, clock and trigger, while optocouplers are only used for slow controls such as I<sup>2</sup>C and reset lines.



Figure 3: Schematic view of the readout chain for the SuperSVD.

A prototype readout system was built and successfully operated in the lab as well as in several beam tests. It consists of a mechanical repeater box ("DOCK", fig. 4) that contains a mother board ("MAMBO") which hosts up to six repeater boards ("REBO"). The latter are all identical, but are assigned to positive or negative bias voltages and hence readout of n- or p-sides of the detector, respectively, depending on the slot in the mother board. The actual level translation is performed on the REBO boards, each of which presently serves 16 APV25 frontend chips on four hybrids (fig. 5).



Figure 4: Prototype repeater system. An aluminum box ("DOCK") contains a mother board ("MAMBO") and up to six repeater boards ("REBO"). Each repeater board is mounted onto an aluminum bracket (shown detached to the right) which is then screwed to the water-cooled copper lid (shown to the left).



Figure 5: Prototype repeater board. The separation between earth-bound and floating voltage levels is indicated by a white line. Optocouplers (left part) translate  $I^2C$  and reset lines and capacitors with amplifiers on both sides bridge clock, trigger (center) and analog signals (right half).

The analog signals, once translated to earth-bound voltage levels, are transmitted to the back-end VME system through ethernet cables of  $30 \,\mathrm{m}$  length. Optical links are an alternative, but also driving up the cost and normally need digitiza-

tion before, which would not only increase the density in the repeater boxes, but also their power consumption and thus the requirements of cooling. Moreover, radiation in the location of the repeater boxes is not an issue in the present system, but may become critical in the future. The radiation dose was measured around the forward repeater boxes to be approximately  $5 \,\mathrm{kRad}\,\mathrm{ab}^{-1}$ . Although many parameters will change in the future machine, a simple scaling of this number implies a lifetime dose of about  $250 \,\mathrm{kRad}$ , which is deadly for most commercial electronic devices. An alternative approach would be to place the repeater boxes farther away from the radiative area, which is under investigation now.

Clock and trigger signals are also propagated through ethernet cables from the controller unit to the repeater boxes, and round twisted flat cables are used for slow controls and switch controls which establish the connection of individual hybrids to  $I^2C$  and reset buses.

The prototype readout system was originally designed for an intermediate upgrade where only the two innermost layers of SVD2 should have been replaced by APV25 readout, but with the SuperSVD system the number of readout channels will double, and thus the density of the repeaters must also increase significantly. Some improvements are planned on the REBO boards to increase the number of channels on a single board while keeping the same size. Moreover, we believe that the number of REBOs within a single box can be increased to (almost) twice the present number.

# V. FADC+PROC BACK-END DATA PROCESSING

Both data processing boards ("FADC+PROC") as well as the control units for distribution of clock, trigger and slow control signals to the front-end are based on 9U VME modules located in the electronics hut. The prototype system consists of one master controller ("NECO", left side of fig. 7), one control distribution unit ("SVD3\_Buffer", right side of fig. 7) and two FADC+PROC modules (fig. 8), each of which receives the signals of 16 APV25 chips. This system is modular in the sense that in its present form it can spread over two crates with up to 32 FADC+PROC units serving 512 APV25 channels. Clearly this is not sufficient for SuperSVD, and the density is likely to increase also in the back-end. The number of channels per unit and grouping of repeater and back-end units will be reconsidered once the sensor configuration is frozen.



Figure 6: Block diagramm of the data processing chain.



Figure 7: Control modules of the prototype system. The master control unit (NECO, left) ist complemented by distribution units (SVD3\_Buffer, right).



Figure 8: FADC+PROC data processing module.

Fig. 6 shows the building blocks of the FADC+PROC devices. At the inputs, there is an adjustable equalizer (to compensate for the limited bandwidth of the 30 m long cables) and a preamplifier for each channel, followed by a 10-bit FADC and one FPGA for a group of four inputs, coinciding with one side of a silicon sensor. Inside this FPGA, each channel has its

dedicated pipelined processing unit which performs channel reordering (to restore the physical strip order), pedestal subtraction, a two-pass common mode correction and zero suppression (sparsification). In the future, a hit time finder will be implemented after the processing blocks. We will take six samples per trigger, this unit will select the three points around the peak and use an internal look-up table to determine the peaking time and amplitude as well as quality indicators. Computer simulations were performed for such look-up tables, delivering results close to what can be obtained by a numeric fit.

The data of all front-side FPGAs are collected, formatted and buffered in a central FPGA and passed on to the data acquisition system. Presently, this is done through a common platform called COPPER/FINESSE, but in the future we could also implement a Gigabit Ethernet interface directly on the FADC+PROC.

#### VI. PROTOTYPE RESULTS

The prototype system has been extensively tested in the lab and in several beam tests and demonstrated stable, reproducible results with various types of prototype detector modules. The basic functionality of hardware and firmware has been established, yet some details still need fine-tuning. The hit time finding block is being developed but not yet finalized.

So far, the hit time finding was performed off-line by numeric fitting and typically a precision of  $2 \dots 3$  ns could be obtained at a cluster signal-to-noise ratio of  $25 \dots 15$ , respectively, when measured against a reference TDC. Fig. 9 summarizes the results obtained with various different prototype modules for the Belle upgrade, depending on the measured signal-to-noise. The accumulated data can be fit by a straight line when plotted in double-logarithmic mode.



Figure 9: Measured hit time precision versus the cluster signal-to-noise ratio.

The hit time finding can be used to discard off-time background and thus, together with the shorter shaping time of the APV25, reduce the overall occupancy by a factor of up to 100 compared to the present SVD2. [13]

# VII. SUMMARY AND OUTLOOK

The new Silicon Vertex Detector (SuperSVD) for the future Belle-II is now being designed, based on R&D and experience obtained with prototypes in the past few years. On the silicon detector module level, the Origami chip-on-sensor concept ensures low-mass double-sided readout using thinned APV25 front-end chips with fast shaping and yet excellent signal-tonoise.

Moreover, we have demonstrated a prototype of a fully functional and scalable electronics readout system including voltage level translation, which is achieved by capacitive coupling for analog signals and by optocouplers for the slow control lines. In the back-end, the data are sparsified on-line and hit time finding is used to narrow the acceptance window and thus reduce the overall occupancy considerably. This prototype system yielded excellent results in the lab as well as in several beam tests and is now being scaled up to match the full-sized SuperSVD detector.

#### REFERENCES

- [1] A.Abashian *et al.* (The Belle Collaboration), **The Belle Detector**, Nucl. Instr. and Meth. A 479 (2002), 117–232
- [2] D.R.Marlow, B-factory detectors, Nucl. Instr. and Meth. A 478 (2002), 80–87
- [3] M.Kobayashi, T.Maskawa CP Violation in the Renormalizable Theory of Weak Interaction, Prog. Theor. Phys. 49 (1973), 652–657
- [4] S.Kurokawa, E.Kikutani, Overview of the KEKB Accelerators, Nucl. Instr. and Meth. A 499 (2003), 1–7, and other articles in this volume
- [5] H.Aihara *et al.*, Belle SVD2 vertex detector, Nucl. Instr. and Meth. A 568 (2006), 269–273
- [6] VA1TA Chip, http://www.ideas.no/products/ASICs/pdf/Va1Ta.pdf
- [7] H.-G.Moser *et al.*, DEPFET active pixel sensors, PoS VERTEX2007:022 (2007), 1–13
- [8] V.Re et al., Babar Silicon Vertex Tracker: Status and Prospects, Nucl. Instr. and Meth. A 569 (2006), 1–4
- [9] M.Friedl et al., The Origami Chip-on-Sensor Concept for Low-Mass Readout of Double-Sided Silicon Detectors, CERN-2008-008 (2008), 277–281
- [10] C.Irmler *et al.*, Construction and Performance of a Double-Sided Silicon Detector Module Using the Origami Concept, this volume
- [11] M.French *et al.*, Design and Results from the APV25, a Deep Sub-micron CMOS Front-End Chip for the CMS Tracker, Nucl. Instr. and Meth. A 466 (2001), 359–365
- [12] S.Gadomski *et al.*, The Deconvolution Method of Fast Pulse Shaping at Hadron Colliders, Nucl. Instr. and Meth. A 320 (1992), 217–227
- [13] M.Friedl *et al.*, Occupancy Reduction in Silicon Strip Detectors with the APV25 Chip, Nucl. Instr. and Meth. A 569 (2006), 92–97

# e-link: A Radiation-Hard Low-Power Electrical Link for Chip-to-Chip Communication

S. Bonacini<sup>a</sup>, K. Kloukinas<sup>a</sup>, P. Moreira<sup>a</sup>

<sup>a</sup>CERN, 1211 Geneva 23, Switzerland

sandro.bonacini@cern.ch

# Abstract

The e-link, an electrical interface suitable for transmission of data over PCBs or electrical cables, within a distance of a few meters, at data rates up to 320 Mbit/s, is presented. The e-link is targeted for the connection between the GigaBit Transceiver (GBTX) chip and the Front-End (FE) integrated circuits. A commercial component complying with the Scalable Low-Voltage Signaling (SLVS) electrical standard was tested and demonstrated a performance level compatible with our application. Test results are presented. A SLVS transmitter/receiver IP block was designed in 130 nm CMOS technology. A test chip was submitted for fabrication.

## I. INTRODUCTION

With the future upgrade of the LHC and its associated experiments the number of tracker detector channels will increase by one order of magnitude with respect to the LHC trackers just completed. Nonetheless, the design strives to reduce the total material inside the detectors, which is mainly due to cables, cooling and mechanical support, the last one being related to the other two. It is thus necessary to minimize the power consumption of the electronic devices in the front-end (FE) and the number of cables required. This can be achieved by new low-power interconnection schemes between the FE and the off-detector electronics, and among the on detector Application-Specific Integrated Circuits (ASICs); numerous slow data links could be aggregated into fewer faster and more efficient links.

The use of an advanced CMOS technology, which allows several supply voltage levels for different purpouses, helps the minimization of power of the FE's ASICs.

The recent technology advancements demonstrated serial links as fast as 10 Gbps and above implemented in 130 nm CMOS technology. The GBT project was started to design the future optical data link for the experiments, which brings together the functions of data readout, trigger and control. The GBT will be connected to a number of up to 32 FE ASICs, requiring each one a dedicated electrical link, in a star-point topology. These links target short distance transmission (typically up to 2 meters on PCB, and up to 4 meters on cable) and shall be as much as possible insensitive to common-mode voltage variations.

The front-end electronics of particle physics detectors aim to achieve high levels of performance in terms of resolution and accuracy. This performance is limited by the system intrinsic noise, therefore electrical links should be designed to minimize crosstalk and power supply noise.

For these reasons, the study of a low-power low-voltage-

swing electrical link was carried out. Among the several link examined, the Scalable Low-Voltage Signaling (SLVS) industry standard was chosen and tested. The protocol is briefly described in section II. The tests are described in section III.

Since the link circuitry shall be placed in the FE, it needs to work properly in the harsh environment of the experiments characterized by high level of radiation (up to hundreds of Mrd) and intense magnetic field (up to 4T). These constraints make commercial components not suitable and require the design of novel radiation-hard trasmitter and receiver circuits.

The design of an SLVS transmitter and an SLVS receiver was carried out, as part of the GBT project, for the interconnection between the GBTX chip and the FE ASICs. The design is presented in section IV.

# II. THE SLVS STANDARD

The SLVS standard is defined in [1] and describes a differential current-steering electrical protocol with a voltage swing of 200 mV on a 100  $\Omega$  load and a common mode of 200 mV. The differential voltage is therefore 400 mV as depicted in Fig. 1.



Figure 1: SLVS standard signaling scheme.

The output current is 2 mA, with a power consumption at the load of 0.4 mW. The reduction in common-mode with respect to other standards, like LVDS, allows the use of a supply voltage as low as 0.8 V for the output driver circuitry.

A few commercial parts which comply to this standard are available, mainly from National Semiconductors, and their target application is in mobile/portable devices as short (< 30 cm) communication links over PCB traces and flat cable.

#### III. BIT ERROR RATE TESTING

The test aimed to demonstrate the capability of the electrical protocol to work with longer distances and different media than the parts normal application.

#### A. Test setup

A commercial part which uses the SLVS standard was tested with several media types and lengths (5 m Ethernet cable, 24 cm kapton, 2 m PCB and others) at two different speeds (320 Mbps and 480 Mbps). The part we used is the LM4308 from National Semiconductors.

The test setup is composed of

- two Xilinx Spartan-3E evaluation boards,
- two custom PCBs holding each two LM4308 components,
- two link media,

arranged like in Fig. 2.



Figure 2: Test setup (clock signals are not shown).



Figure 3: Test setup picture.

The LM4308 chip is an SLVS serdes, which can be hardwire-configured to be either a serializer or a deserializer. In the test, two LM4308 chips are serializers while the other two are deserializers.

Each one of the Xilinx Spartan-3E chips generates a pseudorandom sequence, which is fed to a serializer chip, and checks the sequence coming from a deserializer chip. The link media are connected to the serdes boards through Samtec QTE/QSE connectors.

A few special PCB-type media were fabricated for this purpouse: a 1-m microstrip, a 2-m microstrip and a 2-m stripline; these lines follow a serpentine path to minimize area. An Ethernet plug adapter was also fabricated in order to test Ethernet cables.

# B. Test results

The test results are described in Table 1. The eye-diagram in Fig 4 has been obtained at 480 Mbps at the load of a 2-m microstrip PCB line.

It should be noted that the LM4308 uses a forwarded-clock technique, therefore the bit errors which were measured might as well come from the clock line, which in all media runs along the data line.

Table 1: SLVS test results

| Media              | 320 Mbps             | 480 Mbps             |
|--------------------|----------------------|----------------------|
| 1-m microstrip     | $< 1 \cdot 10^{-13}$ | $< 1 \cdot 10^{-13}$ |
| 2-m microstrip     | $< 1 \cdot 10^{-13}$ | $< 1 \cdot 10^{-13}$ |
| 2-m stripline      | $< 1 \cdot 10^{-13}$ | $< 1 \cdot 10^{-13}$ |
| 24-cm Kapton       | $< 3 \cdot 10^{-14}$ | $< 1 \cdot 10^{-13}$ |
| 5-m ethernet cable | $< 1 \cdot 10^{-13}$ | $2 \cdot 10^{-11}$   |



Figure 4: Eye diagram at load, at 480 Mbps using a 2-m microstrip board medium.

The test results of the SLVS standard were encouraging and demonstrated performance compatible with our target applications

# IV. SLVS TRANSMITTER AND RECEIVER IP BLOCKS DESIGN

A transmitter and receiver IP blocks for integration in the FE ASICs, complying with the SLVS protocol, were designed in a 130 nm technology. The e-link can operate at any speed up to 320 Mbps. The transmitter and receiver blocks are designed to be rad-hard and SEU-hard.

Though these IP blocks are targeted for the implementation of the GBTX-FE connection, they are also suitable for general chip-to-chip communication within the LHC experiments.

The transmitter and receiver circuits are designed to be powered in the range from 1.0 to 1.5 V.

Studies on the radiation tolerance of the technology used [3] suggest that thin-oxide transistors suffer limited total dose effects. Only thin-oxide transistors are used in the design, avoiding any special layout technique. SEU-robustness is assured by triplicating all low-capacitance nodes and logic elements.



Figure 5: Transmitter output stage schematic.



Figure 6: Receiver first stage schematic.

## A. Transmitter

The transmitter, whose schematic is shown in Fig. 5, is implemented by a N-over-N driver which steers the current given by the current source M1. The common-mode is kept at  $V_{ref,cm}$  = 200 mV by the replica bias of the source-follower M2.

In order to minimize the power consumption, the current output is adjustable from 2 mA down to 0.5 mA, with a 60% power reduction and thus proportional lowering of crosstalk. The transmitter can also be set into a power-down state when unused. The current output is set by a 4-bit digital switch (not in the figure).

In power-down mode, all the biasing circuits are switched off and the pre-driver stops toggling the final stage. The transmitter consumes 3 mW at 320 Mbps, with 1.2 V supply voltage and 2 mA output.

## B. Receiver

The receiver is implemented by a rail-to-rail differential amplifier, shown in Fig. 6, such that it guarantees a wide commonmode voltage range. The receiver can be as well set into a power-down state when unused.

The first stage amplifier, similar to [2], is a combination of two basic complementary amplifiers, which together can cover fully the input range from negative to positive supply. Moreover, the amplifier is self-biased through a negative feedback mechanism.

In power-down mode the biasing is switched off, which prevents toggling on the output. The receiver consumes 210  $\mu$ W at 320 Mbps, 1.2 V supply and with a 64 fF output load.

# C. Test chip

A test chip containing the SLVS receiver and the SLVS transmitter was designed and submitted for fabrication. The test chip works as an LVDS-to-SLVS translator and viceversa. A few CMOS input pins are present to control the transmitter current output setting and the receiver shutdown. A loopback control pin is also provided for testing.



Figure 7: Test chip layout.

Testing will be performed on the chip to evaluate the bit error rate in the same fashion as the commercial part.

# V. CONCLUSIONS

The SLVS electrical standard for the e-link, targeted for the connection between the GBTX chip and the FE ASICs, was tested with a commercial part and demonstrated a performance level compatible with our application. An SLVS transmitter/receiver IP block was designed in 130 nm CMOS technology and the test chip was submitted for fabrication.

Future improvements might include the implementation of programmable pre-emphasis in the transmitter and investigate LVDS compatibility of the electrical levels.

#### REFERENCES

- JEDEC, JESD8-13, Scalable Low-Voltage Signaling for 400 mV (SLVS-400).
- [2] M. Bazes, Two Novel Fully Complementary Self-Biased CMOS Differential Amplifiers, IEEE Journal of Solid-State Circuits, Vol. 26, No. 2, Feb. 1991.
- [3] F. Faccio & G. Cervelli. Radiation-induced edge effects in deep submicron CMOS. transistors. IEEE Transactions on Nuclear Science, vol. 52, no. 6, pages 2413–2420, December 2005.

# A Zero Suppression Micro-Circuit for Binary Readout CMOS Monolithic Sensors

A. Himmi<sup>a</sup>, G.Doziere<sup>a</sup>, O. Torheim<sup>a,b</sup>, C.Hu-Guo<sup>a</sup>, M.Winter<sup>a</sup>

 <sup>a</sup> Institut Pluridisciplinaire Hubert Curien, University of Strasbourg, CNRS/IN2P3, 23 rue du Loess, BP 28, 67037 Strasbourg, France
<sup>b</sup>Dept. of Physics and Technology, University of Bergen, Norway

Abdelkader.himmi@IReS.in2p3.fr

## Abstract

The EUDET-JRA1 beam telescope and the STAR vertex detector upgrade will be equipped with CMOS pixel sensors allowing to provide high density tracking adapted to intense particle beams. The EUDET sensor Mimosa26, is designed and fabricated in a CMOS-0.35µm Opto process. Its architecture is based on a matrix of 1152x576 pixels, 1152 column-level Analogue-to-Digital Conversion (ADC) by discriminators and a zero suppression circuitry. This paper focused on the data sparsification architecture, allowing a data compression factor between from 10 and 1000, depending on the hit density per frame. It can be extended to the final sensor for the STAR upgrade.

# I. INTRODUCTION

CMOS Monolithic Active Pixel Sensors (MAPS) are characterized by their detection efficiency close to 100 %, high granularity (~µm), fast read-out frequency (~k frame/s), low material budget (~30  $\mu$ m Si) and radiation tolerance (~1 Mrad, ~10<sup>13</sup> n<sub>eq</sub>/cm<sup>2</sup>). They are foreseen to equip new generation of vertex detectors in subatomic physics experiments [1]. Their first application coincides with the upgrade of the Heavy Flavor Tracker (HFT) in the STAR (Solenoidal Tracker at RHIC) experiment [2]. They will also equip the beam telescope of the European project EUDET [3]. The aim of the EUDET-JRA1 project is to support the infrastructure for doing detector R&D (Detector R&D towards the Internal Linear Collider). One of activities is to provide a CMOS pixel beam telescope to be operated initially at the DESYII 6 GeV electron test beam facility, near Hamburg in Germany. The high precision beam telescope will be built with up to six measurement planes equipped with CMOS Monolithic Active Pixel Sensors (MAPS). Both of these two applications need sensors with digital output and with integrated zero suppression circuit in order to increase the read-out frequency per frame with the aim to reduce the frame occupancy. The zero suppression circuit integrated in a CMOS pixel sensor is located at the bottom of a matrix and after an analogue to digital conversion circuit. Mimosa26 [4] designed for the EUDET telescope, implements such architecture. It consists of a pixel array of 576 rows and 1152 columns with a pixel pitch of 18.4 µm. Each pixel includes amplification and a Correlated Double Sampling (CDS) and each column of pixels ends with a discriminator performing the analogue to digital conversion. The data from 1152 discriminators are

processed by the zero suppression circuit. Before its integration into a final sensor, the concept of the zero suppression logic has been validated. SUZE-01, a reduced scale, fully digital circuit, able to treat and format 128 emulated discriminator outputs, has been successfully fabricated and tested in 2007. The test shows that the algorithm of hits pixel selection is fully operational. This concept is now implemented into the Mimosa26 chip. The first part of this paper describes the overview of the readout sensor architecture. The second part presents the zero suppression algorithm for MAPS architecture witch is structured in 3 steps. The last section is dedicated to the test methodology for digital output sensor.

#### II. OVERVIEW OF THE SENSOR ARCHITECTURE

#### A. Hit recognition and encoding format

The sensor is read out in a rolling shutter mode, the rows being selected sequentially by activating a multiplexer every 16 clock cycles. Figure 1 shows an example of digital matrix frame with some hits. Their coding is performed in terms of "states", each representing such a group of successive pixels giving a signal above discriminator threshold in a row. The "state" format includes the column address of the first hit pixel, followed by 2 bits encoding the number of contiguous pixels in the group delivering a signal above threshold. The row address is represented by an 11 bit number and is common to all "states" in a row. Up to M "states" by row can be processed. This limit was derived from a statistical study based on the highest occupancy expected in the pixel array.



Figure 1: Schematic view illustrating the encoding of the pixels delivering a signal above discriminator threshold

# B. Principle of hit finding algorithm

The zero suppression logic [5] is based on sparse-scan read-out [6] in order to optimize the data bandwidth. A fast priority scan path between the first and last discriminator outputs is implemented to minimize the delay within the critical data path. The 1152 column terminations are distributed over 18 banks (see Figure 2), each bank being connected to 64 columns. The digital architecture allowing to find '1's in a row of discriminated outputs is based on "Sparse Data Scan" algorithm.



Figure 2: Block diagram of the sensor read-out architecture

# III. ZERO SUPPRESSION FOR MAPS ARCHITECTURE

# A. Fast readout architecture of MAPS

The digital part sequentially controls each line for the whole frame composed of 576 lines of 1152 columns. The main sequencer gives the address of lines and all synchronizations and controls signals (see Figure 3) and works at 80 MHz.



Figure 3: Timing diagram for suze control signals

A JTAG controller programs the configuration information. The row of matrix is read during 200 ns and the read out frame frequency is about 10 KHz.

## B. Readout Chain

Zero suppression is based on row by row sparse data readout and organized in pipeline mode in three steps.

#### 1) Sparse data scan

Figure 4 shows the different steps of the sparse data scan for one bank.



Figure 4: Schematic view of sparse data scan for one bank

The algorithm proceeds through four consecutive steps, summarized below:

• In the first step, the data inputs for the process are extracted from 64 discriminators;

• The second step consists in encoding groups of hit pixels. This logic provides Enable bits and Code bits for each column composing a bank. The Enable bit is set to 1 for the first hit pixel in a group. The number of Enable bits set to 1 characterizes the state;

• The third step selects the "states"; each "state" is selected successively by a sparse data scan. It uses a chain of alternated NAND and NOR gates for the priority management during the sparse-scan. The generation of "states" requires several instructions. The number of "states" (N) in a bank is related to M "states" in a row. The algorithm manages up to N=6 instructions or "states" in a bank;

• At each instruction, the column address of the "state" is decoded. The last digital step stores the N "states" and generates "status" information indicating the number of "states" per bank. Each bank has its own address encoded in 5 bits.

#### 2) State Multiplexer

The state Multiplexer reads out the outcomes of the first step in 18 banks and keeps up to M=9 "states". For a row, each bank provides a maximum of N "states". Another logical unit, based on multiplexers, allows selecting a maximum of M "states" among 18xN (bank) "states". Thus, maximum M "states" will be stored in a memory. In case of more than M "states" are identified, an overflow bit is set to 1. The format of the row "states" includes the row address, the status register (number of states in the entire row), the "state" column addresses and the overflow bit.

This block is constituted of 3 sub-blocks:

- 2 identical modules Mux6x9To9, extracting each 9
- "states" and 1 status for an half row

• 1 module Mux2x9To9, retaining 9 "states" and a status from these 2 modules

After the active state of RstLine, the process starts by scanning the result of 18 banks starting from column 0 to 1151. The enable signal for the CkReadPixMux clock, allows doubling the clock period for the logic, which is most critical part in the design. The algorithm of module Mux6x9To9 (see Figure 5) read 9 hits "states" at maximum in 3 steps. At each rising edge of enabled CkReadPixMux (T=CK1, CK2, CK3), 3 hits "states" can be latched at maximum and each step proceeds through 3 consecutive stages, described below:

- Cursor 1 is located on:
  - First hit "state"
- Cursor 2 is located either:
  - at the fourth hit "state" if it belongs to the same bank pointed by Cursor 1
  - or at the second or third hit "state" if the "state" is in a different bank used by the Cursor 1
  - or at the Cursor 1 location if there is no hit anymore
- *Cursor 3* is located either:
  - at the third hit "state" after the Cursor 2 location if this "state" belongs to the same bank pointed by Cursor2
  - or at the first hit "state" if the "state" is a different bank pointed by the Cursor 2
  - or at the Cursor 2 location if there is no hit anymore

At the second CK2 and third CK3 rising edge: Cursor 1 points on the previous location of the Cursor 2 or 3, according to the hit "states" configuration. Cursor 2 and Cursor 3 locations are updated following the same processing realized during the phase of CK1.



Figure 5: View of Mux6x9To9 algorithm

#### 3) Memory management

This step corresponds to storing the outcomes of *state multiplexer* step to a memory.



Figure 6: Memory manager of hits "states"

Memory is composed of 2 IP's buffers to ensure the continuous read-out (4 SRAM's: 600 x 16 bits each, see Figure 2):

• During the current frame, the writing mode uses 2

SRAM's and the reading mode works with 2 others SRAM's.

• The writing process is realized by the writing of word

of 2x16 bits. In order to reduce the useful memory space (see Figure 6), if the last word of 16 bits is not written (in case of even number of hit states in the current row), next row processing status is written in that location.

• At the end of the frame, a state machine memorizes the

number of written words given by the address writing counter.

During the next frame, the 2 operations (reading/ writing) are swapped, and this process is repeated at each frame.

The format of the row "states" is composed of Status/line and State words. *States/Line* contains the address of the line which is hit, the number of "state" for this line (i.e. a number between one and nine), and an overflow flag. "State" contains the address of the first hit pixel and the number of successive hit pixels as shown on the Figure 7. Two low voltage differential signalling (LVDS) data lines (DO0 and DO1) are used for the data transmission (frequency is 80 MHz).

The Figure 7 describes the format of data send by Mimosa26. The different part of the data frame is the *Header*, *Frame counter*, *Data Length*, *States/Line*, *State*, and *Trailer*. The 2 words elements (i.e. *Header*, *Frame counter*, *Data Length* and *Trailer*) are divided into two parts. For instance, the header includes Header0 (corresponds to the 16 bits LSB) and header1 (corresponds to the 16 bits MSB). The *Header*, the *Trailer* could be used together to detect loss of synchronization.

DataLength is the number of words (16 bits) of the useful data. The data periodically sent at the beginning of each new frame, and the number of bits sent between two headers is variable and depends on the numbers of the words recorded during the last frame. Both data lines have the same number of bits. Consequently Datalength0 and Datalength1 are the same. The useful data are represented by the daisy chain of States/Line and States. The maximum number of the data generated by the suppression of zero is 570 x 16 bits for each output. After this overflow, the data frame will be truncated. Besides, the data rate per output reaches around 10 Mbytes/s.



Figure 7: Format of the Mimosa 26 output data : 80 MHz dual channel

## IV. DIGITAL OUTPUT SENSOR

## A. Chip architecture

The Figure 8 shows Mimosa 26 layout: the 1<sup>st</sup> sensor integrating the zero suppression feature, fabricated the beginning of 2009. The zero suppression logic, located at the bottom of 1152 discriminators, occupies an area of 21.5 x 0.62 mm<sup>2</sup>. It is based on SUZE-01 prototype. The propagation delay for such dimensions becomes preponderant at 80 MHz and involves some difficulties for layout routing (digital part). The layout includes 70K standard cells. A JTAG controller embedded allows the communication between the core of the system and an external test structure. The fabrication process is the AMS C35B4C3 CMOS 0.35  $\mu$ m technology, already used in MIMOSA pixel sensors.



Figure 8: a) Mimosa 26 layout: 1<sup>st</sup> sensor with Integrated Zero Suppression b) SUZE-01 prototype: Zero Suppression circuit

# *B. Test Sensor with Integrated zero suppression*

The tests of the chip require specific board. The dedicated communication through the JTAG protocol is initiated by a user interface written in C (Windows environment). This user interface configures registers for the initialisation sequence. We introduce all parameters for the synchronization of the acquired frame, and two lines pattern. The ASIC includes an embedded structure of test. This structure generates a matrix constituted of 278 times the two lines pattern. Each part of the architecture can be tested separately or entirely. The Mimosa26 test board, at the end of chip, is connected to the platform NXI (National Instruments) acquiring the data stream at 160 Mbits/s (see Figure 9). For the tests performance, in automatic way, the data patterns are selected from a source text file and sent through the chip via the JTAG interface. All the features of the architecture were tested successfully: encoding of the hit (location and geometry) and the limits of the data compression system. We can note also additional tests for reliability and robustness:

 Reliability test: 3 patterns tested 7 millions times without error. • Robustness test: 199 frames x 10 000 random

patterns test at 80 MHz without error.

The full chain test including the pixel matrix, discriminators, and the zero suppression logic, can be found in the reference [7].



Figure 9: Schematic view of the test set-up of Mimosa 26

# V. CONCLUSIONS

In this paper, we have designed a fast read-out architecture witch integrates zero suppression circuit, based on sparse data scan. The readout speed is  $\sim 10$  kframe/s. The Mimosa26 readout chain was validated by functionality tests in laboratory. Consequently, the data flow reduction will allow running the EUDET telescope on high intensity particle beams. The sensor for the HFT upgrade in STAR will be based on Mimosa26 architecture and is planned to be manufactured in 2010.

## VI. REFERENCES

- M. Winter et al., Vertexing based on high precision, thin CMOS sensors, in Proceedings of the 8th ICATPP, Como, Italy, October 2003.
- [2] L.C. Greiner et al., STAR vertex detector upgrade development, in Proceedings of Vertex 2007, Lake Placid, NY, U.S.A., September 23–28 2007, PoS (Vertex 2007) 041.
- [3] EUDET: Detector R&D towards the International Linear Collider; http://www.eudet.org/.
- [4] Ch Hu-Guo et al, CMOS pixel sensor development: a fast read-out architecture with integrated zero suppression, 2009 JINST 4 P04012 doi: 10.1088/1748-0221/4/04/P04012
- [5] K. Einsweiler, A. Joshi, S. Kleinfelder, L. Luo, R. Marchesini, O. Milgrome, F. Pengg ,"Dead-time Free Pixel Readout Architecture for ATLAS Front-End IC", IEEE Nucl. Sci., VOL.46, NO. 3, JUNE 1999.
- [6] J.J. Jaeger, C. Boutonnet, P. Delpierre, J. Waisbard, and F. Plisson,"A Sparse Data Scan Circuit for Pixel Detector Readout", IEEE Nucl. Sci., VOL.41, NO. 3, JUNE 1994.
- [7] Ch Hu-Guo et al,"10000 frames per second readout MAPS for the EUDET beam telescope", published in the same proceeding, Paris, September 2009.

# Commissioning of the CSC Level 1 Trigger Optical Links at CMS

D.Acosta<sup>a</sup>, G.P.Di Giovanni<sup>a</sup>, D.Holmes<sup>a</sup>, A.Madorsky<sup>a</sup>, M.Matveev<sup>b</sup>, P.Padley<sup>b</sup>, L.Redjimi<sup>b</sup>, L.Uvarov<sup>c</sup>, D.Wang<sup>a</sup>

<sup>a</sup>University of Florida, Gainesville, FL 32611, USA <sup>b</sup>Rice University, Houston, TX 77005, USA <sup>c</sup>Petersburg Nuclear Physics Institute, St.Petersburg, Russia matveev@rice.edu

#### Abstract

The Endcap Muon (EMU) Cathode Strip Chamber (CSC) detector at the CMS experiment at CERN has been fully installed and operational since summer of 2008. The system of 180 optical links connects the middle and upper levels of the CSC Level 1 Trigger chain. Design and commissioning of all optical links present several challenges, including reliable clock distribution, link synchronization and alignment, status monitoring and system testing. We gained an extensive experience conducting various tests, participating in local and global cosmic runs and in initial stage of the LHC operation. In this paper we present our hardware, firmware and software solutions and first results of the optical link commissioning.

#### I. INTRODUCTION

The CSC detector [1] comprises 468 six-layer multi-wire proportional chambers arranged in four stations in the Endcap regions of the CMS with the goal to provide muon identification, triggering and momentum measurement.

The CSC Level 1 trigger electronics consists of: (1) onchamber anode and cathode front-end (AFEB and CFEB) and Anode Local Charges Track (ALCT) boards; (2) Trigger Motherboard (TMB) and Muon Port Card (MPC) in sixty 9U crates on the periphery of the return yoke of CMS; and (3) one Track Finder (TF) in the underground counting room (Fig.1). This system provides four trigger candidates to the CMS Muon Trigger within 80 bunch crossing (BX) latency, or 2.5 us.



Figure 1: EMU CSC Level 1 Trigger Electronics

The AFEB amplifies and discriminates the anode signals. The CFEB (4 or 5 boards per chamber) amplifies, shapes and digitises the strip charge signals. The anode patterns provide more precise timing information than the cathode signals, and also provide coarse radial position and angle of passing particle for the trigger chain. The FPGA-based processing unit in the ALCT searches for patterns of hits in six planes that would be consistent with muon tracks originating from the interaction point. The patterns are considered valid, if hits from at least four planes are present in the pattern.

Two valid anode patterns, or ALCT's, are sent to the TMB. Based on comparator half-strip hits sent from CFEBs, the TMB searches for two patterns of hits from at least four planes and then matches these two Cathode Local Charged Track (CLCT) patterns with two ALCT ones, making a correlated two-dimensional LCT.

Up to nine TMBs, in pairs with Data Acquisition Motherboards (DMB), one Clock and Control Board (CCB), and one MPC reside in the peripheral 9U crate. 60 such crates are mounted along the outer rim of the endcap iron disks. Every bunch crossing, the MPC receives up to 18 LCTs from 9 TMB boards, sorts them and sends the three best ones via optical links to the Sector Processor (SP) residing in the TF crate in the underground counting room. There are 180 CSC synchronous trigger optical links in total. Each DMB has its own asynchronous optical link for data transmission to the CMS DAQ system using the Data Dependent Units (DDU) and Data Concentrator Cards (DCC). They reside in a four custom 9U crates in the underground counting room.

The TF consists of 12 SP boards, the Muon Sorter (MS), the DDU and the CCB. Each SP receives 15 data streams with trigger primitives from five MPCs and performs track reconstruction for the  $60^{\circ}$  sector. The three selected tracks are sent to the MS via a custom backplane. The MS sorts the 36 incoming tracks and selects the four best ones and transmits them over copper links to the Global Muon Trigger receiver in the Global Trigger crate. Every SP also provides data to the DAQ system via the TF DDU module.

#### II. OPTICAL LINK ARCHITECTURE

The basic units of the CSC optical link are the Texas Instruments TLK2501 [2] gigabit serializer/deserializer (SERDES) and the Finisar FTRJ8519 optical transceiver (Fig.2). All links are simplex and operate at a double (~80.16MHz, later in this paper referred as 80MHz) of the LHC clock frequency. The source of trigger data is the MPC board, and the target is the SP. Each muon pattern (called later in this paper as "muon") is sent via a separate link. The MPC transmits the three best muons in ranked order. Each SP receives up to 6 muons from inner station ME1, and three muons from each of stations ME2, ME3 and ME4 (Fig.2). In total, there are 15 optical receivers and 15 TLK2501 deserializers on each SP boards. Due to layout constraints the length of optical fibers varies from 59 m to 112 m, so the propagation times vary up to 270 ns (assuming ~5ns/m delay in multi-mode fiber). All optical connections are implemented through the front panels of the MPC and SP boards. A front view of the TF crate with 180 optical fibers connected is shown in Fig.3.



Figure 2: CSC Level 1 Trigger Optical Links



Figure 3: Front View of the CSC Track Finder crate

## **III. CLOCK DISTRIBUTION**

The TLK2501 specification requires that the peak-to-peak jitter of the SERDES reference clock (80.16MHz in our case) be no more than 40 ps [2]. Since our trigger links are synchronous, we must use a derivative of the LHC clock frequency. The CCB, which is the source of the clock and control signals, includes the CERN designed TTCrq mezzanine board [3] with the TTCrx and QPLL2 ASICs. The TTCrx transmits the 40MHz clock with relatively high jitter of about few hundred picoseconds, while the QPLL2 provides three LVDS clock outputs of 40MHz, 80MHz and 160MHz with a jitter below 50 ps [4]. It was decided to route the QPLL2 80Mhz LVDS clock output via the custom peripheral backplane to the MPC and use it as a reference for the TLK2501 serializers.

On a SP board the 80MHz reference clock is obtained from the 40MHz frequency arriving from the CCB. Such a solution allows us to use the peripheral CCB board in the TF crate without any modifications. The default CCB source is the 40MHz clock from the QPLL2; and all the twelve 40MHz clocks to SP boards in the TF crate are delivered over separate LVDS backplane lines of the same length.

On the first prototype of the SP board, in 2002, the 80MHz reference clock was synthesized in the FPGA using the

Digitally Controlled Module (DCM). The output jitter was excessive and the link did not lock properly. It was decided to build a small daughter board (Fig.4) that comprises the same QPLL2 ASIC that the TTCrq mezzanine is using. This board is installed on top of the SP main board and provides a low jitter 80MHz LVDS clock. This clock is distributed via clock repeaters from the daughter board to all 15 deserializers.



Figure 4: SP Clock Daughter Board (top and bottom view)

The QPLL2 on both the CCB and SP mezzanines are set to default operation "mode 1" [4], when the QPLL2 calibration logic is active, and frequency calibration cycle is executed after a reset or each time the lock is lost. This mode requires minimal monitoring and automatically executes a frequency calibration cycle every time the loss-of-lock state is detected. Locking time, including a frequency calibration cycle is ~180 ms. The "lock" state can be monitored with the LEDs on the front panel of the SP and CCB boards as well as from status registers available via VME.

The locking range of the 22 production SP clock daughter boards as well as a couple of the TTCrq mezzanines was studied during production tests of the TF. It was shown that all the tested boards can withstand a variation of at least -84, +42 ppm of the LHC frequency and thus meet the CMS trigger requirements (Fig.5).



Figure 5: QPLL Locking Range

## IV. LINK SYNCHRONIZATION AND ALIGNMENT

The 16-bit parallel data in the TLK2501 transmitter is encoded into 20 bits using an 8B/10B encoding format. There are also two other control signals called TX\_EN and TX\_ER that specify the "normal data character", "idle", "carrier extend" and "error propagation". The latter three are the special codes defined in the 8B/10B format. One of them, the "idle" is used as a synchronization pattern to recover the byte boundary. The decoder in the deserializer detects the "idle" symbol called the K28.5 comma which generates a synchronization signal aligning the data to their 10-bit boundaries for decoding. Then the decoder converts the data back to 8-bit, removing the control symbols. The receiver has two status outputs RX\_DV and RX\_ER to indicate one of four link states listed above.

The only way to synchronize (or re-synchronize) the TLK2501 chipset is to put a transmitter into "idle" state for at least 3 clock cycles. This is done upon the arrival of the L1Reset (Resynch) command distributed from the Timing, Trigger and Control (TTC) system of the CMS at the beginning of each run. Then, after transmission latency, link propagation delay, and data reception latency, every TLK2501 receiver switches into "idle" mode. Three data streams from each MPC are supplied to the front FPGA of SP (there are five front FPGA in total), where the input alignment FIFO buffers (one per muon) have been reset by the same L1Reset command and are waiting for valid data from the receiver. Each alignment FIFO resumes writes after the corresponding TLK2501 receiver has switched to normal data transmission. When all receivers have started getting valid data (and all their RX\_DV outputs became "1"), the AND of all RX\_DV outputs is synchronized with the SP bunch crossing (BX) clock CLK40 and enables the FIFO reads (Fig.6), thus the SP is aligned to the latest (longest) link. In the present MPC firmware the length of the "idle" pattern is set to 128BX, or 3.2us.



Figure 6: Simplified Alignment Scheme

For the whole TF crate with 12 Sector Processors the synchronization and alignment of all 180 links require to set and adjust the Alignment FIFO delays individually for each SP. They allow to equalize the different MPC-to-SP fiber lengths with 0.5BX accuracy (Table 1). This procedure is described in detail in the Note [5].



Table 1: Alignment FIFO Delays vs Optical Fiber Lengths

Since the TTC fiber lengths to peripheral CCBs also vary, the system-level synchronization procedure includes appropriate coarse and fine delays settings in the peripheral TTCrx ASIC on CCB boards and programmable delays in the TMB registers. An efficient test of the global CSC synchronization is possible using the Bunch Crossing Zero (BC0) signal coming from the TTC system. All the CSC trigger boards (TMB, MPC, SP, MS) are transparent to this signal. So we can check isochronous clocking by comparing BC0 arrival times at SP level from various peripheral crates and from individual TMB boards in each crate. This test was conducted in May 2009. Each MPC was set into "transparent" mode, when it can transmit any given LCT1...LCT18 to any specific optical link 1..3 without sorting. 936 individual measurements were made (468 chambers x 2 LCT per chamber) and the synchronization was verified.

#### V. LINK TESTING, MONITORING AND PERFORMANCE

The simplest data transmission test can be run from the transmitter TLK2501 to the receiver TLK2501 using the embedded  $2^{7}$ -1 Pseudo-Random Bit Stream (PRBS) generators. Within ~15 minutes the bit error rate below  $10^{-12}$  per link can be verified. This test does not involve the transmitter and receiver FPGA. A more elaborate test allows to transmit test patterns from the output buffer in the MPC (or even in the TMB) and verify them from the spy SP FIFO.

There are several clock and link status monitorables available from the SP registers via VME. They include the following: SP daughter board and TTCrq "lock" statuses and "Loss of Lock" counters; "signal detect" status of each optical receiver; alignment FIFO "empty flag" and word count; "signal loss", "carrier extend", "error word", "alignment FIFO underflow", "BC0 arrived later/early", "BX mismatch" and "PRBS error" counters; "valid pattern" and "valid track" counters for occupancy monitoring.

Immediately after the "idle" pattern every MPC sends to SP an 8-bit word with its unique board (1..60) and link (1..3) numbers. These numbers are stored in the SP status register and are used as a basic tool to verify the integrity of links.

Monitoring procedures include periodic (at present, every 10 seconds) data read out over VME from all the TF boards.

Most relevant quantities (any link errors, "unlocked" and "FIFO full" statuses, real-time trigger rates) are available to shifters and used for alarms. Monitoring data is periodically logged to the local file and Condition Database. An example of the link status display showing three links SP2/F1/M1/M2/M3 in error state is shown in Fig.7.



Figure 7: Link Status Monitoring Display

A long term study of link behavior using these monitorables allowed us to detect at an early stage of commissioning that some random fraction of optical links accumulated synchronization errors in certain runs. Detailed bench tests confirmed the problem and it was traced back to minor error in the SP's front FPGA firmware, where the receiver control signals crossing two clock domains in the FPGA were not handled properly, resulting in occasional synchronization failures. The error was fixed, and all the 12 SP boards reprogrammed, and since October 2008 we haven't seen any synchronization errors. Red alarms in the display above may well indicate the other hardware problems, for example, not properly initialized or non-powered peripheral crate, when all three links from a given MPC are not running properly.

The correlated two-dimensional LCTs are transmitted to both the trigger (TF-DDU) through the MPC and to DAQ chains (Fig.8). So the quality of data transmission via optical links can be evaluated by comparison of the trigger and DAQ data streams.



Figure 8: CSC DAQ and Trigger Readout

We have done two types of data analysis. The first one is data to data comparison of the LCTs between the trigger and main DAQ streams per event for all transmitted bits. The study consists of comparing the number of LCTs found in DMB and TF DDU for each chamber and on event by event basis. 6% of the total LCTs/event presents the largest class of mismatches, when there are no LCTs in the DMB, but at least one LCT in the TF DDU which should never happen. Most of this discrepancy was traced to disabled chambers in the main readout while being kept in the trigger. 0.4% of the total LCTs/event corresponds to the case when we have more LCTs in the main readout with respect to the trigger. This is explained by the fact that the MPC selects only three bestquality LCTs out of 18 as expected.

The second type of analysis is based on data to emulator comparison for the MPC. The CSC Trigger Primitives emulator simulates the functionalities of the ALCT, CLCT, TMB and MPC processors. Collections of the CSC wire and comparator digis are the inputs to the simulator, and ALCT, CLCT and correlated LCTs before and after the MPC sorting are its outputs. The results of this study are consistent with the previous one.

# VI. CONCLUSION AND FUTURE PLANS

The system of 180 CSC Level 1 trigger optical links has been in operation for more than a year. The firmware on a receiver part was updated several times to fix minor bugs and improve abilities to monitor link performance. Optical links are running reliably since autumn of 2008. CSC TF cell of Trigger Supervisor software allows to access all libraries to control, perform configuration, monitoring and hardware tests of the TF, including optical links. Several monitoring panels are available for shifters in the control room.

It was essential for successful commissioning at the CMS to have a testing stand in the CMS test area in building 904 at CERN. This stand includes one operational CSC chamber and a full chain of trigger boards, including the TF. This test stand is being used for various hardware, firmware and software checks, debugging and measurements. It is important to maintain such a stand for the lifetime of the experiment, along with simpler stands at the universities involved in hardware and firmware development.

The proposed Super-LHC upgrade with increased luminosity of  $10^{35}$  cm<sup>-2</sup>s<sup>-1</sup> implies higher data volumes to be transmitted through the Trigger and DAQ systems. Preliminary estimates show that the volume of data through the EMU trigger optical links will increase 3..6 times, so the present MPC becomes a bottleneck. It is envisaged that the CSC Muon Port Card, Sector Processor and optical links will have to be upgraded to accommodate higher throughput, more complex sorting and track reconstruction algorithms.

## VII. REFERENCES

[1] The CMS Experiment at CERN LHC. The CMS Collaboration. Published in Journal of Instrumentation 2008 JUNST 3 S08004.

[2] <u>http://focus.ti.com/lit/ds/symlink/tlk2501.pdf</u>

[3] TTCrq Manual. Available at: <u>http://proj-</u> <u>qpll.web.cern.ch/proj-qpll/images/manualTTCrq.pdf</u>

[4] <u>http://proj-qpll.web.cern.ch/proj-qpll/</u>

[5] Commissioning of the CSC Track Finder. CMS Internal Note CMS IN-2008/053.
# Upgrade of the Cold Electronics of the ATLAS HEC Calorimeter for sLHC Generic Studies of Radiation Hardness and Temperature Dependence

A.Rudert, D.Dannheim, A.Fischer, A.Hambarzumjan, H.Oberlack, G.Pospelov, O.Reimann, P.Schacht

Max-Planck-Institut fuer Physik, D-80805 Munich, Germany

# on behalf of the HECPAS Collaboration

# IEP Kosice, Slovakia; Univ. Montreal, Canada; MPI Munich, Germany;

#### IEAP Prague, Czech Republic; NPI Rez, Czech Republic

ruderta@mpp.mpg.de

# Abstract

The front-end electronics (signal amplification and summation) of the ATLAS Hadronic End-cap Calorimeter (HEC) is operated at the circumference of the HEC calorimeter wheels inside the cryostats in liquid argon (LAr). The present electronics is designed to operate at irradiation levels expected for the LHC. For operation at the sLHC the irradiation levels are expected to be a factor ten higher, therefore a new electronic system might be needed. The technological possibilities have been investigated. For different technologies generic studies at the transistor level different have been carried out to understand the radiation hardness during irradiation up to integrated n fluxes of  $2 \times 10^{16} n/cm^2$  and the behaviour during cool-down to LAr temperatures. An S-parameter technique has been used to monitor the performances during irradiation and cool-down. In addition, DC measurements before and after irradiation have been compared. Results of these investigations are reported. Conclusions are drawn and the viability is assessed of using technologies for carrying out the design of the new HEC cold electronics for the sLHC.

#### I. INTRODUCTION

The LAr system consists of a barrel region and two end-cap / forward regions. As seen from simulation studies, the radiation levels increase with  $|\eta|$ . From the barrel to the endcap and from the endcap to the forward calorimeters the flux and average energy of the particles from min-bias events increases with the consequent growth of multiplicity and density of shower particles. This results in a power density, and hence radiation flux, deposited in the calorimeter reaching levels not seen in previous collider detectors. The ATLAS calorimeters are designed to cope with the highest luminosity of  $\mathcal{L} = 10^{34}$  cm<sup>-2</sup> s<sup>-1</sup> foreseen at the LHC.

Under sLHC conditions both the peak and the integrated luminosity collected over an anticipated sLHC lifetime of ten years will typically increase by a factor of ten. One element which might be affected by integrated luminosity is the frontend electronics of the HEC which is located in relatively high radiation fields at the perimeter of the HEC calorimeter wheels. At the position a n fluence of  $0.2 \times 10^{14} n/cm^2$ , a  $\gamma$  dose of 5 kGy and a hadronic fluence of  $1.2 \times 10^{12} p/cm^2$  are expected after ten years of LHC operation at highest luminosity.

# II. ACTIVE PAD CONCEPT OF THE HEC COLD ELECTRONICS



Figure 1: A HEC wheel fully assembled on the assembly table showing the 'active pad' electronics.

The signal processing of the HEC employs the notion of 'active pads' which keeps the detector capacities at the input of the amplifiers small and thereby achieves a fast rise time of the signal [1]. Short coaxial cables are used to send the signals from the read-out pads to preamplifier and summing boards (PSB) located at the perimeter of the wheels inside the LAr. The lateral pad segmentation is  $\Delta \eta \times \Delta \phi = 0.1 \times 0.1$  up to  $\eta = 2.5$  and  $\Delta \eta \times \Delta \phi = 0.2 \times 0.2$  for higher  $\eta$  while the longitudinal read-out segmentation is fourfold. The pad capacitance varies from 40 to 400 *pF* which yields a rise time variation from 5 to 25 *ns*. The signals from a set of preamplifiers from longitudinally aligned pads (2, 4, or 8 for different regions of the calorimeter) are actively summed inside the chip forming one output signal, which is transmitted to the cryostat feed-through via the PSB's.



Figure 2: Picture of a PSB board.

The PSB's carry the highly integrated amplifier and summing chips in Gallium-Arsenide (GaAs) MESFET technology. The GaAs TriQuint QED-A 1  $\mu m$  technology has been selected for the front-end ASIC because it offers excellent high frequency performance, stable operation at cryogenic temperatures and radiation hardness [2]. The front-end chip consists of 8 identical preamplifiers and two drivers. The summing scheme is implemented with external components and interconnections made on the PSB. The Fig.1 shows a fully assembled HEC wheel in the horizontal position on the assembly table with the PSB boards (see Fig.2) at the outer circumference.



Figure 3: The signal amplitude measured after values of n fluence from  $1.5 \times 10^{13}$  to  $9 \times 10^{14} n/cm^2$  for four different detector capacities. Shown is the ratio to the signal amplitude before the irradiation.

It is known that GaAs is a radiation resistant semiconductor. The radiation hardness has been studied at the IBR - 2 reactor in Dubna, Russia with a set of pre-production chips. Various types of tests have been performed. Seven chips were exposed to a total fluence of fast n of  $(1.11 \pm 0.15) \times 10^{15} n/cm^2$  and an integrated  $\gamma$  dose of  $3.5 \pm 0.3 kGy$ . A second set of 8 chips was irradiated with  $\gamma$ 's up to a total dose of  $(55 \pm 8) kGy$  accompanied by a fast n fluence of  $(1.1 \pm 0.2) \times 10^{14} n/cm^2$ .

In these tests the chips were kept in a cryostat filled with liquid nitrogen. The standard set of characteristics like transfer function, rise time, linearity and equivalent noise current (ENI) of preamplifiers was measured. The measurements show that the preamplifier characteristics start to degrade when the n fluence exceeds approximately  $3 \times 10^{14} n/cm^2$ . The Fig.3 shows the degradation of the amplitude with n irradiation for four different values of input (detector) capacitance.

Similar measurements with  $\gamma$ - irradiation show that the characteristics stay unchanged up to a dose of at least 50 kGy. Both boundary values are well above the radiation levels expected in the final ATLAS environment at LHC.

In summary, the radiation hardness of the cold HEC electronics against all three types of radiation has been studied and compared to ATLAS requirements. It has been found that n irradiation is by far the most dangerous radiation type yielding the smallest safety margin.

Another important aspect of the cold electronics is the heating of the chips that can finally result in bubbling of the liquid argon. The bubbles propagating to a LAr detector gap can cause high voltage discharges. Therefore the power consumption has to be kept low.

# III. SLHC REQUIREMENTS FOR THE HEC COLD ELECTRONICS



Figure 4: Testboard with IHP transistors with four structures bonded in one ceramic package.

The present ATLAS requirements for the HEC PSB boards have been developped with an LHC design luminosity of  $10^{34}cm^{-2}s^{-1}$  corresponding to a n fluence of  $2 \cdot 10^{12}n/cm^2$ per year. Assuming an operation of 10 years this yields a safety margin of ~ 15 for the LHC luminosity. Assuming a ten times higher integrated luminosity at sLHC the safety factor is essentially eliminated, i.e. the present HEC cold electronics will be operated at its limit. It is therefore planned to develop a new ASIC that will be ten times more radiation hard against n. If needed, the new chip would be used to replace the present GaAs chips at the sLHC. For an upgrade the PSB boards at the circumference of the HEC wheel would then be replaced by new, pin compatible PSB boards with more radiation hard IC's. This operation can be done without disassembling the HEC wheels.



Figure 5: Dependence of the gain of four IHP bipolar transistors on the n flux.

The requirements for the new IC's are:

- radiation hardness for n up to a factor of 10 better, i.e. up to a fluence of a few 10<sup>15</sup>n/cm<sup>2</sup>;
- low power consumption to stay safely away from the LAr boiling point at the operational Ar pressure and temperature. In consequence, the power consumption should not exceed the present level of < 0.2 W;</li>
- as most of the QC tests have to be done at room temperature, the gain of the pramplifiers and summing amplifiers should not vary by more than a factor of two from room to LAr temperature;
- the noise has to stay low, i.e. it should not exceed the present level 50 nA with 0 pF input load or 100 nA with 200 pF load at each preamplifier input; the maximum signal for one preamplifier input is 250  $\mu A$ , the dynamic range  $\sim 5 \cdot 10^3$ ;
- as only the full read-out channel, i.e. the summed and not the preamplifier signals, can be electronically calibrated, the gain variation of the individual preamplifiers within the IC has to be below 1 %;

- the IC has to be safe against HV discharges in the gaps of the HEC modules;
- the input impedance has to be  $50 \ \Omega \pm 2 \ \Omega$  to cope with the existing cabling scheme;

#### IV. RESULTS OF TECHNOLOGY STUDIES

The radiation hardness against n irradiation has been studied for transistors of SiGe (Table 1), Si and GaAs (Table 2) technologies.

Table 1: SiGe transistors studied for radiation hardness against n irradiation.

| Material   | SiGe        | SiGe        | SiGe        |
|------------|-------------|-------------|-------------|
| Transistor | Bipolar HBT | Bipolar HBT | Bipolar HBT |
| Foundry    | IHP         | IBM         | AMS         |
| Process    | SGB25V      | 8WLBiCMOS   | BiCMOS      |
|            | 250 nm      | 130 nm      | 350 nm      |
|            |             | MB and HB   |             |
| Type       | npn         | npn         | npn         |
|            |             |             |             |

Typically four structures have been bonded in one ceramic package, which has been mounted on a small testboard. Up to 8 boards have been aligned in the n beam of the cyclotron at Rez/Prague. Protons of 37 MeV impinging on a D<sub>2</sub>O target generate a n flux up to  $10^{11} n/cm^2/s$ . The energy spectrum peaks at low energies (1 MeV) with a steep decline towards higher energies. The flux falls steeply off with the distance from the target. The typical integrated flux obtained was of the order of ~  $2 \times 10^{16} n/cm^2$  for the closest position relative to the D<sub>2</sub>O target. The performance of the transistors has permanently been monitored with a network analyzer recording the full set of Sparameters. In addition DC parameters (voltages and currents) have been recorded as well.

The Fig.4 shows the testboard with IHP transistors with four structures bonded in one ceramic package. Each of the four transistors has an input and output line connected to the network analyzer via switches.

For the four IHP bipolar transistors the Fig.5 shows the dependence of the gain on the n fluence. The two transistors which are in slot one, i.e. closest to the  $D_2O$  target, were exposed to a n fluence of  $2.2 \times 10^{16} n/cm^2$ , the corresponding n fluence for the equivalent transistors located in slot seven being  $\sim 10^{15} n/cm^2$ . The results show that the gain is rather stable in the range required for sLHC, i.e. up to  $2 \times 10^{15} n/cm^2$ ; it is independent of the irradiation density.

Table 2: Si and GaAs technologies (transistors) studied for radiation hardness using n irradiation.

| Material   | Si       | Si       | Si       | GaAs     | GaAs    |
|------------|----------|----------|----------|----------|---------|
| Transistor | CMOS FET | CMOS FET | CMOS FET | FET      | FET     |
| Foundry    | IHP      | IHP      | AMS      | Triquint | Sirenza |
| Process    | SGB25V   | SGB25V   | BiCMOS   | CFH800   |         |
|            | 250 nm   | 250 nm   | 350 nm   | 250 nm   | 250 nm  |
| Туре       | nmos     | pmos     | nmos     | pHEMT    | pHEMT   |

Table 3: Loss of gain of the transistors studied for a n fluence of  $2 \times 10^{15} n/cm^2$  at two different frequencies.

| Material   | SiGe    | SiGe    | SiGe    | Si   | Si   | Si   | GaAs     | GaAs    |
|------------|---------|---------|---------|------|------|------|----------|---------|
| Transistor | Bipolar | Bipolar | Bipolar | CMOS | CMOS | CMOS | FET      | FET     |
|            |         |         |         | FET  | FET  | FET  |          |         |
| Foundry    | IHP     | IBM     | AMS     | IHP  | IHP  | AMS  | Triquint | Sirenza |
| Туре       | npn     | npn     | npn     | nmos | pmos | nmos | pHEMT    | pHEMT   |
| 10 MHz     | 5%      | 5%      | 5%      | 4%   | 4%   | 3%   | 0%       | 4%      |
| 40 MHz     | 3%      | 2%      | 5%      | 2%   | 3%   | 3%   | 2%       | 2%      |

For two different frequencies Tab. 3 shows the loss of gain for the transistors studied at a n fluence of  $2 \times 10^{15} n/cm^2$ . The errors are dominated by systematic effects and are at the few percent level. We observe that all technologies only show a small degradation of the gain up to the irradiation level expected for sLHC.

Another important aspect is the variation of the gain with temperature. This dependence has been studied for all technologies in the required range down to liquid  $N_2$  temperatures. All bipolar technologies show a strong dependence of the operation point with temperature, i.e. they require a voltage adjustment when going from room to liquid  $N_2$  temperatures. This is different for the FET's where the gain variation is rather small within the temperature range studied.

# V. CONCLUSIONS

Based on these studies, both options, SiGe Bipolar HBT as well as Si CMOS FET, are under further investigation. Presently preamplifiers are being developed for both technologies. The dynamic range and the noise performance are investigated. We plan to irradiate these prototype preamplifiers in cold in the near future. The final technology selection will be based on these results.

#### REFERENCES

- ATLAS Liquid Argon Calorimeter Technical Design Report, CERN/LHCC/96-41 (1996).
- [2] J. Ban et al., Cold electronics for the liquid argon hadronic end-cap calorimeter of ATLAS, Nucl. Instr. and Meth. A556 (2006), 158.

# Radiation hardness studies of a 130 nm Silicon Germanium BiCMOS technology with a dedicated ASIC

S. Díez<sup>b</sup>, M. Wilder<sup>e</sup>, M. Ullán<sup>b</sup>, Y. Tazawa<sup>d</sup>, A. K. Sutton<sup>a</sup>, H. Spieler<sup>g</sup>, E. Spencer<sup>e</sup>, A. Seiden<sup>e</sup>, H.F.-W. Sadrozinski<sup>e</sup>, M. Ruat<sup>b</sup>, S. Rescia<sup>c</sup>, S. Phillips<sup>a</sup>, F. M. Newcomer<sup>d</sup>, G. Mayers<sup>d</sup>, F. Martinez-McKinney<sup>e</sup>, I. Mandić<sup>f</sup>, W. Kononenko<sup>d</sup>, A. A. Grillo<sup>e</sup>, V. Emerson<sup>c</sup>, N. Dressnandt<sup>d</sup>, J. D. Cressler<sup>a</sup>

<sup>a</sup> Georgia Institute of Technology, School of Electrical and Computer Engineering, USA

<sup>b</sup> Centro Nacional de Microelectrónica (CNM-CSIC), Spain

<sup>c</sup> Brookhaven National Laboratory (BNL), USA

<sup>d</sup> The University of Pennsylvania, Physics and Astronomy Department, Philadelphia, PA, USA

<sup>e</sup> Santa Cruz Institute for Particle Physics (SCIPP), University of California Santa Cruz, USA

<sup>f</sup> Jozef Stefan Institute, Slovenia

<sup>g</sup> Lawrence Berkeley National Laboratory (LBNL), Physics Division, USA

sergio.diez@imb-cnm.csic.es

# Abstract

We present the radiation hardness studies on the bipolar devices of the 130 nm 8WL Silicon Germanium (SiGe) BiCMOS technology from IBM. This technology has been proposed as one of the candidates for the Front-End (FE) readout chip of the upgraded Inner Detector (ID) and the Liquid Argon Calorimeter (LAr) of the ATLAS Upgrade experiment. After neutron irradiations, devices remain at acceptable performances at the maximum radiation levels expected in the Si tracker and LAr calorimeter.

#### I. INTRODUCTION

Large Hadron Collider (LHC) upgrade, the Super-LHC, will imply a luminosity increase in the experiment of an order of magnitude [1]. This means a significant increase in the radiation levels inside the ATLAS detector [2]. Based on the working "strawman" layout for the silicon strip detector of the upgraded ATLAS detector, the current studies predict 30 Mrad(Si) of total ionizing dose (TID) and 9.8 x  $10^{14}$  cm<sup>-2</sup> 1 MeV neutron equivalent fluence in the "short-strips" region, and 8.4 Mrad(Si) - 3.5 x  $10^{14}$  cm<sup>-2</sup> in the "long-strips" region, while the radiation levels for the liquid Argon calorimeter (LAr) are expected to be in the order of 300 Krad(Si) total ionizing dose (TID) and a total 1 MeV neutron equivalent fluence of 9.6 x  $10^{12}$  cm<sup>-2</sup>. All these numbers include the 2x safety factor.

The increased luminosity and enhanced degradation created by the new radiation environment will force to replace completely the current Inner Detector and the readout electronics for the Liquid Argon Calorimeter (LAr). One of the technological options for these applications is the use of SiGe BiCMOS technologies. Those technologies provide high amplification factors at low shaping times as well as very low noise vs. power ratio. Nevertheless, their radiation hardness must be validated up to the high radiation levels expected in the ATLAS Upgrade experiment. After previous studies of several SiGe technologies from different foundries, and given the preliminary radiation studies [3], [4], the main option chosen from the SiGe group for this application is the 130 nm 8WL BiCMOS technology from IBM. This technology provides an easy portability with the 8RF IBM 130 nm CMOS technology, which is the baseline technology for the digital part of the upgraded FE readout chip of the Si Tracker. We present in this work the performance of bipolar devices from the SiGe BiCMOS 8WL technology after neutron radiation exposure, as part of the radiation hardness assurance test program of this technology. Other experiments scheduled in the test program are gamma irradiations, proton irradiations and Enhanced Low Dose Rate Sensitivity (ELDRS) studies. All of them are in progress and will be reported soon. Two prototype FE readout Test Chips (TC) have also been designed and fabricated for both the Si Tracker and the LAr calorimeter and their pre-irrad results are also reported in this conference [3], [5], [6].

#### II. IBM 8WL SIGE BICMOS TECHNOLOGY

Fig. 1 shows a schematic cross-section of the bipolar transistors of the high-performance 130 nm 8WL SiGe BiCMOS technology (100 / 200 GHz peak  $f_T / f_{max}$ ). Detailed information about the features of the 8WL technology is reported in [7]. On the purpose of studying the radiation resistance of the IBM 8WL technology, a dedicated TC with different test structures has been designed and fabricated in this process. It is the so-called SiGBiT ASIC.



Figure 1: Schematic cross-section of the 8WL SiGe BiCMOS technology

The Silicon-Germanium Bipolar Test chip (SiGBiT) consists of several test structures from the 8WL process. It includes 40 design-kit bipolar transistors of different types, geometries and emitter sizes (18 differential pairs and 4 single transistors), and several resistors of different geometries. All these devices are summarized in Table 1. The SiGBiT ASIC also includes a CMOS test structure ported from the 130 nm 8RF CMOS technology structure designed by the CERN microelectronics group. Figure 2 shows a picture of the chip layout.

Table 1: SiGBiT npn HBTs and resistor inventory.

| NPN SiGe Bipolar transistors (120 nm emitter width) |      |        |      |         |         | _         |
|-----------------------------------------------------|------|--------|------|---------|---------|-----------|
| Count                                               | Pair | Single | Туре | Emitter | Stripes | -         |
| 4                                                   | Х    |        | HP   | 20      | 2       | -         |
| 2                                                   | Х    |        | HB   | 20      | 2       |           |
| 3                                                   | Х    |        | HP   | 8       | 1       |           |
| 3                                                   | Х    |        | HB   | 8       | 1       |           |
| 6                                                   | Х    |        | HP   | 1       | 1       |           |
| 4                                                   |      | Х      | HP   | 4       | 2       |           |
| Resistors                                           |      |        |      |         |         |           |
| Count                                               | Pair | Single | Туре | L (µm)  | W (µm)  | Value (kΩ |
| 3                                                   |      | Х      | PP   | 35      | 6       | 2         |
| 2                                                   | Х    |        | PR   | 30      | 3       | 2.3       |
| 2                                                   | x    |        | RR   | 30      | 3       | 17        |





Figure 2: Layout of the SiGBiT test chip. Bipolar parts are located on the upper part and the sides of the chip. 8RF-ported CMOS test structure is located at the bottom of the chip.

Our study will be focused on the bipolar devices configured as differential pairs. Their emitter sizes are 4.8  $\mu$ m<sup>2</sup> for the 20x2 high-performance (HP) and high-breakdown (HB) devices, 0.96  $\mu$ m<sup>2</sup> for the 8x1 HP and HB devices, and 0.12  $\mu$ m<sup>2</sup> for the 1x1 HP devices. There are 2 k $\Omega$  polysilicon resistors placed in series between the base of the transistors and the base pads, in order to avoid high frequency oscillations of the devices during measurements. The value of these resistors to the base of the transistors, although we did not use them in our measurements.

# **III. EXPERIMENT AND MEASUREMENTS**

In order to evaluate the displacement damage created on the devices due to radiation, we performed neutron irradiations on several test chips. Neutron irradiations were performed in the TRIGA nuclear research reactor facilities, in Jozef Stefan Institute (JSI), Ljubljana, Slovenia. Five fluences were reached:  $2x10^{13}$ ,  $2x10^{14}$ ,  $6x10^{14}$ ,  $1x10^{15}$  and  $5x10^{15}$  cm<sup>-2</sup> (1 MeV n<sub>eq</sub>). In order to minimize the activation of the samples during irradiation, we glued the test chips on bare Si boards with no additional material. Devices were irradiated with all terminals floating. Previous studies performed on other SiGe bipolar transistors showed that devices do not change their radiation behaviour with respect to different bias configurations during neutron irradiations [8]. All irradiations were performed with a cadmium (Cd) shielding surrounding the samples to reduce the effect of thermal neutrons [9]. All results shown here correspond to devices that have gone through 15 days of room temperature annealing after irradiation. Nevertheless, room temperature annealing showed no significant effect on the performance of the devices under test.

Forward Gummel plots (FGP) of the transistors were measured before and after irradiation. Measurements were performed in a CASCADE manual probe test bench with a HP4155B semiconductor parameter analyzer. FGPs were obtained in common-emitter configuration, which means sweeping  $V_{BE}$  from 0 to 1 V (applying a sweep in  $V_E$  from 0 to -1 V and keeping  $V_B = V_S = V_C = 0$  V). In order to evaluate the degradation created on the samples by neutron irradiations, several figures-of-merit were extracted from the FGPs: the common-emitter current gain of the transistors after irradiations,  $\beta_f = I_{Cf}/I_{Bf}$ , the change in their reciprocal gain,  $\Delta(1/\beta) = 1/\beta_F - 1/\beta_0$ , and their normalized current gain,  $\beta_N = \beta_{f} / \beta_0$ . All these parameters were extracted at  $V_{BE} = 0.75$ V, an arbitrary selection corresponding to or close to the injection levels that these transistors are expected to work in the real circuits.

#### **IV. RESULTS**

The main effect of non-ionizing radiation on the characteristics of a SiGe bipolar transistor is an increase of the base current ( $I_B$ ), which produces a reduction in the common emitter current gain ( $\beta_f = I_{Cf} / I_{Bf}$ ). This effect becomes more important at lower injection levels, as can be seen as in Fig. 3 for IBM 8WL 1x1 transistors.



Figure 3: Final current gain  $(\beta_f)$  versus collector current  $(I_c)$  for transistors *HP* - *Ix1* after different neutron fluences

#### A. Neutron irradiations

Fig. 4 shows the value of the common-emitter current gain  $(\beta_f)$  for the different transistor types versus the neutron fluence. This parameter illustrates the absolute gain degradation of the devices. As it can be observed from the figure, transistors remain with values of  $\beta_f$  above 50 at the target values of neutron fluence (~1x10<sup>13</sup> cm<sup>-2</sup> for LAr and ~1x10<sup>15</sup> cm<sup>-2</sup> for Si Tracker). These final values of  $\beta_f$  are within the circuit operation specifications. In spite of this, a very severe degradation of the devices at the highest fluence reached in the experiment (5x10<sup>15</sup> cm<sup>-2</sup>) can also be observed. This fluence value is far beyond the maximum fluence expected in the "short-strips" region of the Si Tracker.



Figure 4: Final current gain  $(\beta_f)$  versus neutron fluence for all transistor types. Filled points correspond to mean values.

The values of the reciprocal gain  $(\Delta(1/\beta))$  versus neutron fluence are shown in Fig. 5. The figure demonstrates a very clear linear dependence of the radiation damage created on the transistors with the non-ionizing particle fluence, as expected from the literature [10]. The linear fits of the mean values for each type of transistor are also shown in the figure.



Figure 5: Variation of reciprocal current gain  $(\Delta(1/\beta))$  versus neutron fluence for all transistor types. Filled points correspond to mean values. Linear fits of the mean values are also represented.

Fig. 6 shows the value of the normalized current gain  $(\beta_N)$  for the different devices under study. This figure of merit is useful for the comparison of the behaviour under radiation of the different transistor types, as it cancels the dependence of the damage with the value of the initial gain  $(\beta_0)$ , which varies substantially from one transistor type to the other. The figure illustrates that degradation is very similar for all transistor types and geometries studied in this experiment.



Figure 6: Normalized current gain  $(\beta_N)$  versus neutron fluence for all transistor types. Filled points correspond to mean values.

#### B. Transistor damage variability

Preliminary radiation studies performed on bipolar devices from IBM 8WL technology revealed high variability on the performance of irradiated transistors, especially after neutron irradiations [3]. Variability of results could lead to an undesirable excessive mismatching in the final circuit. At that time, we attributed this effect to possible problems in the test structure which was not designed by the authors, but obtained from the foundry as "spare" pieces. We decided to repeat the experiment with design-kit transistors and fabricated within process specifications as it is done in the present study.

For the study of the variability of results in this experiment, we calculated the value of the standard deviation ( $\sigma$ ) of the base current after irradiation, and then normalized this value to the mean value of the mean base current, that is  $\sigma_N = \sigma(I_{Bf})/I_{Bf}$ . This value is shown in Fig. 7 for the different fluences and transistor types.



Figure 7: Normalized standard deviation  $(\sigma_N)$  of the final base current  $(I_{Bf})$  versus neutron fluence for all transistor types.

The figure shows that dispersion of the results is smaller than the one observed in the previous experiments, in which values of  $\sigma_N$  were above 0.6 in all the cases (these values can be calculated from the results reported in [3]). It can also be observed that variability increases for smaller emitter geometries as it is always expected in mismatching measurements. Some small fluence dependence can be derived from the figure. We believe this effect may be related to low probable nuclear interactions of the neutrons with the nucleus in the lattice of the devices under study, that produce high damage in the active region of the devices with low statistics. Nevertheless, variability is not fully understood, and a deeper study of this effect is ongoing. Results from gamma irradiations in progress, which only produce ionizing damage on the samples, will be a great help to understand this effect. In any case, current gain values remain above 50 at the target fluence even in the worst-case transistor, as can be observed in Fig. 4.

#### V. CONCLUSIONS

We have performed neutron irradiations on bipolar devices of the 130 nm 8WL Silicon Germanium (SiGe) BiCMOS technology from IBM, in order to study its radiation hardness. Devices remained at sufficiently good performances at the target values of fluence expected in the Si tracker and the LAr calorimeter of the ATLAS Upgrade experiment. We observed some variability on the results that has still to be understood. Nevertheless, transistors remain functional with sufficient performance even in the worst cases.

#### REFERENCES

- F. Gianotti et al. "Physics potential and experimental challenges of the LHC luminosity Upgrade," *The European Physics Journal C- Particles and fields*, vol 39(3), pp. 293-333, 2005.
- [2] G. Darbo, et al. "Outline of R&D Activities for ATLAS at an Upgraded LHC," CERN document COM-GEN-2005-002, Jan 2005.
- [3] M. Ullán et al. "Evaluation of Silicon-Germanium (SiGe) Bipolar Technologies for Use in an Upgraded ATLAS Detector," *Nuclear Instruments and Methods in Physics Research A*, vol. 604, Issue 3, pp. 668-674, 2009.
- [4] S.Díez et al. "IHP SiGe:C BiCMOS technologies as a suitable backup solution for the ATLAS Upgrade Front-End electronics," *IEEE Trans.* on Nuclear Science, vol 56 (4), pp. 2449-2456, 2009.
- [5] A. Grillo et al. "A Prototype Front-End Readout Chip for Silicon Microstrip Detectors Using an Advanced SiGe Technology," *Topical*

Workshop on Electronics for Particle Physics 2009 (TWEPP 09), Workshop Proceedings, 2009.

- [6] M. Newcomer et al., "A SiGe ASIC Prototype for the ATLAS LAr Calorimeter Front-End Upgrade," *Topical Workshop on Electronics for Particle Physics 2009 (TWEPP 09)*, Workshop Proceedings, 2009.
- [7] J. D. Cressler, A. Sutton, M. Bellini, A. Madan, S. Phillips, A. Appaswamy, T. Cheng. "Radiation Effects in SiGe Devices", *MURI Review*, Vanderbilt University, Nashville, TN, 2008.
- [8] M. Ullán, et al. "Radiation damage of SiGe HBT Technologies at different bias configurations," *Topical Workshop on Electronics for Particle Physics 2008 (TWEPP 2008)*, Workshop Proceedings, 2008.
- [9] I. Mandic et al. "Bulk damage in DMILL npn bipolar transistors caused by thermal neutrons versus protons and fast neutrons," *IEEE Trans. on Nuclear Science*, vol. 51(4), pp. 1752-1758, 2004.
- [10] G. C. Messenger and J. P. Spratt, "The effects of neutron irradiation on germanium and, silicon" *Proc. IRE*, vol. 46, pp. 1038–1044, 1958.

# OMEGAPIX: 3D integrated circuit prototype dedicated to the ATLAS upgrade Super LHC pixel project

A. Lounis<sup>a</sup>, C. de La Taille<sup>a</sup>, N. Seguin-Moreau<sup>a</sup>, G. Martin-Chassard<sup>a</sup>, D. Thienpont<sup>a</sup>, Y. Guo<sup>b</sup>

<sup>a</sup> CNRS/IN2P3/LAL, Bât. 200, BP 34, 91 898, Orsay, France <sup>b</sup> CNRS/IN2P3/LPNHE, Paris (VI), France

# thienpon@lal.in2p3.fr

### Abstract

In late 2008, an international consortium for development of vertically integrated (3D) readout electronics was created to explore features available from this technology.

In this paper, the OMEGAPIX circuit is presented. It is the first front-end ASIC prototype designed at LAL in 3D technology. It has been submitted on May 2009.

At first, a short reminder of 3D technology is presented. Then the IC design is explained: analogue tier, digital tier and testability.

# I. 3D CONSORTIUM, MULTI-PROJECT WAFER (MPW) AND PROCESS

The <u>Handbook of 3D Integration</u> [1] defines the 3D integration as "an emerging, system level integration architecture wherein multiple strata (layers) of planar devices are stacked and interconnected using through silicon (or other semiconductor material) vias (TSV) in the Z direction".

The more expected benefits of this emerging technology for High Energy Physics (HEP) applications are to reduce the insensitive area (in particular for the pixel sensor), add more functionalities (several CMOS technologies in the same global device) and improve the form factor (less material, more little device size).

# *A.* 3D consortium: a large number of international institutes

In late 2008, Fermilab, U.S.A., took the initiative in gathering several international laboratories and institutes with interest in HEP to intend to bring together resources to investigate options and share cost [2].

Besides Fermilab, this consortium gathers six IN2P3 institutes in France, six Italian institutes, University of Bonn and AGH University of Science & Technology in Poland.

A MPW has been submitted on May 2009 for "only" a two layers device.

# B. Chartered/Tezzaron 3D process

Among the various available 3D technologies, the process from Tezzaron was chosen. This process is wafer to wafer, face to face and since it is via first (TSVs are built in the same time than transistors) another company has to build the wafer. Tezzaron are working with Chartered which performs the wafer fabrication with TSV as a part of its foundry process.

Chartered technology is a 130 nm CMOS one with various types of transistors: 3p3, 1p5, 1p5 low Vt. It builds TSV of 6  $\mu$ m length and 1,2  $\mu$ m for the diameter. (See picture below, TSVs are called Super-Vias).

Then Tezzaron performs the wafer connection with the Cu-Cu thermocompression bonding technique making both electrical and mechanical connections. The alignment between the two wafers is better than 2  $\mu$ m. Next, the back-side of one wafer is thinned up to reach the TSV contact: this wafer has about 12  $\mu$ m thickness.



Figure 1: Picture from Tezzaron website showing a three layers device

In the picture above, we can see that the two first wafers from the bottom have been stacked in a face-to-face process (Cu-Cu pads). Then the back-side of one wafer is thinned up to reach the TSV, Cu pads are placed on each TSV and this face becomes a new "front-side" for another face-to-face stacked process. Also a large number of wafers can be stacked.

#### II. OMEGAPIX DESIGN

OMEGAPIX circuit embeds 64x24 readout channels that have been developed to match very drastic requirements. Into the first layer, called analogue tier, there are the analogue part of the front-end cell, a block which performs the selection of the column and the bias. Into the other layer, called digital tier, there is a shift register with a read logic into each channel.

#### A. Requirements

Although one of goals of this first chip is to explore this new technology, as much the 130 nm CMOS process from Chartered as reliability and yield of 3D devices from Tezzaron, requirements have been chosen in such a way they go to the future likely requirements of the ATLAS upgrade Super LHC pixel project.

So, we want to explore a new possibility to minimize the pixel pitch down to  $50x50 \mu m$ . Thus a readout array matching a new MPI-HLL plannar pixel sensor prototype from Munich has been designed.



Figure 2: pixel array sensor prototype

Some specifications are given bellow:

Channel size:  $50x50 \mu m$ . The first limitation of the pixel size is currently the electronics readout area.

Dissipation: 3  $\mu$ W/ch. If we want to keep an equivalent power consumption after the pixel size shrinking, we have to low drastically the power dissipation for each channel. Typically the consumption should be 2.4  $\mu$ W/Ch to keep the power density at 96 mW/cm<sup>2</sup>. The power density has been low down to 80 mW/cm<sup>2</sup> (2  $\mu$ W/ch) for the analogue tier and 40 mW/cm<sup>2</sup> (1  $\mu$ W/ch) for the digital tier.

Noise: the IC has been designed to low the noise down to 100 e- and to be able to decrease the threshold down to 1000 e-.

#### B. Analogue Tier

The analogue channel is divided into three parts: the preamplifier, the shaper with threshold tuning and the discriminator.

The power voltage for all the analogue part, except for the discriminator, is 1.2 V.



Figure 3: analogue one channel schematic

#### 1) Preamplifier description

In order to reach the very low power requirement and low channel area, design has been done in such a way that the global capacitance has been minimized.



Figure 4: preamplifier schematic

The parasitic capacitance Cgd performs the feedback capacitance.

Cf = Cgd = ~ 1.6 fF

The ideal gain is 1/Cf = 100 mV/ke- or 625 mV/fC. In simulation, the gain is about 60 mV/ke- or 375 mV/fC. This lower value is due to the non infinite preamplifier open loop gain.

The bias current are Ib1 = 100 pA, Ib2 = 2 nA,  $Ib3 = 1 \mu A$ . A paraphase structure has been used to fix the DC points, equivalent to a non-inverting Common Source; transconductance = gm1.gm2/(gm1+gm2) depending of the current.

$$Rf = Req = ~180 \text{ M}\Omega \text{ if Ib1} = 100 \text{ pA}$$
$$Rf = Req = 74 \text{ M}\Omega \text{ if Ib1} = 1 \text{ nA}$$

#### 2) Shaper description

The shaper has almost the same structure than the preamplifier with a capacitive coupling but also with an additional variable gain and a 5 bits DAC to adjust the DC output and thus the threshold.



Figure 5: shaper schematic

The bias current are Ib1 = 2.5 nA, Ib2 = 5 nA, Ib3 = 60 nA.

The variable gain consist in four various NMOS in parallel which can be switched leading to make the global Cgd value to vary. So the gain varies from 172 mV/ke- to 487 mV/ke-, or from 1.075 V/fC to 3 V/fC.

The DAC fixes the output DC voltage.

#### 3) The 5 bits DAC

Since the high resistance poly option from Chartered was not taken, the DAC would have been designed with only transistors.



Figure 6: 5 bits DAC schematic

The principle of this DAC is not usual: two sets of diodes have been designed in such a way that the equivalent impedances are different. Four current sources can be selected to make the current to vary. One bit selects the diode we want to use; the four other bits adjust the current which draws through the selected diode. The DAC value can vary from 460 mV to 850 mV which is sufficient for tuning the threshold.

#### 4) Discriminator description

The discriminator consists of three inverters. At first, this block should be into the digital tier to minimize the bulk coupling between the discriminator and the preamplifier input. This design will be made in a next circuit.



Figure 7: outputs after the three inverters

#### 5) Dedicated test chip

This circuit has been design in such a way that it can be easy to test and measure the signals.

Three probes have been added into each analogue channels after the preamplifier, the shaper and the discriminator to observe signals by oscilloscope.

Several column types have been designed allowing us to study various flavors of transistor types (normal, low Vt, 3p3), noise, oscillations...

- ✓ Column 1 to 10: reference channels
- ✓ Column 11 to 18: various preamplifier transistor types have been integrated
- ✓ Column 19 to 22: without variable gain
- ✓ Column 23: discriminator has been removed
- ✓ Column 24: shaper has been removed



Figure 8: Slow Control in the analogue tier

Three shift register for the slow control have been implemented. One to configure each analogue channels: test capacitance, DAC, variable gain, masked discriminator output, three probes. There are 14 bits for each channel and, with 1536 channels, this shift register has 21504 bits of slow control.

Another shift register has been implemented to configure the Select Column block: one bit to power or shut off the column, another bit to select the column in which the channel with selected probes is. This shift register has 48 bits of slow control.

#### 6) Simulations

At this time the only results are simulations.



Figure 9: simulation of the analogue channel

We can get a very high gain after the shaper, up to 3 V/fC. The simulated rms noise gives 16.2 mV, or 46 e-, which gives: S/N = 21.

The figure below shows the linearity of the Time Over Threshold (TOT) for various injected charge.



Figure 10: TOT for different injected charge values

The TOT linearity is limited because the shaper output is rapidly saturated and oscillations can be observed which leads to introduce defaults in the effective time over threshold: the shaper has been tuned for a little injection charge threshold, 1000 electrons or 0,16 fC, but the typical injection charge will be significantly different with a sensor of about 200  $\mu$ m thickness or about 75  $\mu$ m.

#### C. Digital Tier

For the digital tier, the supply voltage was fixed to 1 V.

Three parts divide this tier: a RS FlipFlop, a shift register of 24 DFlipFlops and a reading structure into each digital channel placed just above the corresponding analogue channel.

The digital tier has just been designed to get out the pulse coming from the discriminator and to create digital noise.

One of the more important targets will be to examine the coupling between the two tiers; and so, creating activity in digital tier will allow us to observe the behaviour of one layer when the neighbouring layer is working.



Figure 11: digital channel schematic

A shift register into the digital tier has been implemented to select the channel we want to read. This shift register has 1536 bits of shift register.

#### D. Power consumption

The power consumption for one channel, in simulation, is about 1.75  $\mu$ W/ch, below the requirement.

#### III. TEST BOARD

A test board has been designed with a specific firmware to control the chip I/O. A LabView software manages the board.

It is possible to observe and measure the influence of coupling between the digital tier and the analogue tier.

The three probes allow us to observe the signals after the preamplifier, the shaper and the discriminator by oscilloscope.

To characterize the discriminators S-Curve measurements will be made.

### **IV. REFERENCES**

[1]: <u>Handbook of 3D Integration</u>, Technology and applications of 3D Integrated Circuits, edited by Philip Garrou, Christopher Bower and Peter Ramm.

[2]: website of 3DIC at Fermilab, http://3dic.fnal.gov

# Design and measurements of 10 bit pipeline ADC for the Luminosity Detector at ILC

Marek Idzik<sup>a</sup>, Krzysztof Swientek<sup>a</sup>, Szymon Kulis<sup>a</sup>

<sup>a</sup> AGH university of Science and Technology, Faculty of Physics and Applied Computer Science al. Mickiewicza 30, 30-059 Krakow, Poland

swientek@agh.edu.pl

## Abstract

The design and the preliminary measurements of a prototype 10 bit pipeline ADC based on 1.5-bit per stage architecture, developed for the luminosity detector at International Linear Collider (ILC) are presented. The ADC is designed in two versions, with and without a sample-and-hold circuit (S/H) at the input. The prototypes are fabricated in 0.35  $\mu$ m CMOS technology. A dedicated test setup with a fast FPGA based data acquisition system (DAQ) is developed for the ADC testing. The measurements of static (INL, DNL) and dynamic parameters are performed to understand and quantify the circuit performance. The integral (INL) and differential (DNL) nonlinearity are below 1 LSB and 0.5 LSB respectively. The dynamic measurements show signal to noise (SNHR) ratio of about 58 dB for sampling frequency up to 25 MHz.

#### I. INTRODUCTION

A dedicated multichannel readout electronics is needed for the operation of the luminosity detector (LumiCal) [1] at the future ILC collider [2, 3]. The energy deposited in a silicon sensor, detected and amplified in the front-end electronics, needs to be digitised and registered for further analysis. The precision required on the measurement of deposited energy was studied in simulations and was found to be about 10 bits [4]. Considering the number of detector channels needed ( $\sim$ 200,000) and the limitations on area and power, the optimal choice for the analog to digital conversion seems a dedicated multichannel ADC.

Two schemes of analog to digital conversion are presently under study: one relatively slow ADC per each front-end channel and one faster ADC per group of (about) 8 channels. First option would be the simplest solution from the designer point of view while the second one would allow to save on chip area. The first option requires an ADC with sampling rate of around 3 Msample/s while the second requires the sampling rate of about 24 Msample/s. One of the most efficient architectures assuring a good compromise between the speed, the area and the power consumption is the pipeline ADC [5, 6, 7]. This architecture was chosen for the LumiCal data conversion. Since in the ILC experiment each 1 ms long active beam time will be followed by 200 ms pause [8] the requirements on readout electronics power dissipation may be strongly relaxed if the power is switched off during the pause.

# II. DESIGN

The pipeline ADC is built of a number of serially connected converting stages as shown in fig. 1. In this work an architecture

with 1.5-bit stages is chosen because of its simplicity and immunity to the offsets in the comparator and amplifier circuits [5]. The 1.5-bit since generates only three different values coded on 2 output bits which are sent to a digital correction block where 18 input bits from 9 stages are combined together resulting in 10 bits of ADC output.



Figure 1: Pipeline ADC architecture

The block diagram of fully differential single stage is shown in fig. 2. Each 1.5-bit stage consist of two comparators, two pairs of capacitors  $C_s$  and  $C_f$ , an operational transconductance amplifier, several switches and small digital logic circuit. The stage gain of 2 is obtained setting  $C_s = C_f$ . Since the chosen ADC architecture leaves very relaxed requirements on the comparators thresholds (~100 mV precision) the comparators are designed as simple dynamic latches.



Figure 2: Simplified schematic of a 1.5-bit stage

A critical block of pipeline ADC is the differential amplifier. A telescopic cascode amplifier configuration is used here. This represents the most efficient solution with respect to speed vs power in comparison to commonly used configurations like folded cascode and two stage amplifier. Such solution is possible since the considered technology with relatively high 3.3 V supply voltage leaves enough space for the signal dynamic range, which otherwise would be a weak point of the telescopic configuration. In order to obtain high enough gain in a single stage amplifier a gain-boosting scheme is implemented [9, 10].

To allow the possibility of power saving during the beam pause the prototype features the switches for clock and analog power. One can turn off the biasing currents in the differential amplifier and stop the clock in digital blocks.

Since it is not decided yet whether the S/H circuit will be implemented in the front-end channel or in the ADC itself there are two versions of ADC prototypes with and without S/H circuits.

# **III. MEASUREMENT SETUP**

A number of specific requirements need to be fulfilled for an efficient and complete ADC testing. The most important are:

- availability of differential input signal generator
- input signal and reference voltages precision better than ADC resolution
- wide frequency range of external sampling clock and sine wave input source (for dynamic testing)
- for ADC without sample and hold input stage the input sine waveform should be step like to allow dynamic testing
- acquisition system able to record the digitised ADC data with the rate exceeding the ADC sampling frequency
- possibility to perform automatic scans over input signal amplitude, frequency, sampling rate etc.

The block diagram of a dedicated FPGA based test setup fulfilling above requirements is shown in fig. 3. The setup is controlled from a PC computer through GPIB and USB interfaces. The input signal and clock is generated by Tektronix Arbitrary Waveform Generator AWG2021. Since this 12 bit generator produces single ended signal the conversion to differential is needed. It is done by a dedicated circuit comprised of a fast differential amplifier (THS4505). For measurements with static signals AWG2021 generates a slow voltage ramp in the full ADC range. For dynamic measurements in ADC without S/H input stage the same device generates sine-like step signal.

The reference voltages are generated with the Agilent B1500A Semiconductor Device Analyser. Both the AWG2021 and the B1500A are controlled through the GPIB interface. Such interfacing allows the implementation of automatic scans over all AWG2021 and B1500A parameters (amplitude, frequency, biasing, etc.) and so to determine their effect on the overall circuit performance.

The core of the setup is the FPGA DAQ system built using Altium Nanoboard with Xilinx Spartan-IIE FPGA. Since the data acquisition should be fast the control logic block which reads the data lines from ADC and stores it in the memory (available on the Nanoboard) is written in Verilog HDL language. On the other hand the 8051 microcontroller (IP Core) with dedicated firmware is used to manage the communication between the NanoBoard and PC. Since there is no requirement for very high transmission rate the 8051 works at lower frequency than the logic block. Several simple high level commands are sent from the PC to the microcontroller through its UART port to control the behaviour of the DAQ. UART to USB converter is used in between the Nanoboard and the computer to improve the flexibility of the system. The commands received by the microcontroller configure logic block, start data acquisition and data readout.

It was verified experimentally that the described configuration allows acquisition of the data with a sampling frequency up to 100 MHz.

#### **IV. PRELIMINARY MEASUREMENTS**

The ADC prototypes are fabricated in 0.35  $\mu$ m four metal two poly CMOS technology. The photograph of the ASIC containing ADC channels with and without S/H stage is shown in fig. 4. The size of the chip is  $2700 \times 3800 \ \mu$ m which encompass six ADC prototypes, small control logic and pads.



Figure 3: Diagram of a complete test system



Figure 4: Photograph of ADC ASIC



Figure 6: INL and DNL for ADC with sample-and-hold

#### A. Static measurements

Static measurements are performed for the input voltage ramped in the range from -1 V to 1 V. The measured ADC transfer function is shown in figure 5. It is seen that the ASIC is fully functional and linear in first approximation.



Figure 5: ADC output codes vs input voltage; single measurement typical result

To eliminate noise each data point is measured several hundred times. The magnification in the upper left corner of the figure 5 shows the most probable value (mode) calculated in each point.

The differential (DNL) and integral (INL) nonlinearities are computed using the histogramming method [11]. The results for the ADC with and without sample-and-hold circuits are shown respectively in figure 6 and figure 7.



Figure 7: INL and DNL for ADC without sample-and-hold

It is seen that both versions show good linearity i.e. the DNL stays always below 0.5 LSB and INL is significantly below 1 LSB. The ENOB computed from the INL curve is 9.71 in the first case and 9.78 in the second one.

#### B. Dynamic measurements

The dynamic measurements are performed by applying a sine signal to the ADC input and measuring the frequency spectrum distribution at the output. A typical FFT spectrum distribution is shown in figure 8. It is seen that the highest harmonic components are well below 70 dB and the noise level is significantly lower at about 90 dB.



Figure 8: Sample FFT of 40 kHz signal at 30 Msps

From the obtained FFT spectrum important dynamic parameters were calculated. In particular the signal to noise performance was studied. The signal to noise ratio without harmonics (SNHR) as a function of sampling frequency is shown in figure 9. It is seen that SNHR=58 dB up to around 25 MHz and then starts to decrease. The harmonics parameters (THD, SFDR) are not presented here since it was found that the AWG2021 itself generates spurious harmonics on 40 dB level. In order to check the level of harmonics only few measurements were done using the Agilent 33220A arbitrary waveform generator confirming that the harmonics components are below 70 dB.



#### V. POWER CONSUMPTION

The first power consumption measurements were done at 30 Mhz sampling frequency. For the ADC containing S/H stage the analog and digital currents are 8.6 mA and 6.2 mA respectively. For the version without S/H the same currents are 7.1 mA and 5.5 mA respectively. The power may be reduced when lower sampling frequency is used.

#### VI. SUMMARY

A 10 bit pipeline ADC was designed, produced and found fully functional. Preliminary static measurements show the maximum DNL and INL of about 0.43 LSB and 0.64 LSB respectively. The dynamic signal to noise ratio measurements give around 58 dB. The performance measurements confirmed the resolution close to 9.5 bits. More detailed dynamic measurements are needed to study better the harmonics. Also the tests of power saving features need to be performed.

#### VII. ACKNOWLEDGEMENTS

This work was partially supported by the Commission of the European Communities under the  $6^{th}$  Framework Programme "Structuring the European Research Area", contract number RII3-026126. It was also supported in part by the Polish Ministry of Science and Higher Education under contract nr 372/6.PRUE/2007/7.

#### REFERENCES

- H. Abramowicz et al., Instrumentation of the very forward region of a linear collider detector. IEEE Trans. Nucl. Sci., vol. 51, pp. 2983-2989, Dec. 2004.
- [2] M. Idzik, et al., The Concept of LumiCal Readout Electronics. EUDET-Memo-2007-13, 2007. http://www.eudet.org/e26/e28/e182/e281/eudet-memo-2007-13.pdf
- [3] M. Idzik, K. Swientek, Sz. Kulis, Development of pipeline ADC for the luminosity detector at ILC. Proceedings of 15th International Conference on Mixed Design of Integrated Circuits and Systems, MIXDES 2008, 19-21 June 2008, pp. 231–236, 2008.
- [4] H. Abramowicz, R. Ingbir, S. Kananov, A. Levy, I. Sadeh, GEANT4 Simulation of the Electronic Readout Constraints for the Luminosity Detector of the ILC. EUDET-Memo-2007-17, 2007. http://www.eudet.org/e26/e28/e182/e308/eudet-memo-2007-17.pdf
- [5] T. B. Cho, P. Gray, A 10 b, 20 Msample/s, 35 mW pipeline A/D converter. IEEE J. Solid-State Circuits, 30, 166–172, 1995.
- [6] F. Maloberti, F. Francesconi, et al., Design considerations on low-voltage low-power data converters. IEEE Trans. Circuits Syst. I, 42, 853–863, 1995.
- [7] I. Mehr, and J. Signer, A 55-mW, 10-bit, 40-Msample/s Nyquist-rate CMOS ADC. IEEE J. Solid-State Circuits, 35, 318–325, 2000.
- [8] T. Behnke, S. Bertolucci, R. D. Heuer, R. Settles, TESLA Technical Design Report, PART IV, A Detector for TESLA. 2001
- [9] K. Blut, and G. Gleen, A fast-settling CMOS op amp for SC circuits with 90-dB DC gain. IEEE J. Solid-State Circuits, 25(6), 1379–1384, 1990.
- [10] K. Gulati, and H-S Lee, A high-swing CMOS telescopic operational amplifier. IEEE J. Solid-State Circuits, 33, 2010–2019, 1998.
- [11] IEEE standard for terminology and test methods for analog-to-digital converters. IEEE-STD-1241, 2000.

# Hugo França-Santos <sup>a</sup>

<sup>a</sup>CERN, 1211 Geneva 23, Switzerland

hugo.franca.santos@cern.ch

# Abstract

This paper presents a 10-bit analogue to digital converter (ADC) that will be integrated in a general purpose charge readout ASIC that is the new generation of mixed-mode integrated circuits for Time Projection Chamber (TPC) readout. It is based on a pipelined structure with double sampling and was implemented with switched capacitor circuits in eight 1.5-bit stages followed by a 2-bit stage. The power consumption is adjustable with the conversion rate and varies between 15 and 34mW for a 15 to 40MS/s conversion speed. The ADC occupies a silicon area of 0.7mm<sup>2</sup> in a 0.13 $\mu$ m CMOS process and operates from a single 1.5V supply.

#### I. INTRODUCTION

Time Projection Chambers (TPCs) are one of the most widespread particle detectors for high energy physics. The largest TPC to date (88 m<sup>3</sup> in volume) is the core of the "*A Large Ion Collider Experiment*" (ALICE) [1] built at CERN for the "*Large Hadron Collider*" (LHC) particle accelerator. Future planed TPCs (e.g. LCTPC, CLIC and Panda) entail even higher spatial resolution in larger gas volumes hence require readout electronics with an unprecedented high density, low power and low mass. The state of art front-end electronics for TPCs is the one developed specifically for the ALICE TPC. It is based in two ASICs: the PASA and ALTRO [2], and is the groundwork for further technical improvements that will lead to a new generation of readout electronics that fully integrate low-noise amplifiers, analog-to-digital converters (ADCs) and digital signal processing in a single chip.

The ADC presented in this paper is one of the components of a general purpose charge readout chip that is being developed at CERN and meets these requirements providing at the same time flexibility for covering most of the upcoming TPC facilities. It offers adequate features in terms of speed and resolution with a reasonable power consumption and die area; therefore it is suited for an ASIC that incorporates 16 to 32 channels.

## II. ADC ARCHITECTURE

The pipelined analogue-to-digital conversion architecture is the one that best suits the constraints of this System-on-Chip (SoC) and is the preferable architecture for most applications that require ADCs with resolutions between 10 and 14-bit with speeds up to 300MS/s. The break-up of the conversion process combined with several circuit artifices enable the implementation of a high-performing structure with relatively little hardware.

#### A. Pipelined ADC Block Diagram

The pipelined ADC is established in an effective architecture that distributes the quantization along an analogue handling sequence with multiple stages. Each one subtracts part of the pertinent information from the sampled signal and passes the residue to the following stage until the last one, which contains only the sub-ADC function. The outputs of these series of high-speed low-resolution conversion stages are combined afterwards to achieve a highspeed high-resolution ADC.



Figure 1: Pipelined ADC architecture

An arrangement of eight 1.5-bit plus one 2-bit stage was chosen for this design since it offers good trade-offs for the speed and resolution required.

#### B. Double Sampling

A slice of 90 to 95% of the power consumption of the pipelined ADC goes to the OTAs, therefore these are the most important circuits to improve in terms of power efficiency. In a standard pipelined configuration, at a given time, half of the stages are in the sampling phase and the other half in the multiplication phase, hence, only half of the amplifiers are actually being used simultaneously however all are consuming power. Several modifications to the standard multiplying digital-to-analog converter (MDAC) that fully exploit the OTAs exist, the one used in this work is the double sampling technique that was first introduced in the 80's by Choi and Brodersen [3]. It consists on the duplication of the switch capacitor circuitry allowing the parallel execution of the sampling and multiplication operations as shown in Figure 2.

This circuit has greater power efficiency since it allows the reduction of the OTAs bandwidth to half but also suffers from several draw-backs. It occupies more die area, given that it has twice the number of capacitors, which are relatively big for matching reasons; it has a memory effect that arises from the suppression of the OTA reset phase, so, a fraction of each sample remains stored in the parasitic capacitance of the OTAs input, due to their finite gain and incomplete settling [4], and is added to the next sample. This error can be negligible if the OTA has a considerable higher gain than the minimum required for the corresponding ADC resolution, which is the case in this design.



Figure 2: A double sampling MDAC

Another potential problem of the double sampling is the gain error that may arise from mismatches between the ratios  $C_{12}/C_{11}$  and  $C_{22}/C_{21}$ , and an additional problem that may also take place is a different offset between even and odd samples, the reason being a mismatch in the charge injection of the switches  $\Phi_0$  and  $\overline{\Phi}_0$ .

The clocking circuitry that divides the frequency by two and delivers it to the sampling switches is very likely to introduce a different timing skew to the parallel track-andhold (T/H) switches. When the input is a sine wave the error turn out to be a tone at the frequency  $F_S/2 - F_{IN}$  [5]. In this design, a changing of the sampling circuit permitted the removal of this problem [6]. The idea was to introduce a new switch that synchronized the two parallel T/Hs, terminating the sampling phase (turning off) just before the switches  $\Phi_1$  or

 $\Phi_1$  depending on the phase being odd or even.

# C. Sub-ADC Threshold Levels

In the 1.5-bit per stage configuration, the redundancy of the sub-ADCs allows to set the thresholds in the range of  $0 \le \pm V_{TH} \le \pm \frac{1}{2} V_{REF}$ , so, typically these thresholds are set to the value that maximizes the error tolerance and that is in the middle of the allowed range:  $\pm \frac{1}{4}V_{REF}$  [7] [8] [9]. On the other hand, the error introduced by the capacitors mismatch is proportional to the amount of charge transferred between them as depicted in the Figure 3. It is noticeable in this figure that there is a relationship between the threshold positions and the effect of the capacitor mismatch. The error increases linearly as the input signal deviates from the reference voltages and from the common mode voltage since there is a greater charge transfer. The value of the thresholds that would minimize the error is  $\pm \frac{1}{2}V_{REF}$  however it would require very accurate comparators and would not tolerate any timing disparity between the triggering of the sub-ADCs and the triggering of the T/H, which exists in this design as will be explained later in the section III-D.



Figure 3: Capacitor mismatch effect

In this design the thresholds were set to  $\pm \frac{3}{8}V_{REF}$  because the Monte Carlo simulations showed that an error margin of  $\frac{1}{8}V_{REF}$  was still large enough. In a pipelined ADC with 1.5-bit per stage, the improvement in terms of reduction of INL is 12.5% in each stage. Since the capacitor mismatch contribution to the INL is divided by two as the stage number increases the total reduction of is 24.9% in a 10-bit ADC if there is no stage scaling; or even more if there is a capacitor scaling factor.

#### **III. CMOS IMPLEMENTATION**

The ADC presented here is fully differential since it offers the double of the output swing, which is convenient in low voltage designs; and superior tolerance to power supply noise, that can be critical in a mixed-signal circuit like this one. It operates from a single 1.5V power supply and occupies a die area of 0.7mm<sup>2</sup>.

# A. Operational Amplifier

The operational amplifier is the fundamental block that dictates the performance of the switched-capacitor pipelined ADC. The maximum speed and, to a large extent, the power consumption of the ADC are determined by the operational amplifier that at the same time is the block where the limits of the technology are meet [5].

The selected amplifier has two stages: a gain boosted telescopic amplifier input stage and a rail-to-rail output stage.

The common mode feedback is continuous in time being sensed with a resistor/capacitor divider.

The frequency compensation is both direct (or Miller) with a nulling resistor and indirect. This gives the best control over the phase margin of the main amplifier and the common mode loop, (71° and 57° respectively) keeping the bandwidth loss reasonable. The first stage of the amplifier is the one that most contributes to the gain since it provides 72.7dB, the second stage contributes with 30.3dB, making a total of 104dB and a unitary gain bandwidth of 332MHz in the simulated version.



Figure 4: Operational Amplifier

After the parasitic extraction of the layout, considering resistors and capacitors, these values changed to 101dB and 326MHz. It consumes 4mW in normal speed i.e. for operating the ADC at 40MS/s.

The biasing circuit is based in the beta-multiplier principle that provides constant Gm over a wide range of temperature. It is externally regulated and independent of the process parameters [10].

#### *B. Track & Hold Switches*

The sampling switches were implemented with complementary low-threshold FETs. Since a low onresistance is required these transistors are relatively large and consequently inject a considerable amount of charge when they change their on/off state, this phenomenon is called clock feedthrough. If no measures were taken this charge would contaminate in a non-linear way the sample that is stored in the capacitors. To reduce this effect, two dummy transistors were added by the sides of each active switch, each one injecting half of the charge that the active one injects but with opposite signal, reducing considerably the amount of input dependant charge injection [11].



Figure 5: Transmission gates with charge cancellation

Another important constraint of the sampling switches is their linearity over the input range of the ADC. The non-linear trait of the transmission gate switches introduces a distortion that can affect the overall performance of the ADC, especially at high input frequencies. In the next figure is shown the onresistance over the power supply range of three transmissiongate switches: minimum length regular V<sub>T</sub>, minimum length low-V<sub>T</sub> and low-V<sub>T</sub> with optimized length.



Figure 6: On-resistance of transmission gates

The voltage range that was considered in the analysis is comprised between 250mV and 1.25V since it corresponds to the full range of the ADC. The maximum on-resistance is determined by the time constant of the sampling and was set to 33 $\Omega$  for the various switches in comparison. The onresistance of the regular V<sub>T</sub> switch has a variation inside the range in the order of 12 $\Omega$ ; in the low-V<sub>T</sub> switch this value is reduced to 8 $\Omega$  and in the optimized one even further reduced to only 3 $\Omega$ . To measure the effect of this non-linearity a simulation with a full amplitude sine wave at the maximum input frequency (20MHz) was done and consequent harmonic distortion is depicted in the next figure.



Figure 7: Distortion of transmission gates

This analysis has shown a reduction of the highest spurious harmonic from -68dB to -77dB with typical process parameters and this value increased to -72dB in the process

corner fast-slow. These magnitudes of distortion are still tolerated by a 10-bit ADC given that it has an intrinsic quantization noise of -62dB.

#### C. Comparators

The precision requirements for the comparators are not very strict, however they should be fast and should not introduce a relevant kick-back noise. The selected architecture is called resistive divider latched comparator; it was introduced by Cho and Gray [12] and became a widely-used comparator in pipelined ADCs.



Figure 8: Resistive divider latched comparator

The comparator used in this design is similar to the originally published circuit but with the adding of one capacitor in each branch to make it less vulnerable to mismatches of the transistors, hence more accurate.

The setting of the thresholds is done according to the principles described in section II-C, so:

$$IN_{+} - IN_{-} = \pm \frac{3}{8} (REF_{+} - REF_{-})$$

This is translated into two comparators. One does the comparison:

$$IN_{+} + \frac{3}{8}REF_{-} = IN_{-} + \frac{3}{8}REF_{+}$$

and the other one:

$$IN_{+} + \frac{3}{8}REF_{+} = IN_{-} + \frac{3}{8}REF_{-}$$

therefore the width of the transistors connected to the reference voltages are  $\frac{3}{8}$  the width of the ones connected to the inputs, both having the same length.

#### D. Clocking

The circuit that generates the clocks for the various blocks of the ADC is complex and for practical reasons will not be shown here, however the most important features will be described.

The main clock drives two distinct branches: one that works at half clock speed and another that operates at full clock speed. The first one is applied to a non-overlapping clocking circuit that provides the clock sequencing for the double-sampling MDACs; the second triggers the sub-ADCs and the synchronization switches  $\Phi_S$  as explained in the section II-B.

For the sampling operation, the switches of the MDACs turn off in a sequence that minimizes the charge injection and in particular the input signal dependency; this is done by the well-known bottom plate sampling technique [13].

For minimizing the effect of the kick-back noise introduced by the latched comparators in the sample, the triggering of the MDAC of the first stage is done slightly after the triggering of the corresponding sub-ADC. This introduces a desynchronization between the MDAC and the sub-ADC however this error can be seen as an additional error in the thresholds of the comparators and will not influence the performance of the ADC since it has enough margin of redundancy.

The triggering of the  $\Phi_s$  is done with a small delay from the input clock since the bigger the skew the bigger the clock jitter, which can compromise the performance of the ADC at high input frequencies.

#### E. Layout

A two-channel prototype of this ADC was built in a multiproject wafer (MPW); the layout is shown in the next figure.



Figure 9: Chip layout

For a maximum level of testability the outputs from the stages are directly connected to the output pads, so the data alignment and the redundant sign digit code blocks are implemented outside, in the test system.

The digital and the analogue domains are properly separated at the various levels: pads, power distribution, wires and also at the substrate level using high resistivity enclosures around the digital parts.

### IV. TESTING

In this prototype one channel is more focused in verifying the functionality and the other in evaluating the performance, so they have different testing capabilities; however the tests gave similar results in both channels.

#### A. Static characterization

The static measurements were done through the output code density method [14] using a sine wave input with the frequency 50.0488KHz that exceeded slightly the full range of the ADC. The results are depicted in the figures 10 and 11 and summarized in the Table 1.



Figure 10: Differential non-linearity



Figure 11: Integral non-linearity

|     | DNL   | INL   |
|-----|-------|-------|
| MAX | 0.54  | 0.62  |
| MIN | -0.58 | -0.71 |

Table 1: INL and DNL range

The maximum values of DNL and INL are below  $\pm 1LSB$  therefore they are within the specifications for this ADC. In the INL graph it is possible to recognize the influence of the capacitors mismatch of the first stage and confirm the influence of the selection of the thresholds according to the ideas explained in the section II-C.

#### B. Dynamic Characterization

A dynamic characterization was done at 20 and 40MS/s. Sine wave signals with frequencies that ranged from 1 to 20MHZ and amplitudes near the full scale of the ADC were applied to the inputs. The results ranged from 9.07 effective number of bits (ENOB) for the lowest frequency input signal to 8.63 ENOB for the Nyquist frequency.

Whilst operating at 20MS/s the power consumption could be reduced from 34mW to 26mW without any significant loss in performance.

The tests are still ongoing for a complete characterization and optimization of power efficiency for a wider range of sampling frequencies.

#### V. CONCLUSION

A 10-bit Pipelined ADC in the 0.13 um CMOS technology was presented. A switched-capacitor with double sampling architecture was used. The proper design of the switching circuitry and selection of sub-ADC thresholds enabled to deal with the low voltage constraints and reduce the sensitivity to capacitor matching. The evaluation tests revealed a performance that matched the specifications.

#### REFERENCES

- [1] ALICE Collaboration, "ALICE Time Projection Chamber: Technical Design Report," Tech. Design Report ALICE, CERN-LHCC-2000-01
- [2] R. Esteve Bosch, A. Jiménez de Parga, B. Mota and L. Musa "The ALTRO Chip: A 16-Channel A/D Converter and Digital Processor for Gas Detectors," IEEE Transactions on Nuclear Science, Vol. 50, No. 6, December 2003
- [3] Tat C. Choi and Robert W. Brodersen "Considerations for High-Frequency Switched-Capacitor Ladder Filters," IEEE Transactions on circuits and systems, Vol. CAS-27, No. 6, June 1980
- [4] Seyfi Bazarjani and W. Martin Snelgrove, "A 160-MHz Fourth-Order Double-Sampled SC Bandpass Signa-Delta Modulator," IEEE Transactions on circuits and systems, II: Analog and Digital Processing, Vol. 45, No. 5, May 1998
- [5] Mikko Waltari and Kari Halonen, "Circuit Techniques for Low-Voltage and High-Speed A/D Converters," Kluwer Academic Publishers, 2002
- [6] Mikko Waltari and Kari Halonen, "Timing Skew Insensitive Switching for Double-Sampled Circuits," proc. IEEE ISCAS, Vol. II, pp. 61-64, May 1999
- [7] Tsung-Hsien Lu, Chun-Kuei Chiu and Chin-Cheng Tien,
   "A 10 Bit 40-MS/s Pipelined Analog-to-Digital Converter for IEEE 802.11a WLAN Systems," International Symposium on Communications, 2005
- [8] Hamid Charkhkar, Alireza Asadi and Reza Lotfi, "A 1.8V, 10-bit, 40MS/s MOSFET-Only Pipelined Analogto-Digital Converter," proc. IEEE ISCAS, pp. 63-66, September 2006
- [9] Geir S. Østrem, Øystein Moldsvor and Oddvar Aaserud, "A Compact 3V, 70mW, 12-bit Video-Speed CMOS ADC," Analog Integrated Circuits and Signal Processing, 15, 27-36, Kluwer Academic Publishers, 1998
- [10] R. Jacob Baker, "CMOS Circuit design, Layout, and Simulation" IEEE Press Series on Microelectronic Systems, 2008
- [11] Phillip E. Allen and Douglas R. Holberg, "CMOS Analog Circuit Design," Oxford University Press, 2004
  [12] T. B. Cho, P. R. Gray, "A 10 b, 20 Msample/s, 35 mW
- [12] T. B. Cho, P. R. Gray, "A 10 b, 20 Msample/s, 35 mW Pipelined A/D Converter," IEEE J. Solid-State Circuits, vol. 30, no. 3, pp. 166-172, Mar. 1995
- [13] D. Haigh, B. Singh, "A switching scheme for switched capacitor filters which reduces the effect of parasitic capacitances associated with switch control terminals," Proc. 19h IEEE ISCAS, pp.586-589, 1983
- [14] IEEE Standard for Terminology and Test Methods for Analog-to-Digital Converters, IEEE Std 1241-2000

# A Self Triggered Amplifier/Digitizer Chip for CBM

T. Armbruster <sup>a</sup>, P. Fischer <sup>a</sup>, I. Perić <sup>a</sup> <sup>a</sup>University of Heidelberg, Germany

### tim.armbruster@ziti.uni-heidelberg.de

#### Abstract

The development of front-end electronics for the planned CBM experiment at FAIR/GSI is in full progress. For charge readout of the various sub-detectors a new self-triggered amplification and digitization chip is being designed and tested.

The mixed signal readout chip will have 32-64 channels each containing a low-power/low-noise preamplifier/shaper front-end, an 8-9 bit ADC and a digital post-processing based on a FIR/IIR-filter. The ADC has a pipeline architecture that uses a novel current-mode storage cell as a basic building block.

The current prototype provides 26 different parametrized preamplifier/shaper/discriminator channels, 8 pipeline ADCs, a readout shift register matrix and a synthesized redundant signed binary (RSD) decoder.

#### I. INTRODUCTION

The fixed target Compressed Baryonic Matter (CBM) experiment is one of several heavy-ion experiments being built within the planned accelerator expansion FAIR (Facility for Antiproton and Ion Research) at Gesellschaft für Schwerionenforschung (GSI) in Darmstadt, Germany [1]. A superconducting synchrotron double ring accelerator (STS100/300) with 1.100 m circumference will be the heart of FAIR whereas the existing GSI accelerators UNILAC and SIS18 will serve as an injector. The two synchrotron rings will produce pulsed beams of up to 2.7 GeV/u for U<sup>28+</sup>, 29 GeV for protons (SIS100) and 34 GeV/u for U<sup>92+</sup> (SIS300) [8]. In the photo-montage below (fig. 1), the SIS double ring structure is shown in the upper right corner, the already existing GSI facilities are on the left (white-gray structures).



Figure 1: Photo-montage of the facility expansion FAIR at GSI [1]

The physical goal of the CBM experiment is to investigate highly compressed nuclear matter produced in direct nucleusnucleus collisions. More precisely, one aims to explore the "deconfinement" phase transition in the QCD phase diagram from hadronic matter to quark-gluon matter that takes place at temperatures of about 170 MeV [2]. The first experiments at FAIR are scheduled to start in 2014, the complete facility is expected to be finished in 2016.

The CBM detector concept comprises of several different sub-detector types that must be able to deliver precise tracking and timing measurements and to allow for reliable particle identification. Among other sub-detectors, a Silicon Tracking System (STS) built of silicon strip-sensors will be used as main tracking device and a Transition Radiation Detector (TRD) will separate electrons from pions and also track charged particles. Since from simulations one expects nucleus-nucleus interaction rates of about 10 MHz with each event producing up to 1000 charged particles [9], the demands on the different sub-detectors, the front-end electronics and the data acquisition (DAQ) in terms of data-rates and radiation-tolerance are high.

For both, STS and TRD, high-rate, low-power and lownoise readout ASICs are needed. Since the Poisson distributed collisions between the nuclei are not correlated to a global trigger signal, the readout ASICs for both detectors as well as the complete DAQ must be self-triggered. Besides other groups that pursue different approaches (e.g. a low-power, moderate resolution, time-over-threshold front-end design from AGH [3]), we have joined the CBM collaboration and started a new front-end readout chip development in 2006.

In this paper we will describe the current status of our work on the front-end readout electronics and especially the results we have as yet achieved with our last prototype. Moreover we sketch our concept for the first complete readout ASIC, that will integrate on one die 32-64 channels each performing the analog amplification, shaping and digitization as well as some digital filtering, hit detection and data reduction. We intend to submit the new readout chip in the end of 2010.

#### II. PROTOTYPE ARCHITECTURE

#### A. Design Overview

Our current prototype chip is sized  $3.2 \times 1.5 \text{ mm}^2$  and has been fabricated in the UMC  $0.18 \mu \text{m}$  1P6M technology. One die carries 26 charge sensitive amplifier channels, 8 pipeline ADCs, a shift register matrix of 5.3 kbit, two synthesized control/decoder blocks and different test and calibration circuits. 12 current DACs with 7 bit resolution allow for internal bias generation.



Figure 2: Block-diagram of latest prototype

As sketched in fig. 2, the 8 ADCs are connected to 8 different amplifier channels. The readout concept is to continuously run the ADCs which as well continuously write their digital output results into the corresponding subsequent shift register sub-blocks. During conversion phase, all ADC sample values older than 42 sample steps are thereby discarded, since the length of the shift registers is limited to 42 bits.

If an internal or external trigger signal occurs, all 8 shift register sub-blocks are connected in series (white arrows in fig. 2) and the whole data is shifted to the output decoder logic where it is further processed (redundant binary to "normal" binary decoder [6]) and afterward passed to the outer world. Since this oscilloscope-like methodology causes long dead-times, it will of course not be feasible for the final readout chip, but in the current prototype it significantly decreases the digital logic area, the number of necessary output pins and the overall data-rates.

The shift register matrix is build using two dynamic 3T D-RAM cells per register bit. For calculating the total amount of 5.3 kbit, one must consider that each ADC produces  $2 \times 8$  bit (time-multiplexed) per sampling step: 8 ADCs x (16 bit / sample \* ADC) x 42 samples (shift register length).

# B. Layout



Figure 3: Prototype layout

Figure 3 shows the complete chip layout. The die has a total of 110 pads sitting on a  $80 \,\mu\text{m}$  pitch. The 26 preamplifier/shaper channels are marked yellow. In the red box are the metal-metal capacitors that are connected to the amplifier inputs as a replacement for a "real" external detector capacitance. The bias circuitry of the amplifier channels including the 12 DACs is highlighted green. Framed in light blue are the 8 pipeline ADCs. Its bias structures and some hand-made control logic is marked pink. The blue box surrounds the 5.3 kbit shift register matrix. Both synthesized blocks are orange-colored, the upper one provides the output control and decoder logic while the lower one switches the shift register matrix and the ADCs. The different test structures are bordered white.

# C. Amplifier/Shaper Channels



Figure 4: Simplified preamplifier/shaper schematic

As sketched above (fig. 4), each amplifier channel basically consists of a single-ended preamplifier with NMOS input and a pole-zero cancellation feedback, a 2nd order T-feedback shaper (82 ns shaping-time) and a comparator (not shown) with LVDS output. The preamplifiers of the different channels were realized with varied design parameters to figure out what the lowest possible noise values are and how they can be achieved. In particular, due to the high impact of the input NMOS on the overall noise characteristics of the whole preamplifier, 3 different types of input NMOS were used within the different channels: normal (NMOS with triwell, minimal gate length), no-triwell (NMOS with triwell, non-minimal gate length).

For both, preamplifier and shaper, a unified amplifier cell was used several times. By choosing a certain number of amplifier cells for the preamplifier during design phase, one can easily optimize the tradeoff between power consumption and amplifier noise for a given detector capacitance. In the final chip the total number of amplifier cells will be configurable/switchable to be able to individually adjust the power and noise characteristics of the preamplifier to the actual detector capacitance.

Furthermore a special injection cell was included in every channel that provides 3 different ways for injecting test charges. The whole channel block was layed out by hand and covers about 40 x 540  $\mu m^2.$ 

# D. Pipeline ADC

The current-mode pipeline ADCs have 8 pipeline stages and therefore generate 9 bit per conversion step at a maximum speed of 24 MSamples/s. Each ADC produces a raw data stream of 400 Mbit/s that is fed into the storage matrix and decoded afterward. The ADC realizes the popular 1.5 bit algorithmic conversion technique which adds some redundancies into the output data for the benefit of relaxing the accuracy requirements of the comparator but for the costs of an additional redundant binary to binary decoder [7].

The probably most challenging building block of an algorithmic ADC in general is the multiplication unit. In our design a novel current storage cell, as it is sketched below (fig. 5), was used to perform a multiplication by two. This is actually done by copying the input current twice into two different copy cells and by connecting both cells together afterward to finally produce the doubled input current.



The basic principle of the current copy cell (cp. again fig. 5) is to integrate the input current onto the feedback capacitance (upper part of the loop) while concurrently reconverting the output voltage of the integrator back to a current (lower part of the loop) as long as the input current and the reconverted current are unequal. If at any time both currents exactly cancel each other, an equilibrium is reached and the primary input current can easily be stored just by opening both write switches (fig. 5). To read out the stored current again only the (lower) read switch must be closed.

The offset correction also required during each algorithmic conversion is done by enabling or disabling some additional current sources that are directly integrated into the current copy cell.

#### III. TEST SETUP

The prototype chip was directly bonded to a PCB that also carries the bias circuitry, some LVDS buffers, level shifters and different connectors. The PCB itself is mounted on a Xilinx Spartan FPGA board that provides all necessary infrastructure for FPGA programming and data exchange via USB with a PC. An impression of the test setup is shown below (fig. 6).



Figure 6: Test setup: The die is directly bonded to the PCB

## IV. PROTOTYPE RESULTS

#### A. Measurements of Amplifiers/Shapers

Well-known test charges can easily be injected into the amplifier inputs by using a calibrated internal injection capacitance. The pulse shapes of both the preamplifier and the shaper outputs can be studied qualitatively with a monitor bus, for precise noise values the discriminators within the channels are used to perform s-curve scans.

What is not shown here, the overall pulse shapes and the general amplifier/shaper behavior matches nearly perfectly the simulation (output pulse peaking-time about 95 ns) and therefore satisfies our expectations, whereas unfortunately the measured noise does not, as the following overview (fig. 7) shows.



To obtain these results many noise measurements have been performed. In particular, the equivalent noise charge (ENC) had to be extracted from measured s-curves of the different channels (different input NMOS types) while connecting different capacitive loads.

From the graphic it is apparent, that although the noise offset (at 0 pF detector capacitance) is for all simulations and

measurements at about 200 e ENC, the measured slopes of the different channel versions differ significantly from both simulation and each other.

The most important result here is that hardly any variation of the input NMOS type did have a significant impact on the measured noise, whereas using a longer input NMOS with a non-minimal gate length caused a dramatic decrease of the noise slope by a factor of about 2. Even through many different theories were made and many different sophisticated simulations including extracted post-placement simulations were performed, neither the difference of the noise values between the long and the normal channels nor the absolute deviation of all noise values from simulation could really be understood yet. Further investigations are ongoing.

Nevertheless the long channel has a measured noise of about 800 e ENC for a 30 pF detector capacitance while consuming only 3.6 mW and therewith already satisfies the project requirements.

#### B. Measurement of Pipeline ADC

We have measured many ADC transfer characteristics with DC inputs at a conversion speed of 24 MS amples/s and a corresponding clock frequency of 200 MHz, an exemplary result is shown below (fig. 8).

Besides demonstrating the proper operation of the ADCs itself, the successful measurements of the characteristic ADC curve also implicitly proves the proper operation of all involved readout components (shift register matrix, control blocks, redundant signed binary decoder, etc.).



Based on the evaluation of the differential non-linearity (DNL), the best measurements of the 9 bit design so far give an effective resolution of 7-8 bit, what has actually approximately been predicted by simulation. Thereby the ADC only consumes 4.5 mW at a conversion speed of 24 MSamples/s and covers just about  $130 \times 120 \,\mu\text{m}^2$  chip area.

# C. System: Amplifier/Shaper + ADC

Since, as described above, some shaper outputs can be connected to ADC inputs, shaper pulses can directly be digitized on-chip. For the following plot (fig. 9) 1000 hits have been recorded in this way.



Figure 9: Overlay of 1000 shaper output pulses digitized with on-chip ADC at 24Msamples/s

Since the measured shaper noise (at 5 pF input load) is smaller than one last significant bit (LSB) of the ADC, the observable disturbance here is only caused by ADC noise.

In general, by increasing the overall front-end gain, one could easily scale the shaper noise levels to the same scale as the ADC noise levels, at least as long as the needed dynamic range is not limiting. This will of course be considered in the final readout chip design.

From a theoretical point of view, the impulse response of the  $2^{nd}$  order shaper should be of the type  $(x/T^2) * exp(-x/T)$  and indeed fitting the digitized data in this way gave very good agreement.

#### D. Summary Table

The following table summarizes the most important characteristics of the prototype ASIC.

| Chip Technology           | UMC 0.18 µm, 1P6M, MiMCaps |  |
|---------------------------|----------------------------|--|
| Chip Area                 | 1.5 x 3.2 mm <sup>2</sup>  |  |
| Channel / ADC Area        | 40 x 540 / 130 x 120 μm²   |  |
| Number of Channels / ADCs | 26 / 8                     |  |
| Power per Channel / ADC   | 3.8 / 4.5 mW               |  |
| Shaper Noise (ENC)        | 200 e + 20 e / pF          |  |
| Shaper Peaking-Time       | 95 ns                      |  |
| ADC Resolution            | 7-8 bit effective          |  |
| ADC Speed                 | 24 MSamples / s            |  |
|                           |                            |  |

#### V. Outlook: Complete v1.0 ASIC

# A. Design Concept

The next milestone is to build the complete version 1.0 chip, that will have 32 mixed signal channels each consisting of an amplifier, an ADC and a post-processing including for example an IIR/FIR-filter, a digital hit detector and a simple

data compression unit. Moreover a 1-2 GBit LVDS transmitter cell and simple protocol encoder are intended. A very first draft of the conceptual block diagram is shown below (fig. 10).



Figure 10: Preliminary block diagram of the first complete chip

Furthermore, a hit parameter extraction unit that evaluates information as for example the hit amplitude is currently in discussion.

At present a token ring network seems to be the most simple and evenhanded solution to connect the channel outputs with the digital processing and transmission unit.

To save transmission bandwidth it is intended to connect several chips with lower load (due to a lower event rate) to enable them to share one LVDS transmitter via a simple serial round-robin protocol.

#### B. Radiation Tolerance

Calculations estimate the whole readout electronics to be exposed to radiation doses between 1 krad (time of flight detector) and 20 Mrad (first STS layer) [4] what in general demands for special circuit techniques feasible to increase the radiation-tolerance of both digital and analog chip parts. In our case, extensive investigations from GSI have shown the UMC 0.18 µm technology to be per se sufficiently radiationtolerant up to a certain limit while showing very good annealing characteristics [5]. For this reason, the usage of special radiation hardening techniques is yet not intended.



# C. Preliminary Floor-plan

A preliminary floor-plan concept is shown in the graphic above (fig. 11). The about  $3 \times 2 \text{ mm}^2$  sized die will have a separated bias circuitry, two symmetric 16-channel blocks, a centered digital processing area and a digital slow-control and I/O unit. The 32 detector channels will be wire-bonded to both the left and the right side to relax the overall pitch proportions.

#### VI. CONCLUSIONS

With the successful measurement of the low-noise and low-power analog preamplifier/shaper circuits on the one hand and the small low-power 7-8 bit ADCs on the other hand, we have conceptually finished the whole analog frontend and therefore reached an important milestone, even if several refinements certainly still have to be scheduled.

Moreover, as we have shown, we already have an overall design concept how the first complete readout chip should be realized and due to the effective cooperation with the different detector and physics groups, the final specification will soon be completed.

The submission of the complete readout ASIC is scheduled to the end of 2010.

#### VII. ACKNOWLEDGMENT

This work has been conducted using tools from Cadence Design System. We appreciate the support and the opportunities we got from our participation in the Cadence Academic Network.

#### VIII. REFERENCES

- [1] The Facility for Anitproton and Ion Research (FAIR) project website: http://www.gsi.de/fair
- [2] The Compressed Baryonic Matter (CBM) experiment website: http://www.gsi.de/fair/experiments/cbm
- [3] K. Kasiński, P. Gryboś, R. Szczygieł "TOTO1 ASIC First Results", 14th CBM CM Split, Croatia, October 2009
- [4] W.F.J. Müller "Radiation Doses in CBM A first estimate and an assessment of consequences", 11th CBM CM at GSI, Germany, February 2008
- [5] S. Löchner "Radiation studies on the UMC 180 nm CMOS process", CBM Progress Report 2008
- [6] S.M. Yen, C.S. Laih, C.H. Chen, and J.Y. Lee "An Efficient Redundant Binary Number to Binary Number Converter", IEEE Journal of Solid State Circuits, Vol. 27, No. 1, pp. 109-112, Jan 1992.
- J.S. Wang, C.L. Wey, "A 12-bit 100-ns/bit 1.9-mW CMOS Switched-Current Cyclic A/D Converter", IEEE Trans. Circuits and Systems II: Analog and Digital Signal Processing, vol. 46, no. 5, May 1999, pp: 507
- [8] H.H. Gutbrod at al, FAIR Baseline Technical Report 2006, Executive Summary, ISBN 3-9811298-0-6
- [9] V. Friese, W.F.J. Müller, P. Senger "The CBM experiment at FAIR", CBM Progress Report 2008

Figure 11: Preliminary floor-plan

# Measurement of the performances of a Low-Power Multi-Dynamics Front-End for Neutrino Underwater Telescope Optical Modules

D. Lo Presti<sup>a</sup>, L. Caponetto<sup>b</sup>, G.V. Russo<sup>a</sup>, N. Randazzo<sup>c</sup>, V. Sipala<sup>a</sup>, E. Leonora<sup>c</sup> On behalf of the NEMO collaboration

> a University of Catania, Department of Physics and Astronomy, Catania, Italy b INFN Catania/CNRS-IN2P3-CPPM, Marseille, France c Istituto Nazionale di Fisica Nucleare, Sezione di Catania, Italy

#### domenico.lopresti@ct.infn.it

#### Abstract

A solution for a system to capture signals in the Optical Module of an underwater neutrino telescope[1,6] is described, with focus on power consumption and signal dynamics considerations. All the design specification derive from considerations regarding the signals and their acquisition and are made starting from the most general hypothesis possible, so that they will be valid for any underwater Cherenkov neutrino telescope.

# I. INTRODUCTION

A front-end board, the FE-ADC [3], using a consumer ADC, has been designed and realized. It is aimed at the demonstration of the advantages of the proposed architecture fitting the specifications of power dissipation, multi-input dynamics and signal reconstruction has been realized.

The performances of this board have been accurately measured, both stand alone and coupled to the PMT foreseen by the NEMO collaboration, and are presented and discussed.

The results meet the requirements and establish the basis for the design of the definitive front-end architecture employing the SAS (Smart Analogue Sampler) chip [4] in place of an ADC.

# II. DESCRIPTION OF THE FE-ADC

In order to validate the proposed architecture, a front-end board has been designed. The block diagram of the board is shown in figure 1. All the functionalities foreseen for the final front-end board has been implemented. The sampling and A/D conversion of the PMT interface output signal are performed by a consumer 200 MHz 12 bit ADC, the AD9230 by Analog Device. The ADC output data are stored in a FIFO inside the FPGA only when a validation signal, the SOT (Signal Over Threshold) is high, corrisponding to the OutA crossing a suitable threshold. In this way it is possible to perform zero suppression and minimize dead time. The data transmission mechanism is the same developed, and full working, in the Front-end board used in NEMO Phase-1 [2]. If a signal exceeds the 512 pe it is possible to sample the integrator output of the PMT interface.

The sampling frequency is again 200 MHz but the stored data are taken every 10 samples. In this way using the same ADC it is possible to have samples of the integrator at 20 MHz.

The FIFO in the FPGA has a depth of 1 Ksample. This allows an instantaneous event rate up to 1 MHz without dead time, for signal in the range  $1 \div 512$  pe and time width below 100 ns.

When the integrator signal is used, the maximum record length in time is  $50 \ \mu s$ .



Figure 1 Block diagram of the FE-ADC board

The AD9230 has only one input channel and this input is connected, according to signal classification, to the correct PMT interface output through a fast analog multiplexer. This multiplexer is controlled by the Control Unit implemented in the FPGA.

The multiplexer changes its output according to the signal classification continuously, so, for example, if a signal crosses the over range threshold of the first input dynamics, the ADC will always sample the signal in the correct dynamics.

The samples of a signal together with the cardinal sine interpolation are shown in figure 2.



Figure 2: An example of waveform reconstructed using cardinal sine interpolation.

In this way, knowing the classification corresponding to each sample and the PMT interface gain calibration, it is possible to reconstruct the input signal Off-Line. An input linear range of 512 pe and an overall input dynamics, using the integrator output, up to 10000 pe are achieved.

# **III. TEST RESULTS**

The PMT signal has been acquired and used as a parametric input waveform.

The FE-ADC board has been tested using an arbitrary waveform generator for the PMT emulated input signal generation. A picture of the FE-ADC board is shown in figure 3.



Figure 3 A picture of the FE-ADC board

In the following, the results of the measurement of the PMT coupled to the FE-ADC board as the front-end electronics are presented. A data acquisition firmware has been designed for this purpose.

A. Charge resolution

The charge of an acquired signal is calculated off-line integrating the interpolated waveform of the signal itself.

In this way, it is possible to draw the charge spectrum and evaluate the gain and the resolution of the system, PMT an FE-ADC.

On the basis of the measurements, the gain of the PMT can be reduced to  $1,36\cdot10^7$ , by a factor of about 3,6, without affecting the optimal signal to noise ratio. This is a great advantage from the point of view of the ageing allowing, furthermore, an increase of the PMT output linear range from 100 to about 1000 photo-electrons.

In figure 4, the single photo-electron charge spectrum of the system at the optimal gain is shown.

The peak to valley and the resolution of the PMT, as measured stand alone with standard setup in single photoelectron conditions, remain unchanged, respectively 2 and 20%.



Figure 4: The single photo-electron charge spectrum measured using the FE-ADC

#### B. Time resolution

The time stamp of an incoming signal consists of two terms: rough and fine time stamps.



Figure 5: An example of the CFD software output

The rough time stamp mechanism consists of a 10 bit 200 MHz counter implemented in the FPGA. The 200 MHz clock is derived by the 20 MHz Master Clock by means of a DCM inside the FPGA. The measured jitter is below 300 ps. The clock is a LVDS signal and drives the ADC. In the final frontend, the SAS chip will use the same clock.

The fine time stamp is calculated by a costant fraction discriminator (CFD) software applied to the interpolated waveform. An example of the sotware output is shown in figure 5.

Using a laser source at a repetition frequency of 10 kHz and the time stamp of the signals acquired by the system it was possible to measure the time resolution of the whole system. The resolution has different components: the laser pulses jitter, the PMT time resolution, the front-end time stamp reconstruction resolution. The measured overall time resolution is of about 1,4 ns. The laser pulse jitter is negligible and the PMT resolution, measured using standard setup is of about 1,25 ns. In figure 6, the time stamp spectrum is shown.



Figure 6: The time stamp spectrum for the system in single photoelectron conditions

#### C. Double hit resolution

In order to measure the double hit resolution, which is a crucial parameter for our application, a dedicated setup has been developed. Using a splitter for optical fiber to obtain two optical path with different time delay it is possible to produce starting from a signal a double hit with the desired time separation. A measurement of the reconstruction performances varying the time separation has been performed. An example of the double hit waveform acquired is shown in figure 7.

The signal filtering and the sampling frequency have been optimized to obtain a double hit resolution of 20ns.

#### D. Control Unit

The Control Unit has been implemented in the FPGA, a Spartan3 by XILINX and manage all the operations in the FE-ADC board: the PMT supply voltage control and supervision, the thresholds of the classification, the ADC samples storage, filtering and transmission, the communication and the on board sensors.



Figure 7: A typical double hit waveform with 20 ns time separation

#### E. Overall performances

The emulation board has been fully tested in its overall performances and the technological and architectural choices are fully compliant with the mechanical and experimental specifications.

The main performances of the FE-ADC board coupled to the PMT are summarised in Table 1.

Table 1: main performances of the FE-ADC board coupled to the PMT

| Multi-dynamics:                                |  |  |  |  |
|------------------------------------------------|--|--|--|--|
| • 3 linear dynamic range up to 512 pe          |  |  |  |  |
| • charge dynamic range up to 10000 pe          |  |  |  |  |
| signal classification                          |  |  |  |  |
| Negligible dead time (@ 300 KHz BG in 10" PMT) |  |  |  |  |
| 5 ns time stamp online                         |  |  |  |  |
| 600ps time stamp offline                       |  |  |  |  |
| 20ns double hit resolution                     |  |  |  |  |
| PMT low gain = $1,36 \cdot 10^7$               |  |  |  |  |
| • high linear range                            |  |  |  |  |
| lower dark current                             |  |  |  |  |
| <ul> <li>longer PMT operating life</li> </ul>  |  |  |  |  |

The main feature of the FE-ADC are summarised in Table 2.

| FE-ADC inside or outside the Optical Module              |  |  |  |
|----------------------------------------------------------|--|--|--|
| Power supply (analog and digital) - 290 mA @ 5 V         |  |  |  |
| (187 mA ADC)                                             |  |  |  |
| 1,4 W ( 70% yield)                                       |  |  |  |
| Dual Safe Boot - FPGA back-up firmware                   |  |  |  |
| PMT control (ISEG base interface)                        |  |  |  |
| ADC 200 Msps 12 bit                                      |  |  |  |
| 200 MHz lvds sampling clock generation (DCM)             |  |  |  |
| Time stamp and classification (settable by slow control) |  |  |  |
| Temperature and humidity Sensors                         |  |  |  |
| Istantaneous rate monitor                                |  |  |  |
|                                                          |  |  |  |

The total power dissipation is about 1,5 W measured with a 5 V power supply. Considering that the ADC counts for the 64% of the power consumption and that the foreseen power consumption of the SAS chip is 60 mW, the definitive front-end board will have a power consumption of about 700 mW.

# IV. CONCLUSIONS

The development of the emulation board demonstrates the advantages of the proposed architecture fitting the specifications of power dissipation, multi input dynamics, signal reconstruction establishes the basis for the definitive design of the final front end board using the SAS chip. As soon as the chip will be available, the whole front-end have been tested together with the PMT.

The results of the measurements show that all the specifications for the Optical Module front electronics have been satisfied. The use of the final version of the SAS chip will allow for the total power consumption to be further reduced.

# V.REFERENCES

[1] Nemo website, nemoweb.lns.infn.it

[2] F.Ameli, M.Bonori, C.A.Nicolau, A 200 MHz FPGAbased PMT Acquisition electronics for Nemo experiment. Proceeding of VLVnT Workshop,231-234,Nikhef, Amsterdam 2003, http://www.vlvnt.nl

[3] D. Lo Presti, G. V. Russo, L. Caponetto, N. Randazzo, et al., A VLSI Full Custom ASIC Front End for the Optical Module of the NEMO Underwater Neutrino Detector, IEEE Transactions on Nuclear Science, Vol. 53, No. 3 (June, 2006) issue.

[4] L. Caponetto, D. Lo Presti, G.V. Russo, N. Randazzo et al., Design study of a low power, low noise front-end for multianode silicon drift detectors., Nuclear Instruments and Methods in Physics Research A, v. 552, iss. 3, pp. 489-512, 2005

[6] E. Migneco et al., Status of NEMO, Proc. Second Int. Workshop on large neutrino telescopes, Nucl. Instr. Meth. A567 (2006) 444

# The Control System for a new Pixel Detector at the sLHC

# J. Boek<sup>a</sup>, K. Becker<sup>a</sup>, T. Henß<sup>a</sup>, S. Kersten<sup>a</sup>, P. Kind<sup>a</sup>, P. Mättig<sup>a</sup>, C. Zeitnitz<sup>a</sup>

<sup>a</sup> Bergische Universität Wuppertal, Gaußstr. 20, 42119 Wuppertal, Germany

boek@physik.uni-wuppertal.de

#### Abstract

For the upgrade of the LHC, the sLHC (super Large Hadron Collider), a new ATLAS Pixel Detector is planned, which will require a completely new control system. To reduce the material budget new power distribution schemes are under investigation, where the active power conversion is located inside the detector volume. Such a new power supply system will need new control strategies. Parts of the control must be located closer to the loads. The minimization of mass, the demand for less cables and the re-use of the outer existing services are the main restrictions to the design of the control system. The requirements of the DCS (Detector Control System) and a first concept will be presented. We will focus on a control chip which necessarily has to be implemented in the new system. A setup of discrete components has been built up to investigate and verify the chip's requirements. We report on the status of the work.

#### I. INTRODUCTION

The innermost part of the ATLAS tracking system for the sLHC upgrade is a pixel detector. The precise layout and geometric dimensions are still under discussion. However an interesting option foresees five cylindric shells around the interaction point in the barrel part and five disks per end cap. A support tube will divide the detector into two parts: an insertable part comprising of shell 0 and 1, and a fixed part containing the rest. In the barrel staves carry the individual detector modules (see Figure 1), in the end caps the modules are installed on disks. Staves, half staves and disks sectors are forming the DCS relevant groups. The smallest unit on which DCS can act on will be one detector module. Typically a detector module will be read out via four front end chips. while the innermost layer likely will have only one front end per detector tile. Depending on the layer up to sixteen detector modules form a half stave.



Figure 1: Pixel stave of the outer layers (2,3 and 4)

At first the data collected by the front end chips are transferred to a controller, which is located at the EoS (End of Stave) card. A possible candidate for the EoS controller is the GBT (Giga Bit Transmission) chip [1]. Finally the opto electrical transceiver, called opto board, which is located a few meters away from the interaction point, sends the data to the control room for further processing and storage.

Besides the detector modules themselves, the End of Stave card with its electronics, and the opto board, monitoring of the environment and of the cooling are subjects of the detector control.

#### **II. REQUIREMENTS**

For each DCS subject we identified the parameters which need to be monitored or which require to be controlled (either the set value must be changeable and/or an operator must be able to switch a channel on and off), see Table 1.

For each parameter one has to define its granularity, e.g. whether a value is available per front end chip or just per stave. One has to evaluate where the processing of data takes place, locally inside the detector volume or at the power supply level, which typically will be installed in the counting rooms. In the following we will concentrate on all quantities which can't be controlled in the counting room. Furthermore the level of reliability and the life time – whether an information is permanently available or just for special periods - must be defined for all DCS items.

The control system must operate and react safely in all use cases from the assembly of the detector and qualification tests to the commissioning phase and normal data taking. A limited operation without a working cooling system must be possible. Tuning and calibration must be supported. It might happen that just parts of the system are available and will be operated.

Table 1: Items of the Detector Control System

|                   | to be monitored      | to be controlled   |
|-------------------|----------------------|--------------------|
| detector module   | HV voltage & current | selectable voltage |
|                   |                      | on/off             |
|                   | LV voltage & current | selectable values  |
|                   |                      | on/off             |
|                   | temperature          |                    |
| end of stave card | voltage & current    | on/off             |
|                   | temperature          |                    |
|                   |                      | reset              |
| opto board        | voltages & currents  | selectable voltage |
| _                 | -                    | on/off             |
|                   | temperature          |                    |
|                   |                      | reset              |
| environment &     | humidity             |                    |
| cooling           | temperature          |                    |

#### **III. PROPOSAL FOR A CONTROL SYSTEM**

Our starting point is the actual detector where DCS fulfils all needs and supports the data taking in a reliable way. Therefore monitoring and control of the different functions in the new system should be provided with the same reliability and the same level of granularity as for the actual detector. Especially monitoring and control per detector module are essential, e.g. the current consumption of the low voltage tells the operator whether a module is properly configured.

While the high voltage reading and setting will take place outside the detector volume, typically even inside the HV power supplies themselves, the low voltage monitoring and control require a data processing close to the detector modules due to the voltage drops.

Currently two powering methods are under discussion: a parallel powering with DC-DC converters or the serial powering scheme. As the choice of power distribution has a direct impact on the monitoring and control possibilities the two powering schemes must be investigated separately. It would be counterproductive if one studies new powering schemes to reduce the material in the detector and increases the DCS cable volume at the same time. Therefore to both efforts should be the attempt to avoid additional lines in common.



Figure 2: DCS for DC-DC powered modules



Cable bundle with data and HV

Figure 3: DCS for serial powered modules

# A. DCS for DC-DC powered modules

In the case of the DC-DC scheme the power reduction is performed by two stages of DC-DC converters. While the first stage is located inside the front end chips, the second is foreseen near the detector.

As the voltages are supplied in parallel per detector module and the cable bundle must be routed through the End of Stave card a monitoring of the low voltage per module is possible at the End of Stave card, see Figure 2. Monitoring can be done by a DCS chip, mounted on the EoS card. In this way just very short lines on the EoS card itself are necessary to provide a LV reading per detector module. Monitoring of the reference voltage Vref can be performed in the same way. Because individual lines are routed to the outside, the control and current monitoring can take place at the far end. Just for first system tests it might be useful to foresee a local current monitoring in order to debug the system.

Figure 2 depicts also the temperature monitoring of the detector modules. Each detector tile is equipped with an NTC (Negative Temperature Coefficient) sensor. The monitoring of the NTCs will require (n+1) lines for n detector modules. These lines are routed locally between the detector modules and the EoS card. As these lines are just monitoring connections, very small cable diameters are sufficient.

# B. DCS for serial powered modules

In the case of serial powered modules all modules of a chain are supplied by one power line and its return. They are connected to a current source, which will be located outside the detector most likely in the counting room. Shunt and linear voltage regulators inside the front end chips produce the required voltage. In this way serial powering reduces the power lines and hence minimizes the passive material inside the detector volume. Furthermore the power losses in the cables are reduced. The principal functionality of a serial powered pixel stave has already been proven some years ago, see [2].

Drawback of the serial powering is that an individual disabling per module from the outside is not possible anymore. A local mechanism is required to switch on/off single modules. The MPC (module protection chip developed by Bonn University [3]) bypasses the module and provides an overvoltage protection. Its bypass circuit must be locally steered. A capacitive coupled DCS chip at the end of stave would be a good candidate. Just one line per module is required, see Figure 3.

Due to the different DC levels of the detector modules their LV monitoring requires a voltage divider between the monitoring lines and the DCS chip inputs. Different from the DCS chip and the MPC, which can be developed in the same deep submicron technology as the front end chips, the voltage divider must be developed in a technology which stands higher DC levels, up to 20 V depending on the number of detector modules which are serialized.

To avoid additional sense lines between the detector modules and the EoS card different 'spying' methods are under investigation. As the HV return line, the data lines and the bypass control line are either on the DC level or depend on the DC level of the dedicated module, they principally offer the possibility for the LV monitoring. As these methods even allow a monitoring closer to the load, it must be investigated in how far also a DC-DC scheme could benefit from these plans.

As the temperature monitoring is completely independent of the LV powering scheme, it can be the same for both powering schemes. This gives in total (2n+1) DCS lines between detector modules and the EoS card for n serial powered modules.

## C. Overview on the DCS architecture

Besides the detector modules the EoS card, the opto board, and the monitoring of the detector volume and of the cooling are subject to the control system.

The EoS card houses mainly the GBT. Monitoring of its supply voltage and the EoS card's temperature are necessary, additionally a reset to the GBT should be available. These tasks can also be handled by the DCS chip, which is mounted on the EoS card, see Figure 2 and Figure 3.

The DCS needs of the opto boards are similar. Monitoring of the voltages, which supply the different components of the opto electrical transceiver, a reset and the temperature supervision can be performed by a DCS chip installed on the opto board or close to it.

In the case of the environmental and cooling monitoring, which mainly consists of temperature and a few humidity sensors, a DCS chip installed in their vicinity reduces the number of cables, which must be routed to the exterior.



Figure 4: DCS architecture

Figure 4 summarizes the supervision of all DCS items and gives an overview on the general DCS architecture. Three independent paths are built: diagnostics, control and safety.

By the diagnostics path detailed information can be merged into the data stream on request. A temperature and low voltage reading per front end chip could be useful additional information. Several global registers are foreseen in the front end chips, which could also be used for DCS. The DCS data will be merged into data stream. In the off-detector readout processors routines search for DCS data and transfer them to the correct system. It will be a powerful tool to debug, understand and tune the detector, but this information will only be available when the front end chips are correctly configured. As the control and feedback path contains all parameters which are essential for the operation of the detector, it must be available for all use cases and therefore can't rely on the functionality of the read out chain. It contains setting of values and their monitoring, allows to switch channels on or off, sends a reset to components which are stuck, and monitors temperatures. A high reliability is required. Obviously control and its feedback should have the same granularity. A detector module will be the smallest unit which can be handled independently. The information will be processed either by the power supplies or directly close to the detector modules. Major parts are described in the previous sections. The core of the on detector control will be a DCS chip, which handles the data.

The DCS master as shown in Figure 4, which might act as a middleman between the front end, the DCS chip, and the DCS computers, can be defined when investigations on the protocol of the DCS chip are further advanced (see also next section).

The safety path is based on interlock circuits. Specially irradiated silicon sensors can be irreparably damaged by heatups. To protect the detectors against overheat due to errors in the cooling system, delamination or thermal run-aways a hardwired interlock system is necessary. This independent interlock system should ensure the safety of the detector. As the highest level of reliability is required its active components should be located outside the detector and act directly on the power supplies.

As neither high precision nor high granularity are required two to four temperature sensors could be combined to create one interlock signal. While the average temperature is measured by the interlock system, the temperature per detector module can be measured by the DCS chip [4].

Summarizing, while the level of reliability will be highest for the safety path, it will be high for control and can be lower for diagnostics. The required granularity behaves vice versa. This stands in close relation to the required lifetime. Highest reliability goes with a permanent availability. For values which require a lower level of reliability, normally an intermittent availability is sufficient.

# D. Cable balance

As the control should be available for all use cases, the DCS chip should have its own powering lines. Together with the communication lines this results in three to five cable pairs which must be routed from the DCS chip to the external world. Additionally one cable pair per four detector modules should be foreseen for the interlock. The questions, to which location the cables must be routed and where they will be further bundled, are still under discussion. Compared to the actual detector, where three cable pairs per detector module are led towards the outside, this gives an impressive reduction in the number of DCS cables as can be seen in Table 2.

|                  | 8 detector<br>modules/DCS<br>chip | 16 detector<br>modules/DCS chip |
|------------------|-----------------------------------|---------------------------------|
| current detector | 24                                | 48                              |
|                  |                                   |                                 |
| DCS chip         | 3-5                               | 3-5                             |
| Interlock        | 2                                 | 4                               |
| sum              | 5-7                               | 7-9                             |

Table 2: DCS cables [pairs] from the end of stave to the exterior

The future detector will be built by a much larger number of detector modules as the current detector: 5888 compared to 1744 modules. However, as the number of cables per detector modules is much smaller, the re-use of the existing external cables, which is one of the boundary conditions for the system design, should be possible. A more detailed analysis should consider the DCS cables of the environmental monitoring and of the on-detector opto transceiver.

Also the number of internal cables from the modules to the EoS card is smaller than in the current detector, where there are 6 lines per detector module compared to a maximum of (2 + 1/n) lines per module, if n modules form a half stave. (1/n is given by the common return line which is shared by n modules). In the case of a DC-DC powering scheme even less cables are necessary. This results in a reduction of at least 50% for the internal DCS cabling.

# IV. THE DCS CHIP

As shown in the previous sections a DCS chip would be a good tool to reduce the material inside the detector volume while it supports the new powering schemes in the best way.

# A. Requirements

From the units which are supervised by the DCS chip, like the detector modules, the EoS card, the opto board or the environment, the features of the chip follow directly. As one chip design should cover all tasks, the requirement list is a set union of the individual lists:

- about 35 differential ADC channels, 10-12 bit
- about 17 digital outputs
- local clock and Vref for the ADC
- optionally supply of NTCs
- 2 x 16 bit counters
- communication interface, which is able to drive long cables
- all input/output signals should be differential
- chip ID
- low power consumption to allow an operation without cooling
- radiation level 1.3 \* 10  $^{16}$  n<sub>eq</sub>/cm<sup>2</sup>, 570 MRad

The harsh radiation environment [5] of the pixel volume is obviously the largest challenge. We aim to use the same deep submicron process as used for the front end read out chips in order to benefit from the large experience and knowhow, which already exists in this field. Besides the overall radiation hardness special efforts will be necessary to protect the registers of the digital outputs against SEU (single event upsets). It could be fatal for the operation of the detector if a module is switched on or off by error.

To make the DCS chip as fail-safe as possible the minimum of functionality should be foreseen. If possible, complex data processing should be done outside the detector. The smaller the number of active circuits will be, the more robust the design can be. Additionally this will reduce the power consumption. Also the ADC accuracy and the speed of data processing should be further investigated under the aspect of power consumption and possibilities to reduce it.

The main design criterion for the communication interface will be its robustness, while the speed of data transfer is of low importance for slow control data. A good compromise between baud rate and cable length must be found. For the moment we think that SPI and I2C are possible candidates. While SPI might be a bit more robust, I2C has the advantage of less lines.

# B. First prototype

As a starting point we defined a DCS prototype chip, whose functional blocks can be seen in Figure 5. As the choice of the communication protocol is still open, an I2C as well as an SPI interface is foreseen. To study the behaviour of the chip in detail all in– and outputs are routed to the outside.



Figure 5: Block diagram of DCS prototype chip

For the first prototype we concentrated on the chip interface. While we implemented a standard I2C protocol, the SPI is modified in so far that it contains slave addresses. Besides that five digital outputs, two counters for the read out of capacitive humidity sensors, and the connections to an external ADC are available. Because almost all components are digital circuits, this first prototype is submitted in a standard (non radiation hard) 350 nm CMOS technology. The tests which are planned with the prototype can be grouped in two categories, choice of the communication interface and system aspects, which will be described further in the next section. Foreseen tests are:

- Verification of SPI and I2C protocols,
- study impact of cable length,
- build a DCS network,
- control external ADC,
- test digital outputs,
- read out of humidity sensors.

# V. THE CONTROL BOARD FOR THE STAVE EMULATOR

To investigate and understand all aspects of a complete pixel stave a stave emulator setup has been developed at Bonn University[6]. This test-bench allows to evaluate the various aspects of data coding, management, and transmission as well as to study questions concerning powering and detector control. The detector modules and the EoS controller are represented by emulator cards. The DCS emulator, called COBOLT (COntrol BOard for the stave emuLaTor) and developed at Wuppertal University, can be connected via an adapter card.

All functionality, which should be inside a later DCS chip, is placed on the small printed circuit board of COBOLT. In the first iteration individual components are used. The aim is to verify that all DCS functionality is covered. Core of the board is an ATmega640V microcontroller including a 16 channel 10 bit ADC. Several plug-ins allow to adapt further measurements, studies of a voltage divider etc..

While continuously testing the interaction between DCS and the overall stave system, the DCS emulator will be replaced step by step by more realistic and final components.

First tests concerning the steering of the bypass control, which is required for a serial powered stave, have been successfully performed. Studies how the monitoring of the module's LV can be done for a serial powered stave are ongoing. As soon as the DCS prototype chip is delivered, it will also be inserted into the emulator system, replacing the microcontroller. A prototype of the DCS master, which will be required to establish the communication to the outer world, is also under development.

# VI. SUMMARY & OUTLOOK

The pixel detector which is planned for the sLHC, will require a completely new DCS architecture. Support of the new powering schemes, serial powering or a DC-DC scheme, and the reduction of material inside the active detector volume are the main design criteria.

The DCS items are identified and first requirements defined. We propose a new DCS architecture based on three independent paths: diagnostics provided by the read out system, control and feedback mainly performed by a DCS chip and safety ensured by an hardwired interlock system. From the boundary conditions of the on-detector control the necessity of a DCS chip follows. Its required characteristics are presented.

A first prototype DCS chip has been submitted. The aim is mainly to evaluate the communication interface and to study a DCS network. Furthermore a new prototype will be developed in a radiation hard CMOS technology in order to investigate strategies for SEU save registers and a bit flip resistant data transfer.

At the same time the definition of the DCS master must go on and it should be evaluated in how far the design of a pixel DCS chip can be merged with other developments.

#### VII. REFERENCES

[1] P. Moreira 'GBT Project Status' at: http://indico.cern.ch/getFile.py/access?contribId=22& sessionId=20&resId=0&materialId=slides&confId=45460

[2] D.B. Ta, T. Stockmanns, P. Fischer, J. Grosse-Knetter, Ö. Runolfsson, N. Wermes 'Concept, realization and characterization of serially powered pixel modules', NIM-A 2006, Volume 565, p. 113-118

[3] L. Gonella 'Module Protection Chip "MPC" for Serial Powered Pixel Module' at:

http://indico.cern.ch/getFile.py/access?contribId=4&resId=3& materialId=slides&confId=52375

[4] M. Garcia-Sciveres 'Integrated Stave Concepts' at: http://indico.cern.ch/contributionDisplay.py?contribId=8&ses sionId=3&confId=21398

[5] M. Garcia-Sciveres 'Phase 2 ATLAS pixel system architecture and requirements' at:

http://indico.cern.ch/getFile.py/access?contribId=1&sessionId =1&resId=0&materialId=slides&confId=47853

[6] http://icwiki.physik.uni-

bonn.de/twiki/bin/view/Systems/StaveEmulator
# High-Speed Serial Optical Link Test Bench Using FPGA with Embedded Transceivers

Annie C. Xiang <sup>a</sup>, Tingting Cao <sup>a</sup>, Datao Gong <sup>a</sup>, Suen Hou <sup>b</sup>, Chonghan Liu <sup>a</sup>, Tiankuan Liu <sup>a</sup>, Da-Shung Su <sup>b</sup>, Ping-Kun Teng <sup>b</sup>, Jingbo Ye <sup>a</sup>

<sup>a</sup> Department of Physics, Southern Methodist University, Dallas, TX 75275, U.S.A <sup>b</sup> Institute of Physics, Academia Sinica, Nangang 11529, Taipei, Taiwan

# cxiang@smu.edu

# Abstract

We develop a custom Bit Error Rate test bench based on Altera's Stratix II GX transceiver signal integrity development kit, demonstrate it on point-to-point serial optical link with data rate up to 5 Gbps, and compare it with commercial stand alone tester. The 8B/10B protocol is implemented and its effects studied.

A variable optical attenuator is inserted in the fibre loop to induce transmission degradation and to measure receiver sensitivity. We report comparable receiver sensitivity results using the FPGA based tester and commercial tester. The results of the FPGA also shows that there are more one-tozero bit flips than zero-to-one bit flips at lower error rate. In 8B/10B coded transmission, there are more word errors than bit flips, and the total error rate is less than two times that of non-coded transmission. Total error rate measured complies with simulation results, according to the protocol setup.

#### I. INTRODUCTION

High-speed serial optical data link provides a solution to High Energy Physics experiments' readout systems with high bandwidth, low power, low mass and small footprint. Many gigabits per second links are currently deployed at CERN's Large Hadron Collider (LHC) such as GLINK in calorimeter readout [1] and GOL in silicon tracker [2]. Next generation of multi-gigabit per second link is widely proposed to be operated at the Super LHC upgrade [3].

In the mean while, commercial FPGAs with embedded multi-gigabit transceivers have become readily accessible. Altera's Stratix II GX family and Xilinx's Virtex 5 FXT family offer comprehensive data interface designs that operate up to 6.5 Gbps. The newest Stratix IV GX and Virtex 6 HXT push the serial transceiver rate up to 10 Gbps.

These reconfigurable transceivers combined with programmable logic fabric make it feasible to develop a custom Bit Error Rate Tester (BERT) capable of verifying and characterizing a wide range of digital communication systems. Once the performance of the transceiver is verified, it can be deployed to demonstrate link architecture at system level. Compared with traditional standalone BERT equipment, FPGA-based BERT is much cheaper. It is also feasible to set up for different DUTs in irradiation tests due to its portable size and accessibility. Several groups have reported BER tests in a number of Single Event Effects (SEE) studies on optical and electrical components. [4][5]

Expedient customization is another major advantage of FPGA implementation. Reconfigurable hardware and build-in

IPs support flexible prototyping. Function blocks are encapsulated and pluggable, for example, 8B/10B encoder and decoder can be enabled or by-passed to emulate different system architects. It is important to understand how these standard communication protocols affect the transmission of event data as well as time, trigger and control information. PC user interface through USB is also important to support real-time access of detailed error loggings for studying these effects as well as link degradation due to irradiation.

A set of Bit Error Rate tests are performed, which is also known as the receiver sensitivity tests. Using the same physical transmission link between transceivers, we report comparable results using FPGA based BERT and the commercial tester. We also conduct tests using non-coded data and 8B/10B encoded data, and compare the results to that of simulation.

#### II. TEST BENCH SETUP

#### A. Optical link

We develop the test bench based on Altera's Stratix II GX transceiver signal integrity development kit and demonstrate it on a point-to-point serial optical link. A picture of the test bench set up is shown in Figure 1.

The FPGA-based BERT generates pseudo-random binary sequence (PRBS) at 5 Gbps. Its embedded transmitter drives a differential pair of coaxial cable that is connected to a SFP+ module. The SFP+ module consists of an optical transmitter (laser diode) and an optical receiver (photo diode). They convert the serial signal from electrical to optical and from optical back to electrical. The light output of the optical transmitter is coupled into two meters of OM3 grade multiple mode fibre. A variable attenuator is inserted in the fibre loop. The attenuator can be manually or automatically controlled. Fibre from the attenuator is plugged back into the optical receiver of the same SFP+ module. A carrier board is designed in house, onto which the SFP+ transceiver module is plugged. The board is impedance matched for high speed traces, and provides power and configuration to the module. Another pair of coaxial cables loops the electrical output signal of the optical receiver back to the FPGA's embedded receiver. The Altera development kit supports communication with a PC through USB port via FTDI interface. A user interface panel is coded in LabVIEW to download configurations and upload error loggings to and from the FPGA.

The physical media dependent portion of the data link begins and ends at the input and output coaxial cables, inclusive. A stand alone commercial BERT is plugged in the place of the FPGA-based BERT for comparison. Data are collected on a set of SFP+ from various manufacturers and a set of different length fibre loops. There are no discrepancies among the test results and only results of one scenario are detailed in section 3.



Figure 1: Setup for FPGA-based BERT driving serial optical link

# B. FPGA with embedded transceiver

The Stratix II GX FPGA dedicates the right side banks to transceiver circuitry that transmit and receive high-speed serial data streams. Each transceiver supports a number of protocols and operation modes with embedded hardware blocks and build-in firmware IPs [6].

We instantiate the transceivers to operate in basic mode through provided mega-function. And the instantiation is illustrated in Figure 2.

The FIFO buffer decouples clock phase variations across the programmable logic device (PLD) and the transceiver domains. Byte serializer allows the PLD to run at half the clock rate in order to match the transceiver speed. Byte ordering block is used in conjunction with byte deserializer to ensure the least and most significant byte order. Doublewidths data path of the channel serializer and channel deserializer are enabled to support data rate of 5 Gpbs. Two cascaded 8B/10B encoders and decoders can be enabled or by-passed. The channel data path is 32 bits wide for noncoded transmission and 40 bits wide for 8B/10B transmission.

On board 156.25 MHz oscillator is enabled as the input reference clock for the transmitter and receiver clock synthesisers to generate required frequencies. The clock recovery unit works in automatic lock mode, i.e., it initially locks to the reference clock and then switches over to the incoming data stream.

Word aligner detects specific patterns, aligns word boundaries and flags link synchronization according to protocol specific or custom defined state machine. We use the same specified word pattern for alignment, ordering and synchronization for simplicity. It therefore may require several resets to achieve true synchronization, where all status flags are asserted. Dynamic reconfiguration supports switching of analogue settings such as pre-emphasis, equalization and differential voltage amplitude at run-time through on-board dip switches and push buttons.



Figure 2: Simplified diagram of the transceiver implementation in hardware and firmware

#### C. Pattern generator and error detector

Pattern generator and error detector are custom coded in the FPGA programmable logic fabric, in conjunction with the embedded transceiver, to generate and verify data stream that pass through the physical optical link. Pseudo-random binary sequence (PRBS) of length  $2^7$ -1 and  $2^{23}$ -1 are implemented in polynomial shifters as basic test patterns. Only results of  $2^7$ -1 PRBS are reported in section 3.

The functions of the pattern generator and error detector and their interfaces with the transceiver are controlled by state machines as illustrated in Figure 3.



Figure 3: State machine of pattern generation (above) and error detection (below)

After power on or reset, synchronization data is sent from transmitter to receiver until frequency is locked and word alignment is achieved. Pattern generator is then enabled. The error detector uses the incoming data as seed to generated expected output data, until pattern match is declared. The error detector then switches to internal seed. Therefore, when the link is stable, incoming erroneous bit cannot disturb the output generation of error detector. Pattern match is declared when error-free incoming data is received for a specified number of consecutive clocks. Pattern match is not deserted, however, for consecutive error cycles. The frequency lock indicator will flag if the error cycles lead to link losing synchronization. Error injection that simulates single bit flip is provided by XOR the least important bit. Error types, type counters and time stamps are logged in FIFOs for user access. Error statistics are performed on the PC side.

#### III. RESULTS

#### A. Signal integrity

We measure signals at several test points along the serial optical data link using oscilloscopes' electrical or optical modules. The test points and test parameters are illustrated in Figure 4. Test point 1 measures channel transmitter output. Test point 2 measures optical transmitter output. Test point 3 measures optical receiver input and channel receiver input is measured at test point 4. Example eye diagrams of the channel transmitter output and channel receiver input are shown in Figure 5. Zero pre-emphasis setting results in the best eye opening at the far end of optical receiver output. Transmitter PLL bandwidth, equalization and DC gain have no effect on the error rate performance under this test scenario.



#### Figure 4: Schematic of test points and test parameters along a complete serial optical data link. (RT: real-time; SA: sampling; OE: optical-electrical converter; VOA: variable optical attenuator.)

The goal of measuring waveforms at different test points is to assign power (vertical) and jitter (horizontal) budget along these interfaces so that components that comply with these values would work together as one system. There are several industrial standards such as 4G Fibre Channel and 10GbE [7][8] that provide component acceptance value. For our application purpose, we must ensure that irradiation degradation is also accommodated while referring to these values. Jitter measured at channel transmitter output is 45ps, or 0.225 UI (unit interval), where the unit interval is 200ps for 5Gbps transmission. It is below both reference values from the 4GFC and 10GbE scaled. This validates the use of Stratix II GX transmitter to characterize downstream components. Jitter measured at channel receiver input is 60ps or 0.30 UI. This number is the convoluted contributions of channel transmitter, optical transceiver, and fibre loop. The difference of this value and the jitter acceptance value of the channel receiver is available for assignment to system degradations, such as fibre dispersion, connectors and irradiation. The rise/fall time of around 45 ps at channel transmitter output and rise/fall time of around 55 ps at channel receiver input also validate the use of the embedded transceiver to characterize link components and evaluate system bit error rate performance.



Figure 5: Eye diagram of the near-end transmitter (test point 1, above) and far-end receiver input (test point 4, below) of 5 Gbps, PRBS 2<sup>7</sup>-1 data pattern at room temperature.

# B. Basic BER

A variable optical attenuator is inserted in the fibre loop of the optical data link to induce transmission degradation. Bit error rate is measured at different attenuation levels. This test is also used to characterize the receiver sensitivity, the minimum optical power for achieving a specified bit error rate, i.e. at  $10^{-12}$ . In the noise dominate region, this relationship follow the general trend of error function of Gaussian distribution, where discrepancies are attributed to system nonlinearities as power penalties.

We compare the measurement results of the FPGA based BERT and that of a commercial BERT. The results are shown in Figure 6. The two testers obtain the same receiver sensitivity value for the same data link. The commercial BERT result deviates from the FPGA based BERT result as the bit error rate increases. This difference is due to the setup where the commercial BERT uses the same clock for both channel transmitter and receiver, whereas the FPGA based BERT has the ability to use clock recovered from the data stream as the channel receiver clock to mask out part of the system jitter.

We also observe that there is more one-to-zero bit flips than zero-to-one bit flips at lower error rate. This is due to the post amplification circuitry design of the optical receiver, which favourites one state over the other.



Figure 6: Bit error rate as a function of received optical power for 5Gbps, non-coded PRBS 2<sup>7</sup>-1 data transmission

#### C. 8B/10B word error

The 8B/10B coding is used by many protocols to achieve: DC balanced data stream; sufficient level transitions; and unique code groups. Stratix II GX devices support two dedicated 8B/10B encoders in each transceiver channel. It works in cascade mode and complements the word aligner to achieve boundary synchronization.

The 8B/10B coding algorithm is implemented per 802.3ae standard [8]. In such a setup, a single bit flip in the serial data stream can affect one code group, resulting in multiple bit errors; or affect two code groups, resulting in invalid codes. When an affected code group is diagnosed as invalid, the output of the decoder is irrelevant. It is therefore simpler to record word error instead of bit error in this case. When the single bit flip induced error spread into multiple code groups, the propagation delay is uncertain, depending on the transmitted data. Error propagation is eventually stopped by nonzero disparity blocks and the timing distribution of propagation delay decreases rapidly. This knowledge is important to building criteria for evaluating coding schemes that can potentially cause inter-event interference such as in the case discussed above.

Figure 7 shows the Monte-Carlo simulation results of the error position distribution of 8B/10B coded transmission when the non-coded transmission bit error rate is  $10^{-4}$ . A total of 10,000 errors are inflicted, which is equivalent to  $10^8$  bits in the serial data stream.



Figure 7: Simulation result of error rate of 8B/10B encoded data transmission when the non-coded serial data error rate is  $10^{-4}$ .

Table 1 shows the results of the simulation repeated at several levels of error rates. The majority of errors are word errors resulted from invalid codes. Most word errors occur in the first word after the bit flip (50%) as compared to the same word (18%) of the bit flip, and much less errors occur in the second word and insignificant amount occurs thereafter. Bit errors are also restricted to the same word of the bit flip.

Table 1: Monte-Carlo results of error rates of 8B/10B coded transmission with different non-coded transmission error rates

| serial err. rate                | 10-4  | 10-6  | 10-8  | 10-10 |
|---------------------------------|-------|-------|-------|-------|
| # of err. injected              | 9997  | 10000 | 10000 | 10000 |
| total bit flip err.             | 7239  | 7068  | 7175  | 7197  |
| total word err.                 | 13469 | 13618 | 13526 | 13632 |
| word err. without spread        | 1774  | 1743  | 1751  | 1673  |
| err. spread to 1st word         | 5135  | 5125  | 5094  | 5134  |
| err. spread to 2nd word         | 1393  | 1392  | 1420  | 1446  |
| err. spread to 3rd word & later | 532   | 568   | 558   | 586   |

Error rate measurements of both non-coded and 8B/10B coded transmission are performed using the FPGA-based BERT. The results are shown in Figure 8. It confirms that there are more word errors than bit errors. And that that total word errors of 8B/10B transmission is less than two times that of the non-coded transmission.



Figure 8: Bit errors and word errors as a function of received optical power for 5Gbps, non-coded PRBS 2<sup>7</sup>-1 data transmission vs. 8B/10B coded data transmission

#### **IV. CONCLUSIONS**

A test bench of high-speed serial optical link using Altera's Stratix II GX transceiver SI development kit is demonstrated. Its performance satisfies the tentative requirements for 5Gbps point-to-point data link applications. Optical receiver sensitivity test results comply in between the FPGA setup and that of a standalone commercial BER Tester.

The development of a custom BER tester allows us to investigate detailed statistics of the errors. We report that there are more one to zero bit flip than zero to one bit flip at lower error rate due to the optical receiver circuitry deployed.

Word error rate and error propagation of 8B/10B protocol is analyzed and simulated. We implemented the 8B/10B coding block in the FPGA-BERT and the measurement results comply with simulation results. The timing distribution of error propagation will prove important in evaluating the coding scheme appropriate to event data acquisition in experiments adopting such links.

#### V. ACKNOWLEDGEMENTS

The authors acknowledge US-ATLAS R&D program for the upgrade of the LHC, and the US Department of Energy grant DE-FG02-04ER41299. We would also like to acknowledge Drs. Francois Vasey, Jan Torska and Paschalis Vichoudis at CERN for beneficial discussions.

#### VI. REFERENCES

- [1] M. L. Andrieux et al., "Irradiation studies of Gb/s optical links developed from the front-end read-out of the ATLAS liquid argon calorimeter, *Nuclear Physics B – Proceedings Supplements*, 78 (1-3), pp. 719 - 724, 1999.
- [2] P. Moreira, G. Cervelli, J. Christiansen, F. Faccio, A. Kluge, A. Marchiora, T. Toifl, J. P. Cachemiche, M. Menouni, "A radiation tolerant gigabit serializer for LHC data transmission," *Workshop on Electronics for LHC Experiments*, 2001.
- [3] F. Vasey et al, "The Versatile Link common Project," to be submitted to *JINST*.
- [4] J. Troska, A. J. Pacheco, L. Amaral, s. Dris, D. Ricci, C. Sigaud, F. Vasey, and P. Vichoudis, "Single-Event Upsets in photodiodes for Multi-Gb/s data transmission," *Topical Workshop on Electronics for Particle Physics, IEEE Radiation Effects Data Workshop*, 2008.
- [5] C. Xiang, T. Liu, C. A. Yang, P. Gui, W. Chen, J. Zhang, P. Zhu, J. Ye, "Total ionizing dose and single event effect studies of a 0.25 μm CMOS serializer ASCI," *IEEE Radiation Effects Data Workshop*, 2007.
- [6] Altera, "Stratix II GX EP2SGX90 Transceiver Signal Integrity Development Board: Reference Manual," online <u>http://altrea.com/literature/manual/rm\_si\_bd\_2sgx90.pdf</u>
- [7] NCITS standard Fibre Channel Physics Interfaces-FC-PI: Rev 13, 2001.
- [8] IEEE standard 802.3ae-2005: Gigabit Ethernet, Institute of Electrical and Electronics Engineers, NY, 2005.

# The Design of a High Speed Low Power Phase Locked Loop

# Tiankuan Liu<sup>a</sup>, Datao Gong<sup>a</sup>, Suen Hou<sup>b</sup>, Zhihua Liang<sup>a</sup>, Chonghan Liu<sup>a</sup>, Da-Shung Su<sup>b</sup>, Ping-Kun Teng<sup>b</sup>, Annie C. Xiang<sup>a</sup>, Jingbo Ye<sup>a</sup>

<sup>a</sup> CERN of Physics, Southern Methodist University, Dallas TX 75275, U.S.A. <sup>b</sup> Institute of Physics, Academia Sinica, Nangang 11529, Taipei, Taiwan

# liu@physics.smu.edu

# Abstract

The upgrade of the ATLAS Liquid Argon Calorimeter readout system calls for the development of radiation tolerant, high speed and low power serializer ASIC. We have designed a phase locked loop using a commercial 0.25- $\mu$ m Silicon-on-Sapphire (SoS) CMOS technology. Post-layout simulation indicates that tuning range is 3.79 – 5.01 GHz and power consumption is 104 mW. The PLL has been submitted for fabrication. The design and simulation results are presented.

# I. INTRODUCTION

The upgrade from Large Hadron Collider (LHC) to super-LHC (sLHC) puts new challenges on the ATLAS Liquid Argon Calorimeter readout system. As a key part of the readout system, the optical data links must operate at the data rate of about 100 giga-bit per second (Gbps) per front–end board (FEB), 60 times higher than the present whereas power consumption must be kept the same as the present [1]. The serializers used in the present optical data link system cannot meet the upgrade requirements on data rate and power consumption. Due to the radiation tolerant requirement, no commercial serializer is available for the upgrade of the optical data links. A radiation tolerant, high speed, and low power serializer Application-Specific Integrated Circuit (ASIC) has developed for the upgrade of the optical data links.

We have designed the first serializer prototype ASIC (LOC1) working at 2.5 Gbps with a bit error ratio (BER) of  $10^{-11}$ . The second serializer prototype (LOC2) submitted in August 2009 is designed to work at 5 Gbps with power consumption of 500 mW [2]. Our next prototype (LOC3) aims at 8 – 10 Gbps, correspondingly, we have to develop a phase locked loop (PLL) operating at 4 – 5 GHz.

In LOC2, a ring-oscillator based PLL is implemented. This PLL works at 2.5 GHz with 173 mW power consumption. It is clear from the LOC2 design that a ringoscillator based PLL will not reach 5 GHz easily. Back in LOC1, a cross-coupled LC-tank based PLL (LCPLL) was implemented [3]. This LCPLL uses two identical LC oscillators and two coupling circuits to generate quadrature outputs, the frequency depending both on the resonant frequency of each individual oscillator and on their coupling coefficients. This LCPLL can be tuned in the range from 2.4 GHz to 3.6 GHz with a random jitter component of 2 ps (RMS). Power consumption of the LCPLL is 280 mW and the chip area is 1.64 mm<sup>2</sup>. This design is abandoned because of its high power consumption, circuit complexity, and large chip area usage.

We have designed a high speed and low power LCPLL. The design goal is to operate in the 4 - 5 GHz range, providing the clock for the future 8 -10 Gbps serializer, with less than 1 ps (RMS) random jitter and less than 120 mW power consumption. We choose a commercial 0.25-µm SoS CMOS technology because of its high speed, low power, absence of radiation-induced latch-up, and availability of high quality analog devices like inductors [4]. We have evaluated this technology to develop radiation tolerant ASICs in the application of particle physics front-end readout systems [5]. We apply no special design technique for radiation tolerant purposes except that we use static logic units instead of dynamic ones and transistors as large as possible. This design has been submitted for fabrication together with LCO2. The design and simulation results are presented in this paper.

#### II. DESIGN

The top level schematic of the PLL is shown in Figure 1. An LVDS receiver (LVDSRec in the figure) converts LVDS signals to CMOS signals. The PFD is a phase and frequency detector. The charge pump converts the up and down signals into control voltage. The LPF is a low pass filter. The LCVCO is a LC-tank-based VCO. The divider and driver consist of a divider (divide by 16) and a CML driver. We add a LVDS receiver and a CML driver, which will be removed when the PLL is used in a serializer, as the input and output interface for test purpose.



Figure 1: Schematic of the PLL

The PLL layout is shown in Figure 2. The PLL is located at a corner of a 9-mm<sup>2</sup> square chip shared by the PLL and a serializer. The PLL itself is  $1.4 \times 1.7$  mm<sup>2</sup>, where most area is

occupied by the decoupling capacitors for the power supply (about 800 pF in total), the decoupling capacitors for the voltage reference (about 200 pF in total), and the capacitors (about 220 pF in total) used in the low pass filter.



Figure 2: Layout of the PLL

The charge pump gain is programmable in four levels (20, 40, 60, and 80  $\mu$ A) through two configuration bits. The LPF is a second order passive low pass filter whose 3-dB bandwidth is programmable in three levels through three configuration bits ( $c_0c_1c_2$ ). The PLL loop bandwidth and phase margin are calculated [6], as shown in Table 1.

Table 1: The phase margin (PM) and open loop bandwidth (BW)

| $c_0 c_1 c_2$ | 2  | 00         | )1         | 01         | 0          | 1(         | 00         |
|---------------|----|------------|------------|------------|------------|------------|------------|
|               |    | BW/<br>MHz | PM/<br>deg | BW/<br>MHz | PM/<br>deg | BW/<br>MHz | PM/<br>deg |
| Charge        | 20 | 0.42       | 46.3       | 0.84       | 46.3       | 1.68       | 46.3       |
| pump          | 40 | 0.72       | 56.3       | 1.44       | 56.3       | 2.88       | 56.3       |
| gain          | 60 | 1.02       | 59.5       | 2.04       | 59.5       | 4.08       | 59.5       |
| (µA)          | 80 | 1.31       | 60.0       | 2.63       | 60.0       | 5.25       | 60.0       |

The new designs in the LCPLL are the VCO and the CML divider that is needed to match the 5 GHz VCO output frequency. We share LVDS receiver, the PDF, the charge pump, the LPF, the CMOS divider, and the CML driver between LCPLL and LOC2. More details of these blocks can be found in [2].

#### A. VCO design

Two common VCO implementations are ring oscillator based and LC-tank based. We choose an LCVCO because its high speed, low power, low jitter, and insensitivity to radiation. The schematic of the LCVCO is shown in Figure 3. NMOS transistors M2 and M3 with their source and drain terminals tied together are used as varactors. L0 and L1 are on-chip spiral inductors. The transistors M0 and M1 are negative resistance devices to compensate the energy loss of the LC tank consisting of inductors and varactors. Transistors M4, M5, M6, M7 and the resistor R0 form a current reference [7] and transistor M8 is used to mirror the current reference into the LC tank. Transistors M9, M10, and M11 form a startup circuit for the current reference. In order to reduce the length modulation effects, all transistors in the current reference circuit are much longer than the minimum length. An array of decoupling capacitors (not shown in the figure) is used to reduce noise on voltage reference v1.



Figure 3: Schematic of the LCVCO

The voltage-capacitance (C-V) curve of an NMOS varactor is shown in Figure 4. The C-V curve is monotonic and the maximum capacitance is two times larger than the minimum capacitance. Because the Q factor of NMOS varactors is larger than that of the same size PMOS varactors, we choose NMOS varactors.



#### Figure 4: V-C curve of a NMOS varactor at 5 GHz

A 2.675-nH on-chip spiral inductor [8] is chosen because its peak frequency, 5.1 GHz, is close to our desired frequency. The Q factor of this inductor at 5 GHz is simulated to be 21.2.

The voltage-frequency (V-F) curve of the VCO at typical corner and room temperature is shown in Figure 5. Tuning range is from 3.79 GHz to 5.01 GHz at typical corner and room temperature. The oscillation frequency varies less than 8.7% from corner to corner and from temperature to temperature. At all corners and at three temperatures (-40, 27, and 85 °C), the V-F curve is monotonic. At typical corner and room temperature, the phase noise at 1 MHz off the carrier frequency of 4.9 GHz is -114.1 dBc/Hz. Power consumption of the VCO is 4.5 mW.



Figure 5: V-F curve of the VCO

#### B. Divider design

Shown in Figure 6 is the schematic of the divider and driver. The first stage of the divider chain is a CML divider (divide by 2). The output magnitude of the CML divider is not large enough to drive the CMOS divider (divide by 8) and the CML driver, so a CML to CMOS converter is used after the CML divider. This converter has two pairs of complementary outputs. One pair is connected to the CMOS divider, the other to the CML driver. The CML driver is used to drive 50  $\Omega$  transmission lines for test purpose. The bandwidth of the CML driver is not high enough to match the VCO output signals, so the CML driver is used after a CML divider.



Figure 6: Divider and driver schematic

The CML divider schematic [9] is shown in Figure 7. The CML divider consists of a master latch and a slave latch. The clock inputs of the slave latch are inverted compared to those of the master latch. The outputs of the master latch are fed into the slave latch, whereas the outputs of the slave latch are inverted and fed into the master latch. The latch schematic is shown in Figure 8.



Figure 8: Schematic of the CML latch

The CML to CMOS converter consists of a differential to single-ended converter (D2S) and two stages of CMOS inverters as shown in Figure 9. The schematic of the D2S is shown in Figure 10.



Figure 9: Schematic of the CML-to-CMOS converter



Figure 10: Schematic of the D2S

The CML divider and the CML to CMOS converter are simulated together. The CML divider and the CML to CMOS converter can work up to 5.1 GHz at all corners and temperature from -40 °C to 85 °C.

#### **III. PERFORMANCES**

We perform post-layout simulation of the whole PLL. We remove all decoupling capacitors to expedite the simulation. During the simulation, the charge pump gain is set to 80  $\mu$ A and the loop bandwidth is set to 2.5 MHz. Shown in Figure 11 is the time interval error (TIE) waveform calculated from the differential VCO output signal. TIE is defined as  $TIE(n) = t(n) - n \cdot T - t_0$ , where t(n) (n=1, 2, 3, ...) are the instants of zero-crossing points, T is the ideal signal period, and t<sub>0</sub> is a constant. T and t<sub>0</sub> can be calculated by a linear fit of t(n) with n after acquisition. T equals to the input signal period divided by the dividing factor and t<sub>0</sub> equals to the mean TIE after acquisition. The phase of the VCO output signals follow that of the input signals completely after about 9  $\mu$ s. This is the PLL acquisition time.



Figure 11: TIE waveform

Shown in Figure 12 is the histogram of the TIE after 9 µs. Transistor noise is turned off during the simulation. The jitter shown in Figure 13 represents the PLL tracking error, i.e., deterministic jitter. The peak-to-peak value of this deterministic jitter is less than 2 ps.



Figure 12: TIE histogram

The phase noise usually dominates random jitter of a PLL [10]. Figure 14 shows the phase noise of the LCVCO in the worst case. The phase noise at 1 MHz off the 4.9 GHz carrier frequency is -105.8 dBc/Hz. We convert phase noise into random jitter in 10 kHz – 100 MHz range [11-13]. Random jitter is less than 1 ps (RMS).



Figure 13: Worst case phase noise of the VCO

Power consumption of the core PLL without the CML driver in the typical corner and room temperature is 104 mW at 4.9 GHz.

Table 2 shows the major performances of the PLL in the post-layout simulation.

| Tuning range (GHz)                           | 3.79 - 5.01 |
|----------------------------------------------|-------------|
| Power consumption of core PLL (mW)           | 104         |
| Area (mm <sup>2</sup> )                      | 1.4 x 1.7   |
| Random Jitter from VCO (worst case, RMS, ps) | < 1         |
| Deterministic jitter (peak-peak, ps)         | 2           |
| Acquisition time (µs)                        | 9           |

Table 2: Major performances of the PLL

# IV. CONCLUSION

We have designed a phase locked loop using a commercial 0.25-µm Silicon-on-Sapphire (SoS) CMOS technology. The post-layout simulation indicates that we achieve the design goal. The PLL has been submitted for fabrication and will be tested after it is delivered.

#### V. ACKNOWLEDGMENTS

This work is supported by US-ATLAS R&D program for the upgrade of the LHC, and the US Department of Energy grant DE-FG02-04ER41299. We would like to thank Jasoslav Ban at Columbia University, Paulo Moreira at CERN, Fukun Tang at University of Chicago, Mauro Citterio and Valentino Liberali at INFN, Carla Vacchi at University of Pavia, Christine Hu and Quan Sun at CNRS/IN2P3/IPHC, Sachin Junnarkar at Brookhaven National Laboratory, Mitch Newcomer at University of Pennsylvania, Peter Clarke, Jay Clementson, Yi Kang, Francis M. Rotella, John Sung, and Gary Wu at Peregrine Semiconductor Corporation for their invaluable suggestions and comments to help us complete the design work. We also would like to thank Justin Ross at Southern Methodist University for his help in setting up and maintaining the design environment.

#### VI. REFERENCES

- [1] Arno Straessner, "Development of New Readout Electronics for the ATLAS LAr Calorimeter at the sLHC", presented at the topical workshop on electronics in particle physics (TWEPP), Paris, France, Sep. 21-25, 2009.
- [2] Datao Gong, Suen Hou, Zhihuan Liang, *et al*, "Development of a 16:1 serializer for data transmission at 5 Gbps", presented the topical workshop on electronics in particle physics (TWEPP), Paris, France, Sep. 21-25, 2009.
- [3] Peiqing Zhu, "Design and characterization of phaselocked loops for radiation-tolerant applications", PhD Dissertation, Department of Electrical Engineering, Southern Methodist University, Dallas, TX, 2008.

- [4] R. Reedy, J. Cable, D. Kelly, *et al.*, "UTSi CMOS: A Complete RF SOI Solution", Analog Integrated Circuits and Signal Processing, vol. 25, pp. 171-179, 2000.
- [5] Tiankuan Liu, Wickham Chen, Ping Gui, et al., "Total Ionization Dose Effects and Single-Event Effects Studies Of a 0.25 um Silicon-On-Sapphire CMOS Technology", presented at the 9th European Conference Radiation and Its Effects on Components and Systems (RADECS), Deauville, France, Sep. 2007.
- [6] William O. Keese, "An Analysis and Performance Evaluation of a Passive Filter Design Technique for Charge Pump Phase-Locked Loops", National Semiconductor Application Note 1001, May 1996.
- [7] Behzad Razavi, Design of Analog CMOS Integrated Circuits, McGraw-Hill Science/Engineering/Math; 1st edition, August 15, 2000.
- [8] Peregrine Semiconductor Corp., "GX Rev. 1.9 UltraCMOS<sup>TM</sup> 0.25um Spice Models", Document No. 53-0023, Rev 04, 2007, San Diego, CA 92121, pp. 15-23.
- [9] Akinori Shinmyo, Masanori Hashimoto, Hidetoshi Onodera, "Design and Optimization of CMOS Current Mode Logic Dividers", 2004 EEE Asia-Pacific Conference on Advanced System Integrated Circuits (AP-ASIC2004), Aug. 4-5, 2004.
- [10] Behzad Razavi, Monolithic Phase-Locked Loops and Clock Recovery Circuits: Theory and Design, Wiley-IEEE Press, December 4, 2008.
- [11] Neil Roberts, "Phase Noise and Jitter A Primer for Digital Designers," *EE Design*, Jul. 14, 2003.
- [12] "Clock (CLK) Jitter and Phase Noise Conversion", Maxim Integrated Products Application Note 3359, Sep. 23, 2004.
- [13] Walt Kester, "Converting Oscillator Phase Noise to Time Jitter", MT-008 TUTORIAL, Rev. A, Oct. 2008, Analog Devices, Inc.

# Development of A 16:1 serializer for data transmission at 5 Gbps

Datao Gong<sup>a</sup>, Suen Hou<sup>b</sup>, Zhihua Liang<sup>a</sup>, Chonghan Liu<sup>a</sup>, Tiankuan Liu<sup>a</sup>, Da-Shun Su<sup>b</sup>, Ping-Kun Teng<sup>b</sup>, Annie C. Xiang<sup>a</sup>, Jingbo Ye<sup>a</sup>,

> <sup>a</sup> Department of Physics, Southern Methodist University, Dallas TX 75275, U.S.A. <sup>b</sup> Institute of Physics, Academia Sinica, Taipei 11529, Taiwan

#### dtgong@physics.smu.edu

# Abstract

Radiation tolerant, high speed and low power serializer ASIC is critical for optical link systems in particle physics experiments. Based on a commercial 0.25 µm silicon-onsapphire CMOS technology, we design a 16:1 serializer with 5 Gbps serial data rate. This ASIC has been submitted for fabrication. The post-layout simulation indicates the deterministic jitter is 54 ps (pk-pk) and random jitter is 3 ps (rms). The power consumption of the serializer is 500 mW. The design details and post layout simulation results are presented in this paper.

#### I. INTRODUCTION

The large volume data production in the recent high energy physics experiments requires a high speed data transmission ASIC for digital optical link between the ondetector and off-detector electronics systems. The radiation tolerance of the ASIC becomes more critical along with the increasing of the luminosity of the beam in the experiments. There are two serializer chips used in the Large Hadron Collider (LHC) experiments, GOL and G-link [1][2]. The GOL with a serial data rate at 1.6 Gbps is based on a 0.25 µm bulk silicon CMOS technology with radiation hardening layout. With a built-in laser driver, its power consumption is about 400 mW at 1.6 Gbps. The G-link has been identified to be radiation tolerant for the present ATLAS Liquid Argon Calorimeter (LAr) readout system. This chip consumes about 2.0 watts at 1.6 Gbps. The upgrade of LAr readout system from LHC to supper-LHC requires optical data link to provide 100 Gbps data rate, 60 times higher than the present, with same power consumption budget for each front-end board (FEB)[16]. Neither GOL nor G-link can meet the power consumption budget and data rate requirement. The development of a higher speed and lower power serializer is necessary for the LAr upgrade.

A commercial 0.25  $\mu$ m silicon on sapphire (SoS) CMOS technology has been identified to be suitable for ASIC development in the radiation environment in the particle physics experiments [3]. This technology has a  $f_T$  of 90 GHz which is much faster than that of the bulk silicon CMOS with the same feature size [4]. In this paper we present a design of a 16:1 serializer working at 5 Gbps based on this technology with 500 mW power consumption. This serializer can be used as a key component in high speed transmitter for LAr upgrading data optical link.

#### II. DESIGN

The serializer includes a 16:1 multiplexer, a PLL based clock generator and a CML driver as shown in figure 1. The multiplexer receives 16 bit LVDS signals and outputs CMOS level serial data at 5 Gbps. The clock generator provides clock signals whose phases are locked to input LVDS clock signal to the multiplexer. The CML driver is used to drive high speed differential signals though transmission lines to radiation tolerant optical laser driver [17]. To achieve good immunity of the single-event effect (SEE), we use large transistor size and static D-flip-flop in the whole design.

#### Multiplexer



Figure 1: The architecture of the serializer

#### A. LVDS receiver

An LVDS receiver is used to convert differential data and reference clock signals to CMOS signals for consequential process. The LVDS receiver is a differential amplifier followed by a differential pair with active load. With 100 mV minimum differential model level requirement, the receiver can work above 400 MHz with common mode level from 0.8 to 1.7 V and consumes about 2.8 mW in the typical corner post-layout simulation.

# B. 16:1 Multiplexer

The 16:1 multiplexer has 4 stage multiplexer units in serial in which the first 3 stages are cascade of same basic CMOS logic 2:1 multiplexer unit and the last stage is a special designed 2:1 multiplexer unit to operate above 2.5 GHz. The basic 2:1 multiplexer unit is driven by a clock not fast than 2 GHz and converts two input data bits into serial output as shown in figure 2. Two bits data are latched into D-flipflop at the rising edge of clock signal. One of the two latched data bit is delayed half clock period by a latch to assure the clock signal select data bits in following passive multiplexer with correct timing. The serial output data bit width depends on the duty cycle of clock which requires the clock signal with 50% duty cycle.



Figure 2: The basic 2:1 multiplexer unit

The static traditional transmission gate D-flip-flop used in the multiplexer unit is fast and has good SEE immunity comparing to other type ones [6][7]. In the regular D-flip-flip, the internal complementary clock signal of the pass gate is generated by inverters and not very symmetric. The asymmetric complementary clock signals significantly increase the delay of the pass-gates switching in the D-flipflop. This static D flip-flop can not work more than 2 GHz.

A high speed D-flip-flop is required to operate above 3 GHz in the last stage of multiplexer unit and first divider-by-2 circuit following the VCO. A D-flip-flop with symmetrical complementary clock signal inputs meets this requirement as shown in figure 3. We use two identical differential-to-single-ended circuits with cross-couple input from the differential VCO delay stage to generate symmetric complementary clock signals for this unit.



Figure 3: The static D-flip-flop with symmetrical complementary clock signals

#### C. Clock generator

The clock generator comprises a PLL and a clock divider. The clock signals distributed from the divider are 312.5 MHz, 625 MHz, 1.25 GHz and 2.5 GHz for four stages of multiplexers respectively. The 2.5 GHz clock signal is complementary signal required by the high speed 2:1 multiplexer unit. The phase frequency detector (PFD) is dead zone free and maximum operating frequency of this PFD is above 400 MHz. The charge pump requires complementary up and down signals to operate. The asymmetric complementary signals add extra noise on the control node. To minimize the asymmetry of the complementary up and down signals, two inverter arrays are used to generate complementary clock signals as shown in figure 4. After optimization of the transistor sizes, the complementary clock signals match within 5 ps in all process corners [5].



Figure 4: Single-ended to complementary signal converter

A conventional charge pump with active amplifier is implemented. The unitary gain voltage amplifier equalizes the voltage of the mirror node and control voltage node of VCO, which eliminates the charge sharing problem appearing at the instances of switching. Dummy pass-gates are added at the mirror node and control voltage node to reduce the charge injection problem. The current source of the charge pump is programmable from 20 uA to 80 uA to match nonlinear VCO gain. The change pump linear working range is from 0.5 to 2.0 V.



Figure 5: Charge pump with active amplifier

Multiple-pass loop architecture is used in the differential ring oscillator to boost the voltage controlled oscillator (VCO) operating frequency. The extra auxiliary feed forward loop reduces the delay of the stages in a conventional main loop [9][10]. The five stages oscillator is depicted in figure 5. This architecture is also called as look-ahead ring oscillator [11].



Figure 6: multiple-pass loop 5 stage differential oscillator.

Because of the two loop path structure, two pairs of inputs are needed in the delay stage as depicted in figure 7. Transistors M5 and M6 make the main loop, while M7 and M8 make the secondary loop. Comparing to common differential delay stage, the tail current source is removed, which reduced the phase noise due to the upconversion of the tail transistor low-frequency noise near the oscillation frequency. The oscillating amplitude of this delay stage is rail-to-rail, which also reduce the jitter [13][14].



Figure 7: Differential delay stage

As shown in figure 7, transistors M1, M2, M5 and M6 are constructed as a latch. When Vctrl increases, the resistance of M3 and M4 reduces, which increases the positive feedback gain of the latch. The stronger feedback gain makes the latch harder to switch the output nodes. Thus the stage delay increases and the VCO oscillates at a lower frequency when control node voltage increases. The VCO oscillates from 1.5 to 2.75 GHz with the VCO gain varies from 0.4 to 1.1 GHz/V in the charge pump working range. The post-layout simulation indicates that the phase noise is -92 dBc/Hz at 1 MHz offset from the 2.5 GHz carrier frequency.

The PLL low pass filter reduces the low frequency noise for the reference clock, but it is a high pass filter for the VCO generated phase noise. Choosing loop band width is a tradeoff among different noise sources. The low pass filter is a bandwidth programmable passive  $2^{nd}$  RC network as shown in figure 8. There is a reset bin to reset the control voltage to Vdd at the initial stage which means the VCO start to oscillate at lowest frequency.



Figure 8: Bandwidth programmable low pass filter

There are 3 control bits C0, C1 and C2 in which only one bit can be set to turn on NMOS transistors switchers and enable the resistors and capacitors in the RC filter. The PLL loop bandwidth and phase margin also depends on the charge pump current. To keep the PLL operating in stable status, its phase margin is larger than 45 degree in all the combination of charge pump current and LPF configurations as shown in table 1.

Table 1: Loop bandwidth in MHz and phase margin in degree with different CP current and the LPF configurations.

| СР      | C0,C1,C2=001 |        | C0,C1,C2=010 |        | C0,C1,C2=100 |        |
|---------|--------------|--------|--------------|--------|--------------|--------|
| current | BW           | margin | BW           | margin | BW           | margin |
| 20uA    | 1.25         | 60     | 2.5          | 60     | 5.0          | 60     |
| 40uA    | 2,28         | 56     | 4.6          | 56     | 9.1          | 56     |
| 60uA    | 3.14         | 50     | 6.3          | 50     | 12.5         | 50     |
| 80uA    | 3.88         | 45     | 7.8          | 45     | 15.5         | 45     |

#### D. CML driver

The last stage multiplexer outputs CMOS signals. A followed CML driver is needed to drive the high speed serial data through transmission lines. The CML driver is designed to be 4 stages CML buffer as shown in figure 9 [12]. The following differential stage has twice the current and the transistors are twice in width of those in the previous stage. The last stage amplifier has 20 mA current and 50  $\Omega$  output resistance to match the 50  $\Omega$  transmission lines outside the chip.



When the bonding wire with 25 um diameter is 1 mm long, its inductance is about 1nH. The output load assumed in

the testing is 1 nH for the bonding wire and 0.2 pF for overall capacitive load that includes the bonding pad and the input capacitance of the optical laser driver module. Resistive load is 50  $\Omega$  at the end of an ideal 50  $\Omega$  transmission line. The rise and fall times of the output waveform are 44 ps when we test it with 2.5 GHz clock signal. The output signal amplitude at the far-end of the transmission depends on the input frequency as shown in figure 10. As shown in this figure, the CML driver output signal peak to peak amplitude larger than 400 mV at 5 Gbps rate.



Figure 10: The magnitude of the CML output signal in voltage vs input signal frequency in GHz in different process corners.

The bonding wire length may vary and cause the attached inductance variation. The post-layout simulation manifests that the inductance of bonding wire significantly degenerates the CML driver bandwidth. The bandwidth of the CML driver attached with these bonding wires is about 5.5 GHz. When bonding wire is 5mm, the 3db bandwidth drops to 3.6 GHz. This result suggests us keep the high speed signal bonding wire as short as possible.

#### **III. PERFORMANCE**

The 16:1 serializer is implemented on a 3 mm x 3 mm die and occupies about half of the die area as shown in figure 11. The gray blocks in the plots are decoupling capacitors on the power lines. A high frequency LC-PLL is implemented on the same die for next version of serializer [15]. We separate the multiplexer unit power and ground lines from the noise sensitive PLL circuits to reduce the jitter and noise from the power line. The power consumptions of three main components are shown in table 2. Considering the PLL consumes about 35% power, it is possible to reduce the transmission power by sharing one PLL clock generator with multiple 16:1 multiplexers in the future.

Table 2: Power consumption of serializer components

|                  | Power (mW) |
|------------------|------------|
| CML Driver       | 96         |
| PLL              | 173        |
| 16:1 multiplexer | 238        |
| Total            | 507        |



Figure 11: The layout of the serializer, a LC PLL and other test components are included on the same die

In the final post-layout simulation 2<sup>7</sup>-1 PRBS data is feeded through the data bus. The reference clock is 312.5 MHz without jitter. The output load is same as depicted in CML driver testing. To simulate the noise on the power lines the inductance of bonding wires that connect the power and ground lines is considered. Because it is extremely slow to run post-layout simulations with the actual decoupling capacitors which are made of large transistors and metal-insulator-metal device, the on chip decoupling capacitor is simulated with a 0.6 nF ideal capacitor.



Figure 12: Post-layout simulation with 1nH inductor on each power line and 600pF internal capacitor.

The simulated eye-diagram is shown in figures 12. The transistor noise is not turned on in the simulation thus the jitter quoted in the figure does not include the random jitter. We roughly estimate that the deterministic jitter is 54ps peak-to-peak. Considering the phase noise in post-layout simulation, we estimate that the random jitter from the VCO is less than 3 ps (RMS) in all LPF and charge pump configurations.

#### IV. SUMMARY

A 16:1 serializer at 5Gbps is implemented with a 0.25 um SoS CMOS technology. The serializer consumes 500mW power when running at 5Gbps. Its deterministic jitter is estimated to be 54 ps and random jitter is about 3 ps. The design has been submitted for fabrication.

# V. ACKNOWLEDGEMENT

This work is supported by US-ATLAS R&D program for the upgrade of the LHC, and the US Department of Energy grant DE-FG02-04ER41299. We would like to thank Jasoslav Ban at Columbia University, Paulo Moreira at CERN, Fukun Tang at University of Chicago, Mauro Citterio and Valentino Liberali at INFN, Carla Vacchi at University of Pavia, Christine Hu and Quan Sun at CNRS/IN2P3/IPHC, Sachin Junnarkar at Brookhaven National Laboratory, Mitch Newcomer at University of Pennsylvania, Jay Clementson, Yi Kang, John Sung, and Gary Wu at Peregrine Semiconductor Corporation for their invaluable suggestions and comments to help us complete the design work. We also would like to thank Justin Ross at Southern Methodist University for his help in setting up and maintaining the design environment.

#### **VI.** REFERENCES

- [1] P. Moreira, T. Toifl, A. Kluge, G. Cervelli, F. Faccio, A.Marchioro and J. Christiansen, "G-Link and GigabitEthernet compliant serializer for LHC data transmission," IEEE Nuclear Science Symposium., vol. 2, pp. 96-99, Oct. 2000.
- [2] P. Moreira1, G. Cervelli, J. Christiansen, F. Faccio, A. Kluge, A. Marchioro and T. Toifl, "A Radiation Tolerant Gigabit Serializer for LHC Data Transmission" 2001, 7th Workshop on Electronics for LHC Experiments, Stockholm, Sweden, 10 14 Sep 2001, pp.145-149 Subject category Detectors and Experimental Techniques
- [3] Tiankuan Liu, Wickham Chen, Ping Gui, et al., "Total Ionization Dose Effects and Single-Event Effects Studies Of a 0.25 um Silicon-On-Sapphire CMOS Technology", presented at the 9th European Conference Radiation and Its Effects on Components and Systems (RADECS), Deauville, France, Sep. 2007.
- [4] OKI technical Review, Oct.2004/Issue 200 Vol.71 No.4, http://www.oki.com/en/otr/200/downloads/otr-200-R18.pdf
- [5] Masakazu. Shoji, IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL SC-21, NO. 5, OCTOBER 1986. 875. Elimination of Process-Dependent. Clock. Skew in. CMOS VLSI.
- [6] S. Tahmasbi Oskuii et al, "Comparative study on lowpower high-performance standard-cell flip-flops" Microelectronics: Design, Technology, and Packaging, edited by Derek Abbott, Kamran Eshraghian, Charles A. Musca, Dimitris Pavlidis, Neil Weste, Proceedings of

SPIE Vol. 5274 (SPIE, Bellingham, WA, 2004) • 0277-7861

- [7] Ramanarayanan, R et al. "Analysis of soft error rate in flip-flops and scannable latches", SOC Conference, 2003. Proceedings. IEEE International [Systems-on-Chip] Volume, Issue, 17-20 Sept. 2003 Page(s): 231 -234
- [8] J. Maneitas, "Low-jitter and process-independent DLL and PLL based on self-biased techniques," ISSCC Digest of Technical Papers, 1996.
- [9] L. Sun and T. Kwasniewski, "A 1.25 GHz 0.35µm monolithic CMOS PLL clock generator for data communications," Proceedings of the IEEE Custom Integrated Circuits, pp.265-268, 1999.
- [10] D.-Y. Yeong, S.-H. Chai, W.-C. Song, and G.-H. Cho, "CMOS current-controlled oscillators using multiplefeedback architectures," in ISSCC Dig. Technical Papers, pp. 386-387, 1997.
- [11] J. Maneatis and M. Horowitz, "Multiple interconnected ring oscillator circuit," US Patent 5 475 344, Dec. 1995.
- [12] Heydari et al., Design of Ultra High-Speed CMOS CML buffers and Latches, IEEE International Symposium on Circuits and Systems (ISCAS), May 2003, 208-211
- [13] B. Terlemez, "Oscillation Control in CMOS Phase-Locked Loops" Georgia Institute of Technology, PhD thesis, USA 2005
- [14] C. Park and B. Kim, "A low-noise, 900-MHz VCO in 0.6-μm CMOS," /IEEE/Journal of Solid-State Circuits/, vol.34, pp. 586-591, May 1999.
- [15] Tiankuan Liu, Datao Gong, Suen Hou, *et al*, "The Design of a Low Power High Speed Phase Locked Loop", presented the topical workshop on electronics in particle physics (TWEPP), Paris, France, Sep. 21-25, 2009.
- [16] Arno Straessner, "Development of New Readout Electronics for the ATLAS LArCalorimeter at the sLHC", presented at the topical workshop on electronics in particle physics (TWEPP), Paris, France, Sep. 21-25, 2009.
- [17] Jan Troska et al, "The Versatile Transceiver Proof of Concept" presented the topical workshop on electronics in particle physics (TWEPP), Paris, France, Sep. 21-25, 2009.

# Characterization of Semiconductor Lasers for Radiation Hard High Speed Transceivers

Sérgio Silva<sup>a,b</sup>, Luís Santos Amaral<sup>a</sup>, Stephane Detraz<sup>a</sup>, Paulo Moreira<sup>a</sup>, Spyridon Papadopoulos<sup>a</sup>, Ioannis Papakonstantinou<sup>a</sup>, Henrique M. Salgado<sup>b</sup>, Christophe Sigaud<sup>a</sup>, Csaba Soos<sup>a</sup>, Pavel Stejskal<sup>a</sup>, Jan Troska<sup>a</sup>, François Vasey<sup>a</sup>

> <sup>a</sup> CERN, 1211 Geneva 23, Switzerland <sup>b</sup>INESC Porto, Universidade do Porto, Portugal

# ssilva@cern.ch

#### Abstract

In the context of the versatile link project, a set of semiconductor lasers were studied and modelled aiming at the optimization of the laser driver circuit. High frequency measurements of the laser diode devices in terms of reflected and transmission characteristics were made and used to support the development of a model that can be applied to study their input impedance characteristics and light modulation properties. Furthermore the interaction between the laser driver, interconnect network and the laser device itself can be studied using this model. Simulation results will be compared to measured data to validate the model and methodology.

Keywords: Laser, VCSEL, model, Verilog-A, transceiver, radiation hard.

#### I. INTRODUCTION

The versatile transceiver under development for the Super Large Hadron Collider (SLHC) experiment will have to endure severe radiation conditions while providing multiple gigabit per second data transmission capability to cover the experiments requirements [1, 2]. For this, characterization and modeling of the electro-optic components (in particular the semiconductor laser), are of upmost importance as they will enable the correct design and optimization of the transceiver [3]. They will also enable to evaluate the link performance when the physical characteristics of the device change due to the environmental circumstances.

A measurement methodology will be presented whose results lead to the implementation of a model with broad validity. This model accommodates several different laser types (Fabry-Perot, Distributed Feedback, Vertical Emission) [4-7]. The laser model is implemented in Verilog-A for ease of use by integrated circuit designers, and it aims at easing the design of robust systems capable of complying with the demanding requirements of high energy physics experiments.

Since the impedance mismatch between the driver and the laser should be kept as low as possible to decrease intersymbol interference, jitter and power loss, a very accurate model of the laser chip input and parasitic network was developed. It will be shown that the theoretical model is in good agreement with experimental data and that it enables correct design of the transmitter circuitry of the laser driver. The results of the study of an impedance matching network and signal pre-emphasis will be shown. Current work is focusing on the use of the model to predict the performance degradation with environmental conditions and analyses of the system sensitivity to manufacturing parameter deviations [8].

#### II. LASER MODEL

A laser model is presented here capable of mirroring the device dynamic behavior (output light signal) and input characteristics (input impedance) of real devices. This model tries to embrace different laser structures and package types (wire leads/flex cable) of commercially available devices ([9]).

The model is divided into the intrinsic laser diode (ILD) model and the parasitic interconnection circuit. The ILD behaviour can be described by the rate equations or the Laplace transfer function obtained at the bias operating point. The parasitic interconnection circuit represents the laser assembly in a package, its interconnection and laser die structure ([3]). A schematic of the laser model plus source and test fixture is presented in Figure 1.



Figure 1: Laser model schematic including test fixture.

Typically laser package connections are kept very short and the wire bonds have a very small length and therefore two simple lumped element transmission line models can be used in cascade for the leads and wire bonds respectively. These are constituted mainly by inductances since the associated capacitances and resistances are very small. Nevertheless the contact resistance and capacitance cannot be neglected and are included in the model as the total capacitance  $C_P$  and resistance  $R_P$  respectively.

 $R_S$  represents the resistance between the electrical contacts and the active layer (including the Bragg mirror stacks in the case of the VCSEL laser).  $C_{SUB}$  and  $R_{SUB}$  are the substrate plus bond pad associated capacitance and resistance respectively.

Under operating conditions (bias current much higher than the threshold current), the laser diode depletion region is very conductive and so the ILD impedance is very small and is, for input impedance calculation purposes, much lower than  $R_s$  and thus negligible ([5]). Therefore the laser input impedance can be obtained right of the arrow pointer with the ILD short-circuited, as illustrated in Figure 1.

For the intrinsic laser diode it is assumed that since we are only interested in the mean photon densities it is possible to model the laser using only a single rate equation. Typical telecom lasers operate in a single-longitudinal mode which makes them suitable for high-bit-rate fiber optic communications ([3, 5]). This also makes the approximation of the laser behavior by a single pair of rate equations very accurate. This is the case for the DFB and VCSEL lasers but not FP. In the case of the FP laser, the photon density accounted for in the rate equation is the mean densities between all modes. Nevertheless, if one is interested solely in the dynamics of the output light power, this approximation is still valid.

For small modulation currents ( $i_{Modulation} < I_{Bias}$ ), the intrinsic laser diode can be modelled using its transfer function. Obtaining the steady-state equation for a specific bias current, it is then possible to linearize of the rate equations around the operating point ([5]). From this, the laser transfer function is easily calculated as the Laplace transform of these linearized rate equations ([5]).

For large signals ( $i_{Modulation} \approx I_{Bias}$ ), the full differential rate equations must be used which can make the simulation process slower.

#### **III. PARAMETER EXTRACTION**

The laser model parameters can be extracted using S-Parameter measurements (reflection and transmission characteristics) conducted using a network vector analyzer ([9]) or the relative intrinsic noise spectrum curves measured with a spectrum analyser.

Starting from known parameters obtained for similar lasers in the literature, the optimization algorithm (constrained parameter curve fit) tries to find the set of parameters that lies within pre-determined bounds and searches for the minimum square error of a set of curves.

Since the input impedance measurement is decoupled from the ILD (the ILD is considered as having very low impedance), the  $S_{11}$  (reflection) measurements are used to obtain the parasitic circuit parameters. The ILD rate equation parameters, on the other hand, can be extracted using two methods: frequency subtraction ([4, 9]) and relative intensity noise spectrum fit ([10]).

The frequency subtraction method was presented in [9] and proceeds using the  $S_{21}$  (transmission) measured at different bias currents and then fitting a laser frequency response model obtained using the laser rate equations noting that ([4, 2]):

$$\frac{H_{Global}(f, I_{Bias})}{H_{Global}(f, I_{Ref})} = \frac{H_{PC}(f)H_{TF}(f)H_{ILD}(f, I_{Bias})}{H_{PC}(f)H_{TF}(f)H_{ILD}(f, I_{Ref})} = \frac{H_{ILD}(f, I_{Bias})}{H_{ILD}(f, I_{Ref})}$$
(1)

The laser transfer function is given as a function of the S-Parameters by:

$$H_{Global}(f, I_{Bias}) = S_{21}(f, I_{Bias}) (1 - S_{11}(f)),$$
(2)

So the quotient (subtraction with log operator) between the laser response ( $H_{Global}(f, I_{Bias})$ ) at different currents above threshold ( $I_{Bias}, I_{Ref}$ ) is not affected by the parasitic circuit transfer function ( $H_{PC}(f)$ ) or the laser assembly transfer function ( $H_{TF}(f)$ ) since these are not dependent on bias current, and is simply the quotient of the ILD transfer functions which is known and a function of the model parameters. The  $H_{ILD}(f, I_{Bias})$  function can be approximated by the following equation ([4, 5]):

$$\frac{H_{ILD}(f, I_{Bias})}{H_{ILD}(0, I_{Bias})} \approx \frac{f_r^2}{f_r^2 - f^2 + jff_d}$$
(3)

In this equation,  $f_r$  (resonant frequency) and  $f_d$  (damping frequency) are a function of the ILD model parameters and bias current, as dictated by the rate equations. This enables the estimation of the following ILD rate equation parameters: V, volume;  $g_0$ , gain slope constant;  $\varepsilon$ , gain compression factor;  $N_{0m}$ , carrier density at transparency;  $\beta$ , spontaneous emission factor;  $\Gamma$ , optical confinement Factor;  $\tau_p$ , photon life time;  $\tau_n$ , electron life time.

The ILD parameter extraction is made by adjusting a set of curves obtained for the ratio of transfer functions for two different currents to the ones obtained with measured data. Using a set of curves obtained for different pairs of operating currents enhances the fitting robustness.

A final tuning using the global model response and can be carried out as the last step (Figure 2).



Figure 2: Parameter extraction algorithm.

The relative intensity noise spectrum method uses the measured noise spectrum curves (RIN(f)) of the laser using a spectrum analyzer and a model obtained using the rate equations for this curves. The laser parameter estimation is then a curve fitting procedure using the equation bellow:

$$\frac{RIN(f)}{RIN(0)} \approx \frac{f_r^4}{f_d^2} \frac{f^2 + f_d^2}{(f^2 + f_r^2)^2 + f^2 f_d^2}$$
(4)

Again,  $f_r$  and  $f_d$  are a function of the ILD model parameters and bias current. As an initial guess, the parameters obtained using methods such as frequency subtraction should be used. Otherwise the published laser manufacturer parameters can be used if available. Both parameter extraction methods provide a way to obtain the ILD rate equation parameters. With these parameters and the rate equations, the entire steady state and dynamic behaviour of the laser can be described mathematically and thus be incorporated in a simulation software.

#### **IV. RESULTS**

Several types of different laser devices were measured and its model parameters estimated by the method here described. Figure 3 presents a summary of the extracted parameters. Here device 1 is a Fabry-Perot long wavelength laser, device 2 is a VCSEL long wavelength laser and the remaining devices are short wavelength VCSEL lasers. Devices 5 and 6 are the same laser device but with wire leads and flex cable connections respectively.

The values for the parameters that are associated to the bond wire ( $C_{P1}$ ,  $L_{P1}$  and  $R_{P1}$ ) are similar for all devices. This is to be expected since they all use the same package type and are at this level very similar. The exception being the FP laser whose parameters are higher than the mean values. It is interesting to see that the active area resistance is clearly higher in the VCSEL and even higher in the short wavelength laser, as it does not use a buried tunnel heterojunction structure. The inductance values agree with the ones expected for short connections but the capacitance and resistance values are higher than the expected ones. This might be due to capacitive effects between the wires and package that were not considered explicitly in the parasitic model. And as for the resistance  $R_{P2}$ , contact and solder imperfections might be responsible for this high value.

For the ILD parameter values, the lasers of the same family show agreement between them (3, 4 and 5, 6) as it was to be expected. More so between the devices 5 and 6 since they are basically the same laser device with different electrical packages. The FP laser has the highest active volume value which is in agreement with its internal structure. The VCSEL long wavelength laser (device 2) has an active volume that is larger than the short wavelength lasers, a consequence of the internal structure for this type of lasers.

For simulation purposes the laser parasitic circuit and ILD model are implemented using Verilog-A, which is a suitable format for numerous simulations packages ([9]). With this model and the extracted parameters it is possible to compare the measured transfer function with the results obtained with the model (shown here for the case of a long wavelength VCSEL laser). As it can be seen, the model agrees with the measured data to a good degree for the transfer function (Figure 4) at several bias currents ( $I_1 \le I_2 \le I_3 \le I_4$ ) and for the eye diagram (Figure 5; blue measured, red simulation).

With this model it is possible to study and optimize the electrical network that connects a laser driver to the laser and the way the bias current is supplied to it. Furthermore, the high magnetic field devices are subjected to in the particle detector makes impossible to use ferromagnetic components, making it necessary to use alternative configurations for the supply of the laser bias currents. This includes the use of ceramic/air core inductors or microstrip inductors. Both



Figure 3: Parasitic & ILD model parameters.

solutions represent a loss of performance regarding a solution with ferromagnetic core inductors and need to be studied.

Laser drivers are typically sensitive to signal reflections caused by the impedance mismatch between the driver output and laser input load.



Figure 4: Transfer function of measured laser data and model simulations.



Figure 5: Measured eye diagram and model simulations.

It is possible to trade power transmission for reduced reflections by using a resistive matching network as the one in Figure 6. This network aims to obtain an impedance  $Z_M \approx Z_0$ by the use of a resistor net, trading signal reflection minimization for power transmission. The series damping resistor (R2 and R4) serves the dual purpose of damping reflections (that cause output distortion) and creating a stable load. Load stability is improved because the load presented by the laser can vary by a significant portion of its nominal value, whereas he combined load presented by the laser and the damping resistor varies by much less if the resistor take the bigger parcel of the series set. The circuit is able to effectively reduce the reflections into the source, but the resistors present in the circuit cause less power to be delivered to the laser. This is the trade off made: power transmission versus reflection reduction. So by reducing the mismatch between the laser input impedance and laser driver output impedance we are unavoidably reducing also the power that reaches the laser. A compromise is necessary to make maximize the power transfer while maintaining a suitable maximum level of interference due to signal reflections.

Figure 7, 8, 9, 10 shows the result of a simulation using a laser model and a Pi-resistive network to minimize the reflections back to the laser driver and a signal of 4.8GBPS. In Figure 8, the overshoot is the pre-emphasis effect and undershoot is caused by residual reflections.

The high-frequency behaviour of a laser is significantly affected by the electrical parasitics of the laser die and package. The role of the pre-emphasis (or current peaking), for a laser, is to charge and discharge the parasitic capacitances faster.



Figure 6: Impedance matching circuit.



Figure 7: Eye diagram for the current at network input (without matching circuit).



Figure 8: Eye diagram for the current at network input (without matching circuit).



Figure 9: Eye diagram for the laser output (without preemphasis).





Current peaking allows modulation of lasers at higher rates without the need to reduce the laser parasitics. By using signal pre-emphasis (Figure 10) the rise and fall times are decreased from 80pS to 50pS.

# V. CONCLUSION

Using simple assumptions, a broadly applicable model was developed that can be used with many different types of semiconductor lasers. This model is modular in order to separate the analysis made for package parasitic from the laser parameters. Very good agreement between the model and the measurements was obtained, which is fundamental for a correct study of the design of a robust transceiver with demanding requirements.

The impact of the matching network is relevant for the behaviour of the laser driver as it might fail or reduce its performance if the level of the reflected signal is too high. For the laser output, these types of matching networks are not optimal and better improvements can be achieved when using signal pre-emphasis since resistive matching networks will always reduce the transmitted signal level.

#### REFERENCES

- Hessey, N. (2008), "Overview and Electronics Needs of ATLAS and CMS High Luminosity Upgrades", Proceeding of the TWEPP2008, Naxos, Greece.
- [2] Amaral, L. et al (2008), "Evalutation of Multi-Gbps Optical Transceivers for Use in Future HEP Experiments", Proceeding of the TWEPP2008, Naxos, Greece.
- [3] Tucker, R.; Kaminow, I., "High-frequency characteristics of directly modulated InGaAsP ridge waveguide and buried heterostructure lasers", *Journal of Lightwave Technology*, vol.2, no.4, pp. 385-393, Aug 1984.
- [4] Morton, P. A.; et. al, "Frequency response subtraction for simple measurement of intrinsic laser dynamic properties," IEEE Photon. Technol. Lett., vol. 4, pp. 133–136, Feb. 1992.
- [5] Salgado, H. M.; and O'Reilly, J. J.; "Volterra series analysis of distortion in semiconductor laser diodes," Proc. Inst. Elect. Eng. J., vol. 138, no. 6, pp. 379–382, Dec. 1991.
- [6] Cartledge, J.C.; Srinivasan, R.C. (1997), "Extraction of DFB laser rate equation parameters for system simulation purposes", Journal of Lightwave Technology, vol.15, no.5, pp.852-860.
- [7] Bruensteiner, M.; Papen, G.C. (1999), "Extraction of VCSEL rate-equation parameters for low-bias system simulation," IEEE Journal of Selected Topics in Quantum Electronics, vol.5, no.3, pp.487-494.
- [8] Blokhin, S. A. et al (2006), "Experimental Study of Temperature Dependence of Threshold Characteristics in Semiconductor VCSELs Based on Submonolayer InGaAs QDs", Journal of Semiconductors, Vol. 40, pp1232-1236.
- [9] Silva, Sergio; Salgado, Henrique M. (2009), "VCSEL laser characterization and modelling for future optical transceiver at the super large hadron collider", Proceedings of the 11th International Conference on Transparent Optical Networks (ICTON '09), pp.1-5, June 28-July 2, 2009.
- [10] Fukushima, T. et al (1991 b), "Relative intensity noise reduction in InGaAs/InP multiple quantum well lasers with low nonlinear damping", IEEE Photonics Technology Letters, vol. 3, no. 8, pp.688-693, August, 1991.

# Presentation of the "ROC" Chips Readout

# F. Dulucq<sup>a</sup>, C. de La Taille<sup>a</sup>, G. Martin-Chassard<sup>a</sup>,

<sup>a</sup> IN2P3/LAL/OMEGA, UPS11-Bat200, Orsay, France

dulucq@lal.in2p3.fr

# Abstract

The OMEGA group at LAL has designed 3 chips for ILC calorimeters: one analog (SPIROC) and one digital (HARDROC) for the hadronic one and also one for the electromagnetic one (SKIROC). The readout and the management of these different chips will be explained.

To minimize the lines between the ASICs and the DAQ, the readout is made thanks to 2 lines which are common for all the chips: Data and TransmitOn. As the chips are daisy chained, each chip is talking to the DAQ one after the other. When one chip has finished its readout, it starts the readout of the chip just after. Moreover, during this readout, only the chip which is talking to the DAQ is powered: this is made thanks to the POD (Power On Digital) module in the ASIC. In the ILC mode, readout sequence is active during inter bunch crossing (like ADC conversion).

Another chip designed for PMM2 R&D program (PARISROC) integrates a new selective readout: that's mean only hit channels are sent to the DAQ in a complete autonomous mode.

#### I. GENERAL OVERVIEW

#### A. Some ROC chips and their applications

#### 1) MAROC:

MAROC (Multi-Anode ReadOut Chip) is designed to read multi-anode photomultipliers [1] of the ATLAS luminometer made of Roman pots (Figure 1).



Figure 1: MAROC chip layout

#### 2) SKIROC:

SKIROC (Silikon Kalorimeter ReadOut Chip) has been designed to read-out the upcoming generation of Si-W calorimeter featuring ILC requirements (Figure 2).

| A BEASESEESE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                   | 122301       | 10,000,000               | 120.000         |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------|--------------|--------------------------|-----------------|
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                   | 100.010      | 22221223                 |                 |
| Conception of the                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | COLUMN TWO IS NOT |              |                          |                 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 10000             |              | 10 C R                   |                 |
| 1 M 1 M 1 M 1 M 1 M 1 M 1 M 1 M 1 M 1 M                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | 22252             |              | 2010                     |                 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                   |              | St (d=                   |                 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 12222             |              | 00 i G =                 |                 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                   |              | (and 1)                  | 하는 비행 !         |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                   |              |                          |                 |
| A DECK OF A DECK OF A DECK                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | 5 2 5 5 F         |              | SF 3.13                  |                 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                   |              | (S 1 ()                  |                 |
| 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                   |              | 8216                     |                 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 22252             |              | 80 H H                   |                 |
| Contract of the second second                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                   | Riden Hall   | 1994                     |                 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 100000            |              | 1211                     |                 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                   |              |                          |                 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 10110             |              |                          |                 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 888888 E          |              | 81 M -                   |                 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                   | 122          | 21 2 A                   |                 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                   |              |                          |                 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                   |              | 2018-                    |                 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                   |              | 10-14-                   |                 |
| A DESCRIPTION OF TAXABLE PARTY OF TAXABL |                   | والمراسبينية | Conception in succession |                 |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                   |              |                          |                 |
| Contraction of the local division of the loc | College States    | - HOLES      | in lightin               |                 |
| dia a man                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |                   |              |                          | and the second  |
| Constantine and                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                   |              |                          | ACCOUNT OF LALE |

Figure 2: SKIROC chip layout

#### 3) HARDROC:

HARDROC (HAdronic Rpc Detector ReadOut Chip) is the front end chip [2] designed for the readout of the RPC or GEM foreseen for the Digital HAdronic CALorimeter of the future ILC (Figure 3).



Figure 3: HARDROC chip layout

#### 4) SPIROC:

SPIROC (Silicon Photomultiplier Integrated ReadOut Chip) is a dedicated front-end electronics [3] for an ILC prototype of hadronic calorimeter with Silicon photomultiplier readout (Figure 4).



Figure 4: SPIROC chip layout

# 5) PARISROC:

PARISROC (Photomultiplier ARray Integrated in Sige ReadOut Chip) is the front end ASIC designed for the PMM2 R&D project dedicated to neutrino experiments (Figure 5).



Figure 5: PARISROC chip layout

# B. High level working

The ROC chips can be divided in 2 groups: analog and digital ones. It depends if the signal from the detector is first stored into an analog way or directly in a digital way.

For analog chips, discriminated analog signals are first stored into an analog memory (the SCA: switched capacitor array) and then converted into digital words thanks to an ADC. These digital values are stored in a RAM to be readout at the end of the acquisition cycle.

For digital chips, the ADC is not needed as data are directly saved into the RAM. This is shown in next Figure 6.



Figure 6: High level working

#### C. Main analog block

These analog chips are based on Switched Capacitor Array (SCA). The number of channels managed can be up to 64. Fine time measurement is available depending on the application and experiment. The main architecture of analog part is given below in Figure 7.



Figure 7: Main analog block

Figure 8 is an example of the behaviour of a "Track & Hold Cell" which allows to lock the capacitor value at the maximum of the analog signal.



Figure 8: Behaviour of Track and Hold cell

#### II. TIMING CONSIDERATIONS

#### A. Global overview

Depending on the application, acquisition module is not active all the time. For example, in bunch crossing train sequence like in the future ILC, acquisition is stopped after each train (Figure 9). This is the case for SKIROC, SPIROC and HARDROC chips for ILC calorimeters.



Figure 9: ILC sequences

In neutrino experiment, acquisition is never stopped. This is the case for PARISROC chip which can handle an acquisition active all the time. During its conversion and readout phases, discriminated analog signals can be stored in the SCA if it is not full.

# B. Future ILC requirements

Future ILC is based on a 200ms bunch crossing train period (Figure 10). For the front end electronics, the digital part of acquisition is active only during the bunch crossing and the conversion and the readout are active during inter bunch crossing.



Acquisition and conversion modules are active at the same time for all the ASICs. That's why the power on can be managed by the DAQ. But, as the readout is daisy chained, the power on should also be managed by the daisy chain. On the table 1 below are represented the maximum duration of each phase.

For each cycle of 200ms, one ASIC should be powered only 8 ms (4% of cycle). This will allow to meet power budget. POD module was designed to fulfil this requirement for the digital part.

| Table | 1: | ASIC | timings |
|-------|----|------|---------|
|-------|----|------|---------|

| Phase       | Duration | Comments                      |
|-------------|----------|-------------------------------|
| Acquisition | 1ms      | Bunch crossing train duration |
| Conversion  | 3ms      | Worst case (32 conversions)   |
| Readout     | 4ms      | 5 MHz readout clock           |

#### III. POWER ON DIGITAL (POD) MODULE

#### A. Block diagram

Power On Digital (POD) module has been design to meet ILC power budget for front end electronics. Il allows to start and stop clocks depending on the 3 phases (acquisition, conversion and readout).

Additionally with start/stop of the clock, it manages the LVDS receiver bias current: its power supply. The combination of clock gating and LVDS management allows to meet the power budget of the ILC (see Figure 11).



The POD module is divided in 3 parts. One is activated and managed by the DAQ during common phases: acquisition and conversion. The second one is set during the readout by the daisy chained token. The last one manages the LVDS receiver.

#### B. Layout

POD has been layouted with standard 3-b cells from AMS (Austria Micro System). The main layout features are given after in Table 2.

Table 2: ASIC timings

| Area      | 120 um x 80 um       |  |
|-----------|----------------------|--|
| Flip flop | 8 FF                 |  |
| Layout    | 2 metals (M1 and M2) |  |
| Frequency | Up to 40 MHz         |  |

Besides, power supply pins are accessible at each corner of the module (Vdd, Gnd and Vss). This module has been integrated in HARDROC chips at revision 2 and higher (Figure 12).



# C. Detailed working

The timing diagram given below (Figure 13) represents the complete sequence driven by the DAQ. It shows the 3 sequences and how they are managed. The "clock stopped" zones corresponds when POD is off (not scaled). As mentioned above, it represents 96% of time and allows to meet power budget requirement (25 uW per channel). For each phases, clock is started asynchronously, enabled and stopped synchronously (idle state at logic '0').



#### 1) Acquisition and conversion phases

PowerON is set during the reset phase before each acquisition. It allows to start the LVDS receiver and consequently the clocks. When clocks are established, reset can be released; this is done after reset startup time which is about 200 ns. That's why reset duration must be longer than LVDS wakeup time (Figure 14).



PowerON is released at the end of the conversion. It is synchronized internally to properly stop the clocks. Effective PowerOn release is done after few clock ticks (Figure 15).



PowerON is asynchronously set by the DAQ during the reset state and it is synchronously released by the POD in each chips.

#### 2) Readout phase

PowerON during daisy chained readout is done by the previous chip thanks to its calibrated EndReadout which is the StartReadout of the chip just after it. This signal allows to start LVDS receiver and then synchronously the clocks. Finally, it generates an internal StartReadout for state machines (Figure 16).



At the end of the readout, clocks and LVDS receivers are stopped synchronously (Figure 17). Effective PowerOn release is done after few clock ticks (2-3).



#### IV. PARISROC NEW READOUT

Parisroc [4] integrates a state machine to control the 3 phases: it allows to have a complete autonomous working. Moreover, compare to other ROC chips, it integrates a new channel management: they are completely independent. That's mean, when 1 channel is hit, ADC conversion is started and then the readout of this channel. The readout will only treat hit channels, that's why this module tags each frame with its channel number.

During conversion and readout, acquisition is never stopped: triggers are stacked into SCA and treated as soon as possible (Figure 17).



#### V. REFERENCES

[1] Barrillon P. et al., "MAROC: Multi-Anode ReadOut Chip for MaPMTs", *Nuclear Science Symposium Conference Record*, 2006. *IEEE*, vol.2, pp.809-814, Oct. 29 2006-Nov. 1 2006

[3] Seguin-Moreau N. et al., "HARDROC1, Readout chip of the Digital Hadronic Calorimeter of ILC", *TWEPP 2007 Conference Proceedings* 

[3] Raux L. et al., "SPIROC (SiPM Integrated Read-Out Chip): Dedicated very front-end electronics for an ILC prototype hadronic calorimeter with SiPM read-out", *TWEPP 2008 Conference Proceedings* 

[4] Dulucq F. et al., "Digital part of PARISROC: A photomultiplier array readout chip", *Nuclear Science Symposium Conference Record*, 2008. NSS 08. IEEE, pp.2002-2005, 19-25 Oct. 2008

# Position Measurements with Micro-Channel Plates and Transmission lines using Pico-second Timing and Waveform Analysis

Bernhard Adams<sup>a</sup>, Klaus Attenkofer<sup>a</sup>, Mircea Bogdan<sup>b</sup>, Karen Byrum<sup>a</sup>, Jean-Francois C. Genat<sup>b</sup>, Herve Grabas<sup>c</sup>, Henry J. Frisch<sup>b</sup>, Heejong Kim<sup>b</sup>, Mary K. Heintz<sup>b</sup>, Tyler Natoli<sup>b</sup>, Richard Northrop<sup>b</sup>, Eric Oberla<sup>b</sup>, Samuel Meehan<sup>b</sup>, Edward N. May<sup>a</sup>, Robert Stanek<sup>a</sup>, Fukun Tang<sup>b</sup>, Gary Varner<sup>d</sup> and Eugene Yurtsev<sup>a</sup>

<sup>a</sup>Argonne National Laboratory, 9700, S. Cass Avenue, Argonne, IL 60439, USA
<sup>b</sup>University of Chicago, Enrico Fermi Institute, 5640 S. Ellis Av, Chicago IL 60637, USA
<sup>c</sup>Ecole Superieure d'Electricite, Gif-sur-Yvette, 91190, France
<sup>d</sup>University of Hawai'i, 2500 Campus Road, Honolulu, HI 96822, USA

genat@hep.uchicago.edu

# Abstract

The anodes of Micro-Channel Plate devices are coupled to fast transmission lines in order to reduce the number of electronics readout channels, and can provide two-dimension position measurements using two-ends delay timing. Tests with a laser and digital waveform analysis show that resolutions of a few hundreds of microns along the transmission line can be reached taking advantage of a few pico-second timing estimation. This technique is planned to be used in Micro-channel Plate devices integrating the transmission lines as anodes.

## I. INTRODUCTION

DELAY-LINE readout with pico-second timing resolution allows measuring the impact of a particle along a detector with a precision better than one millimetre. The time distance relation is:

# $\Delta t = 2 \Delta x / v$

where v is the propagation velocity of the pulse along the line.

As some photo-detector applications would cover tens of square meters, it is also important to reduce the number of electronics channels. Delay lines coupled to the detector and read at their two ends can reduce significantly this number, compared to full pixels detectors such as regular Micro Channel Plates available from the industry. The transmission lines could also be integrated with the photo-detector itself in order to reduce the physical dimensions and power, increase the analog bandwidth, improve the readout speed, and provide all-digital data output, when equipped with custom designed readout Application Specific Integrated Circuits (ASIC).

We present in this work position measurement results obtained with Micro-Channel Plate detectors tied to 50  $\Omega$ transmission lines implemented on high-frequency printed circuit boards, read with a fast digital oscilloscope.

# II. EXPERIMENTAL SET-UP

Micro-Channel Plate tubes from Photonis X85011 and X85022 (resp. 25 and 10 µm pore size) of 2-inch x 2-inch size have been used in this work. A 25 µm MCP with 1024 anode pads is shown Figure 1. These MCPs have been connected to 50 Ohms transmission lines printed circuit cards and tested using a 408 nm wavelength laser focused on the window entrance of the MCP. The number of amplified photoelectrons is evaluated using a single photo electron sensitive photo-multiplier. Once the velocity along the 10 cm-long transmission line is determined, the position along the line is derived from the difference in delays between the two ends of the card. Since the signals at the two ends originate from the same pulse at the output of the MCP, their shapes are strongly correlated. Then, a waveform analysis using least square fits to a known template signal derived from the averaged measurements allows extracting the time of arrival of the pulse at the two ends of the line.

A transmission line printed circuit card has been designed using an RF ceramic substrate allowing reaching bandwidths up to 3 GHz. The tubes with 32 x 32 anodes have been glued to the transmission line card with conducting silver epoxy. Electrical tests and tests on a calibrated laser test stand have been performed. Both 25 and 10  $\mu$ m pores MCP have been illuminated with a calibrated 408 nm laser source and measured, in terms of signal waveform, gain, and timing resolution.

#### III. MICRO-CHANNEL PLATE SIGNALS

Typical MCP signals measured at the two ends of a transmission line are shown in Figure 2 for a tube with 25  $\mu$ m pores, for an input laser signal corresponding to 18 photoelectrons. The high voltage is set between 1.7 and 2.5 kV, depending mainly upon the pore size, and the tube is connected to the 50  $\Omega$  transmission lines on the printed circuit card. Each line reads a row of 32 anode pads at 1.6 mm pitch.



Figure 1: A 1024-anodes tube from Photonis. Bottom view showing the anodes outputs.

Each end is loaded with 50  $\Omega$  at the inputs of a fast sampling oscilloscope (Tektronix 6154C) which records the signals at 20 GSa/s sampling frequency and a 9 GHz analog bandwidth. In this particular case, the rise-time is of the order of half a nanosecond, the amplitude of the order of a few tens of millivolts. Table 1 shows the measured amplitudes and rise times for MCPs with 25 and 10  $\mu$ m, at voltages of 2.0 and 2.5 kV respectively, illuminated by the laser light providing 18, 50, and 158 photo-electrons.

The laser test bench has been calibrated using a Quantacon Burle 8500 single electron resolution photo-multiplier. The laser was a PLP-10 from Hamamatsu, equipped with a 408 nm head. The light pulse duration is specified to be 70ps (FWHM).

It has been previously reported that a transmission line load allows keeping the intrinsic current pulse waveform from the micro-channel plate, compared to readout where all pads would be tied together, due to a significant reduction of the capacitances and inductances seen from each anode [1, 2].

A transmission-line readout card has been implemented on a printed circuit board (4350B from Rogers) with 32 parallel transmission lines of 50  $\Omega$  impedance at a 1.6 mm pitch, each reading one row of anode pads at the back of the tube (Figures 3, 4 and 5).

Each transmission line on the readout card is glued with conducting silver epoxy to the associated row of 32 anodes readout electronics. Only 6 of the 32 transmission lines were brought out to SMA connectors at the edge of the card to testing purposes. The reminding lines were terminated at each end in 50  $\Omega$ . The 32-anode pads of the MCP are stub-tied evenly over 2-inches, each pad contributing approximately a 100 fF capacitance to the line. The lossy transmission line model in the simulation was extracted from a layout of the printed circuit board by the HyperLynx simulator (Mentor-Graphics).



Figure 2: Signals from a Photonis XP85011 micro-channel plate photo detector with  $25\mu$ m-diameter pores, recorded with a Tektronix TDS6154C oscilloscope using a calibrated laser test-stand. The signal corresponds to 50 photo-electrons, with a signal-to- noise ratio (average amplitude over rms noise) of 38dB. The oscilloscope analog bandwidth is 9 GHz, the sampling rate is 20 GSa/s, and the horizontal and vertical scales are 2.5 ns/division, and 5mV/division, respectively.

Table 1. MCP signal's amplitudes for 18, 50, and 158 photoelectrons, for 25 and 10  $\mu$ m pores Micro-Channel Plates from Photonis, read with a 50  $\Omega$  transmission lines card

| Pores and High Voltage | 25µm 2kV | 10μm 2.5kV |
|------------------------|----------|------------|
| Photo-electrons        | mV       | mV         |
| 10                     | 25       | 68         |
| 50                     | 35       | 100        |
| 158                    | 78       | 224        |



Figure 3: The transmission line card equipped with a 25 µm MCP.

Figure 6 and 7 show the simulation results. An input pulse with a 100 ps rise time is applied to the center pad. The simulation for signal integrity was setup with the equivalent representation shown in Figure 5. The input signal can be applied on any of the 32 anodes. The output voltage pulses are obtained at the 50  $\Omega$  terminations at each end. The green curve is the input pulse on the center pad; the red is the output pulse at the left end termination; the blue is the output pulse at the right end termination. The observed reflections on the input and output pulses are due to impedance discontinuities over the transmission line from the 32 stub-loaded capacitances of the MCP anodes. A 5ps/mm propagation time constant was predicted from the simulations.

Figure 7 shows a simulation of a pulse propagated through a transmission line of 1m length. From Figure 7, a propagation velocity of 5ps/mm is predicted: a 16ps time delay between left and right output was observed since the line length on the left is 1.6 mm shorter than the line length on the right side.



Figure 4: The  $50\Omega$  transmission lines card.



Figure 5: Electrical equivalent of the transmission line MCP readout.

The simulation shows that the transmission-line has an analog bandwidth of 3.5 GHz, well-matched to the output bandwidth of a tube with a rise-time of 100 ps.

### IV. TIMING TECHNIQUES

There are a number of techniques to measure the arrival time of very fast electrical pulses [3-7, 13]. Typically one measures the time at which the pulse crosses a single threshold, or, for better resolution, the time at which the pulse reaches a constant fraction of its amplitude [7]. An extension of the threshold method is to measure the time that a pulse crosses multiple thresholds. A recent development is the large-scale implementation of fast analog waveform sampling onto arrays of storage capacitors using CMOS integrated circuits at rates on the order of a few GSa/s. Most, if not all of them, have actually 3-dB analog bandwidths below 1 GHz [8-

11]. The steady decrease in feature size and power for custom integrated circuits now opens the possibility for multi-channel chips with multi-GHz analog bandwidths, able to sample between 10 and 100 GHz, providing both time and amplitude after processing.

Assuming that the signals are recorded over a time interval from before the pulse to after the peak of the pulse with sufficient samples, the fast waveform sampling provides the information to get the time of arrival of the first photoelectrons, the shape of the leading edge, and the amplitude and integrated charge. While other techniques can give time, amplitude, or integrated charge, fast sampling has the advantage that it collects all the information, and so can support corrections for pileup, baseline shifts before the pulse, and filtering for noisy or misshapen pulses.



Figure 6 Simulation of the transmission line using the model of Figure 5. A 100 ps rise-time current signal is injected at the center of the transmission line (pad 16).

In addition, this method is not sensitive to base-line shifts due to 'pile-up', the overlap of a pulse with a preceding one or many, a situation common in high-rate environments such as in collider applications.



Figure 7: Simulation of the propagation of a pulse through a 1m transmission line. Red curve: the input pulse, green curve: the output on the left side, blue curve, the output on the right side.

Also, for applications in which one is searching for rare events with anomalous times, the single measured time does not give indications of possible anomalous pulse shapes due to intermittent noise, rare environmental artifacts, and other rare but real annoyances common in real experiments. In contrast, constant fraction discrimination takes into account only the pulse amplitude. The most commonly used constant fraction discriminator technique forms the difference between attenuated and delayed versions of the original signal, followed by the detection of the zero crossing of the difference signal. There are therefore three parameters: the delay, the attenuation ratio, and the threshold. These parameters have to be carefully set with respect to the pulse characteristics in order to obtain the best timing resolution.

Waveform sampling stores successive values of the pulse waveform. For precision time-of-arrival measurements, such as considered here, one needs to fully sample at least the waveform leading edge over the peak. The sampling method is unique among the four methods in providing the pulse amplitude, the integrated charge, and figures of merit on the pulse-shape and baseline, important for detecting pile-up or spurious pulses. An iterative least-squares fit making use of a noiseless MCP template signal is then applied to the data using an algorithm that has been implemented for high resolution calorimetry measurements with Liquid Argon detectors [12,13].

#### V. RESULTS

Using the calibrated test bench, it has been possible to illuminate MCPs mounted on transmission lines cards as shown in Figure 4 with a controlled amount of light (408 nm), at given rates, over the whole sensitive area of the MCP photocathode.

Moving the laser light spot along a transmission line and recording the signal at the two ends of the line allows measuring both the instant (average of the two times of arrival) and the position (difference of the two times of arrival) of the impact on the photocathode. The distribution of the instants allows measuring the timing resolution of the MCP, and the distribution of the differences provides the position resolution as described below.

The spread (RMS) is on the order of 30 ps for each distribution, corresponding to the sum of different contributions: MCP transit time spread, laser jitter, oscilloscope trigger jitter, electronics system noise. Figure 8 shows 80 pulses from a 25  $\mu$ m MCP recorded at 20 GSa/s at the two ends of the central transmission line. The oscilloscope was triggered by a pulse synchronous with the laser, and the two pulses were recorded on the two traces of the same trigger frame.

Figure 9 shows the histogram of the difference of the times of arrival, deduced from the sampled data using the timing extraction technique described above. The spread is on the order of a few pico-seconds, as the two signals are strongly correlated, since they originate from the same current pulse injected by the MCP's anodes at the same location in the transmission line.

The electronics system noise is the only contribution to this spread, as all others cancel out. At 158 photo-electrons, the position spread has been found to be  $124\mu m$ , and the differential time spread to be 2.3ps for the 10  $\mu m$  MCP at 2.5 kV. The propagation constants have been measured to be 7.6 ps/mm for the 25  $\mu m$  MCP, and 9.3 ps/mm for the 10  $\mu m$  MCP, at 2.0 and 2.5 kV respectively.

Figure 10 shows the measured position resolution versus the high voltage for the 25  $\mu$ m MCP and a light input corresponding to 158 photo-electrons. Figure 11 shows the position spread versus the position of the light spot along the

transmission lines, for both MCPs, at 18 and 50 photoelectrons.



Figure 8: Traces from 80 measured pulses at one end of the central transmission line. The large spread in time of arrival is mainly due to the laser jitter triggering the oscilloscope.

# VI. SUMMARY AND CONCLUSIONS

We have shown that Micro-Channel Plate detectors coupled to fast transmission lines read with waveform sampling can measure the position along the lines with accuracy well below 1 mm.

The measurements agree well with simulations based on templates derived from real signal shapes and theoretical modeling of the transmission line electrical characteristics. The readout scheme of transmission lines with impedance matched waveform sampling at each end allows using MCPbased photo-detectors for large area sensors in which several devices could be read in series, reducing significantly the number of electronics channels, and consequently the power, the on-detector material, and the amount of data.



Figure 9: The histogram of the measured times of arrival differences deduced from the sampled data processed with the algorithm from [12]. RMS is 2.3ps corresponding to a position spread of 125µm



Figure 10: The histogram of the measured times of arrival differences deduced from the sampled data processed with the algorithm from [12]. RMS is 2.3ps corresponding to a position spread of  $125\mu$ m.



Figure 11: Measured position resolution versus position at 18 and 50 Photo-electrons between 5 and 45 mm from the side of the MCP.

#### **ACKNOWLEDGEMENTS**

We are indebted to Dominique Breton, Eric Delagnes, Stefan Ritt, and Gary Varner and their groups working on fast waveform sampling. We thank John Anderson, Gary Drake, Andrew Kobach, Jon Howorth, Keith Jenkins, Patrick Le Du, Richard Northrop, Erik Ramberg, Anatoly Ronzhin, Larry Ruckman, Greg Sellberg, and Jerry Va'Vra for valuable contributions. We thank Paul Hink and Paul Mitchell of Photonis for much help with the MCP's, and Larry Haga of Tektronix Corporation for his help in acquiring the Tektronix TDS6154C.

# VII. REFERENCES

[1] C. Ertley, J. Anderson, K. Byrum, G. Drake, E. May, http://www.hep.anl.gov/ertley/windex.html

[2] F. Tang, J. Anderson, K. Byrum, G. Drake, C. Ertley, H.J. Frisch, J-F. Genat, "Transmission line readout with Good Time and Space resolutions for Planacon MCP-PMT"; *Proceedings of the TWEPP-08 Workshop*, Naxos, Greece, 2008, pp. 579-581

[3] D.I. Porat, "Review Of Sub-nanosecond Time Interval Measurements"; IEEE Trans. Nucl. Sci. 1973, pp. 36-.

[4] J-F. Genat, "High Resolution Time to Digital Converters"; Nucl. Inst. Meth. A315, 1992, pp. 411-414.

[5] J. Kalisz, "Review of Methods for Time Interval Measurements with Picosecond Resolution"; Institute of Physics Publishing, *Metrologia*, 41, 2004, pp. 17-32.

[6] An extensive list of references on timing measurements can be found in: A. Mantyniemi, MS Thesis, Univ. of Oulu, 2004; ISBN 951-42-7460-I; ISBN 951-42-7460-X;

http://herkules.oulu.fi/isbn951427461X/isbn951427461X.pdf

[7] S. Cova et al., "Constant Fraction Circuits for Picosecond Photon Timing with Micro-channel Plate Photomultipliers"; *Review of* 

Scientific Instruments, 64-1, 1993, pp. 118-124.

[8] D. Breton, E. Auge, E. Delagnes, J. Parsons, W. Sippach, V. Tocut, "The HAMAC rad-hard Switched Capacitor Array"; ATLAS note, CERN 2001.

[9] E. Delagnes, Y. Degerli, P. Goret, P. Nayman, F. Toussenel, and P. Vincent, "SAM : A new GHz sampling ASIC for the HESS-II Front-End"; Cerenkov Workshop, 2005.

[10] S. Ritt, "Design and Performance of the 5 GHz Waveform Digitizer Chip DRS3"; Submitted to Nuclear Instruments and Methods, 2007.

[11] G. Varner, L.L. Rudman, A. Wong, "The First version Buffered Large Analog Bandwidth (BLAB1) ASIC for high Luminosity Colliders and Extensive Radio Neutrino Detectors"; Nucl. *Inst. Meth.* A591, 2008 pp. 534.
[12] W.E. Cleland and E.G. Stern, "Signal Processing considerations for Liquid Ionization Calorimeters in a High Rate Environment"; *Nucl. Instr. Meth.* A338, 1994, pp. 467-497.

[13] J-F. Genat, G. Varner, F. Tang, H.J. Frisch, "Signal Processing for Picosecond Resolution Timing Measurements"; Submitted to *Nuclear*. *Instruments and Methods*.

# Hardware studies for the upgrade of the ATLAS Central Trigger Processor

D. Berge <sup>a</sup>, J. Burdalo <sup>a</sup>, N. Ellis <sup>a</sup>, P. Farthouat <sup>a</sup>, S. Haas <sup>a</sup>, J. Lundberg <sup>a</sup>, S. Maettig <sup>a, b</sup>, A. Messina <sup>a</sup>, T. Pauly <sup>a</sup>, D. Sherman <sup>a</sup>, R. Spiwoks <sup>a</sup>

<sup>a</sup> CERN, 1211 Geneva 23, Switzerland <sup>b</sup> University of Hamburg, 20146 Hamburg, Germany

#### stefan.haas@cern.ch

# Abstract

The ATLAS Central Trigger Processor (CTP) is the final stage of the first level trigger system which reduces the collision rate of 40 MHz to a level-1 event rate of 75 kHz. The CTP makes the Level-1 trigger decision based on multiplicity values of various transverse-momentum thresholds as well as energy information received from the calorimeter and muon trigger sub-systems using programmable selection criteria.

In order to improve the rejection rate for the first phase of the luminosity upgrade of the LHC to  $3 \cdot 10^{34}$  cm<sup>-2</sup> s<sup>-1</sup> planned for 2015, one of the options being studied consists of adding a topological trigger processor, using Region-Of-Interest information from the calorimeter and potentially also the muon trigger. This will require an upgrade of the CTP in order to accommodate the additional trigger inputs. The current CTP system consists of a 9U VME64x crate with 11 custom designed modules where the functionality is largely implemented in FPGAs. The constraint for the upgrade study presented here was to reuse the existing hardware as much as possible. This is achieved by operating the backplane at twice the design frequency and required developing new FPGA firmware for several of the CTP modules.

We present the design of the newly developed firmware for the input, monitoring and core modules of the CTP as well as results from initial tests of the upgraded system.

#### I. INTRODUCTION

The ATLAS experiment at the Large Hadron Collider (LHC) at CERN uses a three-level trigger system. The Level-1 trigger [1] is a synchronous system operating at the bunch crossing (BC) frequency of 40.08 MHz of the LHC. It uses information on clusters and global energy in the calorimeters and on tracks found in the dedicated muon trigger detectors to reduce the event rate to 75 kHz. Figure 1 shows an overview of the ATLAS Level-1 trigger system. The Muon to Central Trigger Processor Interface (MUCTPI) [2] combines the data from the trigger sectors of the two dedicated muon trigger detectors in the barrel and end-cap regions and calculates muon candidate multiplicity values. The Central Trigger Processor (CTP) [3] uses the muon multiplicities from the MUCTPI together with electron/photon, tau/hadron and jet multiplicities, as well as event energy information received from the calorimeter trigger processors to make the final Level-1 trigger decision (L1A) based on a list of programmable selection criteria (trigger menu). Trigger inputs from various other sources, such as luminosity detectors, minimum bias scintillators and beam pick-ups can also be taken into account. The CTP receives timing signals from the

LHC and distributes them, along with the L1A, through the trigger, timing and control (TTC) network to the sub-detector back-end and front-end electronics. It also sends Region-of-Interest (RoI) data to the Level-2 trigger system (LVL2) and trigger summary information to the data acquisition system (DAQ). In addition the CTP provides integrated and bunch-by-bunch scaler data for monitoring of the trigger, detector and beam conditions. For a full overview see [4].



Figure 1: Overview of the ATLAS Level-1 Trigger

#### **II. CTP ARCHITECTURE**

The CTP system consists of a single 9U VME64x chassis with three dedicated backplanes and 11 custom designed modules of 6 different types. Figure 2 below shows the architecture of the CTP.



Figure 2: Architecture of the CTP

The machine interface module (CTPMI) receives the timing signals from the LHC and distributes them over the COM backplane to the other modules. Each of the 3 input modules (CTPIN) receives up to 124 trigger input signals, synchronizes and aligns them and sends selected trigger signals over the PIT bus backplane to the monitoring and core modules. The monitoring module (CTPMON) performs bunch-by-bunch monitoring of the PIT bus signals. The core module (CTPCORE) generates the Level-1 accept signal (L1A) according to programmable selection criteria and sends trigger summary information to the Level-2 trigger and DAQ systems through optical link interfaces (S-LINK). The four output modules (CTPOUT) distribute the trigger and timing signals through up to 20 cables to the sub-detector TTC partitions and receive calibration requests from the subdetectors. The calibration module (CTPCAL) time-multiplexes the calibration request signals from the CAL backplane and performs level conversion of front-panel NIM input trigger signals.

Figure 5 shows a picture of the CTP crate installed in the ATLAS underground counting room. From left to right are the CTPMI, three CTPINs, the CTPMON, the CTPCORE, four CTPOUTs, and the CTPCAL modules. Currently 9 of the 12 CTPIN input connectors are used, however not all of the signals on the trigger cables are allocated. Another two complete CTP systems are available in the laboratory as spares as well as for firmware and software development.



Figure 3: The CTP installed in the ATLAS counting room

### III. CTP TRIGGER PATH

Figure 4 below shows the trigger path of the CTP. Each of the three CTPIN modules receives, synchronises and aligns the trigger signals from four input cables with 31 signals each. Reconfigurable switch matrices then select which of the aligned trigger inputs to drive onto the 160 PIT bus lines.



The CTPCORE module receives the 160 selected trigger signals from the PIT bus. Look-up tables (LUT) at the input generate 256 trigger conditions from the 160 PIT signals and additional internal triggers. A ternary contents-addressable memory (CAM) then calculates 256 trigger items as logical combinations of these trigger conditions. Those trigger items are pre-scaled and gated with a programmable mask, preventive dead-time and the busy signal from the experiment's readout. The logical OR of all items then forms the final L1A signal which is fanned out to the sub-detectors. Dead-time is generated in the CTPCORE to prevent the frontend buffers in the experiment from overflowing. The memory files for the LUT and CAM of the CTPORE and the configuration files for the switch matrices of the CTPINs are automatically generated from the Level-1 trigger menu by software and are loaded when the CTP is configured. The design of the CTP has been optimized for low latency; it takes only 4 bunch crossings (BC) from the trigger signals being received at the CTPIN to the L1A signal being sent from the CTPOUT modules to the experiments TTC partitions.

There are a total of up to 372 trigger inputs for the full CTP system however the number of trigger signals usable in the L1A formation is limited to 160 by the number of PIT bus lines. In order to accommodate additional trigger input signals we have therefore doubled the PIT bus transfer rate by operating it at 80 MHz using double-data rate (DDR) signalling. This results in an effective PIT bus width of 320 bits. This modification also required significant changes to the FPGA firmware of the CTPIN, CTPMON and CTPCORE modules.

The PIT backplane is a short multi-drop bus which connects the switch matrix outputs of the 3 CTPIN modules to the inputs of the CTPCORE and CTPMON modules. It spans 5 VME slots and uses SSTL2 signal levels with a combined series/parallel termination scheme. Although the PIT bus was originally only designed to operate at 40 MHz, the DDR operation has been shown to work reliably.

#### IV. THE CTP INPUT MODULE

The CTPIN has four identical channels, which receive 31 LVDS trigger input signals at 40 MHz each. Figure 5 below shows a picture of the CTPIN module.



Figure 5: Picture of the CTPIN module

After level conversion, an FPGA synchronizes the trigger inputs to the internal clock, aligns them with respect to each other using programmable length pipelines and optionally checks their parity. The synchronized trigger inputs can be stored in a diagnostic memory for debugging and monitoring purposes. The memory can also be used to inject data into the channel. This functionality is implemented in an Altera Stratix FPGA (EP1S20).

A second FPGA (Altera Cyclone EP1C20) is used to monitor the trigger inputs with counters that integrate over all bunches in a LHC turn. Each channel also features a TDC (CERN HPTDC ASIC) to measure the phase of every trigger input signal. Finally a configurable switching matrix implemented in a CPLD (Lattice ispXPLD) is used to select and route the aligned trigger inputs to the PIT bus. The internal clock of the CTPIN module can be adjusted using a programmable delay line (CERN DELAY25 ASIC). Figure 6 below shows a simplified block diagram of the module.



Figure 6: CTPIN architecture

The modified firmware of the synchronization and alignment FPGA features DDR output registers which drive the 31 trigger signals onto 16 DDR lines. Since the monitoring FPGA connects to the same lines, DDR input registers were added there. In addition a 90° phase shifted clock is sent to the monitoring FPGA in order to correctly latch the DDR signals. The 16 DDR output signals of each channel are sent to the switch matrix CPLD which selects and routes up to 64 signals from the CTPIN onto the 160 PIT bus lines.

# V. THE CTP CORE MODULE

The LUT/CAM FPGA (Xilinx XC2VP50) receives the PIT bus signals on the CTPCORE and implements the LUT and CAM for the trigger formation. DDR input registers were added at the input, the clock for latching the PIT signals can be adjusted using a programmable delay line (CERN DELAY25 ASIC). Figure 7 below shows a picture of the CTPCORE module.



Figure 7: Picture of the CTPCORE module

Since there are now twice as many trigger inputs, the structure of the LUT and CAM also needed to be adapted. Figure 8 below shows a block diagram of the new LUT/CAM FPGA.



Figure 8: CTPCORE LUT/CAM FPGA

An array of 28 12-input LUTs generates 448 trigger conditions from the 320 trigger inputs. This includes the internally generated triggers, namely two random triggers, two pre-scaled clocks and eight triggers for programmable groups of bunch crossings. The width of the ternary CAM was also increased from 256 originally to 448 to accommodate all the trigger conditions. However the number of trigger items was kept at 256, because of limited FPGA resources and PCB connections.

The trigger inputs on the CTPCORE are also sent to another FPGA which implements monitoring counters and writes them into FIFO buffers for DAQ/LVL2 readout and monitoring. This functionality is implemented in an Altera Stratix FPGA (EP1S60). Since there are not enough PCB connections for the 320 trigger signals, DDR signaling was also used to transmit these signals to the monitoring/readout FPGA. In addition the number of PIT monitoring counters was doubled (324) and the readout formatting unit was adapted to accommodate the additional PIT bus data words required in the readout/monitoring event format.

#### VI. THE CTP MONITORING MODULE

The CTPMON module decodes the 160 PIT bus signals and selects trigger inputs that are to be monitored. It then builds a histogram with 3564 entries for each of the 160 decoded PIT signals, in order to monitor them on a bunch-bybunch basis. This functionality requires a large number of onchip memory blocks and is implemented in 4 large Altera Stratix FPGAs (EP1S80).

The firmware of the PIT bus interface FPGA was modified to include DDR input registers, but unfortunately the memory resources of the histogramming FPGAs were not sufficient to increase the number of signals being monitored. Therefore we decided to implement a simple selection mechanism in the PIT bus interface FPGA which allows selecting 160 of the 320 trigger inputs for monitoring.

#### VII. TEST RESULTS

After an extensive verification phase using simulation and static timing analysis, the modified FPGA firmware was loaded onto one of the reference CTP systems and tested. Slightly adapted versions of the system test programs from the software framework developed for diagnostics and operation of the CTP [5] were used for this purpose. These test programs allow sending arbitrary data patterns from the test memories on the CTPIN modules and checking the various monitoring buffers and counters on the CTPIN, CTPMON and CTPCORE modules against the calculated values. In order to determine the timing margins we also measured the timing window where the PIT bus data could be safely latched on the CTPCORE and CTPMON modules. This was done by scanning the programmable clock delays available on the CTPIN and CTPCORE modules and checking the correct operation using the test programs mentioned above. The concept is illustrated in Figure 9 below.

The valid data window for latching the PIT bus signals at the CTPMON input was measured to be 65% (8 ns) of the clock half-period (12.5 ns) and 70% (9 ns) at the CTPCORE input. Timing variations between the three CTPIN modules were small, on the order of 1 ns. The trigger latency has increased to 7 BC, due to the DDR input and output registers that had were added in the FPGAs.



Figure 9: PIT bus timing scan

#### VIII. SUMMARY

We have presented an upgrade of the ATLAS CTP which increases the number of useable trigger inputs from 160 currently to 320 by operating the PIT bus backplane at 80 MHz using DDR signaling. This was feasible because the PIT bus backplane was carefully designed, and the FPGAs on the CTPIN and CTPCORE modules are relatively recent and have sufficient spare resources.

The basic functionality of the CTP has been maintained, there are however some limitations:

- The CTPMON can only monitor 160 of the 320 PIT signals due to limited FPGA memory resources.
- The mapping of the trigger input signals to the LUT inputs on the CTPCORE using the switch matrices on the CTPIN modules is somewhat less flexible since trigger inputs need to be allocated in pairs.
- The latency has increased from 4 to 7 BC, although it may be possible to reduce this to 6 BC.

The CTP upgrade presented here could even be of interest before the first phase of the LHC luminosity upgrade, since already now all 160 PIT bus signals are allocated, so there is no headroom for potential additional trigger inputs.

#### IX. REFERENCES

- [1] The ATLAS Collaboration, "*The ATLAS Experiment at the CERN Large Hadron Collider*", JINST 3 (2008) S08003.
- [2] S. Haas et al., "The ATLAS Level-1 Muon to Central Trigger Processor Interface", Topical Workshop on Electronics for Particle Physics, CERN-2007-007, November 2007.
- [3] R. Spiwoks et al., "The ATLAS Level-1 Central Trigger Processor (CTP)", 11th Workshop on Electronics for LHC and Future Experiments, CERN/LHCC/2005/038 265, November 2005.
- [4] S. Ask et al., "The ATLAS Central Level-1 Trigger Logic and TTC System", JINST 3 (2008) P08002.
- [5] R. Spiwoks et al., "Framework for Testing and Operation of the ATLAS Level-1 MUCTPI and CTP", these proceedings.

# SPIROC (SiPM Integrated Read-Out Chip): Dedicated very front-end electronics for an ILC prototype hadronic calorimeter with SiPM read-out.

Michel Bouchel, Stéphane Callier, Frédéric Dulucq, Julien Fleury, Christophe de La Taille, Gisèle Martin-Chassard, Ludovic Raux

> IN2P3/LAL- Orsay - France Corresponding authors: raux@lal.in2p3.fr

#### Abstract

The SPIROC chip is a dedicated very front-end electronics for an ILC prototype hadronic calorimeter with Silicon photomultiplier (or MPPC) readout. This ASIC is due to equip a 10,000-channel demonstrator in 2009. SPIROC is an evolution of FLC\_SiPM used for the ILC AHCAL physics prototype [1].

SPIROC was submitted in June 2007 and will be tested in September 2007. It embeds cutting edge features that fulfil ILC final detector requirements. It has been realized in 0.35m SiGe technology. It has been developed to match the requirements of large dynamic range, low noise, low consumption, high precision and large number of readout channels needed.

SPIROC is an auto-triggered, bi-gain, 36-channel ASIC which allows to measure on each channel the charge from one photoelectron to 2000 and the time with a 100ps accurate TDC. An analogue memory array with a depth of 16 for each channel is used to store the time information and the charge measurement. A 12-bit Wilkinson ADC has been embedded to digitize the analogue memory content (time and charge on 2 gains). The data are then stored in a 4kbytes RAM. A very complex digital part has been integrated to manage all theses features and to transfer the data to the DAQ which is described on [2].

After an exhaustive description, the extensive measurement results of that new front-end chip will be presented.

# I. SECOND GENERATION SIPM READOUT: SPIROC

#### A. SPIROC: an ILC dedicated ASIC.

The SPIROC chip has been designed to meet the ILC hadronic calorimeter with SiPM readout [4]. The next figures (5 and 6) show an AHCAL scheme. One of the main constraints is to have a calorimeter as dense as possible. Therefore any space for infrastructure has to be minimized. One of the major requirements is consequently to minimize power to avoid active cooling in the detection gap. The aim is to keep for the DAQ-electronics located inside the detection gaps the power as low as 25  $\mu$ W per channel.



Figure 1: A half-octant of the HCAL



Figure 2: AHCAL integrated layer

# B. SPIROC: general description

#### Table 1: SPIROC description



Figure 3: SPIROC layout

The SPIROC chip is a 36-channel input front end circuit developed to read out SiPM outputs. The block diagram of the ASIC is given in *Figure 4*. Its main characteristics are given in Table 1.



Figure 4: SPIROC general scheme

#### C. SPIROC analogue core

A low power 8-bit DAC has been added at the preamplifier input to tune the input DC voltage in order to adjust individually the SiPM high voltage (see *figure 5*).



Figure 5: SPIROC connection

Two variable preamplifiers allow to obtain the requested dynamic range (from 1 to 2000 photoelectrons) with a level of noise of 1/10 photoelectron. Then, these charge preamplifiers are followed by two variable CRRC<sup>2</sup> slow shapers (50 ns-175 ns) and two 16-deep Switched Capacitor Array (SCA) in which the analogue voltage will be stored. A voltage 300 ns ramp gives the analogue time measurement. The time is stored in a 16-deep SCA when a trigger occurs. In parallel, trigger outputs are obtained via fast channels made of a fast shaper followed by a discriminator. The trigger discriminator threshold is given by an integrated 10-bit DAC common to the 36 channels. This threshold is finely tuneable on additional 4 bits channel by channel. The discriminator output feeds the digital part which manages the SCA. The complete scheme of one channel is shown on *figure 6* 



Figure 6: SPIROC one channel diagram

#### D. Embedded ADC

The ADC used in SPIROC is based on a Wilkinson structure. Its resolution is 12 bits. As the default accuracy of 12 bits is not always needed, the number of bits of the counter can be adjusted from 8 to 12 bits. This type of ADC is

particularly adapted to this application which needs a common analogue voltage ramp for the 36 channels and one discriminator for each channel. The ADC is able to convert 36 analogue values (charge or time) in one run (about 100  $\mu$ s at 40 MHz). If the SCA is full, 32 runs are needed (16 for charges and 16 for times).

#### E. Expected analogue performance

The new analogue chain in SPIROC allows the single photo electron calibration and the signal measurement to be on the same range, simplifying greatly the absolute calibration. An analogue simulation of a whole analogue channel is shown in *figure 7*. It is obtained with an equivalent charge of 1 photoelectron (160 fC at SiPM gain  $10^6$ ).

For the time measurement, the simulation shows a gain of 120 mV per photoelectron with a peaking time of 15 ns on the "fast channel" (preamplifier + fast shaper). The noise to photoelectron ratio is about 24 which is quite comfortable to trigger on half photoelectron.

For the energy measurement, the simulation gives a gain of 10 mV per photoelectron with a peaking time of about 100 ns on "high gain channel" (high gain preamplifier + slow shaper). The noise to photoelectron ratio is about 11 and should be sufficient for the planned application. On the "low gain channel", the noise to photoelectron ratio is about 3 and it meets largely the requirement



Figure 7: One channel simulation

# F. SPIROC operating modes

The system on chip has been designed to match the ILC beam structure (*figure 8*). The complete readout process needs at least 3 different steps: *acquisition phase, conversion phase, readout phase,* and possibly *idle phase.* 



| Acquisition                                                                                                                                                                                                                                                   | A/D conversion                                                                                                                          | DAQ                                                                                                                                                                                                                                                                                |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| When an event occurs :<br>• Charge is stored in<br>analogue merory<br>• Time is stored in digital<br>(coarse) and analogue (fine)<br>memory<br>• Trigger is automatically<br>rearmed at next coarse time<br>flag (bunch crossing ID)<br>Depth of memory is 16 | The data (charge and time)<br>stored in the analogue<br>memory are sequentially<br>converted into digital data<br>and stored in a SRAM. | The events stored in the<br>RAM are readout through a<br>serial link when the chip gets<br>the token allowing the data<br>transmission.<br>When the transmission is<br>done, the token is transforred<br>to the next chip.<br>256 chips can be read out<br>through one serial link |

Figure 8: SPIROC running modes

#### • Acquisition mode :

During the *acquisition mode*, the valid data are stored in analogue memories in each front-end chip during the beam train. An external signal is available to erase the active column named "No\_Trigger". It can be used to erase the column if a trigger was due to noise.



Figure 9: Operation of Track and Hold

#### • Conversion mode :

Then, during the *conversion mode*, the data are converted into digital before being stored in the chip SRAM by following the mapping represented in *figure 10*. The 36 charges and 36 times stored in SCA are converted for each column. When these 72 conversions are over, data are stored in the memory in order to start a new one for the next column.

The Bunch Crossing Identifier (BCID), hit (H) channels and gains (G) are also saved into RAM



#### • Readout mode :

Finally, during the *readout mode*, the data are sent to DAQ during the inter-train (20kbits per ASIC per bunch train). The readout is based on a daisy chain mechanism initiated by the DAQ. One data line activated sequentially is used to readout all the ASIC on the SLAB.



#### • Idle mode :

When all these operations are done, the chip goes *to idle mode* to save power. In the ILC beam structure 99 % of power can be saved.

The management of all the different steps of normal working (acquisition, A/D measure and read-out) needs a very complex digital part which was integrated in the ASIC [3] (see on the *figure 12*).



Figure 12: Interaction between digital and analog part

# G. Power pulsing

The new electronics readout is intended to be embedded in the detector. One important feature is the reduction of the power consumption. The huge number of electronic channels makes crucial such a reduction to 25  $\mu$ Watt per channel using the power pulsing scheme, possible thanks to the ILC bunch pattern: 2 ms of acquisition, conversion and readout data for 198 ms of dead time. However, to save more power, during each mode, the unused stages are off.

#### **II. MEASUREMENTS**

# A. 8-bit input DAC performance

The input DAC span goes from 4.5V down to 0.5V with a LSB of 20 mV. The default value is 4.5V in order to operate the SiPM at minimum over-voltage when the DAC is not loaded. The linearity is  $\pm 2\%$  (5LSB), just enough for the SiPM operation but consistent with the allocated area. Also, the dispersion between channels, although not fundamental could also be improved. The power dissipation is well within the specs and the 100nA bias current to V<sub>dd</sub> makes the chip difficult to measure without special precautions.


#### Trigger and gain selection 10-bit DAC В. measurement

The linearity for the two thresholds DAC was checked by scanning all the values and measuring the signal for each combination. The figure below gives the evolution of the signal amplitude as a function of the DAC combination. By fitting this line in the region without saturation (up to thermometer = 10), we obtained a nice linearity of  $\pm 0.2$  % on a large range.





#### Charge measurement С.

Waveforms were recorded with a fixed injected charge of 100 fC and for variable preamplifier gains as one can see on the Figure 15 which represents the amplitude as a function of time for different gains.



Figure 15: High gain slow shaper waveforms for a fixed injected charge of 160 fC and different preamplifier gains.



Figure 16: 1/V<sub>out</sub> versus C<sub>f</sub> (preamplifier gain capacitance)

From these measurements the linearity of the charge output as a function of the gain was calculated to be around  $\pm 1$  % (see figure below).

The next figure represents the high gain output signal amplitude as a function of the injected charge. The fit to the linear part of the curve is better than 1%.



Figure 17: High gain slow shaper linearity

We also looked at the cross-talk on the slow shaper path. Figure 17 represents the waveforms of a channel 8 and its neighbours for an injected charge of 15 pC. The amplitude of the neighbouring channels is multiplied by 100. The calculation of maximum ratio gave a cross-talk of less than 0.3%.



The photoelectron to noise ratio of 4 allows to nicely resolve the single photoelectrons peaks. The next figure shows the single photo electron spectrum.



#### D. Time measurement

Well known S-curves were also studied. They correspond to the measurement of the trigger efficiency during a scan of the input charge or the threshold while the other parameters, like the preamplifier gain, are kept constant. Figure 20 represents the trigger efficiency as a function of the DAC values for the 36 channels of a single chip. All channels were set at Cf=0.2pF and the input signal was fixed at Qinj=50 fC. We obtained 100 % trigger efficiency for an input charge of approximately 50 fC which corresponds to 1/3 pe as requested.



Figure 21 represents the evolution of the 50 % trigger efficiency as a function of the injected charge



Figure 21: 50 % trigger efficiency input charge versus applied threshold for a single channel and a fixed preamplifier gain

Figure 22 which displays the threshold as a function of the input injected charge shows that each channel can also auto trigger down to 40fC which corresponds to the  $5\sigma$  limit.



The time walk and trigger jitter are given on the two next figures. The figure shows the relative trigger time as function of injected charge. The maximum time amplitude between small and large signal is about 10 ns and the jitter can decrease until 200 ps.



Figure 23: Trigger time walk and trigger jitter

#### **III.** CONCLUSION

The SPIROC chip has been submitted in June 2007 and its test started in October 2007. It embeds cutting edge features that fulfil ILC final detector requirements including ultra low power consumption and extensive integration for SiPM readout. The system on chip is driven by a complex state machine ensuring the ADC, TDC and memories control.

The SPIROC chip is due to equip a 10,000-channel demonstrator in 2009 in the frame work of EUDET.

#### **IV. REFERENCES**

[1] LC-DET-2006-007: Dedicated very front-end electronics for an ILC prototype hadronic calorimeter with SiPM readout

S. Blin<sup>4</sup>, B. Dolgoshein<sup>3</sup>, E. Garutti<sup>1</sup>, M. Groll<sup>2</sup>, C. de La Taille<sup>4</sup>, A. Karakash<sup>3</sup>, V. Korbel<sup>1</sup>, B. Lutz<sup>1</sup>, G. Martin-Chassard<sup>4</sup>, A. Pleshk<sup>3</sup>, L. Raux<sup>4</sup>, F. Sefkow<sup>1</sup>

<sup>1</sup> DESY, Hamburg, Germany

<sup>2</sup> University of Hamburg, Germany

<sup>3</sup> Moscow Engineering and Physics Institute, Moscow, Russia

<sup>4</sup> LAL/IN2P3, Orsay, France

[2] DigiPtal part of SiPM Integrated Read-Out Chip ASIC for ILC hadronic calorimeter

F. Dulucq<sup>1</sup>, M. Bouchel<sup>1</sup>, C. de La Taille<sup>1</sup>, J. Fleury<sup>1</sup>, G.Martin-Chassard<sup>1</sup>, L. Raux<sup>1</sup>,

<sup>1</sup> IN2P3/LAL, Orsay, France

[3] FLC\_SIPM : front-end chip for SIPM readout for ILC analog HCAL

2005 International Linear Collider Workshop – Stanford, USA.

C.De La Taille<sup>1</sup>, G.Martin-Chassard<sup>1</sup>, L.Raux<sup>1</sup>

<sup>1</sup> IN2P3/LAL, Orsay, France

[4] System aspects of the ILC-electronics and power pulsing

P. Goettlicher (DESY) for the CALICE-collaboration

## An FPGA-based Emulation of the G-Link Chip-Set for the ATLAS Level-1 Barrel Muon Trigger

A. Aloisio<sup>a,b</sup>, F. Cevenini. <sup>a,b</sup>, R. Giordano<sup>a,b</sup>, V. Izzo<sup>a</sup>

<sup>a</sup> INFN - Sezione di Napoli - Via Cintia, 80126, Napoli, Italy

<sup>b</sup> Università degli Studi di Napoli "Federico II" - Via Cintia, 80126, Napoli, Italy

aloisio@na.infn.it cevenini@na.infn.it rgiordano@na.infn.it izzo@na.infn.it

#### Abstract

Many High Energy Physics experiments based their serial links on the Agilent HDMP-1032/34A serializer/deserializer chip-set (or GLink). This success was mainly due to the fact that this pair of chips was able to transfer data at  $\sim$  1 Gb/s with a deterministic latency, fixed after each power up or reset of the link. Despite this unique timing feature, Agilent discontinued the production and no compatible commercial off-the-shelf chip-sets are available. The ATLAS Level-1 Muon trigger includes some serial links based on GLink in order to transfer data from the detector to the counting room. The transmission side of the links will not be upgraded, however a replacement for the receivers in the counting room in case of failures is needed.

In this paper, we present a solution to replace GLink transmitters and/or receivers. Our design is based on the gigabit serial IO (GTP) embedded in a Xilinx Virtex 5 Field Programmable Gate Array (FPGA). We present the architecture and we discuss parameters of the implementation such as latency and resource occupation. We compare the GLink chip-set and the GTP-based emulator in terms of latency, eye diagram and power dissipation.

#### I. INTRODUCTION

Trigger systems of High Energy Physics (HEP) experiments need data transfers to be executed with fixed latency, in order to preserve the timing information. This requirement is not necessarily satisfied by Serializer-Deserializer (SerDes) chip-sets, which can have latency variations in terms of integer numbers of Unit Intervals (UIs) and/or of clock cycles of the parallel domain. For instance, the TLK2711A [1] exhibits latency variations up to 31 UIs on the receiver data-path. The Gigabit link, or GLink, chip-set [2], produced by Agilent, was able to transfer data at data-rates up to 1 Gb/s with a fixed latency even after a power-cycle or a loss of lock. Serial links of data acquisition systems of HEP experiments have been often based on the GLink chip-set. For instance it has been deployed in the Alice, ATLAS, Babar [3], CDF, CMS, D0 and Nemo [4] experiments (just to cite some of them). The chip-set became so widely used, that CERN produced a radiation hard serializer compatible with it [5]. Unfortunately, a few years ago Agilent discontinued the production of the chip-set and users needing replacements are looking for alternative solutions. Latest FPGAs include embedded multi-Gigabit SerDes, which offer a wide variety of configurable features. The benefit from the integration of such a device in FPGA is in terms of power consumption, size, board layout complexity, cost and re-programmability. The Level-1 Barrel Muon Trigger of the ATLAS experiment includes GLink serial links in order to transfer data from the detector to the counting room. The transmission side of the links is on-detector and will unlikely be upgraded, however a replacement for the receivers in the counting room in case of failures is needed. We developed a replacement solution for GLink transmitters and receivers, based on the gigabit serial IO (GTP) embedded in Xilinx Virtex 5 Field Programmable Gate Array (FPGA). Our solution preserves the fixed-latency feature of the original chip-set. In the coming sections we will introduce the present L1 Barrel Muon Trigger and the GLink chip-set, then we will describe the architecture and the implementation of our design. Eventually we will present some test results about our emulator, comparing them also with the GLink chip-set.

### II. ATLAS BARREL MUON TRIGGER AND DAQ

The ATLAS detector [6] is installed in one of the four beamcrossing sites at the Large Hadron Collider (LHC) of CERN. The detector has a cylindrical symmetry and it is centered on the interaction point. ATLAS consists of several subsystems, among them there is a muon spectrometer, which in the barrel region is built in the loops of an air-core toroidal magnet and includes Resistive Plate Chambers (RPCs). RPCs are arranged in towers used for the Level-1 (L1) muon trigger (Fig. 1). The spectrometer is divided in two halves along the axis and each half is in turn divided in 16 sectors. A physical sector is segmented in two trigger sectors, including 6 or 7 RPC towers each.

The whole trigger system is implemented as a synchronous pipeline, with a total latency of 2.0  $\mu s$ , clocked by the Timing, Trigger and Control (TTC) system [7] of the LHC. The TTC distributes timing information such as the bunch crossing clock (at about 40 MHz) and the L1 trigger.

The read-out and trigger electronics of the barrel muon spectrometer includes an on-detector part and an off-detector one. A board on the detector, the PAD [8], transfers data to a Versa Module Eurocard (VME) board in the counting room, the Sector Logic/RX (SL/RX) [9], via an 800-Mbps serial link based on the GLink chip-set. Each SL/RX board includes 8 GLink receivers and two FPGAs handling the received data and the communication with other off-detector boards.



Figure 1: Left: Cross section of the ATLAS muon spectrometer. Right: Level-1 Trigger and DAQ for the spectrometer.

During the trigger decision, data are stored by the ondetector electronics. If the event is validated, a L1 accept signal is broad-casted to the PADs, which transfer data to the RX/SL. The RX/SL board, in turn, sends data to other VME boards for further processing and storage. More information about the AT-LAS barrel muon trigger and Data Acquisition (DAQ) can be found in [10].

#### III. THE GLINK CHIP-SET

The GLink chip-set consists of a serializer (HDMP-1032A) and a deserializer (HDMP-1034A). The chips work with datarates up to 1 Gb/s and encode data according to the Conditional Inversion Master Transition (CIMT) protocol. In order to read serial data, the receiver extracts a clock from the CIMT stream and locks its phase to the master transition. The recovered clock synchronizes all the internal operations of the receiver and it is available as an output. Received data are transferred out of the device synchronously with the recovered clock and the chip-set architecture is such that the overall link latency is deterministic. Moreover, by means of the dedicated Parallel Automatic Synchronization System (PASS), it is also possible to output data synchronously with a local receiver clock, provided that it has a constant phase relationship with the transmission clock (like it happens in the ATLAS L1 barrel muon trigger, which is clocked by the LHC machine clock).

We now briefly introduce the CIMT encoding protocol. A CIMT stream is a sequence of 20-bit words, each containing 16 data bits (D-Field) and 4 control bits (C-Field). The C-Field flags each word as a data word, a control word or an idle word. Idle words are used in order to synchronize the link at start-up and to keep it phase-locked when no data or control words are transmitted. The protocol guarantees a transition in the middle of the C-Field and the receiver checks for this transition in received data in order to perform word alignment and to detect errors. Two encoding modes are available: one compatible

with older chip-sets and an enhanced one, which is more robust against incorrect word alignments. The DC-balance of the link is ensured by sending inverted or unaltered words in such a way to minimize the bit disparity, defined as the difference between the total number of transmitted 1s and 0s. By reading the C-Field content, the receiver is able to determine whether the payload is inverted or not and restore its original form.

#### **IV. GLINK EMULATION**

We built our GLink emulator around the Xilinx GTP transceiver [11], embedded in Virtex 5 [12] FPGAs. Other FPGA vendor offer embedded SerDes, for instance Altera with the GX and Lattice with the flexiPCS. However, the fixed-latency characteristic of our emulator is deeply-based on some hardware features of the GTP. For a discussion about the possibility to implement a fixed-latency link with FPGA-embedded SerDes see [13].

#### A. Architecture

The GTP can serialize/de-serialize words 8, 16, 10 and 20 bit wide. We configured it to work with 20-bit CIMT-encoded words at 40 MHz, in order to achieve a 800 Mb/s link. The receiver clock has an unknown, but fixed, phase offset with respect to the transmitter clock. In order to transfer data with minimum latency the GTP allows to skip internal elastic buffers, one being in the data-path of the transmitter and the other one in the data-path of the receiver. When skipping buffers, all phase differences must be resolved between the external parallel clock domain and a clock domain internal to the device. We set up the transmitter to work without the elastic buffer, while we left two options for the receiver: the first one without the buffer and with an improved latency (Configuration1), but with some constraints on the relative phase between transmission and reception clocks and the second one without any phase constraint, but with a higher latency (Configuration2).



Figure 2: Simplified block diagram of the emulator.

On the transmitter, a phase control logic instructs the GTP to align the phase of the internal clock to the transmission clock and asserts the Ready signal when done. A dedicated logic encodes incoming 16-bit words into 20-bit CIMT words and transfers them to the GTP (Fig. 2). The encoder is able to send data, control or idle words and supports an input flag bit exactly like the original chip-set.

On the receiver side, when working in Configuration1, the phase align and control logic checks whether or not it is possible to retrieve data from the link with the assigned parallel clock phase. If it is not possible, the phase must be changed either in the FPGA or outside. In Configuration2 every phase offset is legal, therefore no checks are performed. In order to align received data to the correct word boundary, we added to the GTP: a CIMT decoder and a word align control logic. The decoder checks the C-Field of incoming CIMT words and, if it is not valid, flags an error to the word align control logic. When errors are found, the logic activates the shifter inside the GTP, changing the word boundary alignment of parallel data. If, for a defined number of clock cycles, no errors are found, the align control logic assumes parallel data are correctly aligned and asserts the Aligned signal. The decoder determines if the received word is an idle, a control or a data word, extracts the status of the flag and activates the corresponding outputs.

For the sake of completeness, we inform the reader that our emulator supports all the CIMT encoding modes of the HDMP-1032/34A chip-set, but not the 20/21-bit modes of the older HDMP-1022/24.

#### **B.** Physical Implementation

A full-duplex emulator (transmitter and receiver) requires around 500 Look Up Tables (LUTs) and 400 Flip Flops (FFs), which are 3% of the logic resources available in a Xilinx Virtex 5 LX50T FPGA (Table 1). Such a tiny resource requirement, will allow us to integrate all the eight GLink receivers of the RX/SL board in the FPGA and the impact of this integration will be just a 6% of the fabric resources.

The latencies of the transmitter and the receiver are respectively 6.75 and 5.25 parallel clock cycles (6.75 in Configuration2). Details about the contribution of internal blocks are given in Table 2. For each component we report the latencies in terms of clock cycles and the absolute value. For comparison with the latencies of our solution we recall that latencies of the GLink transmitter and receiver are respectively 1.4 and 3.0 parallel clock cycles. Hence, our emulator has a higher latency with respect to the original chip-set, however this is not an issue for our application.

We notice that a GLink receiver dissipates ~ 800 mW and a transmitter ~ 700 mW (typical @ 1 Gb/s). Each GTP pair (transmitter and receiver) dissipates ~ 300 mW (typical @ 3 Gb/s), hence the power dissipation of the emulator is lower than the one of the original chip-set.

Table 1: Resources used by an implementation of a GLink transmitter/receiver in a Xilinx Virtex 5 LX50T.

| Resource  | Occupied | Percentag | e Available |
|-----------|----------|-----------|-------------|
| LUTs      | 651      | 2.3 %     | 28,800      |
| Registers | 408      | 1.4 %     | 28,800      |
| Slices    | 265      | 3.7 %     | 7,200       |
| DCMs      | 2        | 17 %      | 12          |
| GTPs      | 1        | 8.3 %     | 12          |

Table 2: Latency of the building blocks of the link (receiver in Configuration1).

|                                 | # of   | Block  |
|---------------------------------|--------|--------|
|                                 | clock  | la-    |
|                                 | cycles | tency  |
|                                 |        | (ns)   |
| Transmitter                     |        |        |
| Total Encoding Latency (fabric) | 4.5    | 112.5  |
| Total GTP Latency               | 2.25   | 56.25  |
| Total Transmitter Latency       | 6.75   | 168.75 |
|                                 |        |        |
| Receiver                        |        |        |
| Total GTP Latency               | 4.75   | 118.75 |
| Total Decoding Latency (fabric) | 1      | 25     |
| Total Receiver Latency          | 5.25   | 143.75 |
|                                 |        |        |
| Total Link Latency              | 12     | 312.5  |

#### V. TEST RESULTS



Figure 3: Eye diagram comparison between GLink and the GTP.

In order to test our link, we deployed two off-the-shelf boards [14] built around a Virtex 5 LX50T FPGA. The boards route the serial I/O pins of one of the GTPs on the FPGA to SubMiniature version A (SMA) connectors. We connected the transmitter and the receiver GTPs with a pair of 5 ns, 50  $\Omega$ impedance coaxial cables. Transmitted and received payloads were available on single ended test-points as well as on Low-Voltage Differential Signaling (LVDS) outputs and were monitored by an oscilloscope to observe latency variations. We used a dual channel clock generator providing two 40-MHz clock outputs with a fixed phase offset. This way, we emulated the TTC system of the ATLAS experiment, which is used to clock data in and out from the link.

We checked that our emulator is able to correctly transmit (receive) data toward (from) an Agilent GLink receiver (transmitter) chip in all the encoding modes supported by the HDMP-1032/34A chip-set. In order to perform this test, we deployed a ML-505 board and a custom board hosting a GLink transmitter and a receiver. The test showed that the emulator correctly exchanges data with a GLink chip in both the CIMT encoding modes.

We present an eye diagram comparison between the Agilent GLink transmitter and the GTP (Fig. 3). We fed the transmitters with the same payload, a 16-bit pseudo random word sequence. We probed the signal on the positive line of the differential pair, at the far end of a 5 ns 50  $\Omega$  coaxial cable. Between the transmitter and the cable, there was a 10 nF decoupling capacitor. We terminated the negative line on its characteristic impedance to keep the differential driver balanced. We notice that the GLink eye width is 50 ps wider than GTP's. Despite the GTP smaller voltage swing (400 mV) with respect to GLink (600 mV), the latter has rise and fall times respectively around 30% and 15% lower. The timing jitter on GTP's edges is  $\sim 210$  ps, while for Agilent transmitter is  $\sim 180$  ps. This difference could be due to the fact that the generation of high-speed serial clock, from the 40-MHz oscillator, requires only the internal Phase Locked Loop (PLL) for GLink. Instead, in our clocking scheme for the

GTP we deployed a Delay Locked Loop (DLL) of the FPGA to multiply the 40-MHz clock in order to obtain the 80-MHz clock. Therefore, the total jitter on the transmitted serial stream includes the contribution of the jitters of both the PLL and the DLL. Moreover, we used a single ended oscillator to source the PLL of the GTP, while the User Guide recommends to use a differential oscillator.

We performed Bit Error Ratio (BER) measurements on the link implemented with our emulator. We deployed a custom Bit Error Ratio Tester (BERT) [15], checking the received payload against a local copy and flagging an error when a difference occurred. More than  $10^{13}$  bits have been transferred and no errors have been observed, corresponding to a  $10^{-12}$  BER, estimated with a 99% confidence level [16]. We did not perform BER measurements for a design integrating multiple G-Link receivers in the same FPGA. However, other studies [17] have shown that the GTP has a good tolerance both to the logic activity in the FPGA fabric and to the switching activity of surrounding IOs.

#### VI. CONCLUSIONS

Data-rates and transmission protocols of SerDes embedded in FPGAs can be changed by simply re-programming the device. By suitably configuring a GTP transceiver and adding few logic resources from the FPGA fabric (~ 3% of the total), we have been able to achieve a complete replacement for the GLink chip-set. Our emulator transfers data with a fixed latency, which was a crucial feature of the original chip-set. We experimentally verified the compatibility of our emulator with GLink both in transmission and reception. Our receiver offers two configuration options: the first one with a shorter internal data-path and with minimum latency, but with some constraints on the relative phase between transmission and reception parallel clocks and the second one without any phase constraint, but with a higher latency. Since the emulator has a tiny footprint in terms of logic resources, in a future upgrade of the RX/SL, it will allow us to integrate all the GLink receivers on the board in a single FPGA, still leaving most of the device resources free for trigger and readout tasks. Hence, the layout of the upgraded board would be simplified with respect to the present. Moreover, a GTP pair dissipates less power than the G-Link chip-set, so the power dissipation due to data de-serialization will be lowered in the upgrade.

#### ACKNOWLEDGMENT

The authors are thankful to Giovanni Guasti and Francesco Contu from Xilinx Italy for their support and help in configuring the GTP transceiver. This work is partly supported as a PRIN project by the Italian Ministero dell'Istruzione, Università e Ricerca Scientifica.

#### REFERENCES

[1] TLK2711A - 1.6 TO 2.7 GBPS TRANSCEIVER, Texas Instruments, 2007 [On-line]. Available: http://focus.ti.com/lit/ds/symlink/tlk2711a.pdf

- [2] Agilent HDMP 1032-1034 Transmitter-Receiver Chip-set Datasheet, 2001, Agilent [On-line]. Available: http://www.physics.ohiostate.edu/~cms/cfeb/datasheets/hdmp1032.pdf
- [3] P. Sanders, "The BaBar trigger, readout and event gathering system", IEEE Trans. on Nucl. Sci., Vol. 45, Issue 4, Part 1, August 1998 pp. 1894-1897
- [4] F. Ameli, "The Data Acquisition and Transport Design for NEMO Phase 1", IEEE Trans. on Nucl. Sci., Vol. 55, Issue 1, Part 1, Feb. 2008 pp. 233-240
- [5] P. Moreira, T. Toifl, A. Kluge, G. Cervelli, F. Faccio, A. Marchioro, J. Christiansen, "GLink and gigabit Ethernet compliant serializer for LHC data transmission", In Nuclear Science Symposium Conference Record, 15-20 Oct. 2000, Vol. 2, pp. 9/6 - 9/9
- [6] ATLAS Collaboration, ATLAS Detector and Physics Performance - Technical Design Report - Volume I, May 1999 [On-line]. Available: http://atlas.web.cern.ch/Atlas/GROUPS/ PHYSICS/TDR/physics\_tdr/printout/Volume\_I.pdf
- [7] B.G. Taylor for the RD12 Project Collaboration, TTC Distribution for LHC Detectors, IEEE Trans. on Nucl. Sci., Vol. 45, No. 3, June 1998, pp. 821-828.
- [8] F. Pastore, E. Petrolo, R. Vari, S. Veneziano, "Performances of the Coincidence Matrix ASIC of the ATLAS Barrel Level-1 Muon Trigger", In Proc. of the 11th Workshop on Electronics for LHC Experiments, Heidelberg, Germany, 12-16 Sept 2005.
- [9] G.Chiodi, E.Gennari, E.Petrolo, F.Pastore, A.Salamon, R.Varia, S.Veneziano, "The ATLAS barrel level-1 Muon Trigger Sector-Logic/RX off-detector trigger and acquisition board", In Proc. of Topical Workshop on Electronics for Particle Physics, Prague, Czech Republic, 03 - 07 Sep 2007, pp.232-237

- [10] F. Anulli et al., The Level-1 Trigger Barrel System of the ATLAS Experiment at CERN, 2009 [On-line]. Available: http://cdsweb.cern.ch/record/1154759/files/ATL-DAQ-PUB-2009-001.pdf
- [11] Virtex-5 FPGA RocketIO GTP Transceiver User Guide - UG196 (v1.7), Xilinx, 2008 [On-line]. Available: http://www.xilinx.com/support/documentation/ user\_guides/ug196.pdf
- [12] Virtex-5 FPGA User Guide UG190 (v4.3), Xilinx, 2008 [On-line]. Available: http://www.xilinx.com/support/documentation/ user\_guides/ug190.pdf
- [13] A. Aloisio, F. Cevenini, R. Giordano, V. Izzo, "High-Speed, Fixed-Latency Serial Links with FPGAs for Synchronous Transfers", IEEE Trans. on Nucl. Sci., to be published
- [14] ML505/ML506/ML507 Evaluation Platform User Guide - UG347 (v3.0.1), Xilinx, 2008 [On-line]. Available: http://www.xilinx.com/support/documentation/ boards\_and\_kits/ug347.pdf
- [15] A. Aloisio, F. Cevenini, R. Cicalese, R. Giordano, V. Izzo, "Beyond 320 Mbyte/s With 2eSST and Bus Invert Coding on VME64x", IEEE Trans. on Nucl. Sci., Volume 55, Issue 1, Feb. 2008, pp. 203-208
- [16] Statistical Confidence Levels for Estimating Error Probability, Maxim, 2007 [On-line]. Available: http://pdfserv.maxim-ic.com/en/an/AN1095.pdf
- [17] A. Aloisio, F. Cevenini, R. Giordano, V. Izzo, "Characterizing Jitter Performance of Multi Gigabit FPGA-Embedded Serial Transceivers", In Real Time Conference Record, Beijing, China, 10-15 May 2009

# A 40 MHz Trigger-free Readout Architecture for the LHCb Experiment

F. Alessio<sup>a</sup>, R. Jacobsson<sup>a</sup>, Z. Guzik<sup>b</sup>

<sup>a</sup> CERN, 1211 Geneva 23, Switzerland <sup>b</sup>IPJ, 05-400 Swierk/Otwock, Poland

#### Abstract

The LHCb experiment is considering an upgrade towards a trigger-free 40 MHz complete event readout in which the event selection will only be performed on a processing farm by a high-level software trigger with access to all detector information. This would allow operating LHCb at ten times the current design luminosity and improving the trigger efficiencies in order to collect more than ten times the statistics foreseen in the first phase.

In this paper we present the new architecture in consideration. In particular, we investigate new technologies and protocols for the distribution of timing and synchronous control commands, and rate control. This so called Timing and Fast Control (TFC) system will also perform a central destination control for the events and manage the load balancing of the readout network and the event filter farm. The TFC system will be centred on a single FPGA-based multimaster allowing concurrent stand-alone operation of any subset of sub-detectors. The TFC distribution network under investigation will consist of a bidirectional optical network based on the high-speed transceivers embedded in the latest generation of FPGAs with special measures to have full control of the phase and latency of the transmitted clock and information. Since data zero-suppression will be performed at the detector front-ends, the readout is effectively asynchronous and will require that the synchronous control information carry event identifiers to allow realignment and synchronization checks.

#### I. INTRODUCTION

The LHCb experiment at the Large Hadron Collider (LHC) at CERN has submitted an Expression of Interest for an LHCb Upgrade [1] which would allow operating LHCb at ten times the current design luminosity and allow improving the trigger efficiencies in order to collect more than ten times the statistics foreseen in the first phase. Improving the trigger efficiencies requires in practice reading out the full detector ultimately at the LHC crossing rate of 40MHz with the consequence that practically all readout electronics have to be replaced.

Fig. 1 shows the upgraded LHCb readout architecture in consideration. The Front-End Electronics will record and transmit data continuously at 40 MHz. The expected non-zero suppressed event size would result in a very large number of links between the Front-End and the new Readout Boards. It

has been already shown that almost a factor of ten could be gained by sending zero-suppressed data. The zero-suppression would thus have to be performed in radiation-hard Front-End chips. The consequence is that the data will be transmitted asynchronously to the Readout Boards. Therefore, the data frames must include an event identifier in order to realign the event fragments in the Readout Boards. Fig. 2 shows a logical scheme for the Front-End Electronics which we are investigating together with the new readout control.



Figure 1: The upgraded LHCb readout architecture

Optical links based on the CERN GigaBit Transceiver (GBT) are being considered for the readout between the Front-End Electronics and a set of about 400 Readout Boards. The Readout Boards will act as interfaces to the event-building 16 Terabit/s network based on IP-Over-InfiniBand. We advocate here that the Readout Boards also act as the FE interface for timing and synchronous control, as well as the bridge for configuration and monitoring. The event filter farm is to be based on COTS multi-cores.

The only exception in the replacement is the current firstlevel trigger electronics (Level-0 trigger) which already operates at 40 MHz and which may be used to either maintain the readout rate at the current maximum of 1.1 MHz during the time the new readout electronics is being installed or at a rate between 1.1 MHz and 40MHz if the installation of the Data Acquisition (DAQ) network and Event Filter Farm is staged. The use of the current Level-0 trigger system implies that the new Timing and Fast Control (TFC) system will have to support the current distribution system based on the RD12 Timing, Trigger and Control (TTC) development [2].



Figure 2: Proposed Front-End architecture

The rate control may also be achieved by implementing local trigger logic in the new Readout Boards (often referred to as "TELL40" as a follower of the current TELL1 [3]) and use the local decisions or rather "recommendations" centrally in the new TFC system in an intelligent trigger throttle mechanism. This type of rate control may also be used to protect the output bandwidth of the new Readout Boards if data truncation is not desired.

The experience with the current Timing and Fast Control system [4] allows a critical examination and inheriting features which are viable in the LHCb upgrade and which have evolved and matured over already eight years. In this paper we propose a new architecture based on entirely new technologies for LHCb together with an outline of the major functions of the system and their implementations. Since the schedule and logistics will probably not allow installing and commissioning the new readout electronics everywhere during only one shutdown, we aim at maintaining support for the old electronics in the new TFC system. This obviously has to be taken into consideration in the DAQ network as well.

#### **II. SYSTEM AND FUNCTIONAL REQUIREMENTS**

Similar to the current system, the new Timing and Fast Control system should control all stages of the data readout between the Front-End Electronics and the online Event Filter Farm by distributing the LHC beam-synchronous clock, synchronous reset and fast control commands, and at least in the intermediate phase a trigger. Below is a list of the global functions which the new TFC system must support. Since the system must be ready before the readout electronics in order to be used in the development of the sub-detector electronics and detector test beams, the ultimate requirements are obviously flexibility and versatility.

#### A. Bidirectional communication network

The TFC network must allow distributing synchronous information to all parts of the readout electronics and allow collecting buffer status and, at least initially, trigger information to be used for rate control.

#### B. Clock phase and latency control

The synchronous distribution system must allow transmitting a clock to the readout electronics with a known and stable phase at the level of  $\sim$ 50ps and very low jitter (<10ps). It must also allow controlling fully and maintaining stable the latency of the distributed information. Alignment of the individual TFC links and synchronous reset commands together with event number checks will be required to assure synchronicity of the experiment.

#### C. Partitioning

The architecture must allow partitioning, that is the possibility of running autonomously one or any ensemble of sub-detectors in a special running mode independently of all the others. In practice this means that the new TFC system should contain a set of independent TFC Masters, each of which may be invoked for local sub-detector activities or used to run the whole of LHCb in a global data taking, and a configurable switch fabric in the TFC communication network.

#### D. LHC accelerator interface

The system must be able to receive and operate directly with the LHC clock and revolution frequency, and allow full control of the exact phase of the received clock.

#### E. Rate control

The new system should allow controlling the rate, either relying on a "blind" throttle mechanism based on the buffer occupancies in the Readout Boards or on an "intelligent" throttle mechanism based on local trigger decisions computed in the Readout Boards. The local trigger decisions may then be used as "recommendations" for the TFC system to maintain the rate at a specified level.

At the simplest level, the rate control should be based on the actual LHC filling scheme. The TFC system should therefore have means to predict the bunch structure; possibly even receive information about the bunch intensities as measured with beam pickups.

#### F. LO Decision Unit input

As the initial rate control might be based on the old L0 Decision Unit [5], there should be means to interface it with the new TFC system.

#### G. Support for old TTC-based distribution

In order to replace the current readout electronics and commission the new electronics in steps, and make use of the L0 trigger system which is already operating at 40MHz, the new TFC system must support the old TTC system, at least for a period of time during the upgrade phase.

### H. Destination control for the event packets

The system should provide means to synchronously distribute the farm destination to the Readout Boards for each event. This function should also include a request mechanism by which the farm nodes declare themselves as ready to receive the next events for processing. The event transfer from the Readout Boards is thus a push scheme with a passive pull mechanism. The scheme avoids the risk of sending events to non-functional links or nodes, and produces a level of load balancing as well as a rate control in the intermediate upgrade phase with a staged farm. Ultimately this would rather be the only emergency control of the rate when the system has been fully upgraded to a 40 MHz readout.

#### I. Sub-detector calibration triggers

The system must allow generating sub-detector calibration triggers which includes transmitting synchronous calibration commands to the FE electronics.

#### J. Non-zero suppressed readout

Since the proposed Front-End Electronics would perform zero-suppression, a scheme must be envisaged which allows occasionally a non-zero suppressed readout for special purposes. As the bandwidth does not allow this at 40 MHz but there is no requirement for high-rate, the idea is to use the TFC system to synchronize a readout mode in which the readout of a non-zero suppressed event spans over several consecutive crossings.

#### K. TFC data bank

A data bank containing the information about the identity of an event (Run Number, Orbit Number, Event Number, Universal Time) and trigger source information is currently produced by the TFC system and added to each event. A similar block should also be produced in the new TFC system.

#### L. Test-bench support

The system and its components must be built in a way that they can be used stand-alone in small test-benches and testbeams, and they have to be made available at an early stage in the development of the readout electronics.

#### III. OLD VS A NEW ONLINE SYSTEM ARCHITECTURE

Fig. 3 shows schematically the differences between the current LHCb Readout System architecture and the proposed architecture for the LHCb upgrade as seen from the TFC system point of view.

The current TFC system [4] has a wide timing and fast control network to the Readout Boards (ROB) and to the Front-End Electronics based on the Trigger, Timing and Control (TTC) technology developed by the CERN RD12 team [2]. It also has an independent optical throttle network based on a cheap fibre technology to communicate back-pressure to the trigger rate control logic of the TFC system. In total there are four different types of TFC custom electronics modules (TFC master, partition switch, throttle switch, and throttle fanin) and two different types of RD12 TTC modules for the distribution backbone (Optical transmitter, optical fan-out). The TFC system receives the first-level trigger decisions from the Level-0 Decision Unit (L0DU) which processes decision data from the Pile-Up System, the Calorimeter and the Muon detectors at 40MHz and is designed to maintain the rate at a maximum of 1.1MHz.



Figure 3: Old vs New Readout System architecture

In the new architecture the many TFC links to the Front-End Electronics are eliminated by profiting from the bidirectional capability of the CERN GigaBit Transceiver (GBT) development [6] and its capability to carry detector data, timing and fast control information, and Experiment Control System (ECS) information such as configuration and monitoring. In this respect the new Readout Boards become the TFC and ECS interface to the Front-End Electronics. The synchronous TFC information would thus be relayed onto a set of GBT links together with the asynchronous ECS information. The number of links from the Readout Boards to Front-End boards (TFC information and ECS configuration data) may be significantly smaller than the number of links from the Front-End boards to the Readout Boards (detector data and ECS monitoring information), possibly by a factor of ten. The TFC and ECS information would then be fanned out locally at the Front-End boards via appropriate bus types. It should be investigated if a common backplane could be envisaged to a large extent (e.g. xTCA).

The separate TFC distribution network and the throttle network between the TFC Master and the Readout Boards in the current implementation would be replaced by high-speed bidirectional optical links based on commercial technology. Unless needed during the staged upgrade to 40 MHz, the Level-0 Decision Unit would be entirely eliminated. The readout electronics would only require a rate control based on the occupancy in the output stage of the Readout Boards.

The Event Packet Request scheme mentioned in the requirements is maintained by implementing the request protocol on the new DAQ network.

#### IV. NEW TFC ARCHITECTURE

Fig. 4 shows the proposed new TFC architecture to fulfil the requirements of the upgraded LHCb Readout System. In the upgraded scenario, a pool of TFC Masters is instantiated in one single Super Readout Supervisor (S-TFC Master, today called ODIN) based on a single large FPGA for all TFC functions. The S-TFC Master receives the LHC clocks, as well as the LHC Beam Synchronous Timing information, and distributes them to the instantiations.

The link to the sub-detector readout electronics on the S-TFC Master consists of a set of high-speed transceivers. In order to operate the sub-detectors stand-alone in tests or calibrations, the instantiations are independent from one another, each of which contains the logic described in the requirements. The large FPGA incorporates the configurable switch fabric which allows associating any sets of subdetectors to the different optional TFC Master Instantiations.

The use of bidirectional links implies a point-to-point connection to each Readout Board. In order to have a manageable set of transceivers on the S-TFC Master, each Readout Board crate has to contain a fan-out/fan-in module. Thus, physically, each S-TFC Master transceiver is connected via a bidirectional optical link to an S-TFC Interface board to the Readout Boards. Hence there are as many S-TFC Interfaces as there are Readout Board crates<sup>1</sup>, and consequently as many optical bidirectional TFC links and S-TFC Master transceivers. With 24 TFC links, the system would support up to 480 Readout Boards. If more are required, the S-TFC Interface boards could be cascaded.

The physical connection between the S-TFC Interfaces and the Readout Boards is achieved by high-speed bidirectional copper links of maximum a meter in length. Should it be decided that the Readout Boards require backplane communication, for instance implemented in one of the lightweight xTCA technologies, the TFC communication would be implemented on the backplane. The baseline solution is otherwise using hi-cat copper cables.



Figure 4: The New TFC architecture

A TFC transceiver block in the Readout Boards performs the clock recovery and decodes the TFC information. It also relays a subset of the information onto the GBT links which goes from the Readout Boards to the Front-End electronics and which is shared with the ECS configuration data.

Therefore the TFC transceiver block should preferably be located in the FPGA with the GBT transceiver block in Readout Boards. The TFC transceiver block also transmits the trigger/throttle information over the TFC link to the S-TFC Interface.

#### V. R&D STUDIES AND RESULTS FROM SYSTEM SIMULATION

In addition to simulations, the new TFC architecture and the choices of technologies outlined in this paper contain several points requiring feasibility studies on hardware. Below is a summary of issues which need to be addressed:

- Phase and latency control and reproducibility upon power-up with the Altera GX transceivers
- Clock recovery and jitter across the GX transceivers
- Synchronous control command fan-out on the S-TFC Interface and transmission over copper between the S-TFC Interface and the Readout Boards, and effect on jitter
- Clock and synchronous control commands fan-out at the Front-End electronics
- TFC link reset sequence to establish word alignment, and phase and latency calibration across the entire TFC links, including the e-links of the GBTs
- Compounding of the TFC synchronous control information together with the asynchronous ECS information for the GBT links to Front-End electronics
- Implementation to support the old LHCb readout electronics
- Implementation of the control interface based on DIM/TCP/IP in Nios II
- Interface to the DAQ network for the Event Packet Requests and the TFC Data Bank
- Resource usage for S-TFC Master and S-TFC Interface

The use of the GBT-to-FPGA link for data transmission between the Front-End electronics and the Readout Boards is under investigation.

A full simulation framework of the new readout architecture as shown in Figure 1 and 2 has been developed.

It includes a detailed, fully configurable and fully synthesizable clock-level simulation of the new TFC components as described in this paper. It also includes an emulation of the surrounding components such as the GBT links [6], the Front-End electronics and the Readout Boards. The test bench has already allowed defining a preliminary protocol for the new TFC information and has allowed developing the first version of the firmware for the S-TFC Master and the S-TFC Interfaces in their proper environment, estimating the resource usage, studying the latencies of the system, and defining the link reset sequence and timing alignment procedure.

<sup>&</sup>lt;sup>1</sup> In the case that there are several crates filled with few Readout Boards, the S-TFC Interface would span over more than one crate to keep the number of TFC links low.

Moreover, the development of a common simulation framework allows studying and validating different subdetector implementations of the Front-End electronics and allows identifying common solutions for the Front-End electronics and Readout Boards, as well as functional inconsistencies.



Figure 5: Schematic drawing of the system included in simulation defined as a single slice of the new Readout System.

Here first results from the simulation of a single Readout slice of the proposed architecture are presented. Figure 5 shows the system included in simulation.

A single Readout slice comprises the new Readout Supervisor (S-TFC Master), a Readout Board and one Front-End board, outputting currently one GBT link. The starting point of the S-TFC Master logic is the TFC Readout Supervisor used in the current LHCb experiment, with modifications in the protocol, in the reset sequence and in the links configuration. The implementation of the Readout Board logic concentrates on the relay of the TFC commands onto the GBT link, via a S-TFC Decoder/Encoder block, and emulation of data congestion in the Readout System in order to produce a trigger throttle signal. The Front-End block consists essentially of two parts. A Data Generator emulates the detector response, ADC and zero-suppression by producing data on a set of channels according to a Poisson PDF with a mean occupancy specific to the detector, and the LHC filling scheme. The second part implements the derandomization of the data, the packing of the data onto the GBT link, truncation handling, and emulation of the GBT link.



Figure 6: Schematic drawing of a single Front-End channel as implemented in simulation. A VHDL Poisson PDF generator generates ZS data. Data is buffered for processing and then packed onto the GBT link. The nominal LHC machine filling scheme is used in order to exploit the capability of the system during abort gaps and consecutive bunches.

The second part also contains the decoding of the new TFC commands, and applies them to the processing of the events. Figure 6 shows a logical scheme of the Front-End channel.

The system can be customized by changing four main parameters:

- Detector mean occupancy for the data generation
- Channel size in bits
- Number of channels associated to a single FE board, i.e. one GBT link
- Derandomizing buffer depth

The simulation is also prepared in a way that the first part performing the data emulation may be replaced with a different data emulation and data compression to study the requirements of different sub-detectors.

In order to demonstrate the simulation Figure 7 shows the distribution of number of channel with ZS data generated from the Poisson PDF generator for a detector mean occupancy of 30% and 21 channels of 12-bits associated to a single GBT link. The bin of zero occupancy originates from gaps in the LHC filling scheme. Data is buffered in the 15-word deep Derandomizing buffer before being packed and sent over the link. Figure 7 also shows the distribution of the Derandomizing buffer occupancy over almost 3 LHC turns. This particular configuration leads to a peak occupancy of 14 events implying that the truncation mechanism will strongly affect the performance of the system. The simulation shows that in this configuration, 10.5% of incoming events are truncated because of buffer overflow. The simulation also allows demonstrates that the implementation does not lead to any event size bias in the truncation.

With a word size of 80 bits. 80.4 % of the bandwidth of the GBT link is exploited.



Figure 7: On the left, distribution of channels filled with ZS data in agreement with a Poisson PDF. On the right, distribution of the derandomizing buffer occupancy

The link usage of the GBT link can be improved by optimizing the front-end parameters. In fact, configuring the Derandomizing buffer as 24 words-deep, simulation shows that the system decreases the event loss by a factor 2, resulting in 5.4% of truncated events and a GBT link usage of 83.2%. Figure 8 shows the trend of the percentage of truncated event as a function of the Derandomizing buffer depth.



Figure 8: Percentage of events truncated as a function of the Derandomizing buffer depth.

#### VI. PROTOTYPING PLANS

In order to match the schedule for the Upgrade expressed in the EOI [1] and to have a system ready and robust by the time in which each sub-system will start to test their new readout electronics and validate the conformity with the common specifications, the development of the TFC system must take a lead as was done for the current TFC system. This emphasizes the importance that the system is designed with maximum flexibility and versatility in order to adapt and add functionality as the requirements of the readout system emerge.

A first prototype board is being specified. It is aimed at carrying out the feasibility studies described in Section V. It will be a hybrid S-TFC Master/Interface board with a small set of all the functionalities and I/Os of the two boards, including loopback for all links in order to perform link tests, and latency and jitter studies.

#### VII. CONCLUSION

In this paper we have outlined a 'top-down approach' to the design of a new Timing and Fast Control system for the LHCb upgrade. The new architecture relies heavily on new FPGA and link technologies which allow reducing the number of optical links and boards to provide timing and synchronous control to the entire readout chain of LHCb while adding flexibility and robustness.

A full simulation framework for the TFC components including a readout slice of Front-End electronics and Readout Boards has been implemented. It allows developing the TFC functionality and protocols, and testing the readout control in the proper environment at clock level. It also allows studying and validating different Front-End models, and optimizing latency and buffer requirements.

The choices call for several feasibility studies which will be done based on a first hybrid prototype. The R&D plan and the architecture takes into account the fact that the developments of the new readout electronics will need the new TFC system and that stand-alone operation in test-benches outside the pit must be possible.

#### REFERENCES

- LHCb Collaboration, "Expression of Interest for an LHCb Upgrade", CERN/LHCC/2008-007, April 22, 2008
- [2] S. Baron et al., TTC website: http://ttc.web.cern.ch/TTC/
- [3] G. Haefeli et al., "TELL1 Specification for a common read out board for LHCb", LHCb 2003-007, October 10, 2003
- [4] Z. Guzik, R. Jacobsson, B. Jost, "Driving the LHCb Front-End Readout", IEEE Trans. Nuclear Science, vol. 51, pp 508-512, 2004
- [5] R. Cornat, J. Lecoq, P. Perret, "Level-0 decision unit for LHCb", LHCb 2003-065, August 22, 2003
- [6] P. Moreira, A. Marchioro, K. Kloukinas, "The GBT : A proposed architecture for multi-Gb/s data transmission in high energy physics", Topical Workshop on Electronics for Particle Physics, Prague, Czech Republic, 03 - 07 Sep 2007, pp 332-336

# Calibration of the Prompt L0 Trigger of the Silicon Pixel Detector for the ALICE Experiment

C. Cavicchioli<sup>a</sup>, G. Aglieri Rinella<sup>a</sup>, M. Caselle<sup>a,b</sup>, C. Di Giglio<sup>a,b</sup>, C. Torcato de Matos<sup>a</sup>

<sup>a</sup> CERN, 1211 Geneva 23, Switzerland <sup>b</sup> Dipartimento di Fisica dell'Università and INFN, Bari, Italy

costanza.cavicchioli@cern.ch on behalf of the ALICE Silicon Pixel Detector project

### Abstract

The ALICE Silicon Pixel Detector (SPD) is the innermost detector of the ALICE experiment at LHC. It includes 1200 front-end chips, with a total of  $\sim 10^7$  pixel channels. The pixel size is 50 x 425  $\mu$ m<sup>2</sup>. Each front-end chip transmits a Fast-OR signal upon registration of at least one hit in its pixel matrix. The signals are extracted every 100 ns and processed by the Pixel Trigger (PIT) system, to generate trigger primitives. Results are then sent within a latency of 800 ns to the Central Trigger Processor (CTP) to be included in the first Level 0 trigger decision.

This paper describes the commissioning of the PIT, the tuning procedure of the front-end chips Fast-OR circuit, and the results of operation with cosmic muons and in tests with LHC beam.

#### I. SYSTEM DESCRIPTION

ALICE (A Large Ion Collider Experiment) is one of the experiments at the Large Hadron Collider (LHC) at CERN, optimized to study the properties of strongly interacting matter and the quark-gluon plasma in heavy ion collisions [1][2].

The ALICE experiment is designed to identify and track particles with high precision over a wide transverse momentum range (100 MeV/c to 100 GeV/c). ALICE will also take data with proton beams, in order to collect reference data for heavy ion collisions and to address specific stronginteraction topics for which ALICE is complementary to the other LHC detectors.

The Silicon Pixel Detector (SPD) is the innermost detector of the ALICE experiment, providing vertexing and tracking capabilities [5][6][7]. As shown in Figure 1, the SPD is a barrel detector with two layers at radii of 3.9 cm and 7.6 cm, respectively, from the beam axis. The minimum distance between the beam pipe and the inner layer is ~5 mm. The SPD consists of 120 detector modules, called half-staves. Each of them includes two silicon pixel sensors, flip chip bump bonded to 10 front-end readout chips realized in a commercial 0.25 µm CMOS process. One front-end chip contains 8192 pixel cells organized in 32 columns and 256 rows. The pixel dimensions are  $425 \times 50 \ \mu\text{m}^2$  (z × r $\phi$ ); in total there are  $9.83 \times 10^6$  pixels in the SPD. In order to maintain the material budget constraint of 1% X<sub>0</sub> per layer, the sensor chosen thickness is 200 µm and the pixel chips are thinned to 150 µm. Signal and power connections for the chips are provided by an aluminium multilayer bus, glued on top of the ladders.

The 10 front-end chips of each half-stave are connected to a Multi Chip Module (MCM). The MCM contains 4 ASICs and one optical transceiver module: they provide timing, control and trigger signals to the chips. The MCM performs the readout of the front-end chips sending the data to the offdetector electronics in the control room [8]. The MCM is connected to 3 single mode optical fibers; two of them are used to receive the serial control and the LHC clock at 40.08 MHz, and the third is used to send the data to the offdetector electronics.



Figure 1: SPD (right) and one half-stave (left)

Each of the 1200 front-end chips of the SPD may activate its Fast-OR output every 100 ns when at least one pixel inside the chip is hit by a particle. The 1200 Fast-OR bits are sampled and transmitted to the off detector electronics by the MCM. The Fast-OR generation capability is a unique feature among the vertex detectors of the LHC experiments. It allows the SPD to act also as a low latency pad detector that can be added to the first level trigger decision of the ALICE experiment.

The Pixel Trigger (PIT) system [9] was designed to process the Fast-OR bits and produce a trigger output for the Level 0 trigger decision. It is composed of 10 OPTIN boards that receive the data streams coming from the 120 modules of the SPD and extract the Fast-OR bits; the OPTIN boards are mounted on a 9U board, called BRAIN, with a large FPGA (called Processing FPGA, type Xilinx Virtex4) that can apply up to 10 algorithms in parallel on the 1200 Fast-OR bits every 100 ns.

The algorithms are based on topology and multiplicity, and they are implemented using boolean functions.



Figure 2: Pixel Trigger integration

The system integration is shown in Figure 2: the 120 optical fibers for the SPD data (60 per SPD side) are connected to 120 optical splitters, located in the rack next to the CTP. One output of the splitters goes to the routers, located in the control room, for the readout operations, the other output goes to the Pixel Trigger system. The outputs of the Pixel Trigger are sent to CTP of ALICE within 800 ns from the particle collision, to comply with the experimental requirements.

The SPD and the Pixel Trigger control systems are two independent systems; both of them have software drivers implemented in C++ in the Front End Device (FEDs) servers. There are in total two driver systems (spdFed) for the SPD, one per side of the detector, and one driver system (pitFed) for the Pixel Trigger. All the FED systems (spdFed and pitFed) have User Interfaces accessible by an operator through the PVSS II supervision layer [11].

#### II. FAST-OR TUNING

Every chip contains 42 internal DACs, 8 bits each, to provide voltage and current biases to the analog and digital circuitry of the chip. In every chip there is a dedicated Fast-OR circuitry controlled by four DACs. The DAC settings affect the efficiency, uniformity and noise immunity performances of the Fast-OR circuitry.

Table 1: Fast-OR DACs

| DAC name     |                              |  |
|--------------|------------------------------|--|
| Fast_FOPOL   | Fast-OR current pulse source |  |
| Fast_CONVPOL | Current mirror voltage bias  |  |
| Fast_COMPREF | Comparator reference at the  |  |
|              | end of the Fast-OR chain     |  |
| Fast_CGPOL   | Transconductance fine tuning |  |

Tuning of all the SPD modules is required in order to maximize the sensitivity of the detector to single hits and minimize the readout noise of the Fast-OR trigger signal. This has to be done individually for each of the 1200 frontend chips of the SPD.

An initial manual procedure for the tuning has been carried out in the laboratory, in order to study the behaviour of the circuitry responsible of the triggering and to model the impact of the DAC settings on the Fast-OR signal.

The tuning procedure makes use of the possibility to apply a test pulse in every pixel. The test pulse was sent to some pixels inside the chip to simulate the charge generated by a Minimum Ionizing Particle (MIP) going through the sensor. The tuning of the Fast-OR is based on a comparison between the number of test pulses sent to a chip, and the number of Fast-OR pulses detected at the input of the Pixel Trigger for this particular chip. The Fast-OR pulses are counted by counters implemented in each OPTIN board.

The laboratory tests have shown that changing the DAC values can highly affect the Fast-OR signal behaviour. It has been verified the existence of an optimum range of settings for which the efficiency of the Fast-OR signal is high (>95%). With different DAC settings the chip can become totally inefficient or noisy.

#### III. AUTOMATIC TUNING PROCEDURE

The automatic procedure for the Fast-OR tuning was then developed in order to

- 1. reduce the time needed to calibrate all the DACs of all the 1200 front-end chips of the SPD;
- 2. determine values and ranges for all the readout chips with guaranteed efficiency, timing and uniformity performances.

On the basis of the experience gained with the manual calibration, some criteria are applied to optimize the automatic tuning procedure, to reduce the complexity and the time needed for the calibration.

The number of DACs to scan for optimum settings can be limited to four: one DAC that corresponds to the general threshold of the chip and 3 Fast-OR DACs. Table 2 indicates the DACs included in the automatic procedure and their effect on the Fast-OR signal.

Table 2: DACs included in the tuning procedure

| DAC name     | Effect on Fast-OR            |
|--------------|------------------------------|
| Pre_VTH      | global threshold of the chip |
| Fast_FOPOL   | efficiency and uniformity    |
| Fast_CONVPOL | efficiency and uniformity    |
| Fast_COMPREF | digital noise immunity       |

These DACs are scanned over a programmable range set by the operator: the scan can be limited to the optimum range found with the manual tuning. For every DAC setting, the Fast-OR counts in the Pixel Trigger are compared to the number of test pulses sent to the chips. The Fast-OR efficiency is verified in different operating conditions:

- when none of the pixels is activated by a test pulse (to check the noise of the Fast-OR signal during the readout);
- when only one pixel is activated by a test pulse;
- when more than one pixel is activated, without exceeding the maximum occupancy of the chip (~12%).

The Fast-OR tuning procedure can be done in parallel for all the 1200 chips of the SPD.

#### A. Implementation

The components involved in the Fast-OR automatic tuning procedure are presented in Figure 3.



Figure 3: Components involved in the calibration

The Fast-OR tuning is managed by the driver system of the SPD, similarly to several other calibration scans. A C++ class has been implemented and the flow diagram of the main operations performed is shown in Figure 4.



Figure 4: Structure of the Fast-OR calibration class

The spdFed servers interact with the SPD to loop over the DAC values and to define the pixel to receive the test pulses.

They also communicate with the Pixel Trigger to retrieve the Fast-OR data, using commands already implemented in the Pixel Trigger driver.

A new communication layer has been established between the SPD and the Pixel Trigger driver systems. It is based on the Distributed Information Management (DIM) system developed at CERN. The Fast-OR counters of the Pixel Trigger corresponding to the two SPD sides are managed separately, to avoid interferences during the scan.

Once the information of the Fast-OR counters is retrieved by the spdFed, a calibration header is built and then sent to the acquisition system. A Detector Algorithm, based on custom developed C++ classes within the ALICE offline framework, analyzes the data contained in the header and finds for each SPD chip a good DAC combination [12].

The Detector Algorithm analyzes every DAC combination; the DAC values that can satisfy the efficiency requirements for all the pixel configurations activated in a chip are selected. The final DAC settings to be applied are decided finding per each DAC the most frequent value among the ones that have overcome the first selection.

The efficiency requirements during the data analysis can be set with tolerances of up to 5%.

#### B. Results of the procedure

Figure 5 shows a typical result of the Fast-OR calibration: the procedure has been applied over the full range of the DACs, the Fast-OR is plotted as a function of the two DACs Fast\_CONVPOL and Fast\_FOPOL. Three different regions can be identified:

- inefficiency, dark area with Fast-OR counts near zero;
- noise, bright area with very high Fast-OR counts;
- good region, area with Fast-OR counts equal to the number of test pulses sent.



Figure 5: Fast-OR plotted as a function of two Fast-OR DACs

Since the beginning of the commissioning, the Fast-OR tuning have been applied on 105 half-staves, and in these tested half-staves the 1006 chips have been successfully calibrated. The remaining chips are masked in the trigger logic because of noise problems.

For the majority of chips (>95%) it is possible to find DAC values that have a 100% efficiency: the Fast-OR counts

at the input of the Pixel Trigger are exactly the same number as the test pulses sent to the pixels.

Table 3 resumes the status of the calibrated modules of the SPD. The percentage of the operating chips is calculated with respect to the number of tested half-staves.

| Tested<br>half-staves | Inner layer | 33 / 40 (82.5%)     |
|-----------------------|-------------|---------------------|
|                       | Outer layer | 72 / 80 (90.0%)     |
|                       | TOTAL       | 105 / 120 (87.5%)   |
|                       |             |                     |
| Operating<br>chips    | Inner layer | 315 / 330 (95.5%)   |
|                       | Outer layer | 691 / 720 (96.0%)   |
|                       | TOTAL       | 1006 / 1050 (95.8%) |

Table 3: Status of the Fast-OR calibration

The time needed for the tuning procedure depends on the number of half-staves included in the scan and on the ranges applied to the DACs. With DAC ranges that minimize the scan over the inefficient and noise area in the parameter space (see Figure 5), a tuning of the entire detector can be done in less than 4 hours. This is two orders of magnitude less than the time needed for the manual tuning.

#### IV. OPERATION OF THE PIXEL TRIGGER SYSTEM

The Pixel Trigger system could be successfully used as a trigger during the commissioning phase of the SPD and of the other detectors in ALICE since May 2008. This was possible only after the tuning of the Fast-OR circuitry of all the frontend chips.

The first operation of the Pixel Trigger system was during the acquisition of cosmic rays, with a topology based algorithm of top-outer-bottom-outer layer coincidence: the output of the Pixel Trigger is active when a particle activate at the same time at least two chips, one in the upper part of the SPD outer layer, and the other in the lower part of the outer layer. Figure 6 shows a cosmic event with one muon track in the SPD online display.



Figure 6: Cosmic ray in the SPD online display

The flux of cosmic muons in the ALICE cavern has been measured: the rate of a single muon is  $3-4 \text{ Hz/m}^2$ , resulting in an average rate through the SPD of ~1.5 Hz. The trigger rate at the output of the Pixel Trigger system ranges from 0.09 to 0.18 Hz depending on the number of active half-staves; this confirms the efficiency of the Fast-OR tuning.

During the two periods of cosmic runs in ALICE (May – Oct 2008 and Jul – Aug 2009) nearly 110k tracks were acquired with at least 3 clusters in the detector, with the trigger provided by the Pixel Trigger. These tracks show a high purity of more than 99.6%. The runs with cosmics are very useful to study the alignment of the detector modules, and of the SPD with respect to the other detectors of the Inner Tracking System.

The Pixel Trigger system and the SPD were also operated during injection tests toward ALICE: the beam was dumped before the ALICE cavern and the muons resulting from the dump went through the ALICE detectors. Events with high occupancy were recorded using a multiplicity algorithm. Figure 7 shows an example of a recorded event during the injection test of July 2009. It is possible to see long tracks crossing the 200  $\mu$ m thick silicon sensors.



Figure 7: Event display of a particle shower during injection tests (July 2009). The two layers of the SPD are shown.

Beam-induced interactions were also observed in the ALICE Inner Tracking System during September 2008 when the first beams were circulating in the LHC.

#### V. CONCLUSIONS

The ALICE Silicon Pixel Detector can generate a Fast-OR output that contributes to the first level of trigger (Level 0) of the experiment. The Pixel Trigger System was designed and implemented to process the Fast-OR signal, and since May 2008 is operating in the cosmic acquisitions and with the first beams.

A fine tuning of the Fast-OR circuitry of all the 1200 front-end chips of the SPD is required to maximize the single hit detection and minimize the noise in the trigger signal.

After studies in the laboratory, an automatic tuning procedure for the Fast-OR signal has been designed, tested and qualified in the ALICE experiment. New code was implemented in the SPD driver system to manage the calibration. The Pixel Trigger and the SPD driver systems can interact through a new communication channel. A Detector Algorithm has been specifically designed to analyze the results of the Fast-OR tuning.

A calibration scan over the full SPD can be done in less than 4 hours, with enough statistics to determine the optimum settings of the Fast-OR DACs. The percentage of operating chips is  $\sim$ 96%.

After the Fast-OR tuning, the SPD is successfully contributing to the Level 0 trigger of the ALICE experiment, being the only vertex detector among the other LHC experiments to be included in the trigger decision.

#### VI. REFERENCES

[1] ALICE collaboration,, "The ALICE experiment at the CERN LHC", *JINST 3 S08002*, 2008

[2] ALICE collaboration, "ALICE: Physics Performance Report, Volume 1", 2004, *J. Phys. G: Nucl. Part. Phys.*, 30, 1517-1763

[3] ALICE collaboration, "ALICE: Physics Performance Report, Volume 2", 2006, *J. Phys. G: Nucl. Part. Phys.*, 32, 1295-2040

[4] ALICE collaboration, "ALICE Inner Tracking System (ITS): Technical Design Report", Oct. 2001, *CERN-LHCC-99-012* 

[5] A. Kluge et al., "The ALICE Silicon Pixel Detector (SPD)", *Nucl. Instr. and Meth. Section A*, Vol. 582, Issue 3, Dec 2007, pp. 728-732

[6] P. Riedler et al., "Production and Integration of the ALICE Silicon Pixel Detector", *Nucl. Instr. and Meth. Section A*, Vol. 572, Issue 1, 2007, pp. 128-131

[7] V. Manzari et al., "Assembly, construction and testing of the ALICE silicon pixel detector", *Nucl. Instr. and Meth. Section A*, Vol. 570, Issue 2, 2007, pp. 241-247

[8] A. Kluge et al., "ALICE Silicon Pixel on detector Pilot System OPS2003 – The missing manual", ALICE Internal notes, *ALICE-INT-2004-030* 

[9] G. Aglieri Rinella et al., "The Level 0 Pixel Trigger system for the ALICE experiment", *JINST 2 P01007*, 2007

[10] G. Aglieri Rinella et al., "The Level 0 Pixel Trigger System for the ALICE Silicon Pixel Detector: implementation, testing and commissioning", *Proceedings of TWEPP 2008*, Naxos, Greece, Sep. 2008, pp. 123-128

[11] C. Torcato de Matos et al., "The ALICE Level 0 Pixel Trigger driver layer", *Proceedings of TWEPP 2008*, Naxos, Greece, Sep. 2008, pp. 516-521

[12] A. Mastroserio, "Operation of the ALICE Silicon Pixel Detector with cosmics and first beams", *Proceedings of* 11<sup>th</sup> ICATPP Conference, Como, Italy, Oct. 2009

## A programmable 10 Gigabit injector for the LHCb DAQ and its upgrade

## V. Delord<sup>a,b</sup>, J. Garnier<sup>a</sup>, N. Neufeld<sup>a</sup>

### <sup>a</sup> CERN, 1211 Geneva 23, Switzerland

<sup>b</sup> ISIMA, 63173 Aubière, France

vincent.delord@cern.ch, jean-christophe.garnier@cern.ch, niko.neufeld@cern.ch

#### Abstract

The LHCb High Level Trigger and Data Acquisition system selects about 2 kHz of events out of the 1 MHz of events, which have been selected previously by the first-level hardware trigger. The selected events are consolidated into files and then sent to permanent storage for subsequent analysis on the Grid. The goal of the upgrade of the LHCb readout is to lift the limitation to 1 MHz. This means speeding up the DAQ to 40 MHz. Such a DAO system will certainly employ 10 Gigabit or technologies and might also need new networking protocols: a customized TCP or proprietary solutions. A test module is being presented, which integrates in the existing LHCb infrastructure. It is a 10-Gigabit traffic generator, flexible enough to generate LHCb's raw data packets using dummy data or simulated data. These data are seen as real data coming from sub-detectors by the DAQ. The implementation is based on an FPGA using 10 Gigabit Ethernet interface. This module is integrated in the experiment control system. The architecture, implementation, and performance results of the solution will be presented.

#### I. INTRODUCTION

The LHCb experiment [1] is currently using a partition dedicated for tests, using a data-flow generator [2] [3]. It gets simulated data from an on-site storage, formats them to the Online protocol and sends them to the High Level Trigger (HLT) [4] farm. The entire Online and Offline systems can be tested this way during LHC shutdown periods, and even in parallel of normal activities.

The project presented in this paper is related to the LHCb upgrade project, and comes mainly from two requirements. The data acquisition (DAQ) [5] system relies currently on Gigabit Ethernet. Its rate is about 35 GB/s. The average size of an event is 35 kB, the event rate is 1 MHz. The upgraded detector aim to reach the full readout speed at 40 MHz. The upgraded DAQ will likely use 10 Gigabit Ethernet (GBE) or Infiniband. The HLT farm processes these events and produces an output rate of 2 kHz.

The idea is to provide a new solution which would be integrated into the system like a real readout board. It would behave like a readout board, except that it would get simulated data from a storage system instead of the physics data from the detector. It is however a long term R&D project and it would be interesting to include this test device in the current DAQ configuration.

A first design is presented in this paper. Sec. II. presents

the study and the specifications of the project. Sec. III. presents the main ideas and technologies which manages each part of the system. Sec. IV. discusses about the current limits and the next steps in the design.



Figure 1: The LHCb data acquisition system and its main data-flows. Both are triggered and controlled the same way. The only difference is the source of the physics events.

#### **II. SPECIFICATIONS**

#### A. Aims

The aims of the test device "injector" are:

- To provide a data-flow identical to the normal data-flow coming from the detector and the Readout Boards [6]. It means that it has to send network frames as if they were coming from the Readout Board layer, faking the IP addresses [7] and other informations.
- This data-flow has to be complex enough in order to be used for trigger and Offline tests. The simulated data-flow is usually represented by several files of ten million events. The average size of an event is 35 kB.
- To be integrated into the DAQ as a Readout Board. It means to be the connected to the Readout Supervisor (Timing and Fast Control, TFC) [8] and to be triggered by it.

- To be integrated into the Experiment Control System (ECS) [9].
- To be used in parallel with every other LHCb activity.

In the end, the architecture shown in Figure 1 would provide two identical data-flows. One will be dedicated to physics analysis, while the other one will be used for large scale tests.

As the project is in its very first stages, and related to the parallel on-going LHCb upgrade, it has some specific aims. For the design period of the upgraded DAQ architecture, it would be interesting to use this injection device as a pattern generator. Since the protocol that an upgraded DAQ will use is not define yet, it is interesting to have a modular architecture for the injector so we could perform tests using the current Multi-Event Packet (MEP)[10] protocol, or using the Transport Control Protocol (TCP) [11].

#### B. Analysis

In order to get a high data rate injection, this device will be first studied with a 10 GbE interface. Using a single 10 GbE Injector would allow to get a 35 kHz rate. Our aim is to provide an input rate high enough for the HLT farm to perform event selection, i.e. greater than 2 kHz. This is therefore already much faster, and it would be possible to use several injection devices to increase this rate to reach the real one. Driving a 10 GbE network interface is quite limited using commodity hardware [12]. Indeed reaching the line rate requires at least one CPU entirely dedicated to drive the interface. Processing the events is also quite heavy.

The main task is to read simulated data, to process them lightly before to format them to the networking protocol, according to the trigger information coming from the readout supervisor. It can be achieved using a pipelined architecture, with different stages for each part of the processing: reading, formatting, encapsulating, sending (as shown in Figure 2).



Figure 2: 4-stage pipeline processing events independent from each other.

The Readout Supervisor uses the Time, Trigger and Control (TTC) [13] interface to distribute information over all Readout Boards and over Injection Devices. It is required to process these information in Real-Time and to be always synchronised, with the supervisor and with peer injectors. This means that we cannot suffer from a delay caused by reading the simulated event or from the access to the network interface.

It has been decided, in order to meet all the requirements the best as possible, to implement the injector on a hardware setup, based on a Field-Programmable Gate Array (FPGA). An hardware development is indeed the best solution to process the Readout Supervisor triggers. This promises better performances processing data, and driving the 10 GbE interface.

An Altera PCI development board, based on the Arria GX FPGA, was chosen for a preliminary implementation. It is featuring an High-Speed Mezzanine Connector (HSMC) which allows us to interface various types of connectors for the 10 GbE and the TTC interfaces. This board will not reach the 10 GbE line rate. It is used for proofs of concept, for preliminary implementation and tests. The next version will very likely use an Altera Stratix family FPGA, in order to drive as efficiently as possible a Small Form Factor Pluggable Transceiver (SFP+).

According to all these choices, Figure 3 presents schematically the architecture of the hardware data injector. The design has to be modular, so we could easily replace a core by another one. This would be used mainly on the layer 4 networking core, to address the specifications, and for the storage access, as this part is still under study and it would be interesting to compare several solutions.



Figure 3: Architecture of the FPGA.

#### **III.** IMPLEMENTATION

The architecture of every core follows a generic scheme shown in Figure 4. It consists of a Control Unit, which is a Finite State Machine, and a Processing Unit. The Control Unit generates signals to trigger actions in the Processing Unit. The processing unit implements memories, registers and computing units in order to process the data-flow.



Figure 4: Generic model of a core.

This section presents the implementation of the networking layer, and the investigations for the storage access layer and the integration into the experiment control system.

#### A. Networking Implementation

The current network stack in LHCb is MEP over IP over Ethernet. MEP is a kind of User Datagram Protocol (UDP) [14] which is limited in features. With this device, we would therefore like to test other protocols over IP. We are considering to use the TCP protocol for the upgraded DAQ. It would provide flow control and would ensure that no data are lost over the network. IP and Ethernet cores will be always used.

The idea is to implement one core per protocol, and to connect them in a pipeline. All modules are therefore working in parallel and producing a stream of packets on the network interface.

A licensed Intellectual Property manages the 10 GbE Ethernet Media Access Control (MAC). On top of it, the IP and MEP cores were developed. A particularity of our design is that the IP core is custom. It does not include the IP fragmentation process, and it is only performing data sending. We can afford this only in the case of the MEP protocol, as we need only to send data, not to receive them. The fragmentation is performed in the output of the MEP module. These non-respect of the standard allows the minimization of the resources used by the system in the case of the MEP transport protocol, as it requires less memory usage. The complete frame (header and physics data) is indeed cut while it flows out from the MEP core, as shown in Figure 5, so the IP core input frame length is always lower than the maximum size. The IP core requires however a few more signals to manage the fact that the incoming frame IP headers need to have consistent information.

In the case of the incoming TCP integration, we will manage the IP fragmentation in the output of the TCP core. Then we will need one more IP module, dedicated to the reception of data. These data will mainly consist in TCP acknowledgement packets. Indeed receiving data, even small packets, requires to implement the IP reassembly. Our network architecture will use, for each protocol, one core dedicated to sending data and one core dedicated to receiving data.



Figure 5: Processing Unit of the MEP core, with the fragmentation module.

#### B. Storage access

The FPGA injector device cannot store a large amount of physics data. In order to address our requirements, it has to read data from an external storage system. Mainly two options were studied:

- Access to a hard drive disk via the PCI interface.
- Access to a remote storage system via the protocol iSCSI [15].

The most scalable, interesting and challenging solution is the iSCSI implementation. Though it is currently provided by many industrial company for FPGA-based storage acceleration solutions, open source IP cores are not available yet. Its implementation calls for a quite a few time resources.

Here we would use it to access to a raw partition of our storage system, which would contain simulated physics events in a raw format. This partition would not be interpreted by a filesystem but would store directly the data.

#### C. Trigger and Control System

The hardware injector is triggered the same way as a normal Readout Board. It is receiving this trigger and all associated information via a TTC optic signal. This signal is encoded on a double channel, one is the proper trigger telling if the event is accepted. The other one is used to distribute information relative to the LHCb DAQ, as for example the destination HLT farm node, and information about the trigger. These information are required to write the IP and the MEP headers.

There are basically two ways to implement the reception of this signal. The first one is to interface directly a PIN diode. The other one is to use a TTCRx board [16].

This part is very important for the integration of the injection device in the control system, whereas it is not for preliminary tests. We can simulate the trigger information. This part relies on emulation, before to be implemented.

Nevertheless the selected solution is currently to interface the TTCRx board. It requires the design of a routing daughter board which would convert the TTCRx interface with the HSMC of the development board.

#### IV. CONCLUSION

This project is still very young. It is integrated into the upgrade of the LHCb detector, more particularly in the upgrade of the Online Data Acquisition system.

For the first few months of development, we focused on the implementation of the networking layer. So far we have the network architecture for data transmission in the MEP protocol. Though simulation is correct, it is required to carry out real performance tests in order to validate this design. The integration in the control system and the storage access layer implementation will follow shortly after.

#### ACKNOWLEDGMENTS

This research project has been supported by a Marie Curie Initial Training Network Fellowship of the European Community's Seventh Framework Programme under contract number (PITN-GA-2008-211801-ACEOLE)

#### REFERENCES

[1] The LHCb Collaboration, A Augusto Alves Jr *et al.*, The LHCb Detector at the LHC, JINST **3** S08005 (2008).

- [2] J. Garnier *et al.*, High-Speed Data-Injection for Data-Flow Verification in LHCb, *16th IEEE Real Time* (2009).
- [3] M. Cattaneo, LHCb Full Experiment System Test, *CHEP* (2009).
- [4] LHCb HLT homepage, http://lhcb-trig.web.cern.ch/lhcb-trig/HLT.
- [5] P. R. Barbosa-Marinho *et al.*, LHCb Technical Design Report, CERN/LHCC/2001-040 (2001).
- [6] A. Bay *et al.*, The LHCb DAQ interface board TELL1, *Nucl. Instrum. and Methods* A560 (2006) 494.
- [7] Information Sciences Institute, University of Southern California, RFC791 - Internet Protocol.
- [8] LHCb TFC homepage, http://cern.ch/lhcb-online/TFC.
- [9] LHCb ECS homepage, http://cern.ch/lhcb-online/ecs.
- [10] B. Jost, N. Neufeld, Raw-data Transport Format, EDMS 499933.
- [11] Information Sciences Institute, University of Southern California, RFC793 - Transmission Control Protocol.
- [12] Domenico Galli *et al.*, Performance of 10 Gigabit Ethernet Using Commodity Hardware, *16th IEEE Real Time* (2009).
- [13] B.Taylor, Timing Distribution at the LHC, 8th Workshop on Electronics for LHC Experiments (2002).
- [14] J. Postel, RFC768 User Datagram Protocol.
- [15] J. Satran *et al.*, Internet Small Computer Systems Interface (iSCSI) (2004).
- [16] J. Christiansen et al., TTCrx Reference Manual (2004).

## Wafer Screening of ABCN-25 readout ASIC

Peter W Phillips <sup>a</sup>, Bruce Gallop <sup>a</sup>, Richard Matson <sup>a</sup>, Richard Shaw <sup>b</sup>

<sup>a</sup> STFC Rutherford Appleton Laboratory, Didcot, Oxon, OX11 0QX UK <sup>b</sup> Cavendish Laboratory, University of Cambridge, CB3 0HE, UK

#### Peter.W.Phillips@stfc.ac.uk

#### Abstract

The ABCN-25 chip was fabricated in 2008 in the IBM 0.25 micron CMOS process. One wafer was immediately diced to make chips available for evaluation with test PCBs and hybrids, programmes which are reported separately to this conference. A second wafer was later diced untested to ensure continuity of supply. Early indications based on the first diced wafer suggested a percentage yield of more than 95%, however the community decided to screen the remaining wafers such that faulty die could be excluded from the module construction programme. This paper documents the test hardware, software and procedures used to perform the screening. An overview of results is also given.

#### I. INTRODUCTION

The ATLAS Binary Chip-Next (ABCN-25) readout ASIC is designed to support the R&D programme towards silicon detector modules for the ATLAS Tracker Upgrade. The chip implements pipelined binary readout for 128 silicon short strip detector channels.

Fabricated in 0.25 micron IBM CMOS technology during late 2008, the first wafer was diced immediately. Initial tests of wire-bonded chips revealed the design to be fully functional with a very high yield [1], [2]. However, as each detector module will use 40 chips, it remains important that any faulty chips are identified on-wafer such that they can be excluded from the build process.

#### II. HARDWARE AND SOFTWARE

The wafers were probed at the Rutherford Appleton Laboratory using a Cascade Microtech model S300 probe station. The machine, which has a 12" chuck, can easily accommodate the 8" ABCN-25 wafers. The custom cantilever epoxy probe card shown in figure 1, made by Rucker and Kolls, Milpitas, CA, has 122 probes. In place of the usual edge connector, 0.1" header pins are used to provide connectivity, a deliberate choice to give added clearance above the wafer surface during probing. The card has also been shortened to minimize the trace lengths and all LVDS pairs are terminated with 100 ohm resistors at the probe ring. Figure 2 shows the probe card aligned with an ABCN-25 die, probes in contact.

Commercial off the shelf (COTS) hardware from National Instruments (NI) is used to read out each ASIC. Fast test vectors are generated by the NI PCI-6562 400 Mb/s Digital Waveform Generator/Analyzer, which has 16 LVDS channels, and slow test vectors are generated by the NI PCI-6509 Low-Cost 96-channel TTL Digital I/O card.



Figure 1: ABCN-25 Probe Card



Figure 2: Probes in contact with ABCN-25 die

The single bonded chip PCB shown in figure 3 (left) was used extensively during firmware development and system commissioning. The custom driver board shown in figure 3 (right) performs level translation and implements a number of operational modes in which different combinations of ABCN-25 IO lines are mapped to fast and slow IO channels. The multiplexing is achieved by means of a Xilinx Spartan 3E FPGA with firmware written in VHDL. Analogue switches are provided to enable the chip's built-in custom power blocks to be exercised and to route the chip's analogue monitor pads to an Agilent 34401A digital multimeter. A second digital multimeter is used to monitor the analogue voltage generated by the ABCN-25's internal regulator, and a Thurlby-Thandar programmable power supply is used to provide constant voltage or constant current sources to the device under test.



Figure 3: ABCN-25 Bonded Chip and Custom Driver PCBs

The software used to control the system is a development of the SCTDAQ package used during early tests of the ABCN-25 chip. Written in C++, using ROOT for data analysis, this package originally used custom VME readout modules but has been successfully adapted to use COTS hardware from NI.

#### III. TEST METHODOLOGY

The test sequence comprises three parts: digital/power tests, DAC characterisation and analogue characterisation.

#### A. Digital/Power tests

Four test vector blocks engineered to test the complete digital functionality of the chip were supplied by the ASIC design team as Value Change Dump (.vcd) files. Each block was converted into a pair of Hierarchical Waveform Storage (.hws) files using National Instruments' Digital Waveform Editor utility as shown in figure 4: one file representing the waveform input to the chip, shown at the top, and a second file describing the expected output, shown at the bottom.



Figure 4: Vector Block A: Inputs and Expected outputs

Bearing in mind that ABCN-25 has built-in shunt regulator functionality to be used as part of a serially powered system [3], each test is performed under different powering conditions such that the basic functionality of each of the shunt blocks may be demonstrated. For a chip to be considered good, it must return no errors for any vector block. In addition, for tests executed with serial powering shunts active, the full source current must be drawn at the expected voltage.

#### B. DAC characterisation

The ABCN-25 design includes an on-chip multiplexer which enables each of a number of internal analogue signals to be routed to an external voltmeter. These nodes include the output of each of the chip's Digital to Analogue Converters (DACs). During the wafer probing, a digital voltmeter is used to record 8 points to characterise each DAC, and a single measurement for each static node available through the multiplexer. Additional measurements are made of the bandgap reference made available at the vbgtest pin and of the analogue voltage derived from the digital supply by the chip's built in regulator. Chips having DACs with anomalous single point measurements or DAC step sizes are considered as rejects.

#### C. Analogue characterisation

Each ABCN-25 readout channel has a 5-bit threshold DAC, used to compensate for offset variations across the chip. In addition the step size of these DACs, known as the trim range, may be set to one of 8 possible levels. All wafer probing data is recorded using trim range 4. With all trim DACs set to zero, threshold scans are made for charges of 1.5fC, 2.0fC and 2.5fC, injected using the ABCN-25's internal calibration circuitry. A fourth threshold scan is then made for an injected charge of 2.0fC, but this time the trim DACs are set to 31. This data may be analysed to calculate the gain, offset and noise of each channel and to estimate the number of channels which may be trimmed using the selected trim range. For a chip to be considered as good, it must have no more than one bad (dead, stuck or untrimmable) channel.

#### **IV. OPERATIONAL EXPERIENCE**

The probe card had been stored for some months before probing began. In order to make low resistance contacts it was necessary to clean the needles by scrubbing them a few times against an alumina ceramic sheet. Once reproducible results had been demonstrated using individual cut die, probing of the four remaining wafers began.

The software was originally written to retest die which failed any of the digital vector blocks, having first dropped and raised the chuck to relocate the probe needles, and to abort the test sequence if the die still failed. As we gained experience with the system, this was modified such that the sequence would only abort if three consecutive die failed both automated test attempts. In this manner a typical wafer of 456 complete ASICs would run to completion overnight, leaving a small number of die to be investigated in the morning. In all cases, the DAC and analogue characterisations were skipped for die which failed the digital tests. The test sequence takes approximately 2 minutes per die, dominated by the DAC characterisation (60%), but for such a small number of wafers this was not considered to be an issue.



Figure 5: Typical Three Point Gain Result.

A typical three point gain result for a probed chip is shown in figure 5. The top graph illustrates good threshold uniformity across the chip, the second plot shows the gain to be 95mV/fC and the bottom one shows the calculated input noise to be of order 450 ENC. The gain is approximately 10% lower than that of a bonded chip and the calculated noise is hence around 10% higher. The results are still of sufficient quality to screen wafers for anomalous die.

Figure 6 shows gain for all die of a single wafer and figure 7 shows the step size of one of ABCN-25's threshold DACs, also as a function of die number. Both plots show data from wafer A6GBD0X and feature the same two obvious outliers, having zero gain in one plot and a step size of half the normal value in the other. Hence the gain could not be determined due to a failure of the threshold DAC. Indeed all major DAC anomalies, both threshold and bias parameters, were found on die which would in any case have been rejected due to their limited analogue functionality. For production screening this could be an important observation, as the threshold scans complete much more quickly than the DAC characterisations.



Figure 6: Gain vs Die Number, wafer A6GBD0X



Figure 7: Threshold DAC step size, wafer A6GBD0X

#### V. RESULTS

Result maps of the four probed wafers are shown overleaf in figures 8 to 11, summarised below in table 1. The pattern displayed by wafer AJGBMX is perhaps the most interesting, having a cluster of digital failures near the wafer notch (top) and a cluster of analogue failures near the wafer serial number (bottom). The orientation of the die is such that the digital portion is nearest the wafer edge near the notch, and the analogue portion is nearest the wafer edge near the serial number. So this pattern is consistent with a processing defect affecting circuit blocks at large radii.

| Wafer   | Digital<br>Rejects | Analogue<br>Rejects | Good<br>Die | Total<br>Yield |
|---------|--------------------|---------------------|-------------|----------------|
| A6GBD0X | 10                 | 4                   | 442         | 96.9%          |
| AJGBDMX | 18                 | 12                  | 426         | 93.4%          |
| ARGBCYX | 15                 | 1                   | 440         | 96.5%          |
| AWGBDAX | 12                 | 15                  | 429         | 94.1%          |
| Overall | 55                 | 32                  | 1737        | 95.2%          |

Table 1: Yield Summary

#### VI. CONCLUSION

Four ABCN-25 wafers were successfully probed. The overall yield before dicing was found to be 95.2%. Commercial hardware from National Instruments provided an appropriate platform to readout each chip.

#### VII. OUTLOOK

It is planned to order further ABCN-25 wafers in the near future, to provide continued support to the ATLAS strip tracker upgrade programme. These wafers will also be screened at RAL. Looking further ahead, most elements of the present system may also be used to test future generations of ATLAS strip tracker readout chips made in 0.13 micron technology.



Figure 8: Wafer A6GBD0X test map



Figure 9: Wafer AJGBDMX test map

#### VIII. REFERENCES

[1] "Performance of the ABCN-25 readout chip for ATLAS Inner Detector Upgrade", F. Anghinolfi, proceedings of this conference.

[2] "Prototype flex hybrid and module designs for the ATLAS Inner Detector Upgrade utilising the ABCN-25 readout chip and Hammamatsu large area Silicon sensors", A. Greenall, proceedings of this conference.

[3] "Performance and Comparison of Custon Serial Powering Regulators and Architectures for SLHC Silicon Trackers", T. Tic et al, proceedings of this conference.





#### IX. ACKNOWLEDGEMENTS

The authors wish to acknowledge the help and assistance of the ABCN-25 community, especially: Francis Anghinolfi and Jan Kaplon, CERN, Geneva; Didier Ferrere and Sergio Gonzalez Sevilla, University of Geneva; Wladek Dabrowski and Michal Dwuznik, AGH Krakow; Tony Affolder and Ashley Greenall, University of Liverpool; Matt Warren, University College London; Mitch Newcomer and Mike Reilly, University of Pennsylvania.

## A Digitally Calibrated 12 bits 25 MS/s Pipelined ADC with a 3 input multiplexer for CALICE Integrated Readout

F. Rarbi <sup>a</sup>, D. Dzahini <sup>a</sup>, L. Gallin-Martell <sup>a</sup>, J-Y Hostachy <sup>a</sup>,

<sup>a</sup> IN2P3 – LPSC, 53 rue des Martyrs 38026 Grenoble, France

rarbi@lpsc.in2p3.fr

#### Abstract

The necessity of full integrated electronics readout for the next ILC ECAL presents many challenges for low power mixed signal design. The analog to digital converter is a critical stage for the system going from the very front-end stages to digital memories. We present here a high speed converter configuration designed to multiplex 3 analog channels through one analog to digital converter. It is a first step for a multiplexed 64 channel design. A CMOS  $0.35\mu$ m process is used. The dynamic range is 2V over a 3.3V power supply, and the total power dissipation at 25 MHz is approximately 40mW. An analog power management is included to allow a fast switching into a standby mode that reduces the DC power dissipation by a ratio of three orders of magnitude (1/1000).

#### I. INTRODUCTION

For the next International Linear Collider (ILC), the frontend electronics for the electromagnetic calorimeter is really challenging. Mechanical constraints lead to the necessity to integrate in the same chip many different critical stages of the read-out electronics: charge preamplifiers, multi gain shapers, analog memories, ADC, and digital back-end. The average power consumption budget is limited to only  $25\mu$ W per channel. This feature is reachable taking advantage of a power pulsing system with a 1/100 duty cycle, thanks to the beam timing of ILC. The design of the converter must deal with the power dissipation constraint which is one of the main concerns for the electronics. We present here a high speed converter configuration designed to multiplex many analog channels to one ADC as shown in figure 1.



Figure 1: Overview of the front-end read out in the high speed configuration

The chip included a 12 bits ADC and a 3 to 1 analog multiplexer. This design makes the assumption that a high speed converter helps to minimize the total cross talk and the equivalent power dissipation related to each channel. A pipelined architecture is used. For high dynamic converters (more than 10 bit), and high speed (beyond 10 MHz), this architecture is usually considered as a good compromise between the power dissipation and the speed [1]-[4]. An overview block diagram is shown in figure 2.



Figure 2: General block diagram of a pipelined converter

The ADC is composed of a set of pipelined stages. Each stage produces a digital estimate of an incoming held signal, then converts this estimate back to the analog, and subtracts the result from the held input. This residue is then amplified before being transferred to the next stage. Eventually the last stage is a full flash converter which determines the least significant bit (LSB). The successive digital results from the pipelined stages are appropriately delayed throughout a bit alignment network. Then a digital correction stage helps to recover the errors due to the offset of the comparators. Therefore, low offset comparators are not necessary and the total power consumption is reduced. The power dissipation is optimized for each stage following a power scaling in the successive pipeline stages.

This paper summarizes hereafter the design of two prototypes of the converter and we present some testing and simulation results. The first chip was implemented without any calibration nor trimming [5]. The second prototype is designed with a 3 to 1 analog multiplexer and includes an ADC with a dynamic element matching algorithm to improve the linearity.

#### II. THE PIPELINE ADC

#### A. The 1.5 bit stage

The converter consists of ten 1.5 bit sub-ADC followed by a 2 bit full flash stage (refer to figure 2). In figure 3 is illustrated a very simplified diagram of a 1.5 bit pipeline stage. The actual implementation in our design is differential. The A/D block consists of two non critical comparators. The D/A conversion, subtraction, amplification, and S/H functions are performed by a switched capacitor circuit with a resolution of 1.5 bit per stage and an amplification gain of 2. Hence the transfer function of this stage is: Vs=2\*Vin- $\alpha$ \*Vref.

 $\alpha$  is set to 0 or 1 or -1, depending on the output codes (b0, b1);  $\pm V_{ref}$  specifies the dynamic range.



Figure 3: Bloc diagram of a 1.5 bit sub-converter stage.

The prototype has been tested successfully at 25 MHz with a power supply of 3.3 V. The total power consumption was only 37mW.

In figures 4 is shown the output codes for a 2 V peak-to-peak dynamic range with a 1 MHz sine wave input signal.



Figure 4: Output codes for an input 2V peak-to-peak sine wave

The Differential Non linearity (DNL) and the Integral Non Linearity (INL) are presented respectively in figure 5 and 6.



The DNL is almost  $\pm 1$ LSB, and the INL is  $\pm 4$ LSB.

This prototype deals with CALICE requirements and it is closed to the capacitors matching limits in this  $.35\mu$ m process. One solution to improve further the linearity and the total power consumption is to include a first multi-bit stage. Thus a second prototype was designed. This new version uses 2.5 bits in the first stage followed with seven 1.5 bit stages and a last 3 bits full flash. The architecture of this second prototype is illustrated in figure 7.



Figure 7: Block diagram of a pipelined converter with a multi-bit stage

#### В. The 2.5 bit stage

Increasing the number of bits in the front-end stage, relaxes the matching conditions necessary for the back-end; but it makes the amplifier more power consuming to deal with the gain bandwidth product requirements. The gain errors in this first stage are digitally controlled by means of a dynamic element matching (DEM) algorithm for a random choice of the DAC capacitors cells. This algorithm helps to minimize the integral non linearity.

In figure 7 is shown a simplified diagram of a 2.5 bit stage as a front end stage of the pipeline converter. The ADC block consists of six non critical comparators. The DAC conversion, subtraction, amplification, and S/H functions are performed by a switched capacitor structure as one can see in figure 8. This block is the multiplier-DAC (MDAC). It is composed of four capacitors.



Figure 8: A 2.5 bits MDAC a) sampling phase; b) amplifying

The incoming signal is sampled during phase " $\Phi$ 1" (figure 8 a). It is amplified by charge redistribution during phase " $\Phi$ 2" (figure 8 b)). During this amplification phase, one plate of the sampling capacitors (Csi) is connected to a reference voltage V<sub>refi</sub> which will be subtracted from the amplified signal. The residue resulting from this operation is transmitted to the next pipeline stage. The value selected for V<sub>refi</sub> is respectively 0 or  $(-V_{ref})$  or  $(V_{ref})$  depending on the comparators outputs. The amplification gain is 4. Hence the transfer function of this stage is:  $V_s=4*V_{in}-(\alpha+\beta+\gamma)*V_{ref}$  where  $\alpha$ ,  $\beta$  and  $\gamma$  are set to 0, -1 or 1, depending on the output codes of the sub-ADC.  $\pm V_{ref}$  specifies the dynamic range. The transfer characteristic for a 2.5 bit stage is shown in figure 9.



Figure 9: A 2.5 bit residue transfer curve

The expression "2.5 bit" is used to emphasize that only 7 combinations out of the 8 are acceptable for the output codes. The code (1, 1, 1) is avoided, thereby the amplifier will not saturate and this leaves room for the digital error correction.

The sub-ADC is composed of 6 low offset and low power dynamic comparators. The simplified schematic of the comparator is shown in figure 10.



Figure 10: The dynamic comparator

The maximum offset of these comparators must be limited to  $V_{ref}/8$ , where  $\pm V_{ref}$  is the full dynamic range. Our Monte Carlo simulation of the comparator's offset is shown in figure 11 where one can notice a value less than  $\pm 40$  mV.



Figure 11: Offset of dynamic comparator (monte carlo simulation with 50 bins)

The output codes from the comparators are used thereafter by the DAC to rebuild the analog residue. A precise amplification by 4 is performed by four equivalent capacitors as shown in figure 8. The matching of  $C_f$  with all  $C_s$  is the main issue for this amplification, and it is the main cause of non-linearity for the converter.

To expect a 12 bit resolution feature, the amplifier (OTA) in the first stage must have a high open loop gain (more than 72 dB). The folded-cascode architecture used is shown in figure 12. Auxiliary amplifiers are added to increase the open loop gain [6], at just a little expense of power dissipation. The Bode diagram simulations results are given in figure 13.



Figure 12: A regulated folded-cascode OTA



Figure 13: Bode diagram for the OTA on a 4pF load.

As we can see on the Bode diagram (figure 13), the cut off frequency at closed-loop gain of 4 (e.g. 12dB) is approximately 80MHz. Therefore a 25MHz sampling frequency is easily attainable.

The linearity simulations of our first Multiplier and DAC stage are given in Figure 14. One can notice a full range integral non linearity (INL) in the order of 1 LSB.



## Multi-bit MDAC linearity (LSB@12bit)

Figure 14: Non linearity of the MDAC 2.5 bit.

#### C. Dynamic Element Matching (DEM)

Dynamic Element Matching permits to improve linearity of the ADC. In fact, it converts harmonic distortions into noise. The DEM block diagram is shown on figure 15.



Figure 15: DEM Block diagram.

It consists of a random generator and a command control block which permit both to connect randomly one capacitor as a feedback capacitor on the OTA. The "yellow" block on figure 15 is used to make a link between output comparators to MDAC switches through only one transistor gate to be no sensitive to propagation time.

Matlab simulation results of the DEM principle are shown on figure 16. When DEM is "off" a) we have harmonic distortions which degrade linearity. On figure 16 b) DEM is "on". We can notice that harmonic distortions are changed into noise. The noise floor are a little bit increased.



This design was submitted in a CMOS  $0.35\mu$  process from Austria Micro System. The full layout photograph of the prototype is shown in figure 17. The prototype is composed of an analog multiplexer followed by a 12 bit pipeline ADC with a 2.5 bit first stage. The 3 channel analog multiplexer design and simulation results are presented in the next part.



Figure 17: Layout photograph of the full prototype: analog multiplexer+ADC

#### III. THE 3 INPUTS ANALOG MULTIPLEXER

We present in this section the architecture and some simulation results of the 3 channel analog multiplexer.

#### A. Analog Multiplexer architecture

The analog multiplexer is designed to transfer successively the signal from the analog memories to the high speed ADC. A pseudo-differential and flip-flop architecture is used to overcome the capacitor's matching problem. A bloc diagram of the multiplexer is shown on figure 18.



Figure 18: Analog multiplexer schematic in a) write mode, and b) read mode

During the write mode a), input signals are sampled through capacitors  $C_{si}$ . After that, each capacitor is connected sequentially as capacitor feedback on the amplifier. The same capacitor is used as sampled and read component: we have then no gain error in the analog multiplexer due to capacitor mismatch.

#### B. Simulation results

Two full range ramps with opposite slope are set on the external channels while a constant 2mV low signal is put in the middle.

The error found  $(213\mu V)$  is less than 1 LSB, and the impact on the low level signal is only  $140\mu V$ .



Figure 19: Multiplexer simulation results.

#### IV. AVERAGE POWER CONSUMPTION

We present in this section some simulation results about power consumption of this chip: the 3 inputs analog multiplexer followed by a 12-bit pipeline A/D converter using 2.5 bits in the first stage.

The total power consumption of the analog multiplexer is around 5.4mW according to our simulations up to 25MHz. This power consumption comes mainly from the amplifier. And the pipeline ADC has a power consumption about 40mW. It means the full chip: analog multiplexer and the ADC dissipates 45.4mW with a sampling frequency of 25MHz.

For the next ILC experiment, we choose to use only one fast ADC per chip. Each chip is composed of 64-channels and the depth of the analog memory will be sixteen. The ADC and multiplexer power consumption per chip is about 4  $\mu$ W by using power pulsing concept. This leads to an equivalent power consumption about only 125nW per channel. These results show a power consumption for both the multiplexer and the ADC of only 0.5% of the total power consumption which was estimated to 25 $\mu$ W per channel.

#### V. CONCLUSION

The design of two prototypes of a 12 bit 25MS/s pipelined ADC has been reported: the second one is used with a 3 three inputs analog multiplexer which will be extend to 64 in the

future. The first chip consumes very reasonable power dissipation: only 37mW. A 1.5 bit/stage architecture is used for the converter in a differential configuration. It has almost  $\pm$ 1LSB of DNL and  $\pm$ 4LSB of INL. This converter is a high speed version for the future International Linear Collider calorimeter detector (CALICE collaboration). The second version has been designed to improve linearity and power dissipation. A 2.5 bits first stage is used in this second chip. A 3 input analog multiplexer was also design to make the connection between 3 channels and the fast ADC. A very efficient fast power pulsing is integrated with this circuit to reduce the total DC power dissipation according to the beam low duty cycle.

#### REFERENCES

- S. H. Lewis, et al., "10-b 20-Msample/s analog-to-digital converter," *IEEE J. Solid-State Circuits*, vol. 27, pp.351-358, March 1992.
- [2] T. B. Cho and P. R. Gray, "A 10-bit, 20-MS/s, 35-mW pipeline A/D converter," in Proc. IEEE Custom Integrated Circuits Conf., May 1994, pp23.2.1-23.2.4.
- [3] B.P. Brandt and J. Lutsky, "A 75-mW, 10-b, 20-MSPS CMOS subranging ADC with 9.5 effective bit at Nyquist" IEEE Journal of Solid State Circuit, pp.1788-1795 Dec. 1999.
- Byung-Moo Min, et al "A 69-mW 10 bit 80 MS/s pipelined CMOS ADC" IEEE Journal of Solid State Circuit, Vol.38, N°12 Dec. 2003.
- [5] Ryu S.-T., Ray S., Song B.-S., Cho G.-H., Bacrania K. "A 14-b Linear Capacitor Self-Trimming Pipelined ADC", IEEE J. Solid-State Circuits, vol. 39, n°11, pp. 2046-2051, November 2004
- [6] Bult K., Govert J. G. M. G., "A Fast-Settling CMOS Op-Amp for SC Circuit with 90-dBDC Gain", IEEE Journal of Solid State Circuit, Vol.25, N°6, pp1379-1284, Dec. 1990.

## Standalone, battery powered radiation monitors for accelerator electronics

T. Wijnands<sup>a</sup>, C.Pignard<sup>a</sup>, G.Spiezia<sup>a</sup>,

<sup>a</sup> CERN, 1211 Geneva 23, Switzerland

thijs.wijnands@cern.ch

#### Abstract

A technical description of the design of a new type of radiation monitors is given. The key point in the design is the low power consumption inferior to 17 mW in radiation sensing mode and inferior to 0.3 mW in standby mode. The radiation monitors can operate without any external power or signal cabling and measure and store radiation data for a maximum period of 800 days. To read the radiation data, a standard PC can be connected via a USB interface to the device at any time. Only a few seconds are required to read out a single monitor. This makes it possible to survey a large network of monitoring devices in a short period of time, for example during a stop of the accelerator.

#### I. INTRODUCTION

The Large Hadron Collider (LHC) is high energy, high intensity p-p collider that is using superconducting magnets to bend the two counter rotating proton beams on a circular orbit. To operate the accelerator, a large amount of electronic equipment is need for the powering, the vacuum, the quench protection system and the beam instrumentation. To reduce overall cabling costs and to improve on the S/N ratio, electronic systems have been placed under the superconducting magnets along the 27 km long underground tunnel where they will exposed to particle radiation caused by the interaction of protons with material. An increase in radiation levels may damage the components and systems and these may eventually stop working correctly.

An on line radiation monitoring system was installed and commissioned in the LHC in 2007 [1]. On line monitoring provides timely information on the evolution of the radiation fields but has the drawback that it requires extensive signal and power cabling. Furthermore, once the cabling has been installed, the location of the on line measurements has to be at maximum 25 m from the cable junction box which is a constraint if uncertainties exists on the spatial distribution of the radiation.

The standalone version presented here overcomes these problems and does not need external powering or signal cables. The monitor design is based on the same principles as the on-line version but is less complex and produced at much lower cost. In addition, the circuitry for the constant current generation, voltage sources and for the AD conversion is integrated in a USB readout interface that is not exposed to radiation. The USB interface can be connected to and powered by any (portable) PC which facilitates measurements in the field, for example during a technical stop of the accelerator. Apart from the radiation data, the number of clock cycles between the start of the measurements (initialisation) and the moment of readout is also provided which makes it possible to compute a time averaged dose rate and the associated particle flux. The readout software is based on LabView and is using the standard firmware and software that is delivered with the USB interface module.

The monitor has a total of 3 batteries on board. Two of these are used to power the cyclic operations during data taking when the monitor is in radiation sensing mode. One battery is required for the long term data storage when the monitor is in standby mode. The 2 main batteries provide 8.5 A/h which is sufficient to operate 150-220 days, depending on the setting. When the voltage from the main batteries becomes insufficient, the monitor switches automatically from radiation sensing mode to standby mode. When switching to standby mode, the data on the neutron fluences is stored in triplicate storage registers powered by the backup battery. The corresponding elapsed timed is stored too which makes it possible to compute the time averaged neutron flux. In back up mode, an additional 500-600 days of data storage is possible. At present 4 prototypes are under evaluation and being prepared for radiation tests in the field.

#### II. HARDWARE DESIGN

#### A. Radiation Sensors

The monitor presented here measures the total integrated ionising dose in Silicon in Gray, the 1 MeV equivalent neutron fluence per cm<sup>2</sup> and the fluence of nucleons (protons, neutrons) per cm<sup>2</sup> with a threshold energy of 20 MeV.

The radiation sensors consist of 2 radiation sensing MOSFETs (RADFETS<sup>®</sup>) from Tyndall Ltd [2] of different oxide thickness, 6 SIEMENS BPW 34FS Photodiodes in series [3] and 8 x 4 Mbit of Static Random Access Memory (SRAM) TC554001AF-7L from Toshiba[4].

Radfets are radiation sensing Mosfets that have been designed to measure the total ionising dose in Silicon. When exposed to ionising radiation, electron-hole pairs are created in the gate oxide which leads to a change in the threshold voltage. The change of the threshold voltage varies approximately linearly with the dose absorbed. Radfets were chosen as radiation sensors for total dose because they do not need external power when in radiation sensing mode.

The Photodiodes are used in series to enhance the overall sensitivity to detect low energy neutrons. When exposed to particle radiation, the carrier lifetime, the resistance and the carrier density are all changing and the end result is a near linear variation of the forward voltage at constant current injection. As with Radfets, the pin diodes do not need to be powered when in radiation sensing mode. The 32 Mbit of SRAM memories are used to measure the fluence of hadrons. Single hits from ionizing particles can change the logic state of the data stored in the memory. The amount of logic changes (Single Event Upsets) is approximately proportional to the hadron flux. In contrast with the other sensor types, the SRAM memory is always powered and constantly accessed with a read-compare-write cycle. The cyclic scanning of the memory and byte-by-byte comparison with a reference pattern is responsible for most of the power consumption when the monitor is in radiation sensing mode.

#### B. Operating principles

In radiation sensing mode, the radfets are connected as shown in figure 1 (left) with the gate connected to the drain and the bulk to the source. In this configuration the radfet is not using any power.



Figure 1: Schematic of a RADFET in radiation sensing mode (switch closed)

To readout the radfet, an external current source is connected and the switch is opened. The variation of the threshold voltage  $V_t$  at constant current is proportional to the total accumulated dose. The forward voltage  $V_t$  is measured via an external signal cable connected to an USB interface which connect to any standard desktop PC (see below).

The operating scheme for the diodes is near identical to that shown in figure 1 with the Radfet replaced by 6 diodes in series. The forward voltage  $V_t$  is measured identically.



Figure 2: Schematic of a RADFET in reader mode (switch open).

The value of the threshold voltage  $V_t$  has a strong dependence on the temperature which should not be interpreted as a variation of the radiation levels. Close to the radiation sensors, a platinum-chip temperature sensor (Jumo PCA 1.1501.1M) sensor provides a temperature measurement at the moment of readout. The temperature induced variations

of the threshold voltage  $V_t$  can then be corrected for in software in the PC.

The readout of the SRAM memory is shown in figure 3. The 32 Mb SRAM is organized as 8 \* 512Kbytes. Every 1.7ms, a specific location in the memory is addressed and an 8 bits word is read from location and compared to the reference word in the static buffer. If radiation has modified the contents of word in the memory, the comparator will increase the contents of the SEU counters by one count. After the comparison the reference word is written back at the same location of the memory.

At start up, the entire SRAM memory is initialised and the reference word is written at each address location. To refresh and initialise the entire memory, the cycling is reduced to 385 ns for a period 1.6 seconds with an external switch. This operation consumes a significant amount of power and is only used at start up.



Figure 3: Schematic for the cyclic readout of the memory.

There are 3 identical data storage buffer counters to minimise the possibility of radiation induced errors in the counting. Via an external signal cable the contents of these 3 buffers can be read out using the USB interface module that is connected to a PC. In the unlikely event that the data in one of the counters is corrupted, majority voting is used.

Table 1: Operational data for SRAM counter.

| Task                | High Sensitivity | Low Sensitivity |
|---------------------|------------------|-----------------|
| Biasing Voltage     | 3 V              | 5 V             |
| Memory Refresh time | 2 hours          | 2 hours         |
| Power Consumption   | 12 mW            | 17 mW           |
| Max. operating time | 220 days         | 148 days        |
| Max. Data storage   | 595 days         | 523 days        |

To increase the sensitivity to neutrons, the memory can be operated at 2 different bias voltages. The lowest voltage provides the highest sensitivity and the longest operating time (table 1).

Each monitor has a unique identifier which consists of a 64 Bit ROM registration number that is factory laser written into the chip (Maxim DS2433 4 kB 1-wire EEPROM). This assures absolute identity because no two parts can be identical.

#### C. Battery power

The 2 main batteries for the monitor are Lithium Thionyl Chloride Batteries (Li-SOCl<sub>2</sub>). Lithium batteries were chosen

because they have been used for various space mission (Mars pathfinder Rover, Deep Impact Mission etc), because they have the highest specific energy (up to 500Wh/kg) and because they have shown to be radiation tolerant to total dose up to 6 kGy. Another advantage is the low self-discharge rate of these kind of batteries.

For the monitor presented here, two SL2770/T Li-SOCl<sub>2</sub> batteries type C from TARDIAN<sup>®</sup> are used. These batteries have a nominal voltage of 3.6 V and a capacity of 8.5 Ah. The configuration is cylindrical and spirally wound (power cell type).

A first radiation test consisted in exposing these batteries to a total dose of 200 Gy from gamma rays from a  $^{60}$ Co source. At a dose rate of 360 Gy/hr, the batteries were connected in series to an external load (resistance) simulating the radiation monitor. No significant variation of the output voltage or current was observed after a total dose of 200 Gy. These results are in line with the radiation data that was accumulated for the Galileo space mission where batteries of this type were exposed to 6 kGy [4]. The radiation tolerance to neutrons of Lithium Thionyl Chloride Batteries is still relatively unknown also in literature and needs to be investigated in forthcoming radiation tests.

A single backup battery powers the data buffer counters when the main batteries are wearing out. This is a Li/MnO2 button cells battery of very small size type CR2477NRV-LF from RENATA with a nominal capacity of 3 V and 950 mA/h. The Li/MnO2 battery is relatively cheap and provides the best practical volume/capacity ratio. It has also a very low self discharge and excellent storage capacities. A disadvantage is that the discharge voltage slope may vary over the battery life time. When exposed to ionising radiation up to 200 Gy in a similar experiment as described above, no noticeable variation of the output voltage was observed. Forthcoming radiation tests will have to make this issue more precise.

### D. USB interface and PC software

The USB interface module contains all functionality that is required to digitize and permanently store the radiation data from the monitor on a standard PC. The interface performs digital and analogue I/O, generates the constant current required to read out the Radfets and PIN diodes and transmits serial data to the USB bus and vice versa. A 12 bit ADC converts the analogue data from the temperature and radiation sensors. The powering of the module from the host PC is via the standard USB 2.0 interface.



Figure 4: The USB interface module which connects the monitor to any standard desktop or portable PC.

The USB interface module is a commercial serial to USB converter of type USB I/O 24 R from Elexol<sup>®</sup>. This device is shipped with a standard set of drivers and dynamic linked libraries for Windows OS. The I/O is carried out via 3 ports of 8 bits each. Port A is used to communicate the status of the monitor. This includes information on the powering mode which can be on main batteries or in backup powering mode. Port B of the interface is used to send commands to the monitor. In total there are 19 different commands to address all different sensors and buffers on the monitor. On port C, the response of the monitor is presented following a command on port B. With each response, the data status is also communicated. To read out the 12 bit ADC for example, 2 commands are needed. On the second response, the 4 remaining bits communicate the status of the monitor and the validity of the response.

A Labview application has been written to perform the data reading in a cyclic manner and to display the data for analysis. To generate the final data, a series of 50 measurements is made which are then time-averaged and achieved. This is to mitigate the influence of noise and to be able to provide data below the single bit resolution of the ADC. Such a series of measurements requires a total time of approximately 1 second so that several devices can be measured in relatively short period of time.

#### E. Assembly and final presentation

Each monitor has 2 PCBs of 4 layers each. The mother board assures the powering of the device, the timing and the long term data storage. The radiation sensors and the temperature sensors are located on the upper PCB which is plugged in the mother board. The main batteries are located underneath the motherboard and they use approximately 80% of the available space. The entire assembly is bolted in a aluminium casing using steel fixations (see figure 5).



Figure 5: Fully Assembled Radiation Monitor

To avoid damage during transport into the accelerator tunnel, the aluminium casing can be sealed with an aluminium lid. The device can be fixed to any support using standard DIN rail. Only when the device is placed in its final position, the device is initiated and the on board clock is started by manipulating small push buttons.



Figure 6: Readout configuration

It is important to verify the status of the device at regular intervals to ensure that the main batteries are still operational. In addition, accumulating radiation data at arbitrary intermediate intervals can provide a more precise estimate of the fluence to dose ratios at the location of the monitor. To read out the devices in the field, small portable EEPCs are used which have sufficient autonomy to read out various devices. Figure 6 shows the configuration for reading out a monitor with the USB interface module. The module is connected with a signal cable to the monitor and with a USB 2.0 connection to the portable PC.

#### **III. EXAMPLES OF APPLICATIONS**

Two typical applications where these monitors can be of use are found in the experimental caverns of the LHC where the protons beams are colliding and the main LHC particle detectors are located. Both applications have to do with measuring radiation in the vicinity of electrical equipment that is often in movement such as the access lifts and the overhead cranes that are used to take the particle detectors apart during a shutdown period.

Another example is to measure integrated radiation levels at specific locations where the use of electronic equipment may be envisaged at a later point in time. It is not always possible to use Monte Carlo simulations in such a situation either because the beam operating conditions are unknown, the geometry of the area is too complex or simply because simulations would too time consuming. The data from the monitor will provide a clear engineering constraint for the radiation tolerance of this equipment.

Alternatively, when electronic equipment close to a beam line is showing erratic behaviour on a regular basis, the use of a standalone monitor can be considered to rule out the impact of radiation damage.

Finally, when electronic equipment is exposed to radiation, a monitor placed at the same location can provide information on the type of radiation damage that is caused (i.e. damage from total dose or from high or low energy neutrons).

#### IV. FUTURE WORK

One of the key points in the design is the radiation tolerance of the main and backup batteries. In particular, little

has been published in literature about the radiation tolerance of Li batteries to neutrons. So far, no degradation has been observed when these batteries are exposed to gamma radiation from a <sup>60</sup>Co source but a more complete characterisation in different radiation fields and especially in neutron dominated fields, will be needed to make this issue more precise.

Another important issue is the cross calibration of the radiation sensors. The sensors are identical to those used in the on line version and have been extensively tested in dedicated radiation facilities over the last 5 years. To cross check the data from the monitors, it is planned to equip some devices with passive dosimeters. In addition, some devices will be placed around the CERN accelerators at the same location of ionisation chambers which provide on line data on the total ionising dose.

Finally, the temperature coefficients of each radiation sensor will have to be measured individually as a function of accumulated dose and neutron fluence. Only a precise knowledge of the evolution of the temperature coefficients will enable to determine the total ionising dose and the 1 MeV equivalent neutron fluence with a high accuracy.

#### V. CONCLUSIONS

The stand alone radiation monitor design presented here is based on the successfully operated on line radiation monitoring system which is presently in use in the CERN accelerator complex. The devices provide a cost effective solution to survey the evolution of the time integrated radiation levels in terms of the total ionising dose, the 1 MeV eq. Neutron fluence and the hadron fluence h > 20 MeV. Another key point is that no external signal or power cabling is required which makes it possible to measure at practically any location at any time when there is circulating beam in the CERN accelerators.

The readout is non destructive, fast and can be carried out at any time with the device in situ. It is thus possible to read out a large quantity of devices during a technical stop or short shutdown of the accelerator complex. The interface with a host PC is using the standard USB 2.0 protocol and the associated LabView software is easy to use.

Future radiation experiments will focus on the radiation tolerance of Litium batteries in neutron rich environments. Li batteries have already shown excellent performance with respect to damage from total ionising dose.

#### VI. REFERENCES

- T.J. Wijnands, C. Pignard, 12<sup>th</sup> Workshop on Electronics for LHC and future Experiments, 25-29 September 2009, Valencia, Spain.
- [2] <u>www.tyndall.ie/projects/radfets/tech.html</u>
- [3] http://www.automation.siemens.com/semiconductor/in
- [4] M. Broussely, G. Pistoia, *Industrial applications of batteries: from cars to aerospace and energy storage*, Elsevier, Holland, 2007 ISBN-13 978-0-444-51260-6.
### On-chip Phase Locked Loop (PLL) design for clock multiplier in CMOS Monolithic Active Pixel Sensors (MAPS)

Q. Sun <sup>a,b</sup>, K. Jaaskelainen <sup>a</sup>, I. Valin <sup>a</sup>, G. Claus <sup>a</sup>, Ch. Hu-Guo <sup>a</sup>, Y. Hu <sup>a</sup>,

<sup>a</sup> IPHC (Institut Pluridisciplinaire Hubert Curien), 23 rue du Loess, 67037 Strasbourg Cedex 2, France <sup>b</sup> BeiHang University, Beijing, China

Corresponding author : isabelle.valin@ires.in2p3.fr

### Abstract

In a detector system, clock distribution to sensors must be controlled at a level allowing proper synchronisation. In order to reach theses requirements for the HFT (Heavy Flavor Tracker) upgrade at STAR (Solenoidal Tracker at RHIC), we have proposed to distribute a low frequency clock at 10 MHz which will be multiplied to 160 MHz in each sensor by a PLL. A PLL has been designed for period jitter less than 20 ps rms, low power consumption and manufactured in a 0.35  $\mu$ m CMOS process.

### I. INTRODUCTION

CMOS MAPS are foreseen to equip the HFT (Heavy Flavor Tracker) of the vertex detector upgrade of STAR (Solenoidal Tracker at RHIC) experiment at RHIC (Relavistic Heavy Ion Collider) [1], [2] (Figure 1a). In order to achieve a vertex pointing resolution of about, or better than, 30  $\mu$ m, two nearly cylindrical MAPS layers with average radii of about 2.5 cm and 8 cm will be inserted in the existing detector. These two layers will consist in 10 inner ladders and 30 outers respectively. Every ladder contains 10 sensors of ~ 2 cm x 2 cm each (Figure 1b).

The MAPS named Ultimate will integrate a large area pixel array with column-level discriminator, a zero suppression circuit and a serial data transmission [3]. The sensors readout path requires sending data over a 6-8 m LVDS link at a clock frequency of 160 MHz. Inter sensors data skew and clock jitter have to be controlled precisely in order to ensure the synchronization.

A PLL clock multiplier, which generates the 160 MHz clock frequency from a relatively low frequency input clock at 10 MHz, will be implemented on each sensor. Using a low frequency input clock reduces the problems of electromagnetic compatibility (EMC) related to the integration density, high speed transmission and coupling with the environment. The same clock will also equip an optional 8B/10B data transmission block implemented in Ultimate.

The PLL specifications in MAPS are: a period jitter less than a few tenth of ps rms, low power consumption and specific form factor for the layout.

A first prototype of charge-pump PLL circuit was designed and fabricated in a  $0.35\mu$ m CMOS process. In order to reduce the PLL noise coming from supply line, two on-chip voltage regulators are implemented to provide stable power supplies for the VCO (Voltage Controlled Oscillator) and the Charge-pump blocks. This technique allows reducing the PLL

jitter as long as the voltage regulator has a very good PSNR (Power Supply Noise Rejection) performance.

The first part of this paper presents the PLL architecture (section II) and the main building blocks (III, IV) whereas the second part describes the measurements (section IV).



Figure 1: (a) STAR tracking upgrade, (b) Ladder with 10 MAPS sensors (~ 2×2 cm each)

### II. THE PLL ARCHITECTURE AND FEATURES

The PLL clock multiplier block diagram is presented in Figure 2.



Figure 2: Clock multiplier block diagram

The loop is composed of a phase-frequency detector, a charge pump, a loop filter, a VCO, a level shifter and a frequency divider. A Power-on Reset block generates a reset signal when the power is applied to the PLL. This reset is

provided to the frequency divider and the loop filter in order to ensure that the PLL starts operating in a known state. A bias circuitry provides the currents to the charge pump and the VCO.

Various noise sources within the PLL contribute to the jitter and phase noise. As shown in [4] for high frequencies system, the effect of electronic noise on the jitter is typically much less pronounced than that due to substrate and power noise.

In MAPS sensor, supply and substrate noise is a major noise source. It results mainly of voltage fluctuations on the supply lines due to large current transients in digital and mixed circuitries.

The VCO has the most significant contribution to noise which should be minimized by choosing design architecture less sensitive to supply and substrate noise like differential structure. Electronic noise will also be minimized in the design. Besides, a regulated voltage supply line for the VCO has been implemented to reduce the noise originating from supply line. Moreover, stable dynamics and voltage control range could be obtained as long as the regulated power supply is insensitive to process, voltage and temperature variation.

The charge pump is also sensitive to the supply noise. The ripple noise in the power supply will create ripple on the control voltage of the VCO through charge pump. Providing a stable supply line is also required for the charge pump. In order to get better current matching and a wide enough control voltage range, charge pump needs relative higher voltage headroom than the VCO.

Two voltage regulators were implemented to provide stable voltage supplies to the analogue part as shown in Figure 3. The first regulator with low dropout voltage will provide the supply voltage VDDP for the charge pump. The second regulator with high PSNR performance will generates the supply voltage VDDV for the VCO and the bias circuitry. Using two linear regulators in series allows doubling the PSNR of second regulator if they are identical. VDDD is the digital power supply which provides the first regulator and the other sub-blocks.



Figure 3: Power supplies distribution of PLL sub-blocks

#### III. THE VOLTAGE REGULATOR

The Figure 4 shows the schematic of the regulator. It consists of the voltage reference provided by a bandgap reference circuit (not shown), the error amplifier, the pass transistor, the voltage divider R1-R2, and the load capacitor. The loop ensures that the output voltage is always at the appropriate voltage by modulating the gate potential of the

pass transistor. The regulator topology is conceptually similar to a two-stage amplifier.



Figure 4: Schematic of the voltage regulator

Several specifications determine the performances of the regulator. For use in the charge pump PLL application, a high power supply noise rejection is required.

In the closed-loop configuration, the output ripple, noted  $Vout_R$  can be estimated by:

$$\mathsf{Vout}_{\mathsf{R}} \cong \frac{1}{\beta} \left( \mathsf{Vbg}_{\mathsf{R}} + \frac{\mathsf{VDDD}_{\mathsf{R}}}{\mathsf{PSRR}} \right)$$

Where  $\beta = R_2/(R_1+R_2)$  is the feedback factor, VDDD<sub>R</sub> and Vbg<sub>R</sub> are the ripple voltages on the power supply line and the voltage reference, respectively.

A high PSRR (Power Supply Rejection Ratio) for the twostage amplifier will reduce the output ripple of the regulator. The regulator architecture uses a current buffer in series to the Miller compensation capacitor  $C_c$  to break the forward path and compensate the zero [5]. This compensation scheme seems to be very efficient both for gain-bandwidth improvement and for high frequency PSNR. The disadvantages of this technique are a slight increase in complexity, noise and power consumption.

The Figure 5 shows the simulated PSNR for the VCO supply voltage. Table1 summarizes the voltage regulator performances. Figure 6 presents the measured PSNR for the VCO and CP blocks.



Figure 5: Simulated PSNR of VCO supply voltage

Table 1: Regulator performance for VCO supply

| Voltage regulator area     | $0.15 \text{ mm}^2$ |
|----------------------------|---------------------|
| Static current consumption | 780 µA              |
| Maximum output current     | 14 mA               |
| PSNR (measured)            | -30 dB              |



Figure 6: Measured PSNR for the VCO supply (red curve) and CP supply (blue curve)

IV. PLL BUILDING BLOCKS DESCRIPTION The detailed architecture of the PLL is shown in Figure 7.



Figure 7: PLL core

### A. Phase-Frequency Detector and Charge Pump

The phase-frequency detector uses a tri-state logic block (see Figure 7). The UP and DN signals controls the charge pump. Two D flip-flops with D=1 are triggered by two clock signals which are compared. Ideally, the three states are UP, DN and high impedance. If the reference clock leads the feedback clock, the UP state is generated while DN state is produced for the opposite condition. The filter is in high impedance state at steady state. In order to avoid the deadzone around zero-phase error leading to increased noise, the forth state where the UP and DN pulses are "high" simultaneously is enlarged by inserting a delay in the reset path. This ensures that the switches in the charge pump could be opened even if a tiny phase error exists between the reference clock and the feedback clock.

The delay time has been optimized in order to minimize the dead zone and to limit the perturbation on the control voltage in the steady state of the PLL.

The charge pump schematic, depicted in Figure 8 (b), uses a dummy switch structure to limit the charge injection and clock feedthrough mismatch [6].



Figure 8: Charge pump schematic

The simulation result of the charge pump phase-detector presented on Figure 9 shows that the dead zone is eliminated and a phase difference (systematic offset) of -2.5 degree exists between reference and feedback clock.



Figure 9: Simulation of charge pump phase-detector

### B. Voltage controlled oscillator

The PLL uses a 4 stage differential ring oscillator for the VCO. The design of the VCO was optimized for low noise, low common-mode sensitivity and low power dissipation.

The delay cell, shown in Figure 10 (a), contains a source coupled pair with resistive load elements called symmetric loads [7], [8]. Their I-V characteristics are symmetric around the centre of the voltage swing. Linear controllable resistor loads are desirable to achieve supply noise rejection in differential delay cell because the common-mode noise is converted into differential-mode noise by the non-linearity of the load. The differential-mode would affect the cell delay and then produce timing jitter. By using symmetric loads, the first order noise coupling terms are cancelled out, and then reducing the jitter caused by the common-node noise present in the supply line. The cell delay changes with the VBIAS since the effective resistance of the load elements vary as the VBIAS. With the power supply VDDV as the upper swing limit, the lower swing limit is symmetrically opposite to the VBIAS. The VBIAS is generated dynamically by a replica bias circuit depicted in Figure 10 (b). A controllable tail current in the delay cells and the bias circuit is used to adjust the cell delay. The output voltage swing is relatively maintained constant by varying the active resistance of the loads in such a manner that the variation is inversing to the observed current change. The voltage to current converter is shown on Figure 10 (c). This circuit provides a first-order linear relationship between the oscillation frequency and the control voltage. An additional current in the converter make the tuning of the VCO more flexible.



The Figure 11 shows the VCO tuning range from 60 MHz to 230 MHz is obtained by simulation.



Figure 11: VCO simulation: Freq. (MHz) versus Control Voltage

#### V. PLL MEASUREMENTS

The proposed PLL with on-chip voltage regulator has been implemented in AMS (Austria-Micro-Systems)  $0.35\mu m$  CMOS process. The PLL core area is  $0.42 \text{ mm}^2$  (1900  $\mu m x$  220  $\mu m$ ) (see photograph Figure 12). The regulator's area represents about 35% of the total area.



Figure 12: Clock multiplier photography

The PLL locking range is measured from 138 to 300 MHz at room temperature. The frequency range shifts of about 80 MHz upwards compared with the simulation results presented in Figure 11. It might result of the overestimation of the parasitic capacitances in the VCO design. Table 2 shows that the PLL locking range is relatively stable in temperature.

Table 2: PLL locking range as function of temperature

| Temperature (°C) | Lower limit (MHz) | Upper limit (MHz) |
|------------------|-------------------|-------------------|
| 0                | 130               | 295               |
| 20               | 138               | 298               |
| 45               | 140               | 297               |

As shown in Figure 13, the PLL locking time is about 60  $\mu$ s and is in good agreement with the simulation.



Figure 13: Measured PLL locking time (the reference clock jumps from 10 MHz to 16.7 MHz.) (10  $\mu$ s / div)

The Figure 14 presents the period jitter measured with a digital scope in two conditions and the Table 3 summarizes clock jitter as function of the reference frequency at room temperature. Table 4 shows period jitter as function of the frequency of the perturbation. The results show that a period jitter of 13.5 ps rms was measured for a stable 3.3 V supply voltage for a 160 MHz output clock. The period jitter with a 400 mV, 10 kHz frequency square wave on the supply voltage is 16.24 ps rms and increases slightly compared with the stable supply voltage.



(a) period jitter with a stable 3.3 V supply voltage.



(b) period jitter with a peak amplitude of 400 mV, 10kHz square wave on the power supply line.

Figure 14: Measured period jitter at 160 MHz PLL clock with and without noise

| I WOID DI CICCUI III COUDEL CII CIU CIU | Table 3: | Jitter | measurement | summar |
|-----------------------------------------|----------|--------|-------------|--------|
|-----------------------------------------|----------|--------|-------------|--------|

| Reference freq.(MHz)                                             | 9    | 10   | 12   | 14   | 16   | 18   |
|------------------------------------------------------------------|------|------|------|------|------|------|
| PLL clock (MHz)                                                  | 144  | 160  | 192  | 224  | 256  | 288  |
| Period jitter (ps rms)                                           | 12.8 | 13.5 | 11.6 | 13.2 | 11.7 | 12.2 |
| Period peak-peak jitter<br>(ps)                                  | 124  | 126  | 113  | 107  | 97   | 111  |
| Cycle to cycle jitter (ps<br>rms)                                | 22.7 | 22.0 | 23.1 | 20.6 | 21.5 | 21.5 |
| Cycle to cycle peak-peak<br>jitter at 10 <sup>-12</sup> BER (ps) | 323  | 317  | 326  | 293  | 318  | 307  |

Table 4: Measured period jitter with a peak amplitude of 400 mV square wave at 160 MHz at different noise frequency

| Noise frequency<br>(kHz)        | 0.1  | 1    | 10   | 100  | 1000 | 10000 |
|---------------------------------|------|------|------|------|------|-------|
| Period jitter (ps<br>rms)       | 18.8 | 18.5 | 16.2 | 15.5 | 15.6 | 15.3  |
| Period peak-peak<br>jitter (ps) | 148  | 131  | 140  | 113  | 132  | 127   |

#### Table 5 summarizes the PLL performance.

As the PLL prototype shares power supply with the MAPS sensor, it has not been possible to measure directly the PLL supply current. The power dissipation of the PLL has been estimated at 7mW.

Table 5: PLL performance summary

| Technology                    | 0.35µm CMOS process |
|-------------------------------|---------------------|
| PLL die area                  | $0.42 \text{ mm}^2$ |
| Multiplication factor         | 16                  |
| Locking range                 | 138 MHz – 300 MHz   |
| Power supply requirement      | 3.0 – 3.6 V         |
| Power consumption (estimated) | 7 mW at 160 MHz     |
| Period jitter                 | 13.5 ps rms         |
| Period jitter with noise*     | 16.2 ps rms         |
| Locking time                  | 60 µs               |

\*a 400 mV, 10 kHz square wave applied on supply power, room temperature

### VI. CONCLUSION AND PERSPECTIVE

A PLL clock multiplier designed for CMOS MAPS has been presented in this paper. On-chip voltage regulators provide two stable power supplies to the VCO and the charge pump. Using the on-chip regulator increases the area of 35% and the power consumption of 20%. The total power consumption has been estimated at 7 mW.

Experimental results showed that for the output clock at 160 MHz PLL clock, the period jitter is 13.5 ps rms and increases slightly in an emulated noisy power supply environment. With this low jitter performance, the PLL can be employed as clock multiplier in MAPS.

In the future, the same PLL clock will also equip a serial transmitter block. In order to ensure the data transmission with low error rate, the PLL should be optimized by characterizing the transmission system with cable connections and receivers. We plan to design a new prototype to enhance the jitter by using a programmable loop bandwidth and by optimizing the VCO.

#### VII. REFERENCES

- M. Winter et al., Vertexing based on high precision, thin CMOS sensors, in Proceedings of the 8<sup>th</sup> ICATPP, Como, Italy, October 2003.
- [2] L.C. Greiner et al., STAR vertex detector upgrade development, in Proceedings of Vertex 2007, Lake Placid, NY, U.S.A., September 23-28 2007, PoS(Vertex 2007)041.
- [3] Ch. Hu-Guo et al., CMOS pixel sensor development: a fast read-out architecture with zero suppression, 2009 JINST 4 P04012doi: 10.1088/1748-0221/4/04/P04012
- [4] F. Herzel and B. Razavi, A study of Oscillator jitter due to supply and substrate noise, IEEE Transactions on Circuits And Systems-II: Analog and digital signal processing, vol. 46, N°.1, January 1999
- [5] G.Palmisano and G.Palumbo, A compensation strategy for two-stage CMOS Opamps based on current buffer, IEEE Transactions on Circuits And Systems-I: Fundamental, theory and applications, Vol. 44, N°.3, March 1997
- [6] V.Von Kaenel et al., A 320 MHz, 1.5 mW at 1.35V CMOS PLL for Microprocessor Clock Generation, IEEE Journal of Solid-State Circuits, Vol. 31, N°.11, November 1996
- [7] J.G. Maneatis and M. A. Horowitz, Precise Delay Generation Using Coupled Oscillators, ISSCC 93 / Analog Techniques/ Paper TA 7.5
- [8] J.G. Maneatis, Low-Jitter Process-Independent DLL and PLL Based on Self-Biased Techniques, IEEE Journal of Solid-State Circuits, Vol. 31, N°.11, November 1996.

### Charge Pump Clock Generation PLL for the Data Output Block of the Upgraded ATLAS Pixel Front-End in 130 nm CMOS

A. Kruth<sup>a</sup>, G. Ahluwalia<sup>a</sup>, D. Arutinov<sup>a</sup>, M. Barbero<sup>a</sup>, M. Gronewald<sup>a</sup>, T. Hemperek<sup>a</sup>, M. Karagounis<sup>a</sup>, H. Krueger<sup>a</sup>, N. Wermes<sup>a</sup>, D. Fougeron<sup>b</sup>, M. Menouni<sup>b</sup>, R. Beccherle<sup>c</sup>, S. Dube<sup>d</sup>, D. Ellege<sup>d</sup>, M. Garcia-Sciveres<sup>d</sup>, D. Gnani<sup>d</sup>, A. Mekkaoui<sup>d</sup>, V. Gromov<sup>e</sup>, R. Kluit<sup>e</sup>, J. Schipper<sup>e</sup>.

<sup>a</sup> University of Bonn, Physics Department, Nussallee 12, 53115 Bonn, Germany
<sup>b</sup> CPPM, Aix-Marseille Universite Marseille, CNRS/IN2P3, Marseille, France
<sup>c</sup> INFN, Genova via Dodecaneso 33, IT-16146 Genova, Italy
<sup>d</sup> LBNL, 1 Cyclotron Road, Berkeley, CA 94720, USA
<sup>e</sup> NIKHEF, Science Park 105, 1098 XG Amsterdam, Netherlands

kruth@physik.uni-bonn.de

### Abstract

FE-I4 is the 130 nm ATLAS pixel IC currently under development for upgraded Large Hadron Collider (LHC) luminosities. FE-I4 is based on a low-power analog pixel array and digital architecture concepts tuned to higher hit rates [1]. An integrated Phase Locked Loop (PLL) has been developed that locally generates a clock signal for the 160 Mbit/s output data stream from the 40 MHz bunch crossing reference clock. This block is designed for low power, low area consumption and recovers quickly from loss of lock related to single-event transients in the high radiation environment of the ATLAS pixel detector. After a general introduction to the new FE-I4 pixel front-end chip, this work focuses on the FE-I4 output blocks and on a first PLL prototype test chip submitted in early 2009. The PLL is nominally operated from a 1.2 V supply and consumes 3.84 mW of DC power. Under nominal operating conditions, the control voltage settles to within 2 % of its nominal value in less than 700 ns. The nominal operating frequency for the ring-oscillator based Voltage Controlled Oscillator (VCO) is  $f_{VCO} = 640 \, \text{MHz}.$ 

The last sections deal with a fabricated demonstrator that provides the option of feeding the single-ended 80 MHz output clock of the PLL as a clock signal to a digital test logic block integrated on-chip. The digital logic consists of an eight bit pseudo-random binary sequence generator, an eight bit to ten bit coder and a serializer. It processes data with a speed of 160 Mbit/s. All dynamic signals are driven off-chip by custommade pseudo-LVDS drivers.

### I. INTRODUCTION TO THE NEW PIXEL DETECTOR FRONT-END CHIP

FE-I3 is the pixel detector front-end chip of the current AT-LAS experiment at the LHC. Simulations have shown that due to the architecture of this chip, it will suffer from various sources of inefficiency and its performance will degrade significantly with increased LHC luminosities [2]. Furthermore, the sensors of the innermost pixel layers will suffer from severe performance degradation after a few years of operation in the hostile radiation environment close to the interaction point. It is for these reasons that an international collaboration is already working on a new silicon detector front-end chip called FE-I4 suitable for LHC upgrades scheduled for 2013 or later. The first upgrade will be the Insertable B-Layer (IBL). As it imposes complex engineering efforts to disassemble the present detector, a new layer of pixels will be inserted into the present tracker at a radius of  $r \approx 3.7$  cm. A second upgrade will be a full replacement of the complete tracker using four to five pixel layers between  $\approx 3.7$  cm and  $\approx 25$  cm together with silicon strips at larger radii in about 2020. FE-I4 is meant to serve for both upgrades.

Among its new features are an increased die area  $18.8 \text{ mm} \times 20.2 \text{ mm}$  but smaller individual pixels of  $50 \,\mu\text{m} \times 250 \,\mu\text{m}$ . One front-end chip consists of  $336 \times 80$  pixels. The active area of the front-end pixel chip has been increased from 75 % to 90 %. In order to fit the clustered nature of physical hits, the new architecture groups four pixels into one digital region with a five deep buffer for local hit storage. The hit processing logic works in a way that not every hit is sent to the periphery of the chip. Instead hits are stored locally in the pixel region until the decision about the relevance of the hit is made. This reduces the traffic on the double column bus by a factor of 400.

FE-I4 will be manufactured in a 130 nm standard CMOS process technology. The thin  $SiO_2$  gates of the 130 nm technology node give natural radiation hardness to the transistor devices despite high radiation levels and make the use of enclosed layout transistors no longer a hard requirement which helps to increase the packing density.

The output stages of the FE-I4 are located in the periphery of the chip. The clock signal for the data processing at 160 Mbit/s is locally generated on-chip by a single ring-oscillator based PLL and is used in the FEI4 data output block.

### II. PHASE LOCKED LOOP

Figure 1 depicts the block diagram of the PLL with its main building blocks: Phase Frequency Detector (PFD), Charge Pump (CP), Loop Filter (LF), differential VCO, Frequency Divider (FD) and Output Buffers (BUF). The architecture is that of a classic type II charge pump PLL. The advantage of a type II PLL over a type I PLL is that it provides better correction of the PLL output for errors at the input. Additionally the loop gain

and stability properties are set independent of each other and the PFD of a type II PLL does not only detect phase mismatch but also frequency mismatch [3].



Figure 1: Schematic block diagram of the PLL.

The nominal VCO oscillation frequency is  $f_{VCO}$ 640 MHz. At the time the design of the PLL started, it had not been decided whether the 160 Mbit/s front-end output data will be processed at 160 MHz single-edge or 80 MHz double edge. The PLL prototype can provide both clock frequencies derived from  $f_{VCO}$ . Besides, the choice of a higher frequency  $f_{VCO}$  eases the task of generating lower frequency outputs with a clean 50 % duty cycle required for double edge data processing. Furthermore, the physical dimensions of the capacitive elements required in the LF are smaller (cf. Eq. 1) and the devices consume less die area. This enables an on-chip integration of the complete LF without external components. Due to synergy with other projects, the PLL also provides higher frequency clocks at  $f_{OUT} = 320 \text{ MHz}$  and  $f_{OUT} = 640 \text{ MHz}$ . The mentioned benefits come at the price of a slightly increased power consumption for the VCO and the high frequency divider stages. The PLL will be located in the periphery of the FE-I4 chip and the increased power consumption of a single PLL on the chip is negligible compared to the overall power budget.

The loop transfer function (neglecting higher order terms) is

$$H(s) = \frac{I_{CP}K_{VCO}}{2\pi C_{notch}} \frac{1 + sR_{notch}C_{notch}}{s^2 + s\frac{I_{CP}K_{VCO}R_{notch}}{2\pi N} + \frac{I_{CP}K_{VCO}}{2\pi NC_{notch}}}$$
(1)

where  $I_{CP}$  is the charge pump current (cf. Fig. 3),  $K_{VCO}$  is the VCO gain,  $R_{notch}$  and  $C_{notch}$  are loop filter elements (cf. Fig. 4) and N = 16 is the frequency division factor of the loop.

### A. Phase Frequency Detector and Loss of Lock Detection

The PFD uses a classical architecture with an additional loss of lock detection circuitry (see Fig. 2). The loss of lock detection latches the DN signal -resp. UP signal- of the PFD output with the rising edge of the  $f_{FB}$  signal coming from the feedback branch of the control loop Fb2Fast -resp. the rising edge of

the  $f_{REF}$  reference clock signal Ref2Fast- delayed by a certain time T. This delay time T determines the sensitivity of the loss of lock detection. A loss of lock resulting in DN = high -resp. UP = high- for longer than T (neglecting the propagation delay of a D-flipflop) will cause the signal Fb2Fast -resp. the signal Ref2Fast- to go high indicating severe changes in  $V_{CTRL}$ . The value for T has to be chosen large enough in order to prevent the loss of lock detection signals to go permanently high due to process variations.



Figure 2: Schematic of the phase frequency detector and the loss of lock detection.

### B. Charge Pump

The charge pump uses a differential architecture with a complementary dummy branch (see Fig. 3). Thus the charging and the discharging current source provide an almost constant current without switching on or off. While the main branch is controlled by the  $\overline{UP}$  and the DN signal coming from the PFD, the complementary branch is controlled by UP and  $\overline{DN}$ . The inverted signals are delayed by the propagation delay of the inverters used. The switching transistors M1 to M4 in the charge pump are minimum size devices and thus the charge injected into the loop filter upon breaking the current path is minimized. As a consequence spikes on  $V_{CTRL}$  due to charge injected from the transistor channels are reduced [4].



Figure 3: Schematic of the charge pump with its dummy branch.

### C. Loop Filter

The first branch of the LF (cf. Fig. 4) with the capacitance  $C_{pole}$  gives a low-pass characteristic to the control loop. However, the control loop is unstable with the associated frequency pole. The second branch of the LF ( $R_{notch}$ ,  $C_{notch}$ ) creates a frequency notch in order to increase the phase margin of the open-loop transfer function. By a rule of thumb  $10 \times C_{pole}$ should be less than  $C_{notch}$  in order to ensure sufficient phase margin. The third branch of the LF ( $R_{ripple}$ ,  $C_{ripple}$ ) forms another non-dominant frequency pole that filters high frequency noise on  $V_{CTRL}$ . The characteristic frequency response of the overall control loop can still be considered a second order system. The sum of all the capacitance values in the LF is  $C_{SUM} \approx 10$  pF. All capacitors are vertical natural caps fully integrated on chip. The die area consumption of the PLL core is dominated by these capacitor devices to a large extend.



Figure 4: Schematic of the loop filter.

#### D. Differential Voltage-Controlled Oscillator

The VCO consists of three inverters connected as a ring oscillator and a fourth inverter that serves as a buffer. The inverters are differential pairs loaded with PFET active loads and cross-coupled stages for rail-to-rail hard switching behavior (see Fig. 5).



Figure 5: Schematic of a VCO inverter stage.

The differential architecture guarantees an oscillator with 50% duty cycle output. Both the differential pairs and the cross-coupled stages are fed by tail current sources. The control voltage  $V_{CTRL}$  at the output of the LF controls the tail current sources directly whereas the PMOS loads are controlled by the inverted  $V_{CTRL}$ . As a result the oscillator can be tuned over a wide frequency range and an oscillation frequency of

 $f_{VCO} = 640 \text{ MHz}$  is guaranteed for  $3\sigma$  process variations without additional external tuning. The implemented VCO design is a trade-off between an extended VCO tuning range and noise sensitivity.

#### E. Frequency Dividers and Output Buffers

The FDs consist of four custom-made divide by two toggleflipflops. The VCO output frequency of  $f_{VCO} = 640$  MHz is consecutively divided down to 320 MHz, 160 MHz, 80 MHz and finally to 40 MHz equaling a total frequency division factor of N = 16.

In the output buffering stages, the differential clock signals from the dividing chain are converted to single-ended clock signals. Before the clock signals are sent out of the chip, the lower frequency clock signals are all gated with the 640 MHz clock for clock alignment. It is also possible to disable the lower frequency clocks in order to save dynamic power consumption.

The periphery of the test chip includes silicon proven LVDS drivers integrated into the pads that send the dynamic signals off chip [1].

### **III. INTEGRATED DIGITAL TEST LOGIC**

The digital test logic integrated on the fabricated PLL test chip consists of an eight bit pseudo random binary sequence generator, an eight bit ten bit coder and a serializer. The clock signal for the test logic can either be an external clock or the 80 MHz single-ended output of the PLL core. The output data of the serializer is a 160 Mbit/s double data rate bit stream. The integration of the test logic on-chip provides a built-in self-test for the PLL output signal integrity. The test logic implemented resembles a large part of the future FE-I4 data output block.

#### **IV. SIMULATION RESULTS**

Figure 6 illustrates the settling of the  $V_{CTRL}$  under  $3\sigma$  process variations.



Figure 6: Settling of  $V_{CTRL}$  with  $3\sigma$  process variations.

The simulation is based on a parasitic extraction of the PLL core with layout parasitic capacitances included. The PLL

 $V_{CTRL}$  settles in less than  $t_{settle} = 1.5 \,\mu s$  in all process corners. Under nominal conditions  $V_{CTRL}$  settles in  $t_{settle} \approx 650 \,\mathrm{ns}$  to an accuracy of 2 % of its final value. In order to investigate the PLL response to single-event transients, charges of 3 pC in 1.5 ns pulses [5] have been injected into various nodes of the control loop. Figure 7 shows the settling of  $V_{CTRL}$  being interrupted by a charge injection at t = 900 ns into the very same node that controls the oscillation frequency of the VCO. Furthermore, Fig. 7 sketches the reaction of the loss of lock detection. While  $V_{CTRL}$  is rising, the VCO is oscillating too slowly. Consequently the Ref2Fast signal is high, indicating that the reference clock is too fast resp.  $f_{VCO}$  is too low. When the charge injection takes place  $V_{CTRL}$  drastically increases, speeding-up the VCO and thus the Fb2Fast signal changes to high, indicating that the frequency of the signal coming from the feedback branch is higher than the input reference clock signal.



Figure 7: PLL response to a 3 pC charge injection at t = 900 ns onto the node that holds  $V_{CTRL}$ .

From noise simulations the VCO phase noise is  $-83.3 \, dBc/Hz @ 1 \, MHz$  offset and the noise is dominated by flicker noise of the bias current sources. The phase noise can be significantly improved to  $-90.0 \, dBc/Hz @ 1 \, MHz$  offset by enlarging the area of the devices in these bias circuits. The enlargement of these devices does not affect the total die area consumption of the PLL core and will be incorporated in future designs.

#### V. MEASUREMENT RESULTS

Figure 8 shows the PCB designed for the measurements of the PLL demonstrator. The trim potentiometers on the right allow for a flexible adjustment of bias currents and voltages. The input reference clock is fed to the SMA connector at the bottom. Next to the SMA connector on the right, jumpers can be used to enable or disable the different outputs of the test chip. The connection points for the probe heads are located at the top. The demonstrator itself is bonded onto the PCB close to a custom made LVDS transceiver chip that is also bonded onto the PCB in between the SMA connector and the connectors for the probes.



Figure 8: Test PCB designed for measurements on the PLL demonstrator.

For all measurements, the input reference clock has been supplied by an Agilent 81134A pulser with a jitter rms of 2 ps according to the data sheet. The oscilloscope used in the measurements is a Tektronix TDS5104B 5 GS/s, 1 GHz scope with active differential probes of 1 GHz bandwidth. The equipment used limits the measurement accuracy for signals with frequencies higher than 160 MHz. However, it needs to be kept in mind that the lower frequency clocks are internally generated from the higher frequency clocks. Thus the encouraging results for the lower frequency clocks indicate well functioning higher frequency clocks. As the output clock measurements are performed on the PCB, these measurements always include the performance characteristics of the LVDS drivers integrated into the output pads of the test chip.

Table 1 summarizes the results obtained for the PLL demonstrator. The results have been obtained by triggering the scope on one edge and measuring the time jitter resp. frequency jitter on the consecutive edge (cycle-to-cycle jitter) with the built-in measurement functions of the scope. The duty cycle has also been acquired with the measurement functions of the scope.

Table 1: Measurement data for the PLL operated from a 1.2 V supply.

|                       | Equipm.<br>Test |      |      | PLL  |      |      |
|-----------------------|-----------------|------|------|------|------|------|
| Frequency             | 40              | 40   | 80   | 160  | 320  | 640  |
| [MHz]                 |                 |      |      |      |      |      |
| Jitter pk-pk          | 44              | 82   | 74   | 94   | 70   | 106  |
| [ps]                  |                 |      |      |      |      |      |
| $\sigma$ -Frequency   | 6.5             | 19   | 79   | 258  | 1710 | 8100 |
| [kHz]                 |                 |      |      |      |      |      |
| $\sigma$ -Period [ps] | 4.1             | 12   | 12   | 11   | 17   | 20   |
| Duty Cycle            | Х               | 0.24 | 0.33 | 0.10 | х    | х    |
| Deviation [%]         |                 |      |      |      |      |      |

Figure 9 shows the eye diagram of a 160 Mbit/s data stream

sent out by the digital test block. The test logic uses the chip internal single-ended 80 MHz clock output of the PLL core. The shift of the crossing points indicates a deviation of the duty cycle from the ideal 50%. The deviation is attributed to an asymmetry in the circuits behaviour outside the PLL core.



Figure 9: 160 Mbit/s serialized output data stream of the on-chip digital test logic using the PLL 80 MHz clock output.

The opening of the eye diagram is  $\geq 6.0$  ns on the time axis and 284 mV on the voltage axis. The reduction of signal level on the voltage axis is not related to the PLL characteristics but to signal overshoot due to off-chip impedance mismatch.

The tracking range of the VCO is 336 MHz  $\leq f_{VCO} \leq$  976 MHz. Outside of this range the Fb2Fast -resp. Ref2Fast-signals go to permanent *high*.

### VI. CONCLUSION

A new ATLAS Front-End chip FE-I4 is being developed in a 130 nm standard CMOS technology for use for upgraded LHC luminosities, both for the Insertable B-Layer project and Super-LHC. FE-I4 is based on a low-power analog pixel array and new digital architecture concepts. After a short introduction to the new features of the FE-I4 chip, the focus is on the output stages. In order to handle the expected hit rate, the front-end will stream data out at 160 Mbit/s. A type-II PLL has been developed to generate the necessary clock signal with a welldefined duty cycle from the available 40 MHz bunch crossing reference clock. The PLL core draws a low current of 3.2 mA from a 1.2 V supply and consumes a die area of only 255  $\mu$ m × 225  $\mu$ m. The VCO of the PLL is based on a three-stage differential ring oscillator working at a nominal frequency of 640 MHz. The design trade-offs involved with the choice of a ring oscillator in terms of area, noise and locking range are discussed. Choosing an oscillation frequency higher than the output frequency for the VCO guarantees a lower area consumption of the LF capacitors and a well-defined duty cycle handling at the expense of slightly increased power consumption for the VCO and the four-stage dividing chain. In the ATLAS experiment, the PLL will be placed in a hostile radiation environment. In case of single-event transients due to severe charge injections, a short settling time to recover from a loss of lock is important. The presented PLL recovers from any given upset in less than  $1.5 \,\mu s$ .

A stand-alone PLL test chip has been submitted for fabrication early in 2009. Among its outputs are clock signals with 80 MHz for double edge data transfer and 160 MHz for single edge data stream out at 160 Mbit/s. The differential clock output lines are driven by integrated LVDS drivers. Simulation results as well as performance measurements for this test chip are presented and discussed.

The PLL is equipped with on-chip loss-of-lock detection circuits. Furthermore, the demonstrator includes a digital block for 160 Mbit/s double data rate output streaming, consisting of an eight bit pseudo random binary sequence generator, an eight bit to ten bit coder and a serializer. The integrity of the serialized 160 Mbit/s double data rate bit stream generated by the test logic has been investigated and has been found acceptable. The first prototype of the complete FE-I4 IC is scheduled for tape out at the end of 2009.

### REFERENCES

- M. Karagounis *et al.*, Development of the ATLAS FE-I4 pixel readout IC for b-layer Upgrade and Super-LHC, TWEPP'08, 2008, pp.70-75.
- [2] D. Arutinov *et al.*, Digital Architecture and Interface of the new ATLAS Pixel Front-End IC for Upgraded LHC Luminosity, IEEE Transactions on Nuclear Science, Vol.56, No.2, 2009
- [3] B. Razavi, RF Microelectronics, Prentice Hall PTR, ISBN 0-13-887571-5, 1998.
- [4] S. Cheng *et al.*, Design and Analysis of an Ultrahigh-Speed Glitch-Free Fully Differential Charge Pump With Minimum Output Current Variation and Accurate Matching, IEEE Transactions on Circuits and Systems-II: Express Briefs, Vol.53, No.9, 2006.
- [5] L. Wang *et al.*, An SEU-Tolerant Programmable Frequency Divider, Proceedings of ISQED'07.

### ATLAS Silicon Microstrip Tracker Operation

P. Vankov<sup>a</sup>

<sup>a</sup> University of Liverpool, L69 7ZE, United Kingdom

peter.vankov@cern.ch

On behalf of the ATLAS collaboration

### Abstract

The ATLAS experiment at the CERN Large Hadron Collider (LHC) has started taking data last autumn with the inauguration of the LHC. The SemiConductor Tracker (SCT) is the key precision tracking device in ATLAS, made up from silicon microstrip detectors. The completed SCT has been installed inside ATLAS. Since then the detector was operated for many months under realistic conditions. Calibration data has been taken and analysed to determine the noise performance of the system. In addition, extensive commissioning with cosmic ray events has been performed both with and without magnetic field. The current status of the SCT will be reviewed, including results from the latest data-taking, and from the detector alignment.

### I. INTRODUCTION

ATLAS (A Toroidal LHC ApparatuS) [1] is an experiment designed to explore the 14 TeV, 40 MHz proton-proton collisions at the Large Hadron Collider [2] in CERN, Geneva. The unprecedentedly high collision energy and the designed luminosity of  $10^{34}$  cm<sup>-2</sup>s<sup>-1</sup> at LHC will eventually allow discovery of possible *new physics* at the TeV scale. ATLAS will exploit the full physics potential of LHC but will mainly focus on the discovery of the Higgs boson, Super Symmetry (SUSY) and extra dimensions. A complete study of the expected ATLAS physics discovery performance can be found in Ref. [3].

ATLAS is the largest ever built high-energy physics experiment. It has a cylindrical shape with 44 m in length and 25 m in diameter and weighs  $\sim 7000$  t. It comprises of three basic subsystems: the Inner Detector, housed in a solenoid creating magnetic field of 2 T, the Calorimetry system (hadronic and electromagnetic) and the Muon Spectrometer with its associated superconducting toroidal magnets applying magnetic field of 0.5 T. A cut-away view of the ATLAS experiment is presented in Fig. 1; the various subdetectors are labeled.

The ATLAS Inner Detector (ID) [4] has to provide excellent momentum and vertex resolution for particles with pseudorapidity  $|\eta| \leq 2.5$ . At the same time it must cope with the high interaction rates and particle fluxes at the interaction region. For this it is designed to incorporate high granularity, radiation hardness and fast responsiveness. As shown in Fig. 2, the ID is composed of three subsystems placed in the 2 T solenoid: the Pixel detector, the SemiConductor Tracker (SCT) and the Transition Radiation Tracker (TRT). The Pixel detector (silicon pixels) forms the inmost, closest to the interaction point layer of the ID, followed by the SCT (silicon microstrips) and the TRT (arrays of gaseous straw drift-tubes). Each of the three ID systems has a central barrel section and two end-caps in the forward regions.



Figure 1: The ATLAS experiment.



Figure 2: The Inner Detector of ATLAS.

### II. THE SEMICONDUCTOR TRACKER

The ATLAS SemiConductor Tracker is built of 4088 silicon modules arranged in 4 cylindrical barrel layers, and 18 endcap discs. The pseudorapidity region covered by the SCT barrel part, consisting of 2112 modules, includes  $|\eta| < 1.1$  to 1.4, depending on the layer, whereas the end-caps, with 1976 modules in total, extends this region up to  $|\eta| < 2.5$ . The barrel innermost radius is 30 cm and the outermost, common for both the barrel and the end-caps is 56 cm. Along the beam axis (the *z*-direction) the barrel takes 80 cm from both sides of the collision point. The two symmetrical groups of 9 end-cap discs, labeled as end-cap *A* and end-cap *B*, are positioned along *z* between 85 cm and 272 cm. In total, SCT integrates 61 m<sup>2</sup> of silicon micro-strip sensors with 6.3 million readout channels.

The design of the barrel and end-cap SCT modules is similar. The difference is mostly in the shape. The barrel modules are completely identical [5], whereas the end-cap ones are in 4 variations [6]. A typical SCT module, see Fig. 3, is built of 2 pairs of silicon (p-on-n) microstrip sensors, glued back-to-back at an angle of 40 mrad. There are 768 silicon strips per module side (1536 per module) at a pitch of 80  $\mu$ m for the barrel and from 57  $\mu$ m to 94  $\mu$ m for the end-cap modules. This module architecture allows achievement of space-point resolution of 17  $\mu$ m in the  $R\phi$  and 580  $\mu$ m in z directions. A nominal bias voltage of 150 V is applied to the silicon strips. The module power consumption is 5.6 W (without irradiation).

The readout is performed by 6 128-channel ADCD3TA chips [7] on each side of the module, fabricated in radiation hard technology DMILL. The data signals, processed by the chips are pre-amplified, shaped, discriminated (compared to a nominal threshold of 1 fC) and finally digitized; binary output is delivered. The communication of the module with the off-detector electronics is realized through optical links. The opto-electronics used for this includes VDC chip [8] (drives the laser diodes) and DORIC4A chip [8] (receives the clock and command data from the light-sensitive diode).

For a successful 10 years operation at the harsh radiation environment at LHC, the SCT modules must withstand a 1 MeV neutron equivalent fluence of  $2 \times 10^{14}$  cm<sup>2</sup>. To limit the radiation damage effects, such as reverse annealing and leakage current, and to decrease the noise levels, the SCT detector is cooled to  $-7^{\circ}$ C. The cooling is performed by evaporative C<sub>3</sub>F<sub>8</sub>-based system.



Figure 3: A drawing of the SCT barrel module.

### III. SCT COMMISSIONING AND CALIBRATION

The installation of the SCT detector in the ATLAS setup ran in two stages. First, the SCT barrel was inserted in the ATLAS cavern in August 2006, then in April 2007 the end-caps were added. Post-installation and commissioning tests took place after the positioning of the SCT. The electrical connections (high and low voltage, temperature readings) were tested. The optical connections (p-i-n current, light from fiber at the Readout Driver, fiber connections and module mappings) and the cooling performance were examined. Finally, the SCT barrel was signed-off in April 2007 and the end-caps in February 2008, respectively.

In March 2008 the SCT joined the ATLAS combined M6 Milestone run with most other sub-detector systems and with all trigger levels. After successful integration with the central DAQ, SCT started taking cosmic data.

In May 2008 a cooling plant failure occurred, which put the SCT out of operation. The incident affected three out of the six compressors of the ID cooling plant, which is common for the Pixels and SCT. Three months later, at the end of August 2008, the damaged compressors were replaced, and since then the cooling is functional and works without problems.

For the launch of the LHC on  $10^{\text{th}}$  of September 2008 with circulating proton beams in both directions at an injection energy of 450 GeV, the SCT was calibrated again and ready for operation. First detected beam events, see Fig. 4, were caused by splashes of the protons at a collimator close to ATLAS. For safety reasons the SCT barrel was turned off and only the end-caps were left to function at a decreased voltage (20 V) and a raised threshold (1.2 fC).



Figure 4: An LHC beam splash event from 10<sup>th</sup> of September 2008 as detected by the SCT end-caps. The number of reconstructed space-points is shown.

From October until December 2008 SCT took part in the extensive ATLAS global cosmic run. All ATLAS subsystems were on and collected data synchronously. Different magnetic field configurations were applied (solenoid - on/off, toroids - on/off). In addition, there were also dedicated ID-only runs in which SCT worked in conjunction solely with the Pixels and the TRT. In total more than 7 million cosmic muon tracks were accumulated in ATLAS during this period, both with magnetic field on and off. Out of these, 2 million tracks (1.15 million with solenoid switched on and 0.88 million with solenoid off) crossed and were reconstructed in the SCT. An example of a

cosmic ray event traversing the SCT, Pixels and the TRT is given in Fig. 5.



Figure 5: A cosmic ray event with hits in the SCT, Pixels and the TRT.

#### **IV. PERFORMANCE**

Throughout the cosmic data-taking in October - November 2008 SCT operated with 99.6% of its barrel and 97.8% of its end-cap modules. The main reason for the inefficiency were 2 problematic cooling loops. As of today, one of the colling loops is completely recovered; the other one is affected by a non-accessible leak and consequently 13 end-cap modules will stay permanently non-operational.

A part of the disabled modules were down due to issues with the off-detector transmitter boards. It is believed that the problem is now understood; it was caused by electro-static discharges at the VCSEL boards (used to transmit clock and commands to the modules). Currently all broken VCSELs are replaced.

### A. Efficiency

Figure 6 shows the intrinsic hit efficiency of the SCT barrel measured with magnetic field. The muon tracks were required to have 10 SCT hits, 30 TRT hits and a  $\chi^2/\text{DoF} < 2$ . On average the barrel hit efficiency is found to be 99.75%. The end-caps showed lower averaged efficiency values of ~ 99% because of unproper timing with respect to the trigger.



Figure 6: SCT barrel hit efficiency (4 layers, 2 sides - inner and outer).

#### B. Noise

The noise performance of the SCT is illustrated on Fig. 7 where the distribution of the  $(\log_{10} \text{ of})$  average noise occupancy per chip of the SCT barrel, middle and outer end-caps is presented with nominal values of the threshold (1 fC) and bias voltage (150 V). The dashed line indicates the TDR noise-occupancy requirement limit of  $5 \times 10^{-4}$ . It can be seen that all the measured values are well below this limit. The inner and middle short end-cap modules are not shown since for them the average noise occupancy was below the sensitivity of the performed tests.



Figure 7: Noise occupancy averaged over chips in SCT barrel, SCT middle and outer end-caps. The specification limit of  $5 \times 10^{-4}$  is indicated with dashed line at the right-hand side of the plot.

### C. Lorentz angle

Another important quantity measured during the 2008 cosmic tests was the Lorentz angle. This is the track incidence angle leading to a minimum cluster size. Cosmic muons traverse the silicon sensors at different angles, thus allowing precise determination of the Lorentz angle. The measured (with and without magnetic field) mean cluster size as a function of the incidence angle is fitted and plotted in Fig. 8. The Lorentz angle (for magnetic field on) is then defined by the position of the function minimum, resulting in the value of  $3.93^{\circ} \pm 0.03^{\circ} (\text{stat.}) \pm 0.10^{\circ} (\text{syst.})$ . This value is in good agreement with the predicted by Monte-Carlo simulation value of  $3.69^{\circ} \pm 0.19^{\circ} (\text{syst.})$ . When there is no magnetic field applied, the Lorentz angle is found to be in the vicinity of 0 degrees, as expected.



Average cluster size Cosmics, w/ B-field (run 91900 Cosmics, no B-field (run 92057 MC Cosmics, w/ B-field MC Cosmics, no B-field 1.3 1.2 ATLAS 1.1 SCT Barrel Preliminary -20 -15 5 10 15 -10 -5 0 Incidence angle (degrees)

Figure 8: Mean cluster size versus incidence angle. Both measurements (2008 cosmic data) and Monte-Carlo predictions are plotted with and without magnetic field of 2 T.

#### V. ALIGNMENT

The SCT barrel alignment was significantly improved using the collected 2008 cosmic data. This is demonstrated in Fig. 9, where the x track residual distributions are shown before the alignment (nominal geometry), after the alignment (aligned geometry) and for a perfectly aligned (MC-simulated) geometry.

The residuals are constructed as the difference between the measured x hit position and the expected x position, as the latter is obtained via track extrapolation. From the plot it is evident that the newly aligned SCT geometry approximates closely the perfect, Monte-Carlo simulated geometry.

The alignment of the end-caps (module-to-module) was not possible because of the lower end-cap track statistics.

Figure 9: Track *x*-residuals for SCT nominal, aligned and perfect (MC) geometries.

### VI. CONCLUSION

The ATLAS SemiConductor Tracker was successfully installed, commissioned and calibrated. Extensive tests with cosmic rays carried on in the autumn of 2008 proved that the detector is in excellent condition and meets the design specifications. The encountered problems were solved in due time. The SCT is now ready for the expected LHC restart in November 2009.

#### REFERENCES

- G. Aad, *et al.*, The ATLAS Experiment at the CERN Large Hadron Collider, J. Instr. 3 (2008) S08003.
- [2] L. Evans and P. Bryant, LHC Machine, J. Instr. 3 (2008) S08001.
- [3] The ATLAS Collaboration, Expected Performance of the ATLAS Experiment - Detector, Trigger and Physics, arXiv:0901.0512v4 (2009).
- [4] ATLAS Collaboration, ATLAS Inner Detector Technical Design Report, Volume 2, ATLAS TDR 5, CERN/LHCC/97-17, ISBN 92-9083-103-0.
- [5] A. Abdesselam, et al., The Barrel Modules of the AT-LAS Semiconductor Tracker, Nucl. Instrum. Meth. A568 (2006), 642.
- [6] A. Abdesselam, et al., The ATLAS semiconductor tracker end-cap module, Nucl. Instrum. Meth. A575 (2007), 353.
- [7] F. Campabadal, *et al.*, The Design and performance of ABCD3TA ASIC for readout of silicon strip detectors in the ATLAS semiconductor tracker, *Nucl. Instrum. Meth.* A552 (2005), 292.
- [8] D.J. White, *et al.*, Radiation hardness studies of the front end ASICs for the optical links of the ATLAS SemiConductor Tracker, *Nucl. Instrum. Meth.* A457 (2001), 369.

# The GBT-SCA, a radiation tolerant ASIC for detector control applications in SLHC experiments

A Gabrielli<sup>a,b</sup> for the GBT project

S. Bonacini<sup>a</sup>, K. Kloukinas<sup>a</sup>, A. Marchioro<sup>a</sup>, P. Moreira<sup>a</sup> A. Ranieri, <sup>c</sup> G. De Robertis<sup>c</sup>

<sup>a</sup> CERN EP/MIC Geneva, Switzerland <sup>b</sup> Università di Bologna and INFN Bologna, Italy <sup>c</sup> INFN Bari, Italy

### alessandro.gabrielli@bo.infn.it

### Abstract

This work describes the architecture of the GigaBit Transceiver – Slow Control Adapter (GBT–SCA) ASIC suitable for the control and monitoring applications of the embedded front-end electronics in the future SLHC experiments. The GBT–SCA is part the GBT chipset currently under development for the SLHC detector upgrades. It is designed for radiation tolerance and it will be fabricated in a commercial 130 nm CMOS technology. The paper discusses the GBT-SCA architecture, the data transfer protocol, the ASIC interfaces, and its integration with the GBT optical link.

The GBT-SCA is one the components of the GBT system chipset. It is proposed for the future SLHC experiments and is designed to be configurable matching different front-end system requirements. The GBT-SCA is intended for the slow control and monitoring of the embedded front end electronics and implements a point-to-multi point connection between one GBT optical link ASIC and several front end ASICs. The GBT-SCA connects to a dedicated electrical port on the GBT ASIC that provides 80 Mbps of bidirectional data traffic. If needed, more than one GBT-SCA ASIC can be connected to a GBT ASIC thus increasing the control and monitoring capabilities in the system. The GBT-SCA ASIC features several I/O ports to interface with the embedded front-end ASICs. There are 16 I2C buses, 1 JTAG controller port, 4 8bit wide parallel-ports, a memory bus controller and an ADC to monitor up to 8 external analog signals. All these ports are accessible from the counting room electronics, via the GBT optical link system. Special design techniques are being employed to protect the operation of the GBT-SCA against radiation induced Single-Event-Upsets to a level that is compatible for the SLHC experiments.

The paper will present the overall architecture of the GBT-SCA ASIC describing in detail the design of the peripheral controllers for the individual I/O ports, the network controller that implements the connectivity with the GBT ASIC and will discuss the operation modes and the flow of information between the control electronics and the embedded front end ASICs.

### I. INTRODUCTION

The Gigabit Bidirectional Trigger and Slow Control Adapter (GBT-SCA) is a special purpose integrated circuit built in a standard 130 nm CMOS technology. It is used to implement a dedicated control link system for the control and monitoring of the embedded front-end electronics of a High Energy Physics experiment.

To put this GBT-SCA in the context where it will be used, a brief explanation of GBT system is provided in the next section.

### A. Overview of the GBT System

Typical High Energy Physics systems are today composed of three subsystems each of which traditionally implements its transmission system from the control room to the electronics located in the detectors. Figure 1 shows this. The subsystems are:

- a fast timing distribution system responsible to deliver to the experiment the system clock and the fast trigger signals and sometimes some fast signal from the detector to the control room;
- a data acquisition bus carrying the collected data out of the detector into the control room;
- a slow control system carrying bidirectional traffic from and to the control room and the embedded electronics in the detectors.

The GBT project aims at providing a common bidirectional system carrying all three types of traffics mentioned above. Clearly this is achieved by sharing a common medium, which in the GBT system is expected to be a pair of unidirectional optical fibers each one with a capacity of about 4.8 Gbit/s. An appropriate bandwidth is allocated to each of the three tasks in the GBT system.

The slow control part is one of the subsystems served by



Figure 1: The link

the GBT. The GBT is totally transparent to the slow control protocol. The GBT encoded slow control information in the counting room, carries it along the other traffic on the optical fibers, and delivers the information unmodified to the GBT-SCA in the embedded system. A block diagram of the GBT system is shown in Figure 2. The GBT system consists physically of a dedicated ASIC called GBT13 in the embedded electronics and of an FPGA containing several GBT channels in the counting room. The GBT-SCA is connected physically to the GBT13, which implements the long-haul transmission medium for it.

As the GBT system is based on a point-to-point architecture, the slow control system consists essentially in a local area network using a point-to-point topology. The bandwidth allocated by the GBT system to the slow control function is 80 Mbit/s.

### B. Overview of the GBT-SCA Architecture

The communication architecture used by the GBT-SCA is based on two layers. The first layer connects the GBT to the GBT-SCAs; the protocol on this layer is message based and is implemented in a way similar to standard computer LAN networks. The second layer connects the GBT-SCA itself to other chips in the system.

The first layer is unified and common to all GBT-SCAs, and is based on a LAN architecture transporting data packets, to and from the GBT and channel controllers. The second layer is specific to the channel.

The GBT-SCA contains the following blocks as shown in Fig. 2. On the GBT side:

- One MAC Controller;
- One Network Controller (NC). The GBT-SCA control itself is seen as a special channel capable for instance to report the status of the other GBT-SCA channels;
- One SCA Monitor used to control not only the SCA logic itself, but also external front-end alarm signals;
- One arbiter based upon Round-Robin technique to enable the user ports, the monitors or the NC, one at a time, to send data backwards towards the GBT upon reply of previous requests.

On the user side there are 24 I/o ports – one copes with 8 analog inputs:

- 16 I2C master controllers;
- 1 JTAG master controller;
- 1 controller called Detector Control Unit (DCU) that includes an ADC and is used to monitor up to 8 analog signals in the front-end electronic systems;
- 4 I/O like parallel bus controllers such as the ones used in the Motorola PIA etc;
- 1 memory-like bus controller to access devices such



Figure 2: SCA blocks

as static memories, A/D converters etc;

- 1 serial m-wire bus to access simple devices such as temperature sensors and EEPROMS;

All the blocks are synchronous with an external "clock" and have a synchronous "reset". Particularly, each block of the system can be forced into returning to a default state upon execution of a specific reset command. In addition, a hardware reset can reset the whole chip. This latter, is a further asynchronous "reset" added to the system.

#### II. THE PROTOCOL

This architecture assumes that the control is done by sending data packets (messages) to the respective channels, which interpret the messages as commands, execute them on their external interfaces (for example just a read or write operation to a memory bus) and return a status reply to the GBT via another message. The commands can be either addressed to registers located within the channel ports configuration registers - or to devices located in the far frontend. In this latter case the command interpretation and execution is demanded to the front-end electronics. This protocol assumes that the remote devices controlled by the GBT-SCAs are seen from the GBT as remote independent channels, each one with a particular set of control registers and/or allocated memory locations. The channels operate independently from each other to allow concurrent transactions. The channels can perform transfers to their enddevices concurrently. The high-level network layer, being a local area network-like protocol, is controlled by software running on an appropriate microprocessor through the GBT link. To decouple the operation of the channels with respect to the one of the GBT link, the architecture assumes that all operations on the channels are asynchronous and do not demand an immediate response. Basically this means that all commands carried by the GBT link under the form of network messages are posted to the channel interfaces. This is easy to implement for write operations, where practically one works by posting write operations to the channels. For read operations a read request is sent to the channel; the channel performs the operation on its interface and returns a request of attention to the Arbiter. Then, the Arbiter allows the channels to be activated one at a time through transactions opened by the Network Controller. These transactions send data backwards to the GBT, by including the same transaction identifier that was previously used for the correspondent upwards read command. All upwards packets are acknowledged via either status or data words depending on



Figure 3: The SCA packet

the command type. Read commands send data backwards, which are auto-acknowledged; write commands send just the status of the channel as a backward reply. Broadcast operations to different GBT-SCA adapters are not supported, as the connection is point-to-point. Only write broadcasts to internal channels are supported. For example, a broadcast operation to several I2C ports proceeds as follows:

- a broadcast message is sent to all I2C channels in a given GBT-SCA,
- the I2C channels execute the command concurrently but do not complete it necessarily at the same time,
  - if no error occurs, no acknowledgment is sent back.

I2C channels with errors report their status conditions back by sending different error report messages back to the command originator.

MAC controller is addressed via GBT and, if the MAC recognizes a packet whose destination is one of the peripherals of the GBT-SCA, it routes it to the NC. In particular, all the packets from the GBT follow a specific protocol and, through the NC, they are routed to the peripheral (I/O interface) to which they are addressed.

### III. THE SCA BLOCKS

The MAC Controller provides two channels, and both according to the Atlantic Interface protocol. These Atlantic Interface channels provide a 1-byte address field and a 2-byte data filed. The address specifies the user-port to be addressed. The Network Controller can be addressed in this way to access its internal configuration registers. The MAC Controller is at the same time a master and a slave device operating according to the Atlantic Interface protocol. It is a master when it sends data to the NC and it is a slave when receives data from the NC.

The Network Controller (NC) routes the data coming from the MAC Controller to the addressed user port. Additionally, when a user port requires sending data backwards to the GBT and when the port, the NC opens a transaction after being allowed by the Arbiter. The transaction correctly closes when acknowledged by the user port. The NC is a master and a slave device with respect to the MAC Controller while it is always a master with respect to the user ports. The NC must require an enable to the Arbiter before opening a transaction through the WishBone bus.

The Arbiter is responsible of enabling the transactions on the WishBone bus. In fact, it allows the blocks, one at a time, to occupy the bus. If the block is the NC, this becomes a transaction required from the GBT while in all other cases the user ports require attention through request signals to the Arbiter. In any case the transactions are master-to-slave from NC to the user ports. In fact, as soon as possible, the Arbiter allows the port to talk and the NC open the transaction. Eventually the data are sent backwards towards to GBT – for example upon reply of previous requests -.

Thus, at any time, each of the user ports can assert its request-signal to indicate that it has data to be delivered backwards to the GBT – for example this occurs as a consequence of a "read" command -. If several ports assert their requests concurrently, the Arbiter provides a quasi-

random priority to the requests: it is based upon the wellknow Round-Robin technique. Each request, when served by the Arbiter via an "enable" signal, forces the NC into opening a transaction to the requesting port as soon as possible. Then, the data can flow backwards and the user-port makes the backward bus busy until completion. If the user-port never releases the bus, after a predefined timeout the Arbiter turns off the transaction in any case. The Arbiter is a self-standing device and does not follow the WishBone bus protocol. The "request" and "enable" handshaking signals asserted by any of the user ports are out of the WishBone standards. On the other hand, these signals allow keeping a single-master multislave architecture for the WishBone bus. The SCA Monitor is used to monitor not only the SCA logic itself, but also external front-end alarm signals. In fact it is divided into an Internal and an Alarm external part. It continuously monitors the state of the internal machines through counters. It is supposed that, normally, these counters are reset via the NC but, in case of failure, they can reach a given count limit that corresponds to a specific programmable timeout. As a consequence of this, the Monitor can operate one specific task such as an auto-reset of the GBT-SCA or of the channel. A concurrent structure can be applied to monitor external alarm signals and, after a timeout has been reached, further tasks may be activated. In this way the Monitor can make decisions autonomously to handle faulty or abnormal system functioning. This task, for example, will be particularly useful to who wants to switch off an external power supply whenever specific conditions occur. This feature can be seen as if the Network Controller polled continuously external interrupts and, whenever they would require attention, they will be served immediately like high-priority peripherals. This structure allows the implementation of alarm signal through the WishBone bus architecture. In fact, the Monitor is operated and configured through the WB bus - it also contains internal registers to program the timeouts. The network consists of only two devices, the GBT and one embedded GBT-SCA, thus resembling a point-to-point network. Only the GBT is allowed to open a transaction to the GBT-SCA by sending a command via a Data Packet format. If nothing is required to be sent to the GBT-SCA, the GBT to GBT-SCA line is not used except for the clock signal. In fact, this must be sustained in any case to let the GBT-SCA be internally synchronized.

This is a variable-length protocol with a granularity of 1 byte. The MAC might require several cycles to pass an entire packet to the NC. This depends on the LEN field that, as specified below, make the packet length variable from a minimum of 5 bytes to a maximum of 260 bytes.

The WishBone bus standard specifies Single Read, Single Write, Block Read and Block Write commands. Figure 3 shows how each command is identified with a packet – i.e. one command per packet – and contains the following fields:

- 1 mandatory byte for the channel number (CH#),
- 1 mandatory byte for the transaction identifier (TR#),
- 1 mandatory byte for the command type (CMD),
- 1 mandatory byte for the length of the packet (LEN),
- up to 255 optional bytes (DATA) as data field.

In particular:

CH# specifies the SCA internal port to be addressed – i.e. I2C, JTAG, NC, Monitor, etc. -,

TR# is a wrap-around byte to identify the packet. This is reported in the backwards reply packet as answer to a previous packet delivered from the GBT to the SCA or to the front-end. This field uses the two dedicated codes 0x00 and 0xFF for internal and external alarm packets,

- CMD is a command code that specifies a given transaction. The operation can refer to a specific internal register of the channel – i.e. a configuration register – or a front-end destination address. In this case an address field follows the command.

- LEN is a field that ranges from 0 to 255 that specifies the DATA field length. For read commands LEN is 0,

- DATA is an optional variable length field upon LEN value.

### IV. CONCLUSION

The GBT and SCA project is aimed at proposing a highspeed general-purpose optical link for the data acquisition chains of the front-end electronics for SLHC experiments and beyond [4]. For this reason many standard user-ports have been proposed along with a Link protocol.

The project is justified because embedded applications in modern large high-energy physics experiments require particular care to assure the lowest possible power consumption and the radiation tolerance, still offering the highest reliability demanded by very large particle detectors.

Within the project, the SCA chip will carry out the slowcontrol operations for the front-end electronics. In addition, as the SCA will be located in a radiation environment, it will include a robust design to stand SEE.

SCA will interface with front-end electronics via common ports such as JTAG, I2C, parallel and 1-wire and, with the GBT via a Link port.

#### V. REFERENCES

[1] P. Moreira, T. Toifl, A. Kluge, G. Cervelli, F. Faccio, A. Marchioro, J. Christiansen., "G-link and gigabit Link compliant serializer for LHC data transmission", Nucl. Sci. Symp. Conf. Record, 2, (2000), pp. 96-99, doi: 10.1109/NSSMIC.2000.949860

[2] M. Rahman, "Super-radiation hard particle tracking at the CERN SLHC", IEEE Trans. Nucl. Sci., 50/6, (2003), pp. 1797-1804, doi:10.1109/TNS.2003.820769

[3] H. F.-W. Sadrozinski, A. Seiden, "Tracking detectors for the sLHC, the LHC upgrade", Nucl. Instr. Meth. A, 541, (2005), pp. 434-440, doi:10.1016/j.nima.2005.01.086

[4] A. Gabrielli, F. Loddo, A. Ranieri, G. De Robertis, "Architecture of a general purpose embedded Slow-Control-Adapter ASIC for future high-energy physics experiments"Nucl. Instr. Meth. A, 596, (2008), pp. 113-116, doi:10.1016/j.nima.2008.07.060

### A facility and a web application for real-time monitoring of the TTC backbone status

P. Jurga<sup>a</sup>, S. Baron<sup>a</sup>, M. Joos<sup>a</sup>

<sup>a</sup> CERN, 1211 Geneva 23, Switzerland

Piotr.Jurga@cern.ch

### Abstract

The Timing Trigger and Control (TTC) system distributes timing signals from the LHC Radio Frequency (RF) source to the four experiments (ATLAS, ALICE, CMS and LHCb). A copy of these signals is also transmitted to a monitoring system, installed in the CERN Control Centre, which provides continuous measurement of selected parameters. A web application has been designed to ensure real time remote monitoring and post-mortem analysis of these data. The implemented system is aimed at providing a tool for a fast detection of TTC signal abnormality and unavailability which results in reliability improvement of the whole TTC dependent infrastructure.

The paper discusses the architecture of the monitoring system including measurement setup as well as various concerns of data acquisition, storage and visualization.

#### I. LHC TIMING

The timing and synchronization of the LHC experiments is directly extracted from the timing signals used by the Radio Frequency (RF) system to capture and control the beams circulating in the accelerator. The TTC system, in charge of the distribution of these signals, is thus a key element for a successful operation of the experiments, from front-end modules to data acquisition. [1,2]

### A. TTC backbone

The main source of the timing signal is strictly related to the location of the (RF) equipment. As for the LHC, the superconductive RF cavities have been located at one place only (POINT4 – Echenevex)[3], the signals do not get distributed across the tunnel. Instead, they are transmitted through optical fibre backbone presented on the figure 1. [4]



Figure 1: TTC backbone

Once generated at POINT4, the signals are transmitted to the CERN Control Centre (CCC) at the Prevessin site and from there to the experiments (ALICE, ATLAS and LHCb). The only one exception is CMS. As it is very close to POINT4, it receives the signals directly through the tunnel.

#### B. TTC signals

The TTC timing signals consist of three bunch clocks (BC1, BC2 and BCREF) and two orbit signals (ORB1 and ORB2).

| BC1, BC2, BCREF          | ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, | ~40MHz    |  |  |
|--------------------------|-----------------------------------------|-----------|--|--|
| ORB1, ORB2               |                                         | ~11.24kHz |  |  |
| Figure 2: RF-TTC signals |                                         |           |  |  |

For each ring the Bunch Clock is a square wave at the RF frequency divided by 10. Its rising edge has a fixed delay with respect to bunch passage. This delay is reproducible from run to run. Each BC is always locked to its related beam.

For each ring the Orbit is a sequence of 5 ns long pulses at the Revolution Frequency. The delay of each Orbit versus its corresponding Bunch Clock is also reproducible from run to run. The Orbit signal is always locked to the revolution frequency of its related beam [5].

The parameters monitored by the system that we have implemented are extracted from these five timing signals.

#### II. TTCPAGE1 - MAIN OBJECTIVES

As the reliability of the distribution of the LHC timing signals to the experiments is of great importance, there has been a need for a global monitoring system with an accurate real time and post-mortem analysis facility. The designed system is called TTCpage1 and gathers qualitative data describing the status of the timing signals all over the accelerator and makes them available anytime to the TTC support team and the experiments.

It also allows us to ensure that the hardware responsible for the transmission of these signals is behaving as expected.

#### **III. MONITORED PARAMETERS**

To ensure a proper operation of the TTC system it is important that the signals are monitored all their way from the place where they get generated down to all of the experiments. The status of each receiver and transmitter has to be taken into account in order to present a complete backbone picture.

On the other hand, the TTC distribution is based on a passive fibre network hence the signals received at the CCR (Control Centre Rack Zone) are a copy of the signals received by the other experiments. Performing measurements like cycle-to-cycle jitter or skew jitter at the CCR gives a valuable

| Status of TTC backbone<br>receivers and transmitters<br>(published via DIP) | Jitter measurements of the<br>signals<br>(performed by the "jitter<br>scope") | High precision<br>frequency<br>measurements | Continuous track of<br>registers of RF2TTC and<br>RFRx VME Modules in<br>the TTC support rack | Signal phase shift versus<br>temperature (performed<br>by the "driftScope") |
|-----------------------------------------------------------------------------|-------------------------------------------------------------------------------|---------------------------------------------|-----------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
| Transmitter optical power                                                   | <i>Skew jitter</i> : BC1 VS                                                   | Frequency value:                            | Locking status of QPLLs                                                                       | ORB1 roundtrip delay                                                        |
| value at: POINT4, CCR                                                       | BCREF, BC2 VS BCREF,                                                          | BC1, BC2, BCREF                             | from RF2TTC,                                                                                  | from                                                                        |
| Receiver average                                                            | BC1 VS ORB1, BCREF                                                            | absolute precision                          | Beam mode from BST                                                                            | CCR to ATLAS                                                                |
| frequency value at: CCR,                                                    | VS ORB1                                                                       | up to 1Hz, all                              | Frequency average at the                                                                      | versus outside                                                              |
| ALICE, ATLAS, CMS,                                                          | Period jitter: ORB1                                                           | synchronized with                           | RFRx receiver                                                                                 | temperature value                                                           |
| LHCb                                                                        | Cycle to cycle jitter: BC1,                                                   | 10 MHz GPS(GMT)                             | ORB2 period in BC                                                                             | (sensor values from the                                                     |
|                                                                             | BC2, BCREF                                                                    |                                             | counts                                                                                        | DIP)                                                                        |

Table 1: Summary of the monitored parameters

indication to the experiments on the quality of the signals at other reception points.

The table 1 presents a summary of the monitored parameters together with related measurement device. They will be presented with more details in the following subsections.

## *A. TTC* backbone global status - receivers and transmitters

Two pairs of transmitters and six pairs of receivers installed along the TTC backbone are being monitored. The monitoring of these modules has been simplified to general status verification.

| Table 2: TTC backbone transmitter | ſS |
|-----------------------------------|----|
|-----------------------------------|----|

| Transmitter name | Location                                                                                       |
|------------------|------------------------------------------------------------------------------------------------|
| SR4TX            | POINT4 – Echenevex,                                                                            |
|                  | main source                                                                                    |
| RFCCRTX          | CCR (Prevessin), outgoing<br>signal (to ALICE, ATLAS,<br>LHCb and TTC support<br>crate in CCR) |

| Receiver name | Location                   |  |
|---------------|----------------------------|--|
| RFCCRRX       | CCR, incoming signal from  |  |
|               | POINT4                     |  |
| CCR           | Signal received by TTC     |  |
|               | support crate - monitoring |  |
|               | system                     |  |
| ALICERX       | ALICE experiment           |  |
| ATLASRX       | ATLAS experiment           |  |
| CMSRX         | CMS experiment             |  |
| LHCBRX        | LHCb experiment            |  |

Table 3: TTC backbone receivers

A pair of receivers consists of two RF\_RX\_D VME modules [6]. The status of each of the input channels is determined by an internal frequency counting register. The values being readout are not very accurate, however they are very useful for indicating if the signal frequency belongs to an accepted range. The ranges have been defined as 40.056 - 40.114 MHz for the bunch clocks and 11.245 - 11.246 kHz for the orbits.

Three frequency meters, described with more details in the next sections, perform the task of tracking the frequencies with higher precision. The RF-TTC backbone transmitters being used (RF\_TX\_D)[7] in comparison to the receiver modules provide different structure of internal registers. In this case frequency values of transmitted signals are not calculated. However, the optical power being emitted by each channel is stored in registers and monitored. As in the previous case the values are very useful for general status validation, not for qualitative measurements.

The values of parameters extracted from the registers described above are being read out every 10 seconds and stored in a database.

It has to be mentioned that for the receivers the internal registers are updated with some delay with respect to the events occurring on monitored signals. This behaviour is caused by the frequency counting method based on statistics. The delay can be up to around 2 seconds for the Bunch Clock frequency and to 30 seconds for the orbits.

A full picture of the state of all these transmitters and receivers is very useful to get a first overview of the status of the full distribution network. It is however not providing any qualitative information about the received signals. This task is performed by accurate frequency meters and RF2TTC module housed in the TTC support rack in the CCR and will be described in the following sections.

#### B. TTC support rack in the CCR

In addition to the global TTC status monitoring described in the previous subsection the great majority of measurements is being performed by devices connected to the TTC support crate in the CCR (figure3).



Figure 3: Measurement equipment (VME crate with frequency meters, VMEbus controller and slave modules, two oscilloscopes, local network switch and Server PC)

#### 1) Jitter measurements – "jitterScope"

A high-end oscilloscope has been installed to provide continuous measurement of TTC signals jitters. Eight parameters listed in the table 4 have been chosen for continuous monitoring.

Table 4: "jitterScope" measurement parameters

| Group | Parameter                   |  |
|-------|-----------------------------|--|
| P1    | BC1 cycle to cycle jitter   |  |
| P2    | BC2 cycle to cycle jitter   |  |
| P3    | BCREF cycle to cycle jitter |  |
| P4    | BC2 vs BCREF skew jitter    |  |
| P5    | BC1 vs BCREF skew jitter    |  |
| P6    | BC1 vs ORB1 skew jitter     |  |
| P7    | BCREF vs ORB1 skew jitter   |  |
| P8    | ORB1 period jitter          |  |

The figure 4 presents measurement algorithm being used.



Every 10 seconds a new measurement starts and data is being collected for 8 seconds. Once 8 seconds passes the statistical values such as average, peak-to-peak and standard deviation are collected and sent to the database. The statistics are cleared and next measurement starts at t=10s.

#### 2) RF2TTC module parameters

The RF2TTC VME modules which have been installed in every LHC - TTC receiving crate act as an interface between RF optical receiver (RF\_RX\_D) and experiment electronics [7]. As a part of their functions they also decode messages delivered through the Beam Synchronous Timing (BST) system [8]. The TTCpage1 provides monitoring of some of the internal registers of the module. The table below presents the most important of them.

| ruble 5. In 2110 converter monitored parameters | Table 5: | RF2TTC | converter | monitored | parameters |
|-------------------------------------------------|----------|--------|-----------|-----------|------------|
|-------------------------------------------------|----------|--------|-----------|-----------|------------|

| Parameter           | Description                     |  |  |  |
|---------------------|---------------------------------|--|--|--|
| BC1 QPLL,           |                                 |  |  |  |
| BC2 QPLL,           | Status (locked/unlocked) of     |  |  |  |
| BCREF QPLL,         | internal QPLLs                  |  |  |  |
| BCMAIN QPLL         |                                 |  |  |  |
| BST status          | Status of BST message reception |  |  |  |
| BST beam mode       | LHC beam mode extracted from    |  |  |  |
|                     | the BST                         |  |  |  |
| ORB1 and ORB2       | Number of related bunch clock   |  |  |  |
| period in BC counts | pulses per orbit period         |  |  |  |

The values of the registers listed above are read out every 10 seconds and stored in the database. To avoid any events being missed, the values of the registers are always latched when a particular condition occurs and cleared later only after reading.

There are only two exceptions from the 10 second interval. The first one is beam mode monitoring where any change of mode is being logged.

The second one is Orbit period in BC counts measurement. This task is performed with  $89\mu$ S resolution (the value of period is being saved in the modules' FIFO for every orbit pulse, which for 11.24 kHz gives one measurement every  $89\mu$ S). It is thus important to mention that any single Orbit signal with a period different from 3564 BC counts will be registered and displayed.

#### 3) Frequency meters

Three high precision frequency meters based on XILINX Spartan-3 evaluation kit [10] have been developed at CERN [11] to provide high accurate frequency tracking of the bunch clocks (precision up to 1 Hz, BC ~ 40 MHz)(figure 5). These modules have a 2-slots VME form factor, and have been installed in the VME crate which provides them the required power. The modules are read out via RS232 interface. Additionally the meters have been equipped with external reference clock input. The 10MHz signal from LHC Global Machine Timing (GMT) has been used for this purpose.

A measurement is being performed every 10 seconds and results are sent to the database.



Figure 5: Frequency meter

## *4) Fibre transmission delay drift versus temperature variations*

A spare set of fibres between CCR and ATLAS is being used for signal round trip delay measurements. These are being performed by an oscilloscope installed in the rack ("driftScope"). The results are being complemented with temperature values of the sensors provided by CERN Radiation Monitoring System for the Environment and Safety (RAMSES) metrological station.

#### IV. SYSTEM ARCHITECTURE AND DATA FLOW

The heart of the system responsible for gathering measurement data consists of a rack mount PC (TTCpage1 server). The server has been equipped with two network adapters, one connected to the CERN Technical Network (TN) and the other one to the local TTCpage1 private network (PN).

With regard to data collecting, the TTCpage1 also provides a boot server service for the diskless VMEbus controller connected to the PN. The controller is being used for control of the RF\_RX\_D and RF2TTC modules.

The purpose of using PN is to ensure stability of data transmission between the server and the measurement devices (oscilloscopes, VMEbus crate controller and frequency meters).

The location of the server within TN is imposed by a need for Data Interchange Protocol accessibility, which is unachievable within the CERN General Public Network (GPN). The security has been also enhanced by configuring firewall with restrictive policies.



Figure 6: System architecture

Once all the measurement data is gathered it is sent to a database provided by CERN database services. While this database is used for data storage it also acts as a "gateway" between the TN and GPN. An "emergency" copy of data is also saved on local TTCpage1 hard disk array. The array has been based on RAID level 1 controller which provides simultaneous data write on two hard disk drives.

The system is mostly based on 10 second interval which is equivalent to ~1GB data volume a year. The provided database service enables the system to store all the data during the whole LHC lifetime without the need for data reduction.

A web server provided by CERN Web Services is being used for data visualization. All of the webpage logic responsible for data reception from the database and graph plotting has been implemented in PHP and JpGraph library. The user interface has been based on AJAX technology provided by GWT (Google Web Toolkit) engine.

The webpage provides data visualization of the full range of monitored parameters and additionally supports the maintenance of the service itself, by analyzing e.g. sampling intervals, error flags and other.

#### V. DATA GATHERING APPLICATIONS

A set of applications written in C/C++ has been developed for collecting the data.

As a part of their functions the applications provide a remote control of two oscilloscopes accessed via TCP/IP connections and Versatile Instrument Control Protocol (VICP). They make use of a General Public License (GPL) based library for controlling VICP devices.

As the Data Interchange Protocol (DIP) is used as a source of some of the monitored signals (statuses of TTC receivers, Beam mode and temperature value), the applications have been extended with DIP libraries and some interface classes providing the ability to act as both dip-publisher and dipsubscriber.

Emphasis has been put on ensuring reliability and security of the system. This includes an implementation of data buffering mechanisms in case of database connection problems, local data storage and automatic remote restart of the VMEbus controller through custom design RS232 based interface.

Email and SMS notification procedures have been added to provide fast detection of undesirable conditions and possible system failures.

#### VI. TTCPAGE1 – DATA VISUALISATION

The figure 7 shows the web application which has been developed. The main window area has been divided into five parts. Each part consists of a plotting area and two drop down menus for parameter and time resolution selection (e.g. last 1h, last 24h etc.) The webpage is being continuously refreshed every 10s (while working in real-time monitoring mode).



Figure 7: TTCpage1 - web application http://cern.ch/ttcpage1 - CERN NICE user authentication required

The historical data can be analyzed at any time. A user who wants to see the status of the system at any given point in the past, can specify a desired date and after one click on "Update plots" data will appear on the screen.

#### VII. DATA VISUALISATION

Two types of plots are being used for data visualisation. The figure 8 presents a graph with the status of all of the TTC backbone receivers and transmitters versus time. Eight locations (as described in section III A) have been included.

This type of the plot can only display a limited number of discrete values (colours such as green, orange, red, etc.) which in some of the cases is insufficient. The issue has been solved with tabular form of data presentation, which has been made available for any graph being selected (figure 9).



Figure 8: TTC receivers and transmitters graph



Figure 9: Tabular form of data presentation

The second kind of plots being used has been presented on the figure 10. The example shown on the figure 10 presents the BC1 frequency in Hz versus time during RF ramping tests in October 2009.



Figure 10: BC1 frequency versus time

#### VIII. **CONCLUSIONS**

A full system has been designed for RF-TTC remote status monitoring. The deployed system fully complies with existing CERN infrastructure and services such as databases, networks, etc.

А web based application will provide fast data visualization to the LHC experiments, in order to monitor the TTC status in real time. The application is available to the users and helps them to quickly detect unexpected conditions and cross correlate those with other events. All data being collected is time-stamped and stored in a database which facilitates both real time and post-mortem data analysis. The implemented facility will for sure be essential for a close and efficient monitoring of the timing signals and of the complete TTC backbone system.

#### **IX. REFERENCES**

- [1] B.G. Taylor, Timing Distribution at the LHC, Proc. 8th Workshop on Electronics for LHC Experiements, Colmar (2002)
- [2] B.G. Taylor for the RD12 Project Collaboration, TTC distribution for LHC Detectors, IEEE Trans. Nucl. Sci., vol. 45, no. 3, pp. 821-828, Jun. 1998.
- [3] E. Ciapala et al., Commissioning of the 400MHz LHC RF system, LHC Project Report 1147 (2008)
- [4] S. Baron, TTC upgrade project
  - http://cern.ch/ttc-upgrade/
- S. Baron, P. Baudrenghien, RF-TTC Frequently Asked [5] Questions, (2008)
- [6] A. Monera et al., Optical to RF Digital VMEbus Interface Card User Manual, CERN PH-ESE 30-07-08-2 (2006)
- [7] M. Joos, RF to Digital Optical VMEbus Interface Card User Manual, CERN PH-ESS-02-01, (2006)
- [8] M. Joos, RF2TTC VMEbus Interface Card and S/W User Manual, CERN PH-ESS-14-07-2009, (2006)
- [9] R. Jones, The interface and structure of the machine beam synchronous timing message for the LHC experiements, LHC-BOB-ES-0001 ver.2.0 (2008)
- [10] Xilinx, Spartan-3 Generation FPGA User Guide, UG331 (2009)http://www.xilinx.com/support/documentation/user\_guid es/ug331.pdf
- [11] A. Holzner, C. Sigaud, Notes on Xilinx evaluation board frequency meter

https://twiki.cern.ch/twiki/bin/view/CMS/TTCFreqmeter

### A Low-cost Multi-channel Analogue Signal Generator

### F. Müller<sup>a</sup>, K. Schmitt<sup>a</sup>, W. Shen<sup>a</sup>, R. Stamen<sup>a</sup>

<sup>a</sup> Kirchhoff Institute for Physics, University of Heidelberg, Germany

fmueller@kip.uni-heidelberg.de

Abstract



A scalable multi-channel analogue signal generator is presented. It uses a commercial low-cost graphics card with multiple outputs in a standard PC as signal source. Each color signal serves as independent channel to generate an analogue signal. A custom-built external PCB was developed to adjust the graphics card output voltage levels for a specific task, which needed differential signals. The system furthermore comprises a software package to program the signal shape.

The implementation of the signal generator is presented as well as an application where it was successfully utilized.

### I. INTRODUCTION

The presented signal generator provides up to 12 independent analogue signals on a low-cost basis. It consists of a standard PC hosting a commercial multi-monitor graphics card that acts as source for the analogue signals. The graphics card is controlled by a dedicated software package running on the same machine. An external device was developed as part of the signal generator, being an example of how to condition the signal.

A possible application, the emulation of analogue signals of the ATLAS calorimeter trigger inputs for the Level-1 PreProcessor test rig, is described in section VI..

#### II. CONCEPT

The signal generator consists of three building blocks. In a first step, the signal is programmed either from basic pulse shapes or from pulses recorded with an oscilloscope. These signals are mapped to a 8-bit digital signal, as shown in figure 1 (left). At this point, the signal is strictly positive, featuring an artificial, non-zero baseline. The generation of negative signals, i.e. the application of an offset, is performed at a later step.

The analogue signal is generated in a second step, using a commercial graphics card as signal source. The basic idea is to use the DAC of the graphics card and the already existing periphery of the card (bus, memory, control unit) to generate analogue signals. Each color channel of the graphics card thereby serves as an independent signal source, with the native properties from the graphics card specification, as given below. These can be considered sufficient for many applications, like e.g. analogue components of the LHC experiments at CERN. The signal is unipolar, as illustrated in figure 1 (middle).

Figure 1: The desired pulse shape is created and mapped to the 8bit output of the graphics card (left), which generates a single-ended unipolar signal (middle). To condition the signal for a specific task, gain and offset are adjusted, including the conversion to e.g. differential signals (right) [2].

Finally, the signal is conditioned to a specific application by a third building block, which is a dedicated external device. This device has to perform a calibration of the graphics card signal. In addition, the artificially introduced baseline is taken into account by shifting the signals with a global offset. This offset is applied using a dedicated channel of the graphics card. The last operation is to adjust the voltage levels to the desired range of the application.

Only this last operation is patricular for the specific application. In the following, the case of a differential output and an additional fan-out of the signal is presented, which corresponds to the application given in section VI. Figure 1 (right) shows the signal after conditioning.

### **III. GRAPHICS CARD AS SIGNAL SOURCE**

Each color channel of the graphics card serves as an independent signal source. It is an unipolar signal with an 8-bit resolution of the output voltage and a time resolution ("pixel clock") of up to 5 ns. This can be considered sufficient to represent an analogue signal for systems operated at a lower speed, like e.g. many 40 MHz systems at the LHC. The signal is represented by a fixed image consisting of three signals at a time (red, green, blue). The longest possible continuous signal that can be encoded into the image is in the order of  $10\mu s$ , which corresponds to one line on the screen. This restriction is due to the need for horizontal synchronisation of analogue CRT monitors and emerges as a blanking space at the end of each line and each screen, where the electrical output is zero. This typically takes 20% of the total time. The total signal length nevertheless is up to 10 ms, the minimal frequency about 100 Hz ("monitor frequency").

In order to maximize the number of channels, graphics cards with multiple monitor outputs were tested. The Matrox QID Pro [1] was chosen as the model with the best electrical properties.



Figure 2: Noise measurement of the Matrox Millenium G400 with DAC set to zero. All outputs feature a constant, non-zero offset within  $\pm 20$  mV [2].

Figure 2 shows the measurements of the noise of the Matrox Millenium G400, which has very similar properties as the used model. The noise was determined by measuring the output with DAC set to zero, i.e. by displaying a black image, resulting in a gaussian noise with a RMS of 3.4 mV. The linearity was also measured and found to be within a 1% deviation over the output voltage range. Furthermore, figure 2 shows a deviation of the signal from zero in spite of the DAC set to zero. This offset was found on all color channels of the graphics card to be constant within 20 mV, which has to be corrected on the subsequent calibration stage of the external conditioning device.

#### **IV. SOFTWARE PACKAGE**



Figure 3: Graphical user interface of the software package to create, modify and save signal shapes. The main view shows a 40 MHz clock (green), a constant signal (blue) and a typical pulses for the ATLAS LAr type calorimeters (red). These signals were used in the application described in section VI. [2].

A software package was developed to program and create the signals. It consists of two parts. The first is a graphical tool that offers basic pulse shapes, modification tools and the possibility to import external data. The prepared pulse shapes are stored in a generic file format. Three of the signals are merged into a fixed image which correspond to the desired signal shape at the output of the graphics card. The second tool is a console application for linux that connects to a dedicated X window server running on the pc that hosts the graphics card. Thus it drives the graphics card by displaying the saved signals as fixed images at full screen, resulting in a repeating signal as long as the application is running.

#### V. SIGNAL CONDITIONING



Figure 4: The Fan-out and Application Board (FAB) performs calibration, fan-out and conditioning of the signals [2].

An external device was developed to condition the output signal to the voltage levels for a specific task. It is a PCB that consists of several buffer stages to calibrate for gain and offset. Up to six monitor outputs can serve as inputs. One channel is explicitly used to apply a global offset on all other signals in order generate negative, as well as positive, signals. The output are 16 differential signals, which can be configured by an upstream fan-out stage.



Figure 5: Scheme of the signal conditioning: Calibration for gain and baseline, application of a global offset and preparing of the output signal (in this case: differential signal).

Figure 5 shows a scheme of the signal conditioning for one channel. At the first stage, baseline and gain of the input signal from the graphics card are calibrated. This calibration is



Figure 6: Signal chain on the Fan-out and Amplification Board for a typical pulse of a the ATLAS LAr type calorimeter. Left picture: First, the original signal (red) is calibrated for gain and baseline (blue). Middle picture: Then a global offset (blue) is applied, resulting in a continuous baseline (green). Right picture: Finally the single-ended signal is converted to a differential signal (green). Also shown are the two branches of the differential signal (blue, red) and the original signal (red overlay) [2].

implemented by variable resistors of an operational amplifier in inverted circuit and has to be performed once. At the second stage, a global, negative offset is applied on all channels in order to make negative signals possible, using a dedicated, inverted channel of the graphics card. Hence, the offset is programmable by software, taking into account the artificially introduced baseline at the creation of the signal, as described in section II.. The presented version of the device for the signal conditioning was developed for a task that required multiple differential signals. Therefore, the signals are then fanned out and converted from single ended signals to differential ones at the last stage. This last stage, of course, varies for the specific task.

Figure 6 shows the development of the signals after the several stages of the signal conditioning.

#### VI. APPLICATION

The signal generator was successfully applied in a test bed for the PreProcessor Module (PPM) of the ATLAS Level-1 Calorimeter Trigger. One of the main tasks of the PreProcessor is the digitisation of the analogue pulses from the ATLAS calorimeters at a rate of 40 MHz. These pulses are transmitted differentially with a voltage amplitude of up to 2.5 Volts. The key characteristics are a rise time of 50 ns and an undershoot of up to -0.5 Volts for signals from calorimeters based on Liquid Argon technonolgy. The typical shape can be seen in the figure 1. Considering the sampling rate of 200 MHz, the presented signal generator can be considered highly sufficient to emulate the analogue ATLAS calorimeter pulses.

#### A. Test Bed for the PreProcessor Module

The test bed for the analogue parts of the ATLAS Level-1 Calorimeter Trigger Pre-Processor is shown in figure 7. Since the connectivity of the PPM is 4 connectors with 16 channels each, the signal generator was set up with 8 independent signals of the graphics card that are fanned out and converted to 16 differential signals.



Figure 7: The setup of the PreProcessor test bed consists of the signal generator, which delivers 16 differential channels. Furthermore, an external device (*clock board*) uses a dedicated channel to provide a 40 MHz clock.

In order to achieve a synchronous sampling of the pulses on the PPM with respect to the signal generator, the test setup also has to provide the bunch crossing frequency of 40 MHz to the PPM. This requires an additional device, since the signal generator suffers from the blanking space that prevents the generation of such continuous signals.

### B. Clock Synchronisation Board

The Clock Synchronisation Board uses another dedicated channel of the graphics card to provide a clock synchronous to the 16 signal channels. The device features a CPLD for basic logic function and routing, and a voltage controlled oscillator in a phase-locked loop (PLL). An incoming 40 MHz signal from the graphics card serves as reference clock, while the inhibit function of the PLL is used to bridge the intrinsic blanking space of the graphics card signal. Therefore, the reference clock is analysed to detect the beginning of the blanking space. This is achieved by a monoflop that is charged by the reference signal. Once the reference clock stops, the monoflop turns to zero. This activates the inhibit of the voltage controlled oscillator, whereby it sustains the 40 MHz clock. After the blanking space, the PLL ensures that the voltage controlled oscillator synchronises with the reference clock again. See figure 8 for illustration.



Figure 8: Scheme of the 40 MHz clock, provided by a dedicated channel for reference. A PLL with a voltage controlled oscillator is used to both synchronise and bridge the intrinsic blanking space, using the inhibit function of the phase detector of the PLL.



Figure 9: Digitised signal (yellow) and original signal (red overlay) [2].

### C. Measurement

For the measurement, the signal generator was providing pulses from a test beam pulse library [3] as well as the reference clock. The PPM was configured to digitise the analogue signals without further processing. The result is shown in figure 9, with the digitised signal in black, and the original signal as an overlay in red. Both are in very good agreement. The similar digitisation levels of two consecutive pulses furthermore demonstrate the synchronisation of the generated signals and the PPM sampling frequency, provided by the Clock Synchronisation Board.

### VII. SUMMARY

The presented signal generator is applicable in all fields with need for multiple analogue signals where a blanking space is no drawback, or can be compensated as described. The advantages are multiple, easily programmable signals with acceptable quality at very low expense.

### REFERENCES

- [1] Matrox Graphics Inc., User Guide matrox QID Series, 10876-301-0221, 2008.01.11
- [2] F. Müller, Rate Metering for the ATLAS Experiment, HD-KIP 08-22, 2008
- [3] H. Stenzel, Trigger Tower Analogue Signal Library, November 2001.

### A Radiation Tolerant 4.8 Gb/s Serializer for the Giga-Bit Transceiver

Ö. Çobanoğlu<sup>a</sup>, P. Moreira<sup>a</sup>, F. Faccio<sup>a</sup>

<sup>a</sup> CERN, PH-ESE-ME, 1211 Geneva 23, Switzerland

ozgur.cobanoglu@cern.ch

### Abstract

This paper describes the design of a full-custom 120:1 data serializer for the GigaBit Transceiver (GBT) which has been under development for the LHC upgrade (SLHC). The circuit operates at 4.8 Gb/s and is implemented in a commercial 130 nm CMOS technology. The serializer occupies an area of 0.6 mm<sup>2</sup> and its power consumption is 300 mW. The paper focuses on the techniques used to achieve radiation tolerance and on the simulation method used to estimate the sensitivity to single event transients.

#### I. INTRODUCTION

The GBT project aims at developing a radiation tolerant optical transceiver operating at 4.8 Gb/s within the framework of the LHC luminosity upgrade. Links implemented using the GBT will replace the three types of communication links currently in use, namely the timing, trigger and control (TTC) links, the data acquisition (DAQ) links and the slow control (SC) links, therefore providing a single solution for all the communication needs at the SLHC.

The GBT chip set will include a radiation tolerant serializer (SER) which converts 120-bit wide data words into a 4.8 Gb/s serial stream. Operating from a single 1.5 V supply, the circuit accepts CMOS-level data and control signals. The serializer outputs a differential signal with a worst-case simulated pattern-dependent jitter smaller than 6 ps at 4.8 Gb/s.

In the following section, the serializer architecture is detailed and a brief overview of its operation is provided. Section III deals with the circuit design of the major functional blocks. Section IV introduces the method used to estimate the performance of each circuit under radiation. Some relevant simulation results are also provided within this section. Finally, Section V summarizes the work.

### II. ARCHITECTURE

Fig. 1 shows the overall architecture of the serializer. It consists of a 120-bit input register, three 40-bit shift registers, a frequency synthesizer consisting of a phase-lock loop (PLL) with a feedback divider which is composed of two stages, one dividing by 3 and the other dividing by 40, thus a total division ratio of 120 and a 3:1 multiplexer shown as three switches.

The SER architecture is based on dividing the 120-bit frame into 3 40-bit words which are serialized at 1.6 Gb/s and then time division multiplexed to form the final 4.8 Gb/s serial bit stream. This architecture reduces the number of components operating at full speed.



Figure 1: The architecture of the serializer.

A fully integrated programmable charge-pump phase-locked loop (CP-PLL) synthesizes a 4.8 GHz clock from the 40 MHz LHC clock reference. To optimize the output jitter, the values of the loop filter resistor and the charge-pump current are programmable with 2 and 4-bit resolution, ranging from 1.5 K $\Omega$  to 6.0 K $\Omega$  and from 1  $\mu$ A to 100  $\mu$ A, respectively.

The CP-PLL is designed to be tolerant to SETs by a combination of techniques: i) The voltage controlled oscillator (VCO) transistors are designed as triple-well devices for better isolation, to reduce the active volume where charge is collected and finally to promote a quick drift of charge due to the electrical field established by the bias voltages of the P and N wells. ii) Triple Modular Redundancy (TMR) is used in the feed-back divider of the PLL to mitigate the single event upsets (SEU). The design targets the temperature range of [-20  $C^o$ , 100  $C^o$ ] and operates at 1.5 V, tolerant to power supply variation of 10%.

Fig. 1 and Fig. 2 sketch the overall operation of the serializer as follows: at every rising edge of the master clock (Clock40MHz) the 120-bit frame is loaded into the input register. At every rising edge of load $\langle i \rangle$  signal, a 40-bit word is loaded into the respective 40-bit shift register.

Since the PLL locks to the 40 MHz reference clock, it generates a bit clock ( $f_{BIT}$ ) with frequency equal to 120 times

40 MHz, that is, 4.8 GHz from which three non-overlapping clock phases ( $Q_0$ ,  $Q_1$ , and  $Q_2$ ) are obtained. As shown in Fig. 2, these three clock phases are used to clock three shift registers and to control the fast multiplexer in order to time division multiplex the three 1.6 Gb/s serial streams into a single 4.8 Gb/s serial stream.



Figure 2: The timing diagram of the serializer operation.

### A. Radiation Issues

In deep sub-micron technologies, the performance of high speed circuits depend on many effects related to the layout to an extent which is much greater than that for older CMOS processes. Therefore in relatively-recent technologies, the layout work should be introduced in a very early stage since it has a large impact on the final performance. For accurate simulation, some of the loop parameters, which play important roles in the loop behavior such as the VCO gain, must be extracted from the actual layout implementation.

As reported in [8, 9] and [11], the charge-pump and the VCO are the most sensitive components of PLL circuits to SETs, and their design has to take into account the increased sensitivity of modern deep submicron technologies to SETs. In such technologies the integrated devices are located closer to each other, thus an ionizing particle can affect simultaneously several devices. Additionally the response of the parasitic devices to SETs can lead to charge collection exceeding that deposited by the ionizing particle. Examples are the PNPN parasitic structures in CMOS devices which can even lead to latch-up and the parasitic bipolar junction transistors which cause enhanced charge collection [3]. In this work, such conditions are addressed in the VCO differential delay cells and the fast multiplexer (fMUX) where triple-well transistors are used. The triple-well structure is expected to better isolate the devices from radiation-induced charge collection.

Considering the registers within the serializer, triple modular redundancy (TMR) scheme is used in the clock generator to increase SET immunity. This technique however limits the maximum frequency to values much lower than is otherwise achievable with this technology.

The techniques followed to minimize such penalties are summarized in the next section.

#### **III. CIRCUIT DESIGN**

The delay cell[7] chosen for the ring-oscillator is a standard differential-pair with symmetric loads as shown in Fig. 3. The low-pass filter (LPF) voltage, shown as bn in Fig 3, is used to control the differential pair tail current and thus to control the VCO oscillation frequency. The bias of the symmetrical loads is generated through the replica-bias circuit represented in Fig. 3-B.



Figure 3: The differential delay cell (A) and its replica-bias (B).



Figure 4: The fast-multiplexer with the history clearing scheme.

The fast multiplexer shown in Fig. 4, is implemented by an 8 input nMOS logic And-Or-Inverter (AOI) structure driven both by the clock phases  $Q_i$  and the pseudo-complementary shift register signals  $SR_i$ . The required clock phases  $Q_0$ ,  $Q_1$ , and  $Q_2$  are generated within the PLL clock divider. The clock phases  $Q_0$ ,  $Q_1$  and  $Q_2$  are non-overlapping so at any time only one of the fMUX branches is active.

A straightforward AOI multiplexer has the following drawbacks that the circuit presented in Fig. 4 addresses. Firstly, depending on the history of logic levels of  $SR_i$  inputs driving the branches which are disabled by logic-low  $Q_i$  signals, the output node experiences different amounts of charge sharing between the nodes  $n_1$  to  $n_3$  leading to different delays and thus patterndependent jitter. In order to solve this problem, relatively small transistors driven by the next  $Q_i$  phase are connected in parallel with those driven by  $SR_i$  to clear the effect of signal history. When a branch is selected by the corresponding  $Q_i$  phase, these small transistors ensure that the node in between the two transistors is pre-discharged to the ground so that all the transitions start with identical initial conditions. In this way, the patterndependent timing ambiguity is minimized.

#### A. SEU Tolerance

Radiation tolerance of the feed-back divider is obtained by the TMR techniques. Due to the extra logic employed, these techniques limit limit the maximum achievable operating frequency. Circuits voting the inputs or outputs of D-FFs increase the logic propagation delays and cannot be used for high speed applications. Instead, a novel voted dynamic D-FF was designed which embeds the voter. Its schematic can be seen in Fig. 5.



Figure 5: Improved TRM dynamic D-FF.

### **IV. SIMULATION RESULTS**

The PLL architecture adopted can be modeled by a secondorder type-II negative feed-back loop for which an analytic model can be found in [5] and [4]. Fig. 6 shows the possible operating points (circles) of the CP-PLL on the stability map which plots the normalized forward loop gain as a function of the normalized reference input. The overload and z-plane stability limits[4] are also shown. The desired operating points are located in the vicinity of 10 % of the overload limit which set in at lower values.



Figure 6: CP-PLL stability plot.

A practical issue in designing PLLs is the fact that not all the loop parameters can be arbitrarily chosen, requiring some building blocks to be laid out before the actual model based simulations can take place. As examples, the time constant of the loop filter or the charge-pump current can be freely chosen, however the designer cannot set arbitrarily the VCO gain since it very much depends on the circuit topology and the semiconductor process used. The VCO gain contributes to the forward gain of the control loop and is a very important parameter for the loop behavior. The VCO gain and its variations can be known only once the circuit is laid out and the parasitics are extracted. Only then the model based simulations mentioned above can be performed. It is thus necessary to start the PLL design with the implementation of the VCO down to the layout level before realistic model based simulations of the loop itself can be done. Circuit design in these cases is thus an iterative optimization process between the schematic and the layout levels.

#### A. Single-Event Transient Simulations

In the simulation results presented in this section, the charge released within the silicon by an ionizing particle is modeled as ideal rectangular current pulses[6] of different amplitudes with a fixed duration of 10 ps. Even though a double-exponent wave form with a relatively long tail better resembles the actual wave form, it must be extracted from process simulations to correspond to a real conditions. At the time of this writing, however, such process simulation results were not available. Consequently, the effects of the wave form of the injected pulse was not modeled and only that of the magnitude of the injected net charge was considered.

An incoming ionizing particle releases charge that is collected by the microelectronic devices nearby. Fig. 7 sketches how the ionizing particle passages are modeled as ideal current pulses applied to SET vulnerable nodes of the circuit under study. For the VCO differential cell shown in Fig. 7, the charge released is sensed by the drain and/or the source of the transistors causing an effective phase shift at the VCO signal. The simulation result of Fig. 8 shows the low-swing differential VCO signal and the corresponding large-swing single-ended output when an ionizing particle releases charge in the circuit at approximately t=300 ps instant from the beginning of the simulation. The injected charge is relatively small causing only a small phase shift, however in case the amount of charge released by the ionizing particle is large enough, the VCO can even cease oscillation for a while and then recover nominal operation. Such a condition is shown in Fig. 9.



Figure 7: Differential delay cell (D) and the two vulnerable points to be affected by an ionizing particle passage, denoted as A and B.



Figure 8: Moderate SET-induced disturbance: ionizing particle strike perturbs (0.1 pC) the VCO (small amplitude signals) and shifts (as 10 ps) the D2S phase from its nominal evolution (large amplitude non-perturbed signal and its shifted copy).



Figure 9: Due to excessive charge deposition (50 pC) by ionizing particles, the VCO oscillation can be temporarily interrupted.



Figure 10: The effect of the ionizing particle's arrival instant on the amount of delay it causes.

The phase shift induced on the VCO signal by the ionizing particle does not only depend on the magnitude of the charge deposited but also on the instant the charge is collected by the circuit in relation to the VCO cycle. It is possible to find the worst-case sensitivity to the collection of charge via simulation by sweeping the "arrival time" of the ionizing radiation.

Fig. 10 sketches conceptually a simulation result where the output of the differential ring-oscillator is plotted at the top and the delay caused by 0.1 pC of charge injected is plotted as a function of the arrival instant at the bottom. The injection instant is swept over a single VCO cycle. Simulations show that there are two time intervals where the sensitivity is the highest: these correspond to the periods where the VCO output changes at a faster rate. The instants marked as  $t_a$  and  $t_b$  correspond to the maximum phase shift ( $PS_{max}$ ) and the sensitivity is 100

s/C or is equivalently  $1.6 \times 10^{-17} \ s/e^-$ . The SET performance of the VCO is evaluated based on these worst-case time instants.

The worst-case phase error as a function of injected charge is plotted in Fig. 11. The design criteria used for the 4.8 GHz VCO was that a 30 mA current pulse with 10 ps width, corresponding to 0.3 pC of charge release, injected/sank to/from the nodes A and/or B of Fig. 7 should cause a maximum phase shift of approximately 20 ps. Intuitively considering closed loop PLL operation, the amount of timing error per reference clock cycle that the ionizing particles generate should not be bigger than the amount of correction that the loop can perform. This limits the maximum phase error and prevents bursts of errors that otherwise will lead to a significant increase in serializer bit-error rate (BER). This is an issue specific to the design of radiation tolerant PLLs. To achieve such a robustness, we adopted the solution of keeping the current flowing through the transistors just large enough so that the charge released by an ionizing particle does not significantly affect the circuit biasing and oscillation cycle. To accommodate the higher currents while keeping a specific oscillation frequency, transistor widths have to be increased accordingly. This helps achieving tolerance to SETs due to the increased circuit capacitances. The disadvantage of the technique is the increased power consumption of the VCO which might have to be biased with currents several times higher than those that would be normally required to achieve low phase noise operation at the given operating frequency. Running the VCO at high currents does not however impair its phase noise performance.



Figure 11: The worst-case phase error versus the injected current.

The phase shifts considered so far occurs only once an ionizing particle releases charge within the VCO delay cell. However if the charge is deposited at the charge-pump node connected to the filter, the voltage difference it causes on the VCO control signal modulates the VCO frequency. The VCO frequency difference will integrate over 120 VCO cycles until the phasefrequency detector (PFD) generates the next correction signal. In order to minimize this effect, a large filter capacitance must be employed. In the serializer PLL, segmented nMOS transistors in accumulation mode were used with a total capacitance of 300 pF. The LPF itself occupies a total area of less then 400x200  $\mu m^2$ .

### V. SUMMARY

The BER performance of high-speed links depends strongly on the jitter characteristics of the serializer and deserializer circuits. For the serializer, jitter in the transmitted signal has two main origins: random jitter generated by the VCO phase noise and the tracking behavior of the clock multiplying PLL, pattern dependent jitter essentially due to bandwidth limitations, and clock skew in the parallel-to-serial conversion circuits. For serializer circuits operating under radiation, ionizing particles can contribute significantly to the increase of the BER[13]. This can take the form of single or bursts of errors. Single errors can happen for example when one of the bits of the serializer shift registers suffers a single event upset. However circuits like the VCO and the PLL loop-filter when disturbed can lead to bursts of errors adversely affecting the BER which can even lead to losses of link synchronization which will result in relatively large dead times in the data transmission system. It is thus particularly important to minimize the effects of SETs on these last two circuits since they keep a "long term" memory of the disturbing event. For the VCO, in the best case, a SET will appear as a phase jump that will stay uncorrected until the PLL action restores the steady state conditions. In the case of loop-filter, any disturbance will be integrated resulting in large phase errors which again need to be compensated by the PLL. Since in serializer PLLs the loop bandwidth is typically several orders of magnitude lower than the VCO oscillation frequency, the loop action alone is not fast enough to fully compensate for the effects of SETs. It is thus important to use SET robust circuits in the PLL. This paper described the approach adopted to achieve this goal. In particular, the design criteria and simulation method used to design a SET robust VCO were detailed. There it was shown that for SEU tolerance, running the VCO with relatively high currents is an advantage. Although low-power consumption is always desirable, our study shows that ring oscillators can only be made low power at the cost of high sensitivity to SETs.

Another critical component in a PLL working under radiation is the feedback counter. Any upset in this circuit might appear to the PLL as large phase shift resulting on a long settling time or even in a full locking cycle. In any case, such an event will almost certainly desynchronize the receiver PLL resulting in a long dead time. To avoid such behavior the clock divider must use a triple modular redundancy architecture. However, due to the high speed operation of the counter, it became evident that the common scheme of using a flip-flop preceded by a majority voter would not allow to design a high yield circuit for the specified range of process, temperature and voltage variations. To overcome this obstacle a new dynamic flip-flop with embedded voter was developed and is used in the ASIC for the digital circuits that operate at the highest clock frequencies.

Also with the aim of achieving high yield, the parallel-toserial converter uses three shift registers operation at 1/3 of the bit clock frequency. The full data rate serial stream is obtained by time division multiplexing those three serial steams using a single fast multiplexer. This multiplexer uses a special architecture to minimize pattern dependent jitter and it is described in detail in the paper. A serializer/de-serializer ASIC that contains the serializer described in this work was designed in a commercial 130 nm CMOS technology. Fig. 12 shows the serializer layout.

| IR                |     |
|-------------------|-----|
| SR2<br>SR1<br>SR0 |     |
|                   | O D |

Figure 12: Layout of the serializer occupying  $0.6 mm^2$  of die area.

#### REFERENCES

- A.I. Chumakov, *et al.*, Elsevier Radiation Measurements 30 (1999) 547-552
- [2] G. Anelli, *et al.*, IEEE transactions on nuclear science, 2002, Vol. 49, No 4
- [3] G. Bruguier *et al.*, IEEE transactions on nuclear science, Vol. 44, 522-532, April 1996
- [4] F. M. Gardner, IEEE journal of solid-state circuits, Vol. com. 28, no. 11, 1980
- [5] F. M. Gardner, Phaselocking Techniques, John Wiley and Sons, 2005
- [6] H. H. Chung *et al.*, IEEE transactions on nuclear science, Vol. 53, no. 6, 2006
- [7] J. G. Maneatis, IEEE journal of solid-state circuits, Vol. 31, issue 11, November 1996, page(s):1723-1732
- [8] T. D. Loveless *et al.*, IEEE transactions on nuclear science, Vol. 53, no. 6, 2006
- [9] T. D. Loveless *et al.*, IEEE transactions on nuclear science, Vol. 54, no. 6, 2007
- [10] W. Chen *et al.*, IEEE transactions on nuclear science, Vol. 50, no. 6, 2003
- [11] Y. Boulghassoul *et al.*, IEEE transactions on nuclear science, Vol. 52, no. 6, 2005
- [12] Z. Cao *et al.*, IEEE journal of solid-state circuits, Vol. 43, no. 9, 2008
- [13] T. Toifl, P. Moreira and A. Marchioro, Proceeding of the Sixth Workshop on Electronics for LHC Experiments, Cracow, Poland, 11-15 Sept. 2000, pp.226-30

### Detector Control System for the Electromagnetic Calorimeter of the CMS experiment

P. Adzic<sup>1</sup>, A. Brett<sup>7</sup>, D. Di Calafiori<sup>2</sup>, F. Cavallari<sup>3</sup>, Wan-Ting Chen<sup>9</sup>, G. Dissertori<sup>2</sup>, Apollo Go<sup>9</sup>, R. Gomez-Reino<sup>4</sup>, A. Inyakin<sup>5</sup>, D. Jovanovic<sup>61</sup>, G. Leshev<sup>82</sup>, A. Singovski<sup>5</sup>, Syue-Wei Li<sup>9</sup>, E. Di Marco<sup>3</sup>, P. Milenovic<sup>21</sup>, X. Pons<sup>4</sup>, T. Punz<sup>2</sup>, J. Puzovic<sup>61</sup>, S. Zelepukin<sup>82</sup>

<sup>1</sup>VINCA Institute, Belgrade, Serbia, <sup>2</sup>ETH Zurich, Switzerland, <sup>3</sup>INFN, Rome, Italy, <sup>4</sup>CERN, Geneva, Switzerland, <sup>5</sup>University of Minnesota, USA, <sup>6</sup>University of Belgrade, Serbia <sup>7</sup>Fermi National Accelerator Lab. <sup>8</sup>University of Wisconsin, Madison <sup>9</sup>National Central University, Taiwan

Peter.Adzic@cern.ch, Angela.Brett@cern.ch, Francesca.Cavallari@cern.ch, Diogo.Di.Calafiori@cern.ch, Wan-<u>Ting.Chen@cern.ch, Apollo.Go@cern.ch, Guenther.Dissertori@cern.ch, Robert.Gomez-Reino.Garrido@cern.ch,</u> <u>Alexandre.Inyakin@cern.ch, Dragoslav.Jovanovic@cern.ch, Georgi.Leshev@cern.ch, Alexander.Singovski@cern.ch, Syue-Wei.Li@cern.ch Emanuele.Di.Marco@cern.ch, Predrag.Milenovic@cern.ch, Xavier.Pons@cern.ch Thomas.Punz@cern.ch, Jovan.Puzovic@cern.ch, Serguei.Zelepoukine@cern.ch</u>

### Abstract

The Compact Muon Solenoid (CMS) is one of the general purpose particle detectors at the Large Hadron Collider (LHC) at CERN. The challenging constraints on the design of one of its sub-detectors, the Electromagnetic Calorimeter (ECAL), required the development of a complex Detector Control System (DCS). In this paper the general features of the CMS ECAL DCS during the period of commissioning and cosmic running will be presented. The feedback from the people involved was used for several upgrades of the system in order to achieve a robust, flexible and stable control system. A description of the newly implemented features for the CMS ECAL DCS subsystems will be given as well.

#### I. INTRODUCTION

CMS construction work has been finalised at Point 5 near Cessy (France). One of the most accurate, distinctive and important detector systems of the CMS experiment is the high precision Electromagnetic Calorimeter (ECAL). It will provide measurements of electrons and photons with an excellent energy resolution (better than 0.5% at energies above 100 GeV [1]), and thus will be essential in the search for new physics, in particular for the postulated Higgs boson.

The calorimeter is designed as a homogeneous hermetic detector based on 75848 Lead-tungstate (PbWO4) scintillating crystals. The structure of ECAL [1] is subdivided in three main parts: Barrel (EB) part, End-cap (EE) part and Preshower (ES). Avalanche Photo Diodes (APD) and Vacuum Phototriodes (VPT) are used as photodetectors in the barrel part and in the end-cap parts of the detector, respectively [1]. The barrel consists of 36 supermodules (SM) forming a cylinder around the interaction point. The EEs are the structures which close both ends of this cylinder and each of them is formed by two half disks named DEEs. The ES follows the EE's shape and is placed in front of it. All these components and front-end (FE) readout electronics inside the ECAL satisfy rigorous design requirements in terms of their response time, signal-to-noise ratio, immunity to high values of the magnetic field induction (up to 3.8T in the barrel part of the ECAL) as well as in terms of radiation tolerance (expected equivalent doses of up to 5 kGy and neutron fluence of up to

10<sup>12</sup> neutrons/cm<sup>2</sup>) [1]. However, it has been shown that the light yield of PbWO4 crystals and the amplification of the APDs are highly sensitive to temperature and bias voltage fluctuations [2, 3]. Therefore, the usage of these components has directly imposed challenging constraints on the design of the ECAL, such as the need for rigorous temperature and high voltage stability. At the same time, possible changes in the crystal transparency, which can be induced by the radiation, imposed additional requirements for monitoring of the crystal transparency [1]. For all these reasons specific ECAL DCS sub-systems had to be designed.

The implemented ECAL DCS consists of both hardware systems and controls applications [4] (Figure 1). Its monitoring hardware consists of the ECAL Safety System (ESS) and the Precision Temperature and Humidity monitoring (PTHM).



Figure 1: CMS ECAL DCS block diagram (simplified)

The ECAL DCS applications are responsible for the control of systems which provide necessary services for the ECAL. These include: Supervisor, Low Voltage (LV), High Voltage (HV), ESS Air Temperatures, ESS, PTHM, Detector Control Unit (DCU) Monitoring, Cooling Monitoring, Laser Monitoring, ECAL VME Crates Control and the ES Control and Monitoring.

### II. HIGH VOLTAGE AND LOW VOLTAGE SYSTEMS

The APDs require a power supply system with a stability of the bias voltage of the order of few tens of mV. For this reason, a custom HV power supply system has been designed for the CMS ECAL in collaboration with the CAEN Company [5]. The system is based on a standard control crate (SY1527) hosting eight boards especially designed for this application (A1520PE). Up to nine channels can be hosted on a single A1520PE board and each channel can give a bias voltage of up to 500 V with a maximum current of 15 mA. The operating APD gain of 50 requires a voltage between 340 and 430 V. In total, there are 18 crates and 144 boards for the barrel. The SY1527 crate communicates with a board controller via an internal bus and is operated by the ECAL DCS via an OPC server.

In the endcaps, by default all VPTs are operated at anode and dynode voltages of 800 and 600 V, respectively. The VPTs require a stability of the bias voltages of about 10 V. The HV system is based on standard CAEN control crates (SY1527) each hosting two off-the-shelf HV boards (A1735P). Up to six pairs of channels can be hosted on a single A1735P board and each channel can give a bias voltage of up to 1500 V with a maximum current of 7 mA. There is 1 crate for each of the 2 endcaps. The power supplies are complemented by a custom-designed 84-way distribution system [Rutherford Appleton Laboratory, DEG 547/548] which incorporates additional protection circuitry and a clean method to operate each of the 84 channels at one of three different pairs of bias voltages.

The ECAL digitization electronics located on the very front-end (VFE) electronics cards require also a very stable low voltage to maintain constant signal amplification. The system uses low voltage regulators that guarantee this stability. The power is supplied by the LV system that is based on multichannel MARATON LV power supplies (PS) from Wiener [6]. Two types of LV PS are used: a type with six channels of 8V/110A (660 W) and a type with five channels of 8V/110A (660 W) and two channels of 8V/55A (330 W). In total there are 108 PS for the ECAL barrel and 28 PS for the ECAL end-cap. All the LV PS are water-cooled and operated by three ECAL DCS PCs via CAN-bus and an OPC server.

### **III. COOLING SYSTEM**

The ECAL Cooling system employs the water flow to stabilise the detector to 18 °C within 0.05 °C. Each supermodule and each end-cap is independently supplied with water at 18 °C. The water runs through a thermal screen placed in front of the crystals which thermally decouples them from the silicon tracker, and through pipes embedded in the aluminium grid in front of the electronics compartments. Regulation of the wate0r temperature and the water flow, as well as the opening of valves is performed by a dedicated Siemens PLC system. This system is operated by a PC via S7 connection and monitored by the ECAL DCS.

### IV. PRECISION TEMPERATURE AND HUMIDITY MONITORING (PTHM)

The purpose of the temperature monitoring system is to provide precision temperature measurements and to monitor the stability of the temperature distribution in the environment of the ECAL crystals and photo-detectors. In addition, it should provide archiving of the temperature distribution history for the use in the ECAL data processing.

In order to provide this functionality, 360 high quality NTC thermistors [7] with very good long-term stability are installed in the ECAL supermodules and 80 more are installed in the ECAL end-cap Dees. Sensors are individually precalibrated by the manufacturer and then tested and sorted in the lab to ensure a relative precision better than 0.01 °C.

The purpose of the humidity monitoring system is to measure the relative humidity (RH) of the air inside the ECAL electronics compartments and to provide early warnings about high humidity conditions that may potentially lead to water condensation inside the detector. There are 176 HM sensors with 5-7% RH precision [8] placed inside the ECAL.

The readout system of the PTHM system is based on ELMB modules designed by the ATLAS experiment [9] (Figure 2).



Figure 1: ELMB and PTHM electronic boards with ELMB.

Both temperature and humidity sensor samples were tested for their capability to work in an environment with high radiation levels and strong magnetic field that will be present in the ECAL region of CMS. Sensors have shown to be able to maintain their operational parameters unchanged during the expected running life time of the ECAL.

After the raw sensor signals are digitized with the ELMB's ADC, the data are sent by the ELMB's microcontroller via CAN bus to the DCS PC hosting the PTHM application, which is located in the CMS service cavern (USC). All ELMBs located within the crates inside one rack are connected to a single multi-point CAN bus.

The performance of the PTHM readout system in terms of resolution and noise levels has proved to be outstanding. Temperature fluctuations from the noise introduced in the system are of the order of 0.001 °C in the range of 18 - 22 °C.

### V. ECAL SAFETY SYSTEM (ESS)

The purpose of the ESS [4] is to monitor the air temperature of the ECAL electronics environment (expected to be in the range of 25 - 32 °C), to monitor water leakage sensors routed inside the electronics compartments, to control the proper functioning of the ECAL Cooling and LV Cooling

systems and to automatically perform safety actions and generate interlocks in case of any alarm situation.

In order to achieve these goals 352 EPCOS NTC thermistors [10] are positioned in redundant pairs at the centre of each module of the ECAL barrel supermodules and at four locations inside each quadrant of the ECAL end-cap Dees. In accordance with the design objectives, the ESS temperature sensors are calibrated to a precision of 0.1°C. The functionality of the water leakage detection has been based on commercial water leakage sensor-cables provided by RLE Technology [11].

The temperature and water leakage sensors of the ESS are read out by the front-end part of the readout system, which comprises 12 ESS Readout Units (RU) located in the CMS experimental cavern. Each ESS RU represents an electrically and logically independent entity that can support up to four supermodules or up to two end-cap Dees.

In order to provide a reliable and robust readout system, the ESS RUs have been designed in a completely redundant way. Each redundant part of one RU is equipped with a RS485 interface and based on a Microchip PIC microcontroller and a so-called RBFEMUX block of electronics. This block of electronics inside the ESS RU provides intelligent sensor information multiplexing, as well as the digital implementation of a resistance bridge (RBFE) for removal of different readout signal dependencies on voltage offsets, thermocouple effects, power supply and ambient temperature drifts etc. Information from the temperature sensors from four input ports of one RU is mixed between its two redundant parts in a way which minimizes the possibility of losing temperature information inside the ECAL due to malfunctioning of an ESS RU component.

The part of the system where sensor information is processed and interlocks are accepted/generated is based on the industrial Siemens Programmable Logic Controllers (PLCs). The ESS PLC system has been designed and built as a redundant and distributed set of modules from S7-400 and S7-300 families. Since one of the main objectives of the ESS is a very high degree of reliability, a specific ESS multi-point communication protocol that provides reliable information exchange between ESS RUs and ESS PLC also had to be designed.

Both ESS sensors and electronics of ESS RUs were tested for radiation tolerance to appropriate doses and showed no shift in any parameter, while the cross section for single-event effects was proven to be negligible [12].

The ESS performance has been tested during the ECAL integration and test-beam periods in 2006 and 2007, as well as during the ECAL commissioning in 2008 and 2009. The system has shown the expected reliability. At the same time, its temperature readout system has shown to have a relative precision better than 0.02 °C.

#### VI. ECAL DCS SOFTWARE

ECAL DCS applications have been developed using the commercial ETM SCADA (Supervisory Control And Data Acquisition) software PVSS [13]. The version currently used is 3.8 on top of which the CERN ITControls group has added

Joint Control Project (JCOP) framework components [14]. The ECAL DCS is implemented using these technologies and is now integrated with the central CMS DCS within the Finite State Machine framework, which provides hierarchical control and monitoring of CMS and ECAL.

### A. The Supervisor

The Supervisor has been designed to be connected to all ECAL DCS subsystems and to centralize the control and the monitoring of all interactions between them. From the main Supervisor panel, the operator can monitor the status of all subsystems, instantly find the source of possible problems, issue commands to the LV, HV and ECAL VME Crates Control subsystems and manually shutdown the whole detector or parts of it in case of any problem.

The Supervisor application also handles the automatic controlled shutdown of detector's partitions, with granularity at the level of one SM/DEE. This mechanism follows a very simple logic (Fig. 3): The shutdown of the concerned partition is triggered if any of the subsystems/applications which monitor the detector's conditions (such as air temperature, water temperature and humidity) change into the ERROR state.



Figure 3: Automatic shutdown logic implemented in the ECAL DCS software framework.

### B. Low voltage

The full application runs on three separate computers due to the limitation on the number of CAN branches per KVASER CAN adapter and in order to reduce the load per CPU, as the WIENER OPC server is rather resource intensive. A dedicated mechanism cross-checks the desired inhibits pattern against the inhibit configuration actually loaded into the crate.

### C. High Voltage

The control and monitoring of all the 1240 CAEN high voltage channels are handled by this application [1,3], which runs on four separate computers in order to reduce the load per CPU.

Because of the specific properties of the crystals used for the barrel part of the ECAL, a unique voltage should be set to each of the APDs. This functionality has been implemented in the controls software for the HV subsystem.

#### D. Safety System

The experience acquired during the operations showed the need for separating the part of the application that is used for

the control system from the part that represents the safety system itself. As a consequence, the "SM/DEE Air Temperatures" subsystem was created. It includes the ESS sensors information only. Its error conditions are used as a trigger for the DCS automatic controlled shutdown (via software) of the concerned partitions.

# *E. Precision Temperature and Humidity Monitoring*

It is fully implemented under the ECAL Supervisor. The structure of the software application was optimized several times during the detector's running period. The final software solution is used to trigger automatic shutdowns on the Supervisor's level in case of abnormal situations.

### F. Detector Control Unit Monitoring

The DCU monitoring application was re-designed in order to provide DCU data as information to the shifter, without any automatic shutdown action in case of abnormal readings.

### G. Cooling Monitoring

This application [3] only monitors all the relevant data of the ECAL SM/DEE cooling system which are provided by the dedicated system. The Cooling Monitoring application is configured to trigger an automatic software shutdown of a specific detector partition before the ESS takes any action based on the cooling water temperature.

### H. Additional Software Applications

There are several specific applications which were integrated under the ECAL Supervisor. These are:

### 1) ECAL Preshower

This part of the controls software was installed in August 2009, just before the start of the CMS global cosmics run.

### 2) ECAL DCS Laser Monitoring

It displays the relevant information which is sent by the Laser Control System.

### 3) ECAL VME Crates Control

Service implemented by the central CMS DCS and integrated under the ECAL DCS Supervisor as a tool, which provides remote control of the power of ECAL EB/EE VME crates.

### VII. ECAL DCS OPERATIONAL EXPERIENCE

The period of commissioning and cosmic running was efficiently used to test the ECAL DCS hardware in the CMS environment, as well as all its interfaces to other systems. A permanent ECAL DCS expert on-call service was provided during the whole detector's running period.

All shutdowns triggered by the CMS Safety System (DSS) and by the Magnet Safety System (MSS) were always correctly performed by the ESS.

The automatic software shutdown mechanism has proven to be very efficient. The most common triggers for such shutdowns were failures of the CMS primary cooling circuit. In all of these situations the ECAL DCS has smoothly switched off the detector power before any action of the ESS was necessary.

The ECAL DCS software components were constantly upgraded in order to fulfil all relevant user's requests and consequently to move towards an optimal system. The CMS ECAL DCS has reached a fully operational and stable configuration. From the operational point of view the system can be considered ready for the LHC startup, which is foreseen for November 2009.

### VIII. REFERENCES

[1] CMS Collaboration, "The Electromagnetic Calorimeter Technical Design Report", CERN/LHCC 97-33 (1997).

[2] A.A. Annenkov, P. Lecoq and M.V. Korzhik: "Lead Tungstate scintillation material", NIM A490 (2002) 30.

[3] Z. Antunovic et al.: "Radiation hard avalanche photodiodes for the CMS detector", NIM A537 (2005) 379.

[4] P. Adzic et al., "The Detector Control System for the Electromagnetic Calorimeter of the CMS Experiment at LHC", International Conference on Accelerator and Large Experimental Physics Control Systems, Knoxville, Tennessee, USA, 15 – 19 Oct 2007, pp.190-192

[5] A. Bartoloni, "The power supply system for the CMS-ECAL APDs", Proceedings of 7<sup>th</sup> Workshop on Electronics for LHC Experiments LEB 2001, Stockholm, Sweden, CERN-2001-005 (2001), 358.

[6] WIENER MARATON highest density Power Supply for Hostile Environment, <u>http://www.wiener-d.com/M/22/51.html</u>

[7] NTC thermistors 100K61A from Betatherm, http://www.betatherm.com/

[8] RH sensors UPS-600 from Ohmic instruments, http://www.cweb5.com/ohmic/

[9] B. Hallgren et al., The Embedded Local Monitor Board (ELMB) in the LHC Front-end I/O Control System, presented at the 7th Workshop on Electronics for LHC Experiments, Stockholm, Sweden, September 2001.

[10] EPCOS AG, NTC thermistor 470Ω@25°C, ordering code B57211V2471J060, <u>http://www.epcos.com/</u>

[11] RLE's patented Water Leak Detection Cable (SC), http://www.rletech.com/products/cable.html

[12] P. Milenovic et al., "Performance of the CMS ECAL safety system for Super Modules SM0 and SM1", Nucl. Instrum. Meth. A **554** (2005) 427.

[13] ETM professional control: Prozess Visualisierungs und Steuerungs System SCADA tool,

http://itcobe.web.cern.ch/itcobe/Services/Pvss/welcome.html.

[14] M. Beharrell et al., Technology Integration in the LHC experiments' Joint COntrols Project, CHEP'01, Beijing, September 2001. http://cern.ch/itco/Projects-Services/JCOP/
# An integrated DC-DC step-up charge pump and step-down converter in 130 nm technology

M. Bochenek<sup>a,b</sup>, W. Dabrowski<sup>b</sup>, F. Faccio<sup>a</sup>, J. Kaplon<sup>a</sup>

<sup>a</sup> CERN, 1211 Geneva 23, Switzerland

<sup>b</sup> Faculty of Physics and Applied Computer Science, AGH-UST, Krakow, Poland

B nk rn

### Abstract

After the LHC luminosity upgrade the number of readout channels in the ATLAS Inner Detector will be increased by one order of magnitude and delivering the power to the front-end electronics as well as cooling will become a critical system issue. Therefore a new solution for powering the readout electronics has to be worked out. Two main approaches for the power distribution are under development, the serial powering of a chain of modules and the parallel powering with a DC-DC conversion stage on the detector. In both cases switched-capacitor converters in the CMOS front-end chips will be used. In the paper we present the design study of a step-up charge pump and a step-down converter. In optimized designs power efficiency of 85 % for the step-up converter and 92 % for the step-down converter has been achieved.

# I. INTRODUCTION

The present design of Upgraded Atlas Inner Detector assumes about 10 times higher number of silicon strips compared to the present Semiconductor Tracker. Although the power consumption per channel is expected to be reduced significantly, the supply current will be reduced to a lesser degree and delivering power to the front-end chips is a big challenge. Two main approaches for power distribution are under development; the serial powering of a chain of modules and independent powering of modules from DC-DC converters located on the module. In either case switched-capacitor converters in the front-end chips will be used.

In the serial powering scheme, the 1.2 V clean supply voltage for the analog part of the front-end chip must be produced from 0.9 V digital power supply obtained from a shunt regulators. Therefore a linear voltage regulator must follow the stepup converter. Since the current consumed by the analogue part is constant and has moderate value (max. 30 mA), the optimization of the converter is focused on minimization of the output ripple. In this case, the output impedance and power efficiency is not of primary importance.

In the second possible scheme the digital part of the frontend electronics will be supplied directly from the on-chip DC-DC step-down converter providing 0.9 V. Due to high variations of the digital current consumption during chip operation and keeping in mind that the digital current is the substantial part of the global current in the chip, the main parameters to be optimized are the power efficiency and the output impedance.

# II. REALIZATION

# *A. Step-up DC-DC converter*

#### 1) Architecture and principle of the voltage pump

The developed step-up DC-DC converter is based on the concept of the voltage doubler proposed by P. Farvat et. al [1] and Y. Moisiadis et. al [2]. It consists of four building block: a non-overlapping clock generator, buffers, level shifters and a voltage doubler.

The core of the circuit is the voltage doubler, shown in Fig. 1. It consists of two cross-coupled, low  $V_t$  NMOS transistors (M1 and M2), four PMOS transistors with thick gate oxide working as serial switches (M3-M6), three external SMD capacitors  $(C_{\rm PUMPX}$  and  $C_{\rm HOLD})$  and one small capacitance integrated on the chip  $(C_{\rm POL})$ . For the simulation purposes a load resistance  $(R_{\rm LOAD})$  of 60  $\Omega$  was added at the output of the converter. An equivalent series resistance of SMD capacitors of 50 m $\Omega$  was taken into consideration as well. The 470 nF value of the external capacitors  $C_{\rm PUMPX}$  and  $C_{\rm HOLD}$  has been chosen making a compromise between the capacitance value and the size (0603).

Except the level shifters which are working with 1.6 V output voltage the rest of the circuit is supplied with 0.9 V. The nominal output current for the charge pump is specified to be around 25 mA. The output voltage obtained for this current is 1.6 V. The calculated power efficiency is in the range of 84 % for an optimized clock frequency of 500 kHz.



Figure 1: Schematic diagram of the voltage doubler.

Although the power efficiency is not of primary importance, the circuit has been optimized in order to obtain the highest possible efficiency for its nominal output current. Special attention was paid to optimize the W/L ratio of serial PMOS (M3 and M4) switches which can have high impact on the resistive losses in the converter. In order to decrease the  $R_{ON}$  of the switches their length was set to minimum value available in IBM 0.13  $\mu$ m technology, which is 240 nm for thick gate oxide PMOS transistor (120 nm for low V<sub>t</sub> NMOS transistor with thin oxide). The width of the switches was optimized using Spectre simulations.

For further improvement of the power efficiency an auxiliary charge pump was added to the main voltage doubler. This charge pump consists of two PMOS transistors M5 and M6 and integrated capacitance  $C_{POL}$ . It shares two external capacitances ( $C_{PUMPX}$ ) with the main voltage doubler as well. Transistors M5 and M6 help to eliminate the effects of vertical bipolar parasitic structures by binding n-wells of main serial switches (M3 and M4) to the high potential. The auxiliary charge pump works without the resistive load which results in its high power efficiency.

The principle of voltage pumping is the following. When CLK signal is in high state ( $\overline{\text{CLK}}$  is in low state, respectively) transistor M1 is turned off and M2 is turned on. At the same time M3 is turned off and M4 is turned on. Thus the top plate of the capacitor  $C_{\text{PUMP}}$  is charged to the supply voltage  $V_{\text{IN}}$ . In the same time capacitors  $C_{\text{PUMP}}$  and  $C_{\text{HOLD}}$  are connected in parallel. During the second phase (CLK - low and  $\overline{\text{CLK}}$  - high) the bottom plate of  $C_{\text{PUMP}}$  remains at  $V_{\text{IN}}$  while on the  $C_{\text{PUMP}}$  there is still charge equal to  $V_{\text{IN}}C_{\text{PUMP}}$  from the previous phase. This charge is then transferred to the output capacitance ( $C_{\text{HOLD}}$ ).

#### 2) Level shifting

Because of poor driving capability of used big PMOS serial switches (M3 and M4) two level shifters are needed. Fig. 2. shows the schematic of such a level shifter which requires two voltage supply domains: input voltage supply (0.9 V) and output supply (1.6 V) taken from the output of the charge pump. This architecture was proposed by J. Rocha et. Al [3] and Q. A. Khan [4]. The circuit shifts the high state of CLK<sub>IN</sub> from V<sub>IN</sub> (0.9 V) to V<sub>OUT</sub> (1.6 V). Each level shifter consists of eight transistors. All of them, apart from two transistors used in the inverter, are MOS transistors with thicker gate oxide. PMOS transistors M7 and M9 are added to increase the speed of the circuit. In order to reduce current injection to the bulk, triple-well NMOS transistors are used.



Figure 2: Schematic diagram of the level shifter.

#### 3) Non-overlapping clock generator and buffers.

The clock generator shown in Fig. 3. is a modification of the circuit proposed by L. Pylarinos [5]. In order to obtain better power efficiency it is very important to ensure that the driving clock signals do not overlap. It is possible by using current starved inverters with a current limitation set in this case to 120  $\mu$ A. Schmitt inverters were also used. Capacitors shown in Fig. 3. are integrated. Their capacitance is as high as 1 pF which is sufficient to separate clock signals.

The clock signals are additionally buffered to drive efficiently large switching transistors. Each buffer consists of a chain of seven scaled inverters. In last five inverters triple-well NMOS transistors were used.



Figure 3: Schematic diagram of the non-overlapping clock generator.

## B. Step-down DC-DC converter

#### 1) Architecture and principle of operation

The core of the DC-DC step-down converter [6], [7], [8] is shown in Fig. 4. It is built of four stacked transistors (one PMOS and three NMOS) and three external SMD capacitors with low ESR. The whole circuit is supplied with 2.0 V. The nominal output current is specified to be around 60 mA. The output voltage obtained for this current is in the range of 920 mV. The power efficiency obtained for the nominal current is up to 92 %, but the converter can operate at 100 mA with high power efficiency, even up to 87 %. All CMOS devices used in the design are transistors with thicker gate oxide (5.2 nm) allowing the maximum supply voltage of 2.5 V.



Figure 4: Schematic diagram of the switched capacitor DC-DC stepdown converter.

The optimization of the power efficiency of the converter was performed in the following steps. First, the W/L ratio of all switches was optimized. The main goal was to reduce the resistance of the switches by using the minimum length of the transistor channel allowed in the IBM 0.13  $\mu$ m technology. For MOS transistors working with supply voltage up to 2.7 V the minimum length is 240 nm. In order to reduce the bulk effect and to obtain better power efficiency, triple-well NMOS transistors were used. This solution also allowed us to reduce transistor dimensions. Transistors used in the design are relatively big. In case of M1 the W/L ratio is  $8000\mu$ m/0.24 $\mu$ m,  $4000\mu$ m/0.24 $\mu$ m for M2, M3 and  $2000\mu$ m/0.24 $\mu$ m for M4.

The principle of circuit operation is the following [9]. When CLK is high ( $\overline{\rm CLK}$  is low) transistors M1, M3 are turned off and M2, M4 are turned on. Load current charges the output capacitor C . Simultaneously it discharges parallel capacitors C and C<sub>X</sub>. In the second phase, when CLK is low ( $\overline{\rm CLK}$  is high) switches M1, M3 are off and M2, M4 are on. The top capacitor C is connected in parallel with flying capacitors C<sub>X</sub>. It means that load current charges C and C<sub>X</sub> while discharging the bottom capacitance C .

#### 2) Level shifting, buffers and clock generator

Due to the better driving capability of switches, working with lower output voltage, there is no need to use additional level shifters.

Chains of scaled inverters were used as buffers in this design as well. Each buffer consists of four inverters. Similarly to the step-up DC-DC charge pump design triple-well NMOS transistors are used in the inverters. This causes the significant reduction of the current injected into the bulk. The W/L ratio of the PMOS transistor used in the last inverter is  $400\mu$ m/0.24 $\mu$ m, because there is no need for driving large external SMD capacitors but only the internal gate capacitance of the CMOS switches.

A very simple clock generator proposed by L. Pylarinos [3] is used in the design.

# **III. SIMULATION RESULTS**

#### A. Step-up charge pump.

The performance of the charge pump was simulated and some of the results are shown in Fig. 5. For the input voltage (dotted line) ramped from 0 V up to 0.9 V within 50  $\mu$ s the output voltage (solid line) reaches its nominal value of 1.6 V after 70  $\mu$ s. In Fig. 5(b) output voltage ripple is shown. The output ripple is less than 15 mV p-p, which is acceptable assuming that a linear regulator will follow the charge pump.

The simulations were performed for several clock frequencies. The results are shown in Fig. 6. The efficiency is relatively flat for clock frequencies from 150 kHz to 500 kHz, however for lower frequencies the output voltage ripple is higher. At 500 kHz we have still satisfactory efficiency of 85 % and the ripple is below 15 mV p-p.

Power efficiency is strongly dependent on the output current. This dependence is shown in Fig. 7(a). As it was mentioned before the circuit was optimized to obtain good power efficiency

for the nominal output current of 25 mA. For currents higher than 25 mA the power efficiency decreases rapidly due to losses on the resistance of serial PMOS switches M5 and M6 (Fig. 1). The plot shown in Fig. 7(b) indicates a strong dependence of the output voltage on the output current. From this chart one can easily calculate the output impedance of the designed step-up charge pump, which is about 8  $\Omega$ .



Figure 5: Simulation of step-up response for voltage doubler.



Figure 6: Power efficiency versus clock frequency.



Figure 7: Power efficiency versus output current and output voltage versus output current.

The corner analysis for designed step-up charge pump was

also performed. The results are shown in Fig. 8. For the typical transistor models the power efficiency measured from Spectre simulation was as high as 86 %. For 3  $\sigma$  fast device characteristics the power efficiency reaches 87 % with an output voltage of 1.65 V. On the other hand, for 3  $\sigma$  slow device characteristics the power efficiency is still high, 84 %. These results have been obtained by proper optimization of the level shifters, which give better driving capability of PMOS switches.



Figure 8: Results from the corner analysis of the voltage doubler (FF – fast-fast, FFS – fast-fast functional, FS – fast-slow, SF – slow-fast, SSF – slow-slow functional).

# B. Step-down DC-DC converter.

The simulation results of transient analysis for the step-down converter are shown in Fig. 9. Input voltage (dotted line) reaches its nominal voltage of 2.0 V after 5  $\mu$ s. After about 7  $\mu$ s, the output voltage (solid line) reaches 0.92 V. The output voltage ripples (shown in Fig. 9(b)) are below 10 mV p-p. The power efficiency calculated from Spectre simulations (for nominal output current of 60 mA) is as high as 92 %.



Figure 9: Simulation of step-down response for voltage divider.

The power efficiency is strongly dependent on the output current (Fig. 10). The output impedance calculated from this characteristic is about 1  $\Omega$ .



Figure 10: Power efficiency versus output current and output voltage versus output current.

The corner analysis for step-down converter was performed as well. Results obtained from simulations are shown in Fig. 11. The output voltage for the typical parameters is 0.92 V. Even for the 3  $\sigma$  slow device parameters, the output voltage is above 0.9 V. For typical device models the power efficiency is 92 % but in the worst case it is still above 90 %.



Figure 11: Results from the corner analysis of the voltage divider (FF – fast-fast, FFS – fast-fast functional, FS – fast-slow, SF – slow-fast, SSF – slow-slow functional).

# **IV. CONCLUSIONS**

We have elaborated the designs of the DC-DC step-up and step-down converters, which are fully compatible with the 130 nm CMOS technology. A solution has been worked out for the DC-DC step-up charge pump to overcome limitations due to low input voltage. The charge pump uses 3 external capacitors of 470 nF each. The nominal output current is 25 mA and output voltage is 1.6 V. The power efficiency obtained from Spectre simulations is up to 85 % at 500 kHz clock and output ripples below 15 mV p-p

The switched capacitor step-down DC-DC converter is based on the classical structure and uses also 3 external capacitors of 470 nF each. The design has been optimised for switching frequency of 1 MHz. The power efficiency for the nominal output current of 60 mA and output voltage of 0.92 V is up to 92 %.

In both cases the corner and Monte Carlo simulations were performed. Also layouts of both circuits have been prepared.

# V. ACKNOWLEDGEMENTS

This research project has been supported by a Marie Curie Initial Training Network Fellowship of the European Communitys & Seventh Framework Programme under contract number (PITN-GA-2008-211801-ACEOLE) and by the European Community's Seventh Framework Programme under the Grant Agreement no 212114 (SLHC-PP).

### REFERENCES

- P. Favrat *et al.*, A High-Efficiency CMOS Voltage Doubler, IEEE J. Solid-State Circuits, vol. 33, pp. 410-416, March 1998.
- [2] Y. Moisiadis *et al.*, A CMOS Charge Pump for Low-Voltage Applications, The 2000 IEEE International Symposium on Circuits and Systems, vol. 5, pp. 577-580, May 2000.
- [3] J. Rocha *et al.*, High Voltage Tolerant Level Shifters and Logic Gates in Standard Low Voltage CMOS Technolo-

gies, IEEE International Symposium on Industrial Electronics, ISIE 2007, pp. 775-780, June 2007.

- [4] Q. A. Khan *et al.*, A single supply level shifter for multivoltage systems, 19th International Conference on VLSI Design, January 2006.
- [5] L. Pylarinos Charge Pumps: An Overview Department of Electrical and Computer Engineering University of Toronto, Canada, 2001.
- [6] F. Zhang *et al.*, A New Design Method for High-Power High-Efficiency Switched-Capacitor DCDC Converters, IEEE Transactions on Power Electronics, vol. 23, pp. 832-840, March 2008.
- [7] K. D. T. Ngo and R. Webster, Steady-state Analysis and Design of a Switched-capacitor DC-DC Converter, IEEE Transactions on Aerospace and Electronics Systems, vol. 30, pp. 92-101, January 2009.
- [8] W. Harris and K. D. T. Ngo, Power Switched-capacitor DC-DC Converter: Analysis and Design, IEEE Transactions on Aerospace and Electronics Systems, vol. 33, pp. 386-395, April 1997.
- [9] M. Xu *et al.*, Voltage Divider and its Application in the Two-stage Power Architecture, Twenty-First Annual IEEE Applied Power Electronics Conference and Exposition, pp. 499-504, March 2006.

# Error-Free 10.7 Gb/s Digital Transmission over 2 km Optical Link Using an Ultra-Low-Voltage Electro-Optic Modulator

D. Janner<sup>a</sup>, M. Belmonte<sup>b</sup>, M. Meschini<sup>c</sup>, G. Parrini<sup>d</sup>, G. Nunzi Conti<sup>e</sup>, S. Pelli<sup>e</sup>, V. Pruneri<sup>a,f</sup>

<sup>a</sup> ICFO-Institut de Ciencies Fotoniques, Mediterranean Technology Park, 08860 Castelldefels (Barcelona), Spain <sup>b</sup> Oclaro Inc., Via F. Fellini, 4, I-20097 San Donato Milanese-Italy

<sup>c</sup> Istituto Nazionale di Fisica Nucleare, Sezione di Firenze, via G. Sansone 1, I-50019 Sesto Fiorentino (FI) Italy

<sup>d</sup> Phys. Dep. of the University and INFN Firenze, Via G.Sansone 1, I-50019 Sesto Fiorentino (FI) Italy

<sup>e</sup> CNR-IFAC - Istituto di Fisica Applicata "Nello Carrara", via Madonna del Piano 10, I-50019 Sesto Fiorentino (FI) Italy

<sup>f</sup>ICREA - Institució Catalana de Recerca i Estudis Avançats, 08010 Barcelona, Spain

# meschini@fi.infn.it

# Abstract

We demonstrate the feasibility of 10.7 Gb/s error-free (BER <  $10^{-12}$ ) optical transmission on distances up to 2 km using a recently developed ultra-low-voltage commercial Electro-Optic Modulator (EOM) that is driven by 0.6 Vpp and with an optical input power of 1 mW. Given this low voltage operation, the modulator could be driven directly from the detectors' board signals without the need of any further amplification reducing significantly the power dissipation and the material budget.

#### I. INTRODUCTION

The Large Hadron Collider (located at CERN, Geneva, CH) is foreseen to be upgraded in the future to reach an ultimate peak luminosity of  $10^{35}$  cm<sup>-2</sup> s<sup>-1</sup>: that will be the socalled Super-LHC stage. In the SLHC scenario, the bandwidth needed for data extraction from tracking detectors will grow significantly due to the huge particle content at high repetition rates. Increase in bandwidth of the optical link is a key factor to allow fast data processing and to reduce latency times. Actual trend is focused on the development of 5 Gb/s devices (10 Gb/s in perspective) as elements of the total link architecture. However a further increase in the transmission rate could be necessary either to reduce the number of optical links per detector, leading to volume and cost reduction, or to fulfil the higher rate requested from possible new trigger schemes. Voltage driving and power consumption are strategic features of the data link in order to keep low the required power budget. EOMs allow using CW lasers as optical source sitting outside the harsh radiation environment, with positive impact on the reliability of the system and on the detector global required power budget.

# II. LINK CONCEPT AND MEASUREMENTS

#### A. Electro Optic Modulators

Electro-Optic Modulators are widely employed in the telecom industry and represent a standard for 10 to 40Gb/s transmissions. Recent developments on modulators aiming at

low voltage operations are reported in [1], [2]. EOMs used in the measurements subbject of the present work are off-theshelf Lithium Niobate (LN) Mach-Zehnder modulators, with an electro-optic bandwidth (-3 dB) of 12.5 GHz and 10.7Gb/s transmission rate. They have been fabricated by Avanex (now Oclaro) company. The LN modulators have been proved to be excellent from the point of view of radiation resistance [3], [4]; they are immune to high magnetic field and they can be operated safely down to -20°C, according to producer indications.

#### B. Experimental Setup

The measurement setup is schematically drawn in figure 1. A 2 km SM fibre has been used between modulator and receiver, a distance which is exceeding by far and large any possible application in SLHC detectors. The CW laser reaches the modulator via a 2 m PMF (with an optical power of 1 mW at 1550 nm wavelength) while the power at receiver input is equal to -7 dBm.



Figure 1: Schematic drawing of the measurement setup

The physical dimensions of the 10Gb/s modulator are (48.0x9.3x5.0) mm<sup>3</sup>, plus fibre connections (13 mm IN and 18 mm OUT). The transmission bit rate is set to 10.7 Gb/s, with a pseudo-random bit sequence (PRBS)  $2^{31}$ -1.

## **III. RESULTS AND DEVELOPMENTS**

# A. Results

The Bit Error Ratio (BER) was measured varying the amplitude of the RF signal driving the modulator. As shown in figure 2, a BER equal to  $10^{-12}$  can be reached already at a driving voltage of 0.6 V<sub>pp</sub> over the 50  $\Omega$  impedance of the modulator input. Such a low driving voltage may greatly help in the use of EOMs as elements of the architecture of SLHC tracking detector readout systems; this feature avoids the need of higher voltages (usually 3 to 5 V) solely dedicated to data link operation.



Figure 2: Measurement of Bit Error Ratio vs driving voltage

The above result opens up the possibility of driving the modulators directly with buffered detection board digital signals

#### **B.** Developments and Applications

At SLHC, in a tracking semiconductor detector, the transmission data rate can be of the order of Tb/s/detection barrel layer, depending on type of the transmitted data [5]. The above figure implies the use of few hundred transmitters per layer at a digital rate of  $\leq 20$  Gb/s. The use of EOMs, which can presently reach speeds of 40 Gb/s, can avoid data transmission bottlenecks.

The modulator used in these measurements has a small package size, which is directly comparable to SFP+ standard dimensions; in any case, options to further reduce overall footprint are still open and under study. In the SLHC perspective, EOMs are not meant to be used on single detectors, but rather on the serialized output of a set of detectors, in order to efficiently use the available bandwidth and reduce the number of needed data links.

Tests with 1300 nm single mode fibre, together with a study of polarization effects over the fibre lengths relevant for SLHC detectors, are already planned.

# **IV. CONCLUSIONS**

A demonstration of 10 Gb/s transmission with BER  $< 10^{-12}$  on a 2 km fibre optics employing a driving voltage as low as V = 0.6 Vpp (on 50 Ohm impedance) is given. Further studies on polarization are ongoing, and form factor reduction is an option to be still explored. Integration on boards and serialization of data are necessary to cope with SLHC requests. The use of LN-EOMs may fulfil possible requests of faster than 10 Gbps/ devices while limiting the power and material budget imposed by the next generation of tracking detectors for high energy physics. This is the reason why LNM EOMs are attractive in the implementation of the next generation experiment optical-links.

#### REFERENCES

[1] D. Janner, D. Tulli, M. García-Granda, M. Belmonte, V. Pruneri, Laser & Photon. Rev. 3, 301-313 (2009)

[2] F. Lucchi, D. Janner, M. Belmonte, S. Balsamo, M. Villa, S. Giurgiola, P. Vergani, V. Pruneri, Opt. Express 15, 10739-10743 (2007)

#### [3] CERN/DRDC 93-35 RD23/Status Report

[4] Cheng-Chih Lai et al., IEEE Photon. Tec. Letters, vol. 19, no. 13, July 1, 2007

[5] G.Barbagli, F.Palla, G.Parrini, TWEPP-07 proc., 482-486

# ALICE TPC control and read-out system

# D. T. Larsen<sup>a</sup> for the ALICE TPC collaboration

# <sup>a</sup> Department of Physics and Technology, University of Bergen, Bergen, Norway

dagtl@ift.uib.no

# Abstract

ALICE is a dedicated heavy-ion experiment at CERN LHC aiming to study the properties of the quark–gluon plasma. A lead– lead collision might produce several ten thousand new particles. Detailed study of the event requires precise measurements of the particle tracks. A 90 m<sup>3</sup> Time Projection Chamber (TPC) with more than 500 000 read-out pads was built as the main central barrel tracker. Collisions can be recorded at a rate of up to about 1 kHz. The front-end electronics, designed from FPGAs and custom ASICs, performs shaping, amplification, digitisation and digital filtering of the signals. The data is forwarded to DAQ via 216 1.25 Gb/s fibre-optical links. Configuration, control and monitoring is done by an embedded Linux system on the front-end electronics.

First results on the performance of the front-end electronics and the distributed detector control system are presented.

### I. TIME PROJECTION CHAMBER (TPC)

The A Large Ion Collider Experiment (ALICE) [1] is using a TPC [2] as the main track-finding detector. A TPC is a gaseous detector. It is shaped like an horizontal barrel and positioned in the same direction as the beam pipe, which is passing through the centre of the barrel. The overall length is 500 cm, divided by a 100 kV Central Electrode (CE) into two identical drift volume. The diameter is 494 cm, though the innermost 170 cm is not part of the TPC to make room for the beam pipe and inner tracking detectors. A schematic view of the TPC can be seen in Figure 1.

Collisions will take place in the beam pipe in the centre of the TPC, allowing the particles produced to traverse the TPC and leave tracks of ionised gas along their paths. A strong electric field of from the CE will make the electrons drift towards the end planes, where data read-out is performed.

Each end plane is divided into 18 azimuthal sectors, which again are divided into two Multi-Wire Proportional Chambers (MWPC), the Outer and Inner Read-Out Chamber (OROC/IROC). The OROC has four Read-out Partitions (RPs); the IROC two. A RP is an electronic entity for reading out data from read-out pads. The ionistic signal will be amplified by the space charge around the wires of the MWPC. The induced charge on the read-out pads is forwarded to the read-out electronics. In total for both sides there are 557568 pads.

The drift volume is filled with counting gas composed of 85.7 % Ne, 9.5 %  $CO_2$  and 4.8 % N<sub>2</sub>. A cold, light gas is used to assure low diffusion and low multiple scattering. Field distortions are minimised because of the high ion mobility and few ionisation electrons per unit length. The electronics design noise

figure is 1000 RMS  $e^-$  (700 actually achieved); not limiting the position resolution will require a signal/noise ratio of at least 20.

Apart from tracking—measuring the charged particle momentum and having a good two-track separation—it also provides Particle IDentification (PID). The TPC is expected to perform well at multiplicities as high as  $dN_{ch}/d\eta$ =8000 in the particle momentum range [0.1, 100] GeV/c and within  $|\eta| < 0.9$ . Tracking efficiency is required to be >90 %, and the dE/dx resolution better than 10 %. Further, the TPC alone will have a momentum resolution of about 1 % at 2 GeV/c and 10 % at 50 GeV/c. For p–p collisions a read-out rate of  $\approx$ 1 kHz is expected, while for central Pb–Pb collisions  $\approx$ 0.2 kHz.



Figure 1: Schematic view of the TPC. To the left a singe Read-out Partition (RP) is enlarged for visibility. The support for the sectors is shown on the two end planes. Between them is the Central Electrode (CE). The TPC allows space around the centre of the length axis for beam pipe and inner silicon detectors.

#### II. DATA READ-OUT DESIGN

As already mentioned, each sector has six RPs. A RP consists of a Read-out Control Unit (RCU) with up to 25 Front-End Cards (FEC), depending on the radial location. The innermost RPs have the highest number of FECs, as a smaller size for the readout-pads is used to increase the resolution to take into account the higher track density close to the collisions. The electronics for one RP, as well as its connection to the central systems is shown as a block diagram in Figure 2.

Eight ALICE TPC Read-Out (ALTRO) [3] chips are mounted on a FEC, each capable of reading out 16 read-out pads. The FECs are attached to the RCU via buses; one for data transfer and one for control/monitoring. Once on the RCU, the data is forwarded to Data Acquisition system (DAQ) and the High Level Trigger (HLT) via a 1.25 Gb/s optical fibre. A Detector Control System (DCS) board equipped with an embedded ARM processor running Linux is attached to the RCU for control and monitoring. The board is equipped with a standard Ethernet network interface. Radiation tolerant electronics is needed to sustain the radiation from the collisions.

On the FECs, the pad signal passes through a shaping amplifier before it is forwarded to the ALTRO, which will digitise and digitally filter it. The ALTRO is using a 10-bit Analogue-Digital-Converter (ADC) capable of 10 million samples per second. The digital filtering is performed in four stages. First, systematic effects and low frequency perturbations are removed as part of a base-line correction. Tail cancellation removes the tail of the pulses within 1  $\mu$ s of the peak. Fully programmable filter coefficients allow for removal of a wide range of tail shapes. Next, non-systematic perturbations of the base-line superimposed on the signal is removed by applying a base-line correction moving average filter.

The RPs will read out data from a collision when they receive an external trigger. Before a new trigger is issued, it must be ascertained that all RPs have finished reading out data associated with the previous trigger. This is handled by a BUSY system. The Busy Box has a direct link to each of the DAQ computer nodes receiving data from a RP. Once the node has received all data from a certain RP, it will flag this to the Busy Box. When the Busy Box detects that the read-out is done, it will inform the central trigger system, which can now issue a new trigger to the RPs.



Figure 2: Block diagram of TPC read-out and control electronics. Left side is embedded on the detector, right external system in the counting room. Data is collected from FECs, forward to DAQ/HLT via the RCU. Control is achieved via the associated DCS board. The Busy Box indicates when read-out is finished, and a new trigger may be fired.

# III. DETECTOR CONTROL SYSTEM (DCS)

Control and monitoring of the RPs is mainly done via a special software, the FeeServer (FS), running on the embedded Linux system on the DCS board. Communication is via standard IP/TCP network. Functionally, the FeeServer has two main functionalities: monitoring and command handling. Monitoring will publish the values of important hardware registers to external clients. Command handling allows an instruction set to be built for configuring the Front-End Electronics (FEE). The handling of the fundamental network interface and infrastructure for monitoring and command handling is implemented in FeeServer Core, whereas the specific implementation for of hardware access for monitoring and command handling is done in a module called Control Engine (CE).

The InterComLayer (ICL) acts as a hub in the system. It maintains contact with the FS' of all 216 RP, as well as the PVSS-based GUI for the operator and a configuration database containing pre-defined configurations for the FEE.



Figure 3: Structure of the control hierarchy for the Detector Control System (DCS). From bottom: "field layer" (FEE); "control layer" (FS and lower part of ICL); "supervisory layer" (upper part of PVSS and GUI).

A three-layer hierarchy is defined for the DCS: "field layer" is the FEE itself; "control layer" is the FS on each RP, as well as the lower part of the InterComLayer (ICL); "supervisory layer" is the upper part of the ICL and the GUI the shifter is operating. This structure is shown in Figure 3.

Configuration of the FEE is accomplished by sending binary configuration data blocks to the FS. Values of registers of special importance, such as FEC temperatures, voltages and currents, as well as states of the state machine, are being published. Upon receiving a high-level configuration command from the GUI, ICL assembles configuration blocks for the FS by retrieving configuration parameters from the DB. ICL also collects data points published by FS, and forward them to the GUI. There is a full integration with the Experiment Control System (ECS), enabling operation of the TPC by the ALICE shifter.

#### IV. NOISE LEVEL

The background noise level is obtained regularly from pedestal runs. Figure 4 shows the distribution for pairs of ROCs for one of the end planes. The IROC constitutes one pair, while the four chambers of the OROC is divided into two pairs. In Figure 5 the same data is plotted on the corresponding read-out pad.



Figure 4: Noise distribution from pairs of ROCs: IROC and two OROC. The peak is around 0.7 ADC counts.



Figure 5: Typical noise level (in ADC counts) for the TPC from a recent pedestal run. As in previous figure, the noise level is around 0.7 ADC.

The noise figure is required to be less than  $1000 \text{ e}^-$  RMS of base-line, corresponding to 1 ADC count. The noise levels from the pedestal runs, showing that the noise figure is  $\approx 0.7$  ADC count (700 e<sup>-</sup>), well within the requirement. This is close to the natural limit, and do not change much with time. Also, it allows for zero-suppressed empty events less than 70 kB (noise); without zero-suppression 10 000 times larger.

#### V. DATA READ-OUT PERFORMANCE

RPs have a varying number of FECs depending on radial position in the sector, from 25 (innermost) to 18 (outermost). Reserving the same amount of bandwidth for each FEC regardless of radial location implies only RPs with 25 FECs can utilise the full bandwidth of the optical fibre, hence effective read-out rate per 6-RP sector is limited to 770 MB/s. Benchmark tests (Figure 6) show that this is indeed achievable for high-occupancy events where zero-suppression has been applied. Considering the case of low-occupancy events, read-out is possible at an event rate of 595 Hz (0 % occupancy) using full readout. The electronics also supports sparse read-out, in which case empty channels are entirely stripped, including headers. Applying this technique, the read-out rate increases to 1386 Hz. The respective data rates are 70 MB/s and 927 kB/s.



Figure 6: Event rate (black, left scale) and data rate (red, right scale) as function of occupancy, for full read-out. At 100 % occupancy the theoretical maximal data rate of 770 MB/s is reached. At 0 % occupancy the data rate is 595 Hz, however applying sparse read-out increases this to 1386 Hz (not shown, as it only significantly departs at low occupancy).

# REFERENCES

- [1] The ALICE Collaboration, K. Aamodt et al., "The ALICE Experiment at the CERN LHC", JINST 3 (2008) S08002.
- [2] The ALICE Collaboration, "ALICE TPC Technical Design Report", CERN/LHCC 2000-001, ALICE TDR 7, 7 January 2000.
- [3] L. Musa et al., "The ALTRO chip: a 16-channel A/D converter and digital processor for gas detectors", IEEE Trans. Nucl. Sci., November 2003.

# Simple parallel stream to serial stream converter for Active Pixel Sensor readout.

V. Kushpil<sup>1</sup> M.Šumbera<sup>1</sup>, M.Szelezniak<sup>2</sup>

<sup>1</sup>Nuclear Physics Institute ASCR, 25068 Řež/Prague, Czech Republic <sup>2</sup>Berkeley National Laboratory, 1 Cyclotron Rd. Berkeley, CA 94720, USA

# kushpil@ujf.cas.cz

# Abstract

This paper describes a new electronics module for converting a parallel data flow to a serial stream in the USB 2.0 High Speed protocol. The system provides a connection between a PC USB port and a parallel interface of the DAQ board, which is used for investigation of performance of Active Pixel Sensors (APS) prototypes. The DAQ readout software supports Win XX OS and Linux OS. GUI examples have been prepared in the Lab Windows and Lab View environments. The module that was designed using virtual peripheral concept can be easily adapted for many similar tasks.

#### I. INTRODUCTION

High Granularity Semiconductor Detectors (HGSD) (pixel, micro strip and drift) are a powerful tool in highenergy physics. Readout electronics for HGSD is manufactured as ASIC chips (contained preamplifiers, shapers, analog and digital memory and ADC) that can be controlled by FPGA based circuits [1]. The Custom-build Modules Readout (CMR) is used to handle the transfer of data between the HGSD and the main computer for data storage. For example, for investigation of Active Pixel Sensor (APS), the LBL APS group uses a simple parallel data transfer protocol with readout rates of about 60MB/s [2]. Data from DAQ are sent to PC synchronously with Process Clock (PCLK) signal and the data flow is controlled by REQ (request) and ACK (acknowledge) signals.

The main disadvantage of CMR is that the readout system requires a digital DAQ PCI card that needs to be installed inside a PC and this limits the portability of the system. Also the multi-conductor SCSI-like cable limits to some extent the portability of the system.

In this paper we described a simple, 16 bit parallel to USB 2.0 stream converter which allows readout with data rates of about 48 Mbytes per second and can be easily adapted to many different readout architectures and different OS (WinXP, Linux). Flexibility of the converter is achieved by using the virtual peripheral concept [3] for design and fastest 8-bits micro controller SX28 from UBICOM [4]. By using this module, the APS DAQ can be connected to a portable computer allowing the use of different Operating System (OS) with the same hardware (HW) and software (SW).

#### II. HARDWARE

As shown in Fig.1, the Parallel to Serial Flow Converter (PSFC) consists of three main parts: MCU control, FIFO memory and Quick USB (Q-USB) module. The converter module is designed for high performance and maximum

flexibility. It contains the single chip of FIFO memory, single chip of the micro controller SX28, three chips of digital buffers and one Q-USB module. The converter consists of 16 bits parallel input Din[15..0], input lines REQ and PCLK and output line nFULL (FIFO is full). The FIFO memory is a CMOS chip CY7C4506 (16KB x 18 bits) operating with 100 MHz clock (this chip reads and writes data on the front edge of the clock signal). The MCU SX28 control unit provides the data flow synchronization. To obtain acceptable processing time of conversion the simple control algorithm is used. The stages of conversion are described below.



Figure 1: The block diagram of the Parallel to Serial Flow Converter (PSFC).

Stage 1: Read 16 bits word data from DAQ and store first 8KB data in FIFO memory. The 16 bit data from APS are sent synchronously with PCLK signal and the data flow is controlled by REQ and ACK signals. The signal REQ set to '1' informs converter that DAQ is ready to start the data transfer. In response to ACK='1', REQ is set to '0' and a data package from DAQ is sent synchronously with the PCLK clock signal.

Stage 2: After writing data to FIFO during 8KB/12MHz~0.7ms is starting the process conversion. After converter receives the data package (during 16KB/48MHz=~0.33ms, or 330/125~3 USB2 (High-Speed) frames), ACK is set to '0'. When signal REQ is set to '1' and when nFULL is equal '1' the MCU enables writing data to FIFO memory by setting the signal WEN (Write Enable).

This enables the Q-USB module to read data from FIFO and to send it to PC via USB. The Q-USB module operation is described in tutorial and will not be explained here. We will remark only that the Q-USB can be configured for data readout in different modes (master device, slave device, "data tube" mode, full handshake FIFO mode and more...). The signal nFULL=0 switches the Q-USB module into a waiting mode. The FIFO memory is used as a temporary storage buffer if frequency PCLK is high then frequency IFCLK. The signal for synchronization of the beginning of data transfer is absent in the original APS DAQ. The synchronization is achieved by resetting DAQ when PSFC is in the waiting mode

# **III. SOFTWARE AND RESULTS**

Two types of software were developed for the converter. The first one is the converter software that was written in SX assembler code. Codes are very simple and can be easily modified. Modification of the code can be used for adjusting the number of data blocks used for transfer of a multiple frame from DAQ or for additional data processing during of the converter operation. For example, we can use the interrupt service to monitoring nFULL signal during highspeed data transfer from DAQ or software monitoring for that.

The second type of software is the readout software that was prepared for WinXP and Linux OS. The Quick USB has two important advantages. The first is that it can be used with different OS (Win XX, Linux and MAC). The second is that Q-USB supports include libraries of standard functions for different kind of compilers (MS Visual C, Borland CPP, Lab Windows, Lab View and GNU C). Short examples for all OS and compilers described above were prepared. The converter was tested with APS DAQ motherboard version V3.02 developed at LBL.

The converter module was tested by reading full frames of APS prototype MIMOSA5 [5], which consists of four sub-arrays of 512 x 512 pixels each. The DAQ is operating as master device and the converter is a slave device. The data were sent frame by frame. The maximal input data rate A can be define as A=M/T +B, where M- size of FIFO, B-data conversion rite, T- frame sending time. For M=0.16Mbyte, B=48MHz, T=75msec we can obtain A=50Mbyte/sec. From real test we can conclude that converter practically doesn't change output data rating. Specific delay is defined by OS. For WinXP the readout process must be run with high priority (level 13) to realize fast data flow conversion. The readout with converter was reliable and system operated continuously nearly 36 hours.

# IV. CONCLUSIONS

A simple parallel to serial data stream converter for connection between APS DAQ parallel interface and USB port of PC was developed and tested. The readout DAQ software developed for this system supports Win XX OS and Linux OS. The module that was designed using virtual peripheral concept can be easily adapted for many similar tasks. The hardware description, Gerber files, and firmware for the converter module can be downloaded from: http://ojs.ujf.cas.cz/~kushpil/APS

# V. REFERENCE

1] E.J. Siskind, Data acquisition system issues for large experiments, Nucl. Instr. and Meth. A 579 (2007) pp.839–843

[2] LBL readout LBL readout schematic http://www.lbnl.leog.org/pdf/pixel\_dag\_motherboard.pdf [3] SX Virtual Peripheral Methodology & Modules Rev. 1.0 © 2000 Scenix Semiconductor, Inc.

[4] SX28 User Guide 1.0 © 2000 Scenix Semiconductor, Inc.

#### [5] Schematic MIMOSA5

http://www.lbnl.leog.org/pdf/mimosa5\_schem.pdf

This work was supported by the Ministry of Education of the Czech Republic (grants LA08015 and LC70480).

# Total dose effects on deep-submicron SOI technology for Monolithic Pixel Sensor development

S. Mattiazzo<sup>a,b</sup>, M. Battaglia<sup>c,d</sup>, D. Bisello<sup>a,b</sup>, D. Contarato<sup>d</sup>, P. Denes<sup>d</sup>, P. Giubilato<sup>a,b,d</sup>, D. Pantano<sup>a,b</sup>, N. Pozzobon<sup>a,b</sup>, M. Tessaro<sup>a,b</sup>, J. Wyss<sup>b,e</sup>

<sup>a</sup> Università degli Studi di Padova, Dipartimento di Fisica, I-35131 Padova, Italy.

<sup>b</sup> Istituto Nazionale di Fisica Nucleare, Sezione di Padova, I-35131 Padova, Italy.

<sup>c</sup> Department of Physics, University of California, Berkeley, CA 94720, USA.

Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.

<sup>e</sup> DiMSAT, Università degli Studi di Cassino, Cassino, Italy.

serena.mattiazzo@pd.infn.it

#### Abstract

We developed and characterized Monolithic pixel detectors in deep-submicron Fully Depleted (FD) Silicon On Insulator (SOI) technology. This paper presents the first studies of total dose effects from ionizing radiation performed on single transistor test structures. This work shows how the substrate bias condition during irradiation heavily affects the resulting radiation damage.

### I. INTRODUCTION

SOI technology employs standard CMOS integrated circuits fabricated on a thin Silicon layer, electrically insulated from the rest of the silicon wafer by means of a thick oxide layer (Buried Oxide, BOX). This approach gives several advantages over the standard bulk CMOS technology: the small active volume and the lower junction capacitance allow designs with higher latch-up immunity, higher speed and lower power consumption.

Moreover, being the electronics insulated from the substrate, it becomes possible to use a high resistivity substrate as sensitive volume for particle tracking and imaging. The possibility to deplete the sensor layer greatly improves the charge collection efficiency. Vias etched through the oxide connect the substrate to the electronics layer, so that pixel implants can be contacted and a reverse bias can be applied.

A monolithic pixel detector in SOI technology has several features which are appealing for its potential use in the inner volume of the CMS Tracker at SLHC. Unlike Hybrid Pixel Detectors, being both the detector and its front end electronics integrated in the same substrate, there is no need of the expensive bump bond process. The monolithic approach also reduces the material budget of the detector and makes the detector assembly and handling much easier. When compared to other kind of monolithic detectors (i.e. MAPS), a pixel with a depleted sensitive volume features a higher radiation tolerance to displacement damage and allows faster readout speed (as charge is collected by drift and not by diffusion).

However, SOI technology is well known to be prone to total dose damage due to the presence of the thick BOX, where positive charge gets trapped. In this work we will study the total dose tolerance of SOI technology under working conditions when used as particle detector.

#### II. CHIP PRODUCTION

A first prototype chip, named LDRD-SOI-1, was obtained in 2007 in the OKI 0.15 $\mu$ m Fully Depleted (FD) SOI technology. This chip has been widely tested and characterized [1], [2]. As the 0.15 $\mu$ m process was not optimized for low leakage current, it was no longer adopted for the following chip productions.

A second prototype sensor, the LDRD-SOI-2 chip, was designed and fabricated in 2008 in the OKI 0.20 $\mu$ m FD-SOI process, optimized for low leakage current. This process features a full CMOS circuitry implanted on a 40nm thin Silicon layer on top of a 200nm thick BOX. The thickness of the CMOS layer is small enough for the layer to be FD at typical operational voltages. The sensor substrate is 350 $\mu$ m thick and has a resistivity of 700 $\Omega$ ·cm; it is thinned to 250 $\mu$ m and plated with a 200nm thin Al layer that allows backbiasing.

The chip is  $5 \times 5 \text{mm}^2$  with an active area of  $3.5 \times 3.5 \text{mm}^2$  divided into  $168 \times 172$  pixels of  $20 \mu \text{m}$ . The pixel matrix is subdivided into a  $40 \times 172$  pixel section with a simple, analog 3T architecture, and a  $128 \times 172$  pixel section with a digital architecture providing a binary output. In the latter, two capacitors are integrated in each pixel for in-pixel Correlated Double Sampling (CDS), and a digital latch is triggered by a clocked comparator with a current threshold, which is common to the whole section. The chip design has been optimized to allow readout up to a 50MHz clock frequency, and the binary section is equipped with multiple parallel outputs for high frame rate.

A potential limitation of the SOI technology comes from the transistor back-gating effect. The reverse bias of the silicon substrate, necessary to deplete the sensitive volume, increases the potential at the silicon surface, so that the BOX acts as a second gate for the CMOS electronics on top. This typically causes a shift in the transistor threshold as a function of the increasing depletion voltage This was investigated with TCAD simulations and the most effective design found to limit the back-gating problem was a floating p-type guardring around each pixel, which was implemented in the chip; two floating guard-rings also separate the peripheral electronics and I/O logic from the pixel matrix and from the pad area. This chip is currently under test [3].

### III. ELECTRICAL CHARACTERIZATION OF LDRD-SOI-2

I-V and C-V measurements have been performed on the LDRD-SOI-2 detector, to extract both the breakdown voltage and the depletion depth of the sensitive volume, as a function of the backside voltage.

The  $I_{back}$ - $V_{back}$  measurement (Figure 1), performed keeping at 0V the two external guard-rings, shows a breakdown occurring at  $V_{back} \sim 85V$ .



Figure 1: I<sub>back</sub>-V<sub>back</sub> measure on LDRD-SOI-2.

The C-V measure was performed on the chip by keeping at 0V the p-type implantation guard-ring around each pixel (which forms a grid all over the sensor) and by applying an increasing voltage to the backside, to deplete the whole sensitive volume under the BOX. With the knowledge of the area of the depleted volume and with the measure of the capacitance at a certain  $V_{back}$ , we could calculate the corresponding depletion depth (W). In Figure 2 we compare the so measured W with the value expected for a substrate with a nominal resistivity of  $700\Omega$  cm. The measured value is a factor 2 lower than expected; this might indicate that the actual resistivity is slightly lower than the nominal value.



Figure 2: Comparison between the expected and the measured values of the depletion depth as a function of  $V_{\rm back}.\,$ 

#### IV. TOTAL DOSE STUDIES

#### A. Depletion voltage and Fractional Yield

In SOI technology, the thick buried oxide is expected to be very sensitive to ionizing radiation due to positive charge trapping, and a consequent increase of the top-gate leakage current. This effect is even larger when this technology is used to build monolithic radiation detectors. In fact, when a depletion voltage is applied to the sensitive volume (substrate), a strong electrical field is present across the BOX. When exposed to ionizing radiation, electron-hole pairs are created inside the thick oxide. The electrical field immediately separates these charges, which do not recombine; this greatly increases the amount of positive charge trapped throughout the BOX. The number of electron-hole pairs escaping recombination ("fractional yield") depends both on the bias given to the substrate and on the stopping power of the incident particle (Figure 3) - the higher the ionization density, the higher the recombination probability.



Figure 3: Fractional yield as a function of the electrical field applied throughout the oxide and for different incident particles [4], [5].

Previous works have been already carried out to study the total dose damage on monolithic pixel detectors fabricated in 0.15 $\mu$ m Fully Depleted (FD) SOI technology [6] in fixed bias conditions (transistor terminals floating). In our work we will study the total dose tolerance of OKI 0.20 $\mu$ m technology under different bias conditions during irradiation. This will allow a better understanding of the effects of the substrate voltage in real working conditions.

#### B. Irradiations on 0. 20µm process

We performed the studies described in this paper at the total dose test facility located at the INFN National Laboratory of Legnaro (Italy). The facility is equipped with the RP-149 Semiconductor Irradiation System from Seifert (Ahrensburg, Germany) which uses a standard tube for X-ray diffraction analysis (maximum power 3000 W, maximum voltage 60 kV, tungsten anode) [7]. The irradiations are performed in air, at room temperature and with a dose rate of 165rad(SiO<sub>2</sub>)/sec.

The total dose studies have been carried out on test structures kindly provided by KEK (Japan). They consist of 16 NMOS and 16 PMOS transistors, with gates and drains separated and with common sources. Each transistor is surrounded by 1 $\mu$ m PSUB ring. For both NMOS and PMOS transistors, 8 are Body Float type and the remaining 8 have the body with different kinds of connection. In their turn, each set of 8 structures features both core and I/O transistors with threshold voltages (V<sub>thr</sub>) and W and L values varying according to Table 1 and Table 2.

Table 1: Body Float Type.

| Tr | L (µm) | W (µm) | Comment                       |
|----|--------|--------|-------------------------------|
| M1 | 0.20   | 100    | Core, normal V <sub>thr</sub> |
| M2 | 0.50   | 250    | Core, normal V <sub>thr</sub> |
| M3 | 1.00   | 500    | Core, normal V <sub>thr</sub> |
| M4 | 0.20   | 100    | Core, low V <sub>thr</sub>    |
| M5 | 0.50   | 250    | Core, low V <sub>thr</sub>    |
| M6 | 1.00   | 500    | Core, low V <sub>thr</sub>    |
| M7 | 0.35   | 175    | I/O, high V <sub>thr</sub>    |
| M8 | 0.35   | 175    | I/O, low V <sub>thr</sub>     |

Table 2: Body Connection Type.

| Tr  | L (µm) | W (µm) | Comment                                    |
|-----|--------|--------|--------------------------------------------|
| M9  | 0.20   | 100    | Core, normal V <sub>thr</sub> , Source Tie |
| M10 | 0.50   | 250    | Core, normal V <sub>thr</sub> , Source Tie |
| M11 | 1.00   | 500    | I/O, Source Tie                            |
| M12 | 0.20   | 100    | Core, normal V <sub>thr</sub> , Body Tie   |
| M13 | 0.50   | 250    | Core, normal V <sub>thr</sub> , Body Tie   |
| M14 | 1.00   | 500    | Core, normal Vthr, Body Tie                |
| M15 | 10     | 100    | I/O, D-NMOS                                |
| M16 | 10     | 100    | I/O, D-NMOS, Source Tie                    |

For Body Tie transistors M12, M13 and M14, the voltage for the body can be externally supplied by a connection pad. Each signal is directly connected to a pad without any protection diode, making the transistors very sensitive to electrostatic discharges.

During irradiation the transistors are in ON state, corresponding to the worst-case bias condition (the drain and the source of each transistor were kept to 0V, while the gate was kept HIGH for NMOS, LOW for PMOS ). We irradiated the test structures at three different values of depletion voltage:  $V_{back} = 0V$ , 5V, 10V. In these first test, the PSUB guard-ring surrounding each pixel is kept floating, both during irradiation and during measurements, while the external body contact (for M12, M13 and M14) is kept grounded during irradiation and floating during measurement (no significative differences were found in the transistors characteristics if this contact is kept grounded during measurements).

Different transistors show different behaviors when exposed to X-ray radiation; some seem to be promising with regards to their total dose hardness.

The most radiation tolerant behaviour has been found in transistor M13 NMOS, whose  $I_{ds}$ -V<sub>gs</sub> characteristics are displayed in Figure 4, Figure 5 and Figure 6.





Figure 4:  $I_{ds}$ - $V_{gs}$  curve for the M13 NMOS transistor before and after irradiation at  $V_{back} = 0V$  (up to a total dose of 2Mrad).



Figure 5:  $I_{ds}$ -V<sub>gs</sub> curve for the M13 NMOS transistor before and after irradiation at  $V_{back}$  = 5V (up to a total dose of 160krad).



Figure 6:  $I_{ds}$ - $V_{gs}$  curve for the M13 NMOS transistor before and after irradiation at  $V_{back} = 10V$  (up to a total dose of 62krad).

As expected, the total dose damage is heavily dependent on the substrate bias conditions during irradiation. It is well known that the accumulation of positive charge in the BOX causes a negative shift in the threshold voltage of the back transistor. The consequent parasitic conduction induces a current leakage in the front gate transistor, which can hardly be controlled by the front gate polarization. When a positive bias (5V or 10V) is applied to the backside to deplete the detector, the leakage current of the top transistor remains at acceptable levels only for few tens of krad of total dose. When 0V is applied, instead, the transistor is working properly up to an accumulated dose of  $\sim 1$ Mrad.

This observation implies that the total dose tolerance of such devices would greatly increase if the potential under the BOX is kept low.

For this reason, we studied the effectiveness of the PSUB guard-ring to limit the backgate effect. In Figure 7 we report the  $I_{ds}$ -V<sub>gs</sub> curve for one transistor (M2 NMOS) with the PSUB contact floating and in Figure 8 the same curve for the same transistor, but with the PSUB contact tied to GND.



Figure 7:  $I_{ds}$ -V<sub>gs</sub> curve for the M2 NMOS transistor before irradiation, with PSUB ring kept floating.



Figure 8:  $I_{ds}\text{-}V_{gs}$  curve for the M2 NMOS transistor before irradiation, with PSUB ring tied to GND.

With the PSUB at 0V, the leakage current is substantially unchanged, even for  $V_{back}$  values which usually cause the transistors to stop working properly. Analog behaviors are found for the other transistors. This result suggests that the presence of this PSUB guard-ring is indeed effective in keeping the voltage low under the BOX.

To verify if this approach is also helpful in improving the radiation tolerance of the transistors, we performed the following X-ray irradiation at  $V_{back} = 10V$  with the PSUB ring tied to GND and not floating, as in all the previous irradiations.

In Figure 9 we report in log-log scale a summarizing plot of the leakage current values ( $I_{ds}$  when  $V_{gs} = 0V$ ) as a function of the total dose, for all the four irradiations of the previously described transistor, to better compare the effects.



Figure 9: Leakage current values as a function of the total dose accumulated for the M13 NMOS transistor for the four different backside biases during irradiation.

The irradiation at  $V_{back} = 10V$  with the PSUB guard-ring tied to GND indeed improves the radiation hardness of the transistors, and the effect of the total dose damage is comparable with the irradiation performed at  $V_{back} = 5V$  (halfway between 0V and 10V).

The threshold voltage (V<sub>th</sub>) decreases as expected as the accumulated dose increases (Figure 10); again we can see how the irradiation with PSUB guard-ring tied to GND is effective in containing the effect, even though not able to suppress it completely. The V<sub>th</sub> is calculated as the intercept value with the x axis of the I<sub>ds</sub>-V<sub>gs</sub> curve in the linear range.



Figure 10: Threshold voltage for M13 NMOS transistor as a function of the total dose for all the four irradiation conditions.

It is interesting to note that M13 has the body externally tied to GND during irradiation, which apparently helps in keeping low the fields inside the oxides, enhancing its radiation tolerance. We can compare M13 to the transistor M2, which has the same W/L and the same threshold as M13, but with a floating body without no possibility to tie it to GND. This transistor shows a much lower radiation tolerance than M13 (at  $V_{back} = 0V$ , for example, it is able to sustain up to 80krad of accumulated dose).

#### V. CONCLUSIONS

Aim of this work was the study of the effect of the substrate bias conditions on the total dose damage on Monolithic Pixel Detectors fabricated in SOI technology. The investigation focused on the  $0.20\mu$ m OKI FD process, optimized for low leakage currents and used for the development of the last pixel matrix (LDRD2-SOI-2) and for the future detectors. Results are encouraging, as we experimentally proved that the transistors are able to sustain doses up to 1Mrad when the electrical field is kept low throughout the BOX. In this perspective, we also have experimental evidences of effectiveness of the use of a PSUB guard-ring in containing both the backgate effect and the total dose damage on the transistors.

Other technological solutions, like the implantation of a buried P-Well (BPW) under the BOX (and not only a PSUB guard-ring) will hopefully further suppress the backgate effect. It has been demonstrated [8] that this BPW effectively reduces the potential under the BOX and suppresses the backgate effect even at  $V_{back} = 100V$ . With a reduced electrical field through the BOX, the radiation hardness of the chip should also improve, opening up new possibilities for their applications in high radiation environment, such as SLHC.

#### **VI. REFERENCES**

- M. Battaglia *et al.*, Nucl. Instr. and Meth. A 583, 526 (2007).
- [2] M. Battaglia *et al.*, Nucl. Instr. and Meth. A 604, 380 (2009).
- [3] M. Battaglia *et al.*, Journal of Instrumentation, (2009) arXiv:0903.3205.
- [4] F. B. McLean et al., Tech. Rep. HDL-TR-2129, 1987.
- [5] M. R. Shaneyfelt *et al.*, IEEE Trans. Nucl. Sci., 38, 1187 (1991).
- [6] Y. Ikegami et al., Nucl. Instr. and Meth. A 579, 706 (2007).
- [7] D. Bisello et al. Radiation Physics and Chemistry 71, 713 (2004).
- [8] Y. Arai oral presentation at HSTD7, Hiroshima, 2009

# AFTER, the front end ASIC of the T2K Time Projection Chambers

P. Baron, J. Beucher, D. Calvet, X. de la Broise, E. Delagnes, A. Delbart, F. Druillole, A. Le Coguie, E. Mazzucato, E. Monmarthe, M. Zito

CEA Saclay, DSM/IRFU, 91191 Gif-sur-Yvette Cedex , France

# eric.delagnes@cea.fr

# Abstract

The T2K (Tokai-to-Kamioka) experiment is a long baseline neutrino oscillation experiment in Japan. A near detector, located at 280m of the production target, is used to characterize the beam. One of its key elements is a tracker, made of three Time Projection Chambers (TPC) read by Micromegas endplates. A new readout system has been developed to collect, amplify, condition and acquire the data produced by the 124,000 detector channels of these detectors. The front-end element of this system is a a new 72-channel application specific integrated circuit. Each channel includes a low noise charge preamplifier, a pole zero compensation stage, a second order Sallen-Key low pass filter and a 511-cell Switched Capacitor Array. This electronics offers a large flexibility in sampling frequency, shaping time, gain, while taking advantage of the low physics events rate of 0.3 Hz. We detail the design and the performance of this ASIC and report on the deployment of the frond-end electronics on-site.

#### I. INTRODUCTION

T2K (Tokai-to-Kamioka) experiment [1] is dedicated to the study of neutrino oscillations. An intense artificial neutrino beam from the J-PARC (Japan Proton Accelerator Research Complex) facility in Tokai is sent 295 km across Japan towards the already existing Super Kamiokande detector [2] in Kamioka to study how neutrinos change from one type to another. The ND280 [3] detector complex is presently under construction for a scheduled completion by the end of 2009. Located at 280 m from the neutrino production target, its purpose is to measure properties of the neutrino beams at the J-PARC site before the neutrinos have had a chance to oscillate into other flavours. This near detector complex comprises an on-axis detector and off axis detectors mounted inside a magnet used formerly in the UA1 and Nomad experiments. Two Fine-Grain Detectors (FGD), a pi zero detector, an electro-magnetic calorimeter and muon detectors are housed together with three large Time Projection Chambers (TPCs) inside this magnet. These TPCs (schematic view shown on Figure 1), will measure the momenta of muons produced by charged current interactions in the detector, and will be used to reconstruct the neutrino energy spectrum. Each half TPC (2 m x 1m x 2m) endplate is read by a 1.5 m<sup>2</sup> mosaic of 12 pixelated Micromégas modules manufactured using the bulk technology [4]. The front-end electronics modules are directly plugged on the Micromégas detectors, avoiding then the use of fragile fine pitch cables or expensive kapton flex cables, to minimize noise and reduce cost.



Figure 1: Schematic view of one of the 3 TPCs and picture of a 36 cm x 34 cm Micromegas readout module.

### **II. ELECTRONICS SYSTEM OVERVIEW**

# A. Requirements and Constraints

To reach the required reconstruction precision of tracks, the anode of each Micromégas detector is segmented in 1728 pads of 9.8 mm x 7 mm resulting in a total number of 124,416 signals to read. For each pad, the current signal is collected, shaped and recorded (synchronously for all the TPCs) during a duration corresponding to the maximum drift time in the TPC. Then, the X and Y coordinates of the track are reconstructed by computing the centroïd of the charges recorded on the pads hit, while the Z coordinate is determined by the drift time of the electrons in the gas volume computed from the ~500-sample long waveform recorded for each pad. The maximum drift time in the TPC can vary from 10 us to 500µs depending on the gas used. For this reason, the sampling frequency must be adjustable from 1 MHz to 50 MHz. The charge delivered by a pad for a Minimum Ionizing Particle (MIP) is typically few tens of fC, depending on the Micromégas high voltages. A maximum dynamic range of 10 MIP is required with non-linearity smaller than 1% (in the 1-3 MIP range) together with a 100 signal to rms noise ratio for the MIP signal for accurate centroïd calculation.

The neutrino beam is pulsed; there is one spill every  $\sim$ 3.5 s. The TPCs require an external trigger signal and must be able to

capture all beam spills and calibration events (cosmic rays and internal illumination by a laser) at up to 20 Hz. The maximum allowable dead-time for acquiring an event is 50 ms. In addition to these functional requirements, the front-end part operates in a modest magnetic field (0.2 T) with limited space available, a low power budget and no access during operation. There is no special constraint concerning radiation.

# B. Architecture of the Electronics

After having collected, filtered and sampled the signals from the detector, the main functions of the electronics are to reduce and smooth the huge data flow coming out from the front-end and reaching 50 Tbps during the drift time in the gas to values compatible with the DAQ. For this purpose, the electronics takes advantage of the low rate of the events.



Figure 2: TPC readout flow.

The on-detector electronics, located inside the magnet, is based on a modular electronics unit, depicted in Figure 3, reading one whole Micromégas module. This unit, connected directly to the anodes, is composed of 6 Front-End Cards (FECs) and one Front-End Mezzanine (FEM) card. Each 288-channel FEC houses input spark protections, 4 custom-made 72-channel "AFTER" front-end chips (ASIC For TPC Electronic Readout) and a commercial 12-bit quad-channel ADC. The ASIC collects and filters the detector signals and samples them continuously in an analog memory, based on a Switched Capacitor Array (SCA) until an external stop signal, tagging the end of the drift time, arrives. Then, taking advantage of the inter-spill time, the analog data from all the channels of the chip is multiplexed towards one of the four channel of the external ADC achieving thus a first 72to-1 data concentration.

This scheme permits to decouple the sampling frequency (settable from 1 MHz to 100 MHz) and the digitization and digital data treatment clock frequencies (which are set at a fix value). The FEM is a digital electronics card that controls up to 6 FECs, gathers events digitized by the FECs, performs optionally pedestal subtraction and zero suppression, and sends data outside the detector through a full-duplex gigabit optical link. Outside the detector, 6 Data Concentrator Cards (DCC) aggregate the data of the TPC endplates and send event fragments to a merger computer that performs a final data reduction and communicates with the experiment DAQ system via a standard network connection. At the DAQ level, the data has been reduced to less than 250 Kbyte/event.



Figure 3: Front-end electronics of one Micromégas module.

# III. THE AFTER CHIP

### A. Description and Architecture.

The AFTER chip is the central component of the FEC board. It performs a first concentration of the data from 72 inputs to only one analog output connected to an external ADC. Defined before the final choice of the detector, it was developed to accommodate various kinds of detectors and gas mixtures. For this reason, it is very versatile so that its main parameters can be set, using a slow control serial link, to match the detector parameters. For instance 4 different gains are selectable to adapt the chip range to the detector gain and its shaping time and sampling frequency can be chosen to match the drift time in the gas. Moreover the chip can deal with both signal polarity to be compatible with wire chambers readout and is usable with a wide range of input capacitance, even if it is optimized for 20 pF, which is the nominal value expected for detector and routing. Several test modes are available, allowing pulsing one or several channels with a known charge for test or calibration purposes. The main chip specifications are summarized in Table 1.

| Parameter                   | Value                           |
|-----------------------------|---------------------------------|
| Number of channels          | 72                              |
| Samples per channel         | 511                             |
| Dynamic Range               | 2 V / 10 MIPs on 12 bits        |
| MIP charge                  | 12 fC to 60 fC                  |
| MIP/Noise ratio             | 100                             |
| Gain                        | 4 values from 4 mV / fC to      |
|                             | 18 mV / fC                      |
| "Detector" capacitor range  | 0 pF -40 pF                     |
| Peaking Time                | 100 ns to 2 $\mu$ s (16 values) |
| INL                         | 1% 0-3 MIPs ; 5% 3-             |
|                             | 10MIPs                          |
| Sampling frequency          | 1 MHz to 100 MHz                |
| Readout frequency           | 20 MHz to 25 MHz                |
| Polarity of detector signal | Negative (T2K) or Positive      |
| Test                        | 1 among 72 channels or all      |

The architecture of AFTER is shown on Figure 4 and a detailed description can be found in [5]. Each of its 72 channels comprises a front-end part dedicated to the charge collection and the shaping of the detector signal followed by a Switch Capacitor Array (SCA) that samples and stores the analog signal.

The front-end part is made of:

- a NMOS-input Charge Sensitive Amplifier with a folded cascod architecture and continuously reset by a resistor virtually multiplied by an attenuating current conveyor.

- a pole-zero amplifier, using a branch of the current conveyor to cancel the CSA dominant pole. It also amplifies the CSA output signal by a factor comprised between 6 and 30 depending on the gain setting and realises the first pole of the shaper.

- a Sallen-Key filter with 2-complex poles producing a relatively narrow response with a very small undershoot (0.8%).

- an inverting voltage amplifier doubling the signal and driving the SCA.



Figure 4: Architecture of the AFTER Chip

Each channel includes a 511-cell SCA using 4-switches high dynamic range analog memory cells [6] and a read amplifier. Four extra similar channels are available for optional common mode or fix pattern noise rejection (not used for T2K operation). Each SCA channel operates as a 511-cell circular analog buffer in which the signal coming out from each analog channel is continuously sampled and stored at a  $F_{wck}$  sampling rate (up to 100 MHz). When a stop signal is received, the SCA state is frozen and the analog data are sequentially read and multiplexed column by column towards an external commercial 12-bit ADC converting at a 20 MHz rate. The SCA can be totally or partially read. The readout time for the whole memory takes 2 ms corresponding to a fix dead time.

# **B.** AFTER Chip Performances

The AFTER chip has been manufactured using the 0.35 $\mu$ m CMOS technology from AMS. The chip integrates 400,000 transistors on a 58 mm<sup>2</sup> area and is packaged in a 160-pin LQFP package. 5300 chips have been produced with

a parametric yield of 89%. 1728 of them are used to read the TPCs of T2K. 300 chips are also used, with different slow-control parameters, to read the Silicon Photo-multipliers (MPPC) of the T2K 280m Fine Grain Detectors.

All the measured characteristics are fulfilling the design specifications. The power consumption is 7mW/channel. The peaking time and the shape of the signal (shown on Figure 5) are corresponding to our expectations as well as the dynamic range and the integral non-linearity (better than 1.2% over all the four ranges).



Figure 5: 60 fC test pulses recorded by AFTER with various peaking time (120 fC range).

The chip even operates perfectly at a 100 MHz write frequency although it has been designed for a target of 50 MHz.

A complete noise characterization has been made by varying input capacitor and shaping time. It has been used to extract a detailed noise parameterization reported and discussed in [5]. The parameters corresponding to a linear approximation of the ENC versus input capacitance function, valid in the 15 pF – 40 pF, are given in Table 2.

Table 2: Parameters for the linear approximation of the ENC versus detector capacitance characteristic for the various ranges and various peaking times. Approximation is usable in the 15 pF to 40 pF range.

|        |        | 100 ns | 200 ns | 500 ns | 2 μs | Unit  |
|--------|--------|--------|--------|--------|------|-------|
| 120 fC | Offset | 350    | 370    | 415    | 404  | е-    |
|        | Slope  | 22.2   | 14.6   | 7.8    | 5.3  | e-/pF |
| 240 fC | Offset | 690    | 700    | 775    | 750  | е-    |
|        | Slope  | 13     | 8.5    | 4.5    | 3.1  | e-/pF |
| 360 fC | Offset | 1015   | 1050   | 1135   | 1092 | е-    |
|        | Slope  | 10.7   | 5.6    | 3      | 2.8  | e-/pF |
| 600 fC | Offset | 1700   | 1740   | 1817   | 1780 | е-    |
|        | Slope  | 6.5    | 3.2    | 3.3    | 1.8  | e-/pF |

Figure 6 shows this characterization for the 120fC range. For input capacitances smaller than 30 pF and shaping time shorter than 200 ns, which are the parameters foreseen for the operation with the TPC of T2K the noise is smaller than 1000 e- rms which was our target.

The on-chip crosstalk has been measured. It is derivative and its amplitude is less than +/- 0.4% decreasing with the

distance between channels. The voltage droop in the SCA is less than 1 ADC bin - 164 electrons (for the 120fC range) or 1/4096 of the whole dynamic range - within 2 ms with a mean value of 0.29 ADC bin. This effect remains negligible compared to the noise. This excellent uniformity is emphasized by the distribution of the ENC for the 41,500 channels of the first equipped TPC shown on Figure 8. The mean ENC over the whole TPC is 720 electrons with a spread of only 28 electrons rms. As the maximum signal is 120 fC the dynamic range is 1040 corresponding to slightly more than 10 bit rms.



Figure 6: ENC versus input capacitance for different peaking times (120 fC range).

# C. AFTER on-Detector Performances

The performances of the AFTER chip are unchanged when soldered on FEC and plugged on detector. In particular, the average rms noise measured for the complete chain in operating conditions on the TPC field cage is less than 800 electrons corresponding to 5 ADC counts, for the 120 fC range and a 200 ns shaping time. We show on Fig. 7 a typical map of the rms noise of the 1728 channels ( $48 \times 36$ ) of one detector module. The very small dispersion are due to differences of routing- and then of input capacitances which can go from 7 pF to 17pF- between the detector and the corresponding input of an AFTER chip.



Figure 7: Map of the ENC on a typical Micromégas detector. (120 fC range, 200ns peak time).1 ADC bin corresponds to 164 e-.



Figure 8: Map of the ENC on a typical Micromégas detector. (120 fC range, 200ns peak time).1 ADC bin corresponds to 164 e-.

The only 2 pads exhibiting a pathological noise are shortcircuited on the detector. Inter-channel capacitance due mainly to the routing increases slightly the crosstalk to 1.2% which is still a reasonable value. Extensive characterizations of the electronics associated with detectors have been made using radioactive sources before their integration on the TPC at Triumf. A 55Fe spectrum measured with a AFTER-read Micromégas is shown on Figure 9. The 8.5% rms resolution measured on the 5.9 keV ray of iron is intrinsic to the detector itself and similar to the one obtain with high performance commercial preamplifiers.



Figure 9: <sup>55</sup>Fe Spectrum acquired with AFTER (200 ns peaking time, 120 fC range).

The 3 TPCs have been equipped at Triumf and extensive studies with cosmic rays, the calibration laser and a test beam have been successfully made there before shipping them to Japan where they will start taking data at the end of 2009.

One of the first cosmic events measured by the first TPC is displayed in Figure 10.



Figure 10: Cosmic event measured with the first TPC.

# IV. CONCLUSIONS

A new front-end ASIC has been designed to read the Micromégas endplates of the TPCs of T2K. Its low noise performances are fulfilling the requirements initially defined for the experiment (10 bit rms dynamic range). Its architecture associating 72 channels with very low noise front-end and a S.C.A. inside a same chip offers a compact, reliable and low power solution. The whole electronics based on this ASIC for the TPCs of T2K have been produced,

tested and integrated on the detectors and are now ready for commissioning. In spite its limitations (fix 2ms dead time in case of full readout and need for external trigger), but because of it versatility, its easiness of use and also because it permits the access of the signal waveform, the AFTER chip is now routinely used to test MGPD and even other types of detectors.

# V. REFERENCES

[1] Y. Itow, et al., hep-ex/0106019

[2] The Super-Kamiokande Collaboration, "The Super-Kamiokande Detector", Nucl. Instrum. Meth. A501, 2003, pp. 418-462.

[3] Y. Kudenko, "The near neutrino detector for the T2K experiment", in Proc. INSTR08, Novosibirsk, 5 March 2008. online: <u>http://www.nd280.org</u>

[4] J. Bouchez, et al., "Bulk Micromegas detectors for large TPC applications", Nucl. Instrum. Methods, vol 574 pp. 425-432, 2007.

[5] P. Baron et al., "AFTER, an ASIC for the Readout of the Large T2K Time Projection Chambers", IEEE TNS vol. 55, Issue 3, Part 3, June 2008, pp. 1744 – 1752.

[6] D. Breton et al., "A 16 bit-40 Mhz readout system based on dual port analog memories for LHC experiments," in Proc.
2nd Workshop on Electronics for LHC Experiments, Balatonfüred, Hungary, Sep.23–27, 1996, pp. 88–96.

# The Online Error Control and Handling of the ALICE Pixel Detector

M. Caselle<sup>a,b</sup>, A. Kluge<sup>a</sup>, C. Torcato De Matos<sup>a</sup>

<sup>a</sup> CERN, CH-1211 Geneva 23, Switzerland <sup>b</sup> Università Degli Studi di Bari, I-70126, Bari, Italy

On behalf of the Silicon Pixel Detector Project

michele.caselle@cern.ch

## Abstract

The SPD forms the two innermost layers of the ALICE Inner Tracking System (ITS) [1]. The basic building block of the SPD is the half-stave, the whole SPD barrel being made of 120 half-staves with a total number of 9.8 x 10<sup>6</sup> readout channels. Each half-stave is connected via three optical links to the off-detector electronics made of FPGA based VME readout cards (Routers). The Routers and their mezzanine cards provide the zero-suppression, data formatting and multiplexing and the link to the DAQ [2] system. This paper presents the hardware and software tools developed to detect and process any errors, at the level of the Router, originating from either front-end electronics, trigger sequences, DAQ or the off-detector electronics. The on-line error handling system automatically transmits this information to the Detector Control System and to the dedicated ORACLE database for further analysis.

# I. INTRODUCTION

The SPD status and performance can be affected by a variety of hardware malfunctions, such as perturbations or failures in the cooling or power supply systems, Single/Multiple Event Upset or Single Event Transients, degradation of optical connections, wrong front-end or back-end configurations, faulty trigger and timing sequences from Central Trigger Processor (CTP) [3], spurious/missing signals, DAQ optical link not ready, etc.

To detect and manage these anomalous conditions a new system named "error handling system" has been developed and fully integrated in the readout firmware and control software. It consists of hardware and software tools to detect and process errors at the level of the Router originating from the SPD subsystems. Errors are sent to the attention of the operator and are displayed as alarms in the Detector Control System user interface.

A statistical errors analysis (histograms, crosscorrelations, etc.) of the different error types can be done using the ORACLE database to evaluate the main error sources in the SPD hardware. This will allow monitoring the SPD stability over the lifetime in the ALICE experiment.

The error detection system was thoroughly tested in the integration lab using final system components and was then implemented in the ALICE experiment. This paper presents the hardware and software tools developed in order to recognize and process errors in the SPD. The first operation experience in the experiment is also reported.

# II. OVERVIEW OF THE SILICON PIXEL DETECTOR

The ALICE experiment at LHC is designed to investigate high-density strongly interacting matter in nucleus-nucleus interactions. In order to provide high granularity tracking information close to the interaction point in this high multiplicity environment, the two innermost layers of the ALICE detector are made out of Silicon Pixel Detector (SPD). It consists of two barrels at radii 3.9 and 7.6cm from the interaction point of hybrid pixel cells of dimensions 50µm  $(r\Phi) \ge 425 \mu m$  (z) that cover a total surface of  $0.24 m^2$ . The requirements in radiation hardness and the challenging material budget and dimensional constraints have led to specific technology developments and novel solutions. The LV power supply requirements for each half-stave are 1.85V @ 5.5A for the front-end chips and 2.6V @ 0.5A for the MCM, the total power dissipation for SPD is about 1.5kW. The cooling system is based on an evaporative system with  $C_4F_{10}$ . The SPD can provide a trigger input signal to the ALICE Central Trigger Processor (CTP) using the built-in Fast-OR functionality, in each chip, an electric pulse is fired whenever a hit is detected in a cell.



Figure 1: The out-layer of SPD detector and Half-stave view

The following section gives an overview of the ALICE Silicon Pixel Detector with major emphasis on the on-detector and off-detector electronics

### A. Half-Stave and on-detector electronic

The main components of each half-stave are two silicon pixel sensor (ladders) glued and wire-bonded [4] to the low mass Al-polyimide multi-layer flex (pixel bus), which at one end is attached to a Multi-Chip Module (MCM).

The ladder [5] is an assembly of a silicon sensor matrix of 256 x 160 cells bump-bonded to five readout front-end chips. The front-end pixel chip ALICE1LHCb [6,7] is an analog/digital mixed-signal ASIC produced in commercial 6 metal layer  $0.25\mu$ m CMOS process, made radiation tolerant by the design layout. It contains 8192 cells, arranged in 256 rows x 32 columns.

The MCM contains four radiation tolerant ASICs developed at CERN in a commercial 0.25µm CMOS process: the Digital Pilot [8], the Analog Pilot, the RX40 [9] and the GOL (Gigabit Optical Link) [10, 11]. It also contains an ST-Microelectronics optical transceiver (a custom development) containing 2 pin diodes and a 1300nm laser diode. The connection between the off-detector readout electronics and each half-stave is made via three optical fiber links: one link for the LHC@40MHz clock, one for the serial trigger, control and configuration signals and one 800 Mbit/s G-link for the data transmission from the detector. The halfstave bock diagram is shown in figure2.



Figure 2: Half-Stave block diagram

The Digital Pilot performs the readout of the 10 ALICE1LHCb pixel chips and the formatting of the readout data. The GOL receives the readout data from the Digital Pilot on at 40MHz, 16bit bus and serializes them in an 800Mb/s G-Link compatible stream. The Digital Pilot also broadcasts the clock and controls all ASICs presents on the half-stave in according to the commands received from the control room by "serial data" optical fiber. It is connected to the PIN diodes in the optical package and a RX40 chip convert these command in LVDS signals. The Analog Pilot provides the voltage references for the ALICE1LHCb pixel chips and monitors voltages and temperatures on the half-stave.

# B. Off-detector electronic (Router and LinkRX)

The off-detector electronics consists of 20 VME FPGAbased processor modules (Routers), each carrying three 2channel link receiver (LinkRx) daughter-cards, one Detector Data Link (DDL) and a trigger/timing receiver chip (TTCRx) [12]. The main processor on the 10-layer motherboard is a 1020 pins chip Altera Stratix EP1S30. One Router fully equipped is shown in figure 4. Each FPGA-based mezzanine Link Receiver card (LinkRX) serves two half-staves. It receiver the trigger signals and configuration patterns from the Router and propagate it to the half-staves. The readout chain of a LinkRX is shown in figure 3. During the readout phase the pixel data stream from the half-staves is deserialized by an Agilent HDMP1034 device [13], the received data is checked for format errors (described in the next section) and the data are stored in a buffer-FIFO, then zerosuppressed, encoded, re-formatted in the ALICE DAQ format [14] and written to a dual port memory.

When all data from one event are stored in the dual port memory the link receiver asserts event ready flag to be read out by Router processor.



Figure 3: Link Receiver block diagram

The Router receives the trigger control signals from the ALICE Central Trigger Processor (CTP) through the on-board TTCrx chip and forwards the trigger commands to the pixel detector. In the Router FPGA the L0 signal, L1 signal, L1 message, L2 message are decoded.



Figure 4: Router full equipped with three LinkRXs, one DDL card and one TTCRx chip.

The ALICE trigger has three levels (L0, L1 and L2) whereas the SPD system uses L1 and L2 triggers only. The ALICE1HHCb pixel chips provide binary hit information, which is stored in a delay line during the L1 decision time. In case of a positive L1 decision the hit is stored in one out of four multi-event buffers where the data wait for the L2 decision to be read out or discarded. After reception of the positive L2 decision, the Router starts to check the event ready flag in the status register of the link receivers. When an event ready flag appears the Router processor reads the data from the link receiver dual port memory. The Link receiver also asserts to the Router processor the error flags, that are identified in the data stream coming from detector, as described in the next section. Each Router sequentially reads one event from each of the link receiver channels in order to merge data coming from 6 half-staves and labels them with trigger and status information to build one Router sub event. The sub events of each of the Routers are sent to the ALICE-DAQ system through the ALICE detector data link (DDL).

The data access for the on-detector electronic control and configuration is performed via the router VME-interface. The router converts the data to JTAG compatible commands which are sent to the detector through the optical links with a maximum data rate of 5 Mbit/s.

# C. Control System

The operation of the ALICE SPD requires the on-line control and monitoring of a large number of parameters. This task is performed by the SPD Detector Control System (DCS). It is based on a commercial Supervisory Control And Data Acquisition (SCADA) named PVSS. Five PVSS projects run independently on different working nodes to control, respectively the cooling system, the Power Supply (PS) system, the interlock and monitor system and the FE electronics; the fifth project links together and monitors the 4 subsystem projects. The interface between the PVSS and VME Router racks is done by Front End Device (FED) servers a C++ custom standalone application.

# III. ON-LINE ERROR CONTROL AND HANDLING

A dedicated on-line error handling system, consisting of hardware and software tools, has been developed to detect and manage any anomalous conditions arising from possible malfunctions in the various SPD subsystems. Error flags and information are notified to the operator and are displayed as alarms in the Detector Control System user interface. In addition, two bits in the Alice data format Common Data Header (CDH) [14] are used to inform the Experiment Control System ECS [15] that one anomalous condition is present so that, according to the ECS-DAQ policy, the event data taking can be stopped when a predefined number of errors are detected.

All error conditions are divided in classes; at each class one error level is associated. The error levels are divided in: **fatal**, **error** and **warning**. The **fatal** level condition is asserted when the trigger sequence is not coherent, or the event data taking shows inconsistencies, or a severe malfunction is detected in a half-stave. In this case a bit is set in the CDH in order to notify the ECS-DAQ system. The **error** level is asserted when a wrong condition is detected in a half-stave, or in the on-detector or off-detector electronics, but the purity of the data taking remains acceptable. The **warning** level is used to inform the operator that an error condition is likely to arise. The typical example is when the temperature of a half-stave increase towards the threshold limit.

The error message is sent in an error block. The error block formatting is shown in figure 5. It consists of 4 words (32 bit) that contains all information necessary to identify both the errors typology and in which hardware part of the SPD is affect. The error messages include the timing reference information such as bunch and orbit number in order to identify the events in which the errors have been detected.

| Word 1 | Start error<br>header | Order of arrive | Bunch Crossing Number |                  |  |
|--------|-----------------------|-----------------|-----------------------|------------------|--|
| Word 2 | Er                    | ror Class       | Error De              | Error Details_1  |  |
| Word 3 | Error Details_2       |                 |                       |                  |  |
| Word 4 | E                     | Error Details_3 | 3 / Orbit Number      | Trailer<br>error |  |

Figure 5: Error data format (error block)

The new subsystem error handling architecture integrated in the SPD system is shown in Fig. 6. It consists of a software and a hardware layer.



Figure 6: Error handling architecture

All error conditions detected at different hardware levels are captured and identified by additional Finite State Machines, implemented in the LinkRx and Router FPGAs, that complement the off-detector data handling. The errors are formatted as shown in figure 5 and stored in a Single Port Memory (SPM) located on the Router board. The Router sets the "new errors present" flag on VME bus. The Front-End Device (FED) polls periodically the VME bus; when the error flag is detected all error blocks are read from Single Port Memory. The use of a Single Port Memory for storing and reading out the errors is needed to separate the errors readout logic from the main data taking process. The error blocks are recorded in the ORACLE local database together the actual "Run Number" and error timestamp. The FED propagates an error flag to PVSS to warn the operator. Together the full error description also the corrective action, in order to put the detector in a proper status, is sent to the operator. The use of the database to store all errors allows to keep the entire errors log in the SPD. This is fundamental for the future statistical studies.

## A. Software layer

The software layer consists of one low and one high tier. The low tier is a driver written in C++ added in the Front End Device (FED) server. It establishes the communication with the hardware units (Routers) and transmits the error information to the dedicated ORACLE Database. The local database is made in a smart structure able to store and execute the first error data elaboration faster. For each errors class one action on detector can be done in order to re-establish the proper SPD status. This is done also at level of database by means of a dedicate look-up table.

The high tier software layer consists of a custom application written in the Alice Supervisory Control and Data Acquisition (SCADA) system named PVSS. This application allows at the operator to receive both the error message and the error duration, in fact the hardware implementation is able to evaluate if the errors condition is still present or has disappeared. A statistical errors analysis of the different error types can be done using the database.



Figure 7: PVSS error handling User Interface

In figure 7 is shown the graphic user interface developed in the PVSS SCADA environment. The database queries allow to select errors details refereed at different runs, different Routers or in base at the errors classes.

### B. Hardware layer

The hardware tools for error detection consists of two different stages implemented in Verilog modules that were added to the standard off-detector components in the LinkRX and Routers handling the data acquisition. All error information is processed at 40 MHz.



Figure 8: Router FPGA firmware block diagram

The first stage "Detection stage" is used to identify the possible error types in the SPD system, e.g.: optical connection status and data format errors, front-end and backend errors/status, SEE (Single Event Effect), wrong trigger sequences or missing/spurious trigger signals, etc. The second stage is used to handle and transmit the error information to the SPD Front-End-Device server (FED) by VME bus.

The first stage consists mainly of an ad-hoc Finite State Machine designed to capture any anomalies in the different hardware levels. More than 3200 potential error topologies have been identified in the full SPD. When an error condition is found in the LinkRX modules, it asserts to the Router processor the error flags than will be processed in the second stage. The error classes defined in the LinkRX modules coming from pixel chip and MCM are: idle violation, Glink down error, Glink transmission error, Single Event Upset (SEU), control error, control detector feedback error and control pixel error. The anomalous conditions coming from LinkRX readout modules (see figure 3) are: FIFO overflow, memory overflow. Busy violation is asserted when a 5th L1 trigger signal has been received by the on-detector electronics, although all (4) multi event buffers were full and the corresponding busy signal (which has been sent to the trigger) has been active. Idle violation is asserted when a L2 signal (either L2y or L2n) has been received by the on detector electronics although no corresponding L1 signal has been received. Glink down error is asserted when the data link was down during the event read out. The Glink transmission error is asserted when Glink receiver found an error in transmission protocol during the readout of the corresponding event. SEU error is asserted when it was detected and was not recovered by the on-detector electronics. The control error is asserted when the MCM has not recognized one command. All control signals sent to the detector (L1, L2y, L2n, test signal, JTAG signals) are sent back on the fast link for error detection. The control detector feedback error is active if one of the signals sent to the detector was not received back between the precedent and the actual event read out. The control pixel error is asserted when error occurred on the pixel chips ALICE1LHCb. The FIFO overflow is asserted when at least one of the pixel converter readout FIFOs was full at least once during the data read out. The memory overflow is asserted when at least one of the pixel converter readout memories was full at least once during the data read out. All this errors are considered as "fatal" and the error information is sent to the DAQ together with event data in DAQ header [14].

The errors class defined in the Router FPGA main processor allows to find anomalous condition coming from trigger signals (CTP), state machine inside Router FPGA, wrong alignment between half-stave reference clock and LHC bunch number, data format, wrong operation/configuration during the "Start of Run" sequencer, FastOr signals not coherent in the data format or missing or noisy, half-stave temperatures close to the functionality threshold limit and more. The trigger signals and the messages are checked, aligned and stored in the trigger FIFO inside Router FPGA. The trigger errors occurred when the trigger level arrive in a not logical way or bad timing or in case of a spurious or missing signal. In case of a trigger error this is considered as "fatal" and the information is sent to the DAQ system in DAQ header [14]. The FastOr signals generated from the pixel chips are synchronous with the SPD reference clock. In order to keep a coherence between the FastOr signals generated and the bunch crossing and orbit number is important to check, during the "Start of Runs" ECS sequence, the alignment between SPD clock and bunch in orbit. When this alignment is not present, an error flag is set. This error is considered as "error", the operator receiver the associate error class and details but no information is sent to DAQ system. Also the FastOr setting is checked by special state machine that look

the consistency between the hits present in the pixel matrix and the relative FastOr signal, this allows to find both missing or noisy FastOr signals. Moreover, the half-stave temperatures are constantly monitored by Routers, if the threshold limit is reached the interlock signal is sent to the power supply. In fact the efficient cooling is vital for this very low mass detector. In the case of a cooling failure, the detector temperature would increase at a rate of 1 °C/s.

The second stage "Handling stage" consists of several modules that handle the errors signals coming from the first stage. The logical operation are: to order in base at the priority level, to format in the error block shown in figure 5 and store in the error FIFO (see figure 8). A special architecture has been implemented in order to process errors that coming at 40MHz. The signals errors generated in the first stage are collected by a module so-called "Error Manager". Usually one error condition generates a cascade of secondary errors in both LinkRX and Routers that will also be registered by the error detection hardware units. The Error Manager is based on a priority encoder logic used to select both the error entity and the order of arrival, in this way the hardware unit is capable to distinguish between the original error and secondary effects and will flag the cause of the problem. The logic diagram of the second stage is show in the following figure. Moreover, the Error Manager executed the error formatting.



Figure 9: Error Manger logic diagram

Once the errors are stored in the FIFO, they are transferred to the Single Port Memory, and arbitration is used to manage the Single Port Memory in both write and read mode during a VME access. When all blocks error are stored in the memory the "new error present" flag is set to inform the FED server. All operations are controlled by dedicate two Finite State Machine.

# IV. INTEGRATION AND COMMISSIONING

The first prototype of the on-line error handling system described here has been intensively tested and fully qualified in the laboratory by emulation of the error patterns generated at 40MHz. The on-line error handling system has been fully integrated and tested in the experiment. The test and integration was focused on the compliance with the overall ALICE system (CTP and DAQ) during both ECS sequences "Start of Run" and "End of Run". Off-line statistical studies are carried out in order to monitor the SPD stability during operation in the experiment.

# V. REFERENCES

[1] ALICE Collaboration, ALICE Technical Design Report of the Inner Tracking System, CERN/LHCC 99-12, ALICE TDR 4.

[2] http://ph-dep-aid.web.cern.ch/ph-dep-aid/

[3] http://epweb2.ph.bham.ac.uk/user/krivda/alice/

[4] M. Caselle et al., Nucl. Instrum. Methods A 518 297 (2004). Proceeding of the 9th Pisa Meeting on Advanced Detectors, La Biodola, Isola d'Elba, Italy May 25-31, 2003.

[5] P. Riedler et al. "Recent test results of the ALICE silicon pixel detector", Proceedings of the VERTEX 2003 conference, NIMA 549 (2005) 65-69.

[6] K. Wyllie, et al., Front-end pixel chips for tracking in ALICE and particle identification in LHCb, Proceeding of the Pixel 2002 Conference, SLAC Electronic Conference Proceedings, Carmel, USA, September 2002.

[7] R. Dinapoli, et al., An analog front-end in standard 0:25mm CMOS for silicon pixel detectors in ALICE and LHCb, CERN-2000-010, Proceedings of the Sixth Workshop on Electronics for LHC Experiments, Krakow, Poland, September 2000, p. 110.

[8] A. Kluge et al. "The ALICE on-detector pixel PILOT system – OPS", proceedings of the seventh on electronics for LHC experiments, Stockholm, Sweden, Sept 2001, CERN/LHCC/2001-034, p.95.

[9] F. Faccio et al. "RX40 An 80Mbit/s Optical Receiver ASIC for the CMS digital optical link", Reference and Technical Manual, CERN, October 2001.

[10] P. Moreira, et al., A 1.25 Gbit/s Serializer for LHC Data and Trigger Optical links, Fifth Workshop on Electronics for LHC Experiments, CERN/LHCC/99-33, 29 October 1999, p. 194.

[11] P. Moreira et al. "G-Link and Gigabit Ethernet compliant Serializer for LHC Data Transmission", NSS-MIC 2000, Lyon, France, 15 - 20 Oct 2000 - pages 9/6-9/9 (v.2).

[12] J. Christiansen, A. Marchioro, P. Moreira, T. Toifl, TTCrx Reference Manual, A Timing, Trigger and Control Receiver ASIC for LHC Detectors, CERN EP/MIC http://ttc.web.cern.ch/TTC/

[13] Agilent Technologies, Agilent HDMP-1032, HDMP-1034, <u>http://www.semiconductor.agilent.com</u>, 5968-5909E (2/00).

[14] R. Divià, P. Jovanovic, P. Vande Vyvre, Data Format over the ALICE DDL, CERN/ALICE internal note, ALICEINT- 2002-010 V 2.0.

[15] http://alice-ecs.web.cern.ch/alice-ecs/alice\_title.htm

Low power discriminator for ATLAS pixel chip

M. Menouni<sup>d</sup>, D. Arutinov<sup>a</sup>, M. Barbero<sup>a</sup>, R. Beccherle<sup>b</sup>, S. Dube<sup>c</sup>, R. Elledge<sup>c</sup>, D. Fougeron<sup>d</sup>, M. Garcia-Sciveres<sup>c</sup>, F. Gensolen<sup>d</sup>, D. Gnani<sup>c</sup>, V. Gromov<sup>e</sup>, T. Hemperek<sup>a</sup>, M. Karagounis<sup>a</sup>, R. Kluit<sup>e</sup>, A. Kruth<sup>a</sup>, A. Mekkaoui<sup>c</sup>, A. Rozanov<sup>d</sup>, J.D. Schipper<sup>e</sup>

> <sup>a</sup> Physikaliches Institut Universität Bonn, Nussallee 12, 53115 Bonn, Germany <sup>b</sup> INFN Genova, via Dodecaneso 33, IT - 16146 Genova, Italy

<sup>c</sup> Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720 United States of America <sup>d</sup> CPPM/ IN2P3/CNRS, Université de la méditerranée, 163 avenue de Luminy, case 902, Marseille, France <sup>e</sup> NIKHEF, National Institute for Subatomic Physics, Kruislaan 409, 1098 SJ Amsterdam, The Netherlands

menouni@cppm.in2p3.fr

# Abstract

The design of the front-end (FE) pixel electronics requires low power, low noise and low threshold dispersion. In this work, we propose a new architecture for the discriminator circuit. It is based on the principle of dynamic biasing and developed for the FE chip of the ATLAS pixel upgrade. This paper presents two discriminator structures where the bias current depends on the presence of a signal at the input of the discriminator. Since the activity in the FE chip is very low, the power consumption is largely reduced allowing the material reduction in the B-layer.

#### I. INTRODUCTION

A pixel FE chip is under development in a 130 nm CMOS technology for the B-layer replacement. The chip contains 26,880 pixels arranged in 80 columns and 336 rows. The pixel size is set to 50  $\mu$ m × 250  $\mu$ m.

The present pixel design uses a continuous biased discriminator where the bias current is defined to reach the required speed by minimizing the time delay. This allows assigning the hits to their corresponding bunch numbers with high probability.

In the analog pixel architecture, the discriminator power consumption can reach 20% of the total pixel power budget. Since the average counting rate for one pixel is low, it is possible to greatly reduce the power consumption of the pixel if the discriminator is biased only when a hit is present. This paper proposes an efficient way to design very low power discriminators for pixel detectors.

Two different architectures based on the dynamic biasing principle are proposed. In the first one, an input differential stage controls the bias of the main comparator stage. The input voltage signal is converted to a current signal used to bias the second stage after applying a multiplicative factor. The second architecture uses two stages. An auxiliary comparator with a lower threshold value powers up selectively the main comparator stage.

A prototype test chip has been designed as an array of 322 pixels and the different discriminator architectures are implemented in this design.

In the section II, the pixel structure is described and the main specifications are given. In the section III, the different proposed discriminator architectures are described as well as the present one. The section IV is dedicated to the experimental results and the comparison between the different architectures in terms of propagation delay as well as power consumption, noise and dispersion performances.

### II. THE PIXEL STRUCTURE AND SPECIFICATIONS

The analog pixel readout chain foreseen for the FEI4 chip is shown in Figure 1. The pixel contains a fast charge preamplifier, a second stage amplifier, a discriminator and a logic bloc to transfer the hit information to the chip periphery. It is optimized for low noise, low power and fast rise time. The output signal of the second stage is coupled to a discriminator for comparison with a global threshold. Threshold tuning is allowed by dedicated local DACs. Calibration of the analog pixel electronics is performed by a local charge injection circuitry.

| Table $1 : M$ | Aain specifications | of the FEI4 pixel |
|---------------|---------------------|-------------------|
|---------------|---------------------|-------------------|

| Pixel size                            | 50 	imes 250 | $\mu m^2$           |
|---------------------------------------|--------------|---------------------|
| Maximum charge                        | 100,000      | electron            |
| Normal pixel input capacitance range  | 300-500      | fF                  |
| Single channel ENC sigma (400fF)      | 300          | electron            |
| Total analog supply current @400fF    | 10           | µA/pixel            |
| Average hit rate                      | 200          | MHz/cm <sup>2</sup> |
| Total digital supply current @ 100KHz | 10           | µA/pixel            |
| Tuned threshold dispersion (max)      | 100          | electron            |

The important specifications of the FEI4 are summarized in the Table 1. We can see the low value of the average hit rate meaning that each pixel receives in average one hit every 1600 bunch crossing.



Figure 1 : Architecture of the analog pixel readout

#### **III. THE NEW DISCRIMINATOR ARCHITECTURES**

# *A. The current structure of the discriminator (Version 1)*

In the current design shown in Figure 2, the comparator is using the two stages usual architecture.

In the Front end pixel, the comparator output is driving a low capacitance composed mainly by the input capacitance of the driven logic gate added to the interconnection capacitance. Since this load capacitance has a low value, the propagation delay is limited by the bandwidth of the amplifier and not by the slew rate. In this case the transfer function poles have to be as large as possible in order to minimise the propagation delay. Secondly, specifications in term of sensitivity for this stage require a high DC gain. Thus, we need a design with a high gain-bandwidth product.



Figure 2 : The current structure of the comparator

Since the gain-bandwidth is proportional to the transconductance  $g_m$  of the input transistors, the bias current  $I_{B1}$  has to be set at a relatively high value. In order to assign the hits to their corresponding bunch numbers the time walk has to be maintained below 20 ns. A bias current around 4  $\mu$ A to 5  $\mu$ A is needed to meet this specification. This represents nearly 20% of the total pixel consumption.

# *B.* Discriminator with dynamic biasing based on current mirror (Version 2)

In this structure, the input differential stage composed by M11-M12 controls the bias current of the main comparator stage composed by M1-M2. The idea is to use the current flowing into an arm of the first differential pair and apply it

with a multiplication factor K to the second stage as an additional bias current.



Figure 3 : Dynamic biasing based on current mirror

If the input signal is far from the threshold  $V_{TH}$ , The totality of the bias current  $I_{B1}$  flows in the arm composed by the transistor M12. No current is flowing in the transistor M11 and there is no additional current to the bias current  $I_{B2}$  for the main comparator supply (Figure 4).

When the level of the input signal approaches the threshold, one fraction of  $I_{B1}$  is flowing in the transistor M11 and it is copied with applying a factor K. This current is added to the bias current  $I_{B2}$ . Everything happens as if the input voltage signal is converted to a current signal used to bias the main comparator stage with applying a multiplicative factor.



Figure 4 Waveforms timing in the comparator

In order to reach similar performances as in the version 1, the DC bias currents  $I_{B1}$  and  $I_{B2}$  are set to 350 nA each, setting the total consumption for this discriminator to the very low level of 700 nA.

The critical point for this structure is how to speed up the current mirror response. This is required to enable high current switching in the second stage when the input signal is crossing the threshold voltage.

The only way to reduce the propagation time in the current mirror is to reduce the gate capacitance of the transistors M13, M8 and M9. This can be done easily by reducing the size of

those transistors. However this has an impact on the threshold dispersion of the pixel.

# *C. Discriminator with dynamic biasing using variable resistance (Version 3)*

This architecture also uses two stages. The auxiliary comparator composed by M11-M12 corresponding to the first stage powers up selectively the main comparator stage. This is achieved by applying a lower threshold value  $V_{THL}$  to the auxiliary stage while the true value of threshold  $V_{THT}$  is applied to the main stage composed by M1-M2.



Figure 5 : Dynamic biasing using variable resistance

When the amplitude of the input signal  $V_{IN}$  coming from the amplifier is low and doesn't reach the threshold  $V_{THL}$ , the totality of the current  $I_{B1}$  flows through M12 and M14. The output of this stage is low and the transistor MR is off. There is no current in the main stage. When the input signal  $V_{IN}$ reaches  $V_{THL}$ , the output of the first stage increases and drives the transistor MR from region of high resistance to a region with low resistance allowing the current  $I_{B2}$  to flow in the differential pair of the second stage.

Since the current at the second stage can be potentially set to a high value, the speed of this comparator is well improved.

In order to optimize the switching performances of this design the threshold  $V_{THL}$  has to be near the true threshold. Thus, the first stage requires a low propagation delay but the required DC current is lower than that required by a two stage comparator. In order to keep the same performances as in the version 1, simulations show that the auxiliary stage bias current I<sub>B1</sub> has to be set around 1  $\mu$ A.

In this prototype, the threshold  $V_{THL}$  is generated with different sizes for M11 and M12. In the final design,  $V_{THL}$  can be generated by the same DAC generating the threshold  $V_{THT}$ .

#### IV. EXPERIMENTAL RESULTS

#### A. Test chip design

A prototype test chip has been designed as an array of 322 pixels. Different discriminator architectures were implemented in this design. All discriminators are associated to the similar front end. The chip was designed and implemented in a 130 nm CMOS technology. It is based on the previous prototype chip designed by the pixel collaboration [1].



Figure 6 : Test chip layout

Figure 6 shows the layout of the chip. The die size is  $3 \text{ mm} \times 2 \text{ mm}$ . It is arranged in 14 columns and 23 rows of pixels with a size of  $50 \mu \text{m} \times 250 \mu \text{m}$  each. The 3 versions were implemented in this chip. For each version of the discriminator, 3 to 4 columns of pixels were dedicated.

#### *B. Time delay*

In order to measure the resolution in time of the front end chain, we measured the propagation delay from the edge of the injected charge to the discriminator output. The level of charge is adjusted by an external calibrated voltage pulse flowing to the local charge injection circuit of each pixel. It is obvious that the total delay is not attributed only to the discriminator stage but depends also on the behaviour of the preamplifier and the amplifier stages when the injected charge varies. In this prototype, each comparator version is associated to exactly the same pixel design. Thus, the propagation delay differences between the studied structures are attributed only to the discriminator.

Figure 7 shows the propagation delay of the whole analog pixel chain when the threshold is set to 4500 e- and the charge over the threshold varies from 0 to 8000 e-.



Figure 7 : Time Delay for a charge threshold = 4500 e

The version 3 shows better switching performances than the other versions and the time walk is estimated to 12 ns. In this design, during the switching phase, the current varies from 0 to 30  $\mu$ A. This high current level allows reaching better time delay but can be a source of crosstalk which can be propagated to the sensitive areas through the power supply lines. Measurements will be done in order to check if there is any influence on the neighbouring charge amplifiers during this switching phase. In the version 2, the switching current is limited to  $8 \,\mu$ A. The time walk doesn't exceed 15 ns with a DC bias current of only 700 nA.

# C. Noise and threshold dispersion

Measurements show that the structure of the comparator doesn't have any influence on the noise. The typical value of the measured Input Noise Equivalent Charge is around 90 ewhen there is no input capacitance and no leakage current.



Figure 8 Threshold dispersion

However, the version 2 of the discriminator introduces more dispersion in the pixel as shown in Figure 8. In fact the area of he input transistor is set as low as possible in order to increase the speed of the current mirror. However, this threshold dispersion can be contained and all the pixels can be tuned after threshold adjustment.

|                                    | Current consumption | Current<br>spike* | Time walk |
|------------------------------------|---------------------|-------------------|-----------|
| Version 1<br>(Reference design)    | 5.3 μΑ              | 5.1 μΑ            | 20 ns     |
| Version 2<br>(Current-Mirror)      | 0.7 μΑ              | 8μΑ               | 15 ns     |
| Version 3<br>(Variable resistance) | 1.2 µA              | 30 µA             | 12 ns     |

Table 2 Performances comparison

\* Estimated from simulations

Performances are summarized in Table 2. The version 2 of the discriminator based on the current mirror technique is a good design candidate to be implemented in the final design.

### V. CONCLUSION

A very low power consumption discriminator suitable for pixel chips where the average hit rate is low has been described in this paper. The architecture is based on the dynamic biasing principle.

A prototype chip containing almost 300 pixels has been designed in order to test the different proposed architectures. We showed that the new structures can reach a faster time response, very low power consumption than the present design while at the same time ensuring no degradation of the other important performances of the front end pixel.

Using such a design in the FEI4 chip can save 20% of the total power consumption compared to the present design.

### VI. ACKNOWLEDGEMENTS

We would like to thank R. Fei for the test and K. Arnaud for the test board design.

## VII. REFERENCES

- [1] A. Mekkaoui, "FE-I4\_PROTO1", ATLAS Pixel Upgrade for SLHC Electronics, internal document (2008).
- [2] M. Garcia-Sciveres et al. "The FE-I4 Pixel Readout Integrated Circuit" Submitted to Nuclear Instruments and Methods October 13, 2009
- [3] M. Karagounis et al. "Development of the ATLAS FE-I4 pixel readout IC for b-layer Upgrade and Super-LHC" TWEPP-08 Topical Workshop on Electronics for Particle Physics
- [4] M. Barbero et al. "New ATLAS Pixel Front-End IC for Upgraded LHC Luminosity" Nuclear Instruments and Methods
- [5] R. Klinke "CMOS operational amplifier with nearly constant settling time" IEE Proceedings, Vol. 137, No. 4, August 1990,
- [6] M.G. Degrauwe et al "Adaptive Biasing CMOS Amplifiers" IEEE Journal of Solid State Circuits, VOL. SC-17, NO. 3, June 1982
- [7] A. Kayssi, "Analytical transient response of MOS current mirrors", Int. J. Circ. Theor. Appl. 2003; 31:453–464

# Design of the CMS-CASTOR subdetector readout system by reusing existing designs

W. Beaumont<sup>a</sup> for the CMS Collaboration, G.Antchev<sup>b,c</sup>

<sup>a</sup> Universiteit Antwerpen, Belgium, <sup>b</sup> CERN, 1211 Geneva 23, Switzerland, <sup>c</sup> INRNE-BAS, Sofia, Bulgaria.

wim.beaumont@ua.ac.be

# Abstract

CASTOR is a cylindrical calorimeter with a length of 1.5m and a diameter of 60cm located at 14.3 meters from the CMS interaction point and covering the range in pseudorapidity corresponding to 5.1 < | eta | < 6.6. The CASTOR project was approved in the middle of 2007. Given the limited resources and time, developing a readout system from scratch was excluded. Here the final implementations of the readout chain, the considerations for the different choices as well as the performance of the installed equipment are discussed.

# I. INTRODUCTION

CASTOR is an electromagnetic and hadronic calorimeter, based on a sandwich of tungsten and quartz plates, with a 14(16)-fold longitudinal (azimuthall) segmentation, positioned symmetrically around the beam pipe. In the longitudinal direction there are 2 segments for the electromagnetic and 12 segments for the hadronic part. In total there are 16 x 14 = 224 segments. The CASTOR detector was only installed at one side of the CMS experiment but for the readout design one had to take into account the possibility of a detector on both sides. PMT's are used as sensor elements that detect the Cherenkov light from one segment.



Figure 1: CASTOR detector installed on its support

The total integrated dose at the level of the PMT's is expected to be 20 kGy. The stray magnetic field measured near the PMT's is 0.16 T. The detector will be used to study several physics aspects, ranging from QCD to exotic physics. In proton-proton collisions, it will be used to flag the absence of energy or measure forward jets to allow the study diffractive scattering and the low-x proton structure. In heavy ion collisions it will be used e.g. for the search of "Centauroevents" and "strangelets". All these physics studies require specific trigger conditions and different dynamic ranges. For the absence of the rapidity gap a low energy detection is required while for jets, in case used as signature for discovery channels, the energy can be as high as 7 TeV.

Due to the limited time and manpower available for realisation it was clear from the start that one had to use existing designs for the readout system. To ensure active support and compatibility with the CMS readout system, it was decided to look only for designs that were used within CMS. Below we describe the systems that were evaluated in more detail.

# II. THE SENSOR SYSTEM

The choice of the PMT is limited by available space, radiation environment, magnetic field, expected signal and cost. Although enclosed by the partially iron radiation shielding, the PMT has still to cope with an magnetic field of about 0.16 T as measured in 2008. This was higher than anticipated by magnetic field simulation as the model used in the magnetic field simulation was not detailed enough for this region. With such a high field a mesh PMT was the only option and the Hamamatsu type-R5505 PMT's from the SPACAL calorimeter of the H1 experiment [1] at DESY fit inside the given space and could be recovered for our calorimeter. The R5505 has a limit for the average anode current of 10uA, resulting in a limit of the gain that can be applied. Because of a possible reduced transparency of the PMT window due to irradiation, a maximal gain obtained with a cathode voltage of 2200V could be necessary. The PMT base offered by Hamamatsu didn't fit the mechanical and radiation tolerance constraints so a custom made PMT base using a two PCB implementation had to be designed. (see Figure 2).



Figure 2: the R5505 PMT mounted on the CASTOR base

A simple bleeder and filter network is implemented with surface mounted components. An active network was not considered due to the high radiation environment. To guarantee a stable gain as a function of the activity in the detector the last dynode of the PMT has its own power supply line. The voltage step from cathode to first dynode was increased to increase the collection efficiency of the photoelectrons in a magnetic field environment.

To save space the cables were soldered to the PMT base. The cable and the base with the PMT mounted were tested before it was mounted on CASTOR. The HV power supply system from CAEN, the SY1527LC equipped with ten A1535N boards, is located in the service cavern and is connected via six ~100m long cables to the PMT bases in the experimental hall.

#### **III.** THE FRONT END CHOICES

Two front end architectures were considered: the front end components used for the CMS electromagnetic calorimeter (ECAL) and the components for the hadronic calorimeter (HCAL).

# *A.* Evaluation of the HCAL front end architecture.

The forward hadronic calorimeter of CMS, called HF, uses also PMT's to detect Cherenkov light from relativistic particles. The occupancy of this detector is however lower than for the CASTOR detector. The front end architecture is built around three chips.

The QIE chip[2], integrates the charge from the PMT over one bunch crossing time interval. This is an important property for a detector with a high occupancy. The analogue to digital conversion is also done by the QIE chip.

The CCA [3] is a control chip that decodes the command bus and takes care of combining the data from three QIE chips. These data packets are serialized by the GOL chip [4].

The GOL [4] drives a 850 nm laser that transports the data over 80m fibre to the data processing cards.

Due to the radiation levels inside the detector volume it is not possible to place these readout chips near the PMT's. Coax cables have to be used for the transport of the PMT signal to the front end chips located in a rack about 6 m from the detector. The necessary cable length of 12m is twice as long compared to HF and causes an increase of the electronic noise. Due to the long cable length a good matching between the 50  $\Omega$  cable impedance and the QIE input impedance is important. During the initial testing of the QIE chips the chips with an impedance near 50  $\Omega$  were selected and were used for the HF readout cards or set apart as spares. Therefore although enough QIE chips were available it was not clear if there were enough left with the correct input impedance. The digital output of the QIE chip is 10000 counts (non-linear coding) which is not sufficient to cover the full dynamic range for the maximal expected energy and for the detection of halo muons that have to be used for calibration purposes.

# *B.* Evaluation of the ECAL Front-end components

The ECAL front end architecture [5] is based on four chips. A multi gain pre-amplifier (MGPA), a four channel ADC [6], a data processing chip called FINEX [7] and a serializer chip GOL. The multi gain pre-amplifier together

with the ADC provides a greater dynamic range in respect to the QIE and in addition the chips are able to withstand the radiation environment. Less space would be needed to transfer the signals in optical fibres compared with the QIE solution. But to operate the chips a well controlled cooling system was required and this could not be realized in time. Also it was considered to place the chips outside the CASTOR volume implying a 12m long cable between the PMT and the chip. In that case no changes of the existing design would be needed. The MGPA chip was however not designed for an application with long lines between the sensor and the chip. The shaper follows closely the function  $f(t) = e^{-t/\tau}$  where  $\tau$  is typically 40ns. The input signal can be reconstructed by a FIR filter. To study the signal reconstruction the pulse response of the MGPA was digitized with a 1 GHz digital oscilloscope just before the entrance of the ADC. This signal was used in a C++ program to study the effectiveness of a FIR filter. As input signal the simulation result from PYTHIA was used. For the ECAL a method is followed to find the best precision of the energy [8] in a certain bunch crossing. For CASTOR the aim was to minimize the residuals from signals from previous bunch crossings as the occupancy is factors higher in respect to the ECAL situation ...

It was not possible to find weights for a FIR filter to lower the RMS value of the residual below 5 GeV taking into account the electronic noise, time jitter and the not ideal pulse response. An other risk was interference of external signals as the input of the MGPA chip is single ended.

#### C. Implementation of the front end electronics

As the studies on the ECAL front end showed that measurements for low energy would be worse the decision was made to continue with the QIE based architecture. Also the updated LHC schedule gave more time for selecting additional QIE chips. The shortcoming of the limited dynamic range of the QIE has to be dealt with by a trade off between the physics requirements has to be made to deal with the limited dynamic rang. For the calibration with muons special runs with higher gain settings for the PMT will be done. Finally 55 QIE cards were reproduced without changing the layout of the HF design. A new laser had to be selected and a solution was found for the different package of the laser. 39 cards are installed to readout the CASTOR detector and are placed inside three "HF crates". Six backplanes for the "HFcrate" had to be reproduced as well ten crate control modules (CCM). The extra components were needed to extent the number of spares and will be used for test setup. Especially for the backplane and CCM cards the production setup costs were the main cost factor due to the low quantities. The front end crates are powered by one MARATON system from Wiener. Due to space limitation in the rack it was not possible to have the same LV system as used by HCAL. The front end readout system was installed in autumn 2008.

# D. LED pulser

The LED pulser is built as a module that fits in the "HF crate". The LED pulser is able to provide a light pulse of less then 20ns in a specific bunch crossing. Amplitude and bunch crossing can be selected by software via the CCM. The light

from a blue LED is guided via a system of quartz fibres to the window of the PMT's. This signal is used for the commissioning and as reference signal during the calibration procedure.



Figure 3: The LED monitoring system

# IV. EVALUATION OF THE READOUT AND TRIGGER ATCHITECTURE.

The readout and trigger architecture provides the interface between the front end and the CMS-DAQ [9] interface called FRL [10]. Also it sends trigger information to the global trigger of CMS. For the DAQ interface the data from the different front ends has to be packed together, formatted and is sent via a data link (SLINK [11]) to the FRL.

As the readout units will have a high occupancy zero suppression or other data processing will not be done. The trigger logic has to convert the digitized code to an energy per readout unit. The energy per sector has to be summed up and has to be compared to a programmable threshold. A trigger logic card has to calculate the total energy inside CASTOR and will make a final trigger decision. Two different architectures were considered and described below.

# *A.* Evaluation of the HCAL readout and trigger architecture

The HCAL readout and trigger architecture [12] consist of three different 9U-VME cards called the HTR, DCC[13] and TTCf (see Figure 4). There were not enough boards available to readout two CASTOR detectors. Using this architecture implied the production of these 9U-VME boards in small quantities. In addition the DCC board consist of different types of mezzanine cards. Al the reproduction work was considered as too expensive and time consuming. In addition some of the components were obsolete so small redesigns would be necessary. Also no existing hardware could be identified that could be used as the trigger logic card. This has led to decision to search for an alternative architecture to implement the readout and trigger functionality. But because of the reasons mentioned below it was recently decided to use this architecture. In the mean time the HCAL community decided to redesign and produce new DCC boards from which

some are available for CASTOR. In addition it became clear that there will be no second CASTOR in the near future so less HTR cards are needed. An interface card called the oSLB [14], developed by the HCAL community, can be used as interface between the HTR cards and the trigger logic card. This possible solution for the trigger logic implementation has to be investigated in more detail.



Figure 4: The HCAL readout architecture components

# *B.* Evaluation of the CMS Preshower / TOTEM architecture.

The CMS Preshower collaboration and the TOTEM collaboration developed a common hardware platform for their readout and trigger architecture [15] [16] although they have different detectors with different front end architectures. The hardware is a 9U-VME host board with slots for mezzanines. The mezzanine that is used to de-serialize the optical signals from the front ends, called OptoRx [17] is used in both projects. CASTOR could join the final production of the VME host boards so minimizing the production costs and the time needed to follow up the production.



Figure 5: the CASTOR OptoRx

The OptoRx mezzanine could not be used as it was because the optical receiver (NGK POR10M12SFP) is qualified for data rates only up to 1.25Gbps with a wave length of 1310 nm while the GOL on the QIE card sends the data at 1600Mb/s and drives a 850 nm laser. In the CASTOR version of the OptoRx the NGK POR10M12SFP was replaced by a commercially available 12 channel optical receiver (AVAGO AFBR-742BZ) in a SNAP 12 package.

To exchange the receiver only a few changes were needed in the design. In addition the designer had already foreseen a cut-out in the VME host board that allows the use of OptoRx mezzanine equipped with SNAP12 optical receiver because the SNAP 12 package even without heat sink is 12 mm in height while the stacking height of the mezzanine is only 10mm.

The FPGA on the OptoRx performs operations that are comparable with the once implemented in the HTR card. The FPGA's of the VME host board will take care about the final formatting and the interface to the data link functions performed by the DCC in case of the HCAL architecture.

The trigger logic per sector will also be implemented in the OptoRx FPGA. The trigger information will be sent via the third mezzanine slot to the trigger logic card. The plan was to "transform" the OptoRx design to a transmitter. The OptoRx + VME host board combination can also be used as a trigger logic board as shown in Figure 6. Despite the flexibility of the system it was not possible to find a combination to make efficient use of all the optical inputs and to fulfil the trigger requirements. So it was decided to leave out the information of the two last layers for the trigger decisions.



Figure 6: inter connections of the VME host boards equipped with OptoRx ( CDTC )

Parts of the firmware could be copied from the various projects. The VME interface code was copied from the TOTEM project as well the code for memory control and local bus on the VME host board with slight modifications. Concerning the OptoRx the de-serializer code from the Preshower firmware was used as a starting point while for the data synchronization the HCAL firmware was used. Initially it was assumed the firmware could be ready in one year. But finally the firmware for the project is not yet finished although most of the functionality is implemented. The fact that the project is not finished in time is due to an underestimation of the complexity of the system aspects. More detailed system evaluation tests should have been done during the implementation phase. As the start of the LHC is a strict deadline, recently it was decided to use the HCAL readout and trigger architecture as final system. There were no technical difficulties that indicate that the VME host board with OptoRx could not fulfil the requirements. That the combination of OptoRx and VME host board could be used for our purpose is because of the modular approach of the architecture and that this architecture was designed to be used for different applications from the beginning.

# V. COMMISSIONING

In 2007 and 2008 a proto-type sector was tested in the H2 SPS beam line at CERN. From these tests the resolution of the detector was obtained. In figure 7 the result of the muon response is compared with the signal from the pedestal. The pedestal in the figure is taken as an average over 4 entries per event so the effective RMS value is  $2.2 / \sqrt{4} = 1.1$  counts slightly higher then the expected value of 0.8 count. In 2009 the final half CASTOR structure equipped with 2 sectors was placed in the beam line to re-measure the calibration constants.



Figure 7: muon signal and pedestal obtained from the test beam 2008

After the detector was fully assembled the LED system was used to check the working of each individual PMT. It turned out that some fibres were not correctly installed. Some PMT's had short circuits between the last two dynodes. The PMT-cable test didn't cover the detection of this fault. Not all the PMT's suffering from this problem could be replaced due to lack of spare PMT's.

CASTOR was positioned on its support in CMS at the end of June 2009. The average noise level of a readout unit is one QIE count with no indication of a specific interference signal. There are 16 PMT's that don't response to the LED signal, of these 8 PMT's response on the environment light.

Since beginning October 2009 CASTOR is sending data to the CMS DAQ. The trigger logic has still to be implemented.

#### VI. CONCLUSION

The initial intention for the implementation of the readout architecture was to copy everything from the HCAL architecture. However the cost to reproduce all of the necessary components changed this intention. After some additional study the HCAL front end was nevertheless selected. For the readout and trigger architecture an alternative was proposed and worked out in detail and long time considered as the base line implementation. The firmware for the alternative architecture could not be finished in time so finally the complete HCAL architecture has been implemented as conditions have changed over time. CASTOR is now installed inside CMS and is ready to take data.

## VII. ACKNOWLEDGEMENT

We would like to thank the HCAL community for their active support during the implementation of the readout of CASTOR especially Richard Kellogg who was always willing to explain all the details of the HCAL readout related issues.

This work is supported in part by "A. G. Leventis" Foundation (Hellas) and the Hellenic GSRT under programme EPAN

# VIII. REFERENCES

[1] H1 SpaCal Group at DESY: "The H1 Lead/Scintillating-Fibre Calorimeter DESY" Red Report 96-171, published in NIM A386 (1997) 397-408

[2] Tom Zimmerman, Alan Baumbaugh, Jim Hoff, Sergey Los, Theresa Shaw: "Specification for Production CMS QIE ASIC (QIE8)", FERMILAB

[3] R.J. Yarema, A. Baumbaugh, A. Boubekeur, J.E. Elias, T. Shaw: "Channel Control ASIC for the CMS Hadron Calorimeter Front End Readout Module", 8th Workshop on Electronics for LHC Experiments, Colmar, France 200

[4] Moreira, P et all,: "A radiation tolerant gigabit serializer for LHC data transmission" 7th Workshop on Electronics for LHC Experiments, Stockholm, Sweden, 2001

[5] Raymond, M; Crooks, J; French, M; Hall, G: "The MGPA electromagnetic calorimeter readout chip for CMS" 9th Workshop on Electronics for LHC Experiments.

[6] A. Marchioro et al: "A CMOS low power, quad channel, 12 bit, 40Ms/s pipelined ADC for applications in particle physics calorimetry", 9th Workshop on Electronics for the LHC Experiments, Amsterdam, Sept., 2003.

[7] M. Hansen for the ECAL frontend workgroup :"The New Readout Architecture for the CMS ECAL"

12th Workshop on Electronics For LHC and Future Experiments, Valencia, Spain, 2006

[8] Brunelière, R; Zabi, Alexandre: "Reconstruction of the signal amplitude of the CMS electromagnetic calorimeter" CMS-NOTE-2006-037; 2006

[9] CERN/LHCC 2002-26, CMS TDR, Data Acquisition and High Level Trigger

[10] Arcidiacono, R Bauer, G ; Boyer, V ; Brett, A ; Cano, E ; Carboni, A ; Ciganek, M ; Cittolin, S ; Erhan, S ; Gigi, D et al. : "Flexible custom designs for CMS DAQ" Nucl. Phys. B, Proc. Suppl. 172 (2007) 174-177

[11] "S-Link64" project site: http://cms-frl.home.cern.ch/

[12] CMS-HCAL collaboration,: "Design, Performance, and Calibration of CMS Hadron-Barrel Calorimeter Wedges" chapter 3, Eur.Phys.J.C55:159-171,2008.

[13] " E. Hazen, J. Rohlf, S. Wu, A. Baden, T. Grassi: "TheCMS HCAL data concentrator: a modular, standardsbased implementation 7th Workshop on Electronics for LHC Experiments,, Stockholm, 2001

[14] oSLB project web site : http://pcephc356.cern.ch/document/CountingHouse/HTR/HT R Mezzanines/oSLB/oSLB

[15] G. Antchev, P. Aspell, D. Barney, S. Reynaud, W. Snoeys, P. Vichoudis "The TOTEM Front End Driver, its Components and Applications in the TOTEM Experiment", Topical Workshop on Electronics for Particle Physics, 2007, pp.211-214.

[16] G. Antchev, D. Barney, W. Bialas, J. C. Da Silva, P. Kokkas, N. Manthos, S. Reynaud, G. Sidiropoulos, W. Snoeys, P. Vichoudis "A VME-Based Readout System for the CMS Preshower Sub-Detector", IEEE Trans. Nucl. Sci. 54 623.

[17] S. Reynaud, P. Vichoudis "A multi-channel optical plug-in module for gigabit data reception", Proceedings of the 12th Workshop on electronics for LHC and future experiments, 2007, pp.229-231.
# Friday 25 September 2009 Plenary Session 6

# Advances in Architectures and Tools for FPGAs and their Impact on the Design of Complex Systems for Particle Physics

Anthony Gregerson<sup>a</sup>, Amin Farmahini-Farahani<sup>a</sup>, William Plishker<sup>b</sup>, Zaipeng Xie<sup>a</sup>,

Katherine Compton<sup>a</sup>, Shuvra Bhattacharyya<sup>b</sup>, Michael Schulte<sup>a</sup>

<sup>a</sup> University of Wisconsin - Madison

<sup>b</sup> University of Maryland - College Park

{agregerson, farmahinifar, zxie2}@wisc.edu
{plishker, ssb}@umd.edu {compton, schulte}@engr.wisc.edu

#### Abstract

The continual improvement of semiconductor technology has provided rapid advancements in device frequency and density. Designers of electronics systems for high-energy physics (HEP) have benefited from these advancements, transitioning many designs from fixed-function ASICs to more flexible FPGA-based platforms. Today's FPGA devices provide a significantly higher amount of resources than those available during the initial Large Hadron Collider design phase. To take advantage of the capabilities of future FPGAs in the next generation of HEP experiments, designers must not only anticipate further improvements in FPGA hardware, but must also adopt design tools and methodologies that can scale along with that hardware. In this paper, we outline the major trends in FPGA hardware, describe the design challenges these trends will present to developers of HEP electronics, and discuss a range of techniques that can be adopted to overcome these challenges.

#### I. INTRODUCTION

High-energy physics systems have a history of pushing the boundaries of technology. The electronics in HEP systems often require extremely high bandwidth and computational throughput, precise timing, and tight real-time processing constraints. These stringent performance specifications historically demanded the use of custom ASIC solutions [2], because in the past, programmable hardware such as FPGAs were inadequate to the task. Although ASICs are capable of achieving the highest possible performance, they suffer from two major shortcomings for HEP applications. First, they are very expensive to produce in low volumes because the costs of fabrication are not wellamortized. Second, they are rigid, fixed-function devices that offer very limited flexibility for adjustment to new experimental parameters or algorithms. Early designers were forced to cope with these shortcomings, as ASICs were the only technology capable of meeting key performance requirements of HEP systems. However, as time has passed, continual advancements in the semiconductor industry have produced major improvements in the density and speed of electronics. Consequently, FPGAs have also improved in capacity and performance. Modern FP-GAs are able to achieve performance levels suitable for many HEP applications and provide attractive properties such as reprogrammability and smaller low-volume costs. The result of these trends has been a rapid adoption of FPGAs in HEP electronics. A large proportion of the electronics in the Compact Muon Solenoid Level-1 Trigger, for example, are based on FP-GAs, and many of the remaining ASICs are scheduled to be replaced with FPGAs in proposed upgrades [1].

Improvements in FPGA technology are not likely to end soon. Today's high-density FPGAs are based on a 40-nm silicon process and already contain an order of magnitude more logic than the FPGAs available at planning stage of the Large Hadron Collider's electronics. 32 and 22 nm silicon process technologies have already been demonstrated to be feasible; as FPGAs migrate to these improved technologies their logic density and performance will continue to increase. With the next generation of HEP designs, the question has changed from 'When will programmable hardware be good enough to meet our needs?' to 'How can we take maximum advantage of the advancing density and performance of programmable hardware in our designs?' The answer to this question is not as simple as it may seem. Faster, higher-density devices may enable more complex algorithms, greater functionality, and higher-resolution processing-but only if the methods of designing, testing, implementing, and verifying these systems adapt to meet the needs of these new levels of complexity. As devices continue to improve, the importance of using the right combination of tools and methodologies to enhance developer productivity and create maintainable designs will become increasingly critical. Relying solely on established hardware design languages (HDLs) may not be sufficient to meet the challenges of future system design. In this paper, we examine recent trends in FPGAs and the implication these trends have on the adoption of new software tools, techniques, and methods for the design of HEP systems based on future generations of FPGAs.

The rest of this paper is organized as follows. In Section II, we cover the major trends in FPGA hardware. In Section III, we describe the problem of managing increased design complexity and describe a series of tools and techniques that can be used to create scalable design processes. In Section IV, we describe the effects of FPGA trends on the problem of hardware verification and debugging and present tools and techniques for managing this problem. Finally, in Section V, we provide our conclusions about the impacts of FPGA trends on the future of electronics design for high-energy physics applications.

## II. FPGA HARDWARE TRENDS

We divide FPGA hardware trends into two different categories: trends in performance and trends in resource capacity. In this section we examine performance and resource capacity trends for high-end FPGAs from Xilinx over the past ten years [34, 35, 36, 37, 38, 39, 40]. HEP applications often rely on cutting-edge technology to meet their stringent requirements. Therefore, we will present composite data based on the largest and highest-performance device available from either vendor at a given point in time.

#### A. Performance

There are two aspects of FPGA performance that have a strong impact on HEP designs, maximum operating frequency and I/O bandwidth.

Operating frequency is directly related to the computational capabilities of a device. Higher frequencies allow calculations to be completed faster. This is important because computational latency is one of the key constraints of many HEP designs. A graph of the maximum frequency and silicon process technology of high-end commercial FPGAs is shown in Fig. 1. Frequency has scaled linearly with time. It is also notable that at 600 MHz, modern FPGAs are still operating at relatively low frequencies compared to high-end ASIC-based chips, such as microprocessors. Whereas ASICs have experienced many challenges in continuing to scale their frequency up, such as power density concerns and pipeline scaling, FPGAs still have some headroom before they encounter these problems. As indicated in the graph, frequency is closely related to the silicon process size used to manufacture devices. Smaller processes can produce transistors with lower latencies (and correspondingly higher frequencies). As of 2009, high-end FPGAs are being manufactured on a 40-nm silicon process. Intel has already demonstrated viable 32-nm and 22-nm processes [3]. Therefore, we expect that FPGA frequencies will continue to follow increasing trends through the near future.



Figure 1: Frequency and CMOS process size trends for high-end commercial FPGAs.

A second key performance parameter of FPGA devices is their total I/O bandwidth. One of the key distinguishing characteristics of particle physics applications are the tremendous data rates produced by HEP experiments. The electronics involved in triggering, data acquisition, compression, and other real-time data processing need very high bandwidth to handle copious amounts of experimental data. Often, the amount of data that can be processed by each device in these systems is limited by device bandwidth rather than by logic resources or computational speed. Such systems require many duplicate devices to handle all the data.



Figure 2: Total serial I/O bandwidth trends for high-end commercial FPGAs.

As discussed later in Section II.B, although the total number of I/O pins on FPGAs has not experienced significant growth in recent years, total device bandwidth has rapidly improved due to the introduction of high-speed serial transceivers. Fig. 2 shows the trend in total serial I/O bandwidth over the past decade. I/O bandwidth has managed to maintain an exponential growth rate in recent years, allowing it to keep pace with the growth of logic resources (see Section II.B). The matching growth of bandwidth and logic is a key trend for HEP system designers. If FPGAs continue this trend in the future, devices will maintain consistent resource ratios, making it easier to consolidate distributed, multi-device systems into a smaller number of devices. If, on the other hand, bandwidth growth falls behind logic growth, designers will need to consider ways they can use extra logic to improve the quality of their systems. This might mean increasing the precision of computations or adding redundant error correction. Both Xilinx and Altera have recently introduced 11-Gb/s transceivers, but have not yet integrated these transceivers on all serial I/O pins. Moreover, the number of pins dedicated to highspeed serial I/O could be increased; serial I/O is currently only available on a small fraction of available pins. Therefore, it is feasible for total I/O bandwidth to continue to grow.

#### B. Resource Capacity

In addition to FPGA performance, resource capacity is also a major concern. The quantity of logic resources available to developers may determine the amount of functionality and precision of computation that can be incorporated into each device. Unlike the monolithic silicon wafers used to implement ASIC designs, modern FPGAs have a heterogeneous design substrate. They include look-up-tables (LUTs), flip-flops, highdensity block RAM (BRAM), optimized multiply and accumulate chains (DSP blocks). Since FPGAs are packaged chips, it is also worthwhile to consider the total number of I/O pins available. Graphs of the growth of each of these resource types, normalized to the resource capacity of one of the highest capacity FPGAs from 1998, are shown in Fig. 3. Note that the DSP blocks were not introduced into Xilinx FPGAs until 2001, so multiplier growth is normalized to this later device.



Figure 3: Resource capacity trends for high-end commercial FPGAs, shown in on a logarithmic scale.

There are several key relationships to be observed from these trends. All logic resources (LUTs, flip-flops, BRAM, and DSP blocks) have exhibited exponential growth. This can be attributed to advancements in silicon process technology, and thus is likely to continue in the near future. In particular, sequential state (BRAM and flip-flops) makes up a larger percentage of the total logic resources. The total number of I/O pins, however, has not shown sustained growth due to physical limitations. The package size of the device, the number of pins that can fit in that space, and the feasibility of board-level routing for those pins are significant limiting factors. In fact, pin count has shown a downward trend in recent years; I/O bandwidth has only managed to increase due to the introduction of high-speed serial transceivers on a subset of the remaining pins.

Of most importance are the ways in which these trends interact to impact the design process. As the amount of logic and bandwidth to each device increases exponentially, the size and complexity of designs possible on a single device increases dramatically. We discuss design methods and software advances that can be used to manage this challenge in Section III. Logic is growing at a much faster rate than the number of I/O pins. The rapidly increasing ratio of device state to I/O pins will make it more difficult to rely on the use of external logic analyzers to perform hardware verification and debugging for complex sequential circuits. We discuss FPGA-centric verification and debugging tools in Section IV.

#### **III. DESIGN COMPLEXITY**

New FPGA generations will continue to bring increases in device resources and performance. Future architectures will have significantly more logic and bandwidth available on each chip than what is available today. Designers can leverage these improvements to enable higher levels of system integration, more nuanced algorithms, robust error correction and reliability, and higher-resolution processing. However, these design enhancements come at a price; as designs become larger and more capable, they also become more complicated. If it already takes several person-months to properly design, simulate, debug, and verify the firmware of an FPGA with tens to hundreds of thousands of logic cells, how long will it take to do the same for FPGAs with tens of millions of cells?

Before designers can take advantage of larger devices, they must ensure that they can meet three main objectives. First, we must make sure we can control the design costs. The logic density of FPGAs may double every few years, but the budgets of scientific research organizations do not. Second, to maintain a reasonable pace of advancement of HEP systems, the design time for these circuits cannot simply increase proportionally with the growth of logic capacity. Third, ensure the collection of valid scientific results and protect the operation of critical experimental systems, the number of bugs and defects in these circuits must be held to a very low level. To achieve these three objectives, we must increase the productivity and effectiveness of the design and testing of systems. In some cases this can be achieved by adopting new software tools and technologies. In other cases it may mean that developers must transition from ad hoc design practices to more formal and rigorous methodologies.

In this section we cover three concepts for increasing productivity for complex designs: collaborative techniques, scalable design methodology, and high-level-language tools.

### A. Collaborative Design

One approach to managing larger, more complex designs is to tap into a larger pool of design talent and expertise. On a global system-wide scale, HEP projects already rely on large-scale collaborative efforts from many research and design groups. However, collaboration can also employed at the level of individual designs. This sort of collaboration can be implemented on varying scales.

On a small scale, each design group could institute a policy of seeking peer review of their work to ensure that it is of the highest quality. Presenting design decisions for external review not only provides the benefit of outside expert experience and insight, but also helps the group to systematically explore, justify, and document their design choices. Such a review could be regularly applied at multiple levels, including specifications, major design decisions, and actual firmware code.

On a larger scale, related groups within HEP projects could implement infrastructure for sharing firmware code with each other. Although the LHC collaboration has access to a spectacular range of expertise from numerous universities and labs, teams often work in isolation until it is time to begin integrating their systems. Although each group has its own set of goals, constraints, and platforms, it is reasonable to expect that some design work could be shared between teams. For example, many systems may need to decompress zero-suppressed data transmissions, calculate parity and apply error correction, or sort sets of data. If many groups replicate the same design process needed to implement these functions, time and money are being wasted.

On the largest scale, firmware source code published and made open to public scrutiny after the initial internal design. Studies suggest that the average defect rate for open source software is significantly lower than that of proprietary software [5]. It may be possible to achieve similar improvements with open source firmware. Moreover, opening the code up to the public for comment could allow review by hundreds of external designers at very low cost.

#### B. Methodology

One of the most crucial components to managing complex projects is adherence to a structured design methodology. The topic of design methodology is too broad to be thoroughly covered in a single paper. Therefore, rather than attempting to provide an exhaustive summary, we focus on a few concepts that are particularly useful to the design of complex digital systems for HEP applications.

#### 1) Specification

The first step in the design of any reasonably large system is the development of the design specifications. The specifications include the performance requirements of the design - which may include aspects such as latency, throughput, I/O bandwidth, error correction capabilities, and other factors - a description of the algorithms to be implemented, and input and output data formats. At a higher level, the specifications may also include factors such as the monetary and time budgets. Development of a robust and well-documented set of specifications for each major portion of a design should be performed early in the design process, possibly before the first line of firmware code is written. Clear and early communication of these requirements helps to avoid the errors and incompatibilities that arise when teams work from a set of incomplete or unclear specifications. Moreover, the development of the specifications may itself yield insight into the strategies to take during the firmware development process, guide resources to the most challenging aspects of the design, and uncover potential problems before a major engineering investment has been made.

For systems that implement physics algorithms, such as trigger systems, the specifications for the physics and electronics components are typically kept separate. For example, the algorithms are developed in order to meet the physics requirements and verified via simulation and mathematical models. Then, these algorithms are used to develop specifications for the electronics. This methodology makes sense when considering that the physics performance is the first concern of an experiment. However, the problem with this method of developing electronics specifications is that it may constrain the ability of engineers to evaluate alternate designs. For example, making slight alterations to a triggering algorithm might have minimal impact on the triggering efficiency (the physics) but yield major savings in the complexity of the hardware (the electronics). With limited budgets and more stringent hardware requirements, it may become prudent to view the development of the hardware systems that support the experiments as a first-class concern. Efforts should be made to integrate the physics and electronics specifications and form multi-disciplinary teams to evaluate the impact of algorithmic modifications in both the physics and electronics domains.

#### 2) Design Practices

When designing firmware code for complex systems, there are a variety of techniques that can be used to help manage large projects. One of the most basic of these is the concept of modular design. Modular design uses a 'divide and conquer' approach to break up big projects into smaller parts that are easier to design, test, and verify. Systems are partitioned into a group of interconnected modules that each implement a basic function, and these can be combined (perhaps hierarchically) into the required larger structure. Ideally the system should be partitioned in such a way that modules have few interdependencies and each module's function can be analyzed and understood in isolation. Furthermore, modularity provides the benefit of module reuse. For example, a 32-bit-wide adder module could be hierarchically designed, composed of several 8-bit-wide adder modules.

Modular design offers several important advantages over monolithic design. Building a system up by starting with smaller modules allows the developer to test and debug the firmware code in small pieces. This makes it easier to identify and isolate bugs in the code. Modular design also allows developers to perform synthesis on basic computational modules and obtain early performance estimates to guide later development. Building up a library of modules that implement basic functions also allows code re-use, avoiding duplicate coding work and reducing the testing burden. Such modules could also be shared across different projects using a collaborative firmware repository as described in Section III.A.

For the development of HEP systems, it may be beneficial to use parameterization to further increase design re-use beyond what would be possible with modularity alone. Parameterization is a powerful construct available in all major HDLs. It allows a designer to use computations on constants to determine design features such as the size of registers and width of buses. Modifying paramterized features requires simply changing the parameter value in the code, then re-compiling the HDL code into a new (modified) hardware structure. By parameterizing modules, one could, for example, use the same parameterized adder code to create multiple adders of different bit-widths without having to alter the firmware, potentially introducing new bugs. Parameters can also be used to rapidly explore the impact of different design decisions. For example a developer could study the effect that varying the precision of a multiplication unit has on its maximum frequency.

When parameterized modules are combined with code generation constructs available in HDLs, they give designers a powerful tool for exploring large-scope considerations, such as the number of design units that can fit on a given FPGA model. This analysis can be especially useful in large-scale HEP systems where a design may need to be partitioned across multiple devices. The use of fully-parameterized designs enables a rapid evaluation of various partitioning schemes and the ability to gauge the tradeoffs of using different models. It also allows the HDL code to be quickly adapted to different FPGAs; this can be a very valuable trait in HEP designs where the long development process may mean that the target device is not finalized until well into the development cycle. Moreover, it allows the design to be gracefully adapted to larger FPGAs if the hardware is upgraded in the future.

#### 3) Firmware/Emulator Co-design

Designing firmware is generally a more time-consuming process than writing software using a high-level language. As such, it is common practice to first create an emulator for a HEP hardware system in software, use it to explore and test new algorithms, then design the hardware to match the function of the emulator. This approach is effective for rapidly testing algorithmic changes, but often leaves a large implementation gap between the emulator and the hardware. Algorithms that are easy to implement and achieve high performance in software do not necessarily share those properties in hardware. This may lead the software designers to describe algorithms that are very difficult for the hardware designers to implement efficiently. Also, the high-level code that implements the emulator may have a much different structure and interface than the HDL code that implements the firmware, making it difficult to share the same testing infrastructure between the two.

In the future, it may be advantageous to move to a methodology that focuses on firmware/emulator co-design rather than a sequential process of creating the emulator and then creating the firmware or vice versa. The concept of co-design is to allow systems to be developed in tandem, allowing rapid transmission of feedback and implementation of changes. Better communication between firmware designers and algorithm developers should lead to the adoption of algorithms that both meet the needs of the experiment and are well-suited to hardware. Moreover, a co-design process would encourage the use of similar structural hierarchies in the emulator and firmware. This would allow the use of a unified test framework, making it much easier to pinpoint bugs in either implementation.

One of the most important aspects of a co-design methodology is to ensure that the speed of firmware design does not impede the software design. Therefore, rather than directly going between a high-level language, such as C/C++, and an HDL, it may be beneficial to use a hardware verification language (HVL) such as SystemVerilog or SystemC [7, 8] to help bridge the gap.

#### 4) Testing Practices

As projects migrate functionality from high-level languages to firmware implementations or collaboration begins via a codesign methodology, a wide gap separates hardware system and software emulator design approaches. Each has their own programming models and development environments. The original software application description can range from general imperative languages like C, to object-oriented languages like C++ or Java, to domain-specific approaches like MATLAB. Firmware may be developed in SystemVerilog or SystemC during the early design exploration phase and in VHDL or Verilog in the implementation phase. This multitude of languages and programming environments makes the design time lengthy and error prone, as developers must often manually transcode between different languages and environments. Many 'best practices' are utilized in industrial and academic environments to help this process, such as automatically generating documentation (e.g. Javadoc), auto-configuration, adherence to interface specifications, and unit testing.

In particular, unit testing facilitates productive design by integrating testing early into the design flow to catch erroneous or unexpected module behavior earlier in the design cycle, when it is cheaper and easier to alter the design or specifications. Such techniques have proven effective for many languages and platforms, but for design projects that involve transcoding and retooling for the final implementation, existing tools still leave many manual, error-prone steps in the process. This leads to longer design times with lower-quality implementations.

Typically when software designers employ unit testing, they use frameworks that are language-specific (e.g. see [9]). More than just a syntactic customization, such frameworks are often tied to fundamental constructs of the language, such as checking that methods exhibits the proper form of polymorphism in an object-oriented language. Furthermore, these language-specific approaches work well when designers are using only a single language or a single platform for both development and final implementation. But when designers must move between languages with different constructs (such as when moving between an emulator coded in C++ and firmware written in VHDL), the existing tests must be rewritten. This consumes extra design time and creates a new verification challenge to ensure that the corresponding unit tests between these two languages are, in fact, performing the same test.

A new testing approach is needed that is language and platform agnostic. Such an approach is possible by leveraging model-based design for projects that integrate heterogeneous programming languages and by applying and integrating different kinds of design and testing methodologies. With model-based development, automatic testbench creation is possible, improving the ease with which designers can create crossplatform tests.

One tool that has been developed to manage this aspect of the design process is the *DSPCAD Integrative Command Line Environment* (DICE) [10]. It provides a framework for facilitating efficient management of the test and development of cross-platform design projects. In order to accommodate crossplatform operation, the DICE engine provides a collection of utilities implemented as bash scripts, C programs, and python scripts. By using free and open source command-line interfaces and languages, DICE is able to operate on different platforms, such as Windows (equipped with Cygwin), Solaris, and Linux.

#### 5) Design Verification

To improve the quality and performance of hardware designs while reducing their development time, a cross-platform design environment is needed that accommodates both early design exploration and final implementation tuning. One could make effective use of the initial higher-level application specification to create a functionally-accurate, language-independent design model. This model could be used in the development and validation of both the emulator and hardware.

The Dataflow Interchange Format (DIF) is tool for modelbased design and implementation of signal processing systems using dataflow graphs [13, 14]. A designer starts by translating the high-level design specification into a platform-independent description of the application in the DIF format. This structured, formal application description is an ideal starting point for capturing concurrency and optimizing and analyzing the application. Because the application description in DIF exposes communication as a first-class citizen, DIF descriptions are suitable for targeting hardware design, where modules must be interconnected by wires. After creating the initial DIF description, a designer can use it to perform side-by-side development and validation of optimized hardware or software implementations. One of the main advantages of using the DIF format is that it is dataflow-based description that allows the use of sophisticated analysis techniques that have been developed for dataflow languages.

A formal model such as dataflow can improve the test quality and provide information and tools that can be used to optimize a design. Dataflow models have proven invaluable for application areas such as digital signal processing. Their graph-based formalisms allow designers to describe applications in a natural yet semantically-rigorous way. Such a semantic foundation has permitted the development of a variety of analysis tools, including tools for balancing input and output buffers and for efficiently scheduling multiplexed operations [11]. As a result, dataflow languages are increasingly popular. Their diversity, portability, and intuitive appeal have extended them into many application areas and target platforms.

A typical approach involves specifying the application in DIF. Such an application specification typically defines the underlying modules and subsystems, along with their interfaces and connections. This specification is complete in terms of ensuring a correct functional behavior and module interfaces. The DICE framework can be applied to test each of the individual modules for its correctness, or extended to a larger subsystem or the entire application.

Any transcoding or platform-specific enhancements are accommodated by DICE via its standardized build and test framework. This allows designers to utilize the same testing framework at inception as they do at final implementation. Software developed jointly with DIF and DICE uses a single, crossplatform framework to handle design validation throughout each phase of development. The amount of time required to perform validation can be reduced through the direct reuse of unit tests in DICE. Model-based development can allow automatic testbench creation, improving the ease with which designers can create cross-platform tests.

### C. High-Level-Language Tools

Several tools have been developed to enable designers to specify algorithms using high-level languages and/or graphical user interfaces and automatically map those algorithms into an HDL. The resulting HDL code can be simulated to ensure correct performance and synthesized, placed, and routed to produce an ASIC or FPGA implementation. These tools facilitate rapid design-space exploration and for certain classes of algorithms lead to efficient implementations. In addition to generating HDL, several of these tools also generate testbenches, hardware interfaces, and synthesis scripts. However, the HDL produced by these tools is often difficult to read and debug. Furthermore, for certain tools and algorithms, the original high-level language code may require significant modifications to yield acceptable results and various high-level language constructs cannot be converted to synthesizable HDL. With some tools, the generated HDL instantiates components that are specific to a particular FPGA family, which can make it difficult to port to other platforms.

#### 1) C-to-HDL Tools

Numerous companies and universities have developed tools that convert C code to Verilog or VHDL. These tools typically take a program written in C, along with a set of design constraints or guidelines, and produce functionally-equivalent Verilog or VHDL. They may also produce accompanying C code (if not all of the original C code is meant to be synthesized), testbenches, synthesis and place-and-route scripts, and interfaces to the resulting hardware designs. With many of these tools, only a subset of the C language is supported, since constructs such as library calls, dynamic memory allocation, function pointers, complex data structures, and recursive functions cannot be easily implemented using synthesizable HDL code. Some of these tools provide extensions to the C language to allow the designer to specify operand lengths, hardware interfaces, timing-related information, and the desired level of parallelism in the resulting HDL. In the remainder of this section, we provide several examples of C-to-HDL conversion tools and then discuss their strengths and weaknesses.

The Impulse CoDeveloper Toolset from Impulse Accelerated Technologies provides a C-based development framework for FPGA-based systems. It includes the CoDeveloper C-to-FPGA Tools, the CoValidator Test Bench Generator, and the CoDeveloper Platform Support Packages [15, 16]. Collectively, these tools allow designers to (1) specify their hardware designs with Impulse-C, which supports a subset of C plus some extensions, (2) profile their Impulse-C code to determine potential performance bottlenecks, (3) if desired, partition the code such that certain code sections are run on an FPGA and other portions are run on a programmable processor, (4) use interactive, graphical tools to specify design constraints and perform optimizations, (5) map selected Impulse-C code into either VHDL or Verilog, (6) generate hardware interfaces for specific FPGA platforms, and (7) create HDL testbenches and simulation scripts to test the resulting designs. The Impulse CoDeveloper Toolset can be used to generate either standalone hardware designs or hardware design that interface with an embedded or external processor. They also provide several optimizations to improve hardware efficiency and parallelism including common sub-expression elimination, constant folding, loop pipelining, and loop unrolling. The Impulse CoDeveloper Toolset has been used to develop FPGA-based solutions for a wide range of applications including image and video processing, security, digital signal processing, and scientific and financial computing.

Pico Express FPGA from Synfora takes an algorithm written using a subset of the C programming language and a set of de-

sign requirements, such as clock frequency and target throughput, and creates register transfer level (RTL) and SystemC implementation models [17]. It also generates testbenches and an application driver program. PICO Express FPGA includes design space exploration capabilities that, based on user-specified design parameters, create multiple implementations and provide FPGA resource and performance estimates for these implementations to allow design tradeoffs to be evaluated. To achieve efficient designs and provide accurate performance and resource estimates, PICO Express FPGA utilizes several deviceindependent optimizations and also optimizes the resulting RTL for a particular Xilinx FPGA family. PICO Express FPGA has been used to design FGPA-based hardware for a wide range of systems including video, audio, and image processing, wireless communication, and security.

The C2R Compiler from Cebatech provides an automated mechanism for converting structured C source code, along with a small set of compiler directives, to Verilog and SystemC [19, 20]. Internally, the C2R Compiler creates a control dataflow graph and then uses allocation and scheduling algorithms to produce Verilog that is functionally equivalent to the C source code. Consequently, the original C code can be used to perform functional verification of the resulting Verilog. The C2R design flow allows designers to instrument the C source code with various compiler directives and explore the design space of the resulting architectures. The compiler directives can be used to specify state machines for control, create interfaces to the resulting Verilog code, bind arrays to specific FPGA resources, specify the degree of pipelining to be used to implement loops, control variable bit widths, and enable clock gating of registers in the resulting design. C2R has been used to implement hardware designs for security, data compression, and floating-point arithmetic.

The Catapult C Synthesis Tools from Mentor Graphics synthesizes C++ source code without extensions to SystemC, Verilog, or VHDL [18]. Catapult C provides a graphical user interface that lets the designer specify area, performance, and power constraints, apply a variety of optimizations including loop merging, loop unrolling, and loop pipeling, specify operand bit widths, generate hardware interfaces, evaluate design tradeoffs, and identify bottlenecks and inefficiencies in the generated design. Catapult C also provides options for clock-gating to reduce power consumption, takes advantage of optimized FPGA resources such as block RAMs and DSP blocks, and provides automated equivalence checking to formally prove that the original C++ code and the generated HDL are functionally equivalent. Catapult C has been successfully used to generate complex hardware designs for wireless communication and image and video processing. By the end of 2008, over 100 million ASICs had shipped with hardware designed using Catapult C [18]..

Several other tools for C-to-HDL conversion have been developed. These include (but are not limited to):

- 1. The Nios II C-to-Hardware Acceleration Compiler from Altera [22, 23]
- The C-to-Verilog Automated Circuit Design Tool from Cto-Verilog.com [24]
- 3. The Trident Compiler from Los Alamos National Labora-

tory [25, 26]

- 4. The No Instruction Set Computer (NISC) Technology and Toolset from the Center for Embedded Systems at the University of California at Irvine [21].
- 5. The Riverside Optimizing Compiler for Configurable Computing (ROCCC) Toolset from the University of California at Riverside [27, 28]
- The SPARK Toolset from the Microelectronic Embedded Systems Laboratory at the University of California at San Diego [29, 30]
- 7. The GAUT High-level Synthesis Tool from the Laboratory of Science and Technology Information, Communication and Knowledge [31]

In general, the C-to-HDL tools discussed in this paper help simplify the design process, especially for people not familiar with HDLs. They allow the designs to be specified using a subset of C, sometimes with extensions. These tools also facilitate design-space exploration by allowing the designer to specify design constraints, bitwidths, and desired levels of parallelism and then evaluate design tradeoffs based on these specifications. Several of the tools generate additional resources including C support code, test benches, hardware interfaces, and synthesis and place-and-route scripts.

The C-to-HDL tools, however, also have several limitations. Only a subset of the C language is generally supported, and for several tools, extensions to the C language are needed to enable correct synthesis. In order to generate efficient code, it may be necessary to rewrite the original C code to adhere to toolspecific guidelines. Furthermore, the generated code is usually difficult to read and debug. Code that is not well written or too complex can result in designs that are much less efficient than hand-coded HDL designs. On the other hand, it is expected that the tools will continue to improve so that in the future several of these limitations may not be as severe.

#### 2) AccelDSP and System Generator

Xilinx's AccelDSP Synthesis Tool is a high-level MATLABbased development tool for designing and analyzing algorithmic blocks for Xilinx FPGAs [32]. Although MATLAB is a powerful algorithm development tool, many of its benefits are reduced when converting a floating-point algorithm into fixedpoint hardware. For example, quantization errors and the potential for overflow and underflow are introduced into the algorithm due to floating-point to fixed-point conversion. Consequently designers may need to rewrite the code to reduce the impact of these errors and analyze the results produced by the fixedpoint code to ensure they are acceptable. To facilitate this, AccelDSP provides the capability to replace high-level MATLAB functions with fixed-point C++ or Matlab models and automatically generates testbenches to facilitate fixed-point simulations. The tool automatically converts a floating-point algorithm to a fixed-point C++ or MATLAB model. It then generates synthesizable VHDL or Verilog code from the fixed-point model, and creates a testbench for verification. During the HDL generation process, it performs several optimizations including loop unrolling, pipelining, and device-specific memory mapping. A

graphical user interface allows the user to specify the bitwidths used in the generated code and to guide the synthesis process.

The AccelDSP Synthesis tool provides several advantages. It is a tightly integrated component of the Xilinx XtremeDSP Solution and the MATLAB toolset, which allows it to utilize MATLAB's mathematical modeling and data visualization features. To improve the design's efficiency, it automatically utilizes Xilinx IP cores and generates code blocks for use in Xilinx System Generator, which is described below. AccelDSP also provides capabilities to replace high-level MATLAB functions with fixed-point C++, MATLAB, or HDL code by specifying the target Xilinx FPGA model, intermediate data precision, and desired resource distribution. HDL test benches are generated automatically from the corresponding fixed-point C++ or MAT-LAB model and these testbenches can be used to verify functional equivalence between the higher-level model and the resulting HDL. Furthermore, overflow and underflow that occur in the fixed-point code are reported by the AccelDSP simulation tool to help designers find potential errors that occur due to the floating-point to fixed-point conversion process. AccelDSP also provides a set of graphical tools, including probe functions, design reports, and plots to visualize and analyze the system. AccelDSP allows designers to define constraints and control resource usage and timing. For example, the user may choose to expand a "for loop" into multiple parallel hardware blocks or a single hardware block that is reused for several iterations. The user may also provide timing constraints that result in a pipelined design.

AccelDSP also has several limitations. For example, it cannot convert all MATLAB files. Rather, the MATLAB file has to be written in a specific way, and only a limited subset of MAT-LAB can be used. AccelDSP only works with Xilinx FPGA chips so designs cannot easily be ported to FPGAs from other vendors. Furthermore, the generated HDL can be difficult to read and debug. For many algorithms, the amount of resources required by designs generated using AccelDSP is greater than the amount of resources required by designs generated using hand-coded HDLs.

Xilinx's System Generator is a high level design tool that utilizes MATLAB Simulink and enables designers to develop DSP hardware designs for Xilinx FPGAs [33]. It provides over 90 parameterized DSP building blocks that can be used in the Matlab Simulink graphical environment. The design process with Simulink and System Generator is simply selecting DSP blocks, dragging the blocks to their desired location, and connecting the blocks via wires. These blocks and their communication links can be converted from Simulink to Verilog, VHDL, or FPGA bit files. System Generator can also utilize blocks generated by AccelDSP.

System Generator has several strengths. In particular, it is a useful tool for designers with no previous experience with FP-GAs or HDL design. In addition to directly generating VHDL and Verilog code, it also provides a resource estimator that quickly estimates the FPGA resources required by the design prior to placement and routing. System Generator can create a hardware simulation model, and integrate with a Simulink software model to evaluate complete applications including analog signals. For example, Simulink can be used to create a sine wave with pseudo-random noise that serves as an input to a System Generator hardware model, which writes it outputs to a file. The complete Simulink model, which includes the System Generator model can then be used to simulate the entire system and generate a testbench for the hardware module. System Generator also has several limitations. It requires experience with Simulink to create efficient designs The Simulink tool uses an interactive graphical environment and a parameterized set of block libraries, which may not be convenient for programmers who are more familiar with high-level program languages, such as C++ or Java. Furthermore, although the blocks provided by System Generator are very useful for certain types of signal processing applications, these blocks may not meet the needs of other types of applications. Similar to AccelDSP, the HDL code produced by System Generator only works with Xilinx FPGA chips and can be difficult to read and debug.

#### IV. HARDWARE VERIFICATION AND DEBUGGING

Differences between the simulation results and the performance of the real hardware may result from hardware defects that went undetected by the manufacturer, inaccuracies in the models used for hardware simulation, variation from nominal environmental parameters, or unexpected operating conditions such as mutual inductance or capacitive coupling from other systems, clock jitter, power supply noise, etc. Such issues become more important for high-performance systems with tight tolerances, since they are more susceptible to problems arising from variations in the timing of internal signals. Additionally, for large, interconnected system, such as those used in HEP, full system simulation may be very costly or simply infeasible. This further motivates the importance of thoroughly testing the hardware. As we have discussed, FPGA hardware trends show rapid increases in the number of logic resources on each device. In particular, the number of registers on each devices has increased at an especially fast pace recently. The growth trends in registers and in on-chip RAM contribute to an overall trend of increasing state in FPGAs. Increasing the amount of state in a device can prove particularly troublesome during hardware verification-the process of confirming that a circuit built in hardware is consistent in behavior and performance with the circuit as it performed in simulation. Differences between the simulation results and the performance of the real hardware may result from hardware defects that went undetected by the manufacturer, inaccuracies in the models used for hardware simulation, variation from nominal environmental parameters, or unexpected operating conditions such as mutual inductance or capacitive coupling from other systems, clock jitter, power supply noise, etc. Such issues become more important for highperformance systems with tight tolerances, since they are more susceptible to problems arising from variations in the timing of internal signals. Additionally, for large, interconnected system, such as those used in HEP, full system simulation may be very costly or simply infeasible. This further motivates the importance of thoroughly testing the hardware.

Hardware verification is performed by subjecting hardware to a series of test patterns and comparing the performance to the expected results. When an error occurs, it is important to find the source of the error to determine an appropriate way of correcting it. The process of locating the source of errors becomes much more difficult as the quantity of state in a device increases. This is because faulty values may contaminate the state and may propagate to different parts of the state and may take many cycles before they generate an observable error. At the same time, the number of pins on FPGAs is growing at a much slower rate than the internal state. It will become more difficult to observe internal state using external logic analyzer as the ratio of state to pins increases. This is particularly concerning because it means that designers must use much longer and more elaborate tests to verify their hardware. However, in physics applications, it is crucial to identify and eliminate any such bugs before the start of experimentation to ensure confidence in experimental results.

The problem of verifying and debugging circuits with large amounts of state is not unique to FPGAs, and has been extensively studied in the integrated circuit domain [4]. Today, engineers use a set of design techniques known as design for testability (DFT) and built-in self-test (BIST) to automatically apply tests internally and more easily probe the contents of state registers [41]. While these techniques are useful, they come with a cost; adding DFT and BIST consumes chip resources, may require extended design time, and often results in reduced operating frequency. However, because FPGAs are reprogrammable, they have the unique ability to be able to potentially use these techniques without reducing the performance of the final design. In the remainder of this section, we will describe the software tools available for performing BIST on FPGAs in a fast and efficient manner.

#### A. Integrated Logic Analyzers

Major FPGA vendors have provided tools to alleviate the problem of hardware verification. Xilinx's ChipScope Pro [6] and Altera's SignalTap II Embedded Logic Analyzer [12] enable designers to probe and monitor an FPGA's internal signals in real-time. These tools considerably cut verification time and effort in order to eliminate hard-to-detect bugs. The tools help equip a design with embedded hardware logic analyzers that sample data and transactions on selected signals and nodes. ChipScope Pro further provides the ability of forcing internal signals to specified values. Any internal signals in the design can be selected for monitoring. The sampled data are stored in the FPGA's embedded Block RAMs. Data are sent to a personal computer using the JTAG interface, the same interface used in FPGA programming, to give a visualized demonstration of the internal signals. Designers can easily observe and analyze transactions on internal signals of the design in real-time by means of a Software Logic Analyzer installed on a PC. Data sampling is triggered at runtime by a set of predefined conditions that can be set using a graphical user interface. The data sampling lasts for the number of clock cycles specified by the designer.

This approach of utilizing integrated logic analyzers removes or reduces the need for specific external hardware. These tools provide relatively complete observability to designers. They are especially useful for large designs, which often have a myriad of signal and data variations to verify. Designers are able to control the value of internal signals with ChipScope Pro. This is especially valuable in complex sequential circuits where it may take a long sequence to external inputs to change certain internal signals. In addition, signal monitoring is done by onchip configurable logic analyzers, while the FPGA is working under standard operating conditions. This eliminates the need to purchase expensive external logic analyzers and chip testers. Hence, these tools provide an easy, yet powerful approach for FPGA design verification that lowers project costs, saves design time, and helps find bugs early in the implementation process.

Although this approach supplies the designer with new verification capabilities, it has some drawbacks and limitations. The number of observed signals and sampled time depend upon the free Block RAMs available on an FPGA, and since this approach uses FPGA resources, it might have negative timing impact on a design. Furthermore, defining a proper trigger condition that leads to bug detection might be challenging. Finally, embedded logic analyzers are not able to capture signal glitches and to test clock signals, because data is sampled at the hardware's clock frequency and thus cannot perform clock supersampling.

#### 1) Chipscope Cores

ChipScope is composed of a set of cores to promote the design verification process. The Integrated Logic Analyzer (ILA) core, as the most common core, is used for signal monitoring. The Integrated Bus Analyzer (IBA) core simplifies system bus monitoring. The Integrated Controller (ICON) core is used to set trigger conditions and send data from Block RAMs to the PC via the JTAG interface. The Agilent Trace Core 2 (ATC2) core provides an interface between the embedded logic analyzer and the Agilent FPGA trace port analyzer. The virtual input/output (VIO) core provides the designer with signal controllability along with signal monitoring. The internal bit error ratio tester (IBERT) core allows the designer to detect bugs hidden in RocketIO serial I/O designs.

## V. CONCLUSION

The trends in FPGA hardware show exponential increases in device logic, on-chip memory, and I/O bandwidth over recent years. Process technology is in place to allow FPGA manufacturers to maintain this growth in the near future. This improvement in FPGAs could allow future HEP systems to incorporate more intricate and flexible algorithms, implement higherresolution processing, and perform system integration. Achieving these goals will require larger, more complex designs on each FPGA. To manage increasingly complex designs while still working within constrained cost and time budgets, system developers must adopt a more scalable design methodology. This methodology must extend across the entire design process, from specification and design exploration to testing and hardware verification. In this paper, we have presented core design concepts and emerging software tools that can serve as the foundation for a design methodology that can scale to meet the next generation of FPGA-based HEP systems.

#### VI. ACKNOWLEDGEMENT

This work was supported in part by the National Science Foundation, under grants EECS-0824040 and EECS-0823989.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF. We are grateful to the Topical Workshop on Electronics for Particle Physics (TWEPP) Scientific Organizing Committee for inviting us to present at the workshop and submit this paper.

#### REFERENCES

- CMS Collaboration. CMS TriDaS Project: Technical Design Report; 1, The Trigger Systems, CERN (2000).
- [2] W.H. Smith, P. Chumney, S. Dasu, M. Jaworski, and J. Lackey. CMS Regional Calorimeter Trigger High-Speed ASICs, 6th Workshop on Electronics for LHC Experiments, 2000-2010 (2000).
- [3] Intel Corp. Press Release: Intel Developer Forum 22nm News Facts (2009).
- [4] A. Ghosh, S. Devadas, and A.R. Newton. Test Generation and Verification for Highly-Sequential Circuits, *IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems*, 652-667 (1991).
- [5] Coverity Inc. Scan Open Source Report (2009).
- [6] Xilinx Inc. ChipScope Pro 11.3 Software and Cores (2009).
- [7] IEEE Std. 1666-2005. Open SystemC Reference Manual (2005).
- [8] IEEE Std. 1800-2005. System Verilog: Unified Hardware Design, Specification, and Verification Language (2005).
- [9] P. Hamill. Unit Test Frameworks (2004).
- [10] S.S. Bhattacharya, S. Kedilaya, W. Plishker, N. Sane, C. Shen, and G. Zaki. The DSPCAD Integrative Command Line Environment: Introduction to DICE Version 1, UMIACS-TR-2009-13 (2009).
- [11] E.A. Lee and D.G. Messerschmitt. Synchronous Dataflow, *Proceedings of the IEEE*, 1235-1245 (1987).
- [12] Altera Corp. Design Debugging Using the SignalTap II Embedded Logic Analyzer (2009).
- [13] C. Hsu, M. Ko, and S.S. Bhattacharyya. Software Synthesis from the Dataflow Interchange Formal, *Int. Workshop on Software and Compilers for Embedded Systems*, 37-49 (2005).
- [14] W. Plishker, N. Sane, M. Kiemb, K. Anand, and S.S. Bhattacharyya. Functional DIF for Rapid Prototyping, *Rapid System Prototyping*, 17-23 (2008).
- [15] Impulse Accelerated Technologies. Impulse CoDeveloper (2009). Available from http://www.impulseaccelerated. com/products.htm.
- [16] D. Pellerin. Impulse C-to-FPGA Workshop, Fourth Annual Reconfigurable Systems Summer Institute (2008). Available from http://www.rssi2008.org/proceedings/tutorial/ Impulse.pdf.
- [17] Synforma. PICO Express FPGA (2009). Available from http: //www.synfora.com/products/picoExpressFPGA.html.
- [18] Mentor Graphics. Catapult C Synthesis (2009). Available from http://www.mentor.com/products/esl/high\_level\_ synthesis/catapult\_synthesis/.
- [19] Cebatech. Cebatech Technology (2009). Available from http: //www.cebatech.com/technology.

- [20] S. Ahuja, S.T. Gurumani, C. Spackman, and S.K. Shukla. Hardware Coprocessor Synthesis from an ANSI C Specification, *IEEE Design & Test of Computers*, 58-67 (2009).
- [21] Center for Embedded Systems. NISC Toolset User Guide (2007). Available from http://www.ics.uci.edu/~nisc/toolset/ Quick-Guide.pdf.
- [22] Altera Corporation. Nios II C-to-Hardware Acceleration Compiler (2009). Available from http://www.altera.com/ products/ip/processors/nios2/tools/c2h/ni2-c2h. html.
- [23] Altera Corporation. Nios II C2H Compiler User Guide (2009). Available from http://www.altera.com/literature/ug/ ug\\_nios2\\_c2h\\_compiler.pdf.
- [24] The C-to-Verilog Automated Circuit Design Tool (2009). Available from http://www.c-to-verilog.com/.
- [25] Trident Compiler (2009). Available from http: //sourceforge.net/projects/trident/.
- [26] J.L. Tripp, M.B. Gokhale, and K.D. Peterson. Trident: From High-Level Language to Hardware Circuitry, *IEEE Computer*, 28-37 (2007).
- [27] The University of California at Riverside. The Riverside Optimizing Compiler for Configurable Computing (ROCCC) Toolset (2009). Available from http://www.cs.ucr.edu/~roccc/.
- [28] Z. Guo, W. Najjar, and A.B. Buyukkurt. Efficient Hardware Code Generation for FPGAs, ACM Transactions on Architecture and Compiler Optimizations, 1-26 (2008).
- [29] Microelectronic Embedded Systems Laboratory. SPARK: A Parallelizing Approach to the High-Level Synthesis of Digital Circuits (2004). Available from http://mesl.ucsd.edu/spark/.
- [30] S. Gupta, R.K. Gupta, N.D. Dutt, and A. Nicolau. SPARK: A Parallelizing Approach to the High-Level Synthesis of Digital Circuits, Kluwer Academic Publishers, (2004).
- [31] LABSTICC. The GAUT High-level Synthesis Tool (2009). Available from http://www-labsticc.univ-ubs.fr/ www-gaut/.
- [32] Xilinx, Inc. AccelDSP Synthesis Tool (2009). Available from http://www.xilinx.com/tools/acceldsp.htm.
- [33] Xilinx, Inc. System Generator for DSP (2009). Available from http://www.xilinx.com/tools/sysgen.htm.
- [34] Xilinx, Inc. Virtex 2.5 V Field Programmable Gate Arrays, DS003-1 v2.5 (2001).
- [35] Xilinx, Inc. Virtex-E 1.8V Field Programmable Gate Arrays, DS022-1 v.2.3 (2002).
- [36] Xilinx, Inc. Virtex-II Platform FPGAs: Complete Data Sheet, DS031 v3.5 (2007).
- [37] Xilinx, Inc. Virtex-II Pro and Virtex-II Pro X Platform FPGAs: Complete Data Sheet, DS083 v4.7 (2007).
- [38] Xilinx, Inc. Virtex-4 Family Overview, DS112 v3.0 (2007).
- [39] Xilinx, Inc. Virtex-5 Family Overview, DS100 v5.0 (2009).
- [40] Xilinx, Inc. Virtex-6 Family Overview, DS150 v2.0 (2009).
- [41] L.T. Wang, C.W. Wu, and X. Wen. VLSI Test Principles and Architectures: Design For Testability (2006).

## A flash high-precision Time-to-Digital Converter implemented in FPGA technology

P. Branchini<sup>a</sup>, S. Loffredo<sup>a</sup>

<sup>a</sup>Dipartimento di Fisica, Università Roma Tre and I.N.F.N. Sezione di Roma Tre, Rome, Italy

loffredo@roma3.infn.it

#### Abstract

The construction and design process of a high-resolution time-interval measuring system implemented in a SRAMbased FPGA device is discussed. A flash architecture has been implemented. The architecture used is virtually dead time free. It consists of a high precision quartz driven coarse time counter, and a two step phase interpolator. Time intervals of 50 ps steps have been generated to characterize the TDC within the clock period (1.818 ns). The behaviour of the TDC has been tested up to a 20  $\mu$ s interval in 1  $\mu$ s steps. In this way we have measured a resolution on the time interval of about 50 psec on every single measurement. The results of the device in terms of resolution, differential and integral non-linearity are presented.

#### I. INTRODUCTION

Time to Digital Converters (TDCs) are often required in many applications in High Energy and Nuclear Physics. Furthermore, they have been widely used in many scientific equipments such as Time-Of-Flight (TOF) spectrometers and distance measurements. Different configurations of tapped delay lines are widely used to measure sub-nanosecond time intervals both in ASIC and FPGA devices. However, the design process of an ASIC device can be expensive, especially if produced in small quantities, while FPGAs lower the development cost and offer high design flexibility. Rapid progress in FPGA electronics technology allowed achieving a time resolution values in between 50 ps and 500 ps [1], [2]. The architecture used in this device beside being dead time free is multi-hit and allows for a resolution of about 50 psec. We'll show its performance in terms of resolution, integral and differential non linearity.

#### **II.** PRINCIPLE OF OPERATIONS

Our architecture is based on the newest available Xilinx Virtex-5 FPGA [3]. We used the XC5VLX50 with -3 speed grade in order to improve the performance for high-speed design. The approach exploits the classic Nutt method [4] based on a multi-stage interpolation. The first stage is built around a coarse free-running counter used to measure long time intervals. The Virtex-5 Digital Clock Managers (DCMs) provide a wide range of clock management features and allow phase shifting. We used a DCM that gives four copies of the same clock signal shifted by 0° (clk0), 90° (clk90),

180°(clk180) and 270°(clk270). The DCM output signals synchronize a state machine that is also used to perform a first phase interpolation measurement. The coarse conversion dynamic range is limited to the counter output width. The bin size of the coarse output is limited by the clock period, the DCM is used to perform a first level phase (fine time measurement) interpolation thus giving a resolution of about 454 ps. The third stage performs the iper fine time measurement thus improving the coarse counter resolution. Since we also exploit the four phases information delivered by the DCM the delay line must only interpolate in between the four different phases i.e. over a quarter of the clock cycle. Our time converter consists of tapped delay lines.

#### III. THE TDC TESTER BOARD

The TDC Tester board we have built is shown in Fig.1. On this board we have installed two high stability oscillators [5]. The first oscillator (VFTX140) generates an output frequency of 550 MHz. Its temperature stability is better than 0.28 ppm over a temperature range from 0°C to +70°C. The output is configured as a differential LVPECL signal. Long term time accurancy depends on oscillator stability. To address this problem the VFOV200 oscillator has been selected. This oscillator provides an HCMOS output frequency of 250 MHz and it has a temperature stability up to 5 ppb over a temperature range from -40°C to +85°C. Test points for highbandwidth active probes are used to perform the Virtex-5 clock signal characterization. SMA connectors are used to send the start and stop signals to the device. They may adopt differential or single-ended signalling schemes. The Tester daughter board is hosted by a VME module wich allows us to test and read-out the TDC via a VME CPU.

#### IV. TDC ARCHITECTURE

The simplified circuit block diagram of the TDC architecture is shown in Fig.2. The external clock frequency we used was 550 MHz. The coarse time measurement is obtained by the TDC using quartz clock signal. The task of the finite state machine shown in Fig.3 is to select the proper delay line and perform a first phase interpolation using the 4 phases of the clock and delivering a fine time measurement. A 2 bit counter  $N_c$  encodes the value of the difference of the first step phase interpolator. Delay lines have been used to interpolate the phase within a quarter of the clk period and thus improving the time resolution and delivering an iper fine time measurement.

Fig.2, follow the phase difference between the start/stop and the clk0 signal.

This value is "00" if the phase difference is between 0 and  $\pi/2$ ,



Figure 1: The TDC tester board

The building blocks of the coarse TDC are the 550 MHz synchronous binary counter and the finite state machine. The coarse counter has a 32 bit data width and is used in freerunning mode. This counter is reset only at power up. When the start signal transition occurs the current state of the counter is sampled by the start register, and the same operation occurs also when the stop signal is delivered to the TDC. The difference between the stop and the start register is the coarse measurement of the time interval. The state machine samples the start and the stop signal and detects the phase difference between the start and stop rising edges. The least significant bit corresponds to a quarter of the clock period. The full clock period is recovered by the 2 bit counter  $N_c[1:0]$  which labels the phase value. The output binary value Nc[1:0] increases the data out width of the coarse TDC, Nc which is a 34 bit wide word. Therefore the state machine allows us to obtain a time resolution of quarter of the clk0 period (454 ps). The delay line therefore is only used to interpolate the phase in one quarter of the clock period. The sel0/1[1:0] outputs, shown in "01" if it is between  $\pi/2$  and  $\pi$ , "10" if it is between  $\pi$  and  $3\pi/2$ , and "11" if it is between  $3\pi/2$  and  $2\pi$ . The selection of the tapped delay line of the fine time measurement reflects the phase difference between the start/stop signal and clk0. The measurement range of the coarse TDC is limited due to the counter width and the resolution is limited due to the clock frequency.

#### V. CARRY CHAIN DELAY LINE

The carry chain delay line is shown in Fig 4. We have used high-speed chain structures that vendors designed for generalpurpose applications. In this configuration the stop signal is the 550 MHz system clock. The start signal after each delay unit is sampled by the corresponding flip-flop on the rising edge of the stop signal. In this configuration the delay line consists of set of 64 multiplexers in sequence. The selection bit of every multiplexer is set to logic value one, in order to let the start signal propagate through the line. The time quantization step of the TDC is determined by the multiplexer propagation delay time  $\tau$ . Due to the short delay of the tapped delay line, it's necessary to use four delay lines in order to cover the full clock period. The four lines are clocked by the



Figure 2: Simplified circuit block diagram of the TDC architecture.



Figure 3: Simplified block diagram of the finite state machine.

clk0, clk90, clk180, clk270 signals delivered by the DCM. The state machine selects the right delay line by asserting the sel0/1[1:0] bits. In Fig. 5 a simplified block diagram of the Virtex-5 slice is shown. The carry chain delay lines are implemented in a small region of the device and every line uses 8 slices of it. We decided to use four delay lines rather than one but longer, in order to reduce the possible non linearity introduced by the clock distribution between neighbouring slices. Furthermore in this way, the output from the tapped line is converted from thermometric code into

binary natural code by using a priority encoder. A very short dead time (about 1 clock period) is the main advantage of using the carry chain delay line.



Figure 4: Logic block diagram: Carry chain delay line.



Figure 5: Simplified circuit block diagram of the Virtex-5 slice.

#### VI. TESTING THE TDC

To perform our tests we have used an architecture based on an off-the-shelf CPU board, the Motorola MVME6100[6]. The CPU board is designed around the MPC7457 PowerPC processor running at 1.267 GHz. The VME board hosting the TDC Tester daughter card can handle A32/D16 VME cycles and is configured as slave. We have used a DTG5334 [7] as a pulse generator. The DTG5334 can deliver time intervals as long as 20 µsec in 1 ps step. Since we have operated the DTG in free running mode an accept signal was delivered by the MVME6100 to the VME slave board in order to start and stop



and we have measured a resolution of about 50 psec in every measured point in that time interval.

In figure 6 we show on a 2 nsec interval the integral non linearity as a function of the TDC output code. In the same time interval figure 7 shows the differential non linearity.

#### VII. CONCLUSIONS

A TDC based on a FPGA architecture has been built. The advantage of the TDC delay line architecture implemented in FPGA is the ease of use and flexibility. FPGA electronics technology allows to achieve high speed digital designs. This means high resolution digital counter and then a reduced number of delay elements of the line used for the time interpolation within the system clock cycle. The architecture implemented shows very good performance in terms of time resolution (about 50 psec up to 20  $\mu$ sec) and very low dead-time.

#### VIII. ACKNOWLEDGMENTS

This work is partly supported as a PRIN project by the Italian Ministero dell'Istruzione, Università e Ricerca Scientifica. The authors would like to warmly thank R. Lomoro for the general electronic support.

#### IX. Reference

- J. Song, Q. An, and S. Liu, "A High-Resolution Time-to-Digital Converter Implemented in Field-Programmable-Gate-Arrays", *IEEE Trans. Nucl. Sci.*, vol. 53, no. 1, pp. 236-241, Feb. 2006 and references therein.
- [2] A. Aloisio, P. Branchini, R. Cicalese, R. Giordano, V. Izzo, and S. Loffredo, "FPGA implementation of High-Resolution Time-to-Digital Converter", *IEEE Nuclear Science Symposium Conference Record*, vol.1, 2007, 504-507.
- [3] Virtex-5 User Guide (2008, September). [Online]. Available:

http://www.xilinx.com/support/documentation/user\_guide s/ug190.pdf

- [4] J. Kalisz, "Review of methods for time interval measurements with picosecond resolution", *Metrologis*, vol.41, pp.17-32.
- [5] Valpey Fisher. [Online]. Available: http://www.valpeyfisher.com/
- [6] MVME6100 Series VME Single-Board Computer (2005).
   [Online]. Available: http://www.motorola.com/mot/doc/5/5501\_MotDoc.pdf
- [7] DTG5000[Online],http://www2.tek.com/cmsreplive/pspre p/13587/86W\_16679\_6\_2008.05.15.16.35.38\_13587\_EN .pdf.

## Implementing the GBT data transmission protocol in FPGAs

S. Baron<sup>a</sup>, J.P. Cachemiche<sup>b</sup>, F. Marin<sup>b</sup>, P. Moreira<sup>a</sup>, C. Soos<sup>a</sup>

<sup>a</sup> CERN, 1211 Geneva 23, Switzerland, <sup>b</sup>CPPM, 13288 Marseille, France

## sophie.baron@cern.ch, cachemi@cppm.in2p3.fr, marin@cppm.in2p3.fr, paulo.moreira@cern.ch, csaba.soos@cern.ch

#### Abstract

The GBT chip [1] is a radiation tolerant ASIC that can be used to implement bidirectional multipurpose 4.8Gb/s optical links for high-energy physics experiments. It will be proposed to the LHC experiments for combined transmission of physics data, trigger, timing, fast and slow control and monitoring. Although radiation hardness is required on detectors, it is not necessary for the electronics located in the counting rooms, where the GBT functionality can be realized using Commercial Off-The-Shelf (COTS) components. This paper describes efficient physical implementation of the GBT protocol achieved for FPGA devices on Altera and Xilinx devices with source codes developed in Verilog and VHDL. The current platforms are based on Altera StratixIIGX and Xilinx Virtex5.

We will start by describing the GBT protocol implementation in detail. We will then focus on practical solutions to make Stratix and Virtex transceivers match the custom encoding scheme chosen for the GBT.

Results will be presented on single channel occupancy, resource optimization when using several channels in a chip and bit error rate measurements, with the only aim to demonstrate the ability of both Altera and Xilinx FPGAs to host such a protocol with excellent performances. Finally, information will be given on how to use the available source code and how to integrate GBT functionality into custom FPGA applications.

#### I. GBT PROTOCOL PRESENTATION

#### A. Introduction

The general architecture of a high-speed optical link implemented using the GBT chipset and FPGA is represented in Figure 1.



Figure 1: GBT optical link implementation scheme

Logically the link provides three "distinct" data paths for: Timing and Trigger, Data Acquisition and the Slow Control. In practice, the three logical paths do not need to be physically different and are merged. The aim of such architecture is to allow a single link to be used simultaneously for data readout, timing and trigger distribution, readout and experiment control. The link establishes a point-to-point optical bidirectional connection (using two optical fibers).

The GBT chipset [2] is under development to match such architecture. It targets high-speed (3.36Gb/s) data transmission between the detectors and the counting room.

As illustrated in Figure 1, such a link is implemented by a combination of custom and Commercial Off-The-Shelf (COTS) components. In the counting room, the receivers and transmitters will be implemented using COTS components and FPGAs while, embedded on the detectors, the receivers and transmitters will be implemented by the GBT chipset and Versatile Link Components [3]. This architecture clearly distinguishes between the counting room and front-end electronics specificities: that is, the on-detector front-end electronics works in a hostile radiation environment requiring custom made components while the counting room electronics operates in a radiation free environment allowing the use of COTS components. Moreover, the availability of FPGAs with up to 48 Hard-IP serializer blocks would allow concentrating data from several front-end sources into a single module in the counting room facilitating data merging and leading to compact systems.

The study presented below will focus on proving the usability of COTS components and FPGAs to implement the GBT protocol in counting rooms [4].

#### B. GBT Protocol

Due to the beam luminosity planned for SLHC, the high speed data transmission link will be exposed to high Single Event Upset rates. SEUs are a major impairment to error free data transmission. To deal with this, the GBT line coding adopts a robust error correction scheme that will allow correction of bursts of errors caused by SEUs. A significant fraction of the channel bandwidth must therefore be assigned to the transmission of a Forward Error Correction (FEC) code.

The code to be used must provide a high level of protection, since errors occurring during transmission can also occur as burst errors and not only as isolated events. Because of this, a double interleaved Reed-Solomon correcting code was chosen. The code is built by first scrambling the input data to provide DC-balancing of the frame, and then interleaving two Reed-Solomon encoded words (using 4-bit symbols), each capable of correcting a double symbol error (Figure 2). The interleaving operation allows increasing the correction capability of errors up to 4 symbols.



Figure 2: GBT encoding scheme

This in practice means that a sequence of up to 16 consecutive incorrectly-received bits can be corrected. This correction technique requires an extra field of 32 bits in the frame to protect the 88 transmitted bits (including data, header and slow control), resulting in a code efficiency of 73%.

The frame (sketched in Figure 3) is composed of 120 bits that are transmitted during a single SLHC bunch crossing interval (25 ns) resulting in a line data rate of 4.8 Gb/s. Of these, 4 bits are used for the frame Header (H) and 32 used for Forward Error Correction (FEC). This leaves a total of 84 bits free for data transmission corresponding to a user bandwidth of 3.36 Gb/s. In these 84-bits, 4 are always reserved for the Slow Control (SC) field (see 'Slow control channel') and 80bits are reserved for data (D) transmission. Among the 4-bit of slow control, 2 are reserved for GBT control and 2 are user defined. The 'D' field use is not pre-assigned and can be used indistinguishably for Data Acquisition (DAQ), Timing Trigger & Control (TTC) or Experiment Control (EC) applications [5][6].

This makes 2+80 = 82 bits of data available to the user for a frame of 120 bits, giving a payload of 68%.



Figure 3: GBT frame

## II. GBT PROTOCOL IMPLEMENTATION IN FPGAS

#### A. FPGAs constraints

In the same way it is done in the GBT ASIC, the DCbalance of data transmitted over the optical fiber is ensured by the FPGA by scrambling the data contained in the SC and D fields. For forward error correction the scrambled data and header are Reed-Solomon encoded before nibble interleaving, and serialization. The line encoding/decoding process is represented in Figure 4.

No problem was encountered to configure the hard-IP transceivers of the FPGAs, as the portability of this protocol was carefully checked during the specification phase of the GBT. In particular, the ability of Stratix and Virtex transceivers to transmit 120 bits at a frequency of 40 MHz was ensured at that time.



Figure 4: Block diagram of a full GBT link in an FPGA

However, these transceivers provide neither specific encoding schemes like the one we selected nor flexible word alignment functions. This is mainly due to the fact that they target the most common telecommunication protocols. We had thus to implement in user logic all the encoding and decoding blocks, as well as a customized pattern detection and word alignment block (see Figure 5).



Figure 5: Frame alignment procedure in FPGAs

At power on or after a loss of synchronization, the receiver starts a frame-lock acquisition cycle to find the frame boundaries, that is, to acquire frame synchronization.

The frame-lock acquisition mode operates as follows. In the StratixIIGX, the transceiver hard-IP word aligner block cannot be bypassed. It is thus configured to lock on an arbitrary pattern. Once completed the process is not repeated, except at power on or upon a command from the pattern detection state machine. For all the other devices, we bypass the word aligner inside the transceiver.

The parallel output of the receiver feeds the custom pattern detection and word aligner blocks, which take control of the frame alignment process: for each received frame the four bits in the header position are checked for header validity. Because the header pattern can be found in the data, 23 consecutive frames must contain a valid header before the frame is considered locked (the probability of false boundary detection is then reduced below  $10^{-20}$  as demonstrated in [5]). Otherwise, the frame is shifted by one bit and the valid header checking procedure is repeated. After frame-lock is achieved, the

receiver switches to the frame-tracking mode, which maintains frame synchronization even in the presence of headers corrupted by noise or single event upsets.

The phase tracking mode must thus be tolerant to a low rate of detection of invalid headers. Provided that frame synchronization is maintained, the detection of a corrupted header will not introduce a transmission error since the header field is also protected by the forward error correction code transmitted with the frame. A corrupted header will thus be corrected and properly identified by the Reed-Solomon decoder. The frame tracking mode operates as follows: after a successful frame-lock acquisition cycle has been executed the receiver enters the frame-tracking mode. In this mode the receiver strives to maintain frame synchronization. It checks the validity of the headers and counts the number of invalid headers received in 64 consecutive frames after the first invalid header has been detected. If the number of invalid headers received in 64 consecutive frames is bigger than 4 then the receiver re-enters the frame-lock acquisition mode. Otherwise the receiver resets the count of invalid frames and remains in the frame-tracking mode.

#### B. Resource Usage

The full serializer-deserializer, as described above, was implemented both in a StratixIIGX and in a Virtex5FXT. Besides the transceivers and PLLs, which do not consume any resources as they are hard-coded, a single link consumes 1542 ALMs (Adaptative Logic Modules) for the StratixII and 1481 Slices for the Virtex5.

The table 1 shows the number of links which can be implemented in a selection of StratixIIGX and of Virtex5FXT devices, taking into account the available transceiver blocks and logic elements.

| Max<br>usable/available<br>nb of channel | Altera<br>Stratix II GX | Logic cells<br>usage in % |
|------------------------------------------|-------------------------|---------------------------|
| 8/8                                      | EP2SGX30D               | 92%                       |
| 12/12                                    | EP2SGX60D               | 78%                       |
| 16/16                                    | EP2SGX90E               | 69%                       |
| 20/20                                    | EP2SGX130G              | 59%                       |
| 24/24                                    |                         |                           |

Table 1: Maximum GBT links for StratixIIGX

| Max<br>usable/available<br>nb of channel | Xilinx<br>Virtex 5 | Logic cells<br>usage in % |
|------------------------------------------|--------------------|---------------------------|
| 3/8                                      | XC5VFX30T          | 87%                       |
|                                          |                    |                           |
| 10/16                                    | XC5VFX100T         | 93%                       |
| 13/20                                    | XC5VFX130T         | 94%                       |
| 20/24                                    | XC5VFX200T         | 96%                       |

Table 2: Maximum GBT links for Virtex5FXT

Differences of occupancy between Table 1 and Table 2 emphasize the different policies used by Altera and Xilinx in term of ratio between the number of logic cells and the number of transceivers. However, these numbers should be used with care. It is obvious that the occupancy of logic cells is too high if one tries to use all the available transceivers of a chip for GBT protocol implementation. This is tempered by the fact that a design using GBT links will not dedicate all its links to GBT transceivers: some links must be left to output processed data and therefore occupancy will be lower. However, as a back-end FPGA has to dedicate a significant part of its logic to other tasks, optimization of the resources used by the decoding block is a must.

## C. Optimization

An analysis of the resource usage per block for a single link (see Figure 6) quickly shows that more than half of the logic elements are used by the Reed-Solomon decoder.



Figure 6: % of ALMs/Slices of one GBT link used by each functional block

It was thus natural to study optimization schemes, particularly for designs hosting several GBT links in one device. The first possibility is to share one decoder block between several links, multiplying its operating frequency by the same factor. The Reed-Solomon decoding algorithm is a large combinatorial circuit, and the maximum operating frequency achieved was 134MHz for the StratixIIGX, applying all the timing optimization constraints available. This allowed to share one decoder block between 3 links.

An analysis of the resources used for 12 links implemented in a StratixIIGX type EP2SGX90 was carried out with and without optimization.



Figure 7: Effect of optimization by 3 on 12 links implemented on a EP2SGX90

As shown in the Figure 7, the device occupancy dropped from 51% of ALMs to 40% thanks to the optimization. Indeed, the fraction of the resources used by the decoder blocks dropped from 28% down to 10%. However, 7% of new logic elements were added due to the resource consuming multiplexers and de-multiplexers required to share the decoder.

This implementation was tested on a PCIe SIIGX development kit with three optimized links using loopback cables mounted on the HSMC connectors. It ran several days without a single error being detected.

The next step for optimization could be to pipeline the decoder algorithm to increase the clock frequency. The drawback of this implementation, beside its complexity, is that it increases the decoding latency.

#### **III. MEASUREMENTS**

#### A. Setups and equipment

Two evaluation boards were used to implement the GBT protocol on FPGAs. The ML523 (hosting a Virtex5FXT type XC5VFX100T) for Xilinx [8], the PCIe SIIGX Development Kit (hosting a StratixIIGX type EP2SGX90) for Altera [7], both powered by the power supply given in the kit (See Figure 8).



Figure 8: Evaluation platforms. ML523 from Xilinx (left) and PCIe SIIGX from Altera (right)

The reference clock was generated by the J-BERT 4903A from Agilent on differential SMA cables.

For all the qualitative measurements, the very same SFP+ 1300nm optical transceiver module from MergeOptics was used (mounted and dismounted from one board to another). The optical patch cords were 50cm long.

The jitter measurements were made at the optical receiver level with the Lecroy SDA100G sampling scope equipped with 10 GHz optical sampling head.

### B. Platform testing

Various platforms and technologies were tested by implementing the GBT protocol in both Altera and Xilinx chips presented above. As described on the Figure 9, a generator instantiated in the Virtex5 was sending parallel data (80 bits @ 40 MHz, either constant words or flying bits) to the encoder and serializer.



Figure 9: Test setup based on two platforms

The signal (that looks like a PRBS due to the scrambling) was transmitted by an SFP+ to the receiver in the StratixII over a short optical fibre (A). After full decoding (and remote monitoring of the decoded values), the data were encoded back, serialized again and transmitted using another SFP+ module and an optical fibre (B) back to the Virtex5, where it was decoded and compared to the generated words.

We let the system run during several hours without counting any error. Besides providing us an opportunity to implement the GBT protocol on both main technologies, this test allowed us to check the compatibility between the GBT-ASIC protocol and its VHDL translation: the Virtex5 had the Reed-Solomon encoder and decoder implemented in Verilog (the direct copy of the GBT protocol implementation in the ASIC), whereas the StratixII encoder and decoder were implemented in VHDL.

#### C. Jitter performances

Using the same setup, we measured the jitter out of the two optical fibres A and B in Figure 9. For each of the results below, the SFP+ module transmitting the optical signal was the same (it was successively mounted on A and B fibres to test Xilinx and Altera devices).

As presented in Figure 10, Xilinx and Altera platforms both showed excellent performances. The eyes were widely open, and the total jitter of the order of 80ps PP and 5ps RMS.



Figure 10: Eye diagrams for Xilinx Virtex5 FXT (left) and Altera StratixIIGX (right)

#### IV. SOURCE CODE AVAILABILITY

Reference designs of the GBT protocol will be made available before the end of 2009 for both Altera and Xilinx FPGAs. They will be presented as a firmware-based starter kit, downloadable on request via the CERN SVN repository. This starter kit will include the source code for both implementations, and, as much as possible, for various types of devices (StratixII and IV GX, and Virtex5 and 6 FXT) and various flavors of optimization. It will also include documentation.

Basic support will be provided on how to use and optimize the implementation.

#### V. CONCLUSION

With this study, we proved that the GBT protocol can indeed be implemented with success both in Altera and Xilinx FPGA chips. The scheme proposed in the introduction where GBT ASICs are used in detector areas and FPGAs in counting rooms is thus a valid prospect, and the developed code will now be used as a basis to test the GBT serdes chip once it becomes available.

A firmware-based starter kit will be made available upon request to the users. It will be progressively completed by several implementation flavors for StratixIV and Virtex6, and new optimization techniques like a pipelined Reed-Solomon decoder are being considered.

## VI. REFERENCES

[1] GBT project home page: https://espace.cern.ch/GBT-Project

[2] P. Moreira, GBTx specifications: https://espace.cern.ch/GBT-Project/GBTX/Specifications/gbtxSpecsV1.2.pdf

[3] F. Vasey, "Versatile Link", ACES 2009 workshop, 3-4 March 2009, CERN, Geneva: http://indico.cern.ch/contributionDisplay.py?contribId=37&se

ssionId=22&confId=47853

[4] GBT-FPGA project web site: https://espace.cern.ch/GBT-Project/GBT-FPGA

[5] G. Papotti, "Architectural studies of a radiation-hard transceiver ASIC in 0.13 mm CMOS for digital optical links in high energy physics applications", PhD thesis, University of Parma, Italy, January 2007.

http://papotti.web.cern.ch/papotti/tesi.pdf.

[6] G. Papotti, "An Error-Correcting Line Code for a HEP Rad-Hard Multi-GigaBit Optical Link", 12<sup>th</sup> Workshop for LHC and future Experiments (LECC 2006), Valencia, Spain, 25-29 September 2006, pp.258-262.

http://indico.cern.ch/contributionDisplay.py?contribId=30&se ssionId=19&confId=574

[7] Documentation on Altera PCI express Development Kit, StratixIIGX Edition:

http://www.altera.com/products/devkits/altera/kitpciexpress\_s2gx.html

[8] Documentation on Xilinx Virtex5 FXT ML523 RocketIO GTX characterization Platform: <u>http://www.xilinx.com/products/devkits/HW-V5-ML52X-UNI-G.htm</u>

## FPGA-based Bit-Error-Rate Tester for SEU-hardened Optical Links

S. Detraz<sup>a</sup>, S. Silva<sup>a</sup>, P. Moreira<sup>a</sup>, S. Papadopoulos<sup>a</sup>, I. Papakonstantinou<sup>a</sup>

S. Seif El Nasr<sup>a</sup>, C. Sigaud<sup>a</sup>, C. Soos<sup>a</sup>, P. Stejskal<sup>a</sup>, J. Troska<sup>a</sup>, H. Versmissen<sup>a</sup>

<sup>a</sup> CERN, 1211 Geneva 23, Switzerland

csaba.soos@cern.ch

## Abstract

The next generation of optical links for future High-Energy Physics experiments will require components qualified for use in radiation-hard environments. To cope with radiation induced single-event upsets, the physical layer protocol will include Forward Error Correction (FEC). Bit-Error-Rate (BER) testing is a widely used method to characterize digital transmission systems. In order to measure the BER with and without the proposed FEC, simultaneously on several devices, a multi-channel BER tester has been developed. This paper describes the architecture of the tester, its implementation in a Xilinx Virtex-5 FPGA device and discusses the experimental results.

#### I. INTRODUCTION

High-speed optical links offer many advantages, which make them an attractive choice for today's communication systems. In order to reach the multi-gigabit domain, these systems have to fulfill many stringent requirements (e.g. low jitter, low noise etc.), which is a very challenging task for both component manufacturers and system designers. In addition, using these links in future High-Energy Physics (HEP) experiments at CERN's upgraded Large Hadron Collider (super LHC or SLHC), requires special care to be taken during component selection, testing and verification. The selected components will be required to operate at 5 Gbit/s and beyond (up to 10 Gbit/s), with low power dissipation in high-radiation-level environment [1].

To address these challenges, a radiation hard optical link is being developed by CERN and collaborating institutes. The work is shared between two sub-projects: the GigaBit Transceiver (GBT) project [2] is responsible for the design of radiation-hard ASICs and the implementation of the custom physical layer protocol in FPGA devices [3]; while the Versatile Link (VL) project [4] covers the system architectures and the required link components. The proposed system architecture is shown in Figure 1.

#### II. COMPONENT TESTING

In order to qualify components for the next generation of radiation hard optical links, their performance must be evaluated in the laboratory and in a radiation environment. Laboratory evaluation based on eye diagram measurements has been implemented by our group [5] [1]. It proposes a method for a visual comparison of the different modules that provides a good insight into the performance of the transceivers. However, eye diagram measurements cannot easily be used for Single-Event Upset (SEU) tests where rarely occurring events must be captured.

#### A. Bit-error-rate testing

The bit error rate (BER) is an important characteristic of a digital communication system. During a BER test, a known bit sequence is transmitted through the system. At the output the received bits are compared with the expected ones. The BER can be calculated using the following simple equation.

$$BER = \frac{number \, of \, bit \, errors}{total \, number \, of \, bits} \tag{1}$$



Figure 1: Radiation hard optical link architecture

Although this equation is very simple, the exact BER can be achieved only if the denominator approaches infinity. Since it is not possible to meet this requirement in real life, the BER is usually measured within the so-called confidence interval (CI). The width of the CI is defined by the confidence level (CL). Assuming that the errors will occur in the system due to random noise, we can calculate the time (T) required to reach the target BER using the following equation [6] [7],

$$T = -\frac{\ln(1 - CL)}{BER * R} + \frac{\ln(\sum_{k=0}^{N} \frac{(n * BER)^{k}}{k!})}{BER * R}$$
(2)

where R is the line rate, n is the total number bits transmitted, and N is the number of errors that occurred during the transmission. This equation represents a trade-off between test time and confidence level. When N = 0 (i.e. error free transmission), the solution of Equation 2 is trivial. The result is shown in Figure 2.



Figure 2: Time required to reach  $10^{-12}$  BER vs. line rate



Figure 3: Illustration of the effect of SEUs on a photodiode

The test time can be reduced by stressing the system [8]. The idea of the accelerated BER testing is based on the the assumption that the errors in the system are caused by Gaussian noise. By reducing the signal level while keeping the noise constant, the signal to noise ratio (SNR) is also reduced which in turn will

increase the error rate leading to a shorter test. In the presence of radiation, however, there is a region where the error rate is dominated by the SEUs (see Figure 3) [9].

#### B. Custom BERT

Measuring several transceiver components in a radiation environment sequentially is not practical. A multi-channel BERT can greatly improve the overall run time and simplify the procedure. In addition, by implementing the custom physical layer protocol proposed by the GBT project, the custom BERT will be able to show the performance of the applied FEC during SEU tests. Finally, the addition of an error logging facility will help us to better understand the error propagation mechanisms in the overall system.

#### **III. IMPLEMENTATION**

The BERT is implemented on an ML523 Transceiver Characterization Platform from Xilinx (see Figure 4) [10]. The board features a Virtex-5 FPGA (XC5VFX100T) that supports up to 16 high-speed transceivers each operating at up to 6.5 Gbit/s speed. In addition, the board contains 128 MB DDR2 memory connected to the FPGA device. The transceivers as well as the clock resources are accessible through high-quality SMA connectors. For low-speed communication with the board, there is a standard serial port (RS232) available on the board. Device programming and debugging can be done using the JTAG interface. The firmware running on the platform and the software controlling the operation are detailed in the next sections.



Figure 4: ML523 transceiver characterization platform from Xilinx

#### A. Firmware

The firmware design is based on the System-on-Chip (SoC) concept. The architecture is shown in Figure 5. The system is built around one of the two embedded processor blocks available in the Virtex-5 FPGA. The processor block contains a PowerPC 440 processor, crossbar and its interfaces. The crossbar can be connected to both master and slave peripherals in the system using Processor Local Bus (PLB) interfaces. In this design we use two PLBs to improve overall system performance.

Certain peripherals can also be provided with access to external memory via the crossbar. The communication between the control software and the processor is established using the standard UART peripheral, while the BERT specific functions are included in the BERT core. This latter is detailed in the following percentage



Figure 5: Firmware architecture

The BERT core is a custom peripheral with slave and master PLB interfaces and the high-speed serial terminals. The slave interface gives access to several control and status registers. The master interface is used to transfer messages to the external memory that need to be recorded during the measurement. The high-speed ports are connected to the external device under test. Inside the module, there are two separate data paths (see Figure 6). The transmitter path contains a Generator which produces simple test patterns. The data are encoded to GBTcompatible frames by the GBT Encoder. These frames include the FEC bits which allow the receiver to correct the errors that eventually occur during the transmission. For debugging purposes, errors can be injected into the transmitted data path at a programmable rate. The frames are converted to a high-speed serial stream by the multi-gigabit transceiver (MGT). Upon reception, the MGT receiver converts the serial bit stream words which in turn are processed by the GBT Decoder. coder corrects the errors using the redundancy field en in each frame



Figure 6: The firmware architecture of the BERT core

Besides the main data paths, the BERT core contains two feedback paths from the transmitter to the receiver carrying data

from the pattern generator and from the GBT Encoder. To compensate for the latency that occurs during the transmission, these paths are routed through delay lines which are adjusted dynamically. In the receiver, the data from the generator and from the encoder are compared with received data available before and after the GBT decoder respectively. The differences are accumulated by counters and the values are used to calculate the line and system error rates. Using these two values, the performance of the FEC can be measured.

#### B. Software

The proper functioning of the BERT is ensured by a piece of software running on the embedded processor and a Labview script which is executed on the host computer. The firmware is responsible for the communication between the firmware and the PC, while the latter controls the measurement and provides a graphical interface (GUI) for the user.

The embedded process is a simple command interpreter. The commands, in the form of strings, are sent from the Labview script and received by the UART. The results are sent back through the serial port following the execution of the commands. The interpreter supports register read and write, as well as more complex sequences like the initialization. It can be easily extended or modified in case new functions are needed.

The Labview script is organized in two nested loops (see Figure 7). The outer loop controls the instruments (e.g. the optical attenuator) and initializes the tester. Following initialization, the script verifies the link status of the selected channels and masks inactive channels. The inner loop reads the counters of the active channels and checks whether the stop criteria are met. The values are recorded before the outer loop is restarted. The measurement is finished when a preset target BER is reached on all the active channels.



Figure 7: Measurement flow



Figure 8: Lab test setup

#### **IV. MEASUREMENTS**

#### A. Test setup

The measurements in the lab are carried out with the setup described hereafter. The reference clock is generated by a highprecision clock source. The electrical interface of the optical transceiver is connected to one of the available high-speed channels on the BERT. The optical output from the transceiver is fed through an attenuator followed by a splitter. One splitter branch is used to close the optical loop while the other is connected to the optical power meter for monitoring purposes. The test instruments, as well as the BERT are controlled by a Labview script executed on a host PC (see Figure 8).

#### B. Results

Several single-mode (SM) and multi-mode (MM) transceiver modules were tested in the laboratory using the BERT. In some cases the package shielding was partially removed from the module by the manufacturer in order to reduce the mass sufficiently to allow the modules to be used inside a detector. The modules tested are summarized in Table 1.

Table 1: Summary table of the tested optical transceivers

| Туре        | Laser | Package |
|-------------|-------|---------|
| Single-mode | VCSEL | closed  |
| Single-mode | DFB   | closed  |
| Single-mode | DFB   | open    |
| Multi-mode  | VCSEL | closed  |
| Multi-mode  | VCSEL | open    |

A scan of the line BER was carried out on all the devices, in order to compare their performance. The BER curves measured on the SM modules are shown in Figure 9. The results show no large differences between the transceivers, which is a promising preliminary information about the impact of the reduced shielding before the detailed EMI tests will take place.



Figure 9: Test results of the single-mode modules



Figure 10: System and line BER



Figure 11: System and line BER, with errors injected

The coding gain can be defined as the difference between transmit power required to send and receive error-free data without FEC and the transmit power required when the FEC is used. The coding gain is usually expressed in decibels (dB). In order to measure the coding gain of the FEC used in the GBT protocol, we can use line and system BER values recorded during the tests. The two curves in Figure 10 show an example. According to these results, the FEC implemented in the GBT protocol represents approximately 2.5 dB coding gain.

To demonstrate the error correcting capability of the physical layer protocol, one more measurement was done. During this test, the BERT was configured to inject burst of errors in the encoded data as explained earlier in Section A. The result (Figure 11) shows that the line error rate is limited as expected. However, since the burst length does not exceed the correction capability of the decoder, the errors are fixed in the receiver and the system BER will continue to fall as the optical power is increased.

#### V. CONCLUSION

Optical transceiver components will be tested to verify their compliance with the requirements of next generation radiation hard optical links in High-Energy Physics experiments. In order to quantify the effects of radiation, the components will be irradiated and the impact of the SEU on the BER will be investigated.

A multi-channel BER tester supporting the measurement of several components simultaneously has been developed. The BER tester operates at multiple data rates up to a maximum of 6.5 Gbit/s.

The described BER tester was fully verified in the laboratory. It was used to measure the performance of commercial transceivers and to study the impact of different packaging solutions on the BER. In addition, by calculating the BER both before and after the error correction, the tool allowed us to evaluate the performance of FEC implemented in the GBT protocol.

#### REFERENCES

- [1] J. Troska et. al., "The Versatile Transceiver Proof of Concept", these proceedings
- [2] P. Moreira et. al., "The GBT project", these proceedings
- [3] S. Baron, J. P. Cachemiche, F. Marin, P. Moreira, C. Soos, "Implementing the GBT data transmission protocol in FP-GAs", these proceedings
- [4] F. Vasey et. al., "The Versatile Link Common Project", V4.1, 20/3/2009, submitted to JINST
- [5] L. Amaral, J. Troska, A. J. Pacheco, S. Dris, D. Ricci, C. Sigaud, F. Vasey, "Evaluation of Multi-Gbps Optical Transceivers for Use in Future HEP Experiments", *Proc. of the Topical Workshop on Electronics for Particle Physics*, pp.161-166 (2008), CERN-2008-008
- [6] Maxim, "Statistical Confidence Levels for Estimating Error Probability", Maxim Engineering Journal (Vol. 37), pp. 12 (2000), http://pdfserv.maxim-ic.com/en/ej/EJ37.pdf
- [7] M. A. Kossel and M. L. Schmatz, "Jitter Measurements of High-Speed Serial Links", *IEEE Design and Test of Computers (Vol. 21 No. 6)*, pp.536-543 (2004)
- [8] D. H. Wolaver, "Measure Error Rates Quickly and Accurately", *Electronic Design*, pp.89-98 (1995)
- [9] J. Troska, A. J. Pacheco, L. Amaral, S. Dris, D. Ricci, C. Sigaud, F. Vasey and P. Vichoudis, "Single-Event Upsets in Photodiodes for Multi-Gb/s Data Transmission", *Proc. Topical Workshop on Electronics for Particle Physics*, pp.161-166 (2008), CERN-2008-008
- [10] Xilinx, "Virtex-5 LXT/FXT FPGA ML52x RocketIO GTP/GTX Characterization Platforms" Available online: http://www.xilinx.com/products/devkits/ HW-V5-ML52X-UNI-G.htm

# LIST OF PARTICIPANTS

## A

ABELLAN Carlos La Salle. Universitat Ramón Llull Ribagorza s/n 08022 Barcelona SPAIN

ABI Babak Oklahoma State University 145, PS II Bldg. 74078-3072 Stillwater UNITED STATES

ABOVYAN Sergey Max-Planck-Institute for Physics, Munich Foehringer Ring 6 80805 Munich GERMANY

AGLIERI RINELLA Gianluca CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

ALESSIO Federico CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

ALOISIO Alberto University and INFN, Napoli Via Cintia 80126 Napoli ITALY

ANGHINOLFI Francis CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

ANTONA Jean-Christophe ALCATEL-LUCENT Route de Villejust 91620 Nozay FRANCE

## ARMBRUSTER Tim

Heidelberg University B6, 26 68131 Mannheim GERMANY

## ARNABOLDI Claudio

INFN Milano Bicocca P.zza della scienza, 3 20126 Milan ITALY

## ARTECHE GONZÁLEZ Fernando

Instituto Tecnologico de Aragon CL Maria de Luna, 7-8 50018 Zaragoza SPAIN

#### ATINGABUNOR Amoah George

Khomanani Business College 155 Commissioner Street Ilpa House, 4th Floor 2001 Johannesburg SOUTH AFRICA

AUGÉ Etienne CNRS/IN2P3 23 rue Michel-Ange 75016 Paris FRANCE

## B

**BARON** Sophie CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

## BASCHIROTTO Andrea

University of Milano-Bicocca Corso Alessandria 166 15057 Tortona ITALY

## **BEAUMONT** Wim

Universiteit Antwerpen Groenerborgerlaan 171 2020 Antwerpen BELGIUM

## **BECHETOILLE** Edouard

CNRS/IN2P3/IPNL 4 rue Enrico Fermi 69622 Villeurbanne FRANCE **BEIGBEDER** Christophe CNRS/IN2P3/LAL BP 34 91898 Orsay FRANCE

BERTOLUCCI Sergio CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

BHATTACHARYYA Shuvra

University of Maryland 2311 A. V. Williams Bldg. 20742 College Park UNITED STATES

**BIALAS** Wojciech CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

BLANCHOT Georges CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

**BOCHENEK** Michal CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

BOEK Jennifer Bergische Universitaet Wuppertal Gaußstr. 20 42119 Wuppertal GERMANY

**BOHM** Christian University of Stockholm AlbaNova University Center 106 91 Stockholm SWEDEN

**BOHNER** Gérard CNRS/IN2P3/LPC Clermont 24 rue des Landais 63177 Aubière FRANCE **BONACINI** Sandro CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

## **BORCHERDING** Fred

CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

#### **BORTOLIN** Claudio

University of Udine & INFN Padova Via Prepier 20 31040 Chiarano ITALY

## BREKKE Njål

University of Bergen Lyshovden 340 5148 Fyllingsdalen NORWAY

**BRETON** Dominique CNRS/IN2P3/LAL Bât. 200 91898 Orsay

FRANCE

## BUYTAERT Jan

CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

## С

CACHEMICHE Jean-Pierre CNRS/IN2P3/CPPM 163, avenue de Luminy - case 902 13288 Marseille FRANCE

**CALLIER** Stephane CNRS/IN2P3/LAL/OMEGA Université Paris Sud, Bât .200 91898 Orsay FRANCE

CAPONETTO Luigi CNRS/IN2P3/CPPM-INFN 163, avenue de Luminy 13288 Marseille FRANCE **CASELLE** Michele CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

CAVICCHIOLI Costanza CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

**CHANAL** Hervé CNRS/IN2P3/LPC Clermont 24 avenue des Landais 63177 Aubière FRANCE

## **CHARLET** Daniel

CNRS/IN2P3/LAL Bât. 200 91898 Orsay cedex FRANCE

## CHECCUCCI Bruno

INFN Perugia Italy Via P. Mesastris, 30 06034 Foligno ITALY

## CHRISTIANSEN Jorgen CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

**CLÉMENS** Jean-Claude CNRS/IN2P3/CPPM 163 avenue de Luminy 13288 Marseille FRANCE

**COATH** Rebecca STFC - Rutherford Appleton Laboratory R76 1st Floor OX11 0QX Didcot UNITED KINGDOM

**COBANOGLU** Ozgur CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

## **COLLAZUOL** Gianmaria

Scuola Normale Superiore - Pisa Piazza dei Cavalieri 7 56126 Pisa ITALY

## **COLLEDANI** Claude

IPHC 23 rue du Loess 67037 Strasbourg FRANCE

## CONFORTI DI LORENZO Selma

CNRS/IN2P3/LAL/OMEGA Bât. 200 - BP34 91898 Orsay FRANCE

**COSTA** Filippo CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

**CRAMPON** Sébastien CNRS/IN2P3/LPC Clermont 24, avenue des Landais 63177 Aubière FRANCE

## D

DABROWSKI Wladyslaw AGH -UST Krakow Al. Mickiewicza 30 30-059 Krakow POLAND

**DE LA TAILLE** Christophe CNRS/IN2P3/LAL/OMEGA Bât. 200 Centre Universitaire 91898 Orsay FRANCE

## **DE PEDIS** Daniele

INFN - Roma1 P. le Aldo Moro 2 00185 Roma ITALY

**DELAGNES** Eric CEA/IRFU CEA Saclay bat 141 91191 Gif sur Yvette Cedex FRANCE **DELORD** Vincent CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

DHAWAN Satish Yale University P.O. Box 208120 06520-8120 New Haven UNITED STATES

**DI MAIO** Gianni CAEN SpA Via Vetraia, 11 55049 Viareggio ITALY

**DÍEZ** Sergio Microelectrónica Inst. Barcelona IMB-CNM IMB-CNM, Campus UAB 08193 Bellaterrra SPAIN

**DOPKE** Jens Bergische Universitaet Wuppertal Gaussstr. 20 42119 Wuppertal GERMANY

**DULINSKI** Wojciech CNRS/IN2P3/IPHC 23 rue du Loess 67037 Strasbourg FRANCE

**DWUZNIK** Michal AGH -UST Krakow Kazimierzowskie 18/11 31-841 Krakow POLAND

**DZAHINI** Daniel CNRS/IN2P3/LPSC 53 avenue des Martyrs 38026 Grenoble FRANCE

## E

**ESTEBAN LALLANA** Cristina Instituto Tecnologico de Aragon CL MARÍA DE LUNA, 7-8 50018 Zaragoza SPAIN

## F

FACCIO Federico CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

FARTHOUAT Philippe CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

FELD Lutz RWTH Aachen University Sommerfeldstr. 14 52074 Aachen GERMANY

**FERNÁNDEZ** Cristina CIEMAT Avda. Complutense, 22 28040 Madrid SPAIN

FERRO-LUZZI Massimiliano CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

FOUDAS Costas Imperial College London Prince Consort Road SW7 2BW London UNITED KINGDOM

FOUQUE Nadia CNRS/IN2P3/LAPP BP110 chemin de Bellevue 74941 Annecy le vieux FRANCE

**FRANÇA** Hugo CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

**FRIEDL** Markus HEPHY Vienna Nikolsdorfergasse 18 1050 Vienna AUSTRIA FU Yunan CNRS/IN2P3/IPHC 23 rue du Loess 67037 Strasbourg FRANCE

FUKUNAGA Chikara Tokyo Metropolitan University 1-1 Minami-Osawa 192-0397 Hachioji JAPAN

## G

GABRIELLI Alessandro INFN & University of Bologna Viale Berti Pichat 6/2 40127 Bologna ITALY

**GAGLIONE** Renaud CNRS/IN2P3/LAPP LAPP 9 chemin de Bellevue BP110 74941 Annecy le vieux FRANCE

GALLIN-MARTEL Laurent CNRS/IN2P3/LPSC 52 mue des Monture

53 rue des Martyrs 38026 Grenoble FRANCE

GAN K.K. The Ohio State University 191 W Woodruff Ave 43210 Columbus UNITED STATES

**GARNIER** Jean-Christophe CERN Route de Meyrin CH-1211 Geneva 23 SWITZERLAND

**GENAT** Jean-Francois University of Chicago 5640 S. Ellis Av. 60615 Chicago UNITED STATES

**GENSOLEN** Fabrice CNRS/IN2P3/CPPM 163 av de Luminy 13288 Marseille FRANCE

## GHARBI Mohammed

Blue Now 13 rue Paul Langevin 93270 Sevran FRANCE

**GIGI** Dominique CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

GILL Karl CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

## GIORDANO Raffaele

Universita' di Napoli 'Federico II' and INFN Via Cintia, Universita' Monte S. Angelo Edificio G 80126 Napoli ITALY

## GIPPER Jerry

Embedify LLC 929 W. Portobello Avenue 85210 Mesa UNITED STATES

GONG Datao Southern Methodist University 6520 Gold In 75023 Plano UNITED STATES

## **GONIDEC** Allain

CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

GOUSIOU Evangelia CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

## **GRABAS** Herve

University of Chicago 5640 S. Ellis Av. 60615 Chicago UNITED STATES **GRASSI** Tullio FNAL/CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

**GREENALL** Ashley The University of Liverpool Department of Physics L69 7ZE Liverpool UNITED KINGDOM

**GREGERSON** Anthony University of Wisconsin - Madison 6301 Offshore Dr Apt 317 53705 Madison UNITED STATES

**GRILLO** Alexander University of California Santa Cruz 1156 High Street 95064 Santa Cruz UNITED STATES

**GROMOV** Vladimir NIKHEF Science Park 105 1098 XG Amsterdam THE NETHERLANDS

**GUO** Yixian CNRS - LPNHE Tour 43 RDC 4 place Jussieu 75252 Paris FRANCE

GUZIK Zbigniew Soltan Institute for Nuclear Studies Warsaw, Swierk 05-400 Otwock POLAND

## H

HAAS Stefan CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

HALL Geoff Imperial College London Blackett Lab SW7 2AZ London UNITED KINGDOM HALLEWELL Greg

CNRS/IN2P3/CPPM Centre de Physique des Particules de Marseille 13288 Marseille FRANCE

HANSEN Magnus CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

HAZEN Eric Boston University 590 Commonwealth Ave 02215 Boston UNITED STATES

HEGARTY Seamus CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

HERMEL Richard CNRS/IN2P3/LAPP LAPP BP110 74941 Annecy le vieux FRANCE

HIMMI Abdelkader CNRS/IN2P3/IPHC 23 rue du Loess 67037 Strasbourg FRANCE

HOLT Richard Rutherford Appleton Laboratory RAL R1 /1.29 OX11 0BX Didcot UNITED KINGDOM

HU Christine CNRS/IN2P3/IPHC 23 rue du Loess 67037 Strasbourg FRANCE

HUFFMAN Brian Todd Oxford University Denys Wilkinson Blg, Keble Road OX1 3RH Oxford UNITED KINGDOM

## I

**ICHIMIYA** Ryo KEK 1-1 Oho

305-0801 Tsukuba JAPAN

ILES Gregory Imperial College Blackett Lab SW7 2BW London UNITED KINGDOM

# **INDELICATO** Paul UPMC

4 Place Jussieu 75005 Paris FRANCE

# **IRMLER** Christian

Hephy Vienna Nikolsdorfergasse 18 1050 Vienna AUSTRIA

## ITOH Ryosuke

KEK 1-1 Oho 305-0801 Tsukuba JAPAN

# J

JAEGER Jean-Jacques CNRS/IN2P3/APC 10 rue A. Domon et L. Duquet 75013 Paris FRANCE

**JEGLOT** Jimmy CNRS/IN2P3/LAL Bât. 200 91898 Orsay FRANCE

**JOHNSON** Marvin Fermilab ms 352, PO box 500 60510 Batavia UNITED STATES

## JONES John Princeton University 389 Washington Street, Apt. 7A 07302 Jersey City UNITED STATES

# JURAMY Claire

CNRS/IN2P3/LPNHE 4 place Jussieu 75005 Paris FRANCE

JURGA Piotr CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

## K

**KARPINSKI** Waclaw RWTH Aachen Sommerfeldstrasse 14 52146 Aachen GERMANY

**KLABBERS** Pamela CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

KLEIN Katja RWTH Aachen Sommerfeldstrasse 14 52074 Aachen GERMANY

## KLOUKINAS Kostas CERN

Route de Meyrin 1211 Geneva 23 SWITZERLAND

**KLUGE** Alexander CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

**KRAEMER** Markus TU-Munich

Steiningerweg 10 D-85748 Garching GERMANY **KRIVDA** Marian University of Birmingham B15 2TT Birmingham UNITED KINGDOM

## KRUTH Andre

University of Bonn Auf dem Hügel 113 53121 Bonn GERMANY

**KUGATHASAN** Thanushan University of Turin Via Pietro Giuria 1 10100 Torino ITALY

KULIS Szymon AGH-UST Krakow al. Mickiewicza 30 PL-30059 Cracow POLAND

KUSHPIL Vasily Nuclear Physics Institute of ASCR NPI of ASCR 25068 Rez near prague CZECH REPUBLIC

**KVASNICKA** Jiri Institute of Physics of ASCR Na Slovance 2 18221 Prague 8 CZECH REPUBLIC

## L

LARSEN Dag Toppe University of Bergen Allégaten 55 5007 Bergen NORWAY

**LEBBOLO** Hervé CNRS / IN2P3 / LPNHE Université P & M Curie 4 place Jussieu T43 RC 75252 Paris cedex 05 FRANCE

**LECOQ** Jacques CNRS/IN2P3/LPC Clermont Campus des Cézeaux 63177 Aubière FRANCE LESHEV Georgi CERN/ETH Zurich Route de Meyrin 1211 Geneva 23 SWITZERLAND

## LICHARD Peter

CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

LINSSEN Lucie CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

LIU Tiankuan Southern Methodist University 3215 Daniel Avenue, Physics 75206 Dallas UNITED STATES

**LOFFREDO** Salvatore INFN and Università Roma Tre Via della Vasca Navale, 84 00146 Rome ITALY

LUSIN Sergei Fermilab/CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

## Μ

MAALMI Jihane CNRS/IN2P3/LAL Bat 200 Universite Paris Sud 91898 Orsay FRANCE

MACCHIOLO Anna Max-Planck-Institute for Physik Föhringer Ring 6 80805 Munich GERMANY

MADORSKY Alexander University of Florida Museum rd and Lemerand dr 32611 Gainesville UNITED STATES

## MARCHIORO Alessandro CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

MARIN Frédéric

CNRS/IN2P3/CPPM 163 avenue de Luminy Case 902 13288 Marseille FRANCE

## MARTIN-CHASSARD Gisèle

CNRS/IN2P3/LAL/OMEGA Centre Universitaire Paris 11 Bât 200 91898 Orsay FRANCE

## MATHEZ Hervé

CNRS/IN2P3/IPNL Batiment Paul Dirac 4 Rue Enrico Fermi 69622 Villeurbanne FRANCE

## **MATRICON** Pierre

CNRS/IN2P3/LAL Laboratoire LAL 91898 Orsay FRANCE

## MATTIAZZO Serena

INFN & University of Padova Via Marzolo 8 35131 Padova ITALY

## MATVEEV Mikhail Rice University MS 315, 6100 Main Street 77005 Houston

UNITED STATES

MAURO Sergio Wiener Plein & Baus GmbH Muellersbaum 20 51399 Burscheid GERMANY

MAZZA Giovanni INFN sez. di Torino Via P. Giuria 1 10125 Torino ITALY **MENOUNI** Mohsine CNRS/IN2P3/CPPM Marseille 163, avenue de Luminy - case 902 13228 Marseille FRANCE

## **MESCHINI** Marco

Istituto Nazionale di Fisica Nucleare via Montecapri 23 B 50026 San Casciano in Val di Pesa ITALY

#### MICHELIS Stefano CERN

Route de Meyrin 1211 Geneva 23 SWITZERLAND

MOINE Catherine CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

## MOREIRA Paulo CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

## MORRIS John

Queen Mary University of London Mile End Road London E1 4 NS UNITED KINGDOM

## MOSER Hans-Günther

Max-Planck-Institute for Physik Otto-Hahn-Ring 6 81739 Munich GERMANY

# MÜLLER Felix

University of Heidelberg Im Neuenheimer Feld 227 69120 Heidelberg GERMANY

## MUSA Luciano CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

## Ν

**NEWCOMER** F. Mitchell University of Pennsylvania 209 S. 33rd St 19104 Philadelphia UNITED STATES

NOËL Guillaume CNRS/IN2P3/IPNO Centre universitaire Paris 11 -bât. 102 91406 Orsay FRANCE

**NOULIS** Thomas Aristotle University of Thessaloniki Aristotle University Campus 54124 Thessaloniki GREECE

## 0

**OLIVER** John Harvard University 18 Hammond St 02138 Cambridge UNITED STATES

**OZIOL** Christophe CNRS/IN2P3/IPNO Université Paris Sud, Bâtiment 102 91406 Orsay FRANCE

# P

PAPAKONSTANTINOU Ioannis CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

**PETROLO** Emilio INFN Roma Viale della Tecnica, 185 00144 Rome ITALY

**PHAM** Thanh Hung CNRS/IN2P3/LPNHE 4 Place Jussieu - Tour 43 - RDC 75005 Paris FRANCE PHILLIPS Peter

STFC Rutherford Appleton Laboratory HSIC OX11 0QX Didcot UNITED KINGDOM

# **PIGUET** Christian

CSEM Neuchâtel Jaquet-Droz 1 2000 Neuchâtel SWITZERLAND

PLACKETT Richard CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

**PROSSER** Alan Fermilab Wilson and Kirk Roads 60510 Batavia UNITED STATES

**PRYDDERCH** Mark STFC - Rutherford Appleton Laboratory HSIC OX11 0QX Didcot UNITED KINGDOM

**QUINTON** Steve STFC - Rutherford Appleton Laboratory OX11 0QX Didcot UNITED KINGDOM

# R

RACZ Attila CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

**RADEKA** Veljko Brookhaven National Laboratory North Technology Street, 535B 11973 Upton UNITED STATES

**RANIERI** Antonio INFN Bari Via Orabona, 4 70126 Bari ITALY
RARBI Fatah CNRS/IN2P3/LPSC 53 rue des Martyrs 38026 Grenoble FRANCE

**RAULY** Emmanuel CNRS/IN2P3/IPNO Université Paris-Sud bât. 102 91406 Orsay FRANCE

**RICHER** Jean-Pierre CNRS/IN2P3/LPSC 53 Avenue des Martyrs 38026 Grenoble FRANCE

**RICHTER** Robert Max-Planck-Inst. for Physics, Munich Foehringer Ring 6 80805 Muenchen GERMANY

**RUDERT** Agnes Max-Planck-Institute for Physik Föhringer Ring 6 80805 Munich GERMANY

**RUSE** Ludovic PHYSICAL Instruments 6, Impasse Ledru Rollin 94170 Le Perreux sur Marne FRANCE

**RYJOV** Vladimir CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

#### S

SALOMON Franck CNRS/IN2P3/IPNO Université Paris Sud, Bâtiment 102 91406 Orsay FRANCE

SARRABAYROUSE Gérard CNRS/IN2P3/LAAS 7, AVE COLONEL ROCHE 31077 Toulouse cedex 4 FRANCE SASAKI Osamu KEK 1-1 Oho 305-0801 Tsukuba JAPAN

SCHEIRICH Jan Charles University in Prague Jaroslava Sipka 273 03 Stochov CZECH REPUBLIC

**SCHOPFERER** Sebastian Universität Freiburg Hermann-Herder-Str. 3 79104 Freiburg GERMANY

SCHROER Nicolai ZITI - University of Heidelberg G2, 11 68159 Mannheim GERMANY

SCHULTE Michael University of Wisconsin-Madison 1415 Engineering Dr. 53706 Madison UNITED STATES

SCRENCI Adamo Blue Now 13 rue Paul Langevin 93270 Sevran FRANCE

SEDITA Mario INFN Catania Via S. Sofia 62 95123 Catania ITALY

**SEFRI** Rachid CNRS/IN2P3/LPNHE LPNHE Tour 43 rdc 75252 Paris FRANCE

SEGUIN-MOREAU Nathalie CNRS/IN2P3/LAL Universite Paris-Sud bat 200 91898 Orsay FRANCE SERIN Laurent CNRS/IN2P3/LAL Bât. 200 91405 Orsay FRANCE

SILVA Sérgio CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

SINGOVSKI Alexander CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

SIPALA Valeria INFN sez Catania Via S. Sofia, 64 I-95123 Catania ITALY

SMITH Wesley University of Wisconsin 1150 University Ave 53706 Madison UNITED STATES

SOOS Csaba CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

SPENCER Edwin UC Santa Cruz

SCIPP, NS II, Room 307 95064 Santa Cruz UNITED STATES

SPIWOKS Ralf CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

#### STRAESSNER Arno

TU Dresden IKTP 01062 Dresden GERMANY SUZUKI Yu KEK 1-1 Oho 305-0801 Tsukuba JAPAN

#### Т

THIENPONT Damien CNRS/IN2P3/LAL/OMEGA Bât. 200, BP 34 91898 Orsay FRANCE

**THOME** John R. EPFL Lausanne EPFL-STI-IGM-LTCM, Mail 9 1009 Lausanne SWITZERLAND

**TIC** Tomas STFC/RAL 34 Foxhall Road OX11 7AA Didcot UNITED KINGDOM

TOCUT Vanessa

CNRS/IN2P3/LAL LAL BP 34 91898 Orsay FRANCE

**TORHEIM** Olav University of Bergen, Norway

5008 Bergen NORWAY

#### TROSKA Jan

CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

#### U

ULLAN Miguel CNM-Barcelona Campus UAB, Bellaterra 08193 Barcelona SPAIN V

VALIN Isabelle CNRS/IN2P3/IPHC 23 rue du Loess 67037 Strasbourg FRANCE

VANKOV Peter University of Liverpool 117 Crown Station Place L7 3LB Liverpool UNITED KINGDOM

VARI Riccardo INFN Roma Piazzale Aldo Moro 2 00185 Rome ITALY

VASEY François CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

**VENEZIANO** Stefano INFN Roma Piazzale Aldo Moro 2 00185 Rome ITALY

**VENTURINI** Guiseppe CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

**VERT** Pierre Etienne CNRS/IN2P3/LPC Clermont 24 avenue des Landais 63177 Aubière FRANCE

VICHOUDIS Paschalis CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

**VIVALDI** Franco CAEN SpA Via Vetraia, 11 55049 Viareggio ITALY

#### W

WARREN Matthew UCL London Gower Street WC1E 6BT London UNITED KINGDOM

WEBER Bradley Max-Planck-Inst. for Physics, Munich Foehringer Ring 6 80805 Munich GERMANY

WEBER Marc KIT (Karlsruhe Institute of Technology) Institute for Data Processing and Electronics Herrmann-von-Helmholtz-Platz 1 76344 Eggenstein-Leopoldshafen GERMANY

WICEK Francois CNRS/IN2P3/LAL LAL - Centre Scientifique d'Orsay 91898 Orsay cedex FRANCE

WIJNANDS Thijs CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

**WINTER** Marc CNRS/IN2P3/IPHC Strasbourg 23 rue du Loess

67037 Strasbourg FRANCE

WORMSER Guy CNRS/IN2P3/LAL BP 34 91898 Orsay cedex FRANCE

WYLLIE Kenneth CERN Route de Meyrin 1211 Geneva 23 SWITZERLAND

#### Х

XIANG Annie Chu Southern Methodist University 3215 Daniel Ave 75275 Dallas UNITED STATES

XIAOSHAN Jiang IHEP P.O.Box 918/1 100049 Beijing CHINA

#### Y

YAN Xiongbo CNRS/IN2P3/LAL Bât. 200 - BP 34 91898 Orsay FRANCE

YANG Shiming University of Bergen Allegaten 55 5007 Bergen NORWAY

#### YAREMA Ray

Fermilab 301 Montclair Ave 60137 Glen Ellyn UNITED STATES

YE Jingbo Southern Methodist University 3215 Daniel Avenue 75275 Dallas UNITED STATES

#### Z

**ZIVKOVIC** Vladimir Nikhef Science Park 105 1098 XG Amsterdam NETHERLANDS

#### **ZOCCARATO** Yannick

CNRS/IN2P3/IPNL 4 rue Enrico Fermi 69622 Villeurbanne FRANCE

# INDEX

### A

| Abellan Beteta, C.  | 144      |
|---------------------|----------|
| Abi, B              |          |
| Acosta, D.          | 254, 431 |
| Adams, B.           |          |
| Adloff, C.          | 117      |
| Adzic, P.           |          |
| Aglieri Rinella, G. |          |
| Ahluwalia, G        |          |
| Alessio, F.         | 514      |
| Allongue, B         |          |
| Aloisio, A          |          |
| Amaral, L.          |          |
| Andricek, L.        |          |
| Angelucci, B.       | 179      |
| Anghinolfi, F       | 62       |
| Antchev, G.         | 91, 610  |
| Antona, J-C.        |          |
| Arai, Y.            | 68       |
| Arcega, F.J.        |          |
| Armbruster, T.      |          |
| Arteche, F          |          |
| Arutinov, D.        | 548, 606 |
| Attenkofer, K.      |          |
| Avanzini, C         | 179      |
|                     |          |

#### B

| Bachtis, M.     | 191                |
|-----------------|--------------------|
| Baker, O.       |                    |
| Ballabriga, R.  |                    |
| Ballif, C.      |                    |
| Barbero, M.     |                    |
| Barney, D.      |                    |
| Baron, P.       |                    |
| Baron, S.       | 342, 352, 561, 631 |
| Bartknecht, S.  |                    |
| Bary, L.        |                    |
| Baschirotto, A. |                    |
| Battaglia, M.   |                    |
| Baudot, J.      | 47                 |
| Beaumont, W.    | 610                |
| Beccherle, R.   |                    |
| Becker, K.      |                    |
| Beimforde, M.   |                    |
| Belmonte, M     |                    |
| Ben Ami, S.     |                    |
| Benhammou, Y.   |                    |
| Berge, D.       |                    |
| Bertolone, G.   |                    |
| Bertolucci, S.  | 3                  |
| -               |                    |

| Besson, A.           | 47            |
|----------------------|---------------|
| Beucher, J.          |               |
| Bhattacharyya, S.    | 617           |
| Bialas, W.           |               |
| Bisello, D.          |               |
| Bizjak, I.           |               |
| Blanchot, G.         |               |
| Bochenek, M.         |               |
| Boek, J.             |               |
| Bogdan, M.           |               |
| Bohner, G.           |               |
| Bonacini, S.         | 342, 422, 557 |
| Bonilla Osorio, R.S. | 91            |
| Bortolin, C.         |               |
| Bouchel, M.          |               |
| Branchini, P.        |               |
| Brenner, R.          |               |
| Bressler, S.         |               |
| Breton, D.           | 149           |
| Brett, A.            |               |
| Brezina, C.          |               |
| Brianzi, M.          |               |
| Brogna, A.S.         | 47            |
| Bruzzi, M.           |               |
| Bucciolini, M.       |               |
| Burdalo, J.          |               |
| Byrum, K.            |               |
|                      |               |

### С

| Cachemiche, J-P.   | 631 |
|--------------------|-----|
| Calì, I.A.         |     |
| Callier, S.        |     |
| Calvet, D.         |     |
| Calvo, D.          |     |
| Campagne, J.E.     |     |
| Campbell, M.       | 157 |
| Candiano, G.       |     |
| Cao, T             |     |
| Capineri, L.       |     |
| Caponetto, L.      |     |
| Casas Cubillos, J. |     |
| Caselle, M.        |     |
| Cavallari, F.      |     |
| Cavicchioli, C.    |     |
| Cevenini, F.       |     |
| Chanal, H.         |     |
| Chantepie, B.      |     |
| Checcucci, B.      |     |
| Chefdeville, M.    | 117 |
| Chen, H.           |     |
| Chen, KF.          | 91  |
| Chen, W-T.         |     |

| Chironi, V.            |     |
|------------------------|-----|
| Cirrone, G.A.P.        |     |
| Civinini, C.           |     |
| Claus, G.              |     |
| Clémens, J-C.          |     |
| Coath, R.              |     |
| Cobanoglu, O.          |     |
| Cocciolo, G.           |     |
| Cocimano, R.           |     |
| Collazuol, G.          |     |
| Colledani, C.          |     |
| Comerma, A.            |     |
| Compton, K.            | 617 |
| Conforti Di Lorenzo, S |     |
| Contarato, D.          |     |
| Crampon, S.            |     |
| Cressler, J. D.        |     |
| Crone, G.              |     |
| Crooks, J.             |     |
| Cuttone, G.            |     |

## D

| D'Amico, S.            | .103 |
|------------------------|------|
| Da Silva, S.           | .321 |
| Dabrowski, W           | 579  |
| Daniel, L.             | .379 |
| Dannheim, D.           | .435 |
| Darwazeh, I.           | .352 |
| Dasu, S.               | .191 |
| de la Broise, X.       | .596 |
| de la Gama Serrano, J. | .161 |
| de La Taille, C        | 504  |
| De Masi, R.            | 47   |
| De Matteis, M.         | .103 |
| De Remigis, P.         | 52   |
| De Robertis, G.        | .557 |
| Degerli, Y.            | 47   |
| Delagnes, E149,        | 596  |
| Delbart, A.            | .596 |
| De Lizia, P.           | .103 |
| Della Volpe, D.        | .407 |
| Delord, V.             | .525 |
| Demarchi, D.           | .318 |
| Denes, P.              | .591 |
| Desch, K.              | .113 |
| Deschamps, O.          | .186 |
| Despeisse, M.          | 72   |
| Detraz, S              | 636  |
| Dhawan, S.             | .267 |
| Di Calafiori, D.       | .575 |
| Di Giglio, C.          | .520 |
| Di Giovanni, G.P254,   | 431  |
| Di Marco, E.           | .575 |
| Díez, S.               | .439 |
| Dissertori, G.         | .575 |
| Dopke, J.              | .404 |

| Dorokhov, A.   | 47                   |
|----------------|----------------------|
| Doziere, G     |                      |
| Drancourt, C   |                      |
| Dressnandt, N. |                      |
| Druillole, F   |                      |
| Dube, S.       |                      |
| Dulinski, W    | 47                   |
| Dulucq, F      | . 122, 308, 491, 504 |
| Dwuznik, M.    |                      |
| Dzahini, D     |                      |

## E

| Echizenya, Y |  |
|--------------|--|
| Efron, J.    |  |
| El Berni, M  |  |
| Elledge, R   |  |
| Ellege, D    |  |
| Ellis, N     |  |
| Emerson, V.  |  |
| Esteban, C.  |  |
| Etzion, E.   |  |
|              |  |

## F

| Faccio, F.            | .276, 342, 570, 579 |
|-----------------------|---------------------|
| Fang, X.              |                     |
| Farmahini-Farahani, A | 617                 |
| Farthouat, P.         |                     |
| Fedorov, T.           |                     |
| Fei, R.               |                     |
| Feld, L.              |                     |
| Fernandez Penacoba, G |                     |
| Fernández-Bedoya, C   |                     |
| Fernando, W.          |                     |
| Ferro-Luzzi, M.       |                     |
| Fischer, A.           |                     |
| Fischer, H.           |                     |
| Fischer, P.           |                     |
| Fisher, M.            |                     |
| Fleury, J.            |                     |
| Flick, T.             |                     |
| Fobes, R.             |                     |
| Foudas, C.            |                     |
| Fougeron, D.          |                     |
| França-Santos, H.     |                     |
| Francisco, R.         |                     |
| Friedl, M.            |                     |
| Friedrich, J.         |                     |
| Frisch, H. J.         |                     |
| Fuentes, C.           |                     |
| Fukuda, K.            |                     |
| Fukunaga, C.          |                     |
| Furic, I.             |                     |
|                       |                     |

### G

| Gabrielli, A. | <br>557 |
|---------------|---------|
| Gaglione, R.  | <br>117 |

| Galeotti, S.             |                |
|--------------------------|----------------|
| Gallin-Martell, L.       |                |
| Gallop, B.               |                |
| Gan, K.K.                |                |
| Garcia-Sciveres, M.      | .220, 548, 606 |
| Garnier, J-C.            |                |
| Gartner, J.              |                |
| Gascón, D.               | 144            |
| Gaspar de Valenzuela, A. | 144            |
| Gelin, M.                | 47             |
| Genat, J-F. C.           |                |
| Gensolen, F.             | 606            |
| Giordano, R.             |                |
| Gipper, J.               |                |
| Giubilato, P.            |                |
| Gnani, D.                |                |
| Go, A.                   |                |
| Godbeer, A.              |                |
| Godiot, S.               |                |
| Goffe, M.                | 47             |
| Gomez-Reino, R.          |                |
| Gong, D.                 | 471, 476, 481  |
| Gonzalez-Sevilla, S.     |                |
| Gorini, B.               |                |
| Gorski, T.               | 191            |
| Gousiou, E.              | 161            |
| van der Graaf, H.        |                |
| Grabas, H.               |                |
| Green, B.                | 407            |
| Gregerson, A.            | 617            |
| Grillo, A.A.             |                |
| Grogg, K.                | 191            |
| Gromov, V.               | 113, 548, 606  |
| Gronewald, M.            |                |
| Grothe, M.               | 191            |
| Gui, P.                  |                |
| Guilloux, F.             |                |
| Guo, Y.                  |                |
| Guzik, Z.                | 514            |

## Н

| Haas, S          |  |
|------------------|--|
| Hall, G          |  |
| Hallewell, G.    |  |
| Hambarzumjan, A. |  |
| Hammar, A.       |  |
| Hansen, M.       |  |
| Hanzlik, J.      |  |
| Hartin, P        |  |
| Hasegawa, S.     |  |
| Hasegawa, Y.     |  |
| Hayakawa, T.     |  |
| Heim, T.         |  |
| Heintz, M.K.     |  |
| Hemperek, T.     |  |
| Henß, T.         |  |
| Herrmann, F.     |  |
|                  |  |

| Hershenhorn, A. |  |
|-----------------|--|
| Himmi, A        |  |
| Hod, N          |  |
| Holmes, D.      |  |
| Horn, G.        |  |
| Hostachy, J-Y.  |  |
| Hou, S          |  |
| Hu, Y           |  |
| Huber, S        |  |
| Huffman, B.T.   |  |
| Hu-Guo, C       |  |

## I

| 211, 417 |
|----------|
|          |
|          |
|          |
| 14       |
|          |
|          |
|          |
|          |

### J

| Jaaskelainen, K. |     |
|------------------|-----|
| Jacobsson, R.    | 514 |
| Janner, D.       |     |
| Jarron, P.       | 72  |
| Johnson, M.      |     |
| Jones, J.        |     |
| Joos, M.         |     |
| Jovanovic, D.    |     |
| Jurga, P.        |     |
| Jussen, R.       |     |
|                  |     |

## K

| Kagan, H.P.    |               |
|----------------|---------------|
| Kajomovitz, E. |               |
| Kaplon, J.     | 62, 72, 579   |
| Karagounis, M. | 220, 548, 606 |
| Karpinski, W.  |               |
| Kass, R.D.     |               |
| Kawamoto, T.   |               |
| Kersten, S.    |               |
| Kessoku, K.    |               |
| Ketzer, B.     |               |
| Khanna, R.     |               |
| Kieft, G.      |               |
| Kierstead, J.  |               |
| Kim, H.        |               |
| -              |               |

| Kind, P.           | 466           |
|--------------------|---------------|
| Kishiki, S.        |               |
| Klabbers, P.       | 191           |
| Klein, K           |               |
| Kloukinas, K.      |               |
| Kluge, A.          |               |
| Kluit, R           | 113, 548, 606 |
| Königsmann, K.     |               |
| Kononenko, W.      |               |
| Konorov, I         |               |
| Konstantinidis, N. |               |
| Kordas, K.         |               |
| Kotov, K.          | 254           |
| Koziel, M.         | 47            |
| Krämer, M.         |               |
| Krueger, H         |               |
| Kruth, A.          | 113, 548, 606 |
| Kubota, T.         |               |
| Kugathasan, T.     |               |
| Kugel, A.          |               |
| Kulis, S           |               |
| Kuo, CM            | 91            |
| Kurachi, I         | 68            |
| Kurashige, H.      |               |
| Kuriyama, N        | 68            |
| Kushpil, V         |               |

#### L

| La Marra, D.   | 62            |
|----------------|---------------|
| Lamanna, G.    | 179           |
| Lanni, F.      |               |
| Larsen, D.T.   |               |
| Lauser, L.     | 410           |
| Lazaridis, C.  | 191           |
| Le Coguie, A.  |               |
| Lebbai, M.R.M. |               |
| Lecoq, J.      |               |
| Lefèvre, R.    |               |
| Lellouch, D.   |               |
| Leonard, J.    | 191           |
| Leonora, E.    |               |
| Leshev, G.     | 575           |
| Levinson, L.   |               |
| Li, S-W.       |               |
| Liang, Z.      |               |
| Liu, C.        | 471, 476, 481 |
| Liu, T         | 471, 476, 481 |
| Llopart, X.    |               |
| Lo Presti, D.  | 86, 303, 462  |
| Loffredo, S.   |               |
| Lolli, M.      |               |
| Lounis, A.     |               |
| Lu, RS.        | 91            |
| Lundberg, J.   |               |
| Lynn, D.       |               |
| -              |               |

#### M

| Maalmi, J.           |                         | 9  |
|----------------------|-------------------------|----|
| Macchiolo, A.        |                         | 5  |
| Madorsky, A          |                         | 1  |
| Maettig, S.          |                         | 0  |
| Magazzù, G.          |                         | 9  |
| Magne, M.            |                         | 5  |
| Mandić, I.           |                         | 9  |
| Mann, A.             |                         | 2  |
| Marchioro, A.        |                         | 7  |
| Marin, F.            |                         | 1  |
| Marrazzo, L.         |                         | 5  |
| Martchovsky, A       |                         | 9  |
| Martin-Chassard, G   |                         | ,  |
|                      | 491, 504                | 4  |
| Martinez-McKinney, F |                         | 9  |
| Masetti, G.          |                         | 1  |
| Mathez, H.           |                         | 3  |
| Matson, R.           |                         | 9  |
| Matsushita, T.       |                         | 0  |
| Mattiazzo, S.        |                         | 1  |
| Mättig, P.           |                         | 5  |
| Matveev, M.          |                         | 1  |
| May, E.N             |                         | 5  |
| Mayers, G.           |                         | 9  |
| Mazza, G.            |                         | 1  |
| Mazzaglia, E         |                         | 5  |
| Mazzucato, E         |                         | 5  |
| Meehan, S.           |                         | 5  |
| Mekkaoui, A          |                         | 5  |
| Menichelli, D.       |                         | 5  |
| Menouni, M.          |                         | 5  |
| Meroli, S.           |                         | 1  |
| Merritt, H.          |                         | 3  |
| Merz, J.             |                         | 1  |
| Meschini, M.         |                         | 4  |
| Messina, A.          |                         | )  |
| Michelis, S.         |                         | 5  |
| Mignone, M.          |                         | 2  |
| Mikenberg, G         |                         | )  |
| Milenovic, P.        |                         | 5  |
| Misiejuk, A.         |                         | 7  |
| Monmarthe, E         |                         | 5  |
| Moore, J.R.          |                         | 3  |
| Moreira, P.          | 321, 326, 342, 352, 422 | ', |
|                      | 486, 557, 570, 631, 63  | 6  |
| Morel, F.            | 47                      | 7  |
| Morris, J.D.         |                         | 5  |
| Moser, HG.           |                         | 5  |
| Müller, F.           |                         | 5  |
| Musso, C.            |                         | 1  |
|                      |                         |    |

### Ν

| Nagarkar, A. | .338 |
|--------------|------|
| Natoli, T.   | .495 |

| Neufeld, N.    |  |
|----------------|--|
| Newcomer, F.M. |  |
| Newcomer, M.   |  |
| Nishiyama, T.  |  |
| Nisius, R.     |  |
| Northrop, R.   |  |
| Noulis, T.     |  |
| Nunzi Conti, G |  |
|                |  |

#### 0

| Oberla, E.    |    |
|---------------|----|
| Oberlack, H.  |    |
| Ochi, A.      |    |
| Oda, S.       |    |
| Ohno, M.      |    |
| Okihara, M.   | 68 |
| Okumura, Y.   |    |
| Olivier, J.A. |    |
| Omachi, C.    |    |
| Orlandi, S.   |    |
| Orsini, F.    | 47 |
|               |    |

#### Р

| Padley, P.          |                     |
|---------------------|---------------------|
| Paillard, C.        |                     |
| Pangaud, P.         |                     |
| Pantano, D.         | 591                 |
| Papadopoulos, S     | .347, 352, 486, 636 |
| Papakonstantinou, I | .347, 352, 486, 636 |
| Park, J.E.          |                     |
| Parrini, G.         |                     |
| Patras, V.          | 91                  |
| Paul, S.            |                     |
| Pauly, T.           |                     |
| Pelli, S.           |                     |
| Perić, I.           | 457                 |
| Pernecker, S.       | 62                  |
| Pernicka, M.        |                     |
| Perret, P.          |                     |
| Phillips, P.W.      |                     |
| Phillips, S.        |                     |
| Picatoste, E        | 144                 |
| Pignard, C.         | 539                 |
| Piguet, C.          |                     |
| Pinilla, N.         |                     |
| Plackett, R.        | 157                 |
| Plishker, W.        | 617                 |
| Poltorak, K.        |                     |
| Pons, X.            | 575                 |
| Pospelov, G.        |                     |
| Pozzobon, N.        | 591                 |
| Pruneri, V.         |                     |
| Punz, T.            | 575                 |
| Puzovic, J.         | 575                 |

#### R

| Randazzo, N              | 86, 303, 462 |
|--------------------------|--------------|
| Ranieri, A.              |              |
| Rarbi, F.                |              |
| Raux, L.                 |              |
| Redjimi, L.              |              |
| Reimann, O.              |              |
| Rescia, S.               |              |
| Reynaud, S.              | 91           |
| Richter, R.H.            |              |
| Riera-Baburés, J.        | 144          |
| Rivetta, C.              |              |
| Rivetti, A.              |              |
| Rizatdinova, F.          |              |
| Rodriguez Estupinan, J.S |              |
| Roselló, M.              | 144          |
| Rossetto, O.             |              |
| Rozanov, A.              |              |
| Ruat, M.                 |              |
| Rudert, A.               |              |
| Ruggiero, G.             |              |
| Rui Silva, S.            |              |
| Russo, G.V.              |              |

### S

| Sadrozinski, H.FW. |                    |
|--------------------|--------------------|
| Sakamoto, H.       |                    |
| Salgado, H.M.      |                    |
| Sammet, J.         |                    |
| Santoro, R.        |                    |
| Santos Amaral, L.  |                    |
| Santos, G.         |                    |
| Santos, L.         |                    |
| Sarrabayrouse, G.  |                    |
| Sasaki, O.         |                    |
| Savin, A.          |                    |
| Schacht, P.        |                    |
| Schill, C.         |                    |
| Schipper, J.D.     |                    |
| Schmitt, K.        |                    |
| Schopferer, S.     |                    |
| Schroer, N.        |                    |
| Schulte, M.        |                    |
| Sedita, M.         |                    |
| Seguin-Moreau, N.  |                    |
| Seiden, A.         |                    |
| Seif El Nasr, S.   |                    |
| Shaw, R.           |                    |
| Shen, W.           |                    |
| Sherman, D.        |                    |
| Sigaud, C.         | 347, 352, 486, 636 |
| Silva, S.          |                    |
| Silver, Y.         |                    |
| Singovski, A.      |                    |
| Sipala, V.         |                    |
| Siskos, S.         |                    |

| Skubic, P.L.   |                         |
|----------------|-------------------------|
| Smith, D.S.    |                         |
| Smith, H.      |                         |
| Smith, W.H.    |                         |
| Soos, C        | 347, 352, 486, 631, 636 |
| Sozzi, M.      |                         |
| Specht, M.     |                         |
| Spencer, E     |                         |
| Spieler, H.    |                         |
| Spiezia, G.    |                         |
| Spiwoks, R.    |                         |
| Stamen, R      |                         |
| Stanek, R.     |                         |
| Stejskal, P    |                         |
| Straessner, A  |                         |
| Strang, M.     |                         |
| Su, D-S        |                         |
| Sugaya, Y.     |                         |
| Sugimoto, T    |                         |
| Šumbera, M     |                         |
| Sun, Q         |                         |
| Sutton, A.K.   |                         |
| Sutton, M.     |                         |
| Suzuki, Y      |                         |
| Swientek, K.   |                         |
| Szelezniak, M. |                         |

## Т

| Takahashi, Y.        |                         |
|----------------------|-------------------------|
| Takeshita, T.        |                         |
| Talamonti, C.        |                         |
| Tanaka, S.           |                         |
| Tang, F.             |                         |
| Tarem, S.            |                         |
| Tazawa, Y.           |                         |
| Teixeira-Dias, P     |                         |
| Teng, P-K.           |                         |
| Tesi, M.             |                         |
| Tessaro, M.          |                         |
| Thienpont, D.        |                         |
| Thome, J.R.          |                         |
| Tic, T.              |                         |
| Tipton, P.           |                         |
| Tlustos, L.          |                         |
| Tomoto, M.           |                         |
| Torcato de Matos, C. |                         |
| Torheim, O.          |                         |
| Tremblet, L.         |                         |
| Troska, J.           | 321, 347, 352, 486, 636 |
| Turchetta, R.        |                         |
| •                    |                         |

## U

| Ullán, M   |      | .439 |
|------------|------|------|
| Uvarov, L. | 254, | 431  |

#### V

| Valentini, S.       |               |
|---------------------|---------------|
| Valin, I.           |               |
| Vankov, P.          |               |
| Varner, G.          |               |
| Vasey, F.           | 347, 352, 486 |
| Venditti, S         |               |
| Vermeulen, J.       |               |
| Versmissen, H.      | 636           |
| Vert, PE.           |               |
| Vichoudis, P.       |               |
| Vila, I.            |               |
| Vilasís-Cardona, X. |               |
| Villani, E.G.       |               |
| Vouters, G.         |               |
| Voutsinas, Y.       | 47            |

### W

| Wang, D.     |          |
|--------------|----------|
| Warren, M.   | 239      |
| Weber, M.    |          |
| Wei, W.      |          |
| Weidberg, A. |          |
| Weinberg, M. | 191      |
| Wermes, N.   |          |
| Werner, P.   |          |
| Wheadon, R.  |          |
| Wickens, F.  |          |
| Wijnands, T. |          |
| Wilder, M.   | 379, 439 |
| Wilson, M.   |          |
| Winter, M.   |          |
| Wollny, H.   |          |
| Wong, W.     |          |
| Wyllie, K.   |          |
| Wyrsch, N.   |          |
| Wyss, J.     |          |
| -            |          |

## X

| Xiang,  | A.C. | <br> | <br> | 471, | 476, | 481 |
|---------|------|------|------|------|------|-----|
| Xie, Ž. |      | <br> | <br> |      |      | 617 |

### Y

| Ye, J       |  |
|-------------|--|
| Yu, B       |  |
| Yurtsev, E. |  |

### Z

| Zappon, F               |  |
|-------------------------|--|
| Zeitnitz, C.            |  |
| Zelepukin, S            |  |
| Zito, M.                |  |
| Zelepukin, S<br>Zito, M |  |