AIDA-PUB-2013-017 -

## **AIDA**

Advanced European Infrastructures for Detectors at Accelerators

# **Journal Publication**

# Prototype of a gigabit data transmitter in 65 nm CMOS for DEPFET pixel detectors at Belle-II

Kishishita, T (UBONN) et al

01 August 2013



The research leading to these results has received funding from the European Commission under the FP7 Research Infrastructures project AIDA, grant agreement no. 262025.

This work is part of AIDA Work Package **3: Microelectronics and interconnection technology**.

The electronic version of this AIDA Publication is available via the AIDA web site <http://cern.ch/aida> or on the CERN Document Server at the following URL: <http://cds.cern.ch/search?p=AIDA-PUB-2013-017>

## Nuclear Instruments and Methods in Physics Research A

Contents lists available at SciVerse ScienceDirect



journal homepage: www.elsevier.com/locate/nima

## Prototype of a gigabit data transmitter in 65 nm CMOS for DEPFET pixel detectors at Belle-II



### T. Kishishita<sup>\*</sup>, H. Krüger, T. Hemperek, M. Lemarenko, M. Koch, M. Gronewald, N. Wermes

Universität Bonn, Physikalisches Institut, Nussallee 12, 53115, Germany

#### ARTICLE INFO

SEVIEI

#### ABSTRACT

Available online 10 November 2012 Keywords: Front end DEPFET Gbit link

This paper describes the recent development of a gigabit data transmitter for the Belle-II pixel detector (PXD). The PXD is an innermost detector currently under development for the upgraded KEK-B factory in Japan. The PXD consists of two layers of DEPFET sensor modules located at 1.8 and 2.2 cm radii. Each module is equipped with three different ASIC types mounted on the detector substrate with a flip-chip technique: (a) SWITCHER for generating steering signals for the DEPFET sensors, (b) DCD for digitizing the signal currents, and (c) DHP for performing data processing and sending the data off the module to the back-end data handling hybrid via  $\sim$  40 cm Kapton flex and 12–15 m twisted pair (TWP) cables. To meet the requirements of the PXD data transmission, a prototype of the DHP data transmitter has been developed in a 65-nm standard CMOS technology. The transmitter test chip consists of current-mode logic (CML) drivers and a phase-locked loop (PLL) which generates a clock signal for a 1.6 Gbit/s output data stream from an 80 cm reference clock. A programmable pre-emphasis circuit is also implemented in the CML driver to compensate signal losses in the long cable by shaping the transmitted pulse response. The jitter performance was measured as 25 ps (1 $\sigma$  distribution) by connecting the chip with 38 cm flex and 10 m TWP cables.

© 2012 Elsevier B.V. All rights reserved.

#### 1. Introduction

The Belle-II pixel detector (PXD) is based on a DEPFET sensor principle [1], in which the first amplification stage is integrated into the bulk of a depleted silicon sensor. Due to its internal amplification, the DEPFET sensor can be made very thin down to 50 µm, minimizing the multiple scattering from the high luminosity  $e^+e^-$  beams (up to  $\sim 8 \times 10^{35} \text{ cm}^{-2} \text{ s}^{-1}$ ) of the upgraded KEK-B factory in Japan. In addition to that, the small capacitance of the charge collection node and the large signal provided by the fully depleted bulk enable to achieve a sufficient noise performance as an innermost tracking detector.

Fig. 1 shows the Belle-II PXD readout system. The PXD consists of two layers of DEPFET sensor modules located at 1.8 and 2.2 cm radii. The DEPFET matrix is read out by selecting one row at a time and reading current signals of the columns in parallel. Each module is equipped with three different ASIC types mounted on the detector substrate with a flip-chip technique [2]: the "Drain Current Digitizers" (DCDs), which digitize the drain currents from a row of pixels [3]; and the "SWITCHERs", which select and clear the pixels rowwise to send the currents to the DCDs [4,5]; and the "Data Handling Processors" (DHPs) are used to reduce the data rates of the DCDs by zero-suppression and readout triggered data only [6]. While the

\* Corresponding author. E-mail address: kisisita@physik.uni-bonn.de (T. Kishishita). SWITCHERs are located along the side of the DEPFET sensor, DCDs and DHPs are located at the end of the sensor.

After data processing in the DHP, the data are sent off the module to a back-end board called "Data Handling Hybrid" (DHH) via  $\sim 40 \mbox{ cm}$  Kapton flex and 12–15 m twisted pair (TWP) cables. The DHH generates a DHP system clock from a machine interface clock and receives the data from DHPs with a Gbit link receiver. The PXD is readout in a rolling shutter mode with a 10 cm line frequency, resulting in a 20 µs frame rate. Referring to the numbers given for one half-module (see Fig. 1), the output data stream from four DCD chips can be estimated as: ADC resolution  $(8-bit) \times$  channels (256-ch) $\times$  line frequency  $\times$  4 DCD chips = 82 Gbit/s. DHP chips will reduce the data rate to an average of 5 Gbit/s (assuming 3% pixel occupancy and 30 kHz trigger rate). The triggered hit data are transmitted to the DHH with high speed links running at 1.6 Gbit/s per chip, including 8b/10b coding overhead of 20%. An overall maximum data rate is thus 6.4 Gbit/s per half-module. To meet the requirements of the PXD data transmission, DHP test chips are currently under development. In this paper, we focus on the recent development of a DHP data transmitter chip in TSMC 65-nm CMOS technology.

#### 2. Design of the data transmitter test chip

#### 2.1. Overview of the chip

Fig. 2 shows the block diagram of the Gbit data transmitter chip. The overall circuit consists of current-mode logic (CML)

<sup>0168-9002/\$ -</sup> see front matter © 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.nima.2012.11.013

drivers with programmable pre-emphasis circuits and phaselocked loop (PLL) which generates a clock signal for a 1.6 Gbit/s output data stream from an 80 cm reference clock. An 8-bit linear feedback shift register (LFSR) is also integrated on the chip, which generates a pseudo-random sequence for testing the signal integrity of Gbit link outputs. The PLL provides three output taps: 320 cm, 800 cm, and 1.6 GHz. Two CML output drivers allow to drive the reference clock or the 800 cm clock and the 1.6 Gbit/s LFSR signal off the chip.



Fig. 1. Belle-II PXD readout system.



Fig. 2. Block diagram of the Gbit data transmitter test chip.

#### 2.2. Phase-locked loop

The PLL architecture is based on a classical type-II charge pump PLL [7,8]. Fig. 3 shows the block diagram of the PLL with its main building blocks: a phase frequency detector (PFD), charge pump (CP), loop filter (LF), voltage-controlled oscillator (VCO), and frequency dividers (FDs). The circuit topology is inherited from the pixel front-end chip (FE-I4) for the upgraded ATLAS experiment [9].

The PFD is implemented with two flip-flops and an AND gate. The PFD detects phase and frequency differences between reference ( $f_{\text{REF}}$ ) and feedback ( $f_{\text{FD}}$ ) clocks, and then, generates output signals ("UP" and "DN") for controlling current paths of the charge pump. When the rising edge of  $f_{\text{REF}}$  leads the rising edge of  $f_{\text{FD}}$ , the "UP" signal goes high, while "DN" remains low. On the other hand when  $f_{\text{FD}}$  leads  $f_{\text{REF}}$ , "UP" remains low, while "DN" goes high. A simple loss of lock detection circuitry is implemented with additional flip-flops. This circuit reacts on the signal durations of the "UP" and "DN" signals. When "UP" (or "DN") remains high for longer than a certain delay time T, a flip-flop latches the signal, and then, the "Ref2Fast" (or "Fb2Fast") node goes high, indicating that severe changes have occurred in a control voltage ( $V_{\text{ctrl}}$ ).

The CP is based on a differential architecture with a complementary dummy branch. While a main branch is controlled by "UP" and "DN" signals coming from the PFD, a complementary branch is controlled by "UP" and "DN". Thus current sources provide a constant current without switching on and off. Minimum sized transistors are used for switching components, and thus, the charge injection into the loop filter due to breaking a current path is minimized. The gain of the PFD is given as

$$K_{\rm PFD} = \frac{I_{\rm CP}}{2\pi} = 1.59 \left[\mu A/rad\right] \tag{1}$$

where  $I_{CP}$  (= 10 µA) is the charge pump current.

The output current from the CP charges up a capacitor in the LF (indicated as  $C_{\text{pole}}$  in Fig. 3) and generates the control voltage ( $V_{\text{ctrl}}$ ). High-frequency noise components included in the CP output are filtered by a low-pass characteristic of the LF. Metal-insulator-metal (MIM) structures are used for capacitors. The LF transfer function is written as

$$F(s) = \frac{\left(R_{\text{notch}} + \frac{1}{sC_{\text{notch}}}\right) \cdot \frac{1}{sC_{\text{pole}}}}{R_{\text{notch}} + \frac{1}{sC_{\text{notch}}} + \frac{1}{sC_{\text{pole}}}}.$$
(2)



Fig. 3. Block diagram of the phase-locked loop.



Fig. 4. Schematic of the voltage-controlled oscillator (VCO) and its inverter stage.



Fig. 5. 800 MHz output clock vs. V<sub>ctrl</sub> with different corner parameters.

Here we neglected higher-order low-pass filter components ( $R_{ripple}$  and  $C_{ripple}$  in Fig. 3).

The VCO consists of three inverters connected as a ring oscillator. We implemented the inverter with differential pairs loaded with PMOS active loads and cross-coupled stages for rail-to-rail switchings [10] (Fig. 4). The  $V_{\rm ctrl}$  from the LF controls tail currents and PMOS loads. The gain of the VCO was carefully designed and set at  $K_{\rm VCO} = 22\,800\,{\rm Mrad/V}$  for a nominal process parameter with taking parasitic capacitances into consideration (see Fig. 5).

The FD consists of four custom-made dividers with two toggle flip-flops. The nominal VCO output frequency of  $f_{VCO} = 1.6$  GHz is consecutively divided down to 800 cm, 320 cm, and finally 80 MHz, equaling to a total frequency division factor of N=20. The open-loop transfer function of the overall circuit is given as

$$A(s) = K_{\rm PFD} \cdot F(s) \cdot \frac{K_{\rm VCO}}{s} \cdot \frac{1}{N}.$$
(3)

The detailed design parameters referring to the transfer function are summarized in Table 1. Here  $\omega_c$  indicates a cross-over frequency,  $\omega_z$  a zero frequency,  $\omega_p$  a pole frequency, and  $\eta$  a dumping factor.

#### 2.3. CML driver with programmable pre-emphasis

The CML driver is based on a differential architecture as shown in Fig. 2. Main components of the driver consist of two pull-up resistors, two NMOS transistors as switching components and a current source. The NMOS transistors control the current flow of each side of the differential pair according to the differential input. A pre-driver circuit generates gate voltages with a precise control of the switching phases to keep the tail current source always in saturation and to minimize the common-mode output signal. The layout was carefully designed for impedance matching by using poly-silicon resistors and dummy structures.

An additional differential pair was implemented for signal preemphasis. This is accomplished by adding a delayed and inverted version of the signal to the output. The pre-emphasis circuit boosts the high frequency components and compensates losses at high-frequency during the data transmission. While the amplitude of the pre-emphasis is adjusted by the tail bias current source, the pulse width can be adjusted up to 600 ps in four fixed delay steps.

 Table 1

 Summary of PLL design parameters.

| Coolo               | 198 fF   |
|---------------------|----------|
| R <sub>potch</sub>  | 23.4 kΩ  |
| C <sub>notch</sub>  | 3.86 pF  |
| R <sub>ripple</sub> | 53.5 kΩ  |
| C <sub>ripple</sub> | 30.6 fF  |
| ως                  | 42.5 MHz |
| ωz                  | 11.1 MHz |
| $\omega_p$          | 228 MHz  |
| η                   | 0.98     |



Fig. 6. Transient response of  $V_{ctrl}$  with different corner parameters.



Fig. 7. Data stream of the 1.6 Gbit/s LFSR pseudo-random signal and the 800 MHz clock with the CML pre-emphasis off.



Fig. 8. Waveforms of the 800 MHz pre-emphasis output with different delay settings (Left) and amplitude settings (Right).



Fig. 9. Eye diagram and jitter performance of the 1.6 Gbit/s LFSR output with the CML pre-emphasis on.

#### 3. Simulation results

Fig. 5 shows the simulation result of the 800 MHz PLL clock output vs. V<sub>ctrl</sub> of the VCO. Since parasitic components easily affect the oscillation frequency, we optimized the VCO layout with taking the parasitics into consideration. As shown in the figure, the VCO frequency range can be tuned over a wide frequency range and the target frequency of  $f_{\rm VCO} = 1.6 \text{ GHz}$  is secured under  $3\sigma$  process variations without additional external tuning. Fig. 6 shows the transient response of the  $V_{ctrl}$  with different corner parameters. The simulation is based on a parasitic extraction of the overall PLL circuit. The  $V_{ctrl}$  settles in  $t_{\text{settle}} \sim 750 \text{ ns}$  within an accuracy of 2% of its final value in all corners. The dominant phase noise is the flicker noise coming from the bias current sources of the VCO. Although the phase noise of a ring oscillator is higher than a conventional LC-tank oscillator, a ring oscillator has the advantages of a larger frequency tuning range and smaller layout area, which are suitable for our application. The occupying area of the PLL circuit is  $140 \,\mu\text{m} \times 55 \,\mu\text{m}$ . The power consumption of the PLL is 1.25 mW with a supply voltage of 1.2 V.

#### 4. Measurement results

We connected the outputs of the transmitter with flex cable and long TWP cables to imitate the actual PXD readout. Fig. 7 shows the data stream of the 1.6 Gbit/s LFSR output connected with 38 cm flex cable and 10 m TWP cables while the 800 cm output connected with short coax cables to an oscilloscope. The measurement was performed with the CML pre-emphasis off. As shown in the figure the eye diagram of the 1.6 Gbit/s data stream looks almost closed. Then we measured the 800 MHz output with pre-emphasis on. Fig. 8 shows the waveforms with different preemphasis settings. Fig. 9 (upper-left panel) shows the eye diagram of the 1.6 Gbit/s LFSR output with the CML pre-emphasis on in the same cable configuration of Fig. 7. The opening eye diagram is

Table 2

| Summary o | f the | measurements. |
|-----------|-------|---------------|
|-----------|-------|---------------|

| Mesurement setup      | $1\sigma$ jitter (ps) | pk-pk amplitude (mV) |
|-----------------------|-----------------------|----------------------|
| 10 m TWP              | 25                    | 400                  |
| 38 cm flex + 10 m TWP | 25                    | 200                  |
| 38 cm flex + 20 m TWP | 42                    | 100                  |

400 ps on the time axis and 200 mV on the voltage axis which conforms to the requirements of the FPGA receivers. The jitter performance was measured as 25 ps  $(1\sigma)$ . We also performed the jitter and amplitude measurements with different cable setups. The results are summarized in Table 2.

#### 5. Conclusion

A prototype Gbit data transmitter has been developed in TSMC 65-nm standard CMOS technology for the Belle-II pixel detector readout. To cope with the data rate from the DEPFET sensor in a rolling shutter readout mode, DHP chips are used for data reduction and sending the data off the sensor module to the backend board via the high-speed data transmitter. We adopted a type-II PLL as a clock generator, which generates a 1.6 GHz clock signal from an 80 MHz reference clock. To compensate signal losses in the long cable configuration, a programmable preemphasis circuit has been implemented in the output CML drivers. The jitter distribution of the 1.6 Gbit/s LFSR output was measured as 25 ps (1 $\sigma$ ) with 38 cm flex and 10 m TWP cables. In addition to the promising results, a natural radiation hardness to the transistor devices due to its thin gate oxide thickness gives us a merit to use the process for front-end electronics of future high energy physics experiments.

#### References

Section A 273 (1988) 588.

- [6] M. Lemarenko, et al., The data handling processor for the Belle II vertex detector: efficiency optimization, in: Proceedings of TWEPP'11, 2011.
- [7] B. Razavi, Design of Analog CMOS Integrated Circuits, McGraw-Hill Compa-[1] G. Lutz, J. Kemmer, Nuclear Instruments and Methods in Physics Research nies, 2003.
- [8] R.J. Baker, CMOS Circuit Design, Layout, and Simulation, IEEE Series on [2] H. Krüger, Nuclear Instruments and Methods in Physics Research Section
- A 617 (2010) 337. [3] I. Perić, IEEE Transactions on Nuclear Science NS-57 (2010) 743.
- [4] C. Sandow, et al., Nuclear Instruments and Methods in Physics Research Section A 568 (2005) 176.
- [5] P. Fischer, C. Kreidl, I. Perić, Steering and readout chips for DEPFET sensor matrices, in: Proceedings of TWEPP'07, 2007.
- Microelectronic Systems, 2010. [9] A. Kruth, et al., Charge pump clock generation PLL for the data output block of
- the upgraded ATLAS pixel front-end in 130 nm CMOS, in: Proceedings of TWEPP'09, 2009.
- [10] T.V. Cao, et al., Low phase-noise and wide tuning-range CMOS differential VCO for frequency  $\Delta\Sigma$  modulator, in: IEEE Computer Society Annual Symposium on VLSI, 2009.