# The Serial Link Processor for the Fast TracKer (FTK) processor at ATLAS

N. V. Biesuz, S. Citraro, P. Luciano, D. Magalotti, E. Rossi

Abstract – The Associative Memory (AM) system of the Fast Tracker (FTK) processor has been designed to perform pattern matching using the hit information of the ATLAS experiment silicon tracker. The AM is the heart of FTK and is mainly based on the use of ASICs (AM chips) designed to execute pattern matching with a high degree of parallelism. The AM system finds track candidates at low resolution that are seeds for a full resolution track fitting. To solve the very challenging data traffic problems inside FTK, multiple board and chip designs have been performed. The currently proposed solution is named the "Serial Link Processor" and is based on an extremely powerful network of 828 2 Gb/s serial links for a total in/out bandwidth of 56 Gb/s.

This paper reports on the design of the Serial Link Processor consisting of two types of boards, the Local Associative Memory Board (LAMB), a mezzanine where the AM chips are mounted, and the Associative Memory Board (AMB), a 9U VME board which holds and exercises four LAMBs.

We report on the performance of the hardware and firmware intermediate prototypes produced for the global FTK integration, an important milestone to be reached before the FTK production.

### I. INTRODUCTION

The Fast TracKer (FTK) processor [1], organized in a 2-level pipelined architecture, executes a very fast track reconstruction algorithm based on the use of a large bank of pre-stored patterns of trajectory points, the pattern bank [3]. The Associative Memory (AM) system implements the first stage by recognizing track candidates at low resolution to match the demanding task of tracking at the detector readout rate. The second stage receives track candidates and high resolution hits to perform full resolution track fitting at the AM output rate.

The AM system consists of AM chips [2], an ASIC designed and optimized for this particular application, and two boards, the local associative memory board (LAMB), a mezzanine where the AM chips are organized, and a 9U VME board, (AMBoard), where the LAMBs are organized. Both the AM chip and the boards have a long development history. We report about the last version, built for the final AM chip that is provided of Serialized input/output. The final system (Serial Link Processor, SLP) requires the development of:

- a new motherboard, the AMBSLP;
- A LAMBSLP mezzanine named miniLamb is used to accommodate the mini@sic [4], the first AM chip prototype with serial I/O;

• A LAMBSLP mezzanine prototype for the package HS BGA 529 that is common to both AMchip05, the latest AM chip prototype, and AMchip06, the final version of the AM chip.

The mini@sic has been produced to test the new AMchip serialized I/O, while the AMchip05 is a low cost intermediate step that will allow tests of the whole system (final boards and final AMchip architecture) even if the number of patterns will be extremely reduced compared to the final AMchip06.

We report also about the CERN integration tests, the consumption measurements and the cooling tests, executed and planned.

## II. THE AMBSLP AND LAMBSLP



Figure 1: AMBSLP and AUX card

The AMBSLP is a 9U VME board on which 4 LAMBSLPs are mounted. Figure 1 shows the AMBSLP, highlighting the LAMBSLP mezzanine positions (in white solid lines). A network of high speed serial links characterizes the bus distribution on the AMBSLP: 12 input serial links (black solid arrows) that carry the silicon hits from the P3 connector to the LAMBs, and 16 output serial links (each red dashed arrow represents 4 links) that carry the identification numbers of matched patterns (named roads) from the LAMBs to P3. These buses are connected to an auxiliary card (AUX) [4] that sits on the back of the crate in the same slot, through a high frequency ERNI P3 connector (green dashed square). The board on the back performs the full resolution track fitting, refining the AMBSLP work.

The data rate is up to 2 Gb/s on each serial link. Thus the AMBSLP has to handle a challenging data I/O traffic rate of 56 Gbit/s. A huge number of silicon hits must be distributed at high rate with very large fan-out to all patterns (more than 8 million patterns will be located on 64 AM chips on a single AMBSLP) and a similarly large number of roads must be

N. V. Biesuz, S. Citraro, P. Luciano, E. Rossi with Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Pisa and University of Pisa, Italy.

D. Magalotti with Modena University & INFN, Sezione di Perugia, Italy.

collected and sent back to the FTK full resolution track fitting function.

The motherboard has flexible control logic placed inside a group of FPGA chips visible in the figure. They are 2 large Xilinx Artix 7 [6] for data transfer and Xilinx Spartan6 FPGAs [6] for control and VME logic. The Artix 7 have 16 Low-Power Gigabit Transceivers (GTP). Each GTP transceiver is a combined transmitter and receiver capable of operating at data rates up to 7 Gb/s. This ultra-fast data transmission requires specialized, dedicated on-chip circuitry and differential I/O capable of coping with the signal integrity issues.

The incoming hits are received by the GTPs in the input FPGA (blue box) and saved in large de-randomizing FIFOs. The Control FPGA handles the event processing and error conditions. When the Control chip starts to process an event, hits are popped in parallel from all the hit input FIFOs and are simultaneously sent to the four LAMBs. An End Event word, which includes the event tag, separates hits belonging to different events. If occasionally a FIFO becomes Almost Full, a HOLD signal is sent to the upstream board, which suspends data flow until more FIFO locations become available. The threshold at which the HOLD signal is raised and the size of the FIFOs are set to give the upstream board enough time to react and to avoid frequent backpressure.

This motherboard will also fed all daughter boards with the power generated through 3types DC-DC converters.

# III. THE LAMBSLP

The LAMBSLP is a daughter board used to locate the AM chip. It communicates with the AMBSLP through a high pin count, high performance connector placed in the center of the LAMBSLP (the connector is inside the 4 yellow boxes in Figure 1). Each LAMBSLP will contain 16 AM chips.

For each LAMBSLP a large current (up to ~32 A at 1 V, ~8 A at 2.5 V and ~1,76 A at 1,2 V) is provided through the high pin count connector, the same connector used for the LVDS signals. The LAMB designed in the past for the AM chips with parallel I/O had a critical layout, because of the presence of the parallel buses. Its layout had a high degree of complexity. Moreover, it required many FPGAs for handling the input fan-out and the readout logic. In order to minimize the impact of these weaknesses on the system functionality, a new approach has been developed for the LAMBSLP. This new mezzanine is designed to match the new AMchip I/O, totally serialized. The FPGAs can be replaced by simple lowcost serial-links fan-out chips. Final prototypes of this new mezzanine have been designed and tested as standalone boards to demonstrate full functionality of the network of serial connections and the compatibility between FPGAs, new fanout chips and new AMchip I/O. Complete integration inside the FTK system will take place during summer.

## IV. THE AMBSLP AND LAMBSLP TESTS

Figure 2 shows the MiniLamb mezzanine placed on the motherboard. On the MiniLamb are visible the 4 small mini@sics placed inside QFN64 packages, since they are too small to use the final BGA package. In the center of this group the low jitter oscillator and the fan-out chip that distributes the 100 MHz clock to the mini@sics are visible. Above the group the fan-out chips for the hit serialized buses distribution are visible. Two FPGAs are on the board, one to program the AM chips, the other one for test purposes.



Figure 2: The miniLamb with 4 mini@sics and 2 FPGAs placed on the AMBSLP

All serial links are successfully tested using the PRBS (Pseudo-Random Binary Sequence) test. They are transmitted correctly to the miniLamb, correctly received by the FPGA on the board and correctly transmitted to the mini@sics, through the fan-out. The output of the mini@sics are correctly received by the AMBSLP Artix. Figure 3 shows as example the quality of the transmission line at the input of the mini@sic. As shown there is a signal reflection on this link which manifests itself as a doubling of "0" and "1" levels. This is due to an impedance mismatch inside the AM chip package that has been solved for the final version of the AM chip. Nevertheless the estimated bit error rate is 7 10<sup>-14</sup> which is good enough to guarantee full functionality of the board.



Figure 3: Eye diagram for signals sampled at the mini@sic input

Figure 4 shows the same measure performed on a link between two fan-out stages of the AMBoard here the achieved bit error rate is 8 10<sup>-64</sup> showing the worst case since we

achieved bit error rates down to  $10^{-100}$  on the inputs of the LAMBSLP first fan-out stage. Similar measurements were performed with the LAMSLP prototype with analogous results. Those results can be achieved by careful control of the line impedance and the introduction of suitable guard lines used to shield separate the differential couples.



Figure 4: Eye diagram for signals sampled on a serial link after the latest fan-out stage present on the AMboard

## V. CONCLUSIONS

We report about tests of the new Associative Memory system integrating the new mother board AMBSLP and daughter board LAMBSLP. Those test could be performed thanks to the manufacturing of two AM chip prototypes, mini@sic and AMchip05. The AM chips and boards have represented a significant technological challenge due to the high memory density, low power of logic required for the AM chip. Indeed the density of chips on the LAMB mezzanine limits both the cooling power of the system and the maximum power available. The use of advanced packages and many high frequency serial links makes this application even more challenging and requires careful routing at board level.

### REFERENCES

- [1] Andreani et al., The FastTracker Real Time Processor and Its Impact on Muon Isolation, Tau and b-Jet Online Selections at ATLAS, 2012 TNS Vol.: 59, Issue:2, pp, 348 – 357
- [2] Annovi, A. at al., A VLSI Processor for Fast Track Finding Based on Content Addressable Memories, *IEEE Trans. Nucl. Sci.*, vol. 53, pp 2428–2006
- [3] Ancu L. et al., Associative Memory computing power and its simulation, 19° Real Time Conference, 26-30 May 2014 conference, Nara, Japan, May 2014.
- [4] A.Andreani et al., Characterization of an Associative Memory chip for high-energy physics experiments, Instrumentation and Measurement Technology Conference (I2MTC) Proceedings, 2014 IEEE International , 12-15 May 2014, Montevideo, Uruguay, pp 1478-1491
- [5] FTK group, Fast TracKer (FTK) updates to TDR for IBL, 10 Apr 2014, CERN, Geneva, ATL-COM-DAQ-2014-014
- [6] http://www.xilinx.com/