# The Selection Crate for the L0 Calorimeter Trigger - G. Balbi<sup>a</sup>, M. Bargiotti<sup>b</sup>, A. Bertin<sup>b</sup>, M. Bruschi<sup>a</sup>, A. Carbone<sup>b</sup>, I. D'Antone<sup>a</sup>, L. Fabbri<sup>b</sup>, P.Faccioli<sup>b</sup>, D. Galli<sup>b</sup>, B. Giacobbe<sup>a</sup>, F. Grimaldi<sup>b</sup>, I.Lax<sup>a</sup>, U. Marconi<sup>a</sup>, I. Massa<sup>b</sup>, M. Piccinini<sup>a</sup>, - N. Semprini-Cesari $^b$ , R. Spighi $^a$ , V.Vagnoni $^b$ , S. Vecchi $^a$ , M. Villa $^a$ , A. Vitale $^b$ and A. Zoccoli $^b$ $^aINFN\ Sezione\ di\ Bologna$ $^bDip.\ di\ Fisica\ Universit\grave{a}\ di\ Bologna\ and\ INFN$ ## 1 System overview The purpose of the Selection Crate is the selection of the most energetic clusters among those validated by the Validation Cards. Clusters are validated as: - Electromagnetic clusters (gamma, electron or neutral pion); - Hadron clusters. The Selection Crate event by event has to select the highest of each cluster type listed below: - 28 candidates for electrons; - 28 candidates for gammas; - 28 candidates for "local" $\pi^0$ ; - 28 candidates for "global" $\pi^0$ ; - 80 candidates for hadrons. It is also requested to sum up the hits recorded by the SPD, since the overall hit multiplicity is planned to be used as a veto-signal to the level-0 trigger by the L0DU [1]. The Selection Crate is a modular system which consists of 8 electronics boards, called the Selection Boards. However, a unique model of Selection Board equipped with 28 optical input channels will be implemented. The board is foreseen to be adaptable to perform both the electromagnetic and the hadron clusters selection. The electromagnetic cluster selection is performed by using one board per each cluster type (four boards in total). The hadron selection requires three boards. The results of the selection of two boards have to be transmitted to the third one (the hadron master board) for the final selection. One board in addition is used to sum up the SPD hit multiplicity (the input of 16 hit partial sums being given). Clusters from the Validation Cards are transmitted to the Selection Crate packed in 32-bits data words. Cluster information are coded as in the following: - 8 bits are used for representing the cluster transverse energy; - 8 bits are used for the relative cluster address; - 8 bits are used to code the least significant digits of the BCID; - 8 bits are used to label the physical connections. Every Selection Board produces for the level-0 trigger two output messages of 32-bits each. The first message consists of: - 8 bits to code the least significant digits of the BCID; - 8 bits of the highest transverse energy cluster; - 14 bits of the complete cluster address. The second output message contains the following information: - 8 bits to code the least significant digits of the BCID; - 14 bits of the total transverse energy or alternatively the SPD hits multiplicity. Figure 1: The Selection Board main functional blocks. The input stage hosts 28 optical receivers. Data entering the board are de-serialized and synchronized to the external reference TTC clock. They are then processed by the Processing Unit. The data output to the L0DU and to the TELL1 Board interface goes through optical-links. The total transverse energy is calculated only in the hadron case. It is set to zero in the case of electromagnetic cluster selections. In the case of the SPD the hit multiplicity replaces the total transverse energy. There will be therefore seven useful output messages going from the Selection Crate to the L0DU at 40MHz. It is since every board sends one word of 32-bits to L0DU, except the hadron, that puts out two words of 32-bit from the hadron master board. Clusters belonging to an event passing the level-0 trigger selection criteria are transmitted to the TELL1 Board for the level-1 trigger needs at the maximum frequency of 1.1MHz. In case of a level-0 trigger accept, each Selection Board will transmit 32 times 32-bits data words which correspond to the 28 candidate clusters plus 4 control words. Messages to the L0DU and to the TELL1 Board are transmitted through optical links. Parallel LVDS signals, over flat cables, are planned to be used to establish the board-to-board communications between the 3 boards running on the hadron clusters. Each Selection Board will process the TTC fast control signals by an embedded TTCrx receiver chip [2]. The interface to the ECS slow control system will be provided by the mezzanine Credit Card PC board [3] and the Glue Card board [4]. The main building blocks of the Selection Board are: - the 28 optical input interface; - the Processing Unit; - the output interface; - the interface to the fast and to the slow control systems; they are represented in the Figure 1. Next paragraphs are devoted to the description of their main functionalities. #### 1.1 The Data Input Interface A Selection Board is equipped with 28 independent input channels. Each input channel consists of the following fundamental elements: - 1. an optical to voltage transducer; - 2. a de-serializer; - 3. a de-multiplexer; - 4. a synchronization FIFO. A scheme of an input channel is shown in Figure 2. The 32-bits data words enter the Selection Board as optical data stream through the 28 optical-links at the frequency of about 1.6 Gbits/s per channel (32-bits at 40 MHz being the effective data rate). Each optical bit stream is converted to voltage pulses and de-serialized to 16-bits data words at the frequency of about 80 MHz. They are then converted to 32-bits data words at 40 MHz by a 2:1 de-multiplexer in order to restore their original data format (as they are transmitted by the Validation Cards). Before data enters the Processing Unit they have to be synchronized to the TTC reference clock to establish a single time domain. It is important to notice that before data synchronization the 28 cluster data flows are driven Figure 2: Cluster information as optical data stream enters the board at 1.6 Gbits/s. After the de-serialization and the de-multiplexing the resulting 32-bits data words are time aligned to the reference TTC clock. by 28 different clocks being those reconstructed by the 28 DCR de-serializers. Synchronization to the TTC external clock is performed with the synchronization FIFOs. Data writing is driven by the internal DCR clocks, while data reading happens at the TTC frequency. The 32-bits data words can be propagated to the Processing Unit as smaller bit patterns of 24-bits, by dropping the 8-bits used for coding the cable number, once having checked by a set of 28 8-bits comparators that the cabling is the expected one . The 2:1 de-multiplexing, the synchronization to the TTC, and the test for the cabling consistency are all operation implemented in a small FPGA (one FPGA eventually serves to input channels). As physical support for the optical-link we plan to use a ribbon of 12 parallel optical fibers [7]. The optical transducer is a 12 lines parallel optical transducer [8]. In order to feed one board with the planned 28 inputs one needs therefore 3 ribbons of optical fibers and 3 parallel transducers. De-serialization is performed with 28 independent de-serializers TLK2501 [9]. De-multiplexing and time alignment to the TTC clock are implemented on a small FPGA (1 FPGA of the type XC2S50-7PQ208C Spartan2 serves two channels). ## 1.2 Data Processing After the synchronization of the data to the reference TTC clock the cluster information can be moved to the Processing Unit where they will be processed. The data processing is performed with a single capable FPGA, see Figure 3: The I/O signals throughput to/from the Processing Unit. All the relevant information to be processed (data and control signals) are gathered at the Processing Unit. Output messages are sent to the output interfaces where they are arranged for the data transmission. #### Figure 3. At the TTC clock frequency the Processing Unit gets as input the following messages: - 28 times 24-bits of the clusters; - 46-bits of the TTCrx board; - 7-bits of the ECS busses (4+1 bits for the JTAG, 2 bits for the I2C); - 2 times 44-bits of the hadron neighboring boards. and it sends out the following output messages: - A 44-bits message to take the level-0 trigger decision (it is directed to the L0DU or by a switch mechanism to the hadron master board to complete the hadron selection); - A sequence of 32 messages consisting of the 32-bits cluster data words to be used by the level-1 trigger (directed to the TELL1 Board). Data processing is performed in three steps: - 1. Clusters are re-phased to the local BCID by adjusting their relative delays on the basis of the comparison between the clusters BCID and the local BCID one. - 2. Definition of the output messages. This operation requires the setting of the complete 14-bits cluster addresses, the selection of the highest energy cluster, the calculation of the total transverse energy, - 3. Output message transmission to the board output interfaces. It is worth to mention that in order to correctly calculate the total hadron transverse energy the electronics has to preliminarily get rid of 30 artifacts among the 80 hadron clusters. The hadron front-end boards in fact select only 50 local maxima but generate 30 copies of them (the artifacts) to allow the validation of the hadron cluster when a twofold ambiguity between HCAL and ECAL overlapping regions can't be resolved. The hadron validation returns both the original 50 clusters and their 30 copies. Copies will have the same addresses of their originals but in general different energy values. The less energetic for each of the 30 cluster pairs are those that have to be removed. The optical fibers of a cluster pair (the original and the copy) are planned to be plugged into two adjacent board sockets. This way at the Processing Unit one can easily get rid of the artifacts by a 2 by 2 comparison of the energies of the closest clusters. Data processing is performed by means of specialized electronics units. For instance, the complete cluster addresses are defined by means of 28 LUTs whose content has to be set depending on the front-end board locations in the calorimeter. The highest transverse energy cluster among the 28 candidates can be determined by layers of 8-bits comparators and multiplexers. The 44-bits data output to be delivered to the L0DU are arranged in 2 words of 32-bits. The information is planned to be distributed between the two words, putting the highest energy cluster information, i.e. the BCID (8 bits), transverse energy (8 bits), cluster address (14 bits) in the first word, and the rest of the information, i.e. the BCID and the transverse energy (14 bits), or the SPD's hit multiplicity in the second one. While waiting for the level-0 accept trigger signal the clusters are stored, inside each board, in the L0-Buffer, which is implemented in the Processing Unit FPGA as well. The L0-Buffer depth is foreseen to be 160 clocks at maximum, with a variable depth that can be set via the ECS control system to fit the requested delay. In case of level-0 trigger all the 28 clusters are moved from the L0-Buffer to the L0-De-randomizer, from where they are then transmitted to the TELL1 Board. The L0-De-randomizer has been implemented according to the specification given at [11]. Note that since there is no need of a synchronous data transfer of the Selection Board output messages to the L0DU they are sent to the L0DU as soon as they become available. FPGAs capable of performing in a single component the Processing Unit functionality are nowadays available on the market. At the moment the Processing Unit has been simulated on a single XILINX XC2VP50-5FF 1148C, Virtex-II Pro FF (812 I/O pins). #### 1.3 Output interfaces The TTC clock that drives the optical transmitters used to send data to the L0DU and to the TELL1 Board is filtered with the QPLL phase locked loop [5] reducing the jitter to the tolerable level of about 50ps peak-to-peak (see section 1.5). The optical transmitter that are planned to be used are of the same type of those designed to be used to connect the Selection Crate to the Validation Cards. They are described in the following, at the section 1.5. To provide the Selection Board with the requested inter-board communication feature we plan to use LVDS parallel connections based on twisted flat cables. Alternatively on short distances an external bus driving TTL signals can be effectively used to connect the boards. Two different LVDS based output interfaces have also been studied. The first solution foresees the usage of 4 serial lines through a RJ45 cable. The chosen serializer [6] allows to transmit 10-bits at 40 MHz on each line. It is so possible to transmit 40-bits at 40 MHz into a single channel. With such two serial connections one can transmit the 44-bits level-0 messages from the Selection Board to the L0DU at the TTC clock. ### 1.4 Simulation prototypes and test results The Selection Boards are built according to the 9U standard and the entire Selection Crate is hosted in one VME crate. Since there is not need of using the VME bus, the crate is used just for powering the electronics with the standard reference bias. The Selection Board prototype has been built as a modular system, consisting of three separate units, to be individually tested and then assembled in the final prototype. The prototype consists of the Processing Unit, the 8 optical channels input interface and a Control Unit. The test board has been also provided with several I/O devices and it can be controlled via VME bus. Figure 4: The Processing Unit test board. The Processing Unit has been implemented in a single FPGA. The board allows to test the cluster selection algorithm on 8 input channels. In Figure 4 is a photograph of the Processing Unit test board. The prototype of the Processing Unit, implemented in a single FPGA<sup>1</sup>, has been extensively tested with the primary aim of checking the performance of the selection algorithms applied to 8 channels (due to the limited electronics components available in the FPGA). The test setup arranged to test the Processing Unit prototype is shown in Figure 5. Bit patterns of the simulated clusters (supposed to be originated by 8 neighboring front-end boards) have been injected into the board at 40 MHz by means of a Pattern Generator. The board outputs have been recorded by a Logic Analyzer and compared to the expected ones. The test results demonstrate the reliability of the FPGA approach up to the LHCb sustained rates. The latency to have the data ready to the L0DU has been evaluated in 9 clock cycles in the case of the electromagnetic clusters and 14 clock cycles in the case of the hadrons. The sub-processes take in turn: two cycles in order to de-serialize and time align the clusters to the TTC clock; 5 cycles to perform the selection; 2 cycles to transmit the data. The hadron selection requires 3 cycles in addition, in order to perform the final selection on the partial results, and two more other cycles to transmit the data to the L0DU. <sup>&</sup>lt;sup>1</sup>XILINK XCV1000 Figure 5: The setup arranged to test the Processing Unit prototype. The board is fed by a Pattern Generator which allows injecting 32-bits pattern (pseudo-clusters) at 40 MHz. Data outputs have been recorded by a Logic Analyzer and then compared to the expected ones. Single line optical channels have been extensively tested (see the section 1.5). The prototype of the 8 input channels optical interface, wide 1/3 of a 9U standard board (28 channels having to fit within an entire 9U one) has been commissioned. By testing this prototype we aim to establish the case of possible cross talk effects, that could arise due to the high density of the electronics components. It is equipped with one 12 optical parallel transducer and 8 TLK2501 de-serializers. The board will be fed by 8 GOL optical transmitters, of the same type of those that are planned to be used for data transmission by the Validation Cards. This tests allow also to characterize the optical transmitter on a rather large sample. We tested also a serial line LVDS output interface which turns out, as expected, to be effectively employed on distance scale of order of 10 m. The control unit has still to be designed, but it is planned to rely on the existing implementation of both the Glue Card and the Credit Card PC mezzanine board. #### 1.5 The optical link We designed and tested two different solutions for the single line optical-links, both capable of transmitting 32-bits pattern at 40 MHz. The first solution is based on the GLink serializer and GLink de-serializer Agilent devices, the second one is based on the GOL serializer and TLK2501 de-serializer combination. Measurements of the bit error rate of both the GLink-GLink and the GOL-TLK optical channels have been performed by means of an instrument developed ad hoc by the Bologna INFN electronics group. It is a board that has been equipped with a 32-bits pseudo random pattern generator, capable of running at 40 MHz, and with two I/O interfaces. The optical channel is plugged to the board (both the Tx and the Rx optical cards) such a way that the 32-bits data pattern can be sent through the channel at 40 MHz. Data goes through the channel and re-enter the board where they can be then compared to the original message. The system is a very effective one since it allows to reach a sensitivity of the level of $10^{-14}$ in 50 hours of continuous data tacking. Several measurements have shown that both the optical channel prototypes present a small error rate, at the level of $10^{-13}$ , when they are driven by a clock jittering below 50ps peak-to-peak, like it is expected to be the case of the TTC clock after being filtered with the QPLL. However the GLink-GLink combination is unfit to our needs due to the excessive power consumption of about 2.5 W of each the de-serializer. More- Figure 6: The bit error rate test setup. The BER test board generates pseudo random 32-bits patterns that are injected at 40 MHz into the optical transmitter. The optical channel outputs are then collected by the board at the optical receiver. The board measures the bit error rate by comparing the injected and the received bit patterns at 40 MHz. A data taking of about 50 hours allows to reach a sensitivity of about $10^{-14}$ . over, the serializer is not certified as a radiation tolerant device. The GOL-TLK combination instead fits our needs, since the TLK describilizer presents a low level of power absorption, of about 300 mW/describilizer, and it is then suitable to be packed in 28 items per board; from the other side, the GOL serializer, that has to be embedded in the Validation Card, is certified to be a radiation tolerant chip. ## References - [1] Using the SPD multiplicity in the Level-0 trigger, LHCb-2003-022, O. Callot, M. Ferro-Luzzi and P. Perret - [2] The TTCrx Reference Manual, A Timing, Trigger Control System Receiver ASIC for LHC Detectors, CERN, 2003 January - [3] The ECS Credit Card PC mezzanine board. - [4] The ECS Glue Card mezzanine board. - [5] The Quartz Crystal Phase Locked Loop, CERN Microelectronics Group, 2002 October - [6] DS92LV1021 and DS92LV1210 16-40 10Bit Bus LVDS Serializer and De-serializer, National Semiconductor. - [7] High speed ribbon optical link for the level-0 muon trigger, LHCb-2003-008, E.Aslanides, J.P. Cachemiche, B. Dinkespiler, P.Y. Duval, R. LeGac, O.Leroy, P.L. Liotard, M. Menouni, A. Tsaragorodsev - [8] Agilent HFBR-712BP and HFBR-722BP Parallel Fiber Optic Modules, Transmitter and Receiver, 30 GB/s Aggregate Bandwidth. Agilent Preliminary Data Sheet, January 2002. - [9] TLK2501 1.6 to 2.5 Gbit/s Transceiver, Texas Instruments Data Sheet, October 2000. - [10] GOL Reference Manual, Gigabit Optical Link Transmitter Manual, CERN May 2002 - [11] Requirements to the L0 front-end electronics, LHCb-2001-014, J. Christiansen