#### **OPEN ACCESS**

## A potent approach for the development of FPGA based DAQ system for HEP experiments

To cite this article: Shuaib Ahmad Khan et al 2017 JINST 12 T10010

View the article online for updates and enhancements.



## **Related content**

- Design exploration and verification platform, based on high-level modeling and FPGA prototyping, for fast and flexible digital communication in physics experiments G Magazzù, G Borgese, N Costantino et

al.

- The Versatile Link Demo Board (VLDB) R. Martín Lesma, F. Alessio, J. Barbosa et al

- The beam and detector of the NA62 experiment at CERN E. Cortina Gil, E. Martín Albarrán, E. Minucci et al.

PUBLISHED BY IOP PUBLISHING FOR SISSA MEDIALAB



RECEIVED: August 28, 2017 Accepted: October 12, 2017 PUBLISHED: October 27, 2017

#### TECHNICAL REPORT

# A potent approach for the development of FPGA based DAQ system for HEP experiments

Shuaib Ahmad Khan,<sup>*a*,1</sup> Jubin Mitra,<sup>*a*</sup> Erno David,<sup>*b*</sup> Tivadar Kiss<sup>*b*</sup> and Tapan Kumar Nayak<sup>*a*,*c*</sup>

 <sup>a</sup> Variable Energy Cyclotron Centre, HBNI, 1/AF Bidhannagar, Kolkata — 700064, India
 <sup>b</sup> Wigner Research Institute, KFKI, 1121, Budapest, Hungary
 <sup>c</sup> CERN, Geneva 23, Switzerland

*E-mail:* shuaib.ahmad.khan@cern.ch

ABSTRACT: With ever increasing particle beam energies and interaction rates in modern High Energy Physics (HEP) experiments in the present and future accelerator facilities, there has always been the demand for robust Data Acquisition (DAQ) schemes which perform in the harsh radiation environment and handle high data volume. The scheme is required to be flexible enough to adapt to the demands of future detector and electronics upgrades, and at the same time keeping the cost factor in mind. To address these challenges, in the present work, we discuss an efficient DAQ scheme for error resilient, high speed data communication on commercially available state-of-the-art FPGA with optical links. The scheme utilises GigaBit Transceiver (GBT) protocol to establish radiation tolerant communication link between on-detector front-end electronics situated in harsh radiation environment to the back-end Data Processing Unit (DPU) placed in a low radiation zone. The acquired data are reconstructed in DPU which reduces the data volume significantly, and then transmitted to the computing farms through high speed optical links using 10 Gigabit Ethernet (10GbE). In this study, we focus on implementation and testing of GBT protocol and 10GbE links on an Intel FPGA. Results of the measurements of resource utilisation, critical path delays, signal integrity, eye diagram and Bit Error Rate (BER) are presented, which are the indicators for efficient system performance.

KEYWORDS: Data acquisition concepts; Detector control systems (detector and experiment monitoring and slow-control systems, architecture, hardware, algorithms, databases); Optical detector readout concepts

<sup>&</sup>lt;sup>1</sup>Corresponding author.

<sup>© 2017</sup> CERN. Published by IOP Publishing Ltd on behalf of Sissa Medialab. Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

#### Contents

| I | Intr | oauction                                                                    | 1  |
|---|------|-----------------------------------------------------------------------------|----|
| 2 | DAC  | 2 architecture                                                              | 3  |
| 3 | Rea  | dout links                                                                  | 7  |
|   | 3.1  | Front end interface — link-1: GBT                                           | 7  |
|   | 3.2  | Back end interface — link-2: 10GbE                                          | 8  |
| 4 | Test | setup for the interfaces                                                    | 9  |
|   | 4.1  | Implementation of link-1: GBT on FPGA                                       | 10 |
|   | 4.2  | Implementation of link-2: 10GbE on FPGA                                     | 11 |
|   |      | 4.2.1 Model-1: Test system implementation on Quartus II platform using Qsys | 11 |
|   |      | 4.2.2 Model-2: Hardware platform and transceiver test                       | 13 |
| 5 | Perf | formance evaluation                                                         | 14 |
|   | 5.1  | Link-1: GBT protocol on FPGA                                                | 14 |
|   | 5.2  | Link-2: 10GbE protocol on FPGA                                              | 16 |
|   |      | 5.2.1 Model-1 results                                                       | 16 |
|   |      | 5.2.2 Model-2 results                                                       | 17 |
|   | 5.3  | Discussion                                                                  | 20 |
| 6 | Sun  | ımary                                                                       | 20 |

#### Introduction 1

Nuclear and particle physics experiments at high energies, often referred to as High Energy Physics (HEP) experiments, study the constituents of matter and their fundamental interactions. The evolution of our knowledge of the fundamental particles, their interactions as well as connections to the early Universe, has been proportional to the evolution of the available beam energies in the particle accelerators. The increase of collision energy and beam interaction rates demand for sophisticated and hightech detectors, electronics and data acquisition (DAQ) systems. In addition, the radiation levels in the proximity of the detectors have also been growing, which calls for radiation tolerant systems. The readout electronics in the harshly radiated area are highly prone to damage due to the total dose, single event upsets and non-ionizing energy loss [1] depending on the type of radiation. Traditional DAQ systems [2, 3] of the last century, could handle low data rate and less data errors against multi-bit upset in radiation zone. An efficient DAQ system at the present time must be able to cope with the increase in data volume by acquiring data at a high rate and recovery from data error against the multi-bit upsets in radiation environments.

1

3

| Parameter      | 1960-1980               | 1980-2000                | 2000 onwards                   |
|----------------|-------------------------|--------------------------|--------------------------------|
| No. of Readout | ~100s                   | $\sim 10^3 - 10^6$       | $\sim 10^6 - 10^9$             |
| Channels       |                         |                          |                                |
| Data Rate      | ~1 MB/sec               | ~1 GB/sec                | $\sim 10$ GB/sec to few TB/sec |
| Readout        | Front End Electronics   | Parallelism feature of   | Heterogeneous                  |
| Standard       | Non standardized        | distributed computing    | Computing                      |
| Technology     | 1964: NIM standard      | 1986: FastBus            | Point to point High            |
| Evolution      | (backplane bus not      | BW: 40-60 MB/sec         | speed links                    |
| (Year Wise)    | defined)                | Support parallelism      | 2003: PC based                 |
|                | 1969: CAMAC based       | 1982-1987: VME           | computing farms with           |
|                | centralized backplane,  | development with         | Ethernet and PCIe bus          |
|                | but lacked parallelism, | microprocessors.         | Present: upto 400Gbps          |
|                | BW limited to 1 MB/sec, | BW: 40 MB/sec            | Ethernet, PCIe 5.0             |
|                | 1970-1980: NIM based    | <b>1990:</b> NIM, CAMAC, | specifications released        |
|                | Front End read by       | Fastbus and VME          | in June 2017. Boosting         |
|                | minicomputer and        | coexisted                | on-board and local             |
|                | CAMAC readout bus       | 1997: VME320 with        | processing with FPGAs          |
|                |                         | BW: 320 MB/sec           |                                |
| Example        | Experiments at          | Experiments at           | Experiments at                 |
| System         | TRIUMF, BNL             | SPS, LEP at CERN         | CERN LHC                       |

#### **Table 1**. Journey to high speed DAQ system [4].

Modern DAQ for HEP and nuclear physics experiments is a result of continuous evolution [4]. The overview of the journey and the different methodologies adopted in the field of high speed DAQ are summarized in the table 1. In the early two decades of 1960-1980 the DAQ issues had been acknowledged by custom designed readouts that were framed after the characteristics of the individual experiments. The introduction of Nuclear Instrumentation Module (NIM) standard and a modular computer-controlled bus named Computer Automated Measurement and Control (CAMAC) focused on the standardisation of front end and back end respectively however this lacked parallelism with limited data rate and channel count. In the decades of 1980-2000, Fast-Bus standard supported parallelism but the advent of microprocessors leads to the Versa Module European (VME) standard and VME Inter-Crate bus specifications (VIC). In the current century point to point high speed links are evolved, specifications for Ethernet and Peripheral Component Interconnect Express (PCIe) protocol are continuously expanded. With the advent of highly dense latest commercial Field Programmable Gate Array (FPGA), we are heading towards the data rates of Terabytes/sec with high channel count and on-board and local processing. One of the examples of the demand for moden DAQ systems is at the experiments at CERN's Large Hadron Collider (LHC), which is world's largest and most powerful particle accelerator, operating since the year

2009. The first phase of data taking is over and the second phase will be completed in 2018. The LHC and the HEP accelerator experiments aim for the stepwise upgrade to fully extract their scientific potential and extend the physics outreach [5]. With the proposed upgrade in the coming years, the LHC beam energies will increase and the beam luminosity will progressively ramp up to six times of its current design value of  $1 \times 10^{34} \text{ cm}^{-2} \text{s}^{-1}$ . It will increase the interaction rates which will lead to a dramatic increase in the channel occupancy, data rate and data volume [6].

In this manuscript, we address the challenges of DAQ for large HEP experiments. The key issues in the design of DAQ for HEP experiments are the high data rate communication with error resiliency in the harsh radiation environment, quick upgradation, easy reconfigurability and portability on other platforms. A Field Programmable Gate Array (FPGA) based new DAQ readout scheme is proposed that has high speed, fault tolerant readout architecture with the ability to perform in a harsh radiation environment, yet flexible enough to keep up with upgrades and instant reconfigurable. The uniqueness of the proposed scheme is that the readout could be implemented using the commercially available, non-radiation hardened [7] state-of-the-art FPGAs with high processing power in comparison to the radiation hard FPGAs. This approach is based on the radiation tolerant 4.8 Gbps GBT [8–10] optical link and 10GbE [11] link using modular approach. Functional testing of the links is presented. For this development, FPGAs are considered for rapid prototyping as they are field reconfigurable with large resources. It allows higher parallelism and pipelining thereby increasing the logic computation speed and minimizing the latency involved.

The paper is organised as follows. The details of the readout scheme, its internal architecture, features, constituents and the advantages over the conventional approach are explored in section II. Architecture of readout links with different interfaces are discussed in section III. Section IV illustrates the test setup for the implementation of the optical interfaces on FPGA. Performance evaluation and functional tests for evaluation of different protocols are described in section V along with analysis results and discussions. The paper is concluded with a summary in section VI.

#### 2 DAQ architecture

The most demanding features for DAQ in the HEP experiments involve error resilience, high data rate handling, compact hardware with reusable modules for portability and quick upgradation and efficient data aggregation [12, 13]. Motivated by such requirements a simplified hierarchical readout chain for HEP experiments is proposed as shown in figure 1. In this scheme, the readout system is broadly divided into two parts: Front End Electronics (FEE) and the Data Processing Unit (DPU).

FEE are located in the radiation zone with proximity to the detector requiring custom built radiation-hard electronics. At the first stage, FEE receives the data from the detectors. It amplifies, integrates and shapes the weak sensor signals over a given period, and provide robust signals to be transmitted from the detector. A FEE consists of Front End Module (FEM) and Optical Conversion Module (OCM) [14, 15]. The design of each FEM is unique to individual detector requirements [16]. In general, FEM consists of a charge sensitive preamplifier, buffer, sequencer, Analog to Digital Converter (ADC) with other detector specific components. FEM operates directly on the analog (charge) signals produced by the beam interactions in the particle detectors. An event is qualified by the beam interactions and in certain instances by the interaction trigger. When an event occurs, the FEM sends the digitized detector specific data to the Optical Conversion Module



Figure 1. Readout Scheme Architecture.

(OCM) through the differential Electrical Links (E-links) [17]. The OCM converts the digitized data to the optical signal wrapped in the link format [8] and sends it to the DPU over an optical fibre. Optical conversion is needed to communicate the high speed data over long distances with less channel noise having low power consumption.

DPU aggregates the data from a large number of high bandwidth detector side optical links to even higher bandwidth data links on the server side. It increases the throughput and also optimizes the system level cost. The DPU hardware is identical for every detector. However, the detector specific functionalities like number of optical links to be handled, firmware, need DPU to be implemented as custom designed electronics boards with programmable functionalities based on FPGA technology. The digitized data sent to the DPU are multiplexed, processed and formatted depending on the detector specifications before being forwarded to the back-end computing nodes.

The physical location of the DPU in the readout chain is one of the major factors that affect the selection and design of the hardware. DPU can be located either near the detectors as the conventional approach or far from the experimental site in the controlled radiation zone. However, counting room, far from radiation zone is chosen as the preferred location for the DPU. The advantages of the proposed approach over the conventional approach against the critical design parameters and their implications for the DPU technology, available ecosystem and ease of maintenance are listed in the table 2. Considering all these, the layout for the readout scheme is adopted as shown in figure 1.

Data transfer from detectors to the DAQ with high reliability and fixed latency [18] is a crucial requirement for HEP experiments. The interface from FEE to DPU and between DPU to the backend computing node are marked as Link-1 and Link-2 respectively shown in figure 1. The various commercially available protocols can not be used as Link-1 as the robust error correcting code is not present to protect the data upset due to harsh radiation in the cavern. Hence 4.8 Gbps radiation tolerant GBT protocol developed at CERN which is bidirectional and error resilience optical link having fixed latency support is used as Link-1. *4.8 Gigabit/sec* for GBT protocol is composed of *40MHz x 120 bits*. Frequency of 40 MHz is derived from the LHC bunch spacing time of 25ns [18] and 120 bits is technology parameter [19]. If FPGA transceiver reference clock is fixed at the 40

| Parameter              | Conventional approach           | Proposed approach                |
|------------------------|---------------------------------|----------------------------------|
| DPU location           | Experiment area:                | Counting room:                   |
|                        | radiation environment           | Controlled or no radiation       |
| DPU Technology         | Radiation hard electronics      | Commercially available           |
|                        |                                 | components                       |
| FPGA                   | Radiation hard such             | Non-Radiation Hard               |
|                        | as ACTEL or MICROSEMI           | Intel or XILINX                  |
|                        | FPGAs, Flash Memory             | Static RAM based,                |
|                        | based low performance,          | high performance FPGAs           |
| Logic                  | Triple Modular Redundancy       | logic redundancy not             |
| resources              | or voting logic, Low            | required, Densely packed         |
|                        | packing density of logic cells  | logic cells                      |
| Radiation              | Rigorous radiation tests        | Not required                     |
| Campaign               | to qualify the components       |                                  |
| Cable lengths:         |                                 |                                  |
| (a)Between FEE and DPU | (a) Short cables                | (a) Long optical links,          |
| (b)Between DPU and     | (b) Long Radiation              | (b) short connection             |
| Back-end computing     | tolerant link                   |                                  |
| Availability of        | Limited solutions, less         | Ample solutions and more         |
| ecosystem              | choices for component selection | options for components selection |
| Impact on cavern       | High                            | Less or no impact                |
| infrastructure         |                                 |                                  |
| Accessibility,         | No or limited access,           | Easy accessibility               |
| Maintenance,           | Difficult maintenance,          | Ease of maintenance.             |
| flexibility            | Less flexible and low           | Highly flexible towards          |
| and adaptability       | adaptability for upgrades       | the future upgrades.             |
| Cost                   | Relatively High                 | Advantageous over the            |
|                        |                                 | conventional approach            |

Table 2. Advantages of the Proposed approach over the Conventional approach.

MHz then the minimum input data width of the transceiver is 120 bits, so the GBT protocol with the link rate as 4.8 Gbps is chosen. Details of the GBT protocol, its implementation and testing on FPGA are discussed in section 3, 4.1 and 5.1 respectively. The link-2 from the DPU to the back-end processor requires a standard interface having large bandwidth with a multi-channel support that guarantees transmission capability. Custom designed or commercially available link could be used. Since the DPU will be connected to a commercial DAQ server, so a custom defined protocol for high speed link becomes difficult to maintain with poor future support. Hence, different commercial off-the-shelf options were explored that have large ecosystem with ample solutions and reasonable cost. The two latest promising technology options available for the high speed Link-2 interface are PCIe express protocol [20] and 10GbE protocol [21]. However, for each experiment with its distinct set of requirements, it is hard to adopt a ready solution commercially available. Hence it is best

to perceive a standard high speed protocol and adapt it as per our requirements of HEP. The most tangible options as mentioned above are compared concerning the design requirements like the form factor, legacy support, ease of upgradation, flexibility and cost are listed in the tabular form in table 3.

| Comparison      | 10 Gigabit Ethernet Standard                      | PCIe Standard                                      |
|-----------------|---------------------------------------------------|----------------------------------------------------|
| Parameter       |                                                   |                                                    |
| DPU Form factor | ATCA or microTCA form factor                      | PCIe form factor                                   |
| Legacy support  | Good, Form factor and the network                 | Poor, Backward compatibility not maintained        |
|                 | components allow backward compatibility.          | by PCI-SIG group                                   |
| Ease of         | Retaining the key Ethernet architecture.          | Less flexible for upgradation, incompatible with   |
| Upgradation     | Efforts optimized and reduced time of development | older systems, high cost and more development time |
| Flexibility     | More number of cards installed in a single        | Only single DPU could be installed in one          |
|                 | server PC using network switches                  | slot of the server PC. It is not a                 |
|                 | It can be utilized as a fabric, however           | fabric and can only support the connectivity       |
|                 | it has substantial hardware and software protocol | of small numbers of processors and/or              |
|                 | processing requirements.                          | peripherals. It serves as a bridge to fabric       |
| Connectivity    | Ample peer to peer connectivity solutions         | Natively PCI Express does not support peer-to-peer |
| solutions       |                                                   | processor connectivity. Topology limitations.      |
| Routability     | Ethernet switches allow packets to be routed      | Not a routable protocol; it defines a single large |
|                 |                                                   | address space that devices are mapped into.        |
| Line rate       | 10.312 Gbps                                       | 8 Gbps per lane (Gen 3 PCI Express)                |
| Line Coding     | 64b/66b (3.125 percent overhead)                  | 128b/130b (1.54 percent overhead)                  |
| Latency         | Extra stage of communication to move data to      | Tightly integrated with the memory subsystem       |
|                 | or from processor memory                          | in a system on chip device.                        |
|                 | Latency of Tens of microseconds                   | Latency of Sub microseconds                        |
| Cost            | Low cost of the network components and their      | Complex switching devices and                      |
|                 | backward compatibility reduces, the cost of       | proprietary solutions make it costly               |
|                 | installation and maintenance.                     |                                                    |

Table 3. Comparison table between PCIe interface and 10 Gigabit Ethernet interface.

Although designers had migrated to PCIe [20], still 10GbE has got future proof solutions. Hence the 10GbE standard interface is considered for the high speed link to the computing system as summarized in table 3. 10GbE can be optimized for detector specific data and the Quality-of-Service is provided in the higher layers of Open Systems Interconnection (OSI) model [22].

The selection of FPGA family for the DPU hardware and DAQ firmware development is constrained by the availability of logic resources and High Speed Serial Interface (HSSI) Serializer-Deserializer (SerDes) on FPGA. Different non-radiation hardened FPGA families are compared against various crucial parameters like the available logic resources, transceivers, Phase lock loops (PLL) and market availability for the choice of FPGA chip on DPU as listed in the table 4. Intel Stratix-V GX FPGA [23] has been chosen due to the enough available resources as per the requirement of prototype tests.

| FPGA Family             | Intel        | Intel       | Intel       | Xilinx       | Xilinx    | Xilinx Virtex |
|-------------------------|--------------|-------------|-------------|--------------|-----------|---------------|
| Name                    | Stratix-V GX | Stratix-10  | Arria-10 GX | Virtex-6     | Vertex-7  | Ultrascale    |
| Status                  | Available    | End of 2017 | Available   | Available    | Available | Available     |
| FPGA part number        | 5SGXEA7      | 10SG280     | 10AX115     | XC6VLX240T   | XC7VX690T | XCVUI90       |
| PLLs                    | 28           | 48          | 32          | 12           | 20        | 60            |
| >=10Gb/s Transeivers    | 48           | 144         | 96          | 24           | 80        | 60            |
| Logic Elements/cells[M] | 0.622        | 2.8         | 1.15        | 0.241        | 0.693     | 1.9           |
| LUTs[M]                 | 0.235        | 1.8         | 0.425       | 0.15         | 0.433     | 1.07          |
| FFs[M]                  | 0.939        | 7.4         | 1.7         | 0.3          | 0.866     | 2.14          |
| 18/20Kb RAM Blocks      | 2560         | 11721       | 2713        | 832          | 2940      | 7560          |
| Total Block RAM(Mb)     | 50           | 229         | 53          | 15           | 53        | 133           |
| PCIe x8,Gen3            | 4            | 6           | 4           | 2(Gen2)      | 3         | 6             |
| Used for developing     | AMC40 card   |             | PCIe40 card | C-RORC board | MP7 card  |               |

Table 4. FPGA selection parameters.



Figure 2. Block diagram of a GBT link in FPGA.

### 3 Readout links

Architecture for link-1 and link-2 are discussed in this section. The radiation tolerant GBT optical link has been used as Link-1. It is the front end interface for digital data transfer from the on-detector electronics to the DPU. Data are multiplexed, processed and reformatted in the DPU depending on the detector specifications and sent to the processor using the back end interface Link-2 as 10GbE protocol.

#### 3.1 Front end interface — link-1: GBT

GBT link is the interface between FEE and DPU. It is used for data, timing and control distribution merged on a single data channel. The GBT transmission is an asynchronous serial communication that is composed of a GBT transmitter, a Multi-Gigabit Transceiver (MGT) [19] and a GBT receiver as shown in the figure 2. Pattern generator and checkers are used for testing purpose only, and they are replaced with First In Fist Out (FIFO) buffers for FEE buffered data. In the transmitter, scrambler maintains the DC balance for accurate clock recovery without additional overhead by reducing the occurrence of a long sequence of continuous 1's or 0's.

Reed-Solomon (RS) encoder as shown in figure 3 utilizes two double interleaved RS(15, 11) encoded words each capable of correcting a double symbol error. Interleaving operation increases

the error correction capability up to 4 symbols with each symbol of 4 bit. The whole process increases the code correction capabilities without any additional overhead. Gear Box, as shown in figure 2 translates the frequency by modifying data bus width for Clock Domain Crossing. It consists of a dual port RAM, breaks down 120 bit frame to three word of 40 bits each. In the transmit chain, the input of Gear Box is 120bit@40MHz and output is at 40bit@120MHz keeping the data rate fixed at 4.8Gbps. Data frame is sent from the GBT transmitter to a high speed Multi Gigabit Transceiver block.



Figure 3. GBT link encoding scheme.

GBT Receiver performs descrambling, decoding and deinterleaving. The Frame Aligner block [8] performs header detection and locking for frame synchronization using an efficient pattern search algorithm to maintain synchronization between the transmitted and the received data. A custom developed radiation hardened GBT chipset [15] consisting of GBT ASIC with other components is used as OCM in this scheme. Data from the detector is framed into GBT standard using GBT chipset and transmitted to DPU via serial optical link. Radiation hardenes is required near detectors, however it is not necessary for the DPU located away from the radiation zone [24]; this feature is utilized for the realization of GBT functionality on the non-radiation hardened FPGAs. GBT-FPGA logic core firmware [8] is implemented on the FPGA based DPU. It mimics the GBT ASIC behaviour on the FPGA to enable the DPU for receiving the GBT datagram and transmitting the control and timing signals from control room to the detectors. The details of the GBT protocol standard are summarized in the figure 4a.

#### 3.2 Back end interface — link-2: 10GbE

10GbE link acts as an interface between DPU to the back end servers. 10GbE protocol stack consists of a Physical layer and the Data link layer according to OSI model. Data link layer is formed of Media Access Control (MAC) and the Logical Link Control (LLC) as shown in figure 4b.

Detector specific data payload is submitted to the MAC layer [11]. MAC layer initializes, controls and manages the peer to peer connection to prevent from transmission failure due to data collision. It acts as a bridge between the Physical layer and the data link layer. An interconnection between MAC layer and Physical Layer (PHY) is a 72 bit wide 10-Gigabit Media independent interface (XGMII) [11]. The parallel data path of the XGMII and the serial data stream of MAC is mapped by Reconciliation Sublayer (RCS). 10GbE MAC IP core and 10G-Base-R Physical layer (PHY) IP core [25] from Intel are used in this scheme. The internal block diagrams of the two IP cores are shown in the figure 5. Both the Physical Coding Sublayer (PCS) and Physical Medium

| Parameters                                | GBT Data Transmission Protocol |
|-------------------------------------------|--------------------------------|
| Channel Data Throughput                   | 4.8 Gbps                       |
| Raw data Throughput                       | 3.2 Gbps                       |
| Bandwidth Utilization (Coding Efficiency) | 73.33% (88 / 120)              |
| Bandwidth Utilization (Data Efficiency)   | 66.67% (80 / 120)              |
| Security                                  | Not Applicable                 |
| Forward Error Correction                  | RS Encoding (15,11) with       |
|                                           | symbol size of 4 bits          |
| Number of FEC block used                  | 2                              |
| Error Detection                           | 32 bits                        |
| Error Correction                          | 16 bits                        |
| Burst Error Correction                    | 16 bits                        |
| Latency optimized data transmission       | Supported                      |
| Interleaver                               | Block                          |
| Supported Data Type                       | Idle/Control and Data          |

Figure 4. (a) Details of the GBT protocol standard (b) 10GbE position in OSI model.

Attachment Sublayer (PMA) of the 10G-BASE-R PHY IP are implemented as hard IP blocks in FPGA to save resources. 10G-BASE-R PHY IP Core shown in figure 5b delivers serialized data at a line rate of 10.3125 gigabits per second (Gbps) and supports optical communication. On the transmit path, the 10GbE MAC IP accepts the data frames and constructs the Ethernet frames. Data is parsed to the PCS module through RCS layer. Encoding, scrambling, rate matching is performed in the PCS sublayer, and the processed data is sent to the PMA sublayer module. The PMA serializes the encoded data into a bit stream suitable for serial bit-oriented physical devices and passes the stream to the Physical Media Dependent (PMD) [11] layer. Reverse action is performed on the receive path.



**Figure 5.** (a)10GbE Intel MAC IP core block diagram (b) 10GBASE-R PHY with Hard PCS and PMA in Intel devices.

#### **4** Test setup for the interfaces

Development of the scheme shown in figure 1 involves the integration of the constituent components. Aim of the setup is to test the individual modules. It is important as the performance of the scheme depends on the interactions between different constituents. Setup focusses on the testing of the interface links on the FPGA. It tests the compatibility of the components to transfer the valid data at the correct instance across the interfaces. Transceiver test forms an important part as it is the hardware interface to receive the data from the OCM to DPU and transmit it to the back-end processor.

The two interface links used in the readout scheme; GBT link and the 10GbE link are implemented and functionally tested on Intel FPGA. The GBT-FPGA logic core was functionally simulated to understand the behavior of each functional block and then implemented in FPGA [26]. 10GbE link was implemented using the Intel's system integration tool Qsys [27] to adopt a modular approach. The entire test setup is segregated into multiple test models for easy step by step debugging and rapid fault finding of the constituent modules in case of faulty behavior and non-functioning of the scheme.

#### 4.1 Implementation of link-1: GBT on FPGA

Intel Stratix-V GX FPGA development board [23] is used for the implementation and testing of GBT-FPGA logic core firmware. FPGA fabric clock of frequency 156.25 MHz was driven by an on-board oscillator and a clock of 120 MHz to drive MGT is fed by an external jitter cleaned clock source CDCE62005EVM of Texas Instruments [28]. GBT link could be operated in Standard (Std) mode or Latency optimized (Latopt) mode [29, 30]. In Std mode of GBT operation, an elastic buffer is used in between PCS and PMA blocks to compensate for the phase of the clocks that drive the two blocks as shown in figure 6. Elastic buffer adds uncertainty in the latency. To alleviate this effect, the elastic buffer is bypassed and an external phase aligner block is used to align the phase of the clocks between PCS and PMA. This helps to achieve a consistent delay for Latopt mode of GBT operation which is needed for the time critical data transfer.



Figure 6. Clock phase compensation between PCS and PMA.

A detailed study for the estimation of the latency in terms of the clock cycles utilized in the GBT protocol for data processing and transmission is done for all the possible combinations of the mode of operation for transmit and the receive side [31] and discussed in the section 5.1. The latency is measured by concurrently subtracting the transmitted and received value of the counter from the pattern generator in the loopback condition as shown in the figure 7. Estimation for the utilization of FPGA logic resources is an important parameter for the choice of FPGA on the DPU where a large number of links need to be handled. It is estimated using the implementation report of Intel Quartus-II tool (FPGA design software by Intel-Altera).



Figure 7. Test Setup for the GBT latency measurement.

An eye diagram [32, 33] is used to indicate the quality of signals in high-speed digital transmissions. The channel performance of the transceiver link was studied by interpreting the eye diagram pattern in a LeCroy oscilloscope. BER analysis is done using the pattern generator and checker. The outcome of the measurements are discussed in the section 5.1

#### 4.2 Implementation of link-2: 10GbE on FPGA

A test setup is developed for 10GbE implementation on FPGA utilizing Intel IP cores, along with the associated firmware and the embedded software. It is implemented on Quartus-II using the Qsys system integration tool for the quick generation of the interconnect logic and the functionality is verified using the Intel's ModelSim simulation software. The implementation includes **two** models. The *Model-1* focusses on the efficient method of high speed data transmission with minimum processor overload. It performs the loopback tests. The *Model-2* focusses on the optical link testing. It presents an effective approach to address the challenges associated with the testing, performance monitoring and parameter tuning of optical interconnects in FPGA-based systems.



4.2.1 Model-1: Test system implementation on Quartus II platform using Qsys

Figure 8. Test system implementation using Qsys system integration tool.

The architecture of the assembled system instantiated in FPGA and the interconnection between different sub-blocks is shown in figure 8. The model consists of 10GbE Ethernet MAC IP core, NIOS-II processor IP, Scatter Gather Direct Memory Access (SGDMA) IP, JTAG UART [25], an On-chip memory, two On-chip dual clock FIFOs and a standard XGMII interface on the network side and configured to include 10G-Base-R PHY layer IP for optical communication. The implementation methodology is based on the resource intensive soft-core NIOS-II processor [34]. Its soft-core nature allows the designer to specify and generate a custom software over NIOS-II core. NIOS-II acts as a control unit in the loopback test, coordinates the design and provides overall system control. In this design, the SGDMA controller core is used for high-speed data transfer with

minimal processor hold-up [35]. It links the transfers to non-contiguous memory using a table of descriptors from memory. SGDMA improves the overall system performance as compared to the DMA cores. The On-chip memory stores the executable program, data, as well as descriptors for the SGDMA controllers. Dual clock FIFO buffers are used between the SGDMAs and the 10GbE MAC IP core for clock domain crossing. Avalon Memory-Mapped (Avalon-MM), Avalon Stream (Avalon-ST) and Avalon conduit bus [36] are used as interface buses. A brief snapshot of the bus signalling is shown in figure 9. Avalon-MM interfaces are used to implement the address-based read and write interfaces for the source and sink SGDMA. Avalon-ST interface on the client side is used to configure 10GbE MAC IP. Avalon-ST supports the unidirectional flow of data for the components that need low latency, high throughput point-to-point data transfer with data bursting and interleaving option. All the read/write signals and data transfer is synchronized with an associated clock interface. The control lines are implemented using Avalon-ST bus.



Figure 9. Interface of Avalon-MM and Avalon-ST with source and sink SGDMA data transfer.

Data buffers are transmitted through the system interconnect fabric maintaining Avalon standards as shown in figure 10. The subsystem is programmed using the standard JTAG interface



Figure 10. User Logic.

available on the Intel development board. The 10G-base-R PHY IP is operated in the internal loopback mode. Software based loopback test setup is developed using the NIOS-II Software Builder Tool (SBT) [37]. The NIOS-II processor runs the application program that handles the data transmission.

It coordinates the design by allocating the memory to store the transmit and receive data buffers and the descriptor pairs. The test data is incremented in the transmit buffer; it populates the descriptor pair, writes the first descriptor pair to the SGDMAs, thereby starting the transfer, waits until both SGDMAs complete the transfer of all the data buffers. It also validates the received data with the transmitted data. The results are discussed in the section 5.2.1

#### FPGA Vier Logic Ver Logic Ver

#### 4.2.2 Model-2: Hardware platform and transceiver test

Figure 11. Simplified digital communication optical link.

Optical link architecture for the digital communication in FPGA is illustrated in figure 11. The data path consists of FPGA transceiver consisting of PCS and PMA, optical transmitter (laser diode circuitry) and receiver (PIN diode circuitry) along with multimode optical fibre [38]. FPGA is connected to the transmission channel through the PMA block which generates the required clocks and perform serialization/deserialization. The digital processing between the PMA and the FPGA core is performed by PCS block. The PCS performs byte serialization/deserialization, byte ordering, rate matching, and 64B/66B encoding/decoding for the reliable digital data channel. However, we restrict the scope of present work to the performance measurements of physical layer parameters, keeping aside the issues of the PCS sublayer. Tuning of the transceiver parameters is required for channel conditioning which affects the signal integrity and achieve the lowest possible bit errors. The major challenge lies in the fact that various components of the link have different parameter settings with a wide parameter optimization space and higher statistics are required to achieve low BER probability for a given confidence level [39].

Transceiver Toolkit (TTK) [32] from Intel is used to validate the transceiver link signal integrity and to access and tune the transceiver settings in real time. The Auto-Sweep test is performed to identify the best PMA parameter settings [40]. Transceiver parameter settings like Voltage Output Differential (VOD), Pre-emphasis Pre-tap, 2nd Pre-Tap, 1st post Tap, 2nd Post tap, Equalization, DC gain and Variable Gain Amplifier(VGA) [41] are scanned and tuned for the optimal performance by the Auto-Sweep test in TTK. It also reports the signal quality of the received data in terms of eye diagram to understand the signal degradation mechanism. Eye diagram serves as an indicator of the link performance and is used as a target parameter for the link optimization. The test setup for BER measurements and to tune the transceiver parameters for the high speed optical link is shown in figure 12.



Figure 12. Test setup for Transceiver parameter optimization and BER measurements.

It consists of an integrated FPGA system with embedded transceivers along with the Serial Form-factor Pluggable (SFP+) optical transceiver module and the Multi-Mode Fibre with connectors. Firstly, the light output from the optical transmitter is coupled to fibre and looped back without any optical attenuator. With this setup transceiver parameters are tuned using TTK and Auto-Sweep test. This achieves the optimum values of the transceiver PMA parameters known as solution space [40] at the targetted BER for the maximum height/width of the eye diagram. With these optimized solution space PMA settings; a manually controlled In-line Variable optical Attenuator (VOA) is introduced in the fibre loopback to induce optical power degradation. The Optical power output after the attenuation is measured using a handheld optical power meter with an insertion loss of < 0.3dB at the 850nm range of operation. The output from the attenuator is looped back as shown in figure 12. A Pseudo-Random Binary Sequence (PRBS) is transmitted across the transceiver link to evaluate the BER function with the pattern checker. BER at different attenuation levels were measured. This test characterizes the sensitivity of the receiver and the minimum optical power required for achieving a specified BER in a system. Details are discussed in section 5.2.2.

#### **5** Performance evaluation

Test results and the performance analysis for the implementation of the two interfacing links on FPGA are discussed in this section.

### 5.1 Link-1: GBT protocol on FPGA

The GBT-FPGA logic core firmware reference design is implemented on FPGA. Resource estimation is necessary to approximate the number of links that could be packed on an FPGA and to get an idea of the hardware resources utilization. It is also important that the modules consume least amount of power so that power consumption involving computation processes remains within the margin of power rating and prevents overheating. The resource utilization is shown in table 5 and power consumption using the Intel internal power monitoring tool is summarized in table 6

Latency measurement for the GBT protocol is a crucial parameter. Data transmission from the detector to the DPU have to be time synchronized and a fixed latency is required for the application in the trigger and timing system. Latency occurs in both transmitting and receiving directions,

| Parameters                               | Family: StratixV 5SGXEA7N2F45C3 |
|------------------------------------------|---------------------------------|
| Logic utilization (in ALMs)              | 10,542/234,720 (4%)             |
| Registers                                | 20060                           |
| Block memory bits                        | 202,752/52,428,800 (<1%)        |
| RAM Blocks                               | 56/2,560 (2%)                   |
| HSSI PMA TX/RX Serializers/Deserializers | 4/48 (8%)                       |
| PLLs                                     | 11/92 (12%)                     |

**Table 5**. FPGA resource utilization for the GBT-FPGA reference design.

Table 6. Power consumption with the GBT Encoding Scheme.

| Encoding Scheme       | Power consumtion | Power consumtion        |
|-----------------------|------------------|-------------------------|
|                       | in FPGA (mw)     | in Tranceiver bank (mw) |
| GBT Frame-coding mode | 4492.8           | 1947.77                 |
| GBT Wide-Bus mode     | 4082.31          | 1462.93                 |

depending on the media and path involved. The total path  $\mathcal{L}1$  in the loopback mode as shown in figure 7 and is given by equation (5.1). It consists of GBT transmitter (GBT Tx), Multigigabit transmitter (MGT Tx), Multigigabit receiver (MGT Rx) and GBT receiver (GBT Rx).

$$\mathcal{L}_1 = (GBT Tx - MGT Tx - MGT Rx - GBT Rx)$$
(5.1)

The number of clock cycles utilized in the GBT transmit and receive section and the MGT transceiver section are estimated separately. MGT transceiver is removed from the GBT protocol, and the GBT transmitter is coupled to the receiver section at the firmware stage. The two paths  $\mathcal{L}2$  and  $\mathcal{L}3$  is given by equation (5.2) and equation (5.3).

$$\mathcal{L}_2 = GBT \, Tx - GBT \, Rx \tag{5.2}$$

$$\mathcal{L}_3 = MGT \, Tx - MGT \, Rx \tag{5.3}$$

The clock cycles utilized are observed using the simulation models. The Latency in the MGT section is calculated by the difference of  $\mathcal{L}1$  and  $\mathcal{L}2$  path delays. Measurement within FPGA is always dependent on the data rate, hence the delay is measured in terms of clock cycles (1 clock cycle = 25 ns). Transmission latency is measured for all the possible combinations of mode of operation of GBT protocol and tabulated as shown in table 7. The information is useful for the designers to optimize the data acquisition firmware.

The signal quality of the GBT protocol operating at 4.8 Gbps is measured using Lecroy serial data analyzer. The Eye diagram and the details of jitter measurements are shown in figure 13. Eye width/height is 176.8 ps/373 mV at the BER of  $5.525 \times 10^{-12}$ . The measured total jitter is 51.148 picosec only. The data obtained is acceptable and beneficial for further studies.

BER measurements for GBT protocol with respect to the two encoding schemes as shown in figure 3 is plotted in figure 14. An exponential fit to the data is implemented.

Margin of Receiver Sensitivity for targetted BER of  $10^{-12}$  between both the schemes is

$$= (15 - 12.9) \, dBm = 2.1 \, dBm \quad (5.4)$$

| Latency for path                                            | GBT Link mode of operation(ns) |           |           |        |
|-------------------------------------------------------------|--------------------------------|-----------|-----------|--------|
|                                                             | Tx Latopt                      | Tx Latopt | Tx Std    | Tx Std |
|                                                             | Rx Latopt                      | Rx Std    | Rx Latopt | Rx Std |
| $\mathcal{L}_1 = GBT \ Tx - MGT \ Tx - MGT \ Rx - GBT \ Rx$ | 150                            | 350       | 350       | 600    |
| $\mathcal{L}_2 = GBT \ Tx - GBT \ Rx$                       | 75                             | 275       | 250       | 425    |
| $\mathcal{L}_3 = MGT  Tx - MGT  Rx$                         | 75                             | 75        | 100       | 175    |

Table 7. GBT link latency measurement.



Figure 13. (Left) Eye Diagram for GBT protocol and (Right) Jitter Measurement.



Figure 14. BER measurement for GBT Frame coding and GBT wide-Bus mode.

2.1 dBm as given in equation (5.4) is in close agreement to the measurement conducted for GBT protocol implemented on Xilinx FPGA [42] which is around 2.5 dBm.

#### 5.2 Link-2: 10GbE protocol on FPGA

It includes two models. Model-1 presents the implementation results of 10GbE on FPGA and the analysis in terms of resource utilization, stages of the frequency translation, format of data transmission and the latency involved. Model-2 presents the transceiver tests, tuning to achieve solution space, spider chart, eye diagram and BER measurement as a function of optical power.

#### 5.2.1 Model-1 results

The test setup shown in figure 8 is implemented on FPGA and the logic resources utilized are summarized in table 8. The data is transmitted from the fabric clock frequency of 156.25 MHz

| Parameters                               | Family: StratixV 5SGXEA7N2F45C3 |
|------------------------------------------|---------------------------------|
| Logic utilization (in ALMs)              | 6,685/234,720 (3%)              |
| Registers                                | 11291                           |
| Block memory bits                        | 586,560/52,428,800 (1%)         |
| RAM Blocks                               | 51/2,560 (2%)                   |
| HSSI 10G TX/RX PCSs                      | 1/48 (2%)                       |
| HSSI PMA TX/RX Serializers/Deserializers | 1/48 (2%)                       |
| PLLs                                     | 3/92 (3%)                       |

Table 8. FPGA resource utilization for the 10GbE design.

to 10.3125 Gbps at the transceiver. This frequency translation occurs at three stages as shown in figure 15. Data output from the 10GbE MAC is transmitted to the 10G-base-R PHY over the XGMII parallel lines each at 156.25 MHz. These are multiplexed to 16 parallel lines, each operating at a frequency of 625 MHz keeping the bit rate same. The frequency of each bit is shifted to 644.53125 MHz after encoding in the PCS layer. At the Serializer/deserializer, data is serialized and each bit is transmitted from the silicon to the physical media, with a data rate of 10.3125 Gbps. The reverse operation occurs at the receiver. Data Packet transmission in 10GbE MAC complies with the IEEE



Figure 15. Three level of frequency translation in 10GbE communication.

802.3ae Ethernet standard [11] when transmitting data frames on the XGMII interface. 10GbE MAC transmitter performs the endian conversion [43] and the frames received on the Avalon-ST interface from the user follows big endian format. The transmission on the XGMII interface follows little endian format by transmitting the frames from the least significant byte as shown in the figure 16. In the receive data path, the 10GbE MAC Receiver decodes the data lanes coming through the XGMII. For all valid frames, the 10GbE MAC receiver removes the START, preamble, SFD, and EFD bytes and ensures the byte-wise frame alignment. The data transfer latency regarding the clock cycles is calculated for the user logic shown in figure 10 and summarized in table 9

 Table 9. Latency estimation for data transfer(1 clock cycle = 156.25MHz).

| Path                  | $\mathscr{L}_{21}$ | $\mathscr{L}_{32}$ | $\mathscr{L}_{43}$ | $\mathscr{L}_{2'1'}$ | $\mathscr{L}_{3'2'}$ | L4'3' |
|-----------------------|--------------------|--------------------|--------------------|----------------------|----------------------|-------|
| Latency(Clock Cycles) | 0                  | 3                  | 30                 | 0                    | 9                    | 12    |

#### 5.2.2 Model-2 results

Transceiver testing is done as discussed in section 4.2.2 utilizing Intel TTK design operating in 10Gb mode. The transceiver is configured in far-end optical loop-back mode. PRBS31 is used



Figure 16. MAC to XGMII data payload conversion scheme.

for optimizing the parameters as it provides the most stressful boundary conditions to achieve a confidence level in the operating margins of design as shown in figure 17. Autosweep test has been



Figure 17. Variation of Eye width and Eye Height with PRBS type.

performed to scan the best performing case concerning Eye Width/Height at targetted BER of  $10^{-12}$ . As indicated by the Auto-Sweep test, solution space is plotted in the form of spider chart as shown in figure 18. The parameters are fixed and the eye diagram is captured at the best PMA settings, and it is shown in the figure 19 with Eye Width(Horizontal Phase Step)/Eye Height(Vertical Step) as 45/26. BER at different attenuation levels of optical transmitted power [44] are measured as shown



Figure 18. Plot for tuning of transceiver parameter optimized settings at PRBS31.



Figure 19. Eye Diagram for the 10Gb Ethernet on FPGA.



Figure 20. Bit error rate as a function of received optical power at 10Gbps.

in figure 12. BER as a function of received optical power is shown in figure 20. Optical transmitted power of around -11 dBm is required to achieve the BER of  $10^{-12}$  for the transceiver under test. The exponential fit through the data points yields equation of the form  $BER(dB) = a \times e^{b \times Power(dBm)}$ , where coefficients a and b are -144.33 and 0.22887 respectively. The exponential fitting is done as BER is approximated by complementary error function 'erfc' and the system noise is Gaussian in nature; in logarithmic scale, it is approximated as exponential. The statistics for goodness-of-fit; the Sum of Squares due to Error (SSE), R-square, Adjusted R-square and Root mean squared error (RMSE) is 11.71, 0.9698, 0.9686 and 0.484 respectively.

#### 5.3 Discussion

The links of the scheme are implemented on Intel FPGA. The resource utilization and power consumption are measured. Latency calculation gives a measure of the clock cycles utilized for the data processing in the logic path and the distribution of the buffer (Elastic or external phase aligner) in the transmission path. This information is a useful reference for the designers to optimize the data acquisition firmware. Latency in terms of clock cycles for Tx Latopt and Rx Std mode of GBT operation is found to be 14 clock cycles (350 ns) which is the most utilized mode for fixed latency operation. Tx Latopt mode is necessary to send the timing information in a deterministic way whereas on the receiver side the data comes padded with time stamp and hence the timing constraint is relaxed on receiver side that allows the use of Rx Std mode. Signal quality of the GBT protocol is measured using eve diagram with BER of the order of 1 bit in  $10^{12}$  and jitter range of Picoseconds only. The margin of receiver sensitivity is found to be 2.1 dBm for the two encoding schemes of GBT at the targetted BER  $\sim 10^{-12}$ . It is found that the measurement of BER for GBT protocol with respect to the optical power as shown in figure 14 cannot be pursued below -17 dBm receiver sensitivity, due to the loss of recovered clock. However, the plot can be extrapolated based on standard complementary error function nature of the curve, assuming the Gaussian nature for noise. 10GbE link is implemented using the Qsys approach and the three level of frequency translation from the fabric frequency to the optical transmission is discussed. Endian conversion during the data packet transmission for the protocol is shown in figure 16. The number of clock cycles to transmit the data buffer through the system interconnect fabric are calculated. Transceiver is tuned for the high-speed link using signal conditioning circuitry as it forms the important hardware interface for data transmission from OCM to DPU at 4.8 Gbps and from DPU to the DAQ server at 10Gbps. Autosweep test is performed using Intel TTK and the multivariate data for the best case is displayed on a 2D spider chart shown in figure 18. The variation of BER at the speed of 10Gbps as a funtion of optical power upto -15.5 dBm is plotted in figure 20, below which the receiver sensitivity is lost. The deviation of the data set from the exponential fit is due to the various parameters for instance, opto-electronics conversion factor, gain, optical couplings, the insertion losses and the accuracy of the instruments used.

#### 6 Summary

In this work we have discussed a practical approach for the development of DAQ for high rate data transmission in HEP experiments. In this scheme, the FEE sends data from the harsh radition environment of the experimental cavern to the counting room over optical links. The DPU is located in the counting room and implemented using commercially available state-of-the-art FPGAs with large resources as compared to the radiation hardened FPGAs. The study helps to select the necessary components of a DAQ layout scheme judiciously. This readout system assists the high-speed optical data communication with multi-bit error correction. The interfacing links of the DAQ; *viz.*, GBT and 10GbE are implemented on silicon. A detailed testing and performance analysis of the links implementation are presented for FPGA resource utilization, protocol latency, signal integrity, transceiver tests, BER and functional tests. This work is useful for the hardware alignment and firmware calibration purposes and acts as a golden reference for the DAQ designers.

The scheme is portable to the different set of FPGAs. All these factors befit the present approach to cope up with current and future needs of the experiments.

#### References

- [1] C. Inguimbert et al., "Effective NIEL" in silicon: calculation using molecular dynamics simulation results, IEEE Trans. Nucl. Sci. 57 (2010) 1915.
- [2] S. Castillo and K. Ozanyan, *Field-programmable data acquisition and processing channel for optical tomography systems, Rev. Sci. Instrum.* **76** (2005) 095109.
- [3] C.C.W. Robson et al., *An FPGA-based general-purpose data acquisition controller*, *IEEE Trans. Nucl. Sci.* **53** (2006) 2092.
- [4] J. Toledo et al., Past, present and future of data acquisition systems in high energy physics experiments, Microproc. Microsyst. 27 (2003) 353.
- [5] G. Apollinari et al., *High-Luminosity Large Hadron Collider (HL-LHC): preliminary design report*, CERN Yellow Reports: Monographs CERN-2015-005, CERN, Geneva Switzerland, (2015).
- [6] ALICE collaboration, P. Antonioli, A. Kluge and W. Riegler, *Upgrade of the ALICE readout & trigger system*, CERN-LHCC-2013-019, CERN, Geneva Switzerland, (2013) [ALICE-TDR-015].
- [7] M.C. Herbordt et al., *Achieving high performance with FPGA-based computing*, *Computer* **40** (2007) 50.
- [8] S. Baron et al., Implementing the GBT data transmission protocol in FPGAs, in TWEPP-09 Topical Workshop on Electronics for Particle Physics, http://hal.in2p3.fr/in2p3-00468912, Paris France, September 2009, pg. 631.
- [9] P. Moreira et al., *The GBT project*, in *TWEPP-09 Topical Workshop on Electronics for Particle Physics*, CERN-2009-006, Paris France, September 2009, pg. 342.
- [10] P. Moreira et al., *The GBT: a proposed architecture for multi-Gb/s data transmission in high energy physics*, in *Proc. Topical Workshop on Electronics for Particle Physics*, (2007), pg. 332.
- [11] IEEE Computer Society, IEEE standard for ethernet, IEEE Std 802.3<sup>TM</sup>-2015 4 (2015) 38.
- [12] A.D. Oancea et al., A resilient, flash-free soft error mitigation concept for the CBM-ToF read-out chain via GBT-SCA, in 2015 25<sup>th</sup> International Conference on Field Programmable Logic and Applications (FPL), September 2015, pg. 1.
- [13] H.G. Essel, FutureDAQ for CBM: on-line event selection, IEEE Trans. Nucl. Sci. 53 (2006) 677.
- [14] P. Moreira et al., The GBT-SerDes ASIC prototype, 2010 JINST 5 C11022.
- [15] P. Vichoudis et al., *The Gigabit Link Interface Board (GLIB), a flexible system for the evaluation and use of GBT-based optical links,* 2010 *JINST* **5** C11007.
- [16] G.F. Knoll, Radiation detection and measurement, John Wiley & Sons, U.S.A., (2010).
- [17] G. Tambave and A. Velure, *High speed continuous DAQ system for readout of the ALICE SAMPA ASIC*, in 2016 *IEEE-NPSS Real Time Conference(RT)*, June 2016, pg. 1.
- [18] A. Aloisio, F. Cevenini, R. Giordano and V. Izzo, *High-speed, fixed-latency serial links with FPGAs for synchronous transfers*, *IEEE Trans. Nucl. Sci.* 56 (2009) 2864.
- [19] Intel Altera, Stratix V device handbook volume 2: transceiver, U.S.A., (2017).

- [20] J.P. Cachemiche, P.Y. Duval, F. Hachon, R. Le Gac and F. Réthoré, *The PCIe-based readout system for the LHCb experiment*, 2016 *JINST* 11 P02013.
- [21] F. Costa et al., The new frontier of the data acquisition using 1 and 10 Gb/s ethernet links, Phys. Proc. 37 (2012) 1956.
- [22] Y. Jiang, C.-K. Tham and C.-C. Ko, Challenges and approaches in providing QoS monitoring, Int. J. Network Manag. 10 (2000) 323.
- [23] Altera, Stratix V GX FPGA development board reference manual, U.S.A., (2015).
- [24] R. Schwemmer, J.P. Cachemiche, N. Neufeld, C. Soos, J. Troska and K. Wyllie, *Evaluation of 400 m*, 4.8 *Gbit/s versatile link lengths over OM3 and OM4 fibres for the LHCb upgrade*, 2014 *JINST* 9 C03030.
- [25] Intel Altera, Introduction to Intel FPGA IP cores, U.S.A., (2017).
- [26] M. Barros Marin et al., The GBT-FPGA core: features and challenges, 2015 JINST 10 C03021.
- [27] Intel Altera, *Quartus Prime standard edition handbook volume 1: design and synthesis*, U.S.A., (2017).
- [28] Texas Instruments, Low phase noise clock evaluation module up to 1.5 GHz, U.S.A., (2008).
- [29] S. Baron and J. Mendez, The GBT-FPGA user manual, tech. rep., CERN Geneva Switzerland, (2016).
- [30] FPGA working group, J. Mendez, GBT-FPGA tutorial, in TWEPP '16, (2016).
- [31] J. Mitra et al., *GBT link testing and performance measurement on PCIe*40 and AMC40 custom design FPGA boards, 2016 JINST 11 C03039.
- [32] Channels, Controlling Transceiver, Transceiver debugging overview, (2013).
- [33] Intel corporation, JNEye user guide, U.S.A., (2017).
- [34] Altera, NIOS-II, processor reference handbook, U.S.A., (2009).
- [35] Intel Altera, Scatter-gather DMA controller core overview, U.S.A., (2009).
- [36] Altera, Avalon interface specification, application note, U.S.A., (2006), pg. 11.
- [37] Altera, NIOS-II, getting started with NIOS-II software in Eclipse, U.S.A., (2014).
- [38] A. Kuzmin and D. Fey, *Optical link testing and parameters tuning with a test system fully integrated into FPGA*, in *The Fourth International Conference on Advances in System Testing and Validation Lifecycle*, (2012).
- [39] A.C. Xiang et al., *High-speed serial optical link test bench using FPGA with embedded transceivers*, CERN, Geneva Switzerland, (2009).
- [40] Altera, *High-speed link tuning using signal conditioning circuitry in Stratix V transceivers, white paper*, U.S.A., (2015).
- [41] Altera, Understanding the pre-emphasis and linear equalization features in Stratix IV GX devices, application note, U.S.A., (2010).
- [42] C. Soos, GBT protocol implementation on GBT protocol implementation on Xilinx FPGAs, LHCb meeting, (2008).
- [43] Altera, 10 Gbps ethernet MAC megacore function user guide, U.S.A., (2014).
- [44] S. Detraz et al., FPGA-based bit-error-rate tester for SEU-hardened optical links, JINST (2009).