



## The Upgrade of the ATLAS Level-1 Central Trigger Processor

#### TWEPP 2012 Oxford, 18. September 2012

G. Anders, D. Berge, H. Bertelsen, M. Dam, E. Dobson, N. Ellis,
P. Farthouat, C. Gabaldon Ruiz, M, Ghibaudi, B. Gorini, <u>S. Haas</u>,
M. Kaneda, A. Messina, T. Pauly, R. Pottgen, R. Spiwoks,
T. Wengler, S. Maettig, M. Stockton, S. Xella

### Outline

- Current ATLAS Level-1 central trigger architecture
- Phase-0 trigger upgrade (2014)
- Central Trigger Processor (CTP) functionality and architecture
- Motivation for CTP upgrade and specification
- New CTPCORE module design
  - Implementation
  - Firmware
- New CTPOUT module & backplane design
- Summary

# **ATLAS Level-1 Trigger System**

- Process reduced granularity information from calorimeter and muon detectors
- Trigger decision based on object multiplicities at different thresholds
- Generate Level-1 Accept (L1A) and send via Timing, Trigger and Control (TTC) distribution to detector front-ends to initiate readout
- Identify regions-of-interest (Rol) to seed the LVL2 trigger
- Synchronous, pipelined processing system operating at the bunch crossing (BC) rate of 40 MHz
- Maximum round-trip latency: 2.5 us
  - Data stored in on-detector pipelines
- Maximum trigger rate: 75 kHz
- Custom built electronics



# Phase-0 Trigger Upgrade (2014)

- Calorimeter trigger
  - Replace PreProcessor multi-chip modules (MCMs)
  - Replace merger modules (CMX)
    - Send L1Calo Rols to topological processor
- Topological trigger (new)
  - Topological processor with calorimeter and subset of muon Rol input
    - Improve multi-object selection for increasing luminosity
- Central trigger (upgraded)
  - New CTP modules to add more inputs and extend number of menu items
- Latency and trigger rate limits remain unchanged until Phase-II
  - 2.5 us round-trip latency
    - ~20 BC headroom
  - 75 kHz (upgradeable to 100 kHz)
    - Limited by readout



## **CTP Functionality**

- Level-1 trigger decision (L1A generation)
   Low latency: ~4 BC
- Combine triggers from calorimeter & muon trigger processors and forward detectors
- Trigger item formation
  - Flexible logical combinations of trigger inputs as defined in the trigger menu
- Bunch group masking
  - Masking of triggers as a function of LHC bunch structure
- Trigger item pre-scaling
- Preventive dead-time generation to protect detector front-ends
  - Simple (fixed number of bunches) and complex dead-time (leaky bucket algorithm)
  - Busy from readout (busy tree)
- Receive timing signals (BC, ORBIT) from the LHC
- Fan-out trigger and timing signals to TTC partitions, receive busy
- Send Rol summary information to LVL2 and event data to DAQ
- Online monitoring of trigger, dead-time and busy
  - Provides essential information for luminosity and background monitoring



Forward

detectors

Muon

Trigger

Calorimeter

LHC

Trigger

## **CTP** Architecture

- 9U VME chassis with 11 modules
- CTPMI Machine interface
  - Receives timing signals from LHC
- 3 x CTPIN Input module
  - Receives, synchronizes and aligns trigger input signals
- CTPMON Monitoring module
  - Performs bunch-by-bunch trigger monitoring

- CTPCORE Core module
  - Generates Level-1 Accept (L1A)
  - Sends event summary information to LVL2 & DAQ
- 4 x CTPOUT Output module
  - Send timing signals to LTPs
- 3 custom backplanes
  - Trigger & timing (COM), Trigger inputs (PIT) & calibration requests (CAL)



#### **CTP Resource Utilisation**

- CTP in ATLAS is working very reliably
- No spare capacity for some of the resources
- CTP utilization from 2012 trigger menu:

|                                            | Used | Available |
|--------------------------------------------|------|-----------|
| CTPIN input cables (partially used)        | 9    | 12        |
| CTPIN input signals                        | 212  | 372       |
| CTPIN integrating monitoring counters      | 138  | 768       |
| PIT bus lines                              | 160  | 160       |
| CTPCORE trigger items                      | 241  | 256       |
| CTPCORE bunch group masks                  | 8    | 8         |
| CTPCORE maximum number of AND terms        | 6    | 256       |
| CTPCORE maximum number of bits in OR terms | 6    | 12        |
| CTPCORE per-bunch trigger item counters    | 12   | 12        |
| CTPOUT cables to TTC partitions            | 20   | 20        |
| CTPMON per-bunch monitoring counters       | 88   | 160       |

# **Motivation for Upgrade**

- Primary motivation: remove CTP resource limitations
  - Increase the number of trigger inputs
  - Increase the number of trigger items (combinations)
- Additional features
  - Allow partitioning of L1A generation for detector commissioning
  - Improved bunch group masking and per-bunch trigger item monitoring
  - Low latency direct electrical inputs from Topological processor to CTPCORE
  - Option to use optical inputs to connect to new/upgraded sub-systems (latency budget permitting)
- CTP upgrade requires complete redesign of several modules
  - CTPCORE
  - CTPOUT
  - COM backplane
- Next major CTP upgrade foreseen only for Phase-II
  - Level-1 trigger architecture changes -> 2 hardware trigger levels (L0/L1)
  - Latency budget increases

## **Phase-0 CTP Specifications**

- 320 trigger inputs on PIT bus backplane (now 160)
  - Using double data rate signalling on the existing PIT bus backplane
  - Previous study has shown proof of principle
  - Latency penalty of ~2 BC
- 512 trigger items (now 256)
- 4 trigger partitions (now 1)
  - Common trigger menu
  - Each L1A partition has selection of trigger items (mask), and it's own dead-time handling
  - Only primary L1A partition (i.e. "physics" partition) provides LVL2 and DAQ readout
  - Secondary L1A partitions used for detector commissioning, calibration, etc.
- 16 bunch groups (now 8)
  - After forming the trigger items (now part of trigger item)
- 256 per-bunch counters for trigger item monitoring (now 12)
- 64 electrical LVDS inputs for L1Topo trigger signals to CTPCORE
- 12 serial optical inputs using ribbon-fiber receiver
- Overall latency target 6-7 BC

### **CTPCORE++ Module Design**



- Schematics design in progress
- Prototype expected in Q1'2013

# **Implementation (1)**

- Two Virtex-7 FPGAs (XC7VX485T, BGA1157)
  - 600 I/O pins, 20 multi-gigabit transceivers (GTX), 300k LUTs (6-input), 600k flip-flops, 1030 RAM blocks (36 kbit)
  - Possibility to migrate to smaller or larger densities in same package for production modules
- Need to send ~2000 bit/BC from trigger path FPGA to readout/monitoring FPGA (~80 Gbit/s)
  - Requires 16 serial links @ 6.4 Gbaud
  - 8B10B encoding, 120 bit/link/BC
- DDR3 playback and diagnostics memories
  - Inject test patterns & store results
  - ~2000 bits/BC => 80 Gbit/s
  - Two DDR3 SODIMM modules
  - Requires 60% transfer efficiency @ 1066 Mb/s





# **Implementation (2)**

#### Optical inputs

- Avago MiniPOD parallel ribbon fibre transmitters/receivers
- High density (~2x2cm), can be placed close to FPGA
- Eases signal integrity issues (short traces)
- Link speed (conservative): 6.4 GBd, 8B10B coded
  - Up to 128 bit/BC/fiber => more than enough
  - Using fewer bits than 128 per link can somewhat reduce link latency
  - Latency penalty ~ 3 BC
- Initially use electrical inputs, migrate to optical when required

#### Design challenges

- Signal integrity of high-speed serial links
- Power supply distribution is complex
  - Many low-voltage (1.0V, 1.2V, 1.8V, 2.5V) digital supplies for FPGAs
  - Separate low-noise supplies for transceivers (1.0V, 1.2V, 1.8V)
- Virtex-7 I/O only support signal levels up to 1.8V: level translation required
  - PIT bus (2.5V SSTL) -> CPLD with different I/O bank voltages
  - VME -> small FPGA with 3.3V and 1.8V I/O banks
  - LVDS receivers -> NMOS pass transistors for 3.3V to 1.8V translation



### **CTPCORE++** Trigger Path

- Trigger path implemented in one FPGA to reduce latency
  - Initial trigger path firmware implementation shows a latency of ~2 BC

В

Ρ

512

B

G

R

Ρ

Dominated by routing delays

IJ

480

 Second FPGA for non-latency critical functions (readout & monitoring)

Μ

512

LUT – LookUp Tables

BGRP – Bunch-GRouP mask

C

A

Μ

- DT DeadTime
- ITM Trigger IteM
- TAP Trigger item After Prescale
- L1Ap L1A for physics partition

**CAM – Content-Addressable Memory** 

Ε

0

512

R

D

- PSC PreSCaling
- PIT Pattern In Time

Т

Α

Ρ

512

Ρ

S

- **TBP** Trigger item Before Prescale
- TAV Trigger item After Veto
- L1As L1A for secondary partitions

Ρ

Т

320

L1Ap

L1As (3)

### **CTPCORE Firmware**

- Trigger menu implemented as large CAM

   480 data inputs, 512 match outputs

   Xilinx device architecture enables
   implementation of wide CAMs

   LUT configured as shift register (SRL) for data word lookup
   Carry chain for cascading CAM word
   20 bit wide CAM in one Virtex-7 SLICEM
- 480-wide, 512 deep CAM
  - 12800 SLICEs: 40% of XC7VX485T SLICEMs
- CAM content is initialized by loading shift register chain through VME
  - Values are computed from the Level-1 trigger menu
  - Upgrade will require rewrite of trigger menu compiler (TMC) software
- Firmware development and prototyping
  - Using Xilinx Virtex-7 development board (VC707)
  - Same FPGA (XC7VX485T), different package
  - Trigger path firmware ~40% utilization



Stefan Haas, 18. September 2012

## **CTPOUT++ Module & Backplane**

- Upgrade CTPOUT to support more than one trigger partition
- Select trigger partition for each output cable
  - Multiplex L1A & TTYPE
  - Select busy to drive
  - Fan-out BC, ORB, ECR
- Multiplexing of trigger signals implemented in CPLD
  - Low latency: ~1/2 BC for CTPOUT
- Additional features
  - Improved busy monitoring
  - Programmable pattern generator for tests
- Also need to replace COM backplane to distribute signals for 3 additional trigger partitions
  - Increase the number of outputs from CTP to the TTC partitions (20 -> 25)



Electrical LVDS links to detector TTC partitions

#### Summary

- Phase-0 CTP upgrade to allow additional trigger inputs and to provide increased flexibility for the trigger menu
- Double the number of trigger inputs (160 -> 320)
  - Double data rate signalling on PIT backplane
  - Additional low-latency electrical inputs for topological processor
  - Provision for optical inputs in addition to electrical
- Double the number of trigger items (256 -> 512)
- Add 3 secondary trigger partitions for detector commissioning and calibration runs
- Redesign several modules
  - CTPCORE
  - CTPOUT
  - COM backplane
- Prototypes expected in Q1'13