## Associative Memory Design for the Fast TracKer Processor (FTK) at ATLAS

A. Stabile for the AMchip collaboration

NSS, Valencia, Spain 24 Oct. 2011





- 48 Data Formatters (DF)
  - Clustering Mezzanine
- 128 Processing Units
  - AUX Board (FPGA):
    - Data Organizer (DO)
    - Track Fitter (TF 8 layers)
    - Hit Warrior (HW)
  - AM Board with 10M patterns on AMchip04 custom CAMs
- 32 Final Boards (FPGA)
  - Final Fit (11 layers)
  - Final Hit Warrior





### The Associative Memory

- Dedicated device maximum parallelism
- Each pattern with private comparator
- Track search during detector readout



| Approach                                | Tech.  | Num. of Pat.    | Layers |
|-----------------------------------------|--------|-----------------|--------|
| Full custom                             | 700 nm | 0,128 kpat/chip | 6      |
| FPGA                                    | 350 nm | 0,128 kpat/chip | 6      |
| STD cells                               | 180 nm | 5,0 kpat/chip   | 6      |
| STD cells $+$ Full custom (new for FTK) | 65 nm  | 80 kpat/chip    | 8      |



UNIVERSITÀ DEGLI STUDI DI MILANO

# AM working principle



- 1 Flip-flop (FF) for each layer stores layer matches
- All patterns are compared in parallel with incoming data (HIT)
- Fast pattern matchin and flexible input
- the AM readout is based on a modified Fischer Tree <sup>1</sup>

<sup>1</sup>P. Fischer NIM A461 (2001) 499-504

<<p>(日)、

UNIVERSITÀ DEGLI STUD DI MILANO

# AM Chip Memory Layer

To save power we have used two different match line driving scheme<sup>2</sup>

- Current race scheme
- Selective precharge scheme



• Each layer stores a word position: 12 bits + 3 "dont care" bits (value 0,1,x)

2 "Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey", Kostas Pagiarni and Al Diminio Sheikholeslami IEEE Journal of Solid-State Circuits, Vol. 41, NO. 3, March 2006

NSS 2011 (Valencia, Spain)

Alberto Stabile

24 Oct. 2011 5 / 10

UNIVERSITÀ DEGLI STUD

# CAM layer timing diagram



24 Oct. 2011 6 / 10

### The full custom cell





64 pattern vertically

NSS 2011 (Valencia, Spain)

Alberto Stabile

24 Oct. 2011 7 /

The AMchip has an area of 14 mm<sup>2</sup>

CAM is organized as 22 column x 12 row of full custom macro blocks

Each block is 64 x 2 layers

Between two row of blocks there is the majority logic and the fisher tree made using STD cells approach





Image: A mathematical states of the state



NSS 2011 (Valencia, Spain)

24 Oct. 2011 8 / 10



#### Ternary cells: "Don't care bits"

We can use dont care on the least significant bit when we want to match the pattern layer at large resolution or to use all others bits to match with a thinner resolution Coincidence window is programmable layer by layer and pattern by pattern<sup>a</sup>

<sup>a</sup>A new Variable Resolution Associative Memory for High Energy Physics ATL-UPGRADE-PROC-2011-004



Image: A mathematical states and a mathem

#### AM chip status

Completed:

- Full Custom memory block layout and simulation with back-annotate schematics
- Floor plan of entire chip including IO cells and pad ring placement
- Place and Route by means of the Foundation Flow by Cadence Encounter
- Creation of a memory block verilog model for full chip simulation

in progress:

- Improvement of the verilog model to add some new features
- Logic simulations to obtain exaustive results
- Complete AMS simulation of some critical cases

Future:

- By increasing the area we want to enlarge the bank from 8k patterns for chip to 80k patterns for chip
- How to implement power saving architecture and full custom design to gain in memory density

#### AM chip summary (about 1M of comparisons in parallel)

 $Number\_of\_comparisons = Number\_of\_pattern \, \cdot \, Number\_of\_layers \, \cdot \, Number\_of\_bit$ 

$$1179648 = 8192 \cdot 8 \cdot 18$$

NSS 2011 (Valencia, Spain)

イロト イポト イヨト イヨ