# A Systolic Hough Transform Processor as a Second Level Trigger for Drift Chambers\* F. $Klefenz^{1}$ , W. $Conen^{1}$ , R. $Zoz^{1}$ , R. $M\ddot{a}nner^{1,2}$ 1) Lehrstuhl für Informatik V, Universität Mannheim, W-6800 Mannheim, Germany Interdisziplinäres Zentrum für Wissenschaftliches Rechnen, Universität Heidelberg, W-6900 Heidelberg, Germany #### **ABSTRACT** A systolic processor has been developed that executes a parallel Hough transform. The system has been tailored to a specific pattern recognition task, the identification of particle tracks in the r,\$\phi\$ projection of the OPAL jet chamber. For all well defined tracks the starting angle and the radius of curvature is computed in 3.3 µs. The system consists of a Hough transform processor that identifies the tracks and an Euler processor that counts their number by applying the Euler relation to the thresholded result of the Hough transform. For one sector of the detector a prototype system has been realized with 21 XILINX chips. It consists of 35x32 processing elements. The full scale system will use 26,880 processing elements. The processor can easily be adapted to different generalized Hough transforms and various detector geometries. The prototype has been functionally tested with OPAL test data sets. No deviations from the offline simulation have been found. The prototype operates at a clock rate of 40 MHz. ## 1. Introduction The systolic processor described finds and counts in an image all circle segments with a common vertex point. It was desitgned for identifying particle tracks in the OPAL jet chamber at CERN¹ which originate from the interaction zone and are given as circle segments of an arbitrary starting angle and a minimal radius of curvature. The complete operation has to be finished within 5 $\mu$ s. The pattern recognition process consists of four steps, 1) preprocessing of the detector data for adaption to the systolic architecture, 2) a Hough transform that maps the detector coordinates in the $(r,\phi)$ -plane onto new coordinates in a $(1/r_c,\phi_s)$ -plane with $r_c$ = radius of curvature and $\phi_s$ = starting angle of a track, 3) applying a global threshold to the transform result, and 4) applying the Euler relation to the binary image obtained to compute the number of tracks. The system therefore consists of a preprocessor, a Hough transform processor that determines well defined tracks, and an Euler processor that counts their number. # 2. Principle of Operation The OPAL detector consists of 24 independent sectors. Each track i is defined by a set of track points $(r,\phi)_i$ . Because interesting tracks come from the origin, each one is determined by <sup>\*</sup> This work has been supported by the Gesellschaft für Schwerionenforschung Darmstadt (GSI), Darmstadt, Germany. its starting angle $\phi_S$ and its radius of curvature $r_C$ . Together with the origin every pixel $(r,\phi)$ in the detector image specifies a class of tracks with different $\phi_S$ and $r_C$ . From Fig. 1 one can derive the relation between $(\phi_S, r_C)$ and $(r,\phi)$ as $$\frac{r/2}{r_c} = \sin(\phi - \phi_s) \tag{1}$$ The histogramming method used for identifying particle tracks<sup>2</sup> is based on accumulating the number of pixels that vote for every possible track ( $\phi_s$ ,r<sub>c</sub>). As can be seen from Eq. (1), each pixel (r, $\phi$ ) is mapped onto a sine curve with amplitude 2/r and phase shift $\phi$ . This mapping is a special type of a Hough transform. Applying this transform to every detector pixel yields one curve per pixel in the (1/r<sub>c</sub>, $\phi_s$ )-plane (Fig. 2). Pixels belonging to a track with parameters (1/r<sub>c</sub>, $\phi_s$ ) have a common intersection point. The result of this Hough transform therefore gives the probability for the existence of all possible tracks. Fig. 3 shows the representation of an ideal track in the (1/r<sub>c</sub>, $\phi_s$ )-plane. Fig. 1: Mapping of $(r,\phi)$ and $(1/r_C,\phi_S)$ . Fig. 2: Hough space. Fig. 3: A track in the $(1/r_C, \phi_S)$ -plane. #### 3. The data flow architecture In the preprocessing stage a readout unit is assigned to every detector wire k. Each wire directly specifies the radial coordinate $r_k$ . The readout unit extracts the drift times $t_i$ from the signal distributions. In a $t/\phi$ -converter unit composed of bit-serial delay lines these drift times $t_i$ are directly transformed to the corresponding values $\phi_i$ . The input to the Hough transform processor is a continuous data flow of pixels $(r_k, \phi_i)$ . Each time step one pixel row of the same $\phi$ is transferred in parallel to the Hough transform processor and one column of the $(1/r_c, \phi_s)$ -histogram is output. The essential operation is to generate the corresponding sines for all detector pixels $(r_k, \phi_i)$ . All histogram bins which are given by the sines have to be incremented. The sine generation for each detector pixel is serialized in time so that every time step each sine is expressed by the $\Delta(1/r_c(\phi_s))$ increments. Every time step these $\Delta(1/r_c(\phi_s))$ increments are entered into the histogram in parallel for each detector pixel. The update of the histogram is realized with two functional building blocks, a set of 32 adder trees with 35 inputs each and a set of activation units. In every time step each adder tree sums up its activation inputs and delivers one of the $(1/r_c, \phi_s)$ histogram values. So all adder trees output in parallel one column of the $(1/r_c, \phi_s)$ histogram every time step. The activation units form a $32\times35$ activation matrix for the adder trees. Each wire is connected to its own activation unit. Each activation unit has one interconnect to each of the 32 adder trees in a columnwise arrangement. Each activation unit which is realized as a serial shift register with parallel output creates the right activation pattern for its interconnects to the adder trees. The pixel input streams are transformed into successive activation of the adder trees with a fixed arrangement of delay elements and a fixed mapping of adder tree activations for each activation unit separately. This interconnection pattern is programmed into the XILINX chips. The Hough transform processor outputs a 2D histogram which contains peaks at all positions in the $(1/r_c, \phi_s)$ plane that correspond to well defined tracks. By setting a global threshold in the 2D histogram a binary image is created that contains clusters of adjacent pixels at these positions. For a trigger decision these clusters have to be counted. This can easily be done with the help of the Euler relation: connectivity number = $$n \cdot \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} - m \cdot \begin{bmatrix} x & 1 \\ 1 & 0 \end{bmatrix}$$ . The connectivity number is computed as the difference of how often the two 2×2 patterns appear in the image. Since the operation is local this updating can again easily be done in parallel by a systolic processor. The binary image that has been created by thresholding is shifted into the pipeline one column every time step in parallel. The processor consists of 32 pattern matching units (PMUs) for the left pattern and 32 PMUs for the right pattern. A PMU sets the registration line "n" to high if the left pattern is detected in two sequential columns, or line "m" to high if the right pattern is detected. Subsequently an accumulator sums up the number of occurrences of n and m, which are delivered by the output signals of the PMUs. Then an integrator unit sums up the partial results and computes the difference. It represents the number of tracks which is taken as the trigger criterion. ## 4. Implementation by field programmable gate arrays All processing stages (activation units, adders, comparators, Euler processor, integrators, subtractors) have been realized with XILINX field programmable gate arrays. The prototype for one detector sector consists of 21 XILINX chips. (Fig. 4). It operates at a clock rate of 40 MHz. For the Hough transform in one detector sector sixteen XC3042 chips have been used. Two 35 adder trees have been mapped into a XC3042. Each activation matrix is distributed over the sixteen XC3042 chips. In each chip two adder trees with 35 inputs are activated. The whole activation sequence is realized by interconnecting the activation lines of the adjacent XC3042. Threholding is done too in the XC3042. Each XC3042 contains therefore two adder trees, two comparators, and 1/16th of the activation matrix. The flow patterns of the distributed serial shift registers can be considered as the program of the systolic array. Since all other parts are uniform and static components which do not have to be redefined this flow programming has been automated. The control flow pattern extracted from the simulation is directly converted to logic equations, the XILINX programming cycle is initiated, and the resulting bit stream down-loaded to the XILINX chips. The threshold used by the comparators is reprogrammable in a similar manner. The Euler processor has been mapped into one XC3042 chip. It consists of 32 pattern matching units analyzing one column at a time in parallel. Fig. 4: Board layout and mapping of functional units #### 5. Status A prototype of the trigger processor for one sector of the OPAL jet chamber is being installed at CERN. It operates at 40 MHz corresponding to a trigger decision time of 3.3 $\mu$ s. A second system with an expected clock rate of 50 MHz is being set up. ## References - Heuer R.D., Wagner A.: The OPAL Jet Chamber; Nucl. Instr. Meth., Vol. A265 (1988) p. 11-19 - 2. R. Männer, J. Gläß, F. Klefenz: Massively Parallel Systolic Processors for High-Speed Recognition of Simple Patterns; Proc. Parallel Computing Technology '91, Novosibirsk, USSR (1991) 98- 108.