# **VERSATILE SERVICE FOR THE PROTECTION OF EXPERIMENTAL AREAS AT CERN**  16th Int. Conf. on Accelerator<br>  $\overrightarrow{O}$  ISBN: 978-3-95450-193-9<br>  $\overrightarrow{O}$ <br>  $\overrightarrow{B}$ <br>  $\overrightarrow{F}$ <br> *Abstract*<br>  $\overrightarrow{B}$ <br>  $\overrightarrow{B}$ <br>  $\overrightarrow{A}$ <br>  $\overrightarrow{B}$ <br>  $\overrightarrow{A}$ <br>  $\overrightarrow{B}$ <br>  $\overrightarrow{A}$ <br>  $\overrightarrow{B}$ <br>  $\overrightarrow{B}$ <br>  $\overrightarrow{A}$ <br>  $\overrightarrow{B}$ <br>

F. Valentini, M. Munoz Codoceo, P. Ninin CERN, Geneva, Switzerland

**2017). Any distribution of this work must maintain attribution to the author(s), title of the work, publisher, and DOI.** CERN hosts a number of other experimental areas with a rich research program ranging from fundamental physics to medical applications. The risk assessments have shown a large palette of potential hazards (radiological, electrical, chemical, laser, etc.) that need to be properly mitigated in order to ensure the safety of personnel working inside these areas. A Personnel Protection System, typically, accomplishes this goal by implementing a certain number of heterogeneous functionalities as interlocks of critical elements, management of a local HMI, data monitoring and interfacing with RFID badge readers. Given those requirements, reducing system complexity maint and costs are key parameters to be optimized in the solu- $\frac{12}{2}$  tion. This paper is aimed at summarizing the findings, in  $\frac{12}{2}$  terms of costs, complexity and maintenance reduction terms of costs, complexity and maintenance reduction, offered by a technology from National Instruments® based on cRIO controllers and a new series of SIL-3 certiof thi fied safety I/O modules. A use case based on a service for the protection of Class 4 laser laboratories will be described in detail.

#### **INTRODUCTION**

Any distribution The safety management of CERN has brought to light the need to mitigate the risks in a large number of differ- $\widehat{\Sigma}$ ent experimental areas of different configurations ranging  $\alpha$  from large caverns to small bunkers of a few square me- $\mathcal Q$  ters. These experiments are often the outcome of collabo- $\frac{8}{2}$  rations between CERN and worldwide research institutes  $\frac{5}{9}$  and, therefore, access to these areas is granted to a potentially large and heterogeneous population of users.

Considering the large number of facilities we are reequested to secure and the fact that the budget for safety is,  $\bigcirc$  in most of the cases, funded by the different home insti- $\frac{3}{2}$  tutes, there is a strong need to look for solutions being: *simple*, in terms of architecture and number of devices to be installed; *compact,* in terms of space occupied inside  $\frac{1}{2}$  the facility (a huge rack 2m high cannot fit in most of the cases); *easy to operate and maintain* and *cost effective*, especially considering that they might be used only during the short lifetime of the research activity. Content from this work may be used under the terms of the CC BY 3.0 licence ( $\epsilon$ 

Another key aspect to take into account is that, behind the purely safety interlock, a wide range of access control ಹಿ functionalities [1] are necessary in order to operate these  $\frac{5}{6}$  facilities, such as: RFID identification of users, verification of required access rights, count of users in a zone, implementation of dedicated graphical display of relevant information for local users, and remote control/monitoring capabilities for control room operators.

1 Anti-Hydrogen Trap models.

In this paper we investigate the possibility to employ the NI cRIO 9030 controller [2] in conjunction with the new Functional Safety modules from National Instruments® in order to increase the global reliability and to make it suitable for safety related applications. The 903X generation of cRIO controllers are particularly interesting due to the fact that NI replaced their proprietary OS with a new version based on Real-Time Linux. This adds much more flexibility to the controller, on one hand offering the possibility to interface the cRIO with a wide palette of devices such as: USB key-board, USB mouse, video monitors, smart cameras, etc. In addition, thanks to the Lab-VIEW programming environment, it offers the possibility to easily implement IT tasks, such as big file management, database connections, OPC communications, image processing, complex vector/matrix manipulation, mathematical analysis, etc.

Another interesting aspect of the LabVIEW environment is the possibility to fully validate the entire user software even if the real hardware is not connected thanks to the emulation capabilities and testing tools of the programming environment. This allows to spot all logical bugs before the site commissioning tests.

#### **CASE STUDY**

The specific case study considered to demonstrate our technical proposal is the  $ATRAP<sup>1</sup>$  experiment [3] is has been used to apply our technical solution. ATRAP is a collaboration between CERN and Harvard University, with the intent to produce anti-hydrogen atoms and investigate their properties against their matter equivalents. To perform these measurements, a wide set of powerful class 4 lasers are employed for spectrography and for cooling the anti-electron particles down to an energy compatible with the formation of anti-hydrogen.

The ATRAP operational modes foresee to run these laser beams inside specific interlocked containment boxes that can be either closed or open. In the latter case no-one can access to the room with the exception of a restricted number of experts, who can only gain access by entering a personal pin code into a keypad.

The securing strategy for ATRAP experiment consist of a combination of two types of functionalities: access control and safety interlocks. The access control functionalities being the following:

 RFID CERN badge identification in addition to the keypad code in order to log and track all entries in the room. For this, an RFID reader has to be controlled and the access card data has to be checked against a central database storing all authorization

- A local HMI to provide detailed information to the laser operators on the powering state of all lasers and on the safety conditions of the room.
- Remote monitoring of all system I/O signals and other internal variables related to the safety conditions. The SCADA system used is a CERN inhouse software system largely employed for supervision and maintenance purposes.

The Interlock Functions (IF) were identified in accordance with the risk assessment prescriptions that recommended a factor 100 (SIL 2) of risk reduction for ATRAP and are the following:

- **IF-1**: to *detect important operational errors and dangerous situations as the opening of a door when lasers are powered on.*
- **IF-2**: to *manage transitions between different operational modes allowing to safely operate the lasers*.
- **IF-3**: *to handle all warning signs and the evacuation sirens of the room*.
- **IF-4**: *to cut off power to every laser individually if all safety conditions are not met*.
- **IF-5**: *to maintain in closed position the two access doors by controlling a locking magnet.*



Figure 1: Interlock Functions dependence schema.

The safety objective of the system is ensured against *single point of failure* by an overlap of the interlock functions. For instance, when lasers are powered and their containment boxes are opened, the access doors are kept locked by two electromagnets (IF5) but, in case of failure, other functions of the system can ensure the complementary mitigating actions: *interruption of laser power* (IF4)<sup>*,*</sup> *activation of an evacuation alarm* (IF3)*, maintaining of light on all laser warning signs* (IF3).

It is an evidence that this interlock logic coupled with access control functionalities appears too complex to be constructed with a hardwired system, based only on relays or electronic components. For this reason we moved towards a solution based on a programmable controller. After analysing and prototyping different market proposals, such as the Allan-Bradley® Guard Master and Siemens PLC 1215F, we pursue our investigation on the cRIO 9030 that offers automation and IT capabilities.

## **NI TECHNOLOGY**

#### *The CompactRIO Controller*

The main component of the proposed architecture is on the cRIO 9030 controller. It is made up of a real-time Linux OS running on a Dual-Core Intel Atom processor, an Ethernet connection embedded card, an FPGA reconfigurable chip and a wide series of modules for the I/O tasks.



Figure 2: Architecture of the CompactRIO controller.

The real-time part of the controller includes a microprocessor for implementing network communication, data logging, control algorithms, and supporting deterministic controls thanks to the presence of a real-time OS.

The FPGA user-programmable module is completely independent from the microprocessor and it is typically used to implement high-speed controls, inline data processing, or complex timing and triggering operations.

#### *The Functional Safety Modules*

The other core component of the architecture is the new NI 9035 SIL-3 Functional Safety modules. These add a safety integrity level (according to IEC 61508) to the cRIO controller and make it suitable to operate in safety critical applications.



Figure 3: Safety Module NI 9350.

under the terms of the CC BY 3.0 licence ( $\odot$  2017). The NI 9350 are not simply a SIL rated I/O device, it features a self-contai8ned logic solver that can be used to ್ಲಿ implement safety functions independently from the main cRIO controller's chassis where it can be mounted. While these modules can provide the status of the inputs to the cRIO, the safety logic inside the NI 9350 continues to run regardless of the state and condition of the controller.

The logic in each safety module can be programmed with the Functional Safety Editor, an easy-to-use standalone tool that incorporates a graphical programming technique based on state machine formalism.

Content from this work may be used under the terms of the CC BY 3.0 licence ( $@$ 

 $\mathbf{na}$ work 1

from

වී

2017). Any distribution of this work must maintain attribution to the author(s), title of the work, publisher, and DOI.

tain maint

Any distribution of this work m

attribution to the author(s), title of

publisher, and DOI.

work, the

Т In total, each module incorporates 8 digital (sink) inpublisher, puts and 8 digital (source) outputs that make them very suitable in most of the cases but that require some workarounds when the safety logic is more complex.

### **PROTOTYPING OF THE SOLUTION**

work, of the y The first approach followed in our prototype was based on a clear separation between access control functionali- $\frac{1}{2}$  ties, all implemented in the real-time cRIO Linux part, and the interlock functions, directly coded inside the NI 9350 safety modules. However, due to the I/O limitations mentioned in the previous chapter, this approach appeared quite inefficient to be put in place, both in terms of number of modules needed to cope with the whole I/O space of our use case (*5 modules were required to pilot the 10 lasers individually*), and in terms of programming effort to distribute our complex safety logic among the 5 safety modules.

 $\circ$  2017). Any distribution of this work must maintain attribution to the author(s), title of the work, publisher, and DOI.  $\circ$  and DOI.  $\circ$  and  $\circ$  and ia. For this reasons, the solution we propose consists in using the NI safety modules to perform highly reliable diag-E nostic checks of the health state of the standard I/O mod- $\frac{12}{2}$  ules and the cRIO controller, this in order to reveal dan-<br> $\frac{1}{2}$  consume hardware faults during the approximal phases. The gerous hardware faults during the operational phases. The diagnostic checks constitute, then, the Safety Instrumented Functions of the system. This choice allows to drasti-골 cally reduce the number of safety modules (only one is ď then required) and to overcome the limitations of the listribution programming tasks. Furthermore it can be a very interesting technique to make existing cRIO based control systems more reliable with minimal development efforts.

#### ÅДУ *General Architecture*

2017). The main building blocks of our system architecture are the following:

**Laser Veto Relays (x10)**. Every relay is used in failsafe manner to allow the powering of the 10 lasers inde- $\frac{5}{2}$  pendently via normally open contacts of the relays (make contact = powering allowed). ©

 $\frac{3.0}{5}$ **General Veto Relay (x1)**. This relay is used in failsafe  $\approx$  manner to switch off the main power feeding all other  $\gamma$  veto relays. This is done via normally open contact of the relay (make contact) that, when opened, causes an imme- $\frac{1}{6}$  diate shutdown of all lasers.

**Input modules NI 9425 (x2)**: sink digital input mod-E Input modules NI 9425 (x2): sink digital input mod-<br>gules used to acquire the state of the door positions (two in ATRAP), the feedback of the door locking magnets, the  $\overline{\overline{b}}$  emergency stops, the laser powering modes and the posi- $\frac{1}{2}$  tion of the 10 laser covers. The different input signals are distributed between the two input modules in a way to minimize the impact on safety objectives of a failure in a single acquisition channel. Content from this work may be used under the terms of the CC BY 3.0 licence ( $\epsilon$ 

**Output module NI 9475 (x1)**: source digital output module used to pilot independently the 10 laser powering relays, the door locking magnets, the evacuation siren and  $\frac{16}{12}$  the laser ON warning signs.

**The Controller cRIO 9030 (Real-Time part)**: the refrom al-time Linux part of the controller is used to implement all non-safety functionalities, including RFID badge read-Content

er, management of a local HMI and dispatching of monitoring data to the CERN SCADA system (TIM).

**The Controller cRIO 9030 (FPGA part)**: the FPGA integrated chip is used to implement all interlock functions described above. The FPGA controls directly the three I/O standard modules and, via the internal bus of the cRIO backplane, it dispatches the states of all I/Os and all the internal variables to the real-time part for monitoring purposes.

**Safety SIL Module NI 9350 (x1)**: digital I/O module SIL-3 rated used to perform diagnostic checks on the cRIO controller and on the three I/O modules and to pilot the General Veto Relay.

#### *Implementation of the Safety Functions*

The FPGA part of the cRIO controller is used to compute all interlock functions while a single NI 9350 safety module is employed to perform a set of four Safety Instrumented Functions (SIF) consisting in a series of periodic diagnostic checks to ensure the correct behaviour of the FPGA chip and all the I/O modules. This, as we will show in the next paragraph, allows to increase the global system reliability without the need to translate the whole set of interlock functions, easily and intuitively implementable in LabVIEW, into the language of the safety modules.

The safety instrumented functions can be summarized as follows (see figure 4):

- **SIF-1**: *Diagnostic of Input Module 1*. It sends a check signal (DO0) to the NI 9425 Input module **1**, which has to be acknowledged (on DI0 channel) within a given time otherwise a general stop (DO2) of the laser room is performed.
- **SIF-2**: *Diagnostic of Input Module 2*. It sends a check signal (DO1) to the NI 9425 Input module **2**, which has to be acknowledged on DI1 within a certain time otherwise a general stop (DO2) of the laser room is performed.
- **SIF-3**: *Alive Watchdog Check*. DI2 awaits for a periodic alive message certifying the correct execution state of the FPGA's program main cycle. The nonarrival of this signal implies a general stop (DO2) of the laser room.
- **SIF-4**: *Software Fault Check*. The SW fault signal is computed by the FPGA user program and expected on DI3. It is built by a specific LabVIEW routine running inside the main program cycle; it periodically verifies the most relevant internal variable values against a specific set of invariant properties or logical assertions [4] that must be always TRUE during each loop. In the opposite case a general stop (DO2) of the laser room is performed.

The above safety functions perform four periodic diagnostic checks capable of revealing the most common dangerous failures that can affect the standard I/O modules or the FPGA chip and automatically act on the General Veto Relay to set the entire laser facility into a safe controlled state.

0000

DIO Ack INP-1 NI-9475

DI2 DIAG-1 NI-9475

Ack INP-2 NI-9475

**DIAG-2 NI-9475** 

COM

COM

 $DI3$ 

 $D14$ 

D<sub>15</sub>

D<sub>16</sub>

COM

COM

D<sub>17</sub>

COM

 $D11$ 

Chk INP-1 DOO

Chk\_INP-2 DO1

<mark>lelay</mark><br>- Geh\_sTOP DO2

COM

COM

DO<sub>3</sub>

DO<sub>4</sub>

COM

DO<sub>5</sub>

DO6

COM

DO<sub>7</sub>

Vsup

Figure 4: NI 9350 Cabling Interface.

**RELIABILITY CALCULATION**  In order to evaluate the increase in reliability that can be obtained with the integration of a SIL module NI 9350 into a non-safety cRIO chassis, we performed a FMEA on the system (Table 1). The following assumptions were taken for the implementation of the statistical model:



- 1. **Hp-1**: the failure rates  $(\lambda)$  for dangerous failures of the different components are derived from the general National Instruments official MTBFs and we assume that all different failure modes of the components are uniformly distributed.
- 2. **Hp-2**: this study does not take into account the final elements of the studied architecture, as relays, door position sensors, etc. It is mainly focused on contributions to dangerous failures given by all NI devices in the system.
- 3. **HP**-**3**: the failure rate of the NI 9030 controller is assimilated to the failure of its FPGA (*Xilinx Kintex-7*) where the interlock functions are performed. The reliability data are taken from the Xilinx reliability manual [4] that explain how to estimate the FIT (Failure in Time per billion of hours) in function of the model of chip and the size of embedded RAM for the most redoubtable failure on an FPGA: *the single event upset (SEU)* [5]. The value used is then relative to this particular failure event and calculated on the base of our model of FPGA, hosting 5MB of RAM.
- 4. **Hp**-**04**: according to NI technical specifications the failure rate  $(\lambda)$  per hour applicable for the SIL-3 module is 1E-08.



Table 1: Failure Mode and Effect Analysis (FMEA) for the non-SIL rated part of the system

Бug

16th Int. Conf. on Accelerator and Large Experimental Control Systems ICALEPCS2017, Barcelona, Spain JACoW Publishing<br>
doi:10.18429/JACoW-ICALEPCS2017-THPHA108<br>  $\Omega$ 

and On the base of these reliability data we can easily compublisher. pute via a Fault Tree Analysis (FTA) the probability of dangerous system failure (PFH) assuming to have a mission time (proof testing interval) of 1 year and 30s time of diagnostic interval performed by the 9350 SIL module. work,

The probability of a system failure, potentially leading to the loss of safety mission, during 1 year time can then be estimated by the formula:

g to the loss of safety mission, during 1 year time can then  
\n
$$
g_{\text{tot}}^2
$$
 be estimated by the formula:  
\n
$$
g_{\text{tot}}^2 = P_r(F_{M2})P_r(D) + P_r(F_{M4}) + P_r(F_{M6})P_r(D) + P_r(F_{M8}) + P_r(F_{M9})P_r(D)
$$
 (1)  
\n
$$
g_{\text{tot}}^2
$$
 were  $P_r(D)$  is the probability of dangerous failure given by the NI 9350 module where the four safety functions

Were  $P_r(D)$  is the probability of dangerous failure given by the NI 9350 module where the four safety functions are performed. Every  $P_r(X)$  for 1 year of mission time is then given by the following exponential probability dis- $\frac{1}{2}$  are perform<br>the give<br> $\frac{1}{2}$  tribution:

$$
P_r(X) = (1 - e^{-\lambda_X * 8760})
$$
 (2)

maintain From above we can conclude that the usage of diagnosmust tic functions basically zeroed the contribution of failure rates FM.2, FM.6 and FM.9 lowering down the global work: system probability of failure from *0.0107* to *0.0037 events per year* (corresponding to a failure every 270 years) which is compatible with a **SIL-2** grade.  $\sigma$ 

The employment of the safety module allows to spot a distribution wide series of failures and to react in a safe/deterministic manner.

The main advantage of this approach is that it is possi- $\geq$  ble to reach a high SIL level in a cRIO based architecture with a minimum development effort; it allows to maintain  $\hat{\le}$  the interlock logic inside the FPGA part of the controller and it makes the development of safety related applica- $\mathcal Q$  tions much more easy (take maximum advantage from the LabVIEW environment) and cost effective (one SIL rated  $\frac{5}{2}$  module is necessary).  $\circledcirc$  2017). Any distribution of this work must maintain attribution to the author(s), title of the work, publisher, and DOI.

#### **CONCLUSION**

Content from this work may be used under the terms of the CC BY 3.0 licence ( $\epsilon$  $3.0$ The current market offers a large range of controllers  $\mathbf{N}$ and devices suitable for safety applications from many g vendors and they strongly compete with the traditional the PLC technologies. Several requests to secure small exper- $18$  of t imental facilities have led us to investigate the most suit $em$ able technology in terms of: low-cost, simplicity, maintainability, availability; and also to overcome the limitathe i tions we experienced with the safety-PLCs. As reported in under [6]: "*PLC programs do not deal with graphical interfaces, large data structures; they do not create files, do not*   $used$ *perform complex operations*". This is a limiting factor for the implementation of *access control* functionalities and it is typically resolved by integration of different COTS<br>Eproducts [11] which may are: products [1]; which may cause an unpredictable increase work<sub>1</sub> in complexity of the final system.

Our investigation highlighted that a high level of Safety this Integrity (SIL-2) is achievable using technologies that offer a wide connectivity and bring new solutions to control serial devices, to communicate with external datatent bases and to implement local HMIs. Our work was conducted in accordance to the IEC 61511 standard according to which the SIL level of a component can be increased by adding diagnostics functionalities.

In the domain of Functional Safety, the main driving principle of all systems is: *keep it simple*. The reason is that a Safety Instrumented System is intended to perform really simple but critical actions for which failures due to logical bugs are unacceptable. We believe in fact that the massive usage of software for implementing safety instrumented functions may present several potentially serious problems [7], evident when the user's software, in absence of strict and clear development guidelines, becomes difficult to fully test and to maintain. It is largely recognized that in any system the failures caused by software dominate those caused by hardware.

For the above reasons we are also investigating other kinds of technologies, mainly based on hardware, in order to minimize the impact of software on the global system reliability. Even though at the moment we do not have a complete and satisfactory simple hardware solution fitting all our requirements, we believe that our approach goes in the direction of this simplicity. The LabVIEW tool offers an ideal environment for the safety functions programming. Furthermore, it features powerful testing functionalities (as the tool *TestStand* [8]) that allow to increase significantly the quality of user's software and to save time during the system commission phase.

#### **REFERENCES**

- [1] *Integration of heterogeneous access control functionalities using the new generation of NI cRIO 303X controllers,* ICALEPCS15 proceedings, MOPGF143, F. Valentini et al. Melbourne, AUSTRALIA, October, 2015.
- [2] NI cRIO-9030, http://www.ni.com/enus/support/model.crio-9030.html
- [3] *Trapped Antihydrogen in Its Ground State*, Dissertation Thesis of HARVARD University, P. J. Richerme, Cambridge, Massachusetts, USA, 2012.
- [4] *Loop invariants: analysis, classification, and examples*, ACM Computing Surveys, Vol. 46, no. 3, Carlo A. Furia *et al.,* Feb. 2014.
- [5] *Device Reliability Report*, XILINX Report V10.6.1, USA, July. 2017.
- [6] *What is special about PLC software model-checking?* ICALEPCS17 proceedings, THPHA159, D. Darvas, E. Blanco Vinuela, Barcelona, SPAIN, Oct. 2017.
- [7] *Evaluation of Safety-Critical Software*, ACM Computing Surveys, Vol. 33, no. 6, D. L. Parnas *et al.,* Jun. 1990.
- [8] TestStand tool, http://www.ni.com/teststand/whatis/