1.
|
Analysis of DIRAC's behavior using model checking with process algebra
/ Remenska, Daniela (NIKHEF, Amsterdam ; Amsterdam U.) ; Templon, Jeff (NIKHEF, Amsterdam) ; Willemse, Tim (Eindhoven, Tech. U.) ; Bal, Henri (Amsterdam U.) ; Verstoep, Kees (Amsterdam U.) ; Fokkink, Wan (Amsterdam U.) ; Charpentier, Philippe (CERN) ; Diaz, Ricardo Graciani (Barcelona U.) ; Lanciotti, Elisa (CERN) ; Roiser, Stefan (CERN) et al.
DIRAC is the grid solution developed to support LHCb production activities as well as user data analysis. It consists of distributed services and agents delivering the workload to the grid resources. [...]
2012 - 10 p.
In : Computing in High Energy and Nuclear Physics 2012, New York, NY, USA, 21 - 25 May 2012, pp.052061
|
|
2.
|
LHCb: Analysing DIRAC's Behavior using Model Checking with Process Algebra
Reference: Poster-2012-222
Created: 2012. -1 p
Creator(s): Remenska, Daniela
DIRAC is the Grid solution designed to support LHCb production activities as well as user data analysis. Based on a service-oriented architecture, DIRAC consists of many cooperating distributed services and agents delivering the workload to the Grid resources. Services accept requests from agents and running jobs, while agents run as light-weight components, fulfilling specific goals. Services maintain database back-ends to store dynamic state information of entities such as jobs, queues, staging requests, etc. Agents use polling to check for changes in the service states, and react to these accordingly. A characteristic of DIRAC's architecture is the relatively low complexity in the logic of each agent; the main source of complexity lies in their cooperation. These agents run concurrently, and communicate using the services' databases as a shared memory for synchronizing the state transitions. Although much effort is invested in making DIRAC reliable, entities occasionally get into inconsistent states, leading to a potential loss of efficiency in both resource usage and manpower. Tracing and fixing the root of such encountered behaviors becomes a formidable task due to the inherent parallelism present. In this paper we propose the use of rigorous methods for improving software quality. Model checking is one such technique for analysis of an abstract model of a system , and verification of certain properties of interest. Unlike conventional testing, it allows full control over the execution of parallel processes and also supports exhaustive state-space exploration. We used the mCRL2 language and toolset to model the behavior of two critical and related DIRAC subsystems:the workload management and the storage management system. mCRL2 is based on process algebra, and is able to deal with generic data types as well as user-defined functions for data transformation. This makes it particulary suitable for modeling the data manipulations made by DIRAC's agents. By visualizing the state space and replaying scenarios with the toolkit's simulator, we have detected critical race-conditions and livelocks in these systems, which we have confirmed to occur in the real system. We further formalized and verified several properties that were considered relevant. Our future direction is exploring to what extent a (pseudo)automatic extraction of a formal model from DIRAC's implementation is feasible. Given the highly dynamic features of the implementation platform (Python), this is a challenging task.
Related links: Conference: CHEP 2012
|
© CERN Geneva
Access to file Access to files
|
|
3.
|
Fulltext - CERN library copies
|
4.
|
|
5.
|
The LHCb Distributed Computing Model and Operations during LHC Runs 1, 2 and 3
/ Roiser, Stefan (CERN) ; Ramo, Adria Casajus (Barcelona U.) ; Cattaneo, Marco (CERN) ; Charpentier, Philippe (PIC, Bellaterra) ; Clarke, Peter (Edinburgh U.) ; Closier, Joel (CERN) ; Corvo, Marco (INFN, Padua) ; Falabella, Antonio (INFN, Bologna) ; Molina, Josè Flix (Caracas, IVIC) ; Medeiros, Joao Victor De Franca Messias (Rio de Janeiro, CBPF) et al.
SISSA, 2015
- Published in : PoS ISGC2015 (2015) 005
Fulltext: PDF; External link: Published version from PoS
In : International Symposium on Grids and Clouds 2015, Taipei, Taiwan, 15-20 Mar 2015, pp.005
|
|
6.
|
Formalising and analysing the control software of the Compact Muon Solenoid Experiment at the Large Hadron Collider
/ Hwong, Yi Ling (CERN) ; Keiren, Jeroen J.A. (Eindhoven, Tech. U.) ; Kusters, Vincent J.J. (CERN ; Zurich, ETH) ; Leemans, Sander (CERN ; Eindhoven, Tech. U.) ; Willemse, Tim A.C. (Eindhoven, Tech. U.)
The control software of the CERN Compact Muon Solenoid experiment contains over 30,000 finite state machines. These state machines are organised hierarchically: commands are sent down the hierarchy and state changes are sent upwards. [...]
arXiv:1101.5324.-
2011 - 18 p.
- Published in : Sci. Comput. Program.
Fulltext: PDF; External link: Preprint
|
|
7.
|
DIRAC - Distributed Infrastructure with Remote Agent Control
/ Tsaregorodtsev, A. (Marseille, CPPM) ; Garonne, V. (Marseille, CPPM) ; Closier, J. (CERN) ; Frank, M. (CERN) ; Gaspar, C. (CERN) ; van Herwijnen, E. (CERN) ; Loverre, F. (CERN) ; Ponce, S. (CERN) ; Graciani Diaz, R. (Barcelona U.) ; Galli, D. (INFN, Bologna) et al.
This paper describes DIRAC, the LHCb Monte Carlo production system. DIRAC has a client/server architecture based on: Compute elements distributed among the collaborating institutes; Databases for production management, bookkeeping (the metadata catalogue) and software configuration; Monitoring and cataloguing services for updating and accessing the databases. [...]
cs/0306060; CHEP-2003-TUAT006.-
Geneva : CERN, 2003 - 8 p.
- Published in : eConf C: 0303241 (2003) , pp. TUAT006
Fulltext: PDF; External link: Proceedings write-up on eConf
In : 2003 Conference for Computing in High-Energy and Nuclear Physics, La Jolla, CA, USA, 24 - 28 Mar 2003, pp.TUAT006
|
|
8.
|
DIRAC Workload Management System
/ Paterson, S (CERN)
DIRAC (Distributed Infrastructure with Remote Agent Control) is the Workload and Data Management system (WMS) for the LHCb experiment. The DIRAC WMS offers a transparent way for LHCb users to submit jobs to the EGEE Grid as well as local clusters and individual PCs. [...]
2007
In : 2nd EGEE User Forum, Manchester, UK, 09 May - 11 May 2007, pp.127
|
|
9.
|
|
10.
|
|