This document discusses the synthesis of monitoring and simulation processes for data storage and processing for large physics experiments like the Large Hadron Collider. It proposes a scheme called SyMSim that uses real monitoring data from the grid computing infrastructure to generate input for simulations. This allows simulations to accurately model current job workflows and resource usage and predict system performance under different scenarios to optimize resources and prevent issues.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
127 views8 pages
LHC Assignment
This document discusses the synthesis of monitoring and simulation processes for data storage and processing for large physics experiments like the Large Hadron Collider. It proposes a scheme called SyMSim that uses real monitoring data from the grid computing infrastructure to generate input for simulations. This allows simulations to accurately model current job workflows and resource usage and predict system performance under different scenarios to optimize resources and prevent issues.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 8
Synthesis of the simulation and monitoring
processes for the data storage and big data
processing development in physical experiments PRESENTED BY: RAHUL SHUKLA C-62 NIMIT SAHAI C-67
The LHC Computing Challenge The Large Hadron Collider (LHC) delivered in Run 1 (2010-2012) billions of recorded collisions to the experiments ~ 100 PB of data stored at CERN on tape The Worldwide LHC Computing Grid (WLCG) provides compute and storage resources for data processing, simulation and analysis in > 200 Tier1 and tier 2 centers ~ 300k cores, ~200 PB disk, ~200 PB tape The computing challenge resulted in a great success Unprecedented data volume analyzed in record time delivering great scientific results (e.g. Higgs boson discovery)
Considerable LHC renovation is going on now to start Run 2 in 2015 when the event reconstruction time will be doubled while the event rate to storage is expected to be 2.5 times faster. We are coming to the Big Data era! It is needed for potentially new physics, but faces with great challenges in LHC computing: large increase of CPU and WLCG resources; combined grid and cloud access; distributed parallel computing; overhaul most of simulation and analysis software codes Such a fast evolution of LHC distributed computing demands the careful dynamical simulations of all processes for data storage, transfer and analysis 2 Grid and Cloud Benefits:
Make continuous simulation process. Use the statistics accumulated during operation as an additional input parameter to improve accuracy of results.
3 NoSQL What is simulation for? Identification of system weak points (channels, queues, network capacity and other "bottlenecks") Simulation of the different system configuration (grid infrastructure architecture, resources configuration, number of resource centers, etc) Approbation of new algorithms of data access and replication strategies
Simulation allows: Predict Grid system performance under various changes: Different workloads System configuration Different scheduling heuristics Predict and prevent a number of unexpected situations Optimize equipment needed for data transfers and storage taking into account minimum cost and other project requirements Optimize resource distribution between users groups Test the system functioning to define its "bottlenecks" and many other weak points. etc. 4 Simulation..
Challenge Continuous development of Grid systems requires continuous adjustments of simulation parameters. It is necessary to predict the behavior of the system with significant changes. Its possible to use exploitation system statistics obtained from the monitoring tools. Aproach 1. Simulation of the existing computational systems: input data is a system statistics from the monitoring and accounting tools simulation results should match within the error limits with the results of monitoring for the same tasks.
2. Simulation of the new computational systems: hypothesis about the input data flow parameters and procedures of it processing; analysis of event time distribution that are generated by the input data flow processing; distributions are compared with the results obtained from the monitoring of the existing system; 5 Monitoring system Set of hardware and software for the analysis and control of a distributed computing system. Monitoring main tasks: Continuous monitoring of the grid services, both basic (common to the entire infrastructure) and related to individual resource centers Obtaining information about the computing resources (number of compute nodes to perform the tasks, the architecture of a computer system, installed software, specialized software packages available) and CPU consumption Data on access to the resources of virtual organizations and their use of quotas for computing resources Monitoring of the computational jobs and tasks implementation of and tasks (start, status changes, exit codes, etc.).
Monitoring parameters required for simulation:
1. number of tasks (simulation, analysis, reconstruction) coming into the system; 2. RAM is used 3. CPU time used 4. number of processed events 5. Job execution time 6. amount of data used.
6 Synthesis of monitoring and simulation (SyMSim) scheme
Real Grid 1. Submit jobs to the server 2. Jobs request 3. Submit jobs for execution 4. Write statistic to DB Simulation 5. Simulation based on actual monitoring data, generating input data for the simulation 6. Jobs flow imitation 7. Jobs request 8. Submit jobs for execution 9. Simulation result analysis, write statistic information. Jobs flow analysis (ATLAS) ) Simulation (Intensive use of CPU) b) Reconstruction (memory-intensive) c) Analysis (intensive data replication, loading of communication channels). Approximate ratio of different job types: Our job flow generator produces 3 types of jobs with parameters (number of events, CPU time, RAM, etc) closed to Atlas one. Jobs flow scheme 1. If slot and file are available Job is executed; 2. If file stored in tape library Job reserved slot but waiting for necessary file to the disk; 3. From tape to disk: tape cartridge move to the drive, cartridge's file system mounting to the drive, file is copied to the disk. A set of these classes allows one to simulate all the processes occurring with a file copy on tapes: load and unload of tape by manipulator, assembling on drive, search for a file on the tape and its reading/writing.
Interaction model components
7 Conclusion The proposed approach to simulation and analysis of computational structures is universal. It can also be used to solve design problems and the subsequent development of data repositories, not limited to the physical experiments area.