0% found this document useful (0 votes)
11 views

mccmTutorial

The document provides an overview of the CMS Monte Carlo Production System, detailing its management, computing model, and the Worldwide LHC Computing Grid. It discusses the processes involved in generating simulated events for analysis, the roles of various groups within CMS, and the terminology used in the Monte Carlo production workflow. Additionally, it highlights the tools and platforms available for monitoring and managing production requests and datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

mccmTutorial

The document provides an overview of the CMS Monte Carlo Production System, detailing its management, computing model, and the Worldwide LHC Computing Grid. It discusses the processes involved in generating simulated events for analysis, the roles of various groups within CMS, and the terminology used in the Monte Carlo production workflow. Additionally, it highlights the tools and platforms available for monitoring and managing production requests and datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

CMS Monte Carlo Production System: From Analyst Point of View

On behalf of PdmV and Generator Groups

Wajid Ali Khan | 09.04.2019

N ATIONAL C ENTER FOR P HYSICS , I SLAMABAD PAKISTAN


Overview:

Monte Carlo Production Management (McM) its usage and needs


CMS Computing Model: Worldwide LHC Computing Grid
MC Production and McM Terminology
Production Monitoring Platform (pMp)
Dataset Name Terminology and Datasets in DAS → Bridging DAS and McM
Finding Details:
Settings for cmsDrivers/Sequences
Finding/Using Particular Gridpacks/Configuration Files
Private Sample Production (LHEs/pLHE) and More Production Details
Links to be Bookmarked, HNs, Egroups, Twikis
Exercises

Wajid Ali Khan Monte Carlo Production System 09.04.2019 2/32


MC Production Management:

Analysis of LHC data at CMS experiment requires the production of a large number
of simulated events
McM produced billions of simulated events using different campaigns during RunI, II
Each campaign takes in a specific detector and LHC conditions
About more than 20 different groups (PAGs, POGs, DPGs) are working at CMS using
various MC samples
Hundreds of signal and backgrounds samples are needed for various studies
Strong and reliable system is required to manage information needed for
configuration and prioritization of event production
Ensure efficient book keeping and production of MC samples for different groups
Take input form a user in a simple way and interfaces with the CMS production
infrastructure
Take the user → Monte Carlo Management → Tier-1/Tier-2 computing centers

Wajid Ali Khan Monte Carlo Production System 09.04.2019 3/32


CMS Computing Model:

CMS presents a challenging environment not only in terms of physics to discover, the
detector to build and operate but also in:
Data volume and the necessary computing resources
Computing resources and dataset are at least an order of magnitude larger than the
previous experiments
The large scale CMS computing and storage requirements make it difficult to localize all of
them at one place (technical and funding reasons)
Many CMS collaborators are not based at CERN and they have access to significant
computing resources (other than CERN)
It is advantageous to harness them for CMS computing
It also helps to develop local infrastructure and secure local funding

Wajid Ali Khan Monte Carlo Production System 09.04.2019 4/32


Worldwide LHC Computing Grid (WLCG):

WLCG is composed of four levels, or “Tiers”, called 0, 1, 2/3.


Each tier is made up of several computer centres and provides a specific set of
services → tiers process, store and analyse data from LHC
Tier 0 is the CERN Data Centre provides less than 20% of the Grid’s total capacity,
40% at T1s, and 40% at T2s

Wajid Ali Khan Monte Carlo Production System 09.04.2019 5/32


Computing Resources Usage:

Major Campaigns sharing the computing resources

Wajid Ali Khan Monte Carlo Production System 09.04.2019 6/32


MC Production – Managerial Overview:

Physics Object Group


Physics Analysis Group MC Production
Computing Operations
Management
Detector Performance Group

DAS

Generator Contact: collects the needs for simulated datasets from


within the detector or physics group tests the requests locally and
proposes them to the MCCM for production

Generator Convener: Examines or inspects the requests made by the


Generator contact closely and thoroughly and then approves the
particular generator configurations

Request/Production Manager: Configures campaigns and flows,


performs request chaining, sets their priority and submit requests to
the production infrastructure, handles workflows during the production
phase and sends datasets to DAS

Wajid Ali Khan Monte Carlo Production System 09.04.2019 7/32


Data Tiers Most Commonly Used:

Data Tier can be defined as the event contents/information a dataset stores. Most
commonly used are:
RAW, RECO, AOD, AODSIM, MiniAOD, NanoAOD, USER, GEN, FEVT
RAW contains full event information from the Tier-0 (i.e., from CERN), containing
’raw’ detector information (detector element, hits, etc.)
RAW is not used directly for analysis
RECO & AOD
RECO (RECOnstructed data): output from first processing by Tier-0. This layer contains
reconstructed physics objects, but it’s still very detailed ∼ 2 MB
Used mostly for dedicated studies and detector commissioning
AOD (Analysis Object Data): distilled version of RECO data, contains (∼ 40%) RECO
information and can be used for analysis
MiniAOD
Lightweight data tier MiniAOD is a step further in data reduction (∼ 10–15% of AOD size)
Typical event size (30–50 kB/evevnt) serve the needs of ∼ 90% of CMS analyses
NanoAOD
NanoAOD consists of ntuple like format, readable with bare ROOT and containing
per/event information that is needed in most generic analyses (30 - 50%)
Produced on top of MiniAOD, typical event size (1–2kB), also fast to run: O(10–20Hz)
Further details can be found here

Wajid Ali Khan Monte Carlo Production System 09.04.2019 8/32


McM Web Interface: 1

Production Interface: https://cms-pdmv.cern.ch/mcm


Development/Testing Interface: https://cms-pdmv-dev.cern.ch/mcm
Wajid Ali Khan Monte Carlo Production System 09.04.2019 9/32
McM Web Interface: 2

Register your self if you are using the McM for first time
Click Users → than Click Add me! button on lower left corner
Role/access rights increase can be requested at any time

Wajid Ali Khan Monte Carlo Production System 09.04.2019 10/32


Technical Overview of MC Production:

Type of physics processes ?


Event Generation, Hard Scattering
Generation Type of generators ?

{
Hadronization, Validation
Parameters to be modified ?
GEN-SIM
Particle-detector interaction
Simulation CMS Geometry, Magnetic field
Simulation

Pileup situation, Alignment-


Digitization DIGI, L1, DIGI2RAW, HLT

{
Caliberation, Trigger menu etc
DIGI-RECO
RAW2DIGI, L1Reco, RECO,
Reconstruction Reconstruction algorithms
VALIDATION, DQM

MiniAOD

NanoAOD

Wajid Ali Khan Monte Carlo Production System 09.04.2019 11/32


McM Terminology:

Types of root requests in McM:


wmLHE: simulation of the hard event by specialised event generator programs,
resulting in events written in LHE format – Workload Management System
(WMAgent)
wmLHEGS: LHE and GEN-SIM production in a single step (default way)
pLHE: private or personal LHE files
Pythia (GEN-SIM): A generator which can do both hard scattering and hadronization,
the input in general can be a LHE file
wmLHE and pLHE are the steps that produce hard scattering processes, and then
stores those events in EDM format (i.e., genParticles), which can be used later by
hadronizer

Wajid Ali Khan Monte Carlo Production System 09.04.2019 12/32


McM Terminology and MC Processing: 1

Request is a set of instructions and configuration options for Monte Carlo event
generation prepared by the Generator and PPD groups. It may represent different
processing steps and their combinations (LHE, GEN-SIM, RECO)
TOP-RunIIFall18pLHE-00003, TOP-RunIIFall18GS-00003
TOP-RunIIAutumn18DRPremix-00109
TOP-RunIIAutumn18MiniAOD-00116, TOP-RunIIAutumn18NanoAOD-00041
Flow is a connection between at least two (2) campaigns to produce a dataset in
more than one campaign. It can overwrite parameters in subsequent campaign e.g.,
flowRunIIFall18GS → flowRunIIAutumn18DRPremix →
flowRunIIAutumn18MiniAOD → flowRunIIAutumn18NanoAOD
flowPhaseIISpring17DPU200, DRNoPU, DRPU140 and similarly 0T, 38T
Prep ID is a unique identification string for a Monte Carlo request that allows to track
it in different systems
Workflow is a set of tasks to be processed by the production tools. For each request
there can be multiple workflows
Campaign is a central platform in McM used to produce a set of requests sharing the
same physics goal, software release, energy and event processing configuration e.g.,
GENonly, GEN-SIM, wmLHE, wmLHEGS, DIGI-RECO, DIGIonly etc,.

Wajid Ali Khan Monte Carlo Production System 09.04.2019 13/32


McM Terminology and MC Processing: 2

wmLHE+GS

wmLHE DIGI L1 Reco


GEN
L1 Reco
+ DIGI2RAW VALIDATION MiniAOD NanoAOD
pLHE SIM HLT DQM

Sequences of the campaign can be changed at the flow level e.g.,


"magField":"0T" , "pileup":"NoPileUp", "conditions":"specificGT"
Wajid Ali Khan Monte Carlo Production System 09.04.2019 14/32
McM Terminology and MC Processing: 3

Chained campaign is a sequence of campaigns connected by flows determining the


succession of processing steps and campaigns which are needed to deliver datasets
for analysis e.g.,

chain_RunIIWinter19PFCalib16GS_flowRunIIWinter19PFCalib16DRPU0to70_
flowRunIIWinter19PFCalib16MiniAOD_flowRunIIWinter19PFCalib16NanoAOD
Chained campaigns can start from wmLHE, wmLHEGS, pLHE, GEN-SIM
Chained request is a concrete set of processing requests starting from a root
request and going through the steps of a chained campaign e.g.,
TOP-chain_RunIIFall18wmLHEGS_flowRunIIAutumn18DRPremix_
flowRunIIAutumn18MiniAOD_flowRunIIAutumn18NanoAODv4-00003
The number at the end of request is always unique 00003:
https://cms-pdmv.cern.ch/mcm/chained_requests?member_of_campaign=
chain_RunIIFall18wmLHEGS_flowRunIIAutumn18DRPremix_
flowRunIIAutumn18MiniAOD_flowRunIIAutumn18NanoAODv4&prepid=TOP*00003

Wajid Ali Khan Monte Carlo Production System 09.04.2019 15/32


Production Monitoring Platform (pMp): 1

Production Monitoring Platform: https://cms-pdmv.cern.ch/pmp


pMp is developed to monitor the progress and statistics of Monte Carlo requests,
flows, campaigns and workflows using their prepIDs in different options:
Present Statistics: Total events or requests for the specific searched items
Historical Statistics: Shows expected, current and done events over time, and a list of
submitted requests with their progress
Performance Statistics: Shows total time taken by a request to go from one status to
another
CMS user can check the status of relevant samples by using PrepID/campaign

Inset shows the growth of RunIIAutumn18MiniAOD over the time since approval
Wajid Ali Khan Monte Carlo Production System 09.04.2019 16/32
Production Monitoring Platform (pMp): 2

Status of a present statistics in announce mode for a particular campaign


Total number of requests created, DONE, SUBMITTED, APPROVED and NEW

Wajid Ali Khan Monte Carlo Production System 09.04.2019 17/32


Status of Requests in pMp:

Link to pMp plots from McM


View announced statistics for request e.g., TOP-RunIIFall17wmLHEGS-00064
View growing statistics for request
View historical statistics for request
Buttons are present under Actions:
Requests: *TOP*Autumn18*
Campaigns: *RunIIAutumn18*
Flows: *RunIIAutumn18*
Chained Campaigns: *Autumn18*
Information that can be extracted:
Total number of submitted events
Events appeared in DAS (statistics from running jobs of a request)
Done events in DAS (statistics from finished jobs of a request)

Wajid Ali Khan Monte Carlo Production System 09.04.2019 18/32


Dataset Name’s Terminology:

Dataset name in CMS always follows the format (three forward slashes): /*/*/*
/DatasetName/Campaign-ProcessString-globalTag-Ext-Version/DataTier

We can get all the datatier from DBS:


dataset=/TTToSemiLeptonic_mtop171p5_TuneCP5_PSweights_
13TeV-powheg-pythia8/*/*
If an additional statistics are required for any sample the extension (ext1/2/3) of that
particular sample is requested in McM

Wajid Ali Khan Monte Carlo Production System 09.04.2019 19/32


Data Aggregation System (DAS): 1

CMSSW GEN-SIM MiniAOD McM Prep-ID

Search sample on DAS: https://cmsweb.cern.ch/das


/TT_TuneCUETP8M2T4_13TeV-powheg-pythia8/RunIISummer17MiniAOD-92X_
upgrade2017_realistic_v10_ext1-v2/MINIAODSIM
Samples available in DAS marked with VALID, INVALID, PRODUCTION
Wajid Ali Khan Monte Carlo Production System 09.04.2019 20/32
Data Aggregation System (DAS): 2

Collection of Files GT, CMSSW

Completed/Announced samples are marked in DAS as: VALID


PRODUCTION statistics is still growing and dataset is not yet announced
Run over the available statistics by using allowNonValidInputDataset parameter
in CrabConfigFile

Wajid Ali Khan Monte Carlo Production System 09.04.2019 21/32


Finding Details from DAS/McM: 1

Log on to: https://cms-pdmv.cern.ch/mcm and click the Navigation button

There are four fields which can be used to make a search:


Prep-ID: BTV-RunIISummer17MiniAOD-00070
Dataset Name: TT_TuneCUETP8M2T4_13TeV-powheg-pythia8
MccM Ticket: BTV-2017Aug09-0000*
Request Tags: PAGLHCP19

Wajid Ali Khan Monte Carlo Production System 09.04.2019 22/32


Finding Details from DAS/McM: 2

Alternately go to Requests Tab: https://cms-pdmv.cern.ch/mcm/requests Click the


Navigation tab

Lists prepid’s in which the BTV-RunIISummer17MiniAOD-00070 has been used


Output dataset can also be extracted along with requests status

Wajid Ali Khan Monte Carlo Production System 09.04.2019 23/32


Finding details from DAS/McM: 3

Options from Select View tab for request: BTV-RunIISummer17DRPremix-00085


Couple of interesting things are:
Pileup dataset name: Pileup Data set used
Config Id: Configuration files for DIGI and RECO steps
Sequences: cmsDriver infomation on DIGI and RECO steps
Reqmgr name: Shows production status and its link from McM to dataset in DAS

Wajid Ali Khan Monte Carlo Production System 09.04.2019 24/32


Finding Details from DAS/McM: 4

Exploring some more options for the request under considerations

click to see DIGO ConfigFile


click to see RECO ConfigFile click on eye to see cmsDriver details

Available Options are different for a user

Click to see full chain + various step of same request

Wajid Ali Khan Monte Carlo Production System 09.04.2019 25/32


Finding Details for GS Requests:

Exploring some options for the GS: BTV-RunIISummer17wmLHEGS-00001

click to see fragment details LHE/wmLHEGS

grid pack location and other gen level info

get configuration files

Wajid Ali Khan Monte Carlo Production System 09.04.2019 26/32


CMSDriver/Scripts to Produce Events:

Create a request in any campaign e.g., RunIIFall18wmLHEGS


Always check the created request by running it locally before starting validation

click to see existing request in campaign

click to get the test command


click to trigger the validation

Select a campaign in accord with your needs by checking the cmsdriver/sequences


For more details write us at: [email protected]

Wajid Ali Khan Monte Carlo Production System 09.04.2019 27/32


HNs, Egroups and Meetings:

Prep-ops: [email protected]
Default gateway to interact with MC GEN contacts, experts from Trigger, AlcaDB,
computing, and to post MccM announcements
Generator group: [email protected]
Main HN to discuss MC generation issues, and to post MccM announcements
CMSSW Release and data operations
[email protected], [email protected]
To discuss and integrate new algorithms in official CMS software
Monte Carlo Coordination Meetings (MccM):
This meeting is open to GEN contacts (DPGs, POGs, PAGs) and CMS analysts to discuss
MC requests/tickets with MccM core team
Twiki: https://twiki.cern.ch/twiki/bin/view/CMS/PdmVMonteCarloCoordinationMeeting
CERN Time: 3 PM - 4 PM, every Wednesday
PPD General Meeting:
https://indico.cern.ch/category/3905
CERN Time: 2 PM - 4 PM, every Thursday
ORP:
CMSSW release plan, integration of new pull requests in CMSSW, etc
CERN Time: 5 PM - 6 PM, every Tuesday
Announced at: [email protected]

Wajid Ali Khan Monte Carlo Production System 09.04.2019 28/32


Toy Example:

Suppose: /TTToSemiLeptonic_TuneCP5down_PSweights_13TeV-powheg-pythia8/
RunIIFall17MiniAOD-PU2017_94X_mc2017_realistic_v11-v1/MINIAODSIM
We want to find the corresponding request in McM:

Physics Working Group-Chain Used-Unique Number

Root Request of the dataset

Wajid Ali Khan Monte Carlo Production System 09.04.2019 29/32


PdmV Twikis/Material:

PdmV Twiki: https://twiki.cern.ch/twiki/bin/view/CMS/PdmV


McM Twiki: https://twiki.cern.ch/twiki/bin/view/CMS/PdmVMcM
pMp Twiki: https://twiki.cern.ch/twiki/bin/viewauth/CMS/PdmVpMp
McM Glossary: https://twiki.cern.ch/twiki/bin/viewauth/CMS/PdmVMcMGlossary
Previous McM Tutorial: https://indico.cern.ch/event/674156

Wajid Ali Khan Monte Carlo Production System 09.04.2019 30/32


Exercises: 1

Select any MiniAOD dataset from your analysis and find it in DAS
Find its McM prepID, global tag and CMSSW release
Find LHE, GEN-SIM, DIGI-RECO requests that are used to produce this MiniAOD request
Find global tag, x-section, generator level cuts in a GEN-SIM request that were used to
produce that MiniAOD
Check if there is any extension of the sample has been produced
Find the Les Houches Event file used to produce TOP-RunIIFall18GS-00003.
Find various requests been made using request B2G-RunIIWinter15wmLHE-00007
as a root request.
Find cmsDriver settings used for request TOP-RunIIAutumn18MiniAOD-00006.
Find the grid pack used to produce request TOP-RunIIAutumn18MiniAOD-00006.
Find PDF used by default in the sample. There can be a number of ways but find it using
at least two different ways.
Find data-cards used to generate the root request.

Wajid Ali Khan Monte Carlo Production System 09.04.2019 31/32


Exercises: 2

Validation of the following tt̄ FCNC request TOP-RunII-Fall18-wmLHEGS-00249 will


never succeed. Using request checking script patch the grid pack and check if the
validation is successful.
What are the number of partons in born matrix element for highest multiplicity.
Taking the request TOP-RunII-Fall18-wmLHEGS-00226:
Check if H→bb̄, ZZ, τ τ decays have been included.
Check if all other Higgs decays have been switch off.
Check if request includes the parton shower weights.
Find the CR tune used in request.
List all the B2G requests starting from B2G*01469 to B2G*01481 in the campaign
RunIIFall17wmLHEGS.
Split the requests depending on their status none/new, submit/submitted,
define/defined, submit/approved
Find all the SMP, B2G, TOP requests produced with the chained campaign:
chain_RunIIFall18wmLHEGS_flowRunIIAutumn18DRPremix_
flowRunIIAutumn18MiniAOD_flowRunIIAutumn18NanoAODv4

Wajid Ali Khan Monte Carlo Production System 09.04.2019 32/32


Backup

Wajid Ali Khan Monte Carlo Production System 09.04.2019 33/32


Get the Test Command:

Open a terminal window


Get test command: wget
https://cms-pdmv.cern.ch/mcm/public/restapi/requests/
get_test/PPD-RunIIFall18wmLHEGS-00001
Initialize your grid proxy certificate: voms-proxy-init -voms cms
Change the permission: chmod +x PPD-RunIIFall18wmLHEGS-00001
Launch the script: ./PPD-RunIIFall18wmLHEGS-00001
Request.xml, Request.py, Request.root files will be created
Read logs and explore the generated files
You can also produce DIGI-RECO and MiniAOD in same file by adding appropriate
cmsdrivers

Wajid Ali Khan Monte Carlo Production System 09.04.2019 34/32


Conditions (Global Tags):

The alignment and calibration conditions needed by all stages of the data production
(SIM, DIGI: for simulated events) and processing (RECO, MiniAOD: for simulation
and reconstruction alike) in CMSSW can be retrieved using global tags
Global tag (GT):
A single entry point to retrieve all conditions consumed by a given workflow
GT is a collection of 200-400 tags, which are set of AlCa parameters measured by
calibration experts in DPG/POGs, released to dedicated database
It’s usually identified by a string e.g., 92X_upgrade17_realistic_v1
CMSCondDB: web portal for administration and navigation of the existing global tags

Wajid Ali Khan Monte Carlo Production System 09.04.2019 35/32

You might also like