mccmTutorial
mccmTutorial
Analysis of LHC data at CMS experiment requires the production of a large number
of simulated events
McM produced billions of simulated events using different campaigns during RunI, II
Each campaign takes in a specific detector and LHC conditions
About more than 20 different groups (PAGs, POGs, DPGs) are working at CMS using
various MC samples
Hundreds of signal and backgrounds samples are needed for various studies
Strong and reliable system is required to manage information needed for
configuration and prioritization of event production
Ensure efficient book keeping and production of MC samples for different groups
Take input form a user in a simple way and interfaces with the CMS production
infrastructure
Take the user → Monte Carlo Management → Tier-1/Tier-2 computing centers
CMS presents a challenging environment not only in terms of physics to discover, the
detector to build and operate but also in:
Data volume and the necessary computing resources
Computing resources and dataset are at least an order of magnitude larger than the
previous experiments
The large scale CMS computing and storage requirements make it difficult to localize all of
them at one place (technical and funding reasons)
Many CMS collaborators are not based at CERN and they have access to significant
computing resources (other than CERN)
It is advantageous to harness them for CMS computing
It also helps to develop local infrastructure and secure local funding
DAS
Data Tier can be defined as the event contents/information a dataset stores. Most
commonly used are:
RAW, RECO, AOD, AODSIM, MiniAOD, NanoAOD, USER, GEN, FEVT
RAW contains full event information from the Tier-0 (i.e., from CERN), containing
’raw’ detector information (detector element, hits, etc.)
RAW is not used directly for analysis
RECO & AOD
RECO (RECOnstructed data): output from first processing by Tier-0. This layer contains
reconstructed physics objects, but it’s still very detailed ∼ 2 MB
Used mostly for dedicated studies and detector commissioning
AOD (Analysis Object Data): distilled version of RECO data, contains (∼ 40%) RECO
information and can be used for analysis
MiniAOD
Lightweight data tier MiniAOD is a step further in data reduction (∼ 10–15% of AOD size)
Typical event size (30–50 kB/evevnt) serve the needs of ∼ 90% of CMS analyses
NanoAOD
NanoAOD consists of ntuple like format, readable with bare ROOT and containing
per/event information that is needed in most generic analyses (30 - 50%)
Produced on top of MiniAOD, typical event size (1–2kB), also fast to run: O(10–20Hz)
Further details can be found here
Register your self if you are using the McM for first time
Click Users → than Click Add me! button on lower left corner
Role/access rights increase can be requested at any time
{
Hadronization, Validation
Parameters to be modified ?
GEN-SIM
Particle-detector interaction
Simulation CMS Geometry, Magnetic field
Simulation
{
Caliberation, Trigger menu etc
DIGI-RECO
RAW2DIGI, L1Reco, RECO,
Reconstruction Reconstruction algorithms
VALIDATION, DQM
MiniAOD
NanoAOD
Request is a set of instructions and configuration options for Monte Carlo event
generation prepared by the Generator and PPD groups. It may represent different
processing steps and their combinations (LHE, GEN-SIM, RECO)
TOP-RunIIFall18pLHE-00003, TOP-RunIIFall18GS-00003
TOP-RunIIAutumn18DRPremix-00109
TOP-RunIIAutumn18MiniAOD-00116, TOP-RunIIAutumn18NanoAOD-00041
Flow is a connection between at least two (2) campaigns to produce a dataset in
more than one campaign. It can overwrite parameters in subsequent campaign e.g.,
flowRunIIFall18GS → flowRunIIAutumn18DRPremix →
flowRunIIAutumn18MiniAOD → flowRunIIAutumn18NanoAOD
flowPhaseIISpring17DPU200, DRNoPU, DRPU140 and similarly 0T, 38T
Prep ID is a unique identification string for a Monte Carlo request that allows to track
it in different systems
Workflow is a set of tasks to be processed by the production tools. For each request
there can be multiple workflows
Campaign is a central platform in McM used to produce a set of requests sharing the
same physics goal, software release, energy and event processing configuration e.g.,
GENonly, GEN-SIM, wmLHE, wmLHEGS, DIGI-RECO, DIGIonly etc,.
wmLHE+GS
chain_RunIIWinter19PFCalib16GS_flowRunIIWinter19PFCalib16DRPU0to70_
flowRunIIWinter19PFCalib16MiniAOD_flowRunIIWinter19PFCalib16NanoAOD
Chained campaigns can start from wmLHE, wmLHEGS, pLHE, GEN-SIM
Chained request is a concrete set of processing requests starting from a root
request and going through the steps of a chained campaign e.g.,
TOP-chain_RunIIFall18wmLHEGS_flowRunIIAutumn18DRPremix_
flowRunIIAutumn18MiniAOD_flowRunIIAutumn18NanoAODv4-00003
The number at the end of request is always unique 00003:
https://cms-pdmv.cern.ch/mcm/chained_requests?member_of_campaign=
chain_RunIIFall18wmLHEGS_flowRunIIAutumn18DRPremix_
flowRunIIAutumn18MiniAOD_flowRunIIAutumn18NanoAODv4&prepid=TOP*00003
Inset shows the growth of RunIIAutumn18MiniAOD over the time since approval
Wajid Ali Khan Monte Carlo Production System 09.04.2019 16/32
Production Monitoring Platform (pMp): 2
Dataset name in CMS always follows the format (three forward slashes): /*/*/*
/DatasetName/Campaign-ProcessString-globalTag-Ext-Version/DataTier
Prep-ops: [email protected]
Default gateway to interact with MC GEN contacts, experts from Trigger, AlcaDB,
computing, and to post MccM announcements
Generator group: [email protected]
Main HN to discuss MC generation issues, and to post MccM announcements
CMSSW Release and data operations
[email protected], [email protected]
To discuss and integrate new algorithms in official CMS software
Monte Carlo Coordination Meetings (MccM):
This meeting is open to GEN contacts (DPGs, POGs, PAGs) and CMS analysts to discuss
MC requests/tickets with MccM core team
Twiki: https://twiki.cern.ch/twiki/bin/view/CMS/PdmVMonteCarloCoordinationMeeting
CERN Time: 3 PM - 4 PM, every Wednesday
PPD General Meeting:
https://indico.cern.ch/category/3905
CERN Time: 2 PM - 4 PM, every Thursday
ORP:
CMSSW release plan, integration of new pull requests in CMSSW, etc
CERN Time: 5 PM - 6 PM, every Tuesday
Announced at: [email protected]
Suppose: /TTToSemiLeptonic_TuneCP5down_PSweights_13TeV-powheg-pythia8/
RunIIFall17MiniAOD-PU2017_94X_mc2017_realistic_v11-v1/MINIAODSIM
We want to find the corresponding request in McM:
Select any MiniAOD dataset from your analysis and find it in DAS
Find its McM prepID, global tag and CMSSW release
Find LHE, GEN-SIM, DIGI-RECO requests that are used to produce this MiniAOD request
Find global tag, x-section, generator level cuts in a GEN-SIM request that were used to
produce that MiniAOD
Check if there is any extension of the sample has been produced
Find the Les Houches Event file used to produce TOP-RunIIFall18GS-00003.
Find various requests been made using request B2G-RunIIWinter15wmLHE-00007
as a root request.
Find cmsDriver settings used for request TOP-RunIIAutumn18MiniAOD-00006.
Find the grid pack used to produce request TOP-RunIIAutumn18MiniAOD-00006.
Find PDF used by default in the sample. There can be a number of ways but find it using
at least two different ways.
Find data-cards used to generate the root request.
The alignment and calibration conditions needed by all stages of the data production
(SIM, DIGI: for simulated events) and processing (RECO, MiniAOD: for simulation
and reconstruction alike) in CMSSW can be retrieved using global tags
Global tag (GT):
A single entry point to retrieve all conditions consumed by a given workflow
GT is a collection of 200-400 tags, which are set of AlCa parameters measured by
calibration experts in DPG/POGs, released to dedicated database
It’s usually identified by a string e.g., 92X_upgrade17_realistic_v1
CMSCondDB: web portal for administration and navigation of the existing global tags