DAS 2025 - Short exercise: Muons¶
Prerequisites¶
Make sure that you have done the following before starting with this short exercise:
- Request acces to the CERN analytix cluster here
Project description: MUO POG SFs derivation and DAS exercise
Source of data: /cms/muon_pog/parquet/muon
- Leave all the other fields as they are
- Set up a
CMSSW_13_0_14
area in youreos
space (i.e.cmsrel CMSSW_13_0_14
in/eos/user/U/USERNAME/
) - Clone the
spark_tnp
repository from here and install it following the instructions in theREADME
# TODO: Fix this link with the correct fork- Modify this line with the correct path to the
CMSSW_13_0_14
area you just set up (should be/eos/user/U/USERNAME/CMSSW_13_0_14/src
)
- Modify this line with the correct path to the
- Make sure you can access SWAN
Additional material¶
In this notebook we are going to heavily rely on Awkward Arrays
. We are going to use very baseline functionalities, and everything should be quite straightforward. However, if feel like you want a more gentle introduction to it beforehand (or while you work on this exercise), you can have a look at:
0. Introduction¶
We will start this exercise with a brief introduction about what muons are, what are their properties and how they interact with matter.
Afterwards you will learn how muons are reconstructed in the CMS detector and why they are such a standard candle for many physics analyses.
In the hands on part, which starts with this notebook, you will learn how to access information about muon candidates in the .root
files used in CMS.
In the first part of this notebook, we are going to use the so called nanoAOD
file format. This is the last level of skimming that you can get to in the processing of CMS data. It consists of a flat tree containing only the final and most general muon information, meaning that some selection criteria are also applied to the muons candidates (of course, also to the other particles) stored in here.
In the following, we will learn how to:
- Get familiar with
nanoAOD
files, their structure and their content; - Access the relevant kinematic features of muons from the
nanoAOD
files; - Perform an exploratory analysis, checking the distributions of muons transverse momentum, pseudo rapidity, and angular distribution;
- Compare the reconstructed and generator-level distributions of muons;
- Compare different reconstruction algorithms;
In addition, in the last part of this notebook, we will do a quick comparison between the muons stored in miniAOD
and nanoAOD
files.
Let's start by importing some relevant libraries, starting with ROOT
and DataFormats
, that are needed at the end of this exercise to handle miniAOD
files:
import ROOT
ROOT.gSystem.Load("libFWCoreFWLite.so")
ROOT.FWLiteEnabler.enable()
from DataFormats.FWLite import Handle, Events
Welcome to JupyROOT 6.26/11
Let's now import additional libraries, useful for data manipulation (uproot
, awkward
, vector
), numerical operations (numpy
) and plotting (matplotlib
and mplhep
):
# Data manipulation
import uproot
import vector
import awkward as ak
# Numerical operations
import numpy as np
# Plotting
import matplotlib.pyplot as plt
import mplhep as hep
# Make plots in CMS style
plt.style.use(hep.style.CMS)
Unfortunately, we do not have time to go through all the details of these libraries, or to give too much introduction to it, but at the bottom of this notebook you can find some additional references. Before moving on, we need one last call, i.e. vector.register_awkward()
. ROOT
automatically supports operations between four-vectors, but awkward
arrays don't. Awkward Array
is a library for nested, variable-sized data, including arbitrary-length lists, records, mixed types, and missing data, using NumPy
-like idioms. However, by default, there is no information that encodes our arrays as lorentz vectors. This call to vector.register_awkward()
is a crucial to integrate vector's functionality into Awkward Array
's global behavior, enabling Awkward Array
to recognize vector types and operations. This allows for efficient creation and manipulation of multi-dimensional Lorentz vectors (like Momentum2D
, Momentum3D
, etc.) within Awkward Array
data structures, making complex physics data analysis more streamlined.
vector.register_awkward()
In the spark_tnp
folder you have cloned for this exercise, you can find two files: DY_mini.root
and DY_nano.root
. These contain the same events, but in the two different formats of miniAOD
and nanoAOD
, respectively.
Specify in input_dir
the location of your spark_tnp
folder (if you have followed everything correctly, it should be something like /eos/user/U/USERNAME/spark_tnp
:
### Edit here: add your path
input_dir = "/eos/user/m/mbonanom/spark_tnp"
and now let's define the full path of our mini
and nanoAOD
files:
mc_nano = f"{input_dir}/DY_nano.root"
mc_mini = f"{input_dir}/DY_mini.root"
1. Inspecting nanoAOD
structure¶
First of all use use uproot
to open our file. You can find more details about the structure of nanoAOD
files at this documentation page. In general, every nanoAOD
file contains five TTree
objects:
KEY: TTree Events;1 Events
KEY: TTree LuminosityBlocks;1 LuminosityBlocks
KEY: TTree Runs;1 Runs
KEY: TTree MetaData;1 Job metadata
KEY: TTree ParameterSets;1 Parameter sets
While LuminosityBlocks
and Runs
contain the relevant information to determine when a given event was recorded (or gives unique event identificators to simulated events), the Events
TTree
is what we are interested about. It contains the information about all the objects reconstructed by the CMS detector.
Let's first open the file and extract the Events
tree:
file_nano = uproot.open(f"{mc_nano}")
mc_tree = file_nano["Events"]
As you can see, mc_tree
is a TTree
object that contains 1840 branches:
mc_tree
<TTree 'Events' (1840 branches) at 0x7f10f95a1fa0>
Task 1: How many of these branches have information associated to muons?
Hint: you can access the branches of a TTree
opened with uproot
by using .keys()
# All branches
branches = mc_tree.keys()
# Branches that have any reference to `mu`(ons)
mu_branches = [k for k in branches if 'mu' in k.lower()]
# Only `Muon_` branches
muo_branches = [k for k in branches if 'Muon_' in k]
muo_branches
['Muon_highPtId', 'Muon_highPurity', 'Muon_inTimeMuon', 'Muon_isGlobal', 'Muon_isPFcand', 'Muon_isStandalone', 'Muon_isTracker', 'Muon_jetNDauCharged', 'Muon_looseId', 'Muon_mediumId', 'Muon_mediumPromptId', 'Muon_miniIsoId', 'Muon_multiIsoId', 'Muon_mvaMuID_WP', 'Muon_nStations', 'Muon_nTrackerLayers', 'Muon_pfIsoId', 'Muon_puppiIsoId', 'Muon_softId', 'Muon_softMvaId', 'Muon_tightCharge', 'Muon_tightId', 'Muon_tkIsoId', 'Muon_triggerIdLoose', 'Muon_jetIdx', 'Muon_svIdx', 'Muon_fsrPhotonIdx', 'Muon_charge', 'Muon_pdgId', 'Muon_dxy', 'Muon_dxyErr', 'Muon_dxybs', 'Muon_dz', 'Muon_dzErr', 'Muon_eta', 'Muon_ip3d', 'Muon_jetPtRelv2', 'Muon_jetRelIso', 'Muon_mass', 'Muon_miniPFRelIso_all', 'Muon_miniPFRelIso_chg', 'Muon_mvaMuID', 'Muon_pfRelIso03_all', 'Muon_pfRelIso03_chg', 'Muon_pfRelIso04_all', 'Muon_phi', 'Muon_pt', 'Muon_ptErr', 'Muon_segmentComp', 'Muon_sip3d', 'Muon_softMva', 'Muon_tkRelIso', 'Muon_tunepRelPt', 'Muon_bsConstrainedChi2', 'Muon_bsConstrainedPt', 'Muon_bsConstrainedPtErr', 'Muon_mvaLowPt', 'Muon_mvaTTH', 'Muon_genPartFlav', 'Muon_genPartIdx', 'HLT_Dimuon0_Upsilon_Muon_L1_TM0', 'HLT_Dimuon0_Upsilon_Muon_NoL1Mass']
2. Understanding muons¶
We are now interested in looking at some kinematic distributions for the muon candidates (mass, transverse momentum, pseudorapidity, angular distribution), and we also want to compare the different algorithms that are used in CMS to reconstruct muons. nanoAOD
is very convenient when it comes to this, since it stores this information per muon (ie which reconstruction algorithm each muon passed).
Task 2: Load from the mc_tree
TTree
the branches with the information of the mass, transverse momentum, pseudorapidity, and azimuthal angular and the three reconstruction algorithms.
Hint 1: as you should have seen above, branches for the variable
for a given object (Object
) in nanoAOD
have names Object_variable
.
Hint 2: recall that the reconstructions are called: tracker muon, global muon, and standalone muon algorithm.
Hint 3: a list of branches from a TTree
opened with uproot
can be loaded as tree_name.arrays(branches, library="ak")
# Load muon-related branches
muon_branches = [
"Muon_pt",
"Muon_eta",
"Muon_phi",
"Muon_mass",
"Muon_isTracker",
"Muon_isGlobal",
"Muon_isStandalone"
]
reco_muons = mc_tree.arrays(muon_branches, library = "ak")
Now we can explore the content of the muon candidates reconstructed by the CMS detector:
reco_muons.type
4386 * {"Muon_pt": var * float32, "Muon_eta": var * float32, "Muon_phi": var * float32, "Muon_mass": var * float32, "Muon_isTracker": var * bool, "Muon_isGlobal": var * bool, "Muon_isStandalone": var * bool}
reco_muons
<Array [{Muon_pt: [], ... ] type='4386 * {"Muon_pt": var * float32, "Muon_eta": ...'>
It is worth taking a moment to understand the structure of these arrays, in case you are not familiar with it.
The reco_muons
object contains 4386
entries, meaning that the file we loaded contains this many events. Each element of reco_muons
has a variable lenght, corresponding to the number of muons in each event. For example, the first event does not have muons:
reco_muons[0]
<Record ... Muon_isStandalone: []} type='{"Muon_pt": var * float32, "Muon_eta": ...'>
but the fourth event contains two muons:
reco_muons[4]
<Record ... Muon_isStandalone: [True, False]} type='{"Muon_pt": var * float32, "...'>
we can look at their quantities by accessing the individual fields, e.g.:
reco_muons[4].Muon_pt # equivalent to reco_muons[4]["Muon_pt"]
<Array [25.7, 18.5] type='2 * float32'>
Task 3:
T3.1: make a plot of the muons mass, transverse momentum, pseudorapidity, angular distribution
T3.2: what are the limit values of each quantity? I.e. what are the minimum and maximum values of each of the observables you just plotted?
Hint: each distribution needs to end up in a histogram (e.g. use plt.hist()
). Choose a reasonable binning that makes the observables visible.
# Create a figure with one row per branch
muon_kinematics = muon_branches[:4]
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
axes = axes.flatten()
for ax, branch in zip(axes, muon_kinematics):
data = reco_muons[branch][reco_muons.Muon_pt < 1000]
ax.hist(ak.flatten(data), bins=50, alpha=0.7, color="C0")
ax.set_title(branch)
ax.set_xlabel("Value")
ax.set_ylabel("Counts")
ax.grid(True)
/eos/user/m/mbonanom/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:518: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero. setattr(self, word, getattr(machar, word).flat[0]) /eos/user/m/mbonanom/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero. return self._float_to_str(self.smallest_subnormal)
3. Muon reconstruction algorithms¶
Let us know move on to the understanding of muon algorithms. As we mentioned in the introductory presentation, CMS employs three different reconstruction algorithms for the muons:
- Standalone (STA) reconstruction : uses only the muon system
- Tracker algorithm: silicon tracker may be seeded by muons
- Global reconstruction: matching compatible with STA and tracks
Task 4:
- T 4.1: How many muons do we have in the file we are studying? How many muons does e ac
- T 4.2: How many muons are reconstructed by each algorithm?
- T 4.3: Plot the transverse momentum for all the muons, and overaly to it the distribution for the transverse momentum of tracker muons and global muons. What can you conclude?
Hint 1: you can use the ak.num(array)
to count the number of occurrences.
Hint 2: as you can see from the nanoAOD
documentation the branches associated to the reconstruction algorithms contain boolean variables. For example isTracker
contains either True
or False
for each muon candidate, depending on whether it passed the tracker reconstruction algorithm or not.
plt.hist(ak.num(reco_muons["Muon_pt"]))
(array([2.548e+03, 9.310e+02, 7.320e+02, 0.000e+00, 1.490e+02, 2.400e+01, 0.000e+00, 0.000e+00, 1.000e+00, 1.000e+00]), array([0. , 0.7, 1.4, 2.1, 2.8, 3.5, 4.2, 4.9, 5.6, 6.3, 7. ]), <BarContainer object of 10 artists>)
all_muons = sum(ak.num(reco_muons["Muon_pt"]))
print("Total number of muons:", all_muons)
Total number of muons: 2951
# --- Counts of different muon types ---
tracker_muons = reco_muons.Muon_isTracker
global_muons = reco_muons.Muon_isGlobal
standalone_muons = reco_muons.Muon_isStandalone
print("Tracker muons:", sum(ak.num(reco_muons["Muon_pt"][tracker_muons])))
print("Global muons:", sum(ak.num(reco_muons["Muon_pt"][global_muons])))
print("Standalone muons:", sum(ak.num(reco_muons["Muon_pt"][standalone_muons])))
Tracker muons: 2746 Global muons: 2528 Standalone muons: 2603
# baseline
pt = reco_muons.Muon_pt
eta = reco_muons.Muon_eta
phi = reco_muons.Muon_phi
pt_tracker, eta_tracker, phi_tracker = pt[tracker_muons], eta[tracker_muons], phi[tracker_muons]
pt_global, eta_global, phi_global = pt[global_muons], eta[global_muons], phi[global_muons]
# --- Plotting ---
def plot_hist(data, title, xlabel, bins=50):
plt.hist(ak.to_numpy(ak.flatten(data)), bins=bins, histtype="step")
plt.title(title)
plt.xlabel(xlabel)
plt.ylabel("Entries")
plt.show()
bins_pt = np.linspace(0, 200, 100)
t = plt.hist(ak.flatten(pt), bins_pt, label = 'All muons', histtype = 'step')
t = plt.hist(ak.flatten(pt_tracker), bins_pt, label = 'Muons tracker', histtype = 'step')
t = plt.hist(ak.flatten(pt_global), bins_pt, label = 'Muons global', histtype = 'step')
plt.legend()
<matplotlib.legend.Legend at 0x7f10d5c2d280>
4. Generator level muons¶
We now want to look at the muons defined at generator level. Since we are looking at DY, we naively would expect that all the events contain two prompt leptons at generator level.
Task 5: How many generator level prompt muons are present in each event?
gen_branches = ["GenPart_pt",
"GenPart_eta",
"GenPart_phi",
"GenPart_mass",
"GenPart_pdgId",
"GenPart_status",
"GenPart_statusFlags"]
genparts = mc_tree.arrays(gen_branches, library = "ak")
In this case, the nanoAOD
file does not contain one branch per gen-level particle, but it contains branches for all the GenPart
-icles in the event. Hence, we need to load the entire collection and then sort out only the prompt muons from there:
sel_gpt = abs(genparts.GenPart_pdgId) == 13
gen_pt = genparts.GenPart_pt[sel_gpt]
gen_status = genparts.GenPart_statusFlags
isPrompt = (gen_status & (1 << 0)) != 0
gen_pt_good = genparts.GenPart_pt[(sel_gpt) & (genparts.GenPart_status ==1) & isPrompt]
ngenmu = plt.hist(ak.num(gen_pt_good,axis=1))
ak.num(gen_pt_good,axis=1)
<Array [0, 0, 0, 0, 2, 0, ... 0, 2, 2, 0, 0, 0] type='4386 * int64'>
As expected, our events have either 0 generator level muons or exactly two.
Let us now compare the prompt muons we have defined above with the muons reconstructed with the global algorithm.
Task 6: Compare the histogram of the distribution of the transverse momentum of global muons and gen-level prompt muons. What do you note? What can you conclude about the global muons reconstruction algorithm?
t = plt.hist(ak.flatten(pt_global), bins_pt, label = 'Muons global', histtype = 'step',density=True)
t = plt.hist(ak.flatten(gen_pt_good), bins_pt, label = 'Prompt gen muons', histtype = 'step',density=True)
plt.legend()
<matplotlib.legend.Legend at 0x7f10d316fd90>
5. miniAOD
vs nanoAOD
¶
In the following we quickly explore the other most popular format in CMS analyses: miniAOD
. This format is a bit more cumbersome to handle, as it requires ROOT
libraries and dependencies to work with. However, it has a lower level of filtering compared to nanoAOD
, so it could be that it contains more information. Let's have a look at it by loading the miniAOD
file corresponding to the nanoAOD
one we worked on so far, and by checking how many muons are reconstructed by each algorithm in this case:
events = Events("/eos/user/m/mbonanom/spark_tnp/DY_mini.root")
#define the collections to read, these are the same that you obtained running edmDumpEventContent
recMuonLabel = "slimmedMuons"
recMuons = Handle("std::vector<pat::Muon>")
genMuons = Handle("std::vector<pat::PackedGenParticle>")
# setup some counters
count_nTracker=0
count_nGlobal=0
count_nStandalone=0
for i, event in enumerate(events):
event.getByLabel(recMuonLabel, recMuons)
count_per_event_nTracker=0
count_per_event_nGlobal=0
count_per_event_nStandalone=0
for j, muon in enumerate(recMuons.product(),1):
if muon.isTrackerMuon():
count_nTracker+=1
count_per_event_nTracker+=1
if muon.isGlobalMuon():
count_nGlobal+=1
count_per_event_nGlobal+=1
if muon.isStandAloneMuon() :
count_nStandalone+=1
count_per_event_nStandalone+=1
print(count_nTracker, count_nGlobal, count_nStandalone)
9993 3542 4474
What can you see? How do these numbers make sense based on what you have learnt on the three reconstruction algorithms? How does it compare with the above numbers:
Tracker muons: 2746
Global muons: 2528
Standalone muons: 2603
Task 7: Try to play around with the event
information from the miniAOD
for loop above. Can you figure out where the difference comes from?
Hint: think of which observables we looked at above and check what is their impact on the miniAOD
muons.
6. Bonus track: gen-level to reco-level matching¶
Congratulations! You have made it until now. Take some rest before we move on to the next part of this short exercise, or dive more into the muons with the following bonus exercise!
Hint: If you have picked up the higgs to four-leptons long exercise, this bonus exercise could be helpful and interesting ;)
Since we are looking at Drell-Yann events, we can try to reconstruct the Z boson. As we have seen above, while at gen-level we either have exactly two prompt leptons or none at all in each event, at reco level the situation could be a less clear. In the following we try to:
- Reconstruct the Z boson from events that contain exactly two reco-level muons;
- Reconstruct the Z boson from gen-level muons (this is easier, since we should always be able to do so);
- Reconstruct the Z boson from reco-level muons only if they can be matched to a gen-level muon, i.e. if the muon we have reconstructed is compatible with the generated one.
First of all, we need to load the reconstructed muons and the gen-level ones into four vectors. To do so, we use the following structure:
mu_vec = ak.zip({
"pt": muons.Muon_pt,
"eta": muons.Muon_eta,
"phi": muons.Muon_phi,
"mass": muons.Muon_mass
},
with_name="Momentum4D",
behavior=muons.behavior
)
Here you can read more about the ak.zip
function. The with_name="Momentum4D"
inherits from the vector
library we imported above, and ensures that our mu_vec
object has a "Momentum4D"
behavior
attached to it, and it can be used to perform operations as if it were a TLorentzVector
of ROOT
.
mu_vec = ak.zip({
"pt": reco_muons.Muon_pt,
"eta": reco_muons.Muon_eta,
"phi": reco_muons.Muon_phi,
"mass": reco_muons.Muon_mass
},
with_name="Momentum4D",
behavior=reco_muons.behavior
)
## Do the same for gen-level muons here
gen_vec = vector.awkward.zip({
"pt": genparts.GenPart_pt,
"eta": genparts.GenPart_eta,
"phi": genparts.GenPart_phi,
"mass": genparts.GenPart_mass
},
with_name="Momentum4D",
behavior=genparts.behavior
)
Now we select only events that contain exactly two muons and we build the invariant mass of these pairs.
Task: The Z boson should actually contain two opposite sign muons. Can you try to implement this check in building the Z mass? How does it change?
sel_mm = ak.num(mu_vec) == 2
mu_pairs = mu_vec[sel_mm]
mm_mass = (mu_pairs[:,0] + mu_pairs[:,1]).mass
/eos/user/m/mbonanom/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:518: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero. setattr(self, word, getattr(machar, word).flat[0]) /eos/user/m/mbonanom/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero. return self._float_to_str(self.smallest_subnormal)
t = plt.hist(mm_mass, histtype = 'step', bins = np.linspace(0, 200, 100))
We now do the same for gen-level muons, but first we have to select only the prompt muons, like we did above:
sel_gpt = (abs(genparts.GenPart_pdgId) == 13)
isPrompt = (genparts.GenPart_statusFlags & (1 << 0)) != 0
sel_gen = sel_gpt & (genparts.GenPart_status == 1) & isPrompt
good_gen_mu = gen_vec[sel_gen]
## Write here your code to select only events with exactly two muons at gen level
sel_mm_gen = ak.num(good_gen_mu) == 2
gen_mm_pairs = good_gen_mu[sel_mm_gen]
gen_mm = (gen_mm_pairs[:, 0] + gen_mm_pairs[:, 1]).mass
t = plt.hist(gen_mm, histtype = 'step', bins = np.linspace(0, 200, 100))
Now we perform a deltaR
based matching. We select only muons that have a deltaR
distance from the gen-level muon smaller tha 0.2.
# Exactly two reco + two gen muons
sel_evt = sel_mm & sel_mm_gen
mu_two = mu_vec[sel_evt]
gen_two = good_gen_mu[sel_evt]
# Matching reco to gen by ΔR
# Build pairings: reco vs gen
# shape: (events, nReco, nGen)
dr = mu_two[:, :, None].deltaR(gen_two[:, None, :])
best_match_idx = ak.argmin(dr, axis=2) # index of gen matched to each reco
best_match_dr = ak.min(dr, axis=2) # ΔR of that match
matched = best_match_dr < 0.2
valid_match = (ak.sum(matched, axis=1) == 2) & (best_match_idx[:,0] != best_match_idx[:,1])
mu_matched = mu_two[valid_match]
pairs = ak.combinations(mu_matched, 2, fields=["mu1", "mu2"])
m_reco_matched = (pairs["mu1"] + pairs["mu2"]).mass
plt.hist(ak.to_numpy(ak.flatten(m_reco_matched)), bins = np.linspace(0, 200, 100), histtype="step")
plt.xlabel("m(μμ) [GeV]")
plt.ylabel("Events")
plt.title("Reco dimuon mass")
Text(0.5, 1.0, 'Reco dimuon mass')