DAS 2025 - Short exercise: Muons¶

Prerequisites¶

Make sure that you have done the following before starting with this short exercise:

  • Request acces to the CERN analytix cluster here
    • Project description: MUO POG SFs derivation and DAS exercise
    • Source of data: /cms/muon_pog/parquet/muon
    • Leave all the other fields as they are
  • Set up a CMSSW_13_0_14 area in your eos space (i.e. cmsrel CMSSW_13_0_14 in /eos/user/U/USERNAME/)
  • Clone the spark_tnp repository from here and install it following the instructions in the README # TODO: Fix this link with the correct fork
    • Modify this line with the correct path to the CMSSW_13_0_14 area you just set up (should be /eos/user/U/USERNAME/CMSSW_13_0_14/src)
  • Make sure you can access SWAN

Additional material¶

In this notebook we are going to heavily rely on Awkward Arrays. We are going to use very baseline functionalities, and everything should be quite straightforward. However, if feel like you want a more gentle introduction to it beforehand (or while you work on this exercise), you can have a look at:

  • This and this tutorial
  • This introductory page and all the inputs on the left sidebar
  • Google search ;)

In this notebook¶

  1. Introduction
  2. Inspecting nanoAOD structure
  3. Understanding muons
  4. Muon reconstruction algorithms
  5. Generator level muons
  6. miniAOD vs nanoAOD
  7. Bonus exercise: gen-level matching

0. Introduction¶

We will start this exercise with a brief introduction about what muons are, what are their properties and how they interact with matter. Afterwards you will learn how muons are reconstructed in the CMS detector and why they are such a standard candle for many physics analyses. In the hands on part, which starts with this notebook, you will learn how to access information about muon candidates in the .root files used in CMS. In the first part of this notebook, we are going to use the so called nanoAOD file format. This is the last level of skimming that you can get to in the processing of CMS data. It consists of a flat tree containing only the final and most general muon information, meaning that some selection criteria are also applied to the muons candidates (of course, also to the other particles) stored in here.

In the following, we will learn how to:

  • Get familiar with nanoAOD files, their structure and their content;
  • Access the relevant kinematic features of muons from the nanoAOD files;
  • Perform an exploratory analysis, checking the distributions of muons transverse momentum, pseudo rapidity, and angular distribution;
  • Compare the reconstructed and generator-level distributions of muons;
  • Compare different reconstruction algorithms;

In addition, in the last part of this notebook, we will do a quick comparison between the muons stored in miniAOD and nanoAOD files.

Let's start by importing some relevant libraries, starting with ROOT and DataFormats, that are needed at the end of this exercise to handle miniAOD files:

In [1]:
import ROOT
ROOT.gSystem.Load("libFWCoreFWLite.so")
ROOT.FWLiteEnabler.enable()
from DataFormats.FWLite import Handle, Events
Welcome to JupyROOT 6.26/11

Let's now import additional libraries, useful for data manipulation (uproot, awkward, vector), numerical operations (numpy) and plotting (matplotlib and mplhep):

In [2]:
# Data manipulation
import uproot
import vector
import awkward as ak
# Numerical operations
import numpy as np
# Plotting
import matplotlib.pyplot as plt
import mplhep as hep
# Make plots in CMS style
plt.style.use(hep.style.CMS)

Unfortunately, we do not have time to go through all the details of these libraries, or to give too much introduction to it, but at the bottom of this notebook you can find some additional references. Before moving on, we need one last call, i.e. vector.register_awkward(). ROOT automatically supports operations between four-vectors, but awkward arrays don't. Awkward Array is a library for nested, variable-sized data, including arbitrary-length lists, records, mixed types, and missing data, using NumPy-like idioms. However, by default, there is no information that encodes our arrays as lorentz vectors. This call to vector.register_awkward() is a crucial to integrate vector's functionality into Awkward Array's global behavior, enabling Awkward Array to recognize vector types and operations. This allows for efficient creation and manipulation of multi-dimensional Lorentz vectors (like Momentum2D, Momentum3D, etc.) within Awkward Array data structures, making complex physics data analysis more streamlined.

In [3]:
vector.register_awkward()

In the spark_tnp folder you have cloned for this exercise, you can find two files: DY_mini.root and DY_nano.root. These contain the same events, but in the two different formats of miniAOD and nanoAOD, respectively.

Specify in input_dir the location of your spark_tnp folder (if you have followed everything correctly, it should be something like /eos/user/U/USERNAME/spark_tnp:

In [4]:
### Edit here: add your path
input_dir = "/eos/user/m/mbonanom/spark_tnp"

and now let's define the full path of our mini and nanoAOD files:

In [5]:
mc_nano = f"{input_dir}/DY_nano.root"
mc_mini = f"{input_dir}/DY_mini.root"

1. Inspecting nanoAOD structure¶

First of all use use uproot to open our file. You can find more details about the structure of nanoAOD files at this documentation page. In general, every nanoAOD file contains five TTree objects:

  KEY: TTree	Events;1	Events
  KEY: TTree	LuminosityBlocks;1	LuminosityBlocks
  KEY: TTree	Runs;1	Runs
  KEY: TTree	MetaData;1	Job metadata
  KEY: TTree	ParameterSets;1	Parameter sets

While LuminosityBlocks and Runs contain the relevant information to determine when a given event was recorded (or gives unique event identificators to simulated events), the Events TTree is what we are interested about. It contains the information about all the objects reconstructed by the CMS detector.

Let's first open the file and extract the Events tree:

In [6]:
file_nano = uproot.open(f"{mc_nano}")
mc_tree = file_nano["Events"]

As you can see, mc_tree is a TTree object that contains 1840 branches:

In [7]:
mc_tree
Out[7]:
<TTree 'Events' (1840 branches) at 0x7f10f95a1fa0>

Task 1: How many of these branches have information associated to muons?

Hint: you can access the branches of a TTree opened with uproot by using .keys()

In [8]:
# All branches
branches = mc_tree.keys()
# Branches that have any reference to `mu`(ons)
mu_branches = [k for k in branches if 'mu' in k.lower()]
# Only `Muon_` branches
muo_branches = [k for k in branches if 'Muon_' in k]
In [9]:
muo_branches
Out[9]:
['Muon_highPtId',
 'Muon_highPurity',
 'Muon_inTimeMuon',
 'Muon_isGlobal',
 'Muon_isPFcand',
 'Muon_isStandalone',
 'Muon_isTracker',
 'Muon_jetNDauCharged',
 'Muon_looseId',
 'Muon_mediumId',
 'Muon_mediumPromptId',
 'Muon_miniIsoId',
 'Muon_multiIsoId',
 'Muon_mvaMuID_WP',
 'Muon_nStations',
 'Muon_nTrackerLayers',
 'Muon_pfIsoId',
 'Muon_puppiIsoId',
 'Muon_softId',
 'Muon_softMvaId',
 'Muon_tightCharge',
 'Muon_tightId',
 'Muon_tkIsoId',
 'Muon_triggerIdLoose',
 'Muon_jetIdx',
 'Muon_svIdx',
 'Muon_fsrPhotonIdx',
 'Muon_charge',
 'Muon_pdgId',
 'Muon_dxy',
 'Muon_dxyErr',
 'Muon_dxybs',
 'Muon_dz',
 'Muon_dzErr',
 'Muon_eta',
 'Muon_ip3d',
 'Muon_jetPtRelv2',
 'Muon_jetRelIso',
 'Muon_mass',
 'Muon_miniPFRelIso_all',
 'Muon_miniPFRelIso_chg',
 'Muon_mvaMuID',
 'Muon_pfRelIso03_all',
 'Muon_pfRelIso03_chg',
 'Muon_pfRelIso04_all',
 'Muon_phi',
 'Muon_pt',
 'Muon_ptErr',
 'Muon_segmentComp',
 'Muon_sip3d',
 'Muon_softMva',
 'Muon_tkRelIso',
 'Muon_tunepRelPt',
 'Muon_bsConstrainedChi2',
 'Muon_bsConstrainedPt',
 'Muon_bsConstrainedPtErr',
 'Muon_mvaLowPt',
 'Muon_mvaTTH',
 'Muon_genPartFlav',
 'Muon_genPartIdx',
 'HLT_Dimuon0_Upsilon_Muon_L1_TM0',
 'HLT_Dimuon0_Upsilon_Muon_NoL1Mass']

2. Understanding muons¶

We are now interested in looking at some kinematic distributions for the muon candidates (mass, transverse momentum, pseudorapidity, angular distribution), and we also want to compare the different algorithms that are used in CMS to reconstruct muons. nanoAOD is very convenient when it comes to this, since it stores this information per muon (ie which reconstruction algorithm each muon passed).

Task 2: Load from the mc_tree TTree the branches with the information of the mass, transverse momentum, pseudorapidity, and azimuthal angular and the three reconstruction algorithms.

Hint 1: as you should have seen above, branches for the variable for a given object (Object) in nanoAOD have names Object_variable.

Hint 2: recall that the reconstructions are called: tracker muon, global muon, and standalone muon algorithm.

Hint 3: a list of branches from a TTree opened with uproot can be loaded as tree_name.arrays(branches, library="ak")

In [10]:
# Load muon-related branches
muon_branches = [
"Muon_pt",
"Muon_eta",
"Muon_phi",
"Muon_mass",
"Muon_isTracker",
"Muon_isGlobal",
"Muon_isStandalone"
]
reco_muons = mc_tree.arrays(muon_branches, library = "ak")

Now we can explore the content of the muon candidates reconstructed by the CMS detector:

In [11]:
reco_muons.type
Out[11]:
4386 * {"Muon_pt": var * float32, "Muon_eta": var * float32, "Muon_phi": var * float32, "Muon_mass": var * float32, "Muon_isTracker": var * bool, "Muon_isGlobal": var * bool, "Muon_isStandalone": var * bool}
In [12]:
reco_muons
Out[12]:
<Array [{Muon_pt: [], ... ] type='4386 * {"Muon_pt": var * float32, "Muon_eta": ...'>

It is worth taking a moment to understand the structure of these arrays, in case you are not familiar with it.

The reco_muons object contains 4386 entries, meaning that the file we loaded contains this many events. Each element of reco_muons has a variable lenght, corresponding to the number of muons in each event. For example, the first event does not have muons:

In [13]:
reco_muons[0]
Out[13]:
<Record ... Muon_isStandalone: []} type='{"Muon_pt": var * float32, "Muon_eta": ...'>

but the fourth event contains two muons:

In [14]:
reco_muons[4]
Out[14]:
<Record ... Muon_isStandalone: [True, False]} type='{"Muon_pt": var * float32, "...'>

we can look at their quantities by accessing the individual fields, e.g.:

In [15]:
reco_muons[4].Muon_pt # equivalent to reco_muons[4]["Muon_pt"]
Out[15]:
<Array [25.7, 18.5] type='2 * float32'>

Task 3:

  • T3.1: make a plot of the muons mass, transverse momentum, pseudorapidity, angular distribution

  • T3.2: what are the limit values of each quantity? I.e. what are the minimum and maximum values of each of the observables you just plotted?

Hint: each distribution needs to end up in a histogram (e.g. use plt.hist()). Choose a reasonable binning that makes the observables visible.

In [16]:
# Create a figure with one row per branch
muon_kinematics = muon_branches[:4]
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
axes = axes.flatten()

for ax, branch in zip(axes, muon_kinematics):
    data = reco_muons[branch][reco_muons.Muon_pt < 1000]
    ax.hist(ak.flatten(data), bins=50, alpha=0.7, color="C0")
    ax.set_title(branch)
    ax.set_xlabel("Value")
    ax.set_ylabel("Counts")
    ax.grid(True)
/eos/user/m/mbonanom/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:518: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/eos/user/m/mbonanom/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
No description has been provided for this image

3. Muon reconstruction algorithms¶

Let us know move on to the understanding of muon algorithms. As we mentioned in the introductory presentation, CMS employs three different reconstruction algorithms for the muons:

  • Standalone (STA) reconstruction : uses only the muon system
  • Tracker algorithm: silicon tracker may be seeded by muons
  • Global reconstruction: matching compatible with STA and tracks

Task 4:

  • T 4.1: How many muons do we have in the file we are studying? How many muons does e ac
  • T 4.2: How many muons are reconstructed by each algorithm?
  • T 4.3: Plot the transverse momentum for all the muons, and overaly to it the distribution for the transverse momentum of tracker muons and global muons. What can you conclude?

Hint 1: you can use the ak.num(array) to count the number of occurrences.

Hint 2: as you can see from the nanoAOD documentation the branches associated to the reconstruction algorithms contain boolean variables. For example isTracker contains either True or False for each muon candidate, depending on whether it passed the tracker reconstruction algorithm or not.

In [17]:
plt.hist(ak.num(reco_muons["Muon_pt"]))
Out[17]:
(array([2.548e+03, 9.310e+02, 7.320e+02, 0.000e+00, 1.490e+02, 2.400e+01,
        0.000e+00, 0.000e+00, 1.000e+00, 1.000e+00]),
 array([0. , 0.7, 1.4, 2.1, 2.8, 3.5, 4.2, 4.9, 5.6, 6.3, 7. ]),
 <BarContainer object of 10 artists>)
No description has been provided for this image
In [18]:
all_muons = sum(ak.num(reco_muons["Muon_pt"]))
print("Total number of muons:", all_muons)
Total number of muons: 2951
In [19]:
# --- Counts of different muon types ---
tracker_muons = reco_muons.Muon_isTracker
global_muons = reco_muons.Muon_isGlobal
standalone_muons = reco_muons.Muon_isStandalone
In [20]:
print("Tracker muons:", sum(ak.num(reco_muons["Muon_pt"][tracker_muons])))
print("Global muons:",  sum(ak.num(reco_muons["Muon_pt"][global_muons])))
print("Standalone muons:", sum(ak.num(reco_muons["Muon_pt"][standalone_muons])))
Tracker muons: 2746
Global muons: 2528
Standalone muons: 2603
In [21]:
# baseline
pt = reco_muons.Muon_pt
eta = reco_muons.Muon_eta
phi = reco_muons.Muon_phi

pt_tracker, eta_tracker, phi_tracker = pt[tracker_muons], eta[tracker_muons], phi[tracker_muons]
pt_global, eta_global, phi_global = pt[global_muons], eta[global_muons], phi[global_muons]
In [22]:
# --- Plotting ---
def plot_hist(data, title, xlabel, bins=50):
    plt.hist(ak.to_numpy(ak.flatten(data)), bins=bins, histtype="step")
    plt.title(title)
    plt.xlabel(xlabel)
    plt.ylabel("Entries")
    plt.show()
In [24]:
bins_pt = np.linspace(0, 200, 100)
t = plt.hist(ak.flatten(pt), bins_pt, label = 'All muons', histtype = 'step')
t = plt.hist(ak.flatten(pt_tracker), bins_pt, label = 'Muons tracker', histtype = 'step')
t = plt.hist(ak.flatten(pt_global), bins_pt, label = 'Muons global', histtype = 'step')
plt.legend()
Out[24]:
<matplotlib.legend.Legend at 0x7f10d5c2d280>
No description has been provided for this image

4. Generator level muons¶

We now want to look at the muons defined at generator level. Since we are looking at DY, we naively would expect that all the events contain two prompt leptons at generator level.

Task 5: How many generator level prompt muons are present in each event?

In [25]:
gen_branches = ["GenPart_pt",
                "GenPart_eta",
                "GenPart_phi",
                "GenPart_mass",
                "GenPart_pdgId",
                "GenPart_status",
                "GenPart_statusFlags"]
genparts = mc_tree.arrays(gen_branches, library = "ak")

In this case, the nanoAOD file does not contain one branch per gen-level particle, but it contains branches for all the GenPart-icles in the event. Hence, we need to load the entire collection and then sort out only the prompt muons from there:

In [26]:
sel_gpt = abs(genparts.GenPart_pdgId) == 13
gen_pt = genparts.GenPart_pt[sel_gpt]
gen_status = genparts.GenPart_statusFlags
isPrompt = (gen_status & (1 << 0)) != 0
gen_pt_good = genparts.GenPart_pt[(sel_gpt) & (genparts.GenPart_status ==1) & isPrompt]
In [29]:
ngenmu = plt.hist(ak.num(gen_pt_good,axis=1))
No description has been provided for this image
In [30]:
ak.num(gen_pt_good,axis=1)
Out[30]:
<Array [0, 0, 0, 0, 2, 0, ... 0, 2, 2, 0, 0, 0] type='4386 * int64'>

As expected, our events have either 0 generator level muons or exactly two.

Let us now compare the prompt muons we have defined above with the muons reconstructed with the global algorithm.

Task 6: Compare the histogram of the distribution of the transverse momentum of global muons and gen-level prompt muons. What do you note? What can you conclude about the global muons reconstruction algorithm?

In [31]:
t = plt.hist(ak.flatten(pt_global), bins_pt, label = 'Muons global', histtype = 'step',density=True)
t = plt.hist(ak.flatten(gen_pt_good), bins_pt, label = 'Prompt gen muons', histtype = 'step',density=True)
plt.legend()
Out[31]:
<matplotlib.legend.Legend at 0x7f10d316fd90>
No description has been provided for this image

5. miniAOD vs nanoAOD¶

In the following we quickly explore the other most popular format in CMS analyses: miniAOD. This format is a bit more cumbersome to handle, as it requires ROOT libraries and dependencies to work with. However, it has a lower level of filtering compared to nanoAOD, so it could be that it contains more information. Let's have a look at it by loading the miniAOD file corresponding to the nanoAOD one we worked on so far, and by checking how many muons are reconstructed by each algorithm in this case:

In [32]:
events = Events("/eos/user/m/mbonanom/spark_tnp/DY_mini.root")

#define the collections to read, these are the same that you obtained running edmDumpEventContent
recMuonLabel = "slimmedMuons"
recMuons = Handle("std::vector<pat::Muon>")
genMuons = Handle("std::vector<pat::PackedGenParticle>")

# setup some counters
count_nTracker=0
count_nGlobal=0
count_nStandalone=0

for i, event in enumerate(events):
    event.getByLabel(recMuonLabel, recMuons)
    count_per_event_nTracker=0
    count_per_event_nGlobal=0
    count_per_event_nStandalone=0

    for j, muon in enumerate(recMuons.product(),1):
        if muon.isTrackerMuon():
            count_nTracker+=1
            count_per_event_nTracker+=1
        if muon.isGlobalMuon():
            count_nGlobal+=1
            count_per_event_nGlobal+=1
        if muon.isStandAloneMuon() :
            count_nStandalone+=1
            count_per_event_nStandalone+=1
In [33]:
print(count_nTracker, count_nGlobal, count_nStandalone)
9993 3542 4474

What can you see? How do these numbers make sense based on what you have learnt on the three reconstruction algorithms? How does it compare with the above numbers:

Tracker muons: 2746
Global muons: 2528
Standalone muons: 2603

Task 7: Try to play around with the event information from the miniAOD for loop above. Can you figure out where the difference comes from?

Hint: think of which observables we looked at above and check what is their impact on the miniAOD muons.

6. Bonus track: gen-level to reco-level matching¶

Congratulations! You have made it until now. Take some rest before we move on to the next part of this short exercise, or dive more into the muons with the following bonus exercise!

Hint: If you have picked up the higgs to four-leptons long exercise, this bonus exercise could be helpful and interesting ;)

Since we are looking at Drell-Yann events, we can try to reconstruct the Z boson. As we have seen above, while at gen-level we either have exactly two prompt leptons or none at all in each event, at reco level the situation could be a less clear. In the following we try to:

  • Reconstruct the Z boson from events that contain exactly two reco-level muons;
  • Reconstruct the Z boson from gen-level muons (this is easier, since we should always be able to do so);
  • Reconstruct the Z boson from reco-level muons only if they can be matched to a gen-level muon, i.e. if the muon we have reconstructed is compatible with the generated one.

First of all, we need to load the reconstructed muons and the gen-level ones into four vectors. To do so, we use the following structure:

    mu_vec = ak.zip({
        "pt": muons.Muon_pt,
        "eta": muons.Muon_eta,
        "phi": muons.Muon_phi,
        "mass": muons.Muon_mass
    },
        with_name="Momentum4D",
        behavior=muons.behavior
    )

Here you can read more about the ak.zip function. The with_name="Momentum4D" inherits from the vector library we imported above, and ensures that our mu_vec object has a "Momentum4D" behavior attached to it, and it can be used to perform operations as if it were a TLorentzVector of ROOT.

In [35]:
mu_vec = ak.zip({
    "pt": reco_muons.Muon_pt,
    "eta": reco_muons.Muon_eta,
    "phi": reco_muons.Muon_phi,
    "mass": reco_muons.Muon_mass
},
    with_name="Momentum4D",
    behavior=reco_muons.behavior
)

## Do the same for gen-level muons here
gen_vec = vector.awkward.zip({
    "pt": genparts.GenPart_pt,
    "eta": genparts.GenPart_eta,
    "phi": genparts.GenPart_phi,
    "mass": genparts.GenPart_mass
},
    with_name="Momentum4D",
    behavior=genparts.behavior
)

Now we select only events that contain exactly two muons and we build the invariant mass of these pairs.

Task: The Z boson should actually contain two opposite sign muons. Can you try to implement this check in building the Z mass? How does it change?

In [36]:
sel_mm = ak.num(mu_vec) == 2
In [37]:
mu_pairs = mu_vec[sel_mm]
In [38]:
mm_mass = (mu_pairs[:,0] + mu_pairs[:,1]).mass
/eos/user/m/mbonanom/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:518: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/eos/user/m/mbonanom/.local/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
In [39]:
t = plt.hist(mm_mass, histtype = 'step', bins = np.linspace(0, 200, 100))
No description has been provided for this image

We now do the same for gen-level muons, but first we have to select only the prompt muons, like we did above:

In [40]:
sel_gpt = (abs(genparts.GenPart_pdgId) == 13)
isPrompt = (genparts.GenPart_statusFlags & (1 << 0)) != 0
sel_gen = sel_gpt & (genparts.GenPart_status == 1) & isPrompt
good_gen_mu = gen_vec[sel_gen]
In [41]:
## Write here your code to select only events with exactly two muons at gen level
sel_mm_gen = ak.num(good_gen_mu) == 2
gen_mm_pairs = good_gen_mu[sel_mm_gen]
In [42]:
gen_mm = (gen_mm_pairs[:, 0] + gen_mm_pairs[:, 1]).mass
In [43]:
t = plt.hist(gen_mm, histtype = 'step', bins = np.linspace(0, 200, 100))
No description has been provided for this image

Now we perform a deltaR based matching. We select only muons that have a deltaR distance from the gen-level muon smaller tha 0.2.

In [44]:
# Exactly two reco + two gen muons
sel_evt = sel_mm & sel_mm_gen
mu_two  = mu_vec[sel_evt]
gen_two = good_gen_mu[sel_evt]

# Matching reco to gen by ΔR
# Build pairings: reco vs gen
# shape: (events, nReco, nGen)
dr = mu_two[:, :, None].deltaR(gen_two[:, None, :])
In [45]:
best_match_idx = ak.argmin(dr, axis=2)   # index of gen matched to each reco
best_match_dr  = ak.min(dr, axis=2)      # ΔR of that match
In [46]:
matched = best_match_dr < 0.2
In [47]:
valid_match = (ak.sum(matched, axis=1) == 2) & (best_match_idx[:,0] != best_match_idx[:,1])
In [48]:
mu_matched = mu_two[valid_match]
In [49]:
pairs = ak.combinations(mu_matched, 2, fields=["mu1", "mu2"])
m_reco_matched = (pairs["mu1"] + pairs["mu2"]).mass
In [50]:
plt.hist(ak.to_numpy(ak.flatten(m_reco_matched)),  bins = np.linspace(0, 200, 100), histtype="step")
plt.xlabel("m(μμ) [GeV]")
plt.ylabel("Events")
plt.title("Reco dimuon mass")
Out[50]:
Text(0.5, 1.0, 'Reco dimuon mass')
No description has been provided for this image
In [ ]: