=== Type A ===
1) Abstract: Only result for BDT given - what is happening to the cut-based analysis?
2) L. 67: kinematically accessible (no -)
3) L. 135: adjust blank spaces in gamma + jet (either put them or not but not just one)
4) L. 182: Just "greater than 90%" for all. Greater than 90% and "about 92%" seem a bit redundant
5) L. 203: eightH
6) L. 274: 'Simulated events' instead of 'Simulation events'
7) Unclear/incomprehensible sentence l.s 283-5: One 'events' seems to be too much
8) L.s 298-300: Is the 1% for PDFs valid for signal and bg? If so: drop it from the second sentence. If not, state that.
9) L.s 307/8: Put 'as a function of kinematic distributions' directly before the ',': "...observed between prediction and data as a function..."
10) L.344: no '.' after 1.3+-0.2
11) Fig. 6: 'The particleS' or 'decayS'

=== Type B ===
1) L. 54: Why LO for these exact samples and not for all? Or why not NLO for these? Some are also used in NLO as well as LO. Why?
2) L. 94: Disregarding WW decaying leptonically?
3) L.s 158/9: What kind of pT ratio is that between the lepton and the nearest jet? What kind of 'more stringent isolation' is required?
4) L.s 161-3: Why is that hadronic activity a meaningful variable? It seems quite arbitrary.
5) L. 173: What criteria are applied for b tagging? Mistag rate, efficiency?
6) L.s 182/3 (perhaps more Type A): What does that efficiency mean? Usually, the offline cut is mentioned w.r.t. to the trigger not the other way around, isn't it?
7) L. 185 (also perhaps Type A): Do these numbers of jets mean that there are four jets overall or are the b tagged jets counting for Nb and Njets?
8) L. 234: 'subtracting lepton contamination based on simulation': How? Cuts/Other definitions?
9) L. 281: Why half of the difference from unity taken as syst. uncert.? Why not all?
10) L. 305: How are these uncertainties justified? Especially the doubling for pT > 50 GeV?
11) L.s 345-8: Why is it okay or necessary in the first place to scale those parameters? Motivation/Justification? Is it to match previous analyses?
12) Fig. 1: Shouldn't there be uncertainties from 0 upwards if there are no counts?
13) Fig. 1: Reason for systematic underestimation? Is it because it is before scaling (seems like it from Fig. 3)? And is that also the reason for the scaling (cf. 11) )?
14) Fig. 3: Expected that only for very few SRs, the signal contribution is non-negligable? Is the BDT easier to train in those regions?
15) L.s 387/8: Shouldn't 1.4 be the lower value?
16) Tab. 4: How are there SRs with 0 expected tttt events? Is that still a SR?
17) L. 448: It hardly seems like a valid point that the cut-based analysis fits the BDT-based one. Shouldn't the comparison be the SM prediction?