Paper Reading (Inst. Rev. HIG-16-030)

Name: Paper Reading (Inst. Rev. HIG-16-030)
Start: 2018-02-23T11:30:00+01:00
End: 2018-02-23T12:30:00+01:00
Location: DESY

Friday 23 Feb 2018, 11:30 → 12:30 Europe/Berlin

CMS Center (DESY)

CMS Center

DESY

Description

Modification password: CWR

Put CADI number into meeting name

Organizer: Heiner Tholen

submitted comments: https://cds.cern.ch/record/2304495

- 1
  
  Discussion
  
  Comment from Rainer
  
  Type B comments
  ===============
  
  General
  -------
  
  - The PAS on which this paper was based is much longer than the paper. This is particularly a shame since some of the information in the PAS would have been useful in the paper too. With the target journal being JHEP (which normally takes longer papers), why was this paper draft written as a letter?
  
  - If you do not write in letter style, for a normal paper please include section headings. Otherwise there is a very hard trainsition in L64, going over to the CMS detector description, where a conclusive sentence of the previous paragraph would round it.
  
  - The introduction discusses various 2HDM’s. Is there a reason why no interpretation of the results in (some of) these model scenarios is provided? Given that the PAS of this analysis came out 1.5 years ago this could have been a nice addition to the paper.
  
  - The analysis stragety should be layed out clearly. The description is very vague concerning how the flavors of the cb system are tagged. Is a charm tagger used? Is the btag information of the c candidate ignored?
  
  - In the light of the analysis strategy, please discuss whether/how the (previously by CMS published) cs channel is separated from the cb channel which is the subject of this search.
  
  - In L16-17 you mention that the charge conjugate decay is also taken into account, but no further mention of this is done in the paper. Are the H+ and H- decays treated separately or combined? Do the results in those two channels agree with each other? Please mention that the charge conjugation is treated implicitely, if this the case. Furthermore, the particle in search should then be denoted H^\pm (plus minus) instead of H^+.
  
  - The concept of two different H+ mass ranges in this analysis is not discussed before L214. Does the H+ mass range assumption bias the candidate selection via the kinematic fits? The resulting mass distributions in Fig.1 seem to be rather different.
  
  - Related to the >= 3 b category. The background is undershot over a large mass range, which is reflected in the ~2-sgima excess in the limits plots. The category >=3 b seems to be the most sensitive, for you do not mention which category is predominant, which by the way could be mentioned in the paper.
  
  - The paper seems to indicate the ttbb rescale uncertainty accounts for the undershoot. What is not clear is the large uncertainty in the rescale factor (line 121), for in the reference the ratio of ttbb over ttjj has an uncertainty < 15%. The authors should explain exactly what enters in the rescale factor uncertainty. Did the authors also check the proportion of other contributions, such as ttbj?
  
  - A Feynman diagram of the searched process would be nice.
  
  Abstract
  --------
  
  - Abstract L 3: "utilizes tt events": you cannot be sure that all these are tt events. -> "This analysis searches for events having ttbar pairs..."
  
  - It sounds as if the final state contains exactly four jets. Shouldn't it rather be "at least four jets" or "typically four jets"?
  
  - Last scentence: “These are … decay channel” -> Is it necessary to make this statement?
  
  Main text
  ---------
  
  L4: you cannot discover a parameter
  
  L5: "... many questions still remain unexplained within the scope of the SM, => such as ..." (please name two or more open questions).
  
  L7: how can a BSM model contain a Higgs doublet of the SM? (Remove "of the SM")
  
  L10: h(125) is identified as light higgs h0. Is the identification of h(125) as the heavy H0 excluded by now? Please refer to the according searches.
  
  L12: all fermions w. t. s. charge are _typically_ required to couple to only one Higgs doublet (otherwise say "in X and Y models, all fermions w. t. s c. are required ...")
  
  L13: since you refer to Branco et al in [17], it would be best to use the quite generally accepted model nomenclature from there: type-I, type-II, lepton-specific, flipped (not type-X and type-Y).
  
  L21f: Have these searches also been performed in the channel tt->Hb Wb? Maybe write: "Direct searches for the production of a light charged Higgs boson in the decay of a top quark pair"
  
  L25: Since you mention the relevance of flavour conservation in the previous paragraph, it would be good to state clearly whether H+ ->cb is flavor-violating. Maybe it is not and the final state is only due to CKM mixing.
  
  L25f: Are the limits on H->taunu mentionend in L22f also applicable to the search presented in this paper?
  
  L27f: This sentence considers quite a few steps, which makes it hard to follow. It could be clearer if this was split into multiple sentences which explain that
  a) ttbar events decay to two W bosons and 2 b-quarks.
  b) In this analysis one considers the W->ellnu +W->jets decay.
  c) In what follows this will be referred to as the lepton+jets channel.
  
  L27-36: Maybe change the order of this paragraph to make it easier to understand: First state that you perform the search in the semi-leptonic channel (missing in the introduction), than mention the dominant background processes and afterwards explain how the SM top quark decays are reconstructed and what the signal is expected to look like instead.
  
  L30: Same as in abstract: ... of at least (or typically) four jets, where additional jets may be introduced through initial and final state radiation.
  
  L32ff: It sounds a bit odd, as if the H+ appears in the W mass spectrum. Maybe something like: "We fully reconstruct both top quarks and determine the invariant mass of the dijet system _that results from the hadronic decay of the W boson in the SM_". And then maybe: "A secondary resonance peak in the mass distribution of the dijet system could indicate the H+ to cb decay."
  
  L43: After such a long sentence it’s hard to remember that “The first uncertainty” that was quoted is the scale uncertainty. Perhaps “The uncertainties come from (description of scale uncert) and (description of pdf uncert)” would be better?
  
  L55-57: Which branching ratios are chosen for the decay of t->Hb in the signal? Please mention explicitely (if so) that only t -> Wb -> lep nu b and t -> Hb -> c b b is simulated.
  
  L100: Is |eta|<2.4 genuinely required or is that a typo? If it’s not a typo, is there a fundamental reason for not going up to |eta|<2.5?
  
  L102-105: Perhaps better to split this into multiple sentences, for someone not familiar with how relative isolation is calculated in CMS this would be quite hard to understand
  
  L106: Is the 2.5 in brackets a typo? Muons are not reconstructed beyond |eta| =2.4…
  
  L115: "SM ttbar" is what you simulated, but you may not talk of the data as if it was just SM ttbar.
  
  L115: Is this checked in a control region? Please add a sentence stating from where you get this confidence or remove the sentence.
  
  L121: I assume that the error is propagated and used as a systematic uncertainty. It would be nice to mention this here.
  
  L123: The background estimation method can be elaborated on. This is a very short paper already and the reader probably doesn’t want to look up in Ref 27 how the method was developed.
  
  L127: Is this an ABCD method using Iso_rel and MET for the normalisation? If yes, please say so.
  
  L127: The shape of the background distribution: It's not clear which distribution, please state explicitely.
  
  L133: It sounds as if the fitter uses the mass values of t and W to do something. But doesn't it rather use these variables as proxies? I would turn the logic around and say something like: "The fitter varies the jet momenta within their uncertainties using the reconstructed t and W masses as constraints."
  
  L138: I would mention all jet corrections to the jets (that are applied before using the fitter) before describing the fitter.
  
  L141: Is the TS correction done to "train" the fitter? Otherwise, how can you work with parton level in simulation and with particle level in data? The reader needs more detail on what is actually done, it's not self-contained
  
  L142: Please state what quantities the TS correction depends on
  
  L148: What is non-clustered energy (NE)? Does it relate to MET? Please explain how it is calculated.
  
  L155: "...to quarks in the tt system, where the b quark daughters of the top quark are only assigned to b tagged jets."
  
  L156: The TS corrections seem a little out of place here. You mentioned them before. If there is a special reason to mention them again, please make that reason explicit.
  
  L160: What's the percentage of events with correct assignment on signal?
  
  L161: Remove "and the neutrino". (if it is recalculated, it is not directly returned by the fit).
  
  L162: It does not become clear why you need the neutrino momentum. Please mention what you use it for (e.g., if you do so, "The neutrino momentum is used to evaluate the performance of the fit."). If not used, you may remove it.
  
  L173-176: How is the reconstruction performed for signal samples with lower masses? (Please describe.) If you try all combinations for those, would you not also achieve an even higher efficiency by trying all combinations for the high mass points as well?
  
  L187: “And then corresponding changes are estimated for b/c-jets and light flavoured jets separately” -> it’s not entirely clear what is meant here… There are separate uncertainties for b/c-tagged jets and mis-tagged jets, but that is not necessarily clear to someone from the outside.
  
  L190-192: “The uncertainty on lepton scale factors are measured in data using a tag-and-probe technique…” -> Why describe this in so much detail when the derivation of the scale factors themselves (also using T&P) is not covered? Same comment for electron SFs.
  
  L192-193: I believe this statement is inaccurate - this is a variation of the minimum bias cross section used for generating the data pileup distribution, and then used for reweighting. The simulated samples are not actually varied themselves.
  
  L194-196: Several comments. Firstly, this does not actually explain how the uncertainty is calculated. Is the difference taken as the uncertainty, or something else? Furthermore, what does it mean if the non-isolated region is 0.2<Isorel<0.3 and 0.12(mu)/0.1(e)<Isorel<0.3. Some information has to be missing here.
  
  L204: "Various correction and scale factors" => please list them, as it is interesting to know what is not present for the signal MC in the next line. (You can also point to table 2.)
  
  L209-210: Drop the sentence “Using this reweighting method … are correctly estimated” It is redundant as it doesn’t prove that the method is correct, and anyway, if it was deemed incorrect the paper probably wouldn’t be in CWR
  
  L210-212: “Since PYTHIA is used … MadGraph is considered” How is this taken into account?
  
  L214: Which mass range?
  
  L216: Perhaps best to drop ‘Therefore’, otherwise the paper might be accused of flip-flopping.
  
  Tables, Figures, Equation
  -------------------------
  
  Table 1:
  - pm 1 ranges from pm 0.5 to pm 1.49, therefore please use two significant digits.
  
  - Given that in the fit only the ttbar and non-ttbar background are considered (presumably everything else), how are the uncertainties on the individual process yields determined?
  
  Figure 1:
  - Why do you get a second peak in the ttbar background for events with at least three b-tags? Why do the distributions for the ttbar background in the middle and bottom row differ?
  
  - There are no mjj plots for the 2 b-tag category with mH+ >= 130 GeV. Is this because the category is not considered for events with mH+>=130 GeV, or is it just that the same kinematic fit mass solution is used in this regime? This should perhaps be clearer from the text.
  
  - Are the distributions plotted post-fit?
  
  - Please add ratio or pull plots.
  
  - The caption mentions that in the top row the signal for M_H+ = 110 GeV is shown whereas the top right figure the legend shows H(140)
  
  - The second and third row have the same labels and just a different signal denoted. Please add to the label the mass range explicitely. (The plots should have labels stating what region is plotted.)
  
  Table 2:
  - Still assuming that bacgkrounds like single top and W/Z+jets are encapsulated in the non-t#bar{t} background:
  a) why are there no dedicated uncertainties for these processes? (e.g. cross section)
  b) Why was the decision made to combine multijet background with all these other backgrounds as a single process in the fit?
  c) How can it be justified to, in effect, apply the dedicated multijet estimation uncertainty to these other processes?
  
  Figure 2:
  - Is it really necessary to show the median expected with statistical uncertainties only? Of course the limit will improve when removing systematic uncertainties, but the fact that it improves is not in itself enough to state that the analysis is systematically limited. In any case to make statements about how much the sensitivity is affected by systematic uncertainties one would have to compare pre-fit expected limits with systematics vs without systematics.
  
  - It would be nice to see also 2HDM model parameter limits
  
  Type A comments
  ===============
  
  General
  -------
  
  Abstract
  --------
  
  - the data --> data
  
  - L 2: using data
  
  - L 5: The charged Higgs boson
  
  - "A search for a charged Higgs boson decaying to a charm and a bottom quark is performed in pp collisions..." (please remove "a pair")
  - Suggest to rephrase: "a pair of charm and bottom quarks" -> "a charm quark and a bottom quark”
  
  - "As the hypothetical charged Higgs boson is assumed to decay in the cb channel, the targeted final state contains at least four jets, ..."
  
  - The positioning of (H+->cb) in the sentence is a bit odd; is it a requirement that the decay is specified in the abstract or can one assume that the reader understands that when H+->cb is used later it refers to charged Higgs decaying to cb? The t->H+b decay is also not explicitly defined, so by the same logic it should be ok to drop ‘(H+->cb)’ from the sentence.
  
  - Suggest to reorder the first sentence ‘in pp collisions … 19.7 /invfb’ -> ‘using pp collision data recorded at sqrt(s) = 8 TeV corresponding to an integrated luminosity of 19.7 /fb’ or similar.
  
  - Replace "The search utilizes…" by "The analysis searches for” or similar? (strictly speaking the sentence currently reads as if t->H+b is a valid, existent decay.)
  
  Main text
  ---------
  
  line 1: Suggest to rephrase: The Higgs boson was discovered with a mass of 125 GeV -> A Higgs boson with a mass of around 125 GeV was discovered (Note explicitly: A Higgs boson, not ‘The’ Higgs boson, considering a search for additional Higgs bosons is presented here!)
  
  line 2/3 : change to "Its measured properties have been found to be in good agreement with the Standard Model prediction and the measurements support the model of spontaneous symmetry breaking."
  
  line 4: Suggest to rephrase: “Although the final missing … within the scope of the SM” -> Double use of SM in the same sentence so perhaps something like “Although the final missing parameter of the SM has been discovered, the model is incomplete and many questions still remain unanswered”
  
  line 5: "Questions remain unexplained" -> "Questions remain unanswered”
  
  line 9: "the Higgs boson discovered at the LHC can be regarded as one of the neutral CP-even bosons” -> would it be better to replace ‘can be’ by ‘is’?
  
  line 12: "to only one Higgs doublet" --> "one of the two Higgs doublets"
  
  line 16-17: Why not just state “In this paper we present a search for charged Higgs bosons (H^(+/-) )? Or would this make the decay strings too involved?
  
  line 17: "decays are also considered" --> "its decays are also considered"
  
  line 17-18: The way this sentence is written implies that the fact that the mass of the charged Higgs boson is unknown is to some extent contradictory to the large coupling of H+ to the top quark. Perhaps better would be: “In the 2HDM the mass of the charged Higgs boson is a free parameter. Regardless of its mass the H+ is expected to have a large coupling to the top quark.”
  
  L19 "smaller" than the top quark mass
  
  line 19 - 20: try to avoid comments in brackets. Would it be possible to state "If mh+ is less than the top quark mass, the top quark can decay to a charged Higgs boson and a b quark. This is the light charged Higgs boson scenario” or “In the light charged Higgs boson scenario, when mh+ is less than the top quark mass, the top quark can decay to a charged Higgs boson and a b quark”
  
  L27 "In the SM, (remove "the") top quarks decay _predominantly_ into a W boson and a b quark.
  
  L29 pp' => a pair of quarks
  
  L35 Other backgrounds are W/Z+jets, di-boson, and tt production associated with a W, Z or H boson.
  
  line 38: Suggest to rephrase “The top quark mass of 172.5 GeV is used for the simulation samples” - > “The top quark mass is set to 172.5 GeV for simulating these samples”
  
  line 47: “The ttbar simulation events” -> “The simulated ttbar events”
  
  line 47: “To match the observed pT distribution in data” -> “To match the pT distribution observed in collision data”
  
  line 54: “cross section given in [54]” -> “cross section given in Ref. [54]” ?
  
  line 55: Repetition of ‘the’ in the sentence, how about dropping the first “The”? -> “Charged Higgs boson signal events are ….”
  
  line 56-57: This can be simplified as: “The samples are normalised to the SM ttbar cross section” (the cross sections are not normalised to another cross section)
  
  L62 minimum bias and minbias are jargon, please replace it with "total inelastice cross section"
  
  L84 remove parenthesis: "Jets are reconstructed using particle candidates (identified by the PF algorithm)" -> "Jets are reconstructed using PF candidates”
  
  L85 _The_ jet momentum ...
  
  line 99: "only" --> "exactly"
  
  line 99: "one electron (or muon) “ drop the ‘or’ : “one electron (muon)”
  
  L100 "Leptons" Did you say somewhere that Lepton only refers to electrons and muons in this article?
  
  L101 Leptons must be isolated, satisfying an isolation criterion relative to their momentum, ...
  
  L102 Iso_{rel} => I_{rel}
  
  L102 Remove "The" at the begin of the sentence.
  
  L105 Events with additional electrons (muons), satisfying ..., are discarded.
  
  L106 Using the parenthesis notation seems a bit awkward if only the pt requirement differs. Maybe something like "additional lepton, with pt > 20 for electrons and pt>10 for muons, abseta < 2.5, and ..."
  
  line 107: It looks like there is a space missing between ‘pTmiss' and ‘is’
  
  line 110: Remove quotation marks around b jets, the expression is already in brackets. Perhaps could just write “Selected jets are considered b-tagged if they satisfy the medium working point of this algorithm”
  
  line 112: drop hyphen from light-quark?
  
  line 112-113: "Events with two or more tagged b jets are selected" -> "Events with two or more b-tagged jets are selected” (a tagged b jet is a genuine b jet which also happens to be b-tagged…)
  
  L 120: *to* be?
  
  line 120-121: “..by requiring at least one additional jet be from …” -> “by requiring at least one additional jet to be from…”
  
  line 120-121: "one additional jet be from a b quark" --> "one additional jet to be from a b quark","then rescaling" --> "and is rescaled"
  
  line 187: "then"--> "the"
  
  L164 "... directly emerging from the top quark."
  
  line 176: ‘based on MC studies’ - > ‘based on studies of simulated events’
  
  L177 "as well as" => "and may"
  
  L178 from _a_ top quark decay
  
  line 178-179: “originates from top quark decay” - >”originates from a top quark decay”
  
  line 186: “By varying the correction factors by pm 1 sigma” -> “By varying the correction factors within the uncertainties”
  (same comment elsewhere in this paragraph)
  
  line 191: "uncertainty on" --> "uncertainty in"
  
  L192 move the reference behind "electrons".
  
  line 193: "minbias" --> "minimum bias"
  
  line 193-194: Suggest to rephrase, e.g.: “The estimation of the multijet background depends on the non-isolated region selection”
  
  L194 "The uncertainty is calculated using a dedicated control region with ..."
  
  L195 Write out (muon) and (electron)
  
  line 198: MC is jargon; “to match the simulation to data” would be better.
  
  line 203: MC samples -> simulated samples
  
  line 206: in the signal MC -> in simulated signal events
  
  line 221: "to illustrate the analysis" --> "to illustrate that the analysis"
  
  L218,233 you've got B(xy) = 1 and B(yz) = 1.0 (trailing ".0"). In the figure 2 caption you have the trailing .0 twice. Please make it consistent.
  
  line 226: "the data" --> data
  
  Tables, Figures, Equation
  -------------------------
  
  Table 1:
  - align the second pm's as well
  
  Figure 1:
  -The expected signal line in the legend overlaps with the tick marks.
  
  -Check that the conventions are used for the figures; data points probably should not be marked ‘DATA’ in all caps. Would it be possible to use t#bar{t} instead of ttbar? For non-tt should then also us t#bar{t}, at the moment this is inconsistent.
  
  -No line is shown in the uncertainty band plotted in the figure, so it probably also shouldn’t be present in the legend.
  
  -‘Stat+Syst’ is not a complete description of the uncertainty band, it should probably say something like “Background uncertainty”.
  
  -Finally having the legend split on the left and right side of the figure is confusing. If the legend doesn’t fit in the uncovered region of the figure it probably would be better to put the legend outside of the figure panel.
  
  Table 2:
  - event yields for _the_ charged Higgs signal
  
  - "Uncertainty sources marked with an asterisk use shape systematic templates in the fit." => "Uncertainties on the shape of the templates are marked with an asterisk."
  
  Equation 1:
  - use a linebreak before every "+" and align them on top of each other

Choose timezone

Paper Reading (Inst. Rev. HIG-16-030)

CMS Center

DESY