--- DESY Institutional Review of TOP-18-008 ---

We would like to congratulate the authors on this important result and well-written paper.

Below you can find the Institutional Review of TOP-18-008 from the DESY group.

No major concerns on the analysis nor the paper emerged from our review; we compiled a number of questions and suggestions, which hopefully will help to improve the paper even further.

Best regards,
M. Missiroli for the DESY-CMS group
(thanks to P. Asmuss, A. de Wit, J. Keaveney, J. Knolle, I. Melzer-Pellmann, G. Van Onsem and R. Walsh for providing comments)

-----------------------------------------------

======================================
  Type A
======================================

Abstract L8: 'tZq signal significance' -> 'significance of the tZq signal'

L3: 'The number [..] facilitate' -> 'The number [..] facilitates'

L4: '[..] is such a process.' -> 'One such process is [..].'

L8: 'might affect the tZq process differently than they would top quark pair production' sounds strange. Suggest 'might affect the tZq process differently than top quark pair production'

L22: analysis strategy lead[s] to

L32: present -> presented

L35-36: Two sets of MC samples [are used] (instead of end of sentence)

L36 'data taking' -> 'data-taking'

L44: 'Simulation [..] are performed' -> 'The simulation [..] is performed'

L49: ', pileup,' -> suggest to use parentheses '(pileup)', or state it more clearly, e.g. ', referred to as pileup,'

L78: the above [described/mentioned] jet finding algorithm

L87: 'prerequisite' is not necessary in this sentence.

L94: 'both ... and' instead of 'either .. or'?

L114-115: Start final sentence of paragraph with 'Thus' (currently, it is kind of disconnected from the aforementioned achievements of the BDT)

L136: remove comma before verb

L139: 'the tZq and background events' -> 'tZq and background events', or 'tZq production and background processes'

L144: 'remaining jet in the event with highest pt' -> 'remaining jet with highest pt in the event'

L148: |eta| -> the |eta| (in L146 you use 'the |eta| of the recoiling jet')

L156: maximum-likelihood fit -> maximum likelihood fit

L237: 'in the uncertainties' -> 'in uncertainties'

L249: maximum-likelihood fit -> maximum likelihood fit

L257 'data taking' -> 'data-taking'

L258: sigmaSM should be sigma^SM*BR^SM, and similar elsewhere where this is used.

L259: there is an extra space after '94.2'

L266: drop ':' before equation, no reason to interrupt the sentence

Figure 1: in the labels, gamma and mu should be bold as is the rest of the text

Figure 1: don't double 'signal regions': either 'in each of the signal regions, SR-2/3j-1b' or 'BDT distributions for events in SR-2/3j-1b'

Table 1: increase the size of the table; if it helps, consider removing the code-name of each category in the headers, e.g. '(SR-2b)'.

Figure A.1 (right plot): use small p for pT. Also don't double 'in SR-2/3j-1b events'. Could be avoided by e.g. 'recoiling jet (middle) as well as the pT of the Z boson [..] values in excess of 0.5.'

Figure A.1, Caption L3: '|eta| of the recoiling jet' -> 'the |eta| of the recoiling jet'

Figure A.1, Caption L3: 'pane' -> is this intended? consider using a more conventional term, e.g. 'plot' or 'distribution'.

======================================
  Type B
======================================

Abstract, L1: the first line of the abstract reads a bit odd. Why not 'A search for [..] is presented' or 'The observation [..] is reported'?

Abstract, L1: tZq is only described as tZ ('top quark in association with a Z boson (tZq)') -> suggest to add 'and an additional parton (tZq)'.

Abstract, L3: 'multiple jets' is quite vague -> suggest to use the minimum number of jets actually used, e.g. 'at least two jets'.

Abstract, L7: 'previous measurements of the tZq cross section' -> wouldn't it be better to say here 'previous searches for tZq production', given that this one is the paper which establishes the observation?

Abstract, L7: Possibly swap order of significance and cross section measurement? The statistical significance is inherently tied to the measurement, it doesn't exist without it.

L1-13: it would be best to add, maybe in L5, a short sentence clarifying to the reader the typical topology of a tZq signal event (most notably, the presence of a forward jet), given that it's not possible to add Feynman diagrams in this paper format.

L13-14: it sounds odd to go from '...is an important quantity to measure' to 'This Letter presents the observation ...' (rather than presenting a measurement of ...)

L21-24, Comment 1: 'redesigned analysis strategy' (also used in the abstract) reads a bit too vague in the sentence, leaving the reader wondering what exactly are all the differences with respect to previous analyses (and a detailed list of such differences obviously can't fit in this letter). We suggest the following rephrasing: 'Compared to previous publications, the analysis sensitivity is significantly increased by improvements to the lepton identification techniques and to the analysis strategy. The sensitivity is further improved by more than doubling the integrated luminosity with the addition of 2017 data.'

L21-24, Comment 2: as far as we understand, the increase in luminosity brings a smaller improvement compared the improvements to lepton-ID and analysis strategy. If not, things should be rephrased so that the 3 effects are on equal footing.

L21-24, Comment 3: since this is still the introduction, we suggest to use the term 'sensitivity' instead of already citing 'results'.

L43-44: 'with the precision matching that used in the sample generation' is not very clear. Suggest something like 'with the perturbative order in QCD matching that used in the sample generation'.

L61: reading the rest of the paper and looking at the AN, it seems jet energy resolution corrections (in MC) are not applied in the analysis. We understand the standard JetMET prescription is to apply them, so why was this not needed for this analysis?

L66: where does the forward jet come from? The additional parton?

L68: only requirements for 2017 are listed. Have they been the same in 2016?

L70: was DeepCSV actually used in the 2016 dataset?

L77: consider adding a reference for the PV reconstruction, e.g. JINST 9 (2014) P10009, as done for the other physics objects.

L78: 'clustered using the above jet finding algorithm with the tracks assigned to the PV as inputs' -> suggest to improve readability, for example with the following rephrasing 'obtained by clustering the tracks assigned to the PV with the same jet finding algorithm mentioned above'

L91: 'which takes into account the increased particle collimation at high pT values' -> suggest to state a bit more clearly why mini-isolation works better here, for example rephrasing to 'this pT dependence improves the efficiency for the identification of leptons originating from high-pT top quarks'.

L106, Comment 1: regarding the cross-check with Keras: is one of these performing better than the other (BDT/neural network)? If so, by how much? Or is there any reason to use the BDT for the analysis and the neural network as a check and not viceversa? 

L106, Comment 2: for a concise letter like this, it shouldn't anyway be necessary to mention this cross-check, given that the neural net ultimately isn't used in the analysis.

L108: These electron and muon identification BDTs are not the standard ones also used in other CMS analyses, correct? If they are it would be good to mention this.

L108: 'stringent' seems poorly defined here. Was this cut chosen to optimise some figure of merit? Better to just state that.

L111: 'cutoff-based' -> 'cut-based'

L117: 'The combination of tight leptons and [...] 'loose leptons'' -> we think it should be made clearer that, as far as we understand, tight leptons don't necessarily pass the looser selection ('loose selection criteria on the attributes [..]'), and what you define as 'loose leptons' is the OR of tight leptons and the other looser leptons; explaining this in one sentence (as currently done) makes the sentence hard to follow, we suggest to split this explanation into two shorter sentences.

L118: 'criteria on' -> 'criteria based on'

L123-126, Comment 1: 'Events must contain exactly three loose leptons'. It unclear what is the sample used to estimate the non-prompt background, is it 'three loose' or '2 tight and 1 loose'? (L127 seems to conflict with L199; if we understand correcly, you could add 'at least one' in L127)

L123-126, Comment 2: we suggest rephrasing this in a way that clarifies the selection for the signal regions first, e.g. 'exactly 3 tight leptons + 4th loose veto', and then whatever is the larger sample used to measure/validate the backgrounds, i.e. '3 three loose leptons'.

L123-126, Comment 3: in this description, please specify where the veto on the 4th loose lepton applies (SRs? SRs and all CRs? SRs and some CRs?).

L125: 'world-average Z boson mass' - is really 91.1876 GeV used, or rather just 91 GeV? In the latter case, I would drop rather drop the 'world-average'.

L133: the purpose of having SR-2/3j-1b is briefly contextualized ('contains most tZq events'), but the same isn't done for SR-4j-1b and SR-2b and one is left wondering how these are exploited. A brief comment, as done for SR-2/3j-1b, would be useful.

L137-157: in the description of the BDT inputs, there is no mention of how/whether the BDT inputs were validated (for example, was there a requirement on good data/MC agreement for each of the BDT inputs). We suggest adding one short sentence to cover this. You could also consider using the latter sentence to introduce Figure A.1.

L139, Comment 1: The sentence starting 'Simulated...' is a little imprecise. What fraction of signal events satisfy the criteria? Can this be made more quantitative?

L139, Comment 2: suggest to improve readability as follows 'In a large fraction [see L139, Comment 1] of simulated signal events, at least one jet has a high [...]'

L141, Comment 1: 'Therefore' sounds as if 'assigning one b-jet to the top decay' is a direct consequence of 'there is a high-eta jet'. A combining connector, instead of a causal one, works better here.

L141, Comment 2: 'having' -> 'with'

L143: this is where a causal connector fits, e.g. 'Thus, the remaining jet in the event [..]'

L158: 'The background contributions to the SRs for the three event categories' -> simplify to 'The background contributions to the three SRs' (if that's not what you mean, please clarify what you mean in this sentence)

L166: would be good to add to this sentence an explanation why the MET > 50 GeV cut is done (Suppressing QCD? Reduces other background contamination?)

L169, Comment 1: 'the shape and normalisation are implicitly constrained in the final fit via the bins at low BDT values...' -> are there any unconstrained parameters in the fit that account for this? If this is not the case the wording is misleading as it implies the fit is given the full power to constrain shape and norm of this process in these bins, whereas if there are only nuisance parameters with prior constraints applied this isn't actually true. If there are unconstrained parameters that take care of this, then the constraint isn't implicit, right?

L169, Comment 2: as a general comment, it remains unclear from the paper what are the prior uncertainties, if any, on the cross-sections of the background taken from MC (WZ, ttZ, Xgamma). We understand there are different control samples used to either constrain or validate these backgrounds, but it remains unclear what are the pre-fit errors on the normalization of WZ, ttZ and Xgamma in the final fit?

L173: we suggest to invert the logic of this sentence, mentioning first the prior uncertainties and then the lack of constraint of these minor backgrounds in the final fit, e.g. 'These processes are normalized to their predicted cross sections, accounting for theoretical uncertainties, and they are not significantly constrained by data in this analysis.'

L181: 'Internal and external conversions' reads a bit vague, could it be rephrased, or made clearer to the reader what the distinction is?

L186-187: If the event is Z+gamma, with say gamma->e+e- and one of the leptons not properly reconstructed, shouldn't the mass of the dilepton system still be close to the Z mass? 

L203: Seems unnecessary to mention again the performance of the BDT-based lepton ID. One could suppress 'Owing to the performant BDT-based lepton selection used in this search...'

L214: This is the first time a range of values (0.7--5.0%) is given, and since it is not clear what the range refers to, one can mention here something like: '0.7--5.0% across the BDT bins'

L217: 'moving [..] by its estimated uncertainty' -> 'varying [..] within its uncertainty'

L217: Having looked at the additional documentation it appears that a single jet energy scale nuisance parameter is used for each dataset, which is constrained to around 50% of the input value in the case of 2017 data. Given that this nuisance parameter has a relatively large impact on the signal strength and probably would have been the systematic uncertainty with the largest impact if it had not been constrained, is there a justification for this constraint? Analyses which constrain the JEC uncertainty are often recommended to split the uncertainty into sources, the question is why this was not done in this case. Note: this is explicitly *not* a suggestion to change the approach that was used at this point, as it is highly unlikely that it would significantly change the results of the analysis.

L218: remove 'subsequently'

L219: 'propagated to all analysis variables, including the event categorization and selection' doesn't make sense as 'event categorization and selection' are not 'analysis variables', as is implied by 'including'. Perhaps you mean something like 'stages of the analysis' or 'selection criteria'. Consider just dropping 'including the event categorization and selection'.

L230: Using 'limited knowledge' here jars a little as it implies there is something statistically different about the these uncertainties. Perhaps something like 'uncertainties arising from PDFs', as you write for the scale, would be better.

L233-235: see 'L169, Comment 2'; we think we understand what you mean but the post-fit uncertainties on the associated nuisances contribute something to the final total uncertainty right?  Better to just quote what the pre-fit variations are and say that they are all constrained in the fit so these theory uncertainties are not fully manifested in the final result. If it is the case these background are totally free, you should state that.

L241: Why is the ISR/FSR uncertainty not applied to other simulated processes?

L244, Comment 1: we assume Figures 1 (and A.1) contain the *post-fit* distributions; if so, this should be made clear, e.g. specifying 'post-fit' here where Fig.1 is introduced (or in the captions).

L244, Comment 2: if the distributions are indeed post-fit, it might be best to introduce them *after* the fit model is outlined.

L249-250: '[..] fit... to the normalisations of the WZ and ZZ backgrounds, as determined from the control samples' - Is what is meant here that the CRs are fitted simultaneously with the three signal regions? If so, this should be made clearer.

L255: from this sentence, we understand the lepton-ID efficiency (SF) uncertainties for 2016 and 2017 are fully correlated (unlike other experimental uncertainties like b-tagging, trigger and JES). Why is that the case? Are muon and electron uncertainties kept uncorrelated?

L263-264 'are found to be consistent with the combined measurement': we expect this because we've just combined the two measurements. Something really odd would have to happen for the two individual measurements not to be compatible with the combined. The more interesting point to make is whether the 2016 and 2017 measurements individually are compatible with the standard model expectation (2016 not within 1 sigma from what I can tell).

L263: '[..] strengths measured [..] separately [..]' -> '[..] strengths measured separately [..]'

L269, Comment 1: It is relevant that the asymptotic approximation was used to obtain the significances, as it is know not to be a good approximation anymore in this 'high significance' regime. It would be good to make clear in other places where the significances are quoted that they are calculated using the asymptotic approximation.

L269, Comment 2: Since it is not the test statistic that is approximated, but rather the distribution of the test statistic, suggest to write (assuming you use the profile likelihood test statistic): '(...) using the asymptotic approximation of the distribution of the profile likelihood test statistic [42,43], and found to be (...)'

L274: see comment 'Abstract L1': tZq should not only be described as tZ

L277: you should quote the fiducial m_ll requirement again here 

Figure 1, right (SR-2b): What is happening in the one bin with the very large statistical uncertainty?