======================================
  Type A
======================================

Line 87: "prerequisite" is not necessary in this sentence.

Line 258: sigmaSM should be sigma^SM*BR^SM, and similar elsewhere where this is used.


======================================
  Type B
======================================

Abstract, Line 1: First line of abstract reads a bit odd. Why not 'A search for ... is presented"?

Abstract, Line 7: Possibly swap order of significance and cross section measurement? The statistical significance is inherently tied to the measurement, it doesn't exist without it.

Line 5-7: As a non- top expert I don't understand what is meant by this line. ttZ and tbW I assume both count as tZq, or is the q explicitly light? What does the triple-gauge coupling bring into the tZq process? If this line can be clarified at all it would help. If it's immediately obvious to top physics experts then ignore the comment.

Line 13-14: It sounds odd to go from "...is an important quantity to measure" to "This Letter presents the observation ..." (rather than presenting a measurement of ...)

Line 70: Was DeepCSV actually used in the 2016 dataset?

Line 108: These electron and muon identification BDTs are not the standard ones also used in other CMS analyses, correct? If they are it would be good to mention this.

Line 117-120: The sentence implies that the tight leptons do not have to also pass the loose selection criteria. Is that correct? In addition, this is quite a long sentence, it might read better if it was split into two sentences.

Line 169: "the shape and normalisation are implicitly constrained in the final fit via the bins at low BDT values..." -> are there any unconstrained parameters in the fit that account for this? If this is not the case the wording is misleading as it implies the fit is given the full power to constrain shape and norm of this process in these bins, whereas if there are only nuisance parameters with prior constraints applied this isn't actually true. If there are unconstrained parameters that take care of this, then the constraint isn't implict, right?

Line 186-187: If the event is Z+gamma, with say gamma->e+e- and one of the leptons not properly reconstructed, shouldn't the mass of the dilepton system still be close to the Z mass? 

Line 217: Having looked at the additional documentation it appears that a single jet energy scale nuisance parameter is used for each dataset, which is constrained to around 50% of the input value in the case of 2017 data. Given that this nuisance parameter has a relatively large impact on the signal strength and probably would have been the systematic uncertainty with the largest impact if it had not been constrained, is there a justification for this constraint? Analyses which constrain the JEC uncertainty are often recommended to split the uncertainty into sources, the question is why this was not done in this case. Note: this is explicitly *not* a suggestion to change the approach that was used at this point, as it is highly unlikely that it would significantly change the results of the analysis.

Line 241: Why is the ISR/FSR uncertainty not applied to other simulated processes?

Line 250: ".. as determined from the control samples" - Is what is meant here that the control regions are fitted simultaneously with the three signal regions? If so this could perhaps be clarified..

Line 263-264 "are found to be consistent with the combined measurement": we expect this because we've just combined the two measurements. Something really odd would have to happen for the two individual measurements not to be compatible with the combined. The more interesting point to make is whether the 2016 and 2017 measurements individually are compatible with the standard model expectation (2016 not within 1 sigma from what I can tell).

Line 269: It is relevant that the asymptotic approximation was used to obtain the significances, as it is know not to be a good approximation anymore in this 'high significance' regime. It would be good to make clear in other places where the significances are quoted that they are calculated using the asymptotic approximation.

Figure 1, SR-2b: What is happening in the one bin with the very large statistical uncertainty?