Speaker
Description
In recent years, serial femtosecond crystallography (SFX) has made remarkable progress for the measurement of macromolecular structures and dynamics using intense femtosecond duration pulses from X-ray Free Electron Laser (FEL). In these experiments, FEL X-ray pulses are fired at a jet of protein crystals, and the resulting diffraction pattern is measured for each pulse. If the pulse hits protein crystals, the resulting diffraction pattern is recorded. However, most of the time the beam does not hit the crystal and no useful information is recorded. As a result, out of the hundreds of thousands of diffraction patterns in a typical experiment, only a small fraction is useful, so there is tremendous potential for data reduction. Diffraction from a protein crystal produce distinctive patterns known as Bragg peaks. Therefore, existing methods utilize statistical tools to find peaks for identification of diffraction patterns that contain Braggs peaks and remove any patterns which only contain empty shots, resulting in considerable data reduction. Typically, peak finding methods attempt to find ‘all’ Bragg peaks in diffraction patterns which can be computationally expensive. In addition, existing methods require carefully crafted parameters from domain experts. In this work, our goal is to build data reduction methods for serial crystallography that are computationally cheaper and less reliant on parameter(s), leveraging the astonishing success of machine learning. In addition, we will present a fair comparison among existing and machine learning methods with the aim to benchmark the SFX data reduction task. Furthermore, we observe that experimental settings may vary among multiple experiments leading to domain gap for a typical machine
learning model. Thus, we want to build a ‘universal’ model that can be applied in multiple experimental settings.
Keywords: Serial Crystallography, Data Reduction, Machine Learning