Protein crystallography is one of the most successful methods for biological structure determination. This technique requires many diffraction snapshots to get 3D structural information of the studied protein. Even more patterns are needed for studying fast protein dynamics that can be achieved using serial crystallography (SX). Fortunately, new X-ray facilities such as 4th generation synchrotrons and Free Electron Lasers (FELs) combined with newly developed X-ray detectors opened a way to carry out these experiments at a rate of more than 1000 images per second. The drawback of this increase in acquisition rate is the volume of collected data - up to 2 PB of data per experiment could be easily obtained. Therefore, new data reduction strategies have to be developed and deployed. Lossless data reduction methods will not change the data, but usually fail to achieve a high compression ratio. On the other hand, lossy compression methods can significantly reduce the amount of data, but they require careful evaluation of the resulting data quality.
We have tested different approaches for both lossless and lossy compression applied to SX data, proposed some new ways for lossy compression and demonstrated appropriate methods for data quality assessment. By checking the resulting statistics of compressed data (like CC*/Rsplit, Rfree/Rwork) we have demonstrated that the volume of the measured data can be greatly reduced (10-100 times!) while the quality of the resulting data was kept almost constant. In addition, we tested lossy compression methods on the SAD dataset (thaumatin collected at 4.57 keV, measured at the SwissFEL) and demonstrated that even such very sensitive data can be successfully compressed. It allowed us to determine the limit of application for all considered lossy compressions. Some of the proposed compression strategies, tested on SX and MX datasets, can be used for other types of experiments, even with different sources (for example electron and neutron diffraction).