The huge data rates taken at large instruments, like the Square Kilometre Array Observatory (SKAO), need to be reduced significantly during the data acquisition phase. For an understanding and reproducibility of measurement results, it is necessary to store the multitude of details that define the (real-time) criteria for data selection. This is accompanied by a massive increase in metadata. Traditional methods for storing metadata, such as SQL databases, are no longer suitable due to their poor scalability. Not only do they struggle to store petabytes, but they are also inflexible to schema changes, and their access time deteriorates as data size increases.
This talk provides an overview of selected challenges in metadata storage and analysis and discusses whether frameworks such as Lustre, Apache Parquet, Greenplum and Delta Lake could be useful.
==============================================
Connection details:
ZOOM Meeting “PUNCHLunch seminar”:
https://desy.zoom.us/j/91916654877
Webinar ID: 919 1665 4877, passcode: 481572