Welcome to participants
Goals of workshop
Organizational matters
Enabling multi-threaded access to data stored in HDF5 and efficient storage of sparse and variable-length data are long-standing requests from the HDF5 user community. Lifeboat, LLC has been working closely with The HDF Group on design and implementation of the new capabilities.
In our talk we will report on the progress we made toward multi-threaded concurrency in HDF5 since the last...
Recently, the hdf5plugin
(https://www.silx.org/doc/hdf5plugin) has gained support of the Blosc2 library. This allows for HDF5/h5py to use many of the technologies that Blosc2 already supports.
In our talk, we will be describing recent work that we have conducted in enhancing Blosc2, namely:
1) A new dynamic plugin system, that can be easily installed via Python wheels.
2) A new...
HDF5 is the standard data format at most X-ray sources. The ESRF uses this format for both acquisition and processing of data. This contribution highlights the usage of direct-chunk read/write features of the HDF5 library and shows how it can be coupled with GPU processing.
For numerical analysis, GPUs are proven to be ~5 times faster than equally optimized CPU code on equivalent hardware....
HDF5 format is used to store experimental data from photon and neutron sources worldwide. Field programmable gate arrays (FPGAs) are recently finding applications in data acquisition on accelerator-based photon sources. FPGAs can be used also as regular compute accelerators similarly to general purpose graphical processing units. Options for feeding FPGA data reduction algorithms with...
New compression features have been added to the netCDF C and Fortran libraries, including lossy compression, zstandard, and parallel I/O support.
These features will help science data producers such as NOAA, NCAR, NASA, and ESA process, store and distribute the large scientific datasets produced by higher-resolution models and instruments.
The Community Codec Repository (CCR) will be...
[hdf5plugin][1] is a Python package (1) providing a set of [HDF5][2] compression filters (namely: Blosc, Blosc2, BitShuffle, BZip2, FciDecomp, LZ4, SZ, SZ3, Zfp, ZStd) and (2) enabling their use from the Python programming language with [h5py][3] a thin, pythonic wrapper around libHDF5.
This presentation illustrates how to use hdf5plugin for reading and writing compressed datasets from...
DECTRIS X-Ray detectors are utilized at synchrotrons and laboratories around the world, where they strongly contribute to a growing accumulation of data. As we move toward the introduction of next-generation detectors, we expect a rise in both framerates and datarates. Our current pipelines that heavily rely the HDF5 data format and its corresponding software framework, wheres these pipelines...
H5wasm is a webassembly-based library for reading and writing HDF5 files, which can be used natively in a web browser or in a local nodejs environment. The library has no external runtime dependencies, and is used in some online HDF5 viewers that don't require server-side processing: https://h5web.panosc.eu/h5wasm and https://myhdf5.hdfgroup.org/
The community has requested more...
The h5cpp library developed by DESY and ESS is a c++ wrapper for the HDF5 library. Using modern c++ features it simplifies creation of HDF5 files. The pninexus library adds a set of advance tools, e.g. a file structure builder from XML configuration. The libraries with their python binding are heavily used by the PETRA III experiment @ DESY in our detector Tango servers and our NeXus metadata...
The Open Standard for Particle-Mesh Data (openPMD) is a F.A.I.R. metadata standard for tabular (particle/dataframe) and structured mesh data in science and engineering.
We show the basic components of openPMD, its extensions to specific domains, applications from laser-plasma physics, particle accelerators, material physics to imaging and the ability to bridge multiple heterogeneous...
https://gitlab.com/helmholtz-berlin/nexuscreator
https://gitlab.com/helmholtz-berlin/nexuscreatorpy
The research data management group at Helmholtz-Zentrum Berlin is applying FAIR data management. Data starts to be moved from specific file formats into NeXus/HDF5 files. The standardization program involves the conversion of already generated data, and the automation for the creation of...
The materials science beamline, ID11, at the ESRF, was upgraded in 2020 to get a Dectris Eiger 4M pixel detector. This can record diffraction frames at 500 Hz while samples are rotated and scanned in a tiny (~150 nm) X-ray beam. Reconstruction of the diffraction data can eventually give detailed images of all the crystals inside the materials. The large quantities of data can be problematic to...
Serial crystallography (SX) has become an established technique for protein structure determination, especially when dealing with small or radiation-sensitive crystals and investigating fast or irreversible protein dynamics. The advent of newly developed multi-megapixel X-ray area detectors, capable of capturing over 1000 images per second, has brought about substantial benefits. However, this...
High bandwidth instruments (data production rates of GB/s) have proliferated in photon science experimental facilities in the last years across the globe. Some of them are planned to be operated 24/7. Data volumes thus produced exceed both the budget of storage facilities and sometimes even the ingest capacities of hardware.
In this talk, I'd like to highlight key challenges when...
HSDS (Highly Scalable Data Service) is a REST-based web service that supports most of the features of the HDF library, but running as a service. HSDS supports the standard HDF compressors as well as BLOSC-based compressors "out of the box". In addition, HSDS supports parallel compression/decompression and supports using compression with variable length datatypes. This talk will cover how...
Btune (https://www.blosc.org/pages/btune/) is a dynamic plugin for Blosc2 that can help you find the optimal combination of compression parameters for datasets compressed with Blosc2. Blosc2 can easily be used from HDF5/h5py via the hdf5plugin (https://www.silx.org/doc/hdf5plugin).
Depending on your needs, Btune has three different tiers of support for tuning datasets:
- **Genetic...
Hands on tutorial for running HSDS. HSDS is a RESTful service for HDF data that can be used in cloud, desktop, or HPC environments. Tutorial will cover:
HSDS architecture
Installing HSDS
Configuration Options
HSDS command line tools
HSDS compression
Accessing HSDS with REST, Python, and C (rest-vol)
TO JOIN THE TUTORIAL, YOU NEED:
Install Anaconda Python from:...