2025 European HDF5 Users Group Meeting

Name: 2025 European HDF5 Users Group Meeting
Start: 2025-05-26T09:00:00+02:00
End: 2025-05-28T18:00:00+02:00
Location: No location set

26–28 May 2025

Europe/Berlin timezone

Contact the HUG25 Organizing Committee

ehug25@hdfgroup.org

Contribution List

9. Using HDF5 to Store Scientific Data at the European XFEL

Jose Vazquez-Garcia (European XFEL)

26/05/2025, 09:25

20-minute presentation + 5-minute Q&A

The European XFEL is an X-ray laser research facility that produces extremely short and intense X-ray flashes, enabling investigations across a wide range of fields—from the structure of matter to the dynamic evolution of molecular systems. A typical experiment can generate petabytes of data within a day, originating from diverse detectors and in multiple formats. Managing this...

7. The keys to the management of time-evolution signals on HDF5 files in ITER

Rodrigo Castro (CIEMAT)

26/05/2025, 10:10

20-minute presentation + 5-minute Q&A

ITER (International Thermonuclear Experimental Reactor) is the largest international experiment in the field of generating nuclear fusion energy by magnetic confinement. ITER’s objective is to operate in modes that come as close as possible to the conditions of a commercial fusion reactor, which implies long pulses and systems running continuously.
From the point of view of the data...

6. Seamless Integration of Blosc2 and HDF5 for High-Performance Data Compression

Francesc Alted (Blosc project)

26/05/2025, 10:35

20-minute presentation + 5-minute Q&A

HDF5 is the de facto standard for storing large volumes of binary data in files. Blosc2, an award-winning high-performance library, excels at compressing binary data in memory. Both are widely used, making their integration natural. This talk will cover using Blosc2 as an HDF5 filter and HDF5 as a Blosc2 backend.

We will outline the current state of the Blosc2 plugin for HDF5...

5. High Performance Storage for HPC: Proven HDF5 Scalability and Consistency on Pure Storage FlashBlade

Michael Kaspars (Pure Storage)

26/05/2025, 11:00

20-minute presentation + 5-minute Q&A

The integration of the HDF5 high-level IO library, encompassing Single Writer Multiple Reader (SWMR) and parallel mode (pHDF5), with Pure Storage FlashBlade using NFS, modern Linux Kernels, and Networks, offers a robust and high-performance solution. This combination significantly reduces Total Cost of Ownership (TCO) and technical debt compared to traditional parallel file systems.

HDF5,...

19. Revisiting HDF5 SWMR and file versioning features

Elena Pourmal (Lifeboat, LLC)

26/05/2025, 11:25

20-minute presentation + 5-minute Q&A

In this talk, we will present an overview and demonstration of two HDF5 features implemented via virtual file drivers (VFDs), both currently in a prototype stage: the full single-writer/multiple-reader capability (VFD SWMR) and HDF5 versioning (also known as the Onion VFD).
The VFD SWMR feature enables file modifications during writing and provides guarantees on the maximum latency before new...

16. HDF5: The most versatile container for sharing scientific and engineering data

Gerd Heber (The HDF Group)

26/05/2025, 13:20

20-minute presentation + 5-minute Q&A

When people hear this affirmation, a common reaction is, "When, HDF Group, when?" In her book How to Make Sense of Any Mess, Abby Covert reminds us to be careful and not fall in love with our plans or ideas but with the effects we can have when we communicate clearly. While I would reject the notion that "a mess" aptly describes the current state of HDF5, I want to use this short...

12. An Open HDF5-Based Format for Industrial Inspection Data with Experimental Integration of the Onion VFD for File Versioning

Baptiste Gauthier (Evident Scientific)

26/05/2025, 13:45

20-minute presentation + 5-minute Q&A

The [NDE File Format][1] (.nde), developed by Evident, is an open, extensible data format tailored for the non-destructive evaluation (NDE) and testing (NDT) industry. Built upon the HDF5 container and augmented with JSON-based metadata, it offers a platform-independent solution for storing inspection data, primarily for ultrasonic modality. By adopting an open format, .nde files can be...

20. Using a HDF5 File as a Zarr v3 Shard

Mark Kittisopikul (Howard Hughes Medical Institute)

26/05/2025, 14:10

20-minute presentation + 5-minute Q&A

Version 3 of the Zarr specification includes a sharding codec that allows for chunks to contain small inner chunks. The format of the resulting binary file format of shards is reminiscent of a HDF5 file. Both HDF5 files and Zarr v3 shards may contain compressed chunks. Furthermore, the Zarr v3 shard specification is similar to the Fixed Array Data Block structure within a HDF5 file....

14. Race Ya! An Accelerator-Native I/O Pipeline for HDF5

Quincey Koziol (NVIDIA)

26/05/2025, 14:35

20-minute presentation + 5-minute Q&A

GPUs and similar accelerators have become the dominant compute platform for STEM applications, from finance to space flight, and beyond. However, HDF5 continues to execute exclusively on the Host CPU of GPU nodes. This talk will present a design overview of moving the I/O pipeline filters, datatype conversions and other transforms from the CPU to the GPU, including how to perform I/O...

11. HDF: Powering the Future of Data - What's Now, What's New, What's Next!

Scot Breitenfeld (The HDF Group)

26/05/2025, 15:20

20-minute presentation + 5-minute Q&A

This talk will provide an overview of the current state of the HDF5 software ecosystem, highlighting recent advancements, key components, and ongoing challenges. We will explore the future directions of the various tools and libraries that empower researchers and developers to manage HDF5-related workflows efficiently. Additionally, the talk will outline potential future initiatives for the...

3. Innovative Data Acquisition Solutions with HSDS at MAX IV Synchrotron

Zdenek Matej (MAX IV Laboratory, Lund University)

27/05/2025, 09:00

20-minute presentation + 5-minute Q&A

MAX IV is an accelerator-based light source located in southern Sweden. It continuously operates 16 experimental stations using X-rays for material and life sciences. User data analysis primarily occurs on a small edge HPC, while automatic scientific data processing, from around 100 data sources, runs in an edge-cloud environment. These pipelines provide rapid feedback on data acquisition and...

18. Down to the bytes: can we simplify alternative access to HDF5?

Thomas Kluyver (Eur.XFEL (European XFEL))

27/05/2025, 09:25

20-minute presentation + 5-minute Q&A

HDF5 is an enormously powerful and flexible file format. There are many different ways to use it, and it's difficult to provide one API that works efficiently for all the possible use cases. However, the complexity of the on-disk file format is a high barrier to alternative implementations, so with a few heroic exceptions, most code reading & writing HDF5 does so through the canonical C...

22. h5pydantic

Clinton Roy (ANSTO)

27/05/2025, 10:10

Poster + 5-minute talk

h5pydantic is a Pydantic based library aimed at making it easier for scientists to organise their HDF5 files, by writing Python models of their experiments. The library is similar to an Object Relational Mapper (ORM), but instead of targeting a relational database, it targets HDF.

The library is inspired by the need of the Australian Synchrotron during the commissioning of our new beamlines...

10. HSDS v1.0 – Performance Features

John Readey (The HDF Group)

27/05/2025, 10:35

20-minute presentation + 5-minute Q&A

HSDS (Highly Scalable Data Service) is a REST-based service that provides read/write access to HDF5 data stores – using object storage or posix. By using a combination of multi-processing and asynchronous IO, HSDS can achieve remarkable performance when accessing very large datasets. On the other hand, performance lagged for clients invoking a series of smaller requests (reading or writing a...

2. Writing HDF5 in Pure Java: Practical Guide

Igor Khokhriakov (FS-SC (Scientific computing))

27/05/2025, 11:00

20-minute presentation + 5-minute Q&A

This contribution presents our experience using a pure Java implementation of the HDF5 file format to support metadata collection at the P05 beamline at Hereon, DESY. Since 2016, we have relied on Java-based solutions to generate HDF5 files for hundreds of experiments with minimal maintenance overhead.

We will provide a practical overview of how to use the Java HDF5 library effectively in...

1. Enhancing HDF5 with Multi-Threading, Sparse Data Storage, and Encryption: Preliminary Results and Demonstrations

Elena Pourmal (Lifeboat, LLC)

27/05/2025, 11:25

20-minute presentation + 5-minute Q&A

Over the past few years, Lifeboat LLC has been focused on advancing the capabilities of HDF5 by incorporating multi-threaded support, enhancing the storage of sparse and variable-length data, and implementing robust encryption mechanisms for data stored within HDF5 files. These improvements are aimed at optimizing performance, increasing flexibility, and strengthening data security.
In our...

23. Keynote: From Data Blocks to Code Snippets: Boosting HDF5 with GitHub Copilot

Julia Kordick (Microsoft)

27/05/2025, 13:20

Working with HDF5 often means navigating large datasets, verbose APIs, and boilerplate-heavy code. In this demo-heavy session, we’ll explore how AI-assisted coding tools—specifically GitHub Copilot—can accelerate common HDF5 workflows across C, C++, and Python. From auto-generating read/write boilerplate, to documenting complex structures, to scaffolding tests and data conversion routines,...

13. Shiver Me Timbers: The Design of Sharded Storage for HDF5

Mr Quincey Koziol (NVIDIA)

27/05/2025, 14:00

20-minute presentation + 5-minute Q&A

As object storage becomes even more prevalent, HDF5's underlying storage format needs to updated to match the interface that cloud and on-prem object systems provide. This talk will present a design overview of a mapping of the HDF5 data model onto S3-compatible storage systems. An outline of the planned VOL connector implementation and projected performance goals will be part of the talk.

15. HDF5 Viewer Roundtable

Axel Bocciarelli (ESRF), Gerd Heber (The HDF Group), John Readey (The HDF Group), Thomas VINCENT (ESRF)

27/05/2025, 14:25

A very common need when presented with an HDF5 file (especially for non-programmers or those new to HDF5), is some way to “see” the contents. Happily, there are a variety of viewers for HDF5 data sources available: HDFView, H5Web (myhdf5), HDF Compass, etc. However, it’s not obvious how these compare or when one or another might be preferable for a particular application. In this session,...

4. Accelerating Data Compression in HDF5 through Parallel Filter Processing

Frederick Neu

28/05/2025, 09:00

20-minute presentation + 5-minute Q&A

Modern science and engineering creates and accumulates huge amounts of data which is
persisted through tools like HDF5 in order to be available for further analysis, display
and many other operations. Increasing efficiency in this data processing is critical for
nowadays growing data quantities, not only for saving time, but also to efficiently use
available resources.
This thesis aimed...

17. Zarr access via NetCDF-C

Manuel Reis (Deutsches Klimarechenzentrum - DKRZ)

28/05/2025, 09:25

20-minute presentation + 5-minute Q&A

With the growing adoption of the Zarr data format for scalable and cloud-optimized storage, NetCDF has introduced an interface to support Zarr access. This integration enables a broader range of scientific software, beyond the Python ecosystem, to interact with Zarr datasets through the familiar NetCDF API. In this presentation, we will discuss the current state of the NetCDF-Zarr...

8. Synergizing Science and Data: IOwarp for Scalable Workflow Management

Scot Breitenfeld (The HDF Group)

28/05/2025, 10:10

20-minute presentation + 5-minute Q&A

Scientific workflows are evolving rapidly, demanding the seamless integration of simulation, experiments, analytics, and AI. This evolution is placing immense pressure on traditional data management systems. To address these challenges, we present IOwarp, a new initiative focused on building a comprehensive data management platform. IOwarp aims to streamline complex scientific workflows by...

24. Streamlining HDF5's AI Workloads Benchmarking

Dlyaver Djebarov

28/05/2025, 10:35

20-minute presentation + 5-minute Q&A

Rapid adoption of artificial intelligence (AI) in scientific computing requires new tools to evaluate I/O performance effectively. HDF5 is one of the data formats frequently used not only in HPC but also in modern AI applications. However, existing benchmarks are insufficient to address the current challenges posed by AI workloads. This talk introduces an extension to the existing HDF5...

Choose timezone

2025 European HDF5 Users Group Meeting

Contact the HUG25 Organizing Committee