26–28 May 2025
Europe/Berlin timezone

HSDS v1.0 – Performance Features

27 May 2025, 10:35
25m
FLASH seminar room

FLASH seminar room

FLASH Notkestrasse 85 22607 Hamburg
20-minute presentation + 5-minute Q&A

Speaker

John Readey (The HDF Group)

Description

HSDS (Highly Scalable Data Service) is a REST-based service that provides read/write access to HDF5 data stores – using object storage or posix. By using a combination of multi-processing and asynchronous IO, HSDS can achieve remarkable performance when accessing very large datasets. On the other hand, performance lagged for clients invoking a series of smaller requests (reading or writing a small dataset selection, creating an attribute or a new link). As typical in client-server architectures, the per-request latency is quite high compared to in-process operations (e.g. making a call to the HDF5 library). To address this, the next release of HSDS and h5pyd (the HSDS client library for Python) will look to improve performance by a combination of read-ahead logic and combining write operations into a single request. At the same time, clients can continue using the same h5py-api to achieve better performance without needing to make any code changes.

May we record your session? Yes

Primary author

John Readey (The HDF Group)

Presentation materials