5–7 Nov 2025
Deutsches Elektronen-Synchrotron DESY
Europe/Berlin timezone

From MongoDB to SQL: Efficient and Scalable Data Management with Tiled

6 Nov 2025, 17:20
20m
CFEL (Building 99) (Deutsches Elektronen-Synchrotron DESY)

CFEL (Building 99)

Deutsches Elektronen-Synchrotron DESY

Notkestraße 85, 22607 Hamburg
Contributed talk Community Talks

Speaker

Dr Yevgen Matviychuk (Brookhaven National Laboratory -- NSLS-II)

Description

Tiled is a general-purpose data access service that unifies heterogeneous scientific data stores behind a structured interface. By mapping diverse storage backends (CSV, HDF5, TIFF, Zarr, Parquet, relational and NoSQL databases) onto a concise set of logical abstractions—tables, arrays, and hierarchical containers—Tiled hides backend details while enabling efficient, sliceable, chunked access. Its HTTP-based architecture supports deployment as a public or private service, modern authentication, fine-grained authorization, caching, and fast data streaming via WebSockets for real-time acquisition.

Developed in the context of the Bluesky project, Tiled provides first-class support for the Bluesky event document model through the TiledWriter callback. This allows to ingest Bluesky run documents into a Tiled catalog, storing scalar data as tables while registering external binary data (e.g. detector images) via StreamResource/StreamDatum documents aligned to a common time index. TiledWriter includes a RunNormalizer that upgrades legacy schemas, can run asynchronously to avoid disrupting experiments, and buffers data during temporary outages.

We present the architecture and performance of the Bluesky–Tiled integration, emphasizing preprocessing and consolidation before data writing. These steps flatten and reindex streaming documents to accelerate later queries. We compare SQL-backed Tiled catalogs with the legacy NoSQL storage solution, demonstrating the improved support of scalable, low-latency lookup, fast random access, and array slicing. Finally, we share experience migrating existing Bluesky datasets from MongoDB to PostgreSQL, quantifying performance gains on representative beamline use cases.

Authors

Dan Allan (Brookhaven National Laboratory) Dr Yevgen Matviychuk (Brookhaven National Laboratory -- NSLS-II)

Co-authors

Dylan McReynolds (Lawrence Berkeley National Lab) Garrett Bischof (Brookhaven National Laboratory) Mr Joseph Ware (Diamond Light Source) Dr Juan Marulanda Arias (Brookhaven National Laboratory -- NSLS-II) Dr Jun Aishima (Brookhaven National Laboratory -- NSLS-II) Ms Kari Barry (Brookhaven National Laboratory -- NSLS-II) Nate Maytan (NSLS2) Dr Seher Karakuzu (Brookhaven National Laboratory -- NSLS-II)

Presentation materials