Name: SciCat Performance Studies
Start: 2025-06-02T09:00:00+02:00
End: 2025-06-02T10:00:00+02:00
Location: DESY, building 1, 2nd floor, SR 03a

Hide

2025-06-02 9:02 – 10:02

Topic: Performance studies with SciCat

Igor presented his performance studies for two cases: “Data-out” and “Data-In” for a setup assuming 10 parallel clients, 1 mongoDB, 4 Api servers, 1 nginx on VM of 8 CPU (2 nodes), 32 GiB RAM, 4 GiB swap (typical DESY VM).

For Data-out he assumed load at different rates GET requests (10, 100, 1000) on dataset endpoint - with and without indexed quantity and for different content of the DB (1k, 10k, 100k, 1M). Conclusion: for an indexed query SciCat performs well (slide 3).

For the Data-in case, the ingestion part, ie POST requests to Dataset, he used the same setup. His tests comprised 2 cases, case 1: 100 metadata fields, case 2: 250 metadata fields. Observation: dramatic loss of performance from case 1 to case 2, probably due to nb of checks run for each metadata fields (date string,unit conversion, etc).

Neele confirmed for P05 (tomorgraphy) they have about 60* fields at most per dataset (*corrected).

Conclusion and follow up:

All tests did not include any attachments, ie metadata of associated files.
From the tested scenario SciCat can handle the retrieval of metadata decently well for indexed quantities.
Any unindexed searches reduce performance slightly but are still below noticeable threshold of human readings (below 300 ms for 1000RPS with 1M records - 50th percentile).
Concurrent writing may become an issue for too many scientific metadata, up to 100 OK.

Follow up: Regina will ask for Igor to get access on real cluster environment to test mainly performance scenarios including associated filenames in OrigDatablocks.

There are minutes attached to this event. Show them.