2025-06-02 9:02 – 10:02
Topic: Performance studies with SciCat
Igor presented his performance studies for two cases: “Data-out” and “Data-In” for a setup assuming 10 parallel clients, 1 mongoDB, 4 Api servers, 1 nginx on VM of 8 CPU (2 nodes), 32 GiB RAM, 4 GiB swap (typical DESY VM).
For Data-out he assumed load at different rates GET requests (10, 100, 1000) on dataset endpoint - with and without indexed quantity and for different content of the DB (1k, 10k, 100k, 1M). Conclusion: for an indexed query SciCat performs well (slide 3).
For the Data-in case, the ingestion part, ie POST requests to Dataset, he used the same setup. His tests comprised 2 cases, case 1: 100 metadata fields, case 2: 250 metadata fields. Observation: dramatic loss of performance from case 1 to case 2, probably due to nb of checks run for each metadata fields (date string,unit conversion, etc).
Neele confirmed for P05 (tomorgraphy) they have about 60* fields at most per dataset (*corrected).
Conclusion and follow up:
Follow up: Regina will ask for Igor to get access on real cluster environment to test mainly performance scenarios including associated filenames in OrigDatablocks.