Speakers:
Bernardo Ary dos Santos Garcia(Aachen), Juan Pablo Arcila Maldonado(Munich)
17:30
→
18:30
LEGEND data handling1h
Minutes available, the presentation has been done on Friday 21:
"HDF5 files: matrix like data stored. Each column is a vector of vectors.
Metadata: JSON files, used in cascade. So you can overwrite information of specific data taking periods. validity JSON are assigned to make some dataset valid, or outdated or just wrong.
To make the files being readable by everyone, goal is to press a button and make the analysis, they use Snakeman (Nextflow not optimal).
Dataflow: raw -> dsp-opt (filtering stage) -> dsp-digital signal process (extract key quantities) -> calib -> events
Data stored with a logic behind the name, so to include key info of the dataflow steps.
Snakemake is making new fils from original ones, by passing through configuration files, telling Snakemake how exactly to make the new file.
Hands-on tutorial on how the whole data production system is structured.
Mistakes and bugs are tracked in the log given as output."
External talk by CRESST/ COSINUS on how to streamline analysis flow
Raw data (what comes out from the hardware trigger, basically just threshold amplitude) comes out from the DAQ. Then skim a bit the samples and extract key quantities ending with some physics results. They write everything (in terms of digits, not full waveforms) passing the hardware trigger and the main bottleneck is to have a fast system that allows them to continuously acquire data. They proceed with a partially blind analysis, looking at the 10% of data, which is then burned but it’s important to have a first look at the data and initial checks. They rely on ROOT and they usually call functions from terminal. They also have a GUI that they can use to look at the streams of data so as to have a DQM (Data Quality Monitor) system. They ideally want to make the final analysis that relies on a specific bach system (such strategy is not very well followed today) but it’s getting stuck due to excessive accesses. They do not want to work on this because they have problems only at the unblinding stage. They have backup disks that can be used by different groups and shared widely (to be sure data is not lost). MPCDF is a nice tool but we can do the analysis on a dedicated batch system. Having CERN recognition is always good to have, so to have access to all the services that they pay for already. They suggest exploiting it but still have some "private" solutions. COUSIN uses Labfolder as an eLog system (instead of writing in a notebook what happened in the lab) but CREST uses eLog. We can check different solutions for us.
For the first analysis step (low-level analysis) they use 2 main packages: one in CC and a new one in Python. The Python one is mostly used in the first steps of the analysis to help newcomers understand the analysis. They use the software as a container system so that all dependencies and packages are already installed in such containers, that you can use blindly without any software-related issues. Also, if you upload a code in GitLab, the system automatically creates a (Docker-) container so that it is available to everyone.
High-level analysis (I lost it eheh) They have "Cryocluster Discourse" to ask the collaboration for feedback on software-related issues. Questions, and answers, are stored online.
15:30
→
16:00
Coffe break
30m
16:00
→
17:30
SA + VNA measurements: Noise calibration measurement @Bonn