- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
This is the 2nd annual workshop that is organized in the frame of the FH platform on scientific computing (XWIKI FH-SCiComp). Main goals of the workshop are
The workshop is primarily intended for early career researchers, who are actively pursuing the work. Of course, also interested staff is invited.
Topics:
Please register to help us organizing the breaks.
Please fill out the this survey about the gathering after the session on Thursday.
More details will follow soon. Ideally subscribe to the mailing list of platform fh-scicomp@desy.de (via the DESY mailing list server).
Asapo is a streaming framework developed and deployed at DESY to support data acquisition and online data analysis on processing clusters. It efficiently enables high-bandwidth communication between state-of-the-art detectors, storage systems, and independent analysis processes. The deployed solution provides real-time feedback and is offered as a service to DESY scientists. Several data-processing pipelines based on Asapo are deployed at different beamlines at the PETRA III facility.
Continuous Integration (CI) has become a key ingredient for scientific software development. Github is one of the main solutions that are currently in use for hosting open source code. It offers a CI system called "github actions" which allows to develop composable and reusable actions. In this contribution we show some of the possibilities these github actions offer. Starting from the basics of building and testing changesets, we show how to speed up that process, but also how to use github actions to build full fledged software images for the muon collider community.
The interTwin project, funded by Horizon Europe, is building an open-source Digital Twin Engine (DTE) to support interdisciplinary scientific Digital Twins (DTs) across domains such as High Energy Physics, Astrophysics, Climate Science, and Environmental Monitoring. The platform is co-designed by infrastructure providers, technology experts, and domain scientists to streamline the creation and deployment of complex DT workflows.
The DTE enables the execution of containerized workflows on heterogeneous computing backends (HPC, HTC, Cloud) through InterLink, a component that abstracts Kubernetes pod execution to remote resources using an extended Virtual Kubelet architecture. Complementing this, the interTwin Data Lake provides a federated data layer that enables efficient data access and management across multiple storage sites, supporting FAIR-aligned data practices to facilitate interoperability and reuse. It is built on technologies like Rucio, FTS and integrates identity and authorization via EGI Check-in.
To support site integration and user collaboration, the project introduces Teapot, a multi-tenant WebDAV interface developed within interTwin. Teapot enables storage sites without native WebDAV support to add WebDAV access seamlessly while preserving file ownership, making it fully compatible with HPC storage systems. This approach allows sites to securely expose storage to diverse user communities. Teapot also integrates with ALISE, which lets users link local accounts with multiple external identities, facilitating seamless access across federated environments.
The data of the ZEUS experiment at HERA and their usage have been converted to "preservation mode" in 2012, and new physics results have continuously been published from these data since then.
A description will be given of the original plan, its implementation, the latest status of data, software and knowledge access, some corresponding results, and the related challenges.
I will give a quick update on the FAIR and Open Data portal public-data.desy.de to outline how we employ metadata schemata for validated ingestion before community curation and finally show that we can mint DOIs that resolve data locations directly
Data access is an essential component of the scientific process, as is the storage system where that data resides. There is no one-size-fits-all solution, which makes selecting the right storage system a non-trivial task.
In this presentation, I will briefly overview the file and storage systems used at DESY and offer guidelines for matching different workloads to appropriate storage solutions.
The increasing data flow for LHC Run4 requires an architecture capable of handling massive parallelism. This shift demands a heterogeneous environment, creating the need for heterogeneous programming. Various ecosystems like KOKKOS and alpaka provide portability across different accelerators such as CPUs, GPUs, and FPGAs which are the most commonly used at CERN experiments. They achieve this by switching backends at runtime, making them well-suited for such diverse environments.
The CMS experiment has integrated alpaka (an abstraction library for parallel kernel acceleration) into its data analysis software, CMSSW, for several years. The current goal is to migrate as many components as suitable from traditional serial CPU processing to GPU-based processing (both CUDA and ROCm) using alpaka.
One key component is the Phase 2 Outer Tracker data unpacker, which translates RAW data from the DAQ output to the reconstruction step. This component presents an excellent opportunity for increased parallelization and performance gains. The project presented here marks the first step toward building hardware-portable data analysis components for the HL-LHC which is the porting of the Phase 2 Outer Tracker Unpacker from the CPU-based CMSSW model to an alpaka-based CMSSW model.
We show some examples of computer algebra used in theoretical particle physics. The examples include analytic integration and summation and solving systems of linear, differential and difference equations and are based on the computer algebra packages Mathematica, Maple and Form.
We present Pepper, a general purpose framework for CMS data analysis developed at DESY. This is a python-based framework using columnar processing and the scikit-hep ecosystem of libraries, notably awkward and coffea. This approach offers processing speeds comparable to C++ frameworks, while being more approachable for recent graduates with experience of numpy and similar tools. Pepper extends on its base packages by offering helper classes and functions, a simple configuration interface, working examples for simple analyses and easily extensible code. The framework has been used in approximately 5 published analyses, with about 15 further analyses in progress, mostly at DESY. We will discuss lessons learnt from developing this framework, as well as challenges from an analysis and computing perspective.
The CMS High Granularity Calorimeter (HGCAL) will be an
entirely new calorimeter for the high-luminosity phase of the LHC. It
comprises hexagonal silicon modules and scintillating tile modules with
silicon photomultipliers for readout, i.e., SiPM-on-tile modules. At
DESY, about 2,000 modules are going to be assembled, which requires
rigorous and automated quality control (QC) procedures. This
contribution will show the design and workflow of the developed QC
framework. Its modular approach allows the QC tests to be broken down
into a series of steps, which involve interfacing with specialized
hardware for data acquisition, studying the module's response when
changing configuration parameters, performing on-the-fly data analysis,
deriving calibration parameters, and automated reporting. The framework
offers a one-click solution to enable even non-expert users to execute
the QC procedures and determine if an assembled SiPM-on-tile module
meets the standards required for integration into the CMS HGCAL.
Flavour-tagging is a critical component of the ATLAS experiment physics programme. Existing flavour tagging algorithms rely on several low-level taggers, which are a combination of physically informed algorithms and machine learning models. A novel approach presented here instead uses a single machine learning model based on reconstructed tracks, avoiding the need for low-level taggers based on secondary vertexing algorithms. This new approach reduces complexity and improves tagging performance. This model employs a transformer architecture to process information from a variable number of tracks and other objects in the jet in order to simultaneously predict the jets flavour, the partitioning of tracks into vertices, and the physical origin of each track. The inclusion of auxiliary tasks aids the models interpretability. The new approach significantly improves jet flavour identification performance compared to existing methods in both Monte-Carlo simulation and collision data. Notably, the versatility of the approach is demonstrated by its successful application in boosted Higgs tagging using large-R jets.
This contribution presents the final iteration of the CaloClouds series. Simulation of photon showers in the granularities expected in a future Higgs factory is computationally challenging. A viable simulation must capture the find details exposed by such a detector, while also being fast enough to keep pace with the expected rate of observations. The Caloclouds model utilises point cloud diffusion and normalising flows to replicate MCMC simulation with exceptional actuary. First we will make a lightning overview of the models objectives and constraints. To describe the upgrades for the latest version, we detail the studies on the flow model and the optimisations made, and then summarise the steps taken to generalise CaloClouds 3 for using in the whole detector. Considering some of the underlying principles of model design, we look at the significance of the data format choice on model outcomes. Finally, we present the results of reconstructions performed on CaloClouds 3 output against the results from Geant4 simulation, thus demonstrating that this model provides reliable physics reproductions.
References
https://arxiv.org/pdf/2309.05704
The Helmholtz Model Zoo (HMZ) is a cloud-based platform that gives Helmholtz researchers seamless access to deep learning models via a web interface and REST API. It enables easy integration of AI into scientific workflows without moving data outside the Helmholtz Association.
Scientists across all 18 Helmholtz centers can contribute models through a streamlined GitLab submission process. HMZ automatically tests, deploys, and generates web and API interfaces based on submitted model metadata, supporting diverse use cases with minimal effort.
Inference runs on GPU nodes (4×NVIDIA L40) at DESY Hamburg via NVIDIA Triton Server, with data stored securely in HIFIS dCache InfiniteSpace and owned by the uploading user. Both open and restricted model sharing are supported.
Developed by Helmholtz Imaging at DESY, with support from HIFIS and Helmholtz AI.