Workshop on Generative Models

Europe/Berlin
FIAS

FIAS

Gregor Kasieczka (UNI/EXP (Uni Hamburg, Institut fur Experimentalphysik)), Jan Steinheimer (FIAS), Kai Zhou (FIAS), Thomas Kuhr (BELLE (BELLE II Experiment))
Description

This is a 'Generative Models' Workshop organized by the BDA Topic Group of DIG-UM with support from the ErUM-Data-Hub. Aim of this workshop is to have a lively and open scientific exchange on new developments on generative models for ErUM research. This workshop is addressed to scientists from all ErUM communities who have been or are currently working on new generative models.

We will have a mixture of invited talks and open contributions. If you wish to present some of your recent research at the workshop please submit an abstract. Please note that due to the linited number of available slots and space, participation in the workshop will be moderated. So, please register as soon as possible to allow us to plan in advance.

Due to the generous support of the ErUM-Data-Hub there will be no workshop fee.

This workshop is organized by the DIG-UM Topic Group Big Data Analytics. It will be followed by the annual BDA workshop on the 29th of February and 1st of March, also at FIAS: https://indico.desy.de/event/40597/ 

Registration
Registration for the Generative Models Workshop
    • 12:00
      Arrival & registration
    • 1
      Welcome and important information
    • 2
      Deep Generative Models in Science
      Speaker: Johannes Brandstetter
    • 3
      Knowledge-Driven Generative Models for Fields

      Many scientific and technological applications require knowledge of physical fields that are functions over continuous spaces. Here, we demonstrate how to construct flexible generative models for fields that incorporate domain knowledge and do not require any previous training. We also show how the actual field configuration and its uncertainty can be inferred from the data using the Numerical Information Field Theory (NIFTy) package.
      The versatility of NIFTy is demonstrated in a number of astrophysical applications, ranging from spatio-spectral sky imaging, to 3D galactic tomography, to movies of black hole environments.

      Speaker: Vincent Eberle
    • 15:00
      Coffee break
    • 4
      Efficient phase space sampling with Normalizing Flows

      I present a neural network based approach to phase space sampling in high-energy physics. The main idea is to use Normalizing Flows to remap physics-motivated sampling distributions in order to increase the sampling efficiency. The bijectivity of Normalizing Flows thereby guarantees full phase space coverage and an unbiased reproduction of the desired target distribution. Results for representative examples demonstrate the potential of this approach. I reflect on this in the context of recent developments and discuss possibilities for further improvements.

      Speaker: Timo Janßen (University of Göttingen)
    • 5
      Score-Based Generative Models for Radio Galaxy Image Simulation

      In view of the increasing amounts of data accumulating in extragalactic surveys of ever-growing extent, the ability to realistically synthesise image data takes a significant role, proving valuable in testing data analysis methods, developing theoretical frameworks and training machine learning models. As a product of the latest advancements in the field of deep generative modelling, score-based generative models, commonly known as as diffusion models, have emerged as powerful instruments for the task of realistic image generation.

      This talk will provide insight into our ongoing efforts to implement a score-based generative model dedicated to generating realistic radio galaxy images. It will cover a brief introduction to the basic working principles of diffusion models, along with a more specific description of the particular methods for implementation and training employed in this project. Further, the metrics utilised for evaluating the quality of generated images will be discussed. Finally, preliminary results from our work will be presented, offering insight into the advancements achieved thus far.

      Speaker: Tobias Martínez (U Hamburg)
    • 6
      Generate medium response of jet quenching using flow model
      Speaker: LongGang Pang (CCNU)
    • 7
      ParticleGrow: Event by event simulation of heavy-ion collisions via autoregressive point cloud generation

      The properties of hot and/or dense nuclear matter are studied in the laboratory via Heavy-Ion Collisions (HIC) experiments. Of particular interest are the intermediate energy heavy-ion collisions that create strongly interacting matter of moderate temperatures and high densities where interesting structures in the QCD phase diagram such as a first order phase transition from a gas of hadrons to Quark Gluon Plasma or a critical endpoint are conjectured. Such densities and temperatures are also expected to be found in astrophysical phenomena such as binary neutron star mergers and supernova explosions. The experimental measurements are compared with model predictions to extract the underlying properties of the matter created in the collisions. However, the model calculations are often computationally expensive and extremely slow. Therefore, to exploit the full potential of the upcoming HIC experiments, fast simulation methods are necessary. In this work, we present “ParticleGrow”, a novel autoregressive Point cloud generator that can simulate heavy-ion collisions on an event by event basis. Heavy-ion collision events from the microscopic UrQMD model are used to train the generative model. The model built based on the PointGrow algorithm generates the momentum (px,py and pz) and PID ( 7 different hadronic species) , particle by particle in an autoregressive fashion to create a collision event. The distributions of the generated particles and different observables are compared with the UrQMD distributions. It is shown that the generative model can accurately reproduce different observables and effectively capture several underlying correlations in the training data.

      Speaker: Manjunath Omana Kuttan (Frankfurt Institute for Advanced Studies)
    • 11:00
      Coffee break
    • 8
      Generating Accurate Showers in Highly Granular Calorimeters Using Convolutional Normalizing Flows

      The full simulation of particle colliders incurs a significant computational cost. Among the most resource-intensive steps are detector simulations. It is expected that future developments, such as higher collider luminosities and highly granular calorimeters, will increase the computational resource requirement for simulation beyond availability. One possible solution is generative neural networks that can accelerate simulations. Normalizing flows are a promising approach. It has been previously demonstrated, that such flows can generate showers in calorimeters with high accuracy. However, the main drawback of normalizing flows with fully connected sub-networks is that they scale poorly with input dimensions. We overcome this issue by using a U-Net based flow architecture and show how it can be applied to accurately simulate showers in highly granular calorimeters.

      Speaker: Mr Thorsten Lars Henrik Buss (UNI/EXP (Uni Hamburg, Institut fur Experimentalphysik))
    • 9
      CaloClouds: Fast Geometry-Independent Highly-Granular Calorimeter Simulation

      Simulating showers of particles in highly granular detectors is a key frontier in applying machine learning to particle physics. Achieving high accuracy and speed with generative machine learning models would enable them to augment traditional simulations and alleviate a significant computing constraint. This contribution marks a significant breakthrough in this task by directly generating a point cloud of O(1000) space points with energy depositions in the detector in 3D-space. Importantly, it achieves this without relying on the structure of the detector layers. This capability enables the generation of showers with arbitrary incident particle positions and accommodates varying sensor shapes and layouts. Two key innovations make this possible: i) leveraging recent improvements in generative modeling, we apply a diffusion model to ii) an initially even higher-resolution point cloud of up to 40,000 GEANT4 steps. These steps are subsequently down-sampled to the desired number of up to 6000 space points. We demonstrate the performance of this approach by simulating photon showers in the planned electromagnetic calorimeter of the International Large Detector (ILD), achieving overall good modeling of physically relevant distributions.

      Speaker: Anatolii Korol (FTX (FTX Fachgruppe SFT))
    • 13:00
      Lunch
    • 10
      Out-of-Distribution Multi-set Generation with Context Extrapolation for Amortized Simulation and Inverse Problems

      Addressing the challenge of Out-of-Distribution (OOD) multi-set generation, we introduce YonedaVAE, a novel equivariant deep generative model inspired by Category Theory, motivating the Yoneda-Pooling mechanism. This approach presents a learnable Yoneda Embedding to encode the relationships between objects in a category, providing a dynamic and generalizable representation of complex relational data sets. YonedaVAE introduces a self-distilled multi-set generator, capable of zero-shot creating multi-sets with variable inter-category and intra-category cardinality, facilitated by our proposed Adaptive Top-q Sampling. We demonstrate that YonedaVAE can produce new point clouds with cardinalities well beyond the training data and achieve context extrapolation. Trained on low luminosity ultra-high-granularity data of Pixel Vertex Detector (PXD) detector at Belle II with $O(100)$ cardinality, YonedaVAE can generate high luminosity valid signatures with $O(10^5)$ cardinality and correct intra-event correlation without exposure to similar data during training. Being able to generalize to OOD samples, YonedaVAE stands as a valuable method for extrapolative multi-set generation tasks and inverse problems in scientific discovery, including de novo protein design, Drug Discovery, and simulating geometry-independent detector responses beyond experimental limits.

      Speaker: Hosein (Baran) Hashemi (ORIGINS Cluster)
    • 11
      Pixel Vertex Detector background generation with Generative Adversarial Network

      The Pixel Vertex Detector (PXD) is the innermost detector of the Belle
      II experiment. Information from the PXD, together with data from
      other detectors, allows to have a very precise vertex reconstruction.
      The effect of beam background on reconstruction is studied by adding
      measured or simulated background hit patterns to hits produced by
      simulated signal particles. This requires a huge sample of statistically
      independent PXD background noise hit patterns to avoid systematic
      biases, resulting in a huge amount of storage due to the high granular-
      ity of the PXD sensors. As an efficient way of producing background
      noise, we explore the idea of an on-demand PXD background genera-
      tor realised using Generative Adversarial Networks (GANs). In order
      to evaluate the quality of generated background we measure physical
      quantities which are sensitive to the background in the PXD.

      Speaker: Fabio Novissimo (BELLE (BELLE II Experiment))
    • 15:45
      Coffee break
    • 12
      Generative Unfolding with Conditional Neural Networks

      When doing analyses in particle physics we are often faced with the task of correcting our reconstructed observables for detector effects, commonly known as unfolding. While traditional unfolding methods are restricted to binned distributions of a single observable, ML-based methods enable unbinned, high-dimensional unfolding over the entire phase space. In this talk I will introduce generative unfolding where a conditional neural network is used to learn the unfolded distribution conditioned on the reconstructed one.

      Speaker: Sofia Palacios Schweitzer (Universität Heidelberg)
    • Discussion
    • 19:00
      Workshop Dinner

      TBA

    • 13
      Accelerating HEP simulations with Neural Importance Sampling

      Virtually all high-energy-physics (HEP) simulations for the LHC rely on Monte Carlo using importance sampling by means of the VEGAS algorithm. However, complex high-precision calculations have become a challenge for the standard toolbox.
      As a result, there has been keen interest in HEP for modern machine learning to power adaptive sampling. Despite previous work proving that normalizing-flow-powered neural importance sampling (NIS) sometimes outperforms VEGAS, existing research has still left major questions open, which we intend to solve by introducing ZüNIS, a fully automated NIS library.
      We first show how to extend the original formulation of NIS to reuse samples over multiple gradient steps, yielding a significant improvement for slow functions. We then benchmark ZüNIS over a range of problems and show high performance with limited fine-tuning. The library can be used by non-experts with minimal effort, which is crucial to become a mature tool for the wider HEP public.

      Speaker: Niklas Götz
    • 14
      Machine-learning off-shell effects in top quark production

      Off-shell effects in large LHC backgrounds are crucial for precision predictions and, at the same time, challenging to simulate. We show how a generative diffusion network learns off-shell kinematics given the much simpler on-shell process. The idea behind this sampling from on-shell events is that the generative network does not have to reproduce the on-shell features and can focus on the additional and relatively smooth off-shell extension. It generates off-shell configurations fast and precisely, while reproducing even challenging on-shell features.

      Speaker: Mathias Kuschick (Universität Münster)
    • 11:00
      Coffee break
    • Discussion: Closing