Anomaly Detection Mini-Workshop -- LHC Summer Olympics 2020

Europe/Berlin
Virtual

Virtual

Ben Nachman, David Shih, Gregor Kasieczka (Institut fuer Experimentalphysik / UHH)
Description

Despite an impressive and extensive effort by the LHC collaborations, there is currently no convincing evidence for new particles produced in high-energy collisions. At the same time, there has been a growing interest in machine learning techniques to enhance potential signals using all of the available information.

Following the successful conclusion of the LHC Olympics 2020: Winter Games (at the ML4Jets Workshop at NYU in January 2020), we are announcing a two day mini workshop devoted to anomaly detection: LHC Olympics 2020: Summer Games. With this mini-workshop we hope to continue progress on the topic of model independent discovery of new physics at the LHC.

The key goal of this workshop is to discuss progress and new methods for new physics searches at the LHC using unsupervised machine learning. We will also unblind results for the remaining LHC Olympics 2020 black-box datasets. But you are encouraged to give a talk on anomaly detection regardless of your participation level in the LHCO2020.

For more information on the LHC Olympics see the page. Please do not hesitate to ask questions: we will use the ML4Jets slack channel to discuss technical questions related to this challenge. You are also encouraged to sign up for the mailing list lhc-olympics@cern.ch using the e-groups.cern.ch interface for infrequent announcements and communications.


The workshop is a satellite of the (virtual) BOOST conference.

For this informal and virtual workshop, there will be no registration fee. However, please register so we can plan appropriately.

The workshop will take place virtually on Thursday 16.7.2020 and Friday 17.7.2020.

The Zoom link for the workshop is at:
https://uni-hamburg.zoom.us/j/93758369770?pwd=Tzc2M3cxTU42TzV6d2FoMFZKK05Gdz09
(Password: lhco2020!!)

 

Best,
Ben Nachman(benjamin.philip.nachman@cern.ch),
David Shih (shih@physics.rutgers.edu), and
Gregor Kasieczka (gregor.kasieczka@uni-hamburg.de)

Participants
  • Abhisek Saha
  • Aditi Jaiswal
  • Alan Litke
  • Aleks Smolkovic
  • Alessandro Morandini
  • Alexandre Alves
  • Alfredo Castaneda
  • Amandeep Singh Bakshi
  • Andrea De Simone
  • Anna Ferrari
  • Annapaola de Cosa
  • Antonio Giannini
  • Anushree Ghosh
  • Arturo Sanchez
  • Baptiste Ravina
  • Barry Dillon
  • BELFKIR Mohamed
  • Benjamin Nachman
  • Benjamin Tannenwald
  • Berare Göktürk
  • Bora Isildak
  • Bryan Ostdiek
  • Charanjit Kaur
  • Christina Gao
  • Christoph Obermair
  • Clara Nellist
  • Claudius Krause
  • Cosmos Dong
  • Craig Bower
  • Cristina Mantilla Suarez
  • Daniel Noel
  • DARIUS FAROUGHY
  • David Jaroslawski
  • David Shih
  • Debajyoti Sengupta
  • Dimitri Bourilkov
  • Dimitris Proios
  • Disha Bhatia
  • Edson Carquin
  • Edward Ramirez
  • Eric Kuflik
  • Erik Buhmann
  • Flavia de Almeida Dias
  • Frederic Dreyer
  • Gabriele Benelli
  • georgios karathanasis
  • Gregor Kasieczka
  • Grigorios Chachamis
  • Heiko Mueller
  • Honey Gupta
  • Huilin Qu
  • Ibrahim Mirza
  • Ines Ochoa
  • Ioan-Mihail Dinu
  • Ivan Sayago Galvan
  • Jacinthe Pilette
  • Jack Collins
  • James Walder
  • Jan Offermann
  • Javier Duarte
  • Jean-Francois Arguin
  • Jeremi Niedziela
  • Jesse Thaler
  • Joe Davies
  • Johan Sebastian Bonilla
  • John Rodriguez
  • Johnny Raine
  • Jose Salt
  • Joshua Isaacson
  • Julia Gonski
  • Julien Donini
  • Julien Leissner-Martin
  • Julio Lozano-Bahilo
  • Justin Tan
  • Karla Pena
  • KC Kong
  • Kees Benkendorfer
  • Kinga Anna Wozniak
  • Koji Terashi
  • Konstantinos Christoforou
  • Leonardo Cristella
  • Leonid Didukh
  • Lihan Liu
  • Liliana Teodorescu
  • Louis Vaslin
  • Luc Le Pottier
  • Luca Federici
  • Luca Fiorini
  • Luke Kreczko
  • Manimala Mitra
  • Manuel Sommerhalder
  • Marat Freytsis
  • Marcella Bona
  • Maria Clemencia Mora Herrera
  • Mark Samuel Abbott
  • Martina Fumanelli
  • Matthew Buckley
  • Matthias Schlaffer
  • Max Mihailescu
  • Maxx Richard Rahman
  • Michael Tran
  • Mihoko Nojiri
  • Mikaeel Yunus
  • Mitch Weikert
  • Mohsen Ghazi
  • Monica Dunford
  • Nicholas Carrara
  • Nicole Stefanov
  • Nilanjana Kumar
  • Nuno Castro
  • Othmane Rifki
  • Oz Amram
  • Pablo Martin
  • Patricia Rebello Teles
  • Philip Harris
  • Pirmin Berger
  • Pradeep Jasal
  • Prasanth Shyamsundar
  • Qiang Li
  • Raquel Pezoa
  • Riccardo Maganza
  • Riccardo Maganza
  • Roberto Morelli
  • ROJALIN PADHAN
  • Rotem Mayo
  • Rotem Ovadia
  • Roy Gusinow
  • Roy Lemmon
  • Rui Zhang
  • Sangeon Park
  • Saranya Samik Ghosh
  • Sascha Diefenbacher
  • Satyarth Praveen
  • Savannah Thais
  • Sezen Sekmen
  • Shalini Epari
  • Sijun Xu
  • Silviu-Marian Udrescu
  • Sitong An
  • Slava Voloshynovskiy
  • Sohrab Ferdowsi
  • Sotiroulla Konstantinou
  • Stephen Menary
  • Steven Tsan
  • Sung Hak Lim
  • Surbhit Sinha
  • Sven Bollweg
  • Tao Xu
  • Taoli Cheng
  • Tasnuva Chowdhury
  • Thabang Lebese
  • Tianji Cai
  • Tilman Plehn
  • Tisa Biswas
  • Tobias Golling
  • Tobias Lösche
  • Tommaso Dorigo
  • Utkarsh Nawalgaria
  • Vinicius Mikuni
  • Vitaliy Kinakh
  • Xavier Coubez
  • Xola Mapekula
  • Yuri gershtein
    • 16:00 19:00
      Anomaly Detection
      • 16:00
        Introduction 20m
        Speakers: Ben Nachman (Lawrence Berkeley National Laboratory), David Shih (Rutgers University), Gregor Kasieczka (Institut fuer Experimentalphysik / UHH)
        Slides
      • 16:20
        Dijet resonance search with weak supervision using sqrt(s)=13 TeV TeV pp collisions in the ATLAS detector 20m
        This Letter describes a search for resonant new physics using a machine-learning anomaly detection procedure that does not rely on a signal model hypothesis. Weakly supervised learning is used to train classifiers directly on data to enhance potential signals. The targeted topology is dijet events and the features used for machine learning are the masses of the two jets. The resulting analysis is essentially a three-dimensional search $A\rightarrow BC$, for $m_A$∼$\mathcal{O}$(TeV), $m_B,m_C$~$\mathcal{O}$(100 GeV) and $B$,$C$ are reconstructed as large-radius jets, without paying a penalty associated with a large trials factor in the scan of the masses of the two jets. The full Run 2 $\sqrt{s}=13$ TeV pp collision data set of 139 fb$^{−1}$ recorded by the ATLAS detector at the Large Hadron Collider is used for the search. There is no significant evidence of a localized excess in the dijet invariant mass spectrum between 1.8 and 8.2 TeV. Cross-section limits for narrow-width $A$, $B$, and $C$ particles vary with $m_A$, $m_B$, and $m_C$. For example, when $m_A=3$ TeV and $m_B\geq 200$ GeV, a production cross section between 1 and 5 fb is excluded at 95% confidence level, depending on mC. For certain masses, these limits are up to 10 times more sensitive than those obtained by the inclusive dijet search.
        Speakers: Flavia Dias (Nikhef), Dr Flavia de Almeida Dias (Nikhef)
        Slides
      • 16:40
        Anomaly detection with convolutional autoencoders and latent space analysis 20m
        We build on the previous application of auto-encoders to particle physics by including an analysis of the latent space variables.
        Speakers: David Jaroslawski (Rutgers University), Kevin Nash (Rutgers University)
        Slides
      • 17:00
        Anomaly Searches with Tag N' Train 20m
        We have previously proposed Tag N' Train as a new technique for anomaly searches that utilizes the Classification Without Labels method of training on data in a novel way. I will overview the technique and discuss what lessons we learned from the results of Blackbox 1. I will also discuss what questions we would like to answer about the technique going forward and other potential applications.
        Speaker: Oz Amram (Johns Hopkins University)
        Slides
      • 17:20
        Anomaly Detection with Normalizing Flows and Latent Variable Models 20m
        In this note we review model-agnostic approaches to searches for new physics signatures using normalizing flow and latent variable models. We also propose a bootstrap method that allows us to estimate the continuous densities that form the likelihood ratio between the background and signal-plus-background hypothesis with minimal a priori knowledge of the signal structure.
        Speaker: Justin Tan (Melbourne)
        Slides
      • 17:40
        Break 20m
      • 18:00
        Learning the latent structure of collider events 20m
        We describe a technique to learn the underlying structure of collider events directly from the data, without having a particular theoretical model in mind. It allows to infer aspects of the theoretical model that may have given rise to this structure, and can be used to cluster or classify the events for analysis purposes. The unsupervised machine-learning technique is based on the probabilistic (Bayesian) generative model of Latent Dirichlet Allocation. We pair the model with an approximate inference algorithm called Variational Inference, which we then use to extract the latent probability distributions describing the learned underlying structure of collider events. We provide a detailed systematic study of the technique using two example scenarios to learn the latent structure of di-jet event samples made up of QCD background events and either a pair-produced top-quark signal or a new physics W' signal. We also present results from using this technique to infer new physics from the LHC Olympics datasets.
        Speaker: Dr Barry Dillon (Jozef Stefan Institute)
        Slides
      • 18:20
        Anomaly detection and embedding clustering 20m
        We propose a method for unsupervised multiclass classification based on the clustering of events in the embedding space. We show how the method creates unsupervised clusters for different processes and how new physics can be studies with this strategy.
        Speaker: Vinicius Mikuni (UZH)
        Slides
      • 18:40
        Deep Learning as a Tool for Generic Searches at Colliders 20m
        In a previous paper we observed that Deep Neural Networks trained on specific signals still performed well in discriminating new signals unseen during training, indicating the transferrable nature of Deep Learning in HEP applications and their potential to perform model-independent searches in the LHC data. Recently, we explored semi-supervised learning techniques - both shallow and deep - and compared their performance at identifying BSM test signals to the ones obtained with the fully supervised counterpart. In particular, we analysed the recently proposed Deep Support Vector Data Description (DeepSVDD) algorithm, which is specifically trained for outlier identification, unlike the Autoencoder family popularly used for anomaly detection.
        Speaker: Rute Pedro (LIP -Laboratorio de Instrumentacao e Fisica Experimental de Particulas)
        Slides
    • 15:40 19:00
      Anomaly Detection
      • 15:40
        Via Machinae: Anomaly Detection of Stellar Streams 20m
        The Gaia space telescope is mapping the kinematics of the nearest and brightest billion stars in the Milky Way with unprecedented precision. Structures such as streams and tidal debris in the star's phase space can provide evidence for the assembly history of the Galaxy, and perhaps reveal information about the particle physics of dark matter. Identifying such structures in the high-dimensional data set is non-trivial. We apply the ANODE anomaly-detection technique, designed for model-independent searches at the LHC, successfully identifying stellar streams in the Gaia data. Such flexible unsupervised techniques have great potential to assist in the study of these complex astrophysical data sets.
        Speaker: Prof. Matthew Buckley
        Slides
      • 16:00
        Anomaly detection with RanBox 20m
        The search for anomalies in HEP data has to reckon with large-dimensional spaces, with features whose PDF varies widely over their support. RanBox addresses this challenge by the combination of PCA and the integral transform, flattening all marginals and then searching for overdensities in the copula space. The algorithm is able to spot injected signals of down to a few permille fractions in toy examples with few tens of thousand events in 20-dimensional spaces.
        Speaker: Tommaso Dorigo (INFN - sezione di Padova)
        Slides
      • 16:20
        Anomaly Awareness for new physics searches 20m
        In this talk we will present a new algorithm to search for new physics called Anomaly Awareness. By making our algorithm 'aware' of the presence of a range of different anomalies, we improve its capability to detect anomalous events even when it hasn't been exposed to them in the past. As an example, we apply this method to boosted jets and use it to uncover new resonances or EFT effects.
        Speaker: Charanjit Kaur Khosa (University of Sussex)
        Slides
      • 16:40
        QUAK : Quasi Anomalous Knowledge for Anomaly Detection 20m
        For many classes of new physics models, there is a broad set of underlying physics features we can assume about any new signal. With QUAK, we aim to embed these assumptions into our search while still preserving the model-independence of the search. The development of this approach would thus open an avenue of quasi-model dependent searches, which we believe can build a bridge between the conventional new physics searches at the LHC with fully model independent searches. We will present our idea in the context of an ongoing data challenge, the 2020 LHC Olympics. A toy study with the MNIST dataset that showcases QUAK will also be shown.
        Speaker: Mr Sangeon Park (Massachusetts Institute of Technology)
        Slides
      • 17:00
        Event-level Anomaly Detection methods using reconstruction error and likelihood 20m
        This talk presents a summary of our tested anomaly detection models. We are studying the performance of two approaches on the LHC Olympics datasets, one based on Adversarial Autoencoders and the other using a Normalizing Flows method (arxiv:2003.13913). Combining those methods with traditional “bump-hunting” algorithms (​https://github.com/lovaslin/pyBumpHunter​) , we attempt to uncover the mysteries of the LHCO Black boxes.
        Speaker: Ioan Dinu (CERN)
        Slides
      • 17:20
        Break 20m
      • 17:40
        LHCO-motivated anomaly detection exploration 20m
        Motivated by the LHCOlympics games, we explore a few prototyping anomaly detection methods, potentially including Variational Autoencoder based anomalous jet taggers (https://arxiv.org/abs/2007.01850).
        Speaker: Taoli Cheng (Mila, University of Montreal)
        Slides
      • 18:00
        Anomaly Detection via Sequence Modeling 20m
        We present results of an anomaly detection method using a Variational Recurrent Neural Network trained on the constituent 4-vectors of large-radius jets. By training on a contaminated dataset of largely light QCD jets with some small amount of signal events, we can identify potential new physics objects due to their unique substructure without the need of a pre-determined model hypothesis. We focus on the sequence modeling aspects of this approach, including considerations in pre-processing and sequence ordering of the large-radius jet constituents, and how they affect the performance of the model. We assess the improvement in performance due to these optimizations in the context of the LHC Olympics Black Box datasets from the ML4Jets2020 Workshop in January, 2020.
        Speaker: Alan Kahn (Columbia University)
        Slides
      • 18:20
        Summary and Outlook 20m
        Speakers: Ben Nachman (Lawrence Berkeley National Laboratory), David Shih (Rutgers University), Gregor Kasieczka (Institut fuer Experimentalphysik / UHH)
        Slides
      • 18:40
        Discussion 20m