# CDCS Opening symposium

Apr 26 – 28, 2022
Europe/Berlin timezone
Thank you for your participation. We greatly enjoyed it.

# Sofia Vallecorsa

Researcher
CERN
Particle Physics - Deep Learning - Artificial Intelligence - Quantum Computing

Tuesday 26th April, 11:30 am - 12:30 pm, CSSB Seminar Room

 Sofia is a leading researcher in the fields of Scientific Computing, Machine Learning and Quantum Computing with applications in High Energy Physics at CERN Openlab. Before joining Openlab, she used her software development prowess in the development of Deep Learning-based technologies for the simulation of particle transport through detectors at CERN. She has profound expertise in Machine Learning/Deep Learning architectures, frameworks and methods for distributed training and hyperparameters optimisation. Currently, she focuses on the intersection between AI and Quantum Computing. Through her involvement in large collaborations, she is also very experienced in both being part of and managing large interdisciplinary collaborations as well as public-private partnerships.

### Abstract

A new era in data science, from AI to Quantum Computing: challenges and opportunities for research and industry.

Physical and Natural sciences in general have entered the domain of Big Data and the quantity, complexity and rate at which data are produced, analyzed or needed to be simulated,
is imposing considerable stress on conventional computing to extract relevant scientific information in a timely manner. Novel, disruptive techniques such as Artificial Intelligence
and, at a different level Quantum Computing, open the window for entirely new paradigms in computing.

Artificial Intelligence (AI) allows to analyze huge amounts of data, uncovering hidden correlations even without pre-existing knowledge. Among AI methods, the so-called generative models are a set of incredibly powerful techniques that learn an internal representation of a data set and use it to uncover processes and correlations that characterize the system opening novel insights and high precision.
Generative modelling will change the way of how we are doing science nowadays: “a third approach between observation and simulation”. The number of generative modelling applications in science, industry and society at large is increasing at an incredible speed.

In recent years, also Quantum Computing (QC) has developed from the stage of laboratory experiments to an amazing and rapidly evolving research field. With the increasing availability of quantum computers in the range of 10-100 qubits, a large variety of practical applications is being explored, targeting both near-term noisy quantum computer hardware employing error correcting methods as well as future fault-tolerant quantum computers.

This presentation will outline how techniques, such as AI and QC, are changing the way we formulate problems, validate solutions, and design the computing models for Natural
Sciences.

The rapid evolution in AI as well as in QC asks for technology synergies with industry at the earliest stage. Examples on how the CERN Openlab is cooperating and jointly developing
applications with partners in industry is lined out.

# Katja Hose

Professor
Aalborg Universitet
Databases - Machine Learning - Knowledge Graphs - Semantic Web

Tuesday 26th April, 13:30 pm - 14:10 pm, CSSB Seminar Room
 Katja is a professor in the Department of Computer Science at Aalborg University, where she is leading the Data, Knowledge, and Web Engineering group. Her work is rooted in databases and graph technologies and spans theory, algorithms, and applications of data science including knowledge management, extraction, querying, analytics, and machine learning. She brings extensive experience in interdisciplinary research gained as part of her collaborations with colleagues from bioscience, medicine, and sustainability assessment.

### Abstract

Data Science meets Microbial Dark Matter

In recent years, the fields of data science and bio science have been growing closer together; many state-of-the-art approaches for DNA sequencing and metagenomics, for example, make use of machine learning. However, the two disciplines often just apply techniques or use cases from the other field without truly engaging in the details, particularities, and opportunities that the most recent developments in both fields offer. This discrepancy is what motivated the Darkmatter project at Aalborg University (https://darkmatter.aau.dk/), an interdisciplinary effort where researchers from data and bio science are working together to exploit the latest state of the art in their respective fields to jointly accelerate the rate of exploration of the vast space of unexplored and unknown microbes: the microbial dark matter. In this talk, I will discuss some of the challenges and opportunities in our joint effort with a particular focus on our recent advances on metagenomic binning.

# Kai Polsterer

Researcher
Heidelberg Institute for Theoretical Studies
Astroinformatics - Machine Learning - Databases - Data Analysis

Tuesday 26th April, 16:30 am - 17:10 pm, CSSB Seminar Room​​​​​​
 Kai heads the first European research group in astroinformatics at the Heidelberg Institute for Theoretical Studies (HITS), which develops and applies state-of-the art machine learning techniques to analyse astronomical datasets, which are often large and have complex structure. Before joining HITS, he was involved in the development of the control software for two of the main instruments (LUCI) at the Large Binocular Telescope (LBT). His adaptation of a self-organising maps algorithm called PINK (Parallelized Rotation/Flipping INvariant Kohonen Maps) is used to do a morphological analysis of e.g. radio galaxies observed by LOFAR or the SKA pathfinders. Aside from his scientific expertise, he is currently vice-president of the International AstroInformatics Association (IAIA) and brings in his experience in collaborative work, as part of the International Virtual Observatory Alliance (IVOA) and the Working Group on Physics, Modern IT and Artificial Intelligence (AKPIG) of the DPG.

### Abstract

From Photometric Redshifts to Improved Weather Forecasts:
an interdisciplinary view on machine learning in astronomy

The amount, size, and complexity of astronomical data-sets is growing rapidly in the last decades. Now, with new technologies and dedicated survey telescopes, the databases are even growing faster. Besides dealing with poly-structed and complex data, sparse data has become a field of growing scientific interest. By applying technologies from the fields of computer sciences, mathematics, and statistics, astronomical data can be accessed and analyzed more efficiently.

A specific field of research in Astroinformatics is the estimation of the redshift of extra-galactic sources, a measure of their distance, by just using sparse photometric observations. Observing the full spectroscopic information that would be necessary to directly measure the redshift, would be too time consuming. Therefore building accurate statistical models is a mandatory step, especially when it comes to reflecting the uncertainty of the estimates. Statistics and especially weather forecasting has introduced and utilized proper scoring rules and especially the continuous ranked probability score to characterize the calibration as well as the sharpness of predicted probability density functions.

This talk presents what we achieved when using proper scoring rules to train deep neural networks and to evaluate the model estimates. We present how this work led from well calibrated redshift estimates to an improvement in statistical post-processing of weather forecast simulations. The presented work is an example of interdisciplinarity in data-science and how methods can bridge between different fields of application.

# Filipe Maia

Professor
Uppsala Universitet
X-ray imaging and diffraction - Phase Retrieval - Free-electron laser - Coherent Diffractive Imaging - Open Science

Wednesday 26th April, 09:00 am - 09:40 am, CSSB Seminar Room
 Filipe is a professor at the Department of Cell and Molecular Biology, Molecular Biophysics at Uppsala University in Sweden, specializing in coherent imaging with X-ray lasers. He was involved in the development of several software packages used in coherent imaging, such as Hummingbird, Hawk, Condor and Cheetah, as well as the creation of the Coherent X-ray Imaging Data Bank. Currently, he develops lensless imaging methods, making use of X-ray free-electron lasers, to explore the complex world of structural dynamics at the nanoscale. He also brings in organization leadership experience as part of the Laboratory of Molecular Biophysics at the Uppsala University, a center of excellence of the Swedish Research Council.

### Abstract

Opportunities and Challenges in the Era of Superluminous Lightsources

In the last 15 years there has been a spectacular rise of large data volumes acquired in X-ray diffraction experiments. In 2006, around the time I started my PhD, the world’s first soft X-ray free-electron laser, FLASH in Hamburg, was collecting diffraction patterns at roughly 1 Hz. Nowadays we're collecting data at the European XFEL at a peak rate into the megahertz.
This has enabled the development of new techniques which exploit this richness and were not possible before. At the same time this has brought enormous challenges to a community that had relatively little experience handling such large data quantities.
In this talk I will present the evolution of coherent X-ray imaging, and in particular ultrafast X-ray diffractive Imaging experiments, and discuss what new techniques might be over the horizon and how to best make use of this wealth of data to extract as much new knowledge as possible.

# Kimberly Glass

Associate Professor
Harvard Medical
Computational Biology - Gene Regulatory Networks - Network Modeling - Network Medicine

Wednesday 27th April, 11:00 am - 11:40 am, CSSB Seminar Room
 Kimberly is an Assistant Professor and Associate Scientist in the Channing Division of Network Medicine at Brigham and Women's Hospital, a teaching hospital of Harvard Medical School. Her research group develops methods and computational tools to integrate multiple sources of ‘omics data in order to build an understanding of how biological mechanisms and contexts affect gene regulatory networks. As such, her research lies at the intersection of network analysis, biology, and translational medicine. She also brings extensive experience in working with both medical doctors and bench biologists, to both guide methodological development and give biological insight into their discoveries from network analysis.

### Abstract

Rapidly growing Omics data are providing an unprecedented opportunity to gain novel insights into biological systems and disease processes. Network modeling is a powerful approach that can be used to integrate complex information from multiple types of Omics data. In the field of network medicine, our group has developed a suite of methods that support: (1) effective integration of multi-omic data to reconstruct gene regulatory networks; (2) network analysis to identify changes in gene regulation between different biological systems or disease states; and (3) modeling of individual-specific networks in order to link regulatory alterations with heterogeneous phenotypes. In this talk, I will review several of these methods and describe specific applications in which we have used these approaches to understand the complex regulatory processes at work across different biological states, diseases, and individuals.

# Louise Travé-Massuyès

Research Director
LAAS-CRNS
Data based diagnosis - Machine Learning - Monitoring and Health Management

Wednesday 27th April, 13:30 am - 14:10 pm, CSSB Seminar Room
 Louise is a Directeur de Recherche at Laboratoire d'Analyse et d'Architecture des Systèmes, Centre National de la Recherche Scientifique in Toulouse in France. Louise's research interests are in dynamic systems' supervision with a focus on qualitative, model-based methods and data mining. She has been particularly active in bridging the AI and Control Engineering Diagnosis fields.

### Abstract

Dynamic Clustering for Anomaly Detection and Diagnosis

Monitoring is a key element in guaranteeing the state of health of a system, all the more important when the system is critical, autonomous, and/or operating remotely. Anomaly detection and diagnosis are two main aspects. While model-based approaches have been around for a long time, they have been challenged in recent years by data-based approaches which proceed with an exploration of historical data to infer, by learning, a model.

Most systems are subject to multiple variations because they operate in evolving environments and may suffer ageing or unexpected situations. Evolving environments and dynamicity challenge machine learning researchers with nonstationary data flows where the concepts being tracked can change over time. In this regard, dynamic clustering algorithms have been developed to be able to perform state tracking and online anomaly detection in such contexts.

In this talk, I will present the principles of a dynamic clustering approach to track evolving environments that uses a two-stages distance-based and density-based clustering algorithm. I will explain how these principles can be used to develop an online method of double anomaly detection adapted to the requirements of on-board operation. The objective is to design a software protection component for space electronics against radiation faults, a project that we are carrying out in partnership with the French Space Agency CNES.