The inaugural symposium of the Center for Data and Computing in Natural Sciences (CDCS) is scheduled from 26-28 April 2022 on the Science City Hamburg Bahrenfeld (SCHB) campus, in Hamburg, Germany.
The theme of the symposium will be "Data Science for Cross-Disciplinary Research", which will bring together ~150 computational scientists in the fields of physics, biology and engineering in a discussion of how computational methods can be used in these multidisciplinary fields, and bring opportunities for new collaborations.
Sofia Vallecorsa, an expert in Machine Learning and Quantum Computing at CERN openlab, will be the opening keynote speaker.
The CDCS is a new interdisciplinary joint facility of the Universität Hamburg, Deutsches Elektronen-Synchrotron (DESY), and the Hamburg University of Technology, that aims to combine scientific research with state-of-the-art information technology. The CDCS initially consists of four application-focused, cross-disciplinary laboratories (CDLs), which are supported by a Computational Core Unit (CCU). The CDLs focus on the following areas:
The overall aim is to significantly strengthen the conditions for excellent research at the SCHB in the field of computation. The CDCS symposium is projected to present the latest advances in the participating research groups of the CDCS, as well as a venue for new collaborations and unconventional, cross-disciplinary problem solving.
Greetings from Senator Katharina Fegebank (BWFGB), UHH president Hauke Heekeren, DESY director Beate Heinemann, TUHH president Andreas Timm-Giel
Welcome from CDCS spokesperson Matthias Rarey
Session Chair: Matthias Rarey
Physical and Natural sciences in general have entered the domain of Big Data and the quantity, complexity and rate at which data are produced, analyzed or needed to be simulated,
is imposing considerable stress on conventional computing to extract relevant scientific information in a timely manner. Novel, disruptive techniques such as Artificial Intelligence
and, at a different level Quantum Computing, open the window for entirely new paradigms in computing.
Artificial Intelligence (AI) allows to analyze huge amounts of data, uncovering hidden correlations even without pre-existing knowledge. Among AI methods, the so-called generative models are a set of incredibly powerful techniques that learn an internal representation of a data set and use it to uncover processes and correlations that characterize the system opening novel insights and high precision.
Generative modelling will change the way of how we are doing science nowadays: “a third approach between observation and simulation”. The number of generative modelling applications in science, industry and society at large is increasing at an incredible speed.
In recent years, also Quantum Computing (QC) has developed from the stage of laboratory experiments to an amazing and rapidly evolving research field. With the increasing availability of quantum computers in the range of 10-100 qubits, a large variety of practical applications is being explored, targeting both near-term noisy quantum computer hardware employing error correcting methods as well as future fault-tolerant quantum computers.
This presentation will outline how techniques, such as AI and QC, are changing the way we formulate problems, validate solutions, and design the computing models for Natural
Sciences.
The rapid evolution in AI as well as in QC asks for technology synergies with industry at the earliest stage. Examples on how the CERN Openlab is cooperating and jointly developing
applications with partners in industry is lined out.
In recent years, the fields of data science and bio science have been growing closer together; many state-of-the-art approaches for DNA sequencing and metagenomics, for example, make use of machine learning. However, the two disciplines often just apply techniques or use cases from the other field without truly engaging in the details, particularities, and opportunities that the most recent developments in both fields offer. This discrepancy is what motivated the Darkmatter project at Aalborg University (https://darkmatter.aau.dk/), an interdisciplinary effort where researchers from data and bio science are working together to exploit the latest state of the art in their respective fields to jointly accelerate the rate of exploration of the vast space of unexplored and unknown microbes: the microbial dark matter. In this talk, I will discuss some of the challenges and opportunities in our joint effort with a particular focus on our recent advances on metagenomic binning.
Moderated by Marc Wenskat.
A scientist introduces a topic or situation in a scientific context. Then, they present one or more statements and the audience must guess whether these statements are True or False.
The amount, size, and complexity of astronomical data-sets is growing rapidly in the last decades. Now, with new technologies and dedicated survey telescopes, the databases are even growing faster. Besides dealing with poly-structed and complex data, sparse data has become a field of growing scientific interest. By applying technologies from the fields of computer sciences, mathematics, and statistics, astronomical data can be accessed and analyzed more efficiently.
A specific field of research in Astroinformatics is the estimation of the redshift of extra-galactic sources, a measure of their distance, by just using sparse photometric observations. Observing the full spectroscopic information that would be necessary to directly measure the redshift, would be too time consuming. Therefore building accurate statistical models is a mandatory step, especially when it comes to reflecting the uncertainty of the estimates. Statistics and especially weather forecasting has introduced and utilized proper scoring rules and especially the continuous ranked probability score to characterize the calibration as well as the sharpness of predicted probability density functions.
This talk presents what we achieved when using proper scoring rules to train deep neural networks and to evaluate the model estimates. We present how this work led from well calibrated redshift estimates to an improvement in statistical post-processing of weather forecast simulations. The presented work is an example of interdisciplinarity in data-science and how methods can bridge between different fields of application.
In the last 15 years there has been a spectacular rise of large data volumes acquired in X-ray diffraction experiments. In 2006, around the time I started my PhD, the world’s first soft X-ray free-electron laser, FLASH in Hamburg, was collecting diffraction patterns at roughly 1 Hz. Nowadays we're collecting data at the European XFEL at a peak rate into the megahertz.
This has enabled the development of new techniques which exploit this richness and were not possible before. At the same time this has brought enormous challenges to a community that had relatively little experience handling such large data quantities.
In this talk I will present the evolution of coherent X-ray imaging, and in particular ultrafast X-ray diffractive Imaging experiments, and discuss what new techniques might be over the horizon and how to best make use of this wealth of data to extract as much new knowledge as possible.
Rapidly growing Omics data are providing an unprecedented opportunity to gain novel insights into biological systems and disease processes. Network modeling is a powerful approach that can be used to integrate complex information from multiple types of Omics data. In the field of network medicine, our group has developed a suite of methods that support: (1) effective integration of multi-omic data to reconstruct gene regulatory networks; (2) network analysis to identify changes in gene regulation between different biological systems or disease states; and (3) modeling of individual-specific networks in order to link regulatory alterations with heterogeneous phenotypes. In this talk, I will review several of these methods and describe specific applications in which we have used these approaches to understand the complex regulatory processes at work across different biological states, diseases, and individuals.
You are invited to discuss 10 current problems from the 5 interdisciplinary units. Move freely between the standing tables in the marquee and share your thoughts.
Monitoring is a key element in guaranteeing the state of health of a system, all the more important when the system is critical, autonomous, and/or operating remotely. Anomaly detection and diagnosis are two main aspects. While model-based approaches have been around for a long time, they have been challenged in recent years by data-based approaches which proceed with an exploration of historical data to infer, by learning, a model.
Most systems are subject to multiple variations because they operate in evolving environments and may suffer ageing or unexpected situations. Evolving environments and dynamicity challenge machine learning researchers with nonstationary data flows where the concepts being tracked can change over time. In this regard, dynamic clustering algorithms have been developed to be able to perform state tracking and online anomaly detection in such contexts.
In this talk, I will present the principles of a dynamic clustering approach to track evolving environments that uses a two-stages distance-based and density-based clustering algorithm. I will explain how these principles can be used to develop an online method of double anomaly detection adapted to the requirements of on-board operation. The objective is to design a software protection component for space electronics against radiation faults, a project that we are carrying out in partnership with the French Space Agency CNES.
What is controlled and how does it work.
Current and future impact of Data Science/Machine Learning/Deep Learning on natural sciences
Moderated by Klaus Ehret.
Six panelists with expertise in natural sciences and/or computer science will discuss the similarities and differences between their fields, ubiquitous challenges of the digitalisation era, and the convergence and divergence they see between their fields and computer science.
EDIT: see update below
Choose your tour at the registration desk on day 1:
Update:
- CSSB tour begin 15:30 inside CSSB lobby
- FLASH tour begin 15:30 just outside the CSSB lobby
- PETRA III begin 15:30 just outside the CSSB lobby
- LUX and DESY Control room begin 15:30 just outside the CSSB lobby
- EuXFEL begin 15:30 in Schenefeld (common trip to EuXFEL leaves from CSSB lobby at 14:40)
- Photon Science general tour cancelled due to no sign up
- Computing center (IDAF) & Testbeam tour cancelled due to no sign up