Next Generation Environment for Interoperable Data Analysis - Expert Workshop

Seminarraum "Kino" (HZB Berlin)

Seminarraum "Kino"

HZB Berlin

Helmholtz-Zentrum Berlin Gebäude 13.10 Magnusstraße 2 12489 Berlin

Next Generation Environment for Interoperable Data Analysis - Expert Workshop


Volume, velocity and variety of data has dramatically increased over the last decade, which -- at least in principle -- enables unprecedented and collaborative research. Already today, data volumes are too large to be stored and processed by individual scientists or institutional groups. Moreover the data volume is expected to dramatically increase with the next generation of scientific facilities. This calls for a collaborative strategy and effort from all ErUM communities in order to manage the access to these large amounts of data. 

This workshop addresses scientists from *all* ErUM communities in order to obtain a collective understanding on how data are acquired, stored, accessed and analyzed in the different communities. We will focus on existing solutions and real-world experience as well as on the requirements of the individual ErUM communities. 

A dedicated session will analyse common requirements and design patterns and discuss strategic synergies for the Next Generation Environment for Interoperable Data Analysis. 

The workshop takes place on May 3rd/4th in Berlin, Albert Einstein Str 15, 12489 Berlin (


Keynote speakers:

  • Verena Kain, CERN, “CERN’s control system approach for machine learning for accelerators”
  •  Mohammad Al Turany, GSI/FAIR: " ALFA: A framework for building distributed applications"
  • Kai Polsterer, "Unsupervised ML to explore data: lessons learned from learning machines"
  • Niclas Eich, RWTH Aachen, "VISPA: Data Analysis in the Web Browser powered by JupyterLab"


World cafes: 

  • Community needs
  • Data Sources
  • Real world experience
  • Design Pattern


Scientific organising committee

  • Judith Reindl
  • Harry Enke
  • Kay Graf
  • Tim Ruhe
  • Pierre Schnizer:




Registraion for Next Generation Environment for Interoperable Data Analysis - Expert Workshop
  • Wednesday, May 3
    • 8:00 AM
      Registration & Coffee
    • DIGUM User interface: introduction
    • Keynote speeches: VISPA (Niclas Eich) / CERN ML (Verena Kain)
    • 11:00 AM
      Coffee Break
    • Keynote speeches: ALFA (Mohammad Al Turany)
      • 3
        ALFA: A framework for building distributed applications

        The ALFA framework is a joint development between ALICE Online-Offline and FairRoot teams. ALFA has a distributed architecture, i.e. a collection of highly maintainable, testable, loosely coupled, independently deployable processes.

        ALFA allows the developer to focus on building single-function modules with well-defined interfaces and operations. The communication between the independent processes is handled by FairMQ transport layer. FairMQ offers multiple implementations of its abstract data transport interface, it integrates some popular data transport technologies like ZeroMQ. But also provides shared memory and RDMA transport (based on libfabric) for high throughput, low latency applications. Moreover, FairMQ allows the single process to use multiple and different transports at the same time.

        FairMQ based processes can be controlled and orchestrated via different systems by implementing the corresponding plugin. However, ALFA delivers also the Dynamic Deployment System (DDS) as an independent set of utilities and interfaces, providing a dynamic distribution of different user processes on any Resource Management System (RMS) or a laptop.

        ALFA is used by different experiments in different stages of data processing as it offers an easy integration of heterogeneous hardware and software. Examples of ALFA usage in different stages of event processing will be presented; in a detector read-out as well as in an online reconstruction and in a pure offline world of detector simulations based on FairRoot.

        Speaker: Mohammad Al-Turany (GSI Helmholtzzentrum für Schwerionenforschung)
    • 12:30 PM
      Lunch break
    • Facility tour
    • 3:00 PM
      Coffee Break
    • Keynote speeches: Unsupervised ML to explore data (Kai Polsterer)
      • 4
        Unsupervised ML to explore data: lessons learned from learning machines

        The amount and size of astronomical data-sets was growing rapidly in the last decades. Now, with new technologies and dedicated survey telescopes, the databases are growing even faster. VO-standards provide uniform access to this data. What is still required is a new way to analyze and tools to deal with these large data resources. E.g., common diagnostic diagrams have proven to be good tools to solve questions in the past, but they fail for millions of objects in high dimensional features spaces. Besides dealing with poly-structed and complex data, the time domain has become a new field of scientific interest.

        By applying technologies from the field of computer sciences, astronomical data can be accessed more efficiently. Machine learning is a key tool to make use of the nowadays freely available datasets. This talk provides an overview of what can be achieved with unsupervised learning techniques, discussed on examples that show, what we learned when using machine learning algorithms on real astronomical data-set.

        Speaker: Kai Polsterer
    • World Cafe
    • 8:30 AM
      Arrival & Coffee
    • DIGUM User interface: introduction: DIGUM Introduction
    • World Cafe: World cafe wrap up
    • 10:30 AM
      Coffee Break
    • Close Out: Close out