Next Generation Environment for Interoperable Data Analysis - Expert Workshop

Name: Next Generation Environment for Interoperable Data Analysis - Expert Workshop
Start: 2023-05-03T07:00:00+02:00
End: 2023-05-04T13:00:00+02:00
Location: HZB Berlin

3 May 2023, 07:00 → 4 May 2023, 13:00 Europe/Berlin

Seminarraum "Kino" (HZB Berlin)

Seminarraum "Kino"

HZB Berlin

Helmholtz-Zentrum Berlin Gebäude 13.10 Magnusstraße 2 12489 Berlin

Description

Next Generation Environment for Interoperable Data Analysis - Expert Workshop

Volume, velocity and variety of data has dramatically increased over the last decade, which -- at least in principle -- enables unprecedented and collaborative research. Already today, data volumes are too large to be stored and processed by individual scientists or institutional groups. Moreover the data volume is expected to dramatically increase with the next generation of scientific facilities. This calls for a collaborative strategy and effort from all ErUM communities in order to manage the access to these large amounts of data.

This workshop addresses scientists from *all* ErUM communities in order to obtain a collective understanding on how data are acquired, stored, accessed and analyzed in the different communities. We will focus on existing solutions and real-world experience as well as on the requirements of the individual ErUM communities.

A dedicated session will analyse common requirements and design patterns and discuss strategic synergies for the Next Generation Environment for Interoperable Data Analysis.

The workshop takes place on May 3rd/4th in Berlin, Albert Einstein Str 15, 12489 Berlin (https://indico.desy.de/event/37379/)

Keynote speakers:

Verena Kain, CERN, “CERN’s control system approach for machine learning for accelerators”
Mohammad Al Turany, GSI/FAIR: " ALFA: A framework for building distributed applications"
Kai Polsterer, "Unsupervised ML to explore data: lessons learned from learning machines"
Niclas Eich, RWTH Aachen, "VISPA: Data Analysis in the Web Browser powered by JupyterLab"

World cafes:

Community needs
Data Sources
Real world experience
Design Pattern

Scientific organising committee

Judith Reindl
Harry Enke
Kay Graf
Tim Ruhe
Pierre Schnizer:

Contact

info@erumdatahub.de

Registration

Registraion for Next Generation Environment for Interoperable Data Analysis - Expert Workshop

Wednesday 3 May
- Wed 3 May
- Thu 4 May
- 08:00 → 08:30
  
  Registration & Coffee 30m
- 08:30 → 09:00
  
  DIGUM User interface: introduction
  
  Workshop_Berlin_2023_Tim_Ruhe.pdf
- 09:00 → 11:00
  Keynote speeches: VISPA (Niclas Eich) / CERN ML (Verena Kain)
  - 09:00
    
    VISPA: Data Analysis in the Web Browser powered by JupyterLab 45m
    
    Speaker: Niclas Eich (RWTH Aachen University)
    
    230503_vispa_data_analysis_in_the_browser.pdf
  - 10:00
    
    CERN’s control system approach for machine learning for accelerators 45m
    
    Speaker: Dr Verena Kain (CERN)
    
    CERN_controls4ML_Berlin_VKMay23.pdf
- 11:00 → 11:30
  
  Coffee Break 30m
- 11:30 → 12:30
  Keynote speeches: ALFA (Mohammad Al Turany)
  - 11:30
    
    ALFA: A framework for building distributed applications 45m
    
    The ALFA framework is a joint development between ALICE Online-Offline and FairRoot teams. ALFA has a distributed architecture, i.e. a collection of highly maintainable, testable, loosely coupled, independently deployable processes.
    
    ALFA allows the developer to focus on building single-function modules with well-defined interfaces and operations. The communication between the independent processes is handled by FairMQ transport layer. FairMQ offers multiple implementations of its abstract data transport interface, it integrates some popular data transport technologies like ZeroMQ. But also provides shared memory and RDMA transport (based on libfabric) for high throughput, low latency applications. Moreover, FairMQ allows the single process to use multiple and different transports at the same time.
    
    FairMQ based processes can be controlled and orchestrated via different systems by implementing the corresponding plugin. However, ALFA delivers also the Dynamic Deployment System (DDS) as an independent set of utilities and interfaces, providing a dynamic distribution of different user processes on any Resource Management System (RMS) or a laptop.
    
    ALFA is used by different experiments in different stages of data processing as it offers an easy integration of heterogeneous hardware and software. Examples of ALFA usage in different stages of event processing will be presented; in a detector read-out as well as in an online reconstruction and in a pure offline world of detector simulations based on FairRoot.
    
    Speaker: Mohammad Al-Turany (GSI Helmholtzzentrum für Schwerionenforschung)
    
    Alfa_digum.pdf
- 12:30 → 13:30
  
  Lunch break 1h
- 13:30 → 15:00
  
  Facility tour
- 15:00 → 15:30
  
  Coffee Break 30m
- 15:30 → 16:30
  Keynote speeches: Unsupervised ML to explore data (Kai Polsterer)
  - 15:30
    
    Unsupervised ML to explore data: lessons learned from learning machines 45m
    
    The amount and size of astronomical data-sets was growing rapidly in the last decades. Now, with new technologies and dedicated survey telescopes, the databases are growing even faster. VO-standards provide uniform access to this data. What is still required is a new way to analyze and tools to deal with these large data resources. E.g., common diagnostic diagrams have proven to be good tools to solve questions in the past, but they fail for millions of objects in high dimensional features spaces. Besides dealing with poly-structed and complex data, the time domain has become a new field of scientific interest.
    
    By applying technologies from the field of computer sciences, astronomical data can be accessed more efficiently. Machine learning is a key tool to make use of the nowadays freely available datasets. This talk provides an overview of what can be achieved with unsupervised learning techniques, discussed on examples that show, what we learned when using machine learning algorithms on real astronomical data-set.
    
    Speaker: Kai Polsterer
    
    CERN_controls4ML_Berlin_VKMay23.pdf
- 16:30 → 18:30
  
  World Cafe
Thursday 4 May
- Wed 3 May
- Thu 4 May
- 08:30 → 09:00
  
  Arrival & Coffee 30m
- 09:00 → 09:30
  
  DIGUM User interface: introduction: DIGUM Introduction
  
  Workshop_Berlin_2023_Tim_Ruhe.pdf
- 09:30 → 10:30
  
  World Cafe: World cafe wrap up
- 10:30 → 11:00
  
  Coffee Break 30m
- 11:00 → 12:00
  
  Close Out: Close out

Choose timezone

Next Generation Environment for Interoperable Data Analysis - Expert Workshop

Seminarraum "Kino"

HZB Berlin