Next Generation Environment for Interoperable Data Analysis - 2nd Expert Workshop

Europe/Berlin
TU Dortmund

TU Dortmund

Restaurant Calla Vogelpothsweg 85 44227 Dortmund
Description

Volume, velocity and variety of data has dramatically increased over the last decade, which make these data precious and -- at least in principle -- enables unprecedented collaborative research. Already today, data volumes are too large to be stored and processed by individual scientists or instutional groups. Moreover, the data volume is expected to dramatically increase with the next generation of scientific facilities. This calls for a collaborative strategy and effort from all ErUM communities in order to manage the analysis and access to these large amounts of data.  

Building on the insights and experiences from our previous workshop, this event addresses scientists from all ErUM communities. Our goal is formulate a common strategy on how to manage and maintain the access to data and workflows provided by the different stakeholders, ranging from large international collaborations to individual scientists. The common strategy will also discuss requirements with respect to infrastructure and research data management with a focus on the user facing components.   

Main focus:

  • work flow engines
  • large language models

 

Keynote Speakers 

  • Arman Khalatyan (AIP)
  • Harry Enke (AIP)
  • Jan Lucas Uslu (RWTH Aachen)
  • Thomas Kuhr (LMU)
  • Tibor Simko (CERN)

 

Please bring a laptop with you for participating in the hands on tutorial. Details are given in the Tutorial

 

This workshop builds on the previous expert workshop Next Generation Environment for Interoperable Data Analysis (https://indico.desy.de/event/37379/). 

This is Workshop is organized by the DIG-UM Topic Group User Interface with support of the ErUM-Data-Hub

This workshop is addressed to scientists from all ErUM communities

A fee of 50€ will be charged for participation in the workshop.

 

Registration
Next Generation Environment for Interoperable Data Analysis - 2nd Expert Workshop
  • Tuesday 17 September
    • 08:00 09:00
      Registration and Coffee
    • 09:00 09:30
      Welcome
      Conveners: Pierre Schnizer (HZB), Tim Ruhe (TU Dortmund)
    • 09:30 11:00
      Invited Talks: Invited Talks I
      • 09:30
        Preserve-to-reuse: Building REANA reproducible data analysis platform 45m

        In this talk we present the story of building REANA, the free and open
        source platform for reusable and reproducible computational data
        analyses. REANA allows researchers to structure information about
        input data, analysis code, containerised environments and
        computational workflows so that the analyses can be instantiated and
        executed on remote compute clouds. REANA supports several declarative
        workflow systems (CWL, Snakemake, Yadage), container technologies
        (Apptainer, Docker) and compute backends (HTCondor, Kubernetes,
        Slurm). We describe the REANA platform, the typical use cases from
        experimental particle physics, exchange observations on sociological
        and technological challenges inherent in facilitating reusable
        computational research, as well as discuss the future plans for the
        evolution of the platform.

        Speaker: Dr Tibor Simko (CERN)
      • 10:15
        REANA Deployment on PUNCH Infrastructure and User Experiences 45m
        Speaker: Harry Enke (AIP)
    • 11:00 11:30
      Coffee
    • 11:30 12:15
      Invited Talks II
      • 11:30
        Analysis Facilities 45m

        Large experiments usually have well established
        computing solutions for the centrally managed processing of data, but
        this is in general not the case for the analysis of data by individuals
        or small groups. To be able to handle the expected increase of data at
        analysis level a collaborative project plans to work on technologies
        that improve the turn-around time and FAIRness of (interactive) data
        analyses.

        Speaker: Thomas Kuhr (BELLE (BELLE II Experiment))
    • 13:00 14:00
      Lunch
    • 14:00 16:30
      Hands-On: REANA Tutorial I
      • 14:00
        Reana Tutorial I 2h 30m

        Starting from the Tutorial https://gitlab-p4n.aip.de/punch_public/reana/tutorials that the PUNCH4NFDI project has created, we will introduce to working with REANA. And we then can work on example you bring with you.

        Caveata albeit a considerable range of different use cases are already worked out,
        please pick one that is of medium level difficulty in your field
        and does not require terabyte of data to be processed or special resources (FPGA boards etc.)

        For preparation: please follow the instructions in the Tutorial' ReadMe and install the REANA client on your laptop.

        Speakers: Dr Elena Sacchi (AIP), Harry Enke (AIP)
    • 18:00 19:00
      Bergmann Kiosk
    • 19:00 22:00
      Workshop Dinner
  • Wednesday 18 September
    • 08:30 09:00
      Coffee 30m
    • 09:00 10:30
      Invited Talks
      • 09:00
        Using Large Language Models in Scientific Research: Grounding Models in Reality using Data 45m

        Large Language Models (LLMs) have shown remarkable performance in a variety of natural language processing tasks. However, their application in scientific research is still mostly relegated to the domain of grammar correction and summarization. In this talk, I will discuss the potential of LLMs in scientific research, focusing on the challenges and opportunities of using LLMs to find, analyze and validate scientific papers and data while preventing hallucinations. For this I have created interactive elements to help the audience understand the potential of LLMs in scientific research and give them a hands-on experience of using LLMs in their research.

        Speaker: Dr Jan Lucas Uslu (RWTH Aachen)
      • 09:45
        Advancing Astronomical Research with Next-Gen LLMs and Optimized Code Workflows. 45m

        In this presentation, I will highlight the work being conducted at AIP, showcasing how the adoption and utilization of open-source LLMs such as Starcoder, Llama 3.1, and DeepSeek Coder on the Ollama server infrastructure have revolutionized the code development workflows for astronomers. These sophisticated models, integrated with popular IDEs like Visual Studio Code and Neovim, offer seamless real-time code suggestions and auto-completion features tailored specifically for astronomical research. Additionally, I will discuss the extension of these capabilities to the Colab.aip.de platform, where LLMs act as dynamic coding assistants, significantly expediting the coding and debugging process. This presentation will detail the technical advancements at AIP, demonstrating how these innovations enhance productivity and foster more efficient computational research, ultimately propelling the future of astronomical studies in the LLM era.

        We will also be discussing a novel approach to data analysis by integrating Large Language Models (LLMs) with intelligent agents. This integration has the potential to accelerate the coding process by significantly reducing the time required for developing and debugging data analysis pipelines.

        Speaker: Dr Arman Khalatyan (AIP)
    • 10:30 11:00
      Coffee 30m
    • 11:00 11:45
      Invited Talks II: ML Applications for LSST
    • 11:45 12:45
      Discussion
      Convener: Prof. Lucie Flek
    • 12:45 13:30
      Lunch 45m
    • 13:30 15:30
      Hands-On: REANA Tutorial II
    • 15:30 16:00
      Farewell