LSDMA Technical Forum

FTU Aula (KIT Campus Nord)

FTU Aula

KIT Campus Nord

This years LSDMA Technical Forum is a platform for novel and running projects to present their technical challenges, goals as well as currently open challenges. This shall create an environment where the technical people can exchange expertise about state of the art solutions, discuss common challenges and possibly identify future joint projects or proposals. Topics are centered to the fields of storage, big data, identity management and performance.
    • 08:50 08:59
      Introduction 9m
      Speaker: Dr Marcus Hardt (KIT)
    • 09:00 09:20
      Nanoscience Foundries & Fine Analysis (NFFA) 20m
      Nanoscience Foundries & Fine Analysis (NFFA) is a research project funded from the EU's H2020 framework programme for research and innovation. The vision of the NFFA project is to provide free and transnational access to the widest range of tools for research at the nanoscale. The NFFA infrastructure is distributed all over Europe providing synchrotron, FEL and neutron radiation sources for growth, nano-lithography, nano-characterization, theory and simulation and fine-analysis. To make research data stored in data archives attached to the different facilities manageable, retrievable and sharable a distributed Information and Data Repository Platform (IDRP) is build up by WP8 of the NFFA project. This talk gives a short overview on the overall architecture of the IDRP, adopted technologies and novel approaches. Finally, challenges approached when designing the distributed repository are presented and elaborated.
      Speaker: Mr Thomas Jejkal (KIT)
    • 09:20 09:40
      Automated Provenance Management for Enabling Scientific Data Reproducibility 20m
      Provenance traces history within workflows and enables researchers to validate and compare their results. Modelling workflows in ProvONE standard provenance model is an arduous task and lacks an automated approach. To overcome this limitation, in this talk we present a novel graph drawing algorithm for generating ProvONE prospective provenance graphs. These graphs are further updated with the relevant retrospective provenance during the execution of the workflow. Finally, we also show the provenance management architecture for scientific data repository and present the various queries for retrieving the provenance information.
      Speaker: Mr Ajinkya Prabhune (KIT)
    • 09:40 10:00
      Services for long time data storage - project bwDataArchiv 20m
      Requirements from data archives and repositories form the basis of the reliable long term storage infrastructure built in the bwDataArchiv project. Large data from scientific experiments including HPC simulations that must be kept around but does not need to occupy precious on-line analysis storage can be stored easily by using common storage protocols, reliably by using end to end checksums, and economically by using magnetic tape. Through available technologies in combination with developments from e.g. other LSDMA/DSIT work packages the infrastructure is effectively used by national and international projects.
      Speaker: Jos van Wezel (Karlsruhe Institute of Technology)
    • 10:00 10:30
      Towards Information Infrastructures in DFG Collaborative Research Centres 30m
      In this slot we present two new SFBs: Collaborative Research Centre "Volition and Cognitive Control": Data Management, Workflow Optimization and Science Gateway Richard Grunzke The overarching aim of the Collaborative Research Centre (CRC) is to elucidate cognitive and neural mechanisms underlying adaptive volitional control as well as impaired control in selected mental disorders. Researchers of the CRC collect a wide variety of data such as MRI-images, EEG, genetic, or behavioral data from participants in various research projects. Due to the increasing number of participants, multimodal assessment, improved imaging technologies as well as new research projects the amount of CRC data stored in diverse files is steadily increasing. The INF project will design and build a system that manages the data including metadata, enables its analysis using HPC resources, enables data sharing, and integrates with the existing science gateway. ---------------------------- Collaborative Research Center “Episteme in Motion”: Data and Analysis Infrastructure Danah Tonne, Rainer Stotzka The Collaborative Research Center “Episteme in Motion” is dedicated to the examination of processes of knowledge change in European and in non-European pre-modern cultures. The INF project of the CRC develops methods and practices for the digital exploitation and visualization of epistemic changes within long-term processes of transmission of premodern corpuses. It uses travelling manuscripts, codices, prints, albums and library inventories as examples of systemic transfer processes. The aim is to build a repository for the digital data objects and their metadata useful for the specific purposes of all the projects of the CRC. By cooperating closely with the project “Manuscripts in Motion: Tools for Documenting, Analysing and Visualising the Dynamics of Textual Topographies" we test forms of cooperation between the humanities and applied computer science. Given that a cooperation between three institutions is going to be established, the INF project will serve as a pilot for DARIAH-DE for the implementation of complex institutional cooperation.
      Speakers: Dr Rainer Stotzka (KIT), Mr Richard Grunzke (TU Dresden)
    • 10:30 11:00
      Coffee 30m
    • 11:00 11:20
      GeRDI - Generic Research Data Infrastructures 20m
      The new ~3 million Euro DFG project GeRDI (Generic Research Data Infrastructure) aims at building and connecting research data mangement systems. The project involves significant efforts in the areas of requirement analysis, implemenation, pilot operation, and sustainability. Germany-wide scientisits will be enabled to store, search for, and re-use cross-disciplinary research data.
      Speaker: Mr Richard Grunzke (TU Dresden)
    • 11:20 11:40
      BigStorage 20m
      BigStorage is a European Training Network (ETN) whose main goal is to train future data scientists in order to enable them to apply holistic and interdisciplinary approaches for taking advantage of a data-overwhelmed world. Such expertise is mandatory to enable researchers to propose appropriate answers to application requirements while leveraging advanced data storage solutions unifying cloud and HPC storage facilities. Four representative big data application use cases are studied to set up the foundation for the project: the Human Brain Project (HBP), the Square Kilometre Array (SKA), climate science and smart cities. More information is available at
      Speaker: Dr Michael Kuhn (Universität Hamburg)
    • 11:40 12:00
      Thrill 20m
      Speaker: Mr Timo Bingmann (KIT)
    • 12:00 12:20
      dCache: new and exciting features 20m
      The dCache project develops and supports software for storing large volumes of scientific data in a POSIX namespace, optionally storing some of the data on tape, with scalable performance and support for many protocols. This talk will focus on recent improvements that are already available or are anticipated for the next major release. We are introducing a new token-based authorisation scheme that will allow easy sharing of data and external user management. The interface for managing the quality of service users expect for their files is being improved, along with a new web interface for managing data. This provides users with an enriched view of their data. Under the hood, core services are being updated so they can be scaled horizontally and are no longer a single-point-of-failure. We are also adding support within dCache for using clustered storage, initially targeting CEPH, a popular object store.
      Speaker: Dr Paul Millar (DESY)