This years LSDMA Technical Forum is a platform for novel and running projects to present their technical challenges, goals as well as currently open challenges. This shall create an environment where the technical people can exchange expertise about state of the art solutions, discuss common challenges and possibly identify future joint projects or proposals.
Topics are centered to the fields of storage, big data, identity management and performance.
Nanoscience Foundries & Fine Analysis (NFFA)20m
Nanoscience Foundries & Fine Analysis (NFFA) is a research project funded from the EU's H2020 framework programme for research and innovation.
The vision of the NFFA project is to provide free and transnational access to the widest range of tools for research at the nanoscale. The NFFA infrastructure
is distributed all over Europe providing synchrotron, FEL and neutron radiation sources for growth, nano-lithography, nano-characterization, theory
and simulation and fine-analysis. To make research data stored in data archives attached to the different facilities manageable, retrievable
and sharable a distributed Information and Data Repository Platform (IDRP) is build up by WP8 of the NFFA project. This talk gives a short overview
on the overall architecture of the IDRP, adopted technologies and novel approaches. Finally, challenges approached when designing the distributed repository
are presented and elaborated.
Automated Provenance Management for Enabling Scientific Data Reproducibility20m
Provenance traces history within workflows and enables researchers to validate and compare their results.
Modelling workflows in ProvONE standard provenance model is an arduous task and lacks an automated approach.
To overcome this limitation, in this talk we present a novel graph drawing algorithm for generating ProvONE prospective provenance graphs.
These graphs are further updated with the relevant retrospective provenance during the execution of the workflow.
Finally, we also show the provenance management architecture for scientific data repository and present the various queries for retrieving the provenance information.
Services for long time data storage - project bwDataArchiv20m
Requirements from data archives and repositories form the basis of the
reliable long term storage infrastructure built in the bwDataArchiv project.
Large data from scientific experiments including HPC simulations that must be
kept around but does not need to occupy precious on-line analysis storage can
be stored easily by using common storage protocols, reliably by using end to
end checksums, and economically by using magnetic tape. Through available
technologies in combination with developments from e.g. other LSDMA/DSIT work
packages the infrastructure is effectively used by national and international
Jos van Wezel
(Karlsruhe Institute of Technology)
Towards Information Infrastructures in DFG Collaborative Research Centres30m
In this slot we present two new SFBs:
Collaborative Research Centre "Volition and Cognitive Control": Data
Management, Workflow Optimization and Science Gateway
The overarching aim of the Collaborative Research Centre (CRC) is to elucidate
cognitive and neural mechanisms underlying adaptive volitional control as well
as impaired control in selected mental disorders. Researchers of the CRC
collect a wide variety of data such as MRI-images, EEG, genetic, or behavioral
data from participants in various research projects. Due to the increasing
number of participants, multimodal assessment, improved imaging technologies
as well as new research projects the amount of CRC data stored in diverse
files is steadily increasing. The INF project will design and build a system
that manages the data including metadata, enables its analysis using HPC
resources, enables data sharing, and integrates with the existing science
Collaborative Research Center “Episteme in Motion”: Data and Analysis Infrastructure
Danah Tonne, Rainer Stotzka
The Collaborative Research Center “Episteme in Motion” is dedicated to the examination of processes of knowledge change in European and in non-European pre-modern cultures.
The INF project of the CRC develops methods and practices for the digital exploitation and visualization of epistemic changes within long-term processes of transmission of premodern corpuses. It uses travelling manuscripts, codices,
prints, albums and library inventories as examples of systemic transfer processes. The aim is to build a repository for the digital data objects and their metadata useful for the specific purposes of all the projects of the CRC. By
cooperating closely with the project “Manuscripts in Motion: Tools for Documenting, Analysing and Visualising the Dynamics of Textual Topographies" we test forms of cooperation between the humanities and applied computer science. Given
that a cooperation between three institutions is going to be established, the INF project will serve as a pilot for DARIAH-DE for the implementation of complex institutional cooperation.
The new ~3 million Euro DFG project GeRDI (Generic Research Data
Infrastructure) aims at building and connecting research data mangement
systems. The project involves significant efforts in the areas of requirement
analysis, implemenation, pilot operation, and sustainability. Germany-wide
scientisits will be enabled to store, search for, and re-use
cross-disciplinary research data.
BigStorage is a European Training Network (ETN) whose main goal is to train future data scientists in order to enable them to apply holistic and interdisciplinary approaches for taking advantage of a data-overwhelmed world. Such expertise
is mandatory to enable researchers to propose appropriate answers to application requirements while leveraging advanced data storage solutions unifying cloud and HPC storage facilities. Four representative big data application use cases
are studied to set up the foundation for the project: the Human Brain Project (HBP), the Square Kilometre Array (SKA), climate science and smart cities. More information is available at http://bigstorage-project.eu/.
dCache: new and exciting features20m
The dCache project develops and supports software for storing
large volumes of scientific data in a POSIX namespace, optionally
storing some of the data on tape, with scalable performance and
support for many protocols. This talk will focus on recent
improvements that are already available or are anticipated for
the next major release.
We are introducing a new token-based authorisation scheme that
will allow easy sharing of data and external user management.
The interface for managing the quality of service users expect
for their files is being improved, along with a new web interface
for managing data. This provides users with an enriched view of
their data. Under the hood, core services are being updated so
they can be scaled horizontally and are no longer a
single-point-of-failure. We are also adding support within
dCache for using clustered storage, initially targeting CEPH, a
popular object store.