LSDMA Community Forum 2015

Europe/Berlin
Gebäude H (HTW Berlin)

Gebäude H

HTW Berlin

Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
Achim Streit (KIT), Christopher Jung (KIT), Patrick Fuhrmann (DESY)
Description
The Community Forum of the Helmholtz Cross-Sectional Activity 'Large-Scale Data Management and Analysis' (LSDMA) focuses on current Big Data issues and solutions in several scientific communities.
On March 25th, different data management software technologies will be demonstrated by their developer teams. The next day, speakers from scientific communities will present their data issues. On both, there will be much opportunity for discussion.
The event will take place at the HTW Berlin. You can register until 18/03/15; the minimum registration fee is 27 EUR and covers coffee breaks. A joint dinner on March 25th and/or lunch on March 26th can be booked on top.
Group picture
Group picture (mosaic)
  • Wednesday, 25 March
    • 13:30 13:45
      Introduction to Community Forum Demonstrations 15m Gebäude H

      Gebäude H

      HTW Berlin

      Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
      Speaker: Dr Christopher Jung (KIT)
      Slides
    • 13:45 14:15
      Publication Repository - A basic KIT Data Manager Demonstration 30m Gebäude H

      Gebäude H

      HTW Berlin

      Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
      KIT Data Manager is a software architecture for building up repositories for research data. The generic service stack provides enough flexibility to support various scientific communities and use cases in a modular and extensible way. Apart from vertical, community-specific repositories KIT Data Manager can also be used to setup horizontal repositories for generic use cases. This demonstration will present how to setup a repository for managing publications and accompanying data using the KIT Data Manager service stack. It will show the underlying architecture, necessary extensions to KIT Data Manager basic services, how the repository can be used and how it can be obtained and installed.
      Speaker: Mr Thomas Jejkal (KIT)
    • 14:15 14:35
      The UFTP data transfer suite 20m Gebäude H

      Gebäude H

      HTW Berlin

      Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
      UFTP is a high-performance data transfer library that combines ideas from passive FTP with a flexible security layer and user mapping, providing a powerful solution to many data transfer issues. UFTP can be integrated very flexibly into existing security infrastructures and applications. This demonstrator will showcase the UFTP tools in several scenarios such as data upload/download, file synchronization and data sharing.
      Speaker: Dr Bernd Schuller (Jülich Supercomputing Centre)
    • 14:35 15:15
      New developments in UNICORE 40m Gebäude H

      Gebäude H

      HTW Berlin

      Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
      UNICORE is a middleware suite that is well established as one of the major solutions for building federations and e-infrastructures, with a focus on high-performance computing. UNICORE offers secure and seamless access to heterogeneous compute and data resources such as HPC machines, compute clusters, OpenStack images and various storage systems. UNICORE offers both SOAP web services and new RESTful APIs that are combined with a flexible and open security stack. After a brief general overview, this demonstration focuses on recent developments, such as * integration with Unity for flexible user authentication avoiding x.509 user certificates * the new REST API for job submission and data access * integration of S3 storages * the UNICORE web portal * data-driven use cases and metadata handling
      Speaker: Dr Bernd Schuller (Jülich Supercomputing Centre)
    • 15:15 15:45
      Coffee Break 30m Gebäude H

      Gebäude H

      HTW Berlin

      Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
    • 15:45 16:15
      Performance and Power Tracing Framework 30m Gebäude H

      Gebäude H

      HTW Berlin

      Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
      Our performance/power tracing framework allows analyzing the performance and energy consumption of parallel scientific applications using a combination of several tools: Vampir, VampirTrace and pmlib. The framework includes a flexible and extensible design that enables easy integration of different types of power measurement devices and modules that record resource utilization values, such as disk and network throughput. Due to this wide range of statistics related to performance and power, it is useful for both application developers as well as system administrators that want to find and eliminate bottlenecks.
      Speaker: Konstantinos Chasapis (Uni Hamburg)
    • 16:15 16:45
      Birdhouse ... Web Processing Services for the climate science community 30m Gebäude H

      Gebäude H

      HTW Berlin

      Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
      Birdhouse is a collection of Web Processing Service (WPS) related Python components to support big data processing in the climate science community. WPS is an interface standard by the Open Geospatial Consortium (OGC). The aim of Birdhouse is to make the usage of WPS easy. It comes with supporting processes to access climate data sources (like Earth System Grid Federation) and to chain WPS processes with a workfow-engine. There are also processes for the climate impact community and for quality assurance of climate data. Birdhouse uses the Anaconda Python distribution to install WPS packages and the dependencies of WPS processes.
      Speaker: Carsten Ehbrecht (DKRZ)
    • 16:45 17:15
      dCache and cloud services 30m Gebäude H

      Gebäude H

      HTW Berlin

      Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
      Historically, the dCache Technology was designed to provide a highly scalable Multi-Petabyte Storage System for GRID Infrastructures, providing the necessary access and authentication protocols as well as automatic data replication and media selection, guaranteeing high data throughput even under heavy load and a typical scientific chaotic access profile. Due to the rapid evolution of the "Big Data" awareness and the new Sync'n Share Cloud semantics, dCache extended it's access layers to provide the necessary features and to fully support the typical scientific data life cycle use-cases. During the presentation we intend to demonstrate a typical data workflow, including ingestion, processing of the input data, wide area transfers with commonly available tools and services, as well as sync'ing results with mobile devices and sharing them with colleagues using standard Web 2.0 technologies. We'll prove that dCache offers the most optimized standard access and authentication protocols for the various steps.
      Speaker: Dr Paul Millar (DESY)
      Slides
    • 17:15 17:45
      General Discussion and Conclusion 30m Gebäude H

      Gebäude H

      HTW Berlin

      Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
      Speaker: Dr Christopher Jung (KIT)
    • 19:00 22:00
      Dinner 3h ABACUS Tierpark Hotel

      ABACUS Tierpark Hotel

      Franz-Mett-Straße 3-910319 Berlin
  • Thursday, 26 March
    • 09:00 09:15
      Introduction to Community Forum Presentations 15m Gebäude H

      Gebäude H

      HTW Berlin

      Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
      Speaker: Prof. Achim Streit (KIT)
    • 09:15 10:00
      Challenges in data management for the IceCube and CTA observatories 45m Gebäude H

      Gebäude H

      HTW Berlin

      Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
      The challenges in data management for large-scale astroparticle physics instruments like the IceCube neutrino observatory and the future gamma-ray observatory CTA are very different to particle physics experiments, despite the similarity of their data structure with classical particle physics instruments. Examples for the challenges are the generation of large amounts of data in remote places (e.g. the South Pole in the Antarctica) and the requirement to generate fast alerts to transient phenomena like Gamma Ray bursts. CTA will also serve a wider community, giving open data access and user support to non-expert scientists. I will give a detailed overview of the data management activities in the IceCube and CTA observatories in this talk.
      Speaker: Dr Gernot Maier (DESY)
    • 10:00 10:45
      Big Data in Next Generation Sequencing (NGS): Requirements and Challenges 45m Gebäude H

      Gebäude H

      HTW Berlin

      Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
      The DKFZ is the leading clinical sequencing center in Europe, equipped with 15 Illumina Hiseq sequencing machines and an actual 10.000 TB data storage capacity. The introduction of the new Illumina X-Ten technology enables the 1000$ genome even now and at the same time will increase the data throughput by a factor of 10. Resources and infra-structure required for playing the leading role in clinical sequencing will be highlighted in this presentation. Furthermore the aspects of precision oncology and the relating IT challenges will be discussed.
      Speaker: Jürgen Eils (DKFZ)
      Slides
    • 10:45 11:15
      Coffee Break 30m Gebäude H

      Gebäude H

      HTW Berlin

      Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
    • 11:15 11:45
      SCADS Dresden/Leipzig: A Compentence Center for Big Data Research 30m Gebäude H

      Gebäude H

      HTW Berlin

      Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
      The past decade has witnessed a tremendous increase in the availability of data for sophisticated scientific analysis. Not only large-scale research projects are able to produce large amounts of raw data but also the availability of affordable instruments for studies of different types has fed the scientific data deluge. To enable users from different research domains to efficiently use HPC systems for scientific analysis they need assistance in their daily work for processing Big Data scenarios. To support such essential collaborative work for developing adaptable Big Data processing techniques and for providing a working environment for domain experts and computer scientists, a national competence center for Big Data, namely the “Competence Center for Scalable Data Services and Solutions” (ScaDS Dresden/Leipzig) has started it fall 2014. Domain experts from various research fields bring in their requirements for large-scale data processing and analysis and work closely together with data analysts from computer science to extend current methods of data reduction, the extraction of knowledge from the broad data bases (data mining, machine learning, visual analytics), and aim at a service oriented approach to generalize methods development towards Big Data services.
      Speaker: Dr Ralph Müller-Pfefferkorn (Technische Universität Dresden, Center for Information Services and High Performance Computing (ZIH))
      Slides
    • 11:45 12:15
      The Berlin Big Data Center (BBDC) 30m Gebäude H

      Gebäude H

      HTW Berlin

      Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
      The Berlin Big Data Center (BBDC) is a Competence Center funded by the German Federal Ministry of Education and Research lead by Technische Universität Berlin. The BBDC strives to fuse the academic disciplines of machine learning and data management into scalable data analysis. The goal of the Berlin Big Data Center is to help bridge the Talent Gap of Big Data through researching and developing novel technology. Our starting point is the Apache Flink system. We aim to enable deep analytics of huge heterogeneous data sets with low latency by developing advanced, scalable data analysis and machine learning methods. Our goal is to specify in these methods a declarative way and optimize and parallelize them automatically, in order to empower data scientists to focus on the analysis problem at hand. In the talk, I'll highlight the challenges of processing Big Data, present the BBDC consortium, describe the goals and objectives of the BBDC, give a short introduction into Apache Flink including data processing and finally explain the technological goal of the BBDC: "development of declarative and scalable data analysis algorithms".
      Speaker: Holmer Hemsen (TU Berlin)
    • 12:15 13:00
      Data Management in the Coastal Observing System COSYNA 45m Gebäude H

      Gebäude H

      HTW Berlin

      Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
      The Coastal Observing System for Northern and Arctic (COSYNA) aims to describe the physical and biogeochemical state of a regional coastal system. A multitude of measurements devices combined with modelling systems help to gain a comprehensive picture of the state of the North Sea. The data management challenges for a complex system like COSYNA lie in combining diverse data sources like in-situ time series of point-observations with observations of moving platforms, remote sensing platforms and model output. These challenges are met by employing a metadata system that describes instruments and datasets in a highly standardised way. A data portal based on web-services allows a comprehensive retrieval and analysis of data sets of such highly diverse nature. The presentation will illustrate the concepts underlying the processing and management of the data and give examples for data visualisation within the COSYNA data portal CODM.
      Speaker: Gisbert Breitbach (Helmholtz-Zentrum Geesthacht)
      Slides
    • 13:00 14:30
      Lunch Break 1h 30m Gebäude H

      Gebäude H

      HTW Berlin

      Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
    • 14:30 15:15
      Energy Data 45m Gebäude H

      Gebäude H

      HTW Berlin

      Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
      Speaker: Munzke Nina (KIT)
      Slides
    • 15:15 16:00
      Medical Data for Clinical Research - Variety meets Volume 45m Gebäude H

      Gebäude H

      HTW Berlin

      Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
      Speaker: Dagmar Krefting (HTW Berlin)
    • 16:00 16:30
      Discussion and Conclussion 30m Gebäude H

      Gebäude H

      HTW Berlin

      Forschungszentrum Kultur und Informatik (FKI) der HTW Berlin Wilhelminenhofstr. 75a, 12459 Berlin
      Speaker: Prof. Achim Streit (KIT)