The Challenge of Big Data in Science (1st International LSDMA Symposium)

Europe/Berlin
Aula FTU (KIT)

Aula FTU

KIT

Hermann-von-Helmholtz-Platz 1 76344 Eggenstein-Leopoldshafen Germany
Achim Streit (KIT) , Christopher Jung (KIT)
Description
The amount of measured and simulated data in science has been rising quickly and will continue to do so in the foreseeable future. This development provides researchers with many opportunities, but also poses new challenges. At the symposium, organized by the project Large Scale Data Management and Analysis (LSDMA) of the German Helmholtz Alliance, international experts give an up to date overview on applications, technology and infrastructure in big data. The keynotes will be delivered by Alex Szalay (Johns Hopkins University, Sloan Digital Sky Survey) and Bob Sinkovits (San Diego Supercomputer Center, Gordon Supercomputer). Further renowned speakers include the 2012 Leibniz Prize award winner Peter Sanders from KIT. This symposium provides a common space for discussions and aims to identify new perspectives. The attendance fee is only 30 EUR and covers lunch and coffee breaks as well as a bus shuttle from Karlsruhe downtown to the conference venue and back. The symposium is kindly supported by IBM. The contact phone number on September 24th and 25th is +491723442319.
    • 09:00 09:30
      Welcome, Introduction 30m Aula FTU

      Aula FTU

      KIT

      Hermann-von-Helmholtz-Platz 1 76344 Eggenstein-Leopoldshafen Germany
      Speaker: Achim Streit
    • 09:30 10:30
      Extreme Data-Intensive Computing in Science 1h Aula FTU

      Aula FTU

      KIT

      Hermann-von-Helmholtz-Platz 1 76344 Eggenstein-Leopoldshafen Germany
      Speaker: Alex Szalay (Johns Hopkins University)
      Slides
    • 10:30 11:00
      coffee break 30m Aula FTU

      Aula FTU

      KIT

      Hermann-von-Helmholtz-Platz 1 76344 Eggenstein-Leopoldshafen Germany
    • 11:00 11:30
      Data Management in the Human Brain Project 30m Aula FTU

      Aula FTU

      KIT

      Hermann-von-Helmholtz-Platz 1 76344 Eggenstein-Leopoldshafen Germany
      Speaker: Thomas Heinis (EPFL)
      Slides
    • 11:30 12:00
      Distributed Data and Storage Management in the Worldwide LHC Computing Grid 30m Aula FTU

      Aula FTU

      KIT

      Hermann-von-Helmholtz-Platz 1 76344 Eggenstein-Leopoldshafen Germany
      The Large Hadron Collider (LHC) is the flagship project of the European Organization for Nuclear Research (CERN) and has last year delivered more than 20 PB of data to the high-energy physics community. More than 8000 users worldwide are using the data storage and management infrastructure to obtain new physics results. This presentation describes the requirements and boundary conditions for LHC data management and the services in place at CERN and many partner sites. It will also cover the main lessons learned during the first years of deployment and outline some of the planned improvements to the existing system.
      Speaker: Dirk Düllmann (CERN)
      Slides
    • 12:00 12:30
      Large Scale Data Processing for Light Sheet-based Fluorescence Microscopy 30m Aula FTU

      Aula FTU

      KIT

      Hermann-von-Helmholtz-Platz 1 76344 Eggenstein-Leopoldshafen Germany
      Speaker: Ernst Stelzer (Goethe University Frankfurt)
    • 12:30 14:00
      lunch break 1h 30m KIT Canteen

      KIT Canteen

    • 14:00 14:30
      Engineering Algorithms for Large Data Sets 30m Aula FTU

      Aula FTU

      KIT

      Hermann-von-Helmholtz-Platz 1 76344 Eggenstein-Leopoldshafen Germany
      The talk will introduce the methodology of algorithm engineering and discusses how it can be applied to solve problems on large data sets with example from the work done in my group, including sorting and other fundamental algorithms for data bases, full text indices, route planning, phylogenetic tree reconstruction, and particle tracking in the CERN LHC.
      Speaker: Peter Sanders (KIT)
      Slides
    • 14:30 15:00
      EUDAT - Building a Pan-European Collaborative Data Infrastructure 30m Aula FTU

      Aula FTU

      KIT

      Hermann-von-Helmholtz-Platz 1 76344 Eggenstein-Leopoldshafen Germany
      The EU funded EUDAT project is developing a European collaborative data infrastructure for e-science, as a layer in the overall European scientific e-infrastructure to complement the computing layer (EGI, DEISA, PRACE) and the networking layer (GÉANT). The project, which includes 23 partners from 13 countries, started on 1st October 2011 with a total budget of €16.3M. Partners include research communities, data centres, technology providers and funding agencies. The EUDAT objective is to deliver a Collaborative Data Infrastructure (CDI) with the capacity and capability for meeting researchers' needs in a flexible and sustainable way, across geographical and disciplinary boundaries and will allow researchers to share data within and among communities, providing a solution that is affordable, trustworthy, robust, persistent and easy to use. This talk will describe the progress which has been made to date and EUDAT's initial service offerings. It will also outline some of the challenges facing this ambitious project in providing a solution for cross-disciplinary research, preparing for the so-called "data tsunami" and developing a sustainable business model.
      Speaker: Alison Kennedy (EPCC)
      Slides
    • 15:00 15:30
      coffee break 30m Aula FTU

      Aula FTU

      KIT

      Hermann-von-Helmholtz-Platz 1 76344 Eggenstein-Leopoldshafen Germany
    • 15:30 16:00
      The High Data Rate Initiative (HDRI) of the Helmholtz Program 'Photons, Neutrons and Ions (PNI)' 30m Aula FTU

      Aula FTU

      KIT

      Hermann-von-Helmholtz-Platz 1 76344 Eggenstein-Leopoldshafen Germany
      Speaker: Edgar Weckert (DESY)
      Slides
    • 16:00 17:00
      Gordon: A novel high performance computing system for data and memory intensive applications 1h Aula FTU

      Aula FTU

      KIT

      Hermann-von-Helmholtz-Platz 1 76344 Eggenstein-Leopoldshafen Germany
      The Gordon system at the San Diego Supercomputer Center was designed from the ground up to solve data and memory intensive problems. For example, in contrast to the current trend in supercomputing of building increasingly larger machines with less memory per core, Gordon’s 1024 compute nodes each contain two Intel Sandy Bridge octo-core processors and 64 GB of memory. The nodes are connected via a dual-rail 3D torus network based on Mellanox QDR Infiniband hardware and can access a 4 PB Lustre-based parallel file system capable of delivering up to 100 GB/s of sequential bandwidth. Two novel features of Gordon though make it particularly well suited for data intensive problems. To bridge the large latency gap between remote memory and spinning disk, Gordon contains 300 TB of high performance Intel 710 series solid-state storage. Gordon also deploys a number of “supernodes”, based on ScaleMP’s vSMP foundation software, which can provide users with up to 2 TB of virtual shared memory. This talk will cover the Gordon architecture and our motivation for building the system. We will then present the results of both micro andapplication level benchmarks. The talk concludes with recent Gordon successstories spanning domains, such as computational chemistry and structural mechanics, that have traditionally made use of HPC resources and fields that are relatively new to supercomputing.
      Speaker: Robert Sinkovits (SDSC)
      Slides