The Challenge of Big Data in Science (1st International LSDMA Symposium)
Tuesday 25 September 2012 -
09:00
Monday 24 September 2012
Tuesday 25 September 2012
09:00
Welcome, Introduction
-
Achim Streit
Welcome, Introduction
Achim Streit
09:00 - 09:30
Room: Aula FTU
09:30
Extreme Data-Intensive Computing in Science
-
Alex Szalay
(
Johns Hopkins University
)
Extreme Data-Intensive Computing in Science
Alex Szalay
(
Johns Hopkins University
)
09:30 - 10:30
Room: Aula FTU
10:30
coffee break
coffee break
10:30 - 11:00
Room: Aula FTU
11:00
Data Management in the Human Brain Project
-
Thomas Heinis
(
EPFL
)
Data Management in the Human Brain Project
Thomas Heinis
(
EPFL
)
11:00 - 11:30
Room: Aula FTU
11:30
Distributed Data and Storage Management in the Worldwide LHC Computing Grid
-
Dirk Düllmann
(
CERN
)
Distributed Data and Storage Management in the Worldwide LHC Computing Grid
Dirk Düllmann
(
CERN
)
11:30 - 12:00
Room: Aula FTU
The Large Hadron Collider (LHC) is the flagship project of the European Organization for Nuclear Research (CERN) and has last year delivered more than 20 PB of data to the high-energy physics community. More than 8000 users worldwide are using the data storage and management infrastructure to obtain new physics results. This presentation describes the requirements and boundary conditions for LHC data management and the services in place at CERN and many partner sites. It will also cover the main lessons learned during the first years of deployment and outline some of the planned improvements to the existing system.
12:00
Large Scale Data Processing for Light Sheet-based Fluorescence Microscopy
-
Ernst Stelzer
(
Goethe University Frankfurt
)
Large Scale Data Processing for Light Sheet-based Fluorescence Microscopy
Ernst Stelzer
(
Goethe University Frankfurt
)
12:00 - 12:30
Room: Aula FTU
12:30
lunch break
lunch break
12:30 - 14:00
Room: KIT Canteen
14:00
Engineering Algorithms for Large Data Sets
-
Peter Sanders
(
KIT
)
Engineering Algorithms for Large Data Sets
Peter Sanders
(
KIT
)
14:00 - 14:30
Room: Aula FTU
The talk will introduce the methodology of algorithm engineering and discusses how it can be applied to solve problems on large data sets with example from the work done in my group, including sorting and other fundamental algorithms for data bases, full text indices, route planning, phylogenetic tree reconstruction, and particle tracking in the CERN LHC.
14:30
EUDAT - Building a Pan-European Collaborative Data Infrastructure
-
Alison Kennedy
(
EPCC
)
EUDAT - Building a Pan-European Collaborative Data Infrastructure
Alison Kennedy
(
EPCC
)
14:30 - 15:00
Room: Aula FTU
The EU funded EUDAT project is developing a European collaborative data infrastructure for e-science, as a layer in the overall European scientific e-infrastructure to complement the computing layer (EGI, DEISA, PRACE) and the networking layer (GÉANT). The project, which includes 23 partners from 13 countries, started on 1st October 2011 with a total budget of €16.3M. Partners include research communities, data centres, technology providers and funding agencies. The EUDAT objective is to deliver a Collaborative Data Infrastructure (CDI) with the capacity and capability for meeting researchers' needs in a flexible and sustainable way, across geographical and disciplinary boundaries and will allow researchers to share data within and among communities, providing a solution that is affordable, trustworthy, robust, persistent and easy to use. This talk will describe the progress which has been made to date and EUDAT's initial service offerings. It will also outline some of the challenges facing this ambitious project in providing a solution for cross-disciplinary research, preparing for the so-called "data tsunami" and developing a sustainable business model.
15:00
coffee break
coffee break
15:00 - 15:30
Room: Aula FTU
15:30
The High Data Rate Initiative (HDRI) of the Helmholtz Program 'Photons, Neutrons and Ions (PNI)'
-
Edgar Weckert
(
DESY
)
The High Data Rate Initiative (HDRI) of the Helmholtz Program 'Photons, Neutrons and Ions (PNI)'
Edgar Weckert
(
DESY
)
15:30 - 16:00
Room: Aula FTU
16:00
Gordon: A novel high performance computing system for data and memory intensive applications
-
Robert Sinkovits
(
SDSC
)
Gordon: A novel high performance computing system for data and memory intensive applications
Robert Sinkovits
(
SDSC
)
16:00 - 17:00
Room: Aula FTU
The Gordon system at the San Diego Supercomputer Center was designed from the ground up to solve data and memory intensive problems. For example, in contrast to the current trend in supercomputing of building increasingly larger machines with less memory per core, Gordon’s 1024 compute nodes each contain two Intel Sandy Bridge octo-core processors and 64 GB of memory. The nodes are connected via a dual-rail 3D torus network based on Mellanox QDR Infiniband hardware and can access a 4 PB Lustre-based parallel file system capable of delivering up to 100 GB/s of sequential bandwidth. Two novel features of Gordon though make it particularly well suited for data intensive problems. To bridge the large latency gap between remote memory and spinning disk, Gordon contains 300 TB of high performance Intel 710 series solid-state storage. Gordon also deploys a number of “supernodes”, based on ScaleMP’s vSMP foundation software, which can provide users with up to 2 TB of virtual shared memory. This talk will cover the Gordon architecture and our motivation for building the system. We will then present the results of both micro andapplication level benchmarks. The talk concludes with recent Gordon successstories spanning domains, such as computational chemistry and structural mechanics, that have traditionally made use of HPC resources and fields that are relatively new to supercomputing.