Monte Carlo Methods in Advanced Statistics Applications and Data Analysis
Foehringer Ring 6
(Max Planck Institute), Frank Steffen
(MPIM), Kai Schweda
(GSI), Kevin Kroeninger
(University of Goettingen), Ralf Ulrich
(KIT), Thomas Schoerner-Sadenius
This school - the first one commonly organised by the three Helmholtz Alliances Terascale, HAP and EMMI and the Max Planck IMPRS EPP School - addresses physicists from particle physics, astro-particle physics and hadrons & nuclei at all career levels. The programme comprises lectures and exercises on important Monte Carlo based statistics and data analysis methods.
- Basics of statistics and probability, random numbers and the Monte Carlo method
- Bayesian reasoning
- Information field theory
- Markov chain Monte Carlos
- Sampling and clustering
- Population Monte Carlo
- Nested sampling
More information on the contents of the individual presentations and exercises can be found connected to the respective timetable entries.
Basics 4: Information field theory - from data to images (lecture)
The problem of reconstrucing an image or a function from data is generally ill-posed. The desired signal has an infinite number of degrees of freedom whereas the data is only providing a finite number of constraints. Additional statistical information and other knowledge has to be used to regularize the problem. Information field theory permits us to formulate signal inference problems rigorously using probabilistic language to combine data and knowledge. It helps us to exploit existing methods developed for field theories to derive optimal reconstruction algorithms. In this course, an introduction to the basic principles of information field theory will be given and illustrate by concrete examples from astrophysical applications.
NIFTY: Numerical information field theory
This Tutorial introduces NIFTY, "Numerical Information Field Theory",
which allows a user an abstract formulation and programming of SIGNAL
inference AND IMAGE RECONSTRUCTION algorithms. NIFTY is a versatile Python
library designed to enable the development of signal inference algorithms
that operate regardless of the underlying spatial grid and its resolution.
The Tutorial covers the simulation of mock data from Gaussian random
processes and a Wiener filter reconstruction of the underlying signal
field from this data set. Using NIFTY, this filter can be applied on a
variety of spaces; e.g., point sets, n-dimensional regular grids,
spherical spaces, their harmonic counterparts, and product spaces
constructed as combinations of those.
Bayesian mixture modelling
A method to solve the long-lasting problem of disentanglement of the
background from the sources is given by Bayesian mixture modelling
(Guglielmetti F., et al., 2009, MNRAS, 396,165).
The technique employs a joint estimate of the background and detection of
the sources in astronomical images.
Bayesian probability theory is applied to gain insight into the
coexistence of background and sources through a probabilistic
two-component mixture model. Uncertainties of the background and source
signals are consistently provided. Background variations are properly
modelled and sources are detected independent of their shape. No
background subtraction is needed for the detection of sources. Poisson
statistics is rigorously applied throughout the whole algorithm.
The technique is general and applicable to count detectors.
Practical demonstrations of the method will be given through simulated
data sets and data observed in the X-ray part of the electromagnetic
spectrum from ROSAT and Chandra satellites.
BAT - a complex Markov chain Monte Carlo application
BAT - a complex Markov chain Monte Carlo application
The tutorial will give an introduction to the Bayesian Analysis
Toolkit (BAT), a C++ tool for Bayesian inference. The software is
based on algorithms for sampling, optimization and integration where
the key algorithm is Markov Chain Monte Carlo. Interfaces to existing
software tools exists, e.g., the ROOT implementation of Minuit, and
the Cuba library. A simple physics example will be discussed and
formulated as a statistical model in BAT. The first steps will include
the calculation of marginal distributions and uncertainty
propagation. The example will also be used to explain the basic
functionalities of BAT.
(University of Goettingen)
The STAN package: Bayesian Inference based on Hamiltonian Monte Carlo
Basic sampling methods, convergence, variance reduction - and connections to MC event generators
We consider Monte Carlo methods specific to the use in Monte Carlo event generators. After an introduction to Monte Carlo sampling or integration we will discuss some methods of variance reduction with phase space integration as application in mind. Finally we briefly discuss Multi Channel integration as the key to the integration of multi body final state matrix element.
Adaptive importance sampling, or population Monte Carlo (PMC), is a
powerful technique to sample from and integrate over complicated
distributions that may include degeneracies and multiple modes in up
to roughly 40 dimensions. PMC is best for tough problems as the
costly evaluation of the target distribution can be massively
Based on a simplified global fit for new physics, the individual parts
of the algorithm ranging from the initialization over
proposal-function updates to the final results are presented step by
step in a hands-on and visual fashion. Only basic knowledge of C++ is
required in order to modify the given source-code examples for a more
rewarding learning experience.