- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
participants:
Kilian Schwarz (PUNCH4NFDI, DIG-UM Federated Infrastructures, Particle Physics)
Christoph Wissing (PUNCH4NFDI TA2, Particle Physics)
Markus Demleitner (DIG-UM Federated Infrastructures, Astronomy)
Anton Barty (DAPHNE, Photon Science)
Matthias Hoeft (PUNCH4NFDI TA2, Astronomy)
synergy meeting between
PUNCH TA2
DAPHNE TA6 analysis services
DIG-UM/Federated Infrastructures
what would synergy be ?
what kind of work can be identified ?
how to avoid duplication of work ?
at DESY IT
so far resources are separated
e.g. Maxwell used by Photon Science and T2/NAF by HEP
But there is a proposal ongoing to DASHH graduate school to provide intelligent scheduling
which can be used by IDAF at some point.
data source is at different location ==> similar to astronomy
synergy in post analysis
resource sharing between communities and AAI ==> one possible topic
VISA portal (Panosc/Expands) and PUNCH SDP: synergy effects are seen here
also in data catalogues / potential high level overlap with astronomy observation data
different levels of sophistication at different communities
different levels of general computer literacy among users (highly variable)
Photon science – many small groups with expertise in non-computer areas, versus HEP with large collaborations containing a relatively large sub-team with computer expertise
Many photon science users would like to get the (intermediate) results without having to worry about the computing complexity; their expertise and interest lies elsewhere, computing is the tool.
difficult to map synergy effects to concrete things to do
Different funding streams always complicate this
communities work quite differently
infrastructure sharing need to be well shielded from scientific applications
Best is if resource sharing is invisible to the user – they don’t want to have to worry about that
a mid term goal is how to shape infrastructures
Compute4PUNCH provides a barrier way to heavy for DAPHNE to use right now
Put another way: Too much knowledge of computing infrastructure required compared to the patience of photon science users who want to spend their time on other things.
DAPHNE:
workflow is
1. Pre-analysis at the facility
2. downloading data towards home and then to process via Jupyter NB
(it needs to be defined what data actually is – for some it’s everything, for other’s it’s pre-processed intermediate data
3. quality control is required
1. access to data
2. environment to login is needed
3. data catalogue ==> scicat
4. launch via FastX, web, or jupyter nb. Maybe virtual machines for different communities (see VISA)
5. compute resources and storage behind can be shared. Ideally transparent. We hear about it only when there are not enough resources.
This again points to
==> PUNCH-Science Data Platform and Analysis Platforms as VISA as potential synergy topic.
We need to look very generally at what is common
Access to datasets (find datasets, authenticated)
Feed datasets into appropriate analysis pipelines (and interact with the results)
Quality control checks of intermediate data processing along the way are important (many small steps in analysis, many small things to check such as bad pixel masking, beam centering…)
Necessary computing resources appear, as if by magic
Download something to home for further work (reduced, pre-processed data)
Web based, rather than CLI based?
Raw Data to Science Ready data in astronomy
debugging case important point
not a platform but a protocol
"data link"
associate data set with machine readable semantic
synergy could be found here
such links can also lead to a platform
DAPHNE:
transferring instrument information into standard data model
standardised file formats are needed
API method of data access could be useful (streaming, rather than file based)
interoperability of file formats
Sometimes confusing language: what does “meta data” exactly mean ?
why do we do this ?
would synergy make our life easier ? Or maybe not ?
virtual observatory is also a potential area of synergy
photon science and astronomy have things in common
Between LHC and Photon Science maybe harder to find synergies
photon based observatories exist
how would we find these synergies ?
someone from DAPHNE with solid technical problems comes to Astro interop conferences
can be found out with people developing solutions
IT department at DESY is "owned by Particle Physics"
But there are other views, too.
In some ways DESY IT is the ideal place to see what is synergistic, as they see both sides at work and it’s in their interests to avoid duplication by exploiting synergies !!
Any changes need acceptance and adoption by the user community – learning something new is initially seen as a cost rather than a benefit
documented data reduction is required
how to make archives accessible
computing needs to be put to data
analysis of raw data
access to raw data, access to infrastructure
wide area data access
This points to
PUNCH TA5 tasks, data irreversibility TA and real time
what data do keep and what to throw away
But also to PUNCH TA2 (wide area acces)
radio astronomy background
beforehand: people do things on their own computer
future: data analysis on infrastructure
we have to learn how to operate this.
maybe synergy or identifed solution
we need to find synergy in a more ideal world
future oriented synergies
electronic log books may be also a good point for synergy
Heads in the direction of HIFIS and ‘what is common across fields’
Sample PID server
Generic eLogs
Data DOI service
Overlay Batch system and HEP Data Lake technologies are used in
PUNCH TA2 and FIDIUM
would common testbeds between PUNCH and FIDIUM desirable ?
This can help to enlarge the basis.
In the end these topics of potential synergies are identified:
1. shared use of IT infrastructure at DESY (IDAF) and intelligent scheduling mechanisms
2. data locality aware scheduling or wide area access to data
3. resource sharing between communities and user identification via AAI
4. analysis platforms as VISA and PUNCH-SDP
5. file catalogues
6. user shielding from underlying federated infrastructure IT mechanisms / how to use and operate future infrastructures
7. "data link" protocol and virtual observatory
8. standardises data formats, FAIR principles, metadata resulting in higher interoperability
9. data reduction and data irreversibility issues (in real time)
10. electronic lab books
11. Overlay batch system and Data Lake: common testbeds between PUNCH and FIDIUM
these synergy topics should be ordered according to priority and
if they are short term or mid term synergy effects.
This should be clarified in a follow up meeting