NUC 09.06.2010
Attendance:
Wolfgang Ehrenfeld, Angela Lucaci-Timoce, Andreas Gellrich, Yves Kemp, Kai Leffhalm
News
NAF status
- high load on ATLAS WGS, requires reboot:
- comments form ATLAS: should shorten reaction time, download on batch?
- new action items: download queue, reboot procedure
- dCache performance during the last month:
- bottlenecks seen
- action item: form for storage problem reports
- new Lustre space at HH and ZN
- need feedback from experiments for resource planning
Existing Action Items
- New action items:
- NAF: present SL4 work group server usage
- plots shown
- repeat at next NUC
- ATLAS/CMS: plans for SL4 work group servers
- reduce to one shard WGS SL4, support ends in October
- ATLAS/CMS talk to the users
- NAF: check /scratch monitoring (instance full), email notification
- all experiments want this
- still open
- NAF: technical constraints for Lustre extension
- some boundary conditions, upcoming support issues
- open
- NAF: announcement of twitter service
- NAF/NUC: expert meeting on scratch deletion tool
- no meeting yet
- tried a new idea, alas it did not work out
- open
- Still open action items:
- NAF/ATLAS: CMT problem (NAF invite experts/ATLAS)
- CMT developers looking into it
- ATLAS@DESY can contribute
- NAF: AFS scratch space creation (existing account migration for
ILC/LHCb)
- no migration needed for LHCb, ILC
- new AFS servers ordered, over-subscription reduced
- closed
- NAF: deletion model for /scratch
- NAF: multicore batch job monitoring
- plots shown, maybe add to accounting in the future.
- show plot in every meeting
ATLAS report
- login/work group server problems (3-4)
- DB release installation
- install request (kcachegrind, no reply yet)
- ganga installation request
- ganga question
- lustre full (25 May, 3 June, now)
- policy question (2x)
- accounts (2x)
CMS report
- problems with access to dCache user directories (high load from Grid jobs/NAF)
- problems with access to dCache data directories (high load from Grid jobs/NAF)
- dCache user directory creation (not automatic)
- glite32 now used with crab
- need more Lustre space 10-20 TB
- Lustre unavailability on WGS
- would like well defined procedure to report storage problems
- briefly tested tcx053 (no problems seen)
- should formulate CPU and storage request for NAF in 2010/2011
ILC report
LHCb report
- No changes since the day 1, except:
- REAL data are there
- The total data size is increased
- We are in general happy with NAF, but...
- dCache access a bottleneck (maybe limit number of parallel jobs per user?)
- problem with very slow data transfer: NAF -> Heidelberg
- need more Lustre space
New Action Items:
- CPU, dCache, Lustre resource requirements development
- download queue
- reboot procedure
- form for storage report
- accounting ideas via e-mail or next meeting or extra meeting
- dCache upgrade time line
Next meeting
- Wednesday 14 July 2010