NUC 09.06.2010

Attendance:

Wolfgang Ehrenfeld, Angela Lucaci-Timoce, Andreas Gellrich, Yves Kemp, Kai Leffhalm

News

new NUC member of ILC/Calice: Shaojun Lu will replace Angela
should think about contributions to the HEP session at GridKa school 2010:
Wednesday afternoon 08.09.10, please see http://gridka-school.scc.kit.edu/
we should prepare a document detailing our wishes about future NAF resources

NAF status

high load on ATLAS WGS, requires reboot:
- comments form ATLAS: should shorten reaction time, download on batch?
- new action items: download queue, reboot procedure
dCache performance during the last month:
- bottlenecks seen
- action item: form for storage problem reports
new Lustre space at HH and ZN
need feedback from experiments for resource planning

Existing Action Items

New action items:
- NAF: present SL4 work group server usage
  - plots shown
  - repeat at next NUC
- ATLAS/CMS: plans for SL4 work group servers
  - reduce to one shard WGS SL4, support ends in October
  - ATLAS/CMS talk to the users
- NAF: check /scratch monitoring (instance full), email notification
  - all experiments want this
  - still open
- NAF: technical constraints for Lustre extension
  - some boundary conditions, upcoming support issues
  - open
- NAF: announcement of twitter service
  - done
  - closed
- NAF/NUC: expert meeting on scratch deletion tool
  - no meeting yet
  - tried a new idea, alas it did not work out
  - open
Still open action items:
- NAF/ATLAS: CMT problem (NAF invite experts/ATLAS)
  - CMT developers looking into it
  - ATLAS@DESY can contribute
- NAF: AFS scratch space creation (existing account migration for
  
  ILC/LHCb)
  - no migration needed for LHCb, ILC
  - new AFS servers ordered, over-subscription reduced
  - closed
- NAF: deletion model for /scratch
  - working on it
  - see above
- NAF: multicore batch job monitoring
  - plots shown, maybe add to accounting in the future.
  - show plot in every meeting

ATLAS report

login/work group server problems (3-4)
DB release installation
install request (kcachegrind, no reply yet)
ganga installation request
ganga question
lustre full (25 May, 3 June, now)
policy question (2x)
accounts (2x)

CMS report

problems with access to dCache user directories (high load from Grid jobs/NAF)
problems with access to dCache data directories (high load from Grid jobs/NAF)
dCache user directory creation (not automatic)
glite32 now used with crab
need more Lustre space 10-20 TB
Lustre unavailability on WGS
would like well defined procedure to report storage problems
briefly tested tcx053 (no problems seen)
should formulate CPU and storage request for NAF in 2010/2011

ILC report

Lustre space full

LHCb report

No changes since the day 1, except:
- REAL data are there
- The total data size is increased
We are in general happy with NAF, but...
- dCache access a bottleneck (maybe limit number of parallel jobs per user?)
problem with very slow data transfer: NAF -> Heidelberg
need more Lustre space

New Action Items:

CPU, dCache, Lustre resource requirements development
download queue
reboot procedure
form for storage report
accounting ideas via e-mail or next meeting or extra meeting
dCache upgrade time line

Next meeting

- Wednesday 14 July 2010