NUC 14.07.2010

Attendance:

Wolfgang Ehrenfeld, Andreas Gellrich, Yves Kemp, Steve Aplin

News

NAF status

- kswapd problem: seen at other sites, but vague; waiting for new kernel

- InfiniBand errors: will upgrade firmware in blades, might need downtime for big switch upgrade

- one shared SL4 node, WN reorganization; ATLAS and CMS will discuss with users about SL4 usage. - lustre additions - Lustre group quotas possible, guarantees space for certain directory.

(new action item: setup groups) - template for storage problem reports presented and accepted - downtime tomorrow - longer downtime in September? - request: show helpdesk tickets of last months (new action item)

Existing Action Items

old-1 NAF/ATLAS CMT problem Wolfgang and Stephan Wiesand looked into it, problem also seen on Grid
1005-1 NAF present SL4 work group server usage show usage at next NUC, open as long as we have a need for SL4
1005-2 ATLAS/CMS plans for SL4 work group servers reduce to one shared SL4 IN (done), talk to users, open as long as we have a need for SL4
1005-3 NAF email notification for /scratch monitoring (instance full) person away, still open
1005-4 NAF technical constraints for Lustre extension might end in 1007-1
1005-5 NAF automatic Lustre space clean-up expert meeting still foreseen
1006-1 NUC CPU, dCache, Lustre resource requirements needed for next PRC
1006-2 NAF SGE queue for downloads/IO? first ideas dismissed
1006-4 NAF form for storage report on NAF web draft proposed, add to FAQ, review it
1006-3 NAF/ATLAS reboot procedure for (ATLAS) WGS done
1006-5 NUC ideas for accounting by e-mail or next NUC or extra meeting , no progress
1006-6 NAF dCache upgrade time line (10 GE in HH) 30% done

ATLAS report

- account cleanups - new accounts

- missing .OldFiles for early ATLAS users

- Login problems (9 June, 12 July (DESY AFS server)) - Lustre problems on tcx050 (21 June) - High load problem (tcx080 - 28 June, tcx060 - 6 July) - Host unavailable (tcx080 - 6 July)

- Lustre full (ZN, 13 July)

- Problems with iLumiCalc and DB (2x, not working) - ATLAS software question - ganga problem with old release (not fixed) - ganga setup problem with oversized sandbox (NAF config in work) - ATLAS storage question (2x) - gsidcap problem in athena - software installation problems (15.6.11)

- Files written to non ST (user informed, need some follow up on ATLAS documentation)

- dCache problem (doors overloaded, user informed and adviced) - Grid: large output sandboxes at DESY-HH (9 July, affected other jobs?)

CMS report

ILC report

LHCb report

Resource planing/NAF performance

There was a lively discussion on both topics. One consensus was, that we should get feedback from our users now that the LHC experiments have first data to analyze.

New Action Items:

1007-1 NAF setup Lustre groups for ATLAS, CMS, ILC
1007-2 NAF report on help disk tickets at every NUC
1007-3 ATLAS/CMS/LHCb ICHEP review; get feedback

Next meeting

- Wednesday, August 11 2010, 1 am