NUC 12.01.2011
Attendance:
Hartmut Stadie, Andreas Gellrich, Kai Leffhalm, Andreas Haupt, Steve Aplin, Yves Kemp, Wolfgang Ehrenfeld, Marcello Barisonzi, Alexey Zhelezov, Shaojun Lu
guests: Martin Gasthuber, Birgit Lewendel
News
NAF status
- nothing special (see discussion topics for current problems)
Existing Action Items
id |
who |
description |
status/comments |
old-1 |
NAF/ATLAS |
CMT problem |
need some final tests for RPM install, look into CVMFS |
1005-3 |
NAF |
email notification for /scratch monitoring (instance full) |
nearly done |
1006-6 |
NAF |
dCache upgrade time line (10 GE in HH) |
nearly done |
1007-1 |
NAF |
setup Lustre groups for ATLAS, CMS, ILC |
nothing new |
1007-2 |
NAF |
report on help desk tickets |
next time |
1010-1 |
NUC |
face-to-face meeting |
closed |
1010-2 |
NUC |
resource requests |
see below |
1010-4 |
NAF |
statistics on multicore/PROOF usage |
next time |
1012-1 |
NAF |
evaluate different stdout/stderr handling for SGE |
see below |
1012-2 |
NAF |
report on CERN strategy for batch and AFS |
LSF setup copies log files back at the end; similar setup to NAF, but CERN uses many more file servers |
1012-3 |
NAF |
new fair-share after hardware upgrade |
will be done, once HEPspec tests are run |
1012-4 |
ILC&NAF |
develop strategy to manage Lustre space |
in discussion phase |
ATLAS report
- used for various activities
- concern: on login and load balancing, Andreas H.: is fixed
- work group servers: problem with high load
- AFS space sufficient
- need version of naf_token-script for ubuntu and MacOSX
- Lustre administration difficult
- dCache: work with dCache admins to optimize setup and throughput
- support: entry points, request tracker for support list
- show status of components on web?
- motd? YK: can be implemented via group-profile
- CMT problem: software can be installed on WGS via rpm, look into number of AFS reads
- NX/VNC: will be requested by ATLAS
CMS report
- usage similar to ATLAS
- use official data sets so far, might have to switch to ntuples
- primary data set sizes(5 TB per data set)
- switch to secondary/tertiary
- SGE/AFS problem
ILC/CALICE report
- perfectly happy, not much usage right now, next big activity: prepare report for middle 2012
LHCb report
- slow scp speed between Hd-NAF: better with SL5 in Hd
- limit on number of parallel jobs per user: can be done via SGE consumable
- also looking into CernVMFS for software
Discussions
solutions for SGE and AFS interplay
- slides from Andreas
- solutions:
- new job submission verifier: test: qsub -jsv ~finnern/public/jsv.sh
(test quickly please)
- more AFS servers installed (HH: 2 -> 6); split volumes to different servers
- future of SGE: SGE not supported freely by Oracle anymore
upgrade plans for 2011/12
- slides from Yves
- replace login-vm nodes, wn: 64*8 cores by new 48*12 cores
- storage: first estimate add 40% of pledge on top
- Martin Gasthuber: need number on needed disk space and IO rates for sequential and random access, numbers are needed per quarter
Lustre: status and future
- MG:
- Lustre future is unclear
- looking into alternatives
- Do we need a fast sharable FS on the NAF?
- ATLAS estimation: 250 TB at 6 GB/s (other estimate: 10-20 GB/s) from ROOT ntuples
AOB
New Action Items:
1101-1 |
NAF/experiments |
WGS reliability/support/monitoring |
|
1101-2 |
AH/WE/HS |
naf_token for ubuntu&MacOSX |
|
1101-3 |
NAF |
revisit: user information/twitter |
|
1101-4 |
NAF/CMS/ATLAS |
move support list to request tracker? |
notify YK via mail |
1011-5 |
ATLAS |
written request for NX |
|
1101-6 |
NAF/experiments |
do we need a faster x-connection? |
|
1101-7 |
NAF/LHCb |
limit number of parallelly running jobs |
|
1101-8 |
experiments |
test new job submission verifier |
|
Next meeting
- Wednesday, February 9th 2011, 1 pm