Minutes of NAF User Committee meeting from 9.12.2009
----------------------------------------------------

Present: Steve Aplin (ILC), Wolfgang Ehrenfeld (ATLAS), Andreas
Gellrich (NAF), Yves Kemp (IT), Kai Leffhalm (NAF), Birgit Lewendel
(IT), Angela Lucaci-Timoce (ILC), Andreas Nowack (CMS)

Excused: Jan Erik Sundermann (ATLAS), Harmut Stadie (CMS)


1. News from the chair:

Nothing to report.


2. Status report:

Status report was given by Yves. See the agenda for the report. Below
a few highlights from the discussion are listed:

 - Lustre

On the Zeuthen Lustre instance quotas are enabled. This will allow a
fast du even if no quota limits are set. Admins can/will see the
quotas of their experiment. This will avoid the problem of users
locking out their admins using plain unix ACLs.
For the Hamburg Lustre instance quotas will be enable at the next
downtime. This will be sometime in January.

Automatic clean up of scratch space, as requested by ATLAS, can be
done with something similar to tmpwatch. Performance studies for such
a big space can only be done in real life. For the ATLAS purpose it is
not enough to just know the biggest users. Many small users can also
add up to a considerable amount of space and one needs to know if
files are still used.

 - Batch monitoring plots

The monitoring plots are updated. For short running jobs (below 15
minutes) the plot for the waiting time is fixed. Also, the ratio of
used and requested memory is shown.

Monitoring of multi core jobs is not working at the moment. SGE is
buggy. This might be fixed in the next version, which will be
installed at the next downtime in January.

 - Batch memory limit

The default memory limit for the batch system will be raised to
1.5GB. CMS should comment if this is sufficient for their needs. No
swap space will be added to the worker nodes. 

 - Group profiles

The NAF is not in favour of group profiles as changes are done by the
experiments and not the NAF administrators. The experiments are in
favour of it to customise the user environment.

 - SL5 migration

ATLAS has fixed all their problems with SL5 and therefore do not need
the SL4 worker nodes anymore. The remaining SL4 worker nodes will be
migrated to SL5 in week 51.
ATLAS and CMS both need SL4 and SL5 work group servers. The migration
to SL5 for the login will be discussed in the January meeting.

 - Slow network connections

LHCb complained that the network connection is slow between NAF and
Heidelberg. This is due to very small buffer in ssh/scp. Newer ssh
version perform better but are not available at the NAF. The NAF
admins are investigating if buffer size tuning of ssh is possible.

The preferred solution is transfer via dCache although the feedback
from the NAF User Meeting showed that users prefer scp over dCache
export due to faster handling and better knowledge. 

LHCb should follow up if they are not satisfied.

 - Resource monitoring

The NAF web interface will contain a pull down menu with available
institutes in the future. The institute is need for regional
information of users. The registry should contain the same pull down
menu in order to avoid different spellings of one institute.


3. Action items:

Only items with updates beside from status report, see action item list for complete list of
items.

 - monitoring for batch jobs memory: used/requested

Done.

 - Docu for multi core batch jobs

Done.

 - monitoring plots for 15 minutes jobs still not perfect

Done.


4. AOB:

 - SL5/64 bit Grid UI:

glite 3.2 is installed. The NAF admins will set up ini and inform the
experiments.

 - System upgrades

ATLAS had some problems with minor system upgrades. For future system
upgrades the NAF admins will provide a test system for the experiments
and allow between one and two weeks for testing before rolling out the
upgrade.