Minutes of NAF User Committee meeting from 9.12.2009 ---------------------------------------------------- Present: Steve Aplin (ILC), Wolfgang Ehrenfeld (ATLAS), Andreas Gellrich (NAF), Yves Kemp (IT), Kai Leffhalm (NAF), Birgit Lewendel (IT), Angela Lucaci-Timoce (ILC), Andreas Nowack (CMS) Excused: Jan Erik Sundermann (ATLAS), Harmut Stadie (CMS) 1. News from the chair: Nothing to report. 2. Status report: Status report was given by Yves. See the agenda for the report. Below a few highlights from the discussion are listed: - Lustre On the Zeuthen Lustre instance quotas are enabled. This will allow a fast du even if no quota limits are set. Admins can/will see the quotas of their experiment. This will avoid the problem of users locking out their admins using plain unix ACLs. For the Hamburg Lustre instance quotas will be enable at the next downtime. This will be sometime in January. Automatic clean up of scratch space, as requested by ATLAS, can be done with something similar to tmpwatch. Performance studies for such a big space can only be done in real life. For the ATLAS purpose it is not enough to just know the biggest users. Many small users can also add up to a considerable amount of space and one needs to know if files are still used. - Batch monitoring plots The monitoring plots are updated. For short running jobs (below 15 minutes) the plot for the waiting time is fixed. Also, the ratio of used and requested memory is shown. Monitoring of multi core jobs is not working at the moment. SGE is buggy. This might be fixed in the next version, which will be installed at the next downtime in January. - Batch memory limit The default memory limit for the batch system will be raised to 1.5GB. CMS should comment if this is sufficient for their needs. No swap space will be added to the worker nodes. - Group profiles The NAF is not in favour of group profiles as changes are done by the experiments and not the NAF administrators. The experiments are in favour of it to customise the user environment. - SL5 migration ATLAS has fixed all their problems with SL5 and therefore do not need the SL4 worker nodes anymore. The remaining SL4 worker nodes will be migrated to SL5 in week 51. ATLAS and CMS both need SL4 and SL5 work group servers. The migration to SL5 for the login will be discussed in the January meeting. - Slow network connections LHCb complained that the network connection is slow between NAF and Heidelberg. This is due to very small buffer in ssh/scp. Newer ssh version perform better but are not available at the NAF. The NAF admins are investigating if buffer size tuning of ssh is possible. The preferred solution is transfer via dCache although the feedback from the NAF User Meeting showed that users prefer scp over dCache export due to faster handling and better knowledge. LHCb should follow up if they are not satisfied. - Resource monitoring The NAF web interface will contain a pull down menu with available institutes in the future. The institute is need for regional information of users. The registry should contain the same pull down menu in order to avoid different spellings of one institute. 3. Action items: Only items with updates beside from status report, see action item list for complete list of items. - monitoring for batch jobs memory: used/requested Done. - Docu for multi core batch jobs Done. - monitoring plots for 15 minutes jobs still not perfect Done. 4. AOB: - SL5/64 bit Grid UI: glite 3.2 is installed. The NAF admins will set up ini and inform the experiments. - System upgrades ATLAS had some problems with minor system upgrades. For future system upgrades the NAF admins will provide a test system for the experiments and allow between one and two weeks for testing before rolling out the upgrade.