Minutes of NAF User Committee meeting from 08.07.2009 ----------------------------------------------------- Present/Phone: Wolfgang Ehrenfeld (ATLAS), Andreas Gellrich (NAF), Andreas Haupt (IT), Kai Leffhalm (NAF), Niels Meyer (ILC), Hartmut Stadie (CMS), Jan Erik Sundermann (ATLAS) 1. News from the chair: Last week the 'Detector Understanding first First Data' workshop took place at Hamburg. The tutorials for ATLAS and CMS were running very smoothly on the NAF. Thanks to all the help from the NAF admins to achieve this. 2. Action items: Only items with updates, see action item list for complete list of items. - ATLAS cmt compile problem During the detector understanding workshop the ATLAS software installation was mirrored to the local disc of all ATLAS work group servers. This was working very good and scaled very well for many people working and compiling at the same time. ATLAS will discuss with the NAF admins what the prospects are to have part of the ATLAS software permanently installed on the local disc of each work group server. CMS suggested to increase the AFS cache to have a full release in the cache. - NAF SL5 migration There are no new updates on SL5 software compatibility from ATLAS and CMS on testing. ILC testing is ongoing and some problems were discovered and fixed. CERN will migrate their lxplus and lxbatch service sometime this summer, but details are not available. After some discussion the NUC agreed that 90% of the batch resources will be switched to SL5 shortly after the next NUC meeting in August. If contrary to expectations major concern from users show up the migration plan might be altered at the next NUC meeting. The NUC chair will inform the users and ask again to test their software under SL5. The NAF admins will inform the users when the default OS for batch job submission will be changed from SL4 to SL5. This will happen when around 50% of the resources are migrated. Every experiment has by now a dedicated SL5 work group server. - batch monitoring plots: waiting time by experiment Some improvements have been added to the batch monitoring jobs, e. g. waiting time is split up by experiments. See last page of status report. The list of suggestions by Hartmut was longer and he will discuss the details with Kai. 3. Status report: Status report was given by Kai. See the agenda for the report. Below a few highlights from the discussion are listed: Currently the Lustre server has some problems. A reboot usually helps. Problem is under investigation by the NAF admins and sometimes the restart of the service is delayed to do further investigations. Hartmut pointed out that important user data is stored on Lustre. The disruption of the Lustre service has the same impact as a disruption of the AFS service. A quick restart of the Lustre service is encouraged. Wolfgang proposed that the NAF admins will inform at least the NUC or the support teams when there is repeated loose of a service. Not all users will complain to the NAF helpdesk for operational problems but will complain to the experiments support. 4. AOB: - SL4 and SL5 have the same sysname. A different one for SL5 is essential for proper software installation for SL4 and SL5. The NAF admins will report back on this issue well before the next meeting. - Hartmut said that it is difficult to keep the users up to date about new software tools at the NAF and suggested to have a news letter on a regular basis to explain these. An alternative is the use of motd to inform users. The NAF admins pointed to the news section on the NAF web, which covers some aspects of user information. The NAF admins should reevaluate the user information and optimise if necessary. - Wolfgang suggested to enable the automatic reply feature of the ticketing system (naf-helpdesk@desy.de and naf@desy.de). This will give the user/admin a ticket number to reply to in case of no response. The NAF admin will investigate if this is possible.