Minutes of NAF User Committee meeting from 14.01.2009 ----------------------------------------------------- Present/Phone: Steve Aplin (ILC), Johan Blouw (LHCb), Wolfgang Ehrenfeld (ATLAS), Andreas Gellrich (NAF), Andreas Haupt (IT), Carsten Hof (CMS), Yves Kemp (IT), Kai Leffhalm (NAF), Birgit Lewendel (IT), Niels Meyer (ILC), Hartmut Stadie (CMS), Jan Erik Sundermann (ATLAS) 1. News from the chair No news from Grid Project Board or Analysis Centre. A short summary of the NAF related part of the Grid Project Workshop in Munich was given by the people attending the workshop. From this discussion a few questions arose and were discussed: - The main focus of the interactive NAF is data analysis, but as everybody already knows collision data from LHC is only expected end of this year. It is good that users are working on the NAF to test the setup and improve work flows, but one should not forget, that working on the Grid is a main ingredient of all experiment's computing models. Still, it is up to the experiments what they do with their resources. - If the main focus of the interactive NAF is interactive work, which needs fast response times, then this is in contradiction with a 100% workload. The batch system can and is already tuned to give a good compromise between fast response and workload. - The batch system setup is already tuned for fast response needed for interactive and PROOF work. Further tuning can and should be done, but needs use cases and active users. The later are not available at the moment. - The accounting of jobs with large memory requested were discussed. In the current accounting only CPU time enters. For the scheduling of further jobs the requested memory is included. This means a user requesting large memory resources will get a penalty for job submission. On the other side if he blocks some cores because of insufficient memory, the blocked cores or CPU time is not accounted for. - In case users have the feeling thy are not getting the resources they should get or jobs are not scheduled fast enough, they should contact naf-helpdesk@desy.de with details. 2. Action items: gsidcap: When the NAF admins have a working automatic grid proxy renewal service, this action item will be opened again and a migration plan from dcap to gsidcap will be discussed. A test version might be available with in one or two months. Closed. Batch capacity monitoring: Wolfgang Friebel has added the number of available cores to the NAF batch monitoring pages. Closed. ILC user documentation ILC has added a top page to naf.desy.de and added some documentation. Closed. Action items with no update: NAF: document software updates/changes Wolfgang Ehrenfeld: Proposal for software docu web page. ATLAS: update on cmt compile problem (CERN) 3. Status report Status report was given by Yves. See the agenda for the report. Below a few highlights from the discussion are listed: /tmp on work group servers: The /tmp area of the ATLAS work group server are filling up rapidly. This can not be handled by the NAF admins alone. If a /tmp area is filled up to 90% the NAF VO support should be contacted and together the current problem should be investigated, as the NAF VO support has not full access to all files/directories on the work group servers. Also, the grace time of files in /tmp will be reduced from 10 to 7 days. Yves will find out the details of the clean up script and inform the NUC. SL5: At the moment there is no automatic installation of Grid WNs for SL5. DESY will wait for this before eventually migrating the Grid WN. When the Grid WNs will be migrated the NUC will discuss the migration of the NAF interactive machines. It looks like all experiments can run binaries on SL5, so there is no problem expected for migrating the interactive batch nodes. For code compilation SL4 might be needed by the experiments. This situation will be reevaluated when the Grid is ready for migration. All experiments should check if the NAF SL5 setup is working for their software. 4. LHCb requirement paper The main parts of the LHCb requirements paper was presented by Johan. It was already discussed with the NAF admins. They don't see any technical and financial problems. Birgit pointed out that the Lustre space is scratch space. If higher reliability for user data is needed LHCb should consider supplying dCache user space. LHCb do not see the need at the moment. 5. Account changes This topic was already discussed in the last NUC meeting, but not so much on the technical level. Andreas H, Kai, Wolfgang and Yves will prepare some instructions how setting up additional accounts should be handled. This will be discussed in the next NUC meeting. 6. Batch monitoring pages So far the information for the interactive batch monitoring is only collected when a job is finished. This implies that the last 7 days of workload plots can change. These plots are for monitoring the overall efficiency but not the current status. Wolfgang Friebel is working on monitoring the current status of the batch system. All experiments are fine with the current features, although LHCb has not really looked at it. The interactive batch monitoring allows to see the jobs of the current user. The NAF VO support should also be able to see the monitoring information of the users belonging to their VO. Wolfgang will follow up on this with Wolfgang Friebel. 7. AOB root access to VO work group server was discussed. For example, the NAF VO support can not see all directories in /tmp and therefore can not help to find the problem of full /tmp. General root access will not be given, but the NAF admins should think about sudo access or other solutions. The GridKa school will be help from 31.8. to 4.9.09 at Karlsruhe. Please inform you users.