Minutes of NAF User Committee meeting from 12.11.2008 ----------------------------------------------------- Face-to-face in Hamburg. Present: Steve Aplin (ILC, first 30 minutes), Johan Blouw (LHC-B), Wolfgang Ehrenfeld (ATLAS), Andreas Gellrich (NAF), Andreas Haupt (IT), Kai Leffhalm (NAF), Birgit Lewendel (IT), Hartmut Stadie (CMS), Jan Erik Sundermann (ATLAS), Alexey Zhelezov (LHC-B), Yves Kemp (IT) Phone: Carsten Hof (CMS, phone) 1. News from the chair General announcements: Steve Aplin (DESY) and Niels Meyer (DESY) will represent the ILC interests in the NAF User Committee from now on. For 2009 the meeting date and time will be reevaluated. Wolfgang Ehrenfeld will setup a doodle vote. Preferred dates are Tuesday or Thursday. From Grid board: The Grid board will discuss the idea of a UserAnalysis@NAF workshop for the first half next year at the next meeting. Atlas and CMS showed interest. CMS particular in more general topics as efficient ROOT usage. LHCb has no interest. ILC has no opinion at the moment. From analysis centre: In the MC group a workshop for Grid Tools/Monte Carlo is discussed. ILC showed some interest. From experiments: Atlas requested another work group server. The usage of the existing ones is high and compilation time sometimes extremely slow. The causes and possible solutions where extensively discussed. More work group server will no solve all problems. qrsh might be an option for running interactive applications on the batch system. The NAF admins should investigate if the slow compile time is AFS related and if this can be fixed. 2. Action items: Positive/negative list of gsidcap support in experiment specific applications: Atlas: athena can use gsidcap, except for release 12, where still AODs are around for another few month. ganga can be changed to use gsidcap as default protocol. CMS: not tested. ILC: no input LHCb: no problems with software, but don't want it. For the user it is more complex, as a valid proxy is needed. gsidcap is slower than dcap. It was noticed that security is important but usability is equally important. For example LHCb don't need at all a valid proxy on the NAF at the moment, which is required for gsidcap access to the SE. automatic extension of the grid proxy is a very important requirement for a smooth and user friendly transition from dcap to gsidcap access. The NAF admins are looking into this. 3. Operations Report given by Kai Leffhalm. Slides are on the indico page. All experiments should have an email list of the form naf-VO-support@desy.de, where the NAF admins can contact experiment admins. e. g. if /scratch is full. Most experiments have this already. For the rest it will be created. The experiments complained that the admins are not informed about software updates. After some discussion the following policy was agreed: - important policy changes should be announced to the users well in time - important software updates will be announce to the experiment admins - all software updates should be documented, best on a web page For the NAF admins a monthly maintenance slot is sometimes too far away. After some discussion a weekly slot was agreed on: Maintenance can be done every Thursday between 8am and 10am. It should be announced well in advanced. Please note that this is an optional maintenance slot, which usually will not be used every week. Urgent things, e. g. fixing security issues, are not affect by this and will be fixed as soon as possible. This can also effect standard operations. The batch monitoring web page was presented, which give the user a lot of information about his own jobs and the load in general. It is unclear if the nominal capacity is static or dynamic. Should be checked by the NAF admins. An automatic monitoring and email notification of the experiment admins of the /scratch usage would be useful. NAF admins are looking into this. The usage of grid proxies on the batch system was discussed. In general a grid proxy is not needed for most of the work at the NAF. The grid proxy is written to the users home directory, which then can be read from batch jobs. Any change to the grid proxy will influence batch jobs, if they depend on it. On the other side the grid proxy can be extended in this way for a running job. 4. Batch queues Andreas Haupt explained some internals of the SGE scheduler. For effective usage of the batch resources and fast scheduling users should specify as good as possible the needed resources: CPU time and memory usage. A default of 2 GB per job is okay, but less memory usage can result in a fast scheduling. The batch monitoring pages display besides other information the requested and used resources of user jobs. Different options for a default CPU time (h_cpu) were discussed, but the old default behaviour is favoured: 48h. Atlas requested 20-30 slots or 5% of all slots with a running time of 1 week. The NAF admins will compile some documentation to interpret the exit and failure codes of the jobs. It was suggested that it should be made much clearer, when a job is killed because of CPU or memory limit. Either on the batch monitoring pages, in the log file or the mail notification. 5. afs_admin script Andreas Haupt gave some introduction to the afs_admin script and handed out a printed HOWTO. This should be made available on the web for the experiment admins. An introduction to the AFS volume naming policy and recommendations should be given in on of the next meetings or supplied on the web. An automatic monitoring of the AFS volumes usage would be useful, especially for volume replication. 6. Account registration web service The account registration web service (front end) is functional. An official web address is missing and will be setup by the DESY web office. 7. LHCb status report: Alexey Zhelezov gave a detailed report. Slides are on the indico page. Besides the two Heidelberg groups also the Dortmund group is well integrated into the LHCb NAF activities. All data should be at FZK T1 and imported from there. This data distribution scheme is not working at the moment hence data is imported from everywhere. 8. CMS status report Postponed to one of the next meetings. 9. ILC status report Postponed to one of the next meetings. 10. ATLAS status report Postponed to one of the next meetings. 11. NAF User Meeting at Aachen Topics and format of the NAF User Meeting at Aachen were discussed: batch system, storage (dCache: gsidcap/dcap, Lustre), support A group from ATLAS Bonn asked to present their use case at the meeting. There is time for two more use cases. 12. AOB Nothing.