Minutes of NAF User Committee meeting from 10.3.2010 ---------------------------------------------------- Present: Steve Aplin (ILC), Marcello Barisonzi (ATLAS), Wolfgang Ehrenfeld (ATLAS), Andreas Haupt (IT), Yves Kemp (IT), Kai Leffhalm (NAF), Harmut Stadie (CMS), Alexey Zhelezov (LHCb) Excused: Johan Blouw (LHCb), Andreas Gellrich (NAF), Andreas Nowak (CMS) 1. News from the chair: Marcello Barisonzi (Wuppertal) replaces Jan Erik Sundermann (Freiburg) from now onward as one of the two ATLAS members. In the February ATLAS-D group leader meeting it was complained that the NAF downtimes in January and February seemed quite long and many. This need and will change in the future as LHC is running again. 2. Status report: Status report was given by Kai. See the agenda for the report. Below a few highlights from the discussion are listed: - February downtime Last major downtime for longer time. All infrastructure fully deployed. - Work group server reboot to fix Lustre bug All work group servers need a reboot to fix one Lustre bug. The current status of all work group servers is no fully clear. Better coordination is needed now and in the future. The NAF admins will follow up offline with the experiments. - Batch Fairshare After hardware additions from single groups the fairshare was recalculated: ATLAS 25.7%, CMS 43.6%, ILC 4.2%, LHCB 26.5% - ATLAS CMT Problem The NAF is investigating another AFS patch. Wolfgang had already suggest a different AFS client tuning (CERN one) to the NAF and is waiting for a reply when the expert is back from vacation. - Deletion on /scratch The NAF is investigation to get file meta data (filename and last access time) directly from Lustre. At the moment there is no estimate when a ready to use tool is ready. ATLAS explicitly asked for a better time estimate within one week. If the tool can not be delivered within a short time period ATLAS asks for another temporary solution. - Automatic AFS scratch creation First the platform adaptor needs to be extended to create the AFS scratch for every new account. Afterwards, the AFS scratch can be created for the already existing users. First the trivial cases and then the more complex. The work on the platform adaptor has not started yet. Further, new hardware needs to be ordered. The experiments have asked for a faster implementation of the whole project and would like to see a more detailed planning. For the start 2GB AFS scratch would be sufficient. For example CMS has already created the space per user, but not mounted into the home directory. ATLAS would like to get this finished as soon as possible. For ILC this is not very urgent but a good addition to the Lustre space. - Multicore Job Monitoring Multicore jobs are used only by CMS at the moment. If this changes, the given numbers should be split up by experiment. 3. Action items: The following action items have been closed: NAF: recalculate batch fair share after addition of new hardware NAF: glite 3.2 (documentation) ALL: NAF SL5 migration (running) SL5 migration done for ILC and LHCb. ATLAS and CMS have still SL4 work group server. They will contact the NAF if these are not needed anymore. NAF: advice on user files storage (code development, small log files) For updates on the following items see the status report and comments within this minutes. NAF: check, if SGE allows to change user priorities within an experiment by the experiment admins all: documentation (software updates, downtimes, ...) and user information (news letter, motd, news section on NAF web, ...) NAF/ATLAS: CMT problem (NAF invite experts/ATLAS) NAF: AFS scratch space creation NAF: deletion model for /scratch NAF: multicore batch job monitoring 4. Lustre Usage Again, the experiments have presented their envisioned use cases for the Lustre /scratch space. Not all the features are already implemented. The NAF should use this information to present some possible implementations for discussion at the next meeting. 5. Documentation Documentation and user information needs to be improved. The web pages are already quite good. One suggest is to gain more from the problems discussed on the experiments support lists. Every experiment should summaries the questions and problems from last month at the current meeting. Common questions and problems should then be added to the NAF FAQ. Some techniques (motd, email, twitter, ...) were discussed for better user information of changes at the NAF. The NAF and NUC members will test twitter and evaluate if this is a adequate communication channel. 6. AOB: Nothing.