This recipe explains briefly the LOFAR cosmic rays analysis pipeline from a practical users perspective
This recipe assumes that you have a working installation of the LOFAR user software in the directory pointed to by the bash environment variable $LOFARSOFT. The recipe also assumes that you have an svn account for the LOFAR respository.
On the coma cluster the software is located in /vol/optcoma/lofarsoft. This directory contains a checkout from the subversion repository http://usg.lofar.org/svn/code/trunk and was build by a make pycrtools in the build subdirectory. To perform an update of the software simply run svn up in this directory and repeat the make.
The PyCRTools source code is located in the $LOFARSOFT/src/PyCRTools subdirectory and this recipe assumes that the $PYCRTOOLS environment variable points to this directory.
The $PYCRTOOLS directory contains several files and directories that are important for the cosmic rays pipeline. They require a certain structure in your home folder, please check and change accordingly if needed.
The raw LOFAR data is stored in .h5 files on the coma cluster in /vol/astro3/lofar/vhecr/lora_triggered/data. When new events are added to this directory, or when existing events are updated with data from stations that were previously unavailable, they are added to the database by a cronjob executing cr_populate_database.sh from the $PYCRTOOLS/jobs directory. This script in turn calls run_crdb_populate.py and add_beam_direction.py to crawl over all files and add changes to the database. The crontab entry (edited with crontab -e) looks like:
0 0 * * * sbatch -p short $PYCRTOOLS/jobs/cr_populate_database.sh &> /dev/null
The software that is used to reconstruct the LORA data, to such a format that it can be used by the pipeline, is located in the subversion repository hosted at https://svn.science.ru.nl/repos/lora_software. The actual analysis software is based on C++ and ROOT (CERN software) and was written by Satyendra Thoudam. In order to be run automatically by a cron job, the script runninglora_coma.py is used (also in the repository). The file lora.job (also in the repository) is a suggestion of what to run as a daily cronjob. The script will, when run, collect all relevant data from the coma cluster, generate input files, execute the analysis and copy the final data product to where the cosmic ray pipeline finds it.
Required are (besides a working installation of ROOT and PyCRtools as they are on coma):
The first analysis step will generate calibrated data files per month, such as output_201501.root (to be found in the relative directory analysis/LORAOutput/).
The second parameterization step generates data files, such as LORAdata-20150114T205644.dat, LORAdata-20150114T205644.png, LORAdata-20150114T205644.eps. The .dat file is needed to fill the pipeline. The .png file will be displayed on the webserver. The .eps file is for publication purposes, if needed.
In order to run any scripts, “use offline” is required. This sets the ROOT paths to the ROOT version used in offline (software Pierre Auger Collaboration, in case of problems please coordinate with whoever is currently in charge). In case this needs to be updated the software needs to be recompiled with make main_analysis and make main_parameterization in the analysis folder. Note: make clean sometimes does not remove all files correctly so make sure to check.
The software will need to be updated once:
The data files LORAdata- and LORAtime4 are read using the lora.py module in pycrtools. This also contains all the magic of correcting timeoffsets etc. The script “update_loradata.py” in the $PYCRTOOLS/scripts folder can be used to update LORA data, if the automated running has been cancelled for a while or if due to any reason the events are not filled with LORA data and are skipped by the pipeline.
If a bug is found, all data needs to be regenerated. After that the database needs to be updated with the new parameters. Please use the options of runninglora_coma.py (LORA svn repository) to rerun the LORA data. It is wise to first regenerate all analysis files and then run the parameterization once over all data. Only after this, the database should be updated.
The cosmic rays analysis pipeline consists of one top level Python script cr_physics.py and associated modules and tasks that are called from it. All results are written to the directory /vol/astro3/lofar/vhecr/lora_triggered/results and the database.
The pipeline is started on the cluster by having a cronjob execute the script cr_pipeline.sh. The crontab entry (edited with crontab -e) looks like:
0 3 * * * $PYCRTOOLS/jobs/cr_pipeline.sh > $HOME/cr_pipeline.log
This script determines how many events are marked with status NEW in the database and submits a single array job to the SLURM scheduler for which each task executes cr_physics.sh (which simply calls cr_physics.py with the correct parameters and environment). Thus, the easiest way to (re)process some or all events is to mark them as NEW in the database and run cr_pipeline.sh on coma.science.ru.nl.
The simulation pipeline consists of one file cr_simulation.py. Simulations are first written locally into the /scratch/cr_sim_pipeline directory of the compute node where the script was executed. Then they are copied to /vol/astro3/lofar/sim/pipeline/events.
This pipeline is started on the cluster by having a cronjob execute the script cr_simulations.sh. The crontab entry (edited with crontab -e) looks like:
0 8 * * * $PYCRTOOLS/jobs/cr_simulations.sh > $HOME/cr_simulations.log
The webserver consists of a single Python script crserver.py should be running on the same server that contains the PostgreSQL database. It can be accessed by going to port $8000/events$ on that machine, in our case mainly http://crdb.astro.ru.nl:8000/events. (All events can be found at http://crdb.astro.ru.nl:8000/events ). General statistics (generated on the fly from the database) are available at http://crdb.astro.ru.nl:8000/statistics.
The webserver looks for figures in the /vol/astro3/lofar/vhecr/lora_triggered/results directory on the local machine and hence needs read access. It also expects write access to /vol/astro3/lofar/vhecr/lora_triggered/statistics to write the statistics plots.
When the webserver is not working or displaying things incorrectly, it probably needs to be restarted. This can only be done by C&CZ, so send a mail to the postmaster.
This should never be needed for stable operation, but might be desired for new analysis. It is a simple procedure.
That should be it! The database will be updated automatically when it is opened by the pipeline the next time.
The http://crdb.astro.ru.nl:8000/statistics page is a good starting point to check how the events and simulations are doing. Here is a typical list of failures that might occur.