Example of a complete work sessionΒΆ

I just downloaded and installed Monte Python, read the previous pages, and I wish to launch and analyse my first run.

I can first create a few folders in order to keep my montepython directory tidy in the future. I do a

$ mkdir chains
for storing all my chains
$ mkdir chains/planck
if the first run I want to launch is based on the fake planck likelihood proposed in the example.param file
$ mkdir input
for storing all my input files
$ mkdir scripts
for storing all my scripts for running the code in batch mode

I then copy example.param in my input folder, with a name of my choice, e.g. lcdm.param, and edit it if needed:

$ cp example.param input/lcdm.param

I then launch a short chain with

$ montepython/Montepython.py run -p input/lcdm.param -o chains/planck/lcdm -N 5

I can see on the screen the evolution of the initialization of the code. At the end I check that I have a chain and a log.param written in my chains/planck/lcdm/log.param directory. I can immediately repeat the experience with the same command. The second chain is automatically created with number 2 instead of 1. I can also run again without the input file:

$ montepython/Montepython.py run -o chains/planck/lcdm -N 5

This works equally well because all information is taken from the log.param file.

In some cases, initally, I don’t have a covariance matrix to pass in input [1] . But in this particular example I can try the one delivered with the Monte Python package, in the covmat/ directory:

$ montepython/Montepython.py run -p input/lcdm.param\
        -o chains/planck/lcdm -c covmat/fake_planck_lcdm.covmat -N 5

I don’t have yet a covariance matrix to pass in input, otherwise I would have run with

$ montepython/Montepython.py run -p input/lcdm.param -o chains/planck/lcdm -c mycovmat.covmat -N 5

I now wish to launch longer runs on my cluster or on a powerful desktop. The syntax of the script depends on the cluster. In the simplest case it will only contain some general commands concerning the job name, wall time limit etc., and the command line above (I can use the one without input file, provided that I made already one short interactive run, and that the log.param already exists; but I can now increase the number of steps, e.g. to 5000 or 10000). On some cluster, the chain file is created immediately in the output directory at start up. In this case, the automatic numbering of chains proposed by Monte Python will be satisfactory.

Warning

On some clusters, the automatic numbering will conflict when the chains are created too fast. Please look at the section on how to use mpi_run for guidance

In other clusters, the chains are created on a temporary file, and then copied at the end to the output file. In this case, if I do nothing, there is a risk that chain names are identical and clash. I should then relate the chain name to the job number, with an additional command line --chain_number $JOBID. Some clusters, $JOBID is a string, but the job number can be extracted with a line like export JOBNUM="$(echo $PBS_JOBID|cut -d'.' -f1)", and passed to Monte Python as --chain_number $JOBNUM.

If I use in a future run the Planck likelihood, I should not forget to add in the script (before calling Monte Python) the line

source /path/to/my/plc/bin/clik_profile.sh

I then launch a chain by submitting the script, with e.g. qsub scripts/lcdm.sh. I can launch many chains in one command with

$ for i in {1..10}; do qsub scripts/lcdm.sh;done

If you cluster creates the chains too fast, there might be conflicts in the chain names. One way to go around this issue is to run with mpi, which is a parallelization process. The chains will be initialised one after the other, each one sending a go signal to the next in line.

To launch a job with mpi, the syntax is exactly the same than without, except that you will start the whole command with, depending on your installation, mpirun or mpiexec:

mpirun -np 4 python montepython/MontePython.py run -o chains/...

will simply launch 4 chains, each using the environment variable $OMP_NUM_THREADS for the number of cores to compute Class.

When the runs have stopped, I can analyse them with

$ montepython/Montepython.py info chains/planck/lcdm

If I had been running without a covariance matrix, the results would probably be bad, with a very low acceptance rate and few points. It would have however created a covariance matrix chains/planck/lcdm/lcdm.covmat. I can decide to copy it in order to keep track of it even after analysing future runs,

cp chains/planck/lcdm/lcdm.covmat chains/planck/lcdm/lcdm_run1.covmat

I now add to my script, in the line starting with montepython/Montepyhton.py, the option

-c chains/planck/lcdm/lcdm_run1.covmat

run on the same day as the previous one, it might be smart to change also a bit the number of steps (e.g. from 5000 to 5001) in order to immediately identify chains belonging to the same run.

When this second run is finished, I analyse it with e.g.

montepython/Montepython.py info chains/planck/lcdm/2012-10-27_5001*

If all R-1 numbers are small (typically \(<0.05\)) and plots look nice, I am done. If not, there can be two reasons: the covariance matrix is still bad, or I just did not get enough samples.

I can check the acceptance rate of this last run by looking at the chains/planck/lcdm/lcdm.log file. If I am in a case with nearly gaussian posterior (i.e. nearly ellipsoidal contours), an acceptance rate \(<0.2\) or \(>0.3\) can be considered as bad. In other cases, even 0.1 might be the best that I can expect. If the acceptance rate is bad, I must re-run with an improved covariance matrix in order to converge quicker. I copy the last covariance matrix to lcdm_run2.covmat and use this one for the next run. If the acceptance rate is good but the chains are not well converged because they are simply too short, then I should better rerun with the same covariance matrix lcdm_run1.covmat: in this way, I know that the proposal density is frozen since the second run, and I can safely analyse the second and third runs altogether.

If I do two or three runs in that way, I always loose running time, because each new chain will have a new burn-in phase (i.e. a phase when the log likelihood is very bad and slowly decreasing towards values close to the minimum). If this is a concern, I can avoid it in three ways:

  • before launching the new run, I set the input mean value of each parameter in the input file to the best-fit value found in the previous run. The runs will then start from the best-fit value plus or minus the size of the first jump drown from the covariance matrix, and avoid burn-in. Since I have changed the input file, I must rerun with a new output directory, e.g. chain/lcdm2. This is a clean method.
  • I might prefer a less clean but slightly quicker variant: I modify the mean values, like in the previous item, but directly in the log.param file, and I rerun in the same directory without an input file. This will work, but it is advisable not to edit the log.param manually, since it is supposed to keep all the information from previous runs.
  • I may restart the new chains from the previous chains using the -r command line option. The name of previous chains can be written after -r manually or through a script.
  • I can also restart from the best-fit found previously, using the -bf command line option, specifying the .bestfit file to use.

When I am pleased with the final plots and result, I can customize the plot content and labels by writing a short file plot_files/lcdm.plot passed through the -extra command line option, and paste the latex file produced by Monte Python in my paper.

Footnotes

[1]If I am also a CosmoMC user, I might have an adequate covmat to start with, before using the covmat that Monte Python will produce. Fot this I just need to edit the first line, add comas between paramater names, and for parameter that are identical to those in my run, replace CosmoMC parameter names with equivalent Class parameter names.}