4. Best Practices
The following collection of best practice guidelines are intended to prevent bugs and improve the computational performance.
4.1. MPI
The general rules can be summarized as follows:
The first rule of MPI is: You do not send subsets of arrays, only complete continuous data ranges.
The second rule of MPI is: You do not send subsets of arrays, only complete continuous data ranges.
Third rule of MPI: Someone sends non-continuous data, the simulation is over.
Fourth rule: Only two processors to a single send-receive message.
Fifth rule: Only one processor access (read or write) to a shared memory region.
Please also read the general implementation information and, e.g., mappings used for elements, sides and nodes in the chapter MPI Implementation.
4.3. Hawk
Before running a simulation, check out the HLRS Wiki pages Batch System PBSPro (Hawk).
4.3.1. Striping
Always use user-defined striping in the simulation case folders that are on the work spaces as the default stiping setting (dynamic striping) has caused massive problems in the past. Add the following code to your submit script
# Set fixed striping to avoid problems with the progressive Lustre file layout
# - Region 1 [0, 1GiB): Stripe-Size=1 MiB, Stripe-Count=1
#lfs setstripe -c 1 -S 1M $PBS_O_WORKDIR
# - Region 2 [1GiB, 4GiB): Stripe-Size=1 MiB, Stripe-Count=4
#lfs setstripe -c 4 -S 1M $PBS_O_WORKDIR
# - Region 3 [4 GiB, EOF): Stripe-Size=4 MiB, Stripe-Count=8
lfs setstripe -c 8 -S 4M $PBS_O_WORKDIR
Note that the correct line should be commented in and the other lines should be commented out, all depending on the size of your output files. Also consider the stripe settings for large mesh files just to be sure.
4.3.2. Species-zero bug
It has repeatedly occurred that particles with species index zero have been produced on hawk. This might be due to the output to .h5, which could reflect the previous section regarding the striping settings, but could also lie deeper the Lustre file system itself. If this problem occurs, the corrupted particles must be removed from the .h5 file by hand if a restart from such a corrupted file is performed in order to prevent piclas from crashing.