Implementation

I. Introduction

Smilei is a C++ code that uses relatively simple C++ features for modularity and conveniency for non-advanced C++ users.

The repository is composed of the following directories:

  • Licence: contains code licence information

  • doc: contains the Sphinx doc files

  • src: contains all source files

  • happi: contains the sources of the happi Python tool for visualization

  • benchmarks: contains the benchmarks used by the validation process, these benchmarks are also examples for users.

  • scripts: contains multiple tool scripts for compilation and more

    • compile_tools: contains scripts and machine files used by the makefile for compilation

  • tools: contains some additional programs for Smilei

  • validation: contains the python scripts used by the validation process

The source files directory is as well composed of several sub-directories to organise the .cpp and .h files by related thematics. The main is the file Smilei.cpp. There is always only one class definition per file and the file name corresponds to the class name.

The general implementation is later summarized in Fig. 83


II. General concept and vocabulary

This section presents some implicit notions to understand the philosophy of the code.

Notion of data container

Data containers are classes (or sometime just structures) used to store a specific type of data, often considered as raw data such as particles or fields. Some methods can be implemented in a data container for managing or accessing the data.

_images/data_container.png

Fig. 74 Data container.

Notion of operators

An operator is a class that operates on input data to provide a processed information. Input data can be parameters and data containers. Output data can be processed data from data containers or updated data containers. An operator is a class functor (overloading of the () ). Sometime, operator provides additional methods called wrappers to provide different simplified or adapted interfaces. An operator do not store data or temporarily. for instance, the particle interpolation, push and protection are operators.

_images/operator.png

Fig. 75 Operator.

Notion of domain parts

Domain parts are classes that represents some specific levels of the domain decomposition. They can be seen as high-level data container or container of data container. They contain some methods to handle, manage and access the local data. For instance, patches and Species are domain parts:

  • Species contains the particles.

  • Patch contains Species and Fields.

Notion of factory

Some objects such as operators or data containers have several variations. For this we use inheritance. A base class is used for common parameters and methods and derived classes are used for all variations. The factory uses user-defined input parameters to determine the right derived class to choose and initiate them as shown in Fig. 76. For instance, there are several push operators implemented all derived from a base push class. The push factory will determine the right one to use.

_images/factories.png

Fig. 76 Description of the factory concept.

Other

Some classes are used for specific actions in the code such as the initialization process.


III. Domain decomposition and parallelism

The simulation domain is divided multiple times following a succession of decomposition levels. The whole domain is the superimposition of different grids for each electromagnetic field component and macro-particles. Let us represent schematically the domain as an array of cells as in Fig. Fig. 77. Each cell contains a certain population of particles (that can differ from cell to cell).

_images/domain.png

Fig. 77 Example of a full domain with 960 cells.

In smilei, the cells are first reorganized into small group so-called patches. The domain becomes a collection of patches as shown in Fig. 78.

_images/patch_domain_decomposition.png

Fig. 78 The domain in Smilei is a collection of patches.

A patch is an independent piece of the whole simulation domain. It therefore owns the local electromagnetic grids and list of macro-particles. Electromagnetic grids have ghost cells that represent the information located in the neighboring patches (not shown in Fig. 78). All patches have the same spatial size .i.e. the same number of cells. The size of a patch is calculated so that all local field grids (ghost cells included) can fit in L2 cache.

Patches are then distributed among MPI processes in so-called MPI patch collections. The distribution can be ensured in an equal cartesian way or using a load balancing strategy based on the Hilbert curve.

_images/mpi_patch_collection.png

Fig. 79 Patches are then distributed among MPI processes in so-called MPI patch collections.

Inside MPI patch collection, OpenMP loop directives are used to distribute the computation of the patches among the available threads. Since each patch has a different number of particles, this approach enables a dynamic scheduling depending on the specified OpenMP scheduler. As shown in Fig. 83, a synchronization step is required to exchange grid ghost cells and particles traveling from patch to patch.

The patch granularity is used for:

  • creating more parallelism for OpenMP

  • enabling a load balancing capability through OpenMP scheduling

  • ensuring a good cache memory efficiency at L3 and L2 levels.

The patch is not the smaller decomposition grain-size. The patch can be decomposed into bins as shown in Fig. 80.

_images/bin_decomposition.png

Fig. 80 Bin decomposition.

Contrary to patch, a bin is not an independent data structure with its own arrays. It represents a smaller portion of the patch grids through specific start and end indexes. For the macro-particles, a sorting algorithm is used to ensure that in the macro-particles located in the same bin are grouped and contiguous in memory.

Finally, the decomposition levels are summarized in Fig. 81.

_images/decomposition_summary.png

Fig. 81 Domain decomposition summary.


IV. Coding and community rules

Coding style rules

Line and indentation style

  • A line should be no more than 140 characters.

  • The code should not have some characters outside [0,127] in the ASCII table

  • There should be no more than 2 empty successive lines.

  • Only 1 statement per line is allowed.

  • An indentation block is only composed of 4 spaces, no tab is permitted.

  • If possible, keep trailing spaces in empty lines to respect indentation

For instance, this line is not correct:

a1 = b1 + c1*d1; a2 = b2 + c2*d2;

and should be replaced by:

a1 = b1 + c1*d1;
a2 = b2 + c2*d2;

Variables

  • Variable names (public, protected, private, local) should be composed of lowercase and underscore between words.

  • The lowercase rule does not apply on acronyms and people names

For instance:

int number_of_elements;
int lignes_of_Smilei;
int age_of_GNU;

Classes

  • A class name starts with a capital letter

  • A capital letter is used between each word that composes the class name

  • No underscore

For instance:

class MyClassIsGreat
{
    public:
    int public_integer_;
    private:
    int private_integer_;
}
  • A class variable respects the rules of variables previously described.

  • A variable member has an underscore at the end of the name

Functions

  • Functions and member functions should start with a lowercase letter.

  • No underscore.

  • Each word starts with a capital letter.

For instance:

int myGreatFunction(...)
{
...
}
  • Names should be explicit as much as possible.

  • Avoid using shortened expression.

For instance:

void computeParticlePosition(...)
{
...
}

instead of:

void computePartPos(...)
{
...
}

If-statement

Any if-statement should have curly brackets.

For instance:

if( condition ) {
    a = b + c*d;
}

ERROR and OUTPUT management

Developers should use the dedicated error our output macro definition located in the file src/Tools.h.

  • ERROR: this function is the default one used to throw a SIGABRT error with a simple messages.

  • ERROR_NAMELIST: this function should be used for namelist error. It takes in argument a simple message and a link to the documentation. It throws as well a SIGABRT signal.

  • MESSAGE: this function should be used to output an information message (it uses std::cout).

  • DEBUG : should be used for debugging messages (for the so-called DEBUG mode)

  • WARNING : should be used to thrown a warning. A warning alerts the users of a possible issue or to be careful with some parameters without stopping the program.


V. Data structures and main classes

This section describes the main classes and the tree-like smilei data structure. The whole picture is shown in Fig. 82.

_images/data_structure.png

Fig. 82 General of the main tree-like data structure of Smilei.

Class VectorPatch

The class VectorPatch represents the MPI Patch collection described above and is the highest structure level. The class description (vectorPatch.h and vectorPatch.cpp) is located in the directory src/Patch. Among the data components stored in this class, one of the most important is the list of patches. By definition, each MPI process has therefore only one declared vectorPatch object.

class VectorPatch
std::vector<Patch*> patches_

List of patches located in this MPI patch collection.

The class VectorPatch contains the methods directly called in the PIC time loop in smilei.cpp.

Class Patch

The class Patch is an advanced data container that represents a single patch. The base class description (Patch.h and Patch.cpp) is located in the directory src/Patch. From this base class can be derived several versions (marked as final) depending on the geometry dimension:

  • Patch1D

  • Patch2D

  • Patch3D

  • PatchAM for the AM geometry

The class Patch has a list of object Species called vecSpecies.

class Patch
std::vector<Species*> vecSpecies

List of species in the patch

ElectroMagn *EMfields

Electromagnetic fields and densities (E, B, J, rho) of the current Patch

class Species

The class Species is an advanced data container that represents a single species. The base class description (Species.h and Species.cpp) is located in the directory src/Species. From this base class can be derived several versions (marked as final):

  • SpeciesNorm

  • SpeciesNormV

  • SpeciesV

  • SpeciesVAdaptive

  • SpeciesVAdaptiveMixedSort

The correct species object is initialized using the species factory implemented in the file speciesFactory.h.

The class Species owns the particles through the object particles of class Particles*.

class Species
Particles *particles

Vector containing all Particles of the considered Speciesh

class Particles

The class Particles is a data container that contains the particle properties. The base class description (Particles.h and Particles.cpp) is located in the directory src/Particles. It contains several arrays storing the particles properties such as the positions, momenta, weight and others.

class Particles
std::vector<std::vector<double>> Position

Array containing the particle position

std::vector<std::vector<double>> Momentum

Array containing the particle moments

std::vector<double> Weight

Containing the particle weight: equivalent to a charge density

std::vector<double> Chi

containing the particle quantum parameter

std::vector<double> Tau

Incremental optical depth for the Monte-Carlo process

std::vector<short> Charge

Charge state of the particle (multiples of e>0)

std::vector<uint64_t> Id

Id of the particle

std::vector<int> cell_keys

cell_keys of the particle

Many of the methods implemented in Particles are used to access or manage the data.

class ElectroMagn

The class ElectroMagn is a high-level data container that contains the electromagnetic fields and currents. The base class description (ElectroMagn.h and ElectroMagn.cpp) is located in the directory src/ElectroMagn. From this base class can be derived several versions (marked as final) based on the dimension:

  • ElectroMagn1D

  • ElectroMagn2D

  • ElectroMagn3D

  • ElectroMagnAM

The correct final class is determined using the factory ElectroMagnFactory.h.

class ElectroMagn
Field *Ex_

x-component of the electric field

class Field

The class Field is a data-container that represent a field grid for a given component. The base class description (Field.h and Field.cpp) is located in the directory src/Field. It contains a linearized allocatable array to store all grid nodes whatever the dimension.

class Field
double *data_

pointer to the linearized field array

From this base class can be derived several versions (marked as final) based on the dimension:

  • Field1D

  • Field2D

  • Field3D

The correct final class is determined using the factory FieldFactory.h.

Smilei MPI

The class SmileiMPI is a specific class that contains interfaces and advanced methods for the custom and adapted use of MPI within Smilei. The base class description (SmileiMPI.h and SmileiMPI.cpp) is located in the directory src/SmileiMPI.

class SmileiMPI

VI. The Smilei PIC loop implementation

The initialization and the main loop are explicitely done in the main file Smilei.cpp. The time loop is schematically described in Fig. 83.

_images/smilei_main_loop.png

Fig. 83 Smilei main loop implementation (click on the figure for more details).

The time loop is explicitely written step by step in the main file Smilei.cpp thought calls to different vecPatches methods.

  • Patch reconfiguration: if adaptive vectorization is activated, then the patch may be reconfigured for scalar or vectorized mode.

vecPatches.reconfiguration( params, timers, itime );
  • Collision: particle collisions are performed before the particle dynamics part

vecPatches.applyBinaryProcesses( params, itime, timers );
  • Relativistic poisson solver: …

vecPatches.runRelativisticModule( time_prim, params, &smpi,  timers );
  • Charge: reset global charge and currents densities to zero and computes rho old before moving particles

vecPatches.computeCharge(true);
  • Particle dynamics: this step processes the full particle dynamics including field interpolation, advanced physical operators (radiation, ionization…), boundary condition pre-processing and projection

vecPatches.dynamics( params, &smpi, simWindow, radiation_tables_,
                                 MultiphotonBreitWheelerTables,
                                 time_dual, timers, itime );
  • Envelope module: …

vecPatches.runEnvelopeModule( params, &smpi, simWindow, time_dual, timers, itime );
  • Sum of currents and densities: current et charge reduction from different species and perform the synchronization step with communication

vecPatches.sumDensities( params, time_dual, timers, itime, simWindow, &smpi );
  • Maganement of the antenna: apply currents from antennas

vecPatches.applyAntennas( time_dual );
  • Maxwell solvers: solve Maxwell

vecPatches.solveMaxwell( params, simWindow, itime, time_dual, timers, &smpi );
  • Particle communication: finalize particle exchanges and sort particles

vecPatches.finalizeExchParticlesAndSort( params, &smpi, simWindow,
                                             time_dual, timers, itime );
  • Particle merging: merging process for particles (still experimental)

vecPatches.mergeParticles(params, time_dual,timers, itime );
  • Particle injection: injection of particles from the boundaries

vecPatches.injectParticlesFromBoundaries(params, timers, itime );
  • Cleaning: Clean buffers and resize arrays

vecPatches.cleanParticlesOverhead(params, timers, itime );
  • Field synchronization: Finalize field synchronization and exchanges

vecPatches.finalizeSyncAndBCFields( params, &smpi, simWindow, time_dual, timers, itime );
  • Diagnostics: call the various diagnostics

vecPatches.runAllDiags( params, &smpi, itime, timers, simWindow );
  • Moving window: manage the shift of the moving window

1timers.movWindow.restart();
2simWindow->shift( vecPatches, &smpi, params, itime, time_dual, region );
3
4if (itime == simWindow->getAdditionalShiftsIteration() ) {
5    int adjust = simWindow->isMoving(time_dual)?0:1;
6    for (unsigned int n=0;n < simWindow->getNumberOfAdditionalShifts()-adjust; n++)
7        simWindow->shift( vecPatches, &smpi, params, itime, time_dual, region );
8}
9timers.movWindow.update();
  • Checkpointing: manage the checkoints

checkpoint.dump( vecPatches, region, itime, &smpi, simWindow, params );

Particle Dynamics

The particle dynamics is the large step in the time loop that performs the particle movement computation:

  • Field interpolation

  • Radiation loss

  • Ionization

  • Pair creation

  • Push

  • Detection of leaving particles at patch boundaries

  • Charge and current projection

This step is performed in the method vecPatches::dynamics. We first loop on the patches and then the species of each patch ipatch: (*this )( ipatch )->vecSpecies.size(). For each species, the method Species::dynamics is called to perform the dynamic step of the respective particles. The OpenMP parallelism is explicitly applied in vecPatches::dynamics on the patch loop as shown in the following pieces of code.

 1#pragma omp for schedule(runtime)
 2for( unsigned int ipatch=0 ; ipatch<this->size() ; ipatch++ ) {
 3
 4    // ...
 5
 6    for( unsigned int ispec=0 ; ispec<( *this )( ipatch )->vecSpecies.size() ; ispec++ ) {
 7        Species *spec = species( ipatch, ispec );
 8
 9        // ....
10
11        spec->Species::dynamics( time_dual, ispec,
12                                 emfields( ipatch ),
13                                 params, diag_flag, partwalls( ipatch ),
14                                 ( *this )( ipatch ), smpi,
15                                 RadiationTables,
16                                 MultiphotonBreitWheelerTables );
17
18        // ...
19
20    }
21}

Pusher

The pusher is the operator that moves the macro-particles using the equations of movement. The operator implementation is located in the directory src/Pusher. The base class is Pusher implemented in Pusher.h and Pusher.cpp. From this base class can be derived several versions (marked as final):

  • PusherBoris: pusher of Boris

  • PusherBorisNR: non-relativistic Boris pusher

  • PusherBorisV: vectorized Boris pusher

  • PusherVay: pusher of J. L. Vay

  • PusherHigueraCary: pusher of Higuera and Cary

  • PusherPhoton: pusher of photons (ballistic movement)

A factory called PusherFactory is used to select the requested pusher for each patch and each species.

Note

Whatever the mode on CPU, the pusher is always vectorized.

Boris pusher for CPU

The pusher implementation is one of the most simple algorithm of the PIC loop. In the initialization part of the functor, we use pointers to simplify the access to the different arrays. The core of the pusher is then composed of a single vectorized loop (omp simd) over a group of macro-particles.

 1        is_device_ptr( /* tofrom: */                                          \
 2                       momentum_x /* [istart:particle_number] */,             \
 3                       momentum_y /* [istart:particle_number] */,             \
 4                       momentum_z /* [istart:particle_number] */,             \
 5                       position_x /* [istart:particle_number] */,             \
 6                       position_y /* [istart:particle_number] */,             \
 7                       position_z /* [istart:particle_number] */ )
 8    #pragma omp teams distribute parallel for
 9#elif defined(SMILEI_ACCELERATOR_GPU_OACC)
10    const int istart_offset   = istart - ipart_buffer_offset;
11    const int particle_number = iend - istart;
12
13    #pragma acc parallel present(Ex [0:nparts],    \
14                                 Ey [0:nparts],    \
15                                 Ez [0:nparts],    \
16                                 Bx [0:nparts],    \
17                                 By [0:nparts],    \
18                                 Bz [0:nparts],    \
19                                 invgf [0:nparts]) \
20        deviceptr(position_x,                                           \
21                  position_y,                                           \
22                  position_z,                                           \
23                  momentum_x,                                           \
24                  momentum_y,                                           \
25                  momentum_z,                                           \
26                  charge)
27    #pragma acc loop gang worker vector
28#else
29    #pragma omp simd
30#endif
31    for( int ipart=istart ; ipart<iend; ipart++ ) {
32
33        const int ipart2 = ipart - ipart_buffer_offset;
34
35        const double charge_over_mass_dts2 = ( double )( charge[ipart] )*one_over_mass_*dts2;
36
37        // init Half-acceleration in the electric field
38        double pxsm = charge_over_mass_dts2*( Ex[ipart2] );
39        double pysm = charge_over_mass_dts2*( Ey[ipart2] );
40        double pzsm = charge_over_mass_dts2*( Ez[ipart2] );
41
42        //(*this)(particles, ipart, (*Epart)[ipart], (*Bpart)[ipart] , (*invgf)[ipart]);
43        const double umx = momentum_x[ipart] + pxsm;
44        const double umy = momentum_y[ipart] + pysm;
45        const double umz = momentum_z[ipart] + pzsm;
46
47        // Rotation in the magnetic field

Boris pusher for GPU

On GPU, the omp simd directives is simply changed for the #pragma acc loop gang worker vector. The macro-particles are distributed on the GPU threads (vector level).

Nonlinear Inverse Compton radiation operator

The physical details of this module are given in the page High-energy photon emission & radiation reaction. The implementation for this operator and the related files are located in the directory src/Radiation. The base class used to derive the different versions of this operator is called Radiation and is implemented in the files Radiation.h and Radiation.cpp.

A factory called RadiationFactory is used to select the requested radiation model for each patch and species.

The different radiation model implementation details are given in the following sections.

Landau-Lifshift

The derived class that correspponds to the Landau-Lifshift model is called RadiationLandauLifshitz and is implemented in the corresponding files. The implementation is relatively simple and is similar to what is done in the Pusher since this radiation model is an extension of the classical particle push.

 1    // _______________________________________________________________
 2    // Computation
 3
 4    #ifndef SMILEI_ACCELERATOR_GPU_OACC
 5        #pragma omp simd aligned(rad_norm_energy:64)
 6    #else
 7        int np = iend-istart;
 8        #pragma acc parallel \
 9            present(Ex[istart:np],Ey[istart:np],Ez[istart:np],\
10            Bx[istart:np],By[istart:np],Bz[istart:np]) \
11            deviceptr(momentum_x,momentum_y,momentum_z,charge,weight,chi) \
12            reduction(+:radiated_energy_loc)
13    {
14        #pragma acc loop reduction(+:radiated_energy_loc) gang worker vector private(rad_norm_energy)
15    #endif
16    for( int ipart=istart ; ipart<iend; ipart++ ) {
17        
18        // charge / m^2
19        const double charge_over_mass_square = ( double )( charge[ipart] )*one_over_mass_square;
20
21        // Gamma
22        const double gamma = sqrt( 1.0 + momentum_x[ipart]*momentum_x[ipart]
23                      + momentum_y[ipart]*momentum_y[ipart]
24                      + momentum_z[ipart]*momentum_z[ipart] );
25
26        // Computation of the Lorentz invariant quantum parameter
27        const double particle_chi = Radiation::computeParticleChi( charge_over_mass_square,
28                       momentum_x[ipart], momentum_y[ipart], momentum_z[ipart],
29                       gamma,
30                       Ex[ipart-ipart_ref], Ey[ipart-ipart_ref], Ez[ipart-ipart_ref] ,
31                       Bx[ipart-ipart_ref], By[ipart-ipart_ref], Bz[ipart-ipart_ref] );
32
33        // Effect on the momentum
34        if( gamma > 1.1 && particle_chi >= minimum_chi_continuous ) {
35
36            // Radiated energy during the time step
37            const double temp =
38                radiation_tables.getClassicalRadiatedEnergy( particle_chi, dt_ ) * gamma / ( gamma*gamma-1. );

It is composed of a vectorized loop on a group of particles using the istart and iend indexes given as argument parameters to perform the following steps:

  • Commputation of the particle Lorentz factor

  • Commputation of the particle quantum parameter (particle_chi)

  • Update of the momentum: we use the function getClassicalRadiatedEnergy implemented in the class RadiationTables to get the classical radiated energy for the given particle.

  • Computation of the radiated energy: this energy is first stored per particle in the local array rad_norm_energy

The second loop performs a vectorized reduction of the radiated energy for the current thread.

1            momentum_z[ipart] -= temp*momentum_z[ipart];
2                                              
3    // _______________________________________________________________
4    // Computation of the thread radiated energy
5                                              
6#ifndef SMILEI_ACCELERATOR_GPU_OACC

The third vectorized loop compute again the quantum parameters based on the new momentum and this time the result is stored in the dedicated array chi[ipart] to be used in the following part of the time step.

 1                                              + momentum_z[ipart]*momentum_z[ipart] );
 2        }
 3    }
 4
 5    #pragma omp simd reduction(+:radiated_energy_loc)
 6    for( int ipart=0 ; ipart<iend-istart; ipart++ ) {
 7        radiated_energy_loc += weight[ipart]*rad_norm_energy[ipart] ;
 8    }
 9#else
10            radiated_energy_loc += weight[ipart]*(gamma - std::sqrt( 1.0
11                                              + momentum_x[ipart]*momentum_x[ipart]
12                                              + momentum_y[ipart]*momentum_y[ipart]
13                                              + momentum_z[ipart]*momentum_z[ipart] ));
14#endif
15
16
17    // _______________________________________________________________

Corrected Landau-Lifshift

The derived class that correspponds to the corrected Landau-Lifshift model is called RadiationCorrLandauLifshitz and is implemented in the corresponding files.

Niel

The derived class that correspponds to the Nield model is called RadiationNiel and is implemented in the corresponding files.

Monte-Carlo

The derived class that correspponds to the Monte-Carlo model is called RadiationMonteCarlo and is implemented in the corresponding files.

Radiation Tables

The class RadiationTables is used to store and manage the different tabulated functions used by some of the radiation models.

Radiation Tools

The class RadiationTools contains some function tools for the radiation. Some methods provide function fit from tabulated values obtained numerically.

Task Parallelization

As explained in the dedicated page, if OpenMP tasks are activated, the macro-particle operations, in particular those inside vecPatches::dynamics, are parallelized with the task programming paradigm.

Inside vecPatches::dynamics, instead of using an #pragma omp for schedule(dynamics) on the patch loop, a task is generated for each of the ipatch-ispec combination. Instead of the thread number, a buffer_id is given as input to the Species::dynamicsTasks method, to avoid race conditions. Indeed, a buffer (indices, deltaold, etc.) is created for each of the ipatch-ispec combination.

The method Species::dynamicsTasks replaces the use of the usual Species::dynamics method (to be used when tasks are not activated).

Inside each call to Species::dynamics a task is generated for each operator-ibin combination, using depend to ensure the correct order of execution of the operators. For example, for macro-particles inside a given combination ipatch-ispec-ibin the Pusher operator can be executed only after the Interpolator, and so on. The bins have a size of cluster_width cells along the x direction.

To further avoid race conditions, the Interpolators must use local variables for the temporary indices and coefficients they compute.

The Pusher and the Boundary Conditions operators do not present risks of race conditions.

Operators which create new macro-particles, as Ionization, must create them in separated lists for each bin.

Probably the most critical operator for task parallelization is the Projector. To avoid race conditions, each ipatch-ispec-ibin combination has its own copy of the grid representing the physical space where its macro-particles belong. The Projector operator must thus project on these independent subgrids. For this, with tasks the bin_shift argument in the operator is used; without tasks it is zero by default.

These operators acting on a certain ipatch-ispec combination are enclosed in a #pragma omp taskgroup directive to ensure that they are completed before other tasks are generated in that scope.

Afterwards, in vecPatches::dynamics some tasks containing reductions are generated. The densities in the subgrids are summed in the main grid, and the lists containing the new macro-particles (e.g. from ionization) for each bin are joined in the main lists with the new particles.

Before the synchronisations, a #pragma omp taskwait directive is used to wait that the tasks are completed.

The same concepts are used for the envelope module and for the vectorized versions of dynamics. In particular, for the envelope module the Projector for susceptibility uses a local subgrid.

Warning

Without tasks, the number buffer vectors (for each buffer) is equal to the number of OpenMP threads. Each time a thread treats a Species inside a patch it is trating, the corresponding buffer is resized to the number of macro-particles inside that Species in that patch. With tasks, the number of buffer vectors (for each buffer) is equal to the number of patches times the number of Species. To avoid a memory leak, at the end of Species::dynamicsTasks these vectors are resized to 1 element. This operation, as all resize operations on the macro-particle buffers, takes memory and time, but unfortunately no better solution has been found. With the envelope loop, this precaution does not seem necessary, since typically less macro-particles are used. This memory increase, summed to the one given by the mentioned subgrids, make tasks memory-consuming. It is advised to use them with a lot of computing units to avoid using too much memory for each unit.

Warning

A word on debugging with tasks. The typical non-trivial bugs which arise developing with tasks are given by bad synchronizations which cause data races, causing segfaults. Since using a debugger (e.g. allinea ddt) changes the execution time and the syncrhonization, often these bugs disappear, making them tricky to detect. The best strategies to avoid these bugs with tasks are 1) use local variables as much as possible and 2) First check if the code works with all omp task directives commented (i.e. not using tasks). Then, progressively decomment these directives, one by one, introducing a barrier like a #pragma omp taskwait between them and check which task introduces the segfault.

Particle Event Tracing

To visualize the scheduling of operations, with or without tasks, an macro-particle event tracing diagnostic saves the time when each OpenMP thread executes one of these operations and the involved thread’s number. The PIC operators (and the advanced ones like Ionization, Radiation) acting on each ipatch-ispec-ibin combination are included in this diagnostic. This way, the macro-particle operations executed by each OpenMP thread in each MPI process can be visualized. The information of which bin, Species, patch is involved is not stored, since it would be difficult to visualize this level of detail. However it can be added to this diagnostic in principle.

This visualization becomes clearer when a small number of patches, Species, bins is used. In other cases the plot may become unreadable.