Page 41 - Krész, Miklós, and Andrej Brodnik (eds.). MATCOS-13. Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science. Koper: University of Primorska Press, 2016.
P. 41
ure 3: The scheme of the two-level algorithm To reduce the time for the exchange of data flows we pro-
pose to distribute the data between the threads. In the case
of regular mesh by these words we mean the following pro-
cedure. The spatial grid is divided into z pieces, where z
is the number of threads used, and each thread gets control
over one of the pieces: thread number zero deals with nodes
[0, . . . , M/z], the next one – with [M/z, . . . , 2M/z] and so
on. The last one deals with nodes [(z−1)M/z, . . . , M ]. Each
thread creates its local arrays, fills them in accordance with
the difference scheme, and outputs the results on its assigned
grid nodes. On a regular grid we thus obtain practically in-
dependent tasks exchanging only boundary conditions.
In the case of adaptive grids communication becomes more
complicated because of the need to transfer information from
the nodes of the fine mesh to those of the coarse one. In this
case the data distribution algorithm has few stages. Af-
ter n time steps of general task we have data distributed
across z threads: [0, . . . , Mc/z], [Mc/z, . . . , 2Mc/z], . . . , [(z −
1)Mc/z, . . . , Mc] and shared arrays of size (Mf + 1), where
Mc, Mf are the numbers of nodes of the coarse and fine
meshes respectively. The first stage is the implementation
of one time step of coarse grid by every thread separately
(see Figure 5). The second stage prepares the data for the
Figure 4: The scheme of the movement of the adap- Figure 5: The first stage of data distribution algo-
tive fine grid rithm in the case of adaptive grid
inner problem. It is meant that the position of the fine grid
agation of the combustion front: as soon as the maximum with respect to the coarse one is known and ileft, iright are
point of W function moves by the spatial step of the coarse the numbers of the nodes being the edges of the enclosed
grid, the fine mesh moves by the same distance (see Figure grid. Initial data obtained by interpolation from the layer n
4). This approach allows us to take big spatial and time of coarse grid are entered to the shared arrays and then redis-
steps outside the chemical reaction zone. The numerical ex- tributed across z threads, forming the layer (n + 0/p) of the
periments show that to achieve the same accuracy grade one enclosed problem (Figure 6). Then the enclosed problem is
may use the coarse grid with the spatial step four times big-
ger than the one of fine grid. That implies about sixteen Figure 6: The second stage of data distribution al-
times bigger time steps. It is obvious thus that the usage gorithm in the case of adaptive grid
of such two-level time mesh is indispensable in the problem
like the one in question.
3. PARALLELIZATION
3.1 Classical way
The simplest and the most obvious way to implement the
program on a multiprocessor node is application of OpenMP
pragmas to the inner loops of the program. All the arrays
in this case are shared and all the threads work to fill each
of them. As a result data of different types are held in
the storages of different threads and the threads have to
communicate intensively. In the case of the regular mesh
it concerns to the separate computations of the exponent
values and the values of η in the scheme equations of which
they are used, as well as to the move from one time layer
to another. The situation is far more complicated for the
algorithm with the enclosed fine mesh, because in this case
there are a lot of bottlenecks besides the mentioned ones,
related to the repeated transition from coarse mesh to the
fine one and back. Such way of parallelism provides some
acceleration for the execution of regular mesh algorithm, but
it leads even to the slowdown of the computation when the
adaptive mesh is used.
3.2 Data distribution
m a t c o s -1 3 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science 41
Koper, Slovenia, 10-11 October
pose to distribute the data between the threads. In the case
of regular mesh by these words we mean the following pro-
cedure. The spatial grid is divided into z pieces, where z
is the number of threads used, and each thread gets control
over one of the pieces: thread number zero deals with nodes
[0, . . . , M/z], the next one – with [M/z, . . . , 2M/z] and so
on. The last one deals with nodes [(z−1)M/z, . . . , M ]. Each
thread creates its local arrays, fills them in accordance with
the difference scheme, and outputs the results on its assigned
grid nodes. On a regular grid we thus obtain practically in-
dependent tasks exchanging only boundary conditions.
In the case of adaptive grids communication becomes more
complicated because of the need to transfer information from
the nodes of the fine mesh to those of the coarse one. In this
case the data distribution algorithm has few stages. Af-
ter n time steps of general task we have data distributed
across z threads: [0, . . . , Mc/z], [Mc/z, . . . , 2Mc/z], . . . , [(z −
1)Mc/z, . . . , Mc] and shared arrays of size (Mf + 1), where
Mc, Mf are the numbers of nodes of the coarse and fine
meshes respectively. The first stage is the implementation
of one time step of coarse grid by every thread separately
(see Figure 5). The second stage prepares the data for the
Figure 4: The scheme of the movement of the adap- Figure 5: The first stage of data distribution algo-
tive fine grid rithm in the case of adaptive grid
inner problem. It is meant that the position of the fine grid
agation of the combustion front: as soon as the maximum with respect to the coarse one is known and ileft, iright are
point of W function moves by the spatial step of the coarse the numbers of the nodes being the edges of the enclosed
grid, the fine mesh moves by the same distance (see Figure grid. Initial data obtained by interpolation from the layer n
4). This approach allows us to take big spatial and time of coarse grid are entered to the shared arrays and then redis-
steps outside the chemical reaction zone. The numerical ex- tributed across z threads, forming the layer (n + 0/p) of the
periments show that to achieve the same accuracy grade one enclosed problem (Figure 6). Then the enclosed problem is
may use the coarse grid with the spatial step four times big-
ger than the one of fine grid. That implies about sixteen Figure 6: The second stage of data distribution al-
times bigger time steps. It is obvious thus that the usage gorithm in the case of adaptive grid
of such two-level time mesh is indispensable in the problem
like the one in question.
3. PARALLELIZATION
3.1 Classical way
The simplest and the most obvious way to implement the
program on a multiprocessor node is application of OpenMP
pragmas to the inner loops of the program. All the arrays
in this case are shared and all the threads work to fill each
of them. As a result data of different types are held in
the storages of different threads and the threads have to
communicate intensively. In the case of the regular mesh
it concerns to the separate computations of the exponent
values and the values of η in the scheme equations of which
they are used, as well as to the move from one time layer
to another. The situation is far more complicated for the
algorithm with the enclosed fine mesh, because in this case
there are a lot of bottlenecks besides the mentioned ones,
related to the repeated transition from coarse mesh to the
fine one and back. Such way of parallelism provides some
acceleration for the execution of regular mesh algorithm, but
it leads even to the slowdown of the computation when the
adaptive mesh is used.
3.2 Data distribution
m a t c o s -1 3 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science 41
Koper, Slovenia, 10-11 October