Page 42 - Krész, Miklós, and Andrej Brodnik (eds.). MATCOS-13. Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science. Koper: University of Primorska Press, 2016.
        P. 42
     
       	ved in parallel by analogy with the first stage, i.e. steps
(n + 1/p), . . . , (n + p/p) are implemented by every thread
separately (Figure 7). The last stage is the opposite to the
Figure 7: The third stage of data distribution algo- Figure 9: Computational time of all the algorithms
rithm in the case of adaptive grid depending on the number of threads z: A - regular
second one. The solution obtained from the layer (n + p/p) mesh and direct usage of OpenMP pragmas, B -
of the inner problem is entered to the shared array and then regular mesh and usage of parallelism based on the
necessary data are transmitted to the sertain threads in ac- data distribution; obtained with MVS-10P of JSCC
cording to the data distribution on the coarse grid. Besides RAS
that the position of the adaptive grid is determined at this
step. of which contains two processors Intel R Xeon R X5670 (12M
Cache, 2.93 GHz, 6.40 GT/s Intel R QPI, 6 cores), or on
Figure 8: Computational time of all the algorithms MVS-10P of Joint Supercomputer Center of the Russian
depending on the number of threads z: A - regular Academy of Sciences (JSCC), each node of which contains
mesh and direct usage of OpenMP pragmas, B - reg- two processors Xeon R E5-2690 (20M Cache, 2.90 GHz, 8.00
ular mesh and usage of parallelism based on the data GT/s Intel R QPI, 8 cores). Hyper-threading is turned off,
distribution, C - adaptive mesh and direct usage of the KMP Affinity environment variable is set equal to ”com-
OpenMP pragmas, D - adaptive mesh and usage of pact”. At the Figure 8 one may see a bar chart of the compu-
parallelism based on the data distribution; obtained tational time of each algorithm executed on NKS-30T SSCC.
with NKS-30T of SSCC First of all the importance of introduction of the adaptive
grid in the case of sequentially running programs is well seen
To get more detailed information about the construction from this picture. The use of this technique reduces com-
of the adaptive grid and implementation of the respective putational time by a factor of 22 when the only one thread
algorithm with the data distribution see [4]. works. Secondly one could easily compare the efficiency of
both methods of parallelization applied. In the case of the
4. RESULTS regular mesh the direct application of OpenMP directives
provides some improvement for the number of threads less
Numerical experiments were held for the model with the or equal 6. As the number of threads grows data amount be-
tube length L = 0.1 m for the physical time t = 15.0 s. All comes less while the quantity of data transmissions increases
the calculations were performed either on NKS-30T Clus- greatly. So for more than 6 threads there is slowdown of par-
ter of Siberian Super Computer Center (SSCC), each node allel computation based on direct usage of OpenMP prag-
mas. The proposed data distribution algorithm, in opposite,
shows rather good scalability per number of threads up to
z = 12 in this case. The same data are presented by the
means of the Table 2 as well.
Analogous results for the regular mesh obtained on MVS-
10P cluster are presented at the Figure 9 and in the Table
3. It is seen that the same calculations take a little less
time, but the general situation is still the same up to 16
threads. As concerns the algorithm with the enclosed prob-
lem the first method of parallelization is inadmissible since
the more threads is used the longer computation lasts. At
the same time it is seen that for the algorithm with the en-
closed problem even the proposed method doesn’t provide
good scalability though has better effect than the usage of
OpenMP pragmas. So there is an opportunity that with the
greater number of threads the usage of the adaptive grid
might become ineffective.
m a t c o s -1 3 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science 42
Koper, Slovenia, 10-11 October
       
     (n + 1/p), . . . , (n + p/p) are implemented by every thread
separately (Figure 7). The last stage is the opposite to the
Figure 7: The third stage of data distribution algo- Figure 9: Computational time of all the algorithms
rithm in the case of adaptive grid depending on the number of threads z: A - regular
second one. The solution obtained from the layer (n + p/p) mesh and direct usage of OpenMP pragmas, B -
of the inner problem is entered to the shared array and then regular mesh and usage of parallelism based on the
necessary data are transmitted to the sertain threads in ac- data distribution; obtained with MVS-10P of JSCC
cording to the data distribution on the coarse grid. Besides RAS
that the position of the adaptive grid is determined at this
step. of which contains two processors Intel R Xeon R X5670 (12M
Cache, 2.93 GHz, 6.40 GT/s Intel R QPI, 6 cores), or on
Figure 8: Computational time of all the algorithms MVS-10P of Joint Supercomputer Center of the Russian
depending on the number of threads z: A - regular Academy of Sciences (JSCC), each node of which contains
mesh and direct usage of OpenMP pragmas, B - reg- two processors Xeon R E5-2690 (20M Cache, 2.90 GHz, 8.00
ular mesh and usage of parallelism based on the data GT/s Intel R QPI, 8 cores). Hyper-threading is turned off,
distribution, C - adaptive mesh and direct usage of the KMP Affinity environment variable is set equal to ”com-
OpenMP pragmas, D - adaptive mesh and usage of pact”. At the Figure 8 one may see a bar chart of the compu-
parallelism based on the data distribution; obtained tational time of each algorithm executed on NKS-30T SSCC.
with NKS-30T of SSCC First of all the importance of introduction of the adaptive
grid in the case of sequentially running programs is well seen
To get more detailed information about the construction from this picture. The use of this technique reduces com-
of the adaptive grid and implementation of the respective putational time by a factor of 22 when the only one thread
algorithm with the data distribution see [4]. works. Secondly one could easily compare the efficiency of
both methods of parallelization applied. In the case of the
4. RESULTS regular mesh the direct application of OpenMP directives
provides some improvement for the number of threads less
Numerical experiments were held for the model with the or equal 6. As the number of threads grows data amount be-
tube length L = 0.1 m for the physical time t = 15.0 s. All comes less while the quantity of data transmissions increases
the calculations were performed either on NKS-30T Clus- greatly. So for more than 6 threads there is slowdown of par-
ter of Siberian Super Computer Center (SSCC), each node allel computation based on direct usage of OpenMP prag-
mas. The proposed data distribution algorithm, in opposite,
shows rather good scalability per number of threads up to
z = 12 in this case. The same data are presented by the
means of the Table 2 as well.
Analogous results for the regular mesh obtained on MVS-
10P cluster are presented at the Figure 9 and in the Table
3. It is seen that the same calculations take a little less
time, but the general situation is still the same up to 16
threads. As concerns the algorithm with the enclosed prob-
lem the first method of parallelization is inadmissible since
the more threads is used the longer computation lasts. At
the same time it is seen that for the algorithm with the en-
closed problem even the proposed method doesn’t provide
good scalability though has better effect than the usage of
OpenMP pragmas. So there is an opportunity that with the
greater number of threads the usage of the adaptive grid
might become ineffective.
m a t c o s -1 3 Proceedings of the 2013 Mini-Conference on Applied Theoretical Computer Science 42
Koper, Slovenia, 10-11 October






