TY - GEN
T1 - Conjugate Gradient Solvers with High Accuracy and Bit-wise Reproducibility between CPU and GPU using Ozaki scheme
AU - Mukunoki, Daichi
AU - Ozaki, Katsuhisa
AU - Ogita, Takeshi
AU - Iakymchuk, Roman
N1 - Funding Information:
This research was partially supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant #19K20286 and the EU H2020 research, innovation program under the Marie Skłodowska-Curie grant agreement via the Robust project No. 842528. This research used computational resources of the Cygnus supercomputer provided by Multidisciplinary Cooperative Research Program in Center for Computational Sciences, University of Tsukuba, and the Oakforest-PACS system operated by JCAHPC.
Publisher Copyright:
© 2021 Owner/Author.
PY - 2021/1/20
Y1 - 2021/1/20
N2 - On Krylov subspace methods such as the Conjugate Gradient (CG) method, the number of iterations until convergence may increase due to the loss of computational accuracy caused by rounding errors in floating-point computations. At the same time, because the order of the computation is nondeterministic on parallel computation, the result and the behavior of the convergence may be nonidentical in different computational environments, even for the same input. In this study, we present an accurate and reproducible implementation of the unpreconditioned CG method on x86 CPUs and NVIDIA GPUs. In our method, while all variables are stored on FP64, all inner product operations (including matrix-vector multiplications) are performed using the Ozaki scheme. The scheme delivers the correctly rounded computation as well as bit-level reproducibility among different computational environments. In this paper, we show some examples where the standard FP64 implementation of CG results in nonidentical results across different CPUs and GPUs. We then demonstrate the applicability and the effectiveness of our approach in terms of accuracy and reproducibility and their performance on both CPUs and GPUs. Furthermore, we compare the performance of our method against an existing accurateand reproducible CG implementation based on the Exact Basic Linear Algebra Subprograms (ExBLAS) on CPUs.
AB - On Krylov subspace methods such as the Conjugate Gradient (CG) method, the number of iterations until convergence may increase due to the loss of computational accuracy caused by rounding errors in floating-point computations. At the same time, because the order of the computation is nondeterministic on parallel computation, the result and the behavior of the convergence may be nonidentical in different computational environments, even for the same input. In this study, we present an accurate and reproducible implementation of the unpreconditioned CG method on x86 CPUs and NVIDIA GPUs. In our method, while all variables are stored on FP64, all inner product operations (including matrix-vector multiplications) are performed using the Ozaki scheme. The scheme delivers the correctly rounded computation as well as bit-level reproducibility among different computational environments. In this paper, we show some examples where the standard FP64 implementation of CG results in nonidentical results across different CPUs and GPUs. We then demonstrate the applicability and the effectiveness of our approach in terms of accuracy and reproducibility and their performance on both CPUs and GPUs. Furthermore, we compare the performance of our method against an existing accurateand reproducible CG implementation based on the Exact Basic Linear Algebra Subprograms (ExBLAS) on CPUs.
KW - Accuracy
KW - CPU
KW - Conjugate Gradient
KW - GPU
KW - heterogeneous computing
KW - reproducibility
UR - http://www.scopus.com/inward/record.url?scp=85099880833&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099880833&partnerID=8YFLogxK
U2 - 10.1145/3432261.3432270
DO - 10.1145/3432261.3432270
M3 - Conference contribution
AN - SCOPUS:85099880833
T3 - ACM International Conference Proceeding Series
SP - 100
EP - 109
BT - Proceedings of International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2021
PB - Association for Computing Machinery
T2 - 2021 International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2021
Y2 - 20 January 2021 through 22 January 2021
ER -