Reproducible BLAS routines with tunable accuracy using ozaki scheme for many-core architectures

Daichi Mukunoki, Takeshi Ogita, Katsuhisa Ozaki

研究成果: Conference contribution

7 被引用数 (Scopus)

抄録

Generally, floating-point computations comprise rounding errors; the result may be inaccurate and not identical (non-reproducible). Particularly, heterogeneous computing has many factors that affect reproducibility. The loss of accuracy and reproducibility could be a crucial issue in debugging complex codes and the reliability of computations. In this paper, we propose high-performance implementations of reproducible basic linear algebra subprograms (BLAS) routines with tunable accuracy for many-core architectures. Our approach is based on an accurate matrix-multiplication method, Ozaki scheme, which can be constructed on level-3 BLAS that performs standard floating-point operations. We demonstrate the performance of three routines: inner product (DOT), matrix-vector multiplication (GEMV), and matrix-multiplication (GEMM) on NVIDIA’s Volta GPU by comparing these with the standard routines provided by the vendor. Furthermore, we demonstrate the reproducibility between CPU and GPU and its accuracy.

本文言語English
ホスト出版物のタイトルParallel Processing and Applied Mathematics - 13th International Conference, PPAM 2019, Revised Selected Papers
編集者Roman Wyrzykowski, Konrad Karczewski, Ewa Deelman, Jack Dongarra
出版社Springer
ページ516-527
ページ数12
ISBN(印刷版)9783030432287
DOI
出版ステータスPublished - 2020
イベント13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019 - Bialystok, Poland
継続期間: 2019 9月 82019 9月 11

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
12043 LNCS
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

Conference

Conference13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019
国/地域Poland
CityBialystok
Period19/9/819/9/11

ASJC Scopus subject areas

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「Reproducible BLAS routines with tunable accuracy using ozaki scheme for many-core architectures」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル