Accurate Matrix Multiplication on Binary128 Format Accelerated by Ozaki Scheme

Daichi Mukunoki, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)


Although IEEE 754-2008 binary128 (with a 15-bit exponent and 113-bit significand, i.e., quadruple-precision) is not currently implemented on x86 in hardware, software emulation is available on some compilers. However, the performance is significantly slower compared to the binary64 operation, which is supported natively in hardware. This study proposes a fast implementation of matrix multiplication on matrices stored in the binary128 format on x86 CPUs. The proposed implementation utilizes the Ozaki scheme, which is an accurate matrix multiplication algorithm proposed by Ozaki et al. in 2012. This scheme enables one to perform most computations using the binary64 matrix multiplication (the DGEMM routine in Basic Linear Algebra Subprograms (BLAS)); it can exploit the high-performance of highly-optimized vendor BLAS. Although the achievable performance depends on the input matrices (the inner-product dimension, the absolute range, and the significand bit length), the proposed implementation can achieve better performance and accuracy compared to naive matrix multiplication performed using the GCC's binary128 emulation in many cases. In addition, we discuss GPU acceleration, performance on reduced precision inputs, an implementation based on binary32 matrix multiplication (SGEMM), application to memory-intensive operations, and the possibility of a distributed parallel implementation.

Original languageEnglish
Title of host publication50th International Conference on Parallel Processing, ICPP 2021 - Main Conference Proceedings
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450390682
Publication statusPublished - 2021 Aug 9
Event50th International Conference on Parallel Processing, ICPP 2021 - Virtual, Online, United States
Duration: 2021 Aug 92021 Aug 12

Publication series

NameACM International Conference Proceeding Series


Conference50th International Conference on Parallel Processing, ICPP 2021
Country/TerritoryUnited States
CityVirtual, Online


  • Accurate
  • BLAS
  • Binary128
  • FP128
  • Linear algebra
  • Matrix multiplication
  • Quadruple precision

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications


Dive into the research topics of 'Accurate Matrix Multiplication on Binary128 Format Accelerated by Ozaki Scheme'. Together they form a unique fingerprint.

Cite this