TY - GEN
T1 - Sub-band Vector Quantized Variational AutoEncoder for Spectral Envelope Quantization
AU - Srikotr, Tanasan
AU - Mano, Kazunori
PY - 2019/10
Y1 - 2019/10
N2 - Recently, a lot of deep learning model successful in taking over conventional methods in speech processing fields. Vector quantization is a popular technique to reduce the amount of speech data before transmitting. The conventional vector quantization method is based on the mathematical model. Last few years, the Vector Quantized Variational AutoEncoder has been proposed for an end-to-end vector quantization based on deep learning techniques. In this paper, we investigate the sub-band quantization in the Vector Quantized Variational AutoEncoder. This model can concentrate on specific frequency bands to assign more bits and leave the unnecessary band with few bits. Experimental results show the efficiency of the proposed quantization method for the spectral envelope parameters of the high-quality vocoder that operates at 48 kHz sampling frequency named WORLD vocoder. At the same four target bit rates, the sub-band Vector Quantized Variational AutoEncoder can reduce the Log Spectral Distortion around 0.93 dB in average.
AB - Recently, a lot of deep learning model successful in taking over conventional methods in speech processing fields. Vector quantization is a popular technique to reduce the amount of speech data before transmitting. The conventional vector quantization method is based on the mathematical model. Last few years, the Vector Quantized Variational AutoEncoder has been proposed for an end-to-end vector quantization based on deep learning techniques. In this paper, we investigate the sub-band quantization in the Vector Quantized Variational AutoEncoder. This model can concentrate on specific frequency bands to assign more bits and leave the unnecessary band with few bits. Experimental results show the efficiency of the proposed quantization method for the spectral envelope parameters of the high-quality vocoder that operates at 48 kHz sampling frequency named WORLD vocoder. At the same four target bit rates, the sub-band Vector Quantized Variational AutoEncoder can reduce the Log Spectral Distortion around 0.93 dB in average.
KW - autoencoder
KW - sub-band coding
KW - vector quantization
KW - vector quantized variational autoencoder
UR - http://www.scopus.com/inward/record.url?scp=85077681707&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85077681707&partnerID=8YFLogxK
U2 - 10.1109/TENCON.2019.8929436
DO - 10.1109/TENCON.2019.8929436
M3 - Conference contribution
AN - SCOPUS:85077681707
T3 - IEEE Region 10 Annual International Conference, Proceedings/TENCON
SP - 296
EP - 300
BT - Proceedings of the TENCON 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE Region 10 Conference: Technology, Knowledge, and Society, TENCON 2019
Y2 - 17 October 2019 through 20 October 2019
ER -