TY - JOUR
T1 - A corpus-based speech synthesis system with emotion
AU - Iida, Akemi
AU - Campbell, Nick
AU - Higuchi, Fumito
AU - Yasumura, Michiaki
N1 - Funding Information:
This research was partially supported by JST CREST. The authors express their appreciation to Dr. Parham Mokhtari of JST CREST for preparing the steady state vowels and the formant data. The authors also thank Mr. Shinnichi Yamaguchi, of Fukuoka, Japan, Dr. Soichiro Iga of Ricoh Co., Ltd., Mr. Marc Schröder of the University of the Saarland, and Mr. Kazuya Takahashi of Keio University for valuable discussions. The authors would further like to thank target users, students, and colleagues who participated in the experiments.
PY - 2003/4
Y1 - 2003/4
N2 - We propose a new approach to synthesizing emotional speech by a corpus-based concatenative speech synthesis system (ATR CHATR) using speech corpora of emotional speech. In this study, neither emotional-dependent prosody prediction nor signal processing per se is performed for emotional speech. Instead, a large speech corpus is created per emotion to synthesize speech with the appropriate emotion by simple switching between the emotional corpora. This is made possible by the normalization procedure incorporated in CHATR that transforms its standard predicted prosody range according to the source database in use. We evaluate our approach by creating three kinds of emotional speech corpus (anger, joy, and sadness) from recordings of a male and a female speaker of Japanese. The acoustic characteristics of each corpus are different and the emotions identifiable. The acoustic characteristics of each emotional utterance synthesized by our method show clear correlations to those of each corpus. Perceptual experiments using synthesized speech confirmed that our method can synthesize recognizably emotional speech. We further evaluated the method's intelligibility and the overall impression it gives to the listeners. The results show that the proposed method can synthesize speech with a high intelligibility and gives a favorable impression. With these encouraging results, we have developed a workable text-to-speech system with emotion to support the immediate needs of nonspeaking individuals. This paper describes the proposed method, the design and acoustic characteristics of the corpora, and the results of the perceptual evaluations.
AB - We propose a new approach to synthesizing emotional speech by a corpus-based concatenative speech synthesis system (ATR CHATR) using speech corpora of emotional speech. In this study, neither emotional-dependent prosody prediction nor signal processing per se is performed for emotional speech. Instead, a large speech corpus is created per emotion to synthesize speech with the appropriate emotion by simple switching between the emotional corpora. This is made possible by the normalization procedure incorporated in CHATR that transforms its standard predicted prosody range according to the source database in use. We evaluate our approach by creating three kinds of emotional speech corpus (anger, joy, and sadness) from recordings of a male and a female speaker of Japanese. The acoustic characteristics of each corpus are different and the emotions identifiable. The acoustic characteristics of each emotional utterance synthesized by our method show clear correlations to those of each corpus. Perceptual experiments using synthesized speech confirmed that our method can synthesize recognizably emotional speech. We further evaluated the method's intelligibility and the overall impression it gives to the listeners. The results show that the proposed method can synthesize speech with a high intelligibility and gives a favorable impression. With these encouraging results, we have developed a workable text-to-speech system with emotion to support the immediate needs of nonspeaking individuals. This paper describes the proposed method, the design and acoustic characteristics of the corpora, and the results of the perceptual evaluations.
KW - Concatenative speech synthesis
KW - Corpus
KW - Emotion
KW - Natural speech
KW - Source database
UR - http://www.scopus.com/inward/record.url?scp=0037380318&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0037380318&partnerID=8YFLogxK
U2 - 10.1016/S0167-6393(02)00081-X
DO - 10.1016/S0167-6393(02)00081-X
M3 - Article
AN - SCOPUS:0037380318
SN - 0167-6393
VL - 40
SP - 161
EP - 187
JO - Speech Communication
JF - Speech Communication
IS - 1-2
ER -