TY - JOUR
T1 - Speech Database Design for a Concatenative Text-to-Speech Synthesis System for Individuals with Communication Disorders
AU - Iida, Akemi
AU - Campbell, Nick
N1 - Funding Information:
The authors would like to express their sincere appreciation to Mr. Shinichi Yamaguchi of Fukuoka, Japan for his participation in the research. We are grateful for financial assistance from the Japan Science and Technology Agency via the CREST (Core Research for Evolutional Science and Technology) scheme for Advanced Media Technology. We are grateful to Prof. Kimitoshi Fukudome of the Kyushuu Institute of Design for providing such excellent recording facilities. Further appreciation goes to Professors Satoshi Imaizumi and Keikichi Hirose of the University of Tokyo for recording advice. We also thank ATR for providing CHATR98 for the speaker and Mr. Ken Shimomura of NTT AT for the support on the text corpora design. We are also grateful to the students and colleagues who participated in our perceptual experiments. Lastly, we thank Prof. Michiaki Yasumura and Dr. Fumito Higuchi of Keio University for their valuable advice and support on evaluation procedures.
PY - 2003/10
Y1 - 2003/10
N2 - ATR's CHATR is a corpus-based text-to-speech (TTS) synthesis system that selects concatenation units from a natural speech database. The system's approach enables us to create a voice output communication aid (VOCA) using the voices of individuals who are anticipating the loss of phonatory functions. The advantage of CHATR is that individuals can use their own voice for communication even after vocal loss. This paper reports on a case study of the development of a VOCA using recordings of Japanese read speech (i.e., oral reading) from an individual with amyotrophic lateral sclerosis (ALS). In addition to using the individual's speech, we designed a speech database that could reproduce the characteristics of natural utterances in both general and specific situations. We created three speech corpora in Japanese to synthesize ordinary daily speech (i.e., in a normal speaking style): (1) a phonetically balanced sentence set, to assure that the system was able to synthesize all speech sounds; (2) readings of manuscripts, written by the same individual, for synthesizing talks regularly given as a source of natural intonation, articulation and voice quality; and (3) words and short phrases, to provide daily vocabulary entries for reproducing natural utterances in predictable situations. By combining one or more corpora, we were able to create four kinds of source database for CHATR synthesis. Using each source database, we synthesized speech from six test sentences. We selected the source database to use by observing selected units of synthesized speech and by performing perceptual experiments where we presented the speech to 20 Japanese native speakers. Analyzing the results of both observations and evaluations, we selected a source database compiled from all corpora. Incorporating CHATR, the selected source database, and an input acceleration function, we developed a VOCA for the individual to use in his daily life. We also created emotional speech source databases designed for loading separately to the VOCA in addition to the compiled speech database.
AB - ATR's CHATR is a corpus-based text-to-speech (TTS) synthesis system that selects concatenation units from a natural speech database. The system's approach enables us to create a voice output communication aid (VOCA) using the voices of individuals who are anticipating the loss of phonatory functions. The advantage of CHATR is that individuals can use their own voice for communication even after vocal loss. This paper reports on a case study of the development of a VOCA using recordings of Japanese read speech (i.e., oral reading) from an individual with amyotrophic lateral sclerosis (ALS). In addition to using the individual's speech, we designed a speech database that could reproduce the characteristics of natural utterances in both general and specific situations. We created three speech corpora in Japanese to synthesize ordinary daily speech (i.e., in a normal speaking style): (1) a phonetically balanced sentence set, to assure that the system was able to synthesize all speech sounds; (2) readings of manuscripts, written by the same individual, for synthesizing talks regularly given as a source of natural intonation, articulation and voice quality; and (3) words and short phrases, to provide daily vocabulary entries for reproducing natural utterances in predictable situations. By combining one or more corpora, we were able to create four kinds of source database for CHATR synthesis. Using each source database, we synthesized speech from six test sentences. We selected the source database to use by observing selected units of synthesized speech and by performing perceptual experiments where we presented the speech to 20 Japanese native speakers. Analyzing the results of both observations and evaluations, we selected a source database compiled from all corpora. Incorporating CHATR, the selected source database, and an input acceleration function, we developed a VOCA for the individual to use in his daily life. We also created emotional speech source databases designed for loading separately to the VOCA in addition to the compiled speech database.
KW - AAC
KW - Communication disorder
KW - Corpus-based TTS synthesis
KW - Speech corpus
KW - VOCA
UR - http://www.scopus.com/inward/record.url?scp=0142153901&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0142153901&partnerID=8YFLogxK
U2 - 10.1023/A:1025761017833
DO - 10.1023/A:1025761017833
M3 - Article
AN - SCOPUS:0142153901
SN - 1381-2416
VL - 6
SP - 379
EP - 392
JO - International Journal of Speech Technology
JF - International Journal of Speech Technology
IS - 4
ER -