Keynote Speech: Selected Topics from ASR Research for Asian Languages at Tokyo Tech

Sadaoki Furui, Professor Emeritus, Institute Professor Tokyo Institute of Technology


More than 6000 living languages are spoken in the world today, and the majority of them are concentrating in Asia.  Every language has its own specific acoustic as well as linguistic characteristics that require special modeling techniques.  This talk presents our recent experiences in regard to building automatic speech recognition (ASR) systems for the Indonesian, Thai and Chinese languages.  For Indonesian, we are building a spoken-query information retrieval (IR) system.  In order to solve the problem of a large variation of proper noun and English word pronunciation, we have applied proper noun-specific adaptation in acoustic modeling and rule-based English-to-Indonesian phoneme mapping.  For Thai, since there is no word boundary in the written form, we have proposed a new method for automatically creating word-like units from a text corpus, and to recognize spoken style utterances we have applied topic and speaking style adaptation to the language model.  In spoken Chinese, long organization names are often abbreviated, and abbreviated utterances cannot be recognized if the abbreviations are not included in the dictionary.  We have proposed a new method for automatically generating Chinese abbreviations, and by expanding the vocabulary using the generated abbreviations, we have significantly improved the performance of voice search.  This talk includes several recent research activities for the Japanese language.

Sadaoki Furui received B.S., M.S., and Ph.D. degrees in mathematical engineering and instrumentation physics from Tokyo University, Tokyo, Japan in 1968, 1970, and 1978, respectively.

He joined the Electrical Communications Laboratories of Nippon Telegraph and Telephone (NTT) Corporation in 1970, and later served as a Research Fellow and the Director of the Furui Research Laboratory at NTT Human Interface Laboratories, from 1991 to 1997. He is currently a Professor of the Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology. He has also served as Dean of the Graduate School of Information Science and Engineering from 2007 to 2009, and is now serving as Director of Institute Library.

His research interests include analysis of speaker characterization information in speech waves and its application to speaker recognition as well as interspeaker normalization and adaptation in speech recognition. He is also interested in vector-quantization-based speech recognition algorithms, spectral dynamic features for speech recognition, speech recognition algorithms that are robust against noise and distortion, algorithms for Japanese large-vocabulary continuous-speech recognition, automatic speech summarization algorithms, multimodal human-computer interaction systems, automatic question-answering systems, and analysis of the speech perception mechanism. He has authored or coauthored over 900 published articles.

From December 1978 to December 1979, he served on the staff of the Acoustics Research Department of Bell Laboratories, Murray Hill, New Jersey, as a visiting researcher working on speaker verification. Dr. Furui is a Fellow of the IEEE, the Acoustical Society of America (ASA), the Institute of Electronics, Information and Communication Engineers of Japan (IEICE) and the International Speech Communication Association (ISCA). He served as President of the Permanent Council of International Conferences on Spoken Language Processing (PC-ICSLP) from 2000 to 2004, the ISCA from 2001 to 2005, and the Acoustical Society of Japan (ASJ) from 2001 to 2003. He served on the IEEE Technical Committees on Speech as well as Multimedia Signal Processing, and the Technical Program Committees of ICASSP86 in Tokyo as well as ICSLP90 in Kobe. He served on ICSLP94 in Yokohama as Vice Chairman of the Conference Committee. He has organized various international conferences and workshops including the 1997 IEEE Workshop on Automatic Speech Recognition and Understanding. He has also served on several international advisory boards in the US and Europe. He served as a Board member of the IEEE Signal Processing Society from 2001 to 2003. He served as an Editor-in-Chief of the Journal of Speech Communication from 1997 to 2001, Chief Editor of the Journal of the ASJ from 1997 to 1999, and Chief Editor of the English Journal of IEICE from 2001 to 2003. He also served as an IEEE Press Editorial Board member from 1995 to 1999. He is now serving as an Editorial Board member of the Journal of Computer Speech and Language and the Journal of Speech Communication. He has also served as a Board member of the IEICE and the ASJ.

He supervised the five-year Japanese Science and Technology Agency Priority Program entitled "Spontaneous Speech: Corpus and Processing Technology" from 1999 to 2004. He has supervised the 21st Century Center of Excellence (COE) Program entitled "Framework for Systematization and Application of Large-scale Knowledge Resources" since its inception in 2003.

He received the Yonezawa Prize in 1975 and the Best Paper Award in 1988, 1993 and 2003 from the IEICE. He received the Sato Paper Award from the ASJ in 1985 and 1987. He received the Senior Award from the IEEE ASSP Society and the Achievement Award from the Minister of Science and Technology, both in 1989. He received the Book Award from the IEICE in 1990 and the Achievement Award from the IEICE in 2003. He received the IEEE Signal Processing Society Award, the Achievement Award from the Minister of Education, Culture, Sports, Science and Technology, and the Purple Ribbon Medal from Japanese Emperor in 2006. He received the Distinguished Achievement and Contributions Award from the IEICE in 2008, the ISCA Medal for Scientific Achievement in 2009, and the IEEE James L. Flanagan Speech and Audio Processing Award in 2010. He also received the Mira Paul Memorial Award from the AFECT, India in 2001. He was a Distinguished Lecturer of the IEEE Signal Processing Society from 1993 to 1994.

He is the author of "Digital Speech Processing, Synthesis, and Recognition" (Marcel Dekker, 1989, revised in 2000) in English, "Digital Speech Processing" (Tokai University Press, 1985) in Japanese, "Acoustics and Speech Processing" (Kindai-Kagaku-Sha, 1992, revised in 2006) in Japanese, and "Speech Information Processing" (Morikita, 1998) in Japanese. He has authored "Building computers talking with people - Forefront of automatic speech recognition -" (Kadokawa, 2009) in Japanese. He has co-authored "Image and Speech Processing Technology" (Denpa-Shinbun-Sha, 2004) in Japanese. He has edited "Advances in Speech Signal Processing" (Marcel Dekker, 1992) jointly with Dr. M.M. Sondhi. He has translated into Japanese "Fundamentals of Speech Recognition," authored by Drs. L.R. Rabiner and B.-H. Juang (NTT Advanced Technology, 1995) and "Vector Quantization and Signal Compression," authored by Drs. A. Gersho and R. M. Gray (Corona-sha, 1998).


Invited Speech: Shifts in Information Focus between Chinese and English Abstracts of Chinese Scientific Journal Papers

Helena H. Gao, Nanyang Technological University, Singapore


Due to the fast development in China, research papers published in Chinese have increasingly attracted the attention of researchers outside of the country. However, it is not possible for those who do not know Chinese language to refer to the publications without translation. In recent decades, we have noticed that more and more Chinese scientific journals have made it a norm to include an English translation of the paper abstract. This has largely helped non-Chinese researchers to asertain the paper contents before they decide whether to have the complete paper translated or not.

Thus, it seems crucial for the English abstract of a Chinese paper to present a summary that includes all the important information. However, an abstract of a scientific paper can be no more than 200 to 500 words. Within this limited length, an abstract is expected to give a clear description of the author(s)’ research motivation, statement of the problem to be solved, approach(es) used, results or solutions found to solve the problem, and last but not least a conclusion. These five key points in a brief summary of a paper are assumed to be the standard norm to follow by all authors whether the paper is written in English or Chinese.

In this talk I will present the results of an analysis of the information contents and the linguistic characteristics of scientific journal paper abstracts written in Chinese with translations into English. A sample of 60 paper abstracts published in two top scientific journals: Journal of Computer Research and Development and Chinese Journal of Computers by the Institute of Computing Technology of the Chinese Academy of Sciences were collected for the analysis. The information contents were classified based on the the following five key points usually found in research paper abstracts: (1) the author(s)’ research motivation, (2) statement of the problem to be solved, (3) approach(es) used, (4) results or solutions found to solve the problem, and (5) a conclusion). The linguistic characteristics were analyzed at the three levels: (1) the discourse level, to analyze the document structure; (2) the sentence and inter-clause level, to analyze the rhetorical structures that categorize the contents; (3) the phrase and word level, to analyse the rhetorical functions of the non-technical words that contribute to the reporting of research information.

The findings show that the differences found between the Chinese and English abstracts of papers published in the two top journals in China mainly come from the distinctive lingusitic characteristics of the two languages. The shifts in information focus may change slightly the meanings of the contents in Chinese. However, in terms of quality and purpose, the English versions of the Chinese abstracts serve equally well as concise summaries of scientific papers. The results of this study can be useful for coaching graduate students in scientific writing. They can also be further analysed and classified into categorical details to be used for statistical and comparative studies of scientific writing in Chinese and English.  

Dr. Helena H. Gao received her doctoral degree in linguistics from Lund University, Sweden and post-doctoral training in cognitive psychology from the University of Toronto, Canada. She is currently an assistant professor at Nanyang Technological University (NTU), Singapore. Before she joined NTU in August 2006, she was the Research Director in the Cognitive Development Lab/Child Study Centre at the University of Toronto. She has also held teaching appointments at University of Toronto, Lund University, Taiwan Fu-Jen Catholic University, and Shenyang Jianzhu University. Dr. Gao’s research interests include Cognitive linguistics, Psycholinguistics, Corpus Linguistics, and Computational Linguistics.

