Spoken dialogue for a human-like conversational robot ERICA
Time: Monday 14, 9:30 – 10:30
Speaker: Prof. Tatsuya Kawahara (School of Informatics, Kyoto University)
Summary: This talk introduces our symbiotic human-robot interaction project, which aims at an autonomous android who behaves and interacts just like a human. A conversational android ERICA is designed to conduct several social roles focused on spoken dialogue, such as attentive listening (similar to counseling) and job interview. Design principles, problems and current solutions in developing spoken dialogue modules are presented.
Short Biography: Prof. Tatsuya Kawahara received B.E. in 1987, M.E. in 1989, and Ph.D. in 1995, all in information science, from Kyoto University, Kyoto, Japan. From 1995 to 1996, he was a Visiting Researcher at Bell Laboratories, Murray Hill, NJ, USA. Currently, he is a Professor in the School of Informatics, Kyoto University. He has also been an Invited Researcher at ATR and NICT.
He has published more than 300 technical papers on speech recognition, spoken language processing, and spoken dialogue systems. He has been conducting several projects including speech recognition software Julius and the automatic transcription system for the Japanese Parliament (Diet).
Dr. Kawahara received the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology (MEXT) in 2012. From 2003 to 2006, he was a member of IEEE SPS Speech Technical Committee. He was a General Chair of IEEE Automatic Speech Recognition and Understanding workshop (ASRU 2007). He also served as a Tutorial Chair of INTERSPEECH 2010 and a Local Arrangement Chair of ICASSP 2012. He has been an editorial board member of Elsevier Journal of Computer Speech and Language and IEEE/ACM Transactions on Audio, Speech, and Language Processing. He is an editor in chief of APSIPA Transactions on Signal and Information Processing. Dr. Kawahara is a board member of APSIPA and ISCA, and a Fellow of IEEE.
M3 Dialogs – Multimodal, Multilingual, Multiparty
Time: Tuesday 15, 9:00 – 10:00
Speaker: Prof. Alex Waibel, (Carnegie Mellon University, USA & Karlsruhe Institute of Technology, Germany)
Summary: Even though great progress has been made in building (and indeed commercially deploying speech dialog systems), they are still rather siloed and limited in scope, domain, style, language, and participants. Most systems are strictly human-machine, one language, one request at a time, usually with a clear on-off signal and identification of who wants what from whom (“Alexa, what is the weather in Singapore?”). Even though existing systems do this now rather well (a great achievement!) they falls far short of the ease, breadth and robustness with which humans can communicate.
This talk is part review and part speculations on how systems could be different. I will take a liberal view of what constitutes a “Dialog System”. In this view, a dialog is not only human-machine, but also human-human, human-machine-human and machine-machine-human… and preferably all of the above in purposeful integration.
I will outline the flexibilities we are missing in modern dialog systems, review several of our efforts aimed at addressing them and I will speculate on future directions for the research community.
Short Biography:
Dr. Alexander Waibel is a Professor of Computer Science at Carnegie Mellon University, Pittsburgh and at the Karlsruhe Institute of Technology, Germany. He is the director of the International Center for Advanced Communication Technologies (interACT). The Center works in a network with eight of the world’s top research institutions. Its mission is to develop advanced machine learning algorithms to improve human-human and human-machine communication technologies. Prof. Waibel and his team pioneered statistical and neural learning algorithms that made such communication breakthroughs possible. Most notably, the “Time-Delay Neural Network” (1987) (now also known as “convolutional” neural network) is at the heart of many of today’s AI technologies. System breakthroughs that followed suit included early multimodal dialog interfaces, the first speech translation system in Europe&USA (1990/1991), the first simultaneous lecture interpretation system (2005), and Jibbigo, the first commercial speech translator embedded on a phone (2009).
Dr. Waibel founded and served as chairmen of C-STAR (Consortium for Speech Translation Advanced Research) in 1991. He directed research programs in machine perception, interpretation and machine learning in the US, Europe and Asia, including EU-Bridge (2012-2015) and CHIL (2004-2007), two large European multi-site Integrated Project initiatives on intelligent assistants and speech translation services. He was also a co-director of IMMI, a joint venture between KIT, CNRS & RWTH.
Dr. Waibel is a Fellow of the IEEE Fellow and received numerous awards for his work on multilingual and multimodal communication and translation. He is a member of the National Academy of Sciences of German, and Honorary Senator of the Hochschulrektorenkonferenz (the Representation of German Universities). He published extensively (>800 publications, >28,000 citations, h-index >80) in the field and holds many patents.
During his career, Waibel founded and built 10 successful companies. Following the acquisition of Jibbigo by Facebook, Waibel served as founding director of the FB Language Technology Group. He also deployed speech translation technologies in humanitarian and disaster relief missions. His team recently deployed the first simultaneous interpretation service for lectures at Universities and supports language tools at the European Parliament.
Dr. Waibel received his BS, MS and PhD degrees at MIT and CMU, respectively.
Beyond Dialogue System Dichotomies: Principles for Human-Like Dialogue
Time: Wednesday 16, 9:00 – 10:00
Speaker: Prof. David Traum (University of Southern California, USA)
Summary: Many have proposed related dichotomies contrasting two different kinds and aims of dialogue systems (or AI more generally). One of the issues is whether human-system dialogue should even be human-like at all. I will explore these dichotomies and present “role-play dialogue” as a place where these dichotomies can find a commonality of purpose and where being human-like is important even simply for effective task performance. I will attempt to define “Human-like Dialogue” (HLD) as distinct from purely human dialogue and also distinct from instrumental dialogue. In the second part of the talk I will give guideline principles (dos and don’ts) for creating and evaluating HLD agents.
Short Biography: David Traum is a principal scientist at ICT and a research faculty member at the Department of Computer Science at USC. At ICT, Traum leads the Natural Language Dialogue Group, which consists of seven Ph.D.s, four students, and four other researchers. The group engages in research in all aspects of natural language dialogue, including dialogue management, spoken and natural language understanding and generation and dialogue evaluation. In addition, the group collaborates with others at ICT and elsewhere on integrated virtual humans, and transitioning natural language dialogue capability for use in training and other interactive applications. Traum’s research focuses on dialogue communication between human and artificial agents. He has engaged in theoretical, implementational and empirical approaches to the problem, studying human-human natural language and multi-modal dialogue, as well as building a number of dialogue systems to communicate with human users. He has pioneered several research thrusts in computational dialogue modeling, including computational models of grounding (how common ground is established through conversation), the information state approach to dialogue, multiparty dialogue, and non-cooperative dialogue. Traum is author of over 200 technical articles, is a founding editor of the Journal Dialogue and Discourse, has chaired and served on many conference program committees, and is currently the president emeritus of SIGDIAL, the international special interest group in discourse and dialogue. He earned his Ph.D. in computer science at University of Rochester in 1994.