Program Schedule

Local Time(GMT+8)SessionDetails
DAY 1: November 18
10:00-10:30Opening Ceremony
10:30-12:15Session 1:
Speech Recognition
#12. Multi-Encoder Sequential Attention Network for Context-Aware Speech Recognition in Japanese Dialog Conversation, Nobuya Tachimori, Sakriani Sakti and Satoshi Nakamura
#29. Investigation Of a Single-Channel Frequency-Domain Speech Enhancement Network to Improve End-To-End Bengali Automatic Speech Recognition Under Unseen Noisy Conditions, Md Mahbub E Noor, Yen-Ju Lu, Syu-Siang Wang, Supratip Ghose, Chia-Yu Chang, Ryandhimas E. Zezario, Shafique Ahmed, Wei-Ho Chung, Yu Tsao and Hsin-Min Wang
#32. A Study on Native American English Speech Recognition by Indian Listeners with Varying Word Familiarity Level, Abhayjeet Singh, Achuth Rao Mv, Rakesh Vaideeswaran, Chiranjeevi Yarra and Prasanta Kumar Ghosh
#34. Speech Recognition System for Writing Dentist Medical Records, Dinda Yora Islami and Dessi Puji Lestari
#42. A Multi-Genre Urdu Broadcast Speech Recognition System, Erbaz Khan, Sahar Rauf, Farah Adeeba and Sarmad Hussain
#39. An Empirical Study of Speaker Identification System for Mono And Traverse Linguistic Background Using Em And Smem, Amita Dev, Shweta Bansal and Shyam Sunder Agrawal
#43. Self-Supervised Spoken Question Understanding and Speaking with Automatic Vocabulary Learning, Keisuke Toyoda, Yusuke Kimura, Mingxin Zhang, Kent Hino, Kosuke Mori and Takahiro Shinozaki
12:15-14:00 Lunch Break
14:00-15:00Keynote 1Self-Supervised Representation Learning for Pre-training Speech Systems
Laurent Besacier, Principal scientist, Naver Labs Europe
15:15-17:00Session 2:
Emotion and Prosody
#17. Design And Basic Analysis of The Tut Emotional Storytelling Corpus, Hikaru Oishi, Mika Enomoto, Keiko Ochi and Yasunari Obuchi
#20. Construction and Analysis of Tibetan Amdo Dialect Speech Dataset for Speech Synthesis, Xinyi Zhang, Wenhuan Lu, Xinyue Zhao, Yi Zhu and Jianguo Wei
#23. INTO-CASS: A Corpus for The Study of Intonation and Prosody in Chinese Dialects and Ethnic Languages, Aijun Li and Ziyu Xiong
#37. Discourse Timing in Children’s Rhyme Speech Produced by Prelingually Deaf Mandarin-Speaking Children with Cochlear Implants, Jue Yu and Qianwen Jin
#21. Towards The Development of Segment Level Speech Overlap Detection Using Convolutional Neural Network, Ronald John Cabatic and Angelica De La Cruz
#33. On The Use of Gestures in Dialogue Breakdown Detection, Taiga Mori, Kristiina Jokinen and Yasuharu Den
#18. Aspect-Based Sentiment Analysis of User Created Game Reviews, Ian Michael Urriza and Maria Art Antonette Clariño
DAY 2: November 19
10:00-11:30Session 3:
Language Learning
#1. L2 Accent and intelligibility by Chinese L2 Speakers of English, Yizhou Lan and Tongtong Xie
#2. A Study on English Word-final Coronal Stop Deletion by Chinese EFL Learners, Tong Li and Hui Feng
#8. The Role of High Variability Phonetic Training on Chinese EFL Learners’ Perception of English Vowels in Noisy Environment, Qianxi Yu and Ping Tang
#10. The Effect of Overnight Consolidation on English Vowel Perception by Chinese Learners After High Speaker Variability Phonetic Training, Yanan Shen and Ping Tang
#24. Mandarin Speakers’ Acquisitions and Representations of Flapping in American English in An ESL Context: A Perception and Production Study, Chuang Chia Wei
#36. Tonal Patterns of Tri-Syllabic Words in The Production of Standard Chinese of Bilingual Teachers, Yuan Jia and Bin Li
11:45-12:45Sponsor Session
12:45-14:00 Lunch Break
14:00-15:00Keynote 2Environmentally Robust Speech Recognition: A Corpus-Based Perspective
Jun Du, Associate Professor, University of Science and Technology of China (USTC)
15:15-16:45Session 4:
Multimodal Databases
#3. SPIRE VCV: An Acoustic-Articulatory Corpus with Three Different Speaking Rates, Tilak Purohit, Tejas Umesh, Shankar Narayanan, S Minulakshmi and Prasanta Ghosh
#9. Khmer Speech Translation Corpus of The Extraordinary Chambers in The Courts of Cambodia (ECCC), Kak Soky, Masato Mimura, Tatsuya Kawahara, Sheng Li, Chenchen Ding, Chenhui Chu and Sethserey Sam
#11. SLoClas: A Database for Joint Sound Localization and Classification, Qian Xinyuan, Bidisha Sharma, Amine El Abridi and Haizhou Li
#16. GAMVA: A Japanese Audio-visual Multi-Angle Speech Corpus, Shinnosuke Isobe, Ryuichi Hirose, Takumi Nishiwaki, Tomohiro Hattori, Satoshi Tamura, Yuuto Gotoh and Masaki Nose
#19. M2ASR-Mongo: A Free Mongolian Speech Database and Accompanied Baselines, Tiankai Zhi, Ying Shi, Wenqiang Du, Guanyu Li and Dong Wang
#44. WSPIRE: A Parallel Multi-Device Corpus in Neutral and Whisper Speech, Abinay Reddy Naini, Bhavuk Singhal and Prasanta Kumar Ghosh
17:00-18:00O-COCOSDA Steering Committee Meeting (Committee members only)
DAY 3: November 20
10:00-11:30Session 5:
Dialects and Accents
#4. Which Phonemes Will Distinguish the Different Regions Within the Same Dialect? Xuefei Liu, Jianhua Tao, Yurong Han, Chenglong Wang, Xueying Zheng and Zhengqi Wen
#5. Comparison Of Static and Time-Sequential Features in Automatic Fluency Detection of Spontaneous Speech, Huaijin Deng, Takehito Utsuro, Akio Kobayashi and Hiromitsu Nishizaki
#6. How Do Speakers Pause and Hesitate in English and Japanese? – A Comparison Using Parallel Corpora of English and Japanese Presentation Speeches, Michiko Watanabe, Yuma Shirahata, Ralph Rose and Kikuo Maekawa
#26. Korean Dialect Identification Based on Intonation Modeling, Jooyoung Lee, Kyungwha Kim and Minhwa Chung
#35. Development of Accent Recognition Systems for Vietnamese Speech, Quang Tien Duong and Van Hai Do
#7. A Blind Method for Phone Segmentation and Its Evaluation on Vietnamese Speech Corpus, Dac-Thang Hoang and Tat-Thang Vu
11:45-12:45Country/Region ReportsChina, Aijun Li and Dong Wang
Hong Kong, Tan Lee
India, S.S Agrawal and K. Samudravijaya
Indonesia, Dr. Ir. Hammam Riza
Japan, Satoshi Nakamura
Korea, Prof. Emeritus
Philippine, Nathaniel Oco
Singapore, Haizhou Li
Taiwan, Sin-Horng Chen and Hsin-Min Wang
Thailand, Ausdang Thangthai
Vietnam, Luong Chi Mai
Lunch Break
14:00-15:00Virtual Tour
15:15-16:45Session 6:
Speech Synthesis and Translation
#13. Simultaneous Speech-to-speech Translation System with Transformer-based Incremental ASR, MT, and TTS, Ryo Fukuda, Sashi Novitasari, Yui Oka, Yasumasa Kano, Yuki Yano, Yuka Ko, Hirotaka Tokuyama, Kosuke Doi, Tomoya Yanagita, Sakriani Sakti, Katsuhito Sudoh and Satoshi Nakamura
#14. Using Speech Enhancement to Realize Speech Synthesis of Low-Resource Dungan Languages, Rui Jiang, Hongwu Yang, Sicheng Chen and Xin Shan
#15. A study on neural-network-based Text-to-Speech adaptation techniques for Vietnamese, Phuong Pham Ngoc, Chung Tran Quang, Truong Do Quoc and Mai Luong Chi
#25. Using Local Phrase Dependency Structure Information in Neural Sequence-To-Sequence Speech Synthesis, Nobuyoshi Kaiki, Sakriani Sakti and Satoshi Nakamura
#28. Text-to-Speech Systems for Filipino using Unit Selection and Deep Learning, Edsel Jedd Renovalles and Crisron Rudolf Lucas
#31. Investigation Of an Input Sequence on Thai Neural Sequence-to-Sequence Speech Synthesis, Pongsathon Janyoi and Ausdang Thangthai
16:45-17:15Closing Ceremony