11-14 December 2022, Singapore
The page contains the following archives of the presentations:
Keynote 1: Advancing end-to-end automatic
speech recognition and beyond(Slides, Video)
|
Keynote 2: Recent progress in code-switch
Singapore English+Mandarin large vocabulary
continuous speech recognition (Slides, Video)
|
Keynote 3: Automated Assessment and Feedback:
the Role of Spoken Grammatical Error Correction (Slides, Video)
|
Tutorial 1: Exploring the Frontier of
Large-Scale Semi-Supervised Learning for Speech Processing (Slides, Video)
|
Tutorial 2: TorchAudio
Tutorial (Slides, Video)
|
Tutorial 3: Towards Solving Cocktail Party
Problem with Artificial Intelligence (Slides, Video)
|
Tutorial 4: Quantum Machine Learning for
Speech Processing: from Theoretical Foundations to Practices (Slides, Video)
|
Tutorial 5: Recent Advances on Automatic
Dialogue Evaluation(Slides,
Video)
|
Challenge 1 Conversational Short-Phrase Speaker Diarization Challenge (CSSD) Gaofeng Cheng, Yifan Chen, Runyan Yang, Qingxuan Li, Zehui Yang, Lingxuan Ye, Pengyuan Zhang, Qingqing Zhang, Lei Xie, Yanmin Qian, Kong Aik Lee and Yonghong Yan Kai Li Tao Liu, Xu Xiang, Zhengyang Chen, Bing Han, Kai Yu and Yanmin Qian GC1.4 132 TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge Bowen Pang, Huan Zhao, Gaosheng Zhang, Xiaoyue Yang, Yang Sun, Li Zhang, Qing Wang and Lei Xie |
Challenge 2 Intelligent Cockpit Speech Recognition Challenge (ICSRC) Ao Zhang, Fan Yu, Kaixun Huang, Lei Xie, Longbiao Wang, Eng Siong Chng, Hui Bu, Binbin Zhang, Wei Chen and Xin Xu GC2.2 139 The FawAI ASR System for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge Yujia Sun, Bing Ge, Bo Chen, Zhen Fu, Jinxin He, Hongwei Gao and Xue Wang GC2.3 140 LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge Yan Jia, Mi Hong, Jingyu Hou, Kailong Ren, Sifan Ma, Jin Wang, Yinglin Ji, Fangzhen Peng, Lin Yang and Junjie Wang GC2.4 141 Efficient Conformer-Based CTC Model for Intelligent Cockpit Speech Recognition Hanzhi Guo, Yunshu Chen, Xukang Xie, Gaopeng Xu and Wei Guo |
Challenge 3 Chinese-English Code-Switching Automatic Speech Recognition (CSASR) GC3.1 138 Summary on the ISCSLP 2022 Chinese-English Code-switching ASR Challenge Shuhao Deng, Chengfei Li, Jinfeng Bai, Qingqing Zhang, Wei-Qiang Zhang, Runyan Yang, Gaofeng Cheng, Pengyuan Zhang and Yonghong Yan GC3.2 135 The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge Yuhao Liang, Peikun Chen, Fan Yu, Xinfa Zhu, Tianyi Xu, Yingying Gao and Lei Xie GC3.3 136 Hybrid CTC Language Identification Structure for Mandarin-English Code-Switching ASR Hengxin Yin, Guangyu Hu, Fei Wang and Pengfei Ren |
Oral 1: Speech Recognition I Chao-Han Huck Yang, Jun Qi, Sabato Marco Siniscalchi and Chin-Hui Lee OS1.2 25 Adaptive Attention Network with Domain Adversarial Training for Multi-Accent Speech Recognition Yanbing Yang, Hao Shi, Yuqin Lin, Meng Ge, Longbiao Wang, Qingzhi Hou and Jianwu Dang OS1.3 26 Multilingual Zero Resource Speech Recognition Base on Self-Supervise Pre-Trained Acoustic Models Haoyu Wang, Wei-Qiang Zhang, Hongbin Suo and Yulong Wan Song Li, Haoneng Luo, Wenxuan Hu, Yuan Liu, Shiliang Zhang, Lin Li and Qingyang Hong OS1.5 49 Sequence Distribution Matching for Unsupervised Domain Adaptation in ASR Qingxuan Li, Han Zhu, Liuping Luo, Gaofeng Cheng, Pengyuan Zhang, Jiasong Sun and Yonghong Yan HoLam Chung, Junan Li, Pengfei Liu, Wai Kim Leung, Xixin Wu and Helen Meng |
Oral 2: Speech Production and Perception I OS2.1 54 Perception and production of Mandarin vowels by teenagers--blind and sighted Moyu Chen, Jing Qi and Xiyu Wu OS2.2 70 The Production of Contrastive Focus by Children Learning Mandarin Chinese Jing Lu and Ping Tang OS2.3 77 Production Characteristics of Vowels in Standard Chinese by Preschool Bilingual Teachers Linjiao Pan and Yuan Jia OS2.4 81 Effects of Aspiration on Tone Production and Perception in Standard Chinese Chong Cao and Aijun Li Jingwen Cheng, Yingming Gao, Yuchen Yan, Xiaoli Feng, Binghuai Lin and Jinsong Zhang OS2.6 3 A preliminary ultrasonic investigation of tenseness in Northern Yi Shuwen Chen |
Oral 3: Speech Synthesis Chunyu Qiang, Peng Yang, Hao Che, Xiaorui Wang and Zhongyuan Wang Qicong Xie, Tao Li, Xinsheng Wang, Zhichao Wang, Lei Xie, Guoqiao Yu and Guanglu Wan OS3.3 51 Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS Kun Song, Jian Cong, Xinsheng Wang, Yongmao Zhang, Lei Xie, Ning Jiang and Haiying Wu OS3.4 52 AccentSpeech: Learning Accent from Crowd-sourced Data for Target Speaker TTS with Accents Yongmao Zhang, Zhichao Wang, Peiji Yang, Hongshen Sun, Zhisheng Wang and Lei Xie OS3.5 28 CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction Daxin Tan, Liqun Deng, Nianzu Zheng, Yu Ting Yeung, Xin Jiang, Xiao Chen and Tan Lee OS3.6 62 HILvoice: Human-in-the-Loop Style Selection for Elder-Facing Speech Synthesis Xueyuan Chen, Qiaochu Huang, Xixin Wu, Zhiyong Wu and Helen Meng |
Special Session 1: Data Augmentation in Speech Technologies SS1.1 103 Dynamic Thresholding on FixMatch with Weak and Strong Data Augmentations for Sound Event Detection Tanmay Khandelwal and Rohan Kumar Das SS1.2 118 Data Augmentation for Infant Cry Classification Aastha Kachhi, Shreya Chaturvedi, Hemant A. Patil and Dipesh Kumar Singh Yikang Wang, Xingming Wang, Hiromitsu Nishizaki and Ming Li SS1.4 8 Improving Speech Recognition with Augmented Synthesized Data and Conditional Model Training Shaofei Xue, Jian Tang and Yazhu Liu SS1.5 85 Speaking style compensation on synthetic audio for robust keyword spotting Houjun Huang and Yanmin Qian Qing Wang, Jun Du, Siyuan Zheng, Yunqing Li, Yajian Wang, Yuzhong Wu, Hu Hu, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Yannan Wang and Chin-Hui Lee |
Oral 4: Voice Conversion & Spoofing Speech Detection OS4.1 38 End-to-End Voice Conversion with Information Perturbation Qicong Xie, Shan Yang, Yi Lei, Lei Xie and Dan Su OS4.2 42 Mix-Guided VC: Any-to-many Voice Conversion by Combining ASR and TTS Bottleneck Features Zeqing Zhao, Sifan Ma, Yan Jia, Jingyu Hou, Lin Yang and Junjie Wang Dengfeng Ke, Wenhan Yao, Ruixin Hu, Qi Luo, Liangjie Huang, Qi Luo and Wentao Shu OS4.4 116 The Impact of Room Acoustics on Replay Speech Signal Madhu R. Kamble and Hemant A. Patil OS4.5 124 Effect of Speaker-Microphone Proximity on Pop Noise: Continuous Wavelet Transform-Based Approach Priyanka Gupta and Hemant A. Patil OS4.6 127 Synthetic Voice Detection and Audio Splicing Detection using SE-Res2Net-Conformer Architecture Lei Wang, Benedict Yeoh and Jun Wah Ng OS4.7 144 Audio Splicing Localization: Can We Accurately Locate the Splicing Tampering? Zhiping Zeng and Zhizheng Wu |
Oral 5: Speech Enhancement and Separation OS5.1 71 Masking-based Neural Beamformer for Multichannel Speech Enhancement Shuai Nie, Shan Liang, Zhanlei Yang, Longshuai Xiao, Wenju Liu and Jianhua Tao OS5.2 36 Deep Multi-task Cascaded Acoustic Echo Cancellation and Noise Suppression Junjie Li, Meng Ge, Longbiao Wang and Jianwu Dang OS5.3 30 Boosting the Performance of SpEx+ by Attention and Contextual Mechanism Chenyi Li, Zhiyong Wu, Wei Rao, Yannan Wang and Helen Meng Shangdi Liao and Fei Chen OS5.5 6 Speech-enhanced and Noise-aware Networks for Robust Speech Recognition Hung-Shin Lee, Pin-Yuan Chen, Yao-Fei Cheng, Yu Tsao and Hsin-Min Wang Yuxiao Lin, Zhihao Du, Shiliang Zhang, Fan Yu, Zhou Zhao and Fei Wu OS5.7 133 Speech Enhancement Based on CycleGAN with Noise-informed Training Wen-Yuan Ting, Syu-Siang Wang, Hsin-Li Chang, Borching Su and Yu Tsao |
Oral 6: Speech Recognition II OS6.1 10 Incorporating VAD into ASR System by Multi-task Learning Meng Li, Yan Xia and Feng Lin OS6.2 20 Improving ASR in Reverberant Environments Yen-Lun Liao, Chi-Han Lin, Ren-Yuan Lyu and Jyh-Shing Roger Jang OS6.3 23 3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition Zhao You, Shulin Feng, Dan Su and Dong Yu OS6.4 32 Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition Yuting Yang, Binbin Du and Yuke Li Keyu An, Ji Xiao and Zhijian Ou OS6.6 73 Ensemble and Re-ranking based on Language Models to Improve ASR Shu-Fen Tsai, Shih-Chan Kuo, Ren-Yuan Lyu and Jyh-Shing Roger Jang |
Oral 7: Speech Production and Perception II OS7.1 19 Acoustic and Perceptual Study of Tones in Jin Chinese (Togtoh variety) Yue Wang and Wen Liu OS7.2 53 Acoustic-perceptual correlates of whispered Mandarin consonants Min Xu, Jing Shao, Hongwei Ding and Lan Wang Kimiko Tsukada, Yurong and Badmaavanchin Munguntsetseg OS7.4 66 Multichannel Emotional Perception in Chinese Female: Faces, Voices and Bodies Ruiqi Ge and Xiyu Wu OS7.5 128 Coda Nasal Perception in Wenzhou Wu and Rugao Mandarin by Native Speakers of Standard Mandarin Yanyang Chen, Xinya Zhang, Ying Chen and Jiazheng Wang OS7.6 22 Objective Hand Complexity Comparison between Two Mandarin Chinese Cued Speech Systems Li Liu, Gang Feng, Xiaoxi Ren and Xianping Ma |
Oral 8: Speech Synthesis & Speaker Embedding OS8.1 34 Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis Dengfeng Ke, Yayue Deng, Yukang Jia, Jinlong Xue, Qi Luo, Ya Li, Jianqing Sun, Jiaen Liang and Binghuai Lin OS8.2 74 AdaptiveFormer : A Few-shot Speaker Adaptative Speech Synthesis Model based on FastSpeech2 Dengfeng Ke, Ruixin Hu, Qi Luo, Liangjie Huang, WenHan Yao, Wentao Shu, Jinsong Zhang and Yanlu Xie OS8.3 27 ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis Jinlong Xue, Yayue Deng, Yichen Han, Ya Li, Jianqing Sun and Jiaen Liang OS8.4 78 Low-Resource Speech Synthesis with Speaker-Aware Embedding Li-Jen Yang, I-Ping Yeh and Jen-Tzung Chien Zhijunyi Yang, Mengjie Du, Rongfeng Su, Xiaokang Liu, Nan Yan and Lan Wang OS8.6 120 Shuffle is What You Need Wan Lin, Lantian Li and Dong Wang |
Special Session 2: Deep Noise Reduction SS2.1 104 On the Use of Absolute Threshold of Hearing-based Loss for Full-band Speech Enhancement Rohith Mars and Rohan Kumar Das SS2.2 106 RAT: RNN-Attention Transformer for Speech Enhancement Tailong Zhang, Shulin He, Hao Li and Xueliang Zhang SS2.3 109 A Speech-Noise-Equilibrium Loss Function for Deep Learning-Based Speech Enhancement Weitong Zhao, Fushi Xie, Kang Ouyang and Nengheng Zheng Shulin He, Hao Li and Xueliang Zhang SS2.5 105 Two-Branch Network with Selective Kernel Convolution for Time-Domain Speech Enhancement Hui Li, Zhihua Huang and Chuangjian Guo Guochen Yu, Andong Li, Wenzhe Liu, Chengshi Zheng, Yutian Wang and Hui Wang |
OS9.1 46 Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function Qing Wang, Hang Chen, Ya Jiang, Zhe Wang, Yuyang Wang, Jun Du and Chin-Hui Lee OS9.2 108 Multi-Task Joint Learning for Embedding Aware Audio-Visual Speech Enhancement Chenxi Wang, Hang Chen, Jun Du, Baocai Yin and Jia Pan Jiajun Liu, Huazhen Meng, Yunfei Shen, Linna Zheng and Aishan Wumaier OS9.4 114 Cantonese neural speech synthesis from found newscasting video data and its speaker adaptation Raymond Chung Yuan-Fu Liao, Yu-Hsuan Huang, Matus Pleva, Daniel Hládek and Ming-Hsiang Su OS9.6 18 Reconstruction of speech spectrogram based on non-invasive EEG signal Di Zhou, Masashi Unoki, Gaoyan Zhang and Jianwu Dang |
Oral 10: Speech Prosody Binbin Shen, Jian Luan, Shengyan Zhang, Quanbo Shen and Yujun Wang OS10.2 12 A Mandarin Prosodic Boundary Prediction Model Based on Multi-Source Semi-Supervision Peiyang Shi, Zengqiang Shang and Pengyuan Zhang OS10.3 59 English lexical stresses in non-native speech under adverse conditions Mosi He, Ting Zhang, Bin Li and Kin Cheung OS10.4 35 Stress Gravity of Neutral Tone Words in Different Information Structures Jingwen Huang and Aijun Li Tong Li, Hui Feng and Yuan Jia OS10.6 102 In-group Advantage for Chinese and English Emotional Prosody in Quiet and Noise Conditions Yuhan Yan, Shanpeng Li and Ying Chen |
Oral 11: Lightweight Model & Knowledge Distillation OS11.1 48 Multi-Resolution Stacked 1D-CNN for Small-Footprint keyword Spotting with Two-Stage Detection Jian Tang and Shaofei Xue OS11.2 65 Lightweight End-to-End Deep Learning Model for Music Source Separation Yao-Ting Wang, Yi-Xing Lin, Kai-Wen Liang, Tzu-Chiang Tai and Jia-Ching Wang OS11.3 97 AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation Kun Song, Heyang Xue, Xinsheng Wang, Jian Cong, Yongmao Zhang, Lei Xie, Bing Yang, Xiong Zhang and Dan Su OS11.4 43 Label-free Knowledge Distillation with Contrastive Loss for Light-weight Speaker Recognition Zhiyuan Peng, Xuanji He, Ke Ding, Tan Lee and Guanglu Wan OS11.5 121 Improving Speech Separation with Knowledge Distilled from Self-supervised Pre-trained Models Bowen Qu, Chenda Li, Jinfeng Bai and Yanmin Qian OS11.6 111 Text-Informed Knowledge Distillation for Robust Speech Enhancement and Recognition Wei Wang, Wangyou Zhang, Shaoxiong Lin and Yanmin Qian |
Oral 12: Speech Technology for Health OS12.1 94 Prediction of Depression Severity Based on Transformer Encoder and CNN Model Jiahao Lu, Bin Liu, Zheng Lian, Cong Cai, Jianhua Tao and Ziping Zhao OS12.2 5 Depressive Tendency Recognition by Fusing Speech and Text Features: A Comparative Analysis Yimin He, Xiaoyong Lu, Jingyi Yuan, Tao Pan and Yafan Wang OS12.3 17 Medical Difficult Airway Detection using Speech Technology Zhikai Zhou, Shuang Cao, Zhengyang Chen, Bei Liu, Ming Xia, Hong Jiang and Yanmin Qian OS12.4 88 CUEMPATHY: A Counseling Speech Dataset for Psychotherapy Research Dehua Tao, Harold Chui, Sarah Luk and Tan Lee Ying Qin, Tan Lee, Anthony Pak Hin Kong and Feng Lin OS12.6 39 Respiratory and laryngeal influences on voice in post-stroke dysarthria: a pilot study Tinghao Zhao, Xiaoxia Du, Juan Liu, Rongfeng Su, Nan Yan and Lan Wang |
Oral 13: Listening Comprehension of Machines and Humans OS13.1 110 End-to-end speech topic classification based on pre-trained model Wavlm Tengfei Cao, Liang He and Fangjing Niu Tsung-Hsien Yang, Matus Pleva, Daniel Hládek and Ming-Hsiang Su OS13.3 29 Dialogue scenario classification based on social factors Yuning Liu, Di Zhou, Masashi Unoki, Jianwu Dang and Aijun Li OS13.4 112 BERT-LID: Leveraging BERT to Improve Spoken Language Identification Yuting Nie, Junhong Zhao, Wei-Qiang Zhang and Jinfeng Bai Rian Bao, Linkai Peng, Yuchen Yan and Jinsong Zhang Rian Bao, Linkai Peng, Yingming Gao and Jinsong Zhang |
Oral 14: Acoustic Phonetics & Prosody OS14.1 21 An Acoustic Study on Fricative Vowel [iʑ] in Zhongwei Chinese Xinyi Zhang and Wen Liu OS14.2 69 Acoustic Features of Consonants of Standard Chinese and English by Uyghur Native Speakers Yuan Jia and Xintong Zuo OS14.3 33 A Study on Mandarin Chinese ""Bu” Tone Sandhi Followed by English Words Kaige Gao and Xiyu Wu OS14.4 68 An Entropy-based Study on the Acquisition of Mandarin Initial Consonants by Korean Learners Xiaoli Feng, Yingming Gao, Jinsong Zhang and Yanchun Cao Yujie Ji, Qiqi Sun, Zhikang Peng and Xiaoming Jiang OS14.6 31 Acceptance of tonal and segmental variability correlates to inventory size in Mandarin Chinese Julie Siying Chen and Stephen Politzer-Ahles |
Copyright © ISCSLP 2022