{"id":626,"date":"2022-08-23T16:15:04","date_gmt":"2022-08-23T16:15:04","guid":{"rendered":"https:\/\/www.colips.org\/conferences\/iscslp2022\/wp\/?page_id=626"},"modified":"2022-12-23T21:39:25","modified_gmt":"2022-12-23T13:39:25","slug":"tutorials","status":"publish","type":"page","link":"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/tutorials\/","title":{"rendered":"Tutorials"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-page\" data-elementor-id=\"626\" class=\"elementor elementor-626\">\n\t\t\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-610e31f elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"610e31f\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-f6e7a0f\" data-id=\"f6e7a0f\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-ab12dce elementor-widget elementor-widget-heading\" data-id=\"ab12dce\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\">Tutorial 1: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Speech Processing<\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-f69ad15 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"f69ad15\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-a52ed7f\" data-id=\"a52ed7f\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-88d0691 elementor-widget elementor-widget-heading\" data-id=\"88d0691\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Presenters: Yu Zhang, Bo Li, Daniel Park, Google<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-5513650 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"5513650\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-c35d989\" data-id=\"c35d989\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-f73fc07 elementor-widget elementor-widget-text-editor\" data-id=\"f73fc07\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Semi-supervised learning (SSL), which uses unlabeled data to enhance the performance of labeled tasks, has recently played a crucial part in improving public automatic speech recognition (ASR) benchmarks. A combination of pre-training [1]\u2013[13] and self-training [14]\u2013[25] methods have been utilized to enable deep networks to push the state-of-the-art (SoTA) performance on public ASR datasets [12], [23], [26]. Despite the success and exciting developments in this domain, this setting for semi supervised learning is limited in a few aspects. First, the unsupervised data is tailored to the supervised task and pretrained models on Libri-Light has shown limited generalization capacity to different domains in some instances. Second, the Libri-Light dataset is not much bigger than industrial scaled labeled datasets. Third, the supervised tasks considered are much smaller compared to practical tasks on which the performance of the network needs to be improved.<\/p>\n<p>In this tutorial, we will explore how to build a universal speech understanding model that is capable of transcribing speech from many languages and many domains, as well as obtaining superior performance on many non-ASR speech understanding tasks such as speech translation and non-semantic speech classification tasks (such language-id detection, speaker-id detection, etc). More precisely, we present:<\/p>\n<ul>\n<li>Exploring SSL on an industrial model scale: from 600M to 8B.<\/li>\n<li>Exploring SSL on an industrial data scale: millions hours of massive multilingual data.<\/li>\n<li>A new framework for both speech and text pretraining that is suitable for industrial scale.<\/li>\n<\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-0aa704e elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"0aa704e\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-50 elementor-top-column elementor-element elementor-element-b114fec\" data-id=\"b114fec\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-e1003df elementor-widget elementor-widget-text-editor\" data-id=\"e1003df\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Yu Zhang is currently a research scientist at Google Brain. He received his Ph.D degree in computer science from Massachusetts Institute of Technology in 2017. During his Ph.D, he worked on improving speech recognition performance. He is a fan of open source projects and contributed or involved to develop CNTK, MXNet and ESPNet to facilitate ASR research. Currently, his research interests are improving ML model performance for various speech processing applications, with a focus on sequence to sequence modeling. Yu is a main contributor to Google&#8217;s next generation RNNT ASR model and \u00a0Tacotron based text-to-speech system.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-50 elementor-top-column elementor-element elementor-element-e09afcc\" data-id=\"e09afcc\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-610e878 elementor-widget elementor-widget-image\" data-id=\"610e878\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"421\" height=\"424\" src=\"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/08\/Yu-Zhang.jpg\" class=\"attachment-large size-large wp-image-745\" alt=\"\" srcset=\"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/08\/Yu-Zhang.jpg 421w, https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/08\/Yu-Zhang-298x300.jpg 298w, https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/08\/Yu-Zhang-150x150.jpg 150w\" sizes=\"(max-width: 421px) 100vw, 421px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-9e938eb elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"9e938eb\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-50 elementor-top-column elementor-element elementor-element-3a91001\" data-id=\"3a91001\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-98ba198 elementor-widget elementor-widget-text-editor\" data-id=\"98ba198\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Bo Li received the Ph.D degree in computer science from the School of Computing, National University of Singapore in 2014 and the B.E. degree in computer engineering from the School of Computer, Northwestern Polytechnical University, China, in 2008. He is currently a staff research scientist at Google. His research interests are mainly in massively multilingual end-to-end automatic speech recognition using semi-supervised learning, lifelong learning and transfer learning.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-50 elementor-top-column elementor-element elementor-element-98a28f7\" data-id=\"98a28f7\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-0134841 elementor-widget elementor-widget-image\" data-id=\"0134841\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"274\" height=\"274\" src=\"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/08\/Bo-Li-.png\" class=\"attachment-large size-large wp-image-753\" alt=\"\" srcset=\"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/08\/Bo-Li-.png 274w, https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/08\/Bo-Li--150x150.png 150w\" sizes=\"(max-width: 274px) 100vw, 274px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-f3c9132 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"f3c9132\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-50 elementor-top-column elementor-element elementor-element-9c1bc80\" data-id=\"9c1bc80\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-e764c1a elementor-widget elementor-widget-text-editor\" data-id=\"e764c1a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Daniel Park is a research scientist at Google Brain, where he started working as an AI resident in 2018. His research interests include automatic speech recognition, semi-supervised learning, neural architecture search, multimodal learning and audio generation. Daniel received his PhD in Physics from MIT in 2012 and has held postdoctoral positions for high energy theoretical physics research at Stony Brook University and Rutgers University.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-50 elementor-top-column elementor-element elementor-element-93e0246\" data-id=\"93e0246\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-956d4ed elementor-widget elementor-widget-image\" data-id=\"956d4ed\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"274\" height=\"284\" src=\"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/08\/Daniel-Park.jpg\" class=\"attachment-large size-large wp-image-754\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-a819c5c elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"a819c5c\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-9b2008f\" data-id=\"9b2008f\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-7a02050 elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"7a02050\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-ce3ea5e elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"ce3ea5e\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-f487620\" data-id=\"f487620\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-6bf07bd elementor-widget elementor-widget-heading\" data-id=\"6bf07bd\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Tutorial 2: TorchAudio Tutorial<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-5ea33d1 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"5ea33d1\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-81da05f\" data-id=\"81da05f\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-958a1de elementor-widget elementor-widget-heading\" data-id=\"958a1de\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Presenters: Xiaohui Zhang, Zhaoheng Ni, Jeff Hwang, Caroline Chen, Meta<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-ff9aade elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"ff9aade\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-6b123e3\" data-id=\"6b123e3\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-ccb5449 elementor-widget elementor-widget-text-editor\" data-id=\"ccb5449\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>This session will give an overview of the TorchAudio library and three tutorials covering advanced usages of its components. In tutorial <strong>Source Separation and Speech Enhancement<\/strong>, we will demonstrate how to 1) perform speech separation by using a ConvTasNet model trained on the Libri2Mix dataset, 2) perform music source separation by using a Hybrid Demucs model trained on the MUSDB18-HQ dataset, and 3) build and run an end-to-end multi-channel speech enhancement model training pipeline. In <strong>Streaming Automatic Speech Recognition (ASR)<\/strong>, we will walk participants through loading speech streams, applying transforms, extracting features, and passing features to a pre-trained streaming ASR model to generate real-time transcriptions. Finally, in <strong>Self supervised learning (SSL) pipeline<\/strong>, we will demonstrate TorchAudio\u2019s SSL support by showcasing pre-trained SSL models (wav2vec2.0, HuBERT, Voxpopuli), end-to-end training recipes (HuBERT pre-training and fine-tuning), downstream datasets used in the SUPERB benchmark, and a highly efficient CTC decoder for evaluating fine-tuned SSL models.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-48b90d0 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"48b90d0\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-66 elementor-top-column elementor-element elementor-element-b5628ca\" data-id=\"b5628ca\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-bf45378 elementor-widget elementor-widget-text-editor\" data-id=\"bf45378\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Xiaohui Zhang is currently a research scientist in the PyTorch Audio team of Meta. He obtained his PhD in Electrical Engineering and Master in Applied Math and Stats from Center for Language and Speech Processing (CLSP) at the Johns Hopkins University (JHU), supervised by Dan Povey and Sanjeev Khudanpur, and then joined Meta as a research scientist in 2018. His contributions to the ASR community spanned over acoustic modeling, discriminative training, optimization, pronunciation learning, OOV recovery, etc. He was one of the main contributors of Kaldi.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-5c0695d\" data-id=\"5c0695d\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-cc35041 elementor-widget elementor-widget-image\" data-id=\"cc35041\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"201\" height=\"220\" src=\"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/12\/xiaohui-zhang.png\" class=\"attachment-large size-large wp-image-1800\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-71afd22 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"71afd22\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-50 elementor-top-column elementor-element elementor-element-78d5b9f\" data-id=\"78d5b9f\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-b9028ad elementor-widget elementor-widget-text-editor\" data-id=\"b9028ad\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Zhaoheng Ni is currently a research scientist in the PyTorch Audio team of Meta. He graduated from City University of New York supervised by Michael Mandel then joined Meta AI as a research scientist in 2021. His research interests are single-channel and multi-channel speech enhancement, speech separation, and robust ASR.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-50 elementor-top-column elementor-element elementor-element-065b836\" data-id=\"065b836\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-29038e4 elementor-widget elementor-widget-image\" data-id=\"29038e4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"265\" height=\"273\" src=\"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/12\/zhaoheng-ni.png\" class=\"attachment-large size-large wp-image-1801\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-dddc0f6 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"dddc0f6\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-66 elementor-top-column elementor-element elementor-element-4324ccc\" data-id=\"4324ccc\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-607b14e elementor-widget elementor-widget-text-editor\" data-id=\"607b14e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Caroline Chen is currently a software engineer in the PyTorch Audio team of Meta. She graduated from MIT with a Bachelor\u2019s degree in Computer Science and Engineering, and joined Meta AI in 2021.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-60655f5\" data-id=\"60655f5\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-298a25a elementor-widget elementor-widget-image\" data-id=\"298a25a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"215\" height=\"230\" src=\"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/12\/caroline-chen.png\" class=\"attachment-large size-large wp-image-1802\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-efb24c5 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"efb24c5\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-66 elementor-top-column elementor-element elementor-element-4543f7b\" data-id=\"4543f7b\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-979672d elementor-widget elementor-widget-text-editor\" data-id=\"979672d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Jeff Hwang is an engineer on Meta\u2019s PyTorch Audio team.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-33 elementor-top-column elementor-element elementor-element-7ec4053\" data-id=\"7ec4053\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-0547f8d elementor-widget elementor-widget-image\" data-id=\"0547f8d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"248\" height=\"284\" src=\"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/12\/jeff-hwang.png\" class=\"attachment-large size-large wp-image-1803\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-ab2f0f8 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"ab2f0f8\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-4c2c6da\" data-id=\"4c2c6da\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-51c6cd4 elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"51c6cd4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-53e13e8 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"53e13e8\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-f428b07\" data-id=\"f428b07\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-799567e elementor-widget elementor-widget-heading\" data-id=\"799567e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Tutorial 3: Towards Solving Cocktail Party Problem with Artificial Intelligence<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-6f448d2 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"6f448d2\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-739e942\" data-id=\"739e942\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-2d6c65c elementor-widget elementor-widget-heading\" data-id=\"2d6c65c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Presenter: Dr. Chenglin Xu, Kuaishou Technology<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-4a3e928 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"4a3e928\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-676d9b8\" data-id=\"676d9b8\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-8df408d elementor-widget elementor-widget-text-editor\" data-id=\"8df408d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Human has the remarkable ability to focus on the attended voice in a cocktail party. How to make the machine have such ability has been studied for decades. With the recent artificial intelligence revolution, speech separation and extraction techniques have achieved breakthroughs towards solving the cocktail party problem. By solving this, many speech tasks in human communication and human machine interaction could be made possible in a cocktail party environment.<\/p>\n<p>This tutorial will cover the basic concepts of speech separation and extraction up to the recent developments. The blind source separation methods by mimicking human\u2019s bottom-up process will be summarized first. The speaker extraction techniques are then introduced to relax some of the limitations of the blind source separation by mimicking human\u2019s top-down process. Audio and visual clues as the references will be studied to assist the extraction process. The down-stream applications in a cocktail party environment will be further reviewed. Finally, challenges and opportunities will be discussed.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-e7b3cab elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"e7b3cab\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-50 elementor-top-column elementor-element elementor-element-a9d8190\" data-id=\"a9d8190\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-3d26373 elementor-widget elementor-widget-text-editor\" data-id=\"3d26373\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Chenglin Xu is currently with Audio and Video Technology Group in Kuaishou Technology, China. Dr. Xu received his B.Sc. and M.Sc. from Northwestern Polytechnical University, China in 2012 and 2015, and PhD degree from Nanyang Technological University, Singapore in 2020. After that, he worked at National University of Singapore (2020-2021) as a research fellow. His research interests include source separation, speaker extraction, speech enhancement, speaker verification, speech recognition and deep learning. He has published over 40 prestigious journal and conference papers, including IEEE TPAMI, IEEE\/ACM TASLP, Neural Networks, ICASSP and INTERSPEECH. He served as an Area Chair and Technical Program Chair in O-COCOSDA 2020 and 2021.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-50 elementor-top-column elementor-element elementor-element-9169fc9\" data-id=\"9169fc9\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-bbab49e elementor-widget elementor-widget-image\" data-id=\"bbab49e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"308\" height=\"421\" src=\"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/08\/Chenglin-Xu-.jpg\" class=\"attachment-large size-large wp-image-805\" alt=\"\" srcset=\"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/08\/Chenglin-Xu-.jpg 308w, https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/08\/Chenglin-Xu--219x300.jpg 219w\" sizes=\"(max-width: 308px) 100vw, 308px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-bc2d955 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"bc2d955\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-7ba3995\" data-id=\"7ba3995\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-533e9e3 elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"533e9e3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-79ed3f5 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"79ed3f5\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-e74be70\" data-id=\"e74be70\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-06774b6 elementor-widget elementor-widget-heading\" data-id=\"06774b6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\"><p style=\"font-weight: 400;white-space: normal\"><span style=\"color: var( --e-global-color-primary );font-family: var( --e-global-typography-primary-font-family ), Sans-serif;font-size: 18px;font-weight: var( --e-global-typography-primary-font-weight );white-space: pre-wrap\">Tutorial 4: Quantum Machine Learning for Speech Processing: from Theoretical Foundations to Practices<\/span><span style=\"color: var( --e-global-color-primary );font-family: var( --e-global-typography-primary-font-family ), Sans-serif;font-size: 18px;font-weight: var( --e-global-typography-primary-font-weight );white-space: pre-wrap\"><\/span><\/p><\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8ec8333 elementor-widget elementor-widget-heading\" data-id=\"8ec8333\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Presenters: Prof. Jun Qi, Fudan Unversity, Shanghai, China; Huck Yang, Ph.D. candidate, Georgia Insitute of Technology, Atlanta, GA, USA\n\n<div><br><\/div><\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-9eaacbe elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"9eaacbe\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-431c48d\" data-id=\"431c48d\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-e674e82 elementor-widget elementor-widget-text-editor\" data-id=\"e674e82\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>State-of-the-art machine learning (ML), particularly based on deep neural networks (DNN),has enabled a wide spectrum of successful applications ranging from the everyday deployment of speech recognition and computer vision to the frontier of scientific research in synthetic biology. Despite rapid theoretical and empirical progress in DNN-based regression and classification, DNN training algorithms are computationally expensive, even beyond the physical limits of classical hardware. The imminent advent of quantum computing devices opens up new possibilities for exploiting quantum machine learning (QML) to improve the computational efficiency of ML algorithms in new domains. In particular, the advance in quantum hardware enables the QML algorithms to run in noisy intermediate-scale quantum (NISQ) devices. Furthermore, we could employ hybrid quantum-classical models that rely on optimizing parametric quantum circuits, which are resilient to quantum noise errors, and admit many practical QML implementations on NISQ devices. In this tutorial, we discuss how to set up quantum neural networks and put forth the related applications in speech and acoustics processing. The tutorial includes sections of an introduction to quantum machine learning, optimizing quantum neural networks, and the use of variational quantum circuits for speech and acoustic processing.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-e3f3646 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"e3f3646\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-50 elementor-top-column elementor-element elementor-element-bf5f9db\" data-id=\"bf5f9db\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-fde38f5 elementor-widget elementor-widget-text-editor\" data-id=\"fde38f5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p><span style=\"color: #000000;\">Dr. Jun Qi is now an Assistant Professor in Electronic Engineering at Fudan University, Shanghai, China. He received his Ph.D. in the School of Electrical and Computer Engineering at Georgia Institute of Technology, Atlanta, GA, in 2022, advised by Prof. Chin-Hui Lee and Prof. Xiaoli Ma. Previously, he obtained two Masters in Electrical Engineering from the University of Washington, Seattle, and Tsinghua University, Beijing, in 2013 and 2017, respectively. Besides, he was a research intern in the Deep Learning Technology Center at Microsoft Research, Redmond, WA, Tencent AI Lab, WA, and MERL, MA, USA. Dr. Qi was the recipient of 1st prize in Xanadu AI Quantum Machine Learning Competition 2019, and his ICASSP paper on quantum speech recognition was nominated as the best paper candidate in 2022. Besides, he gave two Tutorials on Quantum Neural Networks for Speech and Language Processing at the venues of IJCAI\u201921 and ICASSP\u201922.<\/span><\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-50 elementor-top-column elementor-element elementor-element-d381a90\" data-id=\"d381a90\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-cc8d0d4 elementor-widget elementor-widget-image\" data-id=\"cc8d0d4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"418\" height=\"415\" src=\"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/09\/jun-qi.png\" class=\"attachment-large size-large wp-image-819\" alt=\"\" srcset=\"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/09\/jun-qi.png 418w, https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/09\/jun-qi-300x298.png 300w, https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/09\/jun-qi-150x150.png 150w\" sizes=\"(max-width: 418px) 100vw, 418px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-59f8109 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"59f8109\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-50 elementor-top-column elementor-element elementor-element-88cff32\" data-id=\"88cff32\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-b8eb777 elementor-widget elementor-widget-text-editor\" data-id=\"b8eb777\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Huck Yang is a final-year Ph.D. candidate working on robust and privacy-preserving speech recognition and sequence modeling, advised by Prof. Chin-Hui Lee at Georgia Tech, GA, USA. He received his B.Sc. from National Taiwan University in 2016. He has worked on large-scale ASR-LM with Dr. Ivan Bulyko and Dr. Andreas Stolcke at Amazon Alexa AI, WA, USA, in 2020 and 2021; multilingual speech recognition at Google Research, CA, USA, with Dr. Bo Li and Dr. Yu Zhang in 2022. He received the Judges\u2019 award at DCASE 2021, the best reproducible system award at DCASE 2020, the Xanadu AI Quantum ML Research award 1st Place in 2019, the EPFL summer research fellowship in 2018, and Wallace H. Coulter Fellowship in 2017.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-50 elementor-top-column elementor-element elementor-element-a6edb2a\" data-id=\"a6edb2a\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-a59eaf5 elementor-widget elementor-widget-image\" data-id=\"a59eaf5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"418\" height=\"411\" src=\"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/09\/huck-yang.jpg\" class=\"attachment-large size-large wp-image-818\" alt=\"\" srcset=\"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/09\/huck-yang.jpg 418w, https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/09\/huck-yang-300x295.jpg 300w\" sizes=\"(max-width: 418px) 100vw, 418px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-67cf384 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"67cf384\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-5ef4084\" data-id=\"5ef4084\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-d3e04a4 elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"d3e04a4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-97beb29 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"97beb29\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-dfea61b\" data-id=\"dfea61b\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-ed6ec53 elementor-widget elementor-widget-heading\" data-id=\"ed6ec53\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Tutorial 5: Recent Advances on Automatic Dialogue Evaluation<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-6511936 elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"6511936\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-830bd77\" data-id=\"830bd77\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-9e85875 elementor-widget elementor-widget-heading\" data-id=\"9e85875\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h3 class=\"elementor-heading-title elementor-size-default\">Presenters: Luis Fernando D'Haro, Universidad Polit_ecnica de Madrid; Chen Zhang, National University of Singapore<\/h3>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-5656f3c elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"5656f3c\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-100 elementor-top-column elementor-element elementor-element-81641a9\" data-id=\"81641a9\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-6a5e91c elementor-widget elementor-widget-text-editor\" data-id=\"6a5e91c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>In recent years, dialogue systems have attracted significant interests from both the academia and the industry. Especially the discipline of open-domain dialogue systems, a.k.a chatbots, which has gained great momentum. Since 2016, data-driven generative models become popular in open-domain dialogue systems research. Assessing the performance of such models involves extensive human evaluation, which is both time- and cost- intensive. Hence, during the model development phase, researchers and practitioners must rely on automatic evaluation metrics. Yet, a long-standing challenge is the lack of meaningful metrics, that correlate well with human evaluation. Over the past three years, there has been considerable progress towards meaningful automatic evaluation metrics for dialogue. Taxonomy of dialogue evaluation are defined. More and more standard benchmark for meta-evaluation of the metrics are created. Various meaningful, reference-free, and model-based metrics are proposed.<\/p>\n<p>Our tutorial covers the recent advancement of automatic dialogue evaluation research, mainly about the development in the field from 2016 to 2022. We will discuss (1) common NLG metrics that are used in dialogue evaluation and the problems associated with them; (2) taxonomy of dialogue evaluation; (3) the newly established dialogue evaluation benchmarks and metrics; (4) future research directions.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-c5e724c elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"c5e724c\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-50 elementor-top-column elementor-element elementor-element-4f6002e\" data-id=\"4f6002e\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-621b5bf elementor-widget elementor-widget-text-editor\" data-id=\"621b5bf\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Luis Fernando D\u2019Haro is Electronic Engineer (2000) from Universidad Aut\u00f3noma de Occidente (Colombia) and PhD by Universidad Polit\u00e9cnica de Madrid (2009). During his PhD, he was a Visiting Researcher at the I6 HLT-PR Group in Aachen, Germany (2005) and AT&amp;T Research labs in NJ, USA (2006). Later he made a postdoctoral research stay at the Speech Processing Group in Brno, Czech Republic (2011). From 2014-2018 he worked at I2R, A*STAR in Singapore. Since 2018 he is an Associate Professor at Universidad Polit\u00e9cnica de Madrid (Spain) and he is currently a member of the Speech Technology and Machine Learning group.<\/p>\n<p>He has participated in +40 research projects (2 European, 18 National, 9 private, 12 institutional), authors +23 top journal, +140 international conference papers, and he is editor for 2 books with Springer and invited Editor for 2 special issues at Computer Speech and Language and IEEE-ACM TASLP. He is usual reviewer in +5 top journals, 10 top conferences (including area chair at ACL2020), and for National research programs such as PRELUDIUM (Poland) and RGC (Hong Kong). He co-organized DSTC challenges in 2015, 2016, 2021 and 2022, JSALT2020 (organized by Johns Hopkins-University), WoChat2016-2018 and DBDC4-5, which have the common goal of advancing dialogue systems and their automatic evaluation. He also helped to organize Interspeech2014, HAI2016 (where he participated as presenter for the tutorial: \u201cNatural Language in Human-Robot Interaction\u201d), IWSDS2018, and general chair for IWSDS2020, which was held in Madrid with more than 180 registered participants. In 2021, he was the faculty advisor for the Spanish team Genuine2 at the Alexa Socialbot Grant Challenge (SGC4).<\/p>\n<p>His current research focuses on spoken dialogue and NLP systems. This includes automatic evaluation and controlled multimodal and multilingual generation for open-domain dialogue systems, language and speaker recognition, as well as automatic evaluation for machine translation.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-50 elementor-top-column elementor-element elementor-element-0a1bc38\" data-id=\"0a1bc38\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-b425286 elementor-widget elementor-widget-image\" data-id=\"b425286\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"220\" height=\"257\" src=\"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/08\/Luis-Fernando-DHaro-e1671802730950.jpg\" class=\"attachment-large size-large wp-image-809\" alt=\"\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<section class=\"elementor-section elementor-top-section elementor-element elementor-element-f833c1e elementor-section-boxed elementor-section-height-default elementor-section-height-default\" data-id=\"f833c1e\" data-element_type=\"section\" data-e-type=\"section\">\n\t\t\t\t\t\t<div class=\"elementor-container elementor-column-gap-default\">\n\t\t\t\t\t<div class=\"elementor-column elementor-col-50 elementor-top-column elementor-element elementor-element-2b6586b\" data-id=\"2b6586b\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-e87e62d elementor-widget elementor-widget-text-editor\" data-id=\"e87e62d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Chen Zhang is a final year PhD candidate with Electrical &amp; Computer Engineering (ECE) department of National University of Singapore (NUS). He is also associated with Robert Bosch (SEA) under the NUS-Bosch Industrial PhD Programme. His main research interests include dialogue systems, especially automatic dialogue evaluation and open-domain dialogue generation. His work on \u201cInvestigating the Impact of Pre-trained Language Models on Dialog Evaluation\u201d receives the best paper award in IWSDS-2021. He was one of the main organizers of DSTC10 track 5 challenge on \u201cAutomatic Evaluation and Moderation of Open-domain Dialogue Systems\u201d. Currently, he is part of the organizing committee of DSTC11.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t<div class=\"elementor-column elementor-col-50 elementor-top-column elementor-element elementor-element-5a87aa8\" data-id=\"5a87aa8\" data-element_type=\"column\" data-e-type=\"column\">\n\t\t\t<div class=\"elementor-widget-wrap elementor-element-populated\">\n\t\t\t\t\t\t<div class=\"elementor-element elementor-element-1176e5f elementor-widget elementor-widget-image\" data-id=\"1176e5f\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"234\" height=\"234\" src=\"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/08\/Chen-Zhang.jpg\" class=\"attachment-large size-large wp-image-795\" alt=\"\" srcset=\"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/08\/Chen-Zhang.jpg 234w, https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-content\/uploads\/2022\/08\/Chen-Zhang-150x150.jpg 150w\" sizes=\"(max-width: 234px) 100vw, 234px\" \/>\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t<\/section>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Tutorial 1: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Speech Processing Presenters: Yu Zhang, Bo Li, Daniel Park, Google Semi-supervised learning (SSL), which uses unlabeled data to enhance the performance of labeled tasks, has recently played a crucial part&#8230;<br \/><a class=\"read-more-button\" href=\"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/tutorials\/\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-626","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-json\/wp\/v2\/pages\/626","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-json\/wp\/v2\/comments?post=626"}],"version-history":[{"count":118,"href":"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-json\/wp\/v2\/pages\/626\/revisions"}],"predecessor-version":[{"id":1819,"href":"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-json\/wp\/v2\/pages\/626\/revisions\/1819"}],"wp:attachment":[{"href":"https:\/\/www.colips.org\/conferences\/iscslp2022\/web\/wp-json\/wp\/v2\/media?parent=626"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}