Keynote Speech 1

Title: Generative Adversarial Networks (GANs) for Speech Technology (ISCA Distinguished Lecture)

Speaker: Prof Hemant A. Patil, DA-IICT, India.


Adversarial training or Generative Adversarial Networks (GANs) is the most interesting and technologically challenging idea (pioneered by I. J. Goodfellow in 2014) in the field of machine learning. GAN is a recent framework for estimating generative models via the adversarial training mechanism in which we simultaneously train two models, namely,  a generator G that captures the (true) data distribution and a discriminator model D that estimate the probability that a sample came from training data rather than G. The training procedure of GANs (which is challenging w.r.t. convergence, mode collapse, etc.) for G is to maximize the probability of D making mistake. This framework corresponds to a mini-max two-player game (such as thief-Police game!). In the function space of arbitrary differentiable functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to ½ everywhere (that D is fooled by generator).  When G and D are defined by multilayer perceptron, the entire system can be trained with back propagation. 

GANs are widely used in various applications (first used in image processing and computer vision and recently in speech areas). In particular, image  (sample) generation, single image super resolution, text-to-image synthesis, and several speech technology applications (mostly after 2017), such as voice conversion, Non-audible Murmur (NAM)-to- whisper conversion, whisper-to-normal conversion, voice imitation, speech enhancement, Text-to-Speech (TTS) synthesis, and a very recent application to speaker recognition, natural language generation and data augmentation (for Automatic Speech Recognition (ASR) and low-resource languages), and domain adaptation. The objective of this talk is to first understand the fundamentals of GANs w.r.t. motivation, applications, various GAN architectures along with future research directions. The talk will present a case study on how GANs could be potentially useful to improve the performance of cross-lingual speaker recognition for Indian and other Asian languages.  Finally, the talk will bring out several open research problems (relationship with variational autoencoders (VAEs and their asymptotic consistency, convergence of GANs), that needs immediate attention to fully realize the potential of GANs in several technological applications.

This talk is focused on the application of the technology on Asian language processing.


Hemant A. Patil received Ph.D. degree from the Indian Institute of Technology (IIT), Kharagpur, India, in July 2006. Since 2007, he has been a faculty member at DA-IICT Gandhinagar, India and developed Speech Research Lab recognized as ISCA speech labs at DA-IICT. Dr. Patil is member of ISCA, IEEE, IEEE Signal Processing Society, IEEE Circuits and Systems Society, EURASIP, APSIPA and an affiliate member of IEEE SLTC. He is regular reviewer for ICASSP and INTERSPEECH, Speech Communication, Elsevier, Computer Speech and Language, Elsevier and Int. J. Speech Tech, Springer, Circuits, Systems and Signal Processing, Springer. He has published around 250+ research publications in national and international conferences/journals/book chapters. He visited department of ECE, University of Minnesota, Minneapolis, USA (May-July, 2009) as short term scholar. He has been associated (as PI) with three MeitY sponsored projects in ASR, TTS and QbESTD. He was co-PI for DST sponsored project on India-Digital Heritage (IDH)-Hampi. His research interests include speech and speaker recognition, analysis of spoofing attacks, TTS, and infant cry analysis. He has received DST Fast Track Award for Young Scientists for infant cry analysis. He has coedited four books with Dr. Amy Neustein (EIC, IJST Springer) with titles, Forensic Speaker Recognition (Springer, 2011), Signal and Acoustic Modeling for Speech and Communication Disorders (DE GRUYTER, 2018), Voice Technologies for Speech Reconstruction and Enhancement (DE GRUYTER, 2020), and Acoustic Analysis of Pathologies from Infant to Young Adulthood (DE GRUYTER, 2020).

Dr. Patil has taken a lead role in organizing several ISCA supported events, such as summer/winter schools/CEP workshops (on theme as speaker and language recognition, speech source modeling, text-to-speech synthesis, speech production-perception link, advances in speech processing) and progress review meetings for two MeitY consortia projects all at DA-IICT Gandhinagar. Dr. Patil has supervised 05 doctoral and 42 M.Tech. theses (all in speech processing area). Presently, he is supervising 03 doctoral and 03 masters students. Recently, he offered a joint tutorial with Prof. Haizhou Li during Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2017, and INTERSPEECH 2018. He offered a joint tutorial with H. Kawahara on the topic, “Voice Conversion: Challenges and Opportunities,” during APSIPA ASC 2018, Honolulu, USA. He has been selected as APSIPA Distinguished Lecturer (DL) for 2018-2019 and he has 20 APSIPA DLs in four countries, namely, India, Singapore, China, and Canada. Recently, he is selected as ISCA Distinguished Lecturer (DL) for 2020-2021 and delivered 05 ISCA DLs in India. Recently, he is invited to deliver ISCA DL during overview session of APSIPA ASC 2020, New Zealand, Dec. 7-10.  2020. 

Homepage of Prof. Hemant A. Patil:

Speech Research Lab @ DA-IICT Gandhinagar:

Keynote Speech 2

Title:Malay NLP: Current Trends, Challenges and Future Directions

Speaker: Nazlia Omar, Universiti Kebangsaan Malaysia (UKM), Malaysia


Malay is one of the Southeast Asian languages that is spoken by around 290 million native speakers. Malay NLP has recently gained much attention by both the industry and academia for representing and analysing the language computationally. It has been making some significant progress despite being one of the low resource languages. The talk will highlight some of the research efforts being investigated in Malay, the challenges in dealing with both the formal and informal language and future directions in this area.


Nazlia Omar is currently an Associate Professor at the Center for AI Technology (CAIT), Faculty of Information Science and Technology (FTSM), Universiti Kebangsaan Malaysia (UKM). She holds her PhD from the University of Ulster, UK. Her main research interest is in the area of Natural Language Processing and Computational Linguistics. She is a member of the Asian Language Processing Lab (ASLAN) at CAIT, FTSM, UKM.