We are honored to welcome the following keynote speakers to present at the conference:

Plenary Speaker

Ahmad Khalil bin Mohamad Nooh (Khalil Nooh)

Short Bio:

Khalil Nooh has over 22 years of professional experience in the tech industry with projects in Malaysia, Indonesia, The Philippines and Ukraine. In 2016, he founded his first tech startup Dropshipking.co which got him into the Stanford University Go2Market program together in MaGIC Cyberjaya.

Today, he is the Co-Founder and CEO of Mesolitica, a deep-tech startup that trains narrow AIs to understand Bahasa Melayu. Currently, Mesolitica is developing a business chatbot called “Nous” (pronounced “nows”) powered by MaLLaM 🌙 (Malaysia Large Language Model), providing conversational AI capabilities for SMEs and enterprises with oriented-task chatbots, voice assistants and digital avatars — getting ready for the Metaverse.

Mesolitica is working closely with both NVIDIA and AWS to bring localized Large Langauge Model to enterprise production systems in Malaysia.

Presentation Title:

Mesolitica’s Journey in Developing Malaysia’s Large Language Model (MaLLaM 🌙)

Abstract:

This presentation explores the groundbreaking journey of Mesolitica, a Malaysian AI startup, in developing MaLLaM 🌙, Malaysia’s first culturally relevant Large Language Model (LLM). Originating from the Malaya open-source Natural Language Toolkit (NLTK) for Bahasa Melayu, Mesolitica has evolved into a leading player in local AI innovation, leveraging open souce technologies in the space of Large Language Models and partnership with Amazon Web Services (AWS). Mesolitica’s continued engagement with the Ministry of Digital Malaysia underscore its commitment to AI sovereignty, focusing on data privacy, technological independence, and ethical AI adoption. The presentation will also discuss the open-source contributions of Mesolitica, including models and datasets hosted on Hugging Face, and its vision for empowering Malaysia’s AI ecosystem through talent development and applied research. This presentation aims to inspire academic audiences by showcasing the transformative potential of localized AI models in fostering innovation, cultural relevance, and technological independence in emerging markets.

Khalil Nooh

Professor Wu Zhizheng

Keynote Speaker

Professor Wu Zhizheng

Short Biography:

Professor Wu Zhizheng is currently an Associate Professor at The Chinese University of Hong Kong, Shenzhen. He holds the position of Deputy Director at the Shenzhen Key Laboratory of CrossModal Cognitive Computing. Professor Wu has been consistently listed in Stanford University’s “World’s Top 2% Scientists” and has received multiple Best Paper Awards. He earned his Ph.D. from Nanyang Technological University and has held research and leadership roles at internationally renowned institutions, including Meta (formerly Facebook), Apple, the University of Edinburgh, and Microsoft Research Asia. Professor Wu has initiated several influential open-source projects, such as Merlin, Amphion, and Emilia, which have been adopted by over 700 organizations worldwide, including OpenAI. Notably, Amphion has topped GitHub’s trending list multiple times, while Emilia has become the most popular audio dataset (Most Liked) on HuggingFace. He also initiated and organized the first ASVspoof Challenge and the first Voice Conversion Challenge and served as the organizer of the Blizzard Challenge 2019, a prestigious international speech synthesis competition. Currently, Professor Wu serves on the editorial boards of IEEE/ACM Transactions on Audio, Speech and Language Processing and IEEE Signal Processing Letters, and is the General Chair of the IEEE Spoken Language Technology Workshop 2024.

Presentation Title:

Reinforcement Learning for text-to-speech synthesis

Abstract:

Reinforcement Learning (RL) is emerging as a powerful paradigm for refining Text-to-Speech (TTS) systems, moving beyond standard supervised training to fine-tune nuanced qualities like naturalness and expressiveness. While these methods improve general performance, they often fail to address specific, critical flaws. One of the most significant challenges for modern zero-shot TTS is maintaining intelligibility, especially for difficult inputs like tongue twisters, repeated words, cross-lingual prompts, or code-switching. This talk will explore how targeted preference alignment, a form of RL, can solve these issues. We present a case study based on the paper “Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment.” The approach introduces a new Intelligibility Preference Speech Dataset (INTP) and extends the Direct Preference Optimization (DPO) framework for TTS. Alignment with INTP significantly boosts intelligibility and enhances overall speech quality. We also demonstrate weak-to-strong generalization, improving even state-of-the-art models, and show the potential for iterative refinement. This work highlights a strategic shift from general enhancement to precisely correcting flaws, charting a path toward truly robust and reliable TTS.