ISCA COLIPS I2R


Technical Program

Keynote 1: ISCA Medalist
Monday 15 September 2014 09:30-10:30, Garnet 213-218
     
  Keynote 1 ISCA Medalist
   

Oral Session 1 (Mon-O-1): Multi-lingual ASR
Monday 15 September 2014 11:00-13:00, Garnet 213-218
     
  Mon-O-1-1 Language ID-based training of multilingual stacked bottleneck features
    Yu Zhang, Ekapol Chuangsuwanich and James Glass
  Mon-O-1-2 Kernel Density-based Acoustic Model with Cross-lingual Bottleneck Features for Resource Limited LVCSR
    Van Hai Do, Xiong Xiao, Eng Siong Chng and Haizhou Li
  Mon-O-1-3 Improving ASR Performance On Non-native Speech Using Multilingual and Crosslingual Information
    Ngoc Thang Vu, Yuanfan Wang, Marten Klose, Zlatka Mihaylova and Tanja Schultz
  Mon-O-1-4 Language Independent and Unsupervised Acoustic Models for Speech Recognition and Keyword Spotting
    Kate Knill, Mark Gales, Anton Ragni and Shakti P. Rath
  Mon-O-1-5 Cross-lingual adaptation with multi-task adaptive networks
    Peter Bell, Joris Driesen and Steve Renals
  Mon-O-1-6 On Recognition of Non-Native Speech Using Probabilistic Lexical Model
    Marzieh Razavi and Mathew Magimai Doss

Oral Session 2 (Mon-O-2): Prosody Processing
Monday 15 September 2014 11:00-13:00, Peridot 202-203
     
  Mon-O-2-1 Direct F0 Control of an Electrolarynx based on Statistical Excitation Feature Prediction and its Evaluation through Simulation
    Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti and Satoshi Nakamura
  Mon-O-2-2 A target approximation intonation model for Yorùbá TTS
    Daniel Van Niekerk and Etienne Barnard
  Mon-O-2-3 Learning continuous-valued representations for phrase break prediction
    Anandaswarup Vadapalli and Kishore Prahallad
  Mon-O-2-4 Improving Mandarin Prosodic Boundary Prediction with Rich Syntactic Features
    Hao Che, Jianhua Tao and Ya Li
  Mon-O-2-5 Investigating Automatic & Human Filled Pause Insertion for Synthetic Speech
    Rasmus Dall, Marcus Tomalin, Mirjam Wester, Bill Byrne and Simon King
  Mon-O-2-6 The Effect of Filled Pauses and Speaking Rate on Speech Comprehension in Natural, Vocoded and Synthetic Speech
    Rasmus Dall, Mirjam Wester and Martin Corley

Oral Session 3 (Mon-O-3): Speaker Recognition - Applications
Monday 15 September 2014 11:00-13:00, Peridot 204-205
     
  Mon-O-3-1 Introducing I-Vectors for Joint Anti-spoofing and Speaker Verification
    Elie Khoury, Tomi Kinnunen, Aleksandr Sizov, Zhizheng Wu and Sebastien Marcel
  Mon-O-3-2 Random Projections for Large-Scale Speaker Search
    Ryan Leary and Walter Andrews
  Mon-O-3-3 Analysis of I-Vector framework for Speaker Identification in TV-shows
    Corinne Fredouille and Delphine Charlet
  Mon-O-3-4 Boosting bonsai trees for efficient features combination: application to speaker role identification
    Antoine Laurent, Nathalie Camelin and Christian Raymond
  Mon-O-3-5 Identifying contributors in the BBC World Service Archive
    Yves Raimond and Thomas Nixon
  Mon-O-3-6 Effect of long-term ageing on i-vector speaker verification
    Finnian Kelly, Rahim Saeidi, Naomi Harte and David van Leeuwen

Oral Session 4 (Mon-O-4): Phonetics and Phonology
Monday 15 September 2014 11:00-13:00, Peridot 201
     
  Mon-O-4-1 Acoustic correlates of phonological status
    Maarten Versteegh, Alejandrina Cristia and Amanda Seidl
  Mon-O-4-2 Parameterization of the glottal source with the phase plane plot
    Manu Airaksinen and Paavo Alku
  Mon-O-4-3 Transcribing Tone – A likelihood-based quantitative evaluation of Chao’s "tone letters"
    Philip Rose
  Mon-O-4-4 Intonational Phonology and Prosodic Hierarchy in Malay
    Diyana Hamzah and James Sneed German
  Mon-O-4-5 Comparing parameterizations of pitch register and its discontinuities at prosodic boundaries for Hungarian
    Uwe Reichel
  Mon-O-4-6 An Evaluation of Machine Learning Methods for Prominence Detection in French
    George Christodoulides and Mathieu Avanzi

Special Session 1 (Mon-SP1): Open Domain Situated Conversational Interaction
Monday 15 September 2014 11:00-13:00, Peridot 206
     
  Mon-SP1-1 Learning Situated Knowledge Bases through Dialog
    Aasish Pappu and Alexander Rudnicky
  Mon-SP1-2 Crowdsourcing for Situated Dialog Systems in a Moving Car
    Teruhisa Misu
  Mon-SP1-3 Evaluating Coherence in Open Domain Conversational Systems
    Ryuichiro Higashinaka, Toyomi Meguro, Kenji Imamura, Hiroaki Sugiyama, Toshiro Makino and Yoshihiro Matsuo
  Mon-SP1-4 Adapting dependency parsing to spontaneous speech for open domain spoken language understanding
    Frederic Bechet, Alexis Nasr and Benoit Favre
  Mon-SP1-5 Incremental on-line adaptation of POMDP-based dialogue managers to extended domains
    Milica Gasic, Dongho Kim, Pirros Tsiakoulis, Catherine Breslin, Matthew Henderson, Martin Szummer, Blaise Thomson and Steve Young
  Mon-SP1-6 Hypotheses Ranking for Robust Domain Classification And Tracking in Dialogue Systems
    Jean-Philippe Robichaud, Paul Crook, Puyang Xu, Omar Khan and Ruhi Sarikaya

Poster Session 1 (Mon-P-1): Speech Production: Models and Acoustics
Monday 15 September 2014 11:00-13:00, Max Atria Gallery
     
  Mon-P-1-1 Motor control primitives arising from a learned dynamical systems model of speech articulation
    Vikram Ramanarayanan, Louis Goldstein and Shrikanth Narayanan
  Mon-P-1-2 Nonword Repetition of Taiwanese Disyllabic Tonal Sequences in Adults with Language Attrition
    Chiahsin Yeh, Chiung-Yao Wang and Jung-Yueh Tu
  Mon-P-1-3 A Unified Account of Prominence Effects in an Optimization-Based Model of Speech Timing
    Andreas Windmann, Juraj Simko and Petra Wagner
  Mon-P-1-4 Estimation of the movement trajectories of non-crucial articulators based on the detection of crucial moments and physiological constraints
    Jangwon Kim, Sungbok Lee and Shrikanth Narayanan
  Mon-P-1-5 Sparse smoothing of articulatory features from Gaussian mixture model based acoustic-to-articulatory inversion: Benefit to speech recognition
    Prasad Sudhakar and Prasanta Ghosh
  Mon-P-1-6 Contribution of Tongue Lateral to Consonant Production
    Jun Wang, William Katz and Thomas Campbell
  Mon-P-1-7 A Preliminary Study on Acoustic Correlates of Tone2+Tone2 Disyllabic Word Stress in Mandarin
    Min Liu, Shuju Shi and Jinsong Zhang
  Mon-P-1-8 Vowel length impact on locus equation parameters : An investigation on Jordanian Arabic
    Mohammad Abuoudeh and Olivier Crouzet
  Mon-P-1-9 Corpus-testing a fricative discriminator; or, just how invariant is this invariant?
    Philip Roberts, Henning Reetz and Aditi Lahiri
  Mon-P-1-10 Modeling Coarticulation in Continuous Speech
    Brian Bush and Alexander Kain
  Mon-P-1-11 On classification between normal and pathological voices using the MEEI-KayPENTAX database: Issues and consequences
    Khalid Daoudi and Blaise Bertrac
  Mon-P-1-12 Synchronic variation in the articulation and the acoustics of the Polish three-way place distinction in sibilants and its implications for diachronic change
    Véronique Bukmaier, Jonathan Harrington, Ulrich Reubold and Felicitas Kleber

Poster Session 2 (Mon-P-2): Extraction of Para-Linguistic Information
Monday 15 September 2014 11:00-13:00, Max Atria Gallery
     
  Mon-P-2-1 Predicting Client’s inclination towards Target Behavior Change in Motivational Interviewing and investigating the role of laughter.
    Rahul Gupta, Panayiotis Georgiou, David Atkins and Shrikanth Narayanan
  Mon-P-2-2 Modeling Therapist Empathy through Prosody in Drug Addiction Counseling
    Bo Xiao, Daniel Bone, Maarten Van Segbroeck, Zac Imel, David Atkins, Panayiotis Georgiou and Shrikanth Narayanan
  Mon-P-2-3 An Investigation of Vocal Arousal Dynamics in Child-Psychologist Interactions using Synchrony Measures and a Conversation-based Model
    Daniel Bone, Chi-Chun Lee, Alexandros Potamianos and Shrikanth Narayanan
  Mon-P-2-4 SPEECH EMOTION RECOGNITION USING DEEP NEURAL NETWORK AND EXTREME LEARNING MACHINE
    Kun Han, Dong Yu and Ivan Tashev
  Mon-P-2-5 An annotation scheme for sighs in spontaneous dialogue
    Khiet Truong, Gerben Westerhof, Franciska de Jong and Dirk Heylen
  Mon-P-2-6 Speaker Idiosyncratic Variability of Intensity across Syllables
    Lei He and Volker Dellwo
  Mon-P-2-7 Building A Naturalistic Emotional Speech Corpus by Retrieving Expressive Behaviors From Existing Speech Corpora
    Soroosh Mariooryad, Reza Lotfian and Carlos Busso
  Mon-P-2-8 Identification of Age-Group from Children's Speech by Computers and Humans
    Saeid Safavi, Martin Russell and Peter Jancovic

Poster Session 3 (Mon-P-3): Spoken Language Understanding
Monday 15 September 2014 11:00-13:00, Max Atria Gallery
     
  Mon-P-3-1 THEME IDENTIFICATION IN HUMAN-HUMAN CONVERSATIONS WITH FEATURES FROM SPECIFIC SPEAKER TYPE HIDDEN SPACES
    Mohamed Morchid, Richard Dufour, Mohamed Bouallegue, Georges Linares and Renato de Mori
  Mon-P-3-2 Learning Phrase Patterns for Text Classification Using a Knowledge Graph and Unlabeled Data
    Alex Marin, Roman Holenstein, Ruhi Sarikaya and Mari Ostendorf
  Mon-P-3-3 Targeted Feature Dropout for Robust Slot Filling in Natural Language Understanding
    Puyang Xu and Ruhi Sarikaya
  Mon-P-3-4 Spoken Question Answering Using Tree-structured Conditional Random Fields and Two-layer Random Walk
    Sz-Rung Shiang, Hung-yi Lee and Lin-shan Lee
  Mon-P-3-5 Shrinkage Based Features for Slot Tagging with Conditional Random Fields
    Ruhi Sarikaya, Asli Celikyilmaz, Anoop Deoras and Minwoo Jeong
  Mon-P-3-6 Cluster based Chinese Abbreviation Modeling
    Yangyang Shi, Yi-Cheng Pan and Mei-Yuh Hwang
  Mon-P-3-7 Parsing Named Entity as Syntactic Structure
    Xiantao Zhang, Dongchen Li, Xihong Wu
  Mon-P-3-8 Detecting Out-Of-Domain Utterances Addressed to a Virtual Personal Assistant
    Gokhan Tur, Anoop Deoras and Dilek Hakkani-Tur
  Mon-P-3-9 Fusion of knowledge-based and data-driven approaches to grammar induction
    Spiros Georgiladakis, Christina Unger, Elias Iosif, Sebastian Walter, Philipp Cimiano, Euripides Petrakis and Alexandros Potamianos
  Mon-P-3-10 Improving Named Entity Recognition with Prosodic Features
    Denys Katerenchuk and Andrew Rosenberg
  Mon-P-3-11 Lexical Addressee Detection: A Neural Network Approach
    Suman Ravuri and Andreas Stolcke
  Mon-P-3-12 manipulating stance and involvement using collaborative tasks: an exploratory comparison
    Valerie Freeman, Julian Chan, Gina-Anne Levow, Richard Wright, Mari Ostendorf and Victoria Zayats

Oral Session 5 (Mon-O-5): Spoken Dialogue Systems
Monday 15 September 2014 14:30-16:30, Garnet 213-218
     
  Mon-O-5-1 Incremental Dialog Processing in a Task-Oriented Dialog
    Fabrizio Ghigi, Maxine Eskenazi, María Inés Torres and Sungjin Lee
  Mon-O-5-2 Detecting Incorrectly-Segmented Utterances for Posteriori Restoration of Turn-Taking and ASR Results
    Naoki Hotta, Kazunori Komatani, Satoshi Sato, Mikio Nakano
  Mon-O-5-3 Segmentation and Disfluency Removal for Conversational Speech Translation
    Hany Hassan, Lee Schwartz, Dilek Hakkani-Tur and Gokhan Tur
  Mon-O-5-4 Cost-level integration of statistical and rule-based dialog managers
    Shinji Watanabe, John Hershey, Tim Marks, Youichi Fujii and Yusuke Koji
  Mon-O-5-5 Inverse Reinforcement Learning for Micro-Turn Management
    Dongho Kim, Catherine Breslin, Pirros Tsiakoulis, Milica Gasic, Matthew Henderson and Steve Young
  Mon-O-5-6 Analysing the prosodic characteristics of speech-chunks preceding silences in task-based interactions
    John Kane, Irena Yanushevskaya, Céline de Looze, Brian Vaughan, Ailbhe Ní Chasaide

Oral Session 6 (Mon-O-6): DNN Architectures and Robust Recognition
Monday 15 September 2014 14:30-16:30, Peridot 202-203
     
  Mon-O-6-1 Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling
    Hasim Sak, Andrew Senior and Francoise Beaufays
  Mon-O-6-2 Unfolded Recurrent Neural Networks for Speech Recognition
    George Saon, Hagen Soltau, Ahmad Emami and Michael Picheny
  Mon-O-6-3 Manifold Regularized Deep Neural Networks
    Vikrant Tomar and Richard Rose
  Mon-O-6-4 Modeling Long Temporal Contexts for Robust DNN-based Speech Recognition
    Bo Li and Khe Chai Sim
  Mon-O-6-5 A long, deep and wide artificial neural net for robust speech recognition in unknown noise
    Feipeng Li, Phani Nidadavolu and Hynek Hermansky
  Mon-O-6-6 Investigation of Deep Neural Networks for Robust Recognition of Nonlinearly Distorted Speech
    Ladislav Seps, Jiri Malek, Petr Cerva and Jan Nouza

Oral Session 7 (Mon-O-7): Speaker Recognition - Evaluation and Forensics
Monday 15 September 2014 14:30-16:30, Peridot 204-205
     
  Mon-O-7-1 Summary and Initial Results of the 2013-2014 Speaker Recognition i-vector Machine Learning Challenge
    Désiré Bansé, George Doddington, Daniel Garcia-Romero, John Godfrey, Craig Greenberg, Alvin Martin, Alan McCree, Mark Przybocki and Douglas Reynolds
  Mon-O-7-2 Constrained speaker linking
    David A. van Leeuwen and Niko Brummer
  Mon-O-7-3 RBM-PLDA subsystem for the NIST i-Vector Challenge
    Sergey Novoselov, Timur Pekhovsky, Konstantin Simonchik and Andrey Shulipa
  Mon-O-7-4 Limited Labels for Unlimited Data: Active Learning for Speaker Recognition
    Stephen Shum, Najim Dehak and Jim Glass
  Mon-O-7-5 Bayesian calibration for forensic evidence reporting
    Niko Brummer and Albert Swart
  Mon-O-7-6 Replicate Mismatch between Test/Background and Development Databases: The Impact on the Performance of Likelihood Ratio-based Forensic Voice Comparison
    Shunichi Ishihara

Oral Session 8 (Mon-O-8): Speech Production I
Monday 15 September 2014 14:30-16:30, Peridot 201
     
  Mon-O-8-1 Automatic estimation of the lip radiation effect in glottal inverse filtering
    Manu Airaksinen, Tom Bäckström and Paavo Alku
  Mon-O-8-2 Simulation of 3D larynges with asymmetric distribution of viscoelastic properties in their vocal folds
    Marcelo Rosa
  Mon-O-8-3 Comparison of vocal tract transfer functions calculated using one-dimensional and three-dimensional acoustic simulation methods
    Hironori Takemoto, Parham Mokhtari and Tatsuya Kitamura
  Mon-O-8-4 A study of invariant properties and variation patterns in the converter/distributor model for emotional speech
    Jangwon Kim, Donna Erickson, Sungbok Lee and Shrikanth Narayanan
  Mon-O-8-5 A hybrid approach to 3D tongue modeling from vocal tract MRI using unsupervised image segmentation and mesh deformation
    Alexander Hewer, Ingmar Steiner and Stefanie Wuhrer
  Mon-O-8-6 Estimation of Vocal-Tract Shape from Speech Spectrum and Speech Resynthesis Based on a Generative Model
    Tokihiko Kaburagi

Special Session 2 (a) (Mon-SP2a): INTERSPEECH 2014 Computational Paralinguistics ChallengE (ComParE) I
Monday 15 September 2014 14:30-16:30, Peridot 206
     
  Mon-SP2a-1 The INTERSPEECH 2014 Computational Paralinguistics Challenge: Cognitive & Physical Load
    Björn Schuller, Stefan Steidl, Anton Batliner, Julien Epps, Florian Eyben, Fabien Ringeval and Erik Marchi
  Mon-SP2a-2 Filtering and Subspace Selection for Spectral Features in Detecting Speech Under Physical Stress
    Jouni Pohjalainen and Paavo Alku
  Mon-SP2a-3 Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens
    Ming Li
  Mon-SP2a-4 Canonical Correlation Analysis and Local Fisher Discriminant Analysis based Multi-View Acoustic Feature Reduction for Physical Load Prediction
    Heysem Kaya, Tuğçe Özkaptan, Albert Ali Salah and Sadık Fikret Gürgen
  Mon-SP2a-5 Ensemble of Machine Learning Algorithms for Cognitive and Physical Speaker Load Detection
    How Jing, Ting-Yao Hu, Hung-Shin Lee, Wei-Chen Chen, Chi-Chun Lee, Yu Tsao and Hsin-Min Wang
  Mon-SP2a-6 Detecting the Intensity of Cognitive and Physical Load Using AdaBoost and Deep Rectifier Neural Networks
    Gábor Gosztolya, Tamás Grósz, Róbert Busa-Fekete and László Tóth

Poster Session 4 (Mon-P-4): Hearing and Perception
Monday 15 September 2014 14:30-16:30, Max Atria Gallery
     
  Mon-P-4-1 Revisiting the right-ear advantage for speech: Implications for speech displays
    Nandini Iyer, Eric Thompson, Brian Simpson and Griffin Romigh
  Mon-P-4-2 Comparing Reaction Time Sequences from Human Participants and Computational Models
    Louis ten Bosch, Mirjam Ernestus and Lou Boves
  Mon-P-4-3 Detecting the number of competing speakers – human selective hearing versus spectrogram distance based estimator
    Valentin Andrei, Horia Cucu, Andi Buzo, Corneliu Burileanu
  Mon-P-4-4 The Influence of Sensory Memory and Attention on the Context Effect in Talker Normalization
    Guo Li and Gang Peng
  Mon-P-4-5 Automatic Speech Recognition with Primarily Temporal Envelope Information
    Payton Lin, Fei Chen, Syu-Siang Wang, Yu Tsao and Ying Hui Lai
  Mon-P-4-6 An Adaptive Envelope Compression Strategy for Speech Processing in Cochlear Implants
    Ying Hui Lai, Fei Chen and Yu Tsao
  Mon-P-4-7 ARTICULATORY DYNAMICS AND COORDINATION IN CLASSIFYING COGNITIVE CHANGE WITH PRECLINICAL MTBI
    Brian Helfer, Thomas Quatieri, James Williamson, Laurel Keyes, Benjamin Evans, W. Nicholas Greene, Trina Vian, Joseph Lacirignola, Trey Shenk, Thomas Talavage, Jeff Palmer and Kristin Heaton
  Mon-P-4-8 A Hearing Impairment Simulation Method Using Audiogram-based Approximation of Auditory Charatecteristics
    Nozomi Jinbo, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti and Satoshi Nakamura
  Mon-P-4-9 Investigation of the relative perceptual importance for temporal envelop and temporal fine structure between tonal and non-tonal language
    Dongmei Wang, James M. Kates and John H.L. Hansen
  Mon-P-4-10 Vowel Spectral Contributions to English and Mandarin Sentence Intelligibility
    Daniel Fogerty and Fei Chen
  Mon-P-4-11 Significance of Aperiodicity in the Pitch Perception of Expressive Voices
    Vinay Kumar Mittal and B. Yegnanarayana

Poster Session 5 (Mon-P-5): Cross-linguistic Studies
Monday 15 September 2014 14:30-16:30, Max Atria Gallery
     
  Mon-P-5-1 DIAPIX-FL: A symmetric corpus of problem-solving dialogues in first and second languages
    Mirjam Wester, Maria Luisa Garcia Lecumberri and Martin Cooke
  Mon-P-5-2 Cross-linguistic investigations of oral and silent reading
    Christophe Coupe, Yoon Mi Oh, François Pellegrino and Egidio Marsico
  Mon-P-5-3 Non-native Word Recognition in Noise: The Role of Word-initial and Word-final Information
    Juul Coumans, Roeland van Hout and Odette Scharenborg
  Mon-P-5-5 The Effects of High and Low Variability Phonetic Training on the Perception and Production of English Vowels /e/-/æ/ by Cantonese ESL Learners with High and Low L2 Proficiency Levels
    Janice Wing Sze Wong
  Mon-P-5-6 Dutch vowel production by Spanish learners: duration and spectral features
    Pepi Burgos, Mátyás Jani, Catia Cucchiarini, Roeland van Hout and Helmer Strik
  Mon-P-5-7 English consonant confusions by Greek listeners in quiet and noise and the role of phonological short-term memory
    Angelos Lengeris and Katerina Nicolaidis
  Mon-P-5-8 Corpus-based L2 phonological data and semi-automatic perceptual analysis: the case of nasal vowels produced by beginner Japanese learners of French
    Sylvain Detey, Isabelle Racine, Julien Eychenne and Yuji Kawaguchi
  Mon-P-5-9 Perception of Prosodic Prominence and Boundaries by L1 and L2 Speakers of English
    Gabor PINTER, Shinobu Mizuguchi and Koichi Tateishi
  Mon-P-5-10 Prosody perception, reading accuracy, nonliteral language comprehension, and music and tonal pitch discrimination in school aged children
    Rose Thomas Kalathottukaren, Suzanne C. Purdy and Elaine Ballard
  Mon-P-5-11 Phoneme Category Retuning in a Non-native Language
    Polina Drozdova, Roeland van Hout and Odette Scharenborg
  Mon-P-5-12 Speech Emotion Recognition with Cross-lingual Databases
    Bo-Chang Chiou and Chia-Ping Chen

Poster Session 6 (Mon-P-6): Speaker Diarization
Monday 15 September 2014 14:30-16:30, Max Atria Gallery
     
  Mon-P-6-1 Speaker Diarization using Eye-gaze Information in Multi-party Conversations
    Koji Inoue, Yukoh Wakabayashi, Hiromasa Yoshimoto and Tatsuya Kawahara
  Mon-P-6-2 Unsupervised Speaker Diarization Using Riemannian Manifold Clustering
    Che-Wei Huang, Bo Xiao, Panayiotis Georgiou and Shrikanth Narayanan
  Mon-P-6-3 Towards a complete Binary Key System for the Speaker Diarization Task
    Héctor Delgado, Corinne Fredouille and Javier Serrano
  Mon-P-6-4 An Iterative Speaker Re-Diarization Scheme for Improving Speaker-Based Entity Extraction in Multimedia Archives
    Houman Ghaemmaghami, David Dean and Sridha Sridharan
  Mon-P-6-5 Speaker Diarization Using Gesture and Speech
    Binyam Gebrekidan Gebre, Peter Wittenburg, Sebastian Drude, Marijn Huijbregts and Tom Heskes
  Mon-P-6-6 Is Incremental Cross-Show Speaker Diarization Efficient For Processing Large Volumes of Data?
    Grégor Dupuy, Sylvain Meignier and Yannick Estève
  Mon-P-6-7 Detecting and Labeling Speakers on Overlapping Speech using Vector Taylor Series
    Pranay Dighe, Marc Ferras and Herve Bourlard
  Mon-P-6-8 Phoneme Background Model for Information Bottleneck based Speaker Diarization
    Sree Harsha Yella, Petr Motlicek and Herve Bourlard
  Mon-P-6-9 Diarizing Large Corpora using Multi-modal Speaker Linking
    Marc Ferras, Stefano Masneri, Oliver Schreer and Hervé Bourlard
  Mon-P-6-10 Multimodal understanding for person recognition in video broadcasts
    Frederic Bechet, Meriem Bendris, Delphine Charlet, Geraldine Damnati, Benoit Favre, Mickael Rouvier, Remi Auguste, Benjamin Bigot, Richard Dufour, Corinne Fredouille, Georges Linares, Grégory Senay, Pierre Tirilly and Jean Martinet

Oral Session 9 (Mon-O-9): Robust ASR
Monday 15 September 2014 17:00-19:00, Garnet 213-218
     
  Mon-O-9-1 Comparing Time-Frequency Representations for Directional Derivative Features
    James Gibson, Maarten Van Segbroeck and Shrikanth Narayanan
  Mon-O-9-2 Robust Speech Recognition with Speech Enhanced Deep Neural Networks
    Jun Du, Qing Wang, Tian Gao, Yong Xu, Lirong Dai and Chin-Hui Lee
  Mon-O-9-3 An investigation of likelihood normalization for robust ASR
    Emmanuel Vincent, Aggelos Gkiokas, Dominik Schnitzer and Arthur Flexer
  Mon-O-9-4 Identifying the human-machine differences in complex binaural scenes: What can be learned from our auditory system
    Constantin Spille and Bernd T. Meyer
  Mon-O-9-5 Robust Speech Recognition using Long Short-Term Memory Recurrent Neural Networks for Hybrid Acoustic Modelling
    Jürgen T. Geiger, Zixing Zhang, Felix Weninger, Björn Schuller and Gerhard Rigoll
  Mon-O-9-6 Joint Adaptation and Adaptive Training of TVWR for Robust Automatic Speech Recognition
    Shilin Liu and Khe Chai Sim

Oral Session 10 (Mon-O-10): Implementation of Language Model Algorithms
Monday 15 September 2014 17:00-19:00, Peridot 202-203
     
  Mon-O-10-1 Efficient GPU-based Training of Recurrent Neural Network Language Models Using Spliced Sentence Bunch
    Xie Chen, Yongqiang Wang, Xunying (Andrew) Liu, Mark Gales and Phil Woodland
  Mon-O-10-2 Word Pair Approximation for More Efficient Decoding with High-Order Language Models
    David Nolden, Ralf Schlüter and Hermann Ney
  Mon-O-10-3 Comparing Approaches to Convert Recurrent Neural Networks into Backoff Language Models For Efficient Decoding
    Heike Adel, Katrin Kirchhoff, Ngoc Thang Vu, Dominic Telaar and Tanja Schultz
  Mon-O-10-4 Removing Redundancy from Lattices
    David Nolden, Hagen Soltau, Dan Povey, Pegah Ghahremani, Lidia Mangu and Hermann Ney
  Mon-O-10-5 Lattice Decoding and Rescoring with Long-Span Neural Network Language Models
    Martin Sundermeyer, Zoltán Tüske, Ralf Schlüter and Hermann Ney
  Mon-O-10-6 Word-Phrase-Entity Language Models: Getting More Mileage out of N-grams
    Michael Levit, Sarangarajan Parthasarathy, Shuangyu Chang, Andreas Stolcke and Benoît Dumoulin

Oral Session 11 (Mon-O-11): Speaker Recognition - Noise and Channel Robustness
Monday 15 September 2014 17:00-19:00, Peridot 204-205
     
  Mon-O-11-1 A Novel Boosting Algorithm for Improved i-Vector based Speaker Verification in Noisy Environments
    Sourjya Sarkar and K. Sreenivasa Rao
  Mon-O-11-2 Using Deep Belief Networks for Vector-Based Speaker Recognition
    William Campbell
  Mon-O-11-3 A deep neural network speaker verification system targeting microphone speech
    Yun Lei, Luciana Ferrer, Mitchell McLaren and Nicolas Scheffer
  Mon-O-11-4 Application of Convolutional Neural Networks to Speaker Recognition in Noisy Conditions
    Mitchell McLaren, Yun Lei, Nicolas Scheffer and Luciana Ferrer
  Mon-O-11-5 SVM based Speaker Recognition: Harnessing Trials with Multiple Enrollment Sessions
    Jason Pelecanos, Weizhong Zhu and Sibel Yaman
  Mon-O-11-6 I-vector Speaker Verification based on Phonetic Information under Transmission Channel Effects
    Laura Fernández Gallardo, Michael Wagner and Sebastian Möller

Oral Session 12 (Mon-O-12): Speech Production II
Monday 15 September 2014 17:00-19:00, Peridot 201
     
  Mon-O-12-1 A Real-Time MRI Study of Articulatory Setting in Second Language Speech
    Andrés Benítez, Vikram Ramanarayanan, Louis Goldstein and Shrikanth Narayanan
  Mon-O-12-2 Retroflex and Bunched English /r/ with Physical Models of the Human Vocal Tract
    Takayuki Arai
  Mon-O-12-3 Parameterization of articulatory pattern in speakers with ALS
    Panying Rong, Yana Yunusova, James Berry, Lorne Zinman and Jordan Green
  Mon-O-12-4 Missing samples estimation in electromagnetic articulography data using equality constrained Kalman smoother
    Sujith P and Prasanta Ghosh
  Mon-O-12-5 Palate-referenced Articulatory Features for Acoustic-to-Articulator Inversion
    An Ji, Michael Johnson and Jeffrey Berry
  Mon-O-12-6 A Study on the Improvement of Measurement Accuracy of the Three-dimensional Electromagnetic Articulography
    Hidetsugu UCHIDA, Kohei WAKAMIYA and Tokihiko Kaburagi

Special Session 2 (b) (Mon-SP2b): INTERSPEECH 2014 Computational Paralinguistics ChallengE (ComParE) II
Monday 15 September 2014 17:00-19:00, Peridot 206
     
  Mon-SP2b-1 High-Level Speech Event Analysis for Cognitive Load Classification
    Claude Montacié and Marie-José Caraty
  Mon-SP2b-2 On the Use of Bhattacharyya based GMM Distance and Neural Net Features for Identification of Cognitive Load Levels
    Tin Lay Nwe, Trung Hieu Nguyen and Bin Ma
  Mon-SP2b-3 Prediction of Cognitive Load from Speech with the VOQAL Voice Quality Toolbox for the InterSpeech 2014 Computational Paralinguistics Challenge
    Mark Huckvale
  Mon-SP2b-4 The UNSW Submission to INTERSPEECH 2014 ComParE Cognitive Load Challenge
    Jia Min Karen Kua, Vidhyasaharan Sethu, Phu Le and Eliathamby Ambikairajah
  Mon-SP2b-5 Classification of Cognitive Load from Speech using an i-vector Framework
    Maarten Van Segbroeck, Ruchir Travadi, Colin Vaz, Jangwon Kim, Matthew Black, Alexandros Potamianos and Shrikanth Narayanan

Poster Session 7 (Mon-P-7): Speech Synthesis I
Monday 15 September 2014 17:00-19:00, Max Atria Gallery
     
  Mon-P-7-1 Using Conditional Random Fields to Predict Focus Word Pair in Spontaneous Spoken English
    Xiao Zang, Zhiyong Wu, Helen Meng, Jia Jia and Lianhong Cai
  Mon-P-7-2 Applications of Maximum Entropy Rankers to Problems in Spoken Language Processing
    Richard Sproat and Keith Hall
  Mon-P-7-3 Text-To-Speech with cross-lingual Neural Network-based grapheme-to-phoneme models
    Javier Gonzalvo and Monika Podsiadło
  Mon-P-7-4 Transform Mapping Using Shared Decision Tree Context Clustering for HMM-Based Cross-Lingual Speech Synthesis
    Daiki Nagahama, Takashi Nose, Tomoki Koriyama and Takao Kobayashi
  Mon-P-7-5 Cross-lingual Voice Conversion-Based Polyglot Speech Synthesizer for Indian Languages
    Ramani Boothalingam, Actlin Jeeva M P, Vijayalakshmi P and Nagarajan T
  Mon-P-7-6 An investigation of the application of dynamic sinusoidal models to statistical parametric speech synthesis
    Qiong Hu, Yannis Stylianou, Ranniery Maia, Korin Richmond, Junichi Yamagishi and Javier Latorre
  Mon-P-7-7 Chaotic Mixed Excitation Source for Speech Synthesis
    Hemant Patil and Tanvina Patel
  Mon-P-7-8 Refined Inter-segment Joining in Multi-Form Speech Synthesis
    Alex Sorin, Slava Shechtman and Vincent Pollet
  Mon-P-7-9 A Hierarchical Viterbi Algorithm for Hybrid Mandarin Speech Synthesis System
    Ran Zhang, Zhengqi Wen, Jianhua Tao, Ya Li, Bin Liu and Xiaoyan Lou

Poster Session 8 (Mon-P-8): Multi-lingual Cross-lingual and Low-resource ASR
Monday 15 September 2014 17:00-19:00, Max Atria Gallery
     
  Mon-P-8-1 Improving Language-Universal Feature Extraction with Deep Maxout and Convolutional Neural Networks
    Yajie Miao and Florian Metze
  Mon-P-8-2 Exploiting Vocal-Source Features to Improve ASR Accuracy for Low-Resource Languages
    Raul Fernandez, Jia Cui, Andrew Rosenberg, Bhuvana Ramabhadran and Xiaodong Cui
  Mon-P-8-3 Data augmentation for low resource languages
    Anton Ragni, Kate Knill, Shakti Rath and Mark Gales
  Mon-P-8-4 About Combining Forward and Backward-Based Decoders for Selecting Data for Unsupervised Training of Acoustic Models
    Denis Jouvet and Dominique Fohr
  Mon-P-8-5 Combination of Multilingual and Semi-Supervised Training for Under-Resourced Languages
    Martin Karafiat and Frantisek Grezl
  Mon-P-8-6 Investigating the Learning Effect of Multilingual Bottle-Neck Features for ASR
    Ngoc Thang Vu, Jochen Weiner and Tanja Schultz
  Mon-P-8-7 Distributed Learning of Multilingual DNN Feature Extractors using GPUs
    Yajie Miao, Hao Zhang and Florian Metze
  Mon-P-8-8 Combining Tandem and Hybrid Systems for Improved Speech Recognition and Keyword Spotting on Low Resource Languages
    Shakti P. Rath, Kate Knill, Anton Ragni and Mark Gales
  Mon-P-8-9 Recent Improvements in Neural Network Acoustic Modeling for LVCSR in low-resource languages
    Jia Cui, Bhuvana Ramabhadran, Xiaodong Cui, Andrew Rosenberg, Brian Kingsbury and Abhinav Sethy
  Mon-P-8-10 TOWARDS BETTER PERFORMANCE WITH HETEROGENEOUS TRAINING DATA IN ACOUSTIC MODELING USING DEEP NEURAL NETWORKS
    Yan Huang, Malcolm Slaney, Michael L. Seltzer and Yifan Gong

Poster Session 9 (Mon-P-9): Speech Estimation and Sound Source Separation
Monday 15 September 2014 17:00-19:00, Max Atria Gallery
     
  Mon-P-9-1 A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov models
    Takuya Higuchi, Hirofumi Takeda, Tomohiko Nakamura and Hirokazu Kameoka
  Mon-P-9-2 Enhancing Audio Source Separability Using Spectro-Temporal Regularization with NMF
    Colin Vaz, Dimitrios Dimitriadis and Shrikanth Narayanan
  Mon-P-9-3 Blind Speech Source Localization, Counting and Separation for 2-channel Convolutive Mixture in a Reverberant Environment
    Sayeh Mirzaei, Hugo Van Hamme and Yaser Norouzi
  Mon-P-9-4 Discriminative NMF and its application to single-channel source separation
    Felix Weninger, Jonathan Le Roux, John Hershey and Shinji Watanabe
  Mon-P-9-5 Vocal tract length estimation based on vowels using a database consisting of 385 speakers and a database with MRI-based vocal tract shape information
    Hideki Kawahara, Tatsuya Kitamura, Hironori Takemoto, Ryuichi Nisimura and Toshio Irino
  Mon-P-9-6 A Graph-based Gaussian Component Clustering Approach to Unsupervised Acoustic Modeling
    Haipeng Wang, Tan Lee, Cheung-Chi Leung, Bin Ma and Haizhou Li
  Mon-P-9-7 A Speech System for Estimating Daily Word Counts
    Ali Ziaei, Abhijeet Sangwan and John H.L. Hansen
  Mon-P-9-8 Ensemble Modeling of Denoising Autoencoder for Speech Spectrum Restoration
    Xugang Lu, Yu Tsao, Shigeki Matsuda and Chiori Hori

Keynote 2: K. J. Ray Liu
Tuesday 16 September 2014 08:30-09:30, Garnet 213-218
     
  Keynote 2 Decision Learning in Data Science: Where John Nash Meets Social Media
   

Oral Session 13 (Tue-O-13): Feature Extraction and Modeling for ASR
Tuesday 16 September 2014 10:00-12:00, Garnet 213-218
     
  Tue-O-13-1 Acoustic Modeling with Deep Neural Networks Using Raw Time Signal for LVCSR
    Zoltán Tüske, Pavel Golik, Ralf Schlüter and Hermann Ney
  Tue-O-13-2 Evaluating robust features on Deep Neural Networks for speech recognition in noisy and channel mismatched conditions
    Vikramjit Mitra, Wen Wang, Horacio Franco, Yun Lei, Chris Bartels and Martin Graciarena
  Tue-O-13-3 Deep Scattering Spectra with Deep Neural Networks for LVCSR Tasks
    Tara Sainath, Vijayaditya Peddinti, Brian Kingsbury, Petr Fousek, Bhuvana Ramabhadran and David Nahamoo
  Tue-O-13-4 Robust CNN-based Speech Recognition With Gabor Filter Kernels
    Shuo-Yiin Chang and Nelson Morgan
  Tue-O-13-5 Probabilistic Linear Discriminant Analysis with Bottleneck Features for Speech Recognition
    Liang Lu and Steve Renals
  Tue-O-13-6 Evaluating speech features with the Minimal-Pair ABX task (II): Resistance to noise
    Thomas Schatz, Vijayaditya Peddinti, Xuan-Nga Cao, Francis Bach, Hynek Hermansky and Emmanuel Dupoux

Oral Session 14 (Tue-O-14): Speech Analysis I
Tuesday 16 September 2014 10:00-12:00, Peridot 202-203
     
  Tue-O-14-1 Lateral formants in three Central Australian languages.
    Marija Tabain, Andrew Butcher, Gavan Breen and Richard Beare
  Tue-O-14-2 Detecting articulatory compensation in acoustic data through linear regression modeling
    Alina Khasanova, Jennifer Cole and Mark Hasegawa-Johnson
  Tue-O-14-3 The relationship between the second subglottal resonance and vowel class, standing height, trunk length, and F0 variation for Mandarin speakers
    Jinxi Guo, Angli Liu, Harish Arsikere, Abeer Alwan and Steven Lulich
  Tue-O-14-4 Comparison of speech quality with and without sensors in electromagnetic articulograph AG 501 recording
    Nisha Meenakshi, Chiranjeevi Yarra, B. K. Yamini and Prasanta Ghosh
  Tue-O-14-5 Impact of age in the production of European Portuguese vowels
    Luciana Albuquerque, Catarina Oliveira, António Teixeira, Pedro Sá-Couto, João Freitas and Miguel Sales Dias
  Tue-O-14-6 'Houston, We have a solution': A Case study of the Analysis of Astronaut Speech during NASA Apollo 11 for Long-term Speaker Modeling
    Chengzhu Yu, John Hansen and Douglas Oard

Oral Session 15 (Tue-O-15): Speech Technologies and Applications
Tuesday 16 September 2014 10:00-12:00, Peridot 204-205
     
  Tue-O-15-1 Choosing Useful Word Alternates for Automatic Speech Recognition Correction Interfaces
    David Harwath, Alexander Gruenstein and Ian McGraw
  Tue-O-15-2 An Initial Investigation of Long-Term Adaptation for Meeting Transcription
    Xie Chen, Mark Gales, Kate Knill, Catherine Breslin, Langzhou Chen, K.K. Chin and Vincent Wan
  Tue-O-15-3 Progress in the BBN Keyword Search System for the DARPA RATS Program
    Tim Ng, Roger Hsiao, Le Zhang, Damianos Karakos, Sri Harish Mallidi, Martin Karafiat, Karel Vesely, Igor Szoke, Bing Zhang, Long Nguyen and Richard Schwartz
  Tue-O-15-4 Speech-To-Text Technology to Transcribe and Disclose 100,000+ Hours of Bilingual Documents from Historical Czech and Czechoslovak Radio Archive
    Jan Nouza, Petr Cerva, Jindrich Zdansky, Karel Blavka, Marek Bohac, Jan Silovsky, Josef Chaloupka, Michaela Kucharova, Ladislav Seps, Jiri Malek and Michal Rott
  Tue-O-15-5 Automatic Assessment of Children's Reading with the FLaVoR Decoding Using a Phone Confusion Model
    Emre Yilmaz, Joris Pelemans and Hugo Van Hamme
  Tue-O-15-6 RWTH LVCSR SYSTEMS FOR QUAERO AND EU-BRIDGE : GERMAN, POLISH, SPANISH AND PORTUGUESE
    Mahaboob Ali Basha Shaik, Zoltán Tüske, Mohammed Ali Tahir, Markus Nussbaum-Thom, Ralf Schlueter and Hermann Ney

Oral Session 16 (Tue-O-16): Source Separation and Computational Auditory Scene Analysis
Tuesday 16 September 2014 10:00-12:00, Peridot 201
     
  Tue-O-16-1 Single Channel Source Separation with General Stochastic Networks
    Matthias Zöhrer and Franz Pernkopf
  Tue-O-16-2 Large-margin Conditional Random Fields for Single-microphone Speech Separation
    Yu Ting Yeung, Tan Lee and Cheung-Chi Leung
  Tue-O-16-3 On the use of the Watson mixture model for clustering-based under-determined blind source separation
    Ingrid Jafari, Roberto Togneri and Sven Nordholm
  Tue-O-16-4 Binary Mask Estimation Based on Frequency Modulations
    Chung-Chien Hsu, Jen-Tzung Chien and Tai-Shih Chi
  Tue-O-16-5 Bayesian Factorization and Selection for Speech and Music Separation
    Po-Kai Yang, Chung-Chien Hsu and Jen-Tzung Chien
  Tue-O-16-6 Self-Adaptation in Single-Channel Source Separation
    Michael Wohlmayr, Ludwig Mohr and Franz Pernkopf

Special Session 3 (Tue-SP3): Speech Technologies for Ambient Assisted Living
Tuesday 16 September 2014 10:00-12:00, Peridot 206
     
  Tue-SP3-1 Multichannel Automatic Recognition of Voice Command in a Multi-Room Smart Home : an Experiment involving Seniors and Users with Visual Impairment
    Michel Vacher, Benjamin Lecouteux and François Portet
  Tue-SP3-2 An Evaluation of Unsupervised Acoustic Model Training for a Dysarthric Speech Interface
    Oliver Walter, Jort F. Gemmeke, Vladimir Despotovic, Bart Ons, Reinhold Haeb-Umbach and Hugo Van Hamme
  Tue-SP3-3 Analysis of Phonetic Similarity in a Silent Speech Interface based on Permanent Magnetic Articulography
    Jose A. Gonzalez, Lam Cheah, Jie Bai, Stephen Ell, James Gilbert, Roger Moore and Phil Green
  Tue-SP3-4 Audio-Visual Signal Processing in a Multimodal Assisted Living Environment
    Alexey Karpov, Lale Akarun, Hülya Yalçın, Alexander Ronzhin, Barış Evrim Demiröz, Aysun Çoban and Milos Zelezny
  Tue-SP3-5 On the selection of the impulse responses for distant-speech recognition based on contaminated speech training
    Mirco Ravanelli and Maurizio Omologo
  Tue-SP3-6 Adaptive Speech Recognition and Dialogue Management for Users with Speech Disorders
    Iñigo Casanueva, Heidi Christensen, Thomas Hain and Phil Green
  Tue-SP3-7 Prediction of cognitive performance in an animal fluency task based on rate and articulatory markers
    Bea Yu, Thomas Quatieri, James Williamson and James Mundt
  Tue-SP3-8 Analysis of laughter events in real science classes by using multiple environment sensor data
    Carlos Ishi, Hiroaki Hatano and Norihiro Hagita

Poster Session 10 (Tue-P-10): DNN for ASR
Tuesday 16 September 2014 10:00-12:00, Max Atria Gallery
     
  Tue-P-10-1 Parallel Deep Neural Network Training for LVCSR Tasks using Blue Gene/Q
    Tara Sainath, I-shin Chung, Bhuvana Ramabhadran, Michael Picheny, John Gunnels, Vernon Austel, Upendra Chaudhari, Brian Kingsbury and George Saon
  Tue-P-10-2 Word Embeddings for Speech Recognition
    Samy Bengio and Georg Heigold
  Tue-P-10-3 1-Bit Stochastic Gradient Descent and its Application to Data-Parallel Distributed Training of Speech DNNs
    Frank Seide, Hao Fu, Jasha Droppo, Gang Li and Dong Yu
  Tue-P-10-4 Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks
    Ryu Takeda, Naoyuki Kanda and Nobuo Nukaga
  Tue-P-10-5 Restructuring Output Layers of Deep Neural Networks using Minimum Risk Parameter Clustering
    Yotaro Kubo, Jun Suzuki, Takaaki Hori and Atsushi Nakamura
  Tue-P-10-6 Distributed Asynchronous Optimization of Convolutional Neural Networks
    William Chan and Ian Lane
  Tue-P-10-7 Convolutional Deep Maxout Networks for Phone Recognition
    Laszlo Toth
  Tue-P-10-8 Joint Sequence Training of Phone and Grapheme Acoustic Model based on Multi-task Learning Deep Neural Networks
    Dongpeng Chen, Brian Mak and Sunil Sivadas
  Tue-P-10-9 Improving Semi-supervised Deep Neural Network for Keyword Search in Low Resource Languages
    Roger Hsiao, Tim Ng, Le Zhang, Shivesh Ranjan, Stavros Tsakalidis, Long Nguyen and Rich Schwartz
  Tue-P-10-10 Pruning Deep Neural Networks by Optimal Brain Damage
    Chao Liu, Dong Wang and Zhiyong Zhang

Poster Session 11 (Tue-P-11): Speaker Recognition - General Topics
Tuesday 16 September 2014 10:00-12:00, Max Atria Gallery
     
  Tue-P-11-1 Improving the Performance of Far-Field Speaker Verification Using Multi-Condition Training: The Case of GMM-UBM and i-vector Systems
    Anderson R. Avila, Milton Sarria-Paja, Francisco J. Fraga, Douglas O'Shaughnessy and Tiago Falk
  Tue-P-11-2 Clustering-Based I-Vector Formulation for Speaker Recognition
    Hung-Shin Lee, Yu Tsao, Hsin-Min Wang and Shyh-Kang Jen
  Tue-P-11-3 Speaker recognition via fusion of subglottal features and MFCCs
    Harish Arsikere, Hitesh Anand Gupta and Abeer Alwan
  Tue-P-11-4 The NIST SRE Summed Channel Speaker Recognition System
    Hanwu Sun and Bin Ma
  Tue-P-11-5 Advantages of Wideband over Narrowband Channels for Speaker Verification Employing MFCCs and LFCCs
    Laura Fernández Gallardo, Michael Wagner and Sebastian Möller
  Tue-P-11-6 Speaker verification and spoken language identification using a generalized i-vector framework with phonetic tokenizations and tandem features
    Ming Li and Wenbo Liu
  Tue-P-11-7 Feature Switching in the i-vector Framework for Speaker Verification
    Asha Talambedu, Saranya M S, Karthik Pandia D S, Srikanth Madikeri and Hema Murthy
  Tue-P-11-8 PLDA Modeling in the Fishervoice Subspace for Speaker Verification
    Jinghua Zhong, Weiwu Jiang, Wei Rao, Man-Wai Mak and Helen Meng
  Tue-P-11-9 Performance Factor Analysis for the 2012 NIST Speaker Recognition Evaluation
    Alvin Martin, Craig Greenberg, Vincent Stanford, John Howard, George Doddington and John Godfrey
  Tue-P-11-10 Simultaneous Gender Classification and Voice Activity Detection Using Deep Neural Networks
    Hiroshi Fujimura

Poster Session 12 (Tue-P-12): Speech Processing with Multi-modalities
Tuesday 16 September 2014 10:00-12:00, Max Atria Gallery
     
  Tue-P-12-1 Dynamic Stream Weight Estimation in Coupled-HMM-based Audio-visual Speech Recognition Using Multilayer Perceptrons
    Ahmed Hussen Abdelaziz and Dorothea Kolossa
  Tue-P-12-2 Lipreading using Convolutional Neural Network
    Kuniaki Noda, Yuki Yamaguchi, Kazuhiro Nakadai, Hiroshi G. Okuno and Tetsuya Ogata
  Tue-P-12-3 Lipreading Approach for Isolated Digits Recognition under Whisper and Neutral Speech
    Fei Tao and Carlos Busso
  Tue-P-12-4 Multimodal Exemplar-based Voice Conversion using Lip Features in Noisy Environments
    Kenta Masaka, Ryo Aihara, Tetsuya Takiguchi and Yasuo Ariki
  Tue-P-12-5 Towards a Practical Silent Speech Recognition System
    Yunbin Deng, Geoffrey Meltzner and James Heaton
  Tue-P-12-6 Enhancing Multimodal Silent Speech Interfaces with Feature Selection
    João Freitas, Artur Ferreira, Mário Figueiredo, António Teixeira and Miguel Sales Dias
  Tue-P-12-7 Opti-speech: A real-time, 3D visual feedback system for speech training
    William Katz, Thomas Campbell, Jun Wang, Eric Farrar, J. Coleman Eubanks, Arvind Balasubramanian, Balakrishnan Prabhkaran and Robert Rennaker
  Tue-P-12-8 Across-speaker Articulatory Normalization for Speaker-independent Silent Speech Recognition
    Jun Wang, Ashok Samal and Jordan Green
  Tue-P-12-9 Conversion from Facial Myoelectric Signals to Speech: A Unit Selection Approach
    Marlene Zahner, Matthias Janke, Michael Wand and Tanja Schultz
  Tue-P-12-10 Towards Real-life Application of EMG-based Speech Recognition by using Unsupervised Adaptation
    Michael Wand and Tanja Schultz
  Tue-P-12-11 Simple Gesture-based Error Correction Interface for Smartphone Speech Recognition
    Yuan Liang, Koji Iwano and Koichi Shinoda

Keynote 3: Lori Lamel
Tuesday 16 September 2014 13:30-14:30, Garnet 213-218
     
  Keynote 3 Language Diversity: Speech Processing In A Multi-Lingual Context
   

Oral Session 17 (Tue-O-17): Normalization and Discriminative Training Methods
Tuesday 16 September 2014 15:00-17:00, Garnet 213-218
     
  Tue-O-17-1 Normalization of ASR Confidence Classifier Scores via Confidence Mapping
    Kshitiz Kumar, Chaojun Liu and Yifan Gong
  Tue-O-17-2 Neural Network Phone Duration Model for Speech Recognition
    Tanel Alumäe
  Tue-O-17-3 Sequence Discriminative Distributed Training of Long Short-Term Memory Recurrent Neural Networks
    Hasim Sak, Oriol Vinyals, Georg Heigold, Andrew Senior, Erik McDermott, Rajat Monga and Mark Mao
  Tue-O-17-4 Beyond Cross-entropy: Towards Better Frame-level Objective Functions For Deep Neural Network Training In Automatic Speech Recognition
    Zhen Huang, Jinyu Li, Chao Weng and Chin-Hui Lee
  Tue-O-17-5 A comparison of training approaches for discriminative segmental models
    Hao Tang, Kevin Gimpel and Karen Livescu
  Tue-O-17-6 Asynchronous Stochastic Optimization for Sequence Training of Deep Neural Networks: Towards Big Data
    Erik McDermott, Georg Heigold, Pedro Moreno, Andrew Senior and Michiel Bacchiani

Oral Session 18 (Tue-O-18): Paralinguistic and Extralinguistic Information
Tuesday 16 September 2014 15:00-17:00, Peridot 202-203
     
  Tue-O-18-1 Detection of Children's Paralinguistic Events in Interaction with Caregivers
    Hrishikesh Rao, Jonathan Kim, Mark Clements, Agata Rozga and Daniel Messinger
  Tue-O-18-2 Age and Rhythmic Variations. A Study on Italian
    Massimo Pettorino and Elisa Pellegrino
  Tue-O-18-3 Probabilistic Acoustic Volume Analysis for Speech Affected by Depression
    Nicholas Cummins, Vidhyasaharan Sethu, Julien Epps and Jarek Krajewski
  Tue-O-18-4 Exploring Modulation Spectrum Features for Speech-Based Depression Level Classification
    Elif Bozkurt, Orith Toledo - Ronen, Alexander Sorin and Ron Hoory
  Tue-O-18-5 Automatic modeling of depressed speech: Relevant Features and Relevance of Gender
    Florian Hönig, Anton Batliner, Elmar Nöth, Sebastian Schnieder and Jarek Krajewski
  Tue-O-18-6 Excitation Source Features for Discrimination of Anger and Happy Emotions
    Gangamohan Paidi, Sudarsana Reddy Kadiri, Suryakanth V Gangashetty and Bayya Yegnanarayana

Oral Session 19 (Tue-O-19): Text Processing for Speech Synthesis
Tuesday 16 September 2014 15:00-17:00, Peridot 204-205
     
  Tue-O-19-1 Encoding Linear Models As Weighted Finite-State Transducers
    Ke Wu, Cyril Allauzen, Keith Hall, Michael Riley and Brian Roark
  Tue-O-19-2 Structured Soft Margin Confidence Weighted Learning for Grapheme-to-Phoneme Conversion
    Keigo Kubo, Sakriani Sakti, Graham Neubig, Tomoki Toda and Satoshi Nakamura
  Tue-O-19-3 Unsupervised Language Filtering using the Latent Dirichlet Allocation
    Wei Zhang, Robert Clark and Yongyuan Wang
  Tue-O-19-4 Generating multiple-accent pronunciations for TTS using joint sequence model interpolation
    BalaKrishna Kolluru, Vincent Wan, Javier Latorre, Kayoko Yanagisawa and Mark Gales
  Tue-O-19-5 Using a hybrid approach to build a pronunciation dictionary for Brazilian Portuguese
    Gustavo Mendonça and Sandra Aluisio
  Tue-O-19-6 A Flexible Front-End for HTS
    Matthew Aylett, Rasmus Dall, Arnab Ghoshal, Gustav Eje Henter and Thomas Merrit

Oral Session 20 (Tue-O-20): Cross-language Perception and Production
Tuesday 16 September 2014 15:00-17:00, Peridot 201
     
  Tue-O-20-1 Cross-language Perception of Japanese Singleton and Geminate Consonants: Preliminary Data from Non-native Learners of Japanese and Native Speakers of Italian and Australian English
    Kimiko Tsukada, Felicity Cox and John Hajek
  Tue-O-20-2 Difficulty in discriminating non-native vowels: Are Dutch vowels easier for Australian English than Spanish listeners?
    Samra Alispahic, Paola Escudero and Karen Mulak
  Tue-O-20-3 Acoustic properties of shared vowels in bilingual Mandarin-English children
    Jing Yang and Robert A. Fox
  Tue-O-20-4 Generating segmental foreign accent
    Maria Luisa Garcia Lecumberri, Roberto Barra-Chicote, Ruben Pérez Ramón, Junichi Yamagishi and Martin Cooke
  Tue-O-20-5 Differences of Pitch Profiles in Germanic and Slavic Languages
    Bistra Andreeva, Grażyna Demenko, Bernd Möbius, Frank Zimmerer, Jeanin Jügler and Magdalena Oleskowicz-Popiel
  Tue-O-20-6 The Obligatory Contour Principle in African and European Varieties of French
    Mathieu Avanzi, Guri Bordal and Gélase Nimbona

Special Session 4 (Tue-SP4): Text-dependent Speaker Verification With Short Utterances
Tuesday 16 September 2014 15:00-17:00, Peridot 206
     
  Tue-SP4-1 Content matching for short duration speaker recognition
    Nicolas Scheffer and Yun Lei
  Tue-SP4-2 Extended RSR2015 for text-dependent speaker verification over VHF channel
    Anthony Larcher, Kong Aik Lee, Pablo L Sordo Martinez, Bin Ma and Haizhou Li
  Tue-SP4-3 Tandem Deep Features for Text-Dependent Speaker verification
    Tianfan Fu, Yanmin Qian, Yuan Liu and Kai Yu
  Tue-SP4-4 In-Domain versus Out-of-Domain training for Text-Dependent JFA
    Patrick Kenny, Themos Stafylakis, Md Jahangir Alam, Pierre Ouellet and Marcel Kockmann
  Tue-SP4-5 Domain Adaptation for Text Dependent Speaker Verification
    Hagai Aronowitz and Asaf Rendel
  Tue-SP4-6 Factor Analysis with Sampling Methods for Text Dependent Speaker Recognition
    Antonio Miguel, Jesus Villalba, Alfonso Ortega, Eduardo Lleida and Carlos Vaquero

Poster Session 13 (Tue-P-13): Speech and Audio Analysis
Tuesday 16 September 2014 15:00-17:00, Max Atria Gallery
     
  Tue-P-13-1 Dictionary-based pitch tracking with dynamic programming
    Ewout van den Berg and Bhuvana Ramabhadran
  Tue-P-13-2 Acoustic Features for Robust classification of Mandarin tones
    Hongbing Hu, Stephen Zahorian, Peter Guzewich and Jiang Wu
  Tue-P-13-3 Preservation of lexical tones in singing in a tone language
    Anastasia Karlsson, Håkan Lundström and Jan-Olof Svantesson
  Tue-P-13-4 Emotional Speech Classification using adaptive Sinusoidal Modelling
    Theodora Yakoumaki, George Kafentzis and Yannis Stylianou
  Tue-P-13-5 Formant Enhancement based Speech Watermarking for Tampering Detection
    Shengbei Wang, Masashi Unoki and Nam Soo Kim
  Tue-P-13-6 Modelling Primitive Streaming of Simple Tone Sequences Through Factorisation of Modulation Pattern Tensors
    Tom Barker, Hugo Van Hamme and Tuomas Virtanen
  Tue-P-13-7 Detection of Vowel Onset Points in Voiced Aspirated Sounds of Indian Languages
    Biswajit Dev Sarma and S R Mahadeva Prasanna
  Tue-P-13-8 Accuracy Evaluation of Esophageal Voice Analysis Based on Automatic Topology Generated-Voicing Source HMM
    Akira Sasou
  Tue-P-13-9 Audio Watermarking Based on Multiple Echo Hiding for FM Radio
    Xuejun Zhang and Xiang Xie

Poster Session 14 (Tue-P-14): Cross-lingual and Adaptive Language Modeling
Tuesday 16 September 2014 15:00-17:00, Max Atria Gallery
     
  Tue-P-14-1 Development of Bilingual ASR System for MediaParl Corpus
    Petr Motlicek, David Imseng, Milos Cernak and Namhoon Kim
  Tue-P-14-2 Investigation of Cross-lingual Bottleneck Features in Hybrid ASR Systems
    Jie Li, Rong Zheng and Bo Xu
  Tue-P-14-3 Language identification of individual words with Joint Sequence Models
    Oluwapelumi Giwa and Marelie Davel
  Tue-P-14-4 Audio-to-text Alignment for speech recognition with very limited resources
    Xavier Anguera, Jordi Luque and Ciro Gracia
  Tue-P-14-5 A Minimal-Resource Transliteration Framework for Vietnamese
    Hoang Gia Ngo, Nancy Chen, Sunil Sivadas, Bin Ma and Haizhou Li
  Tue-P-14-6 Combining Recurrent Neural Networks and Factored Language Models During Decoding of Code-Switching Speech
    Heike Adel, Dominic Telaar, Ngoc Thang Vu, Katrin Kirchhoff and Tanja Schultz
  Tue-P-14-7 Data Augmentation, Feature Combination, and Multilingual Neural Networks to Improve ASR and KWS Performance for Low-resource Languages
    Zoltán Tüske, Pavel Golik, David Nolden, Ralf Schlüter and Hermann Ney
  Tue-P-14-8 Mixture of Latent Words Language Models for Domain Adaptation
    Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki and Sumitaka Sakauchi
  Tue-P-14-9 Improving Spoken Document Retrieval by Unsupervised Language Model Adaptation Using Utterance-based Web Search
    Robert Herms, Marc Ritter, Thomas Wilhelm-Stein and Maximilian Eibl
  Tue-P-14-10 The Nested Indian Buffet Process for Flexible Topic Modeling
    Jen-Tzung Chien and Ying-Lan Chang
  Tue-P-14-11 Automated closed captioning for Russian live broadcasting
    Kirill Levin, Irina Ponomareva, Anna Bulusheva, German Chernykh, Ivan Medennikov, Nikolay Merkin, Aleksey Prudnikov and Natalia Tomashenko

Poster Session 15 (Tue-P-15): Pronunciation Modeling and Learning
Tuesday 16 September 2014 15:00-17:00, Max Atria Gallery
     
  Tue-P-15-1 Pronunciation Modeling of Foreign Words for Mandarin ASR by Considering the Effect of Language Transfer
    Lei Wang and Rong Tong
  Tue-P-15-2 Pronunciation Learning for Named-Entities through Crowd-Sourcing
    Attapol Rutherford, Fuchun Peng and Francoise Beaufays
  Tue-P-15-3 Pronunciation variation in read and conversational Austrian German
    Barbara Schuppler, Martine Adda-Decker and Juan A. Morales-Cordovilla
  Tue-P-15-4 Discriminative pronunciation modeling for dialectal speech recognition
    Maider Lehr, Kyle Gorman and Izhak Shafran
  Tue-P-15-5 The Goodness of Pronunciation algorithm applied to disordered speech
    Thomas Pellegrini, Lionel Fontan, Julie Mauclair, Jérôme Farinas and Marina Robert
  Tue-P-15-6 Using Deep Neural Networks to Improve Proficiency Assessment for Children English Language Learners
    Angeliki Metallinou and Jian Cheng
  Tue-P-15-7 Alignment of Spoken Utterances with Slide Content for Easier Learning with Recorded Lectures using Structured Support Vector Machine (SVM)
    Han Lu, Sheng-syun Shen, Sz-Rung Shiang, Hung-yi Lee and Lin-shan Lee
  Tue-P-15-8 A Preliminary study on ASR-based detection of Chinese mispronunciation by Japanese learners
    Duan Richeng, Zhang Jinsong, Cao Wen and Xie Yanlu

Show and Tell Session 1
Tuesday 16 September 2014 15:00-17:00, Garnet Foyer
     
  Show&Tell-1-1 3D tongue motion visualization based on ultrasound image sequences
    Kele Xu, Yin Yang, A. Jaumard-Hakoun, M. Adda-Decker, A. Amelot, S. K. Al Kork, L. Crevier-Buchman, P. Chawah, G. Dreyfus, T. Fux, C. Pillot-Loiseau, P. Roussel, M. Stone, B. Denby
  Show&Tell-1-2 Listen with your skin: Aerotak speech perception enhancement system
    Donald Derrick, Tom De Rybel, Greg O'Beirne, Jennifer Hay
  Show&Tell-1-3 Speech Assistant System
    László Czap
  Show&Tell-1-4 Spoken Dialogue System for Restaurant Recommendation and Reservation
    Rafael E. Banchs, Seokhwan Kim
  Show&Tell-1-5 Interlingual Map Task Corpus Collection
    Hayakawa Akira, Nick Campbell, Saturnino Luz
  Show&Tell-1-6 Translation mobile application for Chinese and Spanish travellers
    Jordi Centelles, Marta R. Costa-jussa, Rafael E. Banchs
  Show&Tell-1-7 LuciaWebGL: A New WebGL-Based Talking Head
    Alberto Benin, Piero Cosi, Giuseppe.Riccardo Leone, Giulio Paci
  Show&Tell-1-8 Crowdee: Mobile Crowdsourcing Micro-task Platform for Celebrating the Diversity of Languages
    Babak Naderi, Tim Polzehl, André Beyer, Tibor Pilz, Sebastian Möller
  Show&Tell-1-9 On the Use of the ‘Pure Data’ Programming Language for Teaching and Public Outreach in Speech Processing
    Roger K. Moore
  Show&Tell-1-10 SyncWords: A Platform for Semi-Automated Closed Captioning and Subtitles
    Aleksandr Dubinsky
  Show&Tell-1-11 Simple4All Show and Tell
    Rob Clark

Keynote 4: William Shi-Yuan Wang
Wednesday 17 September 2014 08:30-09:30, Garnet 213-218
     
  Keynote 4 Sound Patterns in Language
   

Oral Session 21 (Wed-O-21): Statistical Parametric Speech Synthesis
Wednesday 17 September 2014 10:00-12:00, Garnet 213-218
     
  Wed-O-21-1 Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech
    Gustav Eje Henter, Thomas Merritt, Matt Shannon, Catherine Mayo and Simon King
  Wed-O-21-2 Investigating source and filter contributions, and their interaction, to statistical parametric speech synthesis
    Thomas Merritt, Tuomo Raitio and Simon King
  Wed-O-21-3 Voice expression conversion with factorised HMM-TTS models
    Javier Latorre, Vincent Wan and Kayoko Yanagisawa
  Wed-O-21-4 Noise-robust TTS speaker adaptation with statistics smoothing
    Kayoko Yanagisawa, Langzhou Chen and Mark J. F. Gales
  Wed-O-21-5 Speech synthesis in various communicative situations: Impact of pronunciation variations
    Sandrine Brognaux, Benjamin Picart and Thomas Drugman
  Wed-O-21-6 Formant-Controlled Speech Synthesis Using Hidden Trajectory Model
    Ming-Qi Cai, Zhen-Hua Ling and Lirong Dai

Oral Session 22 (Wed-O-22): Voice Activity Detection
Wednesday 17 September 2014 10:00-12:00, Peridot 202-203
     
  Wed-O-22-1 Boosted Deep Neural Networks and Multi-resolution Cochleagram Features for Voice Activity Detection
    Xiao-Lei Zhang and DeLiang Wang
  Wed-O-22-2 Selection of optimal vocal tract regions using real-time magnetic resonance imaging for robust voice activity detection
    Abhay Prasad, Prasanta Ghosh and Shrikanth Narayanan
  Wed-O-22-3 Speech Activity Detection for NASA Apollo Space Missions: Challenges and Solutions
    Ali Ziaei, Lakshmish Kaushik, Abhijeet Sangwan, John H.L. Hansen and Douglas Oard
  Wed-O-22-4 Towards Improving Statistical Model Based Voice Activity Detection
    Ming Tu, Xiang Xie and Yishan Jiao
  Wed-O-22-5 The Use of Low-Frequency Ultrasound for Voice Activity Detection
    Ian McLoughlin
  Wed-O-22-6 Improving the Speech Activity Detection for the DARPA RATS Phase-3 Evaluation
    Jeff Ma

Oral Session 23 (Wed-O-23): Disordered Speech
Wednesday 17 September 2014 10:00-12:00, Peridot 204-205
     
  Wed-O-23-1 Modeling Pronunciation, Rhythm, and Intonation for Automatic Assessment of Speech Quality in Aphasia Rehabilitation
    Duc Le and Emily Mower Provost
  Wed-O-23-2 Ranking severity of speech errors by their phonological impact in context
    Sofia Strömbergsson, Christina Tånnander and Jens Edlund
  Wed-O-23-3 Automatic Detection of Parkinson's Disease from Words Uttered in Three Different Languages
    Juan Rafael Orozco-Arroyave, Florian Hönig, Julián David Arias-Londoño, Jesús Francisco Vargas-Bonilla, Sabine Skodda, Jan Rusz and Elmar Nöth
  Wed-O-23-4 Automating an Objective Measure of Pediatric Speech Intelligibility
    Jason Lilley, Susan Nittrouer and H Timothy Bunnell
  Wed-O-23-5 A COMPARISON OF GMM-HMM AND DNN-HMM BASED PRONUNCIATION VERIFICATION TECHNIQUES FOR USE IN THE ASSESSMENT OF CHILDHOOD APRAXIA OF SPEECH
    Mostafa Shahin, Beena Ahmed, Jacqueline McKechnie, Kirrie Ballard and Ricardo Gutierrez-Osuna
  Wed-O-23-6 Acoustic and Kinematic Characteristics of Vowel Production through a Virtual Vocal Tract in Dysarthria
    Jeff Berry, Andrew Kolb, Cassandra North and Michael Johnson

Oral Session 24 (Wed-O-24): Speech and Multimodal Resources
Wednesday 17 September 2014 10:00-12:00, Peridot 201
     
  Wed-O-24-1 The EMG-UKA Corpus for Electromyographic Speech Processing
    Michael Wand, Matthias Janke and Tanja Schultz
  Wed-O-24-2 A Whispered Mandarin Corpus for Speech Technology Applications
    Pei Xuan Lee, Darren Wee, Hilary Toh, Boon Pang Lim, Nancy Chen and Bin Ma
  Wed-O-24-3 Euronews: a multilingual benchmark for ASR and LID
    Roberto Gretter
  Wed-O-24-4 ATHENA: A Greek Multi-Sensory Database for Home Automation Control
    Antigoni Tsiami, Isidoros Rodomagoulakis, Panagiotis Giannoulis, Athanasios Katsamanis, Gerasimos Potamianos and Petros Maragos
  Wed-O-24-5 The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones
    Marco Matassoni, Ramón Fernandez Astudillo, Athanasios Katsamanis and Mirco Ravanelli
  Wed-O-24-6 Verbal description of LEGO blocks
    Diogo Henriques, Isabel Trancoso, Daniel Mendes and Alfredo Ferreira

Special Session 5 (Wed-SP5): Phase Importance in Speech Processing Applications
Wednesday 17 September 2014 10:00-12:00, Peridot 206
     
  Wed-SP5-1 INTERSPEECH 2014 Special Session: Phase Importance in Speech Processing Applications
    Pejman Mowlaee, Rahim Saeidi and Yannis Stylianou
  Wed-SP5-2 Phase-based harmonic/percussive separation
    Estefania Cano, Mark Plumbley and Christian Dittmar
  Wed-SP5-3 Phase Distortion Statistics as a Representation of the Glottal Source: Application to the Classification of Voice Qualities
    Gilles Degottex and Nicolas Obin
  Wed-SP5-4 A measure of phase randomness for the harmonic model in speech synthesis
    Gilles Degottex and Daniel Erro
  Wed-SP5-5 Enhancement of speech intelligibility in near-end noise conditions with phase modification
    Emma Jokinen, Marko Takanen, Hannu Pulakka and Paavo Alku
  Wed-SP5-6 A Hybrid Approach to Segmentation of Speech Using Group Delay Processing and HMM Based Embedded Reestimation
    S Aswin Shanmugam and Hema Murthy
  Wed-SP5-7 THE IMPORTANCE OF PHASE ON VOICE QUALITY ASSESSMENT
    Maria Koutsogiannaki, Olympia Simantiraki, Gilles Degottex and Yannis Stylianou
  Wed-SP5-8 Feature Extraction from Analytic Phase of Speech Signals for Speaker Verification
    Karthika Vijayan, Vinay Kumar and K. Sri Rama Murty
  Wed-SP5-9 A Cross-vocoder Study of Speaker Independent Synthetic Speech Detection using Phase Information
    Jon Sanchez, Ibon Saratxaga, Inma Hernaez, Eva Navas and Daniel Erro

Poster Session 16 (Wed-P-16): Phonetics and Phonology
Wednesday 17 September 2014 10:00-12:00, Max Atria Gallery
     
  Wed-P-16-1 Investigating the effect of F0 and vocal intensity on harmonic magnitudes: Data from laryngeal high-speed videoendoscopy
    Gang Chen, Soo Jin Park, Jody Kreiman and Abeer Alwan
  Wed-P-16-2 Adapting prosodic chunking algorithm and synthesis system to specific style: the case of dictation
    Elisabeth DELAIS-ROUSSARIE, Damien Lolive, Hiyon YOO, Nelly BARBOT and Olivier ROSEC
  Wed-P-16-3 The Articulation of Lexical and Post-lexical Palatalization in Korean
    Jae-Hyun Sung
  Wed-P-16-4 Articulation and Neutralization: A Preliminary Study of Lenition in Scottish Gaelic
    Diana Archangeli, Samuel Johnston, Jae-Hyun Sung, Muriel Fisher, Michael Hammond and Andrew Carnie
  Wed-P-16-5 Nasality in Speech and Its Contribution to Speaker Individuality
    Kanae Amino, Hisanori Makinae and Tatsuya Kitamura
  Wed-P-16-6 Is Speech Rhythm an Intrinsic Property of Language?
    Jason Brown and Eden Matene
  Wed-P-16-7 Where /aR/ the /R/s in Standard Austrian German?
    Anke Jackschina, Barbara Schuppler and Rudolf Muhr
  Wed-P-16-8 Diphthongized Vowels in the Yi County Hui Chinese Dialect
    Fang Hu and Minghui Zhang
  Wed-P-16-9 Rhythmic variability between some Asian Languages: Results from an automatic analysis of temporal characteristics
    Volker Dellwo, Peggy Mok and Mathias Jenny
  Wed-P-16-10 Listener estimation of speaker age based on whispered speech
    Angelika Braun and Daniela Decker
  Wed-P-16-11 The Lombard Effect with Thai Lexical Tones: An acoustic analysis of articulatory modifications in noise
    Benjawan Kasisopa, Virginie Attina and Denis Burnham

Poster Session 17 (Wed-P-17): Spoken Term Detection and Document Retrieval
Wednesday 17 September 2014 10:00-12:00, Max Atria Gallery
     
  Wed-P-17-1 Intrinsic Spectral Analysis Based on Temporal Context Features for Query-by-Example Spoken Term Detection
    Peng Yang, Cheung-Chi Leung, Lei Xie, Bin Ma and Haizhou Li
  Wed-P-17-2 Recent improvements in SRI’s Keyword Detection System for Noisy Audio
    Julien van Hout, Vikramjit Mitra, Yun Lei, Dimitra Vergyri, Martin Graciarena, Arindam Mandal and Horacio Franco
  Wed-P-17-3 Utilizing State-level Distance Vector Representation for Improved Spoken Term Detection by Text and Spoken Queries
    Mitsuaki Makino, Naoki Yamamoto and Atsuhiko Kai
  Wed-P-17-5 Unsupervised Spoken Word Retrieval using Gaussian-Bernoulli Restricted Boltzmann Machines
    Pappagari RaghavendraReddy, Shekhar Nayak and K Sri Rama Murty
  Wed-P-17-6 Unsupervised Query-by-Example Spoken Term Detection using Bag of Acoustic Words and Non-Segmental Dynamic Time Warping
    Basil George, Abhijeet Saxena, Gautam Mantena, Kishore Prahallad and Bayya Yegnanarayana
  Wed-P-17-7 An Empirical Study of Multilingual and Low-resource Spoken Term Detection Using Deep Neural Networks
    Jie Li, Xiaorui Wang and Bo Xu
  Wed-P-17-8 Diagnostic Techniques for Spoken Keyword Discovery
    Peter Schulam and Murat Akbacak
  Wed-P-17-9 Robust Retrieval Models for False Positive Errors in Spoken Documents
    Sho Kawasaki and Tomoyosi Akiba
  Wed-P-17-10 Semantic Retrieval of Personal Photos using Matrix Factorization and Two-layer Random Walk Fusing Sparse Speech Annotations with Visual Features
    Yung-ming Liou, Yi-sheng Fu, Hung-yi Lee and Lin-shan Lee
  Wed-P-17-11 Audio thumbnails for spoken content without transcription based on a maximum motif coverage criterion
    Guillaume Gravier, Nathan Souviraà-Labastie, Sébastien Campion and Frédéric Bimbot
  Wed-P-17-12 Semantically Based Search in a Social Speech Task
    Fernando García, Emilio Sanchis and Ferran Pla

Poster Session 18 (Wed-P-18): Prosody and Paralinguistic Information
Wednesday 17 September 2014 10:00-12:00, Max Atria Gallery
     
  Wed-P-18-1 Study of Changes in Glottal Vibration Characteristics During Laughter
    Vinay Kumar Mittal and B. Yegnanarayana
  Wed-P-18-2 ON PREDICTING THE UNPLEASANTNESS LEVEL OF A SOUND EVENT
    Stavros Ntalampiras and Ilyas Potamitis
  Wed-P-18-3 Predicting when to laugh with structured classification
    Bilal Piot, Olivier Pietquin and Matthieu Geist
  Wed-P-18-4 Conversational structures affecting auditory likeability
    Benjamin Weiss and Katrin Schoenenberg
  Wed-P-18-5 Towards the Adaptation of Prosodic Models for Expressive Text-To-Speech Synthesis
    Mathieu Avanzi, George Christodoulides, Damien Lolive, Elisabeth DELAIS-ROUSSARIE and Nelly Barbot
  Wed-P-18-6 Data-Driven Generation of Text Balloons based on Linguistic and Acoustic Features of a Comics-Anime Corpus
    Sho Matsumiya, Sakriani Sakti, Graham Neubig, Tomoki Toda and Satoshi Nakamura
  Wed-P-18-7 Learning L2 Prosody Is More Difficult than You Realize– F0 Characteristics and Chunking Size of L1 English, TW L2 English and TW L1 Mandarin
    Chiu-yu Tseng and Chao-yu Su
  Wed-P-18-8 Investigating prosodic relations between initiating and responding laughs
    Khiet Truong and Jürgen Trouvain
  Wed-P-18-9 Application of Image Processing Methods to Filled Pauses Detection from Spontaneous Speech
    Dmytro Prylipko, Olga Egorow, Ingo Siegert and Andreas Wendemuth
  Wed-P-18-10 Perception of Sentence Stress in English Infant Directed Speech
    Sofoklis Kakouros and Okko Räsänen
  Wed-P-18-11 Automatic Recognition of Attitudes in Video Blogs - Prosodic and Visual Feature Analysis
    Noor Alhusna Madzlan, JingGuang Han, Francesca Bonin and Nick Campbell
  Wed-P-18-12 “Was that your mother on the phone?”: Classifying Interpersonal Relationships between Dialog Participants with Lexical and Acoustic Properties
    Denys Katerenchuk, David Guy Brizan and Andrew Rosenberg

Oral Session 25 (Wed-O-25): Features and Robustness in Speaker and Language Recognition
Wednesday 17 September 2014 13:30-15:30, Garnet 213-218
     
  Wed-O-25-1 Combining Source and System Information for Limited Data Speaker Verification
    Rohan Kumar Das, Abhiram B, S R Mahadeva Prasanna and A G Ramakrishnan
  Wed-O-25-2 New Insight into the Use of Phone Log-Likelihood Ratios as Features for Language Recognition
    Mireia Diez, Amparo Varona, Mikel Penagarikano, Luis Javier Rodriguez-Fuentes and German Bordel
  Wed-O-25-3 Robust Language Identification Using Convolutional Neural Network Features
    Sriram Ganapathy, Kyu Han, Samuel Thomas, Mohamed Omar, Maarten Van Segbroeck and Shrikanth Narayanan
  Wed-O-25-4 Acoustic Feature Transformation using UBM-based LDA for Speaker Recognition
    Chengzhu Yu, Gang Liu and John Hansen
  Wed-O-25-5 SNR-Dependent Mixture of PLDA for Noise Robust Speaker Verification
    Manwai Mak
  Wed-O-25-6 Nearest Neighbor Discriminant Analysis for Robust Speaker Recognition
    Seyed Omid Sadjadi, Jason Pelecanos and Weizhong Zhu

Oral Session 26 (Wed-O-26): Topic Spotting and Summarization of Spoken Documents
Wednesday 17 September 2014 13:30-15:30, Peridot 202-203
     
  Wed-O-26-1 Enhanced Language Modeling for Extractive Speech Summarization with Sentence Relatedness Information
    Shih-Hung Liu, Kuan-Yu Chen, Yu-lun Hsieh, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen and Wen-Lian Hsu
  Wed-O-26-2 I-vector based Representation of Highly Imperfect Automatic Transcriptions
    Mohamed Morchid, Mohamed Bouallegue, Richard Dufour, Georges Linares, Driss Matrouf and Renato de Mori
  Wed-O-26-3 Incorporating Lexical and Prosodic Information at Different Levels for Meeting Summarization
    Catherine Lai and Steve Renals
  Wed-O-26-4 Subspace Gaussian Mixture Models for Dialogues Classification
    Mohamed Bouallegue, Mohamed Morchid, Richard Dufour, Driss Matrouf, Georges Linares and Renato de Mori
  Wed-O-26-5 FACTOR ANALYSIS BASED SEMANTIC VARIABILITY COMPENSATION FOR AUTOMATIC CONVERSATION REPRESENTATION
    Mohamed Bouallegue, Mohamed Morchid, Richard Dufour, Driss Matrouf, Georges Linares and Renato de Mori
  Wed-O-26-6 Speech Cohesion for Topic Segmentation of Spoken Contents
    Bouchekif Abdessalam, Geraldine Damnati and Delphine Charlet

Oral Session 27 (Wed-O-27): DNN Learning
Wednesday 17 September 2014 13:30-15:30, Peridot 204-205
     
  Wed-O-27-1 A COMPARATIVE ANALYTIC STUDY ON THE GAUSSIAN MIXTURE AND CONTEXT DEPENDENT DEEP NEURAL NETWORK HIDDEN MARKOV MODELS
    Yan Huang, Dong Yu, Chaojun Liu and Yifan Gong
  Wed-O-27-2 Asynchronous, Online, GMM-free Training of a Context Dependent Acoustic Model for Speech Recognition
    Michiel Bacchiani, Andrew Senior and Georg Heigold
  Wed-O-27-3 Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models.
    Navdeep Jaitly, Vincent Vanhoucke and Geoffrey Hinton
  Wed-O-27-4 Learning Small-Size DNN with Output-Distribution-Based Criteria
    Jinyu Li, Rui Zhao, Jui-Ting Huang and Yifan Gong
  Wed-O-27-5 Ensemble Deep Learning for Speech Recognition
    Li Deng and John Platt
  Wed-O-27-6 Learning Conditional Random Field with Hierarchical Representations for Dialogue Act Recognition
    Yucan Zhou, Qinghua Hu, Jie Liu and Yuan Jia

Oral Session 28 (Wed-O-28): Perception of Emotion and Prosody
Wednesday 17 September 2014 13:30-15:30, Peridot 201
     
  Wed-O-28-1 Can adolescents with autism perceive emotional prosody?
    Cristiane Hsu and Yi Xu
  Wed-O-28-2 Age, Hearing Loss and the Perception of Affective Utterances in Conversational Speech
    Juliane Schmidt, Esther Janse and Odette Scharenborg
  Wed-O-28-3 Analysis of Emotional Effect on Speech-Body Gesture Interplay
    Zhaojun Yang and Shrikanth Narayanan
  Wed-O-28-4 When voices get emotional : A study of emotion-enhanced memory and impairment during emotional prosody exposure
    Cyrielle Chappuis and Didier Grandjean
  Wed-O-28-5 Perception of pitch tails at potential turn boundaries in Swedish
    Margaret Zellers
  Wed-O-28-6 Towards a perceptual model of speech rhythm: Integrating the influence of f0 on perceived duration
    Robert Fuchs

Special Session 6 (a) (Wed-SP6a): Deep Neural Networks for Speech Generation and Synthesis I
Wednesday 17 September 2014 13:30-15:30, Peridot 206
     
  Wed-SP6a-1 DNN-based stochastic postfilter for HMM-based speech synthesis
    Ling-Hui Chen, Tuomo Raitio, Cassia Valentini-Botinhao, Junichi Yamagishi and Zhen-Hua Ling
  Wed-SP6a-2 Statistical Parametric Speech Synthesis using Weighted Multi-distribution Deep Belief Network
    Shiyin Kang and Helen Meng
  Wed-SP6a-3 TTS Synthesis with Bidirectional LSTM based Recurrent Neural Networks
    Yuchen Fan, Yao Qian, Fenglong Xie and Frank K. Soong
  Wed-SP6a-4 Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort
    Tuomo Raitio, Antti Suni, Lauri Juvela, Martti Vainio and Paavo Alku
  Wed-SP6a-5 An Introduction to Computational Networks and the Computational Network Toolkit
    Dong Yu, Adam Eversole, Mike Seltzer, Kaisheng Yao, Brian Guenter, Oleksii Kuchaiev, Frank Seide, Huaming Wang, Jasha Droppo, Zhiheng Huang, Geoffrey Zweig, Chris Rossbach, Jon Currey

Poster Session 19 (Wed-P-19): Speech Analysis and Perception
Wednesday 17 September 2014 13:30-15:30, Max Atria Gallery
     
  Wed-P-19-1 Acoustic investigation of /th/ lenition in Brunei Mandarin
    Shufang Xu
  Wed-P-19-2 Mapping Emotions into Acoustic Space: the Role of Voice Quality
    Ting Wang, Hongwei Ding, Jianjing Kuang and Qiuwu Ma
  Wed-P-19-3 Principal Components of Auditory Spectro-Temporal Receptive Fields
    Nagaraj Mahajan, Nima Mesgarani and Hynek Hermansky
  Wed-P-19-4 Segmentation in singer turns with the Bayesian Information Criterion
    Thlithi Marwa, Pellegrini Thomas, Pinquier Julien and André-Obrecht Régine
  Wed-P-19-5 Mappings between vocal tract area functions, vocal tract resonances and speech formants for multiple speakers.
    Catherine Watson
  Wed-P-19-6 A Next Step Towards Measuring Perceived Quality of Speech Through Physiology
    Sebastian Arndt, Markus Wenzel, Jan-Niklas Antons, Friedemann Köster, Sebastian Möller and Gabriel Curio
  Wed-P-19-7 Effect of Spectral Degradation to the Intelligibility of Vowel Sentences
    Fei Chen, Sharon W.K. Wong and Lena L.N. Wong
  Wed-P-19-8 Consonant Context Effects on Vowel Sensorimotor Adaptation
    Jeff Berry, John Jaeger, Melissa Wiedenhoeft, Brittany Bernal and Michael Johnson
  Wed-P-19-9 Assessing objective characterizations of phonetic convergence
    Gerard Bailly and Amelie Martin
  Wed-P-19-10 Generalizing time-frequency importance functions across noises, talkers, and phonemes
    Michael Mandel, Sarah Yoho and Eric Healy
  Wed-P-19-11 Does elderly speech recognition in noise benefit from spectral and visual cues?
    Yatin Mahajan, Jeesun Kim and Chris Davis
  Wed-P-19-12 On the conversant-specificity of stochastic turn-taking models
    Kornel Laskowski

Poster Session 20 (Wed-P-20): Intelligibility Enhancement and Predictive Measures
Wednesday 17 September 2014 13:30-15:30, Max Atria Gallery
     
  Wed-P-20-1 Single-Ended Estimation of Speech Intelligibility using the ITU P.563 Feature Set
    Toshihiro Sakano, Yosuke Kobayashi and Kazuhiro Kondo
  Wed-P-20-2 Spectral tilt modelling with GMMs for intelligibility enhancement of narrowband telephone speech
    Emma Jokinen, Ulpu Remes, Marko Takanen, Kalle Palomäki, Mikko Kurimo and Paavo Alku
  Wed-P-20-3 Analyzing Perceptual Dimensions of Conversational Speech Quality
    Friedemann Köster and Sebastian Möller
  Wed-P-20-4 Interplay of informational content and energetic masking in speech perception in noise
    Vincent Aubanel, Chris Davis and Jeesun Kim
  Wed-P-20-5 On spectral and time domain energy reallocation for speech-in-noise intelligibility enhancement
    Catalin Zorila and Yannis Stylianou
  Wed-P-20-6 Objective Quality Evaluation of Noise-suppressed Speech: Effects of Temporal Envelope and Fine-structure Cues
    Fei Chen and Yi Hu
  Wed-P-20-7 NOISY SPEECH ENHANCEMENT BASED ON LONG TERM HARMONIC MODEL TO IMPROVE SPEECH INTELLIGIBILITY FOR HEARING IMPAIRED LISTENERS
    Dongmei Wang, Philipos Loizou and John H.L. Hansen
  Wed-P-20-8 Using linguistic predictability and the Lombard effect to increase the intelligibility of synthetic speech in noise
    Cassia Valentini-Botinhao and Mirjam Wester
  Wed-P-20-9 Speech pre-enhancement using a discriminative microscopic intelligibility model
    Maryam Al Dabel and Jon Barker
  Wed-P-20-10 Least Squares Signal Declipping for Robust Speech Recognition
    Mark Harvilla and Richard Stern

Poster Session 21 (Wed-P-21): Speech and Language Processing - General Topics
Wednesday 17 September 2014 13:30-15:30, Max Atria Gallery
     
  Wed-P-21-1 Semi-supervised Training for Bottle-neck Feature based DNN-HMM Hybrid Systems
    Haihua Xu, Hang Su, Eng Siong Chng and Haizhou Li
  Wed-P-21-2 A big data approach to acoustic model training corpus selection
    Olga Kapralova, John Alex, Eugene Weinstein, Pedro Moreno and Olivier Siohan
  Wed-P-21-3 Recent Advances in ASR Applied to an Arabic Transcription System for Al-Jazeera
    Patrick Cardinal, Ahmed Ali, Najim Dehak, Yu Zhang, Tuka Al Hanai, Yifan Zhang, Jim Glass and Stephan Vogel
  Wed-P-21-4 rwthlm – The RWTH Aachen University Neural Network Language Modeling Toolkit
    Martin Sundermeyer, Ralf Schlüter and Hermann Ney
  Wed-P-21-5 Language Modeling with Sum-Product Networks
    Wei-Chen Cheng, Stanley Kok, Hoai Vu Pham, Hai Leong Chieu and Kian Ming A. Chai
  Wed-P-21-6 Improving Deep Neural Network Acoustic Modeling For Audio Corpus Indexing Under The IARPA Babel Program
    Xiaodong Cui, Brian Kingsbury, Jia Cui, Bhuvana Ramabhadran, Andrew Rosenberg, Mohammed Sadegh Rasooli, Owen Rambow, Nizar Habash and Vaibhava Goel
  Wed-P-21-7 Cross-language transfer of semantic annotation via targeted crowdsourcing
    Shammur Absar Chowdhury, Arindam Ghosh, Evgeny Stepanov, Ali Orkan Bayer, Giuseppe Riccardi and Ioannis Klasinas
  Wed-P-21-8 Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding
    Dilek Hakkani-Tur, Asli Celikyilmaz, Larry Heck, Gokhan Tur and Geoffrey Zweig
  Wed-P-21-9 Automatic Speech Recognition and Translation of a Swiss German Dialect: Walliserdeutsch
    Philip N. Garner, David Imseng and Thomas Meyer
  Wed-P-21-10 Building Resources for Algerian Arabic Dialects
    Salima Harrat, Karima Meftouh, Mourad Abbas and Kamel Smaili

Show and Tell Session 2
Wednesday 17 September 2014 13:30-15:30, Garnet Foyer
     
  Show&Tell-2-1 An educational platform to capture, visualize and analyze rare singing
    P. Chawah, S. K. Al Kork, T. Fux, M. Adda-Decker, A. Amelot, N. Audibert, B. Denby, G. Dreyfus, A. Jaumard-Hakoun,C. Pillot-Loiseau, P. Roussel, M. Stone, K. Xu, L. Buchman
  Show&Tell-2-2 Single-Channel Speech Enhancement Based on Non-negative Matrix Factorization and Online Noise Adaptation
    Kwang Myung Jeon, Chan Jun Chun, Woo Kyeong Seong, Hong Kook Kim, and Myung Kyu Choi
  Show&Tell-2-3 Intelligibility of high-pitched vowel sounds in the singing and speaking of a female Cantonese Opera singer
    Dieter Maurer, Peggy Mok, Daniel Friedrichs, Volker Dellwo
  Show&Tell-2-4 Show & Tell: Iterative Refinement of Amplitude and Phase in Single-channel Speech Enhancement
    Pejman Mowlaee, Mario Kaoru Watanabe and Rahim Saeidi
  Show&Tell-2-5 eLite-HTS: a NLP tool for French HMM-based speech synthesis
    Sophie Roekhaut, Sandrine Brognaux, Richard Beaufort and Thierry Dutoit
  Show&Tell-2-6 SARA – Singapore’s Automated Responsive Assistant for the Touristic Domain
    Andreea I. Niculescu, Rafael E Banchs, Ridong Jiang, Seokhwan Kim, Kheng Hui Yeo, Arthur Niswar
  Show&Tell-2-7 The Speech Recognition Virtual Kitchen: Launch Party
    Andrew Plummer, Eric Riebling, Anuj Kumar, Florian Metze, Eric Fosler-Lussier, and Rebecca Bates
  Show&Tell-2-8 System for Automated Speech and Language Analysis (SALSA)
    Kyle Marek-Spartz, Benjamin Knoll, Robert Bill, Thomas Christie, Serguei Pakhomov
  Show&Tell-2-9 Pronunciation Practice Support System for Children who Have Difficulty Correctly Pronouncing Words
    Ikuyo Masuda-Katsuse
  Show&Tell-2-10 Automated Production of True-cased Punctuated Subtitles for Weather and News Broadcasts
    Joris Driesen, Alexandra Birch, Simon Grimsey, Saeid Safarfashandi, Juliet Gauthier, Matt Simpson, Steve Renals
  Show&Tell-2-11 I2R Speech2Singing Perfects Everyone’s Singing
    Minghui Dong, S. W. Lee, Haizhou Li, Paul Chan, Xuejian Peng, Jochen Walter Ehnes, Dongyan Huang

Oral Session 29 (Wed-O-29): Language, Dialect and Accent Recognition
Wednesday 17 September 2014 16:00-18:00, Garnet 213-218
     
  Wed-O-29-1 Spoken Language Recognition Based on Senone Posteriors
    Luciana Ferrer, Yun Lei, Mitchell McLaren and Nicolas Scheffer
  Wed-O-29-2 Automatic Language Identification using Long Short-Term Memory Recurrent Neural Networks
    Javier Gonzalez, Ignacio Lopez-Moreno, Hasim Sak, Joaquin Gonzalez-Rodriguez and Pedro Moreno
  Wed-O-29-3 Robust Language Recognition via Adaptive Language Factor Extraction
    Brecht Desplanques, Kris Demuynck and Jean-Pierre Martens
  Wed-O-29-4 Dialect Levelling in Finnish: A Universal Speech Attribute Approach
    Hamid Behravan, Ville Hautamaki, Sabato Marco Siniscalchi, Elie Khoury, Tommi Kurki, Tomi Kinnunen and Chin-Hui Lee
  Wed-O-29-5 Improving native accent identification using deep neural networks
    Mingming Chen, Zhanlei Yang, Hao Zheng and Wenju Liu
  Wed-O-29-6 Foreign accent recognition based on temporal information contained in lowpass-filtered speech
    Marie-José Kolly, Adrian Leemann and Volker Dellwo

Oral Session 30 (Wed-O-30): Adaptation
Wednesday 17 September 2014 16:00-18:00, Peridot 202-203
     
  Wed-O-30-1 Adaptation of Deep Neural Network Acoustic Models Using Factorised I-Vectors
    Penny Karanasou, Yongqiang Wang, Mark Gales and Phil Woodland
  Wed-O-30-2 Regularized feature-space discriminative adaptation for robust ASR
    Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura, Steven J. Rennie and Vaibhava Goel
  Wed-O-30-3 Towards Speaker Adaptive Training of Deep Neural Network Acoustic Models
    Yajie Miao, Hao Zhang and Florian Metze
  Wed-O-30-4 Component Structuring and Trajectory Modeling for Speech Recognition
    Arseniy Gorin and Denis Jouvet
  Wed-O-30-5 Speaker Dependent Bottleneck Layer Training for Speaker Adaptation in Automatic Speech Recognition
    Rama Sanand Doddipatla, Madina Hasan and Thomas Hain
  Wed-O-30-6 Improving Wideband Acoustic Models Using Mixed-bandwidth Training Data via DNN Adaptation
    Zhao You and Bo Xu

Oral Session 31 (Wed-O-31): Speaker Localization
Wednesday 17 September 2014 16:00-18:00, Peridot 204-205
     
  Wed-O-31-1 A Sparse Reconstruction Method for Speech Source Localization using Partial Dictionaries over a Spherical Microphone Array
    Kushagra Singhal and Rajesh M Hegde
  Wed-O-31-2 A robust TDOA estimation method for in-car-noise environments
    Weiwei Cui, Jaeyoun Cho and Seungyeol Lee
  Wed-O-31-3 Robust Low-Resource Sound Localization in Correlated Noise
    Lorin Netsch and Jacek Stachurski
  Wed-O-31-4 Direction-of-Arrival Estimation of Multiple Speakers Using a Planar Array
    Dongwen Ying, Ruohua Zhou, Junfeng Li, Jielin Pan and Yonghong Yan
  Wed-O-31-5 Weighted Spatial Bispectrum Correlation Matrix for DOA Estimation in the Presence of Interferences
    Wei Xue, Shan Liang and Wenju Liu
  Wed-O-31-6 Multi-Sources Separation for Sound Source Localization
    Mariem Bouafif and Zied Lachiri

Oral Session 32 (Wed-O-32): Speech Analysis II
Wednesday 17 September 2014 16:00-18:00, Peridot 201
     
  Wed-O-32-1 Relating automatic vowel space estimates to talker intelligibility
    Yi Luan, Richard Wright, Mari Ostendorf and Gina-Anne Levow
  Wed-O-32-2 Excitation source analysis for high-quality speech manipulation systems based on an interference-free representation of group delay with minimum phase response compensation
    Hideki Kawahara, Masanori Morise, Tomoki Toda, Hideki Banno, Ryuichi Nisimura and Toshio Irino
  Wed-O-32-3 Sparse Time-Frequency Representation of Speech by the Vandermonde Transform
    Christian Fischer Pedersen and Tom Bäckström
  Wed-O-32-4 Analysis and Identification of Human Scream: Implications for Speaker Recognition
    Mahesh Kumar Nandwana and John H.L. Hansen
  Wed-O-32-5 F0 estimation in noisy speech based on long-term harmonic feature analysis combined with neural network classification
    Dongmei Wang, Philipos C. Loizou and John H.L. Hansen
  Wed-O-32-6 The Influence of Pitch and Noise on the Discriminability of Filterbank Features
    Malcolm Slaney and Michael L. Seltzer

Special Session 6 (b) (Wed-SP6b): Deep Neural Networks for Speech Generation and Synthesis II
Wednesday 17 September 2014 16:00-18:00, Peridot 206
     
  Wed-SP6b-1 Prosody Contour Prediction with Long Short-Term Memory, Bi-Directional, Deep Recurrent Neural Networks
    Raul Fernandez, Asaf Rendel, Bhuvana Ramabhadran and Ron Hoory
  Wed-SP6b-2 Modeling DCT Parameterized F0 Trajectory at Intonation Phrase Level with DNN or Decision Tree
    Xiang Yin, Ming Lei, Yao Qian, Frank K. Soong, Lei He, Zhen-Hua Ling and Li-Rong Dai
  Wed-SP6b-3 High-Order Sequence Modeling Using Speaker-Dependent Recurrent Temporal Restricted Boltzmann Machines for Voice Conversion
    Toru Nakashika, Tetsuya Takiguchi and Yasuo Ariki
  Wed-SP6b-4 Sequence Error (SE) Minimization Training of Neural Network for Voice Conversion
    Feng-Long Xie, Yao Qian, Frank Soong and Haifeng Li
  Wed-SP6b-5 Robust Articulatory Speech Synthesis using Deep Neural Networks for BCI Applications
    Florent Bocquelet, Thomas Hueber, Laurent Girin, Pierre Badin and Blaise Yvert

Poster Session 22 (Wed-P-22): Speech Synthesis II
Wednesday 17 September 2014 16:00-18:00, Max Atria Gallery
     
  Wed-P-22-1 Automatic animation of an articulatory tongue model from ultrasound images using Gaussian mixture regression
    Diandra Fabre, Thomas Hueber and Pierre Badin
  Wed-P-22-2 Articulatory Controllable Speech Modification Based on Statistical Feature Mapping with Gaussian Mixture Models
    Patrick Lumban Tobing, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura and Ayu Purwarianti
  Wed-P-22-3 Speech-Driven Head Motion Synthesis Using Neural Networks
    Ding Chuang, Zhu Pengcheng, Xie Lei, Jiang Dongmei and Fu Zhonghua
  Wed-P-22-4 Text-independent voice conversion using speaker model alignment method from non-parallel speech
    Peng Song, Yun Jin, Wenming Zheng and Li Zhao
  Wed-P-22-5 Voice Conversion Using Generative Trained Deep Neural Networks with Multiple Frame Spectral Envelopes
    Ling-Hui Chen, Zhen-Hua Ling and Lirong Dai
  Wed-P-22-6 Hierarchical modeling of F0 contours for voice conversion
    Gerard Sanchez, Hanna Silen, Jani Nurminen and Moncef Gabbouj
  Wed-P-22-7 SPEECH PROSODY GENERATION FOR TEXT-TO-SPEECH SYNTHESIS BASED ON GENERATIVE MODEL OF F0 CONTOURS
    Kento Kadowaki, Tatsuma Ishihara, Nobukatsu Hojo and Hirokazu Kameoka
  Wed-P-22-8 An Iterative Approach to Decision Tree Training for Context Dependent Speech Synthesis
    Xiayu Chen, Yang Zhang and Mark Hasegawa-Johnson
  Wed-P-22-9 Prosodic phrasing modeling for Vietnamese TTS using syntactic information
    Thi Thu Trang NGUYEN, Albert Rilliard, Do-Dat Tran and Christophe d'Alessandro
  Wed-P-22-10 Accent Type and Phrase Boundary Estimation Using Acoustic and Language Models for Automatic Prosodic Labeling
    Tomoki Koriyama, Hiroshi Suzuki, Takashi Nose, Takahiro Shinozaki and Takao Kobayashi
  Wed-P-22-11 Reconstruction of mist racked articulatory trajectories
    Qiang Fang, Jianguo Wei and Fang Hu

Poster Session 23 (Wed-P-23): Speech Representation, Detection and Classification
Wednesday 17 September 2014 16:00-18:00, Max Atria Gallery
     
  Wed-P-23-1 Phone Classification by a Hierarchy of Invariant Representation Layers
    Chiyuan Zhang, Stephen Voinea, Georgios Evangelopoulos, Lorenzo Rosasco and Tomaso Poggio
  Wed-P-23-2 A semi-Markov model for speech segmentation with an utterance-break prior
    Mark Sinclair, Peter Bell, Alexandra Birch and Fergus McInnes
  Wed-P-23-3 Speech detection in transient noises
    Gunnam Aneeja and Bayya Yegnanarayana
  Wed-P-23-4 Evaluation of dictionary for sparse coding in speech processing
    Yongjun He, Guanglu Sun, Guibin Zheng and Jiqing Han
  Wed-P-23-5 Joint Filtering and Factorization for Recovering Latent Structure from Noisy Speech Data
    Colin Vaz, Vikram Ramanarayanan and Shrikanth Narayanan
  Wed-P-23-6 A Comparison of Open-Source Segmentation Architectures for Dealing with Imperfect Data from the Media in Speech Synthesis
    Ascension Gallardo-Antolin, Juan M Montero and Simon King
  Wed-P-23-7 Read and spontaneous speech classification based on variance of GMM supervectors
    Taichi Asami, Ryo Masumura, Hirokazu Masataki and Sumitaka Sakauchi
  Wed-P-23-8 Co-channel Speech Detection via Spectral Analysis of Frequency Modulated Sub-bands
    Navid Shokouhi, Seyed Omid Sadjadi and John H.L. Hansen
  Wed-P-23-9 Word-level Invariant Representations From Acoustic Waveforms
    Stephen Voinea, Chiyuan Zhang, Georgios Evangelopoulos, Lorenzo Rosasco and Tomaso Poggio
  Wed-P-23-10 On Closed Form Calculation of Line Spectral Frequencies (LSF)
    Paul Dalsgaard and Ove Andersen
  Wed-P-23-11 Robust Features for Content-Based Audio Copy Detection
    Chahid Ouali, Pierre Dumouchel and Vishwa Gupta
  Wed-P-23-12 Binaural Deep Neural Network Classification for Reverberant Speech Segregation
    Yi Jiang, DeLiang Wang and RunSheng Liu

Poster Session 24 (Wed-P-24): Feature Extraction and Modeling for ASR
Wednesday 17 September 2014 16:00-18:00, Max Atria Gallery
     
  Wed-P-24-1 Investigating NMF Speech Enhancement for Neural Network based Acoustic Models
    Jürgen T. Geiger, Jort F. Gemmeke, Björn Schuller and Gerhard Rigoll
  Wed-P-24-2 Automatic Speech Feature Classification for Children with Cochlear Implants
    Jason Lilley, James Mahshie and H Timothy Bunnell
  Wed-P-24-3 Sequential Maximum Mutual Information Linear Discriminant Analysis for Speech Recognition
    Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux and John Hershey
  Wed-P-24-4 Model and Feature Based Compensation for Whispered Speech Recognition
    Shabnam Ghaffarzadegan, Hynek Boril and John H.L. Hansen
  Wed-P-24-5 Post-masking: A Hybrid Approach to Array Processing for Speech Recognition
    Amir Moghimi, Bhiksha Raj and Richard Stern
  Wed-P-24-6 ASR Feature Extraction with Morphologically-Filtered Power-Normalized Cochleograms
    Fernando de la Calle Silos, Francisco J. Valverde Albacete, Ascensión Gallardo Antolín and Carmen Peláez Moreno
  Wed-P-24-7 Should deep neural nets have ears? The role of auditory features in deep learning approaches
    Angel Mario Castro Martinez, Niko Moritz and Bernd T. Meyer
  Wed-P-24-8 Extending Limabeam with discrimination and coarse gradients
    Charles Fox and Thomas Hain
  Wed-P-24-9 GENERATION OF F0 CONTOUR USING DEEP BOLTZMANN MACHINE AND TWIN GAUSSIAN PROCESS HYBRID MODEL FOR BENGALI LANGUAGE
    Sankar Mukherjee and Shyamal Kumar Das Mandal
  Wed-P-24-10 Room Localization for Distant Speech Recognition
    Juan A. Morales Cordovilla, Hannes Pessentheiner, Martin Hagmüller and Gernot Kubin
  Wed-P-24-11 Posterior-based Sparse Representation for Automatic Speech Recognition
    Sara Bahaadini, Afsaneh Asaei, David Imseng and Herve Bourlard

Keynote 5: Li Deng
Thursday 18 September 2014 08:30-09:30, Garnet 213-218
     
  Keynote 5 Achievements and Challenges of Deep Learning - From Speech Analysis And Recognition To Language And Multimodal Processing
   

Oral Session 33 (Thu-O-33): Spoken Term Detection for Low-Resource Languages I
Thursday 18 September 2014 10:00-12:00, Garnet 213-218
     
  Thu-O-33-1 Query-by-Example Spoken Term Detection on Multilingual Unconstrained Speech
    Xavier Anguera, Luis Javier Rodriguez-Fuentes, Igor Szoke, Andi Buzo, Florian Metze and Mikel Penagarikano
  Thu-O-33-2 A Comparison of Multiple Methods for Rescoring Keyword Search Lists for Low Resource Languages
    Victor Soto, Lidia Mangu, Andrew Rosenberg and Julia Hirschberg
  Thu-O-33-3 Subword and Phonetic Search for Detecting Out-of-Vocabulary Keywords
    Damianos Karakos and Richard Schwartz
  Thu-O-33-4 An In-Depth Comparison of Keyword Specific Thresholding and Sum-to-One Score Normalization
    Yun Wang and Florian Metze
  Thu-O-33-5 Graph-based Re-ranking using Acoustic Feature Similarity between Search Results for Spoken Term Detection on Low-resource Languages
    Hung-yi Lee, Yu Zhang, Ekapol Chuangsuwanich and Jim Glass
  Thu-O-33-6 Developing STT and KWS systems using limited language resources
    Viet Bac Le, Lori Lamel, Abdel Messaoudi, William Hartmann, Jean-Luc Gauvain, Cécile Woehrling, Julien Despres and Anindya Roy

Oral Session 34 (Thu-O-34): Voice Conversion
Thursday 18 September 2014 10:00-12:00, Peridot 202-203
     
  Thu-O-34-1 GMM-based bandwidth extension using sub-band basis spectrum model
    Yamato Ohtani, Masatsune Tamura, Masahiro Morita and Masami Akamine
  Thu-O-34-2 A mel-cepstral analysis technique restoring high frequency components from low-sampling-rate speech
    Kazuhiro Nakamura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku and Keiichi Tokuda
  Thu-O-34-3 A Comparative Study of Spectral Transformation Techniques for Singing Voice Synthesis
    Siu-Wa Lee, Zhizheng Wu, Minghui Dong, Xiaohai Tian and Haizhou Li
  Thu-O-34-4 Application of Matrix Variate Gaussian Mixture Model to Statistical Voice Conversion
    Daisuke Saito, Hidenobu Doi, Nobuaki Minematsu and Keikichi Hirose
  Thu-O-34-5 Joint nonnegative matrix factorization for exemplar-based voice conversion
    Zhizheng Wu, Eng Siong Chng and Haizhou Li
  Thu-O-34-6 Statistical Singing Voice Conversion with Direct Waveform Modification based on the Spectrum Differential
    Kazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti and Satoshi Nakamura

Oral Session 35 (Thu-O-35): Speech and Audio Segmentation and Classification
Thursday 18 September 2014 10:00-12:00, Peridot 204-205
     
  Thu-O-35-1 Detecting proximity from personal audio recordings
    Daniel Ellis, Hiroyuki Satoh and Zhuo Chen
  Thu-O-35-2 Acoustic Event Detection and Localization with Regression Forests
    Huy Phan, Marco Maaß, Radoslaw Mazur and Alfred Mertins
  Thu-O-35-3 Multi-source Posteriors for Speech Activity Detection on Public Talks
    Marc Ferras and Herve Bourlard
  Thu-O-35-4 Analysis of Spectrogram Image Methods for Sound Event Classification
    Jonathan Dennis, Tran-Huy Dat and Eng Siong Chng
  Thu-O-35-5 Speech-based Automatic and Robust Detection of Very Early Dementia
    Aharon Satt, Ron Hoory, Alexandra Konig, Pauline Aalten and Philippe Robert
  Thu-O-35-6 On the Acoustic Environment of a Neonatal Intensive Care Unit: Initial Description, and Detection of Equipment Alarms
    Ganna Raboshchuk, Climent Nadeu, Omid Ghahabi, Sergi Solvez, Blanca Muñoz Mahamud, Ana Riverola de Veciana and Santiago Navarro Hervas

Oral Session 36 (Thu-O-36): Language Acquisition
Thursday 18 September 2014 10:00-12:00, Peridot 201
     
  Thu-O-36-1 Non-native perception of regionally accented speech in a multitalker context
    Robert A. Fox, Ewa Jacewicz and Florence Hardjono
  Thu-O-36-2 A crosslinguistic and acquisitional perspective on intonational rises in French
    Giuseppina TURCO and Elisabeth DELAIS-ROUSSARIE
  Thu-O-36-3 Error patterns of Mandarin disyllabic tones by Japanese learners
    Jung-Yueh Tu, Yuwen Hsiung, Ming-Da Wu and Yao-Ting Sung
  Thu-O-36-4 Infant-Directed Speech Enhances Temporal Rhythmic Structure in the Envelope
    Victoria Leong, Marina Kalashnikova, Denis Burnham and Usha Goswami
  Thu-O-36-5 Influences of Tone Sandhi on Word Recognition in Preschool Children
    Dilu Wewalaarachchi and Leher Singh
  Thu-O-36-6 Lexical Representation of Consonant, Vowels and Tones in Early Childhood
    Hwee Hwee Goh, Charlene Hu, Kheng Hui Yeo & Leher Singh

Oral Session 37 (Thu-O-37): Speech Perception
Thursday 18 September 2014 10:00-12:00, Peridot 206
     
  Thu-O-37-1 Audiovisual temporal sensitivity in typical and dyslexic adult readers
    Ana Francisco, Alexandra Jesse, Margriet Groen and James McQueen
  Thu-O-37-2 Aero-tactile integration in fricatives: Converting audio to air flow information for speech perception enhancement
    Donald Derrick, Greg A. O'beirne, Tom De Rybel and Jennifer Hay
  Thu-O-37-3 Relative importance of AM and FM cues for speech comprehension: Effects of speaking rate and their implications for neurophysiological processing of speech
    Guangting Mai
  Thu-O-37-4 The effect of regional and non-native accents on word recognition processes: A comparison of EEG responses in quiet to speech recognition in noise
    Louise Stringer and Paul Iverson
  Thu-O-37-5 Towards a Neural Measure of Perceptual Distance---Classification of Electroencephalographic Responses to Synthetic Vowels
    Manson Cheuk-Man Fong, James William Minett, Thierry Blu and William Shi-Yuan Wang
  Thu-O-37-6 Collecting a Corpus of Dutch Noise-induced ‘Slips of the Ear’
    Odette Scharenborg, Eric Sanders and Bert Cranen

Poster Session 25 (Thu-P-25): Language and Lexical Modeling
Thursday 18 September 2014 10:00-12:00, Max Atria Gallery
     
  Thu-P-25-1 Lexical Modeling for Arabic ASR: A Systematic Approach
    Tuka Al Hanai and James Glass
  Thu-P-25-2 Hybrid language models for speech transcription
    Luiza Orosanu and Denis Jouvet
  Thu-P-25-3 Neural Network Language Models for Low Resource Languages
    Ankur Gandhe, Florian Metze and Ian Lane
  Thu-P-25-4 Feed Forward Pre-training for Recurrent Neural Network Language Models
    Siva Reddy Gangireddy, Fergus McInnes and Steve Renals
  Thu-P-25-5 Grounding language models in spatiotemporal context
    Brandon Roy, Soroush Vosoughi and Deb Roy
  Thu-P-25-6 Direct Word Graph Rescoring Using A* Search and RNNLM
    Shahab Jalalvand and Falavigna Daniele
  Thu-P-25-7 One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling
    Ciprian Chelba, Tomas Mikolov, Thorsten Brants, Philipp Koehn, Tony Robinson, Qi Ge and Mike Schuster
  Thu-P-25-8 Integrating Sequence Information in the Audio-visual Detection of Word Prominence in a Human-machine Interaction Scenario
    Andrea Schnall and Martin Heckmann
  Thu-P-25-9 Backoff Inspired Features for Maximum Entropy Language Models
    Fadi Biadsy, Keith Hall, Pedro Moreno and Brian Roark
  Thu-P-25-10 BioKIT - Real-time decoder for biosignal processing
    Dominic Telaar, Michael Wand, Dirk Gehrig, Felix Putze, Christoph Amma, Dominic Heger, Ngoc Thang Vu, Mark Erhardt, Tim Schlippe, Matthias Janke, Christian Herff and Tanja Schultz
  Thu-P-25-11 Speech Recognition without a Lexicon - Bridging the Gap between Graphemic and Phonetic Systems
    David Harwath and James Glass

Poster Session 26 (Thu-P-26): Speech Enhancement (Single- and Multi-channel)
Thursday 18 September 2014 10:00-12:00, Max Atria Gallery
     
  Thu-P-26-1 A New Auxiliary-Vector Algorithm with Conjugate Orthogonality for Speech Enhancement
    Shengkui Zhao and Douglas Jones
  Thu-P-26-2 Acoustic characteristics of critical message utterances in noise applied to speech intelligibility enhancement
    Neehar Jathar and Preeti Rao
  Thu-P-26-3 Dynamic Noise Aware Training for Speech Enhancement Based on Deep Neural Networks
    Yong Xu, Jun Du, Lirong Dai and Chin-Hui Lee
  Thu-P-26-4 Microphone Array Post-Filtering Using Supervised Machine Learning for Speech Enhancement
    Pasi Pertilä and Joonas Nikunen
  Thu-P-26-5 NOVEL SPEECH DURATION MODIFIER FOR PACKET BASED COMMUNICATION SYSTEM
    SenthilKumar Mani, Jitendra Kumar Dhiman and Sri Rama Murty K.
  Thu-P-26-6 Experiments on Deep Learning for Speech Denoising
    Ding Liu, Paris Smaragdis and Minje Kim
  Thu-P-26-7 Single-channel Dynamic Exemplar-based Speech Enhancement
    Nasser Mohammadiha and Simon Doclo
  Thu-P-26-8 Using Hidden Markov Models for Speech Enhancement
    Akihiro Kato and Ben Milner
  Thu-P-26-9 Blind source extraction based on a direction-dependent a-priori SNR
    Lukas Pfeifenberger and Franz Pernkopf
  Thu-P-26-10 Least Squares Phase Estimation of Mixed Signals
    Carlos Eduardo Cancino Chacon and Pejman Mowlaee
  Thu-P-26-11 Speech Enhancement from Additive Noise and Channel Distortion - a Corpus-Based Approach
    Ming Ji and Danny Crookes

Poster Session 27 (Thu-P-27): Robust ASR
Thursday 18 September 2014 10:00-12:00, Max Atria Gallery
     
  Thu-P-27-1 Robust speech recognition in reverberant environments using subband-based steady-state monaural and binaural suppression
    Hyung-Min Park, Matthew Maciejewski, Chanwoo Kim and Richard Stern
  Thu-P-27-2 Variable-Component Deep Neural Network for Robust Speech Recognition
    Rui Zhao, Jinyu Li and Yifan Gong
  Thu-P-27-3 Effective Modulation Spectrum Factorization for Robust Speech Recognition
    Yu-chen Kao, Yi-Ting Wang and Berlin Chen
  Thu-P-27-4 Hybrid MLP/Structured-SVM Tandem Systems for Large Vocabulary and Robust ASR
    Suman Ravuri
  Thu-P-27-5 Robust speech recognition using temporal masking and thresholding algorithm
    Chanwoo Kim, Kean Chin, Michiel Bacchiani and Richard Stern
  Thu-P-27-6 Deep Neural Network Bottleneck Features For Generalized Variable Parameter HMMs
    Xurong Xie, Rongfeng Su, Xunying Liu and Lan Wang
  Thu-P-27-7 A Novel Dynamic Parameters Calculation Approach For Model Compensation
    Suliang Bu, Yanmin Qian and Kai Yu
  Thu-P-27-8 Speech recognition based on Itakura-Saito divergence and dynamics / sparseness constraints from mixed sound of speech and music by non-negative matrix factorization
    Naoaki Hashimoto, Shoichi Nakano, Kazumasa Yamamoto and Seiichi Nakagawa
  Thu-P-27-9 Noise Robust Speech Recognition Based on Noise-adapted HMMs Using Speech Feature Compensation
    Yongjoo Chung
  Thu-P-27-10 Noise Spectrum Estimation using Gaussian Mixture Model-based Speech Presence Probability for Robust Speech Recognition
    Md Jahangir Alam, Patrick Kenny, Pierre Dumouchel and Douglas O'Shaughnessy

Oral Session 38 (Thu-O-38): Spoken Term Detection for Low-Resource Languages II
Thursday 18 September 2014 13:30-15:30, Garnet 213-218
     
  Thu-O-38-1 Comparing Decoding Strategies for Subword-based Keyword Spotting in Low-Resourced Languages
    William Hartmann, Viet-Bac Le, Abdel Messaoudi, Lori Lamel and Jean-Luc Gauvain
  Thu-O-38-2 Strategies for Rescoring Keyword Search Results Using Word-Burst and Acoustic Features
    Min Ma, Justin Richards, Victor Soto, Julia Hirschberg and Andrew Rosenberg
  Thu-O-38-3 Word-based Probabilistic Phonetic Retrieval for Low-resource Spoken Term Detection
    Di Xu and Florian Metze
  Thu-O-38-4 A Keyword-Boosted sMBR Criterion to Enhance Keyword Search Performance in Deep Neural Network Based Acoustic Modeling
    I-Fan Chen, Nancy Chen and Chin-Hui Lee
  Thu-O-38-5 Combination of FST and CN Search in Spoken Term Detection
    Justin Chiu, Yun Wang, Jan Trmal, Dan Povey, Guoguo Chen and Alexander Rudnicky
  Thu-O-38-6 Low-Resource Open Vocabulary Keyword Search Using Point Process Models
    Chunxi Liu, Aren Jansen, Guoguo Chen, Keith Kintzley, Jan Trmal and Sanjeev Khudanpur

Oral Session 39 (Thu-O-39): Speech Coding and Transmission
Thursday 18 September 2014 13:30-15:30, Peridot 202-203
     
  Thu-O-39-1 Decorrelated Innovative Codebooks for ACELP Using Factorization of Autocorrelation Matrix
    Tom Bäckström and Christian R. Helmrich
  Thu-O-39-2 Stress and Accent Transmission In HMM-Based Syllable-Context Very Low Bit Rate Speech Coding
    Milos Cernak, Alexandros Lazaridis, Philip N. Garner and Petr Motlicek
  Thu-O-39-3 Subjective Voice Quality Evaluation of Artificial Bandwidth Extension: Comparing Different Audio Bandwidths and Speech Codecs
    Hannu Pulakka, Anssi Rämö, Ville Myllylä, Henri Toukomaa and Paavo Alku
  Thu-O-39-4 Stereo Acoustic Echo Suppression Using Widely Linear Filtering in the Frequency Domain
    Zhong-Hua Fu and Lei Xie
  Thu-O-39-5 Enhanced Muting Method in Packet Loss Concealment of ITU-T G.722 Using Sigmoid Function with On-line Optimized Parameters
    Bong-Ki Lee, Inyoung Hwang, Jihwan Park and Joon-Hyuk Chang
  Thu-O-39-6 A ROBUST STEP-SIZE CONTROL ALGORITHM FOR FREQUENCY DOMAIN ACOUSTIC ECHO CANCELLATION
    Chao Wu, Kai Yu Jiang, Yanmeng Guo, Qiang Fu and Yonghong Yan

Oral Session 40 (Thu-O-40): Speech Enhancement
Thursday 18 September 2014 13:30-15:30, Peridot 204-205
     
  Thu-O-40-1 Multi-channel speech enhancement using sparse coding on local time-frequency structures
    Zhiyuan Zhou, Zhaogui Ding, Weifeng Li, Zhiyong Wu, Longbiao Wang and Qingmin Liao
  Thu-O-40-2 Multichannel Speech Dereverberation Based on Convolutive Nonnegative Tensor Factorization for ASR Applications
    Seyedmahdad Mirsamadi and John H.L. Hansen
  Thu-O-40-3 Speech enhancement by low-rank and convolutive dictionary spectrogram decomposition
    Zhuo Chen, Brian Mcfee and Daniel Ellis
  Thu-O-40-4 Multiple-order non-negative matrix factorization for speech enhancement
    Xabier Jaureguiberry, Emmanuel Vincent and Gaël Richard
  Thu-O-40-5 NMF-based Speech Enhancement Incorporating Deep Neural Network
    Tae Gyoon Kang, Kisoo Kwon, Jong Won Shin and Nam Soo Kim
  Thu-O-40-6 A Data-Driven Approach to Speech Enhancement using Gaussian Process
    Sukanya Sonowal, Kisoo Kwon, Nam Soo Kim and Jong Won Shin

Oral Session 41 (Thu-O-41): Unsupervised or Corrective Lexical Modeling
Thursday 18 September 2014 13:30-15:30, Peridot 201
     
  Thu-O-41-1 Error Correction of Automatic Speech Recognition based on Normalized Web Distance
    Enkhbolor Byambakhishig, Katsuyuki Tanaka, Ryo Aihara, Toru Nakashika, Tetsuya Takiguchi and Yasuo Ariki
  Thu-O-41-2 Unsupervised Training Methods for Discriminative Language Modeling
    Erinç Dikici and Murat Saraclar
  Thu-O-41-3 Building A Vocabulary Self-Learning Speech Recognition System
    Long Qin and Alexander Rudnicky
  Thu-O-41-4 Methods for Efficient Semi-Automatic Pronunciation Dictionary Bootstrapping
    Tim Schlippe, Matthias Merz and Tanja Schultz
  Thu-O-41-5 RAPIDLY BUILDING DOMAIN-SPECIFIC ENTITY-CENTRIC LANGUAGE MODELS USING SEMANTIC WEB KNOWLEDGE SOURCES
    Murat Akbacak, Dilek Hakkani-Tur and Gokhan Tur
  Thu-O-41-6 Context-dependent Pronunciation Error Pattern Discovery with Limited Annotations
    Ann Lee and Jim Glass

Oral Session 42 (Thu-O-42): Meta Data
Thursday 18 September 2014 13:30-15:30, Peridot 206
     
  Thu-O-42-1 Detecting speaker roles and topic changes in multiparty conversations using latent topic models
    Ashtosh Sapru and Herve Bourlard
  Thu-O-42-2 A Deep Neural Network Approach for Sentence Boundary Detection in Broadcast News
    Chenglin Xu, Lei Xie, Guangpu Huang, Xiong Xiao, Eng Siong Chng and Haizhou Li
  Thu-O-42-3 VARIABLE SPAN DISFLUENCY DETECTION IN ASR TRANSCRIPTS
    Rahul Gupta, Sankaranarayanan Ananthakrishnan, Zhaojun Yang and Shrikanth Narayanan
  Thu-O-42-4 A CRF-based Approach to Automatic Disfluency Detection in a French Call-Centre Corpus
    Camille Dutrey, Chloé Clavel, Sophie Rosset, Ioana Vasilescu and Martine Adda-Decker
  Thu-O-42-5 Multi-pass sentence-end detection of lecture speech
    Madina Hasan, Rama Sanand Doddipatla and Thomas Hain
  Thu-O-42-6 Multi-Domain Disfluency and Repair Detection
    Victoria Zayats, Mari Ostendorf and Hannaneh Hajishirzi

Poster Session 28 (Thu-P-28): Speech Synthesis III
Thursday 18 September 2014 13:30-15:30, Max Atria Gallery
     
  Thu-P-28-1 Enabling Controllability for Continuous Expression Space
    Langzhou Chen and Norbert Braunschweiler
  Thu-P-28-2 Analysis of Spectral Enhancement Using Global Variance in HMM-Based Speech Synthesis
    Takashi Nose and Akinori Ito
  Thu-P-28-3 Intelligibility analysis of fast synthesized speech
    Cassia Valentini-Botinhao, Markus Toman, Michael Pucher, Dietmar Schabus and Junichi Yamagishi
  Thu-P-28-4 Speech synthesis reactive to dynamic noise environmental conditions
    Susana Palmaz López-Peláez and Robert Clark
  Thu-P-28-5 Partial Representations Improve the Prosody of Incremental Speech Synthesis
    Timo Baumann
  Thu-P-28-6 Dialogue Context Sensitive Speech Synthesis using Factorized Decision Trees
    Pirros Tsiakoulis, Catherine Breslin, Milica Gasic, Matthew Henderson, Dongho Kim and Steve Young
  Thu-P-28-7 Concept-to-Speech Generation by Integrating Syntagmatic Features into HMM-Based Speech Synthesis
    Xin Wang, Zhen-Hua Ling and Li-Rong Dai
  Thu-P-28-8 On the role of missing data imputation and NMF feature enhancement in building synthetic voices using reverberant speech
    Dhananjaya Gowda, Heikki Kallasjoki, Kalle Palomaki, Reima Karhila, Mikko Kurimo, Cristian Contan and Mircea Giurgiu
  Thu-P-28-9 Objective Evaluation of HMM-based Speech Synthesis System Using Kullback-Leibler Divergence
    Cong-Thanh Do, Marc Evrard, Adrien Leman, Christophe d'Alessandro, Albert Rilliard and Jean-Luc Crebouw
  Thu-P-28-10 Speech intonation for TTS: Study on evaluation methodology
    Javier Latorre, Kayoko Yanagisawa, Vincent Wan, BalaKrishna Kolluru and Mark J.F. Gales

Poster Session 29 (Thu-P-29): Adaptation
Thursday 18 September 2014 13:30-15:30, Max Atria Gallery
     
  Thu-P-29-1 Speaker age estimation for elderly speech recognition in European Portuguese
    Thomas Pellegrini, Vahid Hedayati, Isabel Trancoso, Annika Hämäläinen and Miguel Sales Dias
  Thu-P-29-2 Unsupervised Model Selection for Recognition of Regional Accented Speech
    Maryam Najafian, Andrea DeMarco, Stephen Cox and Martin Russell
  Thu-P-29-3 Speaker Adaptation Based on Sparse and Low-rank Eigenphone Matrix Estimation
    Wen-Lin Zhang, Dan Qu, Wei-Qiang Zhang and Bi-Cheng Li
  Thu-P-29-4 MULTI-ACCENT DEEP NEURAL NETWORK ACOUSTIC MODEL WITH ACCENT SPECIFIC TOP LAYER USING THE KLD-REGULARIZED MODEL ADAPTATION
    Yan Huang, Dong Yu, Chaojun Liu and Yifan Gong
  Thu-P-29-5 A Low Complexity Model Adaptation Approach involving Sparse Coding over Multiple Dictionaries
    S Shahnawazuddin and Rohit Sinha
  Thu-P-29-6 Effect of frequency weighting on MLP-based speaker canonicalization
    Yuichi Kubota, Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi and Tsuneo Nitta
  Thu-P-29-7 Feature Space Maximum A Posteriori Linear Regression for Adaptation of Deep Neural Networks
    Zhen Huang, Jinyu Li, Sabato Marco Siniscalchi, I-Fan Chen, Chao Weng and Chin-Hui Lee
  Thu-P-29-8 Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing
    Natalia Tomashenko and Yuri Khokhlov
  Thu-P-29-9 BUT 2014 Babel System: Analysis of adaptation in NN based systems
    Martin Karafiat, Frantisek Grezl, Karel Vesely, Mirko Hannemann, Igor Szoke and Honza Cernocky
  Thu-P-29-10 Speaker adaptation of DNN-based ASR with i-vectors: Does it actually adapt models to speakers?
    Mickael Rouvier and Benoit Favre

Poster Session 30 (Thu-P-30): Language Recognition
Thursday 18 September 2014 13:30-15:30, Max Atria Gallery
     
  Thu-P-30-1 Task-aware Deep Bottleneck Features for Spoken Language Identification
    Bing Jiang, Yan Song, Si Wei, Ian McLoughlin and Li-Rong Dai
  Thu-P-30-2 Virtual Example for Phonotactic Language Recognition
    Rong Tong, Bin Ma and Haizhou Li
  Thu-P-30-3 Phonotactic Language Identification Based on Time-Gap-Weighted Lattice Kernels
    Wei-wei Liu, Wei-Qiang Zhang and Jia Liu
  Thu-P-30-4 UBM Fused Total Variability Modeling for Language Identification
    Maarten Van Segbroeck, Ruchir Travadi and Shrikanth Narayanan
  Thu-P-30-5 On the Complementarity of Short-Time Fourier Analysis Windows of Different Lengths for Improved Language Recognition
    Mireia Diez, Mikel Penagarikano, German Bordel, Amparo Varona and Luis Javier Rodriguez-Fuentes
  Thu-P-30-6 Modified-prior i-Vector Estimation for Language Identification of Short Duration Utterances
    Ruchir Travadi, Maarten Van Segbroeck and Shrikanth Narayanan
  Thu-P-30-7 Language Recognition using Phonotactic-based Shifted Delta Coefficients and Multiple Phone Recognizers
    Luis Fernando D’Haro, Ricardo Cordoba, Christian Salamea and Javier Ferreiros
  Thu-P-30-8 PLLR Features in Language Recognition System for RATS
    Oldrich Plchot, Mireia Diez, Mehdi Soufifar and Lukas Burget
  Thu-P-30-9 Language Identification of Code Switching Sentences and Multilingual Sentences of Under-Resourced Languages by Using Multi Structural Word Information
    Yin-Lai Yeong and Tien-Ping Tan



Diamond Sponsors

Gold Sponsors



Silver Sponsors

Bronze Sponsors

Corporate Partner

Supporters

Media Partners


Copyright © 2013-2015 Chinese and Oriental Languages Information Processing Society
Conference managed by Meeting Matters International