SLS :: Publications

MIT Computer Science and Artificial Intelligence Laboratory

SLS PUBLICATIONS

Papers (1998 - present)

2025 | 2024 | 2023 | 2022 | 2021 | 2020 | 2019 | 2018 | 2017 | 2016 | 2015 | 2014 | 2013 | 2012 | 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998

Many of our papers are available below in Adobe Acrobat (PDF) format and possibly gzip'd PostScript format.

Stephanie Seneff's health-related publications can be found by visiting her Computer Science and Artificial Intelligence Laboratory home page here.

2025

H. Chang, S. Bhati, J. Glass, and A. Liu, "USAD: Universal Speech and Audio Representation via Distillation," Proc. ASRU, Honolulu, Hawaii, USA, December 2025. (PDF)

A. Rouditchenko, S. Bhati, E. Araujo, S. Thomas, H. Kuehne, R. Feris, and J. Glass, "Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?", Proc. ASRU, Honolulu, Hawaii, USA, December 2025. (PDF)

L. Wang, S. Bhati, C. Karjadi, R. Au, and J. Glass, "Recognizing Dementia from Neuropsychological Tests with State Space Models," Proc. ASRU, Honolulu, Hawaii, USA, December 2025. (PDF)

K. Li, Y. Li, T. Zhang, H. Luo, X. Wu, J. Glass, and H. Meng, "RAG-Zeval: Enhancing RAG Responses Evaluator through End-to-End Reasoning and Ranking-Based Reinforcement Learning," Proc. EMNLP, pp. 24925-24943, Suzhou, China, November 2025. (PDF)

A. Ben-Kish, I. Zimerman, M. J. Mirza, L. Wolf, J. Glass, L. Karlinsky, and R. Giryes, "Overflow Prevention Enhances Long-Context Recurrent LLMs," Proc. COLM, Montreal, Canada, October 2025. (PDF)

H. Chang, H. Gong, C. Wang, J. Glass, and Y. Chung, "DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models," Proc. Interspeech, Rotterdam, The Netherlands, August 2025. (PDF)

W. Fang, Y. Zhang, K. Qian, J. Glass, and Y. Zhu, "PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play," Proc. ACL, pp. 26274-26290, Vienna, Austria, July 2025. (PDF)

Y. Chuang, B. Cohen-Wang, S. Shen, Z. Wu, H. Xu, X. Lin, J. Glass, S. Li, and W. Yih, "SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models," Proc. ICML, Vancouver, Canada, July 2025. (PDF)

A. Rouditchenko, S. Thomas, H. Kuehne, R. Feris, and J. Glass, "mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition," IEEE Signal Processing Letters, vol. 32, pp. 2144-2148, 2025, doi: 10.1109/LSP.2025.3569210. (PDF)

E. Araujo, A. Rouditchenko, Y. Gong, S. Bhati, S. Thomas, B. Kingsbury, L. Karlinsky, R. Feris, J. Glass, and H. Kuehne, "CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment," Proc. CVPR, pp. 18794-18803, Nashville, Tennessee, USA, June 2025. (PDF)

P. Schroeder, N. Morgan, H. Luo, and J. Glass, "THREAD: Thinking Deeper with Recursive Spawning," Proc. NAACL-HLT, pp. 8418-8442, Albuquerque, New Mexico, USA, April 2025. (PDF)

2024

Y. Chuang, L. Qiu, C. Hsieh, R. Krishna, Y. Kim, and J. Glass, "Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps," Proc. EMNLP, pp. 1419-1436, Miami, Florida, USA, Nov 2024. (PDF)

A. Rouditchenko, Y. Gong, S. Thomas, L. Karlinsky, H. Kuehne, R. Feris, and J. Glass, "Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation," Proc. Interspeech, pp. 2420-2424, Kos Island, Greece, September 2024. (PDF)

H. Chang and J. Glass, "R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces," Proc. NAACL, pp. 642-662, Mexico City, Mexico, June 2024. (PDF)

Y. Chuang, Y. Xie, H. Luo, J. Glass, and P. He, "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models," Proc. ICLR, Vienna, Austria, May 2024. (PDF)

Z. Hong, I. Shenfeld, T. Wang, Y. Chuang, A. Pareja, J. Glass, A. Srivastava, and P. Agrawal, "Curiosity-driven Red-teaming for Large Language Models," Proc. ICLR, Vienna, Austria, May 2024. (PDF)

Y. Gong, H. Luo, A. Liu, L. Karlinsky, and J. Glass, "Listen, Think, and Understand," Proc. ICLR, Vienna, Austria, May 2024. (PDF)

H. Chang, N. Dong, R. Mavlyutov, S. Popuri, and Y. Chung, "COLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders," Proc. ICASSP, pp. 10801-10805, Seoul, Korea, April 2024. (PDF)

A. Liu, S. Yeh, and J. Glass, "Revisiting Self-Supervised Learning of Speech Representation from a Mutual Information Perspective," Proc. ICASSP, pp. 12051-12055, Seoul, Korea, April 2024. (PDF)

S. Yang, H. Chang, Z. Huang, A. Liu, C. Lai, H. Wu, J. Shi, X. Chang, H. Tsai, W. Huang, T. Feng, P. Chi, Y. Lin, Y. Chuang, T. Huang, W. Tseng, K. Lakhotia, A. Mohamed, S. Li, S. Watanabe, and H. Lee, "A Large-Scale Evaluation of Speech Foundation Models," IEEE/ACM Transactions on Audio, Speech, and Language Processing, 32, pp. 2884-2899, April 2024. (PDF)

L. Wang, M. Hasegawa-Johnson, and C. Yoo, "Unsupervised Speech Recognition with N-Skipgram and Positional Unigram Matching," Proc. ICASSSP, pp. 10936-10940, Seoul, Korea, April 2024. (PDF)

W. Fang, Y. Chuang, and J. Glass, "Joint Inference of Retrieval and Generation for Passage Re-ranking," Proc. EACL, pp. 2289-2298, St. Julian's, Malta, March 2024. (PDF)

2023

Y. Gong, A. Liu, H. Luo, L. Karlinsky, and J. Glass, "Joint Audio and Speech Understanding," Proc. ASRU, Taipei, Taiwan, December 2023. (PDF)

C. Lai, F. Shi, P. Peng, Y. Kim, K. Gimpel, S. Chang, Y. Chuang, S. Bhati, D. Cox, D. Harwath, Y. Zhang, K. Livescu, and J. Glass, "Audio-Visual Neural Syntax Acquisition," Proc. ASRU, Taipei, Taiwan, December 2023. (PDF)

C. Lai, Z. Lu, L. Cao, and R. Pang, "Training Speech Recognition Models to Follow Instructions," Proc. NeurIPS Workshop on Instruction Tuning and Instruction Following, New Orleans, USA, December 2023. (PDF)

A. Liu, H. Chang, M. Auli, W. Hsu, and J. Glass, "DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning," Proc. NeurIPS, New Orleans, USA, December 2023. (PDF)

H. Luo, T. Zhang, Y. Chuang, Y. Gong, Y. Kim, X. Wu, H. Meng, and J.Glass, "Search Augmented Instruction Learning," Proc. EMNLP, pp. 3717-3729, Singapore, December 2023. (PDF)

H. Chang, A. Liu, and J. Glass, "Self-Supervised Fine-Tuning for Improved Content Representations by Speaker-Invariant Clustering," Proc. Interspeech, pp. 2983-2987, Dublin, Ireland, August 2023. (PDF)

Y. Gong, S. Khurana, L. Karlinsky, and J. Glass, "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers," Proc. Interspeech, pp. 2798-2802, Dublin, Ireland, August 2023. (PDF)

A. Rouditchenko, S. Khurana, S. Thomas, R. Feris, L. Karlinsky, H. Kuehne, D. Harwath, B. Kingsbury, and J. Glass, "Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages," Proc. Interspeech, pp. 2268-2272, Dublin, Ireland, August 2023. (PDF)

Y. Chuang, W. Fang, S. Li, W. Yih, and J. Glass, "Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answering," Proc. ACL, pp. 12131-12147, Toronto, Canada, July 2023. (PDF)

N. Dawalatabad, S. Khurana, A. Laurent, and J. Glass, "On Unsupervised Uncertainty-Driven Speech Pseudo-Label Filtering and Model Calibration," Proc. ICASSP, Rhodes, Greece, June 2023. (PDF)

A. Rouditchenko, Y. Chuang, N. Shvetsova, S. Thomas, R. Feris, B. Kingsbury, L. Karlinsky, D. Harwath, H. Kuehne, and J. Glass,"C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval," Proc. ICASSP, Rhodes, Greece, June 2023. (PDF)

Y. Gong, A. Rouditchenko, A. Liu, D. Harwath, L. Karlinsky, H. Kuehne, and J. Glass, "Contrastive Audio-Visual Masked Autoencoder," Proc. ICLR, Kigali, Rwanda, May 2023. (PDF)

2022

N. Dawalatabad, Y. Gong, S. Khurana, R. Au, and J. Glass, "Detecting Dementia from Long Neuropsychological Interviews," Proc. EMNLP, pp. 5299-5312, Abu Dhabi, December 2022. (PDF)

Y. Gong, A. Liu, A. Rouditchenko, and J. Glass, "UAVM: Towards Unifying Audio and Visual Models," IEEE Signal Processing Letters, vol 29, pp. 2437-2441. (PDF)

Y. Fu, Y. Zhang, K. Qian, Z. Ye, Z. Yu, C. Lai, and Y. Lin, "Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing," Proc. NeurIPS, New Orleans, Louisiana, USA, November 2022. (PDF)

A. Liu, C. Lai, W. Hsu, M. Auli, A. Baevski, and J. Glass, "Simple and Effective Unsupervised Speech Synthesis," Proc. Interspeech, pp. 843-847, Incheon, Korea, September 2022. (PDF)

K. Qian, Y. Zhang, H. Gao, J. Ni, C. Lai, D. Cox, M. Hasegawa-Johnson, and S. Chang, "ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers," Proc. ICML, Baltimore, Maryland, USA, July 2022. (PDF)

Y. Chuang, R. Dangovski, H. Luo, Y. Zhang, S. Chang, M. Soljacic, S. Li, S. Yih, Y. Kim, and J. Glass, "DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings," Proc. NAACL-HLT, pp. 4207-4218, Seattle, Washington, USA, July 2022. (PDF)

H. Tsai, H. Chang, W. Huang, Z. Huang, K. Lakhotia, S. Yang, S. Dong, A. Liu, C. Lai, J. Shi, X. Chang, P. Hall, H. Chen, S. Li, S. Watanabe, A. Mohamed, and H. Lee, "SUPERB-SG: Enhanced Speech Processing Universal PERformance Benchmark for Semantic and Generative Capabilities," Proc. of the 60th Annual Meeting of the Association for Computational Linguistics, pp. 8479-8492, Dublin, Ireland, May 2022. (PDF)

A. Liu, S. Jin, C. Lai, A. Rouditchenko, A. Oliva, and J. Glass, "Cross-Modal Discrete Representation Learning," Proc. of the 60th Annual Meeting of the Association for Computational Linguistics, pp. 3013-3035, Dublin, Ireland, May 2022. (PDF)

C. Lai, E. Cooper, Y. Zhang, S. Chang, K. Qian, Y. Liao, Y. Chuang, A. Liu, J. Yamagishi, D. Cox, and J. Glass, "On the Interplay between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis," Proc. ICASSP, pp. 8447-8451, Singapore, May 2022. (PDF)

R. Haulcy, K. Placek, B. Tracey, A. Vogel, and J. Glass, "Repetition Assessment for Speech and Language Disorders: A Study of the Logopenic Variant of Primary Progressive Aphasia," Proc. ICASSP, pp. 6932-6936, Singapore, May 2022. (PDF)

Y. Gong, J. Yu, and J. Glass, "Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition," Proc. ICASSP, pp.151.155, Singapore, May 2022. (PDF)

Y. Gong, Z. Chen, I. Chu, P. Chang, and J. Glass, "Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment," Proc. ICASSP, pp. 7262-7266, Singapore, May 2022. (PDF)

Y. Gong, C. Lai, Y. Chung, and J. Glass, "SSAST: Self-Supervised Audio Spectrogram Transformer," Proc. 36th AAAI Conference on Artificial Intelligence (AAAI-22), pp. 10699-10709, February 2022. (PDF)

2021

C. Lai, Y. Zhang, A. Liu, S. Chang, Y. Liao, Y. Chuang, K. Qian, S. Khurana, D. Cox and J. Glass, "PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition," Proc. of 35th Conference on Neural Information Processing Systems (NeurIPS), December 2021. (PDF)

Y. Gong, Y. Chung, and J. Glass, "PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3292-3306. October 2021. (PDF)

S. Yang, P. Chi, Y. Chuang, C. Lai, K. Lakhotia, Y. Lin, A. Liu, J. Shi, X. Chang, G. Lin, T. Huang, W. Tseng, K. Lee, D. Liu, Z. Huang, S. Dong, S. Li, S. Watanabe, A. Mohamed, and H. Lee, "SUPERB: Speech Processing Universal PERformance Benchmark," Proc. Interspeech, pp. 1194-1198, September 2021. (PDF)

R. Haulcy and J. Glass, "CLAC: A Speech Corpus of Healthy English Speakers," Proc. Interspeech, pp. 2966-2970, September 2021. (PDF)

Y. Gong, Y. Chung, and J. Glass, "AST: Audio Spectrogram Transformer," Proc. Interspeech, pp. 571-575, September 2021. (PDF)

A. Liu, Y. Chung, and J. Glass, "Non-autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies," Proc. Interspeech, pp. 3730-3734, September 2021. (PDF)

I. Palmer, A. Rouditchenko, A. Barbu, B. Katz, and J. Glass, "Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset," Proc. Interspeech, pp. 3650-3654, September 2021. (PDF)

A. Rouditchenko, A. Boggust, D. Harwath, S. Thomas, H. Kuehne, B. Chen, R. Panda, R. Feris, B. Kingsbury, M. Picheny, and J. Glass, "Cascaded Multilingual Audio-Visual Learning from Videos," Proc. Interspeech, pp. 3006-3010, September 2021. (PDF)

A. Rouditchenko, A. Boggust, D. Harwath, B. Chen, D. Joshi, S. Thomas, K. Audhkhasi, H. Kuehne, R. Panda, R. Feris, B. Kingsbury, M. Picheny, A. Torralba, and J. Glass, "AVLnet: Learning Audio-Visual Language Representations from Instructional Videos," Proc. Interspeech, pp. 1584-1588, September 2021. (PDF)

W. Hsu, D. Harwath, T. Miller, C. Song, and J. Glass, "Text-Free Image-to-Speech Synthesis Using Learned Segmental Units," Proc. ACL-IJCNLP, pp. 5284-5300, August 2021. (PDF)

Y. Chung, C. Zhu, and M. Zeng, "SPLAT: Speech-language Joint Pre-training for Spoken Language Understanding," Proc. NAACL-HLT, pp. 1897-1907, June 2021. (PDF)

Y. Chung, Y. Belinkov, and J. Glass, "Similarity Analysis of Self-Supervised Speech Representations," Proc. ICASSP, pp. 3040-3044, June 2021. (PDF)

C. Lai, Y. Chuang, H. Lee, S. Li, and J. Glass, "Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining," Proc. ICASSP, pp. 7468-7472, June 2021. (PDF)

T. He, B. McCann, C. Xiong, and E. Hosseini-Asl, "Joint Energy-based Model Training for Better Calibrated Natural Language Understanding Models," Proc. EACL, pp. 1754-1761, April 2021. (PDF)

T. He, J. Liu, K. Cho, M. Ott, B. Liu, J. Glass, and F. Peng, "Analyzing the Forgetting Problem in the Pretrain-Finetuning of Dialogue Response Models," Proc. EACL, pp. 1121-1133, April 2021. (PDF)

R. Haulcy and J. Glass, "Classifying Alzheimer's Disease Using Audio and Text-Based Representations of Speech," Frontiers in Psychology, 15 January 2021. (PDF)

G. Da San Martino, S. Cresci, A. Barrón-Cedeño, S. Yu, R. Di Pietro, and P. Nakov, "A Survey on Computational Propaganda Detection," Proc. IJCAI, pp. 4826-4832, January 2021. (PDF)

2020

H. Lin, C. Karjadi, T. F. A. Ang, J. Prajakta, C. McManus, T. Alhanai, J. Glass, and R. Au, "Identification of Digital Voice Biomarkers for Cognitive Health," Explore Med 2020; 1: 406-417. doi:10.37349/emed.2020.00028. (PDF)

M. Nadeem, T. He, K. Cho, and J. Glass, "A Systematic Characterization of Sampling Algorithms for Open-Ended Language Generation," Proc. 1st Conference of the Asia-Pacific Chapter of the ACL and the 10th International Joint Conference on Natural Language Processing, pp. 334-346, December 2020. (PDF)

Y. Chung and H. Lin, "Cost-Sensitive Deep Learning with Layer-Wise Cost Estimation," Proc. International Conference on Technologies and Applications of Artificial Intelligence (TAAI), pp. 108-113, December 2020. (PDF)

W. Hsu, D. Harwath, C. Song, and J. Glass, "Text-Free Image-to-Speech Synthesis Using Learned Segmental Units," Proc. NeurIPS Workshop on Self-Supervised Learning for Speech and Audio Processing, December 2020. (PDF)

C. Lai, J. Cao, S. Bodapati, and S. Li, "Towards Semi-Supervised Semantics Understanding from Speech," Proc. NeurIPS Workshop on Self-Supervised Learning for Speech and Audio Processing, December 2020. (PDF)

E. Cooper, C. Lai, Y. Yasuda, J. Yamagishi, "Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?," Proc. Interspeech, pp. 3979-3983, October 2020. (PDF)

Y. Zhao, H. Li, C. Lai, J. Williams, E. Cooper, and J. Yamagishi, "Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction," Proc. Interspeech, pp. 4417-4421, October 2020. (PDF)

Y. Chung, H. Tang, and J. Glass, "Vector-quantized Autoregressive Predictive Coding," Proc. Interspeech, pp. 3760-3764, October 2020. (PDF)

S. Khurana, A. Laurent, W. Hsu, J. Chorowski, A. Lancucki, R. Marxer, and J. Glass, "A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning," Proc. Interspeech, pp. 3790-3794, October 2020. (PDF)

A. Lancucki, J. Chorowski, G. Sanchez, R. Mirxer, N. Chen, H. J. G. A> Dolfing, S. Khurana, T. Alumae, and A. Laurent, "Robust Training of Vector Quantized Bottleneck Models," Proc. International Joint Conference on Neural Networks (IJCNN), July 2020. (PDF)

T. He and J. Glass, "Negative Training for Neural Dialogue Response Generation," Proc. ACL, pp. 2044-2058, July 2020. (PDF)

Y. Chung and J. Glass, "Improved Speech Representations with Multi-Target Autoregressive Predictive Coding," Proc. ACL, pp. 2353-2358, July 2020. (PDF)

G. Da San Martino, S. Shaar, Y, Zhang, S. Yu, A. Barrón-Cedeño, and P. Nakov, "Prta: A System to Support the Analysis of Propaganda Techniques in the News," Proc. ACL, pp. 287-293, July 2020. (PDF)

E. Cooper, C. Lai, Y. Yasuda, F. Fang, X. Wang, N. Chen, and J. Yamagishi, "Zero-Shot Multi-Speaker Text-to-Speech with State-of-the-Art Neural Speaker Embeddings," Proc. ICASSP, pp. 6184-6188, Barcelona, Spain, May 2020. (PDF)

Y. Chung and J. Glass, "Generative Pre-Training for Speech with Autoregressive Predictive Coding," Proc. ICASSP, pp. 3497-3501, Barcelona, Spain, May 2020. (PDF)

D. Harwath, W. Hsu, and J. Glass, "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech," Proc. ICLR, April 2020. (PDF)

J. Zhang, T. He, S. Sra, and A. Jadbabaie, "Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity," Proc. ICLR, April 2020. (PDF)

2019

J. Drexler and J. Glass, "Explicit Alignment of Text and Speech Encodings for Attention-Based End-to-End Speech Recognition," Proc. ASRU, pp. 913-919, Sentosa, Singapore, December 2019. (PDF)

A. Ali, S. Shon, Y. Samih, H. Mubarak, A. Abdelali, J. Glass, S. Renals, and K. Choukri, "The MGB-5 Challenge: Recognition and Dialect Identification of Dialectal Arabic Speech," Proc. ASRU, pp. 1026-1033, Sentosa, Singapore, December 2019. (PDF)

S. Yu, G. Da San Martino, and P. Nakov, "Experiments in Detecting Persuasion Techniques in the News," NeurIPS 2019 Workshop on AI for Social Good (AISG), Vancouver, Canada, December 2019. (PDF)

M. Mohtarami, J. Glass, and P. Nakov, "Contrastive Language Adaptation for Cross-Lingual Stance Detection," Proc. EMNLP, pp. 4441-4451, Hong Kong, China, November 2019. (PDF)

G. Da San Martino, S. Yu, A. Barrón-Cedeño, R. Petrov, and P. Nakov, "Fine-Grained Analysis of Propaganda in News Articles," Proc. EMNLP, pp. 5635-5645, Hong Kong, China, November 2019. (PDF)

F. Grondin and J. Glass, "Fast and Robust 3-D Sound Source Localization with DSVD-PHAT," Proc. International Conference on Intelligent Robots and Systems (IROS), pp. 5352-5357, Macau, China, November 2019. (PDF)

W. Fang, M. Nadeem, M. Mohtarami, and J. Glass, "Neural Multi-Task Learning for Stance Prediction," Proc. of 2nd Workshop on Fact Extraction and Verification (FEVER), pp. 13-19, Hong Kong, November 2019. (PDF)

F. Grondin and J. Glass, "Multiple Sound Localization with SVD-PHAT," Proc. Interspeech, pp. 2698-2702, Graz, Austria, September 2019. (PDF)

L. Ford, H. Tang, F. Grondin, and J. Glass, "A Deep Residual Network for Large-Scale Acoustic Scene Analysis," Proc. Interspeech, pp.2568-2572, Graz, Austria, September 2019. (PDF)

Y. Belinkov, A. Ali, and J. Glass, "Anaylyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition," Proc. Interspeech, pp. 81-85, Graz, Austria, September 2019. (PDF)

H. Luo, M. Mohtarami, J. Glass, K. Krishnamurthy, and B. Richardson, "Integrating Video Retrieval and Moment Detection in a Unified Corpus for Video Question Answering," Proc. Interspeech, pp. 599-603, Graz, Austria, September 2019. (PDF)

E. Azuh, D. Harwath, and J. Glass, "Towards Bilingual Lexicon Discovery From Visually Grounded Speech Audio," Proc. Interspeech, pp. 276-280, Graz, Austria, September 2019. (PDF)

S. Shon, N. Dehak, D. Reynolds, and J. Glass, "MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation," Proc. Interspeech, pp. 356-360, Graz, Austria, September 2019. (PDF)

S. Shon, Y. Lee, and T. Kim, "Large-scale Speaker Retrieval on Random Speaker Variability Subspace," Proc.Interspeech, pp. 2963-2967, Graz, Austria, September 2019. (PDF)

S. Shon, H. Tang, and J. Glass, "VoiceID Loss: Speech Enhancement for Speaker Verification," Proc. Interspeech, pp. 2888-2892, Graz, Austria, September 2019. (PDF)

M. Korpusik, Z. Liu, and J. Glass, "A Comparison of Deep Learning Methods for Language Understanding," Proc. Interspeech, pp. 849-853, Graz, Austria, September 2019. (PDF)

W. Hsu, D. Harwath, and J. Glass, "Transfer Learning from Audio-Visual Grounding to Speech Recognition," Proc. Interspeech, pp. 3243-3246, Graz, Austria, September 2019. (PDF)

Y. Chung, W. Hsu, H. Tang, and J. Glass, "An Unsupervised Autoregressive Model for Speech Representation Learning," Proc. Interspeech, pp. 146-150, Graz, Austria, September 2019. (PDF)

J. Villalba, N. Chen, D. Snyder, D. Garcia-Romero, A. McCree, G. Sell, J. Borgstrom, F. Richardson, S. Shon, F. Grondin, R. Dehak, L. Garcia-Perera, D. Povey, P. Torres-Carrasquillo, S. Khudanpur, and N. Dehak, "State-of-the-art Speaker Recognition for Telephone and Video Speech: the JHU-MIT Submission for NIST SRE18," Proc. Interspeech, pp. 1488-1492, Graz, Austria, September 2019. (PDF)

P. Atanasova, P. Nakov, G. Karadzhov, M. Mohtarami, and G. Da San Martino, "Overview of the CLEF-2019 CheckThat! Lab: Automatic Identification and Verification of Claims. Task 1: Check-Worthiness," Proc. CLEF, Lugano, Switzerland, September 2019. (PDF)

A. Sarkar, Z. Tan, H. Tang, S. Shon, and J. Glass, "Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 8, pp. 1267-1279, August 2019. (PDF)

W. Weng, Y. Chung, and P. Szolovits, "Unsupervised Clinical Language Translation," Proc. KDD, pp. 3121-3131, Anchorage, Alaska, United States, August 2019. (PDF)

M. Korpusik and J. Glass, "Deep Learning for Database Mapping and Asking Clarification Questions in Dialogue Systems," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 8, pp. 1321-1334, Aug. 2019. (PDF)

D. Harwarth, A. Recasens, D. Suris, G. Chuang, A. Torralba, and J. Glass, "Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input," International Journal of Computer Vision, August 5, 2019. (PDF)

X. Li, P. Michel, A. Anastasopoulous, Y. Belinkov, N. Durrani, O. Firat, P. Koehn, G. Neubig, J. Pino, and H. Sajjad, "Findings of the First Shared Task on Machine Translation Robustness," Proc. WMT, pp. 91-102, Florence, Italy, August 2019. (PDF)

H. Luo, L. Jiang, Y. Belinkov, and J. Glass, "Improving Neural Language Models by Segmenting, Attending, and Predicting the Future," Proc. ACL, pp. 1483-1493, Florence, Italy, July 2019. (PDF)

A. Boggust, K. Audhkasi, D. Joshi, D. Harwath, S. Thomas, R. Feris, D. Gutfreund, Y. Zhang, A. Torralba, M. Picheny, and J. Glass, "Grounding Spoken Words in Unlabeled Video," Proc. CVPR, pp. 29-32, Long Beach, California, USA, June 2019. (PDF)

I. Schwartz, S. Yu, T. Hazan, A. G. Schwing, "Factor Graph Attention," Proc. CVPR, pp. 2039-2048, Long Beach, California, USA, June 2019. (PDF)

A. Saleh, R. Baly, A. Barrón-Cedeño, G. Da San Martino, M. Mohtarami, P. Nakov, and J. Glass, "Team QCRI-MIT at SemEval-2019 Task 4: Propaganda Analysis Meets Hyperpartisan News Detection," Proc. SemEval 2019, pp. 1041-1046, Minneapolis, Minnesota, USA, June 2019. (PDF)

T. Mihaylova, G. Karadzhov, P. Atanasova, R. Baly, M. Mohtarami, and P. Nakov, "SemEval-2019 Task 8: Fact Checking in Community Question Answering Forums," Proc. SemEval 2019, pp. 860-869, Minneapolis, Minnesota, USA, June 2019. (PDF)

R. Baly, G. Karadzhov, A. Saleh, J. Glass, and P. Nakov, "Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media," Proc. NAACL-HLT, pp. 2109-2116, Minneapolis, Minnesota, USA, June 2019. (PDF)

N. Durrani, F. Dalvi, H. Sajjad, Y. Belinkov, and P. Nakov, "One Size Does Not Fit All: Comparing NMT Representations of Different Granularities," Proc. NAACL-HLT, pp. 1504-1516, Minneapolis, Minnesota, USA, June 2019. (PDF)

H. Amiri and M. Mohtarami, "Vector of Locally Aggregated Embeddings for Text Representation," Proc. NAACL-HLT, pp. 1408-1414, Minneapolis, Minnesota, USA, June 2019. (PDF)

M. Nadeem, W. Fang, B. Xu, M. Mohtarami, and J. Glass, "FAKTA: An Automatic End-to-End Fact Checking System," Proc. NAACL-HLT, pp. 78-83, Minneapolis, Minnesota, USA, June 2019. (PDF)

P. Atanasova, P. Nakov, L. Màrquez, A. Barrón-Cedeño, G. Karadzhov, T. Mihaylova, M. Mohtarami, and J. Glass, "Automatic Fact-Checking Using Context and Discourse Information," ACM Journal of Data and Information Quality, Vol. 11, No. 3, Article 12, May 2019. (PDF)

Y. Chung, Y. Wang, W. Hsu, Y. Zhang, and RJ Skerry-Ryan, "Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis," Proc. ICASSP, pp. 6940-6944, Brighton, United Kingdom, May 2019. (PDF)

Y. Chung, W. Weng, S. Tong, and J. Glass, "Towards Unsupervised Speech-to-Text Translation," Proc. ICASSP, pp. 7170-7174, Brighton, United Kingdom, May 2019. (PDF)

J. Drexler and J. Glass, "Subword Regularization and Beam Search Decoding for End-to-End Automatic Speech Recognition," Proc. ICASSP, pp. 6266-6270, Brighton, United Kingdom, May 2019. (PDF)

F. Grondin and J. Glass, "SVD-PHAT: A Fast Sound Source Localization Method," Proc. ICASSP, pp. 4140-4144, Brighton, United Kingdom, May 2019. (PDF)

D. Harwath and J. Glass, "Towards Visually Grounded Sub-Word Speech Unit Discovery," Proc. ICASSP, pp. 3017-3021, Brighton, United Kingdom, May 2019. (PDF)

W. Hsu, Y. Zhang, R. J. Weiss, Y. Chung, Y. Wang, Y. Wu, and J. Glass, "Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization," Proc. ICASSP, pp. 5901-5905, Brighton, United Kingdom, May 2019. (PDF)

S. Khurana, S. R. Joty, A. Ali, and J. Glass, "A Factorial Deep Markov Model for Unsupervised Disentangled Representation Learning from Speech," Proc. ICASSP, pp. 6540-6544, Brighton, United Kingdom, May 2019. (PDF)

M. Korpusik and J. Glass, "Dialogue State Tracking with Convolutional Semantic Taggers," Proc. ICASSP, pp. 7220-7224, Brighton, United Kingdom, May 2019. (PDF)

S. Shon, A. Ali, and J. Glass, "Domain Attentive Fusion for End-to-End Dialect Identification with Unknown Target Domain," Proc. ICASSP, pp. 5951-5955, Brighton, United Kingdom, May 2019. (PDF)

S. Shon, T. Oh, and J. Glass, "Noise-Tolerant Audio-Visual Online Person Verification Using an Attention-based Neural Network Fusion," Proc. ICASSP, pp. 3995-3999, Brighton, United Kingdom, May 2019. (PDF)

S. Mun and S. Shon, "Domain Mismatch Robust Acoustic Scene Classification Using Channel Information Conversion," Proc. ICASSP, pp. 845-849, Brighton, United Kingdom, May 2019. (PDF)

A. Bau, Y. Belinkov, H. Sajjad, N. Durrani, F. Dalvi, and J. Glass, "Identifying and Controlling Important Neurons in Neural Machine Translation," Proc. ICLR, New Orleans, Louisiana, USA, May 2019. (PDF)

T. He and J. Glass, "Detecting Egregious Responses in Neural Sequence-to-Sequence Models," Proc. ICLR, New Orleans, Louisiana, USA, May 2019. (PDF)

W. Hsu, Y. Zhang, R. J. Weiss, H. Zen, Y. Wu, Y. Wang, Y. Cao, Y. Jia, Z. Chen, J. Shen, P. Nguyen, and R. Pang, "Hierarchical Generative Modeling for Controllable Speech Synthesis," Proc. ICLR, New Orleans, Louisiana, USA, May 2019. (PDF)

F. Dalvi, N. Durrani, H. Sajjad, Y. Belinkov, A. Bau, and J. Glass, "What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models," Proc. AAAI, Honolulu, Hawaii, USA, January 2019. (PDF)

F. Dalvi, A. Nortonsmith, A. Bau, Y. Belinkov, H. Sajjad, N. Durrani, and J. Glass, "NeuroX: A Toolkit for Analyzing Individual Neurons in Neural Networks," Proc. AAAI, Honolulu, Hawaii, USA, January 2019. (PDF)

M. Korpusik and J. Glass, "Convolutional Neural Encoder for the 7th Dialogue System Technology Challenge," Proc. AAAI Dialog System Technology Challenges (DSTC7) Workshop, Honolulu, Hawaii, USA, January 2019. (PDF)

Y. Belinkov and J. Glass, "Analysis Methods in Neural Language Processing: A Survey," Transactions of the Association for Computational Linguistics (TACL), 2019. (PDF)

Y. Belinkov, A. Magidow, A. Barrón-Cedeño, A. Shmidman, and M. Romanov , "Studying the History of the Arabic Language: Language Technology and a Large-Scale Historical Corpus," Language Resources and Evaluation, 2019. (PDF)

S. Romeo, G. Da San Martino, Y. Belinkov, A. Barrón-Cedeño, M. Eldesouki, K. Darwish, H. Mubarak, J. Glass, and A. Moschitti, "Language Processing and Learning Models for Community Question Answering in Arabic," Information Processing and Management 56 (2019), pp. 274-290. (PDF)

2018

J. Villalba, N. Chen, D. Snyder, D. Garcia-Romero, A. McCree, G. Sell, J. Borgstrom, F. Richardson, S. Shon, F. Grondin, R. Dehak, L. P. Garcia-Perera, P. A. Torres-Carrasquillo, and N. Dehak, "The JHU-MIT System Description for NIST SRE18," Proc. NIST Speaker Recognition Evaluation Workshop, Athens, Greece, December 2018. (PDF)

J. Drexler and J. Glass, "Combining End-to-End and Adversarial Training for Low-Resource Speech Recognition," Proc. SLT, pp. 361-368, Athens, Greece, December 2018. (PDF)

M. Korpusik and J. Glass, "Convolutional Neural Networks for Dialogue State Tracking without Pre-Trained Word Vectors or Semantic Dictionaries," Proc. SLT, pp. 884-891, Athens, Greece, December 2018. (PDF)

S. Shon, W. Hsu, and J. Glass, "Unsupervised Representation Learning of Speech for Dialect Identification," Proc. SLT, pp. 105-111, Athens, Greece, December 2018. (PDF)

S. Shon, H. Tang, and J. Glass, "Frame-Level Speaker Embeddings for Text-Independent Speaker Recognition and Analysis of End-to-End Model," Proc. SLT, pp. 1007-1013, Athens, Greece, December 2018. (PDF)

H. Tang and J. Glass, "On Training Recurrent Networks with Truncated Backpropagation Through Time in Speech Recognition," Proc. SLT, pp. 48-55, Athens, Greece, December 2018. (PDF)

W. Hsu, Y. Zhang, R. Weiss, Y. Chung, Y, Wang, Y. Wu, and J. Glass, "Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization," Proc. NeurIPS Workshop on Interpretability and Robustness in Audio, Speech, and Language (IRASL), Montreal, Canada, December 2018. (PDF)

Y. Chung, W. Weng, S. Tong, and J. Glass, "Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces," Proc. NeurIPS, Montreal, Canada, December 2018. (PDF)

B. Xu, M. Mohtarami, and J. Glass, "Adversarial Doman Adaptation for Stance Detection," Proc. NeurIPS, Montreal, Canada, December 2018. (PDF)

H. Luo and J. Glass, "Learning Word Representations with Cross-Sentence Dependency for End-to-End Co-reference Resolution," Proc. EMNLP, pp. 4829-4833, Brussels, Belgium, November 2018. (PDF)

R. Baly, G. Karadzhov, D. Alexandrov, J. Glass, and P. Nakov, "Predicting Factuality of Reporting and Bias of News Media Sources," Proc. EMNLP, pp. 3528-3539, Brussels, Belgium, November 2018. (PDF)

D. Harwath, A. Recasens, D. Suris, G. Chuang, A. Torralba, and J. Glass, "Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input," Proc. ECCV, pp. 659-677, Munich, Germany, September 2018. (PDF)

W. Hsu and J. Glass, "Scalable Factorized Hierarchical Variational Autoencoder Training," Proc. Interspeech, pp. 1462-1466, Hyderabad, India, September 2018. (PDF)

T. Alhanai, MM. Ghassemi, and J. Glass, "Detecting Depression with Audio/Text Sequence Modeling of Interviews," Proc. Interspeech, pp. 1716-1720, Hyderabad, India, September 2018. (PDF) (best student paper award)

Y. Chung and J. Glass, "Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech," Proc. Interspeech, pp. 811-815, Hyderabad, India, September 2018. (PDF)

H. Tang, W. Hsu, F. Grondin, and J. Glass, "A Study of Enhancement, Augmentation, and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition," Proc. Interspeech, Hyderabad, India, September 2018. (PDF)

W. Hsu, H. Tang, and J. Glass, "Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition," Proc. Interspeech, Hyderabad, India, September 2018. (PDF)

M. Zampieri, S. Malmasi, P. Nakov, A. Ali, S. Shon, J. Glass, Y. Scherrer, T. Samardžić, N. Ljubešić, J. Tiedemann, C. van der Lee, S. Grondelaers, N. Oostdijk, D. Speelman, A. van den Bosch, R. Kumar, B. Lahiri, and M. Jain, "Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign," Proc. of Fifth Workshop on NLP for Similar Languages, Varieties, and Dialects, pp. 1-17, Santa Fe, New Mexico, USA, August 2018. (PDF)

S. Shon, A. Ali, and J. Glass, "Convolutional Neural Networks and Language Embeddings for End-to-End Dialect Recognition," Proc. Odyssey, pp. 98-104, Les Sables D'Olonne, France, June 2018. (PDF)

M. Mohtarami, R. Baly, J. Glass, P. Nakov, L. Màrquez, A. Moschitti, "Automatic Stance Detection Using End-to-End Memory Networks," Proc. NAACL, pp. 767-776, New Orleans, Louisianan, USA, June 2018. (PDF)

R. Baly, M. Mohtarami, J. Glass, L. Marquez, A. Moschitti, P. Nakov, "Integrating Stance Detection and Fact Checking in a Unified Corpus", Proc. NAACL, New Orleans, Louisiana, USA, June 2018. (PDF)

T. Alhanai, R. Au, and J. Glass, "Role-specific Language Models for Processing Neuropsychological Exams," Proc. NAACL, New Orleans, Louisiana, USA, June 2018. (PDF)

Y. Chung, H. Lee, and J. Glass, "Supervised and Unsupervised Transfer Learning for Question Answering," Proc. NAACL, New Orleans, Louisiana, USA, June 2018. (PDF)

D. Harwath, G. Chuang, and J. Glass, "Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech," Proc. ICASSP, pp. 4969-4973, Calgary, Alberta, Canada, April 2018. (PDF)

M. Korpusik and J. Glass, "Convolutional Neural Networks and Multitask Strategies for Semantic Mapping of Natural Language Input to a Structured Database," Proc. ICASSP, pp. 6174-6177, Calgary, Alberta, Canada, April 2018. (PDF)

W. Hsu and J. Glass, "Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition," Proc. ICASSP, pp. 5614-5618, Calgary, Alberta, Canada, April 2018. (PDF)

M. Najafian, S. Khurana, S. Shon, A. Ali, and J. Glass, "Exploiting Convolutional Neural Networks for Phonotactic based Dialect Identification," Proc. ICASSP, pp. 5174-5175, Calgary, Alberta, Canada, April 2018. (PDF)

S. Koppula, J. Glass, and A. Chandrakasan, "Energy-Efficient Speaker Identification with Low-Precision Networks," Proc. ICASSP, pp. 2246-2250, Calgary, Alberta, Canada, April 2018. (PDF)

S. Koppula, K. C. Sim, and K. Chin, "Understanding Recurrent Neural State Using Memory Signatures," Proc. ICASSP, pp. 2396-2400, Calgary, Alberta, Canada, April 2018. (PDF)

T. Mihaylova, P. Nakov, L. Màrquez, A. Barrón-Cedeño, M. Mohtarami, G. Karadzhov, and J. Glass, "Fact Checking in Community Forums," Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, February 2018. (PDF)

M. Price, J. Glass, and A. P. Chandrakasan, "A Low-Power Speech Recognizer and Voice Activity Detector Using Deep Neural Networks," IEEE Journal of Solid-State Circuits, Vol. 53, No. 1, pp. 66-75, January 2018. (PDF)

2017

T. Alhanai, R. Au, and J. Glass, "Spoken Language Biomarkers for Detecting Cognitive Impairment," Proc. IEEE ASRU, pp. 409-416, Okinawa, Japan, December 2017. (PDF)

S. Shon, A. Ali, and J. Glass, "MIT-QCRI Arabic Dialect Identification System for the 2017 Multi-Genre Broadcast Challenge," Proc. IEEE ASRU, pp. 374-380, Okinawa, Japan, December 2017. (PDF)

W. Hsu, Y. Zhang, and J. Glass, "Unsupervised Domain Adaptation for Robust Speech Recognition via Variational Autoencoder-based Data Augmentation," Proc. IEEE ASRU, pp. 16-23, Okinawa, Japan, December 2017. (PDF)

M. Najafian, W. Hsu, A. Ali, and J. Glass, "Automatic Speech Recognition of Arabic Multi-Genre Broadcast Media," Proc. IEEE ASRU, pp. 353-359, Okinawa, Japan, December 2017. (PDF)

K. Leidal, D. Harwath, and J. Glass, "Learning Modality-Invariant Representations for Speech and Images," Proc. IEEE ASRU, pp. 424-429, Okinawa, Japan, December 2017. (PDF)

Y. Belinkov, and J. Glass, "Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems," Proc. NIPS, pp. 1-13, Long Beach, California, USA, December 2017. (PDF)

W. Hsu, Y. Zhang, and J. Glass, "Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data," Proc. NIPS, Long Beach, California, USA, December 2017. (PDF)

Y. Chung and J. Glass, "Learning Word Embeddings from Speech," Proc. NIPS Workshop on Machine Learning for Audio Signal Processing, Long Beach, California, USA, December 2017. (PDF)

Y. Belinkov, L. Màrquez, H. Sajjad, N. Durrani, F. Dalvi, and J. Glass, "Evaluating Layers of Representation in Neural Machine Translation on Part-of-Speech and Semantic Tagging Tasks," Proc. IJCNLP, pp. 1-10, Taipei, Taiwan, November 2017. (PDF)

F. Dalvi, N. Durrani, H. Sajjad, Y. Belinkov, and S. Vogel, "Understanding and Improving Morphological Learning in the Neural Machine Translation Decoder," Proc. IJCNLP, pp. 142-151, Taipei, Taiwan, November 2017. (PDF)

W. Hsu, Y. Zhang, and J. Glass, "Learning Latent Representations for Speech Generation and Transformation," Proc. Interspeech, pp. 1273-1277, Stockholm, Sweden, August 2017. (PDF)

S. Khurana, M. Najafian, A. Ali, T. Al Hanai, Y. Belinkov, and J. Glass, "QMDIS: QCRI-MIT Advanced Dialect Identification System," Proc. Interspeech, pp. 2591-2595, Stockholm, Sweden, August 2017. (PDF)

X. Feng, B. Richardson, S. Amman, and J. Glass, "An Environmental Feature Representation for Robust Speech Recognition and for Environment Identification," Proc. Interspeech, pp. 3078-3082, Stockholm, Sweden, August 2017. (PDF)

M. Korpusik, Z. Collins, and J. Glass, "Character-based Embedding Models and Reranking Strategies for Understanding Natural Language Meal Descriptions," Proc. Interspeech, pp. 3320-3324, Stockholm, Sweden, August 2017. (PDF)

S. Shon, S. Mun, W. Kim, and H. Ko, "Autoencoder based Domain Adaptation for Speaker Recognition under Insufficient Channel Information," Proc. Interspeech, pp. 1014-1018, Stockholm, Sweden, August 2017. (PDF)

S. Shon, S. Mun, and H. Ko, "Recursive Whitening Transformation for Speaker Recognition on Language Mismatched Condition," Proc. Interspeech, pp. 2869-2873, Stockholm, Sweden, August 2017. (PDF)

J. Drexler and J. Glass, "Analysis of Audio-Visual Features for Unsupervised Speech Recognition," Proc. Grounded Language Understanding Workshop, Stockholm, Sweden, August 2017. (PDF)

M. Korpusik and J. Glass, "Spoken Language Understanding for a Nutrition Dialogue System," IEEE/ACM Transactions on Audio, Speech, and Language Processing, July 2017, Vol. 25, No. 7, pp. 1450-1461. (PDF)

D. Harwath and J. Glass, "Learning Word-like Units from Joint Audio-visual Analysis," Proc. ACL, pp. 506-517, Vancouver, Canada, July 2017. (PDF)

Y. Belinkov, N. Durrant, F. Dalvi, H. Sajjad, and J. Glass, "What do Neural Machine Translation Models Learn about Morphology?," Proc. ACL, pp. 861-872, Vancouver, Canada, July 2017. (PDF)

H. Sajjad, F. Dalvi, A. Abdeladi, Y. Belinkov, and S. Vogel, "Challenging Language-Dependent Segmentation for Arabic: An Application to Machine Translation and Part-of-Speech Tagging," Proc. ACL, pp. 601-607, Vancouver, Canada, July 2017. (PDF)

Y. Adi, E. Kermany, Y. Belinkov, O. Lavi, and Y. Goldberg, "Fine-Grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks," Proc. ICLR, Toulon, France, April 2017. (PDF)

M. Korpusik, Z. Collins, and J. Glass, "Semantic Mapping of Natural Language Input to Database Entries via Convolutional Neural Networks," Proc. ICASSP, pp. 5685-5689, New Orleans, Louisiana, USA, March 2017. (PDF)

M. Najafian and J. H. L. Hansen, "Environment Aware Speaker Diarization for Moving Targets Using Parallel DNN-based Recognizers," Proc. ICASSP, pp. 5450-5454, New Orleans, Louisiana, USA, March 2017. (PDF)

M. Price, J. Glass, and A. Chandrakasan, "A Scalable Speech Recognizer with Deep-Neural-Network Acoustic Models and Voice-Activated Power Gating," Proc. ISSCC, San Francisco, February 2017. (PDF)

T. AlHanai and M. Ghassemi, "Predicting Latent Narrative Mood using Audio and Physiologic Data," Proc. AAAI, San Francisco, February 2017. (PDF)

2016

A. Ali, P. Bell, J. Glass, Y. Messaoui, H. Mubarak, S. Renals, and Y. Zhang, "The MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition," Proc. Spoken Language Technologies Workshop (SLT), San Diego, California, USA, December 2016. (PDF)

T. AlHanai, W. Hsu, and J. Glass, "Development of the MIT ASR System for the 2016 Arabic Multi-Genre Broadcast Challenge," Proc. Spoken Language Technologies Workshop (SLT), San Diego, California, USA, December 2016. (PDF)

F. Sun, D. Harwath, and J. Glass, "Look, Listen, and Decode: Multimodal Speech Recognition with Images," Proc. Spoken Language Technologies Workshop (SLT), San Diego, California, USA, December 2016. (PDF)

W. Hsu, Y. Zhang, and J. Glass, "A Prioritized Grid Long Short-Term Memory RNN for Speech Recognition," Proc. Spoken Language Technologies Workshop (SLT), San Diego, California, USA, December 2016. (PDF)

Y. Belinkov and J. Glass, "A Character-level Convolutional Neural Network for Distinguishing Similar Languages and Dialects," Proc. Coling VarDial Workshop, pp. 145-152, Osaka, Japan, December 2016. (PDF)

S. Romeo, G. Da San Martino, A. Barrón-Cedeño, A. Moschitti, Y. Belinkov, W. Hsu, Y. Zhang, M. Mohtarami, and J. Glass, "Neural Attention for Learning to Rank Questions in Community Question Answering," Proc. COLING, pp. 1734-1745, Osaka, Japan, December 2016. (PDF)

Y. Belinkov, A. Magidow, M. Romanov, A. Shmidman, and M. Koppel, "Shamela: A Large-Scale Historical Arabic Corpus," Proc. COLING Workshop on Language Technologies for Language Technology Resources and Tools for Digital Humanities (LT4DH), pp. 45-53, Osaka, Japan, December 2016. (PDF)

D. Harwath, A. Torralba, and J. Glass, "Unsupervised Learning of Spoken Language with Visual Context," Proc. of Neural Information Processing Systems (NIPS), Barcelona, Spain, December 2016. (PDF)

Y. Belinkov and J. Glass, "Large-Scale Machine Translation between Arabic and Hebrew: Available Corpora and Initial Results," Proc. AMTA Workshop on Semitic Machine Translation (SeMaT), pp. 7-12, Austin, Texas, USA, November 2016. (PDF)

W. Hsu, Y. Zhang, A. Lee, and J. Glass, "Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition," Proc. Interspeech, pp. 395-399, San Francisco, USA, September 2016. (PDF)

M. Price, A. Chandrakasan, J. Glass, "Memory-efficient Modeling and Search Techniques for Hardware ASR Decoders," Proc. Interspeech, pp. 1893-1897, San Francisco, USA, September 2016. (PDF)

A. Ali, N. Dehak, P. Cardinal, S. Khuruna, S. Yella, J. Glass, P. Bell, and S. Renals, "Automatic Dialect Detection in Arabic Broadcast Speech," Proc. Interspeech, pp. 2934-2938, San Francisco, USA, September 2016. (PDF)

S. Shum, D. Harwath, N. Dehak, and J. Glass, "On the Use of Acoustic Unit Discovery for Language Recognition," IEEE/ACM Transactions on Audio, Speech, and Language Processing, September 2016, Vol. 24, No. 9, pp. 1665-1676. (PDF)

H. Erdogan, T. Hayashi, J. R. Hershey, T. Hori, C. Hori, W. Hsu, S. Kim, J. Le Roux, Z. Meng, and S. Watanabe, "Multi-Channel Speech Recognition: LSTMs All the Way Through," Proc. CHiME-4 Workshop, San Francisco, California, USA, September 2016. (PDF)

C. Sun, S. Li, and L. Lin, "Thread Structure Prediction for MOOC Discussion Forum," Proc. International Conference of Young Computer Scientists, Engineers and Educators (ICYCSEE), pp. 92-101, Harbin, China, 2016. (PDF)

R. Aharoni, Y. Goldberg, and Y. Belinkov,"Improving Sequence to Sequence Learning for Morphological Inflection Generation: The BIU-MIT Systems for the SIGMORPHON 2016 Shared Task for Morphological Reinflection," Proc. SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pp. 41-48, Berlin, Germany, August 2016. (PDF)

H. Nassif, M. Mohtarami, and J. Glass, "Learning Semantic Relatedness in Community Question Answering Using Neural Models," Proceedings of ACL Workshop on Representation Learning for NLP, pp. 137-147, Berlin, Germany, August 2016. (PDF)

X. Zhang, C. Li, S. Li, and V. Zue, "Automated Segmentation of MOOC Lectures Towards Customized Learning," Proc. IEEE 16th International Conference on Advanced Learning Technologies (ICALT), pp. 20-22, Austin, Texas, USA, July 2016. (PDF)

M. Mohtarami, Y. Belinkov, W. Hsu, Y. Zhang, T. Lei, K. Bar, S. Cyphers, and J. Glass, "SLS at SemEval-2016 Task 3: Neural-based Approaches for Ranking in Community Question Answering," Proceedings of NAACL-HLT Workshop on Semantic Evaluation, pp. 753-760, San Diego, California, USA, June 2016. (PDF)

P. Nakov, L. Màrquez, A. Moschitti, W. Magdy, H. Mubarak, A. Freihat, J. Glass, and B. Randeree, "SemEval-2016 Task 3: Community Question Answering," Proceedings of NAACL-HLT Workshop on Semantic Evaluation, pp. 537-557, San Diego, California, USA, June 2016. (PDF)

E. Chuangsuwanich, Y. Zhang, and J. Glass, "Multilingual Data Selection for Training Stacked Bottleneck Features," Proc. ICASSP, pp.5410-5414, Shanghai, China, March 2016. (PDF)

M. Korpusik, C. Huang, M. Price, and J. Glass, "Distributional Semantics for Understanding Spoken Meal Descriptions," Proc. ICASSP, pp. 6070-6074, Shanghai, China, March 2016. (PDF)

A. Lee, N. F. Chen, and J. Glass, "Personalized Mispronunciation Detection and Diagnosis Based on Unsupervised Error Pattern Discovery," Proc. ICASSP, pp. 6145-6149, Shanghai, China, March 2016. (PDF)

Y. Zhang, E. Chuangsuwanich, J. Glass, and D. Yu, "Prediction-Adaptation-Correction Recurrent Neural Networks for Low-Resource Language Speech Recognition," Proc. ICASSP, pp. 5415-5419, Shanghai, China, March 2016. (PDF)

Y. Zhang, G. Chen, D. Yu, K. Yao, S. Khudanpur, and J. Glass, "Highway Long Short-Term Memory RNNs for Distant Speech Recognition," Proc. ICASSP, pp. 5755-5759, Shanghai, China, March 2016. (PDF)

X. Xiao, S. Watanabe, H. Erdogan, L. Lu, J. Hershey, M. L. Seltzer, G. Chen, Y. Zhang, M. Mandel, and D. Yu, "Deep Beamforming Networks for Multi-Channel Speech Recognition," Proc. ICASSP, pp. 5745-5749, Shanghai, China, March 2016. (PDF)

Y. Qian, T. Tan, D. Yu, and Y. Zhang, "Integrated Adaptation with Multi-Factor Joint-Learning for Far-Field Speech Recognition," Proc. ICASSP, pp. 5770-5774, Shanghai, China, March 2016. (PDF)

T. Tan, Y. Qian, D. Yu, S. Kundu, L. Lu, K. C. Sim, X. Xiao, and Y. Zhang, "Speaker-Aware Training of LSTM-RNNs for Acoustic Modeling," Proc. ICASSP, pp. 5280-5284, Shanghai, China, March 2016. (PDF)

W. Hsu, Y. Zhang, and J. Glass, "Recurrent Neural Network Encoder with Attention for Community Question Answering," arXiv: 1603.07044, March 2016. (PDF)

2015

D. Harwath and J. Glass, "Deep Multimodal Semantic Embeddings for Speech and Images," Proc. ASRU, pp. 237-244, Scottsdale, Arizona, USA, December 2015. (PDF)

F. Richardson, D. Reynolds, and N. Dehak, "Deep Neural Network Approaches to Speaker and Language Recognition," IEEE Signal Processing Letters, October 2015, Vol. 22, No. 10, pp. 1671-1675. (PDF)

L. Lee, J. Glass, H. Lee, and C. Chan, "Spoken Content Retrieval - Beyond Cascading Speech Recognition with Text Retrieval," IEEE/ACM Transactions on Audio, Speech, and Language Processing, September 2015, Vol. 23, No. 9, pp. 1389-1420. (PDF)

Y. Belinkov and J. Glass, "Arabic Diacritization with Recurrent Neural Networks," Proc. EMNLP, pp. 2281-2285, Lisbon, Portugal, September 2015. (PDF)

S. Li and V. Zue, "Linking MOOC Courseware to Accommodate Diverse Learner Backgrounds," Proc. SLaTE, pp. 155-160, Leipzig, Germany, September 2015. (PDF)

S. Shen, H. Lee, S. Li, V. Zue, and L. Lee, "Structuring Lectures in Massive Open Online Courses (MOOCs) for Efficient Learning by Linking Similar Sections and Predicting Prerequisites," Proc. Interspeech, pp. 1363-1367, Dresden, Germany, September 2015. (PDF)

A. Lee and J. Glass, "Mispronunciation Detection without Nonnative Training Data," Proc. Interspeech, pp. 643-647, Dresden, Germany, September 2015. (PDF)

P. Cardinal, N. Dehak, Y. Zhang, and J. Glass, "Speaker Adaptation Using the I-Vector Technique for Bottleneck Features," Proc. Interspeech, pp. 2867-2871, Dresden, Germany, September 2015. (PDF)

F. Richardson, D. Reynolds, and N. Dehak, "A Unified Deep Neural Network for Speaker and Language Recognition," Proc. Interspeech, pp.1146-1150, Dresden, Germany, September 2015. (PDF)

S. Li and V. Zue, "Would Linked MOOC Courseware Enhance Information Search?," Proc. ICALT, pp. 397-399, Hualien, Taiwan, July 2015. (PDF)

Y. Belinkov, A. Barrón-Cedeño, and H. Mubarak, "Answer Selection in Arabic Community Question Answering: A Feature-Rich Approach," Proc. ACL, pp. 183-190, Beijing, China, July 2015. (PDF)

C. Lee, T. J. O'Donnell, and J. Glass, "Unsupervised Lexicon Discovery from Acoustic Input," Transactions of the Association for Computational Linguistics, July 2015, Vol. 3, pp. 389-403. (PDF)

M. Walter, M. Antone, E. Chuangsuwanich, A. Correa, R. Davis, L. Fletcher, E. Frazzoli, Y. Friedman, J. Glass, J. How, J. Jeon, S. Karaman, B. Luders, N. Roy, S. Tellex, and S. Teller, "A Situationally Aware Voice-commandable Robotic Forklift Working Alongside People in Unstructured Outdoor Environments," J. Field Robotics, 32(4) 590-628, 2015. (PDF)

A. Alghunaim, M. Mohtarami, S. Cyphers, and J. Glass, "A Vector Space Approach for Aspect Based Sentiment Analysis," Proc. NAACL-HLT, pp. 116-122, Denver, Colorado, USA, June 2015. (PDF)

Y. Belinkov, M. Mohtarami, S. Cyphers, and J. Glass, "VectorSLU: A Continuous Word Vector Approach to Answer Selection in Community Question Answering Systems," Proc. SemEval, pp. 282-287, Denver, Colorado, USA, June 2015. (PDF)

P. Nakov, L. Màrquez, W. Magdy, A. Moschitti, and J. Glass, "SemEval-2015 Task 3: Answer Selection in Community Question Answering," Proc. SemEval, pp. 269-281, Denver, Colorado, USA, June 2015. (PDF)

C. Cai, P. Guo, J. Glass, and R. Miller, "Wait-Learning: Leveraging Wait Time for Second Language Education," Proc. CHI, pp. 3701-3710, Seoul, Republic of Korea, April 2015. (PDF)

X. Feng, B. Richardson, S. Amman, J. Glass," On Using Heterogeneous Data for Vehicle-Based Speech Recognition: A DNN-Based Approach," Proc. ICASSP, pp. 4385-4389, Brisbane, Australia, April 2015. (PDF)

Y. Zhang, D. Yu, M. L. Seltzer, J. Droppo, "Speech Recognition with Prediction-Adaptation-Correction Recurrent Neural Networks," Proc. ICASSP, pp. 5004-5008, Brisbane, Australia, April 2015. (PDF)

M. Price, J. Glass, and A. P. Chandrakasan, "A 6mW, 5,000-Word Real-Time Speech Recognizer Using WFST Models," IEEE Journal of Solid-State Circuits, January 2015, Vol. 50, No. 1, pp. 102-112. (PDF)

2014

K. Yao, B. Peng, Y. Zhang, D. Yu, G. Zweig, and Y. Shi, "Spoken Language Understanding Using Long Short-Term Memory Neural Networks," Proc. SLT, pp. 189-194, South Lake Tahoe, Nevada, USA, December 2014. (PDF)

M. Korpusik, N. Schmidt, J. Drexler, S. Cyphers, and J. Glass, "Data Collection and Language Understanding of Food Descriptions," Proc. SLT, pp. 560-565, South Lake Tahoe, Nevada, USA, December 2014. (PDF)

A. Ali, Y. Zhang, P. Cardinal, N. Dehak, S. Vogel, and J. Glass, "A Complete Kaldi Recipe for Building Arabic Speech Recognition Systems," Proc. SLT, pp.525-529, South Lake Tahoe, Nevada, USA, December 2014. (PDF)

I. Saleh, S. Joty, L. Màrquez, A. Moschitti, P. Nakov, S. Cyphers, and J. Glass, "A Study of Using Syntactic and Semantic Structures for Concept Segmentation and Labeling," Proc. COLING, pp. 193-202, Dublin, Ireland, August 2014. (PDF)

T. Al Hanai and J. Glass, "Lexical Modeling for Arabic ASR: A Systematic Approach," Proc. Interspeech, pp. 2605-2609, Singapore, September 2014. (PDF)

D. Harwath and J. Glass, "Speech Recognition without a Lexicon - Bridging the Gap between Graphemic and Phonetic Systems," Proc. Interspeech, pp. 2655-2659, Singapore, September 2014. (PDF)

D. Harwath, A. Gruenstein, and I. McGraw, "Choosing Useful Word Alternates for Automatic Speech Recognition Correction Interfaces," Proc. Interspeech, pp. 949-953, Singapore, September 2014. (PDF)

A. Lee and J. Glass, "Context-dependent Pronunciation Error Pattern Discovery with Limited Annotations," Proc. Interspeech, pp. 2877-2881, Singapore, September 2014. (PDF)

H. Lee, Y. Zhang, E. Chuangsuwanich, and J. Glass, "Graph-based Re-ranking using Acoustic Feature Similarity between Search Results for Spoken Term Detection on Low-resource Languages," Proc. Interspeech, pp. 2479-2483, Singapore, September 2014. (PDF)

S. Shum, N. Dehak, and J. Glass, "Limited Labels for Unlimited Data: Active Learning for Speaker Recognition," Proc. Interspeech, pp. 383-387, Singapore, September 2014. (PDF)

Y. Zhang, E. Chuangsuwanich, and J. Glass, "Language ID-based Training of Multilingual Stacked Bottleneck Features," Proc. Interspeech, pp. 1-5, Singapore, September 2014. (PDF)

P. Cardinal, A. Ali, N. Dehak, Y. Zhang, T. Al Hanai, Y. Zhang, J. Glass, and S. Vogel, "Recent Advances in ASR Applied to an Arabic Transcription System for Al-Jazeera," Proc. Interspeech, pp. 2088-2092, Singapore, September 2014. (PDF)

B. Lake, C. Lee, J. Glass, and J. Tenenbaum, "One-Shot Learning of Generative Speech Concepts," Proc. CogSci, pp. 803-808. Quebec City, July 2014. (PDF)

M. H. Bahari, N. Dehak, H. Van hamme, L. Burget, A. M. Ali, and J. Glass, "Non-Negative Factor Analysis of Gaussian Mixture Model Weight Adaptation for Language and Dialect Recognition," IEEE/ACM Transactions on Audio, Speech, and Language Processing, July 2014, Vol. 22, No. 7, pp. 1117-1129. (PDF)

S. Shum, D. A. Reynolds, D. Garcia-Romero, and A. McCree, "Unsupervised Clustering Approaches for Domain Adaptation in Speaker Recognition Systems," Proc. Odyssey, pp. 265-272, Joensuu, Finland, June 2014. (PDF) (best student paper award)

N. Dehak, O. Plchot, M. H. Bahari, L. Burget, H. Van hamme, and R. Dehak, "GMM Weights Adaptation Based on Subspace Approaches for Speaker Verification," Proc. Odyssey, pp. 48-53, Joensuu, Finland, June 2014. (PDF)

D. Garcia-Romero, A. McCree, S. Shum, N. Brümmer, and C. Vaquero, "Unsupervised Domain Adaptation for I-Vector Speaker Recognition," Proc. Odyssey, pp. 260-264, Joensuu, Finland, June 2014. (PDF)

X. Feng, Y. Zhang, J. Glass, "Speech Feature Denoising and Dereverberation via Deep Autoencoders for Noisy Reverberant Speech Recognition," Proc. ICASSP, pp. 1778-1782, Florence, Italy, May 2014. (PDF)

Y. Zhang, E. Chuangsuwanich, and J. Glass, "Extracting Deep Neural Network Bottleneck Features Using Low-Rank Matrix Factorization," Proc. ICASSP, pp. 185-189, Florence, Italy, May 2014. (PDF)

X. Feng, K. Kumatani, J. McDonough, "The CMU-MIT Reverb Challenge 2014 System: Description and Results," Proc. REVERB Workshop, pp. 1-7, Florence, Italy, May 2014. (PDF)

C. Cai, P. Guo, J. Glass, and R. Miller, "Wait-Learning: Leveraging Conversational Dead Time for Second Language Education," Proc. CHI, Toronto, Canada, April 2014. (PDF)

M. Price, J. Glass, and A. Chandrakasan, "A 6mW 5K-Word Real-Time Speech Recognizer Using WFST Models," Proc. ISSCC, pp. 454-455, San Francisco, California, USA, February 2014. (PDF)

2013

J. Liu, P. Pasupat, Y, Wang, S. Cyphers, and J. Glass, "Query Understanding Enhanced by Hierarchical Parsing Structures," Proc. ASRU, pp.72-77, Olomouc, Czech Republic, December 2013. (PDF)

C. Lee, Y. Zhang, and J. Glass, "Joint Learning of Phonetic Units and Word Pronunciations for ASR," Proc. EMNLP, pp. 182-192, Seattle, Washington, USA, October 2013. (PDF)

C. Cai, R. Miller, and S. Seneff, "Enhancing Speech Recognition in Fast-Paced Educational Games using Contextual Cues," Proc. SLaTE, pp. 54-59, Grenoble, France, August 2013. (PDF)

A. Lee and J. Glass, "Pronunciation Assessment via a Comparison-based System," Proc. SLaTE, pp. 122-126, Grenoble, France, August 2013. (PDF)

M. Senoussaoui, P. Kenny, P. Dumouchel, and N. Dehak, "New Cosine Similarity Scorings to Implement Gender-independent Speaker Verification," Proc. Interspeech, pp. 2773-2777, Lyon, France, August 2013. (PDF)

X. Fang, N. Dehak, and J. Glass, "Bayesian Distance Metric Learning on i-vector for Speaker Verification," Proc. Interspeech, pp. 2514-2518, Lyon, France, August 2013. (PDF)

W. Li, J. Glass, N. Roy, and S. Teller, "Probabilistic Dialogue Modeling for Speech-Enabled Assistive Technology," Proc. SLPAT, pp. 67-72, Grenoble, France, August 2013. (PDF)

S. Shum, N. Dehak, R. Dehak, and J. Glass, "Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach," IEEE Transactions on Audio, Speech, and Language Processing, October 2013, Vol. 21, No. 10., pp. 2015-2028. (PDF)

E. Hill, D. Han, P. Dumouchel, N. Dehak, T. Quatieri, C. Moehs, M. Oscar-Berman, J. Giordano, T. Simpatico, and K. Blum, "Long Term Suboxone^TM Emotional Reactivity As Measured by Automatic Detection in Speech," PLOS ONE, July 2013, Volume 8, Issue 7, pp. 1-14. (PDF)

M. H. Bahari, N. Dehak, and H. Van hamme, "Gaussian Mixture Model Weight Supervector Decomposition and Adaptation," Technical Report, June 12, 2013. (PDF)

A. Jansen, E. Dupoux, S. Goldwater, M. Johnson, S. Khudanpur, K. Church, N. Feldman, H. Hermansky, F. Metze, R. Rose, M. Seltzer, P. Clark, I. McGraw, B. Varadarajan, E. Bennett, B. Borschinger, J. Chiu, E. Dunbar, A. Fourtassi, D. Harwath, C. Lee, K. Levin, A. Norouzian, V. Peddinti, R. Richardson, T. Schatz, and S. Thomas, "A Summary of the 2012 JHU CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition," Proc. ICASSP, pp. 8111-8115, Vancouver, Canada, May 2013. (PDF)

O. Plchot, S. Matsoukas, P. Matĕjka, N. Dehak, J. Ma, S. Cumani, O. Glembek, H. Hermansky, S. H. Mallidi, N. Mesgarani, R. Schwartz, M. Soufifar, Z. H. Tan, S. Thomas, B. Zhang, and X. Zhou, "Developing a Speaker Identification System for the DARPA RATS Project," Proc. ICASSP, pp. 6768-6772, Vancouver, Canada, May 2013. (PDF)

D. Harwath, T. J. Hazen, and J. Glass, "Zero Resource Spoken Audio Corpus Analysis," Proc. ICASSP, pp. 8555-8559, Vancouver, Canada, May 2013. (PDF)

A. Lee, Y. Zhang, and J. Glass, "Mispronunciation Detection via Dynamic Time Warping on Deep Belief Network-Based Posteriorgrams," Proc. ICASSP, pp. 8227-8231, Vancouver, Canada, May 2013. (PDF) (student paper award)

J. Liu, P. Pasupat, S. Cyphers, and J. Glass, "Asgard: A Portable Architecture for Multilingual Dialogue Systems," Proc. ICASSP, pp. 8386-8390, Vancouver, Canada, May 2013. (PDF)

S. Shum, W. Campbell, and D. Reynolds, "Large-Scale Community Detection on Speaker Content Graphs," Proc. ICASSP, pp. 7716-7720, Vancouver, Canada, May 2013. (PDF)

I. McGraw, I. Badr, and J. Glass, "Learning Lexicons From Speech Using a Pronunciation Mixture Model," IEEE Transactions on Audio, Speech, and Language Processing, February 2013, Volume 21, Issue 2, pp. 357-366. (PDF)

2012

J. Liu, S. Seneff, and V. Zue, "Harvesting and Summarizing User-Generated Content for Advanced Speech-Based HCI," IEEE Journal of Selected Topics in Signal Processing, Vol. 6, No. 8, pp. 982-992, December 2012. (PDF)

A. Lee and J. Glass, "A Comparison-based Approach to Mispronunciation Detection," Proc. Spoken Language Technologies Workshop, pp. 382-387, Miami, Florida, December 2012. (PDF)

A. Lee and J. Glass, "Sentence Detection Using Multiple Annotations," Proc. Interspeech, Portland, Oregon, September 2012. (PDF)

J. Liu, S. Cyphers, P. Pasupat, I. McGraw, and J. Glass, "A Conversational Movie Search System Based on Conditional Random Fields," Proc. Interspeech, Portland, Oregon, September 2012. (PDF)

I. McGraw, S. Cyphers, P. Pasupat, J. Liu, and J. Glass, "Automating Crowd-supervised Learning for Spoken Language Systems," Proc. Interspeech, Portland, Oregon, September 2012. (PDF)

I. McGraw and A. Gruenstein, "Estimating Word-Stability During Incremental Speech Recognition," Proc. Interspeech, Portland, Oregon, September 2012. (PDF)

S. Shum, N. Dehak, and J. Glass, "On the Use of Spectral and Iterative Methods for Speaker Diarization," Proc. Interspeech, Portland, Oregon, September 2012. (PDF)

P. Matĕjka, O. Plchot, M. Soufifar, O. Glembek, L. D'Haro, K. Veselý, F. Grézl, J. Ma, S. Matsoukas, and N. Dehak. "Patrol Team Language Identification System for DARPA RATS P1 Evaluation," Proc. Interspeech, Portland, Oregon, September 2012. (PDF)

J. Glass, "Towards Unsupervised Speech Processing," Keynote, Proc. ISSPA, Montreal, July 2012. (PDF)

C. Lee and J. Glass, "A Nonparametric Bayesian Approach to Acoustic Model Discovery," Proc. ACL, pp. 40-49, Jeju, Republic of Korea, July 2012. (PDF)

M. Senoussaoui, N. Dehak, P. Kenny, R. Dehak, and P. Dumouchel, "First Attempt of Boltzmann Machines for Speaker Verification," Proc. Odyssey, pp. 117-121, Singapore, June 2012. (PDF)

E. Singer, P. Torres-Carrasquillo, D. Reynolds, A. McCree, F. Richardson, N. Dehak, and D. Sturim, "The MITLL NIST LRE 2011 Language Recognition System," Proc. Odyssey, pp. 209-215, Singapore, June 2012. (PDF)

H. Chang and J. Glass, "Evaluation of Multi-level Context-Dependent Acoustic Model for Large Vocabulary Speaker Adaptation Tasks," Proc. ICASSP, pp. 4313-4316, Kyoto, Japan, March 2012. (PDF)

E. Chuangsuwanich, S. Watanabe, T, Hori, T. Iwata, and J. Glass, "Handling Uncertain Observations in Unsupervised Topic-Mixture Language Model Adaptation," Proc. ICASSP, pp. 5033-5036, Kyoto, Japan, March 2012. (PDF)

D. Harwath and T. J. Hazen, "Topic Identification Based Extrinsic Evaluation of Summarization Techniques Applied to Conversational Speech," Proc. ICASSP, pp. 5073-5076, Kyoto, Japan, March 2012. (PDF)

Y. Xu and S. Seneff, "Improving Nonnative Speech Understanding Using Context and N-Best Meaning Fusion," Proc. ICASSP, pp. 4977-4980, Kyoto, Japan, March 2012. (PDF)

Y. Zhang, K. Adl, and J. Glass, "Fast Spoken Query Detection Using Lower-Bound Dynamic Time Warping on Graphical Processing Units," Proc. ICASSP, pp. 5173-5176, Kyoto, Japan, March 2012. (PDF)

Y. Zhang, R. Salakhutdinov, H. Chang, and J. Glass, "Resource Configurable Spoken Query Detection Using Deep Boltzmann Machines," Proc. ICASSP, pp. 5161-5164, Kyoto, Japan, March 2012. (PDF)

2011

H. Chang and J. Glass, "Multi-level Context-dependent Acoustic Modeling for Automatic Speech Recognition," Proc. ASRU, Waikoloa, Hawaii, December 2011. (PDF)

J. Liu, "A Dialogue System for Accessing Drug Reviews," Proc. ASRU, Waikoloa, Hawaii, December 2011. (PDF)

T. Mertens and S. Seneff, "Subword-based Automatic Lexicon Learning for ASR," Proc. ASRU, Waikoloa, Hawaii, December 2011. (PDF)

J. Liu, A. Li, and S. Seneff, "Automatic Drug Side Effect Discovery from Online Patient-Submitted Reviews: Focus on Statin Drugs," Proc. IMMM, Barcelona, Spaine, October 2011. (PDF)

I. Badr, I. McGraw, and J. Glass, "Pronunciation Learning from Continuous Speech," Proc. Interspeech, pp. 549-552, Florence, Italy, August 2011. (PDF)

E. Chuangsuwanich and J. Glass, "Robust Voice Activity Detector for Real World Applications Using Harmonicity and Modulation Frequency," Proc. Interspeech, pp. 2645-2648, Florence, Italy, August 2011. (PDF)

N. Dehak, P. Torres-Carrasquillo, D. Reynolds, and R. Dehak, "Language Recognition via Ivectors and Dimensionality Reduction," Proc. Interspeech, pp. 857-860, Florence, Italy, August 2011. (PDF)

C. Lee and J. Glass, "A Transcription Task for Crowdsourcing with Automatic Quality Control," Proc. Interspeech, pp. 3041-3044, Florence, Italy, August 2011. (PDF)

C. Lee, J. Glass, and O. Ghitza, "An Efferent-Inspired Auditory Model Front-End for Speech Recognition," Proc. Interspeech, pp. 49-52, Florence, Italy, August 2011. (PDF)

I. McGraw, J. Glass, and S. Seneff, "Growing a Spoken Language Interface on Amazon Mechanical Turk," Proc. Interspeech, pp. 3057-3060, Florence, Italy, August 2011. (PDF)

S. Shum, N. Dehak, E. Chuangsuwanich, D. Reynolds, and J. Glass, "Exploiting Intra-Conversation Variability for Speaker Diarization," Proc. Interspeech, pp. 945-948, Florence, Italy, August 2011. (PDF)

Y. Zhang and J. Glass, "A Piecewise Aggregate Approximation Lower-Bound Estimate for Posteriorgram-based Dynamic Time Warping," Proc. Interspeech, pp. 1909-1912, Florence, Italy, August 2011. (PDF)

Y. Xu and S. Seneff, "A Generic Framework for Building Dialogue Games for Language Learning: Application in the Flight Domain," Proc. SLaTE, Venice, Italy, August 2011. (PDF)

Z. Karam, W. Campbell, N. Dehak, "Graph Relational Features for Speaker Recognition and Mining," Proc. Statistical Signal Processing Workshop (SSP 2011), pp. 525-528, Nice, France, June 2011. (PDF)

H. Chang, Y. Sung, B. Strope, F. Beaufays, "Recognizing English Queries in Mandarin Voice Search," Proc. ICASSP, pp. 5016-5019, Prague, Czech Republic, May 2011. (PDF)

N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front-End Factor Analysis for Speaker Verification," IEEE Transactions on Audio, Speech, and Language Processing, Vol. 19, No. 4, May 2011, pp. 788-798. (PDF) (Selected for the IEEE Signal Processing Society Young Author Best Paper Award.)

N. Dehak, Z. Karam, D. Reynolds, R. Dehak, W. Campbell, and J. Glass, "A Channel-Blind System for Speaker Verification," Proc. ICASSP, pp. 4536-4539, Prague, Czech Republic, May 2011. (PDF)

Z. Karam, W. Campbell, and N. Dehak, "Towards Reduced False-Alarms Using Cohorts," Proc. ICASSP, pp. 4512-4515, Prague, Czech Republic, May 2011. (PDF)

J. Liu, X. Li, A. Acero, and Y. Wang, "Lexicon Modeling for Query Understanding," Proc. ICASSP, pp. 5604-5607, Prague, Czech Republic, May 2011. (PDF)

D. Sturim, W. Campbell, N. Dehak, Z. Karam, A. McCree, D. Reynolds, F. Richardson, P. Torres-Carrasquillo, and S. Shum, "The MIT LL 2010 Speaker Recognition Evaluation System: Scalable Language-Independent Speaker Recognition," Proc. ICASSP, pp. 5272-5275, Prague, Czech Republic, May 2011. (PDF)

Y. Zhang and J. Glass, "An Inner-Product Lower-Bound Estimate for Dynamic Time Warping," Proc. ICASSP, pp. 5660-5663, Prague, Czech Republic, May 2011. (PDF)

Y. Zhang, L. Deng, X. He, and A. Acero, "A Novel Decision Function and the Associated Decision-Feedback Learning for Speech Translation," Proc. ICASSP, pp. 5608-5611, Prague, Czech Republic, May 2011. (PDF)

2010

Y. Xu, S. Seneff, A. Li, J. Polifroni, "Semantic Understanding by Combining Extended CFG Parser with HMM Model," Proc. Spoken Language Technologies Workshop, Berkeley, California, USA, December 2010. (PDF)

S. Liu, S. Seneff, J. Glass, "A Collective Data Generation Method for Speech Language Models," Proc. Spoken Language Technologies Workshop, Berkeley, California, USA, December 2010. (PDF)

E. Chuangsuwanich, S. Cyphers, J. Glass, and S. Teller, "Spoken Command of Large Mobile Robots in Outdoor Environments," Proc. Spoken Language Technologies Workshop, Berkeley, California, USA, December 2010. (PDF)

J. Polifroni, S. Seneff, S.R.K. Branavan, C. Wang, and R. Barzilay, "Good Grief, I Can Speak It! Preliminary Experiments in Audio Restaurant Reviews," Proc. Spoken Language Technologies Workshop, Berkeley, California, USA, December 2010. (PDF)

I. Badr, I. McGraw, and J. Glass, "Learning New Word Pronunciations from Spoken Examples," Proc. Interspeech, Chiba, Japan, September 2010. (PDF)

J. Liu, S. Seneff, and V. Zue, "Utilizing Review Summarization in a Spoken Recommendation System," Proc. SIGDIAL, Tokyo, Japan, September 2010. (PDF)

Y. Xu and S. Seneff, "Dialogue Management Based on Entities and Constraints," Proc. SIGDIAL, Tokyo, Japan, September 2010. (PDF)

M. Peabody and S. Seneff, "A Simple Feature Normalization Scheme for Non-native Vowel Assessement," Satellite Workshop on Second Language Studies: Acquisition, Learning, Education and Technology, Tokyo, Japan, September 2010. (PDF)

J. Polifroni, I. Kiss, S. Seneff, "Speech for Content Creation," Proc. SiMPE, Lisbon, Portugal, September 2010. (PDF)

R. Zbib, S. Matsoukas, R. Schwartz, and J. Makhoul, "Decision Trees for Lexical Smoothing in Statistical Machine Translation," Proc. ACL Joint 5th Workshop on Statistical Machine Translation, Uppsala, Sweden, July 2010. (PDF)

N. Dehak, R. Dehak, J. Glass, D. Reynolds, and P. Kenny, "Cosine Similarity Scoring without Score Normalization Techniques," Proc. IEEE Odyssey Workshop, Brno, Czech Republic, June 2010. (PDF)

S. Shum, N. Dehak, R. Dehak, and J. Glass, "Unsupervised Speaker Adaptation Based on the Cosine Similarity for Text-Independent Speaker Verification," Proc. IEEE Odyssey Workshop, Brno, Czech Republic, June 2010. (PDF)

M. Senoussaoui, P. Kenny, N. Dehak, and P. Dumouchel, "An i-Vector Extractor Suitable for Speaker Recognition with Both Microphone and Telephone Speech," Proc. IEEE Odyssey Workshop, Brno, Czech Republic, June 2010. (PDF)

I. McGraw, C. Lee, L. Hetherington, S. Seneff, and J. Glass, "Collecting Voices from the Cloud," Proc. LREC, Malta, May 2010. (PDF)

S. Teller, M. Walker, M. Antone, A. Correa, R. Davis, L. Fletcher, E. Frazzoli, J. Glass, J. How, A. S. Huang, J. Jeon, S. Karaman, B. Luders, N. Roy, T. Sainath, "A Voice-Commandable Robotic Forklift Working Alongside Humans in Minimally-Prepared Outdoor Environments," Proc. ICRA, Anchorage, Alaska, United States, May 2010. (PDF)

J. Liu, S. Seneff, and V. Zue, "Dialogue-Oriented Review Summary Generation for Spoken Dialogue Recommendation Systems," Proc. NAACL-HLT, Los Angeles, California, United States, March 2010. (PDF)

Y. Zhang and J. Glass, "Towards Multi-Speaker Unsupervised Speech Pattern Discovery," Proc. ICASSP, pp. 4366-4369, Dallas, Texas, United States, March 2010. (PDF)

J. Ming, T. J. Hazen, J. R. Glass, "Combining Missing-Feature Theory, Speech Enhancement, and Speaker-Dependent/-Independent Modeling for Speech Separation," Computer Speech and Language 24, January 2010, pp. 67-76. (PDF)

2009

Y. Xu and S. Seneff, "Speech-Based Interactive Games for Language Learning: Reading, Translation, and Question-Answering," International Journal of Computational Linguistics and Chinese Language Processing, vol. 14, no. 2 (2009) (PDF)

T. N. Sainath, "Island-Driven Search Using Broad Phonetic Classes," Proc. ASRU, Merano, Italy, December 2009. (PDF)

Y. Zhang, and J. Glass, "Unsupervised Spoken Keyword Spotting via Segmental DTW on Gaussian Posteriorgrams," Proc. ASRU, Merano, Italy, December 2009. (PDF)

K. Saenko, K. Livescu, J. Glass, and T. Darrell, "Multistream Articulatory Feature-Based Models for Visual Speech Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 9, pp. 1700 - 1701, September 2009. (PDF)

I. McGraw, A. Gruenstein, and A. Sutherland, "A Self-Labeling Speech Corpus: Collecting Spoken Words with an Online Educational Game," Proc. Interspeech, Brighton, UK, September 2009. (PDF)

H. Chang and J. Glass, "A Back-off Discriminative Acoustic Model for Automatic Speech Recognition," Proc. Interspeech, Brighton, UK, September 2009. (PDF)

M. Peabody and S. Seneff, "Annotation and Features of Non-native Mandarin Tone Quality," Proc. Interspeech, Brighton, UK, September 2009. (PDF)

A. Gruenstein, I. McGraw, and A. Sutherland, "A Self-Transcribing Speech Corpus: Collecting Continuous Speech with an Online Educational Game," Proc. SIGSLaTe, Warwickshire, England, September 2009. (PDF)

B. Yoshimoto, I. McGraw, and S. Seneff, "Rainbow Rummy: A Web-based Game for Vocabulary Acquisition using Computer-directed Speech," Proc. SIGSLaTe, Warwickshire, England, September 2009. (PDF)

Y. Xu, A. Goldie, and S. Seneff, "Automatic Question Generation and Answer Judging: A Q&A Game for Language Learning," Proc. SIGSLaTE, Warwickshire, England, September 2009. (PDF)

J. Liu and S. Seneff, "Review Sentiment Scoring via a Parse-and-Paraphrase Paradigm," Proc. EMNLP, Singapore, August 2009. (PDF)

J. M. Baker, L. Deng, S. Khudanpur, C. Lee, J. Glass, N. Morgan, and D. O'Shaughnessy, "Updated MINDS Report on Speech Recognition and Understanding, Part 2," IEEE Signal Processing Magazine, pp. 79-86, July 2009. (PDF)

J. M. Baker, L. Deng, J. Glass, S. Khudanpur, C. Lee, N. Morgan, and D. O'Shaughnessy, "Research Developments and Directions in Speech Recognition and Understanding, Part 1," IEEE Signal Processing Magazine, pp. 75-80, May 2009. (PDF)

A. Gruenstein, J. Orszulak, S. Liu, S. Roberts, J. Zabel, B. Reimer, B. Mehler, S. Seneff, J. Glass, J. Coughlin, "City Browser: Developing a Conversational Automotive HMI," Proc. CHI, 4291-4296, Boston, April 2009. (PDF)

I. Badr, R. Zbib, and J. Glass, "Syntactic Phrase Reordering for English-to-Arabic Statistical Machine Translation," Proc. EACL, 86-93, Athens, April 2009. (PDF)

H. Chang and J. Glass, "Discriminative Training of Hierarchical Acoustic Models for Large Vocabulary Continuous Speech Recognition," Proc. ICASSP, Taipei, Taiwan, April 2009. (PDF)

B. Hsu and J. Glass, "Language Model Parameter Estimation Using User Transcriptions," Proc. ICASSP, Taipei, Taiwan, April 2009. (PDF)

Y. Zhang and J. Glass, "Speech Rhythm Guided Syllable Nuclei Detection," Proc. ICASSP, Taipei, Taiwan, April 2009. (PDF)

K. Livescu, B. Zhu, and J. Glass, "On the Phonetic Information in Ultrasonic Microphone Signals," Proc. ICASSP, Taipei, Taiwan, April 2009. (PDF)

D. Kanevsky, T. N. Sainath, and B. Ramabhadran, "A Generalized Family of Parameter Estimation Techniques," Proc. ICASSP, Taipei, Taiwan, April 2009. (PDF)

2008

I. McGraw, B. Yoshimoto, and S. Seneff, "Speech-enabled Card Games for Incidental Vocabulary Acquisition in a Foreign Language," Speech Communication 2008. (PDF)

J. Liu, Y. Xu, S. Seneff, and V. Zue, "CityBrowser II: A Multimodal Restaurant Guide in Mandarin," Proc. ISCSLP, Kunming, China, December 2008. (PDF)

Y. Xu and S. Seneff, "Mandarin Learning Using Speech and Language Technologies: A Translation Game in the Travel Domain," Proc. ISCSLP, Kunming, China, December 2008. (PDF)

Y. Xu, J. Liu, and S. Seneff, "Mandarin Language Understanding in Dialogue Context," Proc. ISCSLP, Kunming, China, December 2008. (PDF)

Y. Xu and S. Seneff, "Two-Stage Translation: A Combined Linguistic and Statistical Machine Translation Framework," Proc. AMTA, Waikiki, Hawaii, USA, October 2008. (PDF)

A. Gruenstein, I. McGraw, and I. Badr, "The WAMI Toolkit for Developing, Deploying, and Evaluating Web-Accessible Multimodal Interfaces," Proc. ICMI, Chania, Crete, Greece, October 2008. (PDF)

B. Hsu and J. Glass, "N-gram Weighting: Reducing Training Data Mismatch in Cross-Domain Language Model Estimation," Proc. EMNLP, Honolulu, Hawaii, USA, October 2008. (PDF)

B. Hsu and J. Glass, "Iterative Language Model Estimation: Efficient Data Structure & Algorithms," Proc. Interspeech, Brisbane, Australia, September 2008. (PDF)

T. N. Sainath and V. Zue, "A Comparison of Broad Phonetic and Acoustic Units for Noise Robust Segment-Based Speech Recognition," Proc. Interspeech, Brisbane, Australia, September 2008. (PDF)

D. Kanevsky, T. N. Sainath, B. Ramabhadran, and D. Nahamoo, "Generalization of Extended Baum-Welch Parameter Estimation for Discriminative Training and Decoding," Proc. Interspeech, Brisbane, Australia, September 2008. (PDF)

I. McGraw and S. Seneff, "Speech-enabled Card Games for Language Learners," Proc. AAAI, Chicago, Illinois, USA, July 2008. (PDF)

A. Gruenstein, B. Hsu, J. Glass, S. Seneff, I. Hetherington, S. Cyphers, I. Badr, C. Wang, and S. Liu, "A Multimodal Home Entertainment Interface via a Mobile Device", Proc. of ACL Workshop on Mobile Language Processing, Columbus, Ohio, USA, June 2008. (PDF)

A. Gruenstein, "Response-Based Confidence Annotation for Spoken Dialogue Systems", Proc. of SIGdial Workshop on Discourse and Dialogue, Columbus, Ohio, USA, June 2008. (PDF)

J. Lee and S. Seneff, "Correcting Misuse of Verb Forms," Proc. ACL, Columbus, Ohio, USA, June 2008. (PDF)

I. Badr, R. Zbib, and J. Glass, "Segmentation for English-to-Arabic Statistical Machine Translation", Proc. ACL, Columbus, Ohio, USA, June 2008. (PDF)

T. N. Sainath, D. Kanevsky, and B. Ramabhadran, "Gradient Steepness Metrics using Extended Baum-Welch Transformations for Universal Pattern Recognition Tasks," Proc. ICASSP, Las Vegas, Nevada, USA, April 2008. (PDF)

G. Choueiter, M. Ohannessian, S. Seneff, and J. Glass, "A Turbo-Style Algorithm for Lexical Baseforms Estimation", Proc. ICASSP, Las Vegas, Nevada, USA, April 2008. (PDF)

G. Choueiter, G. Zweig, and P. Nguyen, "An Empirical Study of Automatic Accent Classification", Proc. ICASSP, Las Vegas, Nevada, USA, April 2008. (PDF)

J. Lee and O. Knutsson, "The Role of PP Attachment in Preposition Generation", Proc. CICLing, Haifa, Israel, February 2008. (PDF)

A. Park and J. Glass, "Unsupervised Pattern Discovery in Speech", IEEE Transactions on Audio, Speech, and Language Processing, Vol. 16, No. 1, January 2008. (PDF)

2007

K. Schutte and J. Glass, "Speech Recognition with Localized Time-Frequency Pattern Detectors," Proc. ASRU, Kyoto, Japan, December 2007. (PDF)

G. Choueiter, S. Seneff, and J. Glass, "Automatic Lexical Pronunciations Generation and Update", Proc. ASRU, Kyoto, Japan, December 2007. (PDF)

T. Sainath, D. Kanevsky, and B. Ramabhadran, "Broad Phonetic Class Recognition in a Hidden Markov Model Framework Using Extended Baum-Welch Transformations", Proc. ASRU, Kyoto, Japan, December 2007. (PDF)

B. Hsu, "Generalized Linear Interpolation of Language Models", Proc. ASRU, 136-140, Kyoto, Japan, December 2007. (PDF)

H. Chang and J. Glass, "Hierarchical Large-Margin Gaussian Mixture Models for Phonetic Classification", Proc. ASRU, Kyoto, Japan, December 2007. (PDF)

I. McGraw and S. Seneff, "Immersive Second Language Acquisition in Narrow Domains: A Prototype ISLAND Dialogue System", Proc. of the Speech and Language Technology in Education (SLaTE) Workshop, Farmington, Pennsylvania, October 2007. (PDF)

C. Chao, S. Seneff, and C. Wang, "An Interactive Interpretation Game for Learning Chinese", Proc. of the Speech and Language Technology in Education (SLaTE) Workshop, Farmington, Pennsylvania, October 2007. (PDF)

S. Seneff, "Web-based Dialogue and Translation Games for Spoken Language Learning", Proc. of the Speech and Language Technology in Education (SLaTE) Workshop, Farmington, Pennsylvania, October 2007. (PDF)

A. Gruenstein and S. Seneff, "Releasing a Multimodal Dialogue System into the Wild: User Support Mechanisms", Proc. of the 8th SIGdial Workshop on Discourse and Dialogue, Antwerp, Belgium, pp. 111-119, September 2007. (PDF)

V. Zue, "On Organic Interfaces", Proc. Interspeech, Antwerp, Belgium, August 2007. (PDF)

G. Choueiter, S. Seneff, and J. Glass, "New Word Acquisition Using Subword Modeling", Proc. Interspeech, Antwerp, Belgium, August 2007. (PDF)

J. Frankel, M. Magimai-Doss, S. King, K. Livescu and O. Cetin, "Articulatory Feature Classifiers Trained on 2000 hours of Telephone Speech", Proc. Interspeech, Antwerp, Belgium, August 2007. (PDF)

J. Glass, T. J. Hazen, S. Cyphers, I. Malioutov, D. Huynh, and R. Barzilay, "Recent Progress in the MIT Spoken Lecture Processing Project", Proc. Interspeech, Antwerp, Belgium, August 2007. (PDF)

T. J. Hazen, B. Sherry, and M. Adler, "Speech-Based Annotation and Retrieval of Digital Photographs", Proc. Interspeech, Antwerp, Belgium, August 2007. (PDF)

T. J. Hazen and D. Schultz, "Multi-Modal User Authentication from Video for Mobile or Variable-Environment Applications," Proc. Interspeech, Antwerp, Belgium, August 2007. (PDF)

T. J. Hazen and E. McDermott, "Discriminative MCE-Based Speaker Adaptation of Acoustic Models for a Spoken Lecture Processing Task", Proc. Interspeech, Antwerp, Belgium, August 2007. (PDF)

I. Hetherington, "PocketSUMMIT: Small-Footprint Continuous Speech Recognition", Proc. Interspeech, Antwerp, Belgium, August 2007. (PDF)

J. Lee and S. Seneff, "Automatic Generation of Cloze Items for Prepositions", Proc. Interspeech, Antwerp, Belgium, August 2007. (PDF)

T. Sainath, V. Zue, and D. Kanevsky, "Audio Classification using the Extended Baum-Welch Transformations", Proc. Interspeech, Antwerp, Belgium, August 2007. (PDF)

N. Singh-Miller, M. Collins, and T. J. Hazen, "Dimensionality Reduction for Speech Recognition Using Neighborhood Components Analysis", Proc. Interspeech, Antwerp, Belgium, August 2007. (PDF)

H. Wu and S. Seneff, "Reducing Recognition Error Rate based on Context Relationships among Dialogue Turns", Proc. Interspeech, Antwerp, Belgium, August 2007. (PDF)

B. Zhu, T. J. Hazen, and J. Glass, "Multimodal Speech Recognition with Ultrasonic Sensors", Proc. Interspeech, Antwerp, Belgium, August 2007. (PDF)

M. Hasegawa, K. Livescu, P. Lal, and K. Saenko, "Audiovisual Speech Recognition with Articulator Positions as Hidden Variables", Proc. International Congress of Phonetic Sciences, Saarbruecken, Germany, August 2007. (PDF)

C. Wang and S. Seneff, "A Spoken Translation Game for Second Language Learning", Proc. AIED, Marina del Rey, California, July 2007. (PDF)

I. Malioutov, A. Park, R. Barzilay, and J. Glass, "Making Sense of Sound: Unsupervised Topic Segmentation over Acoustic Input", Proc. ACL, Prague, Czech Republic, June 2007. (PDF)

J. Lee, "A Computational Model of Text Reuse in Ancient Literary Texts", Proc. ACL, Prague, Czech Republic, June 2007. (PDF)

G. Sun, X. Liu, G. Cong, M. Zhou, Z. Xiong, J. Lee, and C. Lin, "Detecting Erroneous Sentences using Automatically Mined Sequential Patterns", Proc. ACL, Prague, Czech Republic, June 2007. (PDF)

C. Wang, M. Collins, and P. Koehn, "Chinese Syntactic Reordering for Statistical Machine Translation", Proc. EMNLP, Prague, Czech Republic, June 2007. (PDF)

S. Seneff, M. Adler, J. Glass, B. Sherry, T. J. Hazen, C. Wang, and T. Wu, "Exploiting Context Information in Spoken Dialogue Interaction with Mobile Devices", Proc. International Workshop on Improved Mobile User Experience (IMUx), Toronto, Canada, May 2007. (PDF)

T. Sainath, D. Kanevsky, and G. Iyengar, "Unsupervised Audio Segmentation Using Extended Baum-Welch Transformations", Proc. ICASSP, Honolulu, Hawaii, April 2007. (PDF)

T. Hori, I. L. Hetherington, T. J. Hazen, and J. Glass, "Open-Vocabulary Spoken Utterance Retrieval Using Confusion Networks", Proc. ICASSP, Honolulu, Hawaii, April 2007. (PDF)

R. Rifkin, K. Schutte, M. Saad, J. Bouvrie, and J. Glass, "Noise Robust Phonetic Classification with Linear Regularized Least Squares and Second-Order Features", Proc. ICASSP, Honolulu, Hawaii, April 2007. (PDF)

K. Livescu, O. Cetin, M. Hasegawa-Johnson, S. King, C. Bartels, N. Borges, A. Kantor, P. Lal, L. Yung, A. Bezman, S. Dawson-Haggerty, B. Woods, J. Frankel, M. Magimai-Doss, K. Saenko, "Articulatory Feature-based Methods for Acoustic and Audio-visual Speech Recognition: Summary from the 2006 JHU Summer Workshop", Proc. ICASSP, Honolulu, Hawaii, April 2007. (PDF)

K. Livescu, A. Bezman, N. Borges, L. Yung, O. Cetin, J. Frankel, S. King, M. Magimai-Doss, X. Chi, and L. Lavoie, "Manual Transcription of Conversational Speech at the Articulatory Feature Level", Proc. ICASSP, Honolulu, Hawaii, April 2007. (PDF)

O. Cetin, A. Kantor, S. King, C. Bartels, M. Magimai-Doss, J. Frankel, and K. Livescu, "An Articulatory Feature-based Tandem Approach and Factored Observation Modeling", Proc. ICASSP, Honolulu, Hawaii, April 2007. (PDF)

C. Wang and S. Seneff, "Automatic Assessment of Student Translations for Foreign Language Tutoring", Proc. HLT-NAACL, Rochester, NY, April 2007. (PDF)

J. Lee, M. Zhou, and X. Liu, "Detection of Non-native Sentences using Machine-translated Training Data", Proc. HLT-NAACL (Short Papers), Rochester, NY, April 2007. (PDF)

S. Seneff, C. Wang, and C. Chao, "Spoken Dialogue Systems for Language Learning", Proc. HLT-NAACL, Rochester, NY, April 2007. (PDF)

2006

B. Hsu and J. Glass, "Spoken Correction for Chinese Text Entry," Proc. 5th International Symposium on Chinese Spoken Language Processing (ISCSLP), Kent Ridge, Singapore, December 2006. (PDF)

M. Peabody, and S. Seneff, "Towards Automatic Tone Correction in Non-native Mandarin," Proc. 5th International Symposium on Chinese Spoken Language Processing (ISCSLP), Kent Ridge, Singapore, December 2006. (PDF)

A. Gruenstein and S. Seneff, "Context-Sensitive Language Modeling for Large Sets of Proper Nouns in Multimodal Dialogue Systems," Proc. IEEE/ACL 2006 Workshop on Spoken Language Technology, Palm Beach, Aruba, December 2006. (PDF)

A. Park and J. Glass, "A Novel DTW-based Distance Measure for Speaker Segmentation," Proc. IEEE/ACL 2006 Workshop on Spoken Language Technology, Palm Beach, Aruba, December 2006. (PDF)

K. Saenko and K. Livescu, "An Asynchronous DBN for Audio-Visual Speech Recognition," Proc. IEEE/ACL 2006 Workshop on Spoken Language Technology, Palm Beach, Aruba, December 2006. (PDF)

A. Gruenstein, S. Seneff, and C. Wang, "Scalable and Portable Web-Based Multimodal Dialogue Interaction with Geographical Database," Proc. Interspeech, Pittsburgh, Pennsylvania, September 2006. (PDF)

T. J. Hazen, "Automatic Alignment and Error Correction of Human Generated Transcripts for Long Speech Recordings," Proc. Interspeech, Pittsburgh, Pennsylvania, September 2006. (PDF)

J. Lee and S. Seneff, "Automatic Grammar Correction for Second-Language Learners," Proc. Interspeech, Pittsburgh, Pennsylvania, September 2006. (PDF)

J. Ming, T. J. Hazen, and J. Glass, "Combining Missing-Feature Theory, Speech Enhancement and Speaker-Dependent/-Independent Modeling for Speech Separation," Proc. Interspeech, Pittsburgh, Pennsylvania, September 2006. (PDF)

C. Wang and S. Seneff, "High-Quality Speech Translation in the Flight Domain," Proc. Interspeech, Pittsburgh, Pennsylvania, September 2006. (PDF)

Y. Wang, A. Acero, M. Mahajan, and J. Lee, "Combining Statistical and Knowledge-Based Spoken Language Understanding in Conditional Models," Proc. COLING/ACL, Sydney, Australia, July 2006. (PDF)

B. Hsu and J. Glass, "Style & Topic Language Model Adaptation Using HMM-LDA," Proc. EMNLP, Sydney, Australia, July 2006. (PDF)

E. Filisko and S. Seneff, "Learning Decision Models in Spoken Dialogue Systems via User Simulation," Proc. AAAI Workshop on Statistical and Empirical Approaches for Spoken Dialog Systems, Boston, Massachusetts, July 2006. (PDF)

R. Woo, A. Park, and T. J. Hazen, "The MIT Mobile Device Speaker Verification Corpus: Data collection and preliminary experiments," Proceedings of Odyssey 2006, The Speaker and Language Recognition Workshop, June 2006. (PDF)

G. Choueiter, D. Povey, S.F. Chen, and G. Zweig, "Morpheme-Based Language Modeling for Arabic LVCSR," Proc. ICASSP 2006, Toulouse, France, May 2006. (PDF)

I. L. Hetherington, H. Shu, and J. Glass, "Flexible Multi-Stream Framework for Speech Recognition Using Multi-Tape Finite-State Transducers," Proc. ICASSP 2006, Toulouse, France, May 2006. (PDF)

J. Ming, T. J. Hazen, and J. Glass, "Speaker Verification Over Handheld Devices with Realistic Noisy Speech Data," Proc. ICASSP 2006, Toulouse, France, May 2006. (PDF)

A. Park and J. Glass, "Unsupervised Word Acquisition from Speech Using Pattern Discovery," Proc. ICASSP 2006, Toulouse, France, May 2006. (PDF)

T. N. Sainath and T. J. Hazen, "A Sinusoidal Model Approach to Acoustic Landmark Detection and Segmentation for Robust Segment-Based Speech Recognition," Proc. ICASSP 2006, Toulouse, France, May 2006. (PDF)

Y. Wang, J. Lee, and A. Acero, "Speech Utterance Classification Model Training Without Manual Transcriptions," Proc. ICASSP 2006, Toulouse, France, May 2006. (PDF)

2005

J. Lee and S. Seneff, "Interlingua-Based Translation for Language Learning Systems," Proc. ASRU, 133-138, San Juan, Puerto Rico, December 2005. (PDF)

A. Park and J. Glass, "Towards Unsupervised Pattern Discovery in Speech," Proc. ASRU, 53-58, San Juan, Puerto Rico, December 2005. (PDF)

A. Gruenstein, C. Wang, and S. Seneff, "Context-Sensitive Statistical Language Modeling," Proc. Interspeech, 17-20, Lisbon, Portugal, September 2005. (PDF)

I. Lee Hetherington, "A Multi-Pass, Dynamic-Vocabulary Approach to Real-Time, Large-Vocabulary Speech Recognition," Proc. Interspeech, 545-548, Lisbon, Portugal, September 2005. (PDF)

K. Schutte and J. Glass, "Robust Detection of Sonorant Landmarks," Proc. Interspeech, 1005-1008, Lisbon, Portugal, September 2005. (PDF)

C. Wang, S. Seneff, and G. Chung, "Language Model Data Filtering via User Simulation and Dialogue Resynthesis," Proc. Interspeech, 21-24, Lisbon, Portugal, September 2005. (PDF)

O. Scharenborg and S. Seneff, "A Two-Pass for Strategy Handling OOVs in a Large Vocabulary Recognition Task," Proc. Interspeech, 1669-1672, Lisbon, Portugal, September 2005. (PDF)

G. Chung, S. Seneff, and C. Wang, "Automatic Induction of Language Model Data for a Spoken Dialogue System," Proc. SIGDIAL, Lisbon, Portugal, September 2005. (PDF)

A. Gruenstein, J. Niekrasz, and M. Purver, "Meeting Structure Annotation: Data and Tools," Proc. SIGDIAL, Lisbon, Portugal, September 2005. (PDF)

E. Filisko and S. Seneff, "Developing City Name Acquisition Strategies in Spoken Dialogue Systems Via User Simulation," Proc. SIGDIAL, Lisbon, Portugal, September 2005. (PDF)

T. Hazen, L. Hetherington, H. Shu, and K. Livescu, "Pronunciation modeling using a finite-state transducer representation," Speech Communication. Vol. 46, No. 2, pp. 189-203, June, 2005. (Preprint PDF) (Speech Communication Home Page)

G. Choueiter and J. Glass, "A Wavelet and Filter Bank Framework for Phonetic Classification," Proc. ICASSP, Philadelphia, March 2005. (PDF)

M. Hasegawa-Johnson, J. Baker, S. Borys, K. Chen, E. Coogan, S. Greenberg, A. Juneja, K. Kirchhoff, K. Livescu, K. Sonmez, S. Mohan, J. Muller, and T. Wang, "Landmark-based speech recognition: Report of the 2004 Johns Hopkins Summer Workshop," Proc. ICASSP, Philadelphia, March 2005. (PDF)

A. Park, T. Hazen, and J. Glass, "Automatic Processing of Audio Lectures for Information Retrieval: Vocabulary Selection and Language Modeling," Proc. ICASSP, Philadelphia, March 2005. (PDF)

K. Saenko, K. Livescu, J. Glass, and T. Darrell, "Production Domain Modeling of Pronunciation for Visual Speech Recognition," Proc. ICASSP, Philadelphia, March 2005. (PDF)

S. Sakai, "Additive Modeling of English F0 Contour for Speech Synthesis," Proc. ICASSP, Philadephia, March 2005. (PDF)

S. Sakai, "Fundamental Frequency Modeling for Speech Synthesis Based on a Statistical Learning Technique," IEICE Transactions on Information and Systems, pp. 489-495, March 2005. (PDF)

2004

J. Glass, E. Weinstein, S. Cyphers, J. Polifroni, G. Chung, and M. Nakano, "A Framework for Developing Conversational User Interfaces," Proc. CADUI, 354-365, Funchal, Portugal, January 2004. (PDF)

S. Seneff, "The Use of Subword Linguistic Modelling for Multiple Tasks in Speech Recognition," Speech Communication, Vol. 42, No. 3-4, pp. 373-390, April 2004. (PDF)

J. Glass, T. Hazen, L. Hetherington and C. Wang, "Analysis and processing of lecture audio data: Preliminary investigations", Proc. HLT-NAACL 2004 Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval, 9-12, Boston, MA, May, 2004. (PDF)

E. Filisko and S. Seneff, "Error Detection and Recovery in Spoken Dialogue Systems," Proc. HLT-NAACL 2004 Workshop on Spoken Language Understanding for Conversational Systems, 31-38, Boston, MA, May, 2004. (PDF)

P. Boda and E. Filisko, "Virtual Modality: a Framework for Testing and Building Multimodal Applications," Proc. HLT-NAACL 2004 Workshop on Spoken Language Understanding for Conversational Systems, 17-24, Boston, MA, May, 2004. (PDF)

K. Livescu and J. Glass, "Feature-based pronunciation modeling for speech recognition." Proc. HLT/NAACL, Boston, MA, May 2004. (PDF)

J. Lee, "Automatic Article Restoration," Proc. HLT-NAACL 2004 Student Research Workshop, Boston, MA, 195-200, May, 2004. (PDF)

E. McDermott and T. Hazen, "Minimum classification error training of landmark models for real-time continuous speech recognition," Proc. ICASSP, 937-940, Montreal, Quebec, May, 2004. (PDF)

S. Sakai, "F0 Modeling with Multi-Layer Additive Modeling Based on a Statistical Learning Technique," Proc. ISCA Speech Synthesis Workshop, 151-154, Pittsburgh, PA, June 2004. (PDF)

C. Wang and S. Seneff, "High-quality Speech Translation for Language Learning," Proc. InSTIL Symposium on Computer Assisted Language Learning, 99-102, Venice, Italy, 2004. (PDF)

S. Seneff, C. Wang, and J. Zhang, "Spoken Conversational Interaction for Language Learning," Proc. InSTIL Symposium on Computer Assisted Language Learning, 151-154, Venice, Italy, 2004. (PDF)

M. Peabody, S. Seneff, and C. Wang, "Mandarin Tone Acquisition through Typed Dialogues," Proc. InSTIL Symposium on Computer Assisted Language Learning, 173--176, Venice, Italy, 2004. (PDF)

J. Lee and S. Seneff, "Translingual Grammar Induction," Proc. Interspeech, 724-727, Jeju, South Korea, October 2004. (PDF)

J-M Kim, C. Wang, M. Peabody, and S. Seneff, "An Interactive English Pronunciation Dictionary for Korean Learners," Proc. Interspeech, 1145-1148, Jeju, South Korea, October 2004. (PDF)

G. Chung, C. Wang, S. Seneff, E. Filisko, and M. Tang, "Combining Linguistic Knowledge and Acoustic Information in Automatic Pronunciation Lexicon Generation" Proc. Interspeech, 328-332, Jeju, South Korea, October 2004. (PDF)

G. Chung, S. Seneff, C. Wang, and L. Hetherington, "A Dynamic Vocabulary Spoken Dialogue Interface," Proc. Interspeech, 327-330, Jeju, South Korea, October 2004. (PDF)

K. Livescu and J. Glass, "Feature-based pronunciation modeling with trainable asynchrony probabilities," Proc. ICSLP, Jeju, South Korea, October 2004. (PDF)

L. Hetherington, "The MIT Finite-State Transducer Toolkit for Speech and Language Processing," Proc. ICSLP, Jeju, South Korea, October 2004. (PDF)

A. Park and T. J. Hazen, "A comparison of normalization and training approaches for ASR-dependent speaker identification," Proc. Interspeech, Jeju, South Korea, October, 2004. (PDF)

T.J. Hazen, E. Saenko, C.H. La and J. Glass, "A segment-based audio-visual speech recognizer: Data collection, development and initial experiments," Proc. ICMI, State College, PA, October 2004. (PDF)

E. Saenko, T. Darrell, and J. Glass, "Articulatory Features for Robust Visual Speech Recognition," Proc. ICMI, State College, PA, October 2004. (PDF)

2003

J. Glass, "A Probabilistic Framework for Segment-Based Speech Recognition," Computer Speech and Language 17, 137-152, 2003. (PDF)

J. Glass and S. Seneff, "Flexible and Personalizable Mixed-Initiative Dialogue Systems," Proc. HLT-NAACL Workshop on Research Directions in Dialogue Processing, Edmonton, Canada, May 2003. (PDF)

G. Chung, S. Seneff, and C. Wang, "Automatic Acquisition of Names Using Speak and Spell Mode in Spoken Dialogue Systems," Proc. HLT-NAACL 2003, Edmonton, Canada, May, 2003, pp. 197-200. (PDF)

E. Filisko and S. Seneff, "A Context Resolution Server for the Galaxy Conversational Systems," Proc. Eurospeech, 197-200, Geneva, Switzerland, September 2003. (PDF)

T. J. Hazen, D. A. Jones, A. Park, L. C. Kukolich, and
D. A. Reynolds, "Integration of Speaker Recognition into Conversational Spoken Dialogue Systems," Proc. Eurospeech, 1961-1964, Geneva, Switzerland, September 2003. (PDF)

J. Schalkwyk, I. Lee Hetherington, and E. Story, "Speech Recognition with Dynamic Grammars Using Finite-State Transducers," Proc. Eurospeech, 1969-1972, Geneva, Switzerland, September 2003. (PDF)

K. Livescu, J. Glass, and J. Bilmes, "Hidden Feature Models for Speech Recognition Using Dynamic Bayesian Networks," Proc. Eurospeech, 2529-2532, Geneva, Switzerland, September 2003. (PDF)

M. Nakano and T. J. Hazen, "Using Untranscribed User Utterances for Improving Language Models based on Confidence Scoring," Proc. Eurospeech, 417-420, Geneva, Switzerland, September 2003. (PDF)

J. Polifroni, G. Chung, and S. Seneff, "Towards the Automatic Generation of Mixed-Initiative Dialogue Systems from Web Content," Proc. Eurospeech, 193-196, Geneva, Switzerland, September 2003. (PDF)

S. Seneff, G. Chung, and C. Wang, "Empowering End Users to Personalize Dialogue Systems through Spoken Interaction," Proc. Eurospeech, 749-752, Geneva, Switzerland, September 2003. (PDF)

S. Seneff, C. Wang, and T. J. Hazen, "Automatic Induction of N-Gram Language Models from a Natural Language Grammar," Proc. Eurospeech, 641-644, Geneva, Switzerland, September 2003. (PDF)

M. Tang, S. Seneff, and V. Zue, "Modeling Linguistic Features in Speech Recognition," Proc. Eurospeech, 2585-2588, Geneva, Switzerland, September 2003. (PDF)

T. J. Hazen, E. Weinstein, and A. Park, "Towards Robust Person Recognition on Handheld Devices Using Face and Speaker Identification Technologies," Proc. ICMI, Vancouver, Canada, November 2003. (PDF)

T. J. Hazen, E. Weinstein, R. Kabir, A. Park, and
B. Heisele, "Multi-Modal Face and Speaker Identification on a Handheld Device," Proc. Workshop on Multimodal User Authentication, 113-120, Santa Barbara, California, December 2003. (PDF)

S. Sakai and J. Glass, "Fundamental Frequency Modeling for Corpus-Based Speech Synthesis Based on a Statistical Learning Technique," Proc. ASRU, 712-717, St. Thomas, U. S. Virgin Islands, December 2003. (PDF)

H. Shu, I. Lee Hetherington, and J. Glass, "Baum-Welch Training for Segment-Based Speech Recognition," Proc. ASRU, 43-48, St. Thomas, U. S. Virgin Islands, December 2003. (PDF)

M. Tang, S. Seneff, and V. Zue, "Two-Stage Continuous Speech Recognition Using Feature-Based Models: A Preliminary Study," Proc. ASRU, 49-54, St. Thomas, U. S. Virgin Islands, December 2003. (PDF)

2002

G. Zweig, J. Bilmes, T. Richardson, K. Filali, K. Livescu, P. Xu, K. Jackson, Y. Brandman, E. Sandness, E. Holtz, J. Torres, B. Byrne, "Structurally Discriminative Graphical Models for Automatic Speech Recognition: Results from the 2001 Johns Hopkins Summer Workshop," Proc. ICASSP, Orlando, Florida, June 2002. (PDF)

M. Tang, X. Luo, and S. Roukos, "Active Learning for Statistical Natural Language Parsing," Proc. ACL, Philadelphia, PA, July 2002. (PDF)

T. J. Hazen, S. Seneff, and J. Polifroni, "Recognition confidence scoring and its use in speech understanding systems," Computer Speech and Language, 16, 49-67, 2002. (PDF)

S. Seneff, "Response Planning and Generation in the MERCURY Flight Reservation System, "Computer Speech and Language, 16, 283-312, 2002. (PDF)

I. Bazzi and J. Glass, "A Multi-Class Approach for Modelling Out-of-Vocabulary Words," Proc. ICSLP, 1613-1616, Denver, CO, September 2002. (PDF)

G. Chung and S. Seneff, "Integrating Speech with Keypad Input for Automatic Entry of Spelling and Pronunciation of New Words," Proc. ICSLP, 2061-2064, Denver, CO, September 2002. (PDF)

T. J. Hazen, I. Lee Hetherington, H. Shu, and K. Livescu, "Pronunciation Modeling Using a Finite-State Transducer Representation," Proc. ISCA Workshop on Pronunciation Modeling and Lexicon Adaptation, 99-104, Estes Park, CO, September 2002. (PDF)

X. Mou, S. Seneff, and V. Zue, "Integration of Supra-Lexical Linguistic Models with Speech Recognition Using Shallow Parsing and Finite State Transducers," Proc. ICSLP, 1289-1292, Denver, CO, September 2002. (PDF)

A. Park and T. J. Hazen, "ASR Dependent Techniques for Speaker Identification," Proc. ICSLP, 1337-1340, Denver, CO, September 2002. (PDF)

J. Polifroni and G. Chung, "Promoting Portability in Dialogue Management," Proc. ICSLP, 2721-2724, Denver, CO, September 2002. (PDF)

E. Pusateri and T. J. Hazen, "Rapid Speaker Adaptation Using Speaker Clustering," Proc. ICSLP, 61-64, Denver, CO, September 2002. (PDF)

H. Shu and I. Lee Hetherington, "EM Training of Finite-State Transducers and Its Application to Pronunciation Modeling," Proc. ICSLP, 1293-1296, Denver, CO, September 2002. (PDF)

J. Yi and J. Glass, "Information-Theoretic Criteria for Unit Selection Synthesis," Proc. ICSLP, 2617-2620, Denver, CO, September 2002. (PDF)

2001

T.J. Hazen and I. Bazzi, "A Comparison and Combination of Methods for OOV Word Detection and Word Confidence Scoring," Proceedings ICASSP, Salt Lake City, UT, May 2001. (PDF)

X. Mou and V. Zue, "Sublexical Modelling Using a Finite State Transducer Framework," Proc. ICASSP, Salt Lake City, UT, May 2001. (PDF)

I. Bazzi and J. Glass, "Learning Units for Domain-Independent Out-of-Vocabulary Word Modelling," Proc. Eurospeech, Aalborg, Denmark, September 2001. (PDF)

J. Glass and E. Weinstein, "SPEECHBUILDER: Facilitating Spoken Dialogue Systems Development," Proc. Eurospeech, Aalborg, Denmark, September 2001. (PDF)

M. Nakano, T. Minami, S. Seneff, T. J. Hazen, D. Scott Cyphers, J. Glass, J. Polifroni, V. Zue, "Mokusei: A Telephone-based Japanese Conversational System in the Weather Domain," Proc. Eurospeech, Aalborg, Denmark, September 2001. (PDF)

T. J. Hazen, I. Lee Hetherington and A. Park, "FST-Based Recognition Techniques for Multi-Lingual and Multi-Domain Spontaneous Speech," Proc. Eurospeech, Aalborg, Denmark, September 2001. (PDF)

I. Lee Hetherington ,"An Efficient Implementation of Phonological Rules using Finite-State Transducers," Proc. Eurospeech, Aalborg, Denmark, September 2001. (PDF)

K. Livescu and J. Glass, "Segment-Based Recognition on the PhoneBook Task: Initial Results and Observations on Duration Modeling," Proc. Eurospeech, Aalborg, Denmark, September 2001. (PDF)

X. Mou, S. Seneff and V. Zue, "Context-dependent Probabilistic Hierarchical Sub-lexical Modelling Using Finite State Transducers," Proc. Eurospeech, Aalborg, Denmark, September 2001. (PDF)

M. Tang, C. Wang, and S. Seneff, "Voice Transformations: From Speech Synthesis to Mammalian Vocalizations," Proc. Eurospeech, Aalborg, Denmark, September 2001. (PDF)

C. Wang and S. Seneff, "Lexical Stress Modeling for Improved Speech Recognition of Spontaneous Telephone Speech in the JUPITER Domain," Proc. Eurospeech, Aalborg, Denmark, September 2001. (PDF)

H. Dolfing and L. Hetherington, "Incremental Language Models for Speech Recognition using Finite-State Transducers," Proc. ASRU, Madonna di Campiglio, Italy, December 2001. (PDF).

2000

V. Zue, et al., "JUPITER: A Telephone-Based Conversational Interface for Weather Information," IEEE Transactions on Speech and Audio Processing, Vol. 8 , No. 1, January 2000. (PDF)

S. Seneff and J. Polifroni, "Dialogue Management in the Mercury Flight Reservation System," Proc. Dialogue Workshop, ANLP-NAACL, Seattle, April 2000. (PDF)

T. J. Hazen, "A comparison of novel techniques for rapid speaker adaptation," Speech Communication, Vol. 31 (2000), pp. 15-33, May 2000. (gzip'd PS) (PDF)

J. Polifroni and S. Seneff, "Galaxy-II as an Architecture for Spoken Dialogue Evaluation" Proc. LREC, Athens, Greece, May 2000. (PDF)

I. Bazzi and J. Glass, "Heterogeneous Lexical Units for Automatic Speech Recognition: Preliminary Investigations" Proc. ICASSP, Istanbul, Turkey, June 2000. (PDF)

S. Kamppari and T.J. Hazen, "Word and Phone Level Acoustic Confidence Scoring," Proc. ICASSP, Istanbul, Turkey, June 2000. (PDF)

K. Livescu and J. Glass, "Lexical Modeling of Non-native Speech for Automatic Speech Recognition," Proc. ICASSP, Istanbul, Turkey, June 2000. (PDF)

K. Ng, "Information Fusion for Spoken Document Retrieval," Proc. ICASSP, Istanbul, Turkey, June 2000. (PDF)

C. Wang and S. Seneff, "Robust Pitch Tracking for Prosodic Modeling in Telephone Speech," Proc. ICASSP, Istanbul, Turkey, June 2000. (PDF)

S. Seneff, J. Glass, T.J. Hazen, Y. Minami, J. Polifroni, and V. Zue, "MOKUSEI: A Japanese Spoken Dialogue System in the Weather Domain," NTT R&D Vol. 49, No. 7, 2000.

V. Zue and J. Glass, "Conversational Interfaces: Advances and Challenges" Proceedings of the IEEE, Special Issue on Spoken Language Processing, Vol. 88, August 2000. (PDF)

T. Hazen, T. Burianek, J. Polifroni and S. Seneff, "Recognition Confidence Scoring for Use in Speech Understanding Systems," Proc. ISCA Tutorial and Research Workshop: ASR2000, Paris, France, September 2000. (PDF)

V. Zue, et al., "From JUPITER to MOKUSEI: Multilingual Conversational Systems in the Weather Domain," Proc. Workshop on Multilingual Speech Communications (MSC2000), Kyoto, Japan, October 2000. (gzip'd PS)

L. Baptist and S. Seneff, "Genesis-II: A Versatile System for Language Generation in Conversational System Applications," Proc. ICSLP, Beijing, China October 2000. (PDF)

I. Bazzi and J. Glass, "Modeling Out-of-Vocabulary Words for Robust Speech Recognition" Proc. ICSLP, Beijing, China October 2000. (PDF)

I. Bazzi and D. Katabi, "Using Support Vector Machines for Spoken Digit Recognition," Proc. ICSLP, Beijing, China October 2000. (PDF)

G. Chung, "A Three-stage Solution for Flexible Vocabulary Speech Understanding," Proc. ICSLP, Beijing, China, October 2000. (PDF)

G. Chung, "Automatically Incorporating Unknown Words in Jupiter," Proc. ICSLP, Beijing, China, October 2000. (PDF)

J. Glass, J. Polifroni, S. Seneff and V. Zue, "Data Collection and Performance Evaluation of Spoken Dialogue Systems: The MIT Experience," Proc. ICSLP, Beijing, China October 2000. (PDF)

T.J. Hazen, T. Burianek, J. Polifroni and S. Seneff, "Integrating Recognition Confidence Scoring with Language Understanding and Dialogue Modeling," Proc. ICSLP, Beijing, China, October 2000. (PDF)

X. Mou and V. Zue, "The Use of Dynamic Reliability Scoring in Speech Recognition," Proc. ICSLP, Beijing, China, October 2000. (PDF)

E. Sandness and I.L. Hetherington, "Keyword-based Discriminative Training of Acoustic Models," Proc. ICSLP, Beijing, China, October 2000. (PDF)

S. Seneff, C. Chuu, and D. S. Cyphers, "Orion: From On-line Interaction to Off-line Delegation," Proc. ICSLP, Beijing, China, October 2000. (PDF)

S. Seneff and J. Polifroni, "Formal and Natural Language Generation in the Mercury Conversational System," Proc. ICSLP, Beijing, China, October 2000. (PDF)

N. Ström and S. Seneff, "Intelligent Barge-in in Conversational Systems," Proc. ICSLP, Beijing, China, October 2000. (PDF)

C. Wang and S. Seneff, "Improved Tone Recognition by Normalizing For Coarticulation and Intonation Effects." Proc. ICSLP, Beijing, China, October 2000. (PDF)

C. Wang, S. Cyphers, X. Mou, J. Polifroni, S. Seneff, J. Yi and V. Zue, "Muxing: A Telephone-Access Mandarin Conversational System," Proc. ICSLP, Beijing, China, October 2000. (PDF)

J. Yi, J. Glass and L. Hetherington, "A Flexible, Scalable Finite-State Transducer Architecture for Corpus-Based Concatenative Speech Synthesis," Proc. ICSLP, Beijing, China, October 2000. (PDF)

1999

J. Glass, T.J. Hazen and L. Hetherington, "Real-time telephone-based speech recognition in the JUPITER domain," Proc. ICASSP, Phoenix, AZ, March 1999. (PDF)

G. Chung and S. Seneff, "A Hierarchical Duration Model for Speech Recognition Based on the ANGIE Framework," Speech Communication, 27, 113-134, 1999. (gzip'd PS)

G. Chung, S. Seneff and I.L. Hetherington, "Towards Multi-Domain Speech Understanding Using a Two-Stage Recognizer," Proc. Eurospeech, Budapest, Hungary, September 1999. (PDF)

S. Seneff, R. Lau and J. Polifroni, "Organization, Communication, and Control in the GALAXY-II Conversational System," Proc. Eurospeech, Budapest, Hungary, September 1999. (PDF)

J. Glass, "Challenges for Spoken Dialogue Systems," Proc. ASRU, Keystone, CO, December 1999. (PDF)

N. Ström, L. Hetherington, T.J. Hazen, E. Sandness and J. Glass, "Acoustic Modeling Improvements in a Segment-Based Speech Recognizer," Proc. ASRU, Keystone, CO, December 1999. (PDF)

1998

T.J. Hazen, and A. Halberstadt, "Using Aggregation to Improve the Performance of Mixture Gaussian Acoustic Models," Proc. ICASSP, Seattle, WA, May 1998. (PDF)

K. Ng, and V. Zue, "Phonetic Recognition for Spoken Document Retrieval," Proc. ICASSP, Seattle, Wa, May 1998. (PDF)

J. Polifroni, S. Seneff, J. Glass, and T.J. Hazen, "Evaluation Methodology for a Telephone-based Conversational System," Proc. LREC, 42-50, Granada, Spain, May 1998. (PDF)

G. Chung and S. Seneff, "Improvements in Speech Understanding Accuracy through the Integration of Hierarchical Linguistic, Prosodic, and Phonological Constraints in the Jupiter Domain, " Proc. ICSLP, Sydney, Australia, November 1998. (PDF)

J. Glass and T.J. Hazen, "Telephone-Based Conversational Speech Recognition in the Jupiter Domain, " Proc. ICSLP, Sydney, Australia, November 1998. (PDF)

A. Halberstadt and J. Glass, "Heterogeneous Measurements and Multiple Classifiers for Speech Recognition," Proc. ICSLP, Sydney, Australia, November 1998. (PDF)

R. Lau and S. Seneff, "A Unified System for Sublexical and Linguistic Modelling Suporting Flexible Vocabulary Speech Understanding," Proc. ICSLP, Sydney, Australia, November 1998. (PDF)

S. Lee, and J. Glass, "Real-Time Probabilistic Segmentation for Segment-Based Speech Recogntion," Proc. ICSLP, Sydney, Australia, November 1998. (PDF)

C. Pao, P. Schmid, and J. Glass, "Confidence Scoring for Speech Understanding Systems," Proc. ICSLP, Sydney, Australia, November 1998. (PDF)

S. Seneff, "The Use of Linguistic Hierarchies in Speech Understanding," Proc. ICSLP, Sydney, Australia, November 1998. (PDF)

S. Seneff, E. Hurley, R. Lau, C. Pao, P. Schmid, and V. Zue, "Galaxy-II: A Reference Architecture for Conversational System Development," Proc. ICSLP, Sydney, Australia, November 1998. (PDF)

C. Wang and S. Seneff, "A Study of Tones and Tempo in Continuous Mandarin Digit Strings and their Application in Telephone Quality Speech Recognition," Proc. ICSLP, Sydney, Australia, November 1998. (PDF)

J. Yi and J. Glass, "Natural-Sounding Speech Synthesis Using Variable-Length Units," Proc. ICSLP, Sydney, Australia, November 1998. (PDF)


	32 Vassar Street Cambridge, MA 02139 USA (+1) 617.253.3049