|
SLS RESEARCH
Language is our primary means of communication, and speech is one of
its most convenient and efficient means of conveyance. In the Spoken
Language Systems group we endeavor to create the technologies that
enable advanced spoken language interaction between humans and
machines. For someone in this line of research, these are exciting
times. After many decades of laboratory research, speech technology
has reached a tipping point in our society whereby the notion of
talking with a computer has become an everyday occurrence via
smartphones and other devices that are becoming commercially
available. People want to talk to their devices in all aspects of
their lives, whether at home, at work, at play, or somewhere in
between.
When people think about speech technology, they usually mean much more
than just speech recognition, which is the process of identifying what
words have been spoken. To do something useful, a machine typically
needs to understand the underlying meaning, often in the larger
context of a multi-turn interaction, and generate some kind of
response to hold up its side of the conversation. Thus, there are a
suite of technologies necessary to enable these capabilities.
Speech is more than language however. When we speak, the resulting
waveform contains information about our identity, emotional state,
health etc., in addition to all the qualities associated with the
linguistic message such as which language, dialect, and speaking style
we use. Technologies that are capable of extracting relevant
information about these different facets of the speech signal will
also play useful roles in our lives. Finally, speech recordings also
contain information about the local environment, so, ultimately,
speech is but one component of a larger audio tapestry that needs to
be understood, perhaps jointly with other perceptual modalities such
as vision.
The SLS group addresses a broad range of research topics, but they can
generally be grouped according to three basic questions: 1) who is
talking, 2) what is said, and 3) what is meant. The first area
focuses on paralinguistic issues like speaker verification, language
and dialect identification, and speaker diarization (i.e., who spoke
when). However, we are also beginning to examine health-related
issues as they are manifested in the speech signal. The second
research area addresses core speech recognition capabilities and
addresses challenges related to noise robustness, limited linguistic
resources, and unsupervised language acquisition. The third and final
area focuses more on the boundary between speech and natural language
processing, and includes topics related to speech understanding, but
also related areas such as sentiment analysis and dialogue. Some of
this research focuses more on open-ended user-generated text content
such as social forums.
Research in speech and language processing is highly experimental,
typically involving large quantities of either annotated or
unannotated data. The mathematical models we create draw heavily from
machine learning techniques such as graphical models and deep neural
networks. While recent advances are remarkable, they represent only
the tip of the iceberg needed to achieve truly natural spoken language
human-machine interaction as depicted in science fiction movies and
literature.
|