|
ABOUT SLS
What We Do
Provide Universal Access
As computers increasingly permeate our daily lives, our demand for
online information is skyrocketing. Growing numbers of us turn to the
Internet to catch up on the latest news, sports, and weather, obtain
stock quotes, reserve airline flights, conduct research, or check out
what's playing at local theaters. Unfortunately, navigating through
vast amounts of data to obtain useful information can require a
time-consuming series of keyboard entries and mouse clicks, and
technical savvy. But there is a more efficient, more flexible tool
available for human-computer interaction, something that even the most
technically challenged of us could use anywhere, any time: spoken
language.
In order to make it possible for humans to speak to computers a
conversational interface is needed. A conversational interface enables
humans to converse with machines (in much the same way we communicate
with one another) in order to create, access, and manage information
and to solve problems. It is what Hollywood and every "vision of the
future" tells us that we must have. Since 1989, getting computers to
communicate the way people do -- by speaking and listening -- has been
the objective of the Spoken Language Systems (SLS) Group at MIT's
Computer Science and Artificial Intelligence Laboratory.
How does it work?
Imagine talking to a computer to find a needle-in-the haystack job
listing, or showtimes of a movie premiere at the closest theater. Today,
obtaining such information online requires a programmed transaction between
the user, who clicks through a pre-determined sequence of options and views
results, and the computer, which retrieves user-selected data. With spoken
language systems, however, user and machine can engage in a spontaneous,
interactive conversation, incrementally arriving at the desired information
in far fewer steps. A case in point is the following excerpt from a conversation
between a user and JUPITER,
an SLS-based weather forecast system:
JUPITER
audio file
User: Yes, I would
like the weather forecast for London, England, please.
JUPITER: In London
in England Wednesday, partly cloudy skies with periods of sunshine. High
82 and low 63. Is there something else?
User: What is that
in degrees Celsius, please?
JUPITER: In London
in England Wednesday, high 28 Celsius and low 17 Celsius. What else?
SLS researchers make this kind of dialogue look easy by empowering the
computer to perform five main functions in real time: speech recognition--
converting the user's speech to a text sentence of distinct words, language
understanding -- breaking down the recognized sentence grammatically, and
systematically representing its meaning, information retrieval -- obtaining
targeted data, based on that meaning representation, from the appropriate
online source, language generation -- building a text sentence that presents
the retrieved data in the user's preferred language, and speech synthesis
-- converting that text sentence into computer-generated speech. Throughout
the conversation, the computer also remembers previous exchanges. In this
example, JUPITER
can respond to "What is that in degrees Celsius, please?" because
the user has just asked about weather conditions in London. Otherwise,
the system would request the user to clarify the question.
Many speech-based interfaces can be considered conversational, and
they may be differentiated by the degree with which the system
maintains an active role in the conversation, or the complexity of the
potential dialogue. At one extreme are system-initiative, or
"directed-dialogue" transactions where the computer takes complete
control of the interaction by requiring that the user answer a set of
prescribed questions, much like the touch-tone implementation of
interactive voice response (IVR) systems. In the case of air travel
planning, for example, a directed-dialogue system could ask the user
to "Please say just the departure city." Since the user's options are
severely restricted, successful completion of such transactions is
easier to attain, and indeed some successful demonstrations and
commercial deployment of such systems have been made. At the other
extreme are user-initiative systems in which the user has complete
freedom in what they say to the system, (e.g., "I want to visit my
grandmother") while the system remains relatively passive, asking only
for clarification when necessary. In this case, the user may feel
uncertain as to what capabilities exist, and may, as a consequence,
stray quite far from the domain of competence of the system, leading
to great frustration because nothing is understood. Lying between
these two extremes are systems that incorporate a "mixed-initiative",
goal-oriented dialogue, in which both the user and the computer
participate actively to solve a problem interactively using a
conversational paradigm. It is this latter mode of interaction that
is the primary focus of our research.
In 1994 has developed an conversational architecture called GALAXY
that incorporates the necessary human language technologies (i.e.,
speech understanding and generation, discourse and dialogue) to enable
advanced research in mixed-initiative interaction. Since then, the
open source architecture has been adopted by many researchers around
the world as a framework for conducting their research on advanced
spoken dialogue systems. Here at MIT, we have developed many
prototype conversational systems, many of which are deployed on
toll-free telephone numbers, that enable users to access information
about weather forecasts (JUPITER), airline
scheduling (PEGASUS) and
flight planning (MERCURY), Cambridge city
locations (VOYAGER), and
selected Web-based information (WebGALAXY).
Raising the Level of Human to Computer Conversation
Although tremendous progress has been made over the last decade in
developing advanced conversational spoken language technology, much
additional progress must be achieved before conversational interfaces
approach the level of naturalness of human-human conversations. Today
SLS researchers are refining core human language technologies and are
incorporating speech with other kinds of natural input modilities such
as pen and gesture. They are working to upgrade the efficiency and
naturalness of application-specific conversations, improve new word
detection/learning capability during speech recognition, and increase
the portability of core technologies and develop new applications. As
the SLS Group continues to address these issues, it brings us closer
to the day when anyone, anywhere, any time, can interact easily with
computers.
Further Reading:
V. Zue and J. Glass, "Conversational Interfaces: Advances and Challenges"
Proceedings of the IEEE, Special Issue on Spoken Language Processing, Vol.
88, August 2000. (PDF)
J. Glass and S. Seneff, "Flexible and Personalizable
Mixed-Initiative Dialogue Systems," presented at HLT-NAACL 2003
Workshop on Research Directions in Dialogue Processing, Edmonton,
Canada, May 2003. (PDF)
V. Zue, et al., "JUPITER: A Telephone-Based Conversational Interface
for Weather Information," IEEE Transactions on Speech and Audio Processing,
Vol. 8 , No. 1, January 2000.(PDF)
|