Random Newsbytes: Next: the conversational interface?: Challenges remain
in improving how a computer handles a dialogue
Tony Waltham*
12/10/97
Bangkok Post
Copyright 1997 The Bangkok Post
Computer users have largely shifted from a command line interface to a graphical user interface,
using a mouse to move around the screen and clicking on icons to "launch" programs.
Making this possible have been the rapid advances in processing power and the steady drop in the
cost of computer memory and storage. Ten years ago, Windows/286 showed the direction, but Windows
ran like treacle on 286 machines of that era.
Even the Intel 80386 processor, the newest chip on the block a decade back, did not really have
enough power for Windows, and it took the 486-based machines to enable Windows 3.x to really gain
its commanding footprint on the desktop.
Similarly, today you can talk to your PC.
Dragon Dictate is one popular PC application, while IBM has good solutions. These systems can
be trained to enhance their understanding of a user's commands, but are sensitive to the
microphone used and occasionally require a specific sound card.
We are still at the pioneering stage with PC-based voice recognition software: if the user
cooperates and is patient, it can usually be "made" to work.
However, Intel and Microsoft have both stated that the "human interface" will drive sales of
future generations of microprocessors and of software applications and the continuing miniaturisation
that is giving us faster chips and more memory per square inch will enable this.
MIT Computer Science Laboratory's Associate Director Victor Zue explained to delegates at the
Fourth Natural Language Processing Pacific Rim Symposium in Phuket last week that this and two
other factors were driving what he called the "conversational interface," which he said was "inevitable."
One underlying fundamental is "the human desire to communicate," while the other -- in addition
to miniaturisation -- is the increased connectivity and networking that is occurring, such as over
the Internet.
He demonstrated live to the audience one such manifestation of the conversational interface, using
a telephone handset as the "client". Dialling up MIT in Cambridge, Massachusetts, Prof Zue spoke to
a computer called "Jupiter".
He asked for a weather forecast for the Cambridge area. Then he asked it how many cities it knew in
Thailand. The response was "I know one city in Thailand ... Bangkok." Prof Zue asked for forecast
for Bangkok, and got an immediate response over the telephone from the MIT computer.
This demonstration of conversational access to online information or services clearly shows that it
can be done today, although Prof Zue explained that the "expertise" demonstrated was limited to a
certain narrow domain of knowledge, that is where both the questions expected and the answers are
fairly limited in scope.
Other such "domains" that have been developed today can be to handle other on-line enquiries such as
about movies that are showing, restaurants or job opportunities.
Also demonstrated was "virtual browsing" using a conversational interface, whereby information was
requested verbally and a computer looked it up on the Web and presented the responses on a computer
screen.
Queries could be made regarding the information displayed, and the responses again appeared on the
screen in text mode.
There are clear opportunities for such services once the technology -- "where much research remains
to be done" -- becomes more sophisticated, and then computers will appear in places that they are not
found today.
They will also be accessible with a telephone call and the ability to process or understand words
spoken over a telephone, and consequently of lower quality, was important, he said.
The model that MIT foresaw was a distributed one, with servers that each focussed on domains of
knowledge which could be added to a system incrementally. The ability to deal with continuous speech
and by unknown users was important, while the vocabulary should be 1,000 words or more.
Importantly, processing should be in real time, meaning instantaneously, and the system should operate
on standard platforms, he said.
Multiple languages can be catered to, both in terms of the responses as well as for the questions --
and such systems can even be used as a language learning system, Prof Stephanie Seneff, also of the MIT
Computer Science Lab, explained.
And, if PC users who dabbled in Microsoft Windows on a 286 machine 10 years ago spent more time
looking at the hourglass symbol than they cared to, speech processing and recognition 10 years ago
was much further behind.
Prof Seneff recalled how you could input a sentence and go away and make a pot of tea or go off to
lunch, as it would take up to 20 minutes for a computer a decade ago to process this information
and respond to it.
Challenges remain, and issues include designing a system that can learn new words, since languages
are dynamic, as well as in improving how a computer handles a dialogue remain. Such a system should
not be completely passive, nor should it be too assertive.
In a conversation such as a telephone enquiry there is a lot of "back channel" or very short dialogue
(such as "yes," or "umm", and 80 percent of remarks contain less than 12 words, Prof Zue explained.
It is reassuring that research institutes around the world, including several universities here in
Thailand as well as event organisers, the National Electronics and Computer Technology Centre (Nectec),
Kasetsart University and the Asian Institute of Technology, are addressing these issues.
And, as such, I can be fairly confident of making the prediction that in 10 years from now we will
be talking with computers a lot -- the key question is, if the "conversation" is over a telephone,
"will we realise that it is a computer or a real person at the other end or not?"
*Tony Waltham is Editor of Database.
|