Sophia: computer can hear you

Jean-Pierre Largillet·28 août 2001·3 min de lecture

The specialists of voice recognition meet August 29th and 30th at the CICA for the workshop of the international Speech Communication Association. Explanations of professor Wellekens.

This is a strong technological theme: voice recognition. And the main world specialists in this industry will be in Sophia Antipolis, at the CICA, on August 29th and 30th for the work shop of the ISCA (International Speech Communication Association) which will deal with the theme of "Adaptation methods for voice recognition". This colloquium, organised by the Eurécom Institute with professor Christian J.Wellekens, has managed to gather famous people from all over the world, because it happens before the world congress "Eurospeech" which will take place in Denmark from September 3rd to 7th. So, many specialists will make a detour via Sophia before the most important event of the year for this industry.

This two-day-long meeting puts together around fifty contributions proposed by university searchers or laboratory members of companies such as Nokia, Panasonic, Intel, Apple, Sony, France Telecom, INRIA, Lucent, Nuance, Compaq and Swisscom (see the full program on Eurécom websites). The specialist of voice recognition and professor at the Eurecom Institue, Christain J.Wellekens explains how the voice recognition process works and what adaptation methods bring.

The base of voice recognition

"Recognition of modern voice is based on construction of phoneme statistic models from large databases composed of the read or spontaneous multispeaker speech. These bases are labelled by linguists; it means that the exact content of sentences is known.

Training these models consists in evaluating their parameters from this data. This sentence is very slow and can requires more than 30 hours of calculation time on a very powerful engine. Once phoneme models are known, by using a phonetic dictionary you can build a model of every kind of word from the phonetic transcription.

During the recognition, we search for the word models suit which justifies the best the warning of the pronounced speech, that is to say the most probable when you take the received word warning into account. Recognition requires sophisticated programming methods to be done in real time. Recognition rates are increased with the use of some grammars which prevent any suit of words. They are deteriorated if pronunciations to analyse are altered by noise.

To improve recognition rates

Then, the language, accents, transmission canals (on telephone line or GSM) alter very seriously the results. In order to improve them, we could ask for training again phoneme models in application conditions. But it would require from the user the tiresome pronunciation of numerous known sentences for training and a long time to train again.

Adaptation consists in modifying parameters of models to improve recognition rates by using only a reduced number of sentences or word pronounced by the user for whom the recognition rate will become higher than the one he got with a all-speaker recognition.

Information and contacts
-Professor Christian J.Wellekens, Tel: +33 (0) 4 93 00 26 28; Multimedia Communications Deaprtment Eurécom, Office: +33 (0) 4 93 00 26 33; Fax: +33 (0) 4 93 00 26 28;
E-mail: Christian.Wellekens@eurecom.fr

Sophia: computer can hear you

Ne manquez rien

Newsletter

Restez connecté