Real-Time Integration of Dynamic Context Information for Improving Automatic Speech Recognition

Publication
In Proceedings of the 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015).

Abstract

The use of prior situational/contextual knowledge about a given task can significantly improve Automatic Speech Recognition (ASR) performance. This is typically done through adaptation of acoustic or language models if data is available, or using knowledge-based rescoring. The main adaptation techniques, however, are either domain-specific, which makes them inadequate for other tasks, or static and offline, and therefore cannot deal with dynamic knowledge. To circumvent this problem, we propose a real-time system which dynamically integrates situational context into ASR.

The context integration is done either post-recognition, in which case a weighted Levenshtein distance between the ASR hypotheses and the context information, based on the ASR confidence scores, is proposed to extract the most likely sequence of spoken words, or pre-recognition, where the search space is adjusted to the new situational knowledge through adaptation of the finite state machine modeling the spoken language. Experiments conducted on 3 hours of Air Traffic Control (ATC) data achieved a reduction of the Command Error Rate (CmdER), which is used as evaluation metric in the ATC domain, by a factor of 4 compared to using no contextual knowledge.

Youssef Oualil
Machine Learning Research Engineer