LIUMLIA IRISA

ASH project (2009-2012): research on speech recognition

  • The ASH project is a research project started on october 2009 and funded by the French National Research Agency (ANR) for 36 months
  • This project focuses on automatic speech recognition
  • The ASH project proposes a new approach to combine several ASR systems. This approach consists in making these ASR systems exchange information during the decoding process, on the fly, whereas classical approach consists in only combining final outputs.

Context

  • Automatic speech recognition (ASR) implies the integration of various knowledge sources in order to convert a signal into the corresponding sequence of words. The sources of knowledge involved are at various levels : cognitive, semantic, linguistic, phonetic, prosodic, etc. Most state-of-the-art systems are based on the same technological ground, namely Hidden Markov Models (HMM) for the acoustic and N-gram language models for the linguistic.
  • The logic of the scientific community during the last 10 years has been the optimization of such techniques, focused on decreasing word error rates with little regards to the trade-off between the performance gain and the increase of computation time. This situation has led to increased performance of ASR systems to the expense of the amount of time and ressources required to train the systems (hundreds hours of training data, decoding of the training corpus, several hundreds of millions of words for the language model, etc.).
  • Another consequence of this research strategy driven by a single performance measure has been the standardization of ASR systems, alternate techniques being left over because they have not been able to compete with classical methods optimized during the last 10 years.
  • The current techniques and the scientific strategy of the scientific comunity seem to reach its limits, with marginal gains being observed for each technical improvement to the expense of an increased computational load. Probably the main reason for this fact is that current techniques poorly integrate all the knowledge sources, such integration being a particularly difficult problem.
  • Two recent research directions seem promising and will be the application focus of the project: system combination and landmark-driven decoders. Both techniques aim at combining different sources of knowledge and modeling paradigms, be they different systems operating on the same representation or a single system operating on multiple representations of the speech material.

Position of the ASH project

  • The ASH project proposes a new approach to combine several ASR systems. This approach consists in making these ASR systems exchange information during the decoding process, on the fly, whereas classical approach consists in only combining final outputs. This implies the design of a specific framework, and allows the use of different techniques. In the ASH project, the landmark-driven technique will be partwhich can be integrated in this new framework.

Hardware technological evolution and the ASH project

  • These last few years, micro-processor manufacturers have developed multi-cores CPUs. This technology is now very affordable and is integrated in standard personal computers.
  • Multi-cores CPUS offer new perpectives to accelerate software in parallelizing the computing process. But, at this time, it is difficult to parallelize the execution of an ASR system, as the different steps are usually sequential.
  • In the ASH project, we expect to take advantage of the multi-cores CPU architecture in developping an approach which parallelizes the use of several ASR systems to process the same audio data, working together to obtain the best recognition hypotheses in real time.
home.txt · Last modified: 12/09/2010 23:25 by Yannick Estève