Syndicate content

Publications

4 November 2010

The task of speaker recognition or speaker identification becomes very important in our digital world. Most of the law enforcement organizations use either automatic or manual speaker identification tools for investigation processes. In any case, before carrying out the identification analysis, they usually need to record a voice sample from the suspect either for one to one comparison or to fill in the database.

17 June 2010

Typically, speaker identification examination requires two audio recordings: a voice sample and a questionable recording. The questionable one is in most of the cases the intercepted or recorded phone call. As mobile phones became the most popular way of communication, the largest number of questionable recordings comes from GSM channels. They use special algorithms and devices to transmit the speech signal through the GSM channel, but these devices and algorithms change the original signal, thus the possibility of usage of such a recording for speaker identification becomes doubtful.

20 October 2009

This paper outlines a project on the development of a new hybrid unit-selection and concatenative Russian TTS system. Project is held within Federal Research and Development Program in Priority Directions of Development of Scientific and Technological Complex of Russia in 2007-2012. A new generation Russian TTS that makes use of syntactic and semantic analysis and can be implemented in various types of electronic devices is the major aim of the project.

21 October 2008

In this paper we show that the Random Forest (RF) approach can be successfully implemented for language modeling of an inflectional language for Automatic Speech Recognition (ASR) tasks. While Decision Trees (DTs) perform worse than a conventional trigram language model (LM), RFs outperform the latter. WER (up to 3.4% relative) and perplexity (10%) reduction over the trigram model can be gained with morphological RFs. Further improvement is obtained after interpolation of DT and RF LMs with the trigram one (up to 15.6% perplexity and 4.8% WER relative reduction).