In this paper we investigate speech recordings before and after speaker’s drug-abuse treatment, and show that there is no statistically significant dependency between distortions of speaker’s fundamental frequency and formants on the one side, and different groups of drugs and on the degree of drug intoxication on the other. Changes of the fundamental frequency are not regular and do not have a general nature. The main reasons for these changes are changes in the emotional state of speakers, rather than a drug addiction treatment.
This paper presents a development of previous research by P.Kenny, which deals with using a supervised PLDA mixture of two gender-dependent speaker verification systems under the conditions of gender uncertainty. We propose using PLDA mixtures for speaker verification in different channels. However, in contrast to creating a gender-independent mixture, the optimal decision for training a channel-independent mixture for two channels in our task was mixing three channel-dependent PLDA systems.
The paper considers some peculiarities of training and using N-gram language models with open vocabulary. It is demonstrated that explicit modeling of the probability distribution of out-of-model (unknown) words is necessary in this case. Two known techniques for this modeling are considered and a new technique with several advantages is proposed. We present experiments which demonstrate the consistency of the proposed approach.
In some applications of speaker recognition, for example in the forensic area or in the access control systems, an important task is to estimate some absolute measure of identity of the speakers. Automatic speaker recognition methods in this case seem to be the fastest and the simplest speaker identification tool [1-2]. However, up to now the applicability and reliability evaluation of automatic speaker recognition systems (ASRS) for single cases, e.g. in forensic area, is widely disputable [3-7]. Output results of state-of-the-art ASRS are based on statistical data analysis.
In this paper we propose to use Variational Bayesian Analysis (VBA) instead of Maximum Likelihood (ML) estimation for Universal Background Model (UBM) building in GMM text independent speaker verification systems. Using VBA estimation solves the problem of the optimal choice of the UBM mixture dimensionality for the training data set, as well as the problem of noise Gaussians which are typical for ML estimation.
This paper deals with topic segmentation of continuous speech. We propose an online segmentation method that relies on the information about sentence boundaries obtained from an automatic sentence boundary detection system. We show that using information about sentence boundaries to divide continuous speech into fragments for topic classification provides an increase in classification accuracy of about 25-30%, compared to the method where only a threshold on the number of words is used to divide continuous speech into fragments.
Word pronunciation variability is one of the basic problems in automatic speech recognition (ASR). There is a significant performance degradation of a system trained on all types of data nondiscriminatively when the test signal differs noticeably from the speaker-independent (SI) acoustic model (AM). We explore a class recognition system with minimum phone error (MPE) training. Class models provide an improvement in recognition quality compared to SI models.
In this paper we investigate speech recordings before and after speaker’s drug-abuse treatment, and show that there are no statistically significant dependences in distortions of speaker’s fundamental frequency on different groups of drugs and on the degree of drug intoxication. Changes in the fundamental frequency are not regular and have no general nature. The main reason for changing the fundamental frequency is a change of speaker emotional state, rather than drug addiction treatment.
This paper reports on some of the results obtained in the course of computer-aided processing of a large text corpus in Russian. Frequencies of occurrence of Russian monophones and diphones are presented and some notable observations are discussed.
This paper deals with new front-end feature improvements for Automatic Speech Recognition (ASR) robustness to changes in speech loudness. Our experiments show that applying a RASTA– like filter gives a significant improvement in robustness to speech loudness change, leading to an up to 4% PER reduction.