Text-independent speaker identification using Gaussian mixture models based on multi-space probability distribution

Citation
C. Miyajima et al., Text-independent speaker identification using Gaussian mixture models based on multi-space probability distribution, IEICE T INF, E84D(7), 2001, pp. 847-855
Citations number
23
Language
INGLESE
art.tipo
Article
Categorie Soggetti
Information Tecnology & Communication Systems
Journal title
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS
ISSN journal
0916-8532 → ACNP
Volume
E84D
Issue
7
Year of publication
2001
Pages
847 - 855
Database
ISI
SICI code
0916-8532(200107)E84D:7<847:TSIUGM>2.0.ZU;2-M
Abstract
This paper presents a new approach to modeling speech spectra and pitch for text-independent speaker identification using Gaussian mixture models base d on multi-space probability distribution (MSD-GMM). MSD-GMM allows us to m odel continuous pitch values of voiced frames and discrete symbols for unvo iced frames in a unified framework. Spectral and pitch features are jointly modeled by a two-stream MSD-GMM. We derive maximum likelihood (ML) estimat ion formulae and minimum classification error (MCE) training procedure for MSD-GMM parameters. The MSD-GMM speaker models are evaluated for text-indep endent speaker identification tasks. The experimental results show that the MSD-GMM can efficiently model spectral and pitch features of each speaker and outperforms conventional speaker models. The results also demonstrate t he utility of the MCE training of the MSD-GMM parameters and the robustness for the inter-session variability.