Speaker age and gender classification using GMM supervector and NAP channel compensation method

Yucesoy, Ergun

dc.contributor.author	Yucesoy, Ergun
dc.date.accessioned	2024-03-15T06:47:09Z
dc.date.available	2024-03-15T06:47:09Z
dc.date.issued	2020
dc.identifier.citation	Yücesoy, E. (2020). Speaker age and gender classification using GMM supervector and NAP channel compensation method. J. Ambient Intell. Humaniz. Comput.. https://doi.org/10.1007/s12652-020-02045-4	en_US
dc.identifier.issn	1868-5137
dc.identifier.issn	1868-5145
dc.identifier.uri	http://dx.doi.org/10.1007/s12652-020-02045-4
dc.identifier.uri	https://www.webofscience.com/wos/woscc/full-record/WOS:000532649300003
dc.identifier.uri	http://earsiv.odu.edu.tr:8080/xmlui/handle/11489/4016
dc.description	WoS Categories: Computer Science, Artificial Intelligence; Computer Science, Information Systems; Telecommunications	en_US
dc.description	Web of Science Index: Science Citation Index Expanded (SCI-EXPANDED)	en_US
dc.description	Research Areas: Computer Science; Telecommunications	en_US
dc.description.abstract	One of the most important factors affecting the performance of speech-based recognition systems is the differences between training and test conditions. The Nuisance attribute projection (NAP) is an effective method for eliminating these differences, called channel effects. In this study, the effects of the NAP approach in determining age and gender groups are investigated. Mel-frequency cepstral coefficients and delta coefficients are used as a feature and Gaussian mixture models (GMM) adapted from the universal background model by maximum-a-posteriori method are used for the modeling of age and gender classes. After the GMMs corresponding to each speech are converted into mean supervectors, they are applied to a Support Vector Machine (SVM), and speeches are classified according to the age and gender group of the speakers. While linear GMM kernel based on Kullback-Leibler divergence is used instead of standard SVM kernels, the NAP channel subspace size is changed between 20 and 200 and the number of GMM components is changed between 32 and 512 to determine the optimum values for these parameters. In the tests on the aGender database, the optimum number of components is determined as 128, and the optimum NAP channel subspace size is determined as 45. The age and gender classification accuracy of the system, which is developed using these optimum parameters, is increased from 60.52 to 62.03% with the use of NAP. In addition, age classification accuracy is increased from 60.23 to 61.82% and gender classification accuracy is increased from 91.71 to 92.30%.	en_US
dc.language.iso	eng	en_US
dc.publisher	SPRINGER HEIDELBERG-HEIDELBERG	en_US
dc.relation.isversionof	10.1007/s12652-020-02045-4	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Speaker age and gender classification, Gaussian mixture model (GMM), Nuisance attribute projection (NAP), Support vector machine (SVM), Maximum-A-posteriori (MAP)	en_US
dc.subject	AUTOMATIC SPEAKER, FORECAST ENGINE, VERIFICATION	en_US
dc.title	Speaker age and gender classification using GMM supervector and NAP channel compensation method	en_US
dc.type	article	en_US
dc.relation.journal	JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING	en_US
dc.contributor.department	Ordu Üniversitesi	en_US
dc.contributor.authorID	0000-0003-1707-384X	en_US

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Makale Koleksiyonu

Show simple item record

Search DSpace

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

Speaker age and gender classification using GMM supervector and NAP channel compensation method

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection

My Account