|Table of Contents|

[1] Xu Xinzhou, Huang Chengwei, Jin Yun, Wu Chen, et al. Speech emotion recognitionusing semi-supervised discriminant analysis [J]. Journal of Southeast University (English Edition), 2014, 30 (1): 7-12. [doi:10.3969/j.issn.1003-7985.2014.01.002]
Copy

Speech emotion recognitionusing semi-supervised discriminant analysis()
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
30
Issue:
2014 1
Page:
7-12
Research Field:
Information and Communication Engineering
Publishing date:
2014-03-31

Info

Title:
Speech emotion recognitionusing semi-supervised discriminant analysis
Author(s):
Xu Xinzhou1 Huang Chengwei2 Jin Yun1 Wu Chen1 Zhao Li1 3
1Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University, Nanjing 210096, China
2School of Physical Science and Technology, Soochow University, Suzhou 215006, China
3 Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing 210096, China
Keywords:
speech emotion recognition speech emotion feature semi-supervised discriminant analysis dimensionality reduction
PACS:
TN912.3
DOI:
10.3969/j.issn.1003-7985.2014.01.002
Abstract:
Semi-supervised discriminant analysis(SDA), which uses a combination of multiple embedding graphs, and kernel SDA(KSDA)are adopted in supervised speech emotion recognition. When the emotional factors of speech signal samples are preprocessed, different categories of features including pitch, zero-cross rate, energy, durance, formant and Mel frequency cepstrum coefficient(MFCC), as well as their statistical parameters, are extracted from the utterances of samples. In the dimensionality reduction stage before the feature vectors are sent into classifiers, parameter-optimized SDA and KSDA are performed to reduce dimensionality. Experiments on the Berlin speech emotion database show that SDA for supervised speech emotion recognition outperforms some other state-of-the-art dimensionality reduction methods based on spectral graph learning, such as linear discriminant analysis(LDA), locality preserving projections(LPP), marginal Fisher analysis(MFA)etc., when multi-class support vector machine(SVM)classifiers are used. Additionally, KSDA can achieve better recognition performance based on kernelized data mapping compared with the above methods including SDA.

References:

[1] Dellaert F, Polzin T, Waibel A. Recognizing emotion in speech[C]//International Conference on Spoken Language. Philadelphia, PA, USA, 1996, 3: 1970-1973.
[2] Ververidis D, Kotropoulos C. Emotional speech recognition: resources, features, and methods[J]. Speech Communication, 2006, 48(9): 1162-1181.
[3] Schuller B, Rigoll G. Timing levels in segment-based speech emotion recognition[C]//International Conference on Spoken Language. Pittsburgh, PA, USA, 2006: 1818-1821.
[4] Oudeyer P. The production and recognition of emotions in speech: features and algorithms[J]. International Journal of Human-Computer Studies, 2003, 59(1/2): 157-183.
[5] Tato R, Santos R, Kompe R, et al. Emotional space improves emotion recognition[C]//International Conference on Spoken Language. Denver, CO, USA, 2002: 2029-2032.
[6] Zhang S Q, Zhao X M, Lei B C. Speech emotion recognition using an enhanced kernel Isomap for human-robot interaction[J]. International Journal of Advanced Robotic Systems, 2013, 10: 114-01-114-07.
[7] You M Y, Chen C, Bu J J, et al. Emotional speech analysis on nonlinear manifold[C]//International Conference on Pattern Recognition. Hong Kong, China, 2006, 3: 91-94.
[8] Ayadi M, Kamel M, Karray F. Survey on speech emotion recognition: features, classification schemes, and databases[J]. Pattern Recognition, 2011, 44(3): 572-587.
[9] Roweis S, Saul L. Nonlinear dimensionality reduction by locally linear embedding[J]. Science, 2000, 290(5500): 2323-2326.
[10] Tenenbaum J, de Silva V, Langford J. A global geometric framework for nonlinear dimensionality reduction[J]. Science, 2000, 290(5500): 2319-2323.
[11] Belkin M, Niyogi P. Laplacian eigenmaps and spectral techniques for embedding and clustering[C]//Advances in Neutral Information Processing Systems 14. Whistler, British Columbia, Canada, 2002: 585-591.
[12] He X F, Niyogi P. Locality preserving projections[C]//Advances in Neural Information Processing Systems 15. Whistler, British Columbia, Canada, 2003: 153-160.
[13] Wang R P, Shan S G, Chen X L, et al. Maximal linear embedding for dimensionality reduction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(9): 1776-1792.
[14] Cai H P, Mikolajczyk K, Matas J. Learning linear discriminant projections for dimensionality reduction of image descriptors[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(2): 338-352.
[15] Yan S C, Xu D, Zhang B Y, et al. Graph embedding and extensions: a general framework for dimensionality reduction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(1): 40-51.
[16] De la Torre F. A least-squares framework for component analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(6): 1041-1055.
[17] Cai D, He X F. Semi-supervised discriminant analysis[C]//International Conference on Computer Vision. Rio de Janeiro, Brazil, 2007: 1-7.
[18] Shawe-Taylor J, Cristianini N. Kernel methods for pattern analysis[M]. Cambridge, UK: Cambridge University Press, 2004.

Memo

Memo:
Biographies: Xu Xinzhou(1987—), male, graduate; Zhao Li(corresponding author), male, doctor, professor, zhaoli@seu.edu.cn.
Foundation items: The National Natural Science Foundation of China(No. 61231002, 61273266), the Ph.D. Programs Foundation of Ministry of Education of China(No.20110092130004).
Citation: Xu Xinzhou, Huang Chengwei, Jin Yun, et al.Speech emotion recognition using semi-supervised discriminant analysis[J].Journal of Southeast University(English Edition), 2014, 30(1):7-12.[doi:10.3969/j.issn.1003-7985.2014.01.002]
Last Update: 2014-03-20