«Previous Article|Table of Contents|Next Article»

[1] Xu Xinzhou, Huang Chengwei, Jin Yun, Wu Chen, et al. Speech emotion recognitionusing semi-supervised discriminant analysis [J]. Journal of Southeast University (English Edition), 2014, 30 (1): 7-12. [doi:10.3969/j.issn.1003-7985.2014.01.002]
Copy

Speech emotion recognitionusing semi-supervised discriminant analysis()

基于半监督判别分析的语音情感识别

Share：

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:: 30
Issue:: 2014 1

Page:: 7-12

Research Field:: Information and Communication Engineering

Publishing date:: 2014-03-31

Info

Title:: Speech emotion recognitionusing semi-supervised discriminant analysis

: 基于半监督判别分析的语音情感识别

Author(s):: Xu Xinzhou¹, Huang Chengwei², Jin Yun¹, Wu Chen¹, Zhao Li^{1, 3}; ¹Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University, Nanjing 210096, China
²School of Physical Science and Technology, Soochow University, Suzhou 215006, China
³ Key Laboratory of Child Development and Learning Science of Ministry of Education, Southeast University, Nanjing 210096, China

: 徐新洲¹, 黄程韦², 金赟¹, 吴尘¹, 赵力^{1, 3}; ¹东南大学水声信号处理教育部重点实验室, 南京 210096; ²苏州大学物理科学与技术学院, 苏州 215006; ³东南大学儿童发展与学习科学教育部重点实验室, 南京 210096

Keywords:: speech emotion recognition; speech emotion feature; semi-supervised discriminant analysis; dimensionality reduction

: 语音情感识别; 语音情感特征; 半监督判别分析; 维数约简

PACS:: TN912.3

DOI:: 10.3969/j.issn.1003-7985.2014.01.002

Abstract:: Semi-supervised discriminant analysis(SDA), which uses a combination of multiple embedding graphs, and kernel SDA(KSDA)are adopted in supervised speech emotion recognition. When the emotional factors of speech signal samples are preprocessed, different categories of features including pitch, zero-cross rate, energy, durance, formant and Mel frequency cepstrum coefficient(MFCC), as well as their statistical parameters, are extracted from the utterances of samples. In the dimensionality reduction stage before the feature vectors are sent into classifiers, parameter-optimized SDA and KSDA are performed to reduce dimensionality. Experiments on the Berlin speech emotion database show that SDA for supervised speech emotion recognition outperforms some other state-of-the-art dimensionality reduction methods based on spectral graph learning, such as linear discriminant analysis(LDA), locality preserving projections(LPP), marginal Fisher analysis(MFA)etc., when multi-class support vector machine(SVM)classifiers are used. Additionally, KSDA can achieve better recognition performance based on kernelized data mapping compared with the above methods including SDA.

: 将基于多个嵌入图组合形式的半监督判别分析(SDA)以及核SDA(KSDA)应用于全监督的语音情感识别. 在语音信号样本情感成分的预处理阶段, 从样本语段中提取出多种特征及其统计参数, 包括基音、过零率、能量、持续长度、共振峰和 MFCC(Mel频率倒谱系数). 在将样本特征送入分类器之前的维数约简阶段, 使用经过参数优化的SDA或KSDA进行降维. Berlin语音情感数据库上的实验表明, 在使用多类SVM分类器时的全监督语音情感识别中, SDA优于其他一些先进的基于谱图学习的维数约简算法, 如LDA, LPP, MFA等, 而KSDA通过核化的数据映射, 能够取得比上述所有算法更好的识别效果.

References:

[1] Dellaert F, Polzin T, Waibel A. Recognizing emotion in speech[C]//International Conference on Spoken Language. Philadelphia, PA, USA, 1996, 3: 1970-1973.
[2] Ververidis D, Kotropoulos C. Emotional speech recognition: resources, features, and methods[J]. Speech Communication, 2006, 48(9): 1162-1181.
[3] Schuller B, Rigoll G. Timing levels in segment-based speech emotion recognition[C]//International Conference on Spoken Language. Pittsburgh, PA, USA, 2006: 1818-1821.
[4] Oudeyer P. The production and recognition of emotions in speech: features and algorithms[J]. International Journal of Human-Computer Studies, 2003, 59(1/2): 157-183.
[5] Tato R, Santos R, Kompe R, et al. Emotional space improves emotion recognition[C]//International Conference on Spoken Language. Denver, CO, USA, 2002: 2029-2032.
[6] Zhang S Q, Zhao X M, Lei B C. Speech emotion recognition using an enhanced kernel Isomap for human-robot interaction[J]. International Journal of Advanced Robotic Systems, 2013, 10: 114-01-114-07.
[7] You M Y, Chen C, Bu J J, et al. Emotional speech analysis on nonlinear manifold[C]//International Conference on Pattern Recognition. Hong Kong, China, 2006, 3: 91-94.
[8] Ayadi M, Kamel M, Karray F. Survey on speech emotion recognition: features, classification schemes, and databases[J]. Pattern Recognition, 2011, 44(3): 572-587.
[9] Roweis S, Saul L. Nonlinear dimensionality reduction by locally linear embedding[J]. Science, 2000, 290(5500): 2323-2326.
[10] Tenenbaum J, de Silva V, Langford J. A global geometric framework for nonlinear dimensionality reduction[J]. Science, 2000, 290(5500): 2319-2323.
[11] Belkin M, Niyogi P. Laplacian eigenmaps and spectral techniques for embedding and clustering[C]//Advances in Neutral Information Processing Systems 14. Whistler, British Columbia, Canada, 2002: 585-591.
[12] He X F, Niyogi P. Locality preserving projections[C]//Advances in Neural Information Processing Systems 15. Whistler, British Columbia, Canada, 2003: 153-160.
[13] Wang R P, Shan S G, Chen X L, et al. Maximal linear embedding for dimensionality reduction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(9): 1776-1792.
[14] Cai H P, Mikolajczyk K, Matas J. Learning linear discriminant projections for dimensionality reduction of image descriptors[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(2): 338-352.
[15] Yan S C, Xu D, Zhang B Y, et al. Graph embedding and extensions: a general framework for dimensionality reduction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(1): 40-51.
[16] De la Torre F. A least-squares framework for component analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(6): 1041-1055.
[17] Cai D, He X F. Semi-supervised discriminant analysis[C]//International Conference on Computer Vision. Rio de Janeiro, Brazil, 2007: 1-7.
[18] Shawe-Taylor J, Cristianini N. Kernel methods for pattern analysis[M]. Cambridge, UK: Cambridge University Press, 2004.

Memo

Memo:: Biographies: Xu Xinzhou(1987—), male, graduate; Zhao Li(corresponding author), male, doctor, professor, zhaoli@seu.edu.cn.
Foundation items: The National Natural Science Foundation of China(No. 61231002, 61273266), the Ph.D. Programs Foundation of Ministry of Education of China(No.20110092130004).
Citation: Xu Xinzhou, Huang Chengwei, Jin Yun, et al.Speech emotion recognition using semi-supervised discriminant analysis[J].Journal of Southeast University(English Edition), 2014, 30(1):7-12.[doi:10.3969/j.issn.1003-7985.2014.01.002]

Last Update: 2014-03-20

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Info

References:

Memo

Common functions

Navigate

Tools

Statistics