|Table of Contents|

[1] Wang Rugang, Xu Xinzhou, Huang Chengwei, et al. Speech emotion recognition via discriminant-cascadingdimensionality reduction [J]. Journal of Southeast University (English Edition), 2016, 32 (2): 151-157. [doi:10.3969/j.issn.1003-7985.2016.02.004]
Copy

Speech emotion recognition via discriminant-cascadingdimensionality reduction()
基于级联降维判别的语言情感识别
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
32
Issue:
2016 2
Page:
151-157
Research Field:
Information and Communication Engineering
Publishing date:
2016-06-20

Info

Title:
Speech emotion recognition via discriminant-cascadingdimensionality reduction
基于级联降维判别的语言情感识别
Author(s):
Wang Rugang1 2 Xu Xinzhou1 Huang Chengwei1 Wu Chen1 Zhang Xinran1 Zhao Li1
1Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University, Nanjing 210096, China
2 College of Information Engineering, Yancheng Institute of Technology, Yancheng 224051, China
王如刚1 2 徐新洲1 黄程韦1 吴尘1 张昕然1 赵力1
1东南大学水声信号处理教育部重点实验室, 南京210096; 2盐城工学院信息工程学院, 盐城224051
Keywords:
speech emotion recognition discriminant-cascading locality preserving projections discriminant analysis dimensionality reduction
语音情感识别 级联降维的保局投影算法 判别分析 降维
PACS:
TN911.72
DOI:
10.3969/j.issn.1003-7985.2016.02.004
Abstract:
In order to accurately identify speech emotion information, the discriminant-cascading effect in dimensionality reduction of speech emotion recognition is investigated. Based on the existing locality preserving projections and graph embedding framework, a novel discriminant-cascading dimensionality reduction method is proposed, which is named discriminant-cascading locality preserving projections(DCLPP). The proposed method specifically utilizes supervised embedding graphs and it keeps the original space for the inner products of samples to maintain enough information for speech emotion recognition. Then, the kernel DCLPP(KDCLPP)is also proposed to extend the mapping form. Validated by the experiments on the corpus of EMO-DB and eNTERFACE’05, the proposed method can clearly outperform the existing common dimensionality reduction methods, such as principal component analysis(PCA), linear discriminant analysis(LDA), locality preserving projections(LPP), local discriminant embedding(LDE), graph-based Fisher analysis(GbFA)and so on, with different categories of classifiers.
为了准确地识别语音情感信息, 研究了语音情感识别的降维中判别级联效应.基于现有的局部投影算法和图形嵌入理论, 提出了一种新型判别分析算法, 即DCLPP算法.为了能够对语音情感识别保持足够的信息, 该算法利用嵌入图形为样本的内部特点保留了原始空间.然后, 为了扩展映射形式, 提出了一种kernel dCLPP(KDCLPP)的方法.在EMO-DB和eNTERFACE’05情感语音数据库上对该算法进行了验证, 结果表明, 所提算法可明显地超越现有的常用主成成分分析(PCA)、线性判别分析(LDA)、局部保持投影(LPP)、局部鉴别嵌入(LDE)和图优化的Fisher判别分析(GbFA)等判别分析算法, 这些算法都有不同类型的分类器.

References:

[1] Alonso J B, Cabrera J, Medina M, et al. New approach in quantification of emotional intensity from the speech signal: Emotional temperature[J]. Expert Systems with Applications, 2015, 42(24): 9554-9564. DOI:10.1016/j.eswa.2015.07.062.
[2] Raptis S, Karabetsos S, Chalamandaris A, et al. A framework towards expressive speech analysis and synthesis with preliminary results[J]. Journal on Multimodal User Interfaces, 2015, 9(4):387-394.
[3] Kantrowitz J T, Hoptman M J, Leitman D I, et al. Neural substrates of auditory emotion recognition deficits in schizophrenia.[J]. Journal of the Society for Neuroscience, 2015, 35(44):14909-14921. DOI:10.1523/JNEUROSCI.4603-14.2015.
[4] Mao Q, Dong M, Huang Z, et al. Learning salient features for speech emotion recognition using convolutional neural networks[J]. IEEE Transactions on Multimedia, 2014, 16(8):2203-2213. DOI:10.1109/tmm.2014.2360798.
[5] Arruti A, Cearreta I, Alvarez A, et al. Feature selection for speech emotion recognition in Spanish and Basque: On the use of machine learning to improve human-computer interaction.[J]. Plos One, 2014, 9(10):e108975. DOI:10.1371/journal.pone.0108975.
[6] Ooi C S, Seng K P, Ang L M, et al. A new approach of audio emotion recognition[J]. Expert Systems with Applications, 2014, 41(13):5858-5869. DOI:10.1016/j.eswa.2014.03.026.
[7] Yan J. Speech emotion recognition based on sparse representation[J]. Archives of Acoustics, 2013, 38(4):465-470. DOI:10.2478/aoa-2013-0055.
[8] Xu X, Huang C, Wu C, et al. Graph learning based speaker independent speech emotion recognition[J]. Advances in Electrical & Computer Engineering, 2014, 14(2):17-22. DOI:10.4316/aece.2014.02003.
[9] Xu X, Deng J, Zheng W, et al. Dimensionality reduction for speech emotion features by multiscale kernels[C]//Annual Conference of International Speech Communication Association. Dresden, Germany, 2015:1532-1536.
[10] Zha C, Zhang X R, Zhao L, et al. Speaker-independent speech emotion recognition based multiple kernel learning of collaborative representation[J]. IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences, 2016, 99(3):756-759. DOI:10.1587/transfun.e99.a.756.
[11] Roweis S, Saul L. Nonlinear dimensionality reduction by locally linear embedding[J]. Science, 2000, 290: 2323-2326. DOI:10.1126/science.290.5500.2323.
[12] He X, Niyogi P. Locality preserving projections[J]. Advances in Neural Information Processing Systems 16(NIPS 2003). Vancouver and Whistle, Canada, 2003.
[13] Cui Y, Fan L. A novel supervised dimensionality reduction algorithm: Graph-based Fisher analysis[J]. Pattern Recognition, 2012, 45(4):1471-1481. DOI:10.1016/j.patcog.2011.10.006.
[14] Belkin M, Niyogi P. Laplacian eigenmaps and spectral techniques for embedding and clustering[C]// Advances in Neural Information Processing Systems 14(NIPS 2001). Vancouver, Canada, 2001.
[15] Yu X, Wang X, Liu B. Supervised kernel neighborhood preserving projections for radar target recognition[J]. Signal Processing, 2008, 88(9): 2335-2339. DOI:10.1016/j.sigpro.2007.11.015.
[16] Burkhardt F, Paeschke A, Rolfes M, et al. A database of German emotional speech[C]//Eurospeech, European Conference on Speech Communication and Technology. Lisbon, Portugal, 2005:1517-1520.
[17] Martin O, Kotsia I, Macq B. The eNTERFACE’05 audio-visual emotion database[C]//22nd International Conference on Data Engineering Workshops. Atlanta, GA, USA, 2006.

Memo

Memo:
Biographies: Wang Rugang(1975—), male, doctor, associate professor; Zhao Li(corresponding author), male, doctor, professor, zhaoli@seu.edu.cn.
Foundation items: The National Natural Science Foundation of China(No.61231002, 61273266), the Ph.D. Program Foundation of Ministry of Education of China(No.20110092130004), China Postdoctoral Science Foundation(No. 2015M571637).
Citation: Wang Rugang, Xu Xinzhou, Huang Chengwei, et al. Speech emotion recognition via discriminant-cascading dimensionality reduction[J].Journal of Southeast University(English Edition), 2016, 32(2):151-157.doi:10.3969/j.issn.1003-7985.2016.02.004.
Last Update: 2016-06-20