|Table of Contents|

[1] Jin Yun, Song Peng, Zheng Wenming, et al. Novel feature fusion method for speech emotion recognitionbased on multiple kernel learning [J]. Journal of Southeast University (English Edition), 2013, 29 (2): 129-133. [doi:10.3969/j.issn.1003-7985.2013.02.004]
Copy

Novel feature fusion method for speech emotion recognitionbased on multiple kernel learning()
一种新的基于多核学习特征融合方法的语音情感识别方法
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
29
Issue:
2013 2
Page:
129-133
Research Field:
Information and Communication Engineering
Publishing date:
2013-06-20

Info

Title:
Novel feature fusion method for speech emotion recognitionbased on multiple kernel learning
一种新的基于多核学习特征融合方法的语音情感识别方法
Author(s):
Jin Yun1 2 Song Peng1 Zheng Wenming3 Zhao Li1
1School of Information Science and Engineering, Southeast University, Nanjing 210096, China
2School of Physics and Electronic Engineering, Jiangsu Normal University, Xuzhou 221116, China
3Research Center for Learning Science, Southeast University, Nanjing 210096, China
金赟1 2 宋鹏1 郑文明3 赵力1
1东南大学信息科学与工程学院, 南京 210096; 2江苏师范大学物理与电子工程学院, 徐州 221116; 3东南大学学习科学与研究中心, 南京 210096
Keywords:
speech emotion recognition multiple kernel learning feature fusion support vector machine
语音情感识别 多核学习 特征融合 支持向量机
PACS:
TN912.3
DOI:
10.3969/j.issn.1003-7985.2013.02.004
Abstract:
In order to improve the performance of speech emotion recognition, a novel feature fusion method is proposed. Based on the global features, the local information of different kinds of features is utilized. Both the global and the local features are combined together. Moreover, the multiple kernel learning method is adopted. The global features and each kind of local feature are respectively associated with a kernel, and all these kernels are added together with different weights to obtain a mixed kernel for nonlinear mapping. In the reproducing kernel Hilbert space, different kinds of emotional features can be easily classified. In the experiments, the popular Berlin dataset is used, and the optimal parameters of the global and the local kernels are determined by cross-validation. After computing using multiple kernel learning, the weights of all the kernels are obtained, which shows that the formant and intensity features play a key role in speech emotion recognition. The classification results show that the recognition rate is 78.74% by using the global kernel, and it is 81.10% by using the proposed method, which demonstrates the effectiveness of the proposed method.
为了提高语音情感识别率, 提出一种新的特征融合方法.在全局特征的基础上, 利用各种不同特征的局部信息, 把全局特征和局部特征结合起来, 引入多核学习的方法, 使整体的全局特征和每类局部特征都对应一个核函数, 加权求和得到一个组合核进行非线性映射, 使不同类别的情感特征在高维再生核Hilbert空间中变得更容易分开.采用Berlin语音情感数据库, 利用交叉验证的方法确定相应的全局核和局部核的参数, 经过多核学习计算, 得到所有核的权重, 确定共振峰和强度是情感识别中相对重要的特征.实验表明, 采用传统的方法识别率为78.74%, 而采用所提出的方法, 识别率为81.10%.因此, 所提出的特征融合方法能够有效地提高语音情感的识别率.

References:

[1] Cowie R, Douglas-Cowie E, Tsapatsoulis N, et al. Emotion recognition in human-computer interaction [J]. IEEE Signal Process Magazine, 2001, 18(1):32-80.
[2] Ververidis D, Kotropoulos C, Pitas I. Automatic emotional speech classification[C]//Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Montreal, Canada, 2004:593-596.
[3] Zeng Z, Pantic M, Roisman G I, et al. A survey of affect recognition methods: audio, visual, and spontaneous expressions [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(1):39-58.
[4] Lee C M, Yildirim S, Bulut M, et al. Emotion recognition based on phoneme classes [C]//8th International Conference on Spoken Language Processing. Jeju Island, Korea, 2004:889-892.
[5] Banse R, Scherer K. Acoustic profiles in vocal emotion expression [J]. J Personality Social Psych, 1996, 70(3):614-636.
[6] Schuller B, Steidl S, Batliner A. The INTERSPEECH 2009 emotion challenge [C]//Proceedings of the Annual Conference of the International Speech Communication Association. Brighton, UK, 2009:312-315.
[7] Schuller B, Valstar M, Eyben F, et al. AVEC 2011—the first international audio/visual emotion challenge[C]//Lecture Notes in Computer Science. Springer, 2011, 6975:415-424.
[8] Bach F R, Lanckriet G R G, Jordan M I. Multiple kernel learning, conic duality, and the SMO algorithm [C]//Proceedings of the Twenty-First International Conference on Machine Learning. Banff, Canada, 2004:41-48.
[9] Lin Yen-Yu, Liu Tyng-Luh, Fuh Chiou-Shann. Multiple kernel learning for dimensionality reduction[J]. IEEE Transactions on Pattern Analysis and Machine Learning, 2011, 33(6):1147-1160.
[10] Burkhardt F, Paeschke A, Rolfes M, et al. A database of German emotional speech [C]//9th European Conference on Speech Communication and Technology. Lisbon, Portugal, 2005:1517-1520.
[11] Bitouk D, Verma R, Nenkova A. Class-level spectral features for emotion recognition [J]. Speech Communication, 2010, 52(7/8):613-625.
[12] Boersma P. Praat, a system for doing phonetics by computer [J]. Glot International, 2001, 5(9/10): 341-345.

Memo

Memo:
Biographies: Jin Yun(1979—), male, graduate; Zhao Li(corresponding author), doctor, professor, zhaoli@seu.edu.cn.
Foundation items: The National Natural Science Foundation of China(No.61231002, 61273266), the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD).
Citation: Jin Yun, Song Peng, Zheng Wenming, et al. Novel feature fusion method for speech emotion recognition based on multiple kernel learning[J].Journal of Southeast University(English Edition), 2013, 29(2):129-133.[doi:10.3969/j.issn.1003-7985.2013.02.004]
Last Update: 2013-06-20