|Table of Contents|

[1] Jin Yun, Song Peng, Zheng Wenming, et al. Novel feature fusion method for speech emotion recognitionbased on multiple kernel learning [J]. Journal of Southeast University (English Edition), 2013, 29 (2): 129-133. [doi:10.3969/j.issn.1003-7985.2013.02.004]

Novel feature fusion method for speech emotion recognitionbased on multiple kernel learning()

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

2013 2
Research Field:
Information and Communication Engineering
Publishing date:


Novel feature fusion method for speech emotion recognitionbased on multiple kernel learning
Jin Yun1 2 Song Peng1 Zheng Wenming3 Zhao Li1
1School of Information Science and Engineering, Southeast University, Nanjing 210096, China
2School of Physics and Electronic Engineering, Jiangsu Normal University, Xuzhou 221116, China
3Research Center for Learning Science, Southeast University, Nanjing 210096, China
speech emotion recognition multiple kernel learning feature fusion support vector machine
In order to improve the performance of speech emotion recognition, a novel feature fusion method is proposed. Based on the global features, the local information of different kinds of features is utilized. Both the global and the local features are combined together. Moreover, the multiple kernel learning method is adopted. The global features and each kind of local feature are respectively associated with a kernel, and all these kernels are added together with different weights to obtain a mixed kernel for nonlinear mapping. In the reproducing kernel Hilbert space, different kinds of emotional features can be easily classified. In the experiments, the popular Berlin dataset is used, and the optimal parameters of the global and the local kernels are determined by cross-validation. After computing using multiple kernel learning, the weights of all the kernels are obtained, which shows that the formant and intensity features play a key role in speech emotion recognition. The classification results show that the recognition rate is 78.74% by using the global kernel, and it is 81.10% by using the proposed method, which demonstrates the effectiveness of the proposed method.


[1] Cowie R, Douglas-Cowie E, Tsapatsoulis N, et al. Emotion recognition in human-computer interaction [J]. IEEE Signal Process Magazine, 2001, 18(1):32-80.
[2] Ververidis D, Kotropoulos C, Pitas I. Automatic emotional speech classification[C]//Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Montreal, Canada, 2004:593-596.
[3] Zeng Z, Pantic M, Roisman G I, et al. A survey of affect recognition methods: audio, visual, and spontaneous expressions [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(1):39-58.
[4] Lee C M, Yildirim S, Bulut M, et al. Emotion recognition based on phoneme classes [C]//8th International Conference on Spoken Language Processing. Jeju Island, Korea, 2004:889-892.
[5] Banse R, Scherer K. Acoustic profiles in vocal emotion expression [J]. J Personality Social Psych, 1996, 70(3):614-636.
[6] Schuller B, Steidl S, Batliner A. The INTERSPEECH 2009 emotion challenge [C]//Proceedings of the Annual Conference of the International Speech Communication Association. Brighton, UK, 2009:312-315.
[7] Schuller B, Valstar M, Eyben F, et al. AVEC 2011—the first international audio/visual emotion challenge[C]//Lecture Notes in Computer Science. Springer, 2011, 6975:415-424.
[8] Bach F R, Lanckriet G R G, Jordan M I. Multiple kernel learning, conic duality, and the SMO algorithm [C]//Proceedings of the Twenty-First International Conference on Machine Learning. Banff, Canada, 2004:41-48.
[9] Lin Yen-Yu, Liu Tyng-Luh, Fuh Chiou-Shann. Multiple kernel learning for dimensionality reduction[J]. IEEE Transactions on Pattern Analysis and Machine Learning, 2011, 33(6):1147-1160.
[10] Burkhardt F, Paeschke A, Rolfes M, et al. A database of German emotional speech [C]//9th European Conference on Speech Communication and Technology. Lisbon, Portugal, 2005:1517-1520.
[11] Bitouk D, Verma R, Nenkova A. Class-level spectral features for emotion recognition [J]. Speech Communication, 2010, 52(7/8):613-625.
[12] Boersma P. Praat, a system for doing phonetics by computer [J]. Glot International, 2001, 5(9/10): 341-345.


Biographies: Jin Yun(1979—), male, graduate; Zhao Li(corresponding author), doctor, professor, zhaoli@seu.edu.cn.
Foundation items: The National Natural Science Foundation of China(No.61231002, 61273266), the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD).
Citation: Jin Yun, Song Peng, Zheng Wenming, et al. Novel feature fusion method for speech emotion recognition based on multiple kernel learning[J].Journal of Southeast University(English Edition), 2013, 29(2):129-133.[doi:10.3969/j.issn.1003-7985.2013.02.004]
Last Update: 2013-06-20