|Table of Contents|

[1] Wu Chenjian, Huang Chengwei, Chen Hong,. Dimensional emotion recognition in whispered speech signalbased on cognitive performance evaluation [J]. Journal of Southeast University (English Edition), 2015, 31 (3): 311-319. [doi:10.3969/j.issn.1003-7985.2015.03.003]
Copy

Dimensional emotion recognition in whispered speech signalbased on cognitive performance evaluation()
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
31
Issue:
2015 3
Page:
311-319
Research Field:
Computer Science and Engineering
Publishing date:
2015-09-20

Info

Title:
Dimensional emotion recognition in whispered speech signalbased on cognitive performance evaluation
Author(s):
Wu Chenjian1 Huang Chengwei2 Chen Hong3
1School of Electronic and Information Engineering, Soochow University, Suzhou 215006, China
2 College of Physics, Optoelectronics and Energy, Soochow University, Suzhou 215006, China
3 School of Mathematical Sciences, Soochow University, Suzhou 215006, China
Keywords:
whispered speech emotion recognition emotion dimensional space
PACS:
TP391.4
DOI:
10.3969/j.issn.1003-7985.2015.03.003
Abstract:
The cognitive performance-based dimensional emotion recognition in whispered speech is studied. First, the whispered speech emotion databases and data collection methods are compared, and the character of emotion expression in whispered speech is studied, especially the basic types of emotions. Secondly, the emotion features for whispered speech is analyzed, and by reviewing the latest references, the related valence features and the arousal features are provided. The effectiveness of valence and arousal features in whispered speech emotion classification is studied. Finally, the Gaussian mixture model is studied and applied to whispered speech emotion recognition. The cognitive performance is also considered in emotion recognition so that the recognition errors of whispered speech emotion can be corrected. Based on the cognitive scores, the emotion recognition results can be improved. The results show that the formant features are not significantly related to arousal dimension, while the short-term energy features are related to the emotion changes in arousal dimension. Using the cognitive scores, the recognition results can be improved.E:\YW2015(3)\网刊\201503003.pdf

References:

[1] Liang R, Xi J, Zhao L, et al. Experimental study and improvement of frequency lowering algorithm in Chinese digital hearing aids[J]. Acta Physica Sinica, 2012, 61(13):134305-1-134305-11.
[2] Hultsch H, Todt D, Ziiblke K. Einsatz und soziale interpretation gefliisterter signale, umwelt und verhalten[M]. Bern, Switzerland: Huber Verlag, 1992:391-406.
[3] Tartter V C, Braun D. Hearing smiles and frowns in normal and whisper registers[J]. Journal of Acoustic Society of America, 1994, 96(4): 2101-2107.
[4] Cirillo J, Todt D. Decoding whispered vocalizations: Relationships between social and emotional variables[C]//Proceedings of the 9th International Conference on Neural Information Processing. Singapore, 2002:1559-1563.
[5] Gong C, Zhao H, Tao Z, et al. Feature analysis on emotional Chinese whispered speech[C]//2010 International Conference on Information Networking and Automation. Kunming, China, 2010: 137-141.
[6] Gong C, Zhao H, Wang Y, et al. Development of Chinese whispered database for speaker verification[C]//2009 Asia Pacific Conference on Postgraduate Research, Microelectronics & Electronics. Shanghai, China, 2009:197-200.
[7] Gong C, Zhao H. Tone recognition of Chinese whispered speech[C]//2008 Pacific-Asia Workshop on Computational Intelligence and Industrial Application. Wuhan, China, 2008:418-422.
[8] Tartter V C. Identifiability of vowels and speakers from whispered syllables[J]. Perception and Psychophysics, 1991, 49(4):365-372.
[9] Takeda T K, Itakura F. Acoustic analysis and recognition of whispered speech[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal. Orlando, FL, USA, 2002:389-392.
[10] Yang L, Li Y, Xu B. The establishment of a Chinese whisper database and perceptual experiment [J]. Journal of Nanjing University:Natural Science, 2005, 41(3):311-317.
[11] Huang C, Jin Y, Zhao L, et al. Speech emotion recognition based on decomposition of feature space and information fusion [J]. Signal Processing, 2010, 26(6): 835-842.
[12] Huang C, Jin Y, Zhao Y, et al. Recognition of practical emotion from elicited speech [C]//Proceedings of ICISE. Nanjing, China, 2009:639-642.
[13] Huang C, Jin Y, Zhao Y, et al. Speech emotion recognition based on re-composition of two-class classifiers [C]//Proceedings of ACII. Amsterdam, Netherland, 2009:1-3.
[14] Schwartz M F, Rine M F. Identification of speaker sex from isolated, whispered vowels[J]. Journal of Acoustical Society of America, 1968, 44(6): 1736-1737.
[15] Tartter V C. Identifiability of vowels and speakers from whispered syllables[J]. Perception and Psychophysics, 1991, 49(4): 365-372.
[16] Higashikawa M, Minifie F D. Acoustical-perceptual correlates of “whisper pitch” in synthetically generated vowels[J]. Speech Lung Hear Res, 1999, 42(3):583-591.
[17] Morris R W. Enhancement and recognition of whispered speech[D]. Atlanta, USA:School of Electrical and Computer Engineering, Georgia Institute of Technology, 2002.
[18] Gao M. Tones in whispered Chinese:articulatory and perceptual Cues[D]. Victoria, Canada:Department of Linguistics, University of Victoria, 2002.
[19] Huang C, Jin Y, Bao Y, et al. Whispered speech emotion recognition embedded with Markov networks and multi-scale decision fusion[J]. Signal Processing, 2013, 29(1): 98-106.
[20] Jin Y, Zhao Y, Huang C, et al. The design and establishment of a Chinese whispered speech emotion database [J]. Technical Acoustics, 2010, 29(1): 63-68.
[21] Zhao Y. Research on several key technologies in speech emotion recognition and feature analysis[D]. Nanjing:School of Information Science and Engineering, Southeast University, 2010.
[22] New T L, Foo S W, Silva L C D. Speech emotion recognition using hidden Markov models[J]. Speech Communication, 2003, 41(4): 603-623.
[23] Huang C, Jin Y, Zhao Y, et al. Design and establishment of practical speech emotion database[J]. Technical Acoustics, 2010, 29(4): 396-399.
[24] Schuller B, Arsic D, Wallhoff F, et al. Emotion recognition in the noise applying large acoustic feature sets[C]//The 3rd International Conference on Speech Prosody. Dresden, Germany, 2006: 276-289.
[25] Tawari A, Trivedi M M. Speech emotion analysis in noisy real-world environment[C]//Proceedings of the 20th International Conference on Pattern Recognition. Washington DC, USA, 2010:4605-4608.
[26] Johnstone T, van Reekum C M, Hird K, et al. Affective speech elicited with a computer game[J]. Emotion, 2005, 5(4): 513-518.
[27] Zou C, Huang C, Han D, et al. Detecting practical speech emotion in a cognitive task[C]//20th International Conference on Computer Communications and Networks. Hawaii, USA, 2011:1-5.
[28] Kockmann M, Burget L, Cernocky J H. Application of speaker-and language identification state-of-the-art techniques for emotion recognition[J]. Speech Communication, 2011, 53(9/10):1172-1185.
[29] Lin Y, Wei G. Speech emotion recognition based on HMM and SVM[C]//Proceedings of 2005 International Conference on Machine Learning and Cybernetics. Bonn, Germany, 2005:4898-4901.
[30] Jin Y, Huang C, Zhao L. A semi-supervised learning algorithm based on modified self-training SVM[J]. Journal of Computers, 2011, 6(7): 1438-1443.
[31] Dellaert F, Polzin T, Waibel A. Recognizing emotion in speech[C]//The Fourth International Conference on Spoken Language. Pittsburgh, PA, USA, 1996:1970-1973.
[32] Lee C, Mower E, Busso C, et al. Emotion recognition using a hierarchical binary decision tree approach[J]. Speech Communication, 2011, 53(9/10):1162-1171.
[33] Nicholson J, Takahashi K, Nakatsu R. Emotion recognition in speech using neural networks[J]. Neural Computing & Applications, 2000, 9(4):290-296.
[34] Yu H, Huang C, Zhang X, et al. Shuffled frog-leaping algorithm based neural network and its application in speech emotion recognition[J]. Journal of Nanjing University of Science and Technology, 2011, 35(5):659-663.
[35] Wang Z. Feature analysis and emotino recognition in emotional speech[D]. Nanjing: School of Information Science and Engineering, Southeast University, 2004.
[36] Yu H, Huang C, Jin Y, et al. Speech emotion recognition based on modified shuffled frog leaping algorithm neural network[J]. Signal Processing, 2010, 26(9): 1294-1299.

Memo

Memo:
Biography: Wu Chenjian(1983—), male, doctor, lecturer, cjwu@suda.edu.cn.
Foundation item: The National Natural Science Foundation of China(No.11401412).
Citation: Wu Chenjian, Huang Chengwei, Chen Hong. Dimensional emotion recognition in whispered speech signal based on cognitive performance evaluation[J].Journal of Southeast University(English Edition), 2015, 31(3):311-319.[doi:10.3969/j.issn.1003-7985.2015.03.003]
Last Update: 2015-09-20