«Previous Article|Table of Contents|Next Article»

[1] Tashpolat Nizamidin, Zhao Li, Zhang Mingyang, et al. Emotion recognition of Uyghur speechusing uncertain linear discriminant analysis [J]. Journal of Southeast University (English Edition), 2017, 33 (4): 437-443. [doi:10.3969/j.issn.1003-7985.2017.04.008]
Copy

Emotion recognition of Uyghur speechusing uncertain linear discriminant analysis()

基于不确定性线性判别分析的维吾尔语语音情感识别

Share：

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:: 33
Issue:: 2017 4

Page:: 437-443

Research Field:: Computer Science and Engineering

Publishing date:: 2017-12-30

Info

Title:: Emotion recognition of Uyghur speechusing uncertain linear discriminant analysis

: 基于不确定性线性判别分析的维吾尔语语音情感识别

Author(s):: Tashpolat Nizamidin^{1, 2}, Zhao Li¹, Zhang Mingyang¹, Xu Xinzhou¹, Askar Hamdulla²; ¹Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University, Nanjing 210096, China
²School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China

: 塔什甫拉提·尼扎木丁^{1, 2}, 赵力¹, 张明阳¹, 徐新洲¹, 艾斯卡尔·艾木都拉²; ¹东南大学水声信号处理教育部重点实验室, 南京 210096; ²新疆大学信息科学与工程学院, 乌鲁木齐 830046

Keywords:: Uyghur language; speech emotion corpus; pitch; formant; uncertain linear discriminant analysis(ULDA)

: 维吾尔语; 语音情感数据库; 基音频率; 共振峰; 不确定性线性判别分析

PACS:: TP391

DOI:: 10.3969/j.issn.1003-7985.2017.04.008

Abstract:: To achieve efficient and compact low-dimensional features for speech emotion recognition, a novel feature reduction method using uncertain linear discriminant analysis is proposed. Using the same principles as for conventional linear discriminant analysis(LDA), uncertainties of the noisy or distorted input data are employed in order to estimate maximally discriminant directions. The effectiveness of the proposed uncertain LDA(ULDA)is demonstrated in the Uyghur speech emotion recognition task. The emotional features of Uyghur speech, especially, the fundamental frequency and formant, are analyzed in the collected emotional data. Then, ULDA is employed in dimensionality reduction of emotional features and better performance is achieved compared with other dimensionality reduction techniques. The speech emotion recognition of Uyghur is implemented by feeding the low-dimensional data to support vector machine(SVM)based on the proposed ULDA. The experimental results show that when employing an appropriate uncertainty estimation algorithm, uncertain LDA outperforms the conventional LDA counterpart on Uyghur speech emotion recognition.

: 为了在语音情感识别中获得高效、紧凑的低维特征, 提出了一种新的基于不确定线性判别分析的特征约简方法.用与传统LDA相同的原则, 在最大判别方向的估计中引入带噪声或失真输入数据的不确定性.在维吾尔语语音情感识别任务上验证了不确定性判别分析的有效性.在该情感数据上, 分析了维吾尔语的语音情感特征, 着重对维吾尔语语音的基音频率和共振峰频率进行了详细分析.利用不确定性线性判别分析对特征维数进行了降维研究, 获得了比其他的常用降维技术更好的结果.通过不确定性线性判别分析获得的低维数据供给支持向量机, 实现了维吾尔语的语音情感识别.实验结果表明, 采用适当的不确定性估计算法时, 在维吾尔语音情感识别任务上, 不确定性线性判别分析(ULDA)算法优于传统LDA降维算法.

References:

[1] El Ayadi M, Kamel M S, Karray F. Survey on speech emotion recognition: Features, classification schemes, and databases [J]. Pattern Recognition, 2011, 44(3): 572-587. DOI:10.1016/j.patcog.2010.09.020.
[2] Chu D, Liao L Z, Ng M K, et al. Incremental linear discriminant analysis: A fast algorithm and comparisons [J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(11): 2716-2735. DOI:10.1109/TNNLS.2015.2391201.
[3] Quan C, Wan D, Zhang B, et al. Reduce the dimensions of emotional features by principal component analysis for speech emotion recognition [C]//Proceedings of the 2013 IEEE/SICE International Symposium on System Integration. Kobe, Japan, 2013: 222-226. DOI:10.1109/sii.2013.6776653.
[4] Saeidi R, Astudillo R F, Kolossa D. Uncertain LDA: Including observation uncertainties in discriminative transforms [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(7): 1479-1488. DOI:10.1016/j.patcog.2010.09.020.
[5] Soldera J, Behaine C A R, Scharcanski J. Customized orthogonal locality preserving projections with soft-margin maximization for face recognition [J]. IEEE Transactions on Instrumentation and Measurement, 2015, 64(9): 2417-2426. DOI:10.1109/TIM.2015.2415012.
[6] Zhou Y, Peng J, Chen C L P. Dimension reduction using spatial and spectral regularized local discriminant embedding for hyperspectral image classification [J]. IEEE Transactions on Geoscience and Remote Sensing, 2015, 53(2): 1082-1095.
[7] Li W, Du Q. Laplacian regularized collaborative graph for discriminant analysis of hyperspectral imagery [J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(12): 7066-7076. DOI:10.1109/tgrs.2016.2594848.
[8] Burkhardt F, Paeschke A, Rolfes M, et al. A database of German emotional speech [C]//Proceedings of the 2005 INTERSPEECH. Lisbon, Portugal, 2005:1517-1520.
[9] McGilloway S, Cowie R, Douglas-Cowie E, et al. Approaching automatic recognition of emotion from voice: A rough benchmark [C]//Proceedings of the 2000 ISCA Workshop on Speech and Emotion: A Conceptual Framework for Research. Newcastle, Northern Ireland, UK, 2000:207-212.
[10] Ablimit M, Eli M, Kawahara T. Partly supervised Uyghur morpheme segmentation [C]//Oriental Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques Workshop. Kyoto, Japan, 2008: 71-76.
[11] Pan S, Tao J, Li Y. The CASIA audio emotion recognition method for audio/visual emotion challenge 2011 [C]//Affective Computing and Intelligent Interaction Fourth International Conference. Memphis, USA, 2011:388-395. DOI:10.1007/978-3-642-24571-8_50.
[12] Eyben F, Wollmer M, Schuller B. Opensmile: The munich versatile and fast open-source audio feature extractor [C]//ACM International Conference on Multimedia. Firenze, Italy, 2010: 1459-1462.
[13] Xu X Z, Deng J, Zheng W M, et al. Dimensionality reduction for speech emotion features by multiscale kernels [C]//Proceedings of Annual Conference of the International Speech Communication Association. Dresden, Germany, 2015:1532-1536.
[14] Wu S, Falk T H, Chan W Y. Automatic speech emotion recognition using modulation spectral features [J]. Speech Communication, 2011, 53(5): 768-785. DOI:10.1016/j.specom.2010.08.013.

Memo

Memo:: Biographies: Tashpolat Nizamidin(1988—), male, graduate; Zhao Li(corresponding author), male, doctor, professor, zhaoli@seu.edu.cn.
Foundation item: The National Natural Science Foundation of China(No.61673108, 61231002).
Citation: Tashpolat Nizamidin, Zhao Li, Zhang Mingyang, et al. Emotion recognition of Uyghur speech using uncertain linear discriminant analysis[J].Journal of Southeast University(English Edition), 2017, 33(4):437-443.DOI:10.3969/j.issn.1003-7985.2017.04.008.

Last Update: 2017-12-20

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Info

References:

Memo

Common functions

Navigate

Tools

Statistics