|Table of Contents|

[1] Liang Zhenlin, Liang Ruiyu, Tang Manting, et al. Transfer learning with deep sparse auto-encoderfor speech emotion recognition [J]. Journal of Southeast University (English Edition), 2019, 35 (2): 160-167. [doi:10.3969/j.issn.1003-7985.2019.02.003]

Transfer learning with deep sparse auto-encoderfor speech emotion recognition()

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

2019 2
Research Field:
Information and Communication Engineering
Publishing date:


Transfer learning with deep sparse auto-encoderfor speech emotion recognition
Liang Zhenlin1 Liang Ruiyu1 2 Tang Manting3 Xie Yue1 Zhao Li1 Wang Shijia1
1School of Information Science and Engineering, Southeast University, Nanjing 210096, China
2School of Communication Engineering, Nanjing Institute of Technology, Nanjing 211167, China
3School of Computer Engineering, Jinling Institute of Technology, Nanjing 211169, China
sparse auto-encoder transfer learning speech emotion recognition
In order to improve the efficiency of speech emotion recognition across corpora, a speech emotion transfer learning method based on the deep sparse auto-encoder is proposed. The algorithm first reconstructs a small amount of data in the target domain by training the deep sparse auto-encoder, so that the encoder can learn the low-dimensional structural representation of the target domain data. Then, the source domain data and the target domain data are coded by the trained deep sparse auto-encoder to obtain the reconstruction data of the low-dimensional structural representation close to the target domain. Finally, a part of the reconstructed tagged target domain data is mixed with the reconstructed source domain data to jointly train the classifier. This part of the target domain data is used to guide the source domain data. Experiments on the CASIA, SoutheastLab corpus show that the model recognition rate after a small amount of data transferred reached 89.2% and 72.4% on the DNN. Compared to the training results of the complete original corpus, it only decreased by 2% in the CASIA corpus, and only 3.4% in the SoutheastLab corpus. Experiments show that the algorithm can achieve the effect of labeling all data in the extreme case that the data set has only a small amount of data tagged.


[1] Schuller B, Vlasenko B, Eyben F, et al. Cross-corpus acoustic emotion recognition: Variances and strategies[J]. IEEE Transactions on Affective Computing, 2010, 1(2): 119-131. DOI:10.1109/t-affc.2010.8.
[2] Lim H, Kim M J, Kim H. Cross-acoustic transfer learning for sound event classification[C]// IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP). Shanghai, China, 2016: 16021470.
[3] Torrey L, Shavlik J. Transfer learning[M]//Handbook of Research on Machine Learning Applications and Trends:Algorithms, Methods, and Techniques. IGI Global, 2010:242-264. DOI:10.4018/978-1-60566-766-9.ch011.
[4] Deng J, Zhang Z X, Marchi E, et al. Sparse autoencoder-based feature transfer learning for speech emotion recognition[C]// 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction. Geneva, Switzerland, 2013: 511-516.
[5] Latif S, Rana R, Younis S, et al. Cross corpus speech emotion classification—An effective transfer learning technique[EB/OL].(2018-01-22)[2018-11-20]. https://www.researchgate.net/publication/322634480_Cross_Corpus_Speech_Emotion_Classification_-_An_Effective_Transfer_Learning_Technique.
[6] Zong Y, Zheng W M, Zhang T, et al. Cross-corpus speech emotion recognition based on domain-adaptive least-squares regression[J]. IEEE Signal Processing Letters, 2016, 23(5): 585-589. DOI:10.1109/lsp.2016.2537926.
[7] Song P, Zheng W M. Feature selection based transfer subspace learning for speech emotion recognition[J]. IEEE Transactions on Affective Computing, 2018: 1. DOI:10.1109/taffc.2018.2800046.
[8] Xu J, Xiang L, Liu Q S, et al. Stacked sparse autoencoder(SSAE)for nuclei detection on breast cancer histopathology images[J]. IEEE Transactions on Medical Imaging, 2016, 35(1): 119-130. DOI:10.1109/tmi.2015.2458702.
[9] Sarath C A P, Lauly S, Larochelle H, et al. An autoencoder approach to learning bilingual word representations[C]//International Conference on Neural Information Processing Systems. Kuching, Malaysia, 2014: 1853-1861.
[10] Goodfellow I J, Le Q V, Saxe A M, et al. Measuring invariances in deep networks[C]// International Conference on Neural Information Processing Systems. Bangkok, Thailand, 2009: 646-654.
[11] Mairal J, Bach F, Ponce J. Online learning for matrix factorization and sparse coding[J]. Journal of Machine Learning Research, 2009, 11(1): 19-60.
[12] HintonG E. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507. DOI:10.1126/science.1127647.
[13] Pan S F, Tao J H, Li Y. The CASIA audio emotion recognition method for audio/visual emotion challenge 2011[C]// Proceedings of the Fourth International Conference on Affective Computing and Intelligent Interaction. Memphis, TN, USA, 2011:388-395.
[14] Eyben F, Wöllmer M, Schuller B. openSMILE—The Munich versatile and fast open-source audio feature extractor[C]//ACM International Conference on Multimedia. Firenze, Italia, 2010: 1459-1462.
[15] Larochelle H, Bengio Y, Louradour J, et al. Exploring Strategies for training deep neural networks[J]. Journal of Machine Learning Research, 2009, 1(10): 1-40.
[16] Bengio Y, Lamblin P, Dan P, et al. Greedy layer-wise training of deep networks[J]. Advances in Neural Information Processing Systems, 2007, 19(2007): 153-160.
[17] Hinton G E. Deep belief networks[J]. Scholarpedia, 2009, 4(5): 5947.DOI:10.4249/scholarpedia.5947.
[18] Xu B, Wang N, Chen T, et al. Empirical evaluation of rectified activations in convolutional network[EB/OL].(2015-11-27)[2018-11-20]. http://de.arxiv.org/pdf/1505.00853.


Biographies: Liang Zhenlin(1995—), male, graduate; Zhao Li(corresponding author), male, doctor, professor, zhaoli@seu. edu. cn.
Foundation items: The National Natural Science Foundation of China(No.61871213, 61673108, 61571106), Six Talent Peaks Project in Jiangsu Province(No.2016-DZXX-023).
Citation: Liang Zhenlin, Liang Ruiyu, Tang Manting, et al.Transfer learning with deep sparse auto-encoder for speech emotion recognition[J].Journal of Southeast University(English Edition), 2019, 35(2):160-167.DOI:10.3969/j.issn.1003-7985.2019.02.003.
Last Update: 2019-06-20