[1] AkE7;ay M B, Oguz K.Speech emotion recognition:Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers[J].Speech Communication, 2020, 116:56-76.DOI:10.1016/j.specom.2019.12.001.
[2] Hochreiter S, Schmidhuber J.Long short-term memory[J].Neural Computation, 1997, 9(8):1735-1780.DOI:10.1162/neco.1997.9.8.1735.
[3] Chung J, Gulcehre C, Cho K, et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL].(2014)[2020-08-01].https://arxiv.org/abs/1412.3555.
[4] Mirsamadi S, Barsoum E, Zhang C.Automatic speech emotion recognition using recurrent neural networks with local attention[C]//2017 IEEE International Conference on Acoustics, Speech and Signal Processing.New Orleans, LA, USA, 2017:2227-2231.DOI:10.1109/ICASSP.2017.7952552.
[5] Greff K, Srivastava R K, Koutník J, et al.LSTM:A search space odyssey[J].IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(10):2222-2232.DOI:10.1109/TNNLS.2016.2582924.
[6] Thakker U, Dasika G, Beu J, et al.Measuring scheduling efficiency of RNNs for NLP applications[EB/OL].(2019)[2020-08-01].https://arxiv.org/abs/1904.03302.
[7] Vaswani A, Shazeer N, Parmar N, et al.Attention is all you need[C]//Advances in Neural Information Processing Systems.Long Beach, CA, USA, 2017:5998-6008.
[8] India M, Safari P, Hernando J.Self multi-head attention for speaker recognition[C]//Interspeech 2019.Graz, Austrilia, 2019:4305-4309.DOI:10.21437/interspeech.2019-2616.
[9] Busso C, Bulut M, Lee C C, et al.IEMOCAP:interactive emotional dyadic motion capture database[J].Language Resources and Evaluation, 2008, 42(4):335-359.DOI:10.1007/s10579-008-9076-6.
[10] Lian Z, Tao J H, Liu B, et al.Conversational emotion analysis via attention mechanisms[C]//Interspeech 2019.Graz, Austrilia, 2019:1936-1940.DOI:10.21437/interspeech.2019-1577.
[11] Li R N, Wu Z Y, Jia J, et al.Dilated residual network with multi-head self-attention for speech emotion recognition[C]//2019 IEEE International Conference on Acoustics, Speech and Signal Processing.Brighton, UK, 2019:6675-6679.DOI:10.1109/ICASSP.2019.8682154.
[12] Devlin J, Chang M W, Lee K, et al.BERT:Pre-training ofdeep bidirectional transformers for language understanding[EB/OL].(2019)[2020-08-01].https://arxiv.org/abs/1810.04805
[13] Hendrycks D, Gimpel K.Gaussian error linear units(GELUs)[EB/OL].(2016)[2020-08-01].https://arxiv.org/abs/1606.08415.
[14] He K M, Zhang X Y, Ren S Q, et al.Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).Las Vegas, NV, USA, 2016:770-778.DOI:10.1109/CVPR.2016.90.
[15] Burkhardt F, Paeschke A, Rolfes M, et al.A database of German emotional speech[C]//Interspeech 2005.Lisbon, Portugal, 2005:1517-1520.
[16] Latif S, Qayyum A, Usman M, et al.Cross lingual speech emotion recognition:Urdu vs.western languages[C]//2018 International Conference on Frontiers of Information Technology (FIT).Islamabad, Pakistan, 2018:88-93.DOI:10.1109/FIT.2018.00023.
[17] Nediyanchath A, Paramasivam P, Yenigalla P.Multi-head attention for speech emotion recognition with auxiliary learning of gender recognition[C]//2020 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP).Barcelona, Spain, 2020:7179-7183.DOI:10.1109/ICASSP40776.2020.9054073.
[18] Chavan V M, Gohokar V V.Speech emotion recognition by using SVM-classifier[J].International Journal of Engineering & Advanced Technology, 2012(5):11-15.
[19] Xi Y X, Li P C, Song Y, et al.Speaker to emotion:Domain adaptation for speech emotion recognition with residual adapters[C]//2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.Lanzhou, China, 2019:513-518.DOI:10.1109/APSIPAASC47483.2019.9023339.