Information and Communication Engineering
Transformer-like model with linear attentionfor speech emotion recognition
Du Jing1 Tang Manting2 Zhao Li1
1School of Information Science and Engineering, Southeast University, Nanjing 210096, China
2School of Computational Engineering, Jinling Institute of Technology, Nanjing 211169, China
transformer attention mechanism speech emotion recognition fast softmax
Because of the excellent performance of Transformer in sequence learning tasks, such as natural language processing, an improved Transformer-like model is proposed that is suitable for speech emotion recognition tasks. To alleviate the prohibitive time consumption and memory footprint caused by softmax inside the multihead attention unit in Transformer, a new linear self-attention algorithm is proposed. The original exponential function is replaced by a Taylor series expansion formula. On the basis of the associative property of matrix products, the time and space complexity of softmax operation regarding the input’s length is reduced from O(N2)to O(N), where N is the sequence length. Experimental results on the emotional corpora of two languages show that the proposed linear attention algorithm can achieve similar performance to the original scaled dot product attention, while the training time and memory cost are reduced by half. Furthermore, the improved model obtains more robust performance on speech emotion recognition compared with the original Transformer.


Biographies: Du Jing(1997—), female, graduate;Zhao Li(corresponding author), male, doctor, professor, zhaoli@seu.edu.cn.
Foundation items: The National Key Research and Development Program of China(No.2020YFC2004002, 2020YFC2004003), the National Natural Science Foundation of China(No.61871213, 61673108, 61571106).
