[1] Zeng Z, Pantic M, Huang T S. Emotion recognition based on multimodal information [M]//Affective Information Processing. Springer, 2009: 241-265. DOI:10.1007/978-1-84800-306-4_14.
[2] Zeng Z, Pantic M, Roisman G I, et al. A survey of affect recognition methods: Audio, visual, and spontaneous expressions [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(1): 39-58. DOI:10.1109/TPAMI.2008.52.
[3] El Ayadi M, Kamel M S, Karray F. Survey on speech emotion recognition: Features, classification schemes, and databases [J]. Pattern Recognition, 2011, 44(3): 572-587. DOI:10.1016/j.patcog.2010.09.020.
[4] Chen L, Mao X, Xue Y, et al. Speech emotion recognition: Features and classification models [J]. Digital Signal Processing, 2012, 22(6): 1154-1160. DOI:10.1016/j.dsp.2012.05.007.
[5] Yan J, Wang X, Gu W, et al. Speech emotion recognition based on sparse representation [J]. Archives of Acoustics, 2013, 38(4): 465-470. DOI:10.2478/aoa-2013-0055.
[6] Zhao G, Pietikainen M. Dynamic texture recognition using local binary patterns with an application to facial expressions[J]. IEEE Transactions on Pattern Analysis And Machine Intelligence, 2007, 29(6): 915-928. DOI:10.1109/TPAMI.2007.1110.
[7] Petridis S, Gunes H, Kaltwang S, et al. Static vs. dynamic modeling of human nonverbal behavior from multiple cues and modalities [C]//Proceedings of the 2009 International Conference on Multimodal Interfaces. Cambridge, MA, USA, 2009:23-30. DOI:10.1145/1647314.1647321.
[8] Wang Y, Guan L, Venetsanopoulos A N. Audiovisual emotion recognition via cross-modal association in kernel space[C]// 2011 IEEE International Conference on Multimedia and Expo(ICME). Barcelona, Spain, 2011:6011949-1-6011949-6. DOI:10.1109/icme.2011.6011949.
[9] Metallinou A, Lee S, Narayanan S. Audio-visual emotion recognition using gaussian mixture models for face and voice[C]//Tenth IEEE International Symposium on Multimedia. Berkeley, CA, USA, 2008: 250-257. DOI:10.1109/ism.2008.40.
[10] Li Y, Tao J, Schuller B, et al. MEC 2016: The multimodal emotion recognition challenge of CCPR 2016 [M]//Pattern Recognition. Springer, 2016:667-678.
[11] Eyben F, Wöllmer M, Schuller B. Opensmile: The munich versatile and fast open-source audio feature extractor[C]//Proceedings of the 18th ACM International Conference on Multimedia. Firenze, Italy, 2010: 1459-1462. DOI:10.1145/1873951.1874246.
[12] Schuller B, Steidl S, Batliner A, et al. The INTERSPEECH 2010 paralinguistic challenge [C]//11th Annual Conference of the International Speech Communication Association. Makuhari, Chiba, Japan, 2010: 2795-2798.
[13] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(6):1137-1149. DOI: 10.1109/TPAMI.2016.2577031.
[14] He G, Chen J, Liu X, et al. The SYSU system for CCPR 2016 multimodal emotion recognition challenge [M]//Pattern Recognition. Springer, 2016:707-720.
[15] Sun B, Xu Q, He J, et al. Audio-video based multimodal emotion recognition using SVMs and deep learning[M]//Communications in Computer and Information Science, 2016: 621-631. DOI:10.1007/978-981-10-3005-5_51.