[1] Kovashka A, Grauman K. Learning a hierarchy of discriminative space-time neighborhood features for human action recognition [C]//Proc of the International Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA, 2010: 2046-2053.
[2] Lan T, Wang Y, Mori G. Discriminative figure-centric models for joint action localization and recognition [C]//Proc of the International Conference on Computer Vision. Colorado, USA, 2011:2003-2010.
[3] Hu Q, Qin L, Huang Q, et al. Action recognition using spatial-temporal context [C]//Proc of the 20th International Conference of Pattern Recognition. Istanbul, Turkey, 2010:1521-1524.
[4] Yuan C, Hu W, Wang H, et al. Spatio-temporal proximity distribution kernels for action recognition [C]//Proc of the International Conference of Acoustics, Speech and Signal Processing. Dallas, TX, USA, 2010:1126-1129.
[5] Song Y, Morency L P, Davis R. Action recognition by hierarchical sequence summarization [C]//2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA, 2013:3562-3569.
[6] Jain M, Jegou H, Bouthemy P. Better exploiting motion for better action recognition [C]//Proc of the International Conference of Computer Vision and Pattern Recognition. Portland, OR, USA, 2013:2555-2562.
[7] Chakraborty B, Holte M B, Moeslund T B, et al. Selective spatio-temporal interest points [J]. Computer Vision and Image Understanding, 2012, 116(3):396-410.
[8] Zhou T C, Chen X, Wu Z Y. Action recognition using hierarchically tree-structured dictionary encoding [J]. Journal of Image and Graphics, 2014, 19(7):1054-1061.(in Chinese)
[9] Chao Y W, Yeh Y R, Chen Y W, et al. Locality-constrained group sparse representation for robust face recognition [C]//Proc of the International Conference on Image Processing. Brussels, Belgium, 2011:761-764.
[10] Xiao W H, Wang B, Liu Y, et al. Action recognition using feature position constrained linear coding [C]//Proc of the International Conference on Multimedia and Expo. San Jose, CA, USA, 2013:1-6.
[11] Vedaldi, A, Zisserman A. Efficient additive kernels via explicit feature maps [C]//Proc of the International Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA, 2010: 2046-2053.
[12] Chapelle O, Haffner P, Vapnik V N. Support vector machines for histogram-based image classification [J]. IEEE Transactions on Neural Networks, 1999, 10(5): 1055-1064.
[13] Castrodad A, Sapiro G. Sparse modeling of human actions from motion imagery [J]. International Journal of Computer Vision, 2012, 100(1): 1-15.
[14] Michalis R, Iasonas K, Stefano S. Discovering discriminative action parts from mid-level video representations [C]//Proc of the International Conference of Computer Vision and Pattern Recognition. Rhode Island, USA, 2012:1242-1249.
[15] Sanin A, Sanderson C, Harandi M T, et al. Spatio-temporal covariance descriptors for action and gesture recognition [C]//Proc of International conference on Application of Computer Vision Workshop. Sydney, Australia, 2013:103-110.