[1] Viola P, Jones M. Rapid object detection using a boosted cascade of simple features[C]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Kauai, HI, USA, 2001: 511-518. DOI:10.1109/CVPR.2001.990517.
[2] Viola P, Jones M J. Robust real-time face detection[J].International Journal of Computer Vision, 2004, 57(2): 137-154. DOI:10.1023/b:visi.0000013087.49260.fb.
[3] Dalal N, Triggs B.Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.San Diego, CA, USA, 2005:886-893.DOI:10.1109/CVPR.2005.177.
[4] Felzenszwalb P F, Girshick R B, McAllester D.Cascade object detection with deformable part models[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.San Francisco, CA, USA, 2010 2241-2248.DOI:10.1109/CVPR.2010.5539906.
()/((a))()/((b))()/((c))Fig.9Qualitative results for small objects on COCO test-dev set.(a)Input images;(b)Results of SSD300;(c)Results of NFPN-SSD300 [5] Uijlings J R R, Sande K, Gevers T, et al. Selective search for object recognition[J].International Journal of Computer Vision, 2013, 104(2): 154-171. DOI:10.1007/s11263-013-0620-5.
[6] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI:10.1109/TPAMI.2016.2577031.
[7] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA, 2014: 580-587. DOI:10.1109/CVPR.2014.81.
[8] He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. DOI:10.1109/TPAMI.2015.2389824.
[9] Everingham M, van Gool L, Williams C K I, et al. The pascal visual object classes(VOC)challenge[J].International Journal of Computer Vision, 2010, 88(2): 303-338. DOI:10.1007/s11263-009-0275-4.
[10] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: common objects in context[M]//Computer Vision—ECCV 2014. Cham: Springer International Publishing, 2014: 740-755. DOI:10.1007/978-3-319-10602-1_48.
[11] Girshick R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision.Santiago, Chile, 2015: 1440-1448. DOI:10.1109/ICCV.2015.169.
[12] Gidaris S, Komodakis N. Object detection via a multi-region and semantic segmentation-aware CNN model[C]//2015 IEEE International Conference on Computer Vision. Santiago, Chile, 2015: 1134-1142. DOI:10.1109/ICCV.2015.135.
[13] Dai J F, Li Y, He K M, et al. R-FCN: Object detection via region-based fully convolutional networks [J/OL]. arXiv preprint arXiv:1605.06409, 2016. https://arxiv.org/abs/1605.06409.
[14] He K M, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//2017 IEEE International Conference on Computer Vision. Venice, Italy, 2017: 2980-2988. DOI:10.1109/ICCV.2017.322.
[15] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA., 2017: 936-944. DOI:10.1109/CVPR.2017.106.
[16] Zhu Y S, Zhao C Y, Wang J Q, et al. CoupleNet: Coupling global structure with local parts for object detection[C]//2017 IEEE International Conference on Computer Vision. Venice, Italy, 2017: 4146-4154. DOI:10.1109/ICCV.2017.444.
[17] Liu W, Anguelov D, Erhan D, et al. SSD:Single shot multibox detector[M]//Computer Vision—ECCV 2016. Cham: Springer International Publishing, 2016: 21-37. DOI:10.1007/978-3-319-46448-0_2.
[18] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA, 2016: 779-788. DOI:10.1109/CVPR.2016.91.
[19] Fu C Y, Liu W, Ranga A, et al. DSSD: Deconvolutionalsingle shot detector [J/OL]. arXiv preprint arXiv:1701.06659, 2017. https://arxiv.org/abs/1701.06659.
[20] Jeong J, Park H, Kwak N. Enhancement of SSD by concatenating feature maps for object detection[C]//British Machine Vision Conference. London, UK, 2017. DOI:10.5244/C.31.76.
[21] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//2017 IEEE International Conference on Computer Vision. Venice, Italy, 2017: 2999-3007. DOI:10.1109/ICCV.2017.324.
[22] Redmon J, Farhadi A. YOLO9000:Better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA, 2017: 6517-6525. DOI:10.1109/CVPR.2017.690.
[23] Shen Z Q, Liu Z, Li J G, et al. DSOD:Learning deeply supervised object detectors from scratch[C]//2017 IEEE International Conference on Computer Vision. Venice, Italy, 2017: 1937-1945. DOI:10.1109/ICCV.2017.212.
[24] Redmon J, Farhadi A. YOLOv3: An incremental improvement [J/OL]. arXiv preprint arXiv:1804.02767, 2018. https://arxiv.org/abs/1804.02767.
[25] Zhang S F, Wen L Y, Bian X, et al. Single-shot refinement neural network for object detection[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA, 2018: 4203-4212. DOI:10.1109/CVPR.2018.00442.
[26] Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA, 2017: 2261-2269. DOI:10.1109/CVPR.2017.243.
[27] Tian Z, Shen C H, Chen H, et al. FCOS: fully convolutional one-stage object detection[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea, 2019: 9626-9635. DOI:10.1109/ICCV.2019.00972.
[28] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition [J/OL]. arXiv preprint arXiv:1409.1556, 2014. https://arxiv.org/abs/1409.1556.
[29] Liu S, Qi L, Qin H F, et al. Path aggregation network for instance segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA, 2018: 8759-8768. DOI:10.1109/CVPR.2018.00913.
[30] Ghiasi G, Lin T Y, Le Q V. NAS-FPN: Learning scalable feature pyramid architecture for object detection[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA, 2019: 7029-7038. DOI:10.1109/CVPR.2019.00720.
[31] Jia Y Q, Shelhamer E, Donahue J, et al. Caffe: convolutional architecture for fast feature embedding[C]//Proceedings of the ACM International Conference on Multimedia. Orlando, FL, USA, 2014: 675-678. DOI:10.1145/2647868.2654889.
[32] Liu W, Rabinovich A, Berg A C. ParseNet: Lookingwider to see better [J/OL]. arXiv preprint arXiv:1506.04579, 2015. https://arxiv.org/abs/1506.04579.
[33] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA, 2016: 770-778. DOI:10.1109/CVPR.2016.90.
[34] Li Z, Zhou F. FSSD: Featurefusion single shot multibox detector [J/OL]. arXiv preprint arXiv:1712.00960, 2017. https://arxiv.org/abs/1712.00960.
[35] Bell S, Zitnick C L, Bala K, et al. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA, 2016: 2874-2883. DOI:10.1109/CVPR.2016.314.
[36] Chen K, Wang J, Pang J, et al. MMDetection: Open MMLabdetection toolbox and benchmark [J/OL]. arXiv preprint arXiv:1906.07155, 2019. https://arxiv.org/abs/1906.07155v1.