|Table of Contents|

[1] Mo Lingfei, Hu Shuming,. Neighborhood fusion-based hierarchical parallel featurepyramid network for object detection [J]. Journal of Southeast University (English Edition), 2020, 36 (3): 252-263. [doi:10.3969/j.issn.1003-7985.2020.03.002]
Copy

Neighborhood fusion-based hierarchical parallel featurepyramid network for object detection()
用于目标检测的邻域融合与分层并行特征金字塔网络
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
36
Issue:
2020 03
Page:
252-263
Research Field:
Computer Science and Engineering
Publishing date:
2020-09-20

Info

Title:
Neighborhood fusion-based hierarchical parallel featurepyramid network for object detection
用于目标检测的邻域融合与分层并行特征金字塔网络
Author(s):
Mo Lingfei Hu Shuming
School of Instrument Science and Engineering, Southeast University, Nanjing 210096, China
莫凌飞 胡书铭
东南大学仪器科学与工程学院, 南京 210096
Keywords:
computer vision deep convolutional neural network object detection hierarchical parallel feature pyramid network multi-scale feature fusion
计算机视觉 深度卷积神经网络 目标检测 分层并行 特征金字塔网络 多尺度特征融合
PACS:
TP391.4
DOI:
10.3969/j.issn.1003-7985.2020.03.002
Abstract:
In order to improve the detection accuracy of small objects, a neighborhood fusion-based hierarchical parallel feature pyramid network(NFPN)is proposed. Unlike the layer-by-layer structure adopted in the feature pyramid network(FPN)and deconvolutional single shot detector(DSSD), where the bottom layer of the feature pyramid network relies on the top layer, NFPN builds the feature pyramid network with no connections between the upper and lower layers. That is, it only fuses shallow features on similar scales. NFPN is highly portable and can be embedded in many models to further boost performance. Extensive experiments on PASCAL VOC 2007, 2012, and COCO datasets demonstrate that the NFPN-based SSD without intricate tricks can exceed the DSSD model in terms of detection accuracy and inference speed, especially for small objects, e.g., 4% to 5% higher mAP(mean average precision)than SSD, and 2% to 3% higher mAP than DSSD. On VOC 2007 test set, the NFPN-based SSD with 300×300 input reaches 79.4% mAP at 34.6 frame/s, and the mAP can raise to 82.9% after using the multi-scale testing strategy.
为了提升对小目标的检测精度, 提出了一种基于邻域融合的分层并行特征金字塔网络(NFPN).与特征金字塔网络(FPN)和反卷积单次多框检测器(DSSD)中采用的逐层递进融合方式(特征金字塔网络的底层特征依赖于顶层特征)不同, NFPN仅对具有相似尺度的浅层特征进行融合, 所构建的特征金字塔网络上下层之间没有依赖关系.NFPN具有高度的可移植性, 可嵌入到许多检测模型中来进一步提升性能.在PASCAL VOC 2007、2012和COCO数据集上进行的大量实验表明, 基于NFPN的SSD模型在检测精度和推理速度方面均优于DSSD模型, 尤其是对于小目标而言, NFPN-SSD300精度比SSD300高4%~5%, 比DSSD321高2%~3%.在VOC 2007测试集上, 输入分辨率为300×300的基于NFPN的SSD模型可以实现79.4%的检测精度和34.6 frame/s的推理速度, 在使用多尺度测试方法后, 其精度可提升至82.9%.

References:

[1] Viola P, Jones M. Rapid object detection using a boosted cascade of simple features[C]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Kauai, HI, USA, 2001: 511-518. DOI:10.1109/CVPR.2001.990517.
[2] Viola P, Jones M J. Robust real-time face detection[J].International Journal of Computer Vision, 2004, 57(2): 137-154. DOI:10.1023/b:visi.0000013087.49260.fb.
[3] Dalal N, Triggs B.Histograms of oriented gradients for human detection[C]//2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.San Diego, CA, USA, 2005:886-893.DOI:10.1109/CVPR.2005.177.
[4] Felzenszwalb P F, Girshick R B, McAllester D.Cascade object detection with deformable part models[C]//2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.San Francisco, CA, USA, 2010 2241-2248.DOI:10.1109/CVPR.2010.5539906.
()/((a))()/((b))()/((c))Fig.9Qualitative results for small objects on COCO test-dev set.(a)Input images;(b)Results of SSD300;(c)Results of NFPN-SSD300 [5] Uijlings J R R, Sande K, Gevers T, et al. Selective search for object recognition[J].International Journal of Computer Vision, 2013, 104(2): 154-171. DOI:10.1007/s11263-013-0620-5.
[6] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI:10.1109/TPAMI.2016.2577031.
[7] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA, 2014: 580-587. DOI:10.1109/CVPR.2014.81.
[8] He K M, Zhang X Y, Ren S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. DOI:10.1109/TPAMI.2015.2389824.
[9] Everingham M, van Gool L, Williams C K I, et al. The pascal visual object classes(VOC)challenge[J].International Journal of Computer Vision, 2010, 88(2): 303-338. DOI:10.1007/s11263-009-0275-4.
[10] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: common objects in context[M]//Computer Vision—ECCV 2014. Cham: Springer International Publishing, 2014: 740-755. DOI:10.1007/978-3-319-10602-1_48.
[11] Girshick R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision.Santiago, Chile, 2015: 1440-1448. DOI:10.1109/ICCV.2015.169.
[12] Gidaris S, Komodakis N. Object detection via a multi-region and semantic segmentation-aware CNN model[C]//2015 IEEE International Conference on Computer Vision. Santiago, Chile, 2015: 1134-1142. DOI:10.1109/ICCV.2015.135.
[13] Dai J F, Li Y, He K M, et al. R-FCN: Object detection via region-based fully convolutional networks [J/OL]. arXiv preprint arXiv:1605.06409, 2016. https://arxiv.org/abs/1605.06409.
[14] He K M, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//2017 IEEE International Conference on Computer Vision. Venice, Italy, 2017: 2980-2988. DOI:10.1109/ICCV.2017.322.
[15] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA., 2017: 936-944. DOI:10.1109/CVPR.2017.106.
[16] Zhu Y S, Zhao C Y, Wang J Q, et al. CoupleNet: Coupling global structure with local parts for object detection[C]//2017 IEEE International Conference on Computer Vision. Venice, Italy, 2017: 4146-4154. DOI:10.1109/ICCV.2017.444.
[17] Liu W, Anguelov D, Erhan D, et al. SSD:Single shot multibox detector[M]//Computer Vision—ECCV 2016. Cham: Springer International Publishing, 2016: 21-37. DOI:10.1007/978-3-319-46448-0_2.
[18] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA, 2016: 779-788. DOI:10.1109/CVPR.2016.91.
[19] Fu C Y, Liu W, Ranga A, et al. DSSD: Deconvolutionalsingle shot detector [J/OL]. arXiv preprint arXiv:1701.06659, 2017. https://arxiv.org/abs/1701.06659.
[20] Jeong J, Park H, Kwak N. Enhancement of SSD by concatenating feature maps for object detection[C]//British Machine Vision Conference. London, UK, 2017. DOI:10.5244/C.31.76.
[21] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//2017 IEEE International Conference on Computer Vision. Venice, Italy, 2017: 2999-3007. DOI:10.1109/ICCV.2017.324.
[22] Redmon J, Farhadi A. YOLO9000:Better, faster, stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA, 2017: 6517-6525. DOI:10.1109/CVPR.2017.690.
[23] Shen Z Q, Liu Z, Li J G, et al. DSOD:Learning deeply supervised object detectors from scratch[C]//2017 IEEE International Conference on Computer Vision. Venice, Italy, 2017: 1937-1945. DOI:10.1109/ICCV.2017.212.
[24] Redmon J, Farhadi A. YOLOv3: An incremental improvement [J/OL]. arXiv preprint arXiv:1804.02767, 2018. https://arxiv.org/abs/1804.02767.
[25] Zhang S F, Wen L Y, Bian X, et al. Single-shot refinement neural network for object detection[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA, 2018: 4203-4212. DOI:10.1109/CVPR.2018.00442.
[26] Huang G, Liu Z, van der Maaten L, et al. Densely connected convolutional networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA, 2017: 2261-2269. DOI:10.1109/CVPR.2017.243.
[27] Tian Z, Shen C H, Chen H, et al. FCOS: fully convolutional one-stage object detection[C]//2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea, 2019: 9626-9635. DOI:10.1109/ICCV.2019.00972.
[28] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition [J/OL]. arXiv preprint arXiv:1409.1556, 2014. https://arxiv.org/abs/1409.1556.
[29] Liu S, Qi L, Qin H F, et al. Path aggregation network for instance segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA, 2018: 8759-8768. DOI:10.1109/CVPR.2018.00913.
[30] Ghiasi G, Lin T Y, Le Q V. NAS-FPN: Learning scalable feature pyramid architecture for object detection[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA, 2019: 7029-7038. DOI:10.1109/CVPR.2019.00720.
[31] Jia Y Q, Shelhamer E, Donahue J, et al. Caffe: convolutional architecture for fast feature embedding[C]//Proceedings of the ACM International Conference on Multimedia. Orlando, FL, USA, 2014: 675-678. DOI:10.1145/2647868.2654889.
[32] Liu W, Rabinovich A, Berg A C. ParseNet: Lookingwider to see better [J/OL]. arXiv preprint arXiv:1506.04579, 2015. https://arxiv.org/abs/1506.04579.
[33] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA, 2016: 770-778. DOI:10.1109/CVPR.2016.90.
[34] Li Z, Zhou F. FSSD: Featurefusion single shot multibox detector [J/OL]. arXiv preprint arXiv:1712.00960, 2017. https://arxiv.org/abs/1712.00960.
[35] Bell S, Zitnick C L, Bala K, et al. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA, 2016: 2874-2883. DOI:10.1109/CVPR.2016.314.
[36] Chen K, Wang J, Pang J, et al. MMDetection: Open MMLabdetection toolbox and benchmark [J/OL]. arXiv preprint arXiv:1906.07155, 2019. https://arxiv.org/abs/1906.07155v1.

Memo

Memo:
Biography: Mo Lingfei(1981—), male, doctor, associate professor, lfmo@seu.edu.cn.
Foundation item: The National Natural Science Foundation of China(No. 61603091).
Citation: Mo Lingfei, Hu Shuming. Neighborhood fusion-based hierarchical parallel feature pyramid network for object detection[J].Journal of Southeast University(English Edition), 2020, 36(3):252-263.DOI:10.3969/j.issn.1003-7985.2020.03.002.
Last Update: 2020-09-20