[1] Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[C]//IEEE Computer Vision and Pattern Recognition. Las Vegas, CA, USA, 2016:2818-2826. DOI:10.1109/CVPR.2016.3 08.
[2] Wang E, Davis J J, Zhao R, et al. Deep neural network approximation for custom hardware:Where we’ve been, where we’re going[J]. ACM Computing Surveys, 2019, 52(2):1-39. DOI:10.1145/3309551
[3] Wang E, Davis J J, Cheung P, et al. LUTNet:Rethinking inference in FPGA soft logic[C]//IEEE Annual International Symposium on Field-Programmable Custom Computing Machines. San Diego, CA, USA, 2019:26-34. DOI:10.1109/FCCM.2019.00014.
[4] Hardieck M, Kumm M, MF6;ller K, et al. Reconfigurable convolutional kernels for neural networks on FPGAs[C]//ACM International Symposium on Field-Programmable Gate Arrays. San Diego, CA, USA, 2019:43-52. DOI:10.1145/3289602.3293905.
[5] Cao Y, Wang C, Tang Y. Explore efficient LUT-based architecture for quantized convolutional neural networks on FPGA[C]//IEEE Annual International Symposium on Field-Programmable Custom Computing Machines. Fayetteville, AR, USA, 2020:232-232. DOI:10.1109/FCCM48280.2020. 00065.
[6] Hormigo J, Caffarena G, Oliver J P, et al. Self-reconfigurable constant multiplier for FPGA[J]. Acm Transactions on Reconfigurable Technology & Systems, 2013, 6(3):1-17. DOI:10.1145/2490830.
[7] Liang Y, Lu L, Xiao Q, et al. Evaluating fast algorithms for convolutional neural networks on FPGAs[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019:1-10. DOI:10.1109/TCAD.2019. 2897701.
[8] Xiao Q, Liang Y, Lu L, et al. Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs[C]//ACM Annual Design Automation Conference. Austin, TX, USA, 2017:1-6. DOI:10.1145/3061639.3062244.
[9] Yu J, Hu Y, Ning X, et al. Instruction driven cross-layer CNN accelerator with Winograd transformation on FPGA[C]//IEEE International Conference on Field Programmable Technology. Melbourne, Australia, 2017:227-230. DOI:10.1109/FPT.2017.8280147.
[10] Lu L, Liang Y. SpWA:An efficient sparse Winograd convolutional neural networks accelerator on FPGAs[C]//IEEE Design Automation Conference. San Francisco, CA, USA, 2018:1-6. DOI:10.1109/DAC.2018.8465842.
[11] Yao C, He J, Zhang X, et al. Cloud-DNN:An open framework for mapping DNN models to cloud FPGAs[C]//ACM International Symposium on Field-Programmable Gate Arrays. San Diego, CA, USA, 2019:73-82. DOI:10.1145/3289602.3293915.
[12] Yepez J, Ko S B. Stride 2 1-D, 2-D, and 3-D Winograd for convolutional neural networks[J]. IEEE Transactions on Very Large Scale Integration(VLSI)Systems, 2020, 28(99):853-863. DOI:10.1109/TVLSI.2019.2961602.
[13] Deng H, Wang J, Ye H, et al. 3D-VNPU:A flexible accelerator for 2D/3D CNNs on FPGA[C]//IEEE Annual International Symposium on Field-Programmable Custom Computing Machines. Orlando, FL, USA, 2021:181-185. DOI:10.1109/FCCM51124.2021.00029.
[14] Niu Y, Kannan R, Srivastava A, et al. Reuse kernels or activations:A flexible dataflow for low-latency spectral CNN acceleration[C]//ACM International Symposium on Field-Programmable Gate Arrays. San Diego, CA, USA, 2020:266-276. DOI:10.1145/3373087.3375302.
[15] Zhang X, Wang J, Chao Z, et al. DNNBuilder:An automated tool for building high-performance DNN hardware accelerators for FPGAs[C]//IEEE International Conference on Computer Aided Design. San Diego, CA, USA, 2018:1-8. DOI:10.1145/3240765.3240801.
[16] Lian X, Liu Z, Song Z, et al. High-performance FPGA-based CNN accelerator with block-floating-point arithmetic[J]. IEEE Transactions on Very Large Scale Integration(VLSI)Systems, 2019, 27(99):1874-1885. DOI:10.1109/TVLSI.2019.2913958