|Table of Contents|

[1] Wang Chengcheng, Li He, Cao Yanpeng, Song Changjun, et al. WinoNet: Reconfigurable look-up table-based Winograd acceleratorfor arbitrary precision convolutional neural network inference [J]. Journal of Southeast University (English Edition), 2022, 38 (4): 332-339. [doi:10.3969/j.issn.1003-7985.2022.04.002]
Copy

WinoNet: Reconfigurable look-up table-based Winograd acceleratorfor arbitrary precision convolutional neural network inference()
Share:

Journal of Southeast University (English Edition)[ISSN:1003-7985/CN:32-1325/N]

Volumn:
38
Issue:
2022 4
Page:
332-339
Research Field:
Circuit and System
Publishing date:
2022-12-20

Info

Title:
WinoNet: Reconfigurable look-up table-based Winograd acceleratorfor arbitrary precision convolutional neural network inference
Author(s):
Wang Chengcheng Li He Cao Yanpeng Song Changjun Yu Feng Tang Yongming
School of Electronic Science and Engineering, Southeast University, Nanjing 210096, China
Keywords:
quantized neural networks look-up table(LUT)-based multiplier Winograd algorithm arbitrary precision
PACS:
TN492
DOI:
10.3969/j.issn.1003-7985.2022.04.002
Abstract:
To solve the hardware deployment problem caused by the vast demanding computational complexity of convolutional layers and limited hardware resources for the hardware network inference, a look-up table(LUT)-based convolution architecture built on a field-programmable gate array using integer multipliers and addition trees is used. With the help of the Winograd algorithm, the optimization of convolution and multiplication is realized to reduce the computational complexity. The LUT-based operator is further optimized to construct a processing unit(PE). Simultaneously optimized storage streams improve memory access efficiency and solve bandwidth constraints. The data toggle rate is reduced to optimize power consumption. The experimental results show that the use of the Winograd algorithm to build basic processing units can significantly reduce the number of multipliers and achieve hardware deployment acceleration, while the time-division multiplexing of processing units improves resource utilization. Under this experimental condition, compared with the traditional convolution method, the architecture optimizes computing resources by 2.25 times and improves the peak throughput by 19.3 times. The LUT-based Winograd accelerator can effectively solve the deployment problem caused by limited hardware resources.

References:

[1] Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[C]//IEEE Computer Vision and Pattern Recognition. Las Vegas, CA, USA, 2016:2818-2826. DOI:10.1109/CVPR.2016.3 08.
[2] Wang E, Davis J J, Zhao R, et al. Deep neural network approximation for custom hardware:Where we’ve been, where we’re going[J]. ACM Computing Surveys, 2019, 52(2):1-39. DOI:10.1145/3309551
[3] Wang E, Davis J J, Cheung P, et al. LUTNet:Rethinking inference in FPGA soft logic[C]//IEEE Annual International Symposium on Field-Programmable Custom Computing Machines. San Diego, CA, USA, 2019:26-34. DOI:10.1109/FCCM.2019.00014.
[4] Hardieck M, Kumm M, M�F6;ller K, et al. Reconfigurable convolutional kernels for neural networks on FPGAs[C]//ACM International Symposium on Field-Programmable Gate Arrays. San Diego, CA, USA, 2019:43-52. DOI:10.1145/3289602.3293905.
[5] Cao Y, Wang C, Tang Y. Explore efficient LUT-based architecture for quantized convolutional neural networks on FPGA[C]//IEEE Annual International Symposium on Field-Programmable Custom Computing Machines. Fayetteville, AR, USA, 2020:232-232. DOI:10.1109/FCCM48280.2020. 00065.
[6] Hormigo J, Caffarena G, Oliver J P, et al. Self-reconfigurable constant multiplier for FPGA[J]. Acm Transactions on Reconfigurable Technology & Systems, 2013, 6(3):1-17. DOI:10.1145/2490830.
[7] Liang Y, Lu L, Xiao Q, et al. Evaluating fast algorithms for convolutional neural networks on FPGAs[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019:1-10. DOI:10.1109/TCAD.2019. 2897701.
[8] Xiao Q, Liang Y, Lu L, et al. Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs[C]//ACM Annual Design Automation Conference. Austin, TX, USA, 2017:1-6. DOI:10.1145/3061639.3062244.
[9] Yu J, Hu Y, Ning X, et al. Instruction driven cross-layer CNN accelerator with Winograd transformation on FPGA[C]//IEEE International Conference on Field Programmable Technology. Melbourne, Australia, 2017:227-230. DOI:10.1109/FPT.2017.8280147.
[10] Lu L, Liang Y. SpWA:An efficient sparse Winograd convolutional neural networks accelerator on FPGAs[C]//IEEE Design Automation Conference. San Francisco, CA, USA, 2018:1-6. DOI:10.1109/DAC.2018.8465842.
[11] Yao C, He J, Zhang X, et al. Cloud-DNN:An open framework for mapping DNN models to cloud FPGAs[C]//ACM International Symposium on Field-Programmable Gate Arrays. San Diego, CA, USA, 2019:73-82. DOI:10.1145/3289602.3293915.
[12] Yepez J, Ko S B. Stride 2 1-D, 2-D, and 3-D Winograd for convolutional neural networks[J]. IEEE Transactions on Very Large Scale Integration(VLSI)Systems, 2020, 28(99):853-863. DOI:10.1109/TVLSI.2019.2961602.
[13] Deng H, Wang J, Ye H, et al. 3D-VNPU:A flexible accelerator for 2D/3D CNNs on FPGA[C]//IEEE Annual International Symposium on Field-Programmable Custom Computing Machines. Orlando, FL, USA, 2021:181-185. DOI:10.1109/FCCM51124.2021.00029.
[14] Niu Y, Kannan R, Srivastava A, et al. Reuse kernels or activations:A flexible dataflow for low-latency spectral CNN acceleration[C]//ACM International Symposium on Field-Programmable Gate Arrays. San Diego, CA, USA, 2020:266-276. DOI:10.1145/3373087.3375302.
[15] Zhang X, Wang J, Chao Z, et al. DNNBuilder:An automated tool for building high-performance DNN hardware accelerators for FPGAs[C]//IEEE International Conference on Computer Aided Design. San Diego, CA, USA, 2018:1-8. DOI:10.1145/3240765.3240801.
[16] Lian X, Liu Z, Song Z, et al. High-performance FPGA-based CNN accelerator with block-floating-point arithmetic[J]. IEEE Transactions on Very Large Scale Integration(VLSI)Systems, 2019, 27(99):1874-1885. DOI:10.1109/TVLSI.2019.2913958

Memo

Memo:
Biographies: Wang Chengcheng(1999—), male, graduate; Tang Yongming(corresponding author), male, doctor, professor, tym@seu.edu.cn
Foundation item: The Academic Colleges and Universities Innovation Program 2.0(No.BP0719013).
Citation: Wang Chengcheng, Li He, Cao Yanpeng, et al. WinoNet:Reconfigurable look-up table-based Winograd accelerator for arbitrary precision convolutional neural network inference.[J].Journal of Southeast University(English Edition), 2022, 38(4):332-339.DOI:10.3969/j.issn.1003-7985.2022.04.002.
Last Update: 2022-12-20