国产日韩欧美一区二区三区三州_亚洲少妇熟女av_久久久久亚洲av国产精品_波多野结衣网站一区二区_亚洲欧美色片在线91_国产亚洲精品精品国产优播av_日本一区二区三区波多野结衣 _久久国产av不卡

?

基于FPGA的通用卷積層IP核設(shè)計(jì)

2021-07-25 03:04安國(guó)臣袁宏拓韓秀璐王曉君侯雨佳
關(guān)鍵詞:卷積神經(jīng)網(wǎng)絡(luò)

安國(guó)臣 袁宏拓 韓秀璐 王曉君 侯雨佳

摘 要:針對(duì)目前卷積神經(jīng)網(wǎng)絡(luò)在小型化、并行化過(guò)程中遇到的計(jì)算速度不夠、可移植性差的問(wèn)題,根據(jù)卷積神經(jīng)網(wǎng)絡(luò)和FPGA器件的特點(diǎn),提出了一種利用VHDL語(yǔ)言參數(shù)化高速通用卷積層IP核的設(shè)計(jì)方法。利用卷積層的計(jì)算方式,將卷積核心設(shè)計(jì)為全并行化、流水線的計(jì)算模塊,通過(guò)在卷積核心的每一行連接FIFO的方式改善數(shù)據(jù)流入的方式,減少地址跳轉(zhuǎn)的操作,并加入控制核心使其可以隨圖像和卷積窗口大小調(diào)整卷積層參數(shù),生成不同的卷積層,最后將卷積層與AXIS協(xié)議結(jié)合并封裝成IP核。結(jié)果表明,在50 MHz的工作頻率下,使用2×2大小的卷積核對(duì)100×100的圖像進(jìn)行卷積計(jì)算,各項(xiàng)資源利用率不超過(guò)1%,耗時(shí)204 μs,計(jì)算速度理論上可以達(dá)到最高5 MF/s。因此,設(shè)計(jì)方案在增加卷積模塊可移植性的同時(shí)又保證了計(jì)算速度,為卷積神經(jīng)網(wǎng)絡(luò)在小型化器件上的實(shí)現(xiàn)提供了一種可行的方法。

關(guān)鍵詞:集成電路技術(shù);卷積神經(jīng)網(wǎng)絡(luò);FPGA;卷積層;設(shè)計(jì)參數(shù)化

中圖分類號(hào):TP274;TP391 文獻(xiàn)標(biāo)識(shí)碼:A

doi:10.7535/hbkd.2021yx03005

Design of universal convolutional layer IP core based on FPGA

AN Guochen1, YUAN Hongtuo1, HAN Xiulu1, WANG Xiaojun1, HOU Yujia2

(1.School of Information Science and Engineering,Hebei University of Science and Technology,Shijiazhuang,Hebei 050018,China;

2.Shijiazhuang Foreign Education Group,Shijiazhuang,Hebei050022,China)

Abstract:Aiming at the problems of insufficient computing speed and poor portability in the miniaturization and parallelization of convolutional neural network,this paper proposes a design of high-speed universal convolutional layer IP core using VHDL language based on the characteristics of convolutional neural network and FPGA devices.Layer based on convolution calculation,convolution core design is put forward for the parallel calculation and pipeline module,through each line in the convolution of the core connect to FIFO to improve the data flow,reduce the operating address jump,and join the control core to make it can adjust the convolution with images and convolution window size to layer parameters,generate different convolution layer,finally,the convolution layer is combined with the AXIS protocol and encapsulated into IP core.Under the working frequency of 50 MHz,the convolution calculation of 100×100 images with 2×2 convolution check is carried out.The utilization rate of each resource is less than 1%,and the time is 204 μs.The theoretical calculation speed can reach the maximum of 5 MF/s.The IP core structure of the convolutional layer not only increases the portability of the convolutional module,but also ensures the computing speed,which provides a feasible implementation method for the implementation of convolutional neural network on miniaturized devices.

Keywords:

integrated circuit technology;convolutional neural network;FPGA;convolution layer;design parameterization

隨著深度學(xué)習(xí)研究的發(fā)展,卷積神經(jīng)網(wǎng)絡(luò)在語(yǔ)音識(shí)別[1]、圖像理解[2]、目標(biāo)跟蹤[3-4]等領(lǐng)域發(fā)揮著重要作用。卷積神經(jīng)網(wǎng)絡(luò)既需要在原理和結(jié)構(gòu)上繼續(xù)研究以達(dá)到更好的性能指標(biāo)和遷移能力,也需要更好地把現(xiàn)有神經(jīng)網(wǎng)絡(luò)適用到各種實(shí)時(shí)性、小型化的場(chǎng)合[5]。越來(lái)越多的神經(jīng)網(wǎng)絡(luò)加速結(jié)構(gòu)被提出,如將卷積層二值化[6]以減少FPGA的運(yùn)算量、專用的加速SOC[7]使CNN峰值計(jì)算性能在100 MHz下達(dá)到42.13GFLOPS等?;诖颂岢隽艘环N參數(shù)化設(shè)計(jì)的通用卷積層IP核加速結(jié)構(gòu),該結(jié)構(gòu)適用范圍更廣。

所提出的通用卷積層IP核設(shè)計(jì),屬于加速器設(shè)計(jì)通用化,是卷積神經(jīng)網(wǎng)絡(luò)加速器設(shè)計(jì)研究與發(fā)展的一個(gè)重要方向[8]。該設(shè)計(jì)通過(guò)VHDL語(yǔ)言的參數(shù)化設(shè)計(jì)[9],對(duì)一種卷積層加速結(jié)構(gòu)[10-11]進(jìn)行重構(gòu),使其由一種定長(zhǎng)卷積加速結(jié)構(gòu)改進(jìn)為可以生成不同大小卷積窗口的通用卷積核??梢浦残苑矫妫ㄟ^(guò)參數(shù)配置IP核,預(yù)先配置其窗口大小,被卷積信號(hào)或圖像大小,即可生成一維、二維不同大小的卷積層,滿足不同的卷積計(jì)算需要。計(jì)算速度方面,優(yōu)化數(shù)據(jù)流入方式,避免了地址的跳轉(zhuǎn),引入并行化和流水線思想,在數(shù)據(jù)傳輸不間斷的情況下,一個(gè)時(shí)鐘周期即可輸出一個(gè)有效計(jì)算結(jié)果。該設(shè)計(jì)支持AXI4-Stream協(xié)議,硬件開(kāi)發(fā)人員可以快速調(diào)用該卷積層IP核完成卷積層的開(kāi)發(fā),不必再把精力浪費(fèi)到內(nèi)部卷積結(jié)構(gòu)的設(shè)計(jì)上。

1 卷積層原理

卷積層有一維卷積層、二維卷積層和多維卷積層,同一種卷積層根據(jù)卷積窗口大小的不同也不同,這里主要介紹一維卷積層和二維卷積層。

1.1 一維信號(hào)卷積

一維卷積通常被用于時(shí)間序列的處理,一維卷積神經(jīng)網(wǎng)絡(luò)(1-dimensional convolutional neural network,簡(jiǎn)稱1DCNN)多用于工業(yè)故障診斷[12]、醫(yī)療診斷[13]等需要對(duì)時(shí)間序列進(jìn)行處理的場(chǎng)合。

圖1為一維卷積示意圖,k個(gè)點(diǎn)的時(shí)間序列和n個(gè)點(diǎn)的卷積核做卷積運(yùn)算,時(shí)間序列從左到右滑動(dòng),每次滑動(dòng)對(duì)應(yīng)數(shù)據(jù)相乘相加輸出一個(gè)卷積結(jié)果,作為下一層卷積層或者池化層的輸入。一維卷積運(yùn)算又等效于FIR濾波器的直接型結(jié)構(gòu)。

一維卷積公式如式(1)所示:

y(k)=∑n-1i=0x(i)h(k-1)。(1)

式中:x(i)是輸入數(shù)據(jù),數(shù)據(jù)長(zhǎng)度為k;h(n)是卷積核,卷積長(zhǎng)度為n;y(k)是卷積核對(duì)輸入數(shù)據(jù)進(jìn)行卷積后的輸出,數(shù)據(jù)長(zhǎng)度為k,在邊帶不補(bǔ)零的情況下,數(shù)據(jù)長(zhǎng)度為k-n+1。

1.2 二維圖像卷積

二維卷積神經(jīng)網(wǎng)絡(luò)(2-dimensional convolutional neural network,簡(jiǎn)稱2DCNN)多用于計(jì)算機(jī)視覺(jué)的處理[14-15],通過(guò)卷積核模擬人類大腦的神經(jīng)元對(duì)卷積層輸入進(jìn)行局部感知,模擬人腦神經(jīng)元感知到生物電信號(hào)的反應(yīng)并將感知結(jié)果輸出。

圖2為二維卷積層示意圖,一個(gè)卷積核模擬的神經(jīng)元在輸入圖像上滑動(dòng),遍歷所有圖像數(shù)據(jù),每滑動(dòng)一次,對(duì)應(yīng)的圖像數(shù)據(jù)和卷積核權(quán)值相乘求和輸出。

二維卷積公式如式(2)所示:

y(p,q)=∑m-1j=0∑n-1i=0x(i,j)h((p-i),(q-j))。(2)

式中:x(i,j)為輸入數(shù)據(jù),一般為圖像數(shù)據(jù);圖像大小為p×q;h((p-i),(q-j))為卷積核;卷積核的窗口大小為m×n;y(p,q)為輸出數(shù)據(jù),數(shù)據(jù)長(zhǎng)度為p×q,在邊帶不補(bǔ)零的情況下,數(shù)據(jù)長(zhǎng)度為(p-m+1)×(q-n+1)。

2 FPGA構(gòu)架分析

從1988年提出的LeNet-5模型到經(jīng)典的VGG-16模型,卷積層由最初的2層增加到了13層,甚至在152層的ResNet網(wǎng)絡(luò)中達(dá)到了50層。卷積層數(shù)的增加意味著計(jì)算量的增加,卷積層的計(jì)算量在CNN中占比高達(dá)90%[16]。因?yàn)樯窠?jīng)元和感受野之間的局部連接特點(diǎn),同一個(gè)卷積層下的卷積核是可以并行計(jì)算的,同一個(gè)卷積核的所有感受區(qū)域也是可以并行計(jì)算的,所以卷積層可以有很高的并行性。

為了兼顧速度和資源,卷積層一般采用并行加流水線相結(jié)合的方式。在某一個(gè)感受野內(nèi)的計(jì)算是并行的,即1次計(jì)算1個(gè)卷積結(jié)果。同一個(gè)卷積窗口在輸入圖像上的滑動(dòng)是流水線式的,即所有數(shù)據(jù)依次進(jìn)入卷積窗口進(jìn)行計(jì)算,既增加了并行度、提高了計(jì)算速度,又相對(duì)節(jié)省資源。

對(duì)于一個(gè)感受野中的單次卷積運(yùn)算,以5×5大小的卷積窗口為例,對(duì)于FPGA并行結(jié)構(gòu)的卷積核,在流水線結(jié)構(gòu)下,1個(gè)運(yùn)算周期可以輸出1個(gè)卷積結(jié)果;對(duì)于基于馮諾依曼結(jié)構(gòu)或者哈佛結(jié)構(gòu)的通用中央處理器,1次卷積需要執(zhí)行25次乘法和24次加法,共計(jì)49次計(jì)算,加上每次卷積中必須的地址跳轉(zhuǎn)等操作,至少需要50個(gè)運(yùn)算周期才能輸出1個(gè)卷積結(jié)果。通過(guò)對(duì)比可知,在相同頻率下,5×5大小的卷積窗口,基于FPGA的卷積核的計(jì)算速度是基于通用中央處理器的50倍以上。

由于傳統(tǒng)的FPGA電路定制化的特點(diǎn),針對(duì)固定結(jié)構(gòu)設(shè)計(jì)的卷積核很難移植到其他算法結(jié)構(gòu)中[17-20],甚至在同一個(gè)算法結(jié)構(gòu)中,一個(gè)3×3的卷積核也很難擴(kuò)展成5×5的卷積核。為了解決卷積層加速結(jié)構(gòu)不便移植的缺點(diǎn),提出了如圖3所示的通用卷積層加速結(jié)構(gòu)。由圖3可知,該卷積層加速結(jié)構(gòu)的通用性在于僅需在通過(guò)生成卷積層時(shí),配置幾個(gè)簡(jiǎn)單的參數(shù),即可生成含有指定大小卷積窗口的卷積層。3×3,5×5,7×7甚至是類似于3×5非正方形的卷積窗口和1×N大小的一維卷積窗口,都可以通過(guò)簡(jiǎn)單的配置生成。

該卷積層加速結(jié)構(gòu)支持AXI4-Stream協(xié)議,通過(guò)AXI4總線獲取數(shù)據(jù)。卷積層結(jié)構(gòu)主要分為3個(gè)部分,分別是控制核心區(qū)、數(shù)據(jù)緩存與預(yù)處理區(qū)和并行計(jì)算區(qū)??刂坪诵膮^(qū)負(fù)責(zé)接收和產(chǎn)生狀態(tài)信號(hào),通過(guò)AXI4總線與外部交互并且通過(guò)控制內(nèi)部FIFO的使能信號(hào)控制數(shù)據(jù)的預(yù)處理和計(jì)算。數(shù)據(jù)緩存與預(yù)處理區(qū)在控制信號(hào)的控制下,將輸入的數(shù)據(jù)流分別存入不同的FIFO中,每一行都配有一個(gè)FIFO。并行計(jì)算區(qū)不接受控制核心區(qū)的控制,在時(shí)鐘、使能和復(fù)位的控制下獨(dú)立地進(jìn)行乘累加操作。

3 通用卷積層IP核設(shè)計(jì)

通用卷積層IP核模塊運(yùn)用類屬參數(shù)語(yǔ)句generic和生成語(yǔ)句generate相結(jié)合的方式進(jìn)行參數(shù)化設(shè)計(jì)。通過(guò)在生成IP核時(shí)配置參數(shù),將要生成的IP核信息傳遞進(jìn)入生成語(yǔ)句,生成特定大小的并行計(jì)算電路。計(jì)算時(shí),將圖像的大小信息輸入控制核心模塊,由控制核心負(fù)責(zé)切換卷積核的運(yùn)行狀態(tài)。

[4] PEI Xia,LI Dong,WANG Lijun,et al.Deep visual tracking:Review and experimental comparison[J].Pattern Recognition.2018,76:323-338.

[5] 李彥冬,郝宗波,雷航.卷積神經(jīng)網(wǎng)絡(luò)研究綜述[J].自動(dòng)化學(xué)報(bào),2016,36(9):2508-2515.

LI Yandong,HAO Zongbo,LEI Hang.A review of convolutional neural networks[J].Acta Automatica Sinica,2016,36(9):2508-2515.

[6] 蔣佩卿,吳麗君.基于FPGA的改進(jìn)二值化卷積層設(shè)計(jì)[J].電氣開(kāi)關(guān),2019(6):8-13.

JIANG Peiqing,WU Lijun.Design of improved binarization convolutional layer based on FPGA[J].Electric Switchgear,2019(6):8-13.

[7] 趙爍,范軍,何虎.基于FPGA的CNN加速SoC系統(tǒng)設(shè)計(jì)[J].計(jì)算機(jī)工程與設(shè)計(jì),2020,41(4):939-944.

ZHAO Shuo,F(xiàn)AN Jun,HE Hu.Design of CNN acceleration SoC system based on FPGA[J].Computer Engineering and Design,2020,41(4):939-944.

[8] 陳桂林,馬勝,郭陽(yáng).硬件加速神經(jīng)網(wǎng)絡(luò)綜述[J].計(jì)算機(jī)研究與發(fā)展,2018,56(2):240-253.

CHEN Guilin,MA Sheng,GUO Yang.A review of hardware-accelerated neural networks[J].Journal of Computer Research and Development,2018,56(2):240-253.

[9] 孫延騰,吳艷霞,顧國(guó)昌.基于VHDL語(yǔ)言的參數(shù)化設(shè)計(jì)方法[J].計(jì)算機(jī)工程與應(yīng)用,2010,46(31):68-71.

SUN Yanteng,WU Yanxia,GU Guochang.Parametric design method based on VHDL language[J].Computer Engineering and Applications,2010,46(31):68-71.

[10]劉志成,祝永新,汪輝,等.基于FPGA的卷積神經(jīng)網(wǎng)絡(luò)并行加速結(jié)構(gòu)設(shè)計(jì)[J].微電子學(xué)與計(jì)算機(jī),2018.35(10):80-84.

LIU Zhicheng,ZHU Yongxin,WANG Hui,et al.Design of convolutional neural network parallel acceleration structure based on FPGA[J].Microelectronics & Computer,2018,35(10):80-84.

[11]陳煌,祝永新,田犁,等.基于FPGA的卷積神經(jīng)網(wǎng)絡(luò)卷積層并行加速結(jié)構(gòu)設(shè)計(jì)[J].微電子學(xué)與計(jì)算機(jī),2018,35(10):85-88.

CHEN Huang,ZHU Yongxin,TIAN Li,et al.Design of convolutional layer parallel acceleration structure of convolutional neural network based on FPGA[J].Microelectronics & Computer,2018.35(10):85-88.

[12]安晶,艾萍,徐森,等.一種基于一維卷積神經(jīng)網(wǎng)絡(luò)的旋轉(zhuǎn)機(jī)械智能故障診斷方法[J].南京大學(xué)學(xué)報(bào)(自然科學(xué)),2019(1):133-142.

AN Jing,AI Ping,XU Sen,et al.An intelligent fault diagnosis method for rotating machinery based on one-dimensional convolutional neural network[J].Journal of Nanjing University(Natural Science),2019(1):133-142.

[13]黃佼,賓光宇,吳水才.基于一維卷積神經(jīng)網(wǎng)絡(luò)的患者特異性心拍分類方法研究[J].中國(guó)醫(yī)療設(shè)備,2018(3):11-14.

HUANG Jiao,BIN Guangyu,WU Shuicai.Patient - specific cardiopap classification based on one-dimensional convolutional neural network[J].China Medical Devices,2018(3):11-14.

[14]王禮賀,楊德振,李江勇,等.卷積神經(jīng)網(wǎng)絡(luò)在目標(biāo)檢測(cè)中的應(yīng)用及FPGA實(shí)現(xiàn)[J].激光與紅外,2020,50(2):252-256.

WANG Lihe,YANG Dezhen,LI Jiangyong,et al.Application of convolutional neural network in target detection and FPGA implementation[J].Laser and Infrared,2020,50(2):252-256.

[15]江澤濤,劉小艷,胡碩.基于CNN的紅外與可見(jiàn)光融合圖像的場(chǎng)景識(shí)別[J].計(jì)算機(jī)工程與設(shè)計(jì),2019,40(8):2289-2294.

JIANG Zetao,LIU Xiaoyan,HU Shuo.Scene recognition of infrared and visible fusion images based on CNN[J].Computer Engineering and Design,2019,40(8):2289-2294.

[16]HE K,ZHANG X,REN S,et al.Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).New York:IEEE,2016:770-778.

[17]王婷,陳斌岳,張福海.基于FPGA的卷積神經(jīng)網(wǎng)絡(luò)并行加速器設(shè)計(jì)[J].電子技術(shù)應(yīng)用,2021,47(2):81-84.

WANG Ting,CHEN Binyue,ZHANG Fuhai.Design of convolutional neural network parallel accelerator based on FPGA[J].Application of Electronic Technique,2021,47(2):81-84.

[18]張旭欣,張嘉,李新增,等.二值VGG卷積神經(jīng)網(wǎng)絡(luò)加速器優(yōu)化設(shè)計(jì)[J].電子技術(shù)應(yīng)用,2021,47(2):20-23.

ZHANG Xuxin,ZHANG Jia,LI Xinzeng,et al.Accelerator optimization design of binary VGG convolutional neural network[J].Application of Electronic Technique,2021,47(2):20-23.

[19]張帆.圖像卷積實(shí)時(shí)計(jì)算的FPGA實(shí)現(xiàn)[J].電子設(shè)計(jì)工程,2021,29(1):132-137.

ZHANG Fan.FPGA implementation of image convolution real-time computation[J].International Electronic Elements,2021,29(1):132-137.

[20]范軍,鞏杰,吳茜鳳,等.基于FPGA的RNN加速SoC設(shè)計(jì)與實(shí)現(xiàn)[J].微電子學(xué)與計(jì)算機(jī),2020,37(11):1-5.

FAN Jun,GONG Jie,WU Xifeng,et al.Design and implementation of RNN accelerated SoC based on FPGA[J].Microelectronics & Computer,2020,37(11):1-5.

猜你喜歡
卷積神經(jīng)網(wǎng)絡(luò)
基于卷積神經(jīng)網(wǎng)絡(luò)溫室智能大棚監(jiān)控系統(tǒng)的研究
基于深度卷積神經(jīng)網(wǎng)絡(luò)的物體識(shí)別算法
深度學(xué)習(xí)算法應(yīng)用于巖石圖像處理的可行性研究
基于深度卷積網(wǎng)絡(luò)的人臉年齡分析算法與實(shí)現(xiàn)
基于卷積神經(jīng)網(wǎng)絡(luò)的樹(shù)葉識(shí)別的算法的研究