期刊文献+
共找到7篇文章
< 1 >
每页显示 20 50 100
Reconfigurable Communication Processor: A New Approach for Network Processor
1
作者 孙华 陈青山 张文渊 《Journal of Shanghai Jiaotong university(Science)》 EI 2003年第1期43-47,共5页
As the traditional RISC+ASIC/ASSP approach for network processor design can not meet the today’s requirements, this paper described an alternate approach, Reconfigurable Processing Architecture, to boost the performa... As the traditional RISC+ASIC/ASSP approach for network processor design can not meet the today’s requirements, this paper described an alternate approach, Reconfigurable Processing Architecture, to boost the performance to ASIC level while reserve the programmability of the traditional RISC based system. This paper covers both the hardware architecture and the software development environment architecture. 展开更多
关键词 network processor reconfigurable processor run time reconfiguration field programmable gate array (FPGA) raduced instruction set circuit (RISC) application specific integrated circuit(ASIC)
下载PDF
A Reconfigurable Block Cryptographic Processor Based on VLIW Architecture 被引量:11
2
作者 LI Wei ZENG Xiaoyang +2 位作者 NAN Longmei CHEN Tao DAI Zibin 《China Communications》 SCIE CSCD 2016年第1期91-99,共9页
An Efficient and flexible implementation of block ciphers is critical to achieve information security processing.Existing implementation methods such as GPP,FPGA and cryptographic application-specific ASIC provide the... An Efficient and flexible implementation of block ciphers is critical to achieve information security processing.Existing implementation methods such as GPP,FPGA and cryptographic application-specific ASIC provide the broad range of support.However,these methods could not achieve a good tradeoff between high-speed processing and flexibility.In this paper,we present a reconfigurable VLIW processor architecture targeted at block cipher processing,analyze basic operations and storage characteristics,and propose the multi-cluster register-file structure for block ciphers.As for the same operation element of block ciphers,we adopt reconfigurable technology for multiple cryptographic processing units and interconnection scheme.The proposed processor not only flexibly accomplishes the combination of multiple basic cryptographic operations,but also realizes dynamic configuration for cryptographic processing units.It has been implemented with0.18μm CMOS technology,the test results show that the frequency can reach 350 MHz.and power consumption is 420 mw.Ten kinds of block and hash ciphers were realized in the processor.The encryption throughput of AES,DES,IDEA,and SHA-1 algorithm is1554 Mbps,448Mbps,785 Mbps,and 424 Mbps respectively,the test result shows that our processor's encryption performance is significantly higher than other designs. 展开更多
关键词 Block Cipher VLIW processor reconfigurable application-specific instruction-set
下载PDF
Image processing algorithm acceleration using reconfigurable macro processor model 被引量:2
3
作者 SunGuanKfu ChenHuaming LuHuanzhang 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2004年第2期110-114,共5页
The concept and advantage of reconfigurable technology is introduced. A kind of processor architecture of re configurable macro processor (RMP) model based on FPGA array and DSP is put forward and has been implemented... The concept and advantage of reconfigurable technology is introduced. A kind of processor architecture of re configurable macro processor (RMP) model based on FPGA array and DSP is put forward and has been implemented. Two image algorithms are developed: template-based automatic target recognition and zone labeling. One is estimating for motion direction in the infrared image background, another is line picking-up algorithm based on image zone labeling and phase grouping technique. It is a kind of 'hardware' function that can be called by the DSP in high-level algorithm. It is also a kind of hardware algorithm of the DSP. The results of experiments show the reconfigurable computing technology based on RMP is an ideal accelerating means to deal with the high-speed image processing tasks. High real time performance is obtained in our two applications on RMP. 展开更多
关键词 real-time image processing reconfigurable computing technology reconfigurable macro processor model template matching image zone labeling.
下载PDF
BAR:a branch-alternation-resorting algorithm for locality exploration in graph processing
4
作者 邓军勇 WANG Junjie +2 位作者 JIANG Lin XIE Xiaoyan ZHOU Kai 《High Technology Letters》 EI CAS 2024年第1期31-42,共12页
Unstructured and irregular graph data causes strong randomness and poor locality of data accesses in graph processing.This paper optimizes the depth-branch-resorting algorithm(DBR),and proposes a branch-alternation-re... Unstructured and irregular graph data causes strong randomness and poor locality of data accesses in graph processing.This paper optimizes the depth-branch-resorting algorithm(DBR),and proposes a branch-alternation-resorting algorithm(BAR).In order to make the algorithm run in parallel and improve the efficiency of algorithm operation,the BAR algorithm is mapped onto the reconfigurable array processor(APR-16)to achieve vertex reordering,effectively improving the locality of graph data.This paper validates the BAR algorithm on the GraphBIG framework,by utilizing the reordered dataset with BAR on breadth-first search(BFS),single source shortest paht(SSSP)and betweenness centrality(BC)algorithms for traversal.The results show that compared with DBR and Corder algorithms,BAR can reduce execution time by up to 33.00%,and 51.00%seperatively.In terms of data movement,the BAR algorithm has a maximum reduction of 39.00%compared with the DBR algorithm and 29.66%compared with Corder algorithm.In terms of computational complexity,the BAR algorithm has a maximum reduction of 32.56%compared with DBR algorithm and53.05%compared with Corder algorithm. 展开更多
关键词 graph processing vertex reordering branch-alternation-resorting algorithm(BAR) reconfigurable array processor
下载PDF
Performance characterization of illumination algorithms for reconfigurable graphics processor 被引量:2
5
作者 Deng Junyong Liu Yang Xie Xiaoyan 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2019年第5期60-71,共12页
Graphics processing is an increasing important application domain with the demand of real-time rendering,video streaming,virtual reality,and so on.Illumination is a critical module in graphics rendering and is typical... Graphics processing is an increasing important application domain with the demand of real-time rendering,video streaming,virtual reality,and so on.Illumination is a critical module in graphics rendering and is typically compute-bound,memory-bound,and power-bound in different application cases.It is crucial to decide how to schedule different illumination algorithms with different features according to the practical requirements in reconfigurable graphics hardware.This paper analyze the performance characteristics of four main-stream lighting algorithms,Lambert illumination algorithm,Phong illumination algorithm,Blinn-Phong illumination algorithm,and Cook-Torrance illumination algorithm,using hardware performance counters on x86 processor platform KabyLake(KBL).The data movement,computation,power consumption,and memory accessing are evaluated over a range of application scenarios.Further,by analyzing the system-level behavior of these illumination algorithms,obtains the cons and pros of these specific algorithms were obtained.The associated relationship between performance/energy and the evaluated metrics was analyzed through Pearson correlation coefficient(PCC)analysis.According to these performance characterization data,this paper presents some reconfiguration suggestions in reconfigurable graphics processor. 展开更多
关键词 performance characterization illumination algorithms reconfigurable graphics processor correlation analysis computer architecture
原文传递
Design and implementation of near-memory computing array architecture based on shared buffer 被引量:1
6
作者 SHAN Rui GAO Xu +3 位作者 FENG Yani HUI Chao CUI Xinyue CHAI Miaomiao 《High Technology Letters》 EI CAS 2022年第4期345-353,共9页
Deep learning algorithms have been widely used in computer vision,natural language processing and other fields.However,due to the ever-increasing scale of the deep learning model,the requirements for storage and compu... Deep learning algorithms have been widely used in computer vision,natural language processing and other fields.However,due to the ever-increasing scale of the deep learning model,the requirements for storage and computing performance are getting higher and higher,and the processors based on the von Neumann architecture have gradually exposed significant shortcomings such as consumption and long latency.In order to alleviate this problem,large-scale processing systems are shifting from a traditional computing-centric model to a data-centric model.A near-memory computing array architecture based on the shared buffer is proposed in this paper to improve system performance,which supports instructions with the characteristics of store-calculation integration,reducing the data movement between the processor and main memory.Through data reuse,the processing speed of the algorithm is further improved.The proposed architecture is verified and tested through the parallel realization of the convolutional neural network(CNN)algorithm.The experimental results show that at the frequency of 110 MHz,the calculation speed of a single convolution operation is increased by 66.64%on average compared with the CNN architecture that performs parallel calculations on field programmable gate array(FPGA).The processing speed of the whole convolution layer is improved by 8.81%compared with the reconfigurable array processor that does not support near-memory computing. 展开更多
关键词 near-memory computing shared buffer reconfigurable array processor convolutional neural network(CNN)
下载PDF
A simplified hardware-friendly contour prediction algorithm in 3D-HEVC and parallelization design
7
作者 JIANG Lin DUAN Xueyao XIE Xiaoyan 《High Technology Letters》 EI CAS 2022年第4期392-400,共9页
After the extension of depth modeling mode 4(DMM-4)in 3D high efficiency video coding(3D-HEVC),the computational complexity increases sharply,which causes the real-time performance of video coding to be impacted.To re... After the extension of depth modeling mode 4(DMM-4)in 3D high efficiency video coding(3D-HEVC),the computational complexity increases sharply,which causes the real-time performance of video coding to be impacted.To reduce the computational complexity of DMM-4,a simplified hardware-friendly contour prediction algorithm is proposed in this paper.Based on the similarity between texture and depth map,the proposed algorithm directly codes depth blocks to calculate edge regions to reduce the number of reference blocks.Through the verification of the test sequence on HTM16.1,the proposed algorithm coding time is reduced by 9.42%compared with the original algorithm.To avoid the time consuming of serial coding on HTM,a parallelization design of the proposed algorithm based on reconfigurable array processor(DPR-CODEC)is proposed.The parallelization design reduces the storage access time,configuration time and saves the storage cost.Verified with the Xilinx Virtex 6 FPGA,experimental results show that parallelization design is capable of processing HD 1080p at a speed above 30 frames per second.Compared with the related work,the scheme reduces the LUTs by 42.3%,the REG by 85.5%and the hardware resources by 66.7%.The data loading speedup ratio of parallel scheme can reach 3.4539.On average,the different sized templates serial/parallel speedup ratio of encoding time can reach 2.446. 展开更多
关键词 depth modeling mode 4(DMM-4) contour prediction 3D high efficiency video coding(3D-HEVC) PARALLELIZATION reconfigurable array processor
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部