The hardware optimization technique of mono similarity system generation is presented based on hardware/software(HW/SW) co design.First,the coarse structure of sub graphs' matching based on full customized HW...The hardware optimization technique of mono similarity system generation is presented based on hardware/software(HW/SW) co design.First,the coarse structure of sub graphs' matching based on full customized HW/SW co design is put forward.Then,a universal sub graphs' combination method is discussed.Next,a more advanced vertexes' compression algorithm based on sub graphs' combination method is discussed with great emphasis.Experiments are done successfully with perfect results verifying all the formulas and the methods above.展开更多
Human detection is important in many applications and has attracted significant attention over the last decade. The Histograms of Oriented Gradients (HOG) as effective local descriptors are used with binary sliding wi...Human detection is important in many applications and has attracted significant attention over the last decade. The Histograms of Oriented Gradients (HOG) as effective local descriptors are used with binary sliding window mechanism to achieve good detection performance. However, the computation of HOG under such framework is about billion times and the pure software implementation for HOG computation is hard to meet the real-time requirement. This study proposes a hardware architecture called One-HOG accelerator operated on FPGA of Xilinx Spartan-6 LX-150T that provides an efficient way to compute HOG such that an embedded real-time platform of HW/SW co-design for application to crowd estimation and analysis is achieved. The One-HOG accelerator mainly consists of gradient module and histogram module. The gradient module is for computing gradient magnitude and orientation;histogram module is for generating a 36-D HOG feature vector. In addition to hardware realization, a new method called Histograms-of-Oriented-Gradients AdaBoost Long-Feature-Vector (HOG-AdaBoost-LFV) human classifier is proposed to significantly decrease the number of times to compute the HOG without sacrificing detection performance. The experiment results from three static image and four video datasets demonstrate that the proposed SW/HW (software/hardware) co-design system is 13.14 times faster than the pure software computation of Dalal algorithm.展开更多
In this paper,a software/hardware High-level Synthesis(HLS)design is proposed to compute the Adaptive Vector Median Filter(AVMF)in realtime.In fact,this filter is known by its excellent impulsive noise suppression and...In this paper,a software/hardware High-level Synthesis(HLS)design is proposed to compute the Adaptive Vector Median Filter(AVMF)in realtime.In fact,this filter is known by its excellent impulsive noise suppression and chromaticity conservation.The software(SW)study of this filter demonstrates that its implementation is too complex.The purpose of this work is to study the impact of using an HLS tool to design ideal floating-point and optimized fixed-point hardware(HW)architectures for the AVMF filter using square root function(ideal HW)and ROM memory(optimized HW),respectively,to select the best HLS architectures and to design an efficient HLS software/hardware(SW/HW)embedded AVMF design to achieve a trade-off between the processing time,power consumption and hardware cost.For that purpose,some approximations using ROM memory were proposed to perform the square root and develop a fixed-point AVMF algorithm.After that,the best solution generated for each HLS design was integrated in the SW/HW environment and evaluated under ZC702 FPGA platform.The experimental results showed a reduction of about 65%and 98%in both the power consumption and processing time for the ideal SW/HW implementation relative to the ideal SW implementation for an AVMF filter with the same image quality,respectively.Moreover,the power consumption and processing time of the optimized SW/HW are 70%and 97%less than the optimized SW implementation,respectively.In addition,the Look Up Table(LUTs)percentage,power consumption and processing time used by the optimized SW/HW design are improved by nearly 45%,18%and 61%compared the ideal SW/HW design,respectively,with slight decrease in the image quality.展开更多
In the context of constructing an embedded system to help visually impaired people to interpret text,in this paper,an efficient High-level synthesis(HLS)Hardware/Software(HW/SW)design for text extraction using the Gam...In the context of constructing an embedded system to help visually impaired people to interpret text,in this paper,an efficient High-level synthesis(HLS)Hardware/Software(HW/SW)design for text extraction using the Gamma Correction Method(GCM)is proposed.Indeed,the GCM is a common method used to extract text from a complex color image and video.The purpose of this work is to study the complexity of the GCM method on Xilinx ZCU102 FPGA board and to propose a HW implementation as Intellectual Property(IP)block of the critical blocks in this method using HLS flow with taking account the quality of the text extraction.This IP is integrated and connected to the ARM Cortex-A53 as coprocessor in HW/SW codesign context.The experimental results show that theHLS HW/SW implementation of the GCM method on ZCU102 FPGA board allows a reduction in processing time by about 89%compared to the SW implementation.This result is given for the same potency and strength of SW implementation for the text extraction.展开更多
传统IC设计方法关注的是如何创建一个全新的设计并进行有效的验证。复杂的IP模块、嵌入式软件、不断增长的晶体管数量 ,这些都变成了传统方法日益沉重的负担。在片上系统SOC(system on chip)设计中 ,基于IP模块的功能组装正在逐渐代替...传统IC设计方法关注的是如何创建一个全新的设计并进行有效的验证。复杂的IP模块、嵌入式软件、不断增长的晶体管数量 ,这些都变成了传统方法日益沉重的负担。在片上系统SOC(system on chip)设计中 ,基于IP模块的功能组装正在逐渐代替传统的功能设计而成为主流的设计方法。展开更多
"同芯Ⅳ"是中国科学院微电子研究所通信与多媒体SOC实验室设计的一款多核异构处理器。本文将电子系统级(Electronic System Level,ESL)设计方法成功应用于该处理器SOC设计,通过SystemC对系统关键单元MIPS处理器建模,利用Visua..."同芯Ⅳ"是中国科学院微电子研究所通信与多媒体SOC实验室设计的一款多核异构处理器。本文将电子系统级(Electronic System Level,ESL)设计方法成功应用于该处理器SOC设计,通过SystemC对系统关键单元MIPS处理器建模,利用Visual Studio和Modelsim等工具进行软硬件协同设计、验证。实践证明利用SystemC模型进行软硬件协同设计有效提高了开发并行度,缩短开发周期,为验证和性能优化提供了详尽的参考数据,简化了调试。展开更多
文摘The hardware optimization technique of mono similarity system generation is presented based on hardware/software(HW/SW) co design.First,the coarse structure of sub graphs' matching based on full customized HW/SW co design is put forward.Then,a universal sub graphs' combination method is discussed.Next,a more advanced vertexes' compression algorithm based on sub graphs' combination method is discussed with great emphasis.Experiments are done successfully with perfect results verifying all the formulas and the methods above.
文摘Human detection is important in many applications and has attracted significant attention over the last decade. The Histograms of Oriented Gradients (HOG) as effective local descriptors are used with binary sliding window mechanism to achieve good detection performance. However, the computation of HOG under such framework is about billion times and the pure software implementation for HOG computation is hard to meet the real-time requirement. This study proposes a hardware architecture called One-HOG accelerator operated on FPGA of Xilinx Spartan-6 LX-150T that provides an efficient way to compute HOG such that an embedded real-time platform of HW/SW co-design for application to crowd estimation and analysis is achieved. The One-HOG accelerator mainly consists of gradient module and histogram module. The gradient module is for computing gradient magnitude and orientation;histogram module is for generating a 36-D HOG feature vector. In addition to hardware realization, a new method called Histograms-of-Oriented-Gradients AdaBoost Long-Feature-Vector (HOG-AdaBoost-LFV) human classifier is proposed to significantly decrease the number of times to compute the HOG without sacrificing detection performance. The experiment results from three static image and four video datasets demonstrate that the proposed SW/HW (software/hardware) co-design system is 13.14 times faster than the pure software computation of Dalal algorithm.
基金The authors extend their appreciation to the Deanship of Scientific Research at Jouf University(Kingdom of Saudi Arabia)for funding this work through research Grant No.DSR2020-06-3663.
文摘In this paper,a software/hardware High-level Synthesis(HLS)design is proposed to compute the Adaptive Vector Median Filter(AVMF)in realtime.In fact,this filter is known by its excellent impulsive noise suppression and chromaticity conservation.The software(SW)study of this filter demonstrates that its implementation is too complex.The purpose of this work is to study the impact of using an HLS tool to design ideal floating-point and optimized fixed-point hardware(HW)architectures for the AVMF filter using square root function(ideal HW)and ROM memory(optimized HW),respectively,to select the best HLS architectures and to design an efficient HLS software/hardware(SW/HW)embedded AVMF design to achieve a trade-off between the processing time,power consumption and hardware cost.For that purpose,some approximations using ROM memory were proposed to perform the square root and develop a fixed-point AVMF algorithm.After that,the best solution generated for each HLS design was integrated in the SW/HW environment and evaluated under ZC702 FPGA platform.The experimental results showed a reduction of about 65%and 98%in both the power consumption and processing time for the ideal SW/HW implementation relative to the ideal SW implementation for an AVMF filter with the same image quality,respectively.Moreover,the power consumption and processing time of the optimized SW/HW are 70%and 97%less than the optimized SW implementation,respectively.In addition,the Look Up Table(LUTs)percentage,power consumption and processing time used by the optimized SW/HW design are improved by nearly 45%,18%and 61%compared the ideal SW/HW design,respectively,with slight decrease in the image quality.
文摘In the context of constructing an embedded system to help visually impaired people to interpret text,in this paper,an efficient High-level synthesis(HLS)Hardware/Software(HW/SW)design for text extraction using the Gamma Correction Method(GCM)is proposed.Indeed,the GCM is a common method used to extract text from a complex color image and video.The purpose of this work is to study the complexity of the GCM method on Xilinx ZCU102 FPGA board and to propose a HW implementation as Intellectual Property(IP)block of the critical blocks in this method using HLS flow with taking account the quality of the text extraction.This IP is integrated and connected to the ARM Cortex-A53 as coprocessor in HW/SW codesign context.The experimental results show that theHLS HW/SW implementation of the GCM method on ZCU102 FPGA board allows a reduction in processing time by about 89%compared to the SW implementation.This result is given for the same potency and strength of SW implementation for the text extraction.
文摘"同芯Ⅳ"是中国科学院微电子研究所通信与多媒体SOC实验室设计的一款多核异构处理器。本文将电子系统级(Electronic System Level,ESL)设计方法成功应用于该处理器SOC设计,通过SystemC对系统关键单元MIPS处理器建模,利用Visual Studio和Modelsim等工具进行软硬件协同设计、验证。实践证明利用SystemC模型进行软硬件协同设计有效提高了开发并行度,缩短开发周期,为验证和性能优化提供了详尽的参考数据,简化了调试。