In this paper,X is a locally compact Hausdorff space and A is a Banach algebra.First,we study some basic features of C0(X,A)related to BSE concept,which are gotten from A.In particular,we prove that if C0(X,A)has the ...In this paper,X is a locally compact Hausdorff space and A is a Banach algebra.First,we study some basic features of C0(X,A)related to BSE concept,which are gotten from A.In particular,we prove that if C0(X,A)has the BSE property then A has so.We also establish the converse of this result,whenever X is discrete and A has the BSE-norm property.Furthermore,we prove the same result for the BSE property of type I.Finally,we prove that C0(X,A)has the BSE-norm property if and only if A has so.展开更多
As a part of quantum image processing,quantum image filtering is a crucial technology in the development of quantum computing.Low-pass filtering can effectively achieve anti-aliasing effects on images.Currently,most q...As a part of quantum image processing,quantum image filtering is a crucial technology in the development of quantum computing.Low-pass filtering can effectively achieve anti-aliasing effects on images.Currently,most quantum image filterings are based on classical domains and grayscale images,and there are relatively fewer studies on anti-aliasing in the quantum domain.This paper proposes a scheme for anti-aliasing filtering based on quantum grayscale and color image scaling in the spatial domain.It achieves the effect of anti-aliasing filtering on quantum images during the scaling process.First,we use the novel enhanced quantum representation(NEQR)and the improved quantum representation of color images(INCQI)to represent classical images.Since aliasing phenomena are more pronounced when images are scaled down,this paper focuses only on the anti-aliasing effects in the case of reduction.Subsequently,we perform anti-aliasing filtering on the quantum representation of the original image and then use bilinear interpolation to scale down the image,achieving the anti-aliasing effect.The constructed pyramid model is then used to select an appropriate image for upscaling to the original image size.Finally,the complexity of the circuit is analyzed.Compared to the images experiencing aliasing effects solely due to scaling,applying anti-aliasing filtering to the images results in smoother and clearer outputs.Additionally,the anti-aliasing filtering allows for manual intervention to select the desired level of image smoothness.展开更多
In practice,simultaneous impact localization and time history reconstruction can hardly be achieved,due to the illposed and under-determined problems induced by the constrained and harsh measuring conditions.Although ...In practice,simultaneous impact localization and time history reconstruction can hardly be achieved,due to the illposed and under-determined problems induced by the constrained and harsh measuring conditions.Although l_(1) regularization can be used to obtain sparse solutions,it tends to underestimate solution amplitudes as a biased estimator.To address this issue,a novel impact force identification method with l_(p) regularization is proposed in this paper,using the alternating direction method of multipliers(ADMM).By decomposing the complex primal problem into sub-problems solvable in parallel via proximal operators,ADMM can address the challenge effectively.To mitigate the sensitivity to regularization parameters,an adaptive regularization parameter is derived based on the K-sparsity strategy.Then,an ADMM-based sparse regularization method is developed,which is capable of handling l_(p) regularization with arbitrary p values using adaptively-updated parameters.The effectiveness and performance of the proposed method are validated on an aircraft skin-like composite structure.Additionally,an investigation into the optimal p value for achieving high-accuracy solutions via l_(p) regularization is conducted.It turns out that l_(0.6)regularization consistently yields sparser and more accurate solutions for impact force identification compared to the classic l_(1) regularization method.The impact force identification method proposed in this paper can simultaneously reconstruct impact time history with high accuracy and accurately localize the impact using an under-determined sensor configuration.展开更多
In this paper,we design an efficient,multi-stage image segmentation framework that incorporates a weighted difference of anisotropic and isotropic total variation(AITV).The segmentation framework generally consists of...In this paper,we design an efficient,multi-stage image segmentation framework that incorporates a weighted difference of anisotropic and isotropic total variation(AITV).The segmentation framework generally consists of two stages:smoothing and thresholding,thus referred to as smoothing-and-thresholding(SaT).In the first stage,a smoothed image is obtained by an AITV-regularized Mumford-Shah(MS)model,which can be solved efficiently by the alternating direction method of multipliers(ADMMs)with a closed-form solution of a proximal operator of the l_(1)-αl_(2) regularizer.The convergence of the ADMM algorithm is analyzed.In the second stage,we threshold the smoothed image by K-means clustering to obtain the final segmentation result.Numerical experiments demonstrate that the proposed segmentation framework is versatile for both grayscale and color images,effcient in producing high-quality segmentation results within a few seconds,and robust to input images that are corrupted with noise,blur,or both.We compare the AITV method with its original convex TV and nonconvex TVP(O<p<1)counterparts,showcasing the qualitative and quantitative advantages of our proposed method.展开更多
In this paper, a modified version of the Classical Lagrange Multiplier method is developed for convex quadratic optimization problems. The method, which is evolved from the first order derivative test for optimality o...In this paper, a modified version of the Classical Lagrange Multiplier method is developed for convex quadratic optimization problems. The method, which is evolved from the first order derivative test for optimality of the Lagrangian function with respect to the primary variables of the problem, decomposes the solution process into two independent ones, in which the primary variables are solved for independently, and then the secondary variables, which are the Lagrange multipliers, are solved for, afterward. This is an innovation that leads to solving independently two simpler systems of equations involving the primary variables only, on one hand, and the secondary ones on the other. Solutions obtained for small sized problems (as preliminary test of the method) demonstrate that the new method is generally effective in producing the required solutions.展开更多
In the contemporary era, the proliferation of information technology has led to an unprecedented surge in data generation, with this data being dispersed across a multitude of mobile devices. Facing these situations a...In the contemporary era, the proliferation of information technology has led to an unprecedented surge in data generation, with this data being dispersed across a multitude of mobile devices. Facing these situations and the training of deep learning model that needs great computing power support, the distributed algorithm that can carry out multi-party joint modeling has attracted everyone’s attention. The distributed training mode relieves the huge pressure of centralized model on computer computing power and communication. However, most distributed algorithms currently work in a master-slave mode, often including a central server for coordination, which to some extent will cause communication pressure, data leakage, privacy violations and other issues. To solve these problems, a decentralized fully distributed algorithm based on deep random weight neural network is proposed. The algorithm decomposes the original objective function into several sub-problems under consistency constraints, combines the decentralized average consensus (DAC) and alternating direction method of multipliers (ADMM), and achieves the goal of joint modeling and training through local calculation and communication of each node. Finally, we compare the proposed decentralized algorithm with several centralized deep neural networks with random weights, and experimental results demonstrate the effectiveness of the proposed algorithm.展开更多
Let X be a complex Banach space and let B and C be two closed linear operators on X satisfying the condition D(B)?D(C),and let d∈L^(1)(R_(+))and 0≤β<α≤2.We characterize the well-posedness of the fractional int...Let X be a complex Banach space and let B and C be two closed linear operators on X satisfying the condition D(B)?D(C),and let d∈L^(1)(R_(+))and 0≤β<α≤2.We characterize the well-posedness of the fractional integro-differential equations D^(α)u(t)+CD^(β)u(t)=Bu(t)+∫_(-∞)td(t-s)Bu(s)ds+f(t),(0≤t≤2π)on periodic Lebesgue-Bochner spaces L^(p)(T;X)and periodic Besov spaces B_(p,q)^(s)(T;X).展开更多
With the continuous development of deep learning,Deep Convolutional Neural Network(DCNN)has attracted wide attention in the industry due to its high accuracy in image classification.Compared with other DCNN hard-ware ...With the continuous development of deep learning,Deep Convolutional Neural Network(DCNN)has attracted wide attention in the industry due to its high accuracy in image classification.Compared with other DCNN hard-ware deployment platforms,Field Programmable Gate Array(FPGA)has the advantages of being programmable,low power consumption,parallelism,and low cost.However,the enormous amount of calculation of DCNN and the limited logic capacity of FPGA restrict the energy efficiency of the DCNN accelerator.The traditional sequential sliding window method can improve the throughput of the DCNN accelerator by data multiplexing,but this method’s data multiplexing rate is low because it repeatedly reads the data between rows.This paper proposes a fast data readout strategy via the circular sliding window data reading method,it can improve the multiplexing rate of data between rows by optimizing the memory access order of input data.In addition,the multiplication bit width of the DCNN accelerator is much smaller than that of the Digital Signal Processing(DSP)on the FPGA,which means that there will be a waste of resources if a multiplication uses a single DSP.A multiplier sharing strategy is proposed,the multiplier of the accelerator is customized so that a single DSP block can complete multiple groups of 4,6,and 8-bit signed multiplication in parallel.Finally,based on two strategies of appeal,an FPGA optimized accelerator is proposed.The accelerator is customized by Verilog language and deployed on Xilinx VCU118.When the accelerator recognizes the CIRFAR-10 dataset,its energy efficiency is 39.98 GOPS/W,which provides 1.73×speedup energy efficiency over previous DCNN FPGA accelerators.When the accelerator recognizes the IMAGENET dataset,its energy efficiency is 41.12 GOPS/W,which shows 1.28×−3.14×energy efficiency compared with others.展开更多
In this paper,using inhomogeneous Calderon’s reproducing formulas and the space of test functions associated with a para-accretive function,the inhomogeneous Besov and TriebelLizorkin spaces are established.As applic...In this paper,using inhomogeneous Calderon’s reproducing formulas and the space of test functions associated with a para-accretive function,the inhomogeneous Besov and TriebelLizorkin spaces are established.As applications,pointwise multiplier theorems are also obtained.展开更多
One of the elementary operations in computing systems is multiplication.Therefore,high-speed and low-power multipliers design is mandatory for efficient computing systems.In designing low-energy dissipation circuits,r...One of the elementary operations in computing systems is multiplication.Therefore,high-speed and low-power multipliers design is mandatory for efficient computing systems.In designing low-energy dissipation circuits,reversible logic is more efficient than irreversible logic circuits but at the cost of higher complexity.This paper introduces an efficient signed/unsigned 4×4 reversible Vedic multiplier with minimum quantum cost.The Vedic multiplier is considered fast as it generates all partial product and their sum in one step.This paper proposes two reversible Vedic multipliers with optimized quantum cost and garbage output.First,the unsigned Vedic multiplier is designed based on the Urdhava Tiryakbhyam(UT)Sutra.This multiplier consists of bitwise multiplication and adder compressors.Compared with Vedic multipliers in the literature,the proposed design has a quantum cost of 111 with a reduction of 94%compared to the previous design.It has a garbage output of 30 with optimization of the best-compared design.Second,the proposed unsigned multiplier is expanded to allow the multiplication of signed numbers as well as unsigned numbers.Two signed Vedic multipliers are presented with the aim of obtaining more optimization in performance parameters.DesignI has separate binary two’s complement(B2C)and MUX circuits,while DesignII combines binary two’s complement and MUX circuits in one circuit.DesignI shows the lowest quantum cost,231,regarding state-ofthe-art.DesignII has a quantum cost of 199,reducing to 86.14%of DesignI.The functionality of the proposed multiplier is simulated and verified using XILINX ISE 14.2.展开更多
Approximate computing is a popularfield for low power consumption that is used in several applications like image processing,video processing,multi-media and data mining.This Approximate computing is majorly performed ...Approximate computing is a popularfield for low power consumption that is used in several applications like image processing,video processing,multi-media and data mining.This Approximate computing is majorly performed with an arithmetic circuit particular with a multiplier.The multiplier is the most essen-tial element used for approximate computing where the power consumption is majorly based on its performance.There are several researchers are worked on the approximate multiplier for power reduction for a few decades,but the design of low power approximate multiplier is not so easy.This seems a bigger challenge for digital industries to design an approximate multiplier with low power and minimum error rate with higher accuracy.To overcome these issues,the digital circuits are applied to the Deep Learning(DL)approaches for higher accuracy.In recent times,DL is the method that is used for higher learning and prediction accuracy in severalfields.Therefore,the Long Short-Term Memory(LSTM)is a popular time series DL method is used in this work for approximate computing.To provide an optimal solution,the LSTM is combined with a meta-heuristics Jel-lyfish search optimisation technique to design an input aware deep learning-based approximate multiplier(DLAM).In this work,the jelly optimised LSTM model is used to enhance the error metrics performance of the Approximate multiplier.The optimal hyperparameters of the LSTM model are identified by jelly search opti-misation.Thisfine-tuning is used to obtain an optimal solution to perform an LSTM with higher accuracy.The proposed pre-trained LSTM model is used to generate approximate design libraries for the different truncation levels as a func-tion of area,delay,power and error metrics.The experimental results on an 8-bit multiplier with an image processing application shows that the proposed approx-imate computing multiplier achieved a superior area and power reduction with very good results on error rates.展开更多
An equation concerning with the subdifferential of convex functionals defined in real Banach spaces and the metric projections to level sets is shown. The equation is compared with the resolvents of general monotone o...An equation concerning with the subdifferential of convex functionals defined in real Banach spaces and the metric projections to level sets is shown. The equation is compared with the resolvents of general monotone operators, and makes the geometric properties of differential equations expressed by subdifferentials clear. Hence, it can be expected to be useful in obtaining the steepest descents defined by the convex functionals in Banach spaces. Also, it gives a similar result to the Lagrange multiplier method under certain conditions.展开更多
研究线性终端状态约束下不定随机线性二次最优控制问题.首先利用Lagrange mul tiplier定理得到了存在最优线性状态反馈解的必要条件,而在加强的条件下也得到了最优控制存在的充分条件.从某种意义上讲,以往关于无约束随机线性二次最优...研究线性终端状态约束下不定随机线性二次最优控制问题.首先利用Lagrange mul tiplier定理得到了存在最优线性状态反馈解的必要条件,而在加强的条件下也得到了最优控制存在的充分条件.从某种意义上讲,以往关于无约束随机线性二次最优控制的一些结果可以看成本文主要定理的推论.展开更多
A global optimization algorithm (GOA) for parallel Chien search circuit in Reed-Solomon (RS) (255,239) decoder is presented. By finding out the common modulo 2 additions within groups of Galois field (GF) mult...A global optimization algorithm (GOA) for parallel Chien search circuit in Reed-Solomon (RS) (255,239) decoder is presented. By finding out the common modulo 2 additions within groups of Galois field (GF) multipliers and pre-computing the common items, the GOA can reduce the number of XOR gates efficiently and thus reduce the circuit area. Different from other local optimization algorithms, the GOA is a global one. When there are more than one maximum matches at a time, the best match choice in the GOA has the least impact on the final result by only choosing the pair with the smallest relational value instead of choosing a pair randomly. The results show that the area of parallel Chien search circuits can be reduced by 51% compared to the direct implementation when the group-based GOA is used for GF multipliers and by 26% if applying the GOA to GF multipliers separately. This optimization scheme can be widely used in general parallel architecture in which many GF multipliers are involved.展开更多
A capacity model of multi-phase signalized intersections is derived by a stopping-line method. It is simplified with two normal situations: one situation involves one straight lane and one left-turn lane; the other s...A capacity model of multi-phase signalized intersections is derived by a stopping-line method. It is simplified with two normal situations: one situation involves one straight lane and one left-turn lane; the other situation involves two straight lanes and one left-turn lane. The results show that the capacity is mainly relative to signal cycle length, phase length, intersection layout and following time. With regard to the vehicles arrival rates, the optimal model is derived based on each phase's remaining time balance, and it is solved by Lagrange multipliers. Therefore, the calculation models of the optimal signal cycle length and phase lengths are derived and simplified. Compared to the existing models, the proposed model is more convenient and practical. Finally, a practical intersection is chosen and its signal cycles and phase lengths are calculated by the proposed model.展开更多
An approach to identifying fuzzy models considering both interpretability and precision was proposed. Firstly, interpretability issues about fuzzy models were analyzed. Then, a heuristic strategy was used to select in...An approach to identifying fuzzy models considering both interpretability and precision was proposed. Firstly, interpretability issues about fuzzy models were analyzed. Then, a heuristic strategy was used to select input variables by increasing the number of input variables, and the Gustafson-Kessel fuzzy clustering algorithm, combined with the least square method, was used to identify the fuzzy model. Subsequently, an interpretability measure was described by the product of the number of input variables and the number of rules, while precision was weighted by root mean square error, and the selection objective function concerning interpretability and precision was defined. Given the maximum and minimum number of input variables and rules, a set of fuzzy models was constructed. Finally, the optimal fuzzy model was selected by the objective function, and was optimized by a genetic algorithm to achieve a good tradeoff between interpretability and precision. The performance of the proposed method was illustrated by the well-known Box-Jenkins gas furnace benchmark; the results demonstrate its validity.展开更多
A novel algorithm, i.e. the fast alternating direction method of multipliers (ADMM), is applied to solve the classical total-variation ( TV )-based model for image reconstruction. First, the TV-based model is refo...A novel algorithm, i.e. the fast alternating direction method of multipliers (ADMM), is applied to solve the classical total-variation ( TV )-based model for image reconstruction. First, the TV-based model is reformulated as a linear equality constrained problem where the objective function is separable. Then, by introducing the augmented Lagrangian function, the two variables are alternatively minimized by the Gauss-Seidel idea. Finally, the dual variable is updated. Because the approach makes full use of the special structure of the problem and decomposes the original problem into several low-dimensional sub-problems, the per iteration computational complexity of the approach is dominated by two fast Fourier transforms. Elementary experimental results indicate that the proposed approach is more stable and efficient compared with some state-of-the-art algorithms.展开更多
In this article, we have given the definition of the relative double multiplier (quasi-multiplier) on a ternary algebra,and studied the isomorphic problem of the multiplier algebra M(A,e) of a ternary algerbra A.
文摘In this paper,X is a locally compact Hausdorff space and A is a Banach algebra.First,we study some basic features of C0(X,A)related to BSE concept,which are gotten from A.In particular,we prove that if C0(X,A)has the BSE property then A has so.We also establish the converse of this result,whenever X is discrete and A has the BSE-norm property.Furthermore,we prove the same result for the BSE property of type I.Finally,we prove that C0(X,A)has the BSE-norm property if and only if A has so.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.62172268 and 62302289)the Shanghai Science and Technology Project(Grant Nos.21JC1402800 and 23YF1416200)。
文摘As a part of quantum image processing,quantum image filtering is a crucial technology in the development of quantum computing.Low-pass filtering can effectively achieve anti-aliasing effects on images.Currently,most quantum image filterings are based on classical domains and grayscale images,and there are relatively fewer studies on anti-aliasing in the quantum domain.This paper proposes a scheme for anti-aliasing filtering based on quantum grayscale and color image scaling in the spatial domain.It achieves the effect of anti-aliasing filtering on quantum images during the scaling process.First,we use the novel enhanced quantum representation(NEQR)and the improved quantum representation of color images(INCQI)to represent classical images.Since aliasing phenomena are more pronounced when images are scaled down,this paper focuses only on the anti-aliasing effects in the case of reduction.Subsequently,we perform anti-aliasing filtering on the quantum representation of the original image and then use bilinear interpolation to scale down the image,achieving the anti-aliasing effect.The constructed pyramid model is then used to select an appropriate image for upscaling to the original image size.Finally,the complexity of the circuit is analyzed.Compared to the images experiencing aliasing effects solely due to scaling,applying anti-aliasing filtering to the images results in smoother and clearer outputs.Additionally,the anti-aliasing filtering allows for manual intervention to select the desired level of image smoothness.
基金Supported by National Natural Science Foundation of China (Grant Nos.52305127,52075414)China Postdoctoral Science Foundation (Grant No.2021M702595)。
文摘In practice,simultaneous impact localization and time history reconstruction can hardly be achieved,due to the illposed and under-determined problems induced by the constrained and harsh measuring conditions.Although l_(1) regularization can be used to obtain sparse solutions,it tends to underestimate solution amplitudes as a biased estimator.To address this issue,a novel impact force identification method with l_(p) regularization is proposed in this paper,using the alternating direction method of multipliers(ADMM).By decomposing the complex primal problem into sub-problems solvable in parallel via proximal operators,ADMM can address the challenge effectively.To mitigate the sensitivity to regularization parameters,an adaptive regularization parameter is derived based on the K-sparsity strategy.Then,an ADMM-based sparse regularization method is developed,which is capable of handling l_(p) regularization with arbitrary p values using adaptively-updated parameters.The effectiveness and performance of the proposed method are validated on an aircraft skin-like composite structure.Additionally,an investigation into the optimal p value for achieving high-accuracy solutions via l_(p) regularization is conducted.It turns out that l_(0.6)regularization consistently yields sparser and more accurate solutions for impact force identification compared to the classic l_(1) regularization method.The impact force identification method proposed in this paper can simultaneously reconstruct impact time history with high accuracy and accurately localize the impact using an under-determined sensor configuration.
基金partially supported by the NSF grants DMS-1854434,DMS-1952644,DMS-2151235,DMS-2219904,and CAREER 1846690。
文摘In this paper,we design an efficient,multi-stage image segmentation framework that incorporates a weighted difference of anisotropic and isotropic total variation(AITV).The segmentation framework generally consists of two stages:smoothing and thresholding,thus referred to as smoothing-and-thresholding(SaT).In the first stage,a smoothed image is obtained by an AITV-regularized Mumford-Shah(MS)model,which can be solved efficiently by the alternating direction method of multipliers(ADMMs)with a closed-form solution of a proximal operator of the l_(1)-αl_(2) regularizer.The convergence of the ADMM algorithm is analyzed.In the second stage,we threshold the smoothed image by K-means clustering to obtain the final segmentation result.Numerical experiments demonstrate that the proposed segmentation framework is versatile for both grayscale and color images,effcient in producing high-quality segmentation results within a few seconds,and robust to input images that are corrupted with noise,blur,or both.We compare the AITV method with its original convex TV and nonconvex TVP(O<p<1)counterparts,showcasing the qualitative and quantitative advantages of our proposed method.
文摘In this paper, a modified version of the Classical Lagrange Multiplier method is developed for convex quadratic optimization problems. The method, which is evolved from the first order derivative test for optimality of the Lagrangian function with respect to the primary variables of the problem, decomposes the solution process into two independent ones, in which the primary variables are solved for independently, and then the secondary variables, which are the Lagrange multipliers, are solved for, afterward. This is an innovation that leads to solving independently two simpler systems of equations involving the primary variables only, on one hand, and the secondary ones on the other. Solutions obtained for small sized problems (as preliminary test of the method) demonstrate that the new method is generally effective in producing the required solutions.
文摘In the contemporary era, the proliferation of information technology has led to an unprecedented surge in data generation, with this data being dispersed across a multitude of mobile devices. Facing these situations and the training of deep learning model that needs great computing power support, the distributed algorithm that can carry out multi-party joint modeling has attracted everyone’s attention. The distributed training mode relieves the huge pressure of centralized model on computer computing power and communication. However, most distributed algorithms currently work in a master-slave mode, often including a central server for coordination, which to some extent will cause communication pressure, data leakage, privacy violations and other issues. To solve these problems, a decentralized fully distributed algorithm based on deep random weight neural network is proposed. The algorithm decomposes the original objective function into several sub-problems under consistency constraints, combines the decentralized average consensus (DAC) and alternating direction method of multipliers (ADMM), and achieves the goal of joint modeling and training through local calculation and communication of each node. Finally, we compare the proposed decentralized algorithm with several centralized deep neural networks with random weights, and experimental results demonstrate the effectiveness of the proposed algorithm.
基金the NSF of China(12171266,12171062)the NSF of Chongqing(CSTB2022NSCQ-JQX0004)。
文摘Let X be a complex Banach space and let B and C be two closed linear operators on X satisfying the condition D(B)?D(C),and let d∈L^(1)(R_(+))and 0≤β<α≤2.We characterize the well-posedness of the fractional integro-differential equations D^(α)u(t)+CD^(β)u(t)=Bu(t)+∫_(-∞)td(t-s)Bu(s)ds+f(t),(0≤t≤2π)on periodic Lebesgue-Bochner spaces L^(p)(T;X)and periodic Besov spaces B_(p,q)^(s)(T;X).
基金supported in part by the Major Program of the Ministry of Science and Technology of China under Grant 2019YFB2205102in part by the National Natural Science Foundation of China under Grant 61974164,62074166,61804181,62004219,62004220,62104256.
文摘With the continuous development of deep learning,Deep Convolutional Neural Network(DCNN)has attracted wide attention in the industry due to its high accuracy in image classification.Compared with other DCNN hard-ware deployment platforms,Field Programmable Gate Array(FPGA)has the advantages of being programmable,low power consumption,parallelism,and low cost.However,the enormous amount of calculation of DCNN and the limited logic capacity of FPGA restrict the energy efficiency of the DCNN accelerator.The traditional sequential sliding window method can improve the throughput of the DCNN accelerator by data multiplexing,but this method’s data multiplexing rate is low because it repeatedly reads the data between rows.This paper proposes a fast data readout strategy via the circular sliding window data reading method,it can improve the multiplexing rate of data between rows by optimizing the memory access order of input data.In addition,the multiplication bit width of the DCNN accelerator is much smaller than that of the Digital Signal Processing(DSP)on the FPGA,which means that there will be a waste of resources if a multiplication uses a single DSP.A multiplier sharing strategy is proposed,the multiplier of the accelerator is customized so that a single DSP block can complete multiple groups of 4,6,and 8-bit signed multiplication in parallel.Finally,based on two strategies of appeal,an FPGA optimized accelerator is proposed.The accelerator is customized by Verilog language and deployed on Xilinx VCU118.When the accelerator recognizes the CIRFAR-10 dataset,its energy efficiency is 39.98 GOPS/W,which provides 1.73×speedup energy efficiency over previous DCNN FPGA accelerators.When the accelerator recognizes the IMAGENET dataset,its energy efficiency is 41.12 GOPS/W,which shows 1.28×−3.14×energy efficiency compared with others.
基金supported by the National Natural Science Foundation of China(11901495)Hunan Provincial NSF Project(2019JJ50573)the Scientific Research Fund of Hunan Provincial Education Department(22B0155)。
文摘In this paper,using inhomogeneous Calderon’s reproducing formulas and the space of test functions associated with a para-accretive function,the inhomogeneous Besov and TriebelLizorkin spaces are established.As applications,pointwise multiplier theorems are also obtained.
文摘One of the elementary operations in computing systems is multiplication.Therefore,high-speed and low-power multipliers design is mandatory for efficient computing systems.In designing low-energy dissipation circuits,reversible logic is more efficient than irreversible logic circuits but at the cost of higher complexity.This paper introduces an efficient signed/unsigned 4×4 reversible Vedic multiplier with minimum quantum cost.The Vedic multiplier is considered fast as it generates all partial product and their sum in one step.This paper proposes two reversible Vedic multipliers with optimized quantum cost and garbage output.First,the unsigned Vedic multiplier is designed based on the Urdhava Tiryakbhyam(UT)Sutra.This multiplier consists of bitwise multiplication and adder compressors.Compared with Vedic multipliers in the literature,the proposed design has a quantum cost of 111 with a reduction of 94%compared to the previous design.It has a garbage output of 30 with optimization of the best-compared design.Second,the proposed unsigned multiplier is expanded to allow the multiplication of signed numbers as well as unsigned numbers.Two signed Vedic multipliers are presented with the aim of obtaining more optimization in performance parameters.DesignI has separate binary two’s complement(B2C)and MUX circuits,while DesignII combines binary two’s complement and MUX circuits in one circuit.DesignI shows the lowest quantum cost,231,regarding state-ofthe-art.DesignII has a quantum cost of 199,reducing to 86.14%of DesignI.The functionality of the proposed multiplier is simulated and verified using XILINX ISE 14.2.
文摘Approximate computing is a popularfield for low power consumption that is used in several applications like image processing,video processing,multi-media and data mining.This Approximate computing is majorly performed with an arithmetic circuit particular with a multiplier.The multiplier is the most essen-tial element used for approximate computing where the power consumption is majorly based on its performance.There are several researchers are worked on the approximate multiplier for power reduction for a few decades,but the design of low power approximate multiplier is not so easy.This seems a bigger challenge for digital industries to design an approximate multiplier with low power and minimum error rate with higher accuracy.To overcome these issues,the digital circuits are applied to the Deep Learning(DL)approaches for higher accuracy.In recent times,DL is the method that is used for higher learning and prediction accuracy in severalfields.Therefore,the Long Short-Term Memory(LSTM)is a popular time series DL method is used in this work for approximate computing.To provide an optimal solution,the LSTM is combined with a meta-heuristics Jel-lyfish search optimisation technique to design an input aware deep learning-based approximate multiplier(DLAM).In this work,the jelly optimised LSTM model is used to enhance the error metrics performance of the Approximate multiplier.The optimal hyperparameters of the LSTM model are identified by jelly search opti-misation.Thisfine-tuning is used to obtain an optimal solution to perform an LSTM with higher accuracy.The proposed pre-trained LSTM model is used to generate approximate design libraries for the different truncation levels as a func-tion of area,delay,power and error metrics.The experimental results on an 8-bit multiplier with an image processing application shows that the proposed approx-imate computing multiplier achieved a superior area and power reduction with very good results on error rates.
文摘An equation concerning with the subdifferential of convex functionals defined in real Banach spaces and the metric projections to level sets is shown. The equation is compared with the resolvents of general monotone operators, and makes the geometric properties of differential equations expressed by subdifferentials clear. Hence, it can be expected to be useful in obtaining the steepest descents defined by the convex functionals in Banach spaces. Also, it gives a similar result to the Lagrange multiplier method under certain conditions.
文摘A global optimization algorithm (GOA) for parallel Chien search circuit in Reed-Solomon (RS) (255,239) decoder is presented. By finding out the common modulo 2 additions within groups of Galois field (GF) multipliers and pre-computing the common items, the GOA can reduce the number of XOR gates efficiently and thus reduce the circuit area. Different from other local optimization algorithms, the GOA is a global one. When there are more than one maximum matches at a time, the best match choice in the GOA has the least impact on the final result by only choosing the pair with the smallest relational value instead of choosing a pair randomly. The results show that the area of parallel Chien search circuits can be reduced by 51% compared to the direct implementation when the group-based GOA is used for GF multipliers and by 26% if applying the GOA to GF multipliers separately. This optimization scheme can be widely used in general parallel architecture in which many GF multipliers are involved.
基金China Postdoctoral Science Foundation(No.2004035208)Jiangsu Communication Science Foundation (No.06Y36)
文摘A capacity model of multi-phase signalized intersections is derived by a stopping-line method. It is simplified with two normal situations: one situation involves one straight lane and one left-turn lane; the other situation involves two straight lanes and one left-turn lane. The results show that the capacity is mainly relative to signal cycle length, phase length, intersection layout and following time. With regard to the vehicles arrival rates, the optimal model is derived based on each phase's remaining time balance, and it is solved by Lagrange multipliers. Therefore, the calculation models of the optimal signal cycle length and phase lengths are derived and simplified. Compared to the existing models, the proposed model is more convenient and practical. Finally, a practical intersection is chosen and its signal cycles and phase lengths are calculated by the proposed model.
文摘An approach to identifying fuzzy models considering both interpretability and precision was proposed. Firstly, interpretability issues about fuzzy models were analyzed. Then, a heuristic strategy was used to select input variables by increasing the number of input variables, and the Gustafson-Kessel fuzzy clustering algorithm, combined with the least square method, was used to identify the fuzzy model. Subsequently, an interpretability measure was described by the product of the number of input variables and the number of rules, while precision was weighted by root mean square error, and the selection objective function concerning interpretability and precision was defined. Given the maximum and minimum number of input variables and rules, a set of fuzzy models was constructed. Finally, the optimal fuzzy model was selected by the objective function, and was optimized by a genetic algorithm to achieve a good tradeoff between interpretability and precision. The performance of the proposed method was illustrated by the well-known Box-Jenkins gas furnace benchmark; the results demonstrate its validity.
基金The Scientific Research Foundation of Nanjing University of Posts and Telecommunications(No.NY210049)
文摘A novel algorithm, i.e. the fast alternating direction method of multipliers (ADMM), is applied to solve the classical total-variation ( TV )-based model for image reconstruction. First, the TV-based model is reformulated as a linear equality constrained problem where the objective function is separable. Then, by introducing the augmented Lagrangian function, the two variables are alternatively minimized by the Gauss-Seidel idea. Finally, the dual variable is updated. Because the approach makes full use of the special structure of the problem and decomposes the original problem into several low-dimensional sub-problems, the per iteration computational complexity of the approach is dominated by two fast Fourier transforms. Elementary experimental results indicate that the proposed approach is more stable and efficient compared with some state-of-the-art algorithms.
文摘In this article, we have given the definition of the relative double multiplier (quasi-multiplier) on a ternary algebra,and studied the isomorphic problem of the multiplier algebra M(A,e) of a ternary algerbra A.