The perception module of advanced driver assistance systems plays a vital role.Perception schemes often use a single sensor for data processing and environmental perception or adopt the information processing results ...The perception module of advanced driver assistance systems plays a vital role.Perception schemes often use a single sensor for data processing and environmental perception or adopt the information processing results of various sensors for the fusion of the detection layer.This paper proposes a multi-scale and multi-sensor data fusion strategy in the front end of perception and accomplishes a multi-sensor function disparity map generation scheme.A binocular stereo vision sensor composed of two cameras and a light deterction and ranging(LiDAR)sensor is used to jointly perceive the environment,and a multi-scale fusion scheme is employed to improve the accuracy of the disparity map.This solution not only has the advantages of dense perception of binocular stereo vision sensors but also considers the perception accuracy of LiDAR sensors.Experiments demonstrate that the multi-scale multi-sensor scheme proposed in this paper significantly improves disparity map estimation.展开更多
Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware reso...Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.展开更多
Automatic modulation classification(AMC) technology is one of the cutting-edge technologies in cognitive radio communications. AMC based on deep learning has recently attracted much attention due to its superior perfo...Automatic modulation classification(AMC) technology is one of the cutting-edge technologies in cognitive radio communications. AMC based on deep learning has recently attracted much attention due to its superior performances in classification accuracy and robustness. In this paper, we propose a novel, high resolution and multi-scale feature fusion convolutional neural network model with a squeeze-excitation block, referred to as HRSENet,to classify different kinds of modulation signals.The proposed model establishes a parallel computing mechanism of multi-resolution feature maps through the multi-layer convolution operation, which effectively reduces the information loss caused by downsampling convolution. Moreover, through dense skipconnecting at the same resolution and up-sampling or down-sampling connection at different resolutions, the low resolution representation of the deep feature maps and the high resolution representation of the shallow feature maps are simultaneously extracted and fully integrated, which is benificial to mine signal multilevel features. Finally, the feature squeeze and excitation module embedded in the decoder is used to adjust the response weights between channels, further improving classification accuracy of proposed model.The proposed HRSENet significantly outperforms existing methods in terms of classification accuracy on the public dataset “Over the Air” in signal-to-noise(SNR) ranging from-2dB to 20dB. The classification accuracy in the proposed model achieves 85.36% and97.30% at 4dB and 10dB, respectively, with the improvement by 9.71% and 5.82% compared to LWNet.Furthermore, the model also has a moderate computation complexity compared with several state-of-the-art methods.展开更多
Accurate diagnosis of apple leaf diseases is crucial for improving the quality of apple production and promoting the development of the apple industry. However, apple leaf diseases do not differ significantly from ima...Accurate diagnosis of apple leaf diseases is crucial for improving the quality of apple production and promoting the development of the apple industry. However, apple leaf diseases do not differ significantly from image texture and structural information. The difficulties in disease feature extraction in complex backgrounds slow the related research progress. To address the problems, this paper proposes an improved multi-scale inverse bottleneck residual network model based on a triplet parallel attention mechanism, which is built upon ResNet-50, while improving and combining the inception module and ResNext inverse bottleneck blocks, to recognize seven types of apple leaf(including six diseases of alternaria leaf spot, brown spot, grey spot, mosaic, rust, scab, and one healthy). First, the 3×3 convolutions in some of the residual modules are replaced by multi-scale residual convolutions, the convolution kernels of different sizes contained in each branch of the multi-scale convolution are applied to extract feature maps of different sizes, and the outputs of these branches are multi-scale fused by summing to enrich the output features of the images. Second, the global layer-wise dynamic coordinated inverse bottleneck structure is used to reduce the network feature loss. The inverse bottleneck structure makes the image information less lossy when transforming from different dimensional feature spaces. The fusion of multi-scale and layer-wise dynamic coordinated inverse bottlenecks makes the model effectively balances computational efficiency and feature representation capability, and more robust with a combination of horizontal and vertical features in the fine identification of apple leaf diseases. Finally, after each improved module, a triplet parallel attention module is integrated with cross-dimensional interactions among channels through rotations and residual transformations, which improves the parallel search efficiency of important features and the recognition rate of the network with relatively small computational costs while the dimensional dependencies are improved. To verify the validity of the model in this paper, we uniformly enhance apple leaf disease images screened from the public data sets of Plant Village, Baidu Flying Paddle, and the Internet. The final processed image count is 14,000. The ablation study, pre-processing comparison, and method comparison are conducted on the processed datasets. The experimental results demonstrate that the proposed method reaches 98.73% accuracy on the adopted datasets, which is 1.82% higher than the classical ResNet-50 model, and 0.29% better than the apple leaf disease datasets before preprocessing. It also achieves competitive results in apple leaf disease identification compared to some state-ofthe-art methods.展开更多
In order to extract the richer feature information of ship targets from sea clutter, and address the high dimensional data problem, a method termed as multi-scale fusion kernel sparse preserving projection(MSFKSPP) ba...In order to extract the richer feature information of ship targets from sea clutter, and address the high dimensional data problem, a method termed as multi-scale fusion kernel sparse preserving projection(MSFKSPP) based on the maximum margin criterion(MMC) is proposed for recognizing the class of ship targets utilizing the high-resolution range profile(HRRP). Multi-scale fusion is introduced to capture the local and detailed information in small-scale features, and the global and contour information in large-scale features, offering help to extract the edge information from sea clutter and further improving the target recognition accuracy. The proposed method can maximally preserve the multi-scale fusion sparse of data and maximize the class separability in the reduced dimensionality by reproducing kernel Hilbert space. Experimental results on the measured radar data show that the proposed method can effectively extract the features of ship target from sea clutter, further reduce the feature dimensionality, and improve target recognition performance.展开更多
Due to the lack of long-range association and spatial location information,fine details and accurate boundaries of complex clothing images cannot always be obtained by using the existing deep learning-based methods.Th...Due to the lack of long-range association and spatial location information,fine details and accurate boundaries of complex clothing images cannot always be obtained by using the existing deep learning-based methods.This paper presents a convolutional structure with multi-scale fusion to optimize the step of clothing feature extraction and a self-attention module to capture long-range association information.The structure enables the self-attention mechanism to directly participate in the process of information exchange through the down-scaling projection operation of the multi-scale framework.In addition,the improved self-attention module introduces the extraction of 2-dimensional relative position information to make up for its lack of ability to extract spatial position features from clothing images.The experimental results based on the colorful fashion parsing dataset(CFPD)show that the proposed network structure achieves 53.68%mean intersection over union(mIoU)and has better performance on the clothing parsing task.展开更多
Infrared-visible image fusion plays an important role in multi-source data fusion,which has the advantage of integrating useful information from multi-source sensors.However,there are still challenges in target enhanc...Infrared-visible image fusion plays an important role in multi-source data fusion,which has the advantage of integrating useful information from multi-source sensors.However,there are still challenges in target enhancement and visual improvement.To deal with these problems,a sub-regional infrared-visible image fusion method(SRF)is proposed.First,morphology and threshold segmentation is applied to extract targets interested in infrared images.Second,the infrared back-ground is reconstructed based on extracted targets and the visible image.Finally,target and back-ground regions are fused using a multi-scale transform.Experimental results are obtained using public data for comparison and evaluation,which demonstrate that the proposed SRF has poten-tial benefits over other methods.展开更多
The high-frequency components in the traditional multi-scale transform method are approximately sparse, which can represent different information of the details. But in the low-frequency component, the coefficients ar...The high-frequency components in the traditional multi-scale transform method are approximately sparse, which can represent different information of the details. But in the low-frequency component, the coefficients around the zero value are very few, so we cannot sparsely represent low-frequency image information. The low-frequency component contains the main energy of the image and depicts the profile of the image. Direct fusion of the low-frequency component will not be conducive to obtain highly accurate fusion result. Therefore, this paper presents an infrared and visible image fusion method combining the multi-scale and top-hat transforms. On one hand, the new top-hat-transform can effectively extract the salient features of the low-frequency component. On the other hand, the multi-scale transform can extract highfrequency detailed information in multiple scales and from diverse directions. The combination of the two methods is conducive to the acquisition of more characteristics and more accurate fusion results. Among them, for the low-frequency component, a new type of top-hat transform is used to extract low-frequency features, and then different fusion rules are applied to fuse the low-frequency features and low-frequency background; for high-frequency components, the product of characteristics method is used to integrate the detailed information in high-frequency. Experimental results show that the proposed algorithm can obtain more detailed information and clearer infrared target fusion results than the traditional multiscale transform methods. Compared with the state-of-the-art fusion methods based on sparse representation, the proposed algorithm is simple and efficacious, and the time consumption is significantly reduced.展开更多
The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prosta...The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prostate segmentation,but due to the variability caused by prostate diseases,automatic segmentation of the prostate presents significant challenges.In this paper,we propose an attention-guided multi-scale feature fusion network(AGMSF-Net)to segment prostate MRI images.We propose an attention mechanism for extracting multi-scale features,and introduce a 3D transformer module to enhance global feature representation by adding it during the transition phase from encoder to decoder.In the decoder stage,a feature fusion module is proposed to obtain global context information.We evaluate our model on MRI images of the prostate acquired from a local hospital.The relative volume difference(RVD)and dice similarity coefficient(DSC)between the results of automatic prostate segmentation and ground truth were 1.21%and 93.68%,respectively.To quantitatively evaluate prostate volume on MRI,which is of significant clinical significance,we propose a unique AGMSF-Net.The essential performance evaluation and validation experiments have demonstrated the effectiveness of our method in automatic prostate segmentation.展开更多
With the remarkable advancements in machine vision research and its ever-expanding applications,scholars have increasingly focused on harnessing various vision methodologies within the industrial realm.Specifically,de...With the remarkable advancements in machine vision research and its ever-expanding applications,scholars have increasingly focused on harnessing various vision methodologies within the industrial realm.Specifically,detecting vehicle floor welding points poses unique challenges,including high operational costs and limited portability in practical settings.To address these challenges,this paper innovatively integrates template matching and the Faster RCNN algorithm,presenting an industrial fusion cascaded solder joint detection algorithm that seamlessly blends template matching with deep learning techniques.This algorithm meticulously weights and fuses the optimized features of both methodologies,enhancing the overall detection capabilities.Furthermore,it introduces an optimized multi-scale and multi-template matching approach,leveraging a diverse array of templates and image pyramid algorithms to bolster the accuracy and resilience of object detection.By integrating deep learning algorithms with this multi-scale and multi-template matching strategy,the cascaded target matching algorithm effectively accurately identifies solder joint types and positions.A comprehensive welding point dataset,labeled by experts specifically for vehicle detection,was constructed based on images from authentic industrial environments to validate the algorithm’s performance.Experiments demonstrate the algorithm’s compelling performance in industrial scenarios,outperforming the single-template matching algorithm by 21.3%,the multi-scale and multitemplate matching algorithm by 3.4%,the Faster RCNN algorithm by 19.7%,and the YOLOv9 algorithm by 17.3%in terms of solder joint detection accuracy.This optimized algorithm exhibits remarkable robustness and portability,ideally suited for detecting solder joints across diverse vehicle workpieces.Notably,this study’s dataset and feature fusion approach can be a valuable resource for other algorithms seeking to enhance their solder joint detection capabilities.This work thus not only presents a novel and effective solution for industrial solder joint detection but lays the groundwork for future advancements in this critical area.展开更多
Human posture estimation is a prominent research topic in the fields of human-com-puter interaction,motion recognition,and other intelligent applications.However,achieving highaccuracy in key point localization,which ...Human posture estimation is a prominent research topic in the fields of human-com-puter interaction,motion recognition,and other intelligent applications.However,achieving highaccuracy in key point localization,which is crucial for intelligent applications,contradicts the lowdetection accuracy of human posture detection models in practical scenarios.To address this issue,a human pose estimation network called AT-HRNet has been proposed,which combines convolu-tional self-attention and cross-dimensional feature transformation.AT-HRNet captures significantfeature information from various regions in an adaptive manner,aggregating them through convolu-tional operations within the local receptive domain.The residual structures TripNeck and Trip-Block of the high-resolution network are designed to further refine the key point locations,wherethe attention weight is adjusted by a cross-dimensional interaction to obtain more features.To vali-date the effectiveness of this network,AT-HRNet was evaluated using the COCO2017 dataset.Theresults show that AT-HRNet outperforms HRNet by improving 3.2%in mAP,4.0%in AP75,and3.9%in AP^(M).This suggests that AT-HRNet can offer more beneficial solutions for human posture estimation.展开更多
Imaging quality is a critical component of compressive imaging in real applications. In this study, we propose a compressive imaging method based on multi-scale modulation and reconstruction in the spatial frequency d...Imaging quality is a critical component of compressive imaging in real applications. In this study, we propose a compressive imaging method based on multi-scale modulation and reconstruction in the spatial frequency domain. Theoretical analysis and simulation show the relation between the measurement matrix resolution and compressive sensing(CS)imaging quality. The matrix design is improved to provide multi-scale modulations, followed by individual reconstruction of images of different spatial frequencies. Compared with traditional single-scale CS imaging, the multi-scale method provides high quality imaging in both high and low frequencies, and effectively decreases the overall reconstruction error.Experimental results confirm the feasibility of this technique, especially at low sampling rate. The method may thus be helpful in promoting the implementation of compressive imaging in real applications.展开更多
To fully extract and mine the multi-scale features of reservoirs and geologic structures in time/depth and space dimensions, a new 3D multi-scale volumetric curvature (MSVC) methodology is presented in this paper. W...To fully extract and mine the multi-scale features of reservoirs and geologic structures in time/depth and space dimensions, a new 3D multi-scale volumetric curvature (MSVC) methodology is presented in this paper. We also propose a fast algorithm for computing 3D volumetric curvature. In comparison to conventional volumetric curvature attributes, its main improvements and key algorithms introduce multi-frequency components expansion in time-frequency domain and the corresponding multi-scale adaptive differential operator in the wavenumber domain, into the volumetric curvature calculation. This methodology can simultaneously depict seismic multi-scale features in both time and space. Additionally, we use data fusion of volumetric curvatures at various scales to take full advantage of the geologic features and anomalies extracted by curvature measurements at different scales. The 3D MSVC can highlight geologic anomalies and reduce noise at the same time. Thus, it improves the interpretation efficiency of curvature attributes analysis. The 3D MSVC is applied to both land and marine 3D seismic data. The results demonstrate that it can indicate the spatial distribution of reservoirs, detect faults and fracture zones, and identify their multi-scale properties.展开更多
Accurate prediction of the state-of-charge(SOC)of battery energy storage system(BESS)is critical for its safety and lifespan in electric vehicles.To overcome the imbalance of existing methods between multi-scale featu...Accurate prediction of the state-of-charge(SOC)of battery energy storage system(BESS)is critical for its safety and lifespan in electric vehicles.To overcome the imbalance of existing methods between multi-scale feature fusion and global feature extraction,this paper introduces a novel multi-scale fusion(MSF)model based on gated recurrent unit(GRU),which is specifically designed for complex multi-step SOC prediction in practical BESSs.Pearson correlation analysis is first employed to identify SOC-related parameters.These parameters are then input into a multi-layer GRU for point-wise feature extraction.Concurrently,the parameters undergo patching before entering a dual-stage multi-layer GRU,thus enabling the model to capture nuanced information across varying time intervals.Ultimately,by means of adaptive weight fusion and a fully connected network,multi-step SOC predictions are rendered.Following extensive validation over multiple days,it is illustrated that the proposed model achieves an absolute error of less than 1.5%in real-time SOC prediction.展开更多
Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usuall...Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usually transmit the high-level feature in the encoder to the decoder,and low-level features are neglected.It is noted that low-level features contain abundant detail information,and how to fully exploit low-level features remains unsolved.Meanwhile,the channel information in high-level feature is also not well mined.Inevitably,the performance of grasp detection is degraded.To solve these problems,we propose a grasp detection network with hierarchical multi-scale feature fusion and inverted shuffle residual.Both low-level and high-level features in the encoder are firstly fused by the designed skip connections with attention module,and the fused information is then propagated to corresponding layers of the decoder for in-depth feature fusion.Such a hierarchical fusion guarantees the quality of grasp prediction.Furthermore,an inverted shuffle residual module is created,where the high-level feature from encoder is split in channel and the resultant split features are processed in their respective branches.By such differentiation processing,more high-dimensional channel information is kept,which enhances the representation ability of the network.Besides,an information enhancement module is added before the encoder to reinforce input information.The proposed method attains 98.9%and 97.8%in image-wise and object-wise accuracy on the Cornell grasping dataset,respectively,and the experimental results verify the effectiveness of the method.展开更多
Although the Faster Region-based Convolutional Neural Network(Faster R-CNN)model has obvious advantages in defect recognition,it still cannot overcome challenging problems,such as time-consuming,small targets,irregula...Although the Faster Region-based Convolutional Neural Network(Faster R-CNN)model has obvious advantages in defect recognition,it still cannot overcome challenging problems,such as time-consuming,small targets,irregular shapes,and strong noise interference in bridge defect detection.To deal with these issues,this paper proposes a novel Multi-scale Feature Fusion(MFF)model for bridge appearance disease detection.First,the Faster R-CNN model adopts Region Of Interest(ROl)pooling,which omits the edge information of the target area,resulting in some missed detections and inaccuracies in both detecting and localizing bridge defects.Therefore,this paper proposes an MFF based on regional feature Aggregation(MFF-A),which reduces the missed detection rate of bridge defect detection and improves the positioning accuracy of the target area.Second,the Faster R-CNN model is insensitive to small targets,irregular shapes,and strong noises in bridge defect detection,which results in a long training time and low recognition accuracy.Accordingly,a novel Lightweight MFF(namely MFF-L)model for bridge appearance defect detection using a lightweight network EfficientNetV2 and a feature pyramid network is proposed,which fuses multi-scale features to shorten the training speed and improve recognition accuracy.Finally,the effectiveness of the proposed method is evaluated on the bridge disease dataset and public computational fluid dynamic dataset.展开更多
To solve the problem of difficulty in identifying apple diseases in the natural environment and the low application rate of deep learning recognition networks,a lightweight ResNet(LW-ResNet)model for apple disease rec...To solve the problem of difficulty in identifying apple diseases in the natural environment and the low application rate of deep learning recognition networks,a lightweight ResNet(LW-ResNet)model for apple disease recognition is proposed.Based on the deep residual network(ResNet18),the multi-scale feature extraction layer is constructed by group convolution to realize the compression model and improve the extraction ability of different sizes of lesion features.By improving the identity mapping structure to reduce information loss.By introducing the efficient channel attention module(ECANet)to suppress noise from a complex background.The experimental results show that the average precision,recall and F1-score of the LW-ResNet on the test set are 97.80%,97.92%and 97.85%,respectively.The parameter memory is 2.32 MB,which is 94%less than that of ResNet18.Compared with the classic lightweight networks SqueezeNet and MobileNetV2,LW-ResNet has obvious advantages in recognition performance,speed,parameter memory requirement and time complexity.The proposed model has the advantages of low computational cost,low storage cost,strong real-time performance,high identification accuracy,and strong practicability,which can meet the needs of real-time identification task of apple leaf disease on resource-constrained devices.展开更多
Whole brain functional connectivity(FC)patterns obtained from resting-state functional magnetic resonance imaging(rs-fMRI)have been widely used in the diagnosis of brain disorders such as autism spectrum disorder(ASD)...Whole brain functional connectivity(FC)patterns obtained from resting-state functional magnetic resonance imaging(rs-fMRI)have been widely used in the diagnosis of brain disorders such as autism spectrum disorder(ASD).Recently,an increasing number of studies have focused on employing deep learning techniques to analyze FC patterns for brain disease classification.However,the high dimensionality of the FC features and the interpretation of deep learning results are issues that need to be addressed in the FC-based brain disease classification.In this paper,we proposed a multi-scale attention-based deep neural network(MSA-DNN)model to classify FC patterns for the ASD diagnosis.The model was implemented by adding a flexible multi-scale attention(MSA)module to the auto-encoder based backbone DNN,which can extract multi-scale features of the FC patterns and change the level of attention for different FCs by continuous learning.Our model will reinforce the weights of important FC features while suppress the unimportant FCs to ensure the sparsity of the model weights and enhance the model interpretability.We performed systematic experiments on the large multi-sites ASD dataset with both ten-fold and leaveone-site-out cross-validations.Results showed that our model outperformed classical methods in brain disease classification and revealed robust intersite prediction performance.We also localized important FC features and brain regions associated with ASD classification.Overall,our study further promotes the biomarker detection and computer-aided classification for ASD diagnosis,and the proposed MSA module is flexible and easy to implement in other classification networks.展开更多
Onboard visual object tracking in unmanned aerial vehicles(UAVs)has attractedmuch interest due to its versatility.Meanwhile,due to high precision,Siamese networks are becoming hot spots in visual object tracking.Howev...Onboard visual object tracking in unmanned aerial vehicles(UAVs)has attractedmuch interest due to its versatility.Meanwhile,due to high precision,Siamese networks are becoming hot spots in visual object tracking.However,most Siamese trackers fail to balance the tracking accuracy and time within onboard limited computational resources of UAVs.To meet the tracking precision and real-time requirements,this paper proposes a Siamese dense pixel-level network for UAV object tracking named SiamDPL.Specifically,the Siamese network extracts features of the search region and the template region through a parameter-shared backbone network,then performs correlationmatching to obtain the candidate regionwith high similarity.To improve the matching effect of template and search features,this paper designs a dense pixel-level feature fusion module to enhance the matching ability by pixel-wise correlation and enrich the feature diversity by dense connection.An attention module composed of self-attention and channel attention is introduced to learn global context information and selectively emphasize the target feature region in the spatial and channel dimensions.In addition,a target localization module is designed to improve target location accuracy.Compared with other advanced trackers,experiments on two public benchmarks,which are UAV123@10fps and UAV20L fromthe unmanned air vehicle123(UAV123)dataset,show that SiamDPL can achieve superior performance and low complexity with a running speed of 100.1 fps on NVIDIA TITAN RTX.展开更多
Sputum smear tests are critical for the diagnosis of respiratory diseases. Automatic segmentation of bacteria from spu-tum smear images is important for improving diagnostic efficiency. However, this remains a challen...Sputum smear tests are critical for the diagnosis of respiratory diseases. Automatic segmentation of bacteria from spu-tum smear images is important for improving diagnostic efficiency. However, this remains a challenging task owing to the high interclass similarity among different categories of bacteria and the low contrast of the bacterial edges. To explore more levels of global pattern features to promote the distinguishing ability of bacterial categories and main-tain sufficient local fine-grained features to ensure accurate localization of ambiguous bacteria simultaneously, we propose a novel dual-branch deformable cross-attention fusion network (DB-DCAFN) for accurate bacterial segmen-tation. Specifically, we first designed a dual-branch encoder consisting of multiple convolution and transformer blocks in parallel to simultaneously extract multilevel local and global features. We then designed a sparse and deformable cross-attention module to capture the semantic dependencies between local and global features, which can bridge the semantic gap and fuse features effectively. Furthermore, we designed a feature assignment fusion module to enhance meaningful features using an adaptive feature weighting strategy to obtain more accurate segmentation. We conducted extensive experiments to evaluate the effectiveness of DB-DCAFN on a clinical dataset comprising three bacterial categories: Acinetobacter baumannii, Klebsiella pneumoniae, and Pseudomonas aeruginosa. The experi-mental results demonstrate that the proposed DB-DCAFN outperforms other state-of-the-art methods and is effective at segmenting bacteria from sputum smear images.展开更多
基金the National Key R&D Program of China(2018AAA0103103).
文摘The perception module of advanced driver assistance systems plays a vital role.Perception schemes often use a single sensor for data processing and environmental perception or adopt the information processing results of various sensors for the fusion of the detection layer.This paper proposes a multi-scale and multi-sensor data fusion strategy in the front end of perception and accomplishes a multi-sensor function disparity map generation scheme.A binocular stereo vision sensor composed of two cameras and a light deterction and ranging(LiDAR)sensor is used to jointly perceive the environment,and a multi-scale fusion scheme is employed to improve the accuracy of the disparity map.This solution not only has the advantages of dense perception of binocular stereo vision sensors but also considers the perception accuracy of LiDAR sensors.Experiments demonstrate that the multi-scale multi-sensor scheme proposed in this paper significantly improves disparity map estimation.
文摘Convolutional neural networks (CNNs) are widely used in image classification tasks, but their increasing model size and computation make them challenging to implement on embedded systems with constrained hardware resources. To address this issue, the MobileNetV1 network was developed, which employs depthwise convolution to reduce network complexity. MobileNetV1 employs a stride of 2 in several convolutional layers to decrease the spatial resolution of feature maps, thereby lowering computational costs. However, this stride setting can lead to a loss of spatial information, particularly affecting the detection and representation of smaller objects or finer details in images. To maintain the trade-off between complexity and model performance, a lightweight convolutional neural network with hierarchical multi-scale feature fusion based on the MobileNetV1 network is proposed. The network consists of two main subnetworks. The first subnetwork uses a depthwise dilated separable convolution (DDSC) layer to learn imaging features with fewer parameters, which results in a lightweight and computationally inexpensive network. Furthermore, depthwise dilated convolution in DDSC layer effectively expands the field of view of filters, allowing them to incorporate a larger context. The second subnetwork is a hierarchical multi-scale feature fusion (HMFF) module that uses parallel multi-resolution branches architecture to process the input feature map in order to extract the multi-scale feature information of the input image. Experimental results on the CIFAR-10, Malaria, and KvasirV1 datasets demonstrate that the proposed method is efficient, reducing the network parameters and computational cost by 65.02% and 39.78%, respectively, while maintaining the network performance compared to the MobileNetV1 baseline.
基金supported by the Beijing Natural Science Foundation (L202003)National Natural Science Foundation of China (No. 31700479)。
文摘Automatic modulation classification(AMC) technology is one of the cutting-edge technologies in cognitive radio communications. AMC based on deep learning has recently attracted much attention due to its superior performances in classification accuracy and robustness. In this paper, we propose a novel, high resolution and multi-scale feature fusion convolutional neural network model with a squeeze-excitation block, referred to as HRSENet,to classify different kinds of modulation signals.The proposed model establishes a parallel computing mechanism of multi-resolution feature maps through the multi-layer convolution operation, which effectively reduces the information loss caused by downsampling convolution. Moreover, through dense skipconnecting at the same resolution and up-sampling or down-sampling connection at different resolutions, the low resolution representation of the deep feature maps and the high resolution representation of the shallow feature maps are simultaneously extracted and fully integrated, which is benificial to mine signal multilevel features. Finally, the feature squeeze and excitation module embedded in the decoder is used to adjust the response weights between channels, further improving classification accuracy of proposed model.The proposed HRSENet significantly outperforms existing methods in terms of classification accuracy on the public dataset “Over the Air” in signal-to-noise(SNR) ranging from-2dB to 20dB. The classification accuracy in the proposed model achieves 85.36% and97.30% at 4dB and 10dB, respectively, with the improvement by 9.71% and 5.82% compared to LWNet.Furthermore, the model also has a moderate computation complexity compared with several state-of-the-art methods.
基金supported in part by the General Program Hunan Provincial Natural Science Foundation of 2022,China(2022JJ31022)the Undergraduate Education Reform Project of Hunan Province,China(HNJG-20210532)the National Natural Science Foundation of China(62276276)。
文摘Accurate diagnosis of apple leaf diseases is crucial for improving the quality of apple production and promoting the development of the apple industry. However, apple leaf diseases do not differ significantly from image texture and structural information. The difficulties in disease feature extraction in complex backgrounds slow the related research progress. To address the problems, this paper proposes an improved multi-scale inverse bottleneck residual network model based on a triplet parallel attention mechanism, which is built upon ResNet-50, while improving and combining the inception module and ResNext inverse bottleneck blocks, to recognize seven types of apple leaf(including six diseases of alternaria leaf spot, brown spot, grey spot, mosaic, rust, scab, and one healthy). First, the 3×3 convolutions in some of the residual modules are replaced by multi-scale residual convolutions, the convolution kernels of different sizes contained in each branch of the multi-scale convolution are applied to extract feature maps of different sizes, and the outputs of these branches are multi-scale fused by summing to enrich the output features of the images. Second, the global layer-wise dynamic coordinated inverse bottleneck structure is used to reduce the network feature loss. The inverse bottleneck structure makes the image information less lossy when transforming from different dimensional feature spaces. The fusion of multi-scale and layer-wise dynamic coordinated inverse bottlenecks makes the model effectively balances computational efficiency and feature representation capability, and more robust with a combination of horizontal and vertical features in the fine identification of apple leaf diseases. Finally, after each improved module, a triplet parallel attention module is integrated with cross-dimensional interactions among channels through rotations and residual transformations, which improves the parallel search efficiency of important features and the recognition rate of the network with relatively small computational costs while the dimensional dependencies are improved. To verify the validity of the model in this paper, we uniformly enhance apple leaf disease images screened from the public data sets of Plant Village, Baidu Flying Paddle, and the Internet. The final processed image count is 14,000. The ablation study, pre-processing comparison, and method comparison are conducted on the processed datasets. The experimental results demonstrate that the proposed method reaches 98.73% accuracy on the adopted datasets, which is 1.82% higher than the classical ResNet-50 model, and 0.29% better than the apple leaf disease datasets before preprocessing. It also achieves competitive results in apple leaf disease identification compared to some state-ofthe-art methods.
基金supported by the National Natural Science Foundation of China (62271255,61871218)the Fundamental Research Funds for the Central University (3082019NC2019002)+1 种基金the Aeronautical Science Foundation (ASFC-201920007002)the Program of Remote Sensing Intelligent Monitoring and Emergency Services for Regional Security Elements。
文摘In order to extract the richer feature information of ship targets from sea clutter, and address the high dimensional data problem, a method termed as multi-scale fusion kernel sparse preserving projection(MSFKSPP) based on the maximum margin criterion(MMC) is proposed for recognizing the class of ship targets utilizing the high-resolution range profile(HRRP). Multi-scale fusion is introduced to capture the local and detailed information in small-scale features, and the global and contour information in large-scale features, offering help to extract the edge information from sea clutter and further improving the target recognition accuracy. The proposed method can maximally preserve the multi-scale fusion sparse of data and maximize the class separability in the reduced dimensionality by reproducing kernel Hilbert space. Experimental results on the measured radar data show that the proposed method can effectively extract the features of ship target from sea clutter, further reduce the feature dimensionality, and improve target recognition performance.
文摘Due to the lack of long-range association and spatial location information,fine details and accurate boundaries of complex clothing images cannot always be obtained by using the existing deep learning-based methods.This paper presents a convolutional structure with multi-scale fusion to optimize the step of clothing feature extraction and a self-attention module to capture long-range association information.The structure enables the self-attention mechanism to directly participate in the process of information exchange through the down-scaling projection operation of the multi-scale framework.In addition,the improved self-attention module introduces the extraction of 2-dimensional relative position information to make up for its lack of ability to extract spatial position features from clothing images.The experimental results based on the colorful fashion parsing dataset(CFPD)show that the proposed network structure achieves 53.68%mean intersection over union(mIoU)and has better performance on the clothing parsing task.
基金supported by the China Postdoctoral Science Foundation Funded Project(No.2021M690385)the National Natural Science Foundation of China(No.62101045).
文摘Infrared-visible image fusion plays an important role in multi-source data fusion,which has the advantage of integrating useful information from multi-source sensors.However,there are still challenges in target enhancement and visual improvement.To deal with these problems,a sub-regional infrared-visible image fusion method(SRF)is proposed.First,morphology and threshold segmentation is applied to extract targets interested in infrared images.Second,the infrared back-ground is reconstructed based on extracted targets and the visible image.Finally,target and back-ground regions are fused using a multi-scale transform.Experimental results are obtained using public data for comparison and evaluation,which demonstrate that the proposed SRF has poten-tial benefits over other methods.
基金Project supported by the National Natural Science Foundation of China(Grant No.61402368)Aerospace Support Fund,China(Grant No.2017-HT-XGD)Aerospace Science and Technology Innovation Foundation,China(Grant No.2017 ZD 53047)
文摘The high-frequency components in the traditional multi-scale transform method are approximately sparse, which can represent different information of the details. But in the low-frequency component, the coefficients around the zero value are very few, so we cannot sparsely represent low-frequency image information. The low-frequency component contains the main energy of the image and depicts the profile of the image. Direct fusion of the low-frequency component will not be conducive to obtain highly accurate fusion result. Therefore, this paper presents an infrared and visible image fusion method combining the multi-scale and top-hat transforms. On one hand, the new top-hat-transform can effectively extract the salient features of the low-frequency component. On the other hand, the multi-scale transform can extract highfrequency detailed information in multiple scales and from diverse directions. The combination of the two methods is conducive to the acquisition of more characteristics and more accurate fusion results. Among them, for the low-frequency component, a new type of top-hat transform is used to extract low-frequency features, and then different fusion rules are applied to fuse the low-frequency features and low-frequency background; for high-frequency components, the product of characteristics method is used to integrate the detailed information in high-frequency. Experimental results show that the proposed algorithm can obtain more detailed information and clearer infrared target fusion results than the traditional multiscale transform methods. Compared with the state-of-the-art fusion methods based on sparse representation, the proposed algorithm is simple and efficacious, and the time consumption is significantly reduced.
基金This work was supported in part by the National Natural Science Foundation of China(Grant#:82260362)in part by the National Key R&D Program of China(Grant#:2021ZD0111000)+1 种基金in part by the Key R&D Project of Hainan Province(Grant#:ZDYF2021SHFZ243)in part by the Major Science and Technology Project of Haikou(Grant#:2020-009).
文摘The precise and automatic segmentation of prostate magnetic resonance imaging(MRI)images is vital for assisting doctors in diagnosing prostate diseases.In recent years,many advanced methods have been applied to prostate segmentation,but due to the variability caused by prostate diseases,automatic segmentation of the prostate presents significant challenges.In this paper,we propose an attention-guided multi-scale feature fusion network(AGMSF-Net)to segment prostate MRI images.We propose an attention mechanism for extracting multi-scale features,and introduce a 3D transformer module to enhance global feature representation by adding it during the transition phase from encoder to decoder.In the decoder stage,a feature fusion module is proposed to obtain global context information.We evaluate our model on MRI images of the prostate acquired from a local hospital.The relative volume difference(RVD)and dice similarity coefficient(DSC)between the results of automatic prostate segmentation and ground truth were 1.21%and 93.68%,respectively.To quantitatively evaluate prostate volume on MRI,which is of significant clinical significance,we propose a unique AGMSF-Net.The essential performance evaluation and validation experiments have demonstrated the effectiveness of our method in automatic prostate segmentation.
基金supported in part by the National Key Research Project of China under Grant No.2023YFA1009402General Science and Technology Plan Items in Zhejiang Province ZJKJT-2023-02.
文摘With the remarkable advancements in machine vision research and its ever-expanding applications,scholars have increasingly focused on harnessing various vision methodologies within the industrial realm.Specifically,detecting vehicle floor welding points poses unique challenges,including high operational costs and limited portability in practical settings.To address these challenges,this paper innovatively integrates template matching and the Faster RCNN algorithm,presenting an industrial fusion cascaded solder joint detection algorithm that seamlessly blends template matching with deep learning techniques.This algorithm meticulously weights and fuses the optimized features of both methodologies,enhancing the overall detection capabilities.Furthermore,it introduces an optimized multi-scale and multi-template matching approach,leveraging a diverse array of templates and image pyramid algorithms to bolster the accuracy and resilience of object detection.By integrating deep learning algorithms with this multi-scale and multi-template matching strategy,the cascaded target matching algorithm effectively accurately identifies solder joint types and positions.A comprehensive welding point dataset,labeled by experts specifically for vehicle detection,was constructed based on images from authentic industrial environments to validate the algorithm’s performance.Experiments demonstrate the algorithm’s compelling performance in industrial scenarios,outperforming the single-template matching algorithm by 21.3%,the multi-scale and multitemplate matching algorithm by 3.4%,the Faster RCNN algorithm by 19.7%,and the YOLOv9 algorithm by 17.3%in terms of solder joint detection accuracy.This optimized algorithm exhibits remarkable robustness and portability,ideally suited for detecting solder joints across diverse vehicle workpieces.Notably,this study’s dataset and feature fusion approach can be a valuable resource for other algorithms seeking to enhance their solder joint detection capabilities.This work thus not only presents a novel and effective solution for industrial solder joint detection but lays the groundwork for future advancements in this critical area.
基金the National Natural Science Foundation of China(No.61975015)the Research and Innovation Project for Graduate Students at Zhongyuan University of Technology(No.YKY2024ZK14).
文摘Human posture estimation is a prominent research topic in the fields of human-com-puter interaction,motion recognition,and other intelligent applications.However,achieving highaccuracy in key point localization,which is crucial for intelligent applications,contradicts the lowdetection accuracy of human posture detection models in practical scenarios.To address this issue,a human pose estimation network called AT-HRNet has been proposed,which combines convolu-tional self-attention and cross-dimensional feature transformation.AT-HRNet captures significantfeature information from various regions in an adaptive manner,aggregating them through convolu-tional operations within the local receptive domain.The residual structures TripNeck and Trip-Block of the high-resolution network are designed to further refine the key point locations,wherethe attention weight is adjusted by a cross-dimensional interaction to obtain more features.To vali-date the effectiveness of this network,AT-HRNet was evaluated using the COCO2017 dataset.Theresults show that AT-HRNet outperforms HRNet by improving 3.2%in mAP,4.0%in AP75,and3.9%in AP^(M).This suggests that AT-HRNet can offer more beneficial solutions for human posture estimation.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61601442,61605218,and 61575207)the National Key Research and Development Program of China(Grant No.2018YFB0504302)the Youth Innovation Promotion Association of the Chinese Academy of Sciences(Grant Nos.2015124 and 2019154)。
文摘Imaging quality is a critical component of compressive imaging in real applications. In this study, we propose a compressive imaging method based on multi-scale modulation and reconstruction in the spatial frequency domain. Theoretical analysis and simulation show the relation between the measurement matrix resolution and compressive sensing(CS)imaging quality. The matrix design is improved to provide multi-scale modulations, followed by individual reconstruction of images of different spatial frequencies. Compared with traditional single-scale CS imaging, the multi-scale method provides high quality imaging in both high and low frequencies, and effectively decreases the overall reconstruction error.Experimental results confirm the feasibility of this technique, especially at low sampling rate. The method may thus be helpful in promoting the implementation of compressive imaging in real applications.
基金supported by the National Natural Science Foundation of China (No. 41004054) Research Fund for the Doctoral Program of Higher Education of China (No. 20105122120002)Natural Science Key Project, Sichuan Provincial Department of Education (No. 092A011)
文摘To fully extract and mine the multi-scale features of reservoirs and geologic structures in time/depth and space dimensions, a new 3D multi-scale volumetric curvature (MSVC) methodology is presented in this paper. We also propose a fast algorithm for computing 3D volumetric curvature. In comparison to conventional volumetric curvature attributes, its main improvements and key algorithms introduce multi-frequency components expansion in time-frequency domain and the corresponding multi-scale adaptive differential operator in the wavenumber domain, into the volumetric curvature calculation. This methodology can simultaneously depict seismic multi-scale features in both time and space. Additionally, we use data fusion of volumetric curvatures at various scales to take full advantage of the geologic features and anomalies extracted by curvature measurements at different scales. The 3D MSVC can highlight geologic anomalies and reduce noise at the same time. Thus, it improves the interpretation efficiency of curvature attributes analysis. The 3D MSVC is applied to both land and marine 3D seismic data. The results demonstrate that it can indicate the spatial distribution of reservoirs, detect faults and fracture zones, and identify their multi-scale properties.
基金supported in part by the National Natural Science Foundation of China(No.62172036).
文摘Accurate prediction of the state-of-charge(SOC)of battery energy storage system(BESS)is critical for its safety and lifespan in electric vehicles.To overcome the imbalance of existing methods between multi-scale feature fusion and global feature extraction,this paper introduces a novel multi-scale fusion(MSF)model based on gated recurrent unit(GRU),which is specifically designed for complex multi-step SOC prediction in practical BESSs.Pearson correlation analysis is first employed to identify SOC-related parameters.These parameters are then input into a multi-layer GRU for point-wise feature extraction.Concurrently,the parameters undergo patching before entering a dual-stage multi-layer GRU,thus enabling the model to capture nuanced information across varying time intervals.Ultimately,by means of adaptive weight fusion and a fully connected network,multi-step SOC predictions are rendered.Following extensive validation over multiple days,it is illustrated that the proposed model achieves an absolute error of less than 1.5%in real-time SOC prediction.
基金This work was supported by the National Natural Science Foundation of China(Nos.62073322 and 61633020)the CIE-Tencent Robotics X Rhino-Bird Focused Research Program(No.2022-07)the Beijing Natural Science Foundation(No.2022MQ05).
文摘Grasp detection plays a critical role for robot manipulation.Mainstream pixel-wise grasp detection networks with encoder-decoder structure receive much attention due to good accuracy and efficiency.However,they usually transmit the high-level feature in the encoder to the decoder,and low-level features are neglected.It is noted that low-level features contain abundant detail information,and how to fully exploit low-level features remains unsolved.Meanwhile,the channel information in high-level feature is also not well mined.Inevitably,the performance of grasp detection is degraded.To solve these problems,we propose a grasp detection network with hierarchical multi-scale feature fusion and inverted shuffle residual.Both low-level and high-level features in the encoder are firstly fused by the designed skip connections with attention module,and the fused information is then propagated to corresponding layers of the decoder for in-depth feature fusion.Such a hierarchical fusion guarantees the quality of grasp prediction.Furthermore,an inverted shuffle residual module is created,where the high-level feature from encoder is split in channel and the resultant split features are processed in their respective branches.By such differentiation processing,more high-dimensional channel information is kept,which enhances the representation ability of the network.Besides,an information enhancement module is added before the encoder to reinforce input information.The proposed method attains 98.9%and 97.8%in image-wise and object-wise accuracy on the Cornell grasping dataset,respectively,and the experimental results verify the effectiveness of the method.
基金This work was supported by the National Natural Science Foundation of China(No.61976247)the Major R&D Programs of China(No.2019YFB-1310400).
文摘Although the Faster Region-based Convolutional Neural Network(Faster R-CNN)model has obvious advantages in defect recognition,it still cannot overcome challenging problems,such as time-consuming,small targets,irregular shapes,and strong noise interference in bridge defect detection.To deal with these issues,this paper proposes a novel Multi-scale Feature Fusion(MFF)model for bridge appearance disease detection.First,the Faster R-CNN model adopts Region Of Interest(ROl)pooling,which omits the edge information of the target area,resulting in some missed detections and inaccuracies in both detecting and localizing bridge defects.Therefore,this paper proposes an MFF based on regional feature Aggregation(MFF-A),which reduces the missed detection rate of bridge defect detection and improves the positioning accuracy of the target area.Second,the Faster R-CNN model is insensitive to small targets,irregular shapes,and strong noises in bridge defect detection,which results in a long training time and low recognition accuracy.Accordingly,a novel Lightweight MFF(namely MFF-L)model for bridge appearance defect detection using a lightweight network EfficientNetV2 and a feature pyramid network is proposed,which fuses multi-scale features to shorten the training speed and improve recognition accuracy.Finally,the effectiveness of the proposed method is evaluated on the bridge disease dataset and public computational fluid dynamic dataset.
基金funded by the Science and Technology Development Program of Jilin Province(20190301024NY)the Precision Agriculture and Big Data Engineering Research Center of Jilin Province(2020C005).
文摘To solve the problem of difficulty in identifying apple diseases in the natural environment and the low application rate of deep learning recognition networks,a lightweight ResNet(LW-ResNet)model for apple disease recognition is proposed.Based on the deep residual network(ResNet18),the multi-scale feature extraction layer is constructed by group convolution to realize the compression model and improve the extraction ability of different sizes of lesion features.By improving the identity mapping structure to reduce information loss.By introducing the efficient channel attention module(ECANet)to suppress noise from a complex background.The experimental results show that the average precision,recall and F1-score of the LW-ResNet on the test set are 97.80%,97.92%and 97.85%,respectively.The parameter memory is 2.32 MB,which is 94%less than that of ResNet18.Compared with the classic lightweight networks SqueezeNet and MobileNetV2,LW-ResNet has obvious advantages in recognition performance,speed,parameter memory requirement and time complexity.The proposed model has the advantages of low computational cost,low storage cost,strong real-time performance,high identification accuracy,and strong practicability,which can meet the needs of real-time identification task of apple leaf disease on resource-constrained devices.
基金This work was supported by the National Natural Science Foundation of China(No.61906006).
文摘Whole brain functional connectivity(FC)patterns obtained from resting-state functional magnetic resonance imaging(rs-fMRI)have been widely used in the diagnosis of brain disorders such as autism spectrum disorder(ASD).Recently,an increasing number of studies have focused on employing deep learning techniques to analyze FC patterns for brain disease classification.However,the high dimensionality of the FC features and the interpretation of deep learning results are issues that need to be addressed in the FC-based brain disease classification.In this paper,we proposed a multi-scale attention-based deep neural network(MSA-DNN)model to classify FC patterns for the ASD diagnosis.The model was implemented by adding a flexible multi-scale attention(MSA)module to the auto-encoder based backbone DNN,which can extract multi-scale features of the FC patterns and change the level of attention for different FCs by continuous learning.Our model will reinforce the weights of important FC features while suppress the unimportant FCs to ensure the sparsity of the model weights and enhance the model interpretability.We performed systematic experiments on the large multi-sites ASD dataset with both ten-fold and leaveone-site-out cross-validations.Results showed that our model outperformed classical methods in brain disease classification and revealed robust intersite prediction performance.We also localized important FC features and brain regions associated with ASD classification.Overall,our study further promotes the biomarker detection and computer-aided classification for ASD diagnosis,and the proposed MSA module is flexible and easy to implement in other classification networks.
基金funded by the National Natural Science Foundation of China(Grant No.52072408),author Y.C.
文摘Onboard visual object tracking in unmanned aerial vehicles(UAVs)has attractedmuch interest due to its versatility.Meanwhile,due to high precision,Siamese networks are becoming hot spots in visual object tracking.However,most Siamese trackers fail to balance the tracking accuracy and time within onboard limited computational resources of UAVs.To meet the tracking precision and real-time requirements,this paper proposes a Siamese dense pixel-level network for UAV object tracking named SiamDPL.Specifically,the Siamese network extracts features of the search region and the template region through a parameter-shared backbone network,then performs correlationmatching to obtain the candidate regionwith high similarity.To improve the matching effect of template and search features,this paper designs a dense pixel-level feature fusion module to enhance the matching ability by pixel-wise correlation and enrich the feature diversity by dense connection.An attention module composed of self-attention and channel attention is introduced to learn global context information and selectively emphasize the target feature region in the spatial and channel dimensions.In addition,a target localization module is designed to improve target location accuracy.Compared with other advanced trackers,experiments on two public benchmarks,which are UAV123@10fps and UAV20L fromthe unmanned air vehicle123(UAV123)dataset,show that SiamDPL can achieve superior performance and low complexity with a running speed of 100.1 fps on NVIDIA TITAN RTX.
基金the Natural Science Foundation of Shandong Province,No.ZR2021MH213and in part by the Suzhou Science and Technology Bureau,No.SJC2021023.
文摘Sputum smear tests are critical for the diagnosis of respiratory diseases. Automatic segmentation of bacteria from spu-tum smear images is important for improving diagnostic efficiency. However, this remains a challenging task owing to the high interclass similarity among different categories of bacteria and the low contrast of the bacterial edges. To explore more levels of global pattern features to promote the distinguishing ability of bacterial categories and main-tain sufficient local fine-grained features to ensure accurate localization of ambiguous bacteria simultaneously, we propose a novel dual-branch deformable cross-attention fusion network (DB-DCAFN) for accurate bacterial segmen-tation. Specifically, we first designed a dual-branch encoder consisting of multiple convolution and transformer blocks in parallel to simultaneously extract multilevel local and global features. We then designed a sparse and deformable cross-attention module to capture the semantic dependencies between local and global features, which can bridge the semantic gap and fuse features effectively. Furthermore, we designed a feature assignment fusion module to enhance meaningful features using an adaptive feature weighting strategy to obtain more accurate segmentation. We conducted extensive experiments to evaluate the effectiveness of DB-DCAFN on a clinical dataset comprising three bacterial categories: Acinetobacter baumannii, Klebsiella pneumoniae, and Pseudomonas aeruginosa. The experi-mental results demonstrate that the proposed DB-DCAFN outperforms other state-of-the-art methods and is effective at segmenting bacteria from sputum smear images.