For underwater robots in the process of performing target detection tasks,the color distortion and the uneven quality of underwater images lead to great difficulties in the feature extraction process of the model,whic...For underwater robots in the process of performing target detection tasks,the color distortion and the uneven quality of underwater images lead to great difficulties in the feature extraction process of the model,which is prone to issues like error detection,omission detection,and poor accuracy.Therefore,this paper proposed the CER-YOLOv7(CBAM-EIOU-RepVGG-YOLOv7)underwater target detection algorithm.To improve the algorithm’s capability to retain valid features from both spatial and channel perspectives during the feature extraction phase,we have added a Convolutional Block Attention Module(CBAM)to the backbone network.The Reparameterization Visual Geometry Group(RepVGG)module is inserted into the backbone to improve the training and inference capabilities.The Efficient Intersection over Union(EIoU)loss is also used as the localization loss function,which reduces the error detection rate and missed detection rate of the algorithm.The experimental results of the CER-YOLOv7 algorithm on the UPRC(Underwater Robot Prototype Competition)dataset show that the mAP(mean Average Precision)score of the algorithm is 86.1%,which is a 2.2%improvement compared to the YOLOv7.The feasibility and validity of the CER-YOLOv7 are proved through ablation and comparison experiments,and it is more suitable for underwater target detection.展开更多
Traditional feature-based image stitching techniques often encounter obstacles when dealing with images lackingunique attributes or suffering from quality degradation. The scarcity of annotated datasets in real-life s...Traditional feature-based image stitching techniques often encounter obstacles when dealing with images lackingunique attributes or suffering from quality degradation. The scarcity of annotated datasets in real-life scenesseverely undermines the reliability of supervised learning methods in image stitching. Furthermore, existing deeplearning architectures designed for image stitching are often too bulky to be deployed on mobile and peripheralcomputing devices. To address these challenges, this study proposes a novel unsupervised image stitching methodbased on the YOLOv8 (You Only Look Once version 8) framework that introduces deep homography networksand attentionmechanisms. Themethodology is partitioned into three distinct stages. The initial stage combines theattention mechanism with a pooling pyramid model to enhance the detection and recognition of compact objectsin images, the task of the deep homography networks module is to estimate the global homography of the inputimages consideringmultiple viewpoints. The second stage involves preliminary stitching of the masks generated inthe initial stage and further enhancement through weighted computation to eliminate common stitching artifacts.The final stage is characterized by adaptive reconstruction and careful refinement of the initial stitching results.Comprehensive experiments acrossmultiple datasets are executed tometiculously assess the proposed model. Ourmethod’s Peak Signal-to-Noise Ratio (PSNR) and Structure Similarity Index Measure (SSIM) improved by 10.6%and 6%. These experimental results confirm the efficacy and utility of the presented model in this paper.展开更多
Landslide disasters comprise the majority of geological incidents on slopes,posing severe threats to the safety of human lives and property while exerting a significant impact on the geological environment.The rapid i...Landslide disasters comprise the majority of geological incidents on slopes,posing severe threats to the safety of human lives and property while exerting a significant impact on the geological environment.The rapid identification of landslides is important for disaster prevention and control;however,currently,landslide identification relies mainly on the manual interpretation of remote sensing images.Manual interpretation and feature recognition methods are time-consuming,labor-intensive,and challenging when confronted with complex scenarios.Consequently,automatic landslide recognition has emerged as a pivotal avenue for future development.In this study,a dataset comprising 2000 landslide images was constructed using open-source remote sensing images and datasets.The YOLOv7 model was enhanced using data augmentation algorithms and attention mechanisms.Three optimization models were formulated to realize automatic landslide recognition.The findings demonstrate the commendable performance of the optimized model in automatic landslide recognition,achieving a peak accuracy of 95.92%.Subsequently,the optimized model was applied to regional landslide identification,co-seismic landslide identification,and landslide recognition at various scales,all of which showed robust recognition capabilities.Nevertheless,the model exhibits limitations in detecting small targets,indicating areas for refining the deep-learning algorithms.The results of this research offer valuable technical support for the swift identification,prevention,and mitigation of landslide disasters.展开更多
AIM:To evaluate the application of an intelligent diagnostic model for pterygium.METHODS:For intelligent diagnosis of pterygium,the attention mechanisms—SENet,ECANet,CBAM,and Self-Attention—were fused with the light...AIM:To evaluate the application of an intelligent diagnostic model for pterygium.METHODS:For intelligent diagnosis of pterygium,the attention mechanisms—SENet,ECANet,CBAM,and Self-Attention—were fused with the lightweight MobileNetV2 model structure to construct a tri-classification model.The study used 1220 images of three types of anterior ocular segments of the pterygium provided by the Eye Hospital of Nanjing Medical University.Conventional classification models—VGG16,ResNet50,MobileNetV2,and EfficientNetB7—were trained on the same dataset for comparison.To evaluate model performance in terms of accuracy,Kappa value,test time,sensitivity,specificity,the area under curve(AUC),and visual heat map,470 test images of the anterior segment of the pterygium were used.RESULTS:The accuracy of the MobileNetV2+Self-Attention model with 281 MB in model size was 92.77%,and the Kappa value of the model was 88.92%.The testing time using the model was 9ms/image in the server and 138ms/image in the local computer.The sensitivity,specificity,and AUC for the diagnosis of pterygium using normal anterior segment images were 99.47%,100%,and 100%,respectively;using anterior segment images in the observation period were 88.30%,95.32%,and 96.70%,respectively;and using the anterior segment images in the surgery period were 88.18%,94.44%,and 97.30%,respectively.CONCLUSION:The developed model is lightweight and can be used not only for detection but also for assessing the severity of pterygium.展开更多
With the rapid development of electric power systems,load estimation plays an important role in system operation and planning.Usually,load estimation techniques contain traditional,time series,regression analysis-base...With the rapid development of electric power systems,load estimation plays an important role in system operation and planning.Usually,load estimation techniques contain traditional,time series,regression analysis-based,and machine learning-based estimation.Since the machine learning-based method can lead to better performance,in this paper,a deep learning-based load estimation algorithm using image fingerprint and attention mechanism is proposed.First,an image fingerprint construction is proposed for training data.After the data preprocessing,the training data matrix is constructed by the cyclic shift and cubic spline interpolation.Then,the linear mapping and the gray-color transformation method are proposed to form the color image fingerprint.Second,a convolutional neural network(CNN)combined with an attentionmechanism is proposed for training performance improvement.At last,an experiment is carried out to evaluate the estimation performance.Compared with the support vector machine method,CNN method and long short-term memory method,the proposed algorithm has the best load estimation performance.展开更多
Recently,deep image-hiding techniques have attracted considerable attention in covert communication and high-capacity information hiding.However,these approaches have some limitations.For example,a cover image lacks s...Recently,deep image-hiding techniques have attracted considerable attention in covert communication and high-capacity information hiding.However,these approaches have some limitations.For example,a cover image lacks self-adaptability,information leakage,or weak concealment.To address these issues,this study proposes a universal and adaptable image-hiding method.First,a domain attention mechanism is designed by combining the Atrous convolution,which makes better use of the relationship between the secret image domain and the cover image domain.Second,to improve perceived human similarity,perceptual loss is incorporated into the training process.The experimental results are promising,with the proposed method achieving an average pixel discrepancy(APD)of 1.83 and a peak signal-to-noise ratio(PSNR)value of 40.72 dB between the cover and stego images,indicative of its high-quality output.Furthermore,the structural similarity index measure(SSIM)reaches 0.985 while the learned perceptual image patch similarity(LPIPS)remarkably registers at 0.0001.Moreover,self-testing and cross-experiments demonstrate the model’s adaptability and generalization in unknown hidden spaces,making it suitable for diverse computer vision tasks.展开更多
Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum co...Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum computer. For this new topological stabilizer code-XYZ^(2) code defined on the cellular lattice, it is implemented on a hexagonal lattice of qubits and it encodes the logical qubits with the help of stabilizer measurements of weight six and weight two. However topological stabilizer codes in cellular lattice quantum systems suffer from the detrimental effects of noise due to interaction with the environment. Several decoding approaches have been proposed to address this problem. Here, we propose the use of a state-attention based reinforcement learning decoder to decode XYZ^(2) codes, which enables the decoder to more accurately focus on the information related to the current decoding position, and the error correction accuracy of our reinforcement learning decoder model under the optimisation conditions can reach 83.27% under the depolarizing noise model, and we have measured thresholds of 0.18856 and 0.19043 for XYZ^(2) codes at code spacing of 3–7 and 7–11, respectively. our study provides directions and ideas for applications of decoding schemes combining reinforcement learning attention mechanisms to other topological quantum error-correcting codes.展开更多
The current existing problem of deep learning framework for the detection and segmentation of electrical equipment is dominantly related to low precision.Because of the reliable,safe and easy-to-operate technology pro...The current existing problem of deep learning framework for the detection and segmentation of electrical equipment is dominantly related to low precision.Because of the reliable,safe and easy-to-operate technology provided by deep learning-based video surveillance for unmanned inspection of electrical equipment,this paper uses the bottleneck attention module(BAM)attention mechanism to improve the Solov2 model and proposes a new electrical equipment segmentation mode.Firstly,the BAM attention mechanism is integrated into the feature extraction network to adaptively learn the correlation between feature channels,thereby improving the expression ability of the feature map;secondly,the weighted sum of CrossEntropy Loss and Dice loss is designed as the mask loss to improve the segmentation accuracy and robustness of the model;finally,the non-maximal suppression(NMS)algorithm to better handle the overlap problem in instance segmentation.Experimental results show that the proposed method achieves an average segmentation accuracy of mAP of 80.4% on three types of electrical equipment datasets,including transformers,insulators and voltage transformers,which improve the detection accuracy by more than 5.7% compared with the original Solov2 model.The segmentation model proposed can provide a focusing technical means for the intelligent management of power systems.展开更多
Multimodal sentiment analysis aims to understand people’s emotions and opinions from diverse data.Concate-nating or multiplying various modalities is a traditional multi-modal sentiment analysis fusion method.This fu...Multimodal sentiment analysis aims to understand people’s emotions and opinions from diverse data.Concate-nating or multiplying various modalities is a traditional multi-modal sentiment analysis fusion method.This fusion method does not utilize the correlation information between modalities.To solve this problem,this paper proposes amodel based on amulti-head attention mechanism.First,after preprocessing the original data.Then,the feature representation is converted into a sequence of word vectors and positional encoding is introduced to better understand the semantic and sequential information in the input sequence.Next,the input coding sequence is fed into the transformer model for further processing and learning.At the transformer layer,a cross-modal attention consisting of a pair of multi-head attention modules is employed to reflect the correlation between modalities.Finally,the processed results are input into the feedforward neural network to obtain the emotional output through the classification layer.Through the above processing flow,the model can capture semantic information and contextual relationships and achieve good results in various natural language processing tasks.Our model was tested on the CMU Multimodal Opinion Sentiment and Emotion Intensity(CMU-MOSEI)and Multimodal EmotionLines Dataset(MELD),achieving an accuracy of 82.04% and F1 parameters reached 80.59% on the former dataset.展开更多
To improve the prediction accuracy of chaotic time series and reconstruct a more reasonable phase space structure of the prediction network,we propose a convolutional neural network-long short-term memory(CNN-LSTM)pre...To improve the prediction accuracy of chaotic time series and reconstruct a more reasonable phase space structure of the prediction network,we propose a convolutional neural network-long short-term memory(CNN-LSTM)prediction model based on the incremental attention mechanism.Firstly,a traversal search is conducted through the traversal layer for finite parameters in the phase space.Then,an incremental attention layer is utilized for parameter judgment based on the dimension weight criteria(DWC).The phase space parameters that best meet DWC are selected and fed into the input layer.Finally,the constructed CNN-LSTM network extracts spatio-temporal features and provides the final prediction results.The model is verified using Logistic,Lorenz,and sunspot chaotic time series,and the performance is compared from the two dimensions of prediction accuracy and network phase space structure.Additionally,the CNN-LSTM network based on incremental attention is compared with long short-term memory(LSTM),convolutional neural network(CNN),recurrent neural network(RNN),and support vector regression(SVR)for prediction accuracy.The experiment results indicate that the proposed composite network model possesses enhanced capability in extracting temporal features and achieves higher prediction accuracy.Also,the algorithm to estimate the phase space parameter is compared with the traditional CAO,false nearest neighbor,and C-C,three typical methods for determining the chaotic phase space parameters.The experiments reveal that the phase space parameter estimation algorithm based on the incremental attention mechanism is superior in prediction accuracy compared with the traditional phase space reconstruction method in five networks,including CNN-LSTM,LSTM,CNN,RNN,and SVR.展开更多
The intensive application of deep learning in medical image processing has facilitated the advancement of automatic retinal vessel segmentation research.To overcome the limitation that traditional U-shaped vessel segm...The intensive application of deep learning in medical image processing has facilitated the advancement of automatic retinal vessel segmentation research.To overcome the limitation that traditional U-shaped vessel segmentation networks fail to extract features in fundus image sufficiently,we propose a novel network(DSeU-net)based on deformable convolution and squeeze excitation residual module.The deformable convolution is utilized to dynamically adjust the receptive field for the feature extraction of retinal vessel.And the squeeze excitation residual module is used to scale the weights of the low-level features so that the network learns the complex relationships of the different feature layers efficiently.We validate the DSeU-net on three public retinal vessel segmentation datasets including DRIVE,CHASEDB1,and STARE,and the experimental results demonstrate the satisfactory segmentation performance of the network.展开更多
Nano-computed tomography(Nano-CT)is an emerging,high-resolution imaging technique.However,due to their low-light properties,tabletop Nano-CT has to be scanned under long exposure conditions,which the scanning process ...Nano-computed tomography(Nano-CT)is an emerging,high-resolution imaging technique.However,due to their low-light properties,tabletop Nano-CT has to be scanned under long exposure conditions,which the scanning process is time-consuming.For 3D reconstruction data,this paper proposed a lightweight 3D noise reduction method for desktop-level Nano-CT called AAD-ResNet(Axial Attention DeNoise ResNet).The network is framed by theU-net structure.The encoder and decoder are incorporated with the proposed 3D axial attention mechanism and residual dense block.Each layer of the residual dense block can directly access the features of the previous layer,which reduces the redundancy of parameters and improves the efficiency of network training.The 3D axial attention mechanism enhances the correlation between 3D information in the training process and captures the long-distance dependence.It can improve the noise reduction effect and avoid the loss of image structure details.Experimental results show that the network can effectively improve the image quality of a 0.1-s exposure scan to a level close to a 3-s exposure,significantly shortening the sample scanning time.展开更多
Background Magnetic resonance imaging(MRI)has played an important role in the rapid growth of medical imaging diagnostic technology,especially in the diagnosis and treatment of brain tumors owing to its non invasive c...Background Magnetic resonance imaging(MRI)has played an important role in the rapid growth of medical imaging diagnostic technology,especially in the diagnosis and treatment of brain tumors owing to its non invasive characteristics and superior soft tissue contrast.However,brain tumors are characterized by high non uniformity and non-obvious boundaries in MRI images because of their invasive and highly heterogeneous nature.In addition,the labeling of tumor areas is time-consuming and laborious.Methods To address these issues,this study uses a residual grouped convolution module,convolutional block attention module,and bilinear interpolation upsampling method to improve the classical segmentation network U-net.The influence of network normalization,loss function,and network depth on segmentation performance is further considered.Results In the experiments,the Dice score of the proposed segmentation model reached 97.581%,which is 12.438%higher than that of traditional U-net,demonstrating the effective segmentation of MRI brain tumor images.Conclusions In conclusion,we use the improved U-net network to achieve a good segmentation effect of brain tumor MRI images.展开更多
Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion s...Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion scenarios. However, while continuously improving cross-modal feature extraction and fusion, ensuring the model’s detection speed is also a challenging issue. We have devised a deep learning network model for cross-modal pedestrian detection based on Resnet50, aiming to focus on more reliable features and enhance the model’s detection efficiency. This model employs a spatial attention mechanism to reweight the input visible light and infrared image data, enhancing the model’s focus on different spatial positions and sharing the weighted feature data across different modalities, thereby reducing the interference of multi-modal features. Subsequently, lightweight modules with depthwise separable convolution are incorporated to reduce the model’s parameter count and computational load through channel-wise and point-wise convolutions. The network model algorithm proposed in this paper was experimentally validated on the publicly available KAIST dataset and compared with other existing methods. The experimental results demonstrate that our approach achieves favorable performance in various complex environments, affirming the effectiveness of the multispectral pedestrian detection technology proposed in this paper.展开更多
Class Title:Radiological imaging method a comprehensive overview purpose.This GPT paper provides an overview of the different forms of radiological imaging and the potential diagnosis capabilities they offer as well a...Class Title:Radiological imaging method a comprehensive overview purpose.This GPT paper provides an overview of the different forms of radiological imaging and the potential diagnosis capabilities they offer as well as recent advances in the field.Materials and Methods:This paper provides an overview of conventional radiography digital radiography panoramic radiography computed tomography and cone-beam computed tomography.Additionally recent advances in radiological imaging are discussed such as imaging diagnosis and modern computer-aided diagnosis systems.Results:This paper details the differences between the imaging techniques the benefits of each and the current advances in the field to aid in the diagnosis of medical conditions.Conclusion:Radiological imaging is an extremely important tool in modern medicine to assist in medical diagnosis.This work provides an overview of the types of imaging techniques used the recent advances made and their potential applications.展开更多
Considering the nonlinear structure and spatial-temporal correlation of traffic network,and the influence of potential correlation between nodes of traffic network on the spatial features,this paper proposes a traffic...Considering the nonlinear structure and spatial-temporal correlation of traffic network,and the influence of potential correlation between nodes of traffic network on the spatial features,this paper proposes a traffic speed prediction model based on the combination of graph attention network with self-adaptive adjacency matrix(SAdpGAT)and bidirectional gated recurrent unit(BiGRU).First-ly,the model introduces graph attention network(GAT)to extract the spatial features of real road network and potential road network respectively in spatial dimension.Secondly,the spatial features are input into BiGRU to extract the time series features.Finally,the prediction results of the real road network and the potential road network are connected to generate the final prediction results of the model.The experimental results show that the prediction accuracy of the proposed model is im-proved obviously on METR-LA and PEMS-BAY datasets,which proves the advantages of the pro-posed spatial-temporal model in traffic speed prediction.展开更多
Accurate load forecasting forms a crucial foundation for implementing household demand response plans andoptimizing load scheduling. When dealing with short-term load data characterized by substantial fluctuations,a s...Accurate load forecasting forms a crucial foundation for implementing household demand response plans andoptimizing load scheduling. When dealing with short-term load data characterized by substantial fluctuations,a single prediction model is hard to capture temporal features effectively, resulting in diminished predictionaccuracy. In this study, a hybrid deep learning framework that integrates attention mechanism, convolution neuralnetwork (CNN), improved chaotic particle swarm optimization (ICPSO), and long short-term memory (LSTM), isproposed for short-term household load forecasting. Firstly, the CNN model is employed to extract features fromthe original data, enhancing the quality of data features. Subsequently, the moving average method is used for datapreprocessing, followed by the application of the LSTM network to predict the processed data. Moreover, the ICPSOalgorithm is introduced to optimize the parameters of LSTM, aimed at boosting the model’s running speed andaccuracy. Finally, the attention mechanism is employed to optimize the output value of LSTM, effectively addressinginformation loss in LSTM induced by lengthy sequences and further elevating prediction accuracy. According tothe numerical analysis, the accuracy and effectiveness of the proposed hybrid model have been verified. It canexplore data features adeptly, achieving superior prediction accuracy compared to other forecasting methods forthe household load exhibiting significant fluctuations across different seasons.展开更多
The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-genera...The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.展开更多
Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model ...Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention(APSA)module compared to the Attenuated Spatial Pyramid module,which can increase the receptive field and enhance the information,and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results,and the model effectively improves the segmentation accuracy of small data sets.The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks,the algorithm has a better segmentation effect and segmentation accuracy,and its mIOU reaches 75.85%.Moreover,to verify the generality of the model and the effectiveness of the APSA module,experiments were conducted on the VOC 2012 data set,and the APSA module improved mIOU by about 12.2%.展开更多
Regular inspection of bridge cracks is crucial to bridge maintenance and repair.The traditional manual crack detection methods are timeconsuming,dangerous and subjective.At the same time,for the existing mainstream vi...Regular inspection of bridge cracks is crucial to bridge maintenance and repair.The traditional manual crack detection methods are timeconsuming,dangerous and subjective.At the same time,for the existing mainstream vision-based automatic crack detection algorithms,it is challenging to detect fine cracks and balance the detection accuracy and speed.Therefore,this paper proposes a new bridge crack segmentationmethod based on parallel attention mechanism and multi-scale features fusion on top of the DeeplabV3+network framework.First,the improved lightweight MobileNetv2 network and dilated separable convolution are integrated into the original DeeplabV3+network to improve the original backbone network Xception and atrous spatial pyramid pooling(ASPP)module,respectively,dramatically reducing the number of parameters in the network and accelerates the training and prediction speed of the model.Moreover,we introduce the parallel attention mechanism into the encoding and decoding stages.The attention to the crack regions can be enhanced from the aspects of both channel and spatial parts and significantly suppress the interference of various noises.Finally,we further improve the detection performance of the model for fine cracks by introducing a multi-scale features fusion module.Our research results are validated on the self-made dataset.The experiments show that our method is more accurate than other methods.Its intersection of union(IoU)and F1-score(F1)are increased to 77.96%and 87.57%,respectively.In addition,the number of parameters is only 4.10M,which is much smaller than the original network;also,the frames per second(FPS)is increased to 15 frames/s.The results prove that the proposed method fits well the requirements of rapid and accurate detection of bridge cracks and is superior to other methods.展开更多
基金Scientific Research Fund of Liaoning Provincial Education Department(No.JGLX2021030):Research on Vision-Based Intelligent Perception Technology for the Survival of Benthic Organisms.
文摘For underwater robots in the process of performing target detection tasks,the color distortion and the uneven quality of underwater images lead to great difficulties in the feature extraction process of the model,which is prone to issues like error detection,omission detection,and poor accuracy.Therefore,this paper proposed the CER-YOLOv7(CBAM-EIOU-RepVGG-YOLOv7)underwater target detection algorithm.To improve the algorithm’s capability to retain valid features from both spatial and channel perspectives during the feature extraction phase,we have added a Convolutional Block Attention Module(CBAM)to the backbone network.The Reparameterization Visual Geometry Group(RepVGG)module is inserted into the backbone to improve the training and inference capabilities.The Efficient Intersection over Union(EIoU)loss is also used as the localization loss function,which reduces the error detection rate and missed detection rate of the algorithm.The experimental results of the CER-YOLOv7 algorithm on the UPRC(Underwater Robot Prototype Competition)dataset show that the mAP(mean Average Precision)score of the algorithm is 86.1%,which is a 2.2%improvement compared to the YOLOv7.The feasibility and validity of the CER-YOLOv7 are proved through ablation and comparison experiments,and it is more suitable for underwater target detection.
基金Science and Technology Research Project of the Henan Province(222102240014).
文摘Traditional feature-based image stitching techniques often encounter obstacles when dealing with images lackingunique attributes or suffering from quality degradation. The scarcity of annotated datasets in real-life scenesseverely undermines the reliability of supervised learning methods in image stitching. Furthermore, existing deeplearning architectures designed for image stitching are often too bulky to be deployed on mobile and peripheralcomputing devices. To address these challenges, this study proposes a novel unsupervised image stitching methodbased on the YOLOv8 (You Only Look Once version 8) framework that introduces deep homography networksand attentionmechanisms. Themethodology is partitioned into three distinct stages. The initial stage combines theattention mechanism with a pooling pyramid model to enhance the detection and recognition of compact objectsin images, the task of the deep homography networks module is to estimate the global homography of the inputimages consideringmultiple viewpoints. The second stage involves preliminary stitching of the masks generated inthe initial stage and further enhancement through weighted computation to eliminate common stitching artifacts.The final stage is characterized by adaptive reconstruction and careful refinement of the initial stitching results.Comprehensive experiments acrossmultiple datasets are executed tometiculously assess the proposed model. Ourmethod’s Peak Signal-to-Noise Ratio (PSNR) and Structure Similarity Index Measure (SSIM) improved by 10.6%and 6%. These experimental results confirm the efficacy and utility of the presented model in this paper.
基金The authors sincerely appreciate the valuable comments from the anonymous reviewers.The team of Jishunping from Wuhan University is acknowledged for supplying open-source remote sensing data.This research was supported by the Second Tibetan Plateau Scientific Expedition and Research Program(Grant No.2019QZKK0904)the National Natural Science Foundation of China(Grant No.U22A20597).
文摘Landslide disasters comprise the majority of geological incidents on slopes,posing severe threats to the safety of human lives and property while exerting a significant impact on the geological environment.The rapid identification of landslides is important for disaster prevention and control;however,currently,landslide identification relies mainly on the manual interpretation of remote sensing images.Manual interpretation and feature recognition methods are time-consuming,labor-intensive,and challenging when confronted with complex scenarios.Consequently,automatic landslide recognition has emerged as a pivotal avenue for future development.In this study,a dataset comprising 2000 landslide images was constructed using open-source remote sensing images and datasets.The YOLOv7 model was enhanced using data augmentation algorithms and attention mechanisms.Three optimization models were formulated to realize automatic landslide recognition.The findings demonstrate the commendable performance of the optimized model in automatic landslide recognition,achieving a peak accuracy of 95.92%.Subsequently,the optimized model was applied to regional landslide identification,co-seismic landslide identification,and landslide recognition at various scales,all of which showed robust recognition capabilities.Nevertheless,the model exhibits limitations in detecting small targets,indicating areas for refining the deep-learning algorithms.The results of this research offer valuable technical support for the swift identification,prevention,and mitigation of landslide disasters.
基金Supported by the National Natural Science Foundation of China(No.61906066)Scientific Research Fund of Zhejiang Provincial Education Department(No.Y202147191)+2 种基金Huzhou University Graduate Research Innovation Project(No.2020KYCX21)Sanming Project of Medicine in Shenzhen(SZSM202311012)Shenzhen Science and Technology Program(No.JCYJ20220530153604010).
文摘AIM:To evaluate the application of an intelligent diagnostic model for pterygium.METHODS:For intelligent diagnosis of pterygium,the attention mechanisms—SENet,ECANet,CBAM,and Self-Attention—were fused with the lightweight MobileNetV2 model structure to construct a tri-classification model.The study used 1220 images of three types of anterior ocular segments of the pterygium provided by the Eye Hospital of Nanjing Medical University.Conventional classification models—VGG16,ResNet50,MobileNetV2,and EfficientNetB7—were trained on the same dataset for comparison.To evaluate model performance in terms of accuracy,Kappa value,test time,sensitivity,specificity,the area under curve(AUC),and visual heat map,470 test images of the anterior segment of the pterygium were used.RESULTS:The accuracy of the MobileNetV2+Self-Attention model with 281 MB in model size was 92.77%,and the Kappa value of the model was 88.92%.The testing time using the model was 9ms/image in the server and 138ms/image in the local computer.The sensitivity,specificity,and AUC for the diagnosis of pterygium using normal anterior segment images were 99.47%,100%,and 100%,respectively;using anterior segment images in the observation period were 88.30%,95.32%,and 96.70%,respectively;and using the anterior segment images in the surgery period were 88.18%,94.44%,and 97.30%,respectively.CONCLUSION:The developed model is lightweight and can be used not only for detection but also for assessing the severity of pterygium.
文摘With the rapid development of electric power systems,load estimation plays an important role in system operation and planning.Usually,load estimation techniques contain traditional,time series,regression analysis-based,and machine learning-based estimation.Since the machine learning-based method can lead to better performance,in this paper,a deep learning-based load estimation algorithm using image fingerprint and attention mechanism is proposed.First,an image fingerprint construction is proposed for training data.After the data preprocessing,the training data matrix is constructed by the cyclic shift and cubic spline interpolation.Then,the linear mapping and the gray-color transformation method are proposed to form the color image fingerprint.Second,a convolutional neural network(CNN)combined with an attentionmechanism is proposed for training performance improvement.At last,an experiment is carried out to evaluate the estimation performance.Compared with the support vector machine method,CNN method and long short-term memory method,the proposed algorithm has the best load estimation performance.
基金supported by the National Key R&D Program of China(Grant Number 2021YFB2700900)the National Natural Science Foundation of China(Grant Numbers 62172232,62172233)the Jiangsu Basic Research Program Natural Science Foundation(Grant Number BK20200039).
文摘Recently,deep image-hiding techniques have attracted considerable attention in covert communication and high-capacity information hiding.However,these approaches have some limitations.For example,a cover image lacks self-adaptability,information leakage,or weak concealment.To address these issues,this study proposes a universal and adaptable image-hiding method.First,a domain attention mechanism is designed by combining the Atrous convolution,which makes better use of the relationship between the secret image domain and the cover image domain.Second,to improve perceived human similarity,perceptual loss is incorporated into the training process.The experimental results are promising,with the proposed method achieving an average pixel discrepancy(APD)of 1.83 and a peak signal-to-noise ratio(PSNR)value of 40.72 dB between the cover and stego images,indicative of its high-quality output.Furthermore,the structural similarity index measure(SSIM)reaches 0.985 while the learned perceptual image patch similarity(LPIPS)remarkably registers at 0.0001.Moreover,self-testing and cross-experiments demonstrate the model’s adaptability and generalization in unknown hidden spaces,making it suitable for diverse computer vision tasks.
基金supported by the Natural Science Foundation of Shandong Province,China (Grant No. ZR2021MF049)Joint Fund of Natural Science Foundation of Shandong Province (Grant Nos. ZR2022LLZ012 and ZR2021LLZ001)。
文摘Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum computer. For this new topological stabilizer code-XYZ^(2) code defined on the cellular lattice, it is implemented on a hexagonal lattice of qubits and it encodes the logical qubits with the help of stabilizer measurements of weight six and weight two. However topological stabilizer codes in cellular lattice quantum systems suffer from the detrimental effects of noise due to interaction with the environment. Several decoding approaches have been proposed to address this problem. Here, we propose the use of a state-attention based reinforcement learning decoder to decode XYZ^(2) codes, which enables the decoder to more accurately focus on the information related to the current decoding position, and the error correction accuracy of our reinforcement learning decoder model under the optimisation conditions can reach 83.27% under the depolarizing noise model, and we have measured thresholds of 0.18856 and 0.19043 for XYZ^(2) codes at code spacing of 3–7 and 7–11, respectively. our study provides directions and ideas for applications of decoding schemes combining reinforcement learning attention mechanisms to other topological quantum error-correcting codes.
基金Jilin Science and Technology Development Plan Project(No.20200403075SF)Doctoral Research Start-Up Fund of Northeast Electric Power University(No.BSJXM-2018202).
文摘The current existing problem of deep learning framework for the detection and segmentation of electrical equipment is dominantly related to low precision.Because of the reliable,safe and easy-to-operate technology provided by deep learning-based video surveillance for unmanned inspection of electrical equipment,this paper uses the bottleneck attention module(BAM)attention mechanism to improve the Solov2 model and proposes a new electrical equipment segmentation mode.Firstly,the BAM attention mechanism is integrated into the feature extraction network to adaptively learn the correlation between feature channels,thereby improving the expression ability of the feature map;secondly,the weighted sum of CrossEntropy Loss and Dice loss is designed as the mask loss to improve the segmentation accuracy and robustness of the model;finally,the non-maximal suppression(NMS)algorithm to better handle the overlap problem in instance segmentation.Experimental results show that the proposed method achieves an average segmentation accuracy of mAP of 80.4% on three types of electrical equipment datasets,including transformers,insulators and voltage transformers,which improve the detection accuracy by more than 5.7% compared with the original Solov2 model.The segmentation model proposed can provide a focusing technical means for the intelligent management of power systems.
基金supported by the National Natural Science Foundation of China under Grant 61702462the Henan Provincial Science and Technology Research Project under Grants 222102210010 and 222102210064+2 种基金the Research and Practice Project of Higher Education Teaching Reform in Henan Province under Grants 2019SJGLX320 and 2019SJGLX020the Undergraduate Universities Smart Teaching Special Research Project of Henan Province under Grant JiaoGao[2021]No.489-29the Academic Degrees&Graduate Education Reform Project of Henan Province under Grant 2021SJGLX115Y.
文摘Multimodal sentiment analysis aims to understand people’s emotions and opinions from diverse data.Concate-nating or multiplying various modalities is a traditional multi-modal sentiment analysis fusion method.This fusion method does not utilize the correlation information between modalities.To solve this problem,this paper proposes amodel based on amulti-head attention mechanism.First,after preprocessing the original data.Then,the feature representation is converted into a sequence of word vectors and positional encoding is introduced to better understand the semantic and sequential information in the input sequence.Next,the input coding sequence is fed into the transformer model for further processing and learning.At the transformer layer,a cross-modal attention consisting of a pair of multi-head attention modules is employed to reflect the correlation between modalities.Finally,the processed results are input into the feedforward neural network to obtain the emotional output through the classification layer.Through the above processing flow,the model can capture semantic information and contextual relationships and achieve good results in various natural language processing tasks.Our model was tested on the CMU Multimodal Opinion Sentiment and Emotion Intensity(CMU-MOSEI)and Multimodal EmotionLines Dataset(MELD),achieving an accuracy of 82.04% and F1 parameters reached 80.59% on the former dataset.
文摘To improve the prediction accuracy of chaotic time series and reconstruct a more reasonable phase space structure of the prediction network,we propose a convolutional neural network-long short-term memory(CNN-LSTM)prediction model based on the incremental attention mechanism.Firstly,a traversal search is conducted through the traversal layer for finite parameters in the phase space.Then,an incremental attention layer is utilized for parameter judgment based on the dimension weight criteria(DWC).The phase space parameters that best meet DWC are selected and fed into the input layer.Finally,the constructed CNN-LSTM network extracts spatio-temporal features and provides the final prediction results.The model is verified using Logistic,Lorenz,and sunspot chaotic time series,and the performance is compared from the two dimensions of prediction accuracy and network phase space structure.Additionally,the CNN-LSTM network based on incremental attention is compared with long short-term memory(LSTM),convolutional neural network(CNN),recurrent neural network(RNN),and support vector regression(SVR)for prediction accuracy.The experiment results indicate that the proposed composite network model possesses enhanced capability in extracting temporal features and achieves higher prediction accuracy.Also,the algorithm to estimate the phase space parameter is compared with the traditional CAO,false nearest neighbor,and C-C,three typical methods for determining the chaotic phase space parameters.The experiments reveal that the phase space parameter estimation algorithm based on the incremental attention mechanism is superior in prediction accuracy compared with the traditional phase space reconstruction method in five networks,including CNN-LSTM,LSTM,CNN,RNN,and SVR.
基金Beijing Natural Science Foundation(No.IS23112)Beijing Institute of Technology Research Fund Program for Young Scholars(No.6120220236)。
文摘The intensive application of deep learning in medical image processing has facilitated the advancement of automatic retinal vessel segmentation research.To overcome the limitation that traditional U-shaped vessel segmentation networks fail to extract features in fundus image sufficiently,we propose a novel network(DSeU-net)based on deformable convolution and squeeze excitation residual module.The deformable convolution is utilized to dynamically adjust the receptive field for the feature extraction of retinal vessel.And the squeeze excitation residual module is used to scale the weights of the low-level features so that the network learns the complex relationships of the different feature layers efficiently.We validate the DSeU-net on three public retinal vessel segmentation datasets including DRIVE,CHASEDB1,and STARE,and the experimental results demonstrate the satisfactory segmentation performance of the network.
基金supported by the National Natural Science Foundation of China(62201618).
文摘Nano-computed tomography(Nano-CT)is an emerging,high-resolution imaging technique.However,due to their low-light properties,tabletop Nano-CT has to be scanned under long exposure conditions,which the scanning process is time-consuming.For 3D reconstruction data,this paper proposed a lightweight 3D noise reduction method for desktop-level Nano-CT called AAD-ResNet(Axial Attention DeNoise ResNet).The network is framed by theU-net structure.The encoder and decoder are incorporated with the proposed 3D axial attention mechanism and residual dense block.Each layer of the residual dense block can directly access the features of the previous layer,which reduces the redundancy of parameters and improves the efficiency of network training.The 3D axial attention mechanism enhances the correlation between 3D information in the training process and captures the long-distance dependence.It can improve the noise reduction effect and avoid the loss of image structure details.Experimental results show that the network can effectively improve the image quality of a 0.1-s exposure scan to a level close to a 3-s exposure,significantly shortening the sample scanning time.
基金Research Fund of Macao Polytechnic University(RP/FCSD-01/2022).
文摘Background Magnetic resonance imaging(MRI)has played an important role in the rapid growth of medical imaging diagnostic technology,especially in the diagnosis and treatment of brain tumors owing to its non invasive characteristics and superior soft tissue contrast.However,brain tumors are characterized by high non uniformity and non-obvious boundaries in MRI images because of their invasive and highly heterogeneous nature.In addition,the labeling of tumor areas is time-consuming and laborious.Methods To address these issues,this study uses a residual grouped convolution module,convolutional block attention module,and bilinear interpolation upsampling method to improve the classical segmentation network U-net.The influence of network normalization,loss function,and network depth on segmentation performance is further considered.Results In the experiments,the Dice score of the proposed segmentation model reached 97.581%,which is 12.438%higher than that of traditional U-net,demonstrating the effective segmentation of MRI brain tumor images.Conclusions In conclusion,we use the improved U-net network to achieve a good segmentation effect of brain tumor MRI images.
基金supported by the Henan Provincial Science and Technology Research Project under Grants 232102211006,232102210044,232102211017,232102210055 and 222102210214the Science and Technology Innovation Project of Zhengzhou University of Light Industry under Grant 23XNKJTD0205+1 种基金the Undergraduate Universities Smart Teaching Special Research Project of Henan Province under Grant Jiao Gao[2021]No.489-29the Doctor Natural Science Foundation of Zhengzhou University of Light Industry under Grants 2021BSJJ025 and 2022BSJJZK13.
文摘Multispectral pedestrian detection technology leverages infrared images to provide reliable information for visible light images, demonstrating significant advantages in low-light conditions and background occlusion scenarios. However, while continuously improving cross-modal feature extraction and fusion, ensuring the model’s detection speed is also a challenging issue. We have devised a deep learning network model for cross-modal pedestrian detection based on Resnet50, aiming to focus on more reliable features and enhance the model’s detection efficiency. This model employs a spatial attention mechanism to reweight the input visible light and infrared image data, enhancing the model’s focus on different spatial positions and sharing the weighted feature data across different modalities, thereby reducing the interference of multi-modal features. Subsequently, lightweight modules with depthwise separable convolution are incorporated to reduce the model’s parameter count and computational load through channel-wise and point-wise convolutions. The network model algorithm proposed in this paper was experimentally validated on the publicly available KAIST dataset and compared with other existing methods. The experimental results demonstrate that our approach achieves favorable performance in various complex environments, affirming the effectiveness of the multispectral pedestrian detection technology proposed in this paper.
文摘Class Title:Radiological imaging method a comprehensive overview purpose.This GPT paper provides an overview of the different forms of radiological imaging and the potential diagnosis capabilities they offer as well as recent advances in the field.Materials and Methods:This paper provides an overview of conventional radiography digital radiography panoramic radiography computed tomography and cone-beam computed tomography.Additionally recent advances in radiological imaging are discussed such as imaging diagnosis and modern computer-aided diagnosis systems.Results:This paper details the differences between the imaging techniques the benefits of each and the current advances in the field to aid in the diagnosis of medical conditions.Conclusion:Radiological imaging is an extremely important tool in modern medicine to assist in medical diagnosis.This work provides an overview of the types of imaging techniques used the recent advances made and their potential applications.
基金the National Natural Science Foundation of China(No.61461027,61762059)the Provincial Science and Technology Program supported the Key Project of Natural Science Foundation of Gansu Province(No.22JR5RA226)。
文摘Considering the nonlinear structure and spatial-temporal correlation of traffic network,and the influence of potential correlation between nodes of traffic network on the spatial features,this paper proposes a traffic speed prediction model based on the combination of graph attention network with self-adaptive adjacency matrix(SAdpGAT)and bidirectional gated recurrent unit(BiGRU).First-ly,the model introduces graph attention network(GAT)to extract the spatial features of real road network and potential road network respectively in spatial dimension.Secondly,the spatial features are input into BiGRU to extract the time series features.Finally,the prediction results of the real road network and the potential road network are connected to generate the final prediction results of the model.The experimental results show that the prediction accuracy of the proposed model is im-proved obviously on METR-LA and PEMS-BAY datasets,which proves the advantages of the pro-posed spatial-temporal model in traffic speed prediction.
基金the Shanghai Rising-Star Program(No.22QA1403900)the National Natural Science Foundation of China(No.71804106)the Noncarbon Energy Conversion and Utilization Institute under the Shanghai Class IV Peak Disciplinary Development Program.
文摘Accurate load forecasting forms a crucial foundation for implementing household demand response plans andoptimizing load scheduling. When dealing with short-term load data characterized by substantial fluctuations,a single prediction model is hard to capture temporal features effectively, resulting in diminished predictionaccuracy. In this study, a hybrid deep learning framework that integrates attention mechanism, convolution neuralnetwork (CNN), improved chaotic particle swarm optimization (ICPSO), and long short-term memory (LSTM), isproposed for short-term household load forecasting. Firstly, the CNN model is employed to extract features fromthe original data, enhancing the quality of data features. Subsequently, the moving average method is used for datapreprocessing, followed by the application of the LSTM network to predict the processed data. Moreover, the ICPSOalgorithm is introduced to optimize the parameters of LSTM, aimed at boosting the model’s running speed andaccuracy. Finally, the attention mechanism is employed to optimize the output value of LSTM, effectively addressinginformation loss in LSTM induced by lengthy sequences and further elevating prediction accuracy. According tothe numerical analysis, the accuracy and effectiveness of the proposed hybrid model have been verified. It canexplore data features adeptly, achieving superior prediction accuracy compared to other forecasting methods forthe household load exhibiting significant fluctuations across different seasons.
基金the National Natural Science Foundation of China(No.61976080)the Academic Degrees&Graduate Education Reform Project of Henan Province(No.2021SJGLX195Y)+1 种基金the Teaching Reform Research and Practice Project of Henan Undergraduate Universities(No.2022SYJXLX008)the Key Project on Research and Practice of Henan University Graduate Education and Teaching Reform(No.YJSJG2023XJ006)。
文摘The unsupervised multi-modal image translation is an emerging domain of computer vision whose goal is to transform an image from the source domain into many diverse styles in the target domain.However,the multi-generator mechanism is employed among the advanced approaches available to model different domain mappings,which results in inefficient training of neural networks and pattern collapse,leading to inefficient generation of image diversity.To address this issue,this paper introduces a multi-modal unsupervised image translation framework that uses a generator to perform multi-modal image translation.Specifically,firstly,the domain code is introduced in this paper to explicitly control the different generation tasks.Secondly,this paper brings in the squeeze-and-excitation(SE)mechanism and feature attention(FA)module.Finally,the model integrates multiple optimization objectives to ensure efficient multi-modal translation.This paper performs qualitative and quantitative experiments on multiple non-paired benchmark image translation datasets while demonstrating the benefits of the proposed method over existing technologies.Overall,experimental results have shown that the proposed method is versatile and scalable.
基金Ministry of Science and Technology Basic Resources Survey Special Project,Grant/Award Number:2019FY100900High-level Hospital Construction Project,Grant/Award Number:DFJH2019015+2 种基金National Natural Science Foundation of China,Grant/Award Number:61871021Guangdong Natural Science Foundation,Grant/Award Number:2019A1515011676Beijing Key Laboratory of Robotics Bionic and Functional Research。
文摘Aiming at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples,a bilateral U-Net network model with a spatial attention mechanism is designed.The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention(APSA)module compared to the Attenuated Spatial Pyramid module,which can increase the receptive field and enhance the information,and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results,and the model effectively improves the segmentation accuracy of small data sets.The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks,the algorithm has a better segmentation effect and segmentation accuracy,and its mIOU reaches 75.85%.Moreover,to verify the generality of the model and the effectiveness of the APSA module,experiments were conducted on the VOC 2012 data set,and the APSA module improved mIOU by about 12.2%.
基金This work was supported by the High-Tech Industry Science and Technology Innovation Leading Plan Project of Hunan Provincial under Grant 2020GK2026,author B.Y,http://kjt.hunan.gov.cn/.
文摘Regular inspection of bridge cracks is crucial to bridge maintenance and repair.The traditional manual crack detection methods are timeconsuming,dangerous and subjective.At the same time,for the existing mainstream vision-based automatic crack detection algorithms,it is challenging to detect fine cracks and balance the detection accuracy and speed.Therefore,this paper proposes a new bridge crack segmentationmethod based on parallel attention mechanism and multi-scale features fusion on top of the DeeplabV3+network framework.First,the improved lightweight MobileNetv2 network and dilated separable convolution are integrated into the original DeeplabV3+network to improve the original backbone network Xception and atrous spatial pyramid pooling(ASPP)module,respectively,dramatically reducing the number of parameters in the network and accelerates the training and prediction speed of the model.Moreover,we introduce the parallel attention mechanism into the encoding and decoding stages.The attention to the crack regions can be enhanced from the aspects of both channel and spatial parts and significantly suppress the interference of various noises.Finally,we further improve the detection performance of the model for fine cracks by introducing a multi-scale features fusion module.Our research results are validated on the self-made dataset.The experiments show that our method is more accurate than other methods.Its intersection of union(IoU)and F1-score(F1)are increased to 77.96%and 87.57%,respectively.In addition,the number of parameters is only 4.10M,which is much smaller than the original network;also,the frames per second(FPS)is increased to 15 frames/s.The results prove that the proposed method fits well the requirements of rapid and accurate detection of bridge cracks and is superior to other methods.