Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum co...Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum computer. For this new topological stabilizer code-XYZ^(2) code defined on the cellular lattice, it is implemented on a hexagonal lattice of qubits and it encodes the logical qubits with the help of stabilizer measurements of weight six and weight two. However topological stabilizer codes in cellular lattice quantum systems suffer from the detrimental effects of noise due to interaction with the environment. Several decoding approaches have been proposed to address this problem. Here, we propose the use of a state-attention based reinforcement learning decoder to decode XYZ^(2) codes, which enables the decoder to more accurately focus on the information related to the current decoding position, and the error correction accuracy of our reinforcement learning decoder model under the optimisation conditions can reach 83.27% under the depolarizing noise model, and we have measured thresholds of 0.18856 and 0.19043 for XYZ^(2) codes at code spacing of 3–7 and 7–11, respectively. our study provides directions and ideas for applications of decoding schemes combining reinforcement learning attention mechanisms to other topological quantum error-correcting codes.展开更多
In recent days,Deep Learning(DL)techniques have become an emerging transformation in the field of machine learning,artificial intelligence,computer vision,and so on.Subsequently,researchers and industries have been hi...In recent days,Deep Learning(DL)techniques have become an emerging transformation in the field of machine learning,artificial intelligence,computer vision,and so on.Subsequently,researchers and industries have been highly endorsed in the medical field,predicting and controlling diverse diseases at specific intervals.Liver tumor prediction is a vital chore in analyzing and treating liver diseases.This paper proposes a novel approach for predicting liver tumors using Convolutional Neural Networks(CNN)and a depth-based variant search algorithm with advanced attention mechanisms(CNN-DS-AM).The proposed work aims to improve accuracy and robustness in diagnosing and treating liver diseases.The anticipated model is assessed on a Computed Tomography(CT)scan dataset containing both benign and malignant liver tumors.The proposed approach achieved high accuracy in predicting liver tumors,outperforming other state-of-the-art methods.Additionally,advanced attention mechanisms were incorporated into the CNN model to enable the identification and highlighting of regions of the CT scans most relevant to predicting liver tumors.The results suggest that incorporating attention mechanisms and a depth-based variant search algorithm into the CNN model is a promising approach for improving the accuracy and robustness of liver tumor prediction.It can assist radiologists in their diagnosis and treatment planning.The proposed system achieved a high accuracy of 95.5%in predicting liver tumors,outperforming other state-of-the-art methods.展开更多
Neurodegeneration is the gradual deterioration and eventual death of brain cells,leading to progressive loss of structure and function of neurons in the brain and nervous system.Neurodegenerative disorders,such as Alz...Neurodegeneration is the gradual deterioration and eventual death of brain cells,leading to progressive loss of structure and function of neurons in the brain and nervous system.Neurodegenerative disorders,such as Alzheimer’s,Huntington’s,Parkinson’s,amyotrophic lateral sclerosis,multiple system atrophy,and multiple sclerosis,are characterized by progressive deterioration of brain function,resulting in symptoms such as memory impairment,movement difficulties,and cognitive decline.Early diagnosis of these conditions is crucial to slowing down cell degeneration and reducing the severity of the diseases.Magnetic resonance imaging(MRI)is widely used by neurologists for diagnosing brain abnormalities.The majority of the research in this field focuses on processing the 2D images extracted from the 3D MRI volumetric scans for disease diagnosis.This might result in losing the volumetric information obtained from the whole brain MRI.To address this problem,a novel 3D-CNN architecture with an attention mechanism is proposed to classify whole-brain MRI images for Alzheimer’s disease(AD)detection.The 3D-CNN model uses channel and spatial attention mechanisms to extract relevant features and improve accuracy in identifying brain dysfunctions by focusing on specific regions of the brain.The pipeline takes pre-processed MRI volumetric scans as input,and the 3D-CNN model leverages both channel and spatial attention mechanisms to extract precise feature representations of the input MRI volume for accurate classification.The present study utilizes the publicly available Alzheimer’s disease Neuroimaging Initiative(ADNI)dataset,which has three image classes:Mild Cognitive Impairment(MCI),Cognitive Normal(CN),and AD affected.The proposed approach achieves an overall accuracy of 79%when classifying three classes and an average accuracy of 87%when identifying AD and the other two classes.The findings reveal that 3D-CNN models with an attention mechanism exhibit significantly higher classification performance compared to other models,highlighting the potential of deep learning algorithms to aid in the early detection and prediction of AD.展开更多
Spam emails pose a threat to individuals. The proliferation of spam emails daily has rendered traditional machine learning and deep learning methods for screening them ineffective and inefficient. In our research, we ...Spam emails pose a threat to individuals. The proliferation of spam emails daily has rendered traditional machine learning and deep learning methods for screening them ineffective and inefficient. In our research, we employ deep neural networks like RNN, LSTM, and GRU, incorporating attention mechanisms such as Bahdanua, scaled dot product (SDP), and Luong scaled dot product self-attention for spam email filtering. We evaluate our approach on various datasets, including Trec spam, Enron spam emails, SMS spam collections, and the Ling spam dataset, which constitutes a substantial custom dataset. All these datasets are publicly available. For the Enron dataset, we attain an accuracy of 99.97% using LSTM with SDP self-attention. Our custom dataset exhibits the highest accuracy of 99.01% when employing GRU with SDP self-attention. The SMS spam collection dataset yields a peak accuracy of 99.61% with LSTM and SDP attention. Using the GRU (Gated Recurrent Unit) alongside Luong and SDP (Structured Self-Attention) attention mechanisms, the peak accuracy of 99.89% in the Ling spam dataset. For the Trec spam dataset, the most accurate results are achieved using Luong attention LSTM, with an accuracy rate of 99.01%. Our performance analyses consistently indicate that employing the scaled dot product attention mechanism in conjunction with gated recurrent neural networks (GRU) delivers the most effective results. In summary, our research underscores the efficacy of employing advanced deep learning techniques and attention mechanisms for spam email filtering, with remarkable accuracy across multiple datasets. This approach presents a promising solution to the ever-growing problem of spam emails.展开更多
Dimensional data directly reflects the growth rate of individual fish,an important economic trait of interest to fish researchers.Efficiently obtaining large-scale fish dimension data would be valuable for both select...Dimensional data directly reflects the growth rate of individual fish,an important economic trait of interest to fish researchers.Efficiently obtaining large-scale fish dimension data would be valuable for both selective breeding and production.To address this,our study proposes a custom dimension measurement method for fish using the YOLOv5-keypoint framework with multi-attention mechanisms.We optimized the YOLOv5 framework,incorporated the SimAM attention mechanism to achieve more accurate and faster fish detection,and added customizable landmarks to the network structure,enabling flexible configuration of the number and location of feature points in the training dataset.This method is applicable to various aquacultural species and other objects.We tested the effectiveness of the method using the economically important grass carp(Ctenopharyngodon idella).The proposed method outperforms pure YOLOv5,Faster R-CNN,and SSD in terms of precision and recall rates,achieving an impressive average precision of 0.9781.Notably,field trials confirmed the method's exceptional measurement accuracy,exceeding 97%compatibility with manual measurements,while demonstrating a realtime speed of 38 frames per second on the NVIDIA RTX A4000.This enables efficient and accurate large-scale surface dimension measurements of economic fish.To facilitate massive measurements in agricultural research,we have implemented this method as an online platform,called Mode-recognition Ruler(MrRuler,http://bioinf o.ihb.ac.cn/mrruler).The platform identifies objects in a single image at an average speed of 0.486±0.005 s,based on a dataset of 10,000 images.MrRuler includes two preset carp models and allows users to upload training datasets for custom models of their targets of interest.展开更多
Crowdsourcing technology is widely recognized for its effectiveness in task scheduling and resource allocation.While traditional methods for task allocation can help reduce costs and improve efficiency,they may encoun...Crowdsourcing technology is widely recognized for its effectiveness in task scheduling and resource allocation.While traditional methods for task allocation can help reduce costs and improve efficiency,they may encounter challenges when dealing with abnormal data flow nodes,leading to decreased allocation accuracy and efficiency.To address these issues,this study proposes a novel two-part invalid detection task allocation framework.In the first step,an anomaly detection model is developed using a dynamic self-attentive GAN to identify anomalous data.Compared to the baseline method,the model achieves an approximately 4%increase in the F1 value on the public dataset.In the second step of the framework,task allocation modeling is performed using a twopart graph matching method.This phase introduces a P-queue KM algorithm that implements a more efficient optimization strategy.The allocation efficiency is improved by approximately 23.83%compared to the baseline method.Empirical results confirm the effectiveness of the proposed framework in detecting abnormal data nodes,enhancing allocation precision,and achieving efficient allocation.展开更多
The advent of self-attention mechanisms within Transformer models has significantly propelled the advancement of deep learning algorithms,yielding outstanding achievements across diverse domains.Nonetheless,self-atten...The advent of self-attention mechanisms within Transformer models has significantly propelled the advancement of deep learning algorithms,yielding outstanding achievements across diverse domains.Nonetheless,self-attention mechanisms falter when applied to datasets with intricate semantic content and extensive dependency structures.In response,this paper introduces a Diffusion Sampling and Label-Driven Co-attention Neural Network(DSLD),which adopts a diffusion sampling method to capture more comprehensive semantic information of the data.Additionally,themodel leverages the joint correlation information of labels and data to introduce the computation of text representation,correcting semantic representationbiases in thedata,andincreasing the accuracyof semantic representation.Ultimately,the model computes the corresponding classification results by synthesizing these rich data semantic representations.Experiments on seven benchmark datasets show that our proposed model achieves competitive results compared to state-of-the-art methods.展开更多
How to use a few defect samples to complete the defect classification is a key challenge in the production of mobile phone screens.An attention-relation network for the mobile phone screen defect classification is pro...How to use a few defect samples to complete the defect classification is a key challenge in the production of mobile phone screens.An attention-relation network for the mobile phone screen defect classification is proposed in this paper.The architecture of the attention-relation network contains two modules:a feature extract module and a feature metric module.Different from other few-shot models,an attention mechanism is applied to metric learning in our model to measure the distance between features,so as to pay attention to the correlation between features and suppress unwanted information.Besides,we combine dilated convolution and skip connection to extract more feature information for follow-up processing.We validate attention-relation network on the mobile phone screen defect dataset.The experimental results show that the classification accuracy of the attentionrelation network is 0.9486 under the 5-way 1-shot training strategy and 0.9039 under the 5-way 5-shot setting.It achieves the excellent effect of classification for mobile phone screen defects and outperforms with dominant advantages.展开更多
Traditional feature-based image stitching techniques often encounter obstacles when dealing with images lackingunique attributes or suffering from quality degradation. The scarcity of annotated datasets in real-life s...Traditional feature-based image stitching techniques often encounter obstacles when dealing with images lackingunique attributes or suffering from quality degradation. The scarcity of annotated datasets in real-life scenesseverely undermines the reliability of supervised learning methods in image stitching. Furthermore, existing deeplearning architectures designed for image stitching are often too bulky to be deployed on mobile and peripheralcomputing devices. To address these challenges, this study proposes a novel unsupervised image stitching methodbased on the YOLOv8 (You Only Look Once version 8) framework that introduces deep homography networksand attentionmechanisms. Themethodology is partitioned into three distinct stages. The initial stage combines theattention mechanism with a pooling pyramid model to enhance the detection and recognition of compact objectsin images, the task of the deep homography networks module is to estimate the global homography of the inputimages consideringmultiple viewpoints. The second stage involves preliminary stitching of the masks generated inthe initial stage and further enhancement through weighted computation to eliminate common stitching artifacts.The final stage is characterized by adaptive reconstruction and careful refinement of the initial stitching results.Comprehensive experiments acrossmultiple datasets are executed tometiculously assess the proposed model. Ourmethod’s Peak Signal-to-Noise Ratio (PSNR) and Structure Similarity Index Measure (SSIM) improved by 10.6%and 6%. These experimental results confirm the efficacy and utility of the presented model in this paper.展开更多
For underwater robots in the process of performing target detection tasks,the color distortion and the uneven quality of underwater images lead to great difficulties in the feature extraction process of the model,whic...For underwater robots in the process of performing target detection tasks,the color distortion and the uneven quality of underwater images lead to great difficulties in the feature extraction process of the model,which is prone to issues like error detection,omission detection,and poor accuracy.Therefore,this paper proposed the CER-YOLOv7(CBAM-EIOU-RepVGG-YOLOv7)underwater target detection algorithm.To improve the algorithm’s capability to retain valid features from both spatial and channel perspectives during the feature extraction phase,we have added a Convolutional Block Attention Module(CBAM)to the backbone network.The Reparameterization Visual Geometry Group(RepVGG)module is inserted into the backbone to improve the training and inference capabilities.The Efficient Intersection over Union(EIoU)loss is also used as the localization loss function,which reduces the error detection rate and missed detection rate of the algorithm.The experimental results of the CER-YOLOv7 algorithm on the UPRC(Underwater Robot Prototype Competition)dataset show that the mAP(mean Average Precision)score of the algorithm is 86.1%,which is a 2.2%improvement compared to the YOLOv7.The feasibility and validity of the CER-YOLOv7 are proved through ablation and comparison experiments,and it is more suitable for underwater target detection.展开更多
Fault detection and diagnosis(FDD)plays a significant role in ensuring the safety and stability of chemical processes.With the development of artificial intelligence(AI)and big data technologies,data-driven approaches...Fault detection and diagnosis(FDD)plays a significant role in ensuring the safety and stability of chemical processes.With the development of artificial intelligence(AI)and big data technologies,data-driven approaches with excellent performance are widely used for FDD in chemical processes.However,improved predictive accuracy has often been achieved through increased model complexity,which turns models into black-box methods and causes uncertainty regarding their decisions.In this study,a causal temporal graph attention network(CTGAN)is proposed for fault diagnosis of chemical processes.A chemical causal graph is built by causal inference to represent the propagation path of faults.The attention mechanism and chemical causal graph were combined to help us notice the key variables relating to fault fluctuations.Experiments in the Tennessee Eastman(TE)process and the green ammonia(GA)process showed that CTGAN achieved high performance and good explainability.展开更多
Predicting the displacement of landslide is of utmost practical importance as the landslide can pose serious threats to both human life and property.However,traditional methods have the limitation of random selection ...Predicting the displacement of landslide is of utmost practical importance as the landslide can pose serious threats to both human life and property.However,traditional methods have the limitation of random selection in sliding window selection and seldom incorporate weather forecast data for displacement prediction,while a single structural model cannot handle input sequences of different lengths at the same time.In order to solve these limitations,in this study,a new approach is proposed that utilizes weather forecast data and incorporates the maximum information coefficient(MIC),long short-term memory network(LSTM),and attention mechanism to establish a teacher-student coupling model with parallel structure for short-term landslide displacement prediction.Through MIC,a suitable input sequence length is selected for the LSTM model.To investigate the influence of rainfall on landslides during different seasons,a parallel teacher-student coupling model is developed that is able to learn sequential information from various time series of different lengths.The teacher model learns sequence information from rainfall intensity time series while incorporating reliable short-term weather forecast data from platforms such as China Meteorological Administration(CMA)and Reliable Prognosis(https://rp5.ru)to improve the model’s expression capability,and the student model learns sequence information from other time series.An attention module is then designed to integrate different sequence information to derive a context vector,representing seasonal temporal attention mode.Finally,the predicted displacement is obtained through a linear layer.The proposed method demonstrates superior prediction accuracies,surpassing those of the support vector machine(SVM),LSTM,recurrent neural network(RNN),temporal convolutional network(TCN),and LSTM-Attention models.It achieves a mean absolute error(MAE)of 0.072 mm,root mean square error(RMSE)of 0.096 mm,and pearson correlation coefficients(PCCS)of 0.85.Additionally,it exhibits enhanced prediction stability and interpretability,rendering it an indispensable tool for landslide disaster prevention and mitigation.展开更多
Landfill leaks pose a serious threat to environmental health,risking the contamination of both groundwater and soil resources.Accurate investigation of these sites is essential for implementing effective prevention an...Landfill leaks pose a serious threat to environmental health,risking the contamination of both groundwater and soil resources.Accurate investigation of these sites is essential for implementing effective prevention and control measures.The self-potential(SP)stands out for its sensitivity to contamination plumes,offering a solution for monitoring and detecting the movement and seepage of subsurface pollutants.However,traditional SP inversion techniques heavily rely on precise subsurface resistivity information.In this study,we propose the Attention U-Net deep learning network for rapid SP inversion.By incorporating an attention mechanism,this algorithm effectively learns the relationship between array-style SP data and the location and extent of subsurface contaminated sources.We designed a synthetic landfill model with a heterogeneous resistivity structure to assess the performance of Attention U-Net deep learning network.Additionally,we conducted further validation using a laboratory model to assess its practical applicability.The results demonstrate that the algorithm is not solely dependent on resistivity information,enabling effective locating of the source distribution,even in models with intricate subsurface structures.Our work provides a promising tool for SP data processing,enhancing the applicability of this method in the field of near-subsurface environmental monitoring.展开更多
Landslide disasters comprise the majority of geological incidents on slopes,posing severe threats to the safety of human lives and property while exerting a significant impact on the geological environment.The rapid i...Landslide disasters comprise the majority of geological incidents on slopes,posing severe threats to the safety of human lives and property while exerting a significant impact on the geological environment.The rapid identification of landslides is important for disaster prevention and control;however,currently,landslide identification relies mainly on the manual interpretation of remote sensing images.Manual interpretation and feature recognition methods are time-consuming,labor-intensive,and challenging when confronted with complex scenarios.Consequently,automatic landslide recognition has emerged as a pivotal avenue for future development.In this study,a dataset comprising 2000 landslide images was constructed using open-source remote sensing images and datasets.The YOLOv7 model was enhanced using data augmentation algorithms and attention mechanisms.Three optimization models were formulated to realize automatic landslide recognition.The findings demonstrate the commendable performance of the optimized model in automatic landslide recognition,achieving a peak accuracy of 95.92%.Subsequently,the optimized model was applied to regional landslide identification,co-seismic landslide identification,and landslide recognition at various scales,all of which showed robust recognition capabilities.Nevertheless,the model exhibits limitations in detecting small targets,indicating areas for refining the deep-learning algorithms.The results of this research offer valuable technical support for the swift identification,prevention,and mitigation of landslide disasters.展开更多
The dominance of Android in the global mobile market and the open development characteristics of this platform have resulted in a significant increase in malware.These malicious applications have become a serious conc...The dominance of Android in the global mobile market and the open development characteristics of this platform have resulted in a significant increase in malware.These malicious applications have become a serious concern to the security of Android systems.To address this problem,researchers have proposed several machine-learning models to detect and classify Android malware based on analyzing features extracted from Android samples.However,most existing studies have focused on the classification task and overlooked the feature selection process,which is crucial to reduce the training time and maintain or improve the classification results.The current paper proposes a new Android malware detection and classification approach that identifies the most important features to improve classification performance and reduce training time.The proposed approach consists of two main steps.First,a feature selection method based on the Attention mechanism is used to select the most important features.Then,an optimized Light Gradient Boosting Machine(LightGBM)classifier is applied to classify the Android samples and identify the malware.The feature selection method proposed in this paper is to integrate an Attention layer into a multilayer perceptron neural network.The role of the Attention layer is to compute the weighted values of each feature based on its importance for the classification process.Experimental evaluation of the approach has shown that combining the Attention-based technique with an optimized classification algorithm for Android malware detection has improved the accuracy from 98.64%to 98.71%while reducing the training time from 80 to 28 s.展开更多
Addressing the challenges in detecting surface floating litter in artificial lakes,including complex environments,uneven illumination,and susceptibility to noise andweather,this paper proposes an efficient and lightwe...Addressing the challenges in detecting surface floating litter in artificial lakes,including complex environments,uneven illumination,and susceptibility to noise andweather,this paper proposes an efficient and lightweight Ghost-YOLO(You Only Look Once)v8 algorithm.The algorithmintegrates advanced attention mechanisms and a smalltarget detection head to significantly enhance detection performance and efficiency.Firstly,an SE(Squeeze-and-Excitation)mechanism is incorporated into the backbone network to fortify the extraction of resilient features and precise target localization.This mechanism models feature channel dependencies,enabling adaptive adjustment of channel importance,thereby improving recognition of floating litter targets.Secondly,a 160×160 small-target detection layer is designed in the feature fusion neck to mitigate semantic information loss due to varying target scales.This design enhances the fusion of deep and shallow semantic information,improving small target feature representation and enabling better capture and identification of tiny floating litter.Thirdly,to balance performance and efficiency,the GhostConv module replaces part of the conventional convolutions in the feature fusion neck.Additionally,a novel C2fGhost(CSPDarknet53 to 2-Stage Feature Pyramid Networks Ghost)module is introduced to further reduce network parameters.Lastly,to address the challenge of occlusion,a newloss function,WIoU(Wise Intersection over Union)v3 incorporating a flexible and non-monotonic concentration approach,is adopted to improve detection rates for surface floating litter.The outcomes of the experiments demonstrate that the Ghost-YOLO v8 model proposed in this paper performs well in the dataset Marine,significantly enhances precision and recall by 3.3 and 7.6 percentage points,respectively,in contrast with the base model,mAP@0.5 and mAP 0.5:0.95 improve by 5.3 and 4.4 percentage points and reduces the computational volume by 1.88MB,the FPS value hardly decreases,and the efficient real-time identification of floating debris on the water’s surface can be achieved costeffectively.展开更多
Recently,deep image-hiding techniques have attracted considerable attention in covert communication and high-capacity information hiding.However,these approaches have some limitations.For example,a cover image lacks s...Recently,deep image-hiding techniques have attracted considerable attention in covert communication and high-capacity information hiding.However,these approaches have some limitations.For example,a cover image lacks self-adaptability,information leakage,or weak concealment.To address these issues,this study proposes a universal and adaptable image-hiding method.First,a domain attention mechanism is designed by combining the Atrous convolution,which makes better use of the relationship between the secret image domain and the cover image domain.Second,to improve perceived human similarity,perceptual loss is incorporated into the training process.The experimental results are promising,with the proposed method achieving an average pixel discrepancy(APD)of 1.83 and a peak signal-to-noise ratio(PSNR)value of 40.72 dB between the cover and stego images,indicative of its high-quality output.Furthermore,the structural similarity index measure(SSIM)reaches 0.985 while the learned perceptual image patch similarity(LPIPS)remarkably registers at 0.0001.Moreover,self-testing and cross-experiments demonstrate the model’s adaptability and generalization in unknown hidden spaces,making it suitable for diverse computer vision tasks.展开更多
With the rapid development of electric power systems,load estimation plays an important role in system operation and planning.Usually,load estimation techniques contain traditional,time series,regression analysis-base...With the rapid development of electric power systems,load estimation plays an important role in system operation and planning.Usually,load estimation techniques contain traditional,time series,regression analysis-based,and machine learning-based estimation.Since the machine learning-based method can lead to better performance,in this paper,a deep learning-based load estimation algorithm using image fingerprint and attention mechanism is proposed.First,an image fingerprint construction is proposed for training data.After the data preprocessing,the training data matrix is constructed by the cyclic shift and cubic spline interpolation.Then,the linear mapping and the gray-color transformation method are proposed to form the color image fingerprint.Second,a convolutional neural network(CNN)combined with an attentionmechanism is proposed for training performance improvement.At last,an experiment is carried out to evaluate the estimation performance.Compared with the support vector machine method,CNN method and long short-term memory method,the proposed algorithm has the best load estimation performance.展开更多
Early screening of diabetes retinopathy(DR)plays an important role in preventing irreversible blindness.Existing research has failed to fully explore effective DR lesion information in fundus maps.Besides,traditional ...Early screening of diabetes retinopathy(DR)plays an important role in preventing irreversible blindness.Existing research has failed to fully explore effective DR lesion information in fundus maps.Besides,traditional attention schemes have not considered the impact of lesion type differences on grading,resulting in unreasonable extraction of important lesion features.Therefore,this paper proposes a DR diagnosis scheme that integrates a multi-level patch attention generator(MPAG)and a lesion localization module(LLM).Firstly,MPAGis used to predict patches of different sizes and generate a weighted attention map based on the prediction score and the types of lesions contained in the patches,fully considering the impact of lesion type differences on grading,solving the problem that the attention maps of lesions cannot be further refined and then adapted to the final DR diagnosis task.Secondly,the LLM generates a global attention map based on localization.Finally,the weighted attention map and global attention map are weighted with the fundus map to fully explore effective DR lesion information and increase the attention of the classification network to lesion details.This paper demonstrates the effectiveness of the proposed method through extensive experiments on the public DDR dataset,obtaining an accuracy of 0.8064.展开更多
The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregula...The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregular and multi-scale nature of food images.Addressing these complexities,our study introduces an advanced model that leverages multiple attention mechanisms and multi-stage local fusion,grounded in the ConvNeXt architecture.Our model employs hybrid attention(HA)mechanisms to pinpoint critical discriminative regions within images,substantially mitigating the influence of background noise.Furthermore,it introduces a multi-stage local fusion(MSLF)module,fostering long-distance dependencies between feature maps at varying stages.This approach facilitates the assimilation of complementary features across scales,significantly bolstering the model’s capacity for feature extraction.Furthermore,we constructed a dataset named Roushi60,which consists of 60 different categories of common meat dishes.Empirical evaluation of the ETH Food-101,ChineseFoodNet,and Roushi60 datasets reveals that our model achieves recognition accuracies of 91.12%,82.86%,and 92.50%,respectively.These figures not only mark an improvement of 1.04%,3.42%,and 1.36%over the foundational ConvNeXt network but also surpass the performance of most contemporary food image recognition methods.Such advancements underscore the efficacy of our proposed model in navigating the intricate landscape of food image recognition,setting a new benchmark for the field.展开更多
基金supported by the Natural Science Foundation of Shandong Province,China (Grant No. ZR2021MF049)Joint Fund of Natural Science Foundation of Shandong Province (Grant Nos. ZR2022LLZ012 and ZR2021LLZ001)。
文摘Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum computer. For this new topological stabilizer code-XYZ^(2) code defined on the cellular lattice, it is implemented on a hexagonal lattice of qubits and it encodes the logical qubits with the help of stabilizer measurements of weight six and weight two. However topological stabilizer codes in cellular lattice quantum systems suffer from the detrimental effects of noise due to interaction with the environment. Several decoding approaches have been proposed to address this problem. Here, we propose the use of a state-attention based reinforcement learning decoder to decode XYZ^(2) codes, which enables the decoder to more accurately focus on the information related to the current decoding position, and the error correction accuracy of our reinforcement learning decoder model under the optimisation conditions can reach 83.27% under the depolarizing noise model, and we have measured thresholds of 0.18856 and 0.19043 for XYZ^(2) codes at code spacing of 3–7 and 7–11, respectively. our study provides directions and ideas for applications of decoding schemes combining reinforcement learning attention mechanisms to other topological quantum error-correcting codes.
文摘In recent days,Deep Learning(DL)techniques have become an emerging transformation in the field of machine learning,artificial intelligence,computer vision,and so on.Subsequently,researchers and industries have been highly endorsed in the medical field,predicting and controlling diverse diseases at specific intervals.Liver tumor prediction is a vital chore in analyzing and treating liver diseases.This paper proposes a novel approach for predicting liver tumors using Convolutional Neural Networks(CNN)and a depth-based variant search algorithm with advanced attention mechanisms(CNN-DS-AM).The proposed work aims to improve accuracy and robustness in diagnosing and treating liver diseases.The anticipated model is assessed on a Computed Tomography(CT)scan dataset containing both benign and malignant liver tumors.The proposed approach achieved high accuracy in predicting liver tumors,outperforming other state-of-the-art methods.Additionally,advanced attention mechanisms were incorporated into the CNN model to enable the identification and highlighting of regions of the CT scans most relevant to predicting liver tumors.The results suggest that incorporating attention mechanisms and a depth-based variant search algorithm into the CNN model is a promising approach for improving the accuracy and robustness of liver tumor prediction.It can assist radiologists in their diagnosis and treatment planning.The proposed system achieved a high accuracy of 95.5%in predicting liver tumors,outperforming other state-of-the-art methods.
文摘Neurodegeneration is the gradual deterioration and eventual death of brain cells,leading to progressive loss of structure and function of neurons in the brain and nervous system.Neurodegenerative disorders,such as Alzheimer’s,Huntington’s,Parkinson’s,amyotrophic lateral sclerosis,multiple system atrophy,and multiple sclerosis,are characterized by progressive deterioration of brain function,resulting in symptoms such as memory impairment,movement difficulties,and cognitive decline.Early diagnosis of these conditions is crucial to slowing down cell degeneration and reducing the severity of the diseases.Magnetic resonance imaging(MRI)is widely used by neurologists for diagnosing brain abnormalities.The majority of the research in this field focuses on processing the 2D images extracted from the 3D MRI volumetric scans for disease diagnosis.This might result in losing the volumetric information obtained from the whole brain MRI.To address this problem,a novel 3D-CNN architecture with an attention mechanism is proposed to classify whole-brain MRI images for Alzheimer’s disease(AD)detection.The 3D-CNN model uses channel and spatial attention mechanisms to extract relevant features and improve accuracy in identifying brain dysfunctions by focusing on specific regions of the brain.The pipeline takes pre-processed MRI volumetric scans as input,and the 3D-CNN model leverages both channel and spatial attention mechanisms to extract precise feature representations of the input MRI volume for accurate classification.The present study utilizes the publicly available Alzheimer’s disease Neuroimaging Initiative(ADNI)dataset,which has three image classes:Mild Cognitive Impairment(MCI),Cognitive Normal(CN),and AD affected.The proposed approach achieves an overall accuracy of 79%when classifying three classes and an average accuracy of 87%when identifying AD and the other two classes.The findings reveal that 3D-CNN models with an attention mechanism exhibit significantly higher classification performance compared to other models,highlighting the potential of deep learning algorithms to aid in the early detection and prediction of AD.
文摘Spam emails pose a threat to individuals. The proliferation of spam emails daily has rendered traditional machine learning and deep learning methods for screening them ineffective and inefficient. In our research, we employ deep neural networks like RNN, LSTM, and GRU, incorporating attention mechanisms such as Bahdanua, scaled dot product (SDP), and Luong scaled dot product self-attention for spam email filtering. We evaluate our approach on various datasets, including Trec spam, Enron spam emails, SMS spam collections, and the Ling spam dataset, which constitutes a substantial custom dataset. All these datasets are publicly available. For the Enron dataset, we attain an accuracy of 99.97% using LSTM with SDP self-attention. Our custom dataset exhibits the highest accuracy of 99.01% when employing GRU with SDP self-attention. The SMS spam collection dataset yields a peak accuracy of 99.61% with LSTM and SDP attention. Using the GRU (Gated Recurrent Unit) alongside Luong and SDP (Structured Self-Attention) attention mechanisms, the peak accuracy of 99.89% in the Ling spam dataset. For the Trec spam dataset, the most accurate results are achieved using Luong attention LSTM, with an accuracy rate of 99.01%. Our performance analyses consistently indicate that employing the scaled dot product attention mechanism in conjunction with gated recurrent neural networks (GRU) delivers the most effective results. In summary, our research underscores the efficacy of employing advanced deep learning techniques and attention mechanisms for spam email filtering, with remarkable accuracy across multiple datasets. This approach presents a promising solution to the ever-growing problem of spam emails.
基金supported by the National Key R&D Program of China[grant number 2021YFD1200804]the Strategic Priority Research Program of the Chinese Academy of Sciences[Precision Seed Design and Breeding,grant number XDA24010206].
文摘Dimensional data directly reflects the growth rate of individual fish,an important economic trait of interest to fish researchers.Efficiently obtaining large-scale fish dimension data would be valuable for both selective breeding and production.To address this,our study proposes a custom dimension measurement method for fish using the YOLOv5-keypoint framework with multi-attention mechanisms.We optimized the YOLOv5 framework,incorporated the SimAM attention mechanism to achieve more accurate and faster fish detection,and added customizable landmarks to the network structure,enabling flexible configuration of the number and location of feature points in the training dataset.This method is applicable to various aquacultural species and other objects.We tested the effectiveness of the method using the economically important grass carp(Ctenopharyngodon idella).The proposed method outperforms pure YOLOv5,Faster R-CNN,and SSD in terms of precision and recall rates,achieving an impressive average precision of 0.9781.Notably,field trials confirmed the method's exceptional measurement accuracy,exceeding 97%compatibility with manual measurements,while demonstrating a realtime speed of 38 frames per second on the NVIDIA RTX A4000.This enables efficient and accurate large-scale surface dimension measurements of economic fish.To facilitate massive measurements in agricultural research,we have implemented this method as an online platform,called Mode-recognition Ruler(MrRuler,http://bioinf o.ihb.ac.cn/mrruler).The platform identifies objects in a single image at an average speed of 0.486±0.005 s,based on a dataset of 10,000 images.MrRuler includes two preset carp models and allows users to upload training datasets for custom models of their targets of interest.
基金National Natural Science Foundation of China(62072392).
文摘Crowdsourcing technology is widely recognized for its effectiveness in task scheduling and resource allocation.While traditional methods for task allocation can help reduce costs and improve efficiency,they may encounter challenges when dealing with abnormal data flow nodes,leading to decreased allocation accuracy and efficiency.To address these issues,this study proposes a novel two-part invalid detection task allocation framework.In the first step,an anomaly detection model is developed using a dynamic self-attentive GAN to identify anomalous data.Compared to the baseline method,the model achieves an approximately 4%increase in the F1 value on the public dataset.In the second step of the framework,task allocation modeling is performed using a twopart graph matching method.This phase introduces a P-queue KM algorithm that implements a more efficient optimization strategy.The allocation efficiency is improved by approximately 23.83%compared to the baseline method.Empirical results confirm the effectiveness of the proposed framework in detecting abnormal data nodes,enhancing allocation precision,and achieving efficient allocation.
基金the Communication University of China(CUC230A013)the Fundamental Research Funds for the Central Universities.
文摘The advent of self-attention mechanisms within Transformer models has significantly propelled the advancement of deep learning algorithms,yielding outstanding achievements across diverse domains.Nonetheless,self-attention mechanisms falter when applied to datasets with intricate semantic content and extensive dependency structures.In response,this paper introduces a Diffusion Sampling and Label-Driven Co-attention Neural Network(DSLD),which adopts a diffusion sampling method to capture more comprehensive semantic information of the data.Additionally,themodel leverages the joint correlation information of labels and data to introduce the computation of text representation,correcting semantic representationbiases in thedata,andincreasing the accuracyof semantic representation.Ultimately,the model computes the corresponding classification results by synthesizing these rich data semantic representations.Experiments on seven benchmark datasets show that our proposed model achieves competitive results compared to state-of-the-art methods.
文摘How to use a few defect samples to complete the defect classification is a key challenge in the production of mobile phone screens.An attention-relation network for the mobile phone screen defect classification is proposed in this paper.The architecture of the attention-relation network contains two modules:a feature extract module and a feature metric module.Different from other few-shot models,an attention mechanism is applied to metric learning in our model to measure the distance between features,so as to pay attention to the correlation between features and suppress unwanted information.Besides,we combine dilated convolution and skip connection to extract more feature information for follow-up processing.We validate attention-relation network on the mobile phone screen defect dataset.The experimental results show that the classification accuracy of the attentionrelation network is 0.9486 under the 5-way 1-shot training strategy and 0.9039 under the 5-way 5-shot setting.It achieves the excellent effect of classification for mobile phone screen defects and outperforms with dominant advantages.
基金Science and Technology Research Project of the Henan Province(222102240014).
文摘Traditional feature-based image stitching techniques often encounter obstacles when dealing with images lackingunique attributes or suffering from quality degradation. The scarcity of annotated datasets in real-life scenesseverely undermines the reliability of supervised learning methods in image stitching. Furthermore, existing deeplearning architectures designed for image stitching are often too bulky to be deployed on mobile and peripheralcomputing devices. To address these challenges, this study proposes a novel unsupervised image stitching methodbased on the YOLOv8 (You Only Look Once version 8) framework that introduces deep homography networksand attentionmechanisms. Themethodology is partitioned into three distinct stages. The initial stage combines theattention mechanism with a pooling pyramid model to enhance the detection and recognition of compact objectsin images, the task of the deep homography networks module is to estimate the global homography of the inputimages consideringmultiple viewpoints. The second stage involves preliminary stitching of the masks generated inthe initial stage and further enhancement through weighted computation to eliminate common stitching artifacts.The final stage is characterized by adaptive reconstruction and careful refinement of the initial stitching results.Comprehensive experiments acrossmultiple datasets are executed tometiculously assess the proposed model. Ourmethod’s Peak Signal-to-Noise Ratio (PSNR) and Structure Similarity Index Measure (SSIM) improved by 10.6%and 6%. These experimental results confirm the efficacy and utility of the presented model in this paper.
基金Scientific Research Fund of Liaoning Provincial Education Department(No.JGLX2021030):Research on Vision-Based Intelligent Perception Technology for the Survival of Benthic Organisms.
文摘For underwater robots in the process of performing target detection tasks,the color distortion and the uneven quality of underwater images lead to great difficulties in the feature extraction process of the model,which is prone to issues like error detection,omission detection,and poor accuracy.Therefore,this paper proposed the CER-YOLOv7(CBAM-EIOU-RepVGG-YOLOv7)underwater target detection algorithm.To improve the algorithm’s capability to retain valid features from both spatial and channel perspectives during the feature extraction phase,we have added a Convolutional Block Attention Module(CBAM)to the backbone network.The Reparameterization Visual Geometry Group(RepVGG)module is inserted into the backbone to improve the training and inference capabilities.The Efficient Intersection over Union(EIoU)loss is also used as the localization loss function,which reduces the error detection rate and missed detection rate of the algorithm.The experimental results of the CER-YOLOv7 algorithm on the UPRC(Underwater Robot Prototype Competition)dataset show that the mAP(mean Average Precision)score of the algorithm is 86.1%,which is a 2.2%improvement compared to the YOLOv7.The feasibility and validity of the CER-YOLOv7 are proved through ablation and comparison experiments,and it is more suitable for underwater target detection.
基金support of the National Key Research and Development Program of China(2021YFB4000505).
文摘Fault detection and diagnosis(FDD)plays a significant role in ensuring the safety and stability of chemical processes.With the development of artificial intelligence(AI)and big data technologies,data-driven approaches with excellent performance are widely used for FDD in chemical processes.However,improved predictive accuracy has often been achieved through increased model complexity,which turns models into black-box methods and causes uncertainty regarding their decisions.In this study,a causal temporal graph attention network(CTGAN)is proposed for fault diagnosis of chemical processes.A chemical causal graph is built by causal inference to represent the propagation path of faults.The attention mechanism and chemical causal graph were combined to help us notice the key variables relating to fault fluctuations.Experiments in the Tennessee Eastman(TE)process and the green ammonia(GA)process showed that CTGAN achieved high performance and good explainability.
基金This research work is supported by Sichuan Science and Technology Program(Grant No.2022YFS0586)the National Key R&D Program of China(Grant No.2019YFC1509301)the National Natural Science Foundation of China(Grant No.61976046).
文摘Predicting the displacement of landslide is of utmost practical importance as the landslide can pose serious threats to both human life and property.However,traditional methods have the limitation of random selection in sliding window selection and seldom incorporate weather forecast data for displacement prediction,while a single structural model cannot handle input sequences of different lengths at the same time.In order to solve these limitations,in this study,a new approach is proposed that utilizes weather forecast data and incorporates the maximum information coefficient(MIC),long short-term memory network(LSTM),and attention mechanism to establish a teacher-student coupling model with parallel structure for short-term landslide displacement prediction.Through MIC,a suitable input sequence length is selected for the LSTM model.To investigate the influence of rainfall on landslides during different seasons,a parallel teacher-student coupling model is developed that is able to learn sequential information from various time series of different lengths.The teacher model learns sequence information from rainfall intensity time series while incorporating reliable short-term weather forecast data from platforms such as China Meteorological Administration(CMA)and Reliable Prognosis(https://rp5.ru)to improve the model’s expression capability,and the student model learns sequence information from other time series.An attention module is then designed to integrate different sequence information to derive a context vector,representing seasonal temporal attention mode.Finally,the predicted displacement is obtained through a linear layer.The proposed method demonstrates superior prediction accuracies,surpassing those of the support vector machine(SVM),LSTM,recurrent neural network(RNN),temporal convolutional network(TCN),and LSTM-Attention models.It achieves a mean absolute error(MAE)of 0.072 mm,root mean square error(RMSE)of 0.096 mm,and pearson correlation coefficients(PCCS)of 0.85.Additionally,it exhibits enhanced prediction stability and interpretability,rendering it an indispensable tool for landslide disaster prevention and mitigation.
基金Projects(42174170,41874145,72088101)supported by the National Natural Science Foundation of ChinaProject(CX20200228)supported by the Hunan Provincial Innovation Foundation for Postgraduate,China。
文摘Landfill leaks pose a serious threat to environmental health,risking the contamination of both groundwater and soil resources.Accurate investigation of these sites is essential for implementing effective prevention and control measures.The self-potential(SP)stands out for its sensitivity to contamination plumes,offering a solution for monitoring and detecting the movement and seepage of subsurface pollutants.However,traditional SP inversion techniques heavily rely on precise subsurface resistivity information.In this study,we propose the Attention U-Net deep learning network for rapid SP inversion.By incorporating an attention mechanism,this algorithm effectively learns the relationship between array-style SP data and the location and extent of subsurface contaminated sources.We designed a synthetic landfill model with a heterogeneous resistivity structure to assess the performance of Attention U-Net deep learning network.Additionally,we conducted further validation using a laboratory model to assess its practical applicability.The results demonstrate that the algorithm is not solely dependent on resistivity information,enabling effective locating of the source distribution,even in models with intricate subsurface structures.Our work provides a promising tool for SP data processing,enhancing the applicability of this method in the field of near-subsurface environmental monitoring.
基金The authors sincerely appreciate the valuable comments from the anonymous reviewers.The team of Jishunping from Wuhan University is acknowledged for supplying open-source remote sensing data.This research was supported by the Second Tibetan Plateau Scientific Expedition and Research Program(Grant No.2019QZKK0904)the National Natural Science Foundation of China(Grant No.U22A20597).
文摘Landslide disasters comprise the majority of geological incidents on slopes,posing severe threats to the safety of human lives and property while exerting a significant impact on the geological environment.The rapid identification of landslides is important for disaster prevention and control;however,currently,landslide identification relies mainly on the manual interpretation of remote sensing images.Manual interpretation and feature recognition methods are time-consuming,labor-intensive,and challenging when confronted with complex scenarios.Consequently,automatic landslide recognition has emerged as a pivotal avenue for future development.In this study,a dataset comprising 2000 landslide images was constructed using open-source remote sensing images and datasets.The YOLOv7 model was enhanced using data augmentation algorithms and attention mechanisms.Three optimization models were formulated to realize automatic landslide recognition.The findings demonstrate the commendable performance of the optimized model in automatic landslide recognition,achieving a peak accuracy of 95.92%.Subsequently,the optimized model was applied to regional landslide identification,co-seismic landslide identification,and landslide recognition at various scales,all of which showed robust recognition capabilities.Nevertheless,the model exhibits limitations in detecting small targets,indicating areas for refining the deep-learning algorithms.The results of this research offer valuable technical support for the swift identification,prevention,and mitigation of landslide disasters.
基金This work was funded by the Deanship of Graduate Studies and Scientific Research at Jouf University under Grant No.(DGSSR-2023-02-02178).
文摘The dominance of Android in the global mobile market and the open development characteristics of this platform have resulted in a significant increase in malware.These malicious applications have become a serious concern to the security of Android systems.To address this problem,researchers have proposed several machine-learning models to detect and classify Android malware based on analyzing features extracted from Android samples.However,most existing studies have focused on the classification task and overlooked the feature selection process,which is crucial to reduce the training time and maintain or improve the classification results.The current paper proposes a new Android malware detection and classification approach that identifies the most important features to improve classification performance and reduce training time.The proposed approach consists of two main steps.First,a feature selection method based on the Attention mechanism is used to select the most important features.Then,an optimized Light Gradient Boosting Machine(LightGBM)classifier is applied to classify the Android samples and identify the malware.The feature selection method proposed in this paper is to integrate an Attention layer into a multilayer perceptron neural network.The role of the Attention layer is to compute the weighted values of each feature based on its importance for the classification process.Experimental evaluation of the approach has shown that combining the Attention-based technique with an optimized classification algorithm for Android malware detection has improved the accuracy from 98.64%to 98.71%while reducing the training time from 80 to 28 s.
基金Supported by the fund of the Henan Province Science and Technology Research Project(No.242102210213).
文摘Addressing the challenges in detecting surface floating litter in artificial lakes,including complex environments,uneven illumination,and susceptibility to noise andweather,this paper proposes an efficient and lightweight Ghost-YOLO(You Only Look Once)v8 algorithm.The algorithmintegrates advanced attention mechanisms and a smalltarget detection head to significantly enhance detection performance and efficiency.Firstly,an SE(Squeeze-and-Excitation)mechanism is incorporated into the backbone network to fortify the extraction of resilient features and precise target localization.This mechanism models feature channel dependencies,enabling adaptive adjustment of channel importance,thereby improving recognition of floating litter targets.Secondly,a 160×160 small-target detection layer is designed in the feature fusion neck to mitigate semantic information loss due to varying target scales.This design enhances the fusion of deep and shallow semantic information,improving small target feature representation and enabling better capture and identification of tiny floating litter.Thirdly,to balance performance and efficiency,the GhostConv module replaces part of the conventional convolutions in the feature fusion neck.Additionally,a novel C2fGhost(CSPDarknet53 to 2-Stage Feature Pyramid Networks Ghost)module is introduced to further reduce network parameters.Lastly,to address the challenge of occlusion,a newloss function,WIoU(Wise Intersection over Union)v3 incorporating a flexible and non-monotonic concentration approach,is adopted to improve detection rates for surface floating litter.The outcomes of the experiments demonstrate that the Ghost-YOLO v8 model proposed in this paper performs well in the dataset Marine,significantly enhances precision and recall by 3.3 and 7.6 percentage points,respectively,in contrast with the base model,mAP@0.5 and mAP 0.5:0.95 improve by 5.3 and 4.4 percentage points and reduces the computational volume by 1.88MB,the FPS value hardly decreases,and the efficient real-time identification of floating debris on the water’s surface can be achieved costeffectively.
基金supported by the National Key R&D Program of China(Grant Number 2021YFB2700900)the National Natural Science Foundation of China(Grant Numbers 62172232,62172233)the Jiangsu Basic Research Program Natural Science Foundation(Grant Number BK20200039).
文摘Recently,deep image-hiding techniques have attracted considerable attention in covert communication and high-capacity information hiding.However,these approaches have some limitations.For example,a cover image lacks self-adaptability,information leakage,or weak concealment.To address these issues,this study proposes a universal and adaptable image-hiding method.First,a domain attention mechanism is designed by combining the Atrous convolution,which makes better use of the relationship between the secret image domain and the cover image domain.Second,to improve perceived human similarity,perceptual loss is incorporated into the training process.The experimental results are promising,with the proposed method achieving an average pixel discrepancy(APD)of 1.83 and a peak signal-to-noise ratio(PSNR)value of 40.72 dB between the cover and stego images,indicative of its high-quality output.Furthermore,the structural similarity index measure(SSIM)reaches 0.985 while the learned perceptual image patch similarity(LPIPS)remarkably registers at 0.0001.Moreover,self-testing and cross-experiments demonstrate the model’s adaptability and generalization in unknown hidden spaces,making it suitable for diverse computer vision tasks.
文摘With the rapid development of electric power systems,load estimation plays an important role in system operation and planning.Usually,load estimation techniques contain traditional,time series,regression analysis-based,and machine learning-based estimation.Since the machine learning-based method can lead to better performance,in this paper,a deep learning-based load estimation algorithm using image fingerprint and attention mechanism is proposed.First,an image fingerprint construction is proposed for training data.After the data preprocessing,the training data matrix is constructed by the cyclic shift and cubic spline interpolation.Then,the linear mapping and the gray-color transformation method are proposed to form the color image fingerprint.Second,a convolutional neural network(CNN)combined with an attentionmechanism is proposed for training performance improvement.At last,an experiment is carried out to evaluate the estimation performance.Compared with the support vector machine method,CNN method and long short-term memory method,the proposed algorithm has the best load estimation performance.
基金supported in part by the Research on the Application of Multimodal Artificial Intelligence in Diagnosis and Treatment of Type 2 Diabetes under Grant No.2020SK50910in part by the Hunan Provincial Natural Science Foundation of China under Grant 2023JJ60020.
文摘Early screening of diabetes retinopathy(DR)plays an important role in preventing irreversible blindness.Existing research has failed to fully explore effective DR lesion information in fundus maps.Besides,traditional attention schemes have not considered the impact of lesion type differences on grading,resulting in unreasonable extraction of important lesion features.Therefore,this paper proposes a DR diagnosis scheme that integrates a multi-level patch attention generator(MPAG)and a lesion localization module(LLM).Firstly,MPAGis used to predict patches of different sizes and generate a weighted attention map based on the prediction score and the types of lesions contained in the patches,fully considering the impact of lesion type differences on grading,solving the problem that the attention maps of lesions cannot be further refined and then adapted to the final DR diagnosis task.Secondly,the LLM generates a global attention map based on localization.Finally,the weighted attention map and global attention map are weighted with the fundus map to fully explore effective DR lesion information and increase the attention of the classification network to lesion details.This paper demonstrates the effectiveness of the proposed method through extensive experiments on the public DDR dataset,obtaining an accuracy of 0.8064.
基金The support of this research was by Hubei Provincial Natural Science Foundation(2022CFB449)Science Research Foundation of Education Department of Hubei Province(B2020061),are gratefully acknowledged.
文摘The task of food image recognition,a nuanced subset of fine-grained image recognition,grapples with substantial intra-class variation and minimal inter-class differences.These challenges are compounded by the irregular and multi-scale nature of food images.Addressing these complexities,our study introduces an advanced model that leverages multiple attention mechanisms and multi-stage local fusion,grounded in the ConvNeXt architecture.Our model employs hybrid attention(HA)mechanisms to pinpoint critical discriminative regions within images,substantially mitigating the influence of background noise.Furthermore,it introduces a multi-stage local fusion(MSLF)module,fostering long-distance dependencies between feature maps at varying stages.This approach facilitates the assimilation of complementary features across scales,significantly bolstering the model’s capacity for feature extraction.Furthermore,we constructed a dataset named Roushi60,which consists of 60 different categories of common meat dishes.Empirical evaluation of the ETH Food-101,ChineseFoodNet,and Roushi60 datasets reveals that our model achieves recognition accuracies of 91.12%,82.86%,and 92.50%,respectively.These figures not only mark an improvement of 1.04%,3.42%,and 1.36%over the foundational ConvNeXt network but also surpass the performance of most contemporary food image recognition methods.Such advancements underscore the efficacy of our proposed model in navigating the intricate landscape of food image recognition,setting a new benchmark for the field.