The satellite-terrestrial networks possess the ability to transcend geographical constraints inherent in traditional communication networks,enabling global coverage and offering users ubiquitous computing power suppor...The satellite-terrestrial networks possess the ability to transcend geographical constraints inherent in traditional communication networks,enabling global coverage and offering users ubiquitous computing power support,which is an important development direction of future communications.In this paper,we take into account a multi-scenario network model under the coverage of low earth orbit(LEO)satellite,which can provide computing resources to users in faraway areas to improve task processing efficiency.However,LEO satellites experience limitations in computing and communication resources and the channels are time-varying and complex,which makes the extraction of state information a daunting task.Therefore,we explore the dynamic resource management issue pertaining to joint computing,communication resource allocation and power control for multi-access edge computing(MEC).In order to tackle this formidable issue,we undertake the task of transforming the issue into a Markov decision process(MDP)problem and propose the self-attention based dynamic resource management(SABDRM)algorithm,which effectively extracts state information features to enhance the training process.Simulation results show that the proposed algorithm is capable of effectively reducing the long-term average delay and energy consumption of the tasks.展开更多
The frequent missing values in radar-derived time-series tracks of aerial targets(RTT-AT)lead to significant challenges in subsequent data-driven tasks.However,the majority of imputation research focuses on random mis...The frequent missing values in radar-derived time-series tracks of aerial targets(RTT-AT)lead to significant challenges in subsequent data-driven tasks.However,the majority of imputation research focuses on random missing(RM)that differs significantly from common missing patterns of RTT-AT.The method for solving the RM may experience performance degradation or failure when applied to RTT-AT imputation.Conventional autoregressive deep learning methods are prone to error accumulation and long-term dependency loss.In this paper,a non-autoregressive imputation model that addresses the issue of missing value imputation for two common missing patterns in RTT-AT is proposed.Our model consists of two probabilistic sparse diagonal masking self-attention(PSDMSA)units and a weight fusion unit.It learns missing values by combining the representations outputted by the two units,aiming to minimize the difference between the missing values and their actual values.The PSDMSA units effectively capture temporal dependencies and attribute correlations between time steps,improving imputation quality.The weight fusion unit automatically updates the weights of the output representations from the two units to obtain a more accurate final representation.The experimental results indicate that,despite varying missing rates in the two missing patterns,our model consistently outperforms other methods in imputation performance and exhibits a low frequency of deviations in estimates for specific missing entries.Compared to the state-of-the-art autoregressive deep learning imputation model Bidirectional Recurrent Imputation for Time Series(BRITS),our proposed model reduces mean absolute error(MAE)by 31%~50%.Additionally,the model attains a training speed that is 4 to 8 times faster when compared to both BRITS and a standard Transformer model when trained on the same dataset.Finally,the findings from the ablation experiments demonstrate that the PSDMSA,the weight fusion unit,cascade network design,and imputation loss enhance imputation performance and confirm the efficacy of our design.展开更多
Aerial threat assessment is a crucial link in modern air combat, whose result counts a great deal for commanders to make decisions. With the consideration that the existing threat assessment methods have difficulties ...Aerial threat assessment is a crucial link in modern air combat, whose result counts a great deal for commanders to make decisions. With the consideration that the existing threat assessment methods have difficulties in dealing with high dimensional time series target data, a threat assessment method based on self-attention mechanism and gated recurrent unit(SAGRU) is proposed. Firstly, a threat feature system including air combat situations and capability features is established. Moreover, a data augmentation process based on fractional Fourier transform(FRFT) is applied to extract more valuable information from time series situation features. Furthermore, aiming to capture key characteristics of battlefield evolution, a bidirectional GRU and SA mechanisms are designed for enhanced features.Subsequently, after the concatenation of the processed air combat situation and capability features, the target threat level will be predicted by fully connected neural layers and the softmax classifier. Finally, in order to validate this model, an air combat dataset generated by a combat simulation system is introduced for model training and testing. The comparison experiments show the proposed model has structural rationality and can perform threat assessment faster and more accurately than the other existing models based on deep learning.展开更多
Traditional models for semantic segmentation in point clouds primarily focus on smaller scales.However,in real-world applications,point clouds often exhibit larger scales,leading to heavy computational and memory requ...Traditional models for semantic segmentation in point clouds primarily focus on smaller scales.However,in real-world applications,point clouds often exhibit larger scales,leading to heavy computational and memory requirements.The key to handling large-scale point clouds lies in leveraging random sampling,which offers higher computational efficiency and lower memory consumption compared to other sampling methods.Nevertheless,the use of random sampling can potentially result in the loss of crucial points during the encoding stage.To address these issues,this paper proposes cross-fusion self-attention network(CFSA-Net),a lightweight and efficient network architecture specifically designed for directly processing large-scale point clouds.At the core of this network is the incorporation of random sampling alongside a local feature extraction module based on cross-fusion self-attention(CFSA).This module effectively integrates long-range contextual dependencies between points by employing hierarchical position encoding(HPC).Furthermore,it enhances the interaction between each point’s coordinates and feature information through cross-fusion self-attention pooling,enabling the acquisition of more comprehensive geometric information.Finally,a residual optimization(RO)structure is introduced to extend the receptive field of individual points by stacking hierarchical position encoding and cross-fusion self-attention pooling,thereby reducing the impact of information loss caused by random sampling.Experimental results on the Stanford Large-Scale 3D Indoor Spaces(S3DIS),Semantic3D,and SemanticKITTI datasets demonstrate the superiority of this algorithm over advanced approaches such as RandLA-Net and KPConv.These findings underscore the excellent performance of CFSA-Net in large-scale 3D semantic segmentation.展开更多
Due to the lack of long-range association and spatial location information,fine details and accurate boundaries of complex clothing images cannot always be obtained by using the existing deep learning-based methods.Th...Due to the lack of long-range association and spatial location information,fine details and accurate boundaries of complex clothing images cannot always be obtained by using the existing deep learning-based methods.This paper presents a convolutional structure with multi-scale fusion to optimize the step of clothing feature extraction and a self-attention module to capture long-range association information.The structure enables the self-attention mechanism to directly participate in the process of information exchange through the down-scaling projection operation of the multi-scale framework.In addition,the improved self-attention module introduces the extraction of 2-dimensional relative position information to make up for its lack of ability to extract spatial position features from clothing images.The experimental results based on the colorful fashion parsing dataset(CFPD)show that the proposed network structure achieves 53.68%mean intersection over union(mIoU)and has better performance on the clothing parsing task.展开更多
Background A crucial element of human-machine interaction,the automatic detection of emotional states from human speech has long been regarded as a challenging task for machine learning models.One vital challenge in s...Background A crucial element of human-machine interaction,the automatic detection of emotional states from human speech has long been regarded as a challenging task for machine learning models.One vital challenge in speech emotion recognition(SER)is learning robust and discriminative representations from speech.Although machine learning methods have been widely applied in SER research,the inadequate amount of available annotated data has become a bottleneck impeding the extended application of such techniques(e.g.,deep neural networks).To address this issue,we present a deep learning method that combines knowledge transfer and self-attention for SER tasks.Herein,we apply the log-Mel spectrogram with deltas and delta-deltas as inputs.Moreover,given that emotions are time dependent,we apply temporal convolutional neural networks to model the variations in emotions.We further introduce an attention transfer mechanism,which is based on a self-attention algorithm to learn long-term dependencies.The self-attention transfer network(SATN)in our proposed approach takes advantage of attention transfer to learn attention from speech recognition,followed by transferring this knowledge into SER.An evaluation built on Interactive Emotional Dyadic Motion Capture(IEMOCAP)dataset demonstrates the effectiveness of the proposed model.展开更多
Keyphrase greatly provides summarized and valuable information.This information can help us not only understand text semantics,but also organize and retrieve text content effectively.The task of automatically generati...Keyphrase greatly provides summarized and valuable information.This information can help us not only understand text semantics,but also organize and retrieve text content effectively.The task of automatically generating it has received considerable attention in recent decades.From the previous studies,we can see many workable solutions for obtaining keyphrases.One method is to divide the content to be summarized into multiple blocks of text,then we rank and select the most important content.The disadvantage of this method is that it cannot identify keyphrase that does not include in the text,let alone get the real semantic meaning hidden in the text.Another approach uses recurrent neural networks to generate keyphrases from the semantic aspects of the text,but the inherently sequential nature precludes parallelization within training examples,and distances have limitations on context dependencies.Previous works have demonstrated the benefits of the self-attention mechanism,which can learn global text dependency features and can be parallelized.Inspired by the above observation,we propose a keyphrase generation model,which is based entirely on the self-attention mechanism.It is an encoder-decoder model that can make up the above disadvantage effectively.In addition,we also consider the semantic similarity between keyphrases,and add semantic similarity processing module into the model.This proposed model,which is demonstrated by empirical analysis on five datasets,can achieve competitive performance compared to baseline methods.展开更多
On Twitter,people often use hashtags to mark the subject of a tweet.Tweets have specific themes or content that are easy for people to manage.With the increase in the number of tweets,how to automatically recommend ha...On Twitter,people often use hashtags to mark the subject of a tweet.Tweets have specific themes or content that are easy for people to manage.With the increase in the number of tweets,how to automatically recommend hashtags for tweets has received wide attention.The previous hashtag recommendation methods were to convert the task into a multi-class classification problem.However,these methods can only recommend hashtags that appeared in historical information,and cannot recommend the new ones.In this work,we extend the self-attention mechanism to turn the hashtag recommendation task into a sequence labeling task.To train and evaluate the proposed method,we used the real tweet data which is collected from Twitter.Experimental results show that the proposed method can be significantly better than the most advanced method.Compared with the state-of-the-art methods,the accuracy of our method has been increased 4%.展开更多
Infrared image recognition plays an important role in the inspection of power equipment.Existing technologies dedicated to this purpose often require manually selected features,which are not transferable and interpret...Infrared image recognition plays an important role in the inspection of power equipment.Existing technologies dedicated to this purpose often require manually selected features,which are not transferable and interpretable,and have limited training data.To address these limitations,this paper proposes an automatic infrared image recognition framework,which includes an object recognition module based on a deep self-attention network and a temperature distribution identification module based on a multi-factor similarity calculation.First,the features of an input image are extracted and embedded using a multi-head attention encoding-decoding mechanism.Thereafter,the embedded features are used to predict the equipment component category and location.In the located area,preliminary segmentation is performed.Finally,similar areas are gradually merged,and the temperature distribution of the equipment is obtained to identify a fault.Our experiments indicate that the proposed method demonstrates significantly improved accuracy compared with other related methods and,hence,provides a good reference for the automation of power equipment inspection.展开更多
Relation extraction is an important task in NLP community.However,some models often fail in capturing Long-distance dependence on semantics,and the interaction between semantics of two entities is ignored.In this pape...Relation extraction is an important task in NLP community.However,some models often fail in capturing Long-distance dependence on semantics,and the interaction between semantics of two entities is ignored.In this paper,we propose a novel neural network model for semantic relation classification called joint self-attention bi-LSTM(SA-Bi-LSTM)to model the internal structure of the sentence to obtain the importance of each word of the sentence without relying on additional information,and capture Long-distance dependence on semantics.We conduct experiments using the SemEval-2010 Task 8 dataset.Extensive experiments and the results demonstrated that the proposed method is effective against relation classification,which can obtain state-ofthe-art classification accuracy just with minimal feature engineering.展开更多
Visual Question Answering(VQA)has attracted extensive research focus and has become a hot topic in deep learning recently.The development of computer vision and natural language processing technology has contributed t...Visual Question Answering(VQA)has attracted extensive research focus and has become a hot topic in deep learning recently.The development of computer vision and natural language processing technology has contributed to the advancement of this research area.Key solutions to improve the performance of VQA system exist in feature extraction,multimodal fusion,and answer prediction modules.There exists an unsolved issue in the popular VQA image feature extraction module that extracts the fine-grained features from objects of different scale difficultly.In this paper,a novel feature extraction network that combines multi-scale convolution and self-attention branches to solve the above problem is designed.Our approach achieves the state-of-the-art performance of a single model on Pascal VOC 2012,VQA 1.0,and VQA 2.0 datasets.展开更多
Traditional based deep learning intrusion detection methods face problems such as insufficient cloud storage,data privacy leaks,high com-munication costs,unsatisfactory detection rates,and false positive rate.To addre...Traditional based deep learning intrusion detection methods face problems such as insufficient cloud storage,data privacy leaks,high com-munication costs,unsatisfactory detection rates,and false positive rate.To address existing issues in intrusion detection,this paper presents a novel approach called CS-FL,which combines Federated Learning and a Self-Attention Fusion Convolutional Neural Network.Federated Learning is a new distributed computing model that enables individual training of client data without uploading local data to a central server.at the same time,local training results are uploaded and integrated across all participating clients to produce a global model.The sharing model reduces communication costs,protects data privacy,and solves problems such as insufficient cloud storage and“data islands”for each client.In the proposed method,a hybrid model is formed by integrating the self-Attention and similar parts of the Convolutional Neural Network in the local data processing.This approach not only enhances the performance of the hybrid model but also reduces computational overhead compared to pure hybrid neural networks.Results from experiments on the NSL-KDD dataset show that the proposed method outperforms other intrusion detection techniques,resulting in a significant improvement in performance.This demonstrates the effectiveness of the proposed approach in improving intrusion detection accuracy.展开更多
基金supported by the National Key Research and Development Plan(No.2022YFB2902701)the key Natural Science Foundation of Shenzhen(No.JCYJ20220818102209020).
文摘The satellite-terrestrial networks possess the ability to transcend geographical constraints inherent in traditional communication networks,enabling global coverage and offering users ubiquitous computing power support,which is an important development direction of future communications.In this paper,we take into account a multi-scenario network model under the coverage of low earth orbit(LEO)satellite,which can provide computing resources to users in faraway areas to improve task processing efficiency.However,LEO satellites experience limitations in computing and communication resources and the channels are time-varying and complex,which makes the extraction of state information a daunting task.Therefore,we explore the dynamic resource management issue pertaining to joint computing,communication resource allocation and power control for multi-access edge computing(MEC).In order to tackle this formidable issue,we undertake the task of transforming the issue into a Markov decision process(MDP)problem and propose the self-attention based dynamic resource management(SABDRM)algorithm,which effectively extracts state information features to enhance the training process.Simulation results show that the proposed algorithm is capable of effectively reducing the long-term average delay and energy consumption of the tasks.
基金supported by Graduate Funded Project(No.JY2022A017).
文摘The frequent missing values in radar-derived time-series tracks of aerial targets(RTT-AT)lead to significant challenges in subsequent data-driven tasks.However,the majority of imputation research focuses on random missing(RM)that differs significantly from common missing patterns of RTT-AT.The method for solving the RM may experience performance degradation or failure when applied to RTT-AT imputation.Conventional autoregressive deep learning methods are prone to error accumulation and long-term dependency loss.In this paper,a non-autoregressive imputation model that addresses the issue of missing value imputation for two common missing patterns in RTT-AT is proposed.Our model consists of two probabilistic sparse diagonal masking self-attention(PSDMSA)units and a weight fusion unit.It learns missing values by combining the representations outputted by the two units,aiming to minimize the difference between the missing values and their actual values.The PSDMSA units effectively capture temporal dependencies and attribute correlations between time steps,improving imputation quality.The weight fusion unit automatically updates the weights of the output representations from the two units to obtain a more accurate final representation.The experimental results indicate that,despite varying missing rates in the two missing patterns,our model consistently outperforms other methods in imputation performance and exhibits a low frequency of deviations in estimates for specific missing entries.Compared to the state-of-the-art autoregressive deep learning imputation model Bidirectional Recurrent Imputation for Time Series(BRITS),our proposed model reduces mean absolute error(MAE)by 31%~50%.Additionally,the model attains a training speed that is 4 to 8 times faster when compared to both BRITS and a standard Transformer model when trained on the same dataset.Finally,the findings from the ablation experiments demonstrate that the PSDMSA,the weight fusion unit,cascade network design,and imputation loss enhance imputation performance and confirm the efficacy of our design.
基金supported by the National Natural Science Foundation of China (6202201562088101)+1 种基金Shanghai Municipal Science and Technology Major Project (2021SHZDZX0100)Shanghai Municip al Commission of Science and Technology Project (19511132101)。
文摘Aerial threat assessment is a crucial link in modern air combat, whose result counts a great deal for commanders to make decisions. With the consideration that the existing threat assessment methods have difficulties in dealing with high dimensional time series target data, a threat assessment method based on self-attention mechanism and gated recurrent unit(SAGRU) is proposed. Firstly, a threat feature system including air combat situations and capability features is established. Moreover, a data augmentation process based on fractional Fourier transform(FRFT) is applied to extract more valuable information from time series situation features. Furthermore, aiming to capture key characteristics of battlefield evolution, a bidirectional GRU and SA mechanisms are designed for enhanced features.Subsequently, after the concatenation of the processed air combat situation and capability features, the target threat level will be predicted by fully connected neural layers and the softmax classifier. Finally, in order to validate this model, an air combat dataset generated by a combat simulation system is introduced for model training and testing. The comparison experiments show the proposed model has structural rationality and can perform threat assessment faster and more accurately than the other existing models based on deep learning.
基金funded by the National Natural Science Foundation of China Youth Project(61603127).
文摘Traditional models for semantic segmentation in point clouds primarily focus on smaller scales.However,in real-world applications,point clouds often exhibit larger scales,leading to heavy computational and memory requirements.The key to handling large-scale point clouds lies in leveraging random sampling,which offers higher computational efficiency and lower memory consumption compared to other sampling methods.Nevertheless,the use of random sampling can potentially result in the loss of crucial points during the encoding stage.To address these issues,this paper proposes cross-fusion self-attention network(CFSA-Net),a lightweight and efficient network architecture specifically designed for directly processing large-scale point clouds.At the core of this network is the incorporation of random sampling alongside a local feature extraction module based on cross-fusion self-attention(CFSA).This module effectively integrates long-range contextual dependencies between points by employing hierarchical position encoding(HPC).Furthermore,it enhances the interaction between each point’s coordinates and feature information through cross-fusion self-attention pooling,enabling the acquisition of more comprehensive geometric information.Finally,a residual optimization(RO)structure is introduced to extend the receptive field of individual points by stacking hierarchical position encoding and cross-fusion self-attention pooling,thereby reducing the impact of information loss caused by random sampling.Experimental results on the Stanford Large-Scale 3D Indoor Spaces(S3DIS),Semantic3D,and SemanticKITTI datasets demonstrate the superiority of this algorithm over advanced approaches such as RandLA-Net and KPConv.These findings underscore the excellent performance of CFSA-Net in large-scale 3D semantic segmentation.
文摘Due to the lack of long-range association and spatial location information,fine details and accurate boundaries of complex clothing images cannot always be obtained by using the existing deep learning-based methods.This paper presents a convolutional structure with multi-scale fusion to optimize the step of clothing feature extraction and a self-attention module to capture long-range association information.The structure enables the self-attention mechanism to directly participate in the process of information exchange through the down-scaling projection operation of the multi-scale framework.In addition,the improved self-attention module introduces the extraction of 2-dimensional relative position information to make up for its lack of ability to extract spatial position features from clothing images.The experimental results based on the colorful fashion parsing dataset(CFPD)show that the proposed network structure achieves 53.68%mean intersection over union(mIoU)and has better performance on the clothing parsing task.
文摘随着手机短信成为人们日常生活交往的重要手段,垃圾短信的识别具有重要的现实意义.针对此提出一种结合TFIDF的self-attention-based Bi-LSTM的神经网络模型.该模型首先将短信文本以词向量的方式输入到Bi-LSTM层,经过特征提取并结合TFIDF和self-attention层的信息聚焦获得最后的特征向量,最后将特征向量通过Softmax分类器进行分类得到短信文本分类结果.实验结果表明,结合TFIDF的self-attention-based Bi-LSTM模型相比于传统分类模型的短信文本识别准确率提高了2.1%–4.6%,运行时间减少了0.6 s–10.2 s.
基金the National Natural Science Foundation of China(62071330)the National Science Fund for Distinguished Young Scholars(61425017)+3 种基金the Key Program of the National Natural Science Foundation(61831022)the Key Program of the Natural Science Foundation of Tianjin(18JCZDJC36300)the Open Projects Program of the National Laboratory of Pattern Recognition and the Senior Visiting Scholar Program of Tianjin Normal Universitythe Innovative Medicines Initiative 2 Joint Undertaking(115902),which receives support from the European Union's Horizon 2020 research and innovation program and EFPIA.
文摘Background A crucial element of human-machine interaction,the automatic detection of emotional states from human speech has long been regarded as a challenging task for machine learning models.One vital challenge in speech emotion recognition(SER)is learning robust and discriminative representations from speech.Although machine learning methods have been widely applied in SER research,the inadequate amount of available annotated data has become a bottleneck impeding the extended application of such techniques(e.g.,deep neural networks).To address this issue,we present a deep learning method that combines knowledge transfer and self-attention for SER tasks.Herein,we apply the log-Mel spectrogram with deltas and delta-deltas as inputs.Moreover,given that emotions are time dependent,we apply temporal convolutional neural networks to model the variations in emotions.We further introduce an attention transfer mechanism,which is based on a self-attention algorithm to learn long-term dependencies.The self-attention transfer network(SATN)in our proposed approach takes advantage of attention transfer to learn attention from speech recognition,followed by transferring this knowledge into SER.An evaluation built on Interactive Emotional Dyadic Motion Capture(IEMOCAP)dataset demonstrates the effectiveness of the proposed model.
文摘Keyphrase greatly provides summarized and valuable information.This information can help us not only understand text semantics,but also organize and retrieve text content effectively.The task of automatically generating it has received considerable attention in recent decades.From the previous studies,we can see many workable solutions for obtaining keyphrases.One method is to divide the content to be summarized into multiple blocks of text,then we rank and select the most important content.The disadvantage of this method is that it cannot identify keyphrase that does not include in the text,let alone get the real semantic meaning hidden in the text.Another approach uses recurrent neural networks to generate keyphrases from the semantic aspects of the text,but the inherently sequential nature precludes parallelization within training examples,and distances have limitations on context dependencies.Previous works have demonstrated the benefits of the self-attention mechanism,which can learn global text dependency features and can be parallelized.Inspired by the above observation,we propose a keyphrase generation model,which is based entirely on the self-attention mechanism.It is an encoder-decoder model that can make up the above disadvantage effectively.In addition,we also consider the semantic similarity between keyphrases,and add semantic similarity processing module into the model.This proposed model,which is demonstrated by empirical analysis on five datasets,can achieve competitive performance compared to baseline methods.
文摘On Twitter,people often use hashtags to mark the subject of a tweet.Tweets have specific themes or content that are easy for people to manage.With the increase in the number of tweets,how to automatically recommend hashtags for tweets has received wide attention.The previous hashtag recommendation methods were to convert the task into a multi-class classification problem.However,these methods can only recommend hashtags that appeared in historical information,and cannot recommend the new ones.In this work,we extend the self-attention mechanism to turn the hashtag recommendation task into a sequence labeling task.To train and evaluate the proposed method,we used the real tweet data which is collected from Twitter.Experimental results show that the proposed method can be significantly better than the most advanced method.Compared with the state-of-the-art methods,the accuracy of our method has been increased 4%.
基金This work was supported by National Key R&D Program of China(2019YFE0102900).
文摘Infrared image recognition plays an important role in the inspection of power equipment.Existing technologies dedicated to this purpose often require manually selected features,which are not transferable and interpretable,and have limited training data.To address these limitations,this paper proposes an automatic infrared image recognition framework,which includes an object recognition module based on a deep self-attention network and a temperature distribution identification module based on a multi-factor similarity calculation.First,the features of an input image are extracted and embedded using a multi-head attention encoding-decoding mechanism.Thereafter,the embedded features are used to predict the equipment component category and location.In the located area,preliminary segmentation is performed.Finally,similar areas are gradually merged,and the temperature distribution of the equipment is obtained to identify a fault.Our experiments indicate that the proposed method demonstrates significantly improved accuracy compared with other related methods and,hence,provides a good reference for the automation of power equipment inspection.
文摘Relation extraction is an important task in NLP community.However,some models often fail in capturing Long-distance dependence on semantics,and the interaction between semantics of two entities is ignored.In this paper,we propose a novel neural network model for semantic relation classification called joint self-attention bi-LSTM(SA-Bi-LSTM)to model the internal structure of the sentence to obtain the importance of each word of the sentence without relying on additional information,and capture Long-distance dependence on semantics.We conduct experiments using the SemEval-2010 Task 8 dataset.Extensive experiments and the results demonstrated that the proposed method is effective against relation classification,which can obtain state-ofthe-art classification accuracy just with minimal feature engineering.
基金This work is supported by the National Natural Science Foundation of China(61872231,61701297).
文摘Visual Question Answering(VQA)has attracted extensive research focus and has become a hot topic in deep learning recently.The development of computer vision and natural language processing technology has contributed to the advancement of this research area.Key solutions to improve the performance of VQA system exist in feature extraction,multimodal fusion,and answer prediction modules.There exists an unsolved issue in the popular VQA image feature extraction module that extracts the fine-grained features from objects of different scale difficultly.In this paper,a novel feature extraction network that combines multi-scale convolution and self-attention branches to solve the above problem is designed.Our approach achieves the state-of-the-art performance of a single model on Pascal VOC 2012,VQA 1.0,and VQA 2.0 datasets.
基金sponsored by the National Natural Science Foundation of China under Grants 62271264,61972207,and 42175194the Project through the Priority Academic Program Development(PAPD)of Jiangsu Higher Education Institution.
文摘Traditional based deep learning intrusion detection methods face problems such as insufficient cloud storage,data privacy leaks,high com-munication costs,unsatisfactory detection rates,and false positive rate.To address existing issues in intrusion detection,this paper presents a novel approach called CS-FL,which combines Federated Learning and a Self-Attention Fusion Convolutional Neural Network.Federated Learning is a new distributed computing model that enables individual training of client data without uploading local data to a central server.at the same time,local training results are uploaded and integrated across all participating clients to produce a global model.The sharing model reduces communication costs,protects data privacy,and solves problems such as insufficient cloud storage and“data islands”for each client.In the proposed method,a hybrid model is formed by integrating the self-Attention and similar parts of the Convolutional Neural Network in the local data processing.This approach not only enhances the performance of the hybrid model but also reduces computational overhead compared to pure hybrid neural networks.Results from experiments on the NSL-KDD dataset show that the proposed method outperforms other intrusion detection techniques,resulting in a significant improvement in performance.This demonstrates the effectiveness of the proposed approach in improving intrusion detection accuracy.