Due to the lack of long-range association and spatial location information,fine details and accurate boundaries of complex clothing images cannot always be obtained by using the existing deep learning-based methods.Th...Due to the lack of long-range association and spatial location information,fine details and accurate boundaries of complex clothing images cannot always be obtained by using the existing deep learning-based methods.This paper presents a convolutional structure with multi-scale fusion to optimize the step of clothing feature extraction and a self-attention module to capture long-range association information.The structure enables the self-attention mechanism to directly participate in the process of information exchange through the down-scaling projection operation of the multi-scale framework.In addition,the improved self-attention module introduces the extraction of 2-dimensional relative position information to make up for its lack of ability to extract spatial position features from clothing images.The experimental results based on the colorful fashion parsing dataset(CFPD)show that the proposed network structure achieves 53.68%mean intersection over union(mIoU)and has better performance on the clothing parsing task.展开更多
Keyphrase greatly provides summarized and valuable information.This information can help us not only understand text semantics,but also organize and retrieve text content effectively.The task of automatically generati...Keyphrase greatly provides summarized and valuable information.This information can help us not only understand text semantics,but also organize and retrieve text content effectively.The task of automatically generating it has received considerable attention in recent decades.From the previous studies,we can see many workable solutions for obtaining keyphrases.One method is to divide the content to be summarized into multiple blocks of text,then we rank and select the most important content.The disadvantage of this method is that it cannot identify keyphrase that does not include in the text,let alone get the real semantic meaning hidden in the text.Another approach uses recurrent neural networks to generate keyphrases from the semantic aspects of the text,but the inherently sequential nature precludes parallelization within training examples,and distances have limitations on context dependencies.Previous works have demonstrated the benefits of the self-attention mechanism,which can learn global text dependency features and can be parallelized.Inspired by the above observation,we propose a keyphrase generation model,which is based entirely on the self-attention mechanism.It is an encoder-decoder model that can make up the above disadvantage effectively.In addition,we also consider the semantic similarity between keyphrases,and add semantic similarity processing module into the model.This proposed model,which is demonstrated by empirical analysis on five datasets,can achieve competitive performance compared to baseline methods.展开更多
Due to the rapid evolution of Advanced Persistent Threats(APTs)attacks,the emergence of new and rare attack samples,and even those never seen before,make it challenging for traditional rule-based detection methods to ...Due to the rapid evolution of Advanced Persistent Threats(APTs)attacks,the emergence of new and rare attack samples,and even those never seen before,make it challenging for traditional rule-based detection methods to extract universal rules for effective detection.With the progress in techniques such as transfer learning and meta-learning,few-shot network attack detection has progressed.However,challenges in few-shot network attack detection arise from the inability of time sequence flow features to adapt to the fixed length input requirement of deep learning,difficulties in capturing rich information from original flow in the case of insufficient samples,and the challenge of high-level abstract representation.To address these challenges,a few-shot network attack detection based on NFHP(Network Flow Holographic Picture)-RN(ResNet)is proposed.Specifically,leveraging inherent properties of images such as translation invariance,rotation invariance,scale invariance,and illumination invariance,network attack traffic features and contextual relationships are intuitively represented in NFHP.In addition,an improved RN network model is employed for high-level abstract feature extraction,ensuring that the extracted high-level abstract features maintain the detailed characteristics of the original traffic behavior,regardless of changes in background traffic.Finally,a meta-learning model based on the self-attention mechanism is constructed,achieving the detection of novel APT few-shot network attacks through the empirical generalization of high-level abstract feature representations of known-class network attack behaviors.Experimental results demonstrate that the proposed method can learn high-level abstract features of network attacks across different traffic detail granularities.Comparedwith state-of-the-artmethods,it achieves favorable accuracy,precision,recall,and F1 scores for the identification of unknown-class network attacks through cross-validation onmultiple datasets.展开更多
Due to their robust learning and expression ability for complex features,the deep learning(DL)model plays a vital role in bearing fault diagnosis.However,since there are fewer labeled samples in fault diagnosis,the de...Due to their robust learning and expression ability for complex features,the deep learning(DL)model plays a vital role in bearing fault diagnosis.However,since there are fewer labeled samples in fault diagnosis,the depth of DL models in fault diagnosis is generally shallower than that of DL models in other fields,which limits the diagnostic performance.To solve this problem,a novel transfer residual Swin Transformer(RST)is proposed for rolling bearings in this paper.RST has 24 residual self-attention layers,which use the hierarchical design and the shifted window-based residual self-attention.Combined with transfer learning techniques,the transfer RST model uses pre-trained parameters from ImageNet.A new end-to-end method for fault diagnosis based on deep transfer RST is proposed.Firstly,wavelet transform transforms the vibration signal into a wavelet time-frequency diagram.The signal’s time-frequency domain representation can be represented simultaneously.Secondly,the wavelet time-frequency diagram is the input of the RST model to obtain the fault type.Finally,our method is verified on public and self-built datasets.Experimental results show the superior performance of our method by comparing it with a shallow neural network.展开更多
Emotional electroencephalography(EEG)signals are a primary means of recording emotional brain activity.Currently,the most effective methods for analyzing emotional EEG signals involve feature engineering and neural ne...Emotional electroencephalography(EEG)signals are a primary means of recording emotional brain activity.Currently,the most effective methods for analyzing emotional EEG signals involve feature engineering and neural networks.However,neural networks possess a strong ability for automatic feature extraction.Is it possible to discard feature engineering and directly employ neural networks for end-to-end recognition?Based on the characteristics of EEG signals,this paper proposes an end-to-end feature extraction and classification method for a dynamic self-attention network(DySAT).The study reveals significant differences in brain activity patterns associated with different emotions across various experimenters and time periods.The results of this experiment can provide insights into the reasons behind these differences.展开更多
Circular RNAs(circRNAs)are RNAs with closed circular structure involved in many biological processes by key interactions with RNA binding proteins(RBPs).Existing methods for predicting these interactions have limitati...Circular RNAs(circRNAs)are RNAs with closed circular structure involved in many biological processes by key interactions with RNA binding proteins(RBPs).Existing methods for predicting these interactions have limitations in feature learning.In view of this,we propose a method named circ2CBA,which uses only sequence information of circRNAs to predict circRNA-RBP binding sites.We have constructed a data set which includes eight sub-datasets.First,circ2CBA encodes circRNA sequences using the one-hot method.Next,a two-layer convolutional neural network(CNN)is used to initially extract the features.After CNN,circ2CBA uses a layer of bidirectional long and short-term memory network(BiLSTM)and the self-attention mechanism to learn the features.The AUC value of circ2CBA reaches 0.8987.Comparison of circ2CBA with other three methods on our data set and an ablation experiment confirm that circ2CBA is an effective method to predict the binding sites between circRNAs and RBPs.展开更多
LIDAR point cloud-based 3D object detection aims to sense the surrounding environment by anchoring objects with the Bounding Box(BBox).However,under the three-dimensional space of autonomous driving scenes,the previou...LIDAR point cloud-based 3D object detection aims to sense the surrounding environment by anchoring objects with the Bounding Box(BBox).However,under the three-dimensional space of autonomous driving scenes,the previous object detection methods,due to the pre-processing of the original LIDAR point cloud into voxels or pillars,lose the coordinate information of the original point cloud,slow detection speed,and gain inaccurate bounding box positioning.To address the issues above,this study proposes a new two-stage network structure to extract point cloud features directly by PointNet++,which effectively preserves the original point cloud coordinate information.To improve the detection accuracy,a shell-based modeling method is proposed.It roughly determines which spherical shell the coordinates belong to.Then,the results are refined to ground truth,thereby narrowing the localization range and improving the detection accuracy.To improve the recall of 3D object detection with bounding boxes,this paper designs a self-attention module for 3D object detection with a skip connection structure.Some of these features are highlighted by weighting them on the feature dimensions.After training,it makes the feature weights that are favorable for object detection get larger.Thus,the extracted features are more adapted to the object detection task.Extensive comparison experiments and ablation experiments conducted on the KITTI dataset verify the effectiveness of our proposed method in improving recall and precision.展开更多
The production data in the industrialfield have the characteristics of multimodality,high dimensionality and large correlation differences between attributes.Existing data prediction methods cannot effectively capture ...The production data in the industrialfield have the characteristics of multimodality,high dimensionality and large correlation differences between attributes.Existing data prediction methods cannot effectively capture time series and modal features,which leads to prediction hysteresis and poor prediction stabil-ity.Aiming at the above problems,this paper proposes a time-series and modal fea-tureenhancementmethodbasedonadual-stageself-attentionmechanism(DATT),and a time series prediction method based on a gated feedforward recurrent unit(GFRU).On this basis,the DATT-GFRU neural network with a gated feedforward recurrent neural network and dual-stage self-attention mechanism is designed and implemented.Experiments show that the prediction effect of the neural network prediction model based on DATT is significantly improved.Compared with the traditional prediction model,the DATT-GFRU neural network has a smaller aver-age error of model prediction results,stable prediction performance,and strong generalization ability on the three datasets with different numbers of attributes and different training sample sizes.展开更多
Purpose-Clothing patterns play a dominant role in costume design and have become an important link in the perception of costume art.Conventional clothing patterns design relies on experienced designers.Although the qu...Purpose-Clothing patterns play a dominant role in costume design and have become an important link in the perception of costume art.Conventional clothing patterns design relies on experienced designers.Although the quality of clothing patterns is very high on conventional design,the input time and output amount ratio is relative low for conventional design.In order to break through the bottleneck of conventional clothing patterns design,this paper proposes a novel way based on generative adversarial network(GAN)model for automatic clothing patterns generation,which not only reduces the dependence of experienced designer,but also improve the input-output ratio.Design/methodology/approach-In view of the fact that clothing patterns have high requirements for global artistic perception and local texture details,this paper improves the conventional GAN model from two aspects:a multi-scales discriminators strategy is introduced to deal with the local texture details;and the selfattention mechanism is introduced to improve the global artistic perception.Therefore,the improved GAN called multi-scales self-attention improved generative adversarial network(MS-SA-GAN)model,which is used for high resolution clothing patterns generation.Findings-To verify the feasibility and effectiveness of the proposed MS-SA-GAN model,a crawler is designed to acquire standard clothing patterns dataset from Baidu pictures,and a comparative experiment is conducted on our designed clothing patterns dataset.In experiments,we have adjusted different parameters of the proposed MS-SA-GAN model,and compared the global artistic perception and local texture details of the generated clothing patterns.Originality/value-Experimental results have shown that the clothing patterns generated by the proposed MS-SA-GANmodel are superior to the conventional algorithms in some local texture detail indexes.In addition,a group of clothing design professionals is invited to evaluate the global artistic perception through a valencearousal scale.The scale results have shown that the proposed MS-SA-GAN model achieves a better global art perception.展开更多
Pedestrian attribute recognition is often considered as a multi-label image classification task. In order to make full use of attribute-related location information, a saliency guided self-attention network(SGSA-Net) ...Pedestrian attribute recognition is often considered as a multi-label image classification task. In order to make full use of attribute-related location information, a saliency guided self-attention network(SGSA-Net) was proposed to weakly supervise attribute localization, without annotations of attribute-related regions. Saliency priors were integrated into the spatial attention module(SAM). Meanwhile, channel-wise attention and spatial attention were introduced into the network. Moreover, a weighted binary cross-entropy loss(WCEL) function was employed to handle the imbalance of training data. Extensive experiments on richly annotated pedestrian(RAP) and pedestrian attribute(PETA) datasets demonstrated that SGSA-Net outperformed other state-of-the-art methods.展开更多
With rapid economic development,the per capita ownership of automobiles in our country has begun to rise year by year.More researchers have paid attention to using scientific methods to solve traffic flow problems.Tra...With rapid economic development,the per capita ownership of automobiles in our country has begun to rise year by year.More researchers have paid attention to using scientific methods to solve traffic flow problems.Traffic flow prediction is not simply affected by the number of vehicles,but also contains various complex factors,such as time,road conditions,and people flow.However,the existing methods ignore the complexity of road conditions and the correlation between individual nodes,which leads to the poor performance.In this study,a deep learning model SAMGCN is proposed to effectively capture the correlation between individual nodes to improve the performance of traffic flow prediction.First,the theory of spatiotemporal decoupling is used to divide each time of each node into finer particles.Second,multimodule fusion is used to mine the potential periodic relationships in the data.Finally,GRU is used to obtain the potential time relationship of the three modules.Extensive experiments were conducted on two traffic flow datasets,PeMS04 and PeMS08 in the Caltrans Performance Measurement System to prove the validity of the proposed model.展开更多
Shield tunneling machines are paramount underground engineering equipment and play a key role in tunnel construction.During the shield construction process,the“mud cake”formed by the difficult-to-remove clay attache...Shield tunneling machines are paramount underground engineering equipment and play a key role in tunnel construction.During the shield construction process,the“mud cake”formed by the difficult-to-remove clay attached to the cutterhead severely affects the shield construction efficiency and is harmful to the healthy operation of a shield tunneling machine.In this study,we propose an enhanced transformer-based detection model for detecting the cutterhead clogging status of shield tunneling machines.First,the working state data of shield machines are selected from historical excavation data,and a long short-term memory-autoencoder neural network module is constructed to remove outliers.Next,variational mode decomposition and wavelet transform are employed to denoise the data.After the preprocessing,nonoverlapping rectangular windows are used to intercept the working state data to obtain the time slices used for analysis,and several time-domain features of these periods are extracted.Owing to the data imbalance in the original dataset,the k-means-synthetic minority oversampling technique algorithm is adopted to oversample the extracted time-domain features of the clogging data in the training set to balance the dataset and improve the model performance.Finally,an enhanced transformer-based neural network is constructed to extract essential implicit features and detect cutterhead clogging status.Data collected from actual tunnel construction projects are used to verify the proposed model.The results show that the proposed model achieves accurate detection of shield machine cutterhead clogging status,with 98.85%accuracy and a 0.9786 F1 score.Moreover,the proposed model significantly outperforms the comparison models.展开更多
An intelligent single radar image de-raining method based on unsupervised self-attention generative adversarial networks is proposed to improve the accuracy of wave height parameter inversion results.The method builds...An intelligent single radar image de-raining method based on unsupervised self-attention generative adversarial networks is proposed to improve the accuracy of wave height parameter inversion results.The method builds a trainable end-to-end de-raining model with an unsupervised cycle-consistent adversarial network as an AI framework,which does not require pairs of rain-contaminated and corresponding ground-truth rain-free images for training.The model is trained by feeding rain-contaminated and clean radar images in an unpaired manner,and the atmospheric scattering model parameters are not required as a prior condition.Additionally,a self-attention mechanism is introduced into the model,allowing it to focus on rain clutter when processing radar images.This combines global and local rain clutter context information to output more accurate and clear de-raining radar images.The proposed method is validated by applying it to actualfield test data,which shows that compared with the wave height derived from the original rain-contaminated data,the root-mean-square error is reduced by 0.11 m and the correlation coefficient of the wave height is increased by 14%using the de-raining method.These results demonstrate that the method effectively reduces the impact of rain on the accuracy of wave height parameter estimation from marine X-band radar images.展开更多
A deep neural network model generally consists of different modules that play essential roles in performing a task.The optimal design of a module for use in modeling a physical problem is directly related to the succe...A deep neural network model generally consists of different modules that play essential roles in performing a task.The optimal design of a module for use in modeling a physical problem is directly related to the success of the model.In this work,the effectiveness of a number of special modules,the self-attention mechanism for recognizing the importance of molecular sequence information in a polymer,as well as the big-stride representation and conditional random field for enhancing the network ability to produce desired local configurations,is numerically studied.Network models containing these modules are trained by using the well documented data of the native structures of the HP model and assessed according to their capability in making structural predictions of unseen data.The specific network design of self-attention mechanism adopted here is modified from a similar idea in natural language recognition.The big-stride representation module introduced in this work is shown to drastically improve network's capability to model polymer segments of strong lattice position correlations.展开更多
Accurate long-term power forecasting is important in the decision-making operation of the power grid and power consumption management of customers to ensure the power system’s reliable power supply and the grid econ...Accurate long-term power forecasting is important in the decision-making operation of the power grid and power consumption management of customers to ensure the power system’s reliable power supply and the grid economy’s reliable operation.However,most time-series forecasting models do not perform well in dealing with long-time-series prediction tasks with a large amount of data.To address this challenge,we propose a parallel time-series prediction model called LDformer.First,we combine Informer with long short-term memory(LSTM)to obtain deep representation abilities in the time series.Then,we propose a parallel encoder module to improve the robustness of the model and combine convolutional layers with an attention mechanism to avoid value redundancy in the attention mechanism.Finally,we propose a probabilistic sparse(ProbSparse)self-attention mechanism combined with UniDrop to reduce the computational overhead and mitigate the risk of losing some key connections in the sequence.Experimental results on five datasets show that LDformer outperforms the state-of-the-art methods for most of the cases when handling the different long-time-series prediction tasks.展开更多
Recent change detection(CD)methods focus on the extraction of deep change semantic features.However,existing methods overlook the fine-grained features and have the poor ability to capture long-range space–time infor...Recent change detection(CD)methods focus on the extraction of deep change semantic features.However,existing methods overlook the fine-grained features and have the poor ability to capture long-range space–time information,which leads to the micro changes missing and the edges of change types smoothing.In this paper,a potential transformer-based semantic change detection(SCD)model,Pyramid-SCDFormer is proposed,which precisely recognizes the small changes and fine edges details of the changes.The SCD model selectively merges different semantic tokens in multi-head self-attention block to obtain multiscale features,which is crucial for extraction information of remote sensing images(RSIs)with multiple changes from different scales.Moreover,we create a well-annotated SCD dataset,Landsat-SCD with unprecedented time series and change types in complex scenarios.Comparing with three Convolutional Neural Network-based,one attention-based,and two transformer-based networks,experimental results demonstrate that the Pyramid-SCDFormer stably outperforms the existing state-of-the-art CD models and obtains an improvement in MIoU/F1 of 1.11/0.76%,0.57/0.50%,and 8.75/8.59%on the LEVIR-CD,WHU_CD,and Landsat-SCD dataset respectively.For change classes proportion less than 1%,the proposed model improves the MIoU by 7.17–19.53%on Landsat-SCD dataset.The recognition performance for small-scale and fine edges of change types has greatly improved.展开更多
Action recognition and localization in untrimmed videos is important for many applications and have attracted a lot of attention. Since full supervision with frame-level annotation places an overwhelming burden on man...Action recognition and localization in untrimmed videos is important for many applications and have attracted a lot of attention. Since full supervision with frame-level annotation places an overwhelming burden on manual labeling effort, learning with weak video-level supervision becomes a potential solution. In this paper, we propose a novel weakly supervised framework to recognize actions and locate the corresponding frames in untrimmed videos simultaneously. Considering that there are abundant trimmed videos publicly available and well-segmented with semantic descriptions, the instructive knowledge learned on trimmed videos can be fully leveraged to analyze untrimmed videos. We present an effective knowledge transfer strategy based on inter-class semantic relevance. We also take advantage of the self-attention mechanism to obtain a compact video representation, such that the influence of background frames can be effectively eliminated. A learning architecture is designed with twin networks for trimmed and untrimmed videos, to facilitate transferable self-attentive representation learning. Extensive experiments are conducted on three untrimmed benchmark datasets (i.e., THUMOS14, ActivityNet1.3, and MEXaction2), and the experimental results clearly corroborate the efficacy of our method. It is especially encouraging to see that the proposed weakly supervised method even achieves comparable results to some fully supervised methods.展开更多
Video summarization has established itself as a fundamental technique for generating compact and concise video, which alleviates managing and browsing large-scale video data. Existing methods fail to fully consider th...Video summarization has established itself as a fundamental technique for generating compact and concise video, which alleviates managing and browsing large-scale video data. Existing methods fail to fully consider the local and global relations among frames of video, leading to a deteriorated summarization performance. To address the above problem, we propose a graph convolutional attention network(GCAN) for video summarization. GCAN consists of two parts, embedding learning and context fusion, where embedding learning includes the temporal branch and graph branch. In particular, GCAN uses dilated temporal convolution to model local cues and temporal self-attention to exploit global cues for video frames. It learns graph embedding via a multi-layer graph convolutional network to reveal the intrinsic structure of frame samples. The context fusion part combines the output streams from the temporal branch and graph branch to create the context-aware representation of frames, on which the importance scores are evaluated for selecting representative frames to generate video summary. Experiments are carried out on two benchmark databases, Sum Me and TVSum, showing that the proposed GCAN approach enjoys superior performance compared to several state-of-the-art alternatives in three evaluation settings.展开更多
Fast and accurate fault diagnosis of strongly coupled, time-varying, multivariable complex industrial processes remain a challenging problem. We propose an industrial fault diagnosis model. This model is established o...Fast and accurate fault diagnosis of strongly coupled, time-varying, multivariable complex industrial processes remain a challenging problem. We propose an industrial fault diagnosis model. This model is established on the base of the temporal convolutional network(TCN) and the one-dimensional convolutional neural network(1DCNN). We add a batch normalization layer before the TCN layer, and the activation function of TCN is replaced from the initial ReLU function to the LeakyReLU function. To extract local correlations of features, a 1D convolution layer is added after the TCN layer, followed by the multi-head selfattention mechanism before the fully connected layer to enhance the model’s diagnostic ability. The extended Tennessee Eastman Process(TEP) dataset is used as the index to evaluate the performance of our model. The experiment results show the high fault recognition accuracy and better generalization performance of our model, which proves its effectiveness. Additionally, the model’s application on the diesel engine failure dataset of our partner’s project validates the effectiveness of it in industrial scenarios.展开更多
Methanol-to-olefins,as a promising non-oil pathway for the synthesis of light olefins,has been successfully industrialized.The accurate prediction of process variables can yield significant benefits for advanced proce...Methanol-to-olefins,as a promising non-oil pathway for the synthesis of light olefins,has been successfully industrialized.The accurate prediction of process variables can yield significant benefits for advanced process control and optimization.The challenge of this task is underscored by the failure of traditional methods in capturing the complex characteristics of industrial processes,such as high nonlinearities,dynamics,and data distribution shift caused by diverse operating conditions.In this paper,we propose a novel hybrid spatial-temporal deep learning prediction model to address these issues.Firstly,a unique data normalization technique called reversible instance normalization is employed to solve the problem of different data distributions.Subsequently,convolutional neural network integrated with the self-attention mechanism are utilized to extract the temporal patterns.Meanwhile,a multi-graph convolutional network is leveraged to model the spatial interactions.Afterward,the extracted temporal and spatial features are fused as input into a fully connected neural network to complete the prediction.Finally,the outputs are denormalized to obtain the ultimate results.The monitoring results of the dynamic trends of process variables in an actual industrial methanol-to-olefins process demonstrate that our model not only achieves superior prediction performance but also can reveal complex spatial-temporal relationships using the learned attention matrices and adjacency matrices,making the model more interpretable.Lastly,this model is deployed onto an end-to-end Industrial Internet Platform,which achieves effective practical results.展开更多
文摘Due to the lack of long-range association and spatial location information,fine details and accurate boundaries of complex clothing images cannot always be obtained by using the existing deep learning-based methods.This paper presents a convolutional structure with multi-scale fusion to optimize the step of clothing feature extraction and a self-attention module to capture long-range association information.The structure enables the self-attention mechanism to directly participate in the process of information exchange through the down-scaling projection operation of the multi-scale framework.In addition,the improved self-attention module introduces the extraction of 2-dimensional relative position information to make up for its lack of ability to extract spatial position features from clothing images.The experimental results based on the colorful fashion parsing dataset(CFPD)show that the proposed network structure achieves 53.68%mean intersection over union(mIoU)and has better performance on the clothing parsing task.
文摘Keyphrase greatly provides summarized and valuable information.This information can help us not only understand text semantics,but also organize and retrieve text content effectively.The task of automatically generating it has received considerable attention in recent decades.From the previous studies,we can see many workable solutions for obtaining keyphrases.One method is to divide the content to be summarized into multiple blocks of text,then we rank and select the most important content.The disadvantage of this method is that it cannot identify keyphrase that does not include in the text,let alone get the real semantic meaning hidden in the text.Another approach uses recurrent neural networks to generate keyphrases from the semantic aspects of the text,but the inherently sequential nature precludes parallelization within training examples,and distances have limitations on context dependencies.Previous works have demonstrated the benefits of the self-attention mechanism,which can learn global text dependency features and can be parallelized.Inspired by the above observation,we propose a keyphrase generation model,which is based entirely on the self-attention mechanism.It is an encoder-decoder model that can make up the above disadvantage effectively.In addition,we also consider the semantic similarity between keyphrases,and add semantic similarity processing module into the model.This proposed model,which is demonstrated by empirical analysis on five datasets,can achieve competitive performance compared to baseline methods.
基金supported by the National Natural Science Foundation of China(Nos.U19A208162202320)+2 种基金the Fundamental Research Funds for the Central Universities(No.SCU2023D008)the Science and Engineering Connotation Development Project of Sichuan University(No.2020SCUNG129)the Key Laboratory of Data Protection and Intelligent Management(Sichuan University),Ministry of Education.
文摘Due to the rapid evolution of Advanced Persistent Threats(APTs)attacks,the emergence of new and rare attack samples,and even those never seen before,make it challenging for traditional rule-based detection methods to extract universal rules for effective detection.With the progress in techniques such as transfer learning and meta-learning,few-shot network attack detection has progressed.However,challenges in few-shot network attack detection arise from the inability of time sequence flow features to adapt to the fixed length input requirement of deep learning,difficulties in capturing rich information from original flow in the case of insufficient samples,and the challenge of high-level abstract representation.To address these challenges,a few-shot network attack detection based on NFHP(Network Flow Holographic Picture)-RN(ResNet)is proposed.Specifically,leveraging inherent properties of images such as translation invariance,rotation invariance,scale invariance,and illumination invariance,network attack traffic features and contextual relationships are intuitively represented in NFHP.In addition,an improved RN network model is employed for high-level abstract feature extraction,ensuring that the extracted high-level abstract features maintain the detailed characteristics of the original traffic behavior,regardless of changes in background traffic.Finally,a meta-learning model based on the self-attention mechanism is constructed,achieving the detection of novel APT few-shot network attacks through the empirical generalization of high-level abstract feature representations of known-class network attack behaviors.Experimental results demonstrate that the proposed method can learn high-level abstract features of network attacks across different traffic detail granularities.Comparedwith state-of-the-artmethods,it achieves favorable accuracy,precision,recall,and F1 scores for the identification of unknown-class network attacks through cross-validation onmultiple datasets.
基金supported in part by the National Natural Science Foundation of China(General Program)under Grants 62073193 and 61873333in part by the National Key Research and Development Project(General Program)under Grant 2020YFE0204900in part by the Key Research and Development Plan of Shandong Province(General Program)under Grant 2021CXGC010204.
文摘Due to their robust learning and expression ability for complex features,the deep learning(DL)model plays a vital role in bearing fault diagnosis.However,since there are fewer labeled samples in fault diagnosis,the depth of DL models in fault diagnosis is generally shallower than that of DL models in other fields,which limits the diagnostic performance.To solve this problem,a novel transfer residual Swin Transformer(RST)is proposed for rolling bearings in this paper.RST has 24 residual self-attention layers,which use the hierarchical design and the shifted window-based residual self-attention.Combined with transfer learning techniques,the transfer RST model uses pre-trained parameters from ImageNet.A new end-to-end method for fault diagnosis based on deep transfer RST is proposed.Firstly,wavelet transform transforms the vibration signal into a wavelet time-frequency diagram.The signal’s time-frequency domain representation can be represented simultaneously.Secondly,the wavelet time-frequency diagram is the input of the RST model to obtain the fault type.Finally,our method is verified on public and self-built datasets.Experimental results show the superior performance of our method by comparing it with a shallow neural network.
文摘Emotional electroencephalography(EEG)signals are a primary means of recording emotional brain activity.Currently,the most effective methods for analyzing emotional EEG signals involve feature engineering and neural networks.However,neural networks possess a strong ability for automatic feature extraction.Is it possible to discard feature engineering and directly employ neural networks for end-to-end recognition?Based on the characteristics of EEG signals,this paper proposes an end-to-end feature extraction and classification method for a dynamic self-attention network(DySAT).The study reveals significant differences in brain activity patterns associated with different emotions across various experimenters and time periods.The results of this experiment can provide insights into the reasons behind these differences.
基金supported by the National Natural Science Foundation of China(Grant Nos.61972451,61902230)the Fundamental Research Funds for the Central Universities,Shaanxi Normal University(GK202103091)。
文摘Circular RNAs(circRNAs)are RNAs with closed circular structure involved in many biological processes by key interactions with RNA binding proteins(RBPs).Existing methods for predicting these interactions have limitations in feature learning.In view of this,we propose a method named circ2CBA,which uses only sequence information of circRNAs to predict circRNA-RBP binding sites.We have constructed a data set which includes eight sub-datasets.First,circ2CBA encodes circRNA sequences using the one-hot method.Next,a two-layer convolutional neural network(CNN)is used to initially extract the features.After CNN,circ2CBA uses a layer of bidirectional long and short-term memory network(BiLSTM)and the self-attention mechanism to learn the features.The AUC value of circ2CBA reaches 0.8987.Comparison of circ2CBA with other three methods on our data set and an ablation experiment confirm that circ2CBA is an effective method to predict the binding sites between circRNAs and RBPs.
基金This work was supported,in part,by the National Nature Science Foundation of China under grant numbers 62272236in part,by the Natural Science Foundation of Jiangsu Province under grant numbers BK20201136,BK20191401in part,by the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)fund.
文摘LIDAR point cloud-based 3D object detection aims to sense the surrounding environment by anchoring objects with the Bounding Box(BBox).However,under the three-dimensional space of autonomous driving scenes,the previous object detection methods,due to the pre-processing of the original LIDAR point cloud into voxels or pillars,lose the coordinate information of the original point cloud,slow detection speed,and gain inaccurate bounding box positioning.To address the issues above,this study proposes a new two-stage network structure to extract point cloud features directly by PointNet++,which effectively preserves the original point cloud coordinate information.To improve the detection accuracy,a shell-based modeling method is proposed.It roughly determines which spherical shell the coordinates belong to.Then,the results are refined to ground truth,thereby narrowing the localization range and improving the detection accuracy.To improve the recall of 3D object detection with bounding boxes,this paper designs a self-attention module for 3D object detection with a skip connection structure.Some of these features are highlighted by weighting them on the feature dimensions.After training,it makes the feature weights that are favorable for object detection get larger.Thus,the extracted features are more adapted to the object detection task.Extensive comparison experiments and ablation experiments conducted on the KITTI dataset verify the effectiveness of our proposed method in improving recall and precision.
基金This work is financially supported by:The National Key R&D Program of China(No.2020YFB1712600)The Fundamental Research Funds for Central University(No.3072022QBZ0601)+1 种基金The National Natural Science Foundation of China(No.62272126)The National Natural Science Foundation of China(No.61872104).
文摘The production data in the industrialfield have the characteristics of multimodality,high dimensionality and large correlation differences between attributes.Existing data prediction methods cannot effectively capture time series and modal features,which leads to prediction hysteresis and poor prediction stabil-ity.Aiming at the above problems,this paper proposes a time-series and modal fea-tureenhancementmethodbasedonadual-stageself-attentionmechanism(DATT),and a time series prediction method based on a gated feedforward recurrent unit(GFRU).On this basis,the DATT-GFRU neural network with a gated feedforward recurrent neural network and dual-stage self-attention mechanism is designed and implemented.Experiments show that the prediction effect of the neural network prediction model based on DATT is significantly improved.Compared with the traditional prediction model,the DATT-GFRU neural network has a smaller aver-age error of model prediction results,stable prediction performance,and strong generalization ability on the three datasets with different numbers of attributes and different training sample sizes.
基金This paper is supported by university fund project of Hubei Institute of Fine Arts,named“The construction of blended teaching mode based on flipped classroom-Taking the Course of“Fashion Painting Illustration”as an Example.”(No.202028)。
文摘Purpose-Clothing patterns play a dominant role in costume design and have become an important link in the perception of costume art.Conventional clothing patterns design relies on experienced designers.Although the quality of clothing patterns is very high on conventional design,the input time and output amount ratio is relative low for conventional design.In order to break through the bottleneck of conventional clothing patterns design,this paper proposes a novel way based on generative adversarial network(GAN)model for automatic clothing patterns generation,which not only reduces the dependence of experienced designer,but also improve the input-output ratio.Design/methodology/approach-In view of the fact that clothing patterns have high requirements for global artistic perception and local texture details,this paper improves the conventional GAN model from two aspects:a multi-scales discriminators strategy is introduced to deal with the local texture details;and the selfattention mechanism is introduced to improve the global artistic perception.Therefore,the improved GAN called multi-scales self-attention improved generative adversarial network(MS-SA-GAN)model,which is used for high resolution clothing patterns generation.Findings-To verify the feasibility and effectiveness of the proposed MS-SA-GAN model,a crawler is designed to acquire standard clothing patterns dataset from Baidu pictures,and a comparative experiment is conducted on our designed clothing patterns dataset.In experiments,we have adjusted different parameters of the proposed MS-SA-GAN model,and compared the global artistic perception and local texture details of the generated clothing patterns.Originality/value-Experimental results have shown that the clothing patterns generated by the proposed MS-SA-GANmodel are superior to the conventional algorithms in some local texture detail indexes.In addition,a group of clothing design professionals is invited to evaluate the global artistic perception through a valencearousal scale.The scale results have shown that the proposed MS-SA-GAN model achieves a better global art perception.
基金supported by the National Natural Science Foundation of China (41874173)。
文摘Pedestrian attribute recognition is often considered as a multi-label image classification task. In order to make full use of attribute-related location information, a saliency guided self-attention network(SGSA-Net) was proposed to weakly supervise attribute localization, without annotations of attribute-related regions. Saliency priors were integrated into the spatial attention module(SAM). Meanwhile, channel-wise attention and spatial attention were introduced into the network. Moreover, a weighted binary cross-entropy loss(WCEL) function was employed to handle the imbalance of training data. Extensive experiments on richly annotated pedestrian(RAP) and pedestrian attribute(PETA) datasets demonstrated that SGSA-Net outperformed other state-of-the-art methods.
基金supported by the National Key R&D Program of China under Grant No.2020YFB1710200the National Natural Science Foundation of China under Grant No.61872105 and No.62072136.
文摘With rapid economic development,the per capita ownership of automobiles in our country has begun to rise year by year.More researchers have paid attention to using scientific methods to solve traffic flow problems.Traffic flow prediction is not simply affected by the number of vehicles,but also contains various complex factors,such as time,road conditions,and people flow.However,the existing methods ignore the complexity of road conditions and the correlation between individual nodes,which leads to the poor performance.In this study,a deep learning model SAMGCN is proposed to effectively capture the correlation between individual nodes to improve the performance of traffic flow prediction.First,the theory of spatiotemporal decoupling is used to divide each time of each node into finer particles.Second,multimodule fusion is used to mine the potential periodic relationships in the data.Finally,GRU is used to obtain the potential time relationship of the three modules.Extensive experiments were conducted on two traffic flow datasets,PeMS04 and PeMS08 in the Caltrans Performance Measurement System to prove the validity of the proposed model.
基金supported by the National Key R&D Program of China (Grant No.2018YFB1702503)Shanghai Municipal Science and Technology Major Project (Grant No.2021SHZDZX0102)the State Key Laboratory of Mechanical System and Vibration (Grant No.MSVZD202103)。
文摘Shield tunneling machines are paramount underground engineering equipment and play a key role in tunnel construction.During the shield construction process,the“mud cake”formed by the difficult-to-remove clay attached to the cutterhead severely affects the shield construction efficiency and is harmful to the healthy operation of a shield tunneling machine.In this study,we propose an enhanced transformer-based detection model for detecting the cutterhead clogging status of shield tunneling machines.First,the working state data of shield machines are selected from historical excavation data,and a long short-term memory-autoencoder neural network module is constructed to remove outliers.Next,variational mode decomposition and wavelet transform are employed to denoise the data.After the preprocessing,nonoverlapping rectangular windows are used to intercept the working state data to obtain the time slices used for analysis,and several time-domain features of these periods are extracted.Owing to the data imbalance in the original dataset,the k-means-synthetic minority oversampling technique algorithm is adopted to oversample the extracted time-domain features of the clogging data in the training set to balance the dataset and improve the model performance.Finally,an enhanced transformer-based neural network is constructed to extract essential implicit features and detect cutterhead clogging status.Data collected from actual tunnel construction projects are used to verify the proposed model.The results show that the proposed model achieves accurate detection of shield machine cutterhead clogging status,with 98.85%accuracy and a 0.9786 F1 score.Moreover,the proposed model significantly outperforms the comparison models.
基金supported by the National Key Research and Development Program of China[grant no 2021YFF0602104-1].
文摘An intelligent single radar image de-raining method based on unsupervised self-attention generative adversarial networks is proposed to improve the accuracy of wave height parameter inversion results.The method builds a trainable end-to-end de-raining model with an unsupervised cycle-consistent adversarial network as an AI framework,which does not require pairs of rain-contaminated and corresponding ground-truth rain-free images for training.The model is trained by feeding rain-contaminated and clean radar images in an unpaired manner,and the atmospheric scattering model parameters are not required as a prior condition.Additionally,a self-attention mechanism is introduced into the model,allowing it to focus on rain clutter when processing radar images.This combines global and local rain clutter context information to output more accurate and clear de-raining radar images.The proposed method is validated by applying it to actualfield test data,which shows that compared with the wave height derived from the original rain-contaminated data,the root-mean-square error is reduced by 0.11 m and the correlation coefficient of the wave height is increased by 14%using the de-raining method.These results demonstrate that the method effectively reduces the impact of rain on the accuracy of wave height parameter estimation from marine X-band radar images.
基金financially supported by the National Natural Science Foundation of China(Nos.21973018 and 21534002)the Natural Sciences and Engineering Research Council(NSERC)of Canada。
文摘A deep neural network model generally consists of different modules that play essential roles in performing a task.The optimal design of a module for use in modeling a physical problem is directly related to the success of the model.In this work,the effectiveness of a number of special modules,the self-attention mechanism for recognizing the importance of molecular sequence information in a polymer,as well as the big-stride representation and conditional random field for enhancing the network ability to produce desired local configurations,is numerically studied.Network models containing these modules are trained by using the well documented data of the native structures of the HP model and assessed according to their capability in making structural predictions of unseen data.The specific network design of self-attention mechanism adopted here is modified from a similar idea in natural language recognition.The big-stride representation module introduced in this work is shown to drastically improve network's capability to model polymer segments of strong lattice position correlations.
基金Project supported by the National Natural Science Foundation of China(No.71961028)the Key Research and Development Program of Gansu Province,China(No.22YF7GA171)+2 种基金the University Industry Support Program of Gansu Province,China(No.2023QB-115)the Innovation Fund for Science and Technology-based Small and Medium Enterprises of Gansu Province,China(No.23CXGA0136)the Scientific Research Project of the Lanzhou Science and Technology Program,China(No.2018-01-58)。
文摘Accurate long-term power forecasting is important in the decision-making operation of the power grid and power consumption management of customers to ensure the power system’s reliable power supply and the grid economy’s reliable operation.However,most time-series forecasting models do not perform well in dealing with long-time-series prediction tasks with a large amount of data.To address this challenge,we propose a parallel time-series prediction model called LDformer.First,we combine Informer with long short-term memory(LSTM)to obtain deep representation abilities in the time series.Then,we propose a parallel encoder module to improve the robustness of the model and combine convolutional layers with an attention mechanism to avoid value redundancy in the attention mechanism.Finally,we propose a probabilistic sparse(ProbSparse)self-attention mechanism combined with UniDrop to reduce the computational overhead and mitigate the risk of losing some key connections in the sequence.Experimental results on five datasets show that LDformer outperforms the state-of-the-art methods for most of the cases when handling the different long-time-series prediction tasks.
基金supported by National Key Research and Development Program of China[Grant number 2017YFB0504203]Xinjiang Production and Construction Corps Science and Technology Project:[Grant number 2017DB005].
文摘Recent change detection(CD)methods focus on the extraction of deep change semantic features.However,existing methods overlook the fine-grained features and have the poor ability to capture long-range space–time information,which leads to the micro changes missing and the edges of change types smoothing.In this paper,a potential transformer-based semantic change detection(SCD)model,Pyramid-SCDFormer is proposed,which precisely recognizes the small changes and fine edges details of the changes.The SCD model selectively merges different semantic tokens in multi-head self-attention block to obtain multiscale features,which is crucial for extraction information of remote sensing images(RSIs)with multiple changes from different scales.Moreover,we create a well-annotated SCD dataset,Landsat-SCD with unprecedented time series and change types in complex scenarios.Comparing with three Convolutional Neural Network-based,one attention-based,and two transformer-based networks,experimental results demonstrate that the Pyramid-SCDFormer stably outperforms the existing state-of-the-art CD models and obtains an improvement in MIoU/F1 of 1.11/0.76%,0.57/0.50%,and 8.75/8.59%on the LEVIR-CD,WHU_CD,and Landsat-SCD dataset respectively.For change classes proportion less than 1%,the proposed model improves the MIoU by 7.17–19.53%on Landsat-SCD dataset.The recognition performance for small-scale and fine edges of change types has greatly improved.
基金supported by National Natural Science Foundation of China(Nos.61871378,U2003111,62122013 and U2001211).
文摘Action recognition and localization in untrimmed videos is important for many applications and have attracted a lot of attention. Since full supervision with frame-level annotation places an overwhelming burden on manual labeling effort, learning with weak video-level supervision becomes a potential solution. In this paper, we propose a novel weakly supervised framework to recognize actions and locate the corresponding frames in untrimmed videos simultaneously. Considering that there are abundant trimmed videos publicly available and well-segmented with semantic descriptions, the instructive knowledge learned on trimmed videos can be fully leveraged to analyze untrimmed videos. We present an effective knowledge transfer strategy based on inter-class semantic relevance. We also take advantage of the self-attention mechanism to obtain a compact video representation, such that the influence of background frames can be effectively eliminated. A learning architecture is designed with twin networks for trimmed and untrimmed videos, to facilitate transferable self-attentive representation learning. Extensive experiments are conducted on three untrimmed benchmark datasets (i.e., THUMOS14, ActivityNet1.3, and MEXaction2), and the experimental results clearly corroborate the efficacy of our method. It is especially encouraging to see that the proposed weakly supervised method even achieves comparable results to some fully supervised methods.
基金Project supported by the National Natural Science Foundation of China (Nos. 61872122 and 61502131)the Zhejiang Provincial Natural Science Foundation of China (No. LY18F020015)+1 种基金the Open Pro ject Program of the State Key Lab of CAD&CG,China (No. 1802)the Zhejiang Provincial Key Research and Development Program,China (No. 2020C01067)。
文摘Video summarization has established itself as a fundamental technique for generating compact and concise video, which alleviates managing and browsing large-scale video data. Existing methods fail to fully consider the local and global relations among frames of video, leading to a deteriorated summarization performance. To address the above problem, we propose a graph convolutional attention network(GCAN) for video summarization. GCAN consists of two parts, embedding learning and context fusion, where embedding learning includes the temporal branch and graph branch. In particular, GCAN uses dilated temporal convolution to model local cues and temporal self-attention to exploit global cues for video frames. It learns graph embedding via a multi-layer graph convolutional network to reveal the intrinsic structure of frame samples. The context fusion part combines the output streams from the temporal branch and graph branch to create the context-aware representation of frames, on which the importance scores are evaluated for selecting representative frames to generate video summary. Experiments are carried out on two benchmark databases, Sum Me and TVSum, showing that the proposed GCAN approach enjoys superior performance compared to several state-of-the-art alternatives in three evaluation settings.
基金Supported by the Scientific and Technological Innovation 2030—Major Project of "New Generation Artificial Intelligence"(2020AAA0109300)。
文摘Fast and accurate fault diagnosis of strongly coupled, time-varying, multivariable complex industrial processes remain a challenging problem. We propose an industrial fault diagnosis model. This model is established on the base of the temporal convolutional network(TCN) and the one-dimensional convolutional neural network(1DCNN). We add a batch normalization layer before the TCN layer, and the activation function of TCN is replaced from the initial ReLU function to the LeakyReLU function. To extract local correlations of features, a 1D convolution layer is added after the TCN layer, followed by the multi-head selfattention mechanism before the fully connected layer to enhance the model’s diagnostic ability. The extended Tennessee Eastman Process(TEP) dataset is used as the index to evaluate the performance of our model. The experiment results show the high fault recognition accuracy and better generalization performance of our model, which proves its effectiveness. Additionally, the model’s application on the diesel engine failure dataset of our partner’s project validates the effectiveness of it in industrial scenarios.
基金the National Natural Science Foundation of China(Grant No.21991093)the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDA29050200)+1 种基金the Dalian Institute of Chemical Physics(DICP I202135)the Energy Science and Technology Revolution Project(Grant No.E2010412).
文摘Methanol-to-olefins,as a promising non-oil pathway for the synthesis of light olefins,has been successfully industrialized.The accurate prediction of process variables can yield significant benefits for advanced process control and optimization.The challenge of this task is underscored by the failure of traditional methods in capturing the complex characteristics of industrial processes,such as high nonlinearities,dynamics,and data distribution shift caused by diverse operating conditions.In this paper,we propose a novel hybrid spatial-temporal deep learning prediction model to address these issues.Firstly,a unique data normalization technique called reversible instance normalization is employed to solve the problem of different data distributions.Subsequently,convolutional neural network integrated with the self-attention mechanism are utilized to extract the temporal patterns.Meanwhile,a multi-graph convolutional network is leveraged to model the spatial interactions.Afterward,the extracted temporal and spatial features are fused as input into a fully connected neural network to complete the prediction.Finally,the outputs are denormalized to obtain the ultimate results.The monitoring results of the dynamic trends of process variables in an actual industrial methanol-to-olefins process demonstrate that our model not only achieves superior prediction performance but also can reveal complex spatial-temporal relationships using the learned attention matrices and adjacency matrices,making the model more interpretable.Lastly,this model is deployed onto an end-to-end Industrial Internet Platform,which achieves effective practical results.