Image description task is the intersection of computer vision and natural language processing,and it has important prospects,including helping computers understand images and obtaining information for the visually imp...Image description task is the intersection of computer vision and natural language processing,and it has important prospects,including helping computers understand images and obtaining information for the visually impaired.This study presents an innovative approach employing deep reinforcement learning to enhance the accuracy of natural language descriptions of images.Our method focuses on refining the reward function in deep reinforcement learning,facilitating the generation of precise descriptions by aligning visual and textual features more closely.Our approach comprises three key architectures.Firstly,it utilizes Residual Network 101(ResNet-101)and Faster Region-based Convolutional Neural Network(Faster R-CNN)to extract average and local image features,respectively,followed by the implementation of a dual attention mechanism for intricate feature fusion.Secondly,the Transformer model is engaged to derive contextual semantic features from textual data.Finally,the generation of descriptive text is executed through a two-layer long short-term memory network(LSTM),directed by the value and reward functions.Compared with the image description method that relies on deep learning,the score of Bilingual Evaluation Understudy(BLEU-1)is 0.762,which is 1.6%higher,and the score of BLEU-4 is 0.299.Consensus-based Image Description Evaluation(CIDEr)scored 0.998,Recall-Oriented Understudy for Gisting Evaluation(ROUGE)scored 0.552,the latter improved by 0.36%.These results not only attest to the viability of our approach but also highlight its superiority in the realm of image description.Future research can explore the integration of our method with other artificial intelligence(AI)domains,such as emotional AI,to create more nuanced and context-aware systems.展开更多
In recent days,Deep Learning(DL)techniques have become an emerging transformation in the field of machine learning,artificial intelligence,computer vision,and so on.Subsequently,researchers and industries have been hi...In recent days,Deep Learning(DL)techniques have become an emerging transformation in the field of machine learning,artificial intelligence,computer vision,and so on.Subsequently,researchers and industries have been highly endorsed in the medical field,predicting and controlling diverse diseases at specific intervals.Liver tumor prediction is a vital chore in analyzing and treating liver diseases.This paper proposes a novel approach for predicting liver tumors using Convolutional Neural Networks(CNN)and a depth-based variant search algorithm with advanced attention mechanisms(CNN-DS-AM).The proposed work aims to improve accuracy and robustness in diagnosing and treating liver diseases.The anticipated model is assessed on a Computed Tomography(CT)scan dataset containing both benign and malignant liver tumors.The proposed approach achieved high accuracy in predicting liver tumors,outperforming other state-of-the-art methods.Additionally,advanced attention mechanisms were incorporated into the CNN model to enable the identification and highlighting of regions of the CT scans most relevant to predicting liver tumors.The results suggest that incorporating attention mechanisms and a depth-based variant search algorithm into the CNN model is a promising approach for improving the accuracy and robustness of liver tumor prediction.It can assist radiologists in their diagnosis and treatment planning.The proposed system achieved a high accuracy of 95.5%in predicting liver tumors,outperforming other state-of-the-art methods.展开更多
Neurodegeneration is the gradual deterioration and eventual death of brain cells,leading to progressive loss of structure and function of neurons in the brain and nervous system.Neurodegenerative disorders,such as Alz...Neurodegeneration is the gradual deterioration and eventual death of brain cells,leading to progressive loss of structure and function of neurons in the brain and nervous system.Neurodegenerative disorders,such as Alzheimer’s,Huntington’s,Parkinson’s,amyotrophic lateral sclerosis,multiple system atrophy,and multiple sclerosis,are characterized by progressive deterioration of brain function,resulting in symptoms such as memory impairment,movement difficulties,and cognitive decline.Early diagnosis of these conditions is crucial to slowing down cell degeneration and reducing the severity of the diseases.Magnetic resonance imaging(MRI)is widely used by neurologists for diagnosing brain abnormalities.The majority of the research in this field focuses on processing the 2D images extracted from the 3D MRI volumetric scans for disease diagnosis.This might result in losing the volumetric information obtained from the whole brain MRI.To address this problem,a novel 3D-CNN architecture with an attention mechanism is proposed to classify whole-brain MRI images for Alzheimer’s disease(AD)detection.The 3D-CNN model uses channel and spatial attention mechanisms to extract relevant features and improve accuracy in identifying brain dysfunctions by focusing on specific regions of the brain.The pipeline takes pre-processed MRI volumetric scans as input,and the 3D-CNN model leverages both channel and spatial attention mechanisms to extract precise feature representations of the input MRI volume for accurate classification.The present study utilizes the publicly available Alzheimer’s disease Neuroimaging Initiative(ADNI)dataset,which has three image classes:Mild Cognitive Impairment(MCI),Cognitive Normal(CN),and AD affected.The proposed approach achieves an overall accuracy of 79%when classifying three classes and an average accuracy of 87%when identifying AD and the other two classes.The findings reveal that 3D-CNN models with an attention mechanism exhibit significantly higher classification performance compared to other models,highlighting the potential of deep learning algorithms to aid in the early detection and prediction of AD.展开更多
Spam emails pose a threat to individuals. The proliferation of spam emails daily has rendered traditional machine learning and deep learning methods for screening them ineffective and inefficient. In our research, we ...Spam emails pose a threat to individuals. The proliferation of spam emails daily has rendered traditional machine learning and deep learning methods for screening them ineffective and inefficient. In our research, we employ deep neural networks like RNN, LSTM, and GRU, incorporating attention mechanisms such as Bahdanua, scaled dot product (SDP), and Luong scaled dot product self-attention for spam email filtering. We evaluate our approach on various datasets, including Trec spam, Enron spam emails, SMS spam collections, and the Ling spam dataset, which constitutes a substantial custom dataset. All these datasets are publicly available. For the Enron dataset, we attain an accuracy of 99.97% using LSTM with SDP self-attention. Our custom dataset exhibits the highest accuracy of 99.01% when employing GRU with SDP self-attention. The SMS spam collection dataset yields a peak accuracy of 99.61% with LSTM and SDP attention. Using the GRU (Gated Recurrent Unit) alongside Luong and SDP (Structured Self-Attention) attention mechanisms, the peak accuracy of 99.89% in the Ling spam dataset. For the Trec spam dataset, the most accurate results are achieved using Luong attention LSTM, with an accuracy rate of 99.01%. Our performance analyses consistently indicate that employing the scaled dot product attention mechanism in conjunction with gated recurrent neural networks (GRU) delivers the most effective results. In summary, our research underscores the efficacy of employing advanced deep learning techniques and attention mechanisms for spam email filtering, with remarkable accuracy across multiple datasets. This approach presents a promising solution to the ever-growing problem of spam emails.展开更多
Aiming at the problem that existing models in aspect-level sentiment analysis cannot fully and effectively utilize sentence semantic and syntactic structure information, this paper proposes a graph neural network-base...Aiming at the problem that existing models in aspect-level sentiment analysis cannot fully and effectively utilize sentence semantic and syntactic structure information, this paper proposes a graph neural network-based aspect-level sentiment classification model. Self-attention, aspectual word multi-head attention and dependent syntactic relations are fused and the node representations are enhanced with graph convolutional networks to enable the model to fully learn the global semantic and syntactic structural information of sentences. Experimental results show that the model performs well on three public benchmark datasets Rest14, Lap14, and Twitter, improving the accuracy of sentiment classification.展开更多
Unmanned aerial vehicles(UAVs) have gained significant attention in practical applications, especially the low-altitude aerial(LAA) object detection imposes stringent requirements on recognition accuracy and computati...Unmanned aerial vehicles(UAVs) have gained significant attention in practical applications, especially the low-altitude aerial(LAA) object detection imposes stringent requirements on recognition accuracy and computational resources. In this paper, the LAA images-oriented tensor decomposition and knowledge distillation-based network(TDKD-Net) is proposed,where the TT-format TD(tensor decomposition) and equalweighted response-based KD(knowledge distillation) methods are designed to minimize redundant parameters while ensuring comparable performance. Moreover, some robust network structures are developed, including the small object detection head and the dual-domain attention mechanism, which enable the model to leverage the learned knowledge from small-scale targets and selectively focus on salient features. Considering the imbalance of bounding box regression samples and the inaccuracy of regression geometric factors, the focal and efficient IoU(intersection of union) loss with optimal transport assignment(F-EIoU-OTA)mechanism is proposed to improve the detection accuracy. The proposed TDKD-Net is comprehensively evaluated through extensive experiments, and the results have demonstrated the effectiveness and superiority of the developed methods in comparison to other advanced detection algorithms, which also present high generalization and strong robustness. As a resource-efficient precise network, the complex detection of small and occluded LAA objects is also well addressed by TDKD-Net, which provides useful insights on handling imbalanced issues and realizing domain adaptation.展开更多
The real-time detection and instance segmentation of strawberries constitute fundamental components in the development of strawberry harvesting robots.Real-time identification of strawberries in an unstructured envi-r...The real-time detection and instance segmentation of strawberries constitute fundamental components in the development of strawberry harvesting robots.Real-time identification of strawberries in an unstructured envi-ronment is a challenging task.Current instance segmentation algorithms for strawberries suffer from issues such as poor real-time performance and low accuracy.To this end,the present study proposes an Efficient YOLACT(E-YOLACT)algorithm for strawberry detection and segmentation based on the YOLACT framework.The key enhancements of the E-YOLACT encompass the development of a lightweight attention mechanism,pyramid squeeze shuffle attention(PSSA),for efficient feature extraction.Additionally,an attention-guided context-feature pyramid network(AC-FPN)is employed instead of FPN to optimize the architecture’s performance.Furthermore,a feature-enhanced model(FEM)is introduced to enhance the prediction head’s capabilities,while efficient fast non-maximum suppression(EF-NMS)is devised to improve non-maximum suppression.The experimental results demonstrate that the E-YOLACT achieves a Box-mAP and Mask-mAP of 77.9 and 76.6,respectively,on the custom dataset.Moreover,it exhibits an impressive category accuracy of 93.5%.Notably,the E-YOLACT also demonstrates a remarkable real-time detection capability with a speed of 34.8 FPS.The method proposed in this article presents an efficient approach for the vision system of a strawberry-picking robot.展开更多
In pursuit of cost-effective manufacturing,enterprises are increasingly adopting the practice of utilizing recycled semiconductor chips.To ensure consistent chip orientation during packaging,a circular marker on the f...In pursuit of cost-effective manufacturing,enterprises are increasingly adopting the practice of utilizing recycled semiconductor chips.To ensure consistent chip orientation during packaging,a circular marker on the front side is employed for pin alignment following successful functional testing.However,recycled chips often exhibit substantial surface wear,and the identification of the relatively small marker proves challenging.Moreover,the complexity of generic target detection algorithms hampers seamless deployment.Addressing these issues,this paper introduces a lightweight YOLOv8s-based network tailored for detecting markings on recycled chips,termed Van-YOLOv8.Initially,to alleviate the influence of diminutive,low-resolution markings on the precision of deep learning models,we utilize an upscaling approach for enhanced resolution.This technique relies on the Super-Resolution Generative Adversarial Network with Extended Training(SRGANext)network,facilitating the reconstruction of high-fidelity images that align with input specifications.Subsequently,we replace the original YOLOv8smodel’s backbone feature extraction network with the lightweight VanillaNetwork(VanillaNet),simplifying the branch structure to reduce network parameters.Finally,a Hybrid Attention Mechanism(HAM)is implemented to capture essential details from input images,improving feature representation while concurrently expediting model inference speed.Experimental results demonstrate that the Van-YOLOv8 network outperforms the original YOLOv8s on a recycled chip dataset in various aspects.Significantly,it demonstrates superiority in parameter count,computational intricacy,precision in identifying targets,and speed when compared to certain prevalent algorithms in the current landscape.The proposed approach proves promising for real-time detection of recycled chips in practical factory settings.展开更多
The attention mechanism can extract salient features in images,which has been proved to be effective in improving the performance of person re-identification(Re-ID).However,most of the existing attention modules have ...The attention mechanism can extract salient features in images,which has been proved to be effective in improving the performance of person re-identification(Re-ID).However,most of the existing attention modules have the following two shortcomings:On the one hand,they mostly use global average pooling to generate context descriptors,without highlighting the guiding role of salient information on descriptor generation,resulting in insufficient ability of the final generated attention mask representation;On the other hand,the design of most attention modules is complicated,which greatly increases the computational cost of the model.To solve these problems,this paper proposes an attention module called self-supervised recalibration(SR)block,which introduces both global and local information through adaptive weighted fusion to generate a more refined attention mask.In particular,a special"Squeeze-Excitation"(SE)unit is designed in the SR block to further process the generated intermediate masks,both for nonlinearizations of the features and for constraint of the resulting computation by controlling the number of channels.Furthermore,we combine the most commonly used Res Net-50 to construct the instantiation model of the SR block,and verify its effectiveness on multiple Re-ID datasets,especially the mean Average Precision(m AP)on the Occluded-Duke dataset exceeds the state-of-the-art(SOTA)algorithm by 4.49%.展开更多
Early screening of diabetes retinopathy(DR)plays an important role in preventing irreversible blindness.Existing research has failed to fully explore effective DR lesion information in fundus maps.Besides,traditional ...Early screening of diabetes retinopathy(DR)plays an important role in preventing irreversible blindness.Existing research has failed to fully explore effective DR lesion information in fundus maps.Besides,traditional attention schemes have not considered the impact of lesion type differences on grading,resulting in unreasonable extraction of important lesion features.Therefore,this paper proposes a DR diagnosis scheme that integrates a multi-level patch attention generator(MPAG)and a lesion localization module(LLM).Firstly,MPAGis used to predict patches of different sizes and generate a weighted attention map based on the prediction score and the types of lesions contained in the patches,fully considering the impact of lesion type differences on grading,solving the problem that the attention maps of lesions cannot be further refined and then adapted to the final DR diagnosis task.Secondly,the LLM generates a global attention map based on localization.Finally,the weighted attention map and global attention map are weighted with the fundus map to fully explore effective DR lesion information and increase the attention of the classification network to lesion details.This paper demonstrates the effectiveness of the proposed method through extensive experiments on the public DDR dataset,obtaining an accuracy of 0.8064.展开更多
Financial time series prediction,whether for classification or regression,has been a heated research topic over the last decade.While traditional machine learning algorithms have experienced mediocre results,deep lear...Financial time series prediction,whether for classification or regression,has been a heated research topic over the last decade.While traditional machine learning algorithms have experienced mediocre results,deep learning has largely contributed to the elevation of the prediction performance.Currently,the most up-to-date review of advanced machine learning techniques for financial time series prediction is still lacking,making it challenging for finance domain experts and relevant practitioners to determine which model potentially performs better,what techniques and components are involved,and how themodel can be designed and implemented.This review article provides an overview of techniques,components and frameworks for financial time series prediction,with an emphasis on state-of-the-art deep learning models in the literature from2015 to 2023,including standalonemodels like convolutional neural networks(CNN)that are capable of extracting spatial dependencies within data,and long short-term memory(LSTM)that is designed for handling temporal dependencies;and hybrid models integrating CNN,LSTM,attention mechanism(AM)and other techniques.For illustration and comparison purposes,models proposed in recent studies are mapped to relevant elements of a generalized framework comprised of input,output,feature extraction,prediction,and related processes.Among the state-of-the-artmodels,hybrid models like CNNLSTMand CNN-LSTM-AM in general have been reported superior in performance to stand-alone models like the CNN-only model.Some remaining challenges have been discussed,including non-friendliness for finance domain experts,delayed prediction,domain knowledge negligence,lack of standards,and inability of real-time and highfrequency predictions.The principal contributions of this paper are to provide a one-stop guide for both academia and industry to review,compare and summarize technologies and recent advances in this area,to facilitate smooth and informed implementation,and to highlight future research directions.展开更多
With the increasing proportion of encrypted traffic in cyberspace, the classification of encrypted traffic has becomea core key technology in network supervision. In recent years, many different solutions have emerged...With the increasing proportion of encrypted traffic in cyberspace, the classification of encrypted traffic has becomea core key technology in network supervision. In recent years, many different solutions have emerged in this field.Most methods identify and classify traffic by extracting spatiotemporal characteristics of data flows or byte-levelfeatures of packets. However, due to changes in data transmission mediums, such as fiber optics and satellites,temporal features can exhibit significant variations due to changes in communication links and transmissionquality. Additionally, partial spatial features can change due to reasons like data reordering and retransmission.Faced with these challenges, identifying encrypted traffic solely based on packet byte-level features is significantlydifficult. To address this, we propose a universal packet-level encrypted traffic identification method, ComboPacket. This method utilizes convolutional neural networks to extract deep features of the current packet andits contextual information and employs spatial and channel attention mechanisms to select and locate effectivefeatures. Experimental data shows that Combo Packet can effectively distinguish between encrypted traffic servicecategories (e.g., File Transfer Protocol, FTP, and Peer-to-Peer, P2P) and encrypted traffic application categories (e.g.,BitTorrent and Skype). Validated on the ISCX VPN-non VPN dataset, it achieves classification accuracies of 97.0%and 97.1% for service and application categories, respectively. It also provides shorter training times and higherrecognition speeds. The performance and recognition capabilities of Combo Packet are significantly superior tothe existing classification methods mentioned.展开更多
Crowdsourcing technology is widely recognized for its effectiveness in task scheduling and resource allocation.While traditional methods for task allocation can help reduce costs and improve efficiency,they may encoun...Crowdsourcing technology is widely recognized for its effectiveness in task scheduling and resource allocation.While traditional methods for task allocation can help reduce costs and improve efficiency,they may encounter challenges when dealing with abnormal data flow nodes,leading to decreased allocation accuracy and efficiency.To address these issues,this study proposes a novel two-part invalid detection task allocation framework.In the first step,an anomaly detection model is developed using a dynamic self-attentive GAN to identify anomalous data.Compared to the baseline method,the model achieves an approximately 4%increase in the F1 value on the public dataset.In the second step of the framework,task allocation modeling is performed using a twopart graph matching method.This phase introduces a P-queue KM algorithm that implements a more efficient optimization strategy.The allocation efficiency is improved by approximately 23.83%compared to the baseline method.Empirical results confirm the effectiveness of the proposed framework in detecting abnormal data nodes,enhancing allocation precision,and achieving efficient allocation.展开更多
Remote sensing imagery,due to its high altitude,presents inherent challenges characterized by multiple scales,limited target areas,and intricate backgrounds.These inherent traits often lead to increased miss and false...Remote sensing imagery,due to its high altitude,presents inherent challenges characterized by multiple scales,limited target areas,and intricate backgrounds.These inherent traits often lead to increased miss and false detection rates when applying object recognition algorithms tailored for remote sensing imagery.Additionally,these complexities contribute to inaccuracies in target localization and hinder precise target categorization.This paper addresses these challenges by proposing a solution:The YOLO-MFD model(YOLO-MFD:Remote Sensing Image Object Detection withMulti-scale Fusion Dynamic Head).Before presenting our method,we delve into the prevalent issues faced in remote sensing imagery analysis.Specifically,we emphasize the struggles of existing object recognition algorithms in comprehensively capturing critical image features amidst varying scales and complex backgrounds.To resolve these issues,we introduce a novel approach.First,we propose the implementation of a lightweight multi-scale module called CEF.This module significantly improves the model’s ability to comprehensively capture important image features by merging multi-scale feature information.It effectively addresses the issues of missed detection and mistaken alarms that are common in remote sensing imagery.Second,an additional layer of small target detection heads is added,and a residual link is established with the higher-level feature extraction module in the backbone section.This allows the model to incorporate shallower information,significantly improving the accuracy of target localization in remotely sensed images.Finally,a dynamic head attentionmechanism is introduced.This allows themodel to exhibit greater flexibility and accuracy in recognizing shapes and targets of different sizes.Consequently,the precision of object detection is significantly improved.The trial results show that the YOLO-MFD model shows improvements of 6.3%,3.5%,and 2.5%over the original YOLOv8 model in Precision,map@0.5 and map@0.5:0.95,separately.These results illustrate the clear advantages of the method.展开更多
Traffic flow prediction plays a key role in the construction of intelligent transportation system.However,due to its complex spatio-temporal dependence and its uncertainty,the research becomes very challenging.Most of...Traffic flow prediction plays a key role in the construction of intelligent transportation system.However,due to its complex spatio-temporal dependence and its uncertainty,the research becomes very challenging.Most of the existing studies are based on graph neural networks that model traffic flow graphs and try to use fixed graph structure to deal with the relationship between nodes.However,due to the time-varying spatial correlation of the traffic network,there is no fixed node relationship,and these methods cannot effectively integrate the temporal and spatial features.This paper proposes a novel temporal-spatial dynamic graph convolutional network(TSADGCN).The dynamic time warping algorithm(DTW)is introduced to calculate the similarity of traffic flow sequence among network nodes in the time dimension,and the spatiotemporal graph of traffic flow is constructed to capture the spatiotemporal characteristics and dependencies of traffic flow.By combining graph attention network and time attention network,a spatiotemporal convolution block is constructed to capture spatiotemporal characteristics of traffic data.Experiments on open data sets PEMSD4 and PEMSD8 show that TSADGCN has higher prediction accuracy than well-known traffic flow prediction algorithms.展开更多
Recently,deep image-hiding techniques have attracted considerable attention in covert communication and high-capacity information hiding.However,these approaches have some limitations.For example,a cover image lacks s...Recently,deep image-hiding techniques have attracted considerable attention in covert communication and high-capacity information hiding.However,these approaches have some limitations.For example,a cover image lacks self-adaptability,information leakage,or weak concealment.To address these issues,this study proposes a universal and adaptable image-hiding method.First,a domain attention mechanism is designed by combining the Atrous convolution,which makes better use of the relationship between the secret image domain and the cover image domain.Second,to improve perceived human similarity,perceptual loss is incorporated into the training process.The experimental results are promising,with the proposed method achieving an average pixel discrepancy(APD)of 1.83 and a peak signal-to-noise ratio(PSNR)value of 40.72 dB between the cover and stego images,indicative of its high-quality output.Furthermore,the structural similarity index measure(SSIM)reaches 0.985 while the learned perceptual image patch similarity(LPIPS)remarkably registers at 0.0001.Moreover,self-testing and cross-experiments demonstrate the model’s adaptability and generalization in unknown hidden spaces,making it suitable for diverse computer vision tasks.展开更多
In recent years,how to efficiently and accurately identify multi-model fake news has become more challenging.First,multi-model data provides more evidence but not all are equally important.Secondly,social structure in...In recent years,how to efficiently and accurately identify multi-model fake news has become more challenging.First,multi-model data provides more evidence but not all are equally important.Secondly,social structure information has proven to be effective in fake news detection and how to combine it while reducing the noise information is critical.Unfortunately,existing approaches fail to handle these problems.This paper proposes a multi-model fake news detection framework based on Tex-modal Dominance and fusing Multiple Multi-model Cues(TD-MMC),which utilizes three valuable multi-model clues:text-model importance,text-image complementary,and text-image inconsistency.TD-MMC is dominated by textural content and assisted by image information while using social network information to enhance text representation.To reduce the irrelevant social structure’s information interference,we use a unidirectional cross-modal attention mechanism to selectively learn the social structure’s features.A cross-modal attention mechanism is adopted to obtain text-image cross-modal features while retaining textual features to reduce the loss of important information.In addition,TD-MMC employs a new multi-model loss to improve the model’s generalization ability.Extensive experiments have been conducted on two public real-world English and Chinese datasets,and the results show that our proposed model outperforms the state-of-the-art methods on classification evaluation metrics.展开更多
Long-term urban traffic flow prediction is an important task in the field of intelligent transportation,as it can help optimize traffic management and improve travel efficiency.To improve prediction accuracy,a crucial...Long-term urban traffic flow prediction is an important task in the field of intelligent transportation,as it can help optimize traffic management and improve travel efficiency.To improve prediction accuracy,a crucial issue is how to model spatiotemporal dependency in urban traffic data.In recent years,many studies have adopted spatiotemporal neural networks to extract key information from traffic data.However,most models ignore the semantic spatial similarity between long-distance areas when mining spatial dependency.They also ignore the impact of predicted time steps on the next unpredicted time step for making long-term predictions.Moreover,these models lack a comprehensive data embedding process to represent complex spatiotemporal dependency.This paper proposes a multi-scale persistent spatiotemporal transformer(MSPSTT)model to perform accurate long-term traffic flow prediction in cities.MSPSTT adopts an encoder-decoder structure and incorporates temporal,periodic,and spatial features to fully embed urban traffic data to address these issues.The model consists of a spatiotemporal encoder and a spatiotemporal decoder,which rely on temporal,geospatial,and semantic space multi-head attention modules to dynamically extract temporal,geospatial,and semantic characteristics.The spatiotemporal decoder combines the context information provided by the encoder,integrates the predicted time step information,and is iteratively updated to learn the correlation between different time steps in the broader time range to improve the model’s accuracy for long-term prediction.Experiments on four public transportation datasets demonstrate that MSPSTT outperforms the existing models by up to 9.5%on three common metrics.展开更多
Predictive Business Process Monitoring(PBPM)is a significant research area in Business Process Management(BPM)aimed at accurately forecasting future behavioral events.At present,deep learning methods are widely cited ...Predictive Business Process Monitoring(PBPM)is a significant research area in Business Process Management(BPM)aimed at accurately forecasting future behavioral events.At present,deep learning methods are widely cited in PBPM research,but no method has been effective in fusing data information into the control flow for multi-perspective process prediction.Therefore,this paper proposes a process prediction method based on the hierarchical BERT and multi-perspective data fusion.Firstly,the first layer BERT network learns the correlations between different category attribute data.Then,the attribute data is integrated into a weighted event-level feature vector and input into the second layer BERT network to learn the impact and priority relationship of each event on future predicted events.Next,the multi-head attention mechanism within the framework is visualized for analysis,helping to understand the decision-making logic of the framework and providing visual predictions.Finally,experimental results show that the predictive accuracy of the framework surpasses the current state-of-the-art research methods and significantly enhances the predictive performance of BPM.展开更多
Aiming at the challenges associated with the absence of a labeled dataset for Yi characters and the complexity of Yi character detection and recognition,we present a deep learning-based approach for Yi character detec...Aiming at the challenges associated with the absence of a labeled dataset for Yi characters and the complexity of Yi character detection and recognition,we present a deep learning-based approach for Yi character detection and recognition.In the detection stage,an improved Differentiable Binarization Network(DBNet)framework is introduced to detect Yi characters,in which the Omni-dimensional Dynamic Convolution(ODConv)is combined with the ResNet-18 feature extraction module to obtain multi-dimensional complementary features,thereby improving the accuracy of Yi character detection.Then,the feature pyramid network fusion module is used to further extract Yi character image features,improving target recognition at different scales.Further,the previously generated feature map is passed through a head network to produce two maps:a probability map and an adaptive threshold map of the same size as the original map.These maps are then subjected to a differentiable binarization process,resulting in an approximate binarization map.This map helps to identify the boundaries of the text boxes.Finally,the text detection box is generated after the post-processing stage.In the recognition stage,an improved lightweight MobileNetV3 framework is used to recognize the detect character regions,where the original Squeeze-and-Excitation(SE)block is replaced by the efficient Shuffle Attention(SA)that integrates spatial and channel attention,improving the accuracy of Yi characters recognition.Meanwhile,the use of depth separable convolution and reversible residual structure can reduce the number of parameters and computation of the model,so that the model can better understand the contextual information and improve the accuracy of text recognition.The experimental results illustrate that the proposed method achieves good results in detecting and recognizing Yi characters,with detection and recognition accuracy rates of 97.5%and 96.8%,respectively.And also,we have compared the detection and recognition algorithms proposed in this paper with other typical algorithms.In these comparisons,the proposed model achieves better detection and recognition results with a certain reliability.展开更多
基金This research was funded by the Natural Science Foundation of Gansu Province with Approval Numbers 20JR10RA334 and 21JR7RA570Funding is provided for the 2021 Longyuan Youth Innovation and Entrepreneurship Talent Project with Approval Number 2021LQGR20+1 种基金the University Level Innovation Project with Approval NumbersGZF2020XZD18jbzxyb2018-01 of Gansu University of Political Science and Law.
文摘Image description task is the intersection of computer vision and natural language processing,and it has important prospects,including helping computers understand images and obtaining information for the visually impaired.This study presents an innovative approach employing deep reinforcement learning to enhance the accuracy of natural language descriptions of images.Our method focuses on refining the reward function in deep reinforcement learning,facilitating the generation of precise descriptions by aligning visual and textual features more closely.Our approach comprises three key architectures.Firstly,it utilizes Residual Network 101(ResNet-101)and Faster Region-based Convolutional Neural Network(Faster R-CNN)to extract average and local image features,respectively,followed by the implementation of a dual attention mechanism for intricate feature fusion.Secondly,the Transformer model is engaged to derive contextual semantic features from textual data.Finally,the generation of descriptive text is executed through a two-layer long short-term memory network(LSTM),directed by the value and reward functions.Compared with the image description method that relies on deep learning,the score of Bilingual Evaluation Understudy(BLEU-1)is 0.762,which is 1.6%higher,and the score of BLEU-4 is 0.299.Consensus-based Image Description Evaluation(CIDEr)scored 0.998,Recall-Oriented Understudy for Gisting Evaluation(ROUGE)scored 0.552,the latter improved by 0.36%.These results not only attest to the viability of our approach but also highlight its superiority in the realm of image description.Future research can explore the integration of our method with other artificial intelligence(AI)domains,such as emotional AI,to create more nuanced and context-aware systems.
文摘In recent days,Deep Learning(DL)techniques have become an emerging transformation in the field of machine learning,artificial intelligence,computer vision,and so on.Subsequently,researchers and industries have been highly endorsed in the medical field,predicting and controlling diverse diseases at specific intervals.Liver tumor prediction is a vital chore in analyzing and treating liver diseases.This paper proposes a novel approach for predicting liver tumors using Convolutional Neural Networks(CNN)and a depth-based variant search algorithm with advanced attention mechanisms(CNN-DS-AM).The proposed work aims to improve accuracy and robustness in diagnosing and treating liver diseases.The anticipated model is assessed on a Computed Tomography(CT)scan dataset containing both benign and malignant liver tumors.The proposed approach achieved high accuracy in predicting liver tumors,outperforming other state-of-the-art methods.Additionally,advanced attention mechanisms were incorporated into the CNN model to enable the identification and highlighting of regions of the CT scans most relevant to predicting liver tumors.The results suggest that incorporating attention mechanisms and a depth-based variant search algorithm into the CNN model is a promising approach for improving the accuracy and robustness of liver tumor prediction.It can assist radiologists in their diagnosis and treatment planning.The proposed system achieved a high accuracy of 95.5%in predicting liver tumors,outperforming other state-of-the-art methods.
文摘Neurodegeneration is the gradual deterioration and eventual death of brain cells,leading to progressive loss of structure and function of neurons in the brain and nervous system.Neurodegenerative disorders,such as Alzheimer’s,Huntington’s,Parkinson’s,amyotrophic lateral sclerosis,multiple system atrophy,and multiple sclerosis,are characterized by progressive deterioration of brain function,resulting in symptoms such as memory impairment,movement difficulties,and cognitive decline.Early diagnosis of these conditions is crucial to slowing down cell degeneration and reducing the severity of the diseases.Magnetic resonance imaging(MRI)is widely used by neurologists for diagnosing brain abnormalities.The majority of the research in this field focuses on processing the 2D images extracted from the 3D MRI volumetric scans for disease diagnosis.This might result in losing the volumetric information obtained from the whole brain MRI.To address this problem,a novel 3D-CNN architecture with an attention mechanism is proposed to classify whole-brain MRI images for Alzheimer’s disease(AD)detection.The 3D-CNN model uses channel and spatial attention mechanisms to extract relevant features and improve accuracy in identifying brain dysfunctions by focusing on specific regions of the brain.The pipeline takes pre-processed MRI volumetric scans as input,and the 3D-CNN model leverages both channel and spatial attention mechanisms to extract precise feature representations of the input MRI volume for accurate classification.The present study utilizes the publicly available Alzheimer’s disease Neuroimaging Initiative(ADNI)dataset,which has three image classes:Mild Cognitive Impairment(MCI),Cognitive Normal(CN),and AD affected.The proposed approach achieves an overall accuracy of 79%when classifying three classes and an average accuracy of 87%when identifying AD and the other two classes.The findings reveal that 3D-CNN models with an attention mechanism exhibit significantly higher classification performance compared to other models,highlighting the potential of deep learning algorithms to aid in the early detection and prediction of AD.
文摘Spam emails pose a threat to individuals. The proliferation of spam emails daily has rendered traditional machine learning and deep learning methods for screening them ineffective and inefficient. In our research, we employ deep neural networks like RNN, LSTM, and GRU, incorporating attention mechanisms such as Bahdanua, scaled dot product (SDP), and Luong scaled dot product self-attention for spam email filtering. We evaluate our approach on various datasets, including Trec spam, Enron spam emails, SMS spam collections, and the Ling spam dataset, which constitutes a substantial custom dataset. All these datasets are publicly available. For the Enron dataset, we attain an accuracy of 99.97% using LSTM with SDP self-attention. Our custom dataset exhibits the highest accuracy of 99.01% when employing GRU with SDP self-attention. The SMS spam collection dataset yields a peak accuracy of 99.61% with LSTM and SDP attention. Using the GRU (Gated Recurrent Unit) alongside Luong and SDP (Structured Self-Attention) attention mechanisms, the peak accuracy of 99.89% in the Ling spam dataset. For the Trec spam dataset, the most accurate results are achieved using Luong attention LSTM, with an accuracy rate of 99.01%. Our performance analyses consistently indicate that employing the scaled dot product attention mechanism in conjunction with gated recurrent neural networks (GRU) delivers the most effective results. In summary, our research underscores the efficacy of employing advanced deep learning techniques and attention mechanisms for spam email filtering, with remarkable accuracy across multiple datasets. This approach presents a promising solution to the ever-growing problem of spam emails.
文摘Aiming at the problem that existing models in aspect-level sentiment analysis cannot fully and effectively utilize sentence semantic and syntactic structure information, this paper proposes a graph neural network-based aspect-level sentiment classification model. Self-attention, aspectual word multi-head attention and dependent syntactic relations are fused and the node representations are enhanced with graph convolutional networks to enable the model to fully learn the global semantic and syntactic structural information of sentences. Experimental results show that the model performs well on three public benchmark datasets Rest14, Lap14, and Twitter, improving the accuracy of sentiment classification.
基金supported in part by the National Natural Science Foundation of China (62073271)the Natural Science Foundation for Distinguished Young Scholars of the Fujian Province of China (2023J06010)the Fundamental Research Funds for the Central Universities of China(20720220076)。
文摘Unmanned aerial vehicles(UAVs) have gained significant attention in practical applications, especially the low-altitude aerial(LAA) object detection imposes stringent requirements on recognition accuracy and computational resources. In this paper, the LAA images-oriented tensor decomposition and knowledge distillation-based network(TDKD-Net) is proposed,where the TT-format TD(tensor decomposition) and equalweighted response-based KD(knowledge distillation) methods are designed to minimize redundant parameters while ensuring comparable performance. Moreover, some robust network structures are developed, including the small object detection head and the dual-domain attention mechanism, which enable the model to leverage the learned knowledge from small-scale targets and selectively focus on salient features. Considering the imbalance of bounding box regression samples and the inaccuracy of regression geometric factors, the focal and efficient IoU(intersection of union) loss with optimal transport assignment(F-EIoU-OTA)mechanism is proposed to improve the detection accuracy. The proposed TDKD-Net is comprehensively evaluated through extensive experiments, and the results have demonstrated the effectiveness and superiority of the developed methods in comparison to other advanced detection algorithms, which also present high generalization and strong robustness. As a resource-efficient precise network, the complex detection of small and occluded LAA objects is also well addressed by TDKD-Net, which provides useful insights on handling imbalanced issues and realizing domain adaptation.
基金funded by Anhui Provincial Natural Science Foundation(No.2208085ME128)the Anhui University-Level Special Project of Anhui University of Science and Technology(No.XCZX2021-01)+1 种基金the Research and the Development Fund of the Institute of Environmental Friendly Materials and Occupational Health,Anhui University of Science and Technology(No.ALW2022YF06)Anhui Province New Era Education Quality Project(Graduate Education)(No.2022xscx073).
文摘The real-time detection and instance segmentation of strawberries constitute fundamental components in the development of strawberry harvesting robots.Real-time identification of strawberries in an unstructured envi-ronment is a challenging task.Current instance segmentation algorithms for strawberries suffer from issues such as poor real-time performance and low accuracy.To this end,the present study proposes an Efficient YOLACT(E-YOLACT)algorithm for strawberry detection and segmentation based on the YOLACT framework.The key enhancements of the E-YOLACT encompass the development of a lightweight attention mechanism,pyramid squeeze shuffle attention(PSSA),for efficient feature extraction.Additionally,an attention-guided context-feature pyramid network(AC-FPN)is employed instead of FPN to optimize the architecture’s performance.Furthermore,a feature-enhanced model(FEM)is introduced to enhance the prediction head’s capabilities,while efficient fast non-maximum suppression(EF-NMS)is devised to improve non-maximum suppression.The experimental results demonstrate that the E-YOLACT achieves a Box-mAP and Mask-mAP of 77.9 and 76.6,respectively,on the custom dataset.Moreover,it exhibits an impressive category accuracy of 93.5%.Notably,the E-YOLACT also demonstrates a remarkable real-time detection capability with a speed of 34.8 FPS.The method proposed in this article presents an efficient approach for the vision system of a strawberry-picking robot.
基金the Liaoning Provincial Department of Education 2021 Annual Scientific Research Funding Program(Grant Numbers LJKZ0535,LJKZ0526)the 2021 Annual Comprehensive Reform of Undergraduate Education Teaching(Grant Numbers JGLX2021020,JCLX2021008)Graduate Innovation Fund of Dalian Polytechnic University(Grant Number 2023CXYJ13).
文摘In pursuit of cost-effective manufacturing,enterprises are increasingly adopting the practice of utilizing recycled semiconductor chips.To ensure consistent chip orientation during packaging,a circular marker on the front side is employed for pin alignment following successful functional testing.However,recycled chips often exhibit substantial surface wear,and the identification of the relatively small marker proves challenging.Moreover,the complexity of generic target detection algorithms hampers seamless deployment.Addressing these issues,this paper introduces a lightweight YOLOv8s-based network tailored for detecting markings on recycled chips,termed Van-YOLOv8.Initially,to alleviate the influence of diminutive,low-resolution markings on the precision of deep learning models,we utilize an upscaling approach for enhanced resolution.This technique relies on the Super-Resolution Generative Adversarial Network with Extended Training(SRGANext)network,facilitating the reconstruction of high-fidelity images that align with input specifications.Subsequently,we replace the original YOLOv8smodel’s backbone feature extraction network with the lightweight VanillaNetwork(VanillaNet),simplifying the branch structure to reduce network parameters.Finally,a Hybrid Attention Mechanism(HAM)is implemented to capture essential details from input images,improving feature representation while concurrently expediting model inference speed.Experimental results demonstrate that the Van-YOLOv8 network outperforms the original YOLOv8s on a recycled chip dataset in various aspects.Significantly,it demonstrates superiority in parameter count,computational intricacy,precision in identifying targets,and speed when compared to certain prevalent algorithms in the current landscape.The proposed approach proves promising for real-time detection of recycled chips in practical factory settings.
基金supported in part by the Natural Science Foundation of Xinjiang Uygur Autonomous Region(Grant No.2022D01B186 and No.2022D01B05)。
文摘The attention mechanism can extract salient features in images,which has been proved to be effective in improving the performance of person re-identification(Re-ID).However,most of the existing attention modules have the following two shortcomings:On the one hand,they mostly use global average pooling to generate context descriptors,without highlighting the guiding role of salient information on descriptor generation,resulting in insufficient ability of the final generated attention mask representation;On the other hand,the design of most attention modules is complicated,which greatly increases the computational cost of the model.To solve these problems,this paper proposes an attention module called self-supervised recalibration(SR)block,which introduces both global and local information through adaptive weighted fusion to generate a more refined attention mask.In particular,a special"Squeeze-Excitation"(SE)unit is designed in the SR block to further process the generated intermediate masks,both for nonlinearizations of the features and for constraint of the resulting computation by controlling the number of channels.Furthermore,we combine the most commonly used Res Net-50 to construct the instantiation model of the SR block,and verify its effectiveness on multiple Re-ID datasets,especially the mean Average Precision(m AP)on the Occluded-Duke dataset exceeds the state-of-the-art(SOTA)algorithm by 4.49%.
基金supported in part by the Research on the Application of Multimodal Artificial Intelligence in Diagnosis and Treatment of Type 2 Diabetes under Grant No.2020SK50910in part by the Hunan Provincial Natural Science Foundation of China under Grant 2023JJ60020.
文摘Early screening of diabetes retinopathy(DR)plays an important role in preventing irreversible blindness.Existing research has failed to fully explore effective DR lesion information in fundus maps.Besides,traditional attention schemes have not considered the impact of lesion type differences on grading,resulting in unreasonable extraction of important lesion features.Therefore,this paper proposes a DR diagnosis scheme that integrates a multi-level patch attention generator(MPAG)and a lesion localization module(LLM).Firstly,MPAGis used to predict patches of different sizes and generate a weighted attention map based on the prediction score and the types of lesions contained in the patches,fully considering the impact of lesion type differences on grading,solving the problem that the attention maps of lesions cannot be further refined and then adapted to the final DR diagnosis task.Secondly,the LLM generates a global attention map based on localization.Finally,the weighted attention map and global attention map are weighted with the fundus map to fully explore effective DR lesion information and increase the attention of the classification network to lesion details.This paper demonstrates the effectiveness of the proposed method through extensive experiments on the public DDR dataset,obtaining an accuracy of 0.8064.
基金funded by the Natural Science Foundation of Fujian Province,China (Grant No.2022J05291)Xiamen Scientific Research Funding for Overseas Chinese Scholars.
文摘Financial time series prediction,whether for classification or regression,has been a heated research topic over the last decade.While traditional machine learning algorithms have experienced mediocre results,deep learning has largely contributed to the elevation of the prediction performance.Currently,the most up-to-date review of advanced machine learning techniques for financial time series prediction is still lacking,making it challenging for finance domain experts and relevant practitioners to determine which model potentially performs better,what techniques and components are involved,and how themodel can be designed and implemented.This review article provides an overview of techniques,components and frameworks for financial time series prediction,with an emphasis on state-of-the-art deep learning models in the literature from2015 to 2023,including standalonemodels like convolutional neural networks(CNN)that are capable of extracting spatial dependencies within data,and long short-term memory(LSTM)that is designed for handling temporal dependencies;and hybrid models integrating CNN,LSTM,attention mechanism(AM)and other techniques.For illustration and comparison purposes,models proposed in recent studies are mapped to relevant elements of a generalized framework comprised of input,output,feature extraction,prediction,and related processes.Among the state-of-the-artmodels,hybrid models like CNNLSTMand CNN-LSTM-AM in general have been reported superior in performance to stand-alone models like the CNN-only model.Some remaining challenges have been discussed,including non-friendliness for finance domain experts,delayed prediction,domain knowledge negligence,lack of standards,and inability of real-time and highfrequency predictions.The principal contributions of this paper are to provide a one-stop guide for both academia and industry to review,compare and summarize technologies and recent advances in this area,to facilitate smooth and informed implementation,and to highlight future research directions.
基金the National Natural Science Foundation of China Youth Project(62302520).
文摘With the increasing proportion of encrypted traffic in cyberspace, the classification of encrypted traffic has becomea core key technology in network supervision. In recent years, many different solutions have emerged in this field.Most methods identify and classify traffic by extracting spatiotemporal characteristics of data flows or byte-levelfeatures of packets. However, due to changes in data transmission mediums, such as fiber optics and satellites,temporal features can exhibit significant variations due to changes in communication links and transmissionquality. Additionally, partial spatial features can change due to reasons like data reordering and retransmission.Faced with these challenges, identifying encrypted traffic solely based on packet byte-level features is significantlydifficult. To address this, we propose a universal packet-level encrypted traffic identification method, ComboPacket. This method utilizes convolutional neural networks to extract deep features of the current packet andits contextual information and employs spatial and channel attention mechanisms to select and locate effectivefeatures. Experimental data shows that Combo Packet can effectively distinguish between encrypted traffic servicecategories (e.g., File Transfer Protocol, FTP, and Peer-to-Peer, P2P) and encrypted traffic application categories (e.g.,BitTorrent and Skype). Validated on the ISCX VPN-non VPN dataset, it achieves classification accuracies of 97.0%and 97.1% for service and application categories, respectively. It also provides shorter training times and higherrecognition speeds. The performance and recognition capabilities of Combo Packet are significantly superior tothe existing classification methods mentioned.
基金National Natural Science Foundation of China(62072392).
文摘Crowdsourcing technology is widely recognized for its effectiveness in task scheduling and resource allocation.While traditional methods for task allocation can help reduce costs and improve efficiency,they may encounter challenges when dealing with abnormal data flow nodes,leading to decreased allocation accuracy and efficiency.To address these issues,this study proposes a novel two-part invalid detection task allocation framework.In the first step,an anomaly detection model is developed using a dynamic self-attentive GAN to identify anomalous data.Compared to the baseline method,the model achieves an approximately 4%increase in the F1 value on the public dataset.In the second step of the framework,task allocation modeling is performed using a twopart graph matching method.This phase introduces a P-queue KM algorithm that implements a more efficient optimization strategy.The allocation efficiency is improved by approximately 23.83%compared to the baseline method.Empirical results confirm the effectiveness of the proposed framework in detecting abnormal data nodes,enhancing allocation precision,and achieving efficient allocation.
基金the Scientific Research Fund of Hunan Provincial Education Department(23A0423).
文摘Remote sensing imagery,due to its high altitude,presents inherent challenges characterized by multiple scales,limited target areas,and intricate backgrounds.These inherent traits often lead to increased miss and false detection rates when applying object recognition algorithms tailored for remote sensing imagery.Additionally,these complexities contribute to inaccuracies in target localization and hinder precise target categorization.This paper addresses these challenges by proposing a solution:The YOLO-MFD model(YOLO-MFD:Remote Sensing Image Object Detection withMulti-scale Fusion Dynamic Head).Before presenting our method,we delve into the prevalent issues faced in remote sensing imagery analysis.Specifically,we emphasize the struggles of existing object recognition algorithms in comprehensively capturing critical image features amidst varying scales and complex backgrounds.To resolve these issues,we introduce a novel approach.First,we propose the implementation of a lightweight multi-scale module called CEF.This module significantly improves the model’s ability to comprehensively capture important image features by merging multi-scale feature information.It effectively addresses the issues of missed detection and mistaken alarms that are common in remote sensing imagery.Second,an additional layer of small target detection heads is added,and a residual link is established with the higher-level feature extraction module in the backbone section.This allows the model to incorporate shallower information,significantly improving the accuracy of target localization in remotely sensed images.Finally,a dynamic head attentionmechanism is introduced.This allows themodel to exhibit greater flexibility and accuracy in recognizing shapes and targets of different sizes.Consequently,the precision of object detection is significantly improved.The trial results show that the YOLO-MFD model shows improvements of 6.3%,3.5%,and 2.5%over the original YOLOv8 model in Precision,map@0.5 and map@0.5:0.95,separately.These results illustrate the clear advantages of the method.
基金supported by the National Natural Science Foundation of China(Grant:62176086).
文摘Traffic flow prediction plays a key role in the construction of intelligent transportation system.However,due to its complex spatio-temporal dependence and its uncertainty,the research becomes very challenging.Most of the existing studies are based on graph neural networks that model traffic flow graphs and try to use fixed graph structure to deal with the relationship between nodes.However,due to the time-varying spatial correlation of the traffic network,there is no fixed node relationship,and these methods cannot effectively integrate the temporal and spatial features.This paper proposes a novel temporal-spatial dynamic graph convolutional network(TSADGCN).The dynamic time warping algorithm(DTW)is introduced to calculate the similarity of traffic flow sequence among network nodes in the time dimension,and the spatiotemporal graph of traffic flow is constructed to capture the spatiotemporal characteristics and dependencies of traffic flow.By combining graph attention network and time attention network,a spatiotemporal convolution block is constructed to capture spatiotemporal characteristics of traffic data.Experiments on open data sets PEMSD4 and PEMSD8 show that TSADGCN has higher prediction accuracy than well-known traffic flow prediction algorithms.
基金supported by the National Key R&D Program of China(Grant Number 2021YFB2700900)the National Natural Science Foundation of China(Grant Numbers 62172232,62172233)the Jiangsu Basic Research Program Natural Science Foundation(Grant Number BK20200039).
文摘Recently,deep image-hiding techniques have attracted considerable attention in covert communication and high-capacity information hiding.However,these approaches have some limitations.For example,a cover image lacks self-adaptability,information leakage,or weak concealment.To address these issues,this study proposes a universal and adaptable image-hiding method.First,a domain attention mechanism is designed by combining the Atrous convolution,which makes better use of the relationship between the secret image domain and the cover image domain.Second,to improve perceived human similarity,perceptual loss is incorporated into the training process.The experimental results are promising,with the proposed method achieving an average pixel discrepancy(APD)of 1.83 and a peak signal-to-noise ratio(PSNR)value of 40.72 dB between the cover and stego images,indicative of its high-quality output.Furthermore,the structural similarity index measure(SSIM)reaches 0.985 while the learned perceptual image patch similarity(LPIPS)remarkably registers at 0.0001.Moreover,self-testing and cross-experiments demonstrate the model’s adaptability and generalization in unknown hidden spaces,making it suitable for diverse computer vision tasks.
基金This research was funded by the General Project of Philosophy and Social Science of Heilongjiang Province,Grant Number:20SHB080.
文摘In recent years,how to efficiently and accurately identify multi-model fake news has become more challenging.First,multi-model data provides more evidence but not all are equally important.Secondly,social structure information has proven to be effective in fake news detection and how to combine it while reducing the noise information is critical.Unfortunately,existing approaches fail to handle these problems.This paper proposes a multi-model fake news detection framework based on Tex-modal Dominance and fusing Multiple Multi-model Cues(TD-MMC),which utilizes three valuable multi-model clues:text-model importance,text-image complementary,and text-image inconsistency.TD-MMC is dominated by textural content and assisted by image information while using social network information to enhance text representation.To reduce the irrelevant social structure’s information interference,we use a unidirectional cross-modal attention mechanism to selectively learn the social structure’s features.A cross-modal attention mechanism is adopted to obtain text-image cross-modal features while retaining textual features to reduce the loss of important information.In addition,TD-MMC employs a new multi-model loss to improve the model’s generalization ability.Extensive experiments have been conducted on two public real-world English and Chinese datasets,and the results show that our proposed model outperforms the state-of-the-art methods on classification evaluation metrics.
基金the National Natural Science Foundation of China under Grant No.62272087Science and Technology Planning Project of Sichuan Province under Grant No.2023YFG0161.
文摘Long-term urban traffic flow prediction is an important task in the field of intelligent transportation,as it can help optimize traffic management and improve travel efficiency.To improve prediction accuracy,a crucial issue is how to model spatiotemporal dependency in urban traffic data.In recent years,many studies have adopted spatiotemporal neural networks to extract key information from traffic data.However,most models ignore the semantic spatial similarity between long-distance areas when mining spatial dependency.They also ignore the impact of predicted time steps on the next unpredicted time step for making long-term predictions.Moreover,these models lack a comprehensive data embedding process to represent complex spatiotemporal dependency.This paper proposes a multi-scale persistent spatiotemporal transformer(MSPSTT)model to perform accurate long-term traffic flow prediction in cities.MSPSTT adopts an encoder-decoder structure and incorporates temporal,periodic,and spatial features to fully embed urban traffic data to address these issues.The model consists of a spatiotemporal encoder and a spatiotemporal decoder,which rely on temporal,geospatial,and semantic space multi-head attention modules to dynamically extract temporal,geospatial,and semantic characteristics.The spatiotemporal decoder combines the context information provided by the encoder,integrates the predicted time step information,and is iteratively updated to learn the correlation between different time steps in the broader time range to improve the model’s accuracy for long-term prediction.Experiments on four public transportation datasets demonstrate that MSPSTT outperforms the existing models by up to 9.5%on three common metrics.
基金Supported by the National Natural Science Foundation,China(No.61402011)the Open Project Program of the Key Laboratory of Embedded System and Service Computing of Ministry of Education(No.ESSCKF2021-05).
文摘Predictive Business Process Monitoring(PBPM)is a significant research area in Business Process Management(BPM)aimed at accurately forecasting future behavioral events.At present,deep learning methods are widely cited in PBPM research,but no method has been effective in fusing data information into the control flow for multi-perspective process prediction.Therefore,this paper proposes a process prediction method based on the hierarchical BERT and multi-perspective data fusion.Firstly,the first layer BERT network learns the correlations between different category attribute data.Then,the attribute data is integrated into a weighted event-level feature vector and input into the second layer BERT network to learn the impact and priority relationship of each event on future predicted events.Next,the multi-head attention mechanism within the framework is visualized for analysis,helping to understand the decision-making logic of the framework and providing visual predictions.Finally,experimental results show that the predictive accuracy of the framework surpasses the current state-of-the-art research methods and significantly enhances the predictive performance of BPM.
基金The work was supported by the National Natural Science Foundation of China(61972062,62306060)the Basic Research Project of Liaoning Province(2023JH2/101300191)+1 种基金the Liaoning Doctoral Research Start-Up Fund Project(2023-BS-078)the Dalian Academy of Social Sciences(2023dlsky028).
文摘Aiming at the challenges associated with the absence of a labeled dataset for Yi characters and the complexity of Yi character detection and recognition,we present a deep learning-based approach for Yi character detection and recognition.In the detection stage,an improved Differentiable Binarization Network(DBNet)framework is introduced to detect Yi characters,in which the Omni-dimensional Dynamic Convolution(ODConv)is combined with the ResNet-18 feature extraction module to obtain multi-dimensional complementary features,thereby improving the accuracy of Yi character detection.Then,the feature pyramid network fusion module is used to further extract Yi character image features,improving target recognition at different scales.Further,the previously generated feature map is passed through a head network to produce two maps:a probability map and an adaptive threshold map of the same size as the original map.These maps are then subjected to a differentiable binarization process,resulting in an approximate binarization map.This map helps to identify the boundaries of the text boxes.Finally,the text detection box is generated after the post-processing stage.In the recognition stage,an improved lightweight MobileNetV3 framework is used to recognize the detect character regions,where the original Squeeze-and-Excitation(SE)block is replaced by the efficient Shuffle Attention(SA)that integrates spatial and channel attention,improving the accuracy of Yi characters recognition.Meanwhile,the use of depth separable convolution and reversible residual structure can reduce the number of parameters and computation of the model,so that the model can better understand the contextual information and improve the accuracy of text recognition.The experimental results illustrate that the proposed method achieves good results in detecting and recognizing Yi characters,with detection and recognition accuracy rates of 97.5%and 96.8%,respectively.And also,we have compared the detection and recognition algorithms proposed in this paper with other typical algorithms.In these comparisons,the proposed model achieves better detection and recognition results with a certain reliability.