Aspect-based sentiment analysis aims to detect and classify the sentiment polarities as negative,positive,or neutral while associating them with their identified aspects from the corresponding context.In this regard,p...Aspect-based sentiment analysis aims to detect and classify the sentiment polarities as negative,positive,or neutral while associating them with their identified aspects from the corresponding context.In this regard,prior methodologies widely utilize either word embedding or tree-based rep-resentations.Meanwhile,the separate use of those deep features such as word embedding and tree-based dependencies has become a significant cause of information loss.Generally,word embedding preserves the syntactic and semantic relations between a couple of terms lying in a sentence.Besides,the tree-based structure conserves the grammatical and logical dependencies of context.In addition,the sentence-oriented word position describes a critical factor that influences the contextual information of a targeted sentence.Therefore,knowledge of the position-oriented information of words in a sentence has been considered significant.In this study,we propose to use word embedding,tree-based representation,and contextual position information in combination to evaluate whether their combination will improve the result’s effectiveness or not.In the meantime,their joint utilization enhances the accurate identification and extraction of targeted aspect terms,which also influences their classification process.In this research paper,we propose a method named Attention Based Multi-Channel Convolutional Neural Net-work(Att-MC-CNN)that jointly utilizes these three deep features such as word embedding with tree-based structure and contextual position informa-tion.These three parameters deliver to Multi-Channel Convolutional Neural Network(MC-CNN)that identifies and extracts the potential terms and classifies their polarities.In addition,these terms have been further filtered with the attention mechanism,which determines the most significant words.The empirical analysis proves the proposed approach’s effectiveness compared to existing techniques when evaluated on standard datasets.The experimental results represent our approach outperforms in the F1 measure with an overall achievement of 94%in identifying aspects and 92%in the task of sentiment classification.展开更多
One of the issues in Computer Vision is the automatic development of descriptions for images,sometimes known as image captioning.Deep Learning techniques have made significant progress in this area.The typical archite...One of the issues in Computer Vision is the automatic development of descriptions for images,sometimes known as image captioning.Deep Learning techniques have made significant progress in this area.The typical architecture of image captioning systems consists mainly of an image feature extractor subsystem followed by a caption generation lingual subsystem.This paper aims to find optimized models for these two subsystems.For the image feature extraction subsystem,the research tested eight different concatenations of pairs of vision models to get among them the most expressive extracted feature vector of the image.For the caption generation lingual subsystem,this paper tested three different pre-trained language embedding models:Glove(Global Vectors for Word Representation),BERT(Bidirectional Encoder Representations from Transformers),and TaCL(Token-aware Contrastive Learning),to select from them the most accurate pre-trained language embedding model.Our experiments showed that building an image captioning system that uses a concatenation of the two Transformer based models SWIN(Shiftedwindow)and PVT(PyramidVision Transformer)as an image feature extractor,combined with the TaCL language embedding model is the best result among the other combinations.展开更多
One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse ...One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse of dimensionality, a problem which plagues NLP in general given that the feature set for learning starts as a function of the size of the language in question, upwards of hundreds of thousands of terms typically. As such, much of the research and development in NLP in the last two decades has been in finding and optimizing solutions to this problem, to feature selection in NLP effectively. This paper looks at the development of these various techniques, leveraging a variety of statistical methods which rest on linguistic theories that were advanced in the middle of the last century, namely the distributional hypothesis which suggests that words that are found in similar contexts generally have similar meanings. In this survey paper we look at the development of some of the most popular of these techniques from a mathematical as well as data structure perspective, from Latent Semantic Analysis to Vector Space Models to their more modern variants which are typically referred to as word embeddings. In this review of algoriths such as Word2Vec, GloVe, ELMo and BERT, we explore the idea of semantic spaces more generally beyond applicability to NLP.展开更多
Dear Editor,This letter proposes a new pattern matching method based on word embedding and dynamic time warping(DTW)to identify groups of similar alarm floods.First,alarm messages are transformed into numeric values t...Dear Editor,This letter proposes a new pattern matching method based on word embedding and dynamic time warping(DTW)to identify groups of similar alarm floods.First,alarm messages are transformed into numeric values that represent alarms and also reflect the relationships between alarm occurrences.Then,similarities between numerically encoded alarm flood sequences are calculated by DTW and groups of similar floods are identified via clustering.The effectiveness of the proposed method is demonstrated by a case study with alarm&event data obtained from a public industrial simulation model.展开更多
Two learning models,Zolu-continuous bags of words(ZL-CBOW)and Zolu-skip-grams(ZL-SG),based on the Zolu function are proposed.The slope of Relu in word2vec has been changed by the Zolu function.The proposed models can ...Two learning models,Zolu-continuous bags of words(ZL-CBOW)and Zolu-skip-grams(ZL-SG),based on the Zolu function are proposed.The slope of Relu in word2vec has been changed by the Zolu function.The proposed models can process extremely large data sets as well as word2vec without increasing the complexity.Also,the models outperform several word embedding methods both in word similarity and syntactic accuracy.The method of ZL-CBOW outperforms CBOW in accuracy by 8.43%on the training set of capital-world,and by 1.24%on the training set of plural-verbs.Moreover,experimental simulations on word similarity and syntactic accuracy show that ZL-CBOW and ZL-SG are superior to LL-CBOW and LL-SG,respectively.展开更多
Word embedding has been widely used in word sense disambiguation(WSD)and many other tasks in recent years for it can well represent the semantics of words.However,the existing word embedding methods mostly represent e...Word embedding has been widely used in word sense disambiguation(WSD)and many other tasks in recent years for it can well represent the semantics of words.However,the existing word embedding methods mostly represent each word as a single vector,without considering the homonymy and polysemy of the word;thus,their performances are limited.In order to address this problem,an effective topical word embedding(TWE)‐based WSD method,named TWE‐WSD,is proposed,which integrates Latent Dirichlet Allocation(LDA)and word embedding.Instead of generating a single word vector(WV)for each word,TWE‐WSD generates a topical WV for each word under each topic.Effective integrating strategies are designed to obtain high quality contextual vectors.Extensive experiments on SemEval‐2013 and SemEval‐2015 for English all‐words tasks showed that TWE‐WSD outperforms other state‐of‐the‐art WSD methods,especially on nouns.展开更多
Sentiment word embedding has been extensively studied and used in sentiment analysis tasks.However,most existing models have failed to differentiate high-frequency and low-frequency words.Accordingly,the sentiment inf...Sentiment word embedding has been extensively studied and used in sentiment analysis tasks.However,most existing models have failed to differentiate high-frequency and low-frequency words.Accordingly,the sentiment information of low-frequency words is insufficiently captured,thus resulting in inaccurate sentiment word embedding and degradation of overall performance of sentiment analysis.A Bayesian estimation-based sentiment word embedding(BESWE)model,which aims to precisely extract the sentiment information of low-frequency words,has been proposed.In the model,a Bayesian estimator is constructed based on the co-occurrence probabilities and sentiment proba-bilities of words,and a novel loss function is defined for sentiment word embedding learning.The experimental results based on the sentiment lexicons and Movie Review dataset show that BESWE outperforms many state-of-the-art methods,for example,C&W,CBOW,GloVe,SE-HyRank and DLJT1,in sentiment analysis tasks,which demonstrate that Bayesian estimation can effectively capture the sentiment information of low-frequency words and integrate the sentiment information into the word embedding through the loss function.In addition,replacing the embedding of low-frequency words in the state-of-the-art methods with BESWE can significantly improve the performance of those methods in sentiment analysis tasks.展开更多
As a key technology of rapid and low-cost drug development, drug repositioning is getting popular. In this study, a text mining approach to the discovery of unknown drug-disease relation was tested. Using a word embed...As a key technology of rapid and low-cost drug development, drug repositioning is getting popular. In this study, a text mining approach to the discovery of unknown drug-disease relation was tested. Using a word embedding algorithm, senses of over 1.7 million words were well represented in sufficiently short feature vectors. Through various analysis including clustering and classification, feasibility of our approach was tested. Finally, our trained classification model achieved 87.6% accuracy in the prediction of drug-disease relation in cancer treatment and succeeded in discovering novel drug-disease relations that were actually reported in recent studies.展开更多
The statute recommendation problem is a sub problem of the automated decision system, which can help the legal staff to deal with the process of the case in an intelligent and automated way. In this paper, an improved...The statute recommendation problem is a sub problem of the automated decision system, which can help the legal staff to deal with the process of the case in an intelligent and automated way. In this paper, an improved common word similarity algorithm is proposed for normalization. Meanwhile, word mover’s distance (WMD) algorithm was applied to the similarity measurement and statute recommendation problem, and the problem scene which was originally used for classification was extended. Finally, a variety of recommendation strategies different from traditional collaborative filtering methods were proposed. The experimental results show that it achieves the best value of Fmeasure reaching 0.799. And the comparative experiment shows that WMD algorithm can achieve better results than TF-IDF and LDA algorithm.展开更多
A novel image encryption scheme based on parallel compressive sensing and edge detection embedding technology is proposed to improve visual security. Firstly, the plain image is sparsely represented using the discrete...A novel image encryption scheme based on parallel compressive sensing and edge detection embedding technology is proposed to improve visual security. Firstly, the plain image is sparsely represented using the discrete wavelet transform.Then, the coefficient matrix is scrambled and compressed to obtain a size-reduced image using the Fisher–Yates shuffle and parallel compressive sensing. Subsequently, to increase the security of the proposed algorithm, the compressed image is re-encrypted through permutation and diffusion to obtain a noise-like secret image. Finally, an adaptive embedding method based on edge detection for different carrier images is proposed to generate a visually meaningful cipher image. To improve the plaintext sensitivity of the algorithm, the counter mode is combined with the hash function to generate keys for chaotic systems. Additionally, an effective permutation method is designed to scramble the pixels of the compressed image in the re-encryption stage. The simulation results and analyses demonstrate that the proposed algorithm performs well in terms of visual security and decryption quality.展开更多
Security during remote transmission has been an important concern for researchers in recent years.In this paper,a hierarchical encryption multi-image encryption scheme for people with different security levels is desi...Security during remote transmission has been an important concern for researchers in recent years.In this paper,a hierarchical encryption multi-image encryption scheme for people with different security levels is designed,and a multiimage encryption(MIE)algorithm with row and column confusion and closed-loop bi-directional diffusion is adopted in the paper.While ensuring secure communication of medical image information,people with different security levels have different levels of decryption keys,and differentiated visual effects can be obtained by using the strong sensitivity of chaotic keys.The highest security level can obtain decrypted images without watermarks,and at the same time,patient information and copyright attribution can be verified by obtaining watermark images.The experimental results show that the scheme is sufficiently secure as an MIE scheme with visualized differences and the encryption and decryption efficiency is significantly improved compared to other works.展开更多
[目的/意义]在人工智能技术及应用快速发展与深刻变革背景下,机器学习领域不断出现新的研究主题和方法,深度学习和强化学习技术持续发展。因此,有必要探索不同领域机器学习研究主题演化过程,并识别出热点与新兴主题。[方法/过程]本文以...[目的/意义]在人工智能技术及应用快速发展与深刻变革背景下,机器学习领域不断出现新的研究主题和方法,深度学习和强化学习技术持续发展。因此,有必要探索不同领域机器学习研究主题演化过程,并识别出热点与新兴主题。[方法/过程]本文以图书情报领域中2011—2022年Web of Science数据库中的机器学习研究论文为例,融合LDA和Word2vec方法进行主题建模和主题演化分析,引入主题强度、主题影响力、主题关注度与主题新颖性指标识别热点主题与新兴热点主题。[结果/结论]研究结果表明,(1)Word2vec语义处理能力与LDA主题演化能力的结合能够更加准确地识别研究主题,直观展示研究主题的分阶段演化规律;(2)图书情报领域的机器学习研究主题主要分为自然语言处理与文本分析、数据挖掘与分析、信息与知识服务三大类范畴。各类主题之间的关联性较强,且具有主题关联演化特征;(3)设计的主题强度、主题影响力和主题关注度指标及综合指标能够较好地识别出2011—2014年、2015—2018年和2019—2022年3个不同周期阶段的热点主题。展开更多
基金supported by the Deanship of Scientific Research,Vice Presidency for Graduate Studies and Scientific Research,King Faisal University,Saudi Arabia[Grant No.3418].
文摘Aspect-based sentiment analysis aims to detect and classify the sentiment polarities as negative,positive,or neutral while associating them with their identified aspects from the corresponding context.In this regard,prior methodologies widely utilize either word embedding or tree-based rep-resentations.Meanwhile,the separate use of those deep features such as word embedding and tree-based dependencies has become a significant cause of information loss.Generally,word embedding preserves the syntactic and semantic relations between a couple of terms lying in a sentence.Besides,the tree-based structure conserves the grammatical and logical dependencies of context.In addition,the sentence-oriented word position describes a critical factor that influences the contextual information of a targeted sentence.Therefore,knowledge of the position-oriented information of words in a sentence has been considered significant.In this study,we propose to use word embedding,tree-based representation,and contextual position information in combination to evaluate whether their combination will improve the result’s effectiveness or not.In the meantime,their joint utilization enhances the accurate identification and extraction of targeted aspect terms,which also influences their classification process.In this research paper,we propose a method named Attention Based Multi-Channel Convolutional Neural Net-work(Att-MC-CNN)that jointly utilizes these three deep features such as word embedding with tree-based structure and contextual position informa-tion.These three parameters deliver to Multi-Channel Convolutional Neural Network(MC-CNN)that identifies and extracts the potential terms and classifies their polarities.In addition,these terms have been further filtered with the attention mechanism,which determines the most significant words.The empirical analysis proves the proposed approach’s effectiveness compared to existing techniques when evaluated on standard datasets.The experimental results represent our approach outperforms in the F1 measure with an overall achievement of 94%in identifying aspects and 92%in the task of sentiment classification.
文摘One of the issues in Computer Vision is the automatic development of descriptions for images,sometimes known as image captioning.Deep Learning techniques have made significant progress in this area.The typical architecture of image captioning systems consists mainly of an image feature extractor subsystem followed by a caption generation lingual subsystem.This paper aims to find optimized models for these two subsystems.For the image feature extraction subsystem,the research tested eight different concatenations of pairs of vision models to get among them the most expressive extracted feature vector of the image.For the caption generation lingual subsystem,this paper tested three different pre-trained language embedding models:Glove(Global Vectors for Word Representation),BERT(Bidirectional Encoder Representations from Transformers),and TaCL(Token-aware Contrastive Learning),to select from them the most accurate pre-trained language embedding model.Our experiments showed that building an image captioning system that uses a concatenation of the two Transformer based models SWIN(Shiftedwindow)and PVT(PyramidVision Transformer)as an image feature extractor,combined with the TaCL language embedding model is the best result among the other combinations.
文摘One of the critical hurdles, and breakthroughs, in the field of Natural Language Processing (NLP) in the last two decades has been the development of techniques for text representation that solves the so-called curse of dimensionality, a problem which plagues NLP in general given that the feature set for learning starts as a function of the size of the language in question, upwards of hundreds of thousands of terms typically. As such, much of the research and development in NLP in the last two decades has been in finding and optimizing solutions to this problem, to feature selection in NLP effectively. This paper looks at the development of these various techniques, leveraging a variety of statistical methods which rest on linguistic theories that were advanced in the middle of the last century, namely the distributional hypothesis which suggests that words that are found in similar contexts generally have similar meanings. In this survey paper we look at the development of some of the most popular of these techniques from a mathematical as well as data structure perspective, from Latent Semantic Analysis to Vector Space Models to their more modern variants which are typically referred to as word embeddings. In this review of algoriths such as Word2Vec, GloVe, ELMo and BERT, we explore the idea of semantic spaces more generally beyond applicability to NLP.
基金supported by the National Natural Science Foundation of China (61903345)the Knowledge Innovation Program of Wuhan-Shuguang Project (2022010801020208)
文摘Dear Editor,This letter proposes a new pattern matching method based on word embedding and dynamic time warping(DTW)to identify groups of similar alarm floods.First,alarm messages are transformed into numeric values that represent alarms and also reflect the relationships between alarm occurrences.Then,similarities between numerically encoded alarm flood sequences are calculated by DTW and groups of similar floods are identified via clustering.The effectiveness of the proposed method is demonstrated by a case study with alarm&event data obtained from a public industrial simulation model.
基金Supported by the National Natural Science Foundation of China(61771051,61675025)。
文摘Two learning models,Zolu-continuous bags of words(ZL-CBOW)and Zolu-skip-grams(ZL-SG),based on the Zolu function are proposed.The slope of Relu in word2vec has been changed by the Zolu function.The proposed models can process extremely large data sets as well as word2vec without increasing the complexity.Also,the models outperform several word embedding methods both in word similarity and syntactic accuracy.The method of ZL-CBOW outperforms CBOW in accuracy by 8.43%on the training set of capital-world,and by 1.24%on the training set of plural-verbs.Moreover,experimental simulations on word similarity and syntactic accuracy show that ZL-CBOW and ZL-SG are superior to LL-CBOW and LL-SG,respectively.
基金National Natural Science Foundation of China,Grant/Award Number:61562054The Fund of China Scholarship Council,Grant/Award Number:201908530036Talents Introduction Project of Guangxi University for Nationalities,Grant/Award Number:2014MDQD020。
文摘Word embedding has been widely used in word sense disambiguation(WSD)and many other tasks in recent years for it can well represent the semantics of words.However,the existing word embedding methods mostly represent each word as a single vector,without considering the homonymy and polysemy of the word;thus,their performances are limited.In order to address this problem,an effective topical word embedding(TWE)‐based WSD method,named TWE‐WSD,is proposed,which integrates Latent Dirichlet Allocation(LDA)and word embedding.Instead of generating a single word vector(WV)for each word,TWE‐WSD generates a topical WV for each word under each topic.Effective integrating strategies are designed to obtain high quality contextual vectors.Extensive experiments on SemEval‐2013 and SemEval‐2015 for English all‐words tasks showed that TWE‐WSD outperforms other state‐of‐the‐art WSD methods,especially on nouns.
基金Funding information National Statistical Science Research Project of China,Grant/Award Number:2016LY98Science and Technology Department of Guangdong Province in China,Grant/Award Numbers:2016A010101020,2016A010101021,2016A010101022+2 种基金Characteristic Innovation Projects of Guangdong Colleges and Universities,Grant/Award Numbers:2018KTSCX049,2018GKTSCX069Science and Technology Plan Project of Guangzhou,Grant/Award Numbers:201802010033,201903010013Bidding Project of Laboratory of Language Engineering and Computing of Guangdong University of Foreign Studies,Grant/Award Number:LEC2019ZBKT005。
文摘Sentiment word embedding has been extensively studied and used in sentiment analysis tasks.However,most existing models have failed to differentiate high-frequency and low-frequency words.Accordingly,the sentiment information of low-frequency words is insufficiently captured,thus resulting in inaccurate sentiment word embedding and degradation of overall performance of sentiment analysis.A Bayesian estimation-based sentiment word embedding(BESWE)model,which aims to precisely extract the sentiment information of low-frequency words,has been proposed.In the model,a Bayesian estimator is constructed based on the co-occurrence probabilities and sentiment proba-bilities of words,and a novel loss function is defined for sentiment word embedding learning.The experimental results based on the sentiment lexicons and Movie Review dataset show that BESWE outperforms many state-of-the-art methods,for example,C&W,CBOW,GloVe,SE-HyRank and DLJT1,in sentiment analysis tasks,which demonstrate that Bayesian estimation can effectively capture the sentiment information of low-frequency words and integrate the sentiment information into the word embedding through the loss function.In addition,replacing the embedding of low-frequency words in the state-of-the-art methods with BESWE can significantly improve the performance of those methods in sentiment analysis tasks.
文摘As a key technology of rapid and low-cost drug development, drug repositioning is getting popular. In this study, a text mining approach to the discovery of unknown drug-disease relation was tested. Using a word embedding algorithm, senses of over 1.7 million words were well represented in sufficiently short feature vectors. Through various analysis including clustering and classification, feasibility of our approach was tested. Finally, our trained classification model achieved 87.6% accuracy in the prediction of drug-disease relation in cancer treatment and succeeded in discovering novel drug-disease relations that were actually reported in recent studies.
文摘The statute recommendation problem is a sub problem of the automated decision system, which can help the legal staff to deal with the process of the case in an intelligent and automated way. In this paper, an improved common word similarity algorithm is proposed for normalization. Meanwhile, word mover’s distance (WMD) algorithm was applied to the similarity measurement and statute recommendation problem, and the problem scene which was originally used for classification was extended. Finally, a variety of recommendation strategies different from traditional collaborative filtering methods were proposed. The experimental results show that it achieves the best value of Fmeasure reaching 0.799. And the comparative experiment shows that WMD algorithm can achieve better results than TF-IDF and LDA algorithm.
基金supported by the Key Area R&D Program of Guangdong Province (Grant No.2022B0701180001)the National Natural Science Foundation of China (Grant No.61801127)+1 种基金the Science Technology Planning Project of Guangdong Province,China (Grant Nos.2019B010140002 and 2020B111110002)the Guangdong-Hong Kong-Macao Joint Innovation Field Project (Grant No.2021A0505080006)。
文摘A novel image encryption scheme based on parallel compressive sensing and edge detection embedding technology is proposed to improve visual security. Firstly, the plain image is sparsely represented using the discrete wavelet transform.Then, the coefficient matrix is scrambled and compressed to obtain a size-reduced image using the Fisher–Yates shuffle and parallel compressive sensing. Subsequently, to increase the security of the proposed algorithm, the compressed image is re-encrypted through permutation and diffusion to obtain a noise-like secret image. Finally, an adaptive embedding method based on edge detection for different carrier images is proposed to generate a visually meaningful cipher image. To improve the plaintext sensitivity of the algorithm, the counter mode is combined with the hash function to generate keys for chaotic systems. Additionally, an effective permutation method is designed to scramble the pixels of the compressed image in the re-encryption stage. The simulation results and analyses demonstrate that the proposed algorithm performs well in terms of visual security and decryption quality.
基金Project supported by the National Natural Science Foundation of China(Grant No.62061014)the Natural Science Foundation of Liaoning province of China(Grant No.2020-MS-274).
文摘Security during remote transmission has been an important concern for researchers in recent years.In this paper,a hierarchical encryption multi-image encryption scheme for people with different security levels is designed,and a multiimage encryption(MIE)algorithm with row and column confusion and closed-loop bi-directional diffusion is adopted in the paper.While ensuring secure communication of medical image information,people with different security levels have different levels of decryption keys,and differentiated visual effects can be obtained by using the strong sensitivity of chaotic keys.The highest security level can obtain decrypted images without watermarks,and at the same time,patient information and copyright attribution can be verified by obtaining watermark images.The experimental results show that the scheme is sufficiently secure as an MIE scheme with visualized differences and the encryption and decryption efficiency is significantly improved compared to other works.
文摘[目的/意义]在人工智能技术及应用快速发展与深刻变革背景下,机器学习领域不断出现新的研究主题和方法,深度学习和强化学习技术持续发展。因此,有必要探索不同领域机器学习研究主题演化过程,并识别出热点与新兴主题。[方法/过程]本文以图书情报领域中2011—2022年Web of Science数据库中的机器学习研究论文为例,融合LDA和Word2vec方法进行主题建模和主题演化分析,引入主题强度、主题影响力、主题关注度与主题新颖性指标识别热点主题与新兴热点主题。[结果/结论]研究结果表明,(1)Word2vec语义处理能力与LDA主题演化能力的结合能够更加准确地识别研究主题,直观展示研究主题的分阶段演化规律;(2)图书情报领域的机器学习研究主题主要分为自然语言处理与文本分析、数据挖掘与分析、信息与知识服务三大类范畴。各类主题之间的关联性较强,且具有主题关联演化特征;(3)设计的主题强度、主题影响力和主题关注度指标及综合指标能够较好地识别出2011—2014年、2015—2018年和2019—2022年3个不同周期阶段的热点主题。