[目的/意义]在人工智能技术及应用快速发展与深刻变革背景下,机器学习领域不断出现新的研究主题和方法,深度学习和强化学习技术持续发展。因此,有必要探索不同领域机器学习研究主题演化过程,并识别出热点与新兴主题。[方法/过程]本文以...[目的/意义]在人工智能技术及应用快速发展与深刻变革背景下,机器学习领域不断出现新的研究主题和方法,深度学习和强化学习技术持续发展。因此,有必要探索不同领域机器学习研究主题演化过程,并识别出热点与新兴主题。[方法/过程]本文以图书情报领域中2011—2022年Web of Science数据库中的机器学习研究论文为例,融合LDA和Word2vec方法进行主题建模和主题演化分析,引入主题强度、主题影响力、主题关注度与主题新颖性指标识别热点主题与新兴热点主题。[结果/结论]研究结果表明,(1)Word2vec语义处理能力与LDA主题演化能力的结合能够更加准确地识别研究主题,直观展示研究主题的分阶段演化规律;(2)图书情报领域的机器学习研究主题主要分为自然语言处理与文本分析、数据挖掘与分析、信息与知识服务三大类范畴。各类主题之间的关联性较强,且具有主题关联演化特征;(3)设计的主题强度、主题影响力和主题关注度指标及综合指标能够较好地识别出2011—2014年、2015—2018年和2019—2022年3个不同周期阶段的热点主题。展开更多
安全是民航业的核心主题。针对目前民航非计划事件分析严重依赖专家经验及分析效率低下的问题,文章提出一种结合Word2vec和双向长短期记忆(bidirectional long short-term memory,BiLSTM)神经网络模型的民航非计划事件分析方法。首先采...安全是民航业的核心主题。针对目前民航非计划事件分析严重依赖专家经验及分析效率低下的问题,文章提出一种结合Word2vec和双向长短期记忆(bidirectional long short-term memory,BiLSTM)神经网络模型的民航非计划事件分析方法。首先采用Word2vec模型针对事件文本语料进行词向量训练,缩小空间向量维度;然后通过BiLSTM模型自动提取特征,获取事件文本的完整序列信息和上下文特征向量;最后采用softmax函数对民航非计划事件进行分类。实验结果表明,所提出的方法分类效果更好,能达到更优的准确率和F 1值,对不平衡数据样本同样具有较稳定的分类性能,证明了该方法在民航非计划事件分析上的适用性和有效性。展开更多
微博作为当今热门的社交平台,其中蕴含着许多具有强烈主观性的用户评论文本。为挖掘微博评论文本中潜在的信息,针对传统的情感分析模型中存在的语义缺失以及过度依赖人工标注等问题,提出一种基于LSTM+Word2vec的深度学习情感分析模型。...微博作为当今热门的社交平台,其中蕴含着许多具有强烈主观性的用户评论文本。为挖掘微博评论文本中潜在的信息,针对传统的情感分析模型中存在的语义缺失以及过度依赖人工标注等问题,提出一种基于LSTM+Word2vec的深度学习情感分析模型。采用Word2vec中的连续词袋模型(continuous bag of words,CBOW),利用语境的上下文结构及语义关系将每个词语映射为向量空间,增强词向量之间的稠密度;采用长短时记忆神经网络模型实现对文本上下文序列的线性抓取,最后输出分类预测的结果。实验结果的准确率可达95.9%,通过对照实验得到情感词典、RNN、SVM三种模型的准确率分别为52.3%、92.7%、85.7%,对比发现基于LSTM+Word2vec的深度学习情感分析模型的准确率更高,具有一定的鲁棒性和泛化性,对用户个性化推送和网络舆情监控具有重要意义。展开更多
Nowadays,Internet of Things(IoT)is widely deployed and brings great opportunities to change people's daily life.To realize more effective human-computer interaction in the IoT applications,the Question Answering(Q...Nowadays,Internet of Things(IoT)is widely deployed and brings great opportunities to change people's daily life.To realize more effective human-computer interaction in the IoT applications,the Question Answering(QA)systems implanted in the IoT services are supposed to improve the ability to understand natural language.Therefore,the distributed representation of words,which contains more semantic or syntactic information,has been playing a more and more important role in the QA systems.However,learning high-quality distributed word vectors requires lots of storage and computing resources,hence it cannot be deployed on the resource-constrained IoT devices.It is a good choice to outsource the data and computation to the cloud servers.Nevertheless,it could cause privacy risks to directly upload private data to the untrusted cloud.Therefore,realizing the word vector learning process over untrusted cloud servers without privacy leakage is an urgent and challenging task.In this paper,we present a novel efficient word vector learning scheme over encrypted data.We first design a series of arithmetic computation protocols.Then we use two non-colluding cloud servers to implement high-quality word vectors learning over encrypted data.The proposed scheme allows us to perform training word vectors on the remote cloud servers while protecting privacy.Security analysis and experiments over real data sets demonstrate that our scheme is more secure and efficient than existing privacy-preserving word vector learning schemes.展开更多
In the context of the accelerated pace of daily life and the development of e-commerce,online shopping is a mainstreamway for consumers to access products and services.To understand their emotional expressions in faci...In the context of the accelerated pace of daily life and the development of e-commerce,online shopping is a mainstreamway for consumers to access products and services.To understand their emotional expressions in facing different shopping experience scenarios,this paper presents a sentiment analysis method that combines the ecommerce reviewkeyword-generated imagewith a hybrid machine learning-basedmodel,inwhich theWord2Vec-TextRank is used to extract keywords that act as the inputs for generating the related images by generative Artificial Intelligence(AI).Subsequently,a hybrid Convolutional Neural Network and Support Vector Machine(CNNSVM)model is applied for sentiment classification of those keyword-generated images.For method validation,the data randomly comprised of 5000 reviews from Amazon have been analyzed.With superior keyword extraction capability,the proposedmethod achieves impressive results on sentiment classification with a remarkable accuracy of up to 97.13%.Such performance demonstrates its advantages by using the text-to-image approach,providing a unique perspective for sentiment analysis in the e-commerce review data compared to the existing works.Thus,the proposed method enhances the reliability and insights of customer feedback surveys,which would also establish a novel direction in similar cases,such as social media monitoring and market trend research.展开更多
文章运用WordSmith 8.0对艾丽斯·沃克小说《紫色》中的关键词和特殊词簇进行分析,揭示了《紫色》在词汇上的整体分布特征,并指出文中所使用的词汇与句式均与主人公非裔女性这一人物形象相吻合。通过Word Smith 8.0检索发现,沃克小...文章运用WordSmith 8.0对艾丽斯·沃克小说《紫色》中的关键词和特殊词簇进行分析,揭示了《紫色》在词汇上的整体分布特征,并指出文中所使用的词汇与句式均与主人公非裔女性这一人物形象相吻合。通过Word Smith 8.0检索发现,沃克小说中的关键词和词簇搭配对于促进故事情节和人物刻画方面有重要作用。研究结果表明,语料库文体学有助于学者发现以往研究中忽视的深层文本含义,是对以往《紫色》文学定性研究结果的再次验证,是定性研究和定量研究的积极结合,也是对学界“经典重读”的积极响应。展开更多
文摘[目的/意义]在人工智能技术及应用快速发展与深刻变革背景下,机器学习领域不断出现新的研究主题和方法,深度学习和强化学习技术持续发展。因此,有必要探索不同领域机器学习研究主题演化过程,并识别出热点与新兴主题。[方法/过程]本文以图书情报领域中2011—2022年Web of Science数据库中的机器学习研究论文为例,融合LDA和Word2vec方法进行主题建模和主题演化分析,引入主题强度、主题影响力、主题关注度与主题新颖性指标识别热点主题与新兴热点主题。[结果/结论]研究结果表明,(1)Word2vec语义处理能力与LDA主题演化能力的结合能够更加准确地识别研究主题,直观展示研究主题的分阶段演化规律;(2)图书情报领域的机器学习研究主题主要分为自然语言处理与文本分析、数据挖掘与分析、信息与知识服务三大类范畴。各类主题之间的关联性较强,且具有主题关联演化特征;(3)设计的主题强度、主题影响力和主题关注度指标及综合指标能够较好地识别出2011—2014年、2015—2018年和2019—2022年3个不同周期阶段的热点主题。
文摘安全是民航业的核心主题。针对目前民航非计划事件分析严重依赖专家经验及分析效率低下的问题,文章提出一种结合Word2vec和双向长短期记忆(bidirectional long short-term memory,BiLSTM)神经网络模型的民航非计划事件分析方法。首先采用Word2vec模型针对事件文本语料进行词向量训练,缩小空间向量维度;然后通过BiLSTM模型自动提取特征,获取事件文本的完整序列信息和上下文特征向量;最后采用softmax函数对民航非计划事件进行分类。实验结果表明,所提出的方法分类效果更好,能达到更优的准确率和F 1值,对不平衡数据样本同样具有较稳定的分类性能,证明了该方法在民航非计划事件分析上的适用性和有效性。
文摘微博作为当今热门的社交平台,其中蕴含着许多具有强烈主观性的用户评论文本。为挖掘微博评论文本中潜在的信息,针对传统的情感分析模型中存在的语义缺失以及过度依赖人工标注等问题,提出一种基于LSTM+Word2vec的深度学习情感分析模型。采用Word2vec中的连续词袋模型(continuous bag of words,CBOW),利用语境的上下文结构及语义关系将每个词语映射为向量空间,增强词向量之间的稠密度;采用长短时记忆神经网络模型实现对文本上下文序列的线性抓取,最后输出分类预测的结果。实验结果的准确率可达95.9%,通过对照实验得到情感词典、RNN、SVM三种模型的准确率分别为52.3%、92.7%、85.7%,对比发现基于LSTM+Word2vec的深度学习情感分析模型的准确率更高,具有一定的鲁棒性和泛化性,对用户个性化推送和网络舆情监控具有重要意义。
基金supported by the National Natural Science Foundation of China under Grant No.61672195,61872372the Open Foundation of State Key Laboratory of Cryptology No.MMKFKT201617the National University of Defense Technology Grant No.ZK19-38.
文摘Nowadays,Internet of Things(IoT)is widely deployed and brings great opportunities to change people's daily life.To realize more effective human-computer interaction in the IoT applications,the Question Answering(QA)systems implanted in the IoT services are supposed to improve the ability to understand natural language.Therefore,the distributed representation of words,which contains more semantic or syntactic information,has been playing a more and more important role in the QA systems.However,learning high-quality distributed word vectors requires lots of storage and computing resources,hence it cannot be deployed on the resource-constrained IoT devices.It is a good choice to outsource the data and computation to the cloud servers.Nevertheless,it could cause privacy risks to directly upload private data to the untrusted cloud.Therefore,realizing the word vector learning process over untrusted cloud servers without privacy leakage is an urgent and challenging task.In this paper,we present a novel efficient word vector learning scheme over encrypted data.We first design a series of arithmetic computation protocols.Then we use two non-colluding cloud servers to implement high-quality word vectors learning over encrypted data.The proposed scheme allows us to perform training word vectors on the remote cloud servers while protecting privacy.Security analysis and experiments over real data sets demonstrate that our scheme is more secure and efficient than existing privacy-preserving word vector learning schemes.
基金supported in part by the Guangzhou Science and Technology Plan Project under Grants 2024B03J1361,2023B03J1327,and 2023A04J0361in part by the Open Fund Project of Hubei Province Key Laboratory of Occupational Hazard Identification and Control under Grant OHIC2023Y10+3 种基金in part by the Guangdong Province Ordinary Colleges and Universities Young Innovative Talents Project under Grant 2023KQNCX036in part by the Special Fund for Science and Technology Innovation Strategy of Guangdong Province(Climbing Plan)under Grant pdjh2024a226in part by the Key Discipline Improvement Project of Guangdong Province under Grant 2022ZDJS015in part by theResearch Fund of Guangdong Polytechnic Normal University under Grants 22GPNUZDJS17 and 2022SDKYA015.
文摘In the context of the accelerated pace of daily life and the development of e-commerce,online shopping is a mainstreamway for consumers to access products and services.To understand their emotional expressions in facing different shopping experience scenarios,this paper presents a sentiment analysis method that combines the ecommerce reviewkeyword-generated imagewith a hybrid machine learning-basedmodel,inwhich theWord2Vec-TextRank is used to extract keywords that act as the inputs for generating the related images by generative Artificial Intelligence(AI).Subsequently,a hybrid Convolutional Neural Network and Support Vector Machine(CNNSVM)model is applied for sentiment classification of those keyword-generated images.For method validation,the data randomly comprised of 5000 reviews from Amazon have been analyzed.With superior keyword extraction capability,the proposedmethod achieves impressive results on sentiment classification with a remarkable accuracy of up to 97.13%.Such performance demonstrates its advantages by using the text-to-image approach,providing a unique perspective for sentiment analysis in the e-commerce review data compared to the existing works.Thus,the proposed method enhances the reliability and insights of customer feedback surveys,which would also establish a novel direction in similar cases,such as social media monitoring and market trend research.
文摘文章运用WordSmith 8.0对艾丽斯·沃克小说《紫色》中的关键词和特殊词簇进行分析,揭示了《紫色》在词汇上的整体分布特征,并指出文中所使用的词汇与句式均与主人公非裔女性这一人物形象相吻合。通过Word Smith 8.0检索发现,沃克小说中的关键词和词簇搭配对于促进故事情节和人物刻画方面有重要作用。研究结果表明,语料库文体学有助于学者发现以往研究中忽视的深层文本含义,是对以往《紫色》文学定性研究结果的再次验证,是定性研究和定量研究的积极结合,也是对学界“经典重读”的积极响应。