Purpose:We present an analytical,open source and flexible natural language processing and text mining method for topic evolution,emerging topic detection and research trend forecasting for all kinds of data-tagged tex...Purpose:We present an analytical,open source and flexible natural language processing and text mining method for topic evolution,emerging topic detection and research trend forecasting for all kinds of data-tagged text.Design/methodology/approach:We make full use of the functions provided by the open source VOSviewer and Microsoft Office,including a thesaurus for data clean-up and a LOOKUP function for comparative analysis.Findings:Through application and verification in the domain of perovskite solar cells research,this method proves to be effective.Research limitations:A certain amount of manual data processing and a specific research domain background are required for better,more illustrative analysis results.Adequate time for analysis is also necessary.Practical implications:We try to set up an easy,useful,and flexible interdisciplinary text analyzing procedure for researchers,especially those without solid computer programming skills or who cannot easily access complex software.This procedure can also serve as a wonderful example for teaching information literacy.Originality/value:This text analysis approach has not been reported before.展开更多
As a major strategic technology for reducing greenhouse gas emissions and ensuring energy security,carbon capture,utilization,and storage(CCUS)is of great significance to large-scale emission reduction.From the perspe...As a major strategic technology for reducing greenhouse gas emissions and ensuring energy security,carbon capture,utilization,and storage(CCUS)is of great significance to large-scale emission reduction.From the perspective of knowledge discovery,it is important to analyse the study progress based on existing study achievements,excavate the evolution characteristics of study topics over time,review stage-specific findings,and construct CCUS domain knowledge map.This will help researchers gain an overall understanding of CCUS studies and promote the industry-college-research cooperation in respect to CCUS.Based on the Web of Science(WOS)database platform and CitNet-Explorer software,the present study explore the international research progress,topic evolution track,research hotspot and research trend of CCUS technology since its birth nearly 30 years ago,using bibliometric method,citation network visualization analysis method and cluster analysis method.Through the analysis of literature citation network,it is found that:16 CCUS topics,6 hotspots have been studied in the last three decades.The topics of CCUS studies present an evolution path from CCUS technology security and economicfeasibility analysis to CCUS technological popularization,and then CCUS technological improvement and development.Cutting-edge CCUS looks at the process and infrastructure construction,cost effectiveness and development prospect analysis.CCUS focuses on improvement of process technologies and related infrastructure.展开更多
The technology innovation management(TIM)field attracts an increasing amount of attention.This paper takes a retrospective look at high-quality publication output in the TIM field over the 55 years from 1968 to 2022,r...The technology innovation management(TIM)field attracts an increasing amount of attention.This paper takes a retrospective look at high-quality publication output in the TIM field over the 55 years from 1968 to 2022,revealing topics,their evolutions,and research trends.A total of 31,498 articles and proceeding papers published during this period are analyzed.The paper first extracts the fine-grained topic words using the tool ITGInsight.Then Linlog algorithm is used to cluster topics based on the cooccurrence of the topic words.Time is integrated within the topic cluster results so that topic evolutions and research trends are analyzed.The TIM field has four main topic clusters:technology research,product research,firm research,and future research.In every topic cluster,there are many fine-sorted macro-topics and micro-topics.There is an obvious increase in diversity in the topic clusters of technology research and firm research.Especially,the evolution of technology research has been closely connected with society.In contrast,product research has declined in its topic size.At the same time,future research maintains a certain stability of its scientific publications.The research predicts that all the four topics will retain their popularity,and play an important role in the TIM field.Among them,technology research will continue to expand and enrich the TIM field.The other three topics will deepen their research for a better development of the TIM field.The paper also proposes some advice for industry professionals,policymakers,and researchers.展开更多
Accurately representing the quantity and characteristics of users' interest in certain topics is an important problem facing topic evolution researchers, particularly as it applies to modem online environments. Searc...Accurately representing the quantity and characteristics of users' interest in certain topics is an important problem facing topic evolution researchers, particularly as it applies to modem online environments. Search engines can provide information retrieval for a specified topic from archived data, but fail to reflect changes in interest toward the topic over time in a structured way. This paper reviews notable research on topic evolution based on the probabilistic topic model from multiple aspects over the past decade. First, we introduce notations, terminology, and the basic topic model explored in the survey, then we summarize three categories of topic evolution based on the probabilistic topic model: the discrete time topic evolution model, the continuous time topic evolution model, and the online topic evolution model. Next, we describe applications of the topic evolution model and attempt to summarize model generalization performance evaluation and topic evolution evaluation methods, as well as providing comparative experimental results for different models. To conclude the review, we pose some open questions and discuss possible future research directions.展开更多
Purpose:This paper introduces an analysis framework for tracking the evolution of research topics at the selected topics level,covering a research topic’s evolution trend,evolution path and its content changes over t...Purpose:This paper introduces an analysis framework for tracking the evolution of research topics at the selected topics level,covering a research topic’s evolution trend,evolution path and its content changes over time.Design/methodology/approach:After the topics were recovered by the author-topic model,we first built the keyword-topic co-occurrence network to track the dynamics of topic trends.Then a single-mode network was constructed with each node representing a topic and edge indicating the relationship between topics.It was used to illustrate the evolution path and content changes of research topics.A case study was conducted on the digital library research in China to verify the effectiveness of the analysis framework.Findings:The experimental results show that this analysis framework can be used to track evolution of research topics at a micro level and using social network analysis method can help understand research topics’evolution paths and content changes with the passage of time.Research limitations:Using the analysis framework will produce limited results when examining unstructured data such as social media data.In addition,the effectiveness of the framework introduced in this paper needs to be verified with more research topics in information science and in more scientific fields.Practical implications:This analysis framework can help scholars and researchers map research topics’evolution process and gain insights into how a field’s topics have evolved over time.Originality/value:Tbe analysis framework used in this study can help reveal more micro evolution details.The index to measure topic association strength defined in this paper reflects both similarity and dissimilarity between topics,which belps better understand research topics’evolution paths and content changes.展开更多
The problem of "rich topics get richer"(RTGR) is popular to the topic models,which will bring the wrong topic distribution if the distributing process has not been intervened.In standard LDA(Latent Dirichlet...The problem of "rich topics get richer"(RTGR) is popular to the topic models,which will bring the wrong topic distribution if the distributing process has not been intervened.In standard LDA(Latent Dirichlet Allocation) model,each word in all the documents has the same statistical ability.In fact,the words have different impact towards different topics.Under the guidance of this thought,we extend ILDA(Infinite LDA) by considering the bias role of words to divide the topics.We propose a self-adaptive topic model to overcome the RTGR problem specifically.The model proposed in this paper is adapted to three questions:(1) the topic number is changeable with the collection of the documents,which is suitable for the dynamic data;(2) the words have discriminating attributes to topic distribution;(3) a selfadaptive method is used to realize the automatic re-sampling.To verify our model,we design a topic evolution analysis system which can realize the following functions:the topic classification in each cycle,the topic correlation in the adjacent cycles and the strength calculation of the sub topics in the order.The experiment both on NIPS corpus and our self-built news collections showed that the system could meet the given demand,the result was feasible.展开更多
With the increasing importance of computer intelligence in the new round of the industrial revolution,administrative,regulatory,or design(ARD)green technology contributes to improving national technological competitiv...With the increasing importance of computer intelligence in the new round of the industrial revolution,administrative,regulatory,or design(ARD)green technology contributes to improving national technological competitiveness and promoting the transformation of green technology,which is becoming an important field under sustainable development goals.The U.S.and China ranked top two in terms of paper influence and patent applications in the field of ARD green technology.However,few comparative studies have been conducted in these two countries.This study presents the evolution and landscapes of ARD green technology between China and the U.S.,focusing on comparing development priorities and technical layouts in each five-year plan period.According to the“International Patent Classification(IPC)Green Inventory”launched by the World Intellectual Property Organization(WIPO),we retrieved 69,412 patents published between 2001 and 2020 from the PatSnap database.Descriptive,content,and thematic network analyses were conducted using latent dirichlet allocation(LDA)and community detection algorithms.The results show that both China and the U.S.strategically focus on ARD green technology development.The technical topics in this field can be divided into three themes:data processing systems,traffic control systems,and building designs.The emphasis on technology research and development(R&D)differs between China and the U.S.There is also evidence that the U.S.has advantages in terms of technological innovation and capabilities.However,China has an advantage in terms of data volume,and the gap between China and the U.S.is gradually narrowing.We also highlight the contributions and limitations of this study.展开更多
Researchers across the globe have been increasingly interested in the manner in which important research topics evolve over time within the corpus of scientific literature. In a dataset of scientific articles, each do...Researchers across the globe have been increasingly interested in the manner in which important research topics evolve over time within the corpus of scientific literature. In a dataset of scientific articles, each document can be considered to comprise both the words of the document itself and its citations of other documents. In this paper, we propose a citationcontent-latent Dirichlet allocation(LDA) topic discovery method that accounts for both document citation relations and the content of the document itself via a probabilistic generative model. The citation-content-LDA topic model exploits a two-level topic model that includes the citation information for ‘father' topics and text information for sub-topics. The model parameters are estimated by a collapsed Gibbs sampling algorithm. We also propose a topic evolution algorithm that runs in two steps: topic segmentation and topic dependency relation calculation. We have tested the proposed citation-content-LDA model and topic evolution algorithm on two online datasets, IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI) and IEEE Computer Society(CS), to demonstrate that our algorithm effectively discovers important topics and reflects the topic evolution of important research themes. According to our evaluation metrics, citation-content-LDA outperforms both content-LDA and citation-LDA.展开更多
Purpose:This article aims to describe the global research profile and the development trends of single cell research from the perspective of bibliometric analysis and semantic mining.Design/methodology/approach:The li...Purpose:This article aims to describe the global research profile and the development trends of single cell research from the perspective of bibliometric analysis and semantic mining.Design/methodology/approach:The literatures on single cell research were extracted from Clarivate Analytic’s Web of Science Core Collection between 2009 and 2019.Firstly,bibliometric analyses were performed with Thomson Data Analyzer(TDA).Secondly,topic identification and evolution trends of single cell research was conducted through the LDA topic model.Thirdly,taking the post-discretized method which is used for topic evolution analysis for reference,the topics were also be dispersed to countries to detect the spatial distribution.Findings:The publication of single cell research shows significantly increasing tendency in the last decade.The topics of single cell research field can be divided into three categories,which respectively refers to single cell research methods,mechanism of biological process,and clinical application of single cell technologies.The different trends of these categories indicate that technological innovation drives the development of applied research.The continuous and rapid growth of the topic strength in the field of cancer diagnosis and treatment indicates that this research topic has received extensive attention in recent years.The topic distributions of some countries are relatively balanced,while for the other countries,several topics show significant superiority.Research limitations:The analyzed data of this study only contain those were included in the Web of Science Core Collection.Practical implications:This study provides insights into the research progress regarding single cell field and identifies the most concerned topics which reflect potential opportunities and challenges.The national topic distribution analysis based on the post-discretized analysis method extends topic analysis from time dimension to space dimension.Originality/value:This paper combines bibliometric analysis and LDA model to analyze the evolution trends of single cell research field.The method of extending post-discretized analysis from time dimension to space dimension is distinctive and insightful.展开更多
Revealing and comparing the evolution process of hot topics in the field of Digital Library in China and abroad.[Methods]:Taking data in the field of Digital Library from core journals in CKNI and Web of Science from ...Revealing and comparing the evolution process of hot topics in the field of Digital Library in China and abroad.[Methods]:Taking data in the field of Digital Library from core journals in CKNI and Web of Science from 1990 s to 2020,topics are extracted by LDA model and hot topics are selected based on life cycle theory.Topic evolution paths are generated to contrast evolution of hot topics between home and abroad which are grouped into dimensions of technology and application.It fails to analyze the lagging performance and reasons of research hot topics in the field of Digital Library at home and abroad.In technological dimension of Digital Library,the research content in China lags behind that at abroad.In terms of application dimension,Chinese application tends to focus on social sciences,while application at abroad tends to focus on natural sciences.The evolution of overall research focus is U-shaped,which gradually shifted from technological research to application research,and now turn back to technological dimension.Nowadays,there are also many emerging topics combined with big data technology.展开更多
Purpose: This study aims to reveal the landscape and trends ofgraphene research in the world by using data from Chemical Abstracts Service (CAS). Design/methodology/approach: Index data from CAS have been retrieve...Purpose: This study aims to reveal the landscape and trends ofgraphene research in the world by using data from Chemical Abstracts Service (CAS). Design/methodology/approach: Index data from CAS have been retrieved on 78,756 papers and 23,057 patents on graphene from 1985 to March 2016, and scientometric methods were used to analyze the growth and distribution of R&D output, topic distribution and evolution, and distribution and evolution of substance properties and roles. Findings: In recent years R&D in graphene keeps in rapid growth, while China, South Korea and United States are the largest producers in research but China is relatively weak in patent applications in other countries. Research topics in graphene are continuously expanding from mechanical, material, and electrical properties to a diverse range of application areas such as batteries, capacitors, semiconductors, and sensors devices. The roles of emerging substances are increasing in Preparation and Biological Study. More techniques have been included to improve the preparation processes and applications of graphene in various fields. Research limitations: Only data from CAS is used and some R&D activities solely reported through other channels may be missed. Also more detailed analysis need to be done to reveal the impact of research on development or vice verse, development dynamics among the players, and impact of emerging terms or substance roles on research and technology development. Practical implications: This will provide a valuable reference for scientists and developers, R&D managers, R&D policy makers, industrial and business investers to understand the landscape and trends ofgraphene research. Its methodologies can be applied to other fields or with data from other similar sources.Originality/value: The integrative use of indexing data on papers and patents of CAS and the systematic exploration of the distribution trends in output, topics, substance roles are distinctive and insightful.展开更多
Many social events spread fast through the Internet and arouse wide community discussions. Those on-line public opinions emerge into diverse topics along the time. Moreover, the strength of the topics is fluctuating. ...Many social events spread fast through the Internet and arouse wide community discussions. Those on-line public opinions emerge into diverse topics along the time. Moreover, the strength of the topics is fluctuating. How to catch both primary topics and trend of topics over the shifting on-line discussions are not only of theoretical importance for scientific research, but also of practical importance for societal management especially in current China. To try the cutting-edge text analytic technologies to deal with unstructured on-line public opinions and provide support for social problem-solving in the big data era is worth an endeavour. This paper applies dynamic topic model (DTM) to explore the changing topics of new posts collected from Tianya Zatan Board of Tianya Club, the most influential Chinese BBS in China's Mainland. By analysis of the hot and cold terms trends, we catch the topics shift of main on-line concerns with illustrations of topics of school bus and environment in December of 2011. An algorithm is proposed to compute the strength fluctuation of each topic. With visualized analysis of the respective main topics in several months of 2012, some patterns of the topics fluctuation on the board are summarized.展开更多
With the progress and development of computer technology,applying machine learning methods to cancer research has become an important research field.To analyze the most recent research status and trends,main research ...With the progress and development of computer technology,applying machine learning methods to cancer research has become an important research field.To analyze the most recent research status and trends,main research topics,topic evolutions,research collaborations,and potential directions of this research field,this study conducts a bibliometric analysis on 6206 research articles worldwide collected from PubMed between 2011 and 2021 concerning cancer research using machine learning methods.Python is used as a tool for bibliometric analysis,Gephi is used for social network analysis,and the Latent Dirichlet Allocation model is used for topic modeling.The trend analysis of articles not only reflects the innovative research at the intersection of machine learning and cancer but also demonstrates its vigorous development and increasing impacts.In terms of journals,Nature Communications is the most influential journal and Scientific Reports is the most prolific one.The United States and Harvard University have contributed the most to cancer research using machine learning methods.As for the research topic,“Support Vector Machine,”“classification,”and“deep learning”have been the core focuses of the research field.Findings are helpful for scholars and related practitioners to better understand the development status and trends of cancer research using machine learning methods,as well as to have a deeper understanding of research hotspots.展开更多
The Product Sensitive Online Dirichlet Allocation model(PSOLDA)proposed in this paper mainly uses the sentiment polarity of topic words in the review text to improve the accuracy of topic evolution.First,we use Latent...The Product Sensitive Online Dirichlet Allocation model(PSOLDA)proposed in this paper mainly uses the sentiment polarity of topic words in the review text to improve the accuracy of topic evolution.First,we use Latent Dirichlet Allocation(LDA)to obtain the distribution of topic words in the current time window.Second,the word2 vec word vector is used as auxiliary information to determine the sentiment polarity and obtain the sentiment polarity distribution of the current topic.Finally,the sentiment polarity changes of the topics in the previous and next time window are mapped to the sentiment factors,and the distribution of topic words in the next time window is controlled through them.The experimental results show that the PSOLDA model decreases the probability distribution by 0.1601,while Online Twitter LDA only increases by 0.0699.The topic evolution method that integrates the sentimental information of topic words proposed in this paper is better than the traditional model.展开更多
文摘Purpose:We present an analytical,open source and flexible natural language processing and text mining method for topic evolution,emerging topic detection and research trend forecasting for all kinds of data-tagged text.Design/methodology/approach:We make full use of the functions provided by the open source VOSviewer and Microsoft Office,including a thesaurus for data clean-up and a LOOKUP function for comparative analysis.Findings:Through application and verification in the domain of perovskite solar cells research,this method proves to be effective.Research limitations:A certain amount of manual data processing and a specific research domain background are required for better,more illustrative analysis results.Adequate time for analysis is also necessary.Practical implications:We try to set up an easy,useful,and flexible interdisciplinary text analyzing procedure for researchers,especially those without solid computer programming skills or who cannot easily access complex software.This procedure can also serve as a wonderful example for teaching information literacy.Originality/value:This text analysis approach has not been reported before.
基金Supported by the Fundamental Research funds for the China Central Universities“CCUS topic detection and evolution analysis based on CitNetExplorer”[Grant number.JBK2002042].
文摘As a major strategic technology for reducing greenhouse gas emissions and ensuring energy security,carbon capture,utilization,and storage(CCUS)is of great significance to large-scale emission reduction.From the perspective of knowledge discovery,it is important to analyse the study progress based on existing study achievements,excavate the evolution characteristics of study topics over time,review stage-specific findings,and construct CCUS domain knowledge map.This will help researchers gain an overall understanding of CCUS studies and promote the industry-college-research cooperation in respect to CCUS.Based on the Web of Science(WOS)database platform and CitNet-Explorer software,the present study explore the international research progress,topic evolution track,research hotspot and research trend of CCUS technology since its birth nearly 30 years ago,using bibliometric method,citation network visualization analysis method and cluster analysis method.Through the analysis of literature citation network,it is found that:16 CCUS topics,6 hotspots have been studied in the last three decades.The topics of CCUS studies present an evolution path from CCUS technology security and economicfeasibility analysis to CCUS technological popularization,and then CCUS technological improvement and development.Cutting-edge CCUS looks at the process and infrastructure construction,cost effectiveness and development prospect analysis.CCUS focuses on improvement of process technologies and related infrastructure.
基金supported by the General Program of National Natural Science Foundation of China under(Grant No.72074020)the Young Scientists Fund of National Natural Science Foundation of China under(Grant No.72004009,72304074)
文摘The technology innovation management(TIM)field attracts an increasing amount of attention.This paper takes a retrospective look at high-quality publication output in the TIM field over the 55 years from 1968 to 2022,revealing topics,their evolutions,and research trends.A total of 31,498 articles and proceeding papers published during this period are analyzed.The paper first extracts the fine-grained topic words using the tool ITGInsight.Then Linlog algorithm is used to cluster topics based on the cooccurrence of the topic words.Time is integrated within the topic cluster results so that topic evolutions and research trends are analyzed.The TIM field has four main topic clusters:technology research,product research,firm research,and future research.In every topic cluster,there are many fine-sorted macro-topics and micro-topics.There is an obvious increase in diversity in the topic clusters of technology research and firm research.Especially,the evolution of technology research has been closely connected with society.In contrast,product research has declined in its topic size.At the same time,future research maintains a certain stability of its scientific publications.The research predicts that all the four topics will retain their popularity,and play an important role in the TIM field.Among them,technology research will continue to expand and enrich the TIM field.The other three topics will deepen their research for a better development of the TIM field.The paper also proposes some advice for industry professionals,policymakers,and researchers.
基金Acknowledgements The authors would like to thank the anonymous reviewers for their constructive comments and suggestions, which significantly contributed to improving the manuscript. This work was supported by the National Key Basic Research Project of China (973 Program) (2012CB316400), the National Natural Science Foundation of China (Grant Nos. 61471321, 61202400, 31300539, and 31570629), the Zhejiang Provincial Natural Science Foundation of China (LY15C140005, LY16F010004), Science and Technology Department of Zhejiang Province Public Welfare Project (2016C31G2010057, 2015C31004), Fundamental Research Funds for the Central Universities (172210261) and the Zhejiang Provincial Key Laboratory of Forestry Intelligent Monitoring and Information Technology Research.
文摘Accurately representing the quantity and characteristics of users' interest in certain topics is an important problem facing topic evolution researchers, particularly as it applies to modem online environments. Search engines can provide information retrieval for a specified topic from archived data, but fail to reflect changes in interest toward the topic over time in a structured way. This paper reviews notable research on topic evolution based on the probabilistic topic model from multiple aspects over the past decade. First, we introduce notations, terminology, and the basic topic model explored in the survey, then we summarize three categories of topic evolution based on the probabilistic topic model: the discrete time topic evolution model, the continuous time topic evolution model, and the online topic evolution model. Next, we describe applications of the topic evolution model and attempt to summarize model generalization performance evaluation and topic evolution evaluation methods, as well as providing comparative experimental results for different models. To conclude the review, we pose some open questions and discuss possible future research directions.
文摘Purpose:This paper introduces an analysis framework for tracking the evolution of research topics at the selected topics level,covering a research topic’s evolution trend,evolution path and its content changes over time.Design/methodology/approach:After the topics were recovered by the author-topic model,we first built the keyword-topic co-occurrence network to track the dynamics of topic trends.Then a single-mode network was constructed with each node representing a topic and edge indicating the relationship between topics.It was used to illustrate the evolution path and content changes of research topics.A case study was conducted on the digital library research in China to verify the effectiveness of the analysis framework.Findings:The experimental results show that this analysis framework can be used to track evolution of research topics at a micro level and using social network analysis method can help understand research topics’evolution paths and content changes with the passage of time.Research limitations:Using the analysis framework will produce limited results when examining unstructured data such as social media data.In addition,the effectiveness of the framework introduced in this paper needs to be verified with more research topics in information science and in more scientific fields.Practical implications:This analysis framework can help scholars and researchers map research topics’evolution process and gain insights into how a field’s topics have evolved over time.Originality/value:Tbe analysis framework used in this study can help reveal more micro evolution details.The index to measure topic association strength defined in this paper reflects both similarity and dissimilarity between topics,which belps better understand research topics’evolution paths and content changes.
基金ACKNOWLEDGMENTS This work is supported by grants National 973 project (No.2013CB29606), Natural Science Foundation of China (No.61202244), research fund of ShangQiu Normal Colledge (No. 2013GGJS013). N1PS corpus is supported by SourceForge. We thank the anonymous reviewers for their helpful comments.
文摘The problem of "rich topics get richer"(RTGR) is popular to the topic models,which will bring the wrong topic distribution if the distributing process has not been intervened.In standard LDA(Latent Dirichlet Allocation) model,each word in all the documents has the same statistical ability.In fact,the words have different impact towards different topics.Under the guidance of this thought,we extend ILDA(Infinite LDA) by considering the bias role of words to divide the topics.We propose a self-adaptive topic model to overcome the RTGR problem specifically.The model proposed in this paper is adapted to three questions:(1) the topic number is changeable with the collection of the documents,which is suitable for the dynamic data;(2) the words have discriminating attributes to topic distribution;(3) a selfadaptive method is used to realize the automatic re-sampling.To verify our model,we design a topic evolution analysis system which can realize the following functions:the topic classification in each cycle,the topic correlation in the adjacent cycles and the strength calculation of the sub topics in the order.The experiment both on NIPS corpus and our self-built news collections showed that the system could meet the given demand,the result was feasible.
基金supported by the National Natural Science Foundation of China(Grant No.:71774130)China Huaneng Group(Grant No.:HNKJ20-H87).
文摘With the increasing importance of computer intelligence in the new round of the industrial revolution,administrative,regulatory,or design(ARD)green technology contributes to improving national technological competitiveness and promoting the transformation of green technology,which is becoming an important field under sustainable development goals.The U.S.and China ranked top two in terms of paper influence and patent applications in the field of ARD green technology.However,few comparative studies have been conducted in these two countries.This study presents the evolution and landscapes of ARD green technology between China and the U.S.,focusing on comparing development priorities and technical layouts in each five-year plan period.According to the“International Patent Classification(IPC)Green Inventory”launched by the World Intellectual Property Organization(WIPO),we retrieved 69,412 patents published between 2001 and 2020 from the PatSnap database.Descriptive,content,and thematic network analyses were conducted using latent dirichlet allocation(LDA)and community detection algorithms.The results show that both China and the U.S.strategically focus on ARD green technology development.The technical topics in this field can be divided into three themes:data processing systems,traffic control systems,and building designs.The emphasis on technology research and development(R&D)differs between China and the U.S.There is also evidence that the U.S.has advantages in terms of technological innovation and capabilities.However,China has an advantage in terms of data volume,and the gap between China and the U.S.is gradually narrowing.We also highlight the contributions and limitations of this study.
基金supported by the National Basic Research Program(973)of China(No.2012CB316400)
文摘Researchers across the globe have been increasingly interested in the manner in which important research topics evolve over time within the corpus of scientific literature. In a dataset of scientific articles, each document can be considered to comprise both the words of the document itself and its citations of other documents. In this paper, we propose a citationcontent-latent Dirichlet allocation(LDA) topic discovery method that accounts for both document citation relations and the content of the document itself via a probabilistic generative model. The citation-content-LDA topic model exploits a two-level topic model that includes the citation information for ‘father' topics and text information for sub-topics. The model parameters are estimated by a collapsed Gibbs sampling algorithm. We also propose a topic evolution algorithm that runs in two steps: topic segmentation and topic dependency relation calculation. We have tested the proposed citation-content-LDA model and topic evolution algorithm on two online datasets, IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI) and IEEE Computer Society(CS), to demonstrate that our algorithm effectively discovers important topics and reflects the topic evolution of important research themes. According to our evaluation metrics, citation-content-LDA outperforms both content-LDA and citation-LDA.
基金the Chinese Academy of Sciences literature information capability construction project of 2020“Construction of strategic information research and consultation system in science and technology field”(Grant No.E290001)。
文摘Purpose:This article aims to describe the global research profile and the development trends of single cell research from the perspective of bibliometric analysis and semantic mining.Design/methodology/approach:The literatures on single cell research were extracted from Clarivate Analytic’s Web of Science Core Collection between 2009 and 2019.Firstly,bibliometric analyses were performed with Thomson Data Analyzer(TDA).Secondly,topic identification and evolution trends of single cell research was conducted through the LDA topic model.Thirdly,taking the post-discretized method which is used for topic evolution analysis for reference,the topics were also be dispersed to countries to detect the spatial distribution.Findings:The publication of single cell research shows significantly increasing tendency in the last decade.The topics of single cell research field can be divided into three categories,which respectively refers to single cell research methods,mechanism of biological process,and clinical application of single cell technologies.The different trends of these categories indicate that technological innovation drives the development of applied research.The continuous and rapid growth of the topic strength in the field of cancer diagnosis and treatment indicates that this research topic has received extensive attention in recent years.The topic distributions of some countries are relatively balanced,while for the other countries,several topics show significant superiority.Research limitations:The analyzed data of this study only contain those were included in the Web of Science Core Collection.Practical implications:This study provides insights into the research progress regarding single cell field and identifies the most concerned topics which reflect potential opportunities and challenges.The national topic distribution analysis based on the post-discretized analysis method extends topic analysis from time dimension to space dimension.Originality/value:This paper combines bibliometric analysis and LDA model to analyze the evolution trends of single cell research field.The method of extending post-discretized analysis from time dimension to space dimension is distinctive and insightful.
文摘Revealing and comparing the evolution process of hot topics in the field of Digital Library in China and abroad.[Methods]:Taking data in the field of Digital Library from core journals in CKNI and Web of Science from 1990 s to 2020,topics are extracted by LDA model and hot topics are selected based on life cycle theory.Topic evolution paths are generated to contrast evolution of hot topics between home and abroad which are grouped into dimensions of technology and application.It fails to analyze the lagging performance and reasons of research hot topics in the field of Digital Library at home and abroad.In technological dimension of Digital Library,the research content in China lags behind that at abroad.In terms of application dimension,Chinese application tends to focus on social sciences,while application at abroad tends to focus on natural sciences.The evolution of overall research focus is U-shaped,which gradually shifted from technological research to application research,and now turn back to technological dimension.Nowadays,there are also many emerging topics combined with big data technology.
文摘Purpose: This study aims to reveal the landscape and trends ofgraphene research in the world by using data from Chemical Abstracts Service (CAS). Design/methodology/approach: Index data from CAS have been retrieved on 78,756 papers and 23,057 patents on graphene from 1985 to March 2016, and scientometric methods were used to analyze the growth and distribution of R&D output, topic distribution and evolution, and distribution and evolution of substance properties and roles. Findings: In recent years R&D in graphene keeps in rapid growth, while China, South Korea and United States are the largest producers in research but China is relatively weak in patent applications in other countries. Research topics in graphene are continuously expanding from mechanical, material, and electrical properties to a diverse range of application areas such as batteries, capacitors, semiconductors, and sensors devices. The roles of emerging substances are increasing in Preparation and Biological Study. More techniques have been included to improve the preparation processes and applications of graphene in various fields. Research limitations: Only data from CAS is used and some R&D activities solely reported through other channels may be missed. Also more detailed analysis need to be done to reveal the impact of research on development or vice verse, development dynamics among the players, and impact of emerging terms or substance roles on research and technology development. Practical implications: This will provide a valuable reference for scientists and developers, R&D managers, R&D policy makers, industrial and business investers to understand the landscape and trends ofgraphene research. Its methodologies can be applied to other fields or with data from other similar sources.Originality/value: The integrative use of indexing data on papers and patents of CAS and the systematic exploration of the distribution trends in output, topics, substance roles are distinctive and insightful.
基金supported by National Basic Research Program of China under Grant No.2010CB731405National Natural Science Foundation of China under Grant No.71171187&71371107
文摘Many social events spread fast through the Internet and arouse wide community discussions. Those on-line public opinions emerge into diverse topics along the time. Moreover, the strength of the topics is fluctuating. How to catch both primary topics and trend of topics over the shifting on-line discussions are not only of theoretical importance for scientific research, but also of practical importance for societal management especially in current China. To try the cutting-edge text analytic technologies to deal with unstructured on-line public opinions and provide support for social problem-solving in the big data era is worth an endeavour. This paper applies dynamic topic model (DTM) to explore the changing topics of new posts collected from Tianya Zatan Board of Tianya Club, the most influential Chinese BBS in China's Mainland. By analysis of the hot and cold terms trends, we catch the topics shift of main on-line concerns with illustrations of topics of school bus and environment in December of 2011. An algorithm is proposed to compute the strength fluctuation of each topic. With visualized analysis of the respective main topics in several months of 2012, some patterns of the topics fluctuation on the board are summarized.
基金Natural Science Foundation of Guangdong Province,Grant/Award Number:2021A1515011339。
文摘With the progress and development of computer technology,applying machine learning methods to cancer research has become an important research field.To analyze the most recent research status and trends,main research topics,topic evolutions,research collaborations,and potential directions of this research field,this study conducts a bibliometric analysis on 6206 research articles worldwide collected from PubMed between 2011 and 2021 concerning cancer research using machine learning methods.Python is used as a tool for bibliometric analysis,Gephi is used for social network analysis,and the Latent Dirichlet Allocation model is used for topic modeling.The trend analysis of articles not only reflects the innovative research at the intersection of machine learning and cancer but also demonstrates its vigorous development and increasing impacts.In terms of journals,Nature Communications is the most influential journal and Scientific Reports is the most prolific one.The United States and Harvard University have contributed the most to cancer research using machine learning methods.As for the research topic,“Support Vector Machine,”“classification,”and“deep learning”have been the core focuses of the research field.Findings are helpful for scholars and related practitioners to better understand the development status and trends of cancer research using machine learning methods,as well as to have a deeper understanding of research hotspots.
基金Supported by the Opening Project of Shanghai Key Laboratory of Integrated Administration Technologies for Information Security(AGK2019004)Songjiang District Science and Technology Research Project(19SJKJGG83)National Natural Science Foundation of China(61802251)。
文摘The Product Sensitive Online Dirichlet Allocation model(PSOLDA)proposed in this paper mainly uses the sentiment polarity of topic words in the review text to improve the accuracy of topic evolution.First,we use Latent Dirichlet Allocation(LDA)to obtain the distribution of topic words in the current time window.Second,the word2 vec word vector is used as auxiliary information to determine the sentiment polarity and obtain the sentiment polarity distribution of the current topic.Finally,the sentiment polarity changes of the topics in the previous and next time window are mapped to the sentiment factors,and the distribution of topic words in the next time window is controlled through them.The experimental results show that the PSOLDA model decreases the probability distribution by 0.1601,while Online Twitter LDA only increases by 0.0699.The topic evolution method that integrates the sentimental information of topic words proposed in this paper is better than the traditional model.