Exploring the Potentialities of Automatic Extraction of University Webometric Information 被引量：2

下载PDF

导出

摘要 Purpose:The main objective of this work is to show the potentialities of recently developed approaches for automatic knowledge extraction directly from the universities’websites.The information automatically extracted can be potentially updated with a frequency higher than once per year,and be safe from manipulations or misinterpretations.Moreover,this approach allows us flexibility in collecting indicators about the efficiency of universities’websites and their effectiveness in disseminating key contents.These new indicators can complement traditional indicators of scientific research(e.g.number of articles and number of citations)and teaching(e.g.number of students and graduates)by introducing further dimensions to allow new insights for“profiling”the analyzed universities.Design/methodology/approach:Webometrics relies on web mining methods and techniques to perform quantitative analyses of the web.This study implements an advanced application of the webometric approach,exploiting all the three categories of web mining:web content mining;web structure mining;web usage mining.The information to compute our indicators has been extracted from the universities’websites by using web scraping and text mining techniques.The scraped information has been stored in a NoSQL DB according to a semistructured form to allow for retrieving information efficiently by text mining techniques.This provides increased flexibility in the design of new indicators,opening the door to new types of analyses.Some data have also been collected by means of batch interrogations of search engines(Bing,www.bing.com)or from a leading provider of Web analytics(SimilarWeb,http://www.similarweb.com).The information extracted from the Web has been combined with the University structural information taken from the European Tertiary Education Register(https://eter.joanneum.at/#/home),a database collecting information on Higher Education Institutions(HEIs)at European level.All the above was used to perform a clusterization of 79 Italian universities based on structural and digital indicators.Findings:The main findings of this study concern the evaluation of the potential in digitalization of universities,in particular by presenting techniques for the automatic extraction of information from the web to build indicators of quality and impact of universities’websites.These indicators can complement traditional indicators and can be used to identify groups of universities with common features using clustering techniques working with the above indicators.Research limitations:The results reported in this study refers to Italian universities only,but the approach could be extended to other university systems abroad.Practical implications:The approach proposed in this study and its illustration on Italian universities show the usefulness of recently introduced automatic data extraction and web scraping approaches and its practical relevance for characterizing and profiling the activities of universities on the basis of their websites.The approach could be applied to other university systems.Originality/value:This work applies for the first time to university websites some recently introduced techniques for automatic knowledge extraction based on web scraping,optical character recognition and nontrivial text mining operations(Bruni&Bianchi,2020).

作者 Gianpiero Bianchi Renato Bruni Cinzia Daraio Antonio Laureti Palma Giulio Perani Francesco Scalfati

机构地区 ISTAT DIAG

出处《Journal of Data and Information Science》 CSCD 2020年第4期43-55,共13页 数据与情报科学学报（英文版）

基金 This work is developed with the support of the H2020 RISIS 2 Project(No.824091)and of the“Sapienza”Research Awards No.RM1161550376E40E of 2016 and RM11916B8853C925 of 2019.This article is a largely extended version of Bianchi et al.(2019)presented at the ISSI 2019 Conference held in Rome,2–5 September 2019.

关键词 Development of data and information services Webometrics indicators Higher education institutions Automatic extraction Machine learning Optimization

分类号 G353.1 [文化科学—情报学]

引文网络
相关文献

参考文献1

1Cinzia Daraio,Renato Bruni,Giuseppe Catalano,Alessandro Daraio,Giorgio Matteucci,Monica Scannapieco,Daniel Wagner-Schuster,Benedetto Lepori.A Tailor-made Data Quality Approach for Higher Educational Data[J].Journal of Data and Information Science,2020,5(3):129-160. 被引量：2

二级参考文献1

1Cinzia Daraio.A Framework for the Assessment of Research and Its Impacts[J].Journal of Data and Information Science,2017,2(4):7-42. 被引量：1

共引文献1

1Giuseppe Catalano,Cinzia Daraio,Jacqueline Leta,Henk F.Moed,Giancarlo Ruocco,Xiaolin Zhang.Novel Approaches to the Development and Application of Informetric and Scientometric Tools[J].Journal of Data and Information Science,2020,5(3):1-4.

同被引文献18

1姚天昉,程希文,徐飞玉,汉思·乌思克尔特,王睿.文本意见挖掘综述[J].中文信息学报,2008,22(3):71-80. 被引量：106
2王凌燕,方曙,季培培.利用专利文献识别新兴技术主题的技术框架研究[J].图书情报工作,2011,55(18):74-78. 被引量：50
3李尚昊,朝乐门.文本挖掘在中文信息分析中的应用研究述评[J].情报科学,2016,34(8):153-159. 被引量：45
4陈成,仲济香,张丹凤,谢敏.生态文明视角下土地整治科技创新研究——基于原国土资源部土地整治相关领域的登记和获奖成果分析[J].中国土地科学,2018,32(4):82-88. 被引量：6
5聂秀萍,谢能付,郝心宁,樊景超.基于文本挖掘的国外农业科研项目研究热点主题分析[J].江西农业学报,2018,30(7):102-106. 被引量：2
6杜亚敏,高世昌,苗利梅.完善土地科技创新管理制度的思考[J].中国土地,2018(11):29-31. 被引量：3
7王歌,张安录,杨帆,李景旺,黄俊添.土地科技创新的合作网络及热点演化研究——以国土资源科学技术奖为例[J].中国土地科学,2019,33(6):104-112. 被引量：6
8郝汉,杨晋渝.基于乡村振兴战略下都市农业土地利用问题及对策研究[J].中国农业资源与区划,2020,41(9):80-84. 被引量：10
9Adian Fatchur Rochim,Abdul Muis,Riri Fitri Sari.A Discrimination Index Based on Jain's Fairness Index to Differentiate Researchers with Identical H-index Values[J].Journal of Data and Information Science,2020,5(4):5-18. 被引量：2
10Xiaoli Chen,Tao Han.A Micro Perspective of Research Dynamics Through“Citations of Citations”Topic Analysis[J].Journal of Data and Information Science,2020,5(4):19-34. 被引量：2

引证文献2

1Giuseppe Catalano,Cinzia Daraio,Jacqueline Leta,Henk F.Moed,Giancarlo Ruocco,Xiaolin Zhang.Novel Approaches to the Development and Application of Informetric and Scientometric Tools[J].Journal of Data and Information Science,2020,5(4):1-4.
2胡雅,王博,孙增慧.基于LDA模型的土地工程基金项目文本挖掘方法研究[J].信息与电脑,2021,33(19):61-63.

1ZAN Chen,LYU Liang-qiu.A Study on the Translation of Culture Loaded Words From the Perspective of Skopos Theory[J].Journal of Literature and Art Studies,2019,9(12):1299-1304.
2《图书情报知识》编辑部,丁念(翻译).信息行为研究如何进阶——IP&M主编Bernard Jim Jansen教授专访[J].图书情报知识,2020(4):5-7. 被引量：1
3史庭蔚,吴逸钊,吴剑钟,霍梅梅,蔡建平.多策略智能停车位推荐算法研究及实现[J].现代计算机,2020,26(36):35-39. 被引量：1
4Sophie Loannou Georgiou.Reviewing the Puzzle of CLIL[J].基础教育外语教学研究,2021(1):30-34.
5李顗.名品[J].中国名牌,2021(2):96-96.
6Peta-Anne Zimmerman,Rebecca Eaton,Lynne Brown,Valda Frommolt,Creina Mitchell,Elizabeth Elder,Frances Lin.The "five senses of success" in nursing students: Assessing first-year support engagement[J].International Journal of Nursing Sciences,2019,6(3):322-328.
7PENG Danlu.Advances in Research on Urban Green Spaces from the Perspective of Urban Geography[J].Journal of Landscape Research,2021,13(1):58-63.
8Rebecca Vismara,Corrado Di Nicola,Rodrigo Gil-San Millan,Kostiantyn V.Domasevich,Claudio Pettinari,Jorge A.R.Navarro,Simona Galli.Efficient hexane isomers separation in isoreticular bipyrazolate metal-organic frameworks:The role of pore functionalization[J].Nano Research,2021,14(2):532-540. 被引量：2
9Yanqiang Zhao,Zexing Yang,Bayi Lang,Manfred Shao Wu Meng,Dayuan Xue,Lu Gao,Lixin Yang.Skincare plants of the Naxi of NW Yunnan, China[J].Plant Diversity,2020,42(6):473-478. 被引量：1
10Chang Liao,Qiuxiang Tian,Feng Liu.Nitrogen availability regulates deep soil priming effect by changing microbial metabolic efficiency in a subtropical forest[J].Journal of Forestry Research,2021,32(2):713-723. 被引量：3

Journal of Data and Information Science

2020年第4期

浏览历史

内容加载中请稍等...