期刊文献+

基于多维尺度模型的潜在主题可视化研究 被引量:5

A Research on Visualization of Underlying Topics Based on MDS Model
下载PDF
导出
摘要 数据库内容结构分析把共词分析方法应用于全文主题发现,但事先选定种子词和统计共现次数等步骤导致该方法会遗漏很多重要的词汇组合和潜在主题。本文提出使用词汇集聚理论作为潜在主题可视化的理论基础,跳过事先选定种子词和统计共现矩阵的步骤,把词条表示在转置的向量空间中,通过多维尺度模型(MDS)算法把词条在转置向量空间中的邻近关系投影到三维空间图上,通过词汇的空间聚类来发现和表示潜在主题;引入数据编码的方法来克服MDS可视空间容量的局限,并设计了邻近矩阵、质心邻近矩阵、属性叠加邻近矩阵及三个层次的方法流程。最后,成功地将三个层次的潜在主题可视化的方法流程应用于计算机应用服务业上市公司的风险识别。 Database Tomography analysis applied term co-occurrence method to discover topics in full texts. But it may miss lots of content and topics in the original text set because of its procedure of co-occurrence frequency statistic and pre-selection of seed term. This paper propose to regard lexical cohesion as theoretical basis of underlying topics visualization, skipping the steps of co-occurrence frequency statistic and pre-selection of seed term, to present terms in transposed vector space, to map the proximity of terms in transposed vector space to visual space by Multi-Dimensional Scale (MDS) algorithm, and to discover and present topics by spatial clustering of related terms. Data coding method was introduced to overcome the limitations of MDS visual space area. Terms proximity matrix, centroid proximity matrix, attribute accumulative proximity matrix and according method procedures were developed to construct a three layers method system. Method of underlying topics visualization was successfully applied to do risk identification for public companies of computer application services, using verbal content about risk factor in prospectus as texts collection.
出处 《情报学报》 CSSCI 北大核心 2014年第1期45-54,共10页 Journal of the China Society for Scientific and Technical Information
基金 国家建设高水平大学公派研究生项目(留金发[2011]3005) 国家自然科学基金(71173249)
关键词 潜在主题 可视化 多维尺度模型 数据编码 underlying topics, visualization, multidimensional scaling, data coding
  • 相关文献

参考文献15

二级参考文献76

共引文献521

同被引文献83

  • 1谭欣,王琦.关于大学生心理健康调查的报告[J].教育科学,1996(2):37-40. 被引量:14
  • 2赵红洲,蒋国华.知识单元与指数规律[J].科学学与科学技术管理,1984,5(9):39-41. 被引量:62
  • 3陈玉祥,朱桂龙,陈德棉.科学发展预测的概念和功能[J].预测,1994,13(1):57-61. 被引量:1
  • 4陈悦,刘则渊.悄然兴起的科学知识图谱[J].科学学研究,2005,23(2):149-154. 被引量:823
  • 5骆卫华,于满泉,许洪波,王斌,程学旗.基于多策略优化的分治多层聚类算法的话题发现研究[J].中文信息学报,2006,20(1):29-36. 被引量:38
  • 6Fox S. The social life of health information, 2011 [ EB/OL]. [ 2014 - 06 - 20]. http ://www. pewinternet, org/2011/05/12/ the - social - life - of - health - information - 2011/.
  • 7Arden M A, Duxbury A M S, Soltani H. Responses to gestational weight management guidance: A thematic analysis of comments made by women in online parenting forums[ J/OL]. [2015 -03 - 10 ]. http ://www. biomedcentral, com/1471 - 2393/14/216.
  • 8Coulson N S. Sharing, supporting and sobriety: A qualitative anal- ysis of messages posted to alcohol - related online discussion fo- rums in the United Kingdom [ J ]. Journal of Substance Use, 2014, 19(1 -2) : 176 -180.
  • 9Attard A, Coulson N S. A thematic analysis of patient communica- tion in Parkinson' s disease online support group discussion forums [J]. Computers in Human Behavior, 2012, 28(2) : 500 -506.
  • 10Rodgers S, Chen Qimei. Intemet community group participation: Psychosocial benefits for women with breast cancer[ J/OL]. [ 2015 -03 - 10 ]. http://onlinelibrary, wiley, com/doi/10. 1111/j. 1083 - 6101. 2005. tb00268, x/full.

引证文献5

二级引证文献83

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部