摘要
使用R语言工具,利用Pamk算法和Kmeans算法相结合的多层次聚类方法,对图书馆新浪微博数据进行子主题聚类和挖掘,发现和摒弃大数据集合中的无用数据,挖掘隐含信息,提高微博信息的利用效率和水平,充分发挥其在图书馆工作中的作用。
The subtopics of library microblog data on Sina website were clustered and mined using the R language tools in combination with the Pamk and Kmeans algoritjm.The non-valid data were discarded from the large data set with the hidden data preserved to improve the access to microblog data and bring them into full play in library service .
出处
《中华医学图书情报杂志》
CAS
2014年第4期46-49,共4页
Chinese Journal of Medical Library and Information Science