摘要
近年来被广泛运用于史学、文学研究的LDA主题建模不仅仅可以用于发现庞大语料库的整体意义结构,将LDA运用于哲学语料库,再聚焦于具体文本,还可以揭示小文本意义特征和文本间意义关系,从而进行哲学发现,并为一些哲学命题提供实证论据.作为一种新方法,LDA的有效性与客观性需要被检验.首先,基于新汉典语料库,以《论语》《孟子》《荀子》的LDA建模研究为例,展示该研究从建模到得出推论的全过程.其次,将模型数据和哲学推论与SN关于上述三部典籍的数字人文研究进行对比,分析不同推论的原因,论述运用LDA发现小文本的意义特征和文本间关系的有效性.最后,使用控制变量法对Ctext语料库进行建模,对比不同参数下模型所呈现的《论语》主题分布,论述该方法的客观性条件仅与k值设置有关,当k取值于收敛向"理想状态"的一段区间时,LDA能最大程度避免主观性因素的干扰,保证该研究方法的客观性.
In recent years,LDA topic modeling,which is widely used in history and literary research,can not only be used to discover the overall meaning structure of large corpus,but also be used to discover the meaning features of small text and the meaning relationship between individual texts in corpus.So as to make philosophical discoveries and provide empirical evidence for some philosophical propositions.This paper applies LDA to philosophical corpus,and then focuses on specific texts.By revealing the meaning relationship between texts,it develops new reasoning.As a kind of new approach,the effectiveness and objectivity of it need to be tested.Based on the Xīn H1 n diǎn(新汉典)corpus,the LDA modeling research of The Analects of Confucius,Mencius,and Xunzi are taken as examples to show the whole process of the research from modeling to drawing inferences.Next,Comparing the model data and philosophical inferences with SN’s digital humanities research on the three books above,this paper analyzes the reasons for different inferences,and discuss the effectiveness of using LDA to discover the meaning characteristics of small texts and the relationships between texts Finally,this paper uses the control variable method to model the Ctext corpus,compares the topic distribution of the Analects Confucius under different parameters,and discusses that the objective condition of this method is only related to the setting of the K value.When the K value converges to the“ideal state”,LDA can avoid the interference of subjective factors to the greatest extent and ensure the objectivity of this research method.
作者
高元昊
王小红
科林·艾伦
杨钊
Gao Yuanhao;Wang Xiaohong;Allen Colin;Yang Zhao
出处
《数字人文研究》
2021年第2期36-50,共15页
Digital Humanities Research
基金
国家哲学社会科学基金重大项目“智能革命与人类深度科技化前景的哲学研究”子项目(17ZDA028)
科学技术部外国专家项目“中国古代哲学的主题建模计算分析”(BG20190027003)
陕西省社会科学基金“机器学习与哲学研究方法创新:基于汉典主题建模”(2019C006)研究成果之一