期刊文献+

基于中文人名用字特征的性别判定方法 被引量:2

A method of gender discrimination based on character feature of Chinese names
原文传递
导出
摘要 基于中文人名用字具有的较强的性别区分性,提出一种利用朴素贝叶斯分类器对中文人名性别进行判定的方法,该方法将每个中文人名中的第一个字(字1)、第二个字(字2)、第一和第二个字组合(字1字2)作为区分特征,利用朴素贝叶斯分类方法对该人名所属性别进行判定。在412 775个中文人名语料上采用10重交叉验证法进行训练和测试,对比了依据不同区分特征组合进行性别判定的准确率,分别采用字1,字2,字1+字2,字1+字1字2,字2+字1字2,字1+字2+字1字2(全部区分特征)构成的特征组合进行性别判定,平均判定准确率分别为72.75%,86.92%,88.84%,87.37%,89.35%,90.06%,取得的最好平均判定准确率为90.06%。 Based on the strong gender discrimination of Chinese names, a method of gender discrimination based on character feature of Chinese names using na?ve Bayes classifier was presented.In this method, the first character of each Chinese name ( Zi1 ) , the second character ( Zi2 ) , the first and the second characters ( Zi1 Zi2 ) were regarded as distin-guishing features.The naive Bayes classification method was used for gender discrimination of Chinese names.Training and testing were done on 412 775 Chinese names corpus using 10 fold cross validation method, and comparative experi-ments were done according to the different feature combinations, they were Zi1 , Zi2 , Zi1 +Zi2 , Zi1 +Zi1 Zi2 , Zi2 +Zi1 Zi2 , Zi1 +Zi2 +Zi1 Zi2 ( all the distinguishing features) .The average accuracy were as followings in turn, 72.75%, 86.92%, 88.84%, 87.37%, 89.35%, 90.06%, of which the best average accuracy was 90.06%.
出处 《山东大学学报(工学版)》 CAS 北大核心 2014年第1期13-18,23,共7页 Journal of Shandong University(Engineering Science)
基金 国家自然科学基金资助项目(60863011) 河南省基础与前沿技术研究计划资助项目(112300410182)
关键词 中文人名 性别判定 朴素贝叶斯分类 用字特征 特征组合 区分特征 Chinese names gender discrimination naive Bayes classification character feature feature combination distinguishing feature
  • 相关文献

参考文献26

二级参考文献144

共引文献293

同被引文献8

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部