摘要
本文探讨了数据挖掘技术在日语作文特征分析中的应用方式。词汇密度和文本特征分析显示,作文是一种独特的文体,与其他本族语语料差异显著。其特点表现为,词汇密度低,名词、数词等使用偏少,动词、形容词等占比高,句子短,书面语程度低。学习者与本族语使用者产出的作文之间存在明显差异。前者中状态描写偏多,动态描写较少,动词、助动词等占比低。相比之下,八级作文更加接近本族语使用者作文尤其是高年级组作文,但部分词汇的使用能力仍显不足。词语共现网络显示,随着等级的提高,学习者的描述逐渐细致、具体,词汇逐渐接近本族语使用者的产出,错误明显减少,但始终无法完全摆脱母语的干扰。
This paper discusses the application of data mining technology in the analysis of composition. The analysis of vocabulary density and text characteristics shows that composition is a unique style, which is obviously different from other native language materials. Its characteristics are low vocabulary density, less use of nouns and numerals, a high proportion of verbs, adjectives, etc., short sentences, and a low level of written language. There are obvious differences between the compositions produced by learners and native speakers. In the former, there are more state descriptions, less dynamic descriptions, and a lower proportion of verbs and auxiliary verbs. In contrast, level-eight composition is closer to the composition of native language users, but the ability to use some vocabulary is still insufficient. The co-occurrence network shows that as the level increases, learners’ descriptions become more detailed and specific, vocabulary gradually approaches the output of native speakers, and errors are significantly reduced. But they still cannot completely avoid the interference of their mother language.
作者
毛文伟
Mao Wenwei(Shanghai International Studies University,China)
出处
《日语学习与研究》
CSSCI
2022年第2期72-81,共10页
Journal of Japanese Language Study and Research
基金
2019年国家社科基金项目“基于数据挖掘技术的中国日语学习者认知机制研究”(项目编号:19BYY201)的阶段研究成果。项目主持人:毛文伟。