摘要
本文从汉语语言特性和二语教学实际需求出发,提出了汉语搭配的四种性质和九个类型。以语言本体研究为基础,将基于表层、深层语言知识的方法和统计方法相融合,实现了高效率、高精度的搭配自动抽取,在240万词的二语教材语料库中抽取了22298条搭配,在1. 38亿词的海量互联网语料库中抽取了219451条搭配,构建了两个不同领域的大规模搭配知识库。最后,以词语辨析、量词学习与语法偏误自动探测为例,分别介绍了两个搭配知识库在二语教学领域的应用。
From the perspectives of Chinese language features and second language teaching, this article proposes four properties and defines nine types of Chinese collocations. With the definition and classification,a hybrid method integrating linguistic knowledge of surface and deeper layers as well as statistic information is devised for automatic collocation extraction with high proficiency and precision. 22298 collocations are extracted from a 2. 4-million-word L2 textbook corpus, and 219451 collocations are extracted from a 138-million-word Wikipedia corpus. Then two large-scale collocation knowledge bases are built. Finally,this article introduces the application of the collocation knowledge bases in second language acquisition with three different tasks: word discrimination,quantifier learning and grammatical error detection.
作者
胡韧奋
肖航
HU Renfen;XIAO Hang
出处
《语言文字应用》
CSSCI
北大核心
2019年第1期135-144,共10页
Applied Linguistics
基金
国家社科基金青年项目"面向汉语国际教育的智能测试技术研究"(18CYY029)
中国博士后科学基金面上资助项目"汉语二语智能测试技术研究"(2018M630095)
中央高校基本科研业务费专项资金的支持
关键词
搭配
定量分析
自动抽取
知识库
语料库
collocation
quantitative analysis
automatic extraction
knowledge base
corpus