期刊文献+

学习者英语书面语料自动词性赋码的信度研究 被引量:13

POS tagging reliability on EFL learners' written data
原文传递
导出
摘要 对学习者语料进行自动词性赋码,可以使语料库获得“增值”,便于对中介语进行更深层次的研究。本研究考察两种自动词性赋码器对中国英语学习者书面语进行自动赋码的可行性。研究中使用Brill词性赋码器和CLAWS7词性赋码器分别为一组高分作文和一组低分作文进行自动词性赋码,并统计赋码的准确率。研究的目的在于:1)比较基于规则的词性赋码器和基于概率的词性赋码器对中国英语学习者书面语的适用度;2)考察学生作文质量对赋码准确率是否有显著影响;3)分析两类词性赋码器在处理学习者语言时所暴露出来的弱点。研究发现,作为一种基于概率的自动词性赋码器,CLAWS7具有较为可靠的性能,其赋码准确率基本达到该工具为英语母语进行词性赋码时的水平,而作为一种基于规则的词性赋码器,Brill的赋码准确率不够稳定,受学习者语言质量特别是语言错误的影响较大。本研究的发现表明,基于CLAWS7所提供的词性赋码,可以对中国英语学习者书面语的句法特点进行有效的研究。 POS tagging can bring “added value” to learner corpora and thus enable in-depth studies of interlanguage. This study investigates the performance of two POS taggers on Chinese EFL learners' written data. The Brill POS tagger and the CLAWS POS tagger were used to tag a group of high-proficiency learner texts and a group of low-proficiency learner texts, and tagging accuracy was then calculated. The study aims 1) to compare the performance of the rule-based tagger with that of the probability-based tagger; 2) to find out whether the performance of POS taggers is significantly affected by the quality of learnerlanguage; and 3) to discover typical errors of both types of POS taggers. Results of the study indicate that the probability-based tagger outperforms the rule-based tagger, and that the probability-based tagger yields an accuracy comparable to that achieved when the tagger is used to tag English native speakers' texts. It is also found that the rule-based tagger does not perform stably, and that its accuracy is often affected by the quality of learner language. It is concluded that learner written corpora tagged with CLAWS can serve as reliable data for syntactic studies of Chinese EFL learners' written language.
作者 梁茂成
出处 《外语教学与研究》 CSSCI 北大核心 2006年第4期279-286,共8页 Foreign Language Teaching and Research
  • 相关文献

参考文献17

  • 1Aarts, J. & S. Granger. 1998. Tag sequences in learner corpora: A key to interlanguage grammar and discourse [A]. In S. Granger (ed.). 1998.
  • 2Brill, E. 1992. A simple rule-based part of speech tagger [ A ]. In Proceedings of the DARPA Speech and Natural Language Workshop [C]. San Mateo, California: Morgan Kauffman.
  • 3Brill, E. 1994. Some advances in rule-based part of speech tagging [ A]. In Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94) [C]. Seattle, WaA:AAAI Press.
  • 4de Haan, P. 1999. Tagging non-native English with the TOSCA-ICLE tagger [A]. In C. Mair & M. Hundt (eds,). Corpus Linguistics and Linguistic Theory: Papers from the Twentieth International Conference on English Language Research on Computerized Corpora ( ICAME 20) [C]. Freiburg im Breisgau 1999.
  • 5Granger, S. 1996. From CA to CIA and back: An integrated approach to computerized bilingual and learner corpora [ A]. In K. Aijmer, B. Altenberg & M.Johansson (eds.). Languages in Contrast: Papers from a Symposium an Taxt-based Cross-linguistic Studies [C]. Lund: Lund University Press.
  • 6Granger, S. 1997. Automated retrieval of passives from native and learner corpora: Precision and recall [J ]. Journal of English Linguistics 25/4 : 365-374.
  • 7Granger, S. 1998. The computer learner corpus: A versatile new source of data for SLA research[A]. In S. Granger (ed.). 1998.
  • 8Granger, S. (ed.). 1998. Learner English an Computer [C]. London and New York: Longman.
  • 9Granger, S. 2002. A bird's-eye view of learner corpus research [A]. In S. Granger, J. Hung & S. Petch- Tyson (eds.). 2002. Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching [C]. Amsterdam: John Benjamins.
  • 10Jacobs, H., S. Zinkgraf, D. Wormuth, V. Hartfiel & J. Hughey. 1981. Testing ESL Composition: A Practical Approach [M]. Rowley, MA: Newbury House.

二级参考文献35

共引文献28

同被引文献130

引证文献13

二级引证文献46

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部