TEM-4写作新分项式评分标准的多层面Rasch模型分析Validation of TEM-4 Writing;Analytic Rating Scale:Multi-facet Rasch Measurement
李清华,孔文
摘要(Abstract):
为了检验TEM-4写作新的分项式评分标准的评分质量,18位评分员使用这种新的分项式评分标准独立评阅了35篇TEM-4真实作文文本。我们运用项目反映理论的多层面Rasch模型对评分结果进行分析。多层面Rasch模型总层面和分层面分析的结果表明,新的分项式评分标准能够有效地区分不同写作能力水平的受试;评分员的松严度虽然存在显著差异,但评分员之间的一致性和评分员内部的稳定性均在可以接受的范围之内;评分标准各维度的难度存在显著差异,分值的使用总体上令人满意;评分员与受试之间交互作用的偏差和评分员与评分标准维度之间交互作用存在一些显著偏差。整体而言,新标准评分结果与模型拟合比较理想,评分员使用该标准评分的结果是可靠的。
关键词(KeyWords): TEM-4写作测试;分项式评分标准;效度验证;多层面Rasch模型
基金项目(Foundation): 上海外国语大学“211”三期建设重点学科资助项目“英语专业写作教学语料库建设与研究”(编号:SISU211-3-1-1-032)的部分成果
作者(Author): 李清华,孔文
参考文献(References):
- [1]Bacha,N.Writing evaluation:what can analytic versus holistic essay scoring tell us[J].System,2001,29:371-383.
- [2]Bonk,W.J.& G.J.Ockey.A many-facet Rasch analysis of the second language group oral discussion task[J].Language Testing, 2003,20(1):89-110.
- [3]Cumming,A.,R.Kantor & D.Powers.Scoring TOEFL essays and TOEFL 2000 Prototype Writing Tasks:An Investigation into Raters' Decision Making and Development of a Preliminary Analytic Framework. TOEFL Monograph Series MS-22[R].Princeton,NJ:Educational Testing Service.2001.
- [4]Eckes,T.Rater types in writing performance assessments:A classification approach to rater variability[J].Language Testing,2008,25 (2):155-185.
- [5]Elder,C.,Barkhuizen,G.,Knoch,U.,& von Randow,J.Evaluating rater responses to an online rater training program[J].Language Testing,2007,24(1):37-64.
- [6]Fulcher,G & F.Davidson.Language Testing and Assessment[M]. NY:Routledge,2007.
- [7]Hamp-Lyons,L.Scoring procedures for ESL contexts[A].In:L. Hamp-Lyons(Ed.),Assessing second language writing in academic contexts[C].Norwood,NJ:Ablex,1991:241-276.
- [8]Knoch,U.Diagnostic assessment of writing:A comparison of two rating scales[J].Language Testing,2009,26(2):275-304.
- [9]Kondo-Brown,K.A FACETS analysis of rater bias in measuring Japanese L2 writing performance[J].Language Testing,2002,19(1):3 -31.
- [10]Linacre,J.M.Many-facet Rasch Measurement[M].MESA Press: Chicago.1989,1994..
- [11]Linacre,J.M.A User's Guide to FACETS:Rasch-Model Computer Program[M].MESA Press:Chicago.2004.
- [12]Liu,Jianda.Rater effects in a written discourse completion test [A].Paper presented in International Conference on Language Testing. Beijing:BFSU,2007.
- [13]McNamara,T.F.Measuring Second Language Performance[M]. London;New York:Longman.1996.
- [14]Myford,C.M.& E.W.Wolfe.Detecting and measuring rater effects using many-facet Rasch measurement:PartⅡ[J].Journal of Applied Measurement,2004,5(2):189-227.
- [15]Knoch,Ute.Diagnostic assessment of writing:A comparison of two rating scales[J].Language Testing,2009,26(2):275-304.
- [16]Park,T.An investigation of an ESL placement test of writing using many-facet Rasch measurement[OL].Teachers College,Columbia University,Working Paper in TESOL & Applied Linguistics 2004,4 (1).http://journals.tc-library.org/index.php/tesol/article/ view/41/48
- [17]Schaefer,E.Rater bias patterns in an EFL writing assessment[J]. Language Testing,2008,25(4):465-493.
- [18]Shaw,S & P.Falvey.The IELTS writing assessment Revision Project: Towards a revised rating scale[OL].Research Reports,2008, Issue 1.http://www.cambridgeesol.org/
- [19]Stemler,S.E.A comparison of consensus,consistency,and measurement approaches to estimating interrater reliability[J/OL].Practical Assessment Research & Evaluation,2004,9(4).http://PAREonline. net/getvn.asp? v=9&n=4.
- [20]Weigle,S.C.Assessing Writing[M].Cambridge:CUP,2002.
- [21]Wright,B.D&G.N.Masters.Rating Scale Analysis:Rasch Measurement [M].Chicago:MESA Press,1982.
- [22]刘建达,语用能力测试的评卷对比研究[J].现代外语,2007, (4):395-404.
- [23]李清华,高等院校英语专业学生四级测试写作评分标准的效度研究[R].上海外国语大学博士后流动站研究报告,2010.
扩展功能
本文信息
服务与反馈
本文关键词相关文章
本文作者相关文章
中国知网
分享