基于Rasch模型的翻译测试效度研究Validation of a Translation Exam Based on Rasch Measurement Model
江进林,文秋芳
摘要(Abstract):
本文运用多面Rasch模型,从考生、评分员和评分项三个方面对一次英语篇章翻译测试的效度进行了研究。结果表明:①考生能力具有显著差异。但是,部分考生的内部答题行为不一致:1.33%的考生未发挥出实际水平,7.34%的考生出现了超常发挥的现象,3%的考生则可能未译完;②评分员的严厉度存在显著差异,但他们的评分均具有良好的内部一致性;③评分项的难度存在显著差异,区分度也较合理。总之,这次测试整体上具有良好的效度,但考生的表现值得进一步研究。
关键词(KeyWords): 多面Rasch模型;翻译;效度
基金项目(Foundation): 教育部人文社会科学重点研究基地重大项目“大规模考试主观题(英汉互译)自动评分系统的研制”(批准号:07JJD740070)的资助
作者(Author): 江进林,文秋芳
参考文献(References):
- [1]American Psychological Association,American Educational Research Association,and National Council on Measurement in Education. Standards for Educational and Psychological Testing[Z].Washington, D.C.:American Educational Research Association,1999.
- [2]Bachman,L.F.Fundamental Considerations in Language Testing [M].Oxford:Oxford University Press;上海:上海外语教育出版社,1990/1999.
- [3]Bachman,L.F.Statistical Analyses for Language Assessment[M]. Cambridge:Cambridge University Press,2004.
- [4]Banerji,M.Construct validity of scores/measures from a developmental assessment of mathematics using classical and Many-Facet Rasch Measurement [J].Journal of Applied Measurement,2000,1(2):177-198.
- [5]Bonk,W.J.& Ockey,G.J.A many-facet Rasch analysis of the second language group oral discussion task[J].Language Testing, 2003,20(1):89-110.
- [6]Eckes,T.& Grotjahn,R.A closer look at the construct validity of Ctests [J].Language Testing,2006,23(3):290-325.
- [7]Eckes,T.Examining rater effects in TestDaF writing and speaking performance assessments:A many-facet Rasch analysis[J].Language Assessment Quarterly,2005,2(3):197-221.
- [8]Elder,C.,Barkhuizen,G.,Knoch,U.& Von Randow,J.Evaluating rater responses to an online training program for L2 writing assessment [J].Language Testing,2007,24(1):37-64.
- [9]Hoyt,W.T.& Kerns,M.D.Magnitude and moderators of bias in observer ratings:A meta-analysis[J].Psychological Methods,1999, 4:403-424.
- [10]Kondo-Brown,K.A FACETS analysis of rater bias in measuring Japanese second language writing performance[J].Language Testing, 2002,19:3-31.
- [11]Linacre,J.M.Many-faceted Rasch Measurement[M].Chicago: MESA Press,1989.
- [12]Linacre,J.M.What do infit and outfit,mean-square and standardized mean[J].Rasch Measurement Transactions,2002,16(2):878.
- [13]Linacre,J.M.FACETS:version 3.63.0[CP/DK].Chicago: Winsteps.com,2008a.
- [14]Linacre,J.M.A user's guide to FACETS:Rasch-model Computer Program[M].Chicago:MESA Press,2008b.
- [15]Lynch,B.& McNamara,T.F.Using G-theory and Many-facet Rasch Measurement in the development of performance assessments of the ESL speaking skills of immigrants[J].Language Testing,1998, 15:158-180.
- [16]Messick,S.Validity[A].In Robert,L.L.(ed.).Educational Measurement[C].3 rd ed.New York:Macmillan,1989.
- [17]Myford,C.& Wolfe,E.Detecting and measuring rater effects using many-facet Rasch measurement:Part I[J].Journal of Applied Measurement, 2003,4(4):386-422.
- [18]Park,T.An investigation of an ESL placement test of writing using many facet Rasch measurement[J].Teachers College,Columbia University, Working papers in TESOL & Applied linguistics,2004,4(1).
- [19]Pollitt,A.C.& Hutchinson,C.Calibrated graded assessment:Rasch partial credit analysis of performance in writing[J].Language Testing,1987,4:72-92.
- [20]Wang,N.Examining reliability and validity of job analysis survey data[J].Journal of Applied Measurement,2003,4(4):358-369.
- [21]Weir,C.J.Language testing and validation:An evidence-based ap- proach[M].Houndmills:Palgrave Macmillan,2005.
- [22]黄永红.英语专业四级口语测试的信度和效度[J].外语研究, 2006,(3):36-38.
- [23]金艳,郭杰克.大学英语四、六级考试非面试型口语考试效度研究[J].外语界,2002,(5):73-79.
- [24]刘建达.话语填充测试方法的多层面Rasch模型分析[J].现代外语,2005,28(2):51-63.
- [25]庞继贤,陈婵.外语口语考试的效度和信度研究述评[J].外语与外语教学,2005,(7):20-23.
- [26]王海贞.基于评分过程证据的英语专业四级口试效度研究[J].解放军外国语学院学报,2007,30(4):49-53.
- [27]杨惠中,Weir,C.J.大学英语四、六级考试效度研究[M].上海:上海外语教育出版社,1998.
- [28]邹申.语言测试[M].上海:上海外语教育出版社,2005.