做事测试评卷中的质量控制Quality Control for Ratings in a Performance Test
刘建达,杨满珍
摘要(Abstract):
本文讨论了做事测试中人们普遍关心的一个核心问题——评卷质量。评卷人在评卷过程中常受自己的喜好、习惯、期望等等因素的影响,这些偏见往往会导致评卷误差,从而影响评卷质量。评卷是种复杂、易于犯错的认知活动,评卷误差主要来自于三个方面:评卷人、评卷过程、评分标准。本研究以写作测试为例,论述了如何通过多层面Rasch模型监控评卷质量,主要观察评卷人的严厉度是否前后一致、评卷人评卷是否有偏见、评卷人能否一致地、有效地使用评分标准、评卷人是否能很好区分考生的不同能力。
关键词(KeyWords): 评卷质量;多层面Rasch模型;做事测试;信度
基金项目(Foundation):
作者(Author): 刘建达,杨满珍
参考文献(References):
- [1]Bachman,L.F.,B.K.Lynch,& M.Mason.Investigating variability in tasks and rater judgments in a performance test of foreign language speaking[J].Language Testing,1995,12:238-257.
- [2]Barrett,S.The impact of training on rater variability[J].International Education Journal,2001,2:49-58.
- [3]Bernardin,H.J.& E.C.Pence.Effects of rater training:Creating new response sets and decreasing accuracy[J].Journal of Applied Psychology,1980,65(60-66).
- [4]Bonk,W.J.& G.J.Ockey.A many-facet Rasch analysis of the second language group oral discussion task[J].Language Testing, 2003,20(1):89-110.
- [5]Brown,W.L.,K.OGorman,& Y.Du.The Reliability and Validity of Mathematics Performance Assessment[P].Paper presented at the Annual Meeting of the American Educational Research Association, Minnesota,1996.
- [6]Buu,Y.-P.Statistical analysis of rater effects[D].Unpublished PhD thesis,University of Florida,Florida,2003.
- [7]Cronbach,L.J.Essentials of Psychological Testing[M](5th ed.). New York:Haper and Row,1990.
- [8]Eckes,T.Examining rater effects in TestDaF writing and speaking performance assessments:A many-facet Rasch analysis[J].Language Assessment Quarterly,2005,2(3):197-221.
- [9]Eckes,T.Rater types in writing performance assessments:A classification approach to rater variability[J].Language Testing,2008,25: 155-185.
- [10]Elder,C.,U.Knoch,G.Barkhuizen,& J.yon Randow.Individual feedback to enhance rater training:Does it work[J].Language Assessment Quarterly,2005,2:175-196.
- [11]Engelhard,G.,Jr.The measurement of writing ability with a manyfaceted rasch model[J].Applied Measurement in Education,1992,5 (3):171-191.
- [12]Engelhard,G.,Jr.Examining rater errors in the assessment of written composition with a many-faceted rasch model[J].Journal of Educational Measurement,1994,31(2):93-112.
- [13]Gyagenda,I.S.& G.Engelhard,Jr.Applying the Rasch Model To Explore Rater Influences on the Assessed Quality of Students'Writing Ability[P].Paper presented at the Annual Meeting of the American Educational Research Association,San Diego,1998.
- [14]Hedge,J.W.& M.J.Kavanagh.Improving the accuracy of performance evaluations:Comparison of three methods of performance appraiser training[J].Journal of Applied Psychology,1988,73:68- 73.
- [15]Johnson,V.E.& J.H.Albert.Ordinal Data Modeling[M].New York:Springer-Verlag,1999.
- [16]Kenny,D.A.& D.A.Kashy.Analysis of the multitrait-multimethod matrix by confirmatory factor analysis[J].Psychological Bulletin, 1992,112:165-172.
- [17]Kondo-Brown,K.A FACETS analysis of rater bias in measuring Japanese second language writing performance[J].Language Testing, 2002,19(1):3-31.
- [18]Kumar,D.D.Performance appraisal:The importance of rater training [J].Journal of the Kuala Lumpur Royal Malaysia Police College, 2005,4:1-17.
- [19]LeBel,T.J.,S.P.Kilgus,A.M.Briesch,& S.Chafouleas. The impact of training on the accuracy of teacher-completed direct behavior ratings[J].Journal of Positive Behavior Interventions. Forthcoming.
- [20]Linacre,J.M.Many-facet Rasch Measurement[M].Chicago:MESA Press,1994.
- [21]Linacre,J.M.A User's Guide to FACETS:Rasch-Model Computer Program[M].Chicago:MESA Press,2005.
- [22]Lumley,T.& T.F.McNamara.Rater characteristics and rater bias: implications for training[J].Language Testing,1995,12(1):54 -71.
- [23]Lynch,B.K.& T.McNamara.Using G-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants[J].Language Te.sting,1998, 15(2):158-180.
- [24]McNamara,T.Measuring Second Language Performance[M].London; New York:Longman,1996.
- [25]Murphy,K.R.& R.L.Anhalt.Is halo error a property of the rater, ratees,or the specific behavior observed[J].Journal of Applied Psychology,1992,77:494-500.
- [26]Myford,C.M.& R.J.Mislevy.Monitoring and improving a portfolio assessment system[R](MS 94-05).Princeton,NJ:Educational Testing Service,1995.
- [27]Myford,C.M.& E.W.Wolfe.Detecting and measuring rater effects using Many-facet Rasch measurement:PartⅠ[J].Journal of Applied Measurement,2003,4(4):386-422.
- [28]Myford,C.M.& E.W.Wolfe.Understanding Rasch measurement: detecting and measuring rater effects using Many-facet Rasch measurement:PartⅡ[J].Journal of Applied Measurement,2004,5 (2):189-227.
- [29]Popham,W.J.Modern Educational Measurement:A Practitioner's Perspective[M].Englewood Cliffs,NJ:Prentice Hall,1990.
- [30]Saito,H.EFL classroom peer assessment:Training effects on rating and commenting[J].Language Testing,2008,25(4):553-581.
- [31]Scullen,S.E.,M.K.Mount,& M.Goff.Understanding the latent structure of job performance ratings[J].Journal of Applied Psychology, 2000,85:956-970.
- [32]Sykes,R.C.,K.Ito,& Z.Wang.Effects of assigning raters to i- tems[J].Educational Measurement:Issues & Practice,2008,27:47 -55.
- [33]Thorndike,E.L.& E.P.Hagen.Measurement and Evaluation in Psychology and Education[M].New York:John Wiley and Sons, 1977.
- [34]Weigle,S.C.Using FACETS to model rater training effects[J]. Language Testing,1998,15(2):263-287.
- [35]何莲珍,张洁.多层面Rasch模型下大学英语四、六级考试口语考试(CET-SET)信度研究[J].现代外语,2008,31(4):388- 398.
- [36]贺满足.多层面分析评估中国学生英语作文中评分者的偏颇性[D].未发表硕士论文,广东外语外贸大学,广州,2006.
- [37]姜雨.多侧面Rasch模式在英语写作测试中的应用[D].未发表硕士论文,大连理工大学大连,2008.
- [38]张洁.评分过程与评卷人信念:评卷人差异的内在因素研究[D].未发表博士论文,广东外语外贸大学,广州,2009.
- [39]张红霞.口语能力各要素对评分误差的影响[D].未发表硕士论文,山西大学,太原,2006.