|
Reducing Rater Bias in Scoring Performance Assessments
|
| Presenter(s):
|
| Robert Johnson, University of South Carolina, rjohnson@mailbox.sc.edu
|
| Min Zhu, University of South Carolina, helen970114@gmail.com
|
| Brandon Loudermilk, University of South Carolina, loudermb@mailbox.sc.edu
|
| Xiaofang Jae, University of South Carolina, jae2008@gmail.com
|
| Ashlee Lewis, University of South Carolina, lewisaa2@mailbox.sc.edu
|
| Abstract:
This study examines the use of visual representations of types of scoring bias in training raters to score arts assessments. Initial evidence is mixed about whether the quality of raters' scores improved when their training incorporated visual representations of scoring bias versus all verbal descriptions of bias. Two of the three treatment groups (i.e., raters trained with visual representations of bias) displayed closer agreement with validation scores than did the controls; whereas, two of the control groups had higher interrater reliability than the treatment groups. The potential for the utility of visual representations of bias is reflected in that a majority of raters in the treatment group successfully recalled more of the types of bias that were presented visually rather than verbally.
|
|
Bootstrap Reliability
|
| Presenter(s):
|
| Cristian Gugiu, Western Michigan University, crisgugiu@yahoo.com
|
| Abstract:
All measures are imperfect since no test (survey) is capable of measuring with perfect accuracy or precision. Over the past century, psychometricians have developed a multitude of reliability measures in order to provide researchers with a simple index that quantifies the reliability of a test. Undoubtedly, the most popular index continues to be coefficient (Cronbach's) alpha. However, this reliability estimator is plagued by a number of issues, including that it (a) only provides a lower bound estimate of the true reliability, (b) is only interpretable if the test is unidimensional, (c) has no lower bound (i.e., negative estimates are possible), and (d) is only appropriate for continuous data. This paper will introduce a new method for estimating the reliability of a test based upon the bootstrap method. The bootstrap reliability method is simple to understand and has the versatility to work with continuous, ordinal, and nonlinear data.
|
|
Employing Generalizibility Theory to Assess the Reliability of Peer Evaluation
|
| Presenter(s):
|
| Mihaiela Gugiu, Central Michigan University, gugiu1mr@cmich.edu
|
| Abstract:
Since the early 1990s, a growing movement in education has advocated for the implementation of peer evaluation in teaching. Naturally, this movement has not gone unchallenged by those who question the reliability and validity of peer evaluation. The purpose of this paper is to demonstrate the applicability of Generalizibility theory in estimating the reliability and validity of student peer evaluation. I draw from my experience in teaching introductory courses in political behavior, where group projects were utilized as a means of enhancing student knowledge. College students were asked to evaluate the group presentations of their peers and the written papers of two other groups. Generalizability theory was used to examine the reliability of peer evaluation of student written group projects and oral presentations. Moreover, the validity of peer evaluations was examined by comparing the grades resulting from this method to those produced by the instructor and a graduate teaching assistant.
|
| | |