| Session Title: Measurement to Improve Precision and Validity of Evaluation Outcomes |
| Multipaper Session 403 to be held in Baltimore Theater on Thursday, November 8, 1:55 PM to 3:25 PM |
| Sponsored by the Quantitative Methods: Theory and Design TIG |
| Chair(s): |
| Ya-Fen Chan, Chestnut Health Systems, ychan@chestnut.org |
| Discussant(s): |
| Barth Riley, University of Illinois, Chicago, bbriley@chestnut.org |
| Abstract: This panel presents a variety of applications of measurement, including classical test theory and the Rasch measurement model, a type of item response theory that provides linear, interval measures of psychological constructs. The end product will be measures that are more reliable, more valid and more convenient to use. |
| Measurement Equivalence: Validity Implications for Program Evaluation |
| Susan Hutchinson, University of Northern Colorado, susan.hutchinson@unco.edu |
| Understanding differences between groups is a fundamental aim of most evaluation research. Such differences might be based on extant demographic characteristics, program participation status, or exposure to some type of intervention. However, between-group comparisons are only valid to the extent that the underlying constructs on which the groups are being compared demonstrate measurement equivalence. Without evidence of equivalent measurement, a researcher cannot unambiguously determine if differences are truly due to the group characteristics of interest or if they are merely an artifact of differential measurement across groups. While measurement equivalence has received considerable attention across many disciplines, it has been largely ignored within the context of evaluation research. Therefore, the purpose of this presentation is to define measurement equivalence from the perspective of validity generalization, describe analytical methods for assessing measuring equivalence, and discuss implications for evaluation research. |
| Assessing Outcomes Across Time: Testing Measurement Assumptions |
| Ann Doucette, George Washington University, doucette@gwu.edu |
| There are several assumptions in longitudinal repeated measure designs. We presume that the measures we use are invariant across time, tapping the same latent construct each time we conduct an assessment and that measures are continuous. Some of the constructs assessed may be highly skewed; such would be the case in severe depression and suicidality, where the probability of positive responses would likely be low. This presentation builds on secondary analyses of longitudinal data from a behavioral healthcare outcomes study; and illustrates the use and advantages of a multilevel Rasch model where items are nested within persons and persons are grouped within therapists. In addition, the Rasch multilevel model allows us to calibrate item change on true interval basis, an assumption of Likert-type scaling that is seldom realized. Results indicate that a change score of 10 points may be markedly different depending where on the construct continuum the change occurs. |
| I Think the People Changed, or was it the Test? |
| Kendon Conrad, University of Illinois, Chicago, kjconrad@uic.edu |
| Barth Riley, University of Illinois, Chicago, bbriley@chestnut.org |
| Ya-Fen Chan, Chestnut Health Systems, ychan@chestnut.org |
| Michael Dennis, Chestnut Health Systems, mdennis@chestnut.org |
| If the items on a measure would change in difficulty from pretest to posttest, it would be problematical to infer that the people changed. That is, we could be observing item change instead of person change. This presentation will examine whether there is change in item calibrations on the 16-item Substance Problem Scale of the Global Appraisal of Individual Need (Dennis et al., 2003) over 5 time points, i.e., baseline and four three-month posttests. Once the magnitude of item change has been presented using Rasch differential item functioning analysis in Winsteps software (Linacre, 2006), the measures will be adjusted to correct for change over time using Facets software (Linacre, 2006). Statistical significance and clinical effect size criteria will be used to estimate the magnitude of differences between adjusted and unadjusted posttests. The implications for improved outcome evaluations will be discussed. |