2010 Banner

Return to search form  

Session Title: What Constitutes Quality Evaluation of Development and Social Change: Values, Standards, Tradeoffs, and Consequences
Panel Session 202 to be held in Lone Star A on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Presidential Strand
Chair(s):
Indran Naidoo, Office of the South African Public Service Commission, indrann@opsc.gov.za
Discussant(s):
Patricia Rogers, Royal Melbourne Institute of Technology, patricia.rogers@rmit.edu.au
Abstract: How we view the processes of social change and development and what we consider quality evaluation of interventions to achieve these are inextricably linked. There is a risk that efforts to improve the quality of development evaluations will only support the evaluation of standardized, simple interventions. This may divert attention and resources from innovations that are more complex - i.e. emergent and unpredictable – and therefore inherently risky, yet critical if development is to be sustained and respond to urgent issues that require innovative responses. We need to move beyond the ongoing paradigmatic ‘tug-of-war’ about methodology to a deeper understanding of how change happens, better ways of evaluating ‘the complex’ without compromising quality standards and better understanding of the ways in which evaluation itself has an impact on the processes of social change. This issue has implications for other interventions that seek to bring about sustainable structural change.
Impact Evaluation: Serving Development?
Zenda Ofir, Independent Consultant, zenda@evalnet.co.za
Recent studies note that development thinking makes use of too narrow a range of approaches to change, limiting creativity and contributing to insufficient interrogation of how change happens in complex development environments. At the same time the ongoing impact evaluation debates tend to focus on methodology rather than on the influences, values and standards that shape development and how its effectiveness is assessed. This paper highlights the intensifying struggle to make space for a diversity of impact evaluation approaches that engage more deeply with how change is effected in challenging development contexts. Experiences during the process of establishing impact evaluation guidelines by the Network of Networks on Impact Evaluation (NONIE) reinforce arguments for dissecting our underlying assumptions when making choices, and support calls for reform of the development evaluation system.
The Social Transformation of Evaluation?
Thomas Schwandt, University of Illinois at Urbana-Champaign, tschwand@illinois.edu
There is a growing literature critical of extant models of evaluation, and social science more generally, for their inability to successfully contribute to our understanding, appraisal, and capacity to solve the kinds of complex and dynamic social problems and desired social transformations that characterize the arenas in which evaluation (and social science) ought to have its greatest foothold (e.g. health-related fields, development, education). This paper first briefly rehearses this criticism and then points to the kinds of alternatives to 'standard' ways of thinking of social science and evaluation that are being proposed including systems thinking and complexity science; implementation science; practice-based evidence; calls for 'public social science'; and proposals to revisit the theory-practice gap. The paper then turns to the question of whether different criteria might be needed to judge these new ways of practicing evaluation and social science.
Values, Standards and Tradeoffs in the Evaluation of (Complex) Change Processes
Irene Guijt, Learning by Design, iguijt@learningbydesign
While some discussions continue to centre on methodological superiority, with (quasi) experimental methods claiming the ‘rigor’ high ground, the need to understand how to assess ‘the complex’ has been fostering methodological innovations that capture change and performance from a broader perspective. All methodologies have their use, and all choices imply privileging some values over others. We therefore need to understand what values and quality standards are being upheld when making these choices. Tradeoffs are inevitable. Thinking through the consequences of choice is crucial. Choices determine what is considered valid to see, define what is valued, and shape future thinking on development policies, processes and priorities. And eventually how development is understood. This paper will summarize the main lines of argument from recent conferences and highlight practical examples that reconcile an understanding of complex societal change processes with quality standards and rigor, ethical concerns, appropriateness and feasibility.

Session Title: Implementation From the Ground Up: Defining, Promoting, and Sustaining Fidelity at All Levels of a State Program
Panel Session 203 to be held in Lone Star B on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Program Theory and Theory-driven Evaluation TIG
Chair(s):
Elizabeth Oyer, EvalSolutions Inc, eoyer@evalsolutions.com
Abstract: Fidelity is the extent to which the intervention, as realized, is “faithful” to the pre-stated model. Measuring implementation fidelity provides data for understanding the overall impact of the program. Presenters will discuss the issues around developing a state framework for evaluating the Illinois Mathematics and Science Program and the policies and resources that are needed to sustain and scale up the initiative. Site evaluators will discuss tools and analyses for formative and summative evaluation of progress toward state goals for IMSP, which employs a comprehensive site visit protocol to create profiles of the local grants and develop themes across grants for understanding implementation of the program. Evaluators will discuss the tools for the site visit as well as the results from 2007-2008 and 2008-2009 program evaluation. Finally, the George Williams College of Aurora University MSP project evaluator will discuss the local evaluation of implementation for the IMSP.
Understanding State-level Program Impact: Leveraging State Policies and Resources for Effective Implementation
Elizabeth Oyer, EvalSolutions Inc, eoyer@evalsolutions.com
Marica Cullen, Illinois State Board of Education, mcullen@isbe.net
Gilbert Downey, Illinois State Board of Education, gdowney@isbe.net
At the state program level, measuring implementation fidelity provides data for understanding the overall impact of the program as well as developing a framework for planning the policies and resources that are needed to sustain and scale up the initiative. Measuring implementation fidelity at the state level requires sensitivity to local evaluations as well as attention to the core elements of adherence to the broad guidelines of the state program. The Illinois Mathematics and Science Program (IMSP) Implementation Evaluation framework balances the needs of the state with the local program implementation needs. Multiple data sources at both the local project level as well as the state level provide rich sources for understanding the influence of adherence to implementation on synthesized outcomes. Panelists will discuss tools and analyses for formative and summative evaluation of progress toward state goals for the IMSP as well as how these analyses have shaped state policies for the program.
Understanding the Forest by Examining the Trees: Creating Profiles of Local Grantees to Develop Themes in Implementation of a State Mathematics and Science Partnership Program
Tania Jarosewich, Censeo Group, tania@censeogroup.com
Debra Greaney, Area V Learning Technology Center, dgreaney@lth5.k12.il.us
The Illinois Mathematics and Science Partnership program evaluation employs a comprehensive site visit protocol to collect data about the degree to which program components are delivered as prescribed; exposure, or the amount of program content received by participants; and the quality of program delivery in terms of the theoretical base of processes and content, participants’ responsiveness, and unique features of the program that distinguish it from other programs. Based on the data collected in the visits, the team creates profiles of the local grants and describes themes across grants to contribute to understanding the implementation of the program across the state. The two members of the evaluation team that visit the project sites to collect the implementation data will discuss the tools for the site visits, the challenges of collecting cross-site data, benefits of in-depth site visits, and summarize key results from 2007-2008 and 2008-2009 program evaluations.
Measuring Implementation and Building Adherence: Assessing Fidelity and Improving Understanding in an Illinois Mathematics and Science Program
James Salzman, Ohio University, salzman@ohio.edu
At the local level, implementation fidelity must be measured comprehensively to align with local project objectives and provide a foundation for measuring progress. There are many data collection considerations for establishing a plan for monitoring and assessing fidelity. The evaluator must identify measures for necessary preconditions and align all measures with current evaluation data sources for efficiency and to eliminate overlap. Finally, mixed methods using multiple data sources are needed to triangulate evidence of implementation. The George Williams College of Aurora University MSP project evaluator will discuss the local evaluation of implementation for the IMSP. The Framework for Teaching (Danielson, 1996) has been modified for identifying specific classroom practices commensurate with inquiry methods being taught. The protocol incorporates training for observers, including facilitated conversations of how teaching episodes aligned with the levels of a rubric which is used as a self-reflection tool that allows participants to better understand implementation of the specific strategies in classrooms.

Session Title: The Essential Features of Collaborative, Participatory, and Empowerment Evaluation
Multipaper Session 204 to be held in Lone Star C on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Collaborative, Participatory & Empowerment Evaluation TIG
Chair(s):
Abraham Wandersman,  University of South Carolina, wandersman@sc.edu
Collaborative Evaluation Essentials: Highlighting the Essential Features of Collaborative Evaluation
Presenter(s):
Liliana Rodríguez-Campos, University of South Florida, liliana@usf.edu
Rita O'Sullivan, University of North Carolina at Chapel-Hill, ritao@email.unc.edu
Abstract: Collaborative evaluators are in charge of the evaluation but they create an on-going engagement between evaluators and program staff, resulting in stronger evaluation designs, enhanced data collection and analysis, and results that stakeholder understand and use.

Session Title: Nonprofit Rating Systems and Implications for Evaluation
Think Tank Session 205 to be held in Lone Star D on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Non-profit and Foundations Evaluation TIG
Presenter(s):
Debra Natenshon, The Center for What Works, debra@whatworks.org
Johanna Morariu, Innovation Network, jmorariu@innonet.org
Abstract: In the past few years efforts to use common measures to assess and compare nonprofit performance seem to have multiplied. Interest in comparing nonprofit performance is in a dramatic upswing, and new/different sets of common measures seem to emerge frequently. Some sets of measures have been developed for niche fields, while others seek to compare across the entire sector. As evaluators, we should be aware of these efforts and aware of their possible implications. This think tank seeks to explore a number of questions related to the topic of nonprofit rating systems and common measures, e.g., Is it possible to develop meaningful common measures for a field as diverse as the nonprofit sector? What can we learn from the experiences of fairly well-known, sector-wide approaches such as Charity Navigator, GreatNonprofits, etc.? Considering what we know about existing approaches, what is the effect on traditional program evaluation?

Session Title: Comparative Effectiveness Research in Program Evaluation
Panel Session 206 to be held in Lone Star E on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Quantitative Methods: Theory and Design TIG
Chair(s):
James Michael Menke, University of Arizona, menke@email.arizona.edu
Abstract: Interest in comparative effectiveness research (CER) is increasing rapldly. Although the main focus of interest is in medicine, pressure toward, perhaps even demand for, CER in other areas will almost certainly follow along, probably led by health services research, but quickly followed by education, and then other policy areas. CER poses particular problems in all research, and program evaluation will be no exception. It will eventually not be sufficient simply to conclude that some intervention has positive effects; it will be required that those effects be shown to be as good as or better than alternative interventions or even alternative policy strategies. CER poses special challenges with respect to conceptual and design issues and for appropriate statistical analysis and interpretation of findings. Some of these challenges are described, along with proposed solutions, and illustrations of their applications are presented.
Epistemological and Methodological Issues in Comparative Effectiveness Research
Lee Sechrest, University of Arizona, sechrest@email.arizona.edu
Comparing the effectiveness of two different methods of intervention is often not as simple as it might seem. A first question, related to the common distinction between efficacy and effectiveness research must be whether one intends to compare the interventions at some maximum , ideal level or as they might be expected to occur under ordinary conditions. A second issue sometimes arises when interventions are to be compared that may be differentially effective dependent on characteristics of the population(s) in they are tested. A critical problem, both theoretically and practically, is how we may know whether they interventions are administered at equivalent strengths (doses). If interventions “take hold” in different time frames or their effects become apparent in different ways, direct comparisons may be jeopardized. These and other issues need careful consideration and explication, for if they are overlooked, comparisons may not be legitimate and when done may be misleading.
Comparisons and Second Order Comparisons of Comparisons
James Michael Menke, University of Arizona, menke@email.arizona.edu
Direct Statistical comparisons of different interventions are not always easy, particularly if the effects of the interventions are not thought to be directly comparable, i.e., the interventions have somewhat different effects. Differential attrition rates may also jeopardize direct comparisons. These and other statistical problems need to be considered and dealt with. Statistical problems are even more acute when direct comparisons are either not possible or do not suffice, and it is necessary to make comparisons of two interventions by inferences from their effects relative to their own comparison groups, e.g., inferences must be made by comparing differences between each intervention and its own separate comparison group. Methods for facilitating such indirect comparisons can be described even though they have not yet often been used.
Comparative Effectiveness in Educational Settings
Katherine McKnight, Pearson Corporation, kathy.mcknight@gmail.com
Educational interventions are common, but their direct head-to-head comparison is not so common. Most interventions are evaluated by comparisons to common or standard practices. Nonetheless, there are instructive instances of comparative effectiveness of interventions in education. Some of these have been implicit, comparing special educational arrangements with ongoing “regular” education, the Coleman report being a prim example. Other interventions have been compared more directly, sometimes by dint of coincidence, with two or more interventions happening to occur in the same time frame or the same social/situational frame. And still other, more recent efforts have made direct and deliberate comparisons. The different conceptual, methodological, and statistical issues involved in these efforts are illustrated, and the lessons are instructive.
Comparing Tobacco Control Interventions
Frederic Malter, University of Arizona, fmalter@email.arizona.edu
Interventions aimed at reducing tobacco use, particularly cigarette smoking, have been numerous (an understatement). Relatively infrequent, however, have been attempts to make direct comparisons between different methods of intervention. Hence, comparative effectiveness research in this important social area must rely heavily on inferences involving populations, equivalence of interventions within classes, legitimacy of comparisons between classes, and statistical analyses resting on sometimes dubious assumptions. Nonetheless, the importance and the difficulty of the problem requires making the best that one can of the data that do exist and of the comparisons that can be made, even if simulated. Data on the variability of results of putatively similar intervention programs are helpful in developing expectations (norms) against which new interventions may be judged. Statistical models may also be useful in identifying interventions that are unusually effective (or ineffective). These are not easy solutions but “Nature is never embarrassed by difficulties in analysis.”

Session Title: Serving Two Masters: Local Evaluators Trying to Maintain Evaluation Quality and Use While Participating in a National Multi-site Evaluation
Panel Session 207 to be held in Lone Star F on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Cluster, Multi-site and Multi-level Evaluation TIG
Chair(s):
Tom Kelly, Annie E Casey Foundation, tkelly@aecf.org
Discussant(s):
Mary Achatz, Westat, maryachatz@westat.com
Abstract: Multi-site initiatives are complex programs to implement and evaluating them can be an even more complex undertaking. Making Connections is a community change initiative (CCI) in 10 urban neighborhoods across the U.S. A key component of its national cross-site evaluation has been the implementation of local evaluations that are relevant to and integrated in the neighborhood work on the ground. These local evaluations have been responsible not only for collecting data on the implementation and outcomes of the initiative but also for the building of community capacity to understand and use data for learning and accountability, while also contributing to the cross-site evaluation. The local evaluators have had to navigate the multiple challenges, demands, and differential capacity local evaluation clients and the national funder and cross-site evaluators. This panel will identify lessons learned in strengthening local evaluation quality and relevance.
Two Bosses, Two Clients, Two Timetables: Local eValuators’ Challenges in Meeting Expectations of Community Partners and National Funder and Evaluators
Sue Tripathi, Making Connections, Denver, sue.tripathi@unitedwaydenver.org
Throughout the implementation of Making Connections, there has been a tension between the needs and theories of the local community and organizational partners and the interests of the national funder. In addition, the local evaluation teams have had to deal with and address these tensions in order to develop working partnerships with community and national evaluators, build local evaluation capacity concurrent with the needs of both local and national evaluations, and participate in cross-site evaluation data collection and analyses. This has required much skill and art in local evaluators across the 10 cities. And the local relevance and utility and cross-site uniformity and comparability have been a constant balancing act. The experiences and lessons learned together by the local and national evaluators contribute much insight into establishing and maintaining evaluation quality, rigor, and relevancy.
Building Local Evaluation Capacity of Coalition Partners
Sebastian Schreiner, City of San Antonio, sebastian.schreiner@sanantonio.gov
Creating and maintaining evaluation quality in coalition and community initiative settings is linked to its own specific set of necessary upfront considerations, preparatory work, and execution challenges. These challenges will be explored based on the evaluation capacity building efforts and experiences of Making Connections—both from the perspective of individual sites as well as the cross-site evaluation. This local evaluation capacity cannot be built only within a few individuals or organizations but across the community and coalition as a whole. Building self-evaluation, learning, and accountability capacity and infrastructure among and inside organizations, systems, and the community is a necessary task of the evaluation. But it faces challenges of data quality in multi-partner/multi-site settings, data access and confidentiality, continuously evolving local goals, differences in community partners in levels of sophistication and tolerance levels for data topics, and resolving differences in evaluation philosophy among partners.
Defining and Measuring Community Engagement at Local and National Levels
Tanja Kubas-Meyer, Independent Consultant, tkubasmeyer@cox.net
Community and family engagement within social service programs and community change initiatives is a critical issue, including community participation in evaluation. The definition and vision of engagement varies widely across a spectrum, however, from residents as program participants and/or informants to residents mobilized for political action. The Providence Making Connections initiative is working with multiple partners and at least three evaluative frames, including the national evaluation, to identify a shared vision and set of common or complementary metrics for family engagement in its work to have "children healthy and prepared to succeed in school." The presentation will focus on the learning and challenges in this work and how the evaluation dealt with different definitions and frames of resident and family engagement used by local and national evaluations.

In a 90 minute Roundtable session, the first rotation uses the first 45 minutes and the second rotation uses the last 45 minutes.
Roundtable Rotation I: Truth, Beauty, And Justice For All: A Conversation With Graduate Students Examining Issues of Power, Control, and Evaluation Quality Within the Realm of Graduate Assistantships
Roundtable Presentation 208 to be held in MISSION A on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Graduate Student and New Evaluator TIG
Presenter(s):
Ayesha Boyce, University of Illinois at Urbana-Champaign, boyce3@illinois.edu
Maria Jimenez, University of Illinois at Urbana-Champaign, mjimene2@illinois.edu
Jeehae Ahn, University of Illinois at Urbana-Champaign, jahn1@illinois.edu
Holly Downs, University of Illinois at Urbana-Champaign, hadowns@illinois.edu
Abstract: Graduate students are often limited in their capacity to create effective change due to role limitations and boundaries. Thus, how do graduate students conduct high quality evaluations within the power constraints of their assistantships? This roundtable seeks to begin a conversation starting with four evaluation graduate students of different ethnicities (Black, Latina, Asian and White), school status, evaluation experience and perspectives on the matter. With the understanding that most graduate students lack power and control over the design and implementation of evaluations, we invite other graduate students, novice evaluators, and experts, to join the conversation. The roundtable participants will attempt to clarify how to navigate various graduate student roles and values, all while being responsive to stakeholder and audience needs in order conduct evaluations of high quality.
Roundtable Rotation II: The Role of Evaluation and Research Support in Ensuring Evaluation Quality
Roundtable Presentation 208 to be held in MISSION A on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Graduate Student and New Evaluator TIG
Presenter(s):
Laricia Longworth-Reed, University of Denver, laricia.longworth-reed@du.edu
Kathryn Schroeder, University of Denver, kathryn.schroeder@du.edu
Anna de Guzman, University of Denver, anna.deguzman@du.edu
Abstract: The impact of research and evaluation assistants on evaluation quality is an important topic to the evaluation field. The purpose of the current presentation is to explore how research and evaluation assistants contribute to quality evaluation through their skill sets, to discuss how skills can be developed to mutually benefit assistants and evaluators, to explore the role of research and evaluation assistance as new evaluators, and to further explore the roles of support staff in guaranteeing quality evaluation.

Session Title: Evaluating the Intervention: A Look at Clinical Treatments and Client Implications
Multipaper Session 209 to be held in MISSION B on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Social Work TIG
Chair(s):
Derrick Gervin,  The Evaluation Group, derrick@evaluationgroup.com
Jenny Jones,  Virginia Commonwealth University, jljones@vcu.edu
Community Collaboration for Children in Kentucky
Presenter(s):
Ramona Stone, University of Louisville, ramona.stone@louisville.edu
Gerard Barber, University of Louisville, rod.barber@louisville.edu
Ruth Huebner, Kentucky Cabinet for Health and Family Services, rutha.huebner@ky.gov
Audrey Brock, Kentucky Department for Community Based Services, audrey.brock@ky.gov
Abstract: This is an evaluation research of a secondary prevention program for families with children at risk for abuse and/or neglect. We are reporting on the number and characteristics of participants in the intensive in-home based services, the amount and intensity of services provided, and on the outcomes of program. The data was collected quarterly from the CCC agencies located across the state of Kentucky, beween July 1, 2006 and June 30, 2010. Data items include information on each member of the household. The outcomes are measured with 1) the North Carolina Family Assessment Scale and 2) new referral for abuse/neglect ater graduating from CCC program. We are reporting descriptive statistics and propose a multivariate longitudinal model to explain the variation in the NCFAS outcomes.
Evaluating the Implementation of a Trauma-Informed and Human Rights Curriculum in a School of Social Work
Presenter(s):
Thomas H Nochajski, State University of New York at Buffalo, thn@buffalo.edu
Bincy Wilson, State University of New York at Buffalo, bincywil@buffalo.edu
Abstract: As part of the re-accreditation process for Council on Social Work Education (CSWE), the University at Buffalo School of Social Work has undertaken an alternative project. The alternative project focuses on the development and evaluation of infusing the school’s curriculum with a trauma-informed and human rights perspective (TI-HR). This approach fits well in the Buffalo-Niagara Region given the pressing social issues and needs because of the high poverty rate and slow economic growth. The evaluation procedure for the curriculum implementation utilizes a mixed-methods action research orientation to help build the necessary collaborative process with the community, staff, students, and faculty. Within the context of the action research approach, information from the literature, focus groups and interviews was used to build and refine student assessments. Additionally, staff from various community organizations has been used to assist with refinement of a survey of the application of TI-HR for organizations.
Exploring Behavioral Health Perceptions of U.S. Army Chaplains: Incorporating Quality From Conceptualization to Conclusion
Presenter(s):
Kimberly Farris, United States Army, kimberlydfarris@gmail.com
Abstract: The purpose of this project is to explore Army Chaplains’ perceptions of behavioral health issues including severity and causal attribution of a presented issue, along with probable course of action if approached for assistance. Examining the Chaplains’ beliefs about causation, such as biological or environmental causes, psychological or social, or other beliefs, may provide insight into their attitudes about Soldiers with serious behavioral health issues and the potential effects on their decision-making processes. The bio-psycho-social-spiritual model is the conceptual model used because of the addition of the spirituality concept. This area warrants further investigation within the military given the rise in suicidal behaviors and unrecognized behavioral health issues but also due to the increased acknowledgment of the importance of spirituality in individuals’ lives. While this project is currently in the implementation phase, the presentation will highlight the significance of operationalizing evaluation quality from conceptualization.
Integrating Realist Evaluation in Social Work Practice
Presenter(s):
Mansoor Kazi, State University of New York at Buffalo, mkazi@buffalo.edu
Abstract: This paper presents a new approach to evidence-based practice based on realist evaluation, with the central aim of investigating what interventions work and in what circumstances. As the research designs unfold naturally, data analysis methods are applied to investigate the patterns between client-specific factors, intervention variables, and outcomes. This analysis can be repeated at regular intervals and helps social workers to better target their interventions, and to develop new strategies for users in the circumstances where the interventions are less successful. The paper will include examples from UK and USA, of how services which have a requirement for the repeated use of a reliable outcome measure and the regular updating of information on an electronic database can address the twin problems of application of evidence-based practice and the evaluation of practice to investigate what works and in what contexts, providing regular analyses to inform practice as it unfolds.
Evaluating Mental Health Instruments Within a Cultural Context
Presenter(s):
Maureen Rubin, University of Texas, San Antonio, maureen.rubin@utsa.edu
Goutham Menon, University of Texas, San Antonio, goutham.menon@utsa.edu
Abstract: Using the person in the environment model the social worker uses his/her clinical and social work knowledge and skills to make a comprehensive assessment of the individual in the context of his/her family and the community that they belong. One area social workers engage is in selecting and using valid and reliable instruments that facilitate measurement of certain behaviors in a client over a period of time. The purpose of this paper is to focus on the process of selecting instruments for clinical and research purposes when working with diverse populations. The paper will address specific areas related to the importance of reliability, validity, norming procedures and the cultural sensitivity of instruments when working with diverse populations. The author will select three instruments predominantly used by social workers in the field of mental health and highlight the pros and cons of using the instrument when working with culturally diverse populations.

Session Title: Using a Multi-stage, Mixed Methods Approach to Improve the Design of System Change Evaluations
Multipaper Session 210 to be held in BOWIE A on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Systems in Evaluation TIG
Chair(s):
Beth Stevens,  Mathematica Policy Research, bstevens@mathematica-mpr.com
Using Qualitative Methods to Identify System Dynamics and Inform System Evaluation Design
Presenter(s):
Margaret Hargreaves, Mathematica Policy Research, mhargreaves@mathematica-mpr.com
Abstract: Incorporating systems theory and dynamics into early evaluation planning and data gathering can improve an evaluation’s design by capturing key system conditions, dynamics, and points of influence that affect the operation and impact of the system change initiative. At the start of an evaluation, qualitative methods are useful tools for gathering this kind of preliminary information about a system and its dynamics. Rapid assessments, environmental scans, case studies, and key informant interviews are all methods that are well-suited for identifying and describing system dynamics. The information can then be used to inform the rest of the evaluation’s design and methods. But, when system conditions and dynamics are not incorporated into an evaluation’s design, the evaluation will inevitably miss critical aspects of the initiative and its environment, affecting its operation and success.

Session Title: Third Annual Asa G. Hilliard III Think Tank on Culture and Evaluation
Think Tank Session 211 to be held in BOWIE B on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Multiethnic Issues in Evaluation TIG
Presenter(s):
Cindy A Crusto, Yale University, cindy.crusto@yale.edu
Discussant(s):
Julie Nielsen, NorthPoint Health and Wellness Center Inc, niels048@umn.edu
Katherine A Tibbetts, Kamehameha Schools, katibbet@ksbe.edu
Joanne Farley, Human Development Institute, joanne.farley@uky.edu
Abstract: This annual session recognizes the important and relevant contributions of Dr. Asa G. Hilliard III, an African American professor of educational psychology and African history, to the field of evaluation generally, and to culturally competent and responsive evaluation specifically. We will first provide a brief overview of Dr. Hilliard's lifeworks and second will describe an evaluation that elevated and exemplified his understanding of the importance of “carrying oneself with a deep historical consciousness” and understanding of cultural values and socialization. We will then work in small groups to analyze evaluation scenarios to explore how Afrocentric approaches to research and evaluation (truth, commitment, justice, community, and harmony) might guide evaluation from the beginning to end of the process. Finally, we will reconvene as a large group for a facilitated discussion to translate the learning derived across all of the small groups to explore how these constructs impact evaluation quality in theory and practice.

Session Title: Maintaining Quality in Challenging Contexts
Multipaper Session 212 to be held in BOWIE C on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Advocacy and Policy Change TIG
Chair(s):
Julia Coffman,  Center for Evaluation Innovation, jcoffman@evaluationexchange.org
Quality Seems to Be the Hardest Word: How One United Kingdom (UK) Funder Uses Evaluation to Achieve Policy Change
Presenter(s):
Andrew Cooper, Diana, Princess of Wales Memorial Fund, andrew.cooper@memfund.org.uk
Abstract: The Diana, Princess of Wales Memorial Fund is spending out its remaining capital and will close at the end of 2012. During this period, an ambitious set of policy change objectives have been set. As an advocacy funder, evaluation is vital in helping us to articulate campaign tactics, interim outcomes and broader lessons. However, we have faced considerable challenges in setting an appropriate level of quality for our research and evaluation work. There are a lack of evaluators who are experienced in advocacy work in the UK, and evaluation methods are not always suitable for assessing a contribution to policy change. We have had many debates about what quality means when evaluating policy change. Quality means different things depending on the intended audience of the evaluation, so we prefer to focus on issues such as the flexibility, timing, and potential uses of evidence when working with evaluators.
Building the Advocacy Evaluation Capacity of Community Coalitions: Lessons Learned From the Northwest Community Changes Initiative
Presenter(s):
Ronda Zakocs, Independent Consultant, rzakocs@bu.edu
Christopher Kabel, Northwest Health Foundation, chris@nwhf.org
Noelle Dobson, Community Health Partnership, noelle@communityhealthpartnership.org
Susan Briggs, Independent Consultant, sbriggs@att.net
Abstract: Assisting community-based organizations to evaluate their advocacy efforts continues to be a challenge. The presentation’s objectives are to describe and share lessons learned from the Northwest Health Foundation’s Northwest Community Changes Initiative designed to build the capacity of six community coalitions’ abilities to evaluate progress of their advocacy efforts targeting local policies promoting healthy eating and active living in Oregon and Southwest Washington. Inspired by a community of practice model, the Initiative facilitated several forums for collective learning: in-person workshops; peer-to-peer telephone conference calls; shared electronic work space; and tailored technical assistance. Over an 18-month period, all coalitions diagramed strategy maps, developed evaluation matrices, collected indicator data for at least one selected milestone, and drafted documents communicating milestones for targeted stakeholders. Data will be presented on the extent to which coalitions’ improved their advocacy evaluation capacities as well as the challenges experienced and insights gained by coalition members and consultants.
Agent-based Modeling as a Tool for Evidence-based Public Policy Analysis
Presenter(s):
Andrea Hegedus, Northrop Grumman Corporation, ahegedus2cdc.gov
Jay Schindler, Northrop Grumman Corporation, jay.schindler@ngc.com
Abstract: Public policy analysis is a powerful method to help evaluate and compare the impact of public policies. As the evidence base for public health interventions grows, linking programmatic and outcomes data to the creation and implementation of public policies provides solid grounding for these policies. Evaluation of evidence-based policies can be time consuming and costly. One tool to assess a policy’s potential outcomes and effects prior to implementation is agent-based modeling (ABM). ABM is one approach in the computational social sciences that allows modelers to simulate actions of “agents within environments” to assess their impact on various systems. This presentation uses concrete examples to show how ABM combines empirical data to compare policy alternatives by manipulating characteristics and interactions of subpopulations; the nature and extent of public health interventions, social and environmental variables, cost data; and other factors that exist within a system -- thereby improving policy makers’ decisions.

In a 90 minute Roundtable session, the first rotation uses the first 45 minutes and the second rotation uses the last 45 minutes.
Roundtable Rotation I: Engaging Social Justice in a Graduate Course on Program Evaluation
Roundtable Presentation 213 to be held in GOLIAD on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Teaching of Evaluation TIG and the Multiethnic Issues in Evaluation TIG
Presenter(s):
Leanne Kallemeyn, Loyola University, Chicago, lkallemeyn@luc.edu
Abstract: The purpose of this roundtable will be to share how issues of social justice are intentionally woven into a graduate course on program evaluation. The substance of what is taught, as well as the pedagogy of how it is taught, will aim to expose students to social justice as it relates to evaluation practice. I will share a syllabus that incorporates readings from evaluation theorists who address social justice, as well as course assignments to engage these readings. I will also consider how a course project doing an evaluation may serve as an experiential component for learning about social justice and evaluation.
Roundtable Rotation II: Issues of Quality: Guiding Principles for Culturally Competent Teaching and Practice
Roundtable Presentation 213 to be held in GOLIAD on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Teaching of Evaluation TIG and the Multiethnic Issues in Evaluation TIG
Presenter(s):
Arthur Hernandez, Texas A&M University, art.hernandez@tamucc.edu
JoAnn Yuen, University of Hawaii, Manoa, joyuen@hawaii.edu
Abstract: This roundtable will provide an opportunity for individuals interested in promoting and teaching about cultural competence as part of a general preparation in evaluation methods to share philosophy, practices and challenges. The presenters will discuss evaluation theory and cultural competence from the perspective of teaching (Hernandez) and practice (Yuen) providing general guidelines and suggested practice from the perspective of successful and not-so-successful experience. Participants will discuss, elaborate and suggest related to their own interests and expertise and presenters will collect and organize the proceedings and email results to interested participants in an effort to facilitate self examination and further the development of evaluation skills related to teaching and practice for all involved.

In a 90 minute Roundtable session, the first rotation uses the first 45 minutes and the second rotation uses the last 45 minutes.
Roundtable Rotation I: Integrating Website Use Analytics into a Mixed Method Evaluation of a Professional Development Website
Roundtable Presentation 214 to be held in SAN JACINTO on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Integrating Technology Into Evaluation
Presenter(s):
Randahl Kirkendall, Carleton College, rkirkend@carleton.edu
Ellen Iverson, Carleton College, eiverson@carleton.edu
Monica Bruckner, Carleton College, mbruckne@carleton.edu
Abstract: On the Cutting Edge is a comprehensive program of workshops and related web-based resources that support professional development for geoscience faculty at all stages of their careers. The collection of online resources is referenced by those who attend the workshops, provides a venue for sharing teaching materials, and extends the reach of the program beyond those attending workshops. The evaluation of the program’s first five years involved integrating data from various surveys (large and small), interviews, focus-groups, and web use statistics into a comprehensive report of the progress and outcomes of the program. The authors will describe how their team analyzed and integrated Google Analytics, server-based website statistics, and web page visit logs into the report . Additionally, they will pose several questions to the group regarding their experiences and ideas for using web analytics in program evaluation.
Roundtable Rotation II: Using Technology for Efficiency in Evaluation
Roundtable Presentation 214 to be held in SAN JACINTO on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Integrating Technology Into Evaluation
Presenter(s):
Marycruz Diaz, WestEd, mdiaz@wested.org
Donna Winston, WestEd, dwinsto@wested.org
Abstract: The roundtable discussion explores WestEd evaluators’ experiences evaluating educational programs in California using technology solutions and examines potential technology platforms for improving our work that save time and money. In a time of economic constraints where education funding has severely impacted how educators do their work, we have adapted to clients’ strategies to make do with fewer resources. One such strategy is the use of technology to replace in-person meetings and overcome travel freezes on school districts in California. Technology solutions have cut travel costs and introduced new ways of working efficiently. As advances are made in technology, we must advance our arsenal of evaluation tools. We will discuss the ways we have used technology to evaluate our clients’ work with technology solutions that have unveiled promising practices for evaluating programs. These technologies have also raised questions about how we improve our evaluations.

Session Title: Project Management Software: An Important Multi-Purpose Tool in an Evaluation Unit’s Toolbox
Demonstration Session 216 to be held in TRAVIS B on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Evaluation Managers and Supervisors TIG
Presenter(s):
Stacey Farber, Cincinnati Children's Hospital Medical Center, stacey.farber@cchmc.org
Tracy Gnadinger, Cincinnati Children's Hospital Medical Center, tracy.gnadinger@cchmc.org
Janet Matulis, Cincinnati Children's Hospital Medical Center, janet.matulis@cchmc.org
Abstract: Managers of evaluation units or complex, multi-resourced projects require data to properly run their business, ensure its success, and clearly articulate its value. Project management (PM) software can be an extremely valuable tool in a manager’s or staff person’s toolbox. The long-term benefits of using PM software may include improved resource allocation / distribution, overall service quality, team communication, and articulation of business value. However, success utilizing and maximizing benefits from PM software is grounded in an evaluation unit’s ability to operationalize its work and establish management norms. Through this demonstration presentation, one evaluation unit will share (a) business needs that led to the adoption of a PM tool (specifically @task), (b) implementation, use, and functions of the tool, and (c) lessons learned. This demonstration will be ideal for managers and staff who are interested in, implementing, or looking to improve their use of a software solution for business management.

Session Title: Assessing Evaluation Capacity: Using the Evaluation Capacity Diagnostic Tool
Demonstration Session 217 to be held in TRAVIS C on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Organizational Learning and Evaluation Capacity Building TIG
Presenter(s):
Lande Ajose, BTW Informing Cchange, lajose@informingchange.com
Kristi Kimball, William and Flora Hewlett Foundation, kkimball@hewlett.org
Abstract: This workshop will help evaluators determine a nonprofit’s readiness for evaluation. After presenting several stories of working with nonprofits with vastly different capacities, this session will share how to use the recently developed Evaluation Capacity Diagnostic Tool. Intended for nonprofits, this tool is designed to help organizations assess their readiness to take on many types of evaluation activities. It captures information on organizational context and the evaluation experience of staff, and can be used in various ways. The tool can pinpoint particularly strong areas of capacity as well as areas for improvement, and can calibrate changes over time in an organization’s evaluation capacity. In addition, this diagnostic can encourage staff to brainstorm about how their organization can enhance evaluation capacity by building on existing evaluation experience and skills. Finally, the tool can serve as a precursor to evaluation activities with an external evaluation consultant. This workshop is designed as a practicum that builds on the conference session Measuring the Immeasurable: Lessons for Building Grantee Capacity to Evaluate Hard-To-Assess Efforts.

Session Title: Improving Evaluation Quality Through and In the Arts
Multipaper Session 218 to be held in TRAVIS D on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Evaluating the Arts and Culture TIG
Chair(s):
Ashlee Lewis,  University of South Carolina, ashleealewis@hotmail.com
Discussant(s):
Debra Smith,  Lesley University, dsmith22@lesley.edu
Arts Integration to Enhance Learning Outcomes
Presenter(s):
Dawson Hancock, University of North Carolina at Charlotte, dhancock@uncc.edu
Abstract: In 2009-2010, the ArtStart project in Charlotte-Mecklenburg Schools in Charlotte, North Carolina helped classroom teachers, arts teachers, and teaching artists infuse the arts into 3rd and 4th grade curriculum. Conducted in 39 elementary classrooms, the project focused on arts infusion, classroom teacher/arts teacher collaboration, and arts application practices by fourteen visual and performing artists or arts organizations. Evaluators collected quantitative and qualitative data through direct observation of classroom activities, review of existing documents and information collected by classroom and arts teachers, surveys of ArtStart participants, and individual and group interviews of students, teachers, administrators, and teaching artists. This presentation's purpose is to discuss the evaluation findings related to the ArtStart project’s goals and to examine implications and lessons learned.
The Potential for Arts-Informed Inquiry in Educational Program Evaluation
Presenter(s):
Michelle Searle, Queen's University at Kingston, michellesearle@yahoo.com
Abstract: This qualitative case study examines the value of arts-informed inquiry within an evaluation of a school-based program, in one school board. In this study I adopted dual roles of evaluator and researcher to gather empirical evidence about the power of including the arts in the field of evaluation. Arts-informed inquiry is draws from creative strategies in the arts, but is not rooted in the arts. This research had 3 goals: (1) to understand in what contexts, and for what purposes it is appropriate to intentionally craft arts-informed inquiry as a feature of evaluation; (2) consider what the crafting of arts-informed inquiry in evaluation looks like; and (3) explore what evidence there is that arts-informed inquiry adds value to evaluative processes and outcomes. This study contributes theoretical implications for the field of evaluation as well as providing growing documentation about the role of qualitatively oriented, arts-informed inquiry.
State of the Arts: Evaluation of a K-12 Arts Education Program
Presenter(s):
Janet Mahowski, University of South Florida, mahowskij@pcsb.org
Abstract: The 2001 No Child Left Behind Act identified the arts among core academic subjects; requiring schools to enable all students to achieve in the arts and to reap the full benefits of arts education. Shortly after, the state of Florida followed suit and approved legislation to include arts as a core credit required for high school graduation, and grade level expectations for arts were adopted. However, efforts to assess and reform our schools have focused on only four “core” subjects; reading, writing, math and science. Pressure on Florida’s schools to improve student performance in these areas is intense. As school districts respond to these challenges, it becomes necessary to evaluate how these demands are affecting arts education. This is an evaluation of the K-12 arts education program within one Florida school district. It was conducted by capturing data from each school site and utilizing data analysis techniques to gain an overview of the arts program.

Session Title: Mixed-Method Evaluation in Human Services Settings
Multipaper Session 219 to be held in INDEPENDENCE on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Human Services Evaluation TIG
Chair(s):
Cheryl Meyer,  Wright State University, cheryl.meyer@wright.edu
Program Evaluation of Multidisciplinary Training on Trauma Informed Care
Presenter(s):
Jessica Heschel, Wright State University, heschel.2@wright.edu
Anne Willis, Wright State University, willis.54@wright.edu
Abstract: This evaluation was conducted for a child advocacy center, which was attempting to encourage the implementation of Trauma Informed Care (TIC) within county organizations and create a learning community to support the model. It consisted of both an assessment of the perceived need for TIC within the agencies, and barriers to implementation of the model. In an effort to complete their goal, the agency designed three trainings on TIC. The purpose of the trainings was to inform non-mental health agencies about the signs and effects of trauma. Prior to the training, participants took an internet-based survey constructed by the evaluators. In addition evaluators conducted a focus group following the training, thus using a mixed-methods design to enhance the quality and validity of the evaluation. Results indicated a high need for TIC within the community; however, participants felt they lacked sufficient information to implement and fund TIC.

Session Title: Back to the Basics and Beyond
Multipaper Session 220 to be held in PRESIDIO A on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Quantitative Methods: Theory and Design TIG
Chair(s):
Raymond Hart,  Georgia State University, rhart@gsu.edu
Consequences of Violating Fundamental Assumptions of the Central Limit Theorem in Evaluation and Research Practice
Presenter(s):
Raymond Hart, Georgia State University, rhart@gsu.edu
Abstract: Recent trends in national evaluations and research projects have used the central limit theorem as a basis for identifying classrooms, schools, and schools districts as statistical outliers on various dependent variables. These studies often ignore or overlook a fundamental assumption in the application of the Central Limit Theorem that the distribution of students to classrooms, schools, and schools districts must be random. In practice, the distribution of students across educational entities and programs is systematic based on social economic status, ability, or other variables. This paper uses simulated data to illustrate the increased likelihood of identifying classrooms, schools, or school districts as statistical outliers when the distribution of students is systematically drawn from a restricted range of the normal distribution. The paper also provides practical examples the social, political and financial implications when statistical conclusions are based on inaccurate application of the Central Limit Theorem.
The Chi-Square Test: Often Used and More Often Misinterpreted
Presenter(s):
Todd Franke, University of California, Los Angeles, tfranke@ucla.edu
Christina Christie, University of California. Los Angeles, tina.christie@ucla.edu
Abstract: The “chi - square test” or more appropriately the 3 or possibly 4 tests that get referred to as the “chi - square test,” represent one of the most common statistical procedures utilized by evaluators for examining categorical data. While the calculations are identical, the circumstances under which the Chi Square test of independence is appropriate compared to the Chi Square test of homogeneity is often misunderstood by evaluators and leads to a diminished quality of evaluation reports. This proposal will examine the use of the family of chi-square based tests across several of the evaluation journals (e.g., Evaluation Review, Journal of the American Evaluation Association. New Directions), identify examples of use and misuse and present information to clarify the correct usage of each of these chi-square tests and the subsequent interpretation of the results. Finally the presentation will discuss the appropriate use of post hoc comparisons procedures for the Chi-square test of homogeneity.
Data Reduction and Classification Decisions: Using a Factor Analytic Approach to Examine Exemplary Teaching Characteristics
Presenter(s):
Sheryl Hodge, Kansas State University, shodge@ksu.edu
Jan Middendorf, Kansas State University, jmiddend@ksu.edu
Linda Thurston, National Science Foundation, lthursto@nsf.gov
Cindi Dunn, Kansas State University, ckdunn@ksu.edu
Abstract: Utilizing data gathered from a previously administered electronic data collection endeavor, evaluators at the Office of Educational Innovation and Evaluation (OEIE) sought to test whether substantive underlying themes were being masked within the larger Exemplary Teaching Characteristics instrument. Following Dillman’s tailored design method (Dillman, 2007), OEIE administered the Web-based Exemplary Teacher Characteristics Survey to 6,044 professional educators. Soon after, a previously identified expert teacher professional development cadre convened to disaggregate survey items into distinguishable professional development levels. Using three distinct exploratory factor analysis approaches, OEIE framed decision-making parameters used to improve the overall quality of the measure. One of these strategies emerged, in corroboration with the experts, to provide further statistical evidence of construct validity. The decision-making processes, as well as the interpretations and identification of the rotated pattern matrices frame the lessons learned for evaluation practice.
Estimating Program Impact Using the Bloom Adjustment for Treatment No-Shows: Evaluation of a Literacy Intervention With Hierarchical Linear Modeling
Presenter(s):
Jing Zhu, Metis Associates, jzhu@metisassoc.com
Jonathan Tunik, Metis Associates, jtunik@metisassoc.com
Alan Simon, Metis Associates, asimon@metisassoc.com
Abstract: This study estimates program impact in a randomized controlled trial (RCT) study in which a substantial proportion of treatment students with outcome data do not actually receive the intervention. In RCT studies, researchers typically analyze intention-to-treat (ITT) samples to preserve randomization. Because of treatment “no-shows”, however, ITT analyses tend to underestimate the treatment effect on those who do receive the intervention as intended. The Bloom adjustment is generally considered a useful approach to convert an ITT estimate into a treatment-on-the-treated (TOT) estimate based on a key assumption that no-shows experience zero impact from the intervention. The present study applies the Bloom adjustment to adjust both the impact estimate and its standard error in an impact study of a literacy program using hierarchical linear modeling. The adjusted TOT estimate is compared to the ITT estimate regarding magnitude and statistical significance. General issues in applying the Bloom adjustment are also discussed.

Session Title: The Power of Metaphor: Using Images for Organizational Analysis
Skill-Building Workshop 221 to be held in PRESIDIO B on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Organizational Learning and Evaluation Capacity Building TIG
Presenter(s):
Maggie Huff-Rousselle, Social Sectors Development Strategies, mhuffrousselle@ssds.net
Bonnie Shepard, Social Sectors Development Strategies, bshepard@ssds.net
Abstract: Using the American Evaluation Association as a practical unit of analysis, participants will practice an imaging technique that can be used as a qualitative method for highly participatory organizational evaluations. This technique has been used over the past 12 years on very different organizations working in Africa, Asia and North America, and examples from the purposes served by the technique and the insights gained will be provided as part of the workshop. The imaging exercise provides rapid insights, via contrasting or similar themes, and can be used as an ideal icebreaker to launch an organizational self-analysis and strategic planning process, where the evaluation approaches and techniques merge into the design of interventions. The technique was inspired by Gareth Morgan’s classic book, “Images of Organization,” considered a break through in thinking because of the ways in which it used metaphors to analyze and explain organizations.

Session Title: Evaluation of Youth Mental Health and Substance Abuse Interventions
Multipaper Session 222 to be held in PRESIDIO C on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Alcohol, Drug Abuse, and Mental Health TIG
Chair(s):
Melissa Rivera,  National Center for Prevention and Research Solutions, mrivera@ncprs.org
Out-of-Home Treatment for Youth With Mental Health Needs and Juvenile Justice Involvement
Presenter(s):
John Robst, University of South Florida, jrobst@fmhi.usf.edu
Mary Armstrong, University of South Florida, marmstrong@fmhi.usf.edu
Norin Dollard, University of South Florida, ndollard@fmhi.usf.edu
Abstract: Objective: To examine juvenile justice recidivism for youth with mental health needs who have been removed from the home and have previous contact with the juvenile justice system. Data: Florida Medicaid claims, Child Welfare, and Juvenile Justice data from 2002-2008. The unit of observation was a juvenile justice contact, thus youth can have repeated observations. Methods: Generalized estimating equations for repeated observations and individual fixed effects specifications were used to examine the relationship between juvenile justice recidivism and the receipt of out-of-home mental health treatment within 90 days of the base arrest. Results: Youth with out-of-home treatment had lower rates of rearrest. However, while youth treated in inpatient settings and therapeutic foster care had reduced rates of rearrest, youth treated in therapeutic group care did not. Conclusion: Out-of-home mental health treatment is associated with lower rearrest rates. Inpatient care and therapeutic foster care was more beneficial than group home care.
Establishing Quality Evaluation Methods in New Terrain: Lessons Learned From a Social Host Ordinance Impact Evaluation
Presenter(s):
Kristen Donovan, EVALCORP, kdonovan@evalcorp.com
Shanelle Boyle, EVALCORP, sboyle@evalcorp.com
Julie Slay, EVALCORP, jslay@evalcorp.com
Dan Hicks, Ventura County Health Department, daniel.hicks@ventura.org
Abstract: Social Host Ordinances (SHO) are a rapidly growing approach for reducing underage drinking at “home parties”, and decreasing associated harms to young people, incidences of community disturbance, and other issues stemming from underage drinking in private settings. Across the U.S., numerous cities and counties are implementing SHOs, yet little is known about their actual impacts. To help fill this gap, we conducted what is thought to be one of the first SHO impact evaluations. This presentation will provide: (1) an overview of the impact evaluation methodology, including transferable evaluation approaches and tools, (2) evidence about the impact that the SHOs have made thus far, and (3) evaluation lessons learned and recommendations for practice. Attendees will leave this session better equipped to conduct policy evaluation studies relevant to their own communities.
Reducing School Violence to Improve Mental Health and Scholastic Achievement in New Orleans High Schools
Presenter(s):
Marsha Broussard, Louisiana Public Health Institute, mbroussard@lphi.org
Lisanne Brown, Louisiana Public Health Institute, lbrown@lphi.org
Paul Hutchinson, Tulane University, phutchin@tulane.edu
Nathalie Ferrell, Tulane University, natferrell@gmail.com
Sarah Kohler, Louisiana Public Health Institute, skohler@lphi.org
Abstract: The prevalence of school violence – both its actualization and its threat – can significantly impact upon student well-being, affecting mental health, scholastic achievement, risk behaviors and numerous other aspects of teenagers’ lives. This study examines data from the 2009 School Health Connection Survey, which collected information on experiences with violence by male and female public high school students in New Orleans as well as information on drug and alcohol use, sexual behaviors, socioeconomic background and scholastic achievement. Multilevel multivariate regression analysis incorporating relationship, school, and neighborhood level contextual influences are used to provide an understanding of the relationships between school-based violence and concurrent measures of mental health status (e.g. depression, antisocial behaviors) and scholastic achievement (e.g. attendance, performance).
Evaluation of School Counseling Programs in Rural Middle Tennessee
Presenter(s):
Randall Reiserer, Centerstone Research Institute, randall.reiserer@centerstone.org
Ajanta Roy, Centerstone Research Institute, ajanta.roy@centerstone.org
Brad Martin, Centerstone Research Institute, brad.martin@centerstone.org
Abstract: We present evaluation results for a school-based therapy program implemented in two rural counties of Middle Tennessee for elementary schoolchildren with behavioral and emotional problems. School counselors from a community mental health center used three evidence-informed models (the Circle of Courage, Wrap-around Care Principles, and the 12 Principles of Re-Education) in their work with children. Program counselors also worked with parents and teachers to establish and maintain behavioral goals for the students. Our evaluation measured whether these mental health services and interventions enhanced personal growth, progress at school, and emotional well being by decreasing office referrals and problem severity, and by improving behavioral functioning. Results from the Ohio Scales showed significant decreases in problem severity and improvement in behavioral functioning. Achenbach teacher report data showed improvements in mental health indicators and academic performance. School disciplinary data showed that the counseling program improved school attendance and reduced disciplinary referrals.

In a 90 minute Roundtable session, the first rotation uses the first 45 minutes and the second rotation uses the last 45 minutes.
Roundtable Rotation I: Learning About Educational Reform From a Seven-Year Math-Science Partnership
Roundtable Presentation 223 to be held in BONHAM A on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Pre-K - 12 Educational Evaluation TIG and the Independent Consulting TIG
Presenter(s):
Cynthia Tananis, University of Pittsburgh, tananis@pitt.edu
Cara Ciminillo, University of Pittsburgh, ciminill@pitt.edu
Tracy Pelkowski, University of Pittsburgh, ceac@pitt.edu
Keith Trahan, University of Pittsburgh, ceac@pitt.edu
Yuanyuan Wang, University of Pittsburgh, ceac@pitt.edu
Gail Yamnitzky, University of Pittsburgh, ceac@pitt.edu
Rebecca Price, University of Pittsburgh, ceac@pitt.edu
Abstract: Bring together 53 school districts, four IHEs, four intermediate units (local agencies of the state department of education), three evaluation groups, thousands of teachers and administrators --- focus on changing culture, professional development, teaching and learning --- add millions of dollars, and what do you get? The NSF and Education Department have funded Math-Science Partnerships (MSP) for a number of years, collectively designed to impact the math-science pipeline of qualified students through the PK-16 system and simultaneously increasing the quality of the math-science teacher workforce. This presentation and discussion summarizes the extensive, collaborative research and evaluation efforts in the Southwest Pennsylvania MSP across seven years and presents a summary of what we have learned, how we learned it, and, importantly, what we were unable to learn from the evaluation and project. The session focuses on the findings of the evaluation but also offers insights about conducting longer-term, collaborative evaluation in the area of educational reform across complex and evolving systems.
Roundtable Rotation II: When Quality and Policy Collide in Evaluating Math-Science Partnership Programs: Strategies for Resolution
Roundtable Presentation 223 to be held in BONHAM A on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Pre-K - 12 Educational Evaluation TIG and the Independent Consulting TIG
Presenter(s):
MaryLynn Quartaroli, Professional Evaluation & Assessment Consultants, marylynn.quartaroli@nau.edu
Hollace Bristol, Coconino County Education Services Agency, hbristol@coconino.az.gov
Abstract: The US Department of Education’s Mathematics and Science Partnership (MSP) competitive grant programs encourage partnerships between local school districts and universities to collaboratively engage in professional development activities aimed at increasing teachers’ content knowledge and improving pedagogical practices. However, determining the quality of these programs can be a contested area, in terms of what constitutes meaningful evidence for the stakeholders: federal agency, state departments of education, local educational agency, higher education instructional teams, and participating teachers and administrators. This conflict most often arises when the context for the evaluation is so different: for the source(s) of funding, “quality-as-measured” is likely sufficient; for the local agency, instructional team, and teachers, “quality-as-experienced” may be more important and useful. In these circumstances, the independent evaluator can find it problematic to provide high quality evaluations and to maintain high quality relationships with both levels of funding agencies. This session will examine these critical issues.

Session Title: How Did You Do It? Implementing Performance Measurement and Monitoring Systems
Multipaper Session 224 to be held in BONHAM B on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Non-profit and Foundations Evaluation TIG
Chair(s):
Eric Barela,  Partners in School Innovation, ebarela@partnersinschools.org
Monitoring and Evaluation (M&E) at Room to Read: Concepts and Practice in an International Non-Profit Educational Organization
Presenter(s):
Michael Wallace, Room to Read, michael.wallace@roomtoread.org
Rebecca Dorman, Room to Read, rebecca.dorman@roomtoread.org
Peter Cooper, Room to Read, peter.cooper@roomtoread.org
Abstract: Room to Read is a nonprofit organization committed to transforming the lives of children in developing countries by focusing on literacy and gender equality in education. Our M&E system is based on program-specific logical frameworks that include goals, objectives, and indicators. M&E is part of a performance management system that helps us improve planning and report results. This paper describes the challenges of matching practical activities with conceptual frameworks, including: • Monitoring: choosing indicators (outputs and outcomes); developing a system for collecting and analyzing data; determining how much data is enough (census or sample); data collection frequency; and ensuring buy-in from stakeholders in data collection and analysis. • Evaluation: choosing the type (outcome/summative or process/formative); developing questions and hypotheses; choosing an evaluator (internal or external); design decisions (number of countries, level of significance for program decisions); and working with the evaluator. The paper concludes with lessons learned and challenges ahead.
Approach to Performance Measurement and Effectiveness of the Children’s Investment Fund Foundation
Presenter(s):
Nalini Tarakeshwar, Children's Investment Fund Foundation, nalini@ciff.org
Peter McDermott, Children's Investment Fund Foundation, pmcdermott@ciff.org
Anna Hakobyan, Children's Investment Fund Foundation, ahokobyan@ciff.org
Tomohiro Hamakawa, Children's Investment Fund Foundation, thamakawa@ciff.org
Abstract: The Children’s Investment Fund Foundation (CIFF) invests in programmes that demonstrably improve the lives of children living in poverty in developing countries by achieving large scale and sustainable impact in areas of children’s health, nutrition and education. Robust Performance Measurement and Effectiveness (PME) systems ensure ongoing course correction and independent evaluation through the life of the programme. Evaluations adopt a fit-for-purpose approach wherein the methodology is shaped by feasibility and programmatic needs. Emphasis is placed on making relevant, timely and high quality data available for course correction and to leverage findings for potential policy and practice change. CIFF attempts to take evidence-based, evaluation-centred programming to a new level in international development. This paper discusses the challenges and benefits of such an approach by presenting three cases: an HIV/AIDS initiative in India, a rural healthcare initiative in Ethiopia, and a last-mile health delivery model in Uganda.
A Strategic Planning Case Study: Implementing a Data Dashboard for a Nonprofit Board’s Self-Evaluation, Monitoring, and Evidence-based Decision Making
Presenter(s):
Veronica Smith, data2insight, veronicasmith@data2insight.com
Abstract: As strategic planning committee co-chair, I led the effort to more systematically monitor and evaluate a public radio station’s progress against its strategic plan, resulting in the design and implementation of a strategic planning dashboard. This paper serves as a practicum project report, summarizing the initiative process and offering lessons learned. The process began with a facilitated board discussion, which resulted in a list of committee action items, including creating and organizing metrics for more frequent board review. We partnered with management and board to identify metrics for the organization’s strategic priorities. We also thoughtfully designed the dashboard to ensure effective and efficient communication. Finally, we created practices and protocols to sustain data quality, use, and accuracy. The dashboard rollout represented a successful strategic planning initiative that 1) resulted in improved management and board partnership and 2) increased the quality of management’s and board’s self-evaluation, monitoring, and evidence-based decision making.
Building and Implementing a Performance Management System to Inform Evaluation: Lessons Learned
Presenter(s):
Eric Barela, Partners in School Innovation, ebarela@partnersinschools.org
Abstract: This paper documents an education nonprofit’s efforts to build and implement a performance management system designed to provide data on both individual and organizational accountability and to inform internal and external evaluation efforts. In 2009, experts from the corporate world were hired to design the system’s architecture and roll out a prototype. Around this time, the nonprofit also experienced substantial changes in its leadership structure and began to consider scaling up. As such, the system needed to also inform evaluation efforts so the nonprofit could provide evidence of its effectiveness to attract potential new funders. This paper will explore lessons learned from designing the system, ensuring that the generated data could inform both internal and external evaluation efforts, and changing the organization’s culture to understand the need to link individual performance to programmatic effectiveness.

Session Title: Evaluating K-8 Literacy Programs: Methods and Models
Multipaper Session 225 to be held in BONHAM C on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Pre-K - 12 Educational Evaluation TIG
Chair(s):
Andrea Beesley,  Mid-continent Research for Education and Learning, abeesley@mcrel.org
Discussant(s):
Lizanne Destefano,  University of Illinois at Urbana-Champaign, destefan@illinois.edu
Challenges to the Utility of Evaluation of Early Elementary Tutoring and Learning Support Services
Presenter(s):
Magdalena Rood, Third Coast R&D Inc, mrood@thirdcoastresearch.com
Cindy Roberts-Gray, Third Coast R&D Inc, croberts@thirdcoastresearch.com
Abstract: A key problem for agencies providing tutoring and learning support services for young children in school is to demonstrate improved child performance in the classroom. As experienced in two agencies under contract with the Austin Independent School District in Austin, Texas, in-house measures consistently, year-after-year demonstrate improvements, but often these results were not replicated on district measures. Whereas both program and school measures are diagnostic in nature, the mismatch between measures appears to be due to philosophical differences between test developers about factors such as age-appropriate skills and standards, testing method, and test administrator role. The proposed paper will review findings of early literacy tutoring programs conducted from 2004 through 2009, and will discuss the implications of the misalignment between learning support and classroom measures for demonstrating convincing results for children
Evaluation of Reading Intervention Effectiveness Using Growth Models
Presenter(s):
Tammiee Dickenson, University of South Carolina, tsdicken@mailbox.sc.edu
Jennifer Young, South Carolina Department of Education, youngjey@gmail.com
Abstract: This study compared achievement growth of students who received supplemental reading intervention services with students who did not receive services. Students were selected for intervention based on academic need. To study intervention effectiveness, the differential growth rate of students who received services was of interest. The sample consisted of approximately 1500 students in 43 schools that participated in the South Carolina Reading First (SCRF) Initiative during three consecutive years. The Stanford Reading First assessment was administered to SCRF students in grades 1-3 in the fall and spring of each school year. A three-level hierarchical linear model was used to model growth in achievement for students who participated in all three years. A quadratic term was included to account for change in growth rate over time. Comparisons were made by whether students received intervention with three types compared for grade 1. Results indicate significant gains with intervention provided in early grades.
Impact of READ180 on At-Risk Middle School Students’ Literacy Outcomes
Presenter(s):
Margaret Gheen, Montgomery County Public Schools, mhgheen@hotmail.com
Shahpar Modarresi, Montgomery County Public Schools, shahpar_modarresi@mcpsmd.org
Abstract: This evaluation examines literacy achievement of middle school students with a history of low performance who were enrolled and not enrolled in READ 180, a program designed to accelerate reading achievement. End-of-year literacy scores on the Measures of Academic Progress-Reading (MAP-R) and the Maryland School Assessment (MSA) in reading were compared among three groups: students enrolled in classes implementing READ 180 with higher versus lower fidelity and students not enrolled in the program. Findings across grade levels, literacy measures, and statistical methods yielded small and sometimes inconsistent patterns of differences among groups. However, overall, READ 180 students had slightly higher end-of-year reading scores than nonparticipants. The greatest differences among groups were observed in Grade 6: students enrolled in READ 180 classes scheduled for 90 minutes daily and with higher program fidelity demonstrated the highest end-of-year performance.
Evaluating Professional Development Training in Early Literacy: Alternatives for Measuring Participant Use of New Skills Post Training– Use of Action Plans and Levels of Use Indicators
Presenter(s):
Ann Zukoski, Rainbow Research Inc, azukoski@rainbowresearch.org
Joanne Knapp-Philo, National Head Start Family Literacy Center, joanne.knapp-philo@sonoma.edu
Kim Stice, National Head Start Family Literacy Center, kim.stice@sonoma.edu
Abstract: A key challenge of evaluating professional development training is measuring how participants’ use and apply new knowledge and skills on the job post-training (Gusky, 2000; Phillips & Stone, 2002). Research shows that adoption of new skills is not likely to be universal or complete and participants need time to reflect and adapt new concepts to their own context. Therefore, gathering and analyzing information about whether or not new practices are used and how well they are used are essential evaluation activities for formative and summative use. Action plans and measures of level use represent important mechanisms of measuring participants’ use of new knowledge and skills, the degree of implementation and the quality of implementation. In this presentation, we will share examples of actions plans and applications of level of use indicators to assess a national training program for early childhood educators. Challenges and opportunities will be discussed.

Session Title: Methods Leading to Higher Quality Evaluations in Education Evaluation
Multipaper Session 226 to be held in BONHAM D on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Pre-K - 12 Educational Evaluation TIG
Chair(s):
Tom McKlin,  The Findings Group LLC, tom.mcklin@gmail.com
Discussant(s):
Anane Olatunji,  Fairfax County Public Schools, aolatunji@fcps.edu
Maturation Effects of Multi-cycle Initiatives: The Importance of Time Sensitive Indicators When Evaluating Impact
Presenter(s):
Lindsey Rosecrans, State University of New York at Albany, lr759168@albany.edu
Kathy Gullie, State University of New York at Albany, kp9854@albany.edu
Abstract: The purpose of this paper is to present the importance of maturity and continuity of grant evaluation strategies and findings related to the impact of evaluating student achievement. The paper compares the outcomes of two initiatives taking into account the differences in program maturity levels. Using a drill down process, the projects assessed student-related outcomes via a series of data sources that ranged from statewide tests, classroom observations, and individual student classroom performance and student portfolios. Evaluation data were used to support a series of analyses that traced students across grades, teachers, and instructional modes. Analysis of student work and its relationship with the maturity and continuity of grant initiatives reflected the increasingly complex nature of evaluating student achievement. Techniques for considering maturation as a co varied variable also will be discussed.
Enhancing Evaluation Quality of Programs Serving Youth: Data Collection Strategies and Ethical Issues
Presenter(s):
Katherine Byrd, Claremont Graduate University, katherine.byrd@cgu.edu
Tiffany Berry, Claremont Graduate University, tiffany.berry@cgu.edu
Susan Menkes, Claremont Graduate University, susan.menkes@cgu.edu
Krista Collins, Claremont Graduate University, krista.collins@cgu.edu
Abstract: Program evaluators face unique challenges collecting data with youth, yet including children in the evaluation process has been shown to enhance the quality of inquiry (Walker, 2007). Understanding developmentally appropriate strategies for collecting data among the youth population will not only enhance the quality of the data collected, but will also increase the sensitivity for detecting program effects. The purpose of this paper is three-fold. First, we will discuss the importance of considering age (i.e., chronological age, developmental level, etc.) and developmental domains (i.e., cognitive, social, physical, etc.) when evaluating programs serving children and youth. Second, quantitative and qualitative strategies for enhancing the quality of responses/data among children will be discussed, especially in relation to changes in age and domain across time. Third, we discuss how evaluation quality improves with respect to ethics when one considers the salient developmental issues pertinent to conducting evaluations with children.
Impacts of Comprehensive Teacher Induction: Results From the Second Year of a Randomized Controlled Study
Presenter(s):
Eric Isenberg, Mathematica Policy Research, eisenberg@mathematica-mpr.com
Steven Glazerman, Mathematica Policy Research, sglazerman@mathematica-mpr.com
Martha Bleeker, Mathematica Policy Research, mbleeker@mathematica-mpr.com
Amy Johnson, Mathematica Policy Research, ajohnson@mathematica-mpr.com
Julieta Lugo-Gil, Mathematica Policy Research, jlugo-gil@mathematica-mpr.com
Mary Grider, Mathematica Policy Research, mgrider@mathematica-mpr.com
Sarah Dolfin, Mathematica Policy Research, sdolfin@mathematica-mpr.com
Edward Britton, WestEd, tbritto@wested.org
Abstract: This project evaluates the impacts of comprehensive teacher induction on teacher instructional practices, attitudes, retention, and student achievement. We use an experimental design in which 252 elementary schools with 561 beginning teachers in 10 school districts were randomly assigned to a treatment group receiving one year of comprehensive induction or a control group, and 166 elementary schools with 448 beginning teachers in 7 other school districts were randomly assigned to a treatment group receiving two years of comprehensive induction or a control group. Teachers in the control group took part in the induction programs provided by the district. We collected survey and administrative data for four years after random assignment in summer 2005, and conducted classroom observations during the spring of teachers’ first year. While treatment teachers received significantly more support than control teachers during the intervention, this did not translate into significant impacts on the outcomes after two years.
Using a Mixed Methods Design to Identify Exemplary College Access Centers in Texas High Schools
Presenter(s):
Jacqueline Stillisano, Texas A&M University, jstillisano@tamu.edu
Hersh Waxman, Texas A&M University, hwaxman@tamu.edu
Yuan-Hsuan Lee, Texas A&M University, jasviwl@neo.tamu.edu
Kayla Braziel Rollins, Texas A&M University, kaylarollins@gmail.com
Rhonda Goolsby, Texas A&M University, rhonda2000@tamu.edu
Chyllis Scott, Texas A&M University, chyllisscott@neo.tamu.edu
Abstract: This paper reports on an evaluation study commissioned to examine the impact of college access centers, called GO Centers, established in Texas high schools with the goal of assisting students with college preparation activities and increasing college applications and college enrollment in the schools in which the Centers were implemented. Using a mixed-methods design, the evaluation team examined factors that contributed to a GO Center being successful at creating a college going culture and identified practices and program components that were prevalent across exemplary centers. Quantitative data were used to identify 30 GO Centers that demonstrated exemplary outcomes on at least one of several different variables, and 6 of these 30 sites were selected to participate in in-depth case studies designed to provide a comprehensive picture of each Center's specific experiences, challenges, and opportunities related to developing and implementing an exemplary college access program.

Session Title: Enhancing the Quality of Evaluation Through Collaboration Among Funders, Programs, and Evaluators: The Example of the New York City Health Bucks Program Evaluation
Panel Session 227 to be held in BONHAM E on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Collaborative, Participatory & Empowerment Evaluation TIG
Chair(s):
Jan Jernigan, Centers for Disease Control and Prevention, ddq8@cdc.gov
Abstract: In response to the growing public health crisis of childhood obesity, the Division of Nutrition, Physical Activity, and Obesity at the Centers for Disease Control and Prevention (CDC), is working to identify promising local obesity prevention and control interventions. In support of this activity, CDC has funded Abt Associates Inc. to conduct a process and outcome evaluation of one such program, New York City Health Bucks, an innovative financial incentive program operated by the New York City Department of Health and Mental Hygiene (DOHMH) to increase access to and purchase of fresh fruits and vegetables in three high-need, underserved NYC neighborhoods. Panel presenters from CDC, DOHMH, and Abt will discuss how collaboration among these key entities has led to the design and implementation of a high-quality, methodologically sound evaluation, which builds on DOHMH’s prior evaluation efforts, and which will inform both CDC and other localities interested in implementing similar initiatives.
The Intersection of Quality and Collaboration: From a Funder's Perspective
Gayle Payne, Centers for Disease Control and Prevention, hfn5@cdc.gov
Jan Jernigan, Centers for Disease Control and Prevention, ddq8@cdc.gov
Improving nutrition through increased consumption of fruits and vegetables is an important objective of the CDC Division of Nutrition, Physical Activity, and Nutrition (DNPAO). One recommended strategy is to utilize farm-to-where-you-are programs that promote the delivery of regionally grown produce to farmers’ markets. DNPAO collaborated with the New York City Department of Health and Mental Hygiene (NYC DOHMH) and Abt Associates, the selected contractor, to evaluate the NYC Health Bucks Program designed to increase access to fruits and vegetables for low income families. This presentation will describe the ways that DNPAO, as Technical Monitor of the evaluation, collaborated with DOHMH and Abt to facilitate common goals and promote an effective evaluation, highlighting, in particular, how collaborative efforts addressed evaluation challenges related to OMB submission and different organizational guidelines and regulations, and describe the benefits of collaboration on evaluation efforts related to developing evaluation questions, data collection plans and analytic techniques.
The Intersection of Quality and Collaboration: From a Program's Perspective
Sabrina Baronberg, New York City Department of Health and Mental Hygiene, sbaronbe@health.nyc.gov
The New York City Health Department's Health Bucks (HB) are designed to improve access to and consumption of fresh fruits and vegetables in underserved neighborhoods while supporting local farmers and farmers' markets. In order to ensure the success of the program, the Health Department has collaborated with over 150 community groups/sites and over 50 farmers markets run by more than 10 different organizations. This presentation will describe how collaboration has been key to high HB coupon redemption rates, accountability, and overall successful program implementation. The presenter will also discuss the Department’s own evaluation efforts and findings to date, including how the program has benefited both farmers and city residents, as well as how the commitment to collaborating with partnering agencies such as the CDC and Abt is helping to ensure accurate and reliable evaluation measures, augmenting the Department’s own evaluation efforts, and serving to disseminate best practices to the field.
The Intersection of Quality and Collaboration: From an Evaluator's Perspective
Yvonne Abel, Abt Associates Inc, yvonne_abel@abtassoc.com
Lauren Olsho, Abt Associates Inc, lauren_olsho@abtassoc.com
Debbie Walker, Abt Associates Inc, debbie_walker@abtassoc.com
Cristina Booker, Abt Associates Inc, cristina_booker@abtassoc.com
Jacey Greece, Abt Associates Inc, jacey_greece@abtassoc.com
Cheryl Hewitt, Abt Associates Inc, cheryl_hewitt@abtassoc.com
Leah Staub-DeLong, Abt Associates Inc, leah_staub-delong@abtassoc.com
Abt Associates has been funded by the CDC to conduct an evaluation of the New York City Health Bucks Program operated by the NYC Department of Health and Mental Hygiene (DOHMH). Key staff from the Abt evaluation team will discuss how collaboration with the CDC and DOHMH in the formative phase of the evaluation has informed and improved the quality of the evaluation plan, data collection instruments, and dissemination efforts. For example, DOHMH provided pre-existing surveys and detailed information about local evaluation and monitoring activities, thus avoiding duplication of effort and excess burden on respondents. Similarly, the CDC’s technical feedback and facilitation of contact with content experts and researchers ensured a robust and technically sound evaluation design. Finally, we will describe lessons learned, including the need for coordination across multiple agencies' standards and guidelines and efforts to disseminate information to wider audiences.

Session Title: Methodological Choices in Assessing the Quality and Strength of Evidence on Effectiveness
Panel Session 228 to be held in Texas A on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Government Evaluation TIG
Chair(s):
Valerie J Caracelli, United States Government Accountability Office, caracelliv@gao.gov
Abstract: This panel aims to explore the methodological choices evaluators face in attempting to review a body of evaluation evidence to learn “what works”, i.e., what interventions or approaches are effective in trying to achieve a given outcome. When asked to assess a new initiative to identify effective social interventions, GAO discovered that 6 federally-supported efforts with the same basic purpose had been operating in diverse content areas for several years. While all 7 evaluation reviews assess evaluation quality on similar social science research standards, some reviews included additional criteria or gave greater emphasis to some issues than others. They also differed prominently in the approaches they took to the next step - synthesizing credible evaluation evidence to draw conclusions about whether an intervention was effective or not. This panel will explore the methodological choices such efforts face, and what features of the evaluations or context influenced their decisions.
Comparing the Top Tier Evidence Approach to Other Systematic Evidence Reviews
Stephanie Shipman, United States Government Accountability Office, shipmans@gao.gov
Valerie J Caracelli, United States Government Accountability Office, caracelliv@gao.gov
This paper will provide an introduction to the issues by discussing federal policy interest in identifying high quality evidence on effective interventions, and briefly describing the Top Tier Evidence initiative – a systematic evidence review conducted by the private, nonprofit Coalition for Evidence-Based Policy – and the congressional request for GAO to assess the validity of that effort. We will then describe the scope of the six federally-supported systematic evidence reviews that we selected for comparison and the general steps they take to first, assess the quality of evaluation evidence, and then, synthesize the credible evidence to draw conclusions about intervention effectiveness. We will point out areas of similarity and differences between their approaches to set-up the other panelists’ in-depth discussions of the issues they considered and the rationales for their methodological and analytic choices.
Methodological Considerations in Selecting Programs for the Model Programs Guide
Marcia Cohen, Development Services Group Inc, mcohen@dsgonline.com
The Office of Juvenile Justice and Delinquency Prevention’s Model Programs Guide (MPG) is a searchable database that allows practitioners and researchers to search for information and research on more than 200 evidence-based prevention and intervention programs. The MPG conducts reviews to identify effective programs on the topics of delinquency; aggression and violent behavior; children exposed to violence; gang involvement; alcohol, tobacco, and other drug use; academic problems; family functioning; sexual activity and exploitation; and mental health issues. This paper reviews the evidence requirements that must be met for programs to be included in the MPG, and discusses the review process as well as the four dimensions of the program review criteria—conceptual framework, program fidelity, design quality, and outcome evidence. Components of the rating instrument and rating system will also be discussed. Evaluation quality standards that can be used by researchers to assess the evidence on program effectiveness are proposed.
National Registry of Evidence-based Programs and Practices Approach to Evaluating the Evidence
Kevin Hennessy, United States Department of Health and Human Services, kevin.hennessy@samhsa.hhs.gov
The Substance Abuse and Mental Health Services Administration (SAMHSA) within the U.S. Department of Health and Human Services developed the National Registry of Evidence-based Programs and Practices (NREPP – www.nrepp.samhsa.gov) as a searchable on-line tool to assist States and community-based organizations in identifying and assessing both the evidence strength and the dissemination support for interventions and approaches to preventing and treating mental and/or substance use disorders. NREPP is one way that SAMHSA works to improve access to information on tested interventions and thereby reduce the lag time between the creation of scientific knowledge and its practical application in the field. With this in mind, the presentation will highlight NREPP’s approach to evaluating behavioral health interventions, and discuss the methodological, practical and political considerations and decisions behind SAMHSA’s choice of this approach.
The Evidence-based Practice Centers: A National Approach to Systematically Assessing Evidence of Effectiveness of Health Care Interventions
Jean Slutsky, United States Department of Health and Human Services, jean.slutsky@ahrq.hhs.gov
In 1997 the Agency for Health Care Policy and Research (AHCPR), now known as the Agency for Healthcare Research and Quality (AHRQ), launched its initiative to promote evidence-based practice in everyday care through establishment of Evidence-based Practice Centers (EPCs). The EPCs develop evidence reports and technology assessments on topics relevant to clinical, social science/behavioral, economic, and other health care organization and delivery issues—specifically those that are common, expensive, and/or significant for the Medicare and Medicaid populations. With this program, AHRQ became a "science partner" with private and public organizations in their efforts to improve the quality, effectiveness, and appropriateness of health care by synthesizing the evidence and facilitating the translation of evidence-based research findings. The EPC’s now number 14 and have become the cornerstone in AHRQ’s program of comparative effectiveness research.

Session Title: Evaluating With Validity: Truth, Justice, and the Beautiful Way
Panel Session 229 to be held in Texas B on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Theories of Evaluation TIG
Chair(s):
James Griffith, Claremont Graduate University, james.griffith@cgu.edu
Discussant(s):
Ernest House, University of Colorado, ernie.house@colorado.edu
Abstract: In Evaluating with Validity (1980), House proposed three standards for evaluation: truth, justice, and beauty. He argued that, if one must choose between these standards, justice always comes first, then truth, and finally beauty. In discussing these standards, House draws both on contemporary social scientists and on philosophers contemporary and classic. Three evaluators with differing perspectives will consider how the contemporary historical context and current theoretical notions support or undermine House’s perspective.
Truth, Beauty, and Justice: Conceptualizing House’s Framework for Evaluation in Community-based Settings
Katrina Bledsoe, Walter R McDonald and Associates Inc, katrina.bledsoe@gmail.com
House’s theoretical framework of beauty, truth, and justice in evaluation continues to inspire the field. Yet, while one might agree with House that if a choice must be made between these three standards, that justice must always comes first, then truth, then beauty, that trade-off might not necessarily need to be made. I contend that justice can be accomplished even in the event the other standards are primarily emphasized. That is, in striving for beauty, the story that is most important to the community and will best serve them is told; in striving for truth, the kinds of questions that need to be asked, and the methods used by and for the community will ultimately lead to justice. To illustrate these points, I discuss my experiences conducting community-based evaluations at the national and local levels, particularly in communities that are hard to reach, underserved, etc.
Truth, Beauty, and Justice: The Best Way to Spot a Real, Genuine, Authentic Evaluation?
Jane Davidson, Real Evaluation Ltd, jane@realevaluation.co.nz
A great deal of money and effort is wasted every year around the world on evaluations (and other projects that call themselves "evaluations" but are not). A very important cause of this waste is that evaluators fail to step back, understand the big picture, or make the right trade-offs among competing concerns (such as truth, beauty, and justice). Even when the need for trade-offs and balance is acknowledged, we often see examples of false dichotomies being promoted. A common example is the belief that an in-depth quantitative or qualitative study [with high truth value] cannot possibly be summarized and presented in a [high beauty value] "sound bite". Another is the belief that "justice" is a value that has no place in a scientific "truth-seeking" endeavor such as evaluation. Jane will comment on the appropriateness, sufficiency, and relative importance of truth, beauty, and justice as standards for identifying whether an evaluation is real, genuine, authentic, and practical.
Whose Roots? Pushing the Justice Envelope in Evaluation
Rodney Hopson, Duquesne University, hopson@duq.edu
The second part of Ernie House’s seminal book, Evaluating with Validity, ends with an important sentence following a discussion on justice. And while House is credited with contributing significantly to notions of democracy and social justice in evaluation over the last 25 years (House, 2002; Kushner, 2005), there is a clear need for more developments that push the justice envelope in evaluation. This paper provides an overview of the justice-related approaches in evaluation from a Housian point of view, while integrating evaluation, democracy, and social change in larger perspective (Greene, 2006). In providing overview of justice-approaches in evaluation, the paper extends ways for retracing evaluation roots and building branches of justice and advancing an important and core attribute for the field (Alkin & Christie, 2004; Hopson & Hood, 2005; Ibrahim, 2003; Yarbrough, Shulha, Hopson, & Caruthers, 2010 forthcoming).

Session Title: Dealing With Technical Challenges in Mixed Methods Evaluation
Multipaper Session 230 to be held in Texas C on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the
Chair(s):
Virginia Dick,  University of Georgia, vdick@cviog.uga.edu
Discussant(s):
Susan Labin,  Independant Consultant, susan@susanlabin.com
Using Mixed Methods to Understand Evaluation Influence: Challenges and Opportunities
Presenter(s):
Sarah Appleton-Dyer, University of Auckland, sk.appleton@auckland.ac.nz
Janet Clinton, University of Auckland, j.clinton@auckland.ac.nz
Rob McNeill, University of Auckland, r.mcneill@auckland.ac.nz
Abstract: Mixed methods is receiving increased attention in the literature, with much discussion surrounding the capacity to mix paradigms and the recognition of pragmatism as an alternative paradigm. The literature also acknowledges our need to understand more about mixed methods in practice. For example, there is not once accepted framework to guide data analysis and integration. While this offers many benefits and opportunities, it also offers some challenges. This paper seeks to contribute to understanding of some of these challenges and opportunities by presenting a mixed methods study that aims to understand evaluation influence within population health partnerships in New Zealand. The study is part of a doctoral thesis and the paper will draw on a range of experiences to highlight the theoretical and practical challenges and opportunities experienced so far. Specifically, the paper will identify challenges and opportunities relating to study development, design, implementation and initial stages of analysis.
A Mixed Methods Toolkit for Evaluating Translational Science Education Programs
Presenter(s):
Julie Rainwater, University of California, Davis, julie.rainwater@ucdmc.ucdavis.edu
Stuart Henderson, University of California, Davis, stuart.henderson@ucdmc.ucdavis.edu
Abstract: Translational research and the training of translational researchers have generated significant attention in the past ten years. This attention has led to an emergence of a number of training programs specifically devoted to recruiting and training translational researchers. Translational training programs emphasis on team science, focus on interdisciplinary research, and acceptance of a range of career trajectories all present unique challenges to program evaluation and the development of evaluation metrics. This presentation describes the evaluation of pre-doctoral and postdoctoral Clinical & Translational Science training programs and the Howard Hughes Medical Institute’s -Integrating Medicine into Basic Science training program at the University of California, Davis. We will share the mixed methods evaluation toolkit that we developed for these programs as well as describe some of the challenges of evaluating translational training programs.
The Tale of Two Mixed Methods Projects
Presenter(s):
Jori Hall, University of Georgia, jorihall@uga.edu
Katherine Ryan, University of Illinois at Urbana-Champaign, k-ryan6@illinois.edu
Abstract: While quality in evaluation is important to determine by considering current and historical definitions of evaluation quality (House, 1980) as a foundation, we believe quality is best understood in terms of the inquiry process itself. Making aspects of the inquiry process transparent enables judgments about how suitable an approach is in relation to the purpose of the inquiry and the context within which it is embedded. With this in mind, this presentation explores the quality of two mixed methods projects by examining different approaches, including a project conducted by a sole researcher in one context as well as a collaborative mixed methods evaluation in another context. Using these projects, we further consider mixed methods decision points during the evaluation design and implementation process. These include the sequence of data collection; the priority given to qualitative and quantitative data collection and analysis; and integration or where the mixing occurred (Creswell, 2003).

Session Title: Analysis and Evaluation of Research Portfolios Using Quantitative Science Metrics: Practice
Panel Session 231 to be held in Texas D on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Research, Technology, and Development Evaluation TIG
Chair(s):
Laurel Haak, Discovery Logic, laurel.haak@thomsonreuters.com
Abstract: Increasingly organizations involved in research and technology development are interested in applying quantitative approaches to evaluate research program impact on participants and to assess whether programs are achieving their stated mission. Science metrics can be leveraged to complement qualitative evaluation methodologies and include bibliometrics, or the use of publication and citation information to derive measures of performance and quality, and other direct measures such as funding amounts and public health impact. In this panel, practical applications of applying metrics to the evaluation of research programs will be discussed. In particular we will discuss the use of bibliometrics in different evaluation settings, the development of novel metrics to address evaluation goals, and the use of metrics that accommodate differences in the temporal aspect of research portfolio outcomes.
Practical Applications of Bibliometrics: What Makes Sense in Different Contexts?
Frédéric Bertrand, Science-Metrix Corp, frederic.bertrand@science-metrix.com
David Campbell, Science-Metrix Corp, david.campbell@science-metrix.com
The production of performance measures using bibliometric methods has been proven effective to help answer important research evaluation and science policy questions. The design of bibliometric methods is critical and must be carefully adapted both to the organizational and research contexts, and to the overall evaluation analytical framework. There also is an opportunity to better combine and integrate bibliometrics with other evaluation methods. It is therefore important for the evaluation community to better understand the range of applications and limitations associated with bibliometric methods in different contexts. This paper presents bibliometric methods and associated analytical frameworks used to support evaluation processes in three contexts: 1) research funding organizations, 2) academic institutions, and 3) science-based governmental organizations (mandated/policy driven-research). These contexts are exemplified using sample bibliometric analysis covering various scientific areas such as genomics, cancer research, environmental science and natural resources.
Beyond Bibliometrics: An International Program Evaluation for Building Research Capacity
Liudmila Mikhailova, United States Army, lmikhailova@crdf.org
The paper focuses on the findings from an eight-country impact evaluation study of Regional Experimental Support Centers (RESC) in Eurasia and discusses quantitative metrics to measure R&D programs in an international context. With funding from the U.S. Department of State, CRDF has established 21 RESCs since 1997 to build research capabilities at scientific institutions and integrate research into university systems. We will discuss development and application of research capacity metrics featuring results at three levels: 1) institutional level – value to building research capacity; 2) regional level measuring the extent to which RESCs created a more attractive climate for economic activities (e.g., a food and drug testing facility in Yerevan, Armenia that facilitates the country’s imports and exports; and an environmental testing center in Baku, Azerbaijan that encourages responsible development of the country’s oil resources); and 3) knowledge production level that includes national and international grants and publications in peer-reviewed journals.
Applying Metrics to Evaluate the Continuum of Research Outputs: Near to Long-term Impact
Joshua Schnell, Discovery Logic, joshua.schnell@thomsonreuters.com
Beth Masimore, Discovery Logic, beth.masimore@discoverylogic.com
Laurel Haak, Discovery Logic, laurel.haak@thomsonreuters.com
Matt Probus, Discovery Logic, matt.probus@thomsonreuters.com
Michael Pollard, Discovery Logic, michael.pollard@thomsonreuters.com
The evaluation of research programs relies on the use of varied measures of research outputs and critical to the success of these evaluations is the quality and comprehensiveness of data available that capture these outputs. ScienceWire® is an data and software platform that integrates and interlinks public and proprietary data on research and its outcomes, including: publications indexed by MEDLINE and Thomson Reuters’ Web of Knowledge; Federal Research Grants from the National Institutes of Health, the National Science Foundation, the Department of Energy, the Department of Defense, and the US Department of Agriculture; patent applications and issued patents from the US Patent and Trademark Office; and approved drug products from the FDA Orange Book. We will demonstrate the practical applications of this integrated data platform in evaluating near and long-term research outputs, from grant applications to research outputs to impacts to patient care.

Session Title: Why Evaluators Need Graphic Design Skills
Demonstration Session 232 to be held in Texas E on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Evaluation Use TIG
Presenter(s):
Stephanie Evergreen, Western Michigan University, stephanie.evergreen@wmich.edu
Abstract: Evaluators need graphic design skills so that findings can be made into more than doorstops or dust collectors. Building on the work of Jane Davidson’s report formatting suggestions and bringing in best practices from the field of graphic design, the demonstration will illustrate the power of presenting findings in ways that are specifically intended to resonate with the audience. Page layout, use of high quality images, and decluttered slideshows will help evaluators move out of the “death by Powerpoint” business as usual and into presentations of findings and recommendations that make a lasting impression. Presentations and written reports both will be addressed, particularly how their tandem, unredundant use can strengthen both pieces of communication. The demonstration platform will allow for ample examples of before/after work, orientation to web tools to support this endeavor, and comparisons to existing guidelines for increasing use of evaluation reports.

Session Title: New Directions for Research on Evaluation
Panel Session 233 to be held in Texas F on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Research on Evaluation TIG
Chair(s):
John LaVelle, Claremont Graduate University, john.lavelle@cgu.edu
Abstract: In recent years the practice of evaluation has grown substantially, as evidenced by a rise in the number of professional evaluation organizations, AEA membership, the demand for evaluation services, and the number of number of universities offering training in evaluation (LaVelle & Donaldson, 2010). In tandem, a growing emphasis has been placed on the importance of research on evaluation. Mark (2007) proposed an organizing taxonomy that suggests four major areas that researchers of evaluation might explore: Context, Activities, Consequences, and Professional Issues. This panel will provide an introduction to Mark’s taxonomy, highlight current research in each section of the taxonomy, and provide an opportunity for the audience to brainstorm with the presenters to generate specific research ideas to guide future efforts.
Research on Evaluation Context: Examples and Ideas
Michael Szanyi, Claremont Graduate University, michael.szanyi@cgu.edu
Mark (2007) defines evaluation context as the circumstances within which evaluation occurs and identifies systematically assessing the effect of context on evaluation practice as one focus in research on evaluation. This presentation will provide an introduction on the contextual piece of Mark’s (2007) framework. Examples will be given to illustrate research on evaluation context, such as Azzam’s study on evaluator responsiveness to stakeholder opinions about an evaluation’s design (2010). Ideas for future research on evaluation context will be empirically derived from a study that the presenter conducted where a needs assessment/interest survey was sent out to AEA members in fall 2009 and over 300 respondents suggested topics and questions related to evaluation context. The major themes and specific questions will be presented to offer research avenues on evaluation context so that we may improve our practice as well as help answer the calls for more research on evaluation.
Research on Evaluation Consequences: A Meta-analysis of Evaluation Use
Mark Hansen, University of California, Los Angeles, markhansen@ucla.edu
Anne Vo, University of California, Los Angeles, annevo@ucla.edu
One of the consequences of evaluation that has received greatest attention among theorists, practitioners, and researchers is evaluation utilization, which has been described as “the way in which an evaluation and information from the evaluation impacts the program that is being evaluated” (Alkin & Taut, 2003). Although there exists a rich body of literature on this topic, there have been only a few efforts to synthesize this work (e.g., Cousins & Leithwood, 1986; Johnson et al., 2009). Such reviews have been enormously helpful in summarizing the research and generating insights concerning conditions that contribute to use. However, it is noteworthy that a quantitative synthesis has not been conducted. Here, we seek to address this gap through a meta-analysis of studies that examined the relationship between evaluation characteristics and perceived usefulness. We describe the results of this synthesis and discuss the usefulness of meta-analysis as a tool for research on evaluation.
Research on Professional Issues in Evaluation: Next Steps
John LaVelle, Claremont Graduate University, john.lavelle@cgu.edu
The idea of professionalization and professional development is not new, and various professional identity topics have been discussed over the years. These discussions, however, have lacked a unifying framework to help the audience see the relationships between various professional topics. The presenter will draw from sociology to share Forsyth & Danisiewicz’s (1985) process model of professionalization, which will then be used to frame a discussion on recent research on professional topics. Potential topics to be shared include the presenters’ research on the recruitment & selection of evaluators and the preservice preparation of evaluators. The discussion will then be expanded to include topics such as competencies, professional regulation, and public awareness of evaluators.

Session Title: Respecting and Protecting Boundaries: Social Evaluation Competencies
Think Tank Session 234 to be held in CROCKETT A on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the
Presenter(s):
Phyllis Clay, Albuquerque Public Schools, phyllis.clay@aps.edu
Discussant(s):
Ranjana Damle, Albuquerque Public Schools, damle@aps.edu
River Dunavin, Albuquerque Public Schools, dunivan_r@aps.edu
Debra Heath, Albuquerque Public Schools, heath_d@aps.edu
Nancy Carillo, Albuquerque Public Schools, carrillo_n@aps.edu
Abstract: Have you ever wondered if you’ve over-extended your boundaries and crossed the line into the evaluand’s territory? On the other hand, have you felt taken advantage of by an evaluee who expects you to do just one more small thing at the last minute? The purpose of this think tank is to provide participants with an opportunity to become more aware of their own approach(es) to boundary setting within their evaluation responsibilities and to explore alternatives in a collegial setting. Facilitators will briefly introduce the topic by highlighting situations in their own work in which they have struggled to keep boundaries clear. Groups will form for participants to discuss personal evaluation boundary situations and potential alternatives for protecting our own boundaries and respecting the boundaries of the programs we evaluate as well as those of the people within those programs. Highlights of the group discussions will be reported.

Session Title: Evaluation Utilization and the Story of the Federal Railroad Administration’s 10 Year Research and Development Effort to Change Safety Culture in the United States Rail Industry
Panel Session 235 to be held in CROCKETT B on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Business and Industry TIG
Chair(s):
Michael Quinn Patton, Utilization-Focused Evaluation, mqpatton@prodigy.net
Discussant(s):
Deborah Bonnet, Fulcrum Corporation, dbonnet@fulcrum-corp.com
Abstract: Over the last 10 years the Human Factors group within the FRA’s R&D Division has been implementing and evaluating a series of innovative projects to change safety culture in the railroad industry from one focused on blame, to one focused on cooperative problem solving. The motivation was to improve safety beyond the limits of what could be done via technological and rule-based procedural changes alone. Each of six programs approached the challenge in a different manner, but they all focused on safety culture and the engagement process to support cooperative problem solving and root cause analysis. The collective finding was that safety culture can be changed, that cooperative problem solving can be done, that root causes can be identified, and ultimately, that safety can be improved. As a result of these findings, the FRA has embarked on a deliberate effort to promote these kinds of programs in the railroad industry.
High Quality Evaluation Utilization as a Method to Improve Safety Culture Change in the United States Railroad Industry: Challenges, Opportunities, Failures, and Success
Michael Coplen, United States Department of Transportation, michael.coplen@dot.gov
Quality program evaluation often begins with an understanding of the context within which the program operates, proceeding with the development of a program logic model. Program performance measures are then identified and implementation activities begin. As the program evolves through its lifecycle, so too do the implementation activities, the data collected and analyzed, and the logic model depicting the program’s theory of change. In this presentation a program manager from the Federal Railroad Administration’s Office of Research and Development will present a high level summary of its decade-long program of safety culture change in the rail industry, including the context, initial failures, implementation challenges, and major successes, highlighting both the quantitative and qualitative measures that support quality program evaluation.
Overview of Safety Culture Evaluation Designs
Jonathan Morell, Vector Research Center, jonny.morell@newvectors.net
Joyce Ranney, Volpe National Transportation Systems Center, joyce.ranney@dot.gov
Michael Zuschlag, Volpe National Transportation Systems Center, michael.zuschlag@volpe.dot.gav
This presentation will provide an overview of the methodologies employed in each of the evaluations of the six programs that comprise the FRA’s efforts to date. Issues covered will be logic models and design – time series and control groups, stakeholder and key informant interviewing, case study analyses, and the interplay of qualitative and quantitative data. The purpose of the presentation will be to provide an understanding the design, in preparation for the next paper, which will provide results aggregated across all the evaluations.
Empirical Findings of Safety Culture Initiatives: Data Sources, Analyses, and Findings
Joyce Ranney, Volpe National Transportation Systems Center, joyce.ranney@dot.gov
Michael Zuschlag, Volpe National Transportation Systems Center, michael.zuschlag@volpe.dot.gav
Jonathan Morell, Vector Research Center, jonny.morell@newvectors.net
This presentation will provide an overview of the data that was collected, how it was analyzed and the results that were observed. Safety data that were collected include: injuries, derailments, locomotive engineer decertifications. Survey data on safety culture was also collected and analyzed. Formative and summative interviews were conducted and analyzed at all of the demonstration pilot sites. Significant improvements were observed in all of the data sets just mentioned.
Promoting Industry-wide Impact: Safety Culture Policy, Current Accomplishments, and Plans for the Future
Stephanie Morrow, Volpe National Transportation Systems Center, stephanie.morrow@dot.gov
Michael Coplen, United States Department of Transportation, michael.coplen@dot.gov
Joyce Ranney, Volpe National Transportation Systems Center, joyce.ranney@dot.gov
One of the unanticipated outcomes of these demonstration pilots was that, even though the diffusion of these efforts was largely informal, they began to garner broad industry support. The railroad carrier where one demonstration pilot was located launched a system-wide intervention similar to the CSA method. The national unions and carrier management all received awards from FRA commending their participation. Anecdotal evidence suggested acceptance and a general readiness for change within the industry. As a result, the development of a systematic engagement plan is underway that will include all industry stakeholders. This engagement plan aims to distribute the methods more broadly across the industry while continuing to evaluate the changes that have occurred as a result of these programs, including policy changes at industry and organizational levels. This presentation will outline the plan and invite suggestions for making the next ten years of rail safety culture improvements even more successful.

Session Title: Assessing Impacts in Real World Evaluations: Alternatives to the Conventional Statistical Counterfactual
Think Tank Session 236 to be held in CROCKETT C on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the International and Cross-cultural Evaluation TIG
Presenter(s):
Michael Bamberger, Independent Consultant, jmichaelbamberger@gmail.com
Discussant(s):
Jim Rugh, Independent Consultant, jimrugh@mindspring.com
Megan Steinke, Save the Children, msteinke@savechildren.org
J Bradley Cousins, University of Ottawa, bcousins@uottawa.ca
Abstract: Only a small fraction of program evaluations can estimate impacts using a statistically defined counterfactual. However, it is widely recognized that the absence of a methodology for defining and testing alternative possible explanations (rival hypotheses) of the observed changes in the project population increases the risk of biased or unreliable estimates of project effects. So what advice can we offer to evaluators on alternatives to the conventional statistical? The proposed session is a follow-up to a 2009 AEA think tank attended by over 50 participants in which a range of quantitative, mixed methods and theory based approaches to defining alternative counterfactuals based on participants own experience in the field were identified. There has been active follow-up resulting in the documentation of these alternative approaches. The 2010 think tank will build on the approaches and challenges identified in Orlando and will explore methodological questions relating to these innovative approaches.

Session Title: Evaluation and Quality: Examples From Government
Multipaper Session 237 to be held in CROCKETT D on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Government Evaluation TIG
Chair(s):
Sam Held,  Oak Ridge Institute for Science and Education, sam.held@orau.org
The Long and Winding Road: How the Integration of Evaluation Performance Measures and Results Can Lead to Better Quality Evaluations
Presenter(s):
Gale Mentzer, University of Toledo, gale.mentzer@utoledo.edu
Abstract: Evaluation plans and logic models depict inputs, activities, outputs, outcomes, outcome indicators, and their accompanying performance measures. When evaluation models are created, the evaluator uses research and theory, experience, and conjecture to determine the most appropriate indicators and data collection methods or performance measures for the stated outcomes. These designs, based on best practices, have good intentions but often an evaluator cannot predict precisely what the best manifestation of an outcome is and when/where it might occur. This presentation follows the path an evaluation plan of a large, federally funded project took as it attempted to uncover why one outcome was not being achieved when all the inputs suggested it should. It demonstrates that by crossing evaluation findings from other outcomes within the project, the evaluator was able to identify confounding variables and a more effective method of measuring the construct.
The External Reviewer's Role in Helping Promote Evaluation Quality: Examples From the Government Accountability Office's (GAO) Recent Experience
Presenter(s):
Martin de Alteriis, United States Government Accountability Office, dealteriism@gao.gov
Abstract: One way in which the U.S. Government Accountability Office (GAO) improves the quality of federal government evaluation is by reviewing agencies’ evaluation practices, and making recommendations to increase efficiency and effectiveness. While some GAO reviews have been meta-evaluations, most have examined the ways in which the agencies evaluated their own programs or activities. Typically, these reviews took a “good government” perspective, and focused on elements such as evaluation planning, data collection, and the use of results. A few reviews, however, used criteria specific to the evaluation profession; for example, two recent reviews relied on AEA’s Evaluation Policy Taskforce’s (EPTF) criteria for integrating evaluation into program management. The recommendations GAO made were accepted by the majority of the agencies, which subsequently took actions to improve quality. This presentation will discuss and illustrate how GAO reviews agency evaluation practices and the measures GAO recommends to improve quality.
Evaluating Data Quality in the Veterans Health Administration All Employee Survey
Presenter(s):
Katerine Osatuke, United States Department of Veterans Affairs, katerine.osatuke@va.gov
Scott C Moore, United States Department of Veterans Affairs, scott.moore@va.gov
Boris Yanovsky, United States Department of Veterans Affairs, boris.yanovsky@va.gov
Sue R Dyrenforth, United States Department of Veterans Affairs, sue.dyrenforth@va.gov
Abstract: The Veterans Health Administration (VHA) All Employee Survey (AES) is a voluntary annual survey of workplace perceptions (2008: N=164502; 72.8% response rate; 2004: N=107576; 51.75% response rate). AES results are included in action plans at the VHA facilities, at the regional level, and nationally. The dissemination of results and improvement implementation are included in the performance standards for managers and executives. Such broad use of AES results underscores the importance of data quality. We examined data quality issues in two years of the survey. We examined survey response and item nonresponse rates as dependent of respondents’ demographics, selected scores (e.g. satisfaction), and facility-level factors (incentives, organizational complexity). Variation in survey response rates and in rates of unanswered questions and survey breakoffs was unrelated to significant differences in mean survey scores for VHA facilities. Variation in demographics was significantly related to individual-level item nonresponse rates but effect sizes were small.
Employment and Training Administration: Increased Authority and Accountability Could Improve Evaluation and Research Program
Presenter(s):
Kathleen White, United States Government Accountability Office, whitek@gao.gov
Ashanta Williams, United States Government Accountability Office, williamsa@gao.gov
Abstract: This paper presents findings of the U.S. Government Accountability Office’s (GAO) evaluation of the research structure and processes of the Employment and Training Administration’s (ETA) research and evaluation center at the Department of Labor. Using key elements identified in the American Evaluation Association’s (AEA) Roadmap for a More Effective Government and in the National Research Council’s assessments of federal research and evaluation branches, GAO researchers examined: 1) how ETA's organizational structure provided for research independence; 2) what steps were taken by ETA to promote transparency and accountability in its research program; and 3) how ETA ensured that its research is relevant to workforce development policy and practice. Overall, GAO found that ETA's research center lacks independent authority for research, has limitations with regard to transparency and accountability processes, has not routinely involved stakeholders in developing its research agenda, and has been slow to address key policy issues.

Session Title: Navigating the Intricacies of Culture and Context in International Program Evaluation
Multipaper Session 238 to be held in REPUBLIC A on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the International and Cross-cultural Evaluation TIG
Chair(s):
Mary Crave,  University of Wisconsin, crave@conted.uwex.edu
Programme Evaluation in the African Context: Negotiating Your Way Through Political, Language and Cultural Maze
Presenter(s):
Octavia Mkhabela, Unleash Potential Unlimited, octavia@uphr.co.za
Abstract: Programme evaluation is in the best of times not politically neutral. The African context presents a unique set of challenges brought to bear by the intersection of culture and politics which in some instances is diametrically opposed to evaluation ethics. Political expediency often leads to parachuting of social programmes that are not carefully thought through and planned. The needs of the beneficiaries are often not assessed and the effect of nuisance variables not propoerly anticipated and mitigated. Evaluators in this context have an unenviable task of pointing these shortcomings out at the end of the programmes whose impact is negligable due to sometimes poor conceptualisation, inadequate design, and shoddy implementation in a cultural environment that puts a premium on respect for authority, and sometimes unquestioning loyalty to those in positions of authority. It becomes important to bring these shortcomings to the fore, while keeping to ethics of evaluation despite pressure from commissioners of evaluation. This requires objectivity and sensitivity to avoid dicontinuation of programmes meant to benefit the poor.
Twaweza: Evaluating an Ecosystem of Change
Presenter(s):
Gretchen Rossman, Center for International Education, gretchen@educ.umass.edu
Abstract: The Center for International Education (CIE) is serving as the independent evaluation entity for the Twaweza initiative based in Dar es Salaam, Tanzania. Twaweza ("we can make it happen" in Swahili) is a ten-year initiative, funded by a consortium of five donors and hosted by the Dutch development organizations Hivos. Its overall goal is to foster citizen-driven change and to empower East African citizens (in Tanzania, Kenya, and Uganda) to advocate for access to and the quality of basic services (particularly basic education, clean water, and health services). The evaluation components consist of national household and facilities surveys and annual case studies—throughout Tanzania, Kenya, and Uganda. CIE has formed partnerships with three Universities in East Africa to implement the evaluation. This paper reports on the evaluation conceptual framework and the results of the baseline studies.
Personalizing Outcomes: Feasibility and Utilization of Individual Goal-Setting in an International Context
Presenter(s):
Melissa Velazquez, Christian Foundation for Children and Aging, melissav@cfcausa.org
Abstract: Nonprofits working internationally in development and poverty reduction face complex and dynamic contexts that present challenges to establishing universal outcome measures and traditional needs assessments. Responsive programs call for adaptable, yet credible, ways to engage participants in shaping program development toward the outcomes most relevant to diverse stakeholders. This paper opens a discussion on creatively addressing evaluation and responsive program improvement in an organization where individuals and families are the units and loci of long-term, comprehensive development. Personal program goals set by program participants and monitored over time present a potential complement to geographic or sector based indicators as markers for measuring program outcomes, assessing fidelity to diverse participant needs, and as a guide for improvement. The author examines issues of logistical and cultural feasibility of goal-setting and documentation, explores scaled utilization within an international organization, and provides practical insight from an on-going, multi-country pilot initiative.
Do Evaluation Frameworks Support Equity and Social Justice? Lessons Learned in East Central and South-Eastern Europe
Presenter(s):
Linda E Lee, Proactive Information Services Inc, linda@proactive.mb.ca
Larry K Bremner, Proactive Information Services Inc, larry@proactive.mb.ca
Abstract: The authors have worked on education-related evaluations in numerous countries in Central, Eastern and South-eastern Europe. The paper is based on the evaluations of four programs designed to support marginalized and disenfranchised populations, including Roma youth who experience racism and discrimination and children vulnerable to being trafficked for labour and sexual exploitation. Beginning with an examination of the definitions and benefits of evaluation, the paper then compares the four evaluation frameworks on six dimensions (evaluation focus, purpose/intended benefits, audience, evaluator role, methods, timeframe). The paper then describes the utilization of results and levels of client and beneficiary participation in the evaluation process. The authors address the challenges of creating evaluations that retain their rigour, credibility and quality, while promoting participation and equity. The paper concludes with the authors’ reflections regarding ways to relinquish power and engage communities in creating the evaluation frameworks and approaches that can indeed support equity and social justice.

Session Title: Evaluating Health Program Sustainability: Improving the Quality of Methods and Measures
Think Tank Session 239 to be held in REPUBLIC B on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Health Evaluation TIG and the Non-profit and Foundations Evaluation TIG
Presenter(s):
Mary Ann Scheirer, Scheirer Consulting, maryann@scheirerconsulting.com
Abstract: This think tank will discuss methods and measures to evaluate health program sustainability in the context of program life cycles. What happens to programs funded by foundations and governmental entities after their initial funding has ended? We will briefly review the state of recent evaluations addressing sustainability, then facilitate discussion among participants around key topics concerning the quality of sustainability evaluation. What methods are most appropriate and feasible for collecting data about sustainability? What sustainability outcomes should be measured and how can we assess the predictors or facilitators of sustainability? This session will help to further develop the paradigms for addressing the quality of evaluation in this relatively new content area.

Session Title: Using a Practical Lens to Develop National-Level Participatory Projects
Demonstration Session 240 to be held in REPUBLIC C on Thursday, Nov 11, 9:15 AM to 10:45 AM
Sponsored by the Collaborative, Participatory & Empowerment Evaluation TIG
Presenter(s):
Tobi Lippin, New Perspectives Consulting Group, tobi@newperspectivesinc.org
Thomas McQuiston, Tony Mazzocchi Center for Health, Safety and Environment, tmcquiston@uswtmc.org
Kristin Bradley-Bull, New Perspectives Consulting Group, kristin@newperspectivesinc.org
Abstract: How can participatory evaluators successfully facilitate high-quality evaluation of programs that are national in scope, including ensuring meaningful participation by frontline program staff and program participants? Join this session for a practical look at some of the key strategies we have developed over a decade of facilitating these kinds of evaluations and assessments with teams comprised of labor union staff, rank and file workers, and external consultants. We will discuss some of the broader participatory strategies we use and the types of evaluation projects we conduct (national in scope and designed to leverage change at worksite, industry, and national policy levels). Then, we will walk through the specifics of how we: cultivate and maintain a “representative” team from various areas of the country; often rely on an organizational unit of analysis; and tap additional opportunities to gain broader input during the evaluation process

Return to Evaluation 2010
Search Results for All Sessions