Harrel, A. (n.d.). Evaluation strategies for human services programs. Washington DC: The Urban Institute.
Assessing Readiness for Evaluation
Evaluability assessment is a systematic procedure for deciding whether program evaluation is justified, feasible, and likely to provide useful information. Questions to be considered in an evaluability assessment include: (3)
Is the program's logic model plausible given the resources available and guidance from the relevant literature? If program goals are unrealistic or the intervention strategies not well grounded in theory and/or prior evidence, then evaluation is not a good investment.
What kinds of data will be needed, from what number of subjects, and what data are likely to be already available? Evaluations should be designed to maximize the use of available data, as long as these are valid indicators of important concepts and are reliable. Available data may, for example, include government statistics, individual and summary agency records and statistics, and information collected by researchers for other studies. If there are crucial data needs not met with existing data, resources must be available to collect the requisite new data.
Are adequate resources and assets available-money, time, expertise, and community and government support? Are there any factors that limit or constrain access to these resources?
Can the evaluation be achieved in a time frame that will permit the findings to be useful in making program and policy decisions by federal, state, and local officials?
To what extent does evaluation information already exist somewhere on the same or a closely related intervention? The answer to this question can have important implications for action. Any successful previous attempts may yield promising models for replication. Lessons learned from previous unsuccessful attempts may inform the current effort. If sufficient evidence already exists from previous efforts, the value of a new evaluation may be marginal.
To what extent are the findings from an evaluation likely to be generalizable to other communities, and therefore useful in assessing whether the program should be expanded to other settings or areas? Are there unique characteristics of the projects to be evaluated that might not apply to most other projects? Program characteristics that are not generalizable reduce the value of any findings.
Selecting an Evaluation Design
Selection of the evaluation design follows the systematic consideration of these questions. As noted, there are four major types of evaluation: impact, performance monitoring, process, and cost. We discuss each in turn.
Impact Evaluation Designs
Three possible designs are possible for impact evaluations: experimental, quasi-experimental, and non-experimental. They all share the strategy of comparing program outcomes with some measure of what would have happened without the program. Experimental designs are the most powerful and produce the strongest evidence. These are not always possible, however, in which case one of the two other alternatives must be chosen. (A later section discusses how to make the choice.)
EXPERIMENTAL DESIGNS
Key elements. Experimental designs are considered the "gold standard" in impact evaluation. Experiments require that individuals or groups, such as classrooms or schools, be assigned at random (by the flip of a coin or equivalent randomizing procedure) to one or more groups prior to the start of services. The "treatment" group or groups will be designated to receive particular services designed to achieve clearly specified outcomes. If multiple treatment groups are designated, the outcomes for the treatment groups may be compared to one another to estimate the relative impact of the different services or the impact relative to a control group. A "control" group receives no services. The treatment group outcomes are compared to control group outcomes to estimate impact. Because chance alone determines who receives the program services, the groups can be assumed to be similar on all characteristics that might affect the outcome measures except the program. Any differences between treatment and control groups, therefore, can be attributed with confidence to the impacts of the program.