National Institute of Justice. Learning from Demonstration Programs. Washington, DC: Paper
prepared for the U.S. Department of Justice, National Institute of Justice by Abt Associates Inc.  pp.13-16.

View entire document

 

Vl. Allocate sufficient funds for an impact evaluation; if controlled experimentation is infeasible, approach less rigorous designs with caution and imagination.

The cornerstone of a demonstration project is an impact evaluation--designed to determine whether and how well the program works and what might be done to improve its operation. In order to determine how well a particular strategy works, sonic basis for comparison is essential, for the real question is "What difference does the program make when compared to the status quo or any other feasible approach?" Certainly, the most rigorous answers will come from experimental designs that call for the random assignment of subjects, organizations, or target areas to experimental and control groups that are identical in all respects--except for the application of the test strategy to the experimental group. Long considered the ideal in evaluation research, random assignment of equally eligible subjects ensures that any differences in the outcomes of the two groups can be attributed to the experimental treatment--not to existing differences between subjects or to chance. While a true experiment is a straightforward evaluation strategy, maintaining the integrity of an experimental design is hardly a simple matter. Ensuring that controls are not contaminated (by exposure to project services), avoiding the so-called Hawthorne effect (whereby experimentals may perform differently merely because they are under observation), and dealing with the problems of attrition from both groups, or the elimination or redirection of planned project efforts, are some of the issues that impose the need for constant vigilance in the execution-of an experimental design.

The practicality of implementing a true experiment will vary depending on the nature of the program and the circumstances of its implementation. Obviously, when a program is oversubscribed (and there is an excess of potentially eligible targets), conditions may favor a true experiment, provided that all potentially eligible targets can be subjected to the same formal and informal screening criteria. If sufficient program capacity exists, and outright denial of service to controls is a troublesome issue, the experiment might be structured to delay the participation of control group members. Alternatively, instead of an untreated control group, eligibles might be randomly assigned to different levels of treatment. This, the surveillance or support services provided by a drug treatment program, or the intensity of police action in an enforcement program might be systematically varied for different groups of randomly assigned subjects. In general, in considering the practicality of a true experiment, it may be wise to recall that if neither of two treatments is known to be superior, random assignment may not only be feasible, but the only fair method of allocation.

Comparison Groups

When controlled experiments are infeasible, a common alternative is to use comparison groups or areas selected to have similar characteristics to those of experimental targets. This form of quasi-experimental approach is often dubbed a queasy-experiment, since any number of variables unaccounted for in deciding the comparison group is "similar" may explain any differences observed between the two groups. This does not necessarily imply, however, that any design short of a true experiment is a foolhardy venture. Advances in statistical methodology no longer require the kinds of "matching. exercises that were prevalent in the early 1970s; the major requirement is only that a particular characteristic not fall uniquely in one group. While the analytic problems presented by a comparison group study are undeniably more difficult, with sufficient creativity in the selection of a comparison group, studious efforts to identify and collect data about relevant exogenous differences between the two groups, and application of suitable statistical methods to control for the differences, this type of design may yield a credible result.

Pre-Post Comparisons

In addition to matched comparison groups, another common alternative to a true experiment is the use of baseline data to construct before-and-after comparisons. Typically used to assess the results of introducing new or more intensive law enforcement policies, some of the more obvious threats to the validity of this design are summarized below:

       Reporting Biases. First, reported crime statistics are inherently unreliable measures of crimes actually committed not only because not all crime is reported to the police, but because law enforcement personnel have some latitude in deciding which crimes are recorded and how they are classified. In dealing with drug-related crime, this threat is particularly serious, since reported offenses may depend on the intensity of investigative scrutiny. A thoroughly investigated report has a greater chance of being considered unfounded if closely examined than if routinely processed. This variability has been estimated as high as ten percent of all reports. Bias may also be introduced if a successful program increases citizen willingness to report offenses.

       Displacement. Police efforts to suppress a particular crime may simply displace that crime outside the range of project (or evaluator) observation. The displacement may be geographic (to other jurisdictions) or temporal (to other days or
times). Offenders may also change the type of crime they commit or the targets of their activity, so that street muggers, for instance, may become burglars or bus robbers may hold up liquor stores.

        Regression Toward the Mean. The fact that new enforcement strategies are naturally applied to areas that are experiencing the highest incidence of the targeted crime poses a particular measurement problem related to the component of random fluctuation in crime rates. The geneticist Sir Francis Galton is usually credited with having discovered "regression toward the mean" when he noticed that tall fathers tended to have sons shorter than they, and short fathers, sons taller than they. The idea behind this phenomenon is that members of a population which are chosen on the basis of being far from the average on some dimension will naturally tend, over time, to fall back toward the population average. In Dalton's case, the population was adult males in a family line. In criminal justice evaluation, the population might be burglary rates in a certain area over time, drug-abusing males awaiting trial, or juveniles contacted by police. Suppose, for instance, that an anti-burglary project allocates its resources by watching monthly burglary rates in each precinct and goes into a particular precinct when its burglary rate has risen two months in a row. The project is very gratified to notice that almost invariably the burglary rate drops during the months of intervention. An analyst might ask, however, "How do we know that the precincts in which the project operated were actually at the beginning of a surge in burglary, which the project arrested?" Since we know that burglary rates fluctuate from month to month within precincts, it could be that the particular precincts the project entered were simply experiencing random fluctuations upward in their burglary rates, which would naturally be followed by fluctuations downward--back toward their mean or average burglary rate.

       Duration of Study. The short period of observation-typically afforded by most evaluations makes the problem of random variation particularly acute. If one is confronted with a total number of drug-related crimes in 1986, a program introduced in 1987, and a total number of drug-related crimes in 1988, there is little room for separating the effects of the program from year-to-year variation. In general, the shorter the duration of study, the greater the need for more observations--specifically monthly or even weekly reported crime counts. A short period of observation is potentially difficult in another important respect. To the extent that the effects of an anti-crime strategy change over time, early measurements may be misleading. Some strategies may only produce results a year or more after their installation. As an extreme example refer to the sidebar on the following page which discusses the time lags that were involved in processing defendants under the New York State Drug Law passed in 1973. The effectiveness of other strategies may diminish over time as potential offenders become aware of the new efforts and adapt their modus operandi to minimize the enforcement threat. As an example, refer to the concluding sidebar which discusses a major change in the deployment of transit police to counter subway robbery.

       Exogenous Influences. Finally and most obviously, is the fact that crime rates show long-term trends that are almost certainly more dependent on demographic, social and economic influences than on any police activities. Obviously, then, an increase or decrease in those rates may be unrelated to the program under review. Like many of the issues discussed in this section, both statistical and qualitative observation will be required to judge the extent to which broader social trends adequately explain changes in particular criminal activities.

References cited in the bibliography provide more comprehensive discussion of these and other issues to be considered in measuring the effects of crime control strategies. The central point to be made here is the need to anticipate and compensate for these problems with creative design and analysis plans. Indeed, much of the art of designing evaluations lies in recognizing all possible confounding threats to the interpretation of results and selecting measures which (a) minimize the effects of external changes, and (b) allow some sort of estimate of their possible influence. Generalities are of only limited usefulness in this task, since the threats to interpretation vary with the situation. As one example, however, refer to the study of crime on the New York City Subway System described in the concluding sidebar.

A Note on Evaluation Costs

With careful planning, the monitoring activities discussed in this paper can be performed as a routine function of the units implementing a demonstration project. Assistance with a process evaluation can often be obtained from local colleges or universities where professors who teach in the social or political sciences may welcome opportunities to give students applied research experience. In the absence of an inhouse research capacity, however, a thorough impact study is likely to require a clearly identified financial commitment. Exactly how much of a commitment is required will depend on the nature of the program, the level of ambition reflected in the study design, and the difficulties involved in data collection and analysis. A modest allocation (for the sake of illustration, defined as up to $50,000) may be sufficient if the needed data are readily accessible and clearly usable, the data files are modest in size, and the analysis plan is relatively straightforward. On the other hand, as Weidman et al. have noted, "a careful evaluation of a demonstration program can cost more than the program itself. For example, [a] District of Columbia study of policewomen on patrol comparing the performance of 86 female with 86 male patrol officers over a year, cost about $300,000 [in the early 1970s].*

In considering an investment in this range, it is important to recall that the funds are not buying a simple yes or no answer indicating whether or not a program works. With sufficient force (ample funds, manpower, or technology), almost any credible strategy will have some salutary effects. But the question is not only "Does it work?," but "Given what we know about how it works, how can the strategy be changed to provide better results at the same or lower cost?" The challenge that faces demonstration program planners is to learn enough about strategies funded by the Anti-Drug Abuse Act to make significant improvements in the nation's capabilities for drug abuse control.

*Donald R. Weidman, et al., Intensive Evaluation for Criminal Justice Planning Agencies, p. 7.