National Institute of Justice. Learning from Demonstration Programs. Washington, DC:
Paper
prepared for the U.S. Department of Justice, National Institute of Justice by Abt
Associates Inc. pp.13-16.
Vl. Allocate sufficient funds for an impact evaluation; if controlled experimentation is infeasible, approach less rigorous designs with caution and imagination.
The cornerstone of a demonstration project is an impact evaluation--designed to
determine whether and how well the program works and what might be done to improve its
operation. In order to determine how well a particular strategy works, sonic basis for
comparison is essential, for the real question is "What difference does the program
make when compared to the status quo or any other feasible approach?" Certainly, the
most rigorous answers will come from experimental designs that call for the random
assignment of subjects, organizations, or target areas to experimental and control groups
that are identical in all respects--except for the application of the test strategy to the
experimental group. Long considered the ideal in evaluation research, random assignment of
equally eligible subjects ensures that any differences in the outcomes of the two groups
can be attributed to the experimental treatment--not to existing differences between
subjects or to chance. While a true experiment is a straightforward evaluation strategy,
maintaining the integrity of an experimental design is hardly a simple matter. Ensuring
that controls are not contaminated (by exposure to project services), avoiding the
so-called Hawthorne effect (whereby experimentals may perform differently merely because
they are under observation), and dealing with the problems of attrition from both groups,
or the elimination or redirection of planned project efforts, are some of the issues that
impose the need for constant vigilance in the execution-of an experimental design.
The practicality of implementing a true experiment will vary depending on the nature of
the program and the circumstances of its implementation. Obviously, when a program is
oversubscribed (and there is an excess of potentially eligible targets), conditions may
favor a true experiment, provided that all potentially eligible targets can be subjected
to the same formal and informal screening criteria. If sufficient program capacity exists,
and outright denial of service to controls is a troublesome issue, the experiment might be
structured to delay the participation of control group members. Alternatively, instead of
an untreated control group, eligibles might be randomly assigned to different levels of
treatment. This, the surveillance or support services provided by a drug treatment
program, or the intensity of police action in an enforcement program might be
systematically varied for different groups of randomly assigned subjects. In general, in
considering the practicality of a true experiment, it may be wise to recall that if
neither of two treatments is known to be superior, random assignment may not only be
feasible, but the only fair method of allocation.
Comparison Groups
When controlled experiments are infeasible, a common alternative is to use comparison
groups or areas selected to have similar characteristics to those of experimental targets.
This form of quasi-experimental approach is often dubbed a queasy-experiment, since any
number of variables unaccounted for in deciding the comparison group is
"similar" may explain any differences observed between the two groups. This does
not necessarily imply, however, that any design short of a true experiment is a foolhardy
venture. Advances in statistical methodology no longer require the kinds of
"matching. exercises that were prevalent in the early 1970s; the major requirement is
only that a particular characteristic not fall uniquely in one group. While the analytic
problems presented by a comparison group study are undeniably more difficult, with
sufficient creativity in the selection of a comparison group, studious efforts to identify
and collect data about relevant exogenous differences between the two groups, and
application of suitable statistical methods to control for the differences, this type of
design may yield a credible result.
Pre-Post Comparisons
In addition to matched comparison groups, another common alternative to a true
experiment is the use of baseline data to construct before-and-after comparisons.
Typically used to assess the results of introducing new or more intensive law enforcement
policies, some of the more obvious threats to the validity of this design are summarized
below:
Reporting Biases. First, reported
crime statistics are inherently unreliable measures of crimes actually committed not only
because not all crime is reported to the police, but because law enforcement personnel
have some latitude in deciding which crimes are recorded and how they are classified. In
dealing with drug-related crime, this threat is particularly serious, since reported
offenses may depend on the intensity of investigative scrutiny. A thoroughly investigated
report has a greater chance of being considered unfounded if closely examined than if
routinely processed. This variability has been estimated as high as ten percent of all
reports. Bias may also be introduced if a successful program increases citizen willingness
to report offenses.
Displacement. Police efforts to
suppress a particular crime may simply displace that crime outside the range of project
(or evaluator) observation. The displacement may be geographic (to other jurisdictions) or
temporal (to other days or
times). Offenders may also change the type of crime they commit or the targets of their
activity, so that street muggers, for instance, may become burglars or bus robbers may
hold up liquor stores.
Regression Toward the Mean. The fact that new enforcement strategies are naturally applied to areas that are experiencing the highest incidence of the targeted crime poses a particular measurement problem related to the component of random fluctuation in crime rates. The geneticist Sir Francis Galton is usually credited with having discovered "regression toward the mean" when he noticed that tall fathers tended to have sons shorter than they, and short fathers, sons taller than they. The idea behind this phenomenon is that members of a population which are chosen on the basis of being far from the average on some dimension will naturally tend, over time, to fall back toward the population average. In Dalton's case, the population was adult males in a family line. In criminal justice evaluation, the population might be burglary rates in a certain area over time, drug-abusing males awaiting trial, or juveniles contacted by police. Suppose, for instance, that an anti-burglary project allocates its resources by watching monthly burglary rates in each precinct and goes into a particular precinct when its burglary rate has risen two months in a row. The project is very gratified to notice that almost invariably the burglary rate drops during the months of intervention. An analyst might ask, however, "How do we know that the precincts in which the project operated were actually at the beginning of a surge in burglary, which the project arrested?" Since we know that burglary rates fluctuate from month to month within precincts, it could be that the particular precincts the project entered were simply experiencing random fluctuations upward in their burglary rates, which would naturally be followed by fluctuations downward--back toward their mean or average burglary rate.
Duration of Study. The short
period of observation-typically afforded by most evaluations makes the problem of random
variation particularly acute. If one is confronted with a total number of drug-related
crimes in 1986, a program introduced in 1987, and a total number of drug-related crimes in
1988, there is little room for separating the effects of the program from year-to-year
variation. In general, the shorter the duration of study, the greater the need for more
observations--specifically monthly or even weekly reported crime counts. A short period of
observation is potentially difficult in another important respect. To the extent that the
effects of an anti-crime strategy change over time, early measurements may be misleading.
Some strategies may only produce results a year or more after their installation. As an
extreme example refer to the sidebar on the following page which discusses the time lags
that were involved in processing defendants under the New York State Drug Law passed in
1973. The effectiveness of other strategies may diminish over time as potential offenders
become aware of the new efforts and adapt their modus operandi to minimize the enforcement
threat. As an example, refer to the concluding sidebar which discusses a major change in
the deployment of transit police to counter subway robbery.
Exogenous Influences.
Finally and most obviously, is the fact that crime rates show long-term trends that are
almost certainly more dependent on demographic, social and economic influences than on any
police activities. Obviously, then, an increase or decrease in those rates may be
unrelated to the program under review. Like many of the issues discussed in this section,
both statistical and qualitative observation will be required to judge the extent to which
broader social trends adequately explain changes in particular criminal activities.
References cited in the bibliography provide more comprehensive discussion of these and
other issues to be considered in measuring the effects of crime control strategies. The
central point to be made here is the need to anticipate and compensate for these problems
with creative design and analysis plans. Indeed, much of the art of designing evaluations
lies in recognizing all possible confounding threats to the interpretation of results and
selecting measures which (a) minimize the effects of external changes, and (b) allow some
sort of estimate of their possible influence. Generalities are of only limited usefulness
in this task, since the threats to interpretation vary with the situation. As one example,
however, refer to the study of crime on the New York City Subway System described in the
concluding sidebar.
A Note on Evaluation Costs
With careful planning, the monitoring activities discussed in this paper can be
performed as a routine function of the units implementing a demonstration project.
Assistance with a process evaluation can often be obtained from local colleges or
universities where professors who teach in the social or political sciences may welcome
opportunities to give students applied research experience. In the absence of an inhouse
research capacity, however, a thorough impact study is likely to require a clearly
identified financial commitment. Exactly how much of a commitment is required will depend
on the nature of the program, the level of ambition reflected in the study design, and the
difficulties involved in data collection and analysis. A modest allocation (for the sake
of illustration, defined as up to $50,000) may be sufficient if the needed data are
readily accessible and clearly usable, the data files are modest in size, and the analysis
plan is relatively straightforward. On the other hand, as Weidman et al. have noted,
"a careful evaluation of a demonstration program can cost more than the program
itself. For example, [a] District of Columbia study of policewomen on patrol comparing the
performance of 86 female with 86 male patrol officers over a year, cost about $300,000 [in
the early 1970s].*
In considering an investment in this range, it is important to recall that the funds are
not buying a simple yes or no answer indicating whether or not a program works. With
sufficient force (ample funds, manpower, or technology), almost any credible strategy will
have some salutary effects. But the question is not only "Does it work?," but
"Given what we know about how it works, how can the strategy be changed to provide
better results at the same or lower cost?" The challenge that faces demonstration
program planners is to learn enough about strategies funded by the Anti-Drug Abuse Act to
make significant improvements in the nation's capabilities for drug abuse control.
| *Donald R. Weidman, et al., Intensive Evaluation for Criminal Justice Planning Agencies, p. 7. |