Office of Juvenile Justice and Delinquency Prevention. Evaluating Juvenile Justice
Programs: A
Design Monograph for State Planners. Washington, DC: Prepared for the U.S. Department of
Justice, Office of Juvenile Justice and Delinquency Prevention by Community Research
Associates, Inc.; 1989. pp.33-38.
Measurement Issues
Perhaps there is no more critical issue in evaluation than defining and measuring the variables to be used. The validity of a study depends on appropriate measures of project activities and outcomes. Indeed, the final judgment of the program may depend upon how the program operations are conceptualized and measured. The choice of measurement, and design of the evaluation will to a large degree, determine if the evaluation is to be believed by evaluation consumers.
There are no hard and fast rules in the choice of measurement; what is most critical is that the measures are appropriate for the context for which they are intended. For example, in evaluating a juvenile division program, one may want to measure a youth's family relationship as a factor that might influence his or her ability to avoid further contact with the court. Obviously in assessing the impact of a positive peer culture program in a juvenile institution this type of information would be less relevant. Other considerations involve the definition of measures of success. If recidivism is defined as a police contact, a different rate of project success will be obtained that if it is defined as adjudication or incarceration. None of these measures are wrong, they are just measuring different things. This it is important to be clear about the meaning of measures chosen for the evaluation..
There are three types of categories of measurement integral to any juvenile justice evaluation: measures of program input, program processes, and program outcomes. The discussion of each of these will frame the remainder of this chapter.
Measurement of Program Input
While the popular conception of evaluation focuses upon program outcomes, remember that there is considerable variation in program inputs which can affect the results of the intervention. Often in juvenile justice there are broad program types such as diversion, education, and family therapy. While there are commonalties among programs within each of these categories, programs with a similar overall description may involve distinct intervention strategies with dissimilar clients. For example, two division programs may be oriented to reducing the level of commitment to juvenile court. However, one may involve diversion at the police level for status offense, while the other may involve screening at the court or prosecution stage to divert minor property offenders to a restitution program. Before we can say what works it is necessary to say what we are doing.
There are several aspects to measuring program input. First consider how the goals and objectives of the program are translated into practice. What facets are to be emphasized through commitment of resources toward the project objectives? What are the major project activities and how do these relate to the anticipated outcomes? Why and how is the program supposed to work? What are the underlying reasons the intervention is presumed to be effective?
These questions all relate to the theory of the program. Although it is often presumed that theory is irrelevant to juvenile justice practice, nothing could be further from the truth. Indeed, the program theory is a statement of the mechanisms through which the intervention is to work. Most importantly from an evaluation standpoint, it tells us what variables and concepts to measure.
Thus, one of the first tasks of evaluation is to obtain a clear explication of the theory behind the program. What should the program change that will result in reduced delinquency? As a prevention program, is it oriented to improving the individual's self concept, attachments to family, school performance, or opportunities? Each potential area of program emphasis implies a different casual process designed to reduce delinquency. Although program designers and administrators may not always claim to have employed theory in the creation of the program, theory is implicit in all forms of delinquency intervention. It can remain the evaluator's task to clarify the reasoning behind the intervention.
While it may appear quite straightforward to specify what the program is, e.g., the reduction of probation officer caseload size, specifying the content, what is actually going on, may be more difficult. There may be substantial variation in operational procedures, and among program personnel. The more complex the project and varied the components, the more difficult and crucial this task is.
Why be concerned about theory and program content? After all, isn't the issue to measure the effect and impact of the intervention? True, but if the evaluator doesn't know what the program is, he or she may fail to ask the appropriate questions regarding program impact, the wrong variables may be measured, or appropriate measures be omitted. Most importantly the evaluator will not be able to attribute changes observed to program components or activities. Since a principal reason for evaluation is to replicate successful programs, it is imperative to know precisely what was done in order that the desired components, procedures, and activities can subsequently be implemented.
Another important virtue for these evaluation input measures is to clarify project activities beyond those presented in the funding application. Often at the time of application the specifics of program operation are not finalized, yet many evaluations use the wording of funding proposals as a reflection of what goes on in the program. If the evaluator simply accepts the program statement as fact, the danger exists of making incorrect statements about its effects. In fact one could wind up evaluating a program that does not actually exist.
For example, although the planners of a juvenile division program may have designed an intervention for youths who commit criminal offenses, the staff operating the program might decide that a more appropriate intervention, given staff resources and expertise, is to divert youths who have family problems and are principally status offenders. Without an understanding of such a shift, the evaluator may inappropriately make conclusions regarding a different form of intervention than actually took place. In the words of Carol Weiss (1972;44),"the evaluator has to discover the reality of the program rather than its illusion".
There are two forms of data which can be considered as input measures; data on the program itself, and data about the program participants. Program specific data would include the purposes of the program taken from program statements as well as staff descriptions, resource allocation, methods of operation, day to day procedures, staffing patterns, location, size of program, management structure, and inter-organizational relationships. Plus every evaluation should carefully document the content, duration, and intensity of treatment involved in the intervention.
The second type of input data concerns characteristics of program clients. This would include demographic and personal characteristics such as age, gender, education, employment, and family economic status. In addition, depending upon their relevance to the program theoretical significance, or use in prior research, one may also wish to collect data on the attitude and perspective of program participants on a variety of issues that may be related to his or her performance in the project. These areas might include the youth's relationship to family and peers, attitude about the program, motivation for participation, perception of sanctions and deterrence, social responsibility, and self concept. While the project may be reasonably expected to alter some of these factors, others such as gender, and quite impervious to change. It is useful to collect data on these and other control variables to determine the types of clients who are more likely to be successful in the program.
There are two forms of data which can be considered as input measures: data on the program itself, and data about the program participants.
Measuring Intermediate Program Effects
Beyond an accurate reflection of program inputs and content, a thorough evaluation should contain an analysis of the attainment of mid-range goals. Almost every juvenile justice program contains both mid-range and long range objectives. For example, a juvenile division program may have the ultimate objective of keeping youth referred to juvenile court from committing subsequent offenses. But there are probably intermediate steps that are believed to lead to this goal. It may be that those diverted are to make restitution; if so intermediate measurements need to determine if the youth in fact do so. While one could collect outcome data on subsequent offenses, and proclaim the program a success or failure, this process would obviously be in error if one could not ascertain that this intermediate, and presumably casual, step had not taken place.
Although it seems obvious that one cannot say a restitution program was successful unless restitution was made, this type of error is quite common in less obvious situations. A program may involve drug treatment as a method of reducing delinquency. While subsequent delinquency may or may not be affected by program participation, it is imperative to consider the impact that the counseling program has upon drug use independent from delinquent activity. It is certainly conceivable that the program may be effective in reducing drug use even though delinquent activity remains unchanged. On the other hand it may be possible that the program has no effect on drug use, and this any conclusions relative to the program's impact on delinquency through drug treatment are inappropriate.
These two types of outcomes are referred to as theory failure and program failure If the program works but the outcome criterion is unchanged, i.e., if drug use is reduced but there is no effect on delinquency, then theory failure has occurred. While the program has achieved the desired intermediate effect, our theory about delinquency being a result of drug use may be flawed. Such a situation would require a reformulation of the theory and restructuring of the intervention.
On the other hand, if the program is not observed to affect the intermediate goal, i.e., if drug use is not affected by program participation, then no conclusions can be made regarding the overall impact of the program on delinquency. The relationship between drug use and delinquency has not been adequately tested. Since drug use has not been altered, any changes in delinquency cannot be attributed to drug use patterns. This is an important distinction because although drug use has not changed, delinquency involvement may have changed. If drug use is not measured, then changes in delinquency may be falsely attributed to the treatment program.
This situation also confirms the necessity for adequate input data. Although the changes in delinquency may not be a result of reduction in drug use, there may be other aspects of the program hat have resulted in this change. For example, the establishment of a positive relationship with the counselor may have resulted in delinquency reduction independent of drug use. Having these data can aid in the redesign of the program to focus on relationships that may be more productive in reducing delinquency.
The consideration if intermediate effects has an additional benefit. The statement of intermediate steps forces a clarification of the project, and forms a conceptual model of the processes through which the effects are presumed to be caused. Such explication not only clarifies what is expected to occur, but may serve as a guide for replication and revision of the project after evaluation results have been obtained.
Measuring Program Outcomes
Measuring outcomes is popularly viewed as the essence of evaluation. In spite of the critical nature of measuring inputs and intermediate effects, program planners and administrators still need to address the question: "Did it work?" As we have observed there is often not a direct answer to this question, and in many cases the most straight forward answer may be "It depends." How success if defined and measured will often determine the degree to which the program is viewed as effective.
At first glance determining the success of juvenile justice programs would not appear to be problematic: program participants either commit new offenses or they don't. Unfortunately, program success is generally not so directly determined.
Rather, juvenile justice success is commonly measured through the concept of recidivism. While this concept has a universal meaning of correctional failure, its operational definition is anything but universal. There are a number of dimensions of the concept that must be specified before a working definition is obtained. First the threshold of recidivism must be established. What are the specific criteria that indicate program failure? Is subsequent police contact sufficient, or is it more appropriate to count arrests? Should there be a formal referral to juvenile court or must there be a formal adjudication to indicate failure? Some may argue that commitment to an institutional program after release is the appropriate measure of recidivism
Obviously the statistics measuring program outcome will be greatly influenced by this criteria decision. If adjudication is the criteria, then youths who have committed subsequent delinquencies will not be counted as failures unless the system responds with a formal adjudication. On the other hand, if one indicates program success as a lack of police contact, then youths who have not committed subsequent delinquent behaviors could be counted as failures, since they may have contact as a result of being known to the police.
In each of these situations the real outcome of the program is the same, but it will appear much different due to the variation in definition. This difference may be substantial. Waldo and Chiricos (1977) in evaluating a work release program noted that program success may vary from 20% to 70% depending on the definition of recidivism.
In defining recidivism, the evaluator must choose the measure that is most appropriate given the scope and objective of the intervention. Generally, the best measure of recidivism is the one that is closest to the behavior itself, either self-reported delinquency of police contacts. The further a measure is from the individual's behavior, the more it is measuring the influence of organizational behavior and decision-making rather than commission of delinquency acts. The police decision to refer to court and the court decision regarding adjudication may be influenced by a number of factors other than the youth's behavior. This is most apparent in the use of measures of recidivism involving return to the program. If one is evaluating an institutional program and an area of concern is post-institutional behavior, the focus should be on the individual's performance in the community, not on return to the institution. The offender's return to the institution may be due to a range of factors unrelated to subsequent behavior or delinquency. Thus measures of program return should be avoided in recidivism studies.
Another important question in defining recidivism is how serious must delinquent behavior be to constitute failure? In evaluating an intervention program aimed at violent juvenile offenders, should a subsequent court referral for a status or minor property offense be considered as failure? There are no hard and fast rules to govern this. However, in many of these decisions greater information can be collected at little or no additional cost. Where possible, data should be presented on the type of subsequent offenses rather than forcing a dichotomous success or failure decisions.
Another issue in defining recidivism is the length of the follow up period. One correct but somewhat unhelpful maxim is the longer the better. Longer follow up periods have the obvious advantage of better testing the lasting effects of the intervention. However, given the need for timely feedback in a public policy environment, long follow up periods are often not feasible. The time span will also affect the appearance of success. The longer an individual is followed, the more likely we are to discover some wrong doing (except for the most saintly clients). Thus major differences in the impact of the program may be observed from a 3-6 month follow up compared to a 2-3 year follow up period. Another complicating factor involves the need for continuing follow up in adult records for youths reaching their age of majority. For longer follow up periods this becomes a critical issue. Generally, in the evaluation of juvenile justice programs a six month follow up would be viewed as minimal with a year period desirable.
One of the most common evaluation errors concerns the use of this follow up period. A one year follow up period means that data on the legal status of each participant during the 12 months following program completion will be collected. The important aspect of this definition is that every person has the same time at risk after the program. Too, often the status of offenders is reviewed as of a certain date, e.g., one year after the program began.
Generally, in the evaluation of juvenile justice programs, a six-month follow-up would be viewed as minimal with a one-year period desirable.
Results may then show that after a year of program operation, a certain number of youth have completed the program and a percentage, presumably small, have been rearrested. In this situation some offenders have had lengthy periods at risk while others would have had only a few days in which to fail. While it is not necessary that each participant have the same period of follow-up, it is imperative that the evaluator collect these data and consider it in the analysis.
Although these are the most common problem areas in the definition of recidivism, there area a number of other issues that deserve careful consideration. Included in these are the concerns of revocation policy for those on community supervision status, e.e., technical violations versus new offenses as reasons for failure. Also, recidivism studies are based on the assumption that the intervention will be effective in influencing the participants' delinquent activities. Thus, a complete offense history, including the dates, offense type, and disposition, should be obtained. From this information the evaluator can control for the seriousness of prior delinquent behavior. It is important that the pre and post program data be collected from the same source since different processes and definitions may be used in collecting various data sets. If multiple data sources are used for the-pre and post program measures, then a finding regarding program effect may actually be an artifact of differences in the manner in which data were collected.
Alternative Measures of Program Impact
Although recidivism is often a measure of outcome, you should not overlook alternative measures of program impact and effectiveness. For example, in measuring the impact of a juvenile diversion program, changes in the level of court referrals would be a valid outcome measure. Similarly in evaluating a community service program, one might measure the hours worked to compute public cost savings, as well as the attitudes and opinions of participants regarding their responsibility to the community.
You would be well advised to create multiple outcome measures for several reasons. First, it is quite rare that the impact of intervention is observed in only one area. Almost all juvenile justice programs have a range of goals and potential effects. Many purport to benefit clients (better treatment), the organization (greater efficiency), as well as the larger community (lower crime). Measuring recidivism alone does not include the multidimensional aspects of these programs. Multiple measures increase the reliability of the evaluation, and may increase the acceptability of the findings, thereby adding to overall validity and credibility.
Second, it's not wise to place all the outcome eggs in one basket. When judgements are being made regarding program continuation, it is better to have a greater amount of information on performance than to simply rely on one measure such as recidivism, which may be greatly influenced by factors beyond the program's control.
Finally, attention should be paid to less tangible measures of program outcome, such as the consensus building and knowledge production aspects mentioned in Chapter Three. A service delivery or client-oriented program, which might appropriately be evaluated using traditional outcome measures, will have other products, or by-products, worth measuring or assessing in a qualitative way. Consider, for example, the LUE program in Chapter Three. Interviewing participants revealed that the police handled youths differently after participating in the LRE program. This was recognized as a valuable outcome, and may be considered both a consensus building and a knowledge production outcome.
A comprehensive juvenile justice program plan may contain other programs for which recidivism or other quantitative measures are inappropriate as evaluation tools. Legislative initiatives, or standard setting programs fall in this category, ad does the creation and implementation of policy or issue review boards. These programs are often directly aimed at consensus building and knowledge production, or some other system-oriented-versus client-oriented-goal. Such programs are worthy of and amenable to evaluation, and should get serious consideration. Evaluating them will provide alternative measures of programs and initiatives.