OJJDP
Office of Juvenile Justice and Delinquency Prevention
Evaluating Juvenile Justice Programs: A Design Monograph for State Planners
Technical Reference and Information Series
Contents
Chapter 1: Purpose and Overview
Benefits to the Reader
Monograph Organization
Monograph Guide
Chapter 2: Getting Started
Why Evaluate?
The Logic of Evaluation
Chapter 3: Approaches to Evaluation
Introduction
Choosing Programs for Evaluation
The Time Frame for Evaluation
Scope of the Evaluation
Varieties of Outcome Measures
Knowledge Production Outcomes
Consensus Building Outcomes
Instrumental Outcomes
A Typology of Evaluation Levels
Dimensions of the Typology
The Law Related Education Program
Basic Monitoring
Comparative Process Evaluation
Basic Outcome Evaluation
Comparative Outcome Evaluation
Summary
Chapter 4: Critical Issues
Introduction
Measurement Issues
Measurement of Program Input
Measuring Intermediate program Effects
Measuring Program Outcomes
Alternative Measures of Program Impact
Threats to Validity
Threats to Internal Validity
Maturation Effects
History Effects
Selection Effects
Mortality Threats
Threats to External Validity
Special Topics in Program Evaluation
Sources of Data
Official Sources of Data
Self Reported Data
Interviews and Questionnaires
Surveys
Basic Guidelines for the Development of Survey Items
Use of Random Assignment
The Use of Observation in Program Evaluation
Common Errors in Evaluation
The Evaluation Imagination
Using Evaluation: Audiences and Products
Chapter 5: Other Considerations
Introduction
Building Evaluation Into a Program RFP
Preparing an Evaluation RFP
Objectives of the RFP
General Information for Applicants
Specifications
Information Required in the Proposal
Evaluation and Selection
Choosing an Evaluator
Inside Versus Outside Evaluators
Recognizing a Good Evaluator
Chapter 6: References and Resources
Juvenile Justice Program Evaluation Resources
Finding Agencies and Organizations
Related Publications
Evaluation Issues
Survey and Research Design
General Statistics and Guides to Statistical programs
for the Computer
Journals and Periodicals
Monograph References
Glossary
Index
Evaluating Juvenile Justice Programs
A Design Monograph for State Planners
Prepared by
James R. "Chip" Coldren, Jr.
Timothy Bynum
Joe Thome, Project Coordinator
Community Research Associates, Inc.
115 North Neil Suite 302
Champaign, IL 61820
August, 1989
June, 1991 (Second Printing)
This monograph was originally published by Community Research Associates
under contract with the U.S. Department of Justice, office of Juvenile Justice and
Delinquency Prevention, contract number OJP-85-C-007 and has been reprinted by the
Criminal Justice Statistics Association under grant from the Bureau of Justice Assistance,
grant number 90-DD-CX-K002. Opinions stated herein are those of the authors and do not
necessarily represent the official position of OJJDP, BJA, CRA, or CJSA.
The Assistant Attorney General, Office of Justice Programs, coordinates the criminal and juvenile justice activities of the following program offices and Bureaus: National Institute of Justice, Bureau of Justice Assistance, Bureau of Justice Statistics, Office of Juvenile Justice and Delinquency Prevention, and office of Victims of Crime. |
PREFACE
Quality and efficiency in programming for young people is a priority for the Office of Juvenile Justice and Delinquency Prevention. The evaluation of programs designed to assist troubled youths and those at-risk is therefore a primary concern as well.
Recently, OJJDP conducted a survey to assess State involvement in evaluating programs funded under the Juvenile Justice and Delinquency Prevention Act's Formula Grant program. The office found great interest in evaluation, a strong indicator that federal assistance would be extremely helpful in improving quality and consistency in program development.
The monograph therefore represents one aspect of OJJDP's effort to assist the states with program evaluation. We share the states' concerns that programs are responsive to the needs of young people, and designed to efficiently use the dollars allocated to address youths' needs. Sound evaluation research will help planners identify exemplary programs and allow them to share their findings with OJJDP and others interested in addressing similar problems.
Accountability, effectiveness, quality control, and the ability to solve problems all represent benefits of evaluation. With those goals in mind, programs aimed at helping young people should continue to improve. This monograph will help fulfill this mission.
ACKNOWLEDGMENTS
The assembly of Evaluating Juvenile Justice Programs: A Monograph for State Planners would not have been possible without the valuable assistance of many contributors. Deborah Wysinger of the office of Juvenile Justice and Delinquency Prevention's (OJJDP) State Relations and Assistance Division (SRAD) headed a Task Team charged with the mission of designing a preparing the monograph. She was supported by the following Task Team members: Timothy Bynum, Ph.D. of Michigan State University School of Criminal Justice; James R. "Chip" Coldren, Jr. of the Criminal Justice Statistics Association; Anne Schneider, Ph.D. Oklahoma State University Department of Political Science; Barbara Seljan, Oregon Juvenile Services Commission; Joe Thome, Community Research Associates; and Ruth Williams, Pennsylvania Commission on Crime and Delinquency.
Additional editorial comments and suggestions were provided by Russ Carpenter, Russ Carpenter Associates; Terry Edwards, New Jersey State Law Enforcement Planning Agency; and Cheryl McNair, Oklahoma Commission on Children and Youth.
The assistance and support of the project by all those involved was greatly
appreciated.
Evaluating Juvenile Justice programs:
A Design Monograph for Statement Planners
1 Purpose and Overview
"What works?" That simple question seems at times to be the most common query of
juvenile justice system administrators and planners. For today's system professions, who
face resource shortfalls daily, the question might also be posed as, "Is my program
working?" or "Are our clients getting what they need?"
To answer these types of questions, state juvenile justice specialists and other system administrators must turn to evaluation research. The problems associated with juvenile delinquency are alarmingly sophisticated and intimidating. Doing what is best for the youth of a community and developing programs that take maximum advantage of limited budgets remain priorities. Toward these goals, evaluation research can play an important part.
The purpose of this monograph is to offer evaluation strategies to state juvenile justice specialists, state advisory groups, juvenile program administrators, and others interested in learning more about the processes and outcomes produced by Formula Grants projects. It is primer designed to provide practical advice on creating or enhancing evaluation programs within the scope of budgetary, staffing, time, and other administrative obstacles.
The office of Juvenile Justice and Delinquency Prevention (OJJDP) State Relations and Assistance Division (SRAD) sponsored this monograph in response to a 1988 survey of juvenile justice specialists that revealed a strong interest in how evaluation research can provide insight into program activities and outcomes. Respondents asked for assistance in developing and/or refining their evaluation processes.
The U.S. Congress formalized its interest in learning more about the characteristics of state and local programs created using Juvenile Justice and Delinquency Prevention (JJDP) Act Formula Grants Programs funds by directing OJJDP to assist state efforts at evaluating, replicating, and marketing JJDP Act programs. Specifically, the 1984 amended version of the JJDP Act states that the Act is intended "to provide for thorough and ongoing evaluation of all federally assisted juvenile delinquency prevention programs" [Sec. 102(a)(I)].
Furthermore each state that participates in the Act's Formula Grants Program is required to submit a three year comprehensive plan, with annual updates, to OJJDP. Among its other requirements, the plan details how the state will "provide for the development of an adequate research, training, and evaluation capability within the State" [Section 223(a)(11)].
For these reasons, OJJDP's State Relations and Assistance Division (SRAD) commissioned this evaluation monograph.
Benefits to the Reader
This monograph serves as an evaluation primer for State Planning Agency (SPA) staff persons and agencies with whom they contract and for criminal justice program evaluation activities. While the OJJDP evaluation survey indicates that juvenile justice specialists are generally well acquainted with evaluation knowledge and skills, they are not uniformly supportive of evaluation activities, or do not feel they are in a position to conduct meaningful evaluations. Based in these findings, it was felt that a less academic and more practical approach to juvenile justice program evaluation issues was warranted.
This monograph is intended to provide an array of benefits to the reader:
Although this manual covers dozens of subjects, it is not inclusive. No single document can cover all there is to understand about evaluation research, and this monograph is not excluded. There are dozens of excellent reference books and articles on the topics included in this monograph (see the final chapter for examples) and the Office's State Relations and Assistance Division provides periodic training on the topic. Additionally, OJJDP's technical assistance programs exist to help persons interested in pursuing some aspects of evaluation. The monograph is therefore intended to be a primer on the topic of evaluation. The best way to use this monograph is to supplement the information with additional readings or through the assistance made available by OJJDP.
Monograph Organization
The monograph is divided into seven major sections. The remainder of this introductory chapter provides a reference guide to issues and sections which comprise the majority of the monograph. Chapter Two reviews the incentives and logic of evaluation. Chapter Three discusses approaches to evaluation. Chapter Four addresses issues central to conducting accurate evaluations and reviews their uses. Chapter Five focuses on issues related to organizing or establishing an evaluation program. Chapter Six offers references and resources for those interested in further reading.
The monograph format allows the reader to identify and select for review only those chapters of sections which address issues of greatest concern. Where appropriate, sidebars highlight examples of techniques described in the text or further define or explain specific concepts or methods. Definitions are highlighted on the page where an issue or topic is introduced.
Monograph Guide
Because the monograph contains information, tips, and practical advise on dozens of topics, a handy reference guide is presented here. For each of Chapters Two through Five important points are highlighted and references to the relevant pages provided in parentheses. When using the reference guide in conjunction with the index at the back of the monograph, selecting and finding special topics should be quite easy.
Why Evaluate (Chapter 2)
(1) There are a number is found reasons to develop a formal evaluation process. A few examples include
(2) Developing a flowchart of the state juvenile justice program and goals is crucial to understanding the role which evaluation can play.
Approaches to Evaluation (Chapter 3)
(1) The following factors should influence the choice of programs selected for evaluation.
(2) Two factors relating to the time frame for evaluation will also influence the decision regarding which program(s) to evaluate.
(3) A good evaluator will first ask "What do I want to learn?" Before embarking on an evaluation project. Careful consideration of this question and its ramifications will facilitate research design and prevent problems from arising in mid-evaluation.
(4) The distinction between three classes of evaluation-program monitoring, process evaluation, and outcome evaluation-is clarified.
(5) Three types of outcome measures are identified.
(6) Six types of evaluation studies are identified; each consists of a different combination of level of evaluation and comparative perspective. The six typologies include program-only and program/comparison versions of monitoring, process evaluation, and outcome evaluation.
This typology is explained using a juvenile justice Law Related Education evaluation example.
Critical Issues (Chapter 4)
(1) Measurement in program evaluation covers three general areas - measurement of input, measurement of intermediate program effects, and measurement of program output. Special attention is given to alternative measures of program impact, e.g., savings to other social service areas result from a program. Each area carries a special set of concerns regarding the context the program operates in, and the validity of the measures chosen.
(2) The validity issue has two aspects - threats to internal validity and threats to external validity. Threats to internal validity are things that could have caused program results other than program participation, such as client maturation, program history and staff learning curves. Threats to external validity are things that prevent the evaluation findings from being applied to other programs or clients.
(3) Common sources of data for program evaluators are reviewed, covering official records, self-reported data, interviews and questionnaires, and surveys. Special attention is given to the development and utilization of surveys.
(4) The use of random assignment in evaluation studies, that is, assigning research subjects to treatment/intervention and control groups on a random basis, is also given special attention because it is the best way to resolve threats to internal validity, which is usually the evaluator's chief concern.
(5) A special section is devoted to the use of observation in program evaluations. Periodic visits to programs under evaluation to observe operations first hand, and perhaps talk to staff and participants, will provide rich anecdotal information that supplements the quantitative data collected.
(6) Cost benefit evaluation issues are reviewed. This includes discussion of who benefits from the program, who incurs the program cost, and enumeration of cost and benefit types such as direct versus indirect, alternative costs, and fixed versus variable program costs.
(7) Three common errors in program evaluation are reviewed.
(8) Imagination is recommended as an important component in any evaluation program. When one gets into program evaluation, there are always a number of obstacles, problems, and distractions, both political and research-related. The good evaluator learns to be creative in the research design process, finding proxy measures for important variables perhaps, and in the political process in finding ways to make evaluation results useful. There is no better resource than experience. The new evaluator must consult others in the field, and documented examples of program evaluations abound, as do other resources.
(9) Using Evaluations
Other Considerations (Chapter 5)
Three important topics are covered in this chapter-building evaluation requirements into program RFPs, preparing evaluation RFPs, and choosing evaluators. These sections include advice from experts who have worked in the evaluation field for many years, and should be considered guidelines that will help planners and juvenile justice specialists make good choices.
(1) Building evaluation into a program RFP. This is a good way to publicly and firmly state your intent to evaluate. The following suggestions for inclusion in juvenile justice program RFPs are offered.
(2) Preparing an evaluation Request for Proposals (REP). A good RFP for evaluation of a juvenile justice program will have five key sections.
Providing this information to potential evaluators, in as specific a manner as possible, will help insure that the evaluation products meet your needs.
(3) Choosing an evaluator. This can be a frustrating experience. This section reviews the pros and cons of employing outside evaluators, and provides hints for recognizing talent. Good evaluators have three basic characteristics.
NOTES
The juvenile justice specialist is the staff person for the state agency charged with administering that state s participation in the Juvenile Justice and Delinquency Prevention Act's Formula Grants Program.
2. See the Survey of State Planning Agency Evaluation Capabilities,
prepared for the Office of Juvenile Justice and Delinquency Prevention by Community
Research Associates, Inc. 1988.
2
Getting Started...
Incentives and Early Steps
Why Evaluate?
Program administrators evaluate on a daily basis. Decisions to develop a new policy and procedures manual, hire new staff, purchase new computer equipment, or search for program alternatives all occur because a program manager believes there is a problem which must be addressed. In essence, the program is not providing all that it was designed to accomplish. Evaluation research techniques can be used to assist the juvenile justice program manager in making those decisions, but, they may have been under-utilized.
Evaluation research can take a variety of forms, cover a range of issues and activities, and be conducted in response to diverse concerns. One should not believe that the only meaningful evaluation is that which is time consuming or involves sophisticated data collection. Evaluation should not be thought of only in terms of traditional restricted definitions. As long as the results are meaningful and accurate, evaluation designs can encompass a variety of formats and approaches.
Regardless of the methods, the purpose for evaluating a program basically remains
consistent-to provide timely and useful information for decision-making by staff, funders,
and others. Is a program making progress, or creating unintended problems? Is a particular
program cost effective? Could a program improve if certain characteristics were changed?
Are the appropriate clients being referred to the program? Should the Program be continued
into a future funding cycle? Depending on the resources available to an evaluator and the
design of the evaluation program, questions such as these and many others can be answered.
Evaluation research should address concerns of effect as well as concerns of understanding. A juvenile justice program evaluation might be conducted to measure outcome to determine, as accurately as possible, what happened as the result of the implementation of a particular program. Such research might also be conducted to enhance our understanding of the relationships between the many factors which combine to characterize that program-why do things happen the way they do?
The incentives for building evaluation into program management are numerous. For the juvenile justice specialist, those incentives might include the objectives of the Act, those of the agency administering the Formula Grants Program, those of the state advisory groups, and those of the local program grantees.
For example, the reader may recall that as participants in the JJDP Act's Formula Grants Program, state planning agencies are obligated to develop an "adequate...evaluation capacity (for) the State" [Sec. 223(a)(11)]. Congress intended that participating states demonstrate some type of evaluation capability. Its concern is possibly directly attributable to the large amounts of Formula Grants Program funds distributed under the program since its inception. Participating states and territories have received more than $350 million in this decade alone, justifying congressional interest in the impact that money has had at the state and local levels.
An annual Performance Report of program successes and problems is also requirement of participating in the JJDP Act [Section 223(a)(22)]. The ability to provide accurate in informative Performance Reports can reflect the extent of a State's formal evaluation activities.
Beyond Congressional and regulatory requirements, there are a number of ways in which evaluation can benefit the SPA, the state advisory group, and local program personnel. Generally evaluations allow an administrator to:
The incentives and benefits must be weighed against the perceived or actual problems with evaluations.
The survey of State juvenile justice specialists revealed that the following were considered problems which hindered attempts to evaluate:
Many others believe, however, that the incentives and benefits override the problems associated with evaluation and that with careful design, many of these problems can be overcome or circumvented. As a result, processes are developed to incorporate data collection and analysis into routine program management. In the end, some gauge of whether a program is accomplishing its objectives is achieved.
As resources allow and the advantages which evaluation can provide are understood, evaluation research increasingly becomes a standardized aspect of program development. Justice system problems, as measured by delinquency rates, probation staff caseloads, training school system overcrowding, and the sophistication of delinquent activity, remain a serious concern. It is, therefore, not adequate to identify a problem, develop a responsive program, and let it run its course. The program must be monitored and evaluated to identify lessons and program characteristics which can be extracted and shared with others.
The Logic of Evaluation
Developing an effective, formalized evaluation program at the state level requires proper preparation. Before anything else is done an evaluator must prepare a model, or map, of the state's juvenile justice program plan. This model is laid out in a manner similar to an organizational chart or flowchart, but differs in that it is a systematic diagram outlining factors which define the direction of the state evaluation process. The process of developing such a model is termed mapping by evaluators and the final product is referred to as a system map.
To begin, the map should identify the state's juvenile justice system policies. At the executive level the governor's office and the state advisory group on juvenile justice issues combine to establish state direction and mandates. The state's juvenile justice agency, human service/welfare agency, and corrections agency must work together to apply those policies. The legislature, litigation, and other factors further influence policy direction. The map must account for those directions and mandates.
Identification of state and local goals for service delivery and system improvements is the second factor to be mapped. The goals developed for the system define the types of data to be collected and the methods to interpret that data. Goal development is the focus of a lengthier discussion below:
Third, the targets of the state strategy should be identified preferably through problem definition and goal development. Those targets may consist of a special client population (e.g., status offenders in need of improved services), a part of the system itself (e.g., an improved monitoring capacity or a service provision organization), or legislation.
Finally, the organizations and individuals responsible for implementing the strategy, and the methods for effecting change, must also be identified.
This brief description of a justice system map may sound familiar to state juvenile justice specialists. The map is essentially a diagrammatic portrayal of the three year state plan and annual plan update submitted to OJJDP as a requirement of participating in the JJDP Act. The format is different, but the groundwork for an evaluation system map has been laid with the development of the annual state plan. A generic illustration of a mapped state system is provided in Exhibit A. Exhibit B Illustrates how such a map might look for a hypothetical state. The map should guide all evaluation activities, at least in a general manner. It is also a useful tool for explaining evaluation activities and for insuring that the process is focused and productive.
The goal setting described as a part of the map is integral to overall evaluation program success, since it is the procedure that leads to the desired outcome. Once the goals have been further defined into specific objectives, they become the criteria against which successes can be measured.
The difference between goals and objectives is actually quite distinct. A goal is a broad statement of anticipated accomplishment. The statement, "To reduce reliance on secure detention" is an example of a broad program goal. This goal could be refined further into a series of measurable objectives. "To decrease the use of secure detention by 50 percent" or "to increase the use of nonsecure alternatives for the supervision of juveniles by 80 percent" are but two example of specific measurable objectives.
Goal establishment implies that persons involved in a program or system recognize that shortcomings exist and steps must be taken to rectify them. Such recognition is usually a product of needs assessment conducted to locate deficiencies. (The development of a needs assessment is beyond the scope of this monograph. There are many excellent references which detail the specifics or the process. See Chapter Six). However, it should be remembered that the annual state plan alluded to above is a type of needs assessment and can be critical tool for problem identification.
The process for goal development can be as sophisticated as time allows. The ideal would be to interview or survey all groups of system professionals to identify their objectives. The data obtained could be ranked and organized according to issues of greatest concern, and thus establish priorities.
At the other end of the spectrum, the evaluator can simply examine existing plans or needs assessments to identify goals and establish priorities. This goal definition procedure saves time, but it does not allow for detailed input from all persons or groups affected.
Goal development is important not only to define what officials expect to occur through a program initiative, but also to offer the applicant, grantee, or evaluator some definition of what a program is intended to accomplish.
Both the State (represented in the Formula Grants Program by the Juvenile Justice Specialist) and the grantee (local program administrator) have responsibilities in the evaluation process. For the state, it is the clear delineation of program goals-what it hopes to accomplish by funding an initiative. For the grantee, it is to ensure that the program moves toward those goals and that sincere attempts are made to achieve the desired effects. The evaluation process becomes a central tool for assessing effects, so the understanding of program goals between the State and the grantee must be developed early in the funding and development process.
Exhibit A
Generic Design for Mapping a Juvenile Justice Strategy
| AREA | Title 1 | Title 2 | Title 3 | Title 4 | Title n |
(Each areas should be listed by title; e.g. Legislation, Corrections, etc.) |
|||||
| PROGRAMS | Program A Program B |
Program A Program B |
Program A Program B |
Program A Program B |
Program A Program B |
(Each program within a strategy should be listed) |
|||||
| PROGRAM GOALS AND OBJECTIVES | Goal 1 Goal 2 |
Goal 1 Goal 2 |
Goal 1 Goal 2 |
Goal 1 Goal 2 |
Goal 1 Goal 2 |
(All goals should be listed for each program area) |
|||||
| Objective a Objective b |
Objective a Objective b |
Objective a Objective b |
Objective a Objective b |
Objective a Objective b |
|
(All objectives should be listed for each program area) |
|||||
| TARGETS | Target a Target b |
Target a Target b |
Target a Target b |
Target a Target b |
Target a Target b |
(Each target should be listed. Example targets might be Legislators, Youth programs, JJ staff, and others, depending on the strategy.) |
|||||
| AGENTS | Agent a Agent b |
Agent a Agent b |
Agent a Agent b |
Agent a Agent b |
Agent a Agent b |
(Each agent involved should be listed. Agents are those responsible for fulfilling project goals and objectives. Examples include the juvenile justice staff, consultants, SAG committees, etc.) |
|||||
| METHODS | Method a Method b |
Method a Method b |
Method a Method b |
Method a Method b |
Method a Method b |
(Each method to be employed by the agents to reach goals and objectives should be listed.) |
|||||
Exhibit B
Hypothetical State Juvenile Justice Strategy Map
Legislative Strategy
| PROGRAM | (A) Create Monitoring Inspection Unit | (B) Create a Law Related Education Program |
| GOALS AND OBJECTIVES | (G1) Create a monitoring inspection unit which will inspect all public and private residential facilities | (G1) Establish a LRE program within the Department of Youth Services |
| (G2) Establish an annual appropriation | (G2) Set program mission statement focused upon delinquency prevention and education of young students | |
| (Obj. 1) Place unit in Department of Corrections so that enforcement mechanism can be built in automatically | (Obj. 1) Set program mission statement focused upon delinquency prevention and education of young students | |
| (Obj. 2) Establish with 1 director, 1 assistant, 4 staff | (Obj. 2 ) Establish a funding level of $200,000 | |
| (Obj. 3 ) Seek and obtain judicial, legislative, and executive endorsement prior to committee votes | (Obj. 3) Direct DYS to administer and design. Money will provide for 1 director, 2 staff, transportation and printing expenses. | |
| (Obj. 4) Authorize as inspection arm of Department of Corrections | (Obj. 4) Establish additional $100,000 appropriation for an evaluation component designed to examine program success | |
| (Obj. 5) Establish a facility inspections universe and definitions of which facilities to inspect | ||
| (Obj. 6) $350,000 annual appropriation | ||
| TARGETS |
|
|
| AGENTS |
|
|
| METHODS |
|
|
Note: This is an example of how a state staff might prepare a justice system map for an upcoming legislative effort.
To help guarantee that the grantee and state expectations are identical, the Request for Proposal (RFP) announcement should delineate the goals, and establish what is to be accomplished by the program (the evaluable goals): Data collection and evaluation expectations can be portrayed at this process. In a nutshell, the role of the RFP is critical, since it can be used to establish measurable expectations. Because of the importance of RFP's for evaluation, their development is reviewed in greater detail in Chapter Five.
So far we have addressed why one may wish to evaluate, the importance of mapping evaluation programs, and the need to clearly develop goals and objectives. Once the decision is made to evaluate and a map is created, it is time to begin thinking about the approach to be taken.
Notes
1. The problems which interfere with accomplishing those goals should be
considered in the mapping process. Resource allocation problems, service delivery
problems, overlapping of responsibilities between agencies, limited data collection
efforts, and conflicting goals between state and local agencies are all example of very
real problems which can impede the pursuit of overall system improvements. Often the
mapping process will help identify system or planning problems. For example, a map may
make it clear that separate programs are performing (or claiming to perform) very similar
tasks, but they are not coordinated. Overlaps or gaps in services delivery become clearer
when they are mapped as we suggest.
3 Approaches to Evaluation
Introduction
Any approach to juvenile justice program evaluation will depend on a number of important factors which the specialist must consider.
Once these issues are seriously considered by the specialist and evaluator, the next steps involve the details of evaluation design, data collection, measurement, and analysis. These are covered in Chapter Four. Our concern here is to define and review general issues on approaching, or moving toward, evaluation. A number of concepts and suggestions are provided, and a typology of evaluation types is defined using a Law Related Education project as an illustrative example. After finishing this chapter, the reader will see there are numerous evaluation options, and that there is always room for evaluation in a juvenile justice program plan. The real issue is to decide to what extend evaluation activities are appropriate and can be realistically carried out. The specialist can make those decisions confidently by resolving these few general issues.
Choosing Programs for Evaluation
Assume that you have identified juvenile justice programs of various types-public awareness programs, legislative initiatives, outreach, and intervention or treatment programs. Having decided that you wish to conduct evaluation, and presuming that not all programs can or will be evaluated, the problem becomes that of deciding which program(s) to evaluate. There are practical and political considerations in this selection process as the following series of questions shows:
A program needn't be controversial to be a good candidate for evaluation, however. Some programs may have been operating for years under the assumption that they are working, at least as intended. Such assumptions should always be questioned, and evaluated if possible.
Consideration of these issues, program by program and for the program plan in general, can provide ideas for which programs you wish to evaluate. Combining such factors as accessibility, practicality, and the need to know evaluation information will push some programs to the forefront. But there is more thinking to be done. The next two sections consider two issues in detail-the time frame and the scope of evaluation. Thinking along these lines will further focus your evaluation plans.
The Time Frame for Evaluation
Time is a critical aspect of any evaluation plan, and it plays a role in a number of different respects. First and foremost, perhaps, is the decisionmaker's (be they politicians, funders, program administrators, clients or their parents, judges, probation officers, or any other person whose job decisions are related to juvenile justice programs) need to know evaluation information-how the program works and how it is doing. It is almost banal to state that such information is never timely enough, which makes evaluation efforts meaningless, but that is not really true.
A well conceived and implemented evaluation project will be of value to program executives, administrators, and evaluators. Any program information gathered and presented in a objective manner will be of some value. Quite often, such information is available in the absence of formal evaluation research. Archives, internal program reviews, client summaries, and other records provide the information; the task is one of locating, organizing, and presenting it.
Evaluation information is most useful when it is timely. As you will see in our discussions of outcome measures and levels of evaluation, more often than not timely evaluation information is available (or can be obtained in a timely and cost effective manner), though it may not be the most timely, the most objective, and the most scientifically rigorous data. In some instances, however, even an intensive, expensive evaluation effort will not produce the necessary information on time. In those instances, the specialist/evaluator may decide not to evaluate.
Closely related to the issue of the timeliness is the time span of the program under consideration. How long will it be in operation? If only for a year or less, evaluation (especially if it is costly) should be given a low priority. How long will the intervention or treatment take to administer, and how long will it take to observe its effects? Some programs administer their treatments (education, training, therapy, exposure to various stimuli) in small doses over long periods of time, while others provide heavy doses in short time spans. Neither case should automatically rule out evaluation, but different time frames for administration of treatment will certainly suggest different approaches to evaluation. In the same vein, different programs will expect to produce impact (behavior changes, attitude changes, test scores) in various time frames; or one program may produce different impacts over time. These considerations, too will affect evaluation plans and methods.
Consideration of a program's history may affect your evaluation decisions. If a program has been in operation for many years and has been ignored by evaluators (it may be unexciting, it may be controversial and protected, it may be difficult to evaluate), the time may be ripe to approach the subject. It is conceivable also that even thorough evaluation will not change much in the way of program operations.
A brand new program, on the other hand, may be too young for evaluation. It may be in a learning phase, or a state of flux, making evaluation difficult. Under some circumstances this situation might call for evaluation, but generally it would not. You might not consider a young program important enough to expend scarce evaluation resources, if you are not sure of its future. On the other hand, evaluation efforts, even at a low level, should start early on in a promising program to generate valuable information for later, more comprehensive, efforts. As you can see, there are no easy answers. If you understand that consideration of program history is important in making evaluation decisions, you are thinking appropriately, even if you don't have all the answers!
Scope of the Evaluation
The critical question when thinking about the scope of an evaluation effort is: "What do I want to learn?" Answering this question will make your decisions easier and will help make choices regarding data and methods. It is not a simple question. As with other questions we have reviewed, the answer depends on practical and political considerations, and the trick is to find a reasonable course of action so you can get on with the business of evaluating.
Your total program plan probably includes many programs of different kinds, funded at different levels, with varying goals and methods. Presumably you will not attempt to evaluate them all unless you have vast resources. The scope of your evaluation, then will be smaller, maybe a few isolated evaluation efforts, maybe a coordinated evaluation of similar programs, or perhaps you will decide to evaluate only one program.
You may also choose to evaluate aspects of one or more large program. A treatment program might include multiple facets or components-therapy, training, community activities-and you could decide to focus on only the training or therapy components, or you may choose to evaluate the training components of a number of programs.
Again, you must ask, "What do I, or what does my evaluation audience, really want to know about?" The legislature or state educational association may want to know about all of your training efforts. Your criminal justice constituency may only ask about a particular program's recidivism or failure rate, while you may feel there is more to be considered.
Evaluation need not always consider a single, total program. It also may cover more than one program, a component or components of a single program, a single component of many programs, and so on.
There is an important distinction in the evaluation literature that bears reviewing here, though it will be covered further on: program monitoring versus process and outcome evaluations. The following definitions for these concepts are offered:
Program Monitoring: Developing and analyzing data for the purpose of counting specific program activities and operations.
Process Evaluation: Developing and analyzing data to assess program processes and procedures; to assess the connections between various program activities.
Outcome Evaluation: Developing and analyzing data to assess program impact and effectiveness.
These definitions lend a false simplicity to these concepts, but provide the correct impression that evaluation activities can be distinguished by levels of complexity, difficulty, and cost. In reality, most evaluations comprise some of each of these activities.
When thinking about what you want to learn through evaluation, think in the context of monitoring, process, and outcome evaluation. Evaluations simply cannot proceed without monitoring information, which means answers to such questions as:
The volume of work done must be counted, sometimes in very detailed fashion. This is the general nature of program monitoring and it must be done if other evaluation activities as to take place.
Monitoring is, by itself, an evaluation activity. The process will yield information to answer a question such as, "How, or what, is the program doing?" Activity levels can be compared to goals and objectives and monitoring them over time can provide important feedback to program staff, clients, administrators, and funders.
As soon as you begin asking questions about the relationship between different activity levels, or the sequence of activities and systemic issues (how the activities are related in program procedures), you enter the realm of process evaluation. Sometimes referred to as "formative evaluation." process evaluation is concerned with providing feedback to staff and management to help avoid problems and adapt to changes in the program's internal or external environment.
In monitoring, you may keep track of the number of incoming clients, the staff workloads, and the provision of services to clients. Process evaluation takes these data a step further by analyzing the effect of trends in new clients on existing caseloads (and perhaps the external processes that are affecting program referrals), and on the time required to provide services. You are really building a model of program operations-identifying the relevant variables and measuring them, and then analyzing their interrelationships. This must be accomplished in some fashion to make the move to outcome evaluation.
An outcome evaluation assesses the success or effectiveness of a program or program component. Having achieved an analytical understanding of how a program operates, through monitoring and process evaluation, the next step is to assess program products, or outcomes. Consider a simple example involving a training program.
The outcome evaluation issues concern how the intended training was provided, the extent to which program activities deviated from the original design, and if the desired effects were achieved (better grades, higher self esteem, better employment, less involvement in crime, etc.). Outcome evaluations may address efficiency or cost-effectiveness issues. They may also uncover unanticipated outcomes.
Conducting outcome evaluation requires adequate monitoring process evaluation, for you cannot be sure an outcome was achieved by a program unless you can demonstrate a link between program activities (process) and results (outcome).
Varieties of Outcome Measures
In reality, outcome is not an "end process" issue; that is, outcome evaluations need not, and probably should not, be concerned only with what occurs at the end point of a particular process. All program activities have outcomes, whether intended or not. Some are quantifiable and some are not. Some are easily observable and comprehensible and some are not. Additionally, all different types of programs have outcomes, and these same issues apply as much to legislative initiatives as they do to direct service provision programs. In this section, we review types of outcomes to dramatize this important point. It is important to think of outcome issues in a more comprehensive context. This will enhance evaluation activities generally, and improve the information that goes to policymakers.
Knowledge Production Outcomes
In many instances, for particular programs or for program plans in general, a major goal is the production of knowledge about juvenile justice issues such as juvenile crime, effective prevention and treatment strategies, current legislation, available services, and so on. Evaluation of knowledge production efforts, or identification of knowledge production outcomes, generally receive low priority from evaluators. The frequent assumption is that more instrumental outcome measures-criminal behavior, test scores, and the like-are more desirable. This is not necessarily an appropriate assumption. Knowledge production is a stated goal in the federal Juvenile Justice Delinquency Prevention law. Acceptance of programs at state and local levels depends on the availability of information about the programs and acceptance of them, goals that cannot be achieved without the production of new knowledge in the general and criminal justice communities.
There are ways to monitor and evaluate knowledge production, and they are addressed later in this monograph. The important point here is that knowledge production be considered a valid process and outcome, worthy of evaluation from the start.
Consensus Building Outcomes
A similar argument applies in the case of consensus building outcomes-the need for production of common understanding and efforts among various constituents in the juvenile justice arena, especially where they may not have existed before. Attaining instrumental goals (see below) often depends on significant consensus building around an issue. Jail removal and de-institutionalization are two primary examples. Such programs cannot guarantee success even if they are well-conceived, well-managed, and adequately funded; they also must enjoy the support of various components of the state and local criminal justice and general communities.
Evaluation efforts tend to ignore this consideration. Consensus building efforts and achievements do not lend themselves to measurement and scientific analysis. However, they provide significant and valid qualitative, or contextual, information, which are worthy of more serious consideration in evaluation efforts. Production of information about the political and social psychological aspects of program implementation would be invaluable.
Instrumental Outcomes
These are the most commonly discussed outcomes in evaluation research. Being directly or indirectly related to a funded program's goals and objectives they, if observed and measured properly, will indicate the program's level of success or effectiveness. Typical areas of measurement include recidivism, educational attainment, self-esteem, and community values or citizenship. These are usually given the highest priority by evaluators and decisionmakers, and for good reason. If a program is not producing the promised results, and evaluation confirms this, then it is time to reconsider program goals, objectives, and methods. Instrumental outcomes are important. They are even more valuable if they are presented with evaluation information about knowledge and consensus building; where appropriate, to give decisionmakers the maximum amount of useful information, in the proper doses.
A Typology of Evaluation Levels
This chapter has presented many concepts and ideas for the specialist/evaluator planning evaluation. To reinforce these issues this chapter offers a typology of evaluation levels using an actual juvenile justice program example-a Law Related Education program implemented and evaluated in Colorado. The multitude of issues and questions raised is not intended to confuse the reader, but to convey the notion that there are many options for evaluating, and many good reasons for making the decision to evaluate. Once the reader accepts the following:
Then the decision to evaluate will be readily made. This section demonstrate even more clearly the various types of possible, and useful, evaluations.
Dimensions of the Typology
The typology of evaluation levels relies on the distinction drawn between monitoring, process, and outcome evaluation, and on the level of comparison you intend to achieve, or are able to achieve, given data and resource limitations. Generally, evaluations will be program, or program component, specific, or they will make use of comparisons with other programs or client groups. Combining three levels of evaluation with two general comparative perspectives reveals six evaluation types, as Exhibit C illustrates. As this typology unfolds below, it will become clear that each of these evaluation types has a useful purpose. Choosing one or the other is not a right versus wrong issue. In some instances the resources available and the demand for information will dictate that no more than a basic evaluation should be attempted. In other instances, a basic or comparative process evaluation will satisfy decisionmakers. It is the rare evaluation effort that either can support or, if the financial resources are available can achieve a true comparative outcome evaluation. Comparative outcome evaluations should be used, however, for the most critical, long-term problems faced by juvenile justice, and for the most promising strategies for addressing those problems.
It is important that the range of possibilities be given serious consideration as you make evaluation plans.
The Law Related Education Program
This program will be used to explain the evaluation typology presented above. It is important to understand that this example-a Law Related Education program-was actually implemented. The evaluation type employed was a comparative outcome evaluation. In this section, five hypothetical evaluations are described to define the other components in the typology, and the comparative outcome evaluation is described from the information produced by the program.
With any evaluation effort it is important to understand a program's goals, objectives, and operations before selecting an evaluation approach and methods. These are reviewed here for the Law Related Education (LRE) program.
LRE Goal:
Provide instruction to students to build a conceptual and practical understanding of law, enforcement, and judicial processes, leading to improved citizenship skills, a desire to work within the legal system to settle grievances and deal with criminal problems, an understanding of the basis for rules and favorable attitudes towards enforcement and justice.
LRE Objective:
Provide 30 to 40 semester hours of LRE to school age children (middle to junior high school age) in seven schools.
LRE Procedures:
Each of the seven schools selected a teaching team that was trained in the LRE
curriculum.
Throughout a semester the team taught law related topics including mock judicial
procedures, and utilized legal and law enforcement professionals in class exercises,
visits to courts, rides in patrol cars, and home security audits.
LRE Rationale:
The educational activities offered as the LRE program are expected to increase understanding of law enforcement, and judicial processes because the standard school curricula do not cover such topics.
Exhibit C
AN EVALUATION TYPOLOGY COMBINING LEVELS OF EVALUATION
AND COMPARATIVE PERSPECTIVES
Comparative Perspective
| LEVEL OF EVALUATION | Program Only | Program and Comparison |
| Monitoring | Basic Monitoring | Comparative Monitoring |
| Process Evaluation | Basic Process Evaluation | Comparative Process Evaluation |
| Outcome Evaluation | Basic Outcome Evaluation | Comparative Outcome Evaluation |
The new educational material will challenge perceptions of these phenomena among school children the are based on television portrayal and popular perceptions among peers. If it is carefully and thoughtfully presented, the LRE curriculum will change these conceptions among the students and foster respect for law abiding behavior.
Basic Monitoring
A basic monitoring evaluation is concerned with answering simple questions about program activities and rationale. A good way to approach the problem is to ask
Monitoring is the process of developing and analyzing data to count and/or identify specific program activities and operations.
the following question: "Who is doing what, when, where, and how often and with what resources?" For the LRE program the following answers are typical:
Answering the "How often" question might entail collecting data in the following areas:
-- total number of students taught, and subtotals for various student type-age, sex, ethnicity, family characteristics,
-- number of students per class, or average number if there is variation, or absences per class.
Basic monitoring information for a program has the following utilities:
Most program leaders or administrators collect this basic information, or at least some subset of it. Collecting the data, if it is done carefully and reviewed periodically, is evaluation of a basic sort. Comparisons are made between expectations and observed results, or at least data are relied on the set expectations. This is evaluation activity.
It is, of course, not sufficient to simply collect and analyze program data to complete a successful monitoring program. The data and findings should be integrated into the decisionmaking process at the program and/or higher levels.
Comparative Monitoring
In a comparative evaluation effort, basic monitoring data are collected for other similar programs, or for subjects in control groups which do not receive the intervention but are monitored for comparison purposes. The LRE program used both types of comparison in its evaluation. LRE programs were implemented and monitored in seven different schools, and in five of the seven control groups, consisting of students randomly assigned to traditional civics or social science classes, were monitored. In this manner, comparisons were made for schools that implemented an LRE program under slightly different circumstances, reflecting slight variations across schools in student ages and racial mixtures, and also within schools that did and did not receive the LRE program.
The value of comparative monitoring information lies in the comparative perspective it provides. With basic monitoring information, the only comparison possible is internal to the program, i.e., comparison with program goals and objectives. Comparative monitoring allows such comparisons, but also allows comparing performance with other programs. For example, if one program attains 90% of its planned instruction hours, while another programs achieve more (95%-100%) or less (70/5-85%), more had been learned than a simple internal check against program objectives.
More important, however, is the confidence in interpreting findings that the comparative approach provides. When describing and evaluating programs, especially when short or long term outcomes are discussed, it is valuable to consider alternative possible explanations of your findings. It is important to know whether an increase in clients, a change in client behavior, or a response to a program initiative is really the product of the program under study or some other factor, such as increased arrests, client education level, or outside influences. The comparative perspective brings more information to the analyst, and allows control or analysis of factors outside the program. In this manner it increases the ability to distinguish program effects from other influences, and thus gives the evaluator more confidence. Chapter Four addresses this issue in more detail under the section "Threats to Validity."
Basic Process Evaluation
If the planned outcome for the LRE program consists of certain attitudes and behaviors among the students, or changes in attitudes and behaviors, then there must be some process by which the program activities, as measured by monitoring, produce the expected outcome. By considering the program activities in combination or in some sequence, and by considering the mechanism by which the activities produce the result, you enter the realm of process evaluation.
For example, qualitative and quantitative measures of student-teacher relations or
interactions might provide a process measure that is more predictive or program success
than simply counting the number of hours spent in class. The effect of time on the
program, as indicated by turnover or absenteeism, is another valuable process measure.
Qualitative assessments of the program's link with other school activities,
Process evaluation involves developing and analyzing data to assess program processes and procedures, esp., determining the connections between various program activities.
may also prove valuable. For example, did the field trips interfere with other classes
or extra-curricular activities, or of the program's acceptance by the school
administration. Test and quiz grades may also be good interim measures of program success.
It is these kinds of information that help explain how the various program activities
operate together. They produce short-term outcome measures which, if positive, can be
expected to produce positive results in the long run as well. Basic process evaluation is
valuable for other reasons, including:
Comparative Process Evaluation
Comparative process evaluation employs the same measurements and assessments used in basic process evaluation, but for comparable or control programs. When comparisons are used for process evaluation, the benefit for evaluators is even greater. Principally, the comparative perspective at the process level provides much more confidence in the findings because the number of cases (programs, or students within programs) increases, and because information from different programs introduces different perspectives and controls into the evaluation.
Consider the LRE program. Process evaluations were conducted in six of the seven schools (one school received minimal administrative support). The LRE programs varied in their student types-the grade ranged from junior high to middle school, and one was a multi-level grade school; one was more racially mixed than the other. The schools also varied in their implementations of the LRE program. This produced variations in quantitative and qualitative information that broadened the entire project's understanding of LRE and its potential, more than a single case study would have done.
Additionally, comparative process measures provide relative measures; that is short-term performance measures that can be compared with measures from other programs. Relative measures permit comparisons of marginal performance differences across programs, and also allow evaluators to address a variety of policy-related questions-What if we taught more hours? What if there was a diversity of students in the class? What if fewer field trips were taken? These questions can be answered with more confidence when enough cases are present to produce variation in the variables of interest.
Basic Outcome Evaluation
With basic outcome evaluation, the logical sequence from program activities, to program processes, to program outcomes is made for a single program. Such analysis cannot be attempted without the antecedent monitoring and process evaluation that outcome evaluation suggests. In the case of the LRE program, two outcomes measures were chosen, and they were taken before and after the implementation of the program. The measures were:
(1) Student scores on scales measuring correlates of law abiding behavior.
(2) A self-report survey on various criminal and delinquent behaviors.
For this evaluation within a single school, program success is defined as whether, or to what extent, student attitudes and behavior changed in the expected directions, as determined by comparisons of time series data. Do the scores and other
Outcome evaluation involves developing and analyzing data to assess program impact and effectiveness.
indicators change in expected directions following (and perhaps during) the LRE program? Are these short-term outcomes, measured soon after classes ended, predictive of longer-term behavior, which might be measured by follow up studies?
Other approaches to basic outcome evaluation might include comparisons within a program. Such activities might involve comparing LRE curriculum to other curricula, or comparing different LRE curricula, predicting "high risk" students at the outset of the program and focusing follow-up efforts on them, studying early and intermediate indicators of successful outcome, or measuring outcome at various points during the program.
The value of a basic outcome evaluation such as this lies in providing the best information possible about program performance. In a single school, with this evaluation design, a finding that LRE students scored about the same or worse on post-program measures in comparison to pre-program measures would have hurt the overall program. If nothing else, the findings would have stimulated reconsideration of the program's goals and procedures. There would have been no information showing that it made a difference. In this case, though, differences were observed in the expected directions. Had the outcome findings been inconclusive the evaluators would turn to the process and monitoring data to explore the reasons, and probably would have found some helpful clues. The LRE program did just that and found other benefits such as favorable feedback from parents, and improvements in police officer handling of juveniles.
A basic outcome evaluation that uses comparisons within a program, as the LRE project did, allows the researcher and program administrator to address the question "Did the program make a difference?" While it doesn't explain what would happen if the program was not implemented, it provides information regarding program impact and program effects.
Comparative Outcome Evaluation
In a comparative outcome evaluation, long-term outcome measures are collected for more than one program, usually for the program under inquiry and a control group of programs, but they may be collected for multiple programs and control groups. This was the research design for the LRE program evaluation. The outcome measures described above were collected for LRE students and control student groups, before and after the program was implemented in five different schools. With the exception of collecting even longer-term measures, e.g., follow-up examination of attitudes and criminal behaviors after one or more years, replicating the research design for one program over multiple programs provides valuable evaluation information. This is especially true when comprehensive monitoring and process evaluation data have been collected in the course of program implementation.
The benefits of comparative outcome evaluation often include all of the benefits of lower levels of evaluation since those kinds of information are necessary to support it. The benefits of comparative outcome evaluation also include:
Comparative outcome evaluations, then, deserve the highest degree of confidence, especially if a pre/post comparison and a comparison with other controls is employed. Often it will not be possible to design and implement a thorough comparative outcome evaluation. Pre/post only designs, or comparison with controls only, will provide evaluators with good information regarding program performance.
Summary
This chapter has reviewed various approaches to evaluation. Before actual research planning takes place, a number of issues must be considered to help focus the evaluation and to prepare the research design. Good program candidates for evaluation should be identified, since all programs can be evaluated but available resources will not allow it. The program issues of accessibility, length of operation, history, expense, nature of controversy surrounding it, and external pressures to evaluate should be considered in making the selection.
If you give careful consideration to these and other issues you will usually find that (1) evaluation is not as difficult or esoteric as it seems, (2) you have been doing it in some fashion already and may as well take credit for the good work. (3) that there may be a broader audience for the information being produced, especially if the manner and format of presentation are adjusted a bit, and (4) providing objective data about juvenile justice programs will be appreciated by many in the field.
Now, having decided that evaluation can be accomplished, you face decisions about how to conduct them. The next chapter will review the basics of evaluation design and other relevant research issues.
Notes
1. The evaluation examples presented in this section were derived from a report
entitled, "Using School-Based Programs to Improve Students' Citizenship in
Colorado," by Grant Johnson and Robert M. Hunter of the Action Research Project at
the University of Colorado. It was published by the Colorado Juvenile Justice and
Delinquency Prevention Council in October of 1987. Their permission to use their materials
is gratefully acknowledged.
4 Critical Issues
Introduction
In this chapter, some of the technical and research design-oriented aspects of evaluation are addressed. In this context, "critical issues" means concepts and problem areas you should understand well enough to distinguish between a good or bad evaluation, or evaluation proposal. There also may be ideas or approaches to evaluation research that are new to you, or on which you might need refreshing. Therefore, a presentation on evaluation uses concludes this chapter.
Prior to concluding, we will discuss five critical issue areas in evaluation research, based on the experiences of seasoned program evaluators. They are:
(1) Measurement-measures of juvenile justice program input, intermediate effects, and outcome.
(2) Validity-definitions of, and threats to, internal and external validity.
(3) Special Topics in Program Evaluation-including the varieties of data sources to consider, random assignment in evaluation research design, cost benefit analysis, and the use of observations in evaluation research.
(4) Common Errors in Evaluation-pitfalls in interpreting data.
(5) Creativity in Evaluation-a section that stresses the need for resourcefulness and
ingenuity in conducting evaluations.
Throughout the presentation of these issues, please remember the overall role and purpose of evaluation. It is common to assume that evaluation can address questions such as "Does the program work?", "Has the project been successful?", and "Should the project be continued?" However, it is inappropriate to expect an evaluation, and the evaluator, to address such questions. They represent issues which involve value judgments requiring project and policy worthiness, both of which are beyond the role of the evaluation. Determining project success and continuation involves examinations of resources, priorities, and politics, a process outside the evaluator's task. The role of the evaluation is to provide objective information concerning project activities and their outcomes to program administrators, policy makers, and funding agencies who will make the determination of the worthiness of the project and decide its future.
Measurement Issues
Perhaps there is no more critical issue in evaluation than defining and measuring the variables to be used. The validity of a study depends on appropriate measures of project activities and outcomes. Indeed, the final judgment of the program may depend upon how the program operations are conceptualized and measured. The choice of measurement, and design of the evaluation will to a large degree, determine if the evaluation is to be believed by evaluation consumers.
There are no hard and fast rules in the choice of measurement; what is most critical is that the measures are appropriate for the context for which they are intended. For example, in evaluating a juvenile division program, one may want to measure a youth's family relationship as a factor that might influence his or her ability to avoid further contact with the court. Obviously in assessing the impact of a positive peer culture program in a juvenile institution this type of information would be less relevant. Other considerations involve the definition of measures of success. If recidivism is defined as a police contact, a different rate of project success will be obtained that if it is defined as adjudication or incarceration. None of these measures are wrong, they are just measuring different things. This it is important to be clear about the meaning of measures chosen for the evaluation..
There are three types of categories of measurement integral to any juvenile justice evaluation: measures of program input, program processes, and program outcomes. The discussion of each of these will frame the remainder of this chapter.
Measurement of Program Input
While the popular conception of evaluation focuses upon program outcomes, remember that there is considerable variation in program inputs which can affect the results of the intervention. Often in juvenile justice there are broad program types such as diversion, education, and family therapy. While there are commonalties among programs within each of these categories, programs with a similar overall description may involve distinct intervention strategies with dissimilar clients. For example, two division programs may be oriented to reducing the level of commitment to juvenile court. However, one may involve diversion at the police level for status offense, while the other may involve screening at the court or prosecution stage to divert minor property offenders to a restitution program. Before we can say what works it is necessary to say what we are doing.
There are several aspects to measuring program input. First consider how the goals and objectives of the program are translated into practice. What facets are to be emphasized through commitment of resources toward the project objectives? What are the major project activities and how do these relate to the anticipated outcomes? Why and how is the program supposed to work? What are the underlying reasons the intervention is presumed to be effective?
These questions all relate to the theory of the program. Although it is often presumed that theory is irrelevant to juvenile justice practice, nothing could be further from the truth. Indeed, the program theory is a statement of the mechanisms through which the intervention is to work. Most importantly from an evaluation standpoint, it tells us what variables and concepts to measure.
Thus, one of the first tasks of evaluation is to obtain a clear explication of the theory behind the program. What should the program change that will result in reduced delinquency? As a prevention program, is it oriented to improving the individual's self concept, attachments to family, school performance, or opportunities? Each potential area of program emphasis implies a different casual process designed to reduce delinquency. Although program designers and administrators may not always claim to have employed theory in the creation of the program, theory is implicit in all forms of delinquency intervention. It can remain the evaluator's task to clarify the reasoning behind the intervention.
While it may appear quite straightforward to specify what the program is, e.g., the reduction of probation officer caseload size, specifying the content, what is actually going on, may be more difficult. There may be substantial variation in operational procedures, and among program personnel. The more complex the project and varied the components, the more difficult and crucial this task is.
Why be concerned about theory and program content? After all, isn't the issue to measure the effect and impact of the intervention? True, but if the evaluator doesn't know what the program is, he or she may fail to ask the appropriate questions regarding program impact, the wrong variables may be measured, or appropriate measures be omitted. Most importantly the evaluator will not be able to attribute changes observed to program components or activities. Since a principal reason for evaluation is to replicate successful programs, it is imperative to know precisely what was done in order that the desired components, procedures, and activities can subsequently be implemented.
Another important virtue for these evaluation input measures is to clarify project activities beyond those presented in the funding application. Often at the time of application the specifics of program operation are not finalized, yet many evaluations use the wording of funding proposals as a reflection of what goes on in the program. If the evaluator simply accepts the program statement as fact, the danger exists of making incorrect statements about its effects. In fact one could wind up evaluating a program that does not actually exist.
For example, although the planners of a juvenile division program may have designed an intervention for youths who commit criminal offenses, the staff operating the program might decide that a more appropriate intervention, given staff resources and expertise, is to divert youths who have family problems and are principally status offenders. Without an understanding of such a shift, the evaluator may inappropriately make conclusions regarding a different form of intervention than actually took place. In the words of Carol Weiss (1972;44),"the evaluator has to discover the reality of the program rather than its illusion".
There are two forms of data which can be considered as input measures; data on the program itself, and data about the program participants. Program specific data would include the purposes of the program taken from program statements as well as staff descriptions, resource allocation, methods of operation, day to day procedures, staffing patterns, location, size of program, management structure, and inter-organizational relationships. Plus every evaluation should carefully document the content, duration, and intensity of treatment involved in the intervention.
The second type of input data concerns characteristics of program clients. This would include demographic and personal characteristics such as age, gender, education, employment, and family economic status. In addition, depending upon their relevance to the program theoretical significance, or use in prior research, one may also wish to collect data on the attitude and perspective of program participants on a variety of issues that may be related to his or her performance in the project. These areas might include the youth's relationship to family and peers, attitude about the program, motivation for participation, perception of sanctions and deterrence, social responsibility, and self concept. While the project may be reasonably expected to alter some of these factors, others such as gender, and quite impervious to change. It is useful to collect data on these and other control variables to determine the types of clients who are more likely to be successful in the program.
There are two forms of data which can be considered as input measures: data on the program itself, and data about the program participants.
Measuring Intermediate Program Effects
Beyond an accurate reflection of program inputs and content, a thorough evaluation should contain an analysis of the attainment of mid-range goals. Almost every juvenile justice program contains both mid-range and long range objectives. For example, a juvenile division program may have the ultimate objective of keeping youth referred to juvenile court from committing subsequent offenses. But there are probably intermediate steps that are believed to lead to this goal. It may be that those diverted are to make restitution; if so intermediate measurements need to determine if the youth in fact do so. While one could collect outcome data on subsequent offenses, and proclaim the program a success or failure, this process would obviously be in error if one could not ascertain that this intermediate, and presumably casual, step had not taken place.
Although it seems obvious that one cannot say a restitution program was successful unless restitution was made, this type of error is quite common in less obvious situations. A program may involve drug treatment as a method of reducing delinquency. While subsequent delinquency may or may not be affected by program participation, it is imperative to consider the impact that the counseling program has upon drug use independent from delinquent activity. It is certainly conceivable that the program may be effective in reducing drug use even though delinquent activity remains unchanged. On the other hand it may be possible that the program has no effect on drug use, and this any conclusions relative to the program's impact on delinquency through drug treatment are inappropriate.
These two types of outcomes are referred to as theory failure and program failure If the program works but the outcome criterion is unchanged, i.e., if drug use is reduced but there is no effect on delinquency, then theory failure has occurred. While the program has achieved the desired intermediate effect, our theory about delinquency being a result of drug use may be flawed. Such a situation would require a reformulation of the theory and restructuring of the intervention.
On the other hand, if the program is not observed to affect the intermediate goal, i.e., if drug use is not affected by program participation, then no conclusions can be made regarding the overall impact of the program on delinquency. The relationship between drug use and delinquency has not been adequately tested. Since drug use has not been altered, any changes in delinquency cannot be attributed to drug use patterns. This is an important distinction because although drug use has not changed, delinquency involvement may have changed. If drug use is not measured, then changes in delinquency may be falsely attributed to the treatment program.
This situation also confirms the necessity for adequate input data. Although the changes in delinquency may not be a result of reduction in drug use, there may be other aspects of the program hat have resulted in this change. For example, the establishment of a positive relationship with the counselor may have resulted in delinquency reduction independent of drug use. Having these data can aid in the redesign of the program to focus on relationships that may be more productive in reducing delinquency.
The consideration if intermediate effects has an additional benefit. The statement of intermediate steps forces a clarification of the project, and forms a conceptual model of the processes through which the effects are presumed to be caused. Such explication not only clarifies what is expected to occur, but may serve as a guide for replication and revision of the project after evaluation results have been obtained.
Measuring Program Outcomes
Measuring outcomes is popularly viewed as the essence of evaluation. In spite of the critical nature of measuring inputs and intermediate effects, program planners and administrators still need to address the question: "Did it work?" As we have observed there is often not a direct answer to this question, and in many cases the most straight forward answer may be "It depends." How success if defined and measured will often determine the degree to which the program is viewed as effective.
At first glance determining the success of juvenile justice programs would not appear to be problematic: program participants either commit new offenses or they don't. Unfortunately, program success is generally not so directly determined.
Rather, juvenile justice success is commonly measured through the concept of recidivism. While this concept has a universal meaning of correctional failure, its operational definition is anything but universal. There are a number of dimensions of the concept that must be specified before a working definition is obtained. First the threshold of recidivism must be established. What are the specific criteria that indicate program failure? Is subsequent police contact sufficient, or is it more appropriate to count arrests? Should there be a formal referral to juvenile court or must there be a formal adjudication to indicate failure? Some may argue that commitment to an institutional program after release is the appropriate measure of recidivism
Obviously the statistics measuring program outcome will be greatly influenced by this criteria decision. If adjudication is the criteria, then youths who have committed subsequent delinquencies will not be counted as failures unless the system responds with a formal adjudication. On the other hand, if one indicates program success as a lack of police contact, then youths who have not committed subsequent delinquent behaviors could be counted as failures, since they may have contact as a result of being known to the police.
In each of these situations the real outcome of the program is the same, but it will appear much different due to the variation in definition. This difference may be substantial. Waldo and Chiricos (1977) in evaluating a work release program noted that program success may vary from 20% to 70% depending on the definition of recidivism.
In defining recidivism, the evaluator must choose the measure that is most appropriate given the scope and objective of the intervention. Generally, the best measure of recidivism is the one that is closest to the behavior itself, either self-reported delinquency of police contacts. The further a measure is from the individual's behavior, the more it is measuring the influence of organizational behavior and decision-making rather than commission of delinquency acts. The police decision to refer to court and the court decision regarding adjudication may be influenced by a number of factors other than the youth's behavior. This is most apparent in the use of measures of recidivism involving return to the program. If one is evaluating an institutional program and an area of concern is post-institutional behavior, the focus should be on the individual's performance in the community, not on return to the institution. The offender's return to the institution may be due to a range of factors unrelated to subsequent behavior or delinquency. Thus measures of program return should be avoided in recidivism studies.
Another important question in defining recidivism is how serious must delinquent behavior be to constitute failure? In evaluating an intervention program aimed at violent juvenile offenders, should a subsequent court referral for a status or minor property offense be considered as failure? There are no hard and fast rules to govern this. However, in many of these decisions greater information can be collected at little or no additional cost. Where possible, data should be presented on the type of subsequent offenses rather than forcing a dichotomous success or failure decisions.
Another issue in defining recidivism is the length of the follow up period. One correct but somewhat unhelpful maxim is the longer the better. Longer follow up periods have the obvious advantage of better testing the lasting effects of the intervention. However, given the need for timely feedback in a public policy environment, long follow up periods are often not feasible. The time span will also affect the appearance of success. The longer an individual is followed, the more likely we are to discover some wrong doing (except for the most saintly clients). Thus major differences in the impact of the program may be observed from a 3-6 month follow up compared to a 2-3 year follow up period. Another complicating factor involves the need for continuing follow up in adult records for youths reaching their age of majority. For longer follow up periods this becomes a critical issue. Generally, in the evaluation of juvenile justice programs a six month follow up would be viewed as minimal with a year period desirable.
One of the most common evaluation errors concerns the use of this follow up period. A one year follow up period means that data on the legal status of each participant during the 12 months following program completion will be collected. The important aspect of this definition is that every person has the same time at risk after the program. Too, often the status of offenders is reviewed as of a certain date, e.g., one year after the program began.
Generally, in the evaluation of juvenile justice programs, a six-month follow-up would be viewed as minimal with a one-year period desirable.
Results may then show that after a year of program operation, a certain number of youth have completed the program and a percentage, presumably small, have been rearrested. In this situation some offenders have had lengthy periods at risk while others would have had only a few days in which to fail. While it is not necessary that each participant have the same period of follow-up, it is imperative that the evaluator collect these data and consider it in the analysis.
Although these are the most common problem areas in the definition of recidivism, there area a number of other issues that deserve careful consideration. Included in these are the concerns of revocation policy for those on community supervision status, e.e., technical violations versus new offenses as reasons for failure. Also, recidivism studies are based on the assumption that the intervention will be effective in influencing the participants' delinquent activities. Thus, a complete offense history, including the dates, offense type, and disposition, should be obtained. From this information the evaluator can control for the seriousness of prior delinquent behavior. It is important that the pre and post program data be collected from the same source since different processes and definitions may be used in collecting various data sets. If multiple data sources are used for the-pre and post program measures, then a finding regarding program effect may actually be an artifact of differences in the manner in which data were collected.
Alternative Measures of Program Impact
Although recidivism is often a measure of outcome, you should not overlook alternative measures of program impact and effectiveness. For example, in measuring the impact of a juvenile diversion program, changes in the level of court referrals would be a valid outcome measure. Similarly in evaluating a community service program, one might measure the hours worked to compute public cost savings, as well as the attitudes and opinions of participants regarding their responsibility to the community.
You would be well advised to create multiple outcome measures for several reasons. First, it is quite rare that the impact of intervention is observed in only one area. Almost all juvenile justice programs have a range of goals and potential effects. Many purport to benefit clients (better treatment), the organization (greater efficiency), as well as the larger community (lower crime). Measuring recidivism alone does not include the multidimensional aspects of these programs. Multiple measures increase the reliability of the evaluation, and may increase the acceptability of the findings, thereby adding to overall validity and credibility.
Second, it's not wise to place all the outcome eggs in one basket. When judgements are being made regarding program continuation, it is better to have a greater amount of information on performance than to simply rely on one measure such as recidivism, which may be greatly influenced by factors beyond the program's control.
Finally, attention should be paid to less tangible measures of program outcome, such as the consensus building and knowledge production aspects mentioned in Chapter Three. A service delivery or client-oriented program, which might appropriately be evaluated using traditional outcome measures, will have other products, or by-products, worth measuring or assessing in a qualitative way. Consider, for example, the LUE program in Chapter Three. Interviewing participants revealed that the police handled youths differently after participating in the LRE program. This was recognized as a valuable outcome, and may be considered both a consensus building and a knowledge production outcome.
A comprehensive juvenile justice program plan may contain other programs for which recidivism or other quantitative measures are inappropriate as evaluation tools. Legislative initiatives, or standard setting programs fall in this category, ad does the creation and implementation of policy or issue review boards. These programs are often directly aimed at consensus building and knowledge production, or some other system-oriented-versus client-oriented-goal. Such programs are worthy of and amenable to evaluation, and should get serious consideration. Evaluating them will provide alternative measures of programs and initiatives.
Threats to Validity
The evaluation design section noted the importance of constructing the evaluation so as to rule out alternative explanations of the findings. The validity of a study reflects to the accuracy of the results. How confident are we that what we have seen is what is really happening? Can we actually attribute the changes observed to participation in the program?
While the issue of validity can be technical and highly complex, the principal concerns of validity are straightforward and must be considered in every evaluation. Often the issue of validity is broken down to the question of what else, other than program participation, may have caused these results, known as internal validity and how general or representative are these findings to other groups or jurisdictions, which is external validity. Although in some situations the validity question must be handled empirically, there are several well know threats to validity of which you should be aware.
Threats to Internal Validity
Different research designs are susceptible to various types of validity threats. For example, the common pre/post design, in which there are measures taken prior to the initiation of treatment and similar measures taken at The conclusion of the treatment or follow-up period, are vulnerable to issues involving how the subjects may have changed from non program effects during the project. Similarly, evaluations which are based on a comparison group design, i.e., nonrandomly selected "similar" group, face validity problems due to potential selection biases.
Maturation Effects
Pre/post designs often are invalid because of what is known as a "maturation" effect. Changes that may naturally occur due to the passage of time, such as becoming older, smarter, or gaining experience are maturation effects. If these changes are related to the variable under study then a false, or invalid, picture is obtained. This is a particularly problem in evaluating juvenile justice programs, where many youths cease committing delinquent acts as they grow older, independent of any formal intervention. Without an adequate comparison group, which is presumably maturing at the same rate, these changes may mistakenly be attributed to the intervention project.
History Effects
Another common threat to validity is known as a history effect. While maturation refers to natural changes in the participants, history refers to changes in the environment outside of the project that could produce changes in the variable under study. For example, during the course of a diversion project aimed at high risk youth there is a heinous crime committed by a juvenile offender with a corresponding outcry for tougher responses to juveniles. This situation may alter the types of youth referred to the program and presumably affect the results. Attitude surveys are particularly subject to this influence since opinions may largely be influenced by recent events and media presentation of topical issues.
Selection Effects
The third area threat to validity is the effect of selection of program participants. There is an understandable desire to choose individuals who are the most amendable to treatment, who would most likely benefit from participation, and who are the best risks for community treatment. After all, program continuation may be based on the performance of the initial participants, meaning a natural desire to select those who have the best chance of succeeding.
However, this group may be those offenders who will do well regardless of program participation. Furthermore, in many cases this hand picked group is not from the project's stated target population, producing biased results regarding the effectiveness and impact of the program
For example, a program may be created to divert property offenders from adjudication. In screening potential clients the staff selects very minor offenders, petty shoplifters, who have a positive home situation, since these offenders are most likely to be good risks for diversion. However, it is unlikely that these offenders were being adjudicated prior to the implementation of the program, and it is less likely that would commit subsequent offenses compared to a more serious delinquent population that would formerly have been adjudicated. Compiling such superficial results and comparing them to regular court probation programs makes it appear that the program has been very effective. But this appearance is likely the result of selection bias and not the effect of the program. Given this common situation, monitoring should be conducted in all juvenile justice evaluations to be sure the appropriate target population is being reached. An adequate comparison group is necessary to indicate if the effect would have occurred with a similar population without the program.
Mortality Threats
Just as selection of program clients can be a source of bias so can the differential dropout rate or mortality among participants. While there is a strong temptation to present results on only those youths who successfully complete the program, this also will result in a bias group of comparison. While program completion and obtaining the full treatment effect are important inputs to an evaluation, they should not cloud the comparison with the performance of a control group. Many evaluations take pains to insure that equivalent groups are available for comparison. If a comparison is made with only those that complete the program these groups are no longer equal. There are most likely qualities that distinguish those completing the program from those who drop out. To the degree that these qualities are related to delinquency, there will be significant bias as a result of this inappropriate comparison.
Threats to External Validity
The threats to validity of history, maturation, selection, and mortality concern internal validity; that is, the validity of the findings of the evaluation itself. External validity refers to the degree to which the findings can be generalized to other groups or jurisdictions. If there is a relationship between the kinds of youth in the program and performance, or the characteristics of the jurisdiction make it unique from other jurisdictions in ways that may be related to delinquency, then the ability of the program to be replicated successfully is limited. For example, if a status offender intervention program is found to be effective in an upper-middle class jurisdiction there is little reason to believe that a similar program would be effective in a lower class area given the dynamics of status offending and referral process to juvenile court in these areas.
There are a number of additional threats to the "accuracy" of program evaluations. Think through the program procedure and research design and ask what else could produce biased results and threaten the validity of the evaluation findings. Make modifications in the evaluation design to address as many of these pitfalls as practically possible.
Special Topics in Program Evaluation
Sources of Data
Official Sources of Data
In many situations the most convenient sources of data are official records maintained by juvenile or criminal justice agencies. Arrests, juvenile court referrals, and adjudications are example of frequently used official data in the evaluation of juvenile justice programs. When recidivism is employed as an outcome criterion it is most often measured from official sources. Although these data are useful and often readily available, you must exercise caution in their use. Remember that the principal reason for which the data were initially collected is as an accounting of the activities of criminal justice agencies. Arrests are more a measure of police activity than of criminal behavior. However, arrest data are a preferred method of measuring recidivism and constitute the best data available from official sources, since they have been least affected by the filtering process of the juvenile justice system.
...official justice records should always be viewed as an underestimate of the actual amount of criminal or delinquent behavior.
In addition to measuring subsequent delinquent behavior, official data sources include the behavior of agency members themselves. If an arrest is made of a juvenile offender it will be represented in the office police records. However, if a youth engages in subsequent delinquent behavior it may not come to the attention of the police or, if it does, the officer may choose not to make an arrest In either case, it would not be reflected in official police records. For this reason, official justice records should always be viewed as an underestimate of the actual amount of criminal or delinquent behavior. Also, changes in organization activities or policy can have an effect on official data which should not be mistaken for changes in crime and delinquency. As long as the evaluator is aware of the potential pitfalls of these data and represents them in the report, official records are a valuable source of evaluation data.
Self Reported Data
Instead of relying on criminal or juvenile justice agencies to tell us about the behavior of youth many researchers advocate asking the youths themselves about their delinquent activities. While this process may seem incredible to some, self reported procedures have repeatedly been found to be valuable and reliable in delinquency research (see Hindelang, et. Al, 1981). In this procedure, generally the youth is asked to complete a questionnaire indicating the frequency of his/her involvement in specific types of delinquent activities. While there is general agreement between self reported data and official statistics in identifying the most serious and persistent offenders, self reported instruments offer more accurate and precise measurement of the numbers and variety of delinquent activities.
Interviews and Questionnaires
One of the most valuable sources of data is directly asking program participants, staff, or other individuals questions pertinent to the evaluation. While these approaches may involve somewhat different methods, they are similar in that the evaluator is attempting to elicit information directly from those knowledgeable about, or involved with program activities. Measurement of self reported delinquency