Chapters 5 - 9

 

CHAPTER 5
USING THE SAR AND THE SSS

 

What Are the SAR and the SSS? 1

The Subgrant Award Report (SAR) is the form you complete when subgrant awards are made, to inform VAWGO how states are allocating their funds and what the subgrant projects plan to accomplish. An SAR is completed for each new or continuation subgrant awarded with each year's funds. Reports on subgrants funded with FY 1995 funds were done in the fall of 1996, and reports on FY 1996 subgrants are due in fall 1997. Beginning with reports for FY 1996 funds, the SAR can be completed and submitted to VAWGO on the computer.

The Subgrant Statistical Summary (SSS) is the follow-up to the SAR, and provides VAWGO with information on any changes in each subgrant's time period, funding, goals, or activities. It also provides feedback on what the project has accomplished during the reporting period. The SSS is a very useful tool for performance monitoring of immediate results. However, it does not give any information about the ultimate impact of a project. More and different data would be needed to determine how a specific training activity changed officers' attitudes, knowledge, and practices; whether a particular policy has been put into practice and with what effects; how a victim service program has helped women improve their lives; and so on.

The SSS is completed once a year for subgrants which were in operation during the previous calendar year. The first round of SSS reports is due in fall 1997, and will provide information on all subgrants active at any time during calendar year 1996. These might be subgrants funded with FY 1995 funds, FY 1996 funds, or both. They may have started before 1996, they may end during 1996, or they may continue into 1997. You will report on all activities from the project's start date through the end of 1996 (or the project's end date, whichever comes first). The reports will be completed on paper this year but will be available electronically beginning next year.

Getting Information for the SAR and the SSS

Your first question might be, How do I get the information I need to complete these forms? Both forms ask for fairly fundamental information on the wide range of subgrant projects which might be funded, and many project directors should already use record keeping systems which capture this information. However, in some cases you may need to collect additional information. This section describes the basic types of information needed for these forms and where you are likely to find them.

The SAR may be completed by either the state administrator or the subgrantee. It asks for information which was very likely provided in the subgrant application materials. This includes fundamental descriptive information such as the amount and type of STOP funding and match; project start and end dates; geographical target area; type of violence against women to be addressed; type of subgrantee agency; and subgrant purpose areas. It also asks for information on other important concerns expressed in the STOP legislation, including efforts to address full faith and credit and underserved populations. If any information requested in the SAR was not provided in the application materials or other existing sources, state administrators and subgrantees should take the opportunity to discuss how the subgrantee will provide the information, and also what information should be included in future subgrant applications.

The new electronic version provided for the FY 1996 subgrants asks for much of the same information as the earlier paper version used for the FY 1995 subgrants, so state administrators are likely to have already established procedures for getting this information. The major difference between the forms over these two years is that where the 1995 form asked for narrative write-in answers, the 1996 form provides answers to be checked off, based on the most common types of answers received in the earlier year. This should make it easier and quicker for you to complete the new SAR.

The SSS will most likely be completed by the subgrantee and forwarded for review by the state administrator, unless the state agency has established procedures for subgrantees to report this type of information to the state, and for the state agency to maintain the data. The SSS is divided into nine parts (I through IX). Which parts and how many parts each subgrantee will complete depends on the project's purpose areas.

Part I of the SSS must be completed for all subgrants. It provides updates on some of the fundamental descriptive information which may have changed over the course of the project. Each of the next seven parts (Part II through Part VIII) relates to one of the seven VAWA purpose areas and asks for very basic information for each purpose area the subgrant addressed.

The number of parts you complete depends on how many purpose areas your subgrant addressed. Many subgrantees will only complete one part, as their subgrant addresses only one purpose area. So, if your subgrant was focused entirely on delivering direct victim services, you would complete only Part VI; if it focused exclusively on stalking, you would complete only Part VII.

However, some of you may have to complete more than one part if your subgrant covered two or more purpose areas (e.g., if your subgrant includes training, policy development, and data system development, you would complete Parts II, IV, and V).

Much of the information you will need to complete the SSS should be available from records routinely kept by projects, such as sign-in sheets for training sessions or logs of calls to a hotline. Other information may be obtained from knowledgeable project staff, such as what other training-related activities were done, how many people staff the special unit, what topics are addressed in the policies you developed, the type of database you have created, and so on.

Part IX is very important. Everyone who serves victims directly should complete Part IX. Part IX asks you to describe the demographic and background characteristics of the victims you served, and is information that VAWGO is required to collect by the VAWA. Subgrants that focus on or have components that focus on special units, victim services, data systems, stalking, or serving Indian populations will all need to complete Part IX if they provided any services directly to victims.

Using the SAR and SSS Information

Information from these sources is invaluable to VAWGO and the Urban Institute as they monitor, evaluate, and report to Congress on how the STOP program is working. But what can it do for you? If you are a state administrator, it can help you keep track of how funds are being distributed across the state, what types of projects are working particularly well, and useful directions for future funding. If you are a subgrantee, this information can help you monitor the implementation and accomplishments of your project, make improvements as needed, and document achievements to justify future funding.

State-Level Monitoring and Planning

SAR and SSS information can be used to identify statewide patterns in STOP funding; what needs are being addressed; and what needs are remaining unmet. For example, you may be interested in knowing how many subgrants or how much funding is devoted to sexual assault versus domestic violence versus stalking. You can aggregate the information from these forms to get total spending for each type of crime for a single year, or identify spending trends over time as the STOP program continues. You may also want to do the same analysis of which underserved populations are being targeted for services, or which areas of the state have not received funding. Knowing how the funding has been allocated according to whatever factors are most important in your state can be very useful in deciding future funding priorities and strategies for soliciting and receiving the proposals you are most interested in.

You can also gain very valuable information by comparing SARs and SSSs on a subgrant-by-subgrant basis. This comparison will let you answer questions about how subgrant goals or activities change over time, and to what extent the projects are achieving their objectives. Suppose, for example, you compare the SARs and the SSSs and you find that the prosecution training projects are having more difficulty staying on time or reaching potential trainees than the law enforcement training projects. This may lead you to look for differences between law enforcement and prosecution in their training projects (e.g., what type of agency receives the funds, how the training is structured) or the general training environment (e.g., requirements, coordinating agencies, available curricula). You may then be able to identify what it is about the law enforcement training projects or environment that is more conducive to project success, and how these features can be borrowed or adapted to benefit the prosecution training projects. This may help you select future prosecution training proposals with a higher likelihood of success, and may indicate special projects needed (e.g., you may wish to fund a project to establish a coordinating agency for prosecution training).

Monitoring and Modifying the Subgrant

The information needed for the SAR and SSS can also be very useful to administrators of subgrants. The SAR provides a record of the activities and goals that you and the state STOP administrator agreed your project would address when the award was first made. The SSS you complete every year (for multiyear projects) provides a record of progress and milestones you have achieved. They can be used to improve how your program is implemented or renegotiate activities and goals if any have proven beyond the scope of the project. Of course, you need not assess progress only once a year when the SSS is due; you can monitor the project more often by getting quarterly or even monthly reports from the record keeping system you have established to provide the information requested in the SSS.

The SSS also documents what your project has accomplished—how many victims you have served, how many staff you have trained, or what policies you have developed. This information can be very useful in helping you assess what still needs to be done and how best to do it based on your experience working in the area. Being able to show that you have done successful work in the past and can identify and address unmet needs is very impressive to potential funders.

Providing Feedback on the Forms

Another very important type of information that you can provide is your opinion on the SAR and SSS forms themselves. Let your VAWGO Program Manager know what parts were difficult to use or interpret; what parts were not useful to you or did not really capture what your project is about; and what additional information you would like to provide to give a better understanding of what your project is about, or that would be more helpful in meeting your goals as you monitor, improve, and seek additional support for your statewide strategy or individual project. These forms can be modified from year to year, and your experience with them would provide very valuable insights about how they can be improved to meet everyone's needs for information.

How to Handle Special Circumstances

In some cases it might not be that easy to determine program outputs—how many personnel were trained, victims served, and so on—under STOP funds. This might be because the project's support came from more than one source, because the project is giving more or different services to the same victims it would have served without STOP funds, or because the project enhances service quality but does not increase the numbers of victims served or services offered. How do you report project activities and accomplishments on the SSS then?

Projects with Several Funding Sources

Suppose, for example, STOP funds supported a project which established a sexual assault special unit in a law enforcement agency and developed a policy on investigation and charging. This project is being supported with equal amounts of STOP, VOCA, and Byrne funds. The unit has five full-time staff who investigated 300 cases of sexual assault through 1996. The question is whether (1) to report on the SSS that the STOP funds were used to develop one-third of a policy, employ one and two-thirds staff, and investigate 100 cases; (2) to report some other allocation (perhaps the STOP funds were in fact used exclusively for policy development); or (3) to report the total outputs of the project without allocating them among funding sources.

You will provide the clearest and most comprehensive picture of what the project has accomplished by reporting in Parts III, IV, and IX the total outputs of the project (e.g., 5 staff, 1 policy, and 300 cases), even though it was supported by three sources of funding. To reflect the funding situation, make sure that your answer to Question 4 in Part I shows ALL the non-STOP funding sources being used to support this activity, including the dollar values assigned for in-kind support. Analyses of projects with multiple sources of funding done by state coordinators or national evaluators will then combine your answers to Parts III, IV, and IX with Part I, Question 4 to assess cost per unit of service, efficiency, or productivity. That is, they will compare the total inputs (STOP, other federal, state, local, and private funds and in-kind cash equivalents) to the total outputs (300 cases, 1 policy, and 5 staff). In some cases this may tell us how much was spent per staff, or per case, or per training session, or per policy, and so on. We may or may not be able to allocate costs to specific funding sources, but since you don't need to know where the funds came from to know whether a service is efficient or productive, this is a useful approach and provides the most comprehensive information on the projects STOP funds are supporting.

Projects that Enhance Services to the Same Victims

When STOP funds support a project that lets you serve victims you would not otherwise have reached, it is clear that those are the victims you should report on in Part IX. But suppose you were running a counseling service for sexual assault and domestic violence victims, and (Case A.1) you are using your new STOP funding to add court accompaniment for about 20 percent of your current clients, or (Case A.2) you are now able to offer victims up to 10 free counseling sessions whereas before you had to limit them to 5 free sessions. How do you characterize your services, and what victims do you include in Part IX?

CASE A.1: In Part VI, indicate on Questions 28 and 29 that your STOP subgrant gave new types of service to the same victims you would have served even without the STOP funds. Then, in Part IX, provide the characteristics of ONLY the victims who received court accompaniment (in addition to the counseling services you were already offering before STOP funding).

CASE A.2: In Part VI, indicate on Question 28 and 29 that your STOP subgrant gave more of the same types of service to the same victims you would have served even without the STOP funds. Then, in Part IX, provide the characteristics of the victims who received the increased number (6 or more) of free sessions.

Projects that Enhance Service Quality but NOT Numbers

Questions might also arise when STOP funds are supporting goals and activities which are not easily translated into numbers of services or numbers of victims. Suppose (Case B.1) you are using your STOP money to add a nighttime staff person to your shelter to make it more secure and make women feel safer, but you will not be serving any more women, unless (Case B.2) more women are now willing to stay in the shelter because they know it is safer. How do you report on your project and the victims it served?

CASE B.1: In Part VI, indicate on Question 28 and 29 that your STOP subgrant gave enhanced or improved services to the same victims you would have served even without the STOP funds. Then, in Part IX, provide the characteristics of ALL the victims who stayed in your shelter once the nighttime staff person was on board.

CASE B.2: In Part VI, indicate on Question 28 and 29 that your STOP subgrant gave enhanced or improved services to the same victims that you would have served even without STOP funding, AND ALSO to different victims than you would have served without STOP. Then, in Part IX, provide the characteristics of ALL the victims who stayed in your shelter once the nighttime staff person was on board.

However, this leaves you dissatisfied because what you really want to show is the impact of your project—that women feel safer when they stay in your shelter. You want to know whether women feel safer under the new arrangements because if they don't, you want to do something different. This is because you want them to feel safer—you are not doing this just to fill out forms for VAWGO. To answer this question you will need to go beyond the performance monitoring data you provided in the SSS and gather some impact data. You can ask around informally and stop there. Or, you can use one of the outcome measures described in Chapter 7 to assess the level of safety your clients feel and whether it has changed from before to after the new security measures were put in place. In fact, as you will usually have some time between notification of award and actually getting the money, you can use this time to collect data from the women who stay in your shelter without the enhanced nighttime security to make your "before" assessment. Then use the same measures to ask the clients who stay there with the enhanced security how they feel, and you will have your "after" assessment. Compare the two to see whether you have achieved your goal.

Other Ways to Get Useful Information

The SSS is not the only way you can find out what is going on across your state or how your project is coming along. Your clients and colleagues have many insights and suggestions they would no doubt be delighted to share if you asked them.

State administrators can turn to the statewide planning group or advisory board for advice and assistance. Members who represent law enforcement, prosecution, and victim services agencies and associations should have a good idea how projects are working in their areas and what additional work is still needed. State administrators can also consider convening regional or statewide subgrantee conferences to get their feedback directly and to promote cross-fertilization of ideas from one community to another. A model emphasizing multiagency community teams, similar to the model used in the VAWGO conference in July 1995, might be a useful approach. State administrators may also wish to borrow another page from the federal approach and support a state-level technical assistance project. Not only would this enhance project activities, but monitoring the types and sources of questions and issues may indicate areas for further work.

Subgrantees may also gain valuable insights by consulting other members of their community. You may find that your project is having unintended side effects, positive or negative, on other agencies, and can then work with those personnel to identify appropriate responses to spillover effects. You may also find that other agencies could contribute to or benefit from expansion of your project to include them; either way your project accomplishes more. And don't overlook the people who participate in your project—you can gain many valuable insights that would otherwise have never occurred to you by directly asking prosecutors how the special unit is coming along, asking court personnel what makes your database easy or difficult to use, asking law enforcement officers what they thought of the training you provided, or asking victims whether the project's services were helpful and what else they needed. STOP project staff may be able to hold a small number of informal interviews, but larger or more structured focus groups or surveys may be best undertaken by local or state evaluators involved with the project.

 

CHAPTER 6
CHOOSING AN EVALUATION DESIGN
1

This chapter describes the major evaluation choices available to you and discusses the factors you need to consider in picking an approach for your evaluation. It covers the three types of evaluation described in Chapter 1—impact evaluation, process evaluation, and. performance monitoring. Following an overview of evaluation designs, the chapter briefly describes a variety of data sources and data collection strategies, including quantitative and qualitative ones. You may use one or more types of evaluation, and one or more data collection strategies, in evaluating your program. Once you have an overview of options in evaluation design, the chapter helps you figure out which level of evaluation might be right for your program, what evaluation activities to select, and how to proceed when your project is already operating and now you want to introduce an evaluation. The chapter also includes a discussion of how to safeguard victim rights and well-being when you are thinking of collecting data from the individuals you serve. The chapter ends with a list of additional reading about evaluation design and basic research methods.

Parts of this chapter are important for everyone to read. These include the section on choosing the right level of evaluation for your program (among impact and process evaluations, and performance monitoring), and the section on informed consent and data security. Other sections of this chapter may seem quite technical to some readers in their discussion of evaluation design choices and the data collection methods that might go along with them. Therefore, read those sections if you intend to get seriously involved in working with an evaluator to shape the design of an evaluation, or if you intend to design your own evaluation (in the latter case, you probably will also want to consult some of the books listed in the Addendum at the end of this chapter).

Readers who are not going to be deeply involved in evaluation design can still interact effectively with evaluators without committing the more technical parts of this chapter to memory. The most important thing for these readers is to become familiar with the general evaluation options and choices (impact or process evaluation, performance monitoring, and what is meant by a comparison group and what makes a good one) by skimming the more technical sections to get a general idea of design options. Exhibit 6.1, which will be found at the end of the material on evaluation design options, gives you a quick graphic overview and summary of the technical material described in the impact evaluation design portions of this chapter. The exhibit is a "decision tree" showing the design options for impact evaluations available to you under different conditions.

Impact Evaluation Designs

Our description of impact evaluations begins with the least demanding design and moves to more elaborate designs. The following sections present the key elements of each design and variations you can consider. The strengths and limitations of each are summarized as are the general requirements of each in terms of resources such as budget and staff. As you move through these choices, the budget increases as does the extent to which you produce scientifically convincing results. However, as noted below, the best choice is often driven by a consideration of the audience for your results—who wants to know, when do they need to know, what issues do they care about, and what types of information will convince them?

We have tried to avoid unnecessary jargon in describing evaluation methods, but we do use the traditional evaluation terms to describe the people from whom you will be collecting data. Project participants are called "the treatment group" and the services they receive are called "the treatment." Those who do not receive services are called "the control group" (if people are randomly assigned to treatment and control groups) or "the comparison group" (if some method other than random assignment is used to select this group).

Non-Experimental Impact Evaluations

Key Elements. Non-experimental impact evaluations examine changes in levels of risk or outcomes for project participants, or groups that may include project participants (e.g., all women in a particular neighborhood). Non-experimental designs do not compare the outcomes for participants to individuals or groups who do not get services.

Design Variations. You can choose from four primary types of non-experimental design: (1) comparisons of groups before and after treatment; (2) time series designs; (3) panel studies; and (4) cross-sectional comparisons after a treatment has been delivered.

The first two designs are based on analysis of aggregate data—that is, data for groups, not for individuals. In a before and after comparison, outcomes for groups of participants that enter the project at a specific time and progress through it over the same time frame are measured before and after an intervention. Your assessment of program impact is inferred from the differences in the average score for the group before and after the services. This simple design is often used to assess whether knowledge, attitudes, or behavior of the group changed after exposure to an intervention. For example, a project focused on training might ask whether the average score on knowledge about domestic violence policies increased for your group of participating police, prosecutors, or others after the training compared to the baseline score measured at the start of training. Similarly, you could measure public attitudes or beliefs before and after a public safety campaign.

A time series design is an extension of the before and after design that takes measures of the outcome variables several times before an intervention begins (e.g., once a month for the six months before an intervention starts) and continues to take measures several times after the intervention is in place (e.g., once a month for six months after the intervention). The evaluation tests whether a statistically significant change in direction or level of the outcome occurs at or shortly after the time of the intervention. For example, a project trying to increase community collaboration could begin collecting information on the number of cross-agency referrals and other collaborative actions every month for the six months before intensive collaboration development efforts begin, and for every month of the two years following the initiation of collaborative work. You could then trace the development of collaborative activity and tie it to events in the community (including the timing of stepped-up efforts to promote collaboration).

Time series measures may be collected directly from project participants. However, people also use a time series design based on information from larger groups or units that include but are not restricted to project participants. For example, rates of reported violent offenses against women for neighborhoods in which special police patrols are introduced might be used to assess reductions in violence. A time series design using publicly available data (such as the rate of violent offenses just suggested) should be considered when it is difficult to identify who receives project services, or when the evaluation budget does not support collection of detailed data from project participants. Although new statistical techniques have strengthened the statistical power of these designs, it is still difficult to rule out the potential impact of non-project events using this approach.

The next two designs examine data at the individual level (that is, data come from individuals, not just from groups). Cross-sectional comparisons are based on surveys of project participants that you conduct after the project is completed. You can use the data collected with this design can be used to estimate correlations between the outcomes experienced by individuals and differences in the duration, type, and intensity of services they received. This will let you draw some conclusions about plausible links between outcomes and services within your treatment group. However, you can not draw definitive conclusions about what caused what, because you do not have any type of comparison group that would let you say "it happened for those who got services, but not for those who did not get services." Panel designs use repeated measures of the outcome variables for individual participants in a treatment. In this design, outcomes are measured for the same group of project participants, often starting at the time they enter the project and continuing at intervals over time. The design is similar to the "time series" design described earlier, but the data come from individuals, not from groups, and data collection rarely starts before the individuals enter the program or receive the intervention.

Considerations/Limitations. Correctly measuring the services received by project participants is critical in non-experimental evaluations. Because the inferences about project impact are based on response to services, differences in the type and amount of service received are critical. The key variations in services need to be spelled out carefully in developing your logic model. Several limitations to non-experimental designs should be noted:

Practical Issues/Data Collection. Non-experimental designs have several practical advantages. They are relatively easy and inexpensive to conduct. Data from individuals for cross-sectional or panel analyses are often collected routinely by the project at the end (and sometimes beginning) of project participation. When relying on project records, the evaluator needs to review the available data against the logic model to be sure that adequate information on key variables is already included. If some key data are missing, the evaluator needs to set up procedures for collecting additional data items.

When individual project records are not available, aggregate statistics may be obtained from the project or from other community agencies that have information on the outcomes you care about. The primary problem encountered in using such statistics for assessing impacts is that they may not be available for the specific population or geographic area targeted by the project. Often these routinely collected statistics are based on the general population or geographic areas served by the agency (e.g., the police precinct or the clinic catchment area). The rates of negative outcomes for the entire set of cases included may well be lower than rates for your target group, if you are trying to serve those with the most severe cases or history of violence. The larger the population or geographical area covered by the statistics, the greater the risk that any effects on program participants will be swamped by the vastly larger number of nonparticipants included in the statistics.

A more expensive form of data collection for non-experimental evaluations is a survey of participants some time after the end of the project. These surveys can provide much needed information on longer term outcomes such as rates of employment or earnings for battered women after leaving the battering situation, or psychological health for sexual assault victims one or more years after the assault. As in any survey research, the quality of the results is determined by response rate rather than by overall sample size, and by careful attention to the validity and reliability of the questionnaire items.

There are a variety of data collection strategies for use in non-experimental and other types of evaluation designs. We describe a number of them later in this chapter, after the review of evaluation designs is completed.

Quasi-Experimental Designs

Key Elements. Quasi-experimental evaluations compare outcomes from project participants to outcomes for comparison groups that do not receive project services. The critical difference between quasi-experimental and experimental designs is that the decision on who participates in the program is not random. Comparison groups are made up of individuals as similar as possible to project participants on factors that could affect the selected outcomes you want to measure. Statistical techniques are then used to control for remaining differences between the groups.

Usually, evaluators use existing groups for comparison—victims (or police officers) in the same or similar neighborhoods of the city who did not receive services (or training), or those who have similar cases in other neighborhoods. In some situations, selected staff (or precincts or court dockets) try a new "treatment" (approach to services) while others do not. When selecting a comparison group, you need to be sure that the comparison group is indeed similar to the treatment group on critical factors. If victims are to be served or officers are to be trained, those receiving new services should be similar to those who get the existing services.

Design Variations. As just described, the main way to define a comparison group is to find an existing group as similar as possible to the treatment group. The most common variation to the "whole group" approach is called "matching." In matching, the researcher constructs a comparison "group" by matching individuals who do not receive treatment to individuals in the treatment group on a selected set of characteristics. This process for constructing a comparison group runs two relatively serious threats to validity. The first is that the groups, while similar at the time of selection, may change over time due to pre-existing characteristics. As a result, changes over time may reflect factors other than the "treatment." The second is that the researcher may have failed to use key variables influencing outcomes the matching process. These variables, which differed between the two groups at the outset, may still cause matched groups to differ on outcomes for reasons other than the treatment. To do the best possible job on selecting critical variables for matching, you should refer to the background factors which your logic model identifies as likely to influence outcomes. These factors should be used in the match.

Quasi-experimental designs vary in the frequency and timing of collecting data on outcome measures. One makes decisions about the frequency and timing of measurements after assessing the potential threats posed by competing hypotheses that cannot be ruled out by the comparison methodology. In many situations, the strongest designs are those that collect pre-project measures of outcomes and risk factors and use these in the analysis to focus on within-individual changes that occur during the project period. These variables are also used to identify groups of participants who benefit most from the services. One design variation involves additional measurement points (in addition to simple before and after) to measure trends more precisely. Another variation is useful when pre-project data collection (such as administering a test on knowledge or attitudes) might "teach" a sample member about the questions to be asked after the project to measure change, and thus distort the measurement of project impact. This variation involves limiting data collection to the end of the project period for some groups, allowing their post-project answers to be compared with the post-project answers of those who also participated in the pre-project testing.

Considerations/Limitations. Use of non-equivalent control group designs requires careful attention to procedures that rule out competing hypotheses regarding what caused any observed differences on the outcomes of interest.

A major threat in STOP evaluations may be that known as "history" —the risk that unrelated events may affect outcomes. The rapid change in laws, services, and public awareness of violence against women may affect the policies and services available to treatment and comparison groups alike. Changes may occur suddenly in large or small geographic areas, jurisdictions, or service catchment areas. For example, if one court begins using a victim advocate successfully, other nearby courts may adopt the practice or even undertake a more comprehensive project with similar goals. The same is true of prosecution strategies or law enforcement approaches. If your comparison group came from the courts or offices that leapt on the bandwagon shortly after you drew your sample, your "comparison group" has just become a treatment group.

A second threat to validity is the process of "selection" —the factors that determine who is eligible for, or who chooses to use, services. Some of these factors are readily identified and could be used in selecting the comparison sample, or could be included in the statistical models estimating project impact. For example, if victims who do not speak English are excluded from services either formally or informally, the comparison of outcomes needs to consider English language proficiency as a control variable. Such differences may not be easy to measure during the evaluation.

Practical Issues/Data Collection. It is a challenge to build defenses or "controls" for threats to validity into evaluation designs through the selection of comparison groups and the timing of outcome observations. Even when the comparison group is carefully selected, the researcher cannot be sure that all relevant group differences have been identified and measured accurately. Statistical methods can adjust for such problems and increase the precision with which project effects can be estimated, but they do not fully compensate for the non-random design. Findings need to be interpreted extremely cautiously, and untested alternative hypotheses need to be considered carefully.

Plans for quasi-experimental evaluations need to pay close attention to the problem of collecting comparable information on control group members and developing procedures for tracking them. You may be able to collect data and provide contact information for treatment group members relatively easily because the program and cooperating agencies have continuing contacts with clients, other agencies, and the community, and have a stake in the outcome of your evaluation. Collecting comparable data and contact information on comparison groups can be difficult. If you collect more complete information for your treatment group than for your comparison group or lose track altogether of more comparison than treatment group members, not only will the evaluation data be incomplete, it will be biased—that is, it will provide distorted and therefore misleading information on project impact. The best way to avoid bias from this problem is to plan tracking procedures and data collection at the start of the evaluation, gathering information from the comparison group members on how they can be located, and developing agreements with other community agencies, preferably in writing, for assistance in data collection and sample member tracking. These agreements are helpful in maintaining continuing contact with your sample in the face of staff turnover at the agencies involved.

Quasi-experimental designs may employ a variety of quantitative and qualitative approaches to gather the data needed to draw conclusions about a project and its impact. Data collection strategies are described below, once we have reviewed all of the options for evaluation design.

Experimental Designs

Key Elements. Experimental designs are considered the "gold standard" in impact evaluation. Experiments require that individuals or groups (e.g., trainees, police precincts, courtrooms, or victims) be assigned at random (by the flip of a coin or equivalent randomizing procedure) to one or more groups prior to the start of project activities. A "treatment" group receives particular services designed to achieve clearly specified outcomes. If several new services are introduced, the experiment can compare multiple treatment groups. A "control" group continues to receive the services in existence prior to the introduction of the new project (either no services or already existing services). The treatment group outcomes are compared to outcomes for alternative treatment groups and/or to a control group to estimate impact. Because chance alone determines who receives the project services, the groups can be assumed to be similar on all characteristics that might affect the outcome measures. Any differences between treatment and control groups, therefore, can be attributed with confidence to the effects of the project.

Design Variations. One design variation is based on a random selection of time periods during which services are provided. For example, new services may be offered on randomly chosen weeks or days. A version of this approach is to use "week on/week off" assignment procedures. Although not truly random, this approach closely approximates random assignment if client characteristics do not vary systematically from week to week. It has the major advantage that project staff often find it easier to implement than making decisions on project entry by the flip of a coin on a case-by-case basis. A second design variation is a staggered start approach in which some members of the target group are randomly selected to receive services with the understanding that the remainder will receive services at a later time (in the case of a school or classroom, the next month, semester, or year). One disadvantage of the staggered start design is that the observations of outcomes are limited to the period between the time the first group completes the project and the second group begins. As a result, it is generally restricted to assessing gains made during participation in relatively short-term projects.

Limitations/Considerations. Although experiments are the preferred design for an impact evaluation on scientific grounds, random assignment evaluations are not always the ideal choice in real life settings. Some interventions are inherently impossible to study through randomized experiments for legal, ethical, or practical reasons. Laws cannot be enforced selectively against a randomly selected subset of offenders or areas in a community. Access to legal protections cannot be curtailed. For example, protection orders cannot be issued to victims only during selected weeks. Essential services should not be withheld. However, it may be possible to randomly assign alternative services or responses if the relative merits of the alternatives are unknown.

You need to ask yourself whether the results that are likely to be obtained justify the investment. Experiments typically require high levels of resources—money, time, expertise, and support from project staff, government agencies, funders, and the community. Could the answers to evaluation questions—and subsequent decisions on project continuation, expansion, or modification—be based on less costly, less definitive, but still acceptable evaluation strategies? The answer is often "yes."

Practical Issues/Data Collection. Experimental designs run the most risk of being contaminated because of deliberate or accidental mistakes made in the field. To minimize this danger, there must be close collaboration between the evaluation team and the project staff in identifying objectives, setting schedules, dividing responsibilities for record-keeping and data collection, making decisions regarding client contact, and sharing information on progress and problems. Active support of the key project administrators, ongoing staff training, and communication via meetings, conference calls, or e-mail are essential.

Failure to adhere to the plan for random assignment is a common problem. Staff are often intensely committed to their clients and will want to base project entry decisions on their perceptions of who needs or will benefit most from the project—although these judgments may not be supported by later research. Thus it is important that the evaluator, not project staff, remain in charge of the allocation to treatment or control group.

As in quasi-experimental evaluations, lack of comparable information for treatment and control group members can be a problem. Experiments generally use both agency records and data collected from individuals through questionnaires and surveys. To assure access to these individuals, quasi-experimental evaluations need to plan for data collection and tracking of sample members at the start of the project and get agreements with agencies and consent procedures with individuals in place early in the process.

Along with all other types of impact evaluation, quasi-experimental designs often combine quantitative data with qualitative information gathered through process evaluation in order to understand more about the program when interpreting impacts on participants. Another issue is documenting what parts of the program each participant received. If the project services and content change over time, it may be difficult to determine what level or type of services produced the outcomes. The best strategy is to identify key changes in the project and the timing of changes as part of a process evaluation and use this information to define "types of project" variations in the project experience of different participants for the impact analysis.


The Impact Evaluation Design "Decision Tree"

Exhibit 6.1 is a "decision tree" taken from Harrell (1996), organized around a set of questions to which the program wanting to conduct an impact evaluation answers "yes" or "no." With each answer the program advances closer to a decision about the type of impact evaluation most appropriate for its circumstances and resources. This decision tree is a quick graphic way of summarizing the foregoing discussion about alternative impact evaluation designs and their requirements. If your program is ready for impact evaluation, the "decision tree" may help you to think about the type of evaluation that would best suit your program.

IMTP6_1l.jpg (64462 bytes) IMTP6_1r.jpg (50213 bytes)

Process Analysis

 

Key Elements

Process evaluations rarely vary in basic design. Most involve a thorough documentation and analysis of activities of the program. A good process analysis design is guided by a set of core questions: Is the project model is being implemented as specified and, if not, how do operations differ from those initially planned? Does the program have unintended consequences and unanticipated outcomes and, if so, what are they and who is affected? What is the view of the project from the perspectives of staff, participants, and the community? The answers to these questions are useful in providing guidance to policy makers and project planners interested in identifying key project elements and in generating hypotheses about project impact that can be tested in impact analyses.

Design Variations

Process evaluations vary in the number of projects or sites included. Most process evaluations focus on a single project or site. However, some undertake comparative process analysis. Comparative process analysis requires that observations, interviews, and other data collection strategies be structured in advance around a set of questions or hypotheses about elements of implementation believed to be critical to project success. Comparative process analysis allows the evaluation to make assessments about alternative strategies and is useful in generalizing the findings to other settings or jurisdictions. This strategy is used to assess which approach is most successful in attaining goals shared by all when competing models have emerged in different locations. It requires purposely selecting sites to represent variations in elements or types of projects, careful analysis of potential causal models, and the collection of qualitative data to elaborate the causal links at each site.

Most design uncertainties in process evaluation involve deciding what information will be collected, from whom and how. Process evaluation can be based solely on qualitative data. However, qualitative data are usually combined with quantitative data on services produced, resources used, and outcomes achieved. Qualitative data collection strategies used in process evaluation include semi-structured interviews with those involved in project planning and operations; focus groups with project planners, staff, or participants; and researcher observations of project activities. Data collection strategies for use with all types of evaluation are described below, following the presentation of performance monitoring.

Practical Issues

In a process evaluation, it is often difficult to decide on what information is truly key to describing program operations and what information is simply extraneous detail. In selecting relevant data and posing questions about program operations, the evaluator needs to refer carefully to the logic model prepared at the start of the project, although it is permissible and important in process evaluation to revise the original logic model in light of findings during the evaluation.

Analysis of qualitative data requires considerable substantive knowledge on the part of the evaluator. The evaluator needs to be familiar with similar projects, respondents, and responses, and the context in which the project is operating. Your evaluator will need to be able to understand the project's historical and political context as well as the organizational setting and culture in which services are delivered. At the same time, the evaluator needs to maintain some objectivity and separation from project management in order to be able to make an unbiased assessment of whether responses support or refute hypotheses about the way the project works and the effects it has.

Collecting qualitative data also requires skilled researchers who are experienced in interviewing and observing. Data must be carefully recorded or taped. Notes on contextual factors and interim hypotheses need to be recorded as soon as possible after data collection. When using interview guides or semi-structured interview protocols, interviewers must be trained to understand the intent of each question, the possible variety of answers that respondents might give, and ways to probe to ensure that full information about the issues under investigation is obtained.

Performance Monitoring

 

Key Elements

Performance monitoring is used to provide information on (1) key aspects of how a system or project is operating; (2) whether, and to what extent, pre-specified project objectives are being attained (e.g., numbers of women served by a shelter, increases in cases prosecuted, improved evidence collection); and (3) identification of failures to produce project outputs (this kind of data can be used in managing or redesigning project operations). Performance indicators can also be developed to (4) monitor service quality by collecting data on the satisfaction of those served; and (5) report on project efficiency, effectiveness, and productivity by assessing the relationship between the resources used (project costs and other inputs) and the output and outcome indicators.

If conducted frequently enough and in a timely way, performance monitoring can provide managers with regular feedback that will allow them to identify problems, take timely action, and subsequently assess whether their actions have led to the improvements sought. Performance measures can also stimulate communication about project goals, progress, obstacles, and results among project staff and managers, the public, and other stakeholders. They focus attention on the specific outcomes desired and better ways to achieve them, and can promote credibility by highlighting the accomplishments and value of the project.

Performance monitoring involves identification and collection of specific data on project outputs, outcomes, and accomplishments. Although they may measure subjective factors such as client satisfaction, the data are numeric, consisting of frequency counts, statistical averages, ratios, or percentages. Output measures reflect internal activities: the amount of work done within the project or organization. Outcome measures (immediate and longer term) reflect progress towards project goals. Often the same measurements (e.g., number/percent of women who filed for a protection order) may be used for both performance monitoring and impact evaluation. However, unlike impact evaluation, performance monitoring does not make any rigorous effort to determine whether these outcomes were caused by project efforts or by other external events.

Design Variations

When projects operate in a number of communities, the sites are likely to vary in mission, structure, the nature and extent of project implementation, primary clients/targets, and timeliness. They may offer somewhat different sets of services, or have identified somewhat different goals. In such situations, it is advisable to construct a "core" set of performance measures to be used by all, and to supplement these with "local" performance indicators that reflect differences. For example, some victim service projects will collect detailed data on the needs of women or the history of domestic violence, while others will simply have data on the number provided with specific services. Performance indicators need to be constructed so that results can be compared across projects in multi-site projects.

Considerations/Limitations

Indicators of outcomes should be clearly differentiated from elaborate descriptions of the population served. For example, there is a tendency of funders to ask for nitty-gritty details about program clients, when they should be asking what the program has done for these women. Take the case of victim services programs under VAWA. The governing legislation specifies only a few victim characteristics as the information that must be reported. This is quite different from the more important information of what the programs did for victims (services provided), and whether the victims benefited from the services. We probably need only basic information about victims, and might do better to concentrate our evaluation effort on understanding the short- and long-term outcomes that the program has helped them achieve. Chapter 7 lays out what these might be, from a sense of being heard and understood, to living in safety and peace of mind.

In selecting performance indicators, evaluators and service providers need to consider:

Practical Issues

The set of performance indicators should be simple, limited to a few key indicators of priority outcomes. Too many indicators burden the data collection and analysis and make it less likely that managers will understand and use reported information. At the same time, the set of indicators should be constructed to reflect the informational needs of stakeholders at all levels—community members, agency directors, and national funders. Most importantly, the performance indicators should reflect key activities defined as central to the project in the logic model.

Regular measurement, at least quarterly, is important so that the system provides the information in time to make shifts in project operations and to capture changes over time. However, pressures for timely reporting should not be allowed to sacrifice data quality. For performance monitoring to take place in a reliable and timely way, the evaluation should include adequate support and plans for training and technical assistance for data collection. Routine quality control procedures should be established to check on data entry accuracy and missing information. At the point of analysis, procedures for verifying trends should be in place, particularly if the results are unexpected.

The costs of performance monitoring are modest relative to impact evaluations, but still vary widely depending on the data used. Most performance indicator data come from records maintained by service providers. The added expense involves regularly collecting and analyzing these records, as well as preparing and disseminating reports to those concerned. This is typically a part-time work assignment for a supervisor within the agency. The expense will be greater if client satisfaction surveys are used to measure outcomes. An outside survey organization may be required for a large-scale survey of past clients; alternatively, a self-administered exit questionnaire can be given to clients at the end of services. In either case, the assistance of professional researchers is needed in preparing data sets, analyses, and reports.

Data Collection Strategies

 

Quantitative Strategies

There are many types of quantitative data collection strategies and sources. The interested reader can pursue more details through the references provided at the end of this chapter. Here we present only the briefest descriptions of the most common types of data:

Qualitative Strategies

Qualitative data collection strategies are extremely useful. They can stand by themselves, as they do in certain types of process evaluation or case studies. Or, they can be used in combination with quantitative methods as part of virtually any of the designs described in this chapter. As with the quantitative strategies just described, the interested reader can pursue more details through the references provided at the end of this chapter. Qualitative strategies include:

The Planner's Questions

1. Who is the audience for the evaluation? Who wants to know, what do they want to know, when do they need the information, and what types of data will they believe?

2. What kinds of evaluation should be included? Impact evaluation, process evaluation, performance monitoring, or all three?

3. What does the logic model indicate about the key questions to be asked?

4. What kinds of data can be collected, from whom, by whom, and when?

5. What levels of resources—budget, time, staff expertise—are required? What are available?

Additional Considerations in Planning your Evaluation

 

What Level of Evaluation to Use

Every project can do performance monitoring, regardless of whether you can find a good comparison group or whether you undertake a full-fledged process or impact evaluation. Collecting data to describe clients served can show you changes over time, and whether you are meeting certain goals, such as increasing the proportion of your clients who come from underserved populations or reaching clients earlier in their process of deciding to leave a batterer. Routinely collecting data on which services you have given people and who gets them allows you to track whether everyone who needs certain services gets them, which are your most and least frequently used services, what types of services are least likely to be available, and so on. For police and prosecution agencies, such tracking can also document where people get "stuck" in the system, and perhaps help you unplug important bottlenecks.

In addition to performance monitoring, most projects can benefit from some level of process evaluation, in which you compare your processes to your logic model and see where the problems lie. A good process evaluation can help you improve your program, and can also get you to the point where conducting an impact evaluation will be worth the investment.

Designs for Projects Already in Progress

Many projects cannot begin evaluating "at the beginning," because they are already operating at full strength when the evaluation begins. You need not let this stop you. You can still construct meaningful comparison groups in a number of ways, and you can certainly begin collecting data on your own clients as soon as you know the evaluation is going to proceed.

For comparison groups, you can use participants in other programs that do not have the type of intervention you are doing (i.e., collect data from participants in a program similar to yours but across town, or in the next county, which does not have the upgraded services your STOP grant provides), or you can use participants in your own program who predate the enhanced services (i.e., collect follow-up data on women who went through your program before your STOP grant started). If necessary when doing this, you can also collect information on participant characteristics that your old intake forms did not include.

Even without a comparison group, you can do performance monitoring for clients beginning as soon as (or even before) you get your STOP money. As described above, you can learn a lot from performance monitoring, and it can be of great help by providing feedback to help you improve your program. In addition, you can institute "exit interviews" or other exit data collection, through which you can get important feedback about client perceptions of and satisfaction with services. Chapter 7 offers for some ideas about what to measure at these interviews, and how to do it.

Informed Consent, Follow-Up Arrangements, and Confidentiality/Data Security

Ethical considerations dictate a careful review of the risks and benefits of any evaluation design. The risks to victims, project staff, and offenders need to be enumerated and strategies to minimize them should be developed. Studies of violence against women need to be particularly sensitive to avoiding "secondary victimization" through data collection procedures that could cause psychological or emotional trauma, place the victim (particularly in family violence cases) at risk from the offender, or reveal private information including the woman's status as a victim.

A review of whether the evaluation procedures meet acceptable standards for the protection of the individuals and agencies being studied should be conducted before work begins. Many funders require a formal review of the research design by a panel trained in guidelines developed to protect research participants. Even when such review is not required, explicit consideration of this issue is essential. Two considerations should be part of this review—informed consent, and confidentiality/data security.

Informed consent refers to what you tell people about what you want from them, the risks to them of participating in the research/evaluation, the benefits that might accrue to them from participating, and what you intend to do to protect them from the risks. With respect to women victims of violence from whom you wish to collect data on impacts at some later time, it also involves establishing permission for follow-up and procedures for recontact that will safeguard the woman. You owe it to your evaluation participants to think these matters through and write out a clear and complete statement of risks and protections. Then, before you gather any information from women, share this disclosure with them and get their consent to continue. Some funders will require that you get this consent in writing. Informed consent is relevant not only with evaluation participants who have been victims, but is also necessary with anyone you gather information from, including agency employees, volunteers, and members of the general public.

If you want to collect follow-up information on impacts, you will need permission to recontact, and will also need to set up safe procedures for doing so. Even if you have no immediate plans to conduct follow-up data collection, if you are beginning to think about doing and evaluation and realize that you might need to recontact women in the future, consider setting up a permission procedure now. Set up a form for this, that includes the text of the appeal you will make (see below), plus spaces to note agreement or refusal and, if agreement, the woman's name and contact information. With every woman who receives help from your agency, say something like:

"We are very interested in improving our services to help women more. To do this, it would be very helpful if we could contact you at some future time to learn about what has happened to you, whether you think our efforts helped, and what more we could have done to assist you. Would you be willing to have us contact you again, if we can work out a safe way to do so? [if yes...] What would be the safest way for us to contact you in the future?

Work out acceptable arrangements with the woman and write down the particulars on the form. If possible, it would also be best to have her sign the form to indicate affirmative consent to follow-up.

How you will assure the confidentiality and data security of the information they give you is the final thing you need to tell people as part of informed consent. You need to think through risks to victims of telling their stories, and risks to agency employees of answering questions about how things "really" work. Then you need to develop procedures to guard their data so the risks do not materialize (that is, you need to ensure that they do not suffer repercussions should they be identified as a source of information that reflects negatively on an agency). Above all, you should tell research participants up front how you intend to handle the data they give you—will you cite them by name, will you disguise the source of your information, or will you report only grouped data that does not identify individuals. They can then decide for themselves how much they want to reveal. Whatever you tell them, that's what you must do, or you will have violated the understanding under which they were willing to share their perceptions, opinions, and information with you. If you promised not to cite them in any way that would make them identifiable, then don't break your promise. If you want to be able to cite them, then tell them so up front. If you have data with people's names attached to it and you have promised to keep their information confidential, then you will have to develop security procedures to maintain that confidentiality (e.g., keeping it in locked file cabinets, putting only an ID number on the data, and keeping the key that links ID number and name in a separate, locked drawer, limiting access to the data to those people who have committed themselves to respect the conditions of confidentiality you have promised).

 

Addendum: Evaluation and Basic Research Methods

 

General Issues

 

Defining Your Research Issue

A Review of Research Designs

Case Study, Implementation Assessment, and Other Qualitative Methods

 

Surveys and Questionnaires

 

Experimental and Quasi-Experimental Design

 

Causal Modeling—Regression as a General System, for (Almost) Any Type of Data

 

Cost Analyses

Performance Monitoring

 

INTRODUCTION TO THE RESOURCE CHAPTERS

The remaining chapters in this Guidebook provide resources to help you measure and evaluate your program(s). The first six chapters (Chapters 7 through 12) focus on the types of outcomes you may need to measure. As explained below, any given program may need to draw on resources from several chapters to get a complete picture of what the program has accomplished. The next two chapters (Chapter 13 on training and Chapter 14 on data system development) describe evaluation issues and measurement approaches to two fairly complex activities that can be funded with STOP grants. The last chapter offers some background and critical contextual information about conducting evaluations of programs on Indian tribal lands, as these pose some unique challenges of both program development and evaluation.

Most STOP-funded projects will need to draw on at least one of these resource chapters; many projects will need to incorporate the suggestions of several chapters into their evaluation design. The following brief chapter descriptions repeat some information from the Preface and Chapter 1, but augment it by indicating the types of projects that would benefit from reading each chapter:

The remainder of this introduction presents several logic models, starting with simpler ones and progressing to the more complex. For each element in these logic models, we refer to one or more of the chapters in this resource section, to give you an idea how you might use the material in these chapters to construct a full evaluation design. For more examples of complex program models, you might also want to look at the logic models for training and for data system development included in Chapters 13 and 14, respectively.

Example 1: Counseling Services

Exhibit LM.1 shows the logic underlying an evaluation of a relatively simple counseling program. The basic service, counseling, is shown in Column B, which also indicates that client case records are expected to be the source of data to document the types and amounts of counseling provided to clients. A variety of possible immediate and longer-term outcomes for clients (effects of counseling) appear in Column D; the primary data source would be client interviews, constructed to include some of the measures found in Chapter 7. The simplest evaluation of this program would involve collecting data relevant only to Columns B and D, on services received and victim outcomes. Client interviews would be required to obtain these outcome data.

You can complicate the simple evaluation considerably, which will mean more work but probably also more knowledge about what really makes a difference if you do. You can measure background factors—in this case, pertinent characteristics of the women coming for counseling—which appear in Column A. You would use information about client characteristics to help you understand what types of women are helped most by which types of counseling. Another complication is external factors that might increase or decrease the likelihood that counseling will produce the desired outcomes. These are shown in Column C, and include the availability of other supportive services from the same agency or from other agencies in the community (see Chapters 8, 9, and 10); other stressors in each woman's life (see Chapter 7); and justice system actions in each woman's case. The expected sources for both background and external factors in this model include intake forms, assessment forms, and/or research interviews with clients.

Remember that whether or not you measure them, background and external factors are always present. You may leave them out of your data collection plans, but you can't leave them out of your thinking. They should be represented in your logic model so you will be sure to consider (1) what you will miss if you leave them out, and (2) the limits on your ability to interpret findings because you have limited the scope of the variables available for analysis.

Example 2: Special Prosecution Unit

 

Exhibit LM.2 shows the logic underlying an evaluation of a special prosecution unit. The activities of the unit itself are shown in Column B, while the immediate and longer-term outcomes are shown in Column D. Outcomes include some that apply to women victims of violence (see Chapter 7) and some that apply to criminal justice system changes (see Chapter 9). As with Example 1, the simplest evaluation one could do on a special prosecution unit would examine outcomes (Column D) in relation to inputs (Column B). Greater complexity could (should) be introduced by including background factors (Column A, in this case characteristics of the sexual assault and/or domestic violence cases handled by the unit, and their similarity to or difference from the totality of cases handled before the unit began operations, such as whether the unit gets only the tough cases). One might also include external factors in the analysis; Column C suggests a number of external factors pertinent to a woman's decision to pursue a case (e.g., her level of danger, the quality of her support system, or other stresses in her life). Measures for these factors can be found in Chapter 7. Other external factors may also be relevant to particular evaluation situations. Exhibit LM.2 indicates that quite a variety of data sources may be necessary for a full treatment of this logic model, and that one might want to look at Chapters 7, 9, 13, and 14 in the process of planning the data collection.

Example 3: Court Advocacy Program

 

Exhibit LM.3 describes a logic model for a court advocacy program to help women coming to civil court for a protection/restraining/stay away order. Column B shows the direct activities of the program, which one would document through case records and process analysis, including observations. Column D shows a variety of outcomes, including some that are personal to the woman and some that would also be considered system outcomes (following through to a completed permanent order). Column A suggests some background factors of the women seeking orders that might affect both their use of the advocacy services and their own ultimate outcomes. Column C indicates some external realities of court accommodation to the program (or failure to do so) that might make a difference for service efficacy. Data sources would be case records, possible data system data, process analysis, and client interviews.

These brief examples as guides to the resource chapters and their connection to logic models will hopefully provide you with a practical basis for jumping into the resource chapters themselves. They do not answer all questions—for example, they do not give you variables to use in describing clients (although you could use the SSS Part IX data fields as a start for this, and add much more that your own program wants to know). Nor do the chapters give you a system for describing and counting services, as this is much too variable across programs. However, the resource chapters do offer a rich array of ideas and measures covering topics and issues that most programs will need to include in an evaluation. Their specifics should be helpful to you. In addition, just reading the chapters may give you some practice in the way that an evaluator might think about measuring things, and this practice may help you develop the details of your own evaluation.

IMTPexLM_1.jpg (136510 bytes)


IMTPexLM_2.jpg (146188 bytes)


IMTPexLM_3.jpg (98304 bytes)

 

CHAPTER 7
VICTIM SAFETY AND WELL-BEING:
MEASURES OF SHORT-TERM AND LONG-TERM CHANGE

By Cris M. Sullivan 1

Many STOP projects have goals that involve making changes in victims' lives. This chapter offers suggestions about ways to document such changes. It discusses the need to identify and measure changes that occur in the short run, and also changes that are more likely to take a significant period of time to develop. Once this distinction between short- and long-term impacts has been described, the chapter offers specific instruments (scales, questions, formats) that measure both short- and long-term changes.

What is Short-Term and What Is Long-Term Change?

Short-term changes are those more immediate and/or incremental outcomes one would expect to see quickly, and that will eventually lead to desired long-term changes. Optimal long-term changes might include (1) freedom from violence, (2) decreased trauma symptoms, and/or (3) increased physical, psychological, economic, and/or spiritual well-being. However, we would not expect these outcomes to occur quickly as the direct or sole result of a new or improved community-based program. Rather, effective programs would ideally result in some degree of measurable, immediate, positive change in women's lives, with this change ultimately contributing to long-term safety and well-being. For example, a hospital-based medical advocacy project for battered women might be expected to result in more women being correctly identified by the hospital, more women receiving support and information about their options, and increased sensitivity being displayed by hospital personnel in contact with abused women. Or, a SANE (sexual assault nurse examiner) program for treating sexual assault victims might be expected to produce many of these same outcomes, as well as better evidence collection. These short-term changes might then be expected to result in more women accessing whatever community resources they might need to maximize their safety (e.g., shelter, personal protection order) and/or help them cope with emotional issues (e.g., counseling, hotlines), which ultimately would be expected to lead to reduced violence and/or increased well-being (long-term outcomes). However, it would be unrealistic to expect to see a change in the level of violence in women's lives or their full psychological healing immediately or even shortly after receipt of medical advocacy offered to battered women or women who have been sexually assaulted. Rather, programs should measure the short-term changes they expect to impact. In these examples, that might include (1) the number of women correctly identified in the hospital as survivors of domestic abuse or sexual assault; (2) the number of women with full, complete, and secure evidence collected; (3) women's satisfaction with information and support received from the program; (4) women's increased knowledge of available resources post-intervention; (5) victims' perceptions of the effectiveness of the intervention in meeting their needs; and (6) hospital personnel's attitudes toward victims of domestic violence and sexual assault.

There are two critical points to make here:

Once you have decided which program outcomes you want to measure, you will need to choose or develop measuring instruments that are sensitive enough to detect whether desired changes have occurred. It is preferable to use a well-established instrument whenever possible to maximize the odds of detecting change and to increase confidence in your findings. We include in this chapter a number of instruments measuring victim safety and well-being over time, all of which are currently being used in research related to violence against women. You should not try to develop a new instrument specifically for your project unless you cannot find any existing instruments in the literature.

Measures of Short-Term Change

In order to measure short-term change, answers must be provided to questions such as:

Unlike constructs such as depression or social support, which can be measured by standardized instruments, short-term changes are generally assessed using questions you create yourself, such as:

While it is often important to ask open-ended questions, such as "What did you like?" and "What can we improve?" these types of questions should be asked in addition to, not instead of, more quantitative questions (closed-ended questions with forced options, such as those just presented). It is much easier to describe effects and to detect change with closed-ended questions, which assign a number to each answer and force the respondent to select one and only one answer. Thus, if you want to know what clients liked, ask specific questions and use specific answer categories. For example, you could ask:

For these questions, you could use answer categories such as: 1=did not like at all, 2=liked somewhat, 3=liked a lot. Or the categories could be 1=not at all satisfied/helpful, 2=somewhat satisfied/helpful, 3=very satisfied/helpful.

It is important to remember that the wording of questions influences the responses received. If, for example, a legal advocacy project is designed to help survivors make the best informed legal decisions they can for themselves, based on complete and accurate information, it would be inappropriate to ask women whether they did or did not participate in pressing charges against their assailants to determine program outcome, as pressing charges may not be the best option for all women. Rather, questions might be used such as the following:

Again, short-term change is generally measured by examining what the consumer received, how much she received, how effective she found the service, how satisfied she was with the service, and whether short-term, incremental change occurred.

Measures of Long-Term Change: Victim Safety and Well-Being

Although the majority of programs receiving STOP grant funding will not be in a position to evaluate longer-term outcomes, it is sometimes feasible and appropriate to measure whether victims' level of safety and/or quality of life improves over time as a result of an intervention. This level of evaluation generally requires additional time and financial resources, but when such an effort is warranted, there are a number of standardized instruments available from which to choose. The remainder of this chapter presents brief critiques of various measures previously used in research pertaining to violence against women. The first section describes eight instruments developed to measure physical, psychological, and/or sexual abuse. These measures can be used to examine whether such violence increases, decreases, or remains the same over time. The second section pertains to measures of well-being, including depression, post-traumatic stress, overall quality of life, and self-esteem. The last section describes instruments that measure correlates of well-being—access to community resources, stressful life events, and level of social support.

Criteria Used to Select Instruments

Numerous instruments have been developed that measure some aspect of safety and/or well-being, and not all could be included in this Guidebook. The instruments described in this chapter were chosen because they met the following criteria:

Each scale presented has demonstrated at least some degree of adequate reliability and validity as a field instrument; therefore we do not detail the properties of each instrument unless they are noted as a concern. 2 Sample items follow each scale description, as do references to articles where you can find detailed information about how each instrument behaves. Complete instruments can be ordered from the STOP TA Project (800 256-6883 or 202 265-0967 in the Washington, D.C. area) unless they are under copyright. Copyrighted instruments that must be obtained by the publisher or author are noted. 3

A Note of Caution

The majority of the instruments that follow were developed for research purposes. Their strength lies in their ability to characterize groups of people, and they should not be used by laypersons as individualized assessment instruments or as diagnostic tools for individual clients. For example, although the Sexual Experiences Survey classifies respondents into one of four categories (nonvictimized, sexually coerced, sexually abused, or sexually assaulted), such classifications are made based on large aggregated data sets. It would be problematic and unethical to inform a particular woman of her "classification," based on her responses to this survey. A woman who did not self-identify in the same manner as her classification on this measure could lose faith in the intervention program designed to assist her and/or could feel misunderstood or even revictimized. The following instruments should be used for descriptive purposes of the sample as a whole and to examine group, not individual, differences.

Measures of Victim Safety

The following measures of physical, psychological, and/or sexual violence are presented alphabetically in this section.

Measures of Physical Abuse by Intimates

Abusive Behavior Inventory [also measures psychological abuse] (Shepard & Campbell, 1992)
Conflict Tactics Scales (Revised) (Straus et al., 1996)
Danger Assessment (Campbell, 1986)
Index of Spouse Abuse [also measures psychological abuse] (Hudson & McIntosh, 1981)
Severity of Violence Against Women Scales (Marshall, 1992)

Measures of Psychological Abuse

Index of Psychological Abuse (Sullivan, Parisian, & Davidson, 1991)
Psychological Maltreatment of Women Inventory (Tolman, 1989)

Measure of Sexual Violence

Sexual Experiences Survey (Koss & Oros, 1982)


Intimate Physical Abuse

ABUSIVE BEHAVIOR INVENTORY - PARTNER FORM

Citation Shepard, M.F., & Campbell, J.A. (1992). The Abusive Behavior Inventory: A measure of psychological and physical abuse. Journal of Interpersonal Violence, 7(3), 291-305. Copyright © 1992 by Sage Publications, Inc. Reprinted by Permission of Sage Publications, Inc.
Description Drawing from both feminist theory and educational curriculum with batterers, the authors designed this 29-item measure of psychological (18 items) and physical (11 items) types of abuse. A self-report measure, it is simple to administer and would generally take no more than 5 minutes to complete. The scale's strengths are that it was found to successfully differentiate between abusers and non-abusers, and it was designed to tap power and control issues within the relationship (for example: "checked up on you," and "stopped you or tried to stop you from going to work or school"). Its weaknesses, conceded by the authors, are that (1) there is no attention to injuries or to medical attention needed, which could approximate severity of the violence; and (2) the reliability and validity of the measure were based on a sample of inpatient, chemically-dependent men and women.
Sample items: Copyright restrictions prohibit electronic distribution of scale contents.
     
Reference Petrik, N.D. (1994). The reduction of male abusiveness as a result of treatment: Reality or myth? Journal of Family Violence, 9(4), 307-316.

Intimate Physical Abuse

THE REVISED CONFLICT TACTICS SCALES (CTS2)

Citation Straus, M.A., Hamby, S.L., Boney-McCoy, S., & Sugarman, D.B. (1996). The Revised Conflict Tactics Scales (CTS2): Development and preliminary psychometric data. Journal of Family Issues, 17, 283-316. Copyright © 1996 by Sage Publications, Inc. Reprinted by Permission of Sage Publications, Inc.
Description The first instrument designed to measure conflict and violence between intimate partners was the original version of this scale (CTS1). The CTS1 has been both widely used and widely criticized. It was an 18-item scale of relationship conflict tactics, with the latter 10 items measuring violent strategies. Its strengths were that it was used successfully in many settings and with many populations, and that it was short and easy to administer. Its weaknesses included (1) measuring partner violence only within the context of conflict, while domestic violence is about power and control, (2) ignoring many common types of woman-battering, including symbolic gestures of violence as well as tactics of power, control, and intimidation, (3) rating some acts of violence as more severe than others outside of the context of the event (i.e., slapping is rated as "mild," although a hard slap can cause severe injury), (4) ignoring whether an act was committed in self-defense, and (5) exclufing injuries sustained or medical attention needed to approximate severity of the violence.

The revised CTS2 has added more items to include some additional types of conflict tactics, has added a section on injuries, and now has 39 items. The other weaknesses remain even with the CTS2. Both the CTS and CTS2 are still very good instruments to use because a great deal is known about their strengths and weaknesses. However, you may want to compensate for remaining weaknesses with additional questions. There is no standardized instrument in general use to help you in making these compensations, so this is one place where you may have to make up some questions of your own to cover the issues that the CTS2 omits.

Sample items: Copyright restrictions prohibit electronic distribution of scale contents.
     
Reference Straus., M.A. (1979). Measuring intrafamily conflict and violence: The Conflict Tactics (CT) Scales. Journal of Marriage and the Family, 75-88.

Straus, M.A., & Gelles, R.J. (1986). Societal change and change in family violence: Violence from 1975 to 1985 as revealed by two national surveys. Journal of Marriage and the Family, 48, 465-479.


Intimate Physical Abuse

THE DANGER ASSESSMENT

Citation Campbell, J.C. (1986). Nursing assessment for risk of homicide with battered women. Advances in Nursing Science, 8, 36-51. Copyright © 1981 by the National Council on Family Relations, 3989 Central Ave., NE, Suite 550, Minneapolis, MN55421. Used by permission. The instrument is available from Jacquelyn Campbell on request.
Description This 11-item assessment tool was created to assist women with abusive partners in assessing their danger of homicide. The author recommends this tool be used as part of a nursing assessment of domestic violence, and that nurses and patients complete the tool together. This instrument was created with the input of battered women, shelter workers, law enforcement officials, and other experts on battering. Due to the singularities of each woman's situation, however, the author stresses that no actual prediction of lethality be made based upon a woman's score. The score (summed affirmative responses) should be shared with the woman who has completed the Danger Assessment so she can determine her own risk.
Sample items: Response categories:

0=no
1=yes

1. Has the physical violence increased in frequency over the past year?
4. Is there a gun in the house?
7. Does he threaten to kill you and/or do you believe he is capable of killing you?

Reference McFarlane, J., & Parker, B. (1994). Preventing abuse during pregnancy: An assessment and intervention protocol. American Journal of Maternal/Child Nursing, 19, 321-324.

McFarlane, J., Parker, B., & Soeken, J. (1995). Abuse during pregnancy: Frequency, severity, perpetrator and risk factors of homicide. Public Health Nursing, 12(5), 284-289.


Intimate Physical Abuse

INDEX OF SPOUSE ABUSE (ISA)

Citation Hudson, W.W., & McIntosh, S.R. (1981). The assessment of spouse abuse: Two quantifiable dimensions. Journal of Marriage and the Family, 43, 873-888.
Description This 30-item self-report instrument takes about 5 minutes to administer, and measures both physical (15 items) and non-physical (15 items) types of intimate abuse. The primary drawback of this scale is related to one of its strengths. Because the authors note that some types of violence are more severe than other types, they have assigned weights to the items, which results in slightly more complexity in computing scale scores for respondents. However, Hudson & McIntosh (1981) describe, in simple terms, the computation required to obtain the two scores.

The 15 items measuring non-physical abuse are quite inclusive of numerous psychologically and emotionally abusive behaviors. The "physical abuse" items, however, do not include such behaviors as kicking, restraining, burning, choking, pushing, or shoving. Women who have experienced these types of abuse without experiencing punches would receive artificially minimized scores on this measure. Further, 7 of the 15 "physical abuse" items only imply physical abuse. An example of such an item is: 'My partner becomes abusive when he drinks.' An additional drawback is that two items refer to abuse occurring when the perpetrator has been drinking, which could also artificially minimize the scores of women whose abusers do not drink. To compensate for aspects of domestic violence that this scale omits, you might want to make up some questions of your own.

Sample items:
Response categories:

1=never
2=rarely
3=occasionally
4=frequently
5=very frequently

[physical abuse]
7. My partner punches me with his fists.

[non-physical abuse]
1. My partner belittles me.

Reference Campbell, D.W., Campbell, J., King, C., Parker, B., & Ryan, J. (1994). The reliability and factor structure of the Index of Spouse Abuse with African-American women. Violence and Victims, 9(3), 259-274.

Intimate Physical Abuse

SEVERITY OF VIOLENCE AGAINST WOMEN SCALES

Citation Marshall, L.L. (1992). Development of the Severity of Violence Against Women Scales. Journal of Family Violence, 7(2), 103-121. Copyright © 1992 by Plenum Publishing Corp. Used with permission.
Description This 46-item instrument was specifically created to measure "threatened, attempted, and completed behaviors likely to cause injury or pain" (Marshall, 1992: 105). The nine dimensions of violence measured by this scale are: symbolic violence; threats of mild, moderate, and serious violence; acts of mild, minor, moderate, and severe violence; and sexual violence. A strength of this scale is that it captures symbolic violence—behaviors often used by perpetrators to frighten and intimidate women. Weaknesses of this instrument are its weighting system, and therefore the categories of threats and violence that it produces. Items were weighted for severity based on ratings provided by samples of women who were not necessarily abused themselves. For example, the item "held her down, pinning her in place" was rated as "mild," and the item "bit her" was rated as "minor." This illustrates the difficulty of rating behaviors out of context, as these acts can of course also be very serious. However, the 46 empirically derived items are excellent examples of intimate male violence against women and can be used without the weighting system discussed in Marshall (1992).
Sample items:
Response categories (referring to the prior 12 months):

1=never
2=once
3=a few times
4=many times

[symbolic violence]
1. Hit or kicked a wall, door or furniture.

[threats]
7. Shook a fist at you.

[physical violence]
35. Choked you.

[sexual violence]
41. Demanded sex whether you wanted to or not.

Reference Vitanza, S., Vogel, L.C.M., & Marshall, L.L. (1995). Distress and symptoms of posttraumatic stress disorder in abused women. Violence and Victims, 10(1), 23-34.

Psychological Abuse

INDEX OF PSYCHOLOGICAL ABUSE

Citation Sullivan, C.M., Parisian, J.A., & Davidson, W.S. (1991). Index of Psychological Abuse: Development of a measure. Presented at the 99th annual convention of the American Psychological Association, San Francisco, CA.
Description This 33-item instrument was designed to measure the common types of psychological abuse reported by battered women: criticism, ridicule, isolation, withdrawal, and control. It is easy to administer and generally takes less than 5 minutes to complete. This measure was originally developed on two samples: women who were exiting a domestic violence shelter, and dating college students. It has since been validated with a Korean sample of abused and non-abused women (Morash et al., in preparation).
Sample items:
Response categories:

1=never
2=rarely
3=sometimes
4=often
8=not applicable (i.e., no children, no pets)

2. Accused you of having or wanting other sexual relationship(s).
7. Tried to control your activites.
20. Criticized your intelligence.
Reference Morash, M., Hoffman, V., Lee, Y.H., & Shim, Y.H. (in preparation). Wife abuse in South Korea.

Sullivan, C.M., Tan, C., Basta, J., Rumptz, M., & Davidson W.S. (1992). An advocacy intervention program for women with abusive partners: Initial evaluation. American Journal of Community Psychology, 20(3), 309-332.