| United States General Accounting Office |
| GAO | Program Evaluation and Methodology Division |
| November 1990 | Case Study Evaluations |
Preface
GAO assists congressional decisionmakers in their deliberative process by furnishing
analytical information on issues and options under consideration. Many diverse
methodologies are needed to develop sound and timely answers to the questions that are
posed by the Congress. To provide GAO evaluators with basic information about the more
commonly used methodologies, GAO's policy guidance includes documents such as methodology
transfer papers and technical guidelines.
This methodology transfer paper on case study evaluations describes how GAO evaluators could use case study methods in performing our work. It describes six applications of case study methods, including the purposes and pitfalls of each, and explains similarities and differences among the six. This paper presents an evaluation perspective on case studies, defines them, and determines their appropriateness in terms of the type of evaluation question posed. The original report was authored by Lois-ellin Datta in April 1987. This reissued (1990) version supersedes the earlier edition.
Case Study Evaluations is one of a series of papers issued by the Program
Evaluation and Methodology Division (PEMD). The purpose of the series is to provide GAO
evaluators with guides to various aspects of audit and evaluation methodology, to
illustrate applications, and to indicate where more detailed information is available.
We look forward to receiving comments from the readers of this paper. They should be
addressed to Eleanor Chelimsky at 202-276-1854.
Werner Grosshans
Assistant Comptroller General
Office of Policy
Eleanor Chelimsky Assistant Comptroller General for Program Evaluation and Methodology
Contents
Preface
Chapter 1
Introduction
Chapter 2
What Are Case Studies?
What Is Meant by "A Case Study"?
Some Common Benefits Expected From Case Study Evaluations
Instance Selection in Case Studies
Chapter 3
Case Study Applications
Illustrative
Exploratory
Critical Instance
Program Implementation
Program Effects
Cumulative
Design Decisions and Case Study Applications
Chapter 4
Data Collection and Analysis
Data Collection
Data Analysis
Handling Multisite Data Sets
Basic Models for Data Analysis
Pitfalls and Booby Traps
Where to Go for More Information
Chapter 5
Summary
What Are Case Studies?
When Are Case Studies Appropriately Used in Evaluation?
What Distinguishes a Good From a Not-Good Case Study?
Impartiality and Generalizability
Appendixes
Appendix I: Theory and History
Appendix II: Site Selection Example
Appendix III: Guidelines for Reviewing Case Study Reports
Bibliography
Glossary
Papers in This Series
Tables
Table 2.1: What Is a Case Study? Exercise
Table 2.2: Complexity of Questions
Table 2.3: Methods of Obtaining Description and Analysis in Case Studies
Table 2.4: Some Common Benefits Expected From Case Study Evaluations
Table 2.5: Instance Selection in Case Studies
Table 2.6: Hypothetical Data on Instance Selection
Table 3.1: Illustrative Case Studies
Table 3.2: Exploratory Case Studies
Table 3.3: Critical Instance Case Studies
Table 3.4: Program Implementation Case Studies
Table 3.5: Illustration of Differences in Note-Taking
Table 3.6: Program Effects Case Studies
Table 3.7: Cumulative Case Studies
Table 3.8: Some Design Decisions in Case Study Methods
Table 4.1: Ways of Analyzing Case Study Data
Table 5.1: Some Common Pitfalls in Case Study Evaluation
Table 1.1: Criteria of Good Research
Table I.2: Evaluation Adaptations of the Research Case Study
Table II.I: Hypothetical Data on Unfiled Corporate Income Tax Returns for 1986
State Income Tax Returns
Table III. 1: Checklist for Reviewing Case Study Reports
Abbreviations
GAO General Accounting Office
OTTR Observe, think, test, and revise
PEMD Program Evaluation and Methodology Division
SSA Social Security Administration
Chapter 1
Introduction
At his government-required anti-terrorist training session recently, a captain for a major
airline said,
"The bits of information were so few and far between that people weren't even paying attention. My instructor for the eight-hour course entered the room only to change videotapes. People were talking; they were doing other things, including reading the paper." (Philadelphia Inquirer, 1986)
This is a case instance. It is an effective way of drawing attention to a problem such as training quality. Such anecdotes are remembered and they are convincing. What they are not, however, is generalizable: that is, an anecdote doesn't tell whether it is the only such instance or whether the problem is wide-spread. And anecdotes usually don't show the reasons for a situation, and thus are of limited value in suggesting solutions.
The challenge for evaluators is how to use those aspects of an anecdote that are effective for our work-the immediacy, the convincingness, the attention-getting quality-and, at the same time, fulfill other informational requirements for our jobs, such as generalizability and reliability. Case study methods, while not without their limitations in this regard, can help us answer this challenge.
GAO already does a lot of case studies-or at least, what we ourselves call case studies in describing our methods. There are GAO case studies in many areas-urban housing, weapon systems testing, community development, military procurement contracts, influences on the Brazilian export-import balances, how programs aimed at improving water quality are working, and the implementation of block grants-to name only a few.
Most of these case studies are either "illustrative" or "critical"
instance applications. The first type of application illustrates findings established by
other techniques, supplementing, for example, national findings on clean air from
administrative records
and other sources, with in-depth description on how funds have been used and with what
results in selected cities. The second type of application is in-depth analysis of a case
of unique interest, such as whether funds have been awarded and managed properly in a
specific community health center or if a certain former government official had done
anything improper before or after leaving the government. There are, however, four other
applications of case studies that are less often used at present but that could be
appropriate for our jobs. In brief, the six types of case study, which we examine in
chapter 3, are as follows:
Case Study Evaluations is a review of methodological issues involved in using case study evaluations. It is not a detailed guide to case study design. It does, however, explain the similarities and differences among the six kinds of case study and discusses ideas for successfully designing them. It also gives guidance to the manager who, in reviewing completed case studies, wants to assess their strengths. Finally, it presents an evaluation perspective on case studies, defining them and determining their appropriateness in terms of the type of evaluation question posed.
The methods and types of case studies outlined here are not definitive. The case study as a research method has evolved over many years of experience but evaluative use of the method has been more limited. Indeed, the history of the case study as an evaluation method is little older than a decade. Therefore, discussion of some of the applications described here is based on relatively extensive field experience (with questions in such domains as justice, education, welfare, environment, housing, and foreign aid), while the discussion of some of the other applications is based on more constrained experience.
We have paid particular attention to the conventional wisdom that case studies are
always subjective and nongeneralizable. In many uses of case studies, there is no need to
generalize. Nonetheless, we find that there are steps that can be taken to generalize from
case studies when this is desired. However, we did not devote any particular emphasis to
the popular idea that case studies are inexpensive to conduct (issues of research
management common to all designs were outside the scope of our work). However, one thing
that should emerge quite clearly from the discussion of design features intrinsic to the
case study is that it can be a rather costly endeavor, given the time required, the rich
in-depth nature of the information sought, and the need to achieve credibility. This
reinforces the importance of weighing carefully the decisions to employ the case study
method in program evaluation.
In this paper, we have taken positions on many issues, expecting to revise these as
experience accumulates and as we receive reactions from evaluators and researchers. This
paper is intended to transfer what we believe to be good practice in case studies and to
help establish the principles of applying case studies to evaluation. Thus, while the
document offers preliminary guidance, it is also a point of departure. For example, we are
developing the variation that we call the "cumulative" case study. It can entail
prospective and retrospective designs and it permits synthesis of many individual case
studies undertaken at different times and in different sites.
The quality of case studies can be variable. Some score high on reasonable tests of quality; others have lower scores. Three problems often encountered have to do with matching the question the evaluator set out to answer and the method for selecting the instances examined, reporting the basis for selecting the instances, and integrating findings across several instances when the findings in one were inconsistent with those in another.
The next sections of this paper will first present some new ways of thinking about a
familiar method, the case study, and then introduce the six applications, describing what
is required, in terms of methodology, to get the benefits case studies can offer. In the
last chapter, we turn to two basic questions: What do we need to take into account with
regard to the objectivity of case studies and their generalizability?
Chapter 2
What Are Case Studies?
Almost everyone in GAO probably has worked on a case study at one time or another yet may
be unfamiliar with what is meant, methodologically, by a case study. The methodological
meaning is important in understanding what differentiates a case study from a noncase
study and a good case study from a not-so-good case study.
What is a case study? The exercise in table 2.1 describes a job we might be asked to do and a design for it and asks you to decide whether or not this is a case study. Take about 10 minutes to think through this example and write out your answer. It is important that you try this out yourself, so please do it before continuing.
Table 2.1: What Is a Case Study? Exercise
| Item | Writing assignment |
| Exercise | Suppose GAO has been asked whether the informed consent requirements for experimentation with human subjects are being properly implemented. Suppose further that we visit three sites where humans are used as subjects for research-a hospital, a university, and a clinic-and that we review the informed consent procedures at each site. |
| Question 1 | Is this an application of the case study method? Why? |
| Question 2 | If not, would case studies be appropriate for answering the question we were asked? Why? |
| Question 3 | What is your definition of "case study"? |
The answers some GAO evaluators gave may illustrate the range of definitions surrounding
case study methods.
To some GAO evaluators, the instance was an application- of the case study
method, because we were looking at only a few sites or because we could not
generalize or because "actual subjects are being used for analysis of a specific
question." To some, the instance was clearly not an application of the case study
method, because "we do not know if the instances are representative of the
universe," and "there doesn't appear to be enough done at each site." To
still others, it was not possible to tell whether this was a case study because looking at
instances was what we do in all our methods, and there was no differentiation between this
job and a compliance audit.
The definitions given also varied greatly. To one person, a case study involves looking
at individual people. To another, a case study examines a clearly defined site and reports
on that one site, so that multiple site studies would not be case studies. To another,
case studies involve getting a great deal of information about a single site or
circumstance, when generalizability isn't important. To others, "a random sample is
necessary for a case study," "case studies are nonnormative research that
investigate a situation without prejudice," "where we could look at a limited
number of cases that would represent the universe overall," and "a review of
relevant conditions in a specific environment with no attempt to project to a larger
universe." There were almost as many definitions as people, and few of them had
elements in common. While exact uniformity isn't expected or perhaps even possible when
people are asked to recall a definition, the extreme variability illustrates that we could
be talking about very different things in a proposal or report when we discuss case study
methods. Thus a decision to "do case studies" could lead to the collection of
irreconcilably dissimilar information from groups working on the same job.
What Is Meant by "A Case Study"?
We have developed a definition of case studies that leads to appropriate uses and says
something about how a good case study is conducted. It is somewhat technical, so we turn
next to giving this definition and to discussing each of its elements.
"A case study is a method for learning about a complex instance, based on a comprehensive understanding of that instance obtained by extensive description and analysis of that instance taken as a whole and in its context."
For example, if we were asked to study what caused the Three Mile Island disaster and scoped the job to describe whether required safeguards were complied with, this would not be a case study. If, however, we scoped the job to examine in depth events leading up to the disaster, what went wrong, and why it went wrong, this would be a case study. For a second example, if we were asked to study the safety of nuclear plants in general, we might select as our method a survey of self-reported compliance with safeguards in all existing plants. This would not be a case study. If, however, we scoped the job to examine in depth recent problems in appropriately selected nuclear plants including among others Three Mile Island, seeking to understand why the safeguards either were not complied with or were not sufficient, then we would have selected the case study method to answer the question.
As we will discuss later, several methods can be used in one job; these examples are only intended to highlight what is not, and what is, a case study. Examining the elements of the definition also may help make this distinction clear.
"A complex instance" means that input and output cannot be readily or very
accurately related. There are several reasons why such a relationship might be difficult.
There could be many influences on what is happening and these influences could interact in
nonlinear ways such that a unit of change in the input can be associated with quite
different changes in the output, sometimes increasing it, sometimes decreasing it, and
sometimes having no discernible effect.
Table 2.2 gives an example of a less and a more complex instance. "Are U.S. airports
following required U.S. and international security procedures for passengers?" is a
less complex question because the criterion is fairly clear, the focus is narrow, the
influences on compliance are likely to be relatively few, and the relation of input and
output is likely to be fairly direct. Staff knowledge of procedures ought to play some
role in following these procedures, for instance.
Some questions are more complex, however, such as the question: "Are security procedures in U.S. airports sufficient to protect the safety of passengers and equipment?" This is more complex because the criterion of "sufficient protection" is much less certain; the focus is broader; the influences on actual achievement of sufficient procedures are likely to be many; and the relation of input and output is not only likely to be both direct and indirect but also difficult to measure.
The second key element in our definition is "a comprehensive understanding."
Here the situation is more straightforward. This means that the goal of a case study is to
obtain as complete a picture as possible of what is going on in an instance, and why.
The third key element, "obtained by extensive description and analysis," has
three components. These are summarized in table 2.3. Case studies involve what
methodologists call "thick" descriptions: rich, full information that should
come from multiple data sources, particularly from firsthand observations. The analysis
also is extensive, and the method compares information from different types of data
sources through a technique called "triangulation." That is, reliability of the
findings is developed through the multiple data sources within each type. This is
akin to corroboration as discussed in the General Policy Manual, chapter 8.0. The
validity of the findings, particularly validity with regard to cause and effect, is
derived from agreement among the types of data sources, together with the systematic
ruling out of alternative explanations and the explanation of "outlier" results.
Examining consistency of evidence across different types of data sources is akin to
verification. There are specialized strategies for making these comparisons-namely,
pattern matching, explanation building, and thematic review. The technical how-toe for
these three strategies will be summarized later in this paper. They involve techniques
such as graphic data displays, tabulations of event frequencies, and chronological or time
series orderings. Generally, data collection and analysis are concurrent and
interactive-that is, "yoked" in case study methods.
Table 2.2: Complexity of Questions
| Example | Characteristics |
| A less complex question "Are U.S. airports following required U.S. and international security procedures for passengers?" |
Criterion is fairly clear: "required U.S.
and international security procedures" Focus is narrow: "passengers" Influences on compliance are likely to be relatively few: staff knowledge of procedures and training in implementation equipment, number of staff compared to workflow, degree of supervision, staff screening and selection Relation of input (influences on compliance) to output (that required security procedures are followed) is fairly direct. |
| A more complex question "Are security procedures in U.S. airports sufficient to protect the safety of passengers and equipment?" |
Criterion is less clear: what would be
sufficient under present conditions and with existing and possible technologies? Focus is broader: passengers and equipment (although still fairly well specified) Influences on achievement of sufficient procedures likely to be many, including the state of the art of detection technologies, number and militancy of potential threats to security, and the willingness of passengers, airline personnel, and airport personnel to accept different costs and forms of protection Relations of input (influences on security) and output (safety) likely to be difficult to measure and to be both indirect and direct. |
Table 2.3: Methods of Obtaining Description and Analysis in Case Studies a
| Technique | Methodology |
| Extensive or "thick" analysis | Analysis of multiple types of data sources such
as --interviews with all relevant persons |
| Analysis via triangulation of data | Analysis through --Pattern matching |
| Comparison of evidence for consistency | Analysis through techniques such as --Matrix
of categories |
a Different types of evidence and standards for them are discussed in General
Policy Manual, chapter 8.0
The next element of the definition is "taken as a whole." As this list
indicates, the size of the instance can be as small as one individual or as large as a
nation. The instance as a whole can be
| 1 These instances have been the subject of case studies. (See U.S. General Accounting Office, February 22, 1984, and Allison, 1971.) Others are general illustrations. |
One example of a GAO case study that examines an individual is our examination of whether or not a senior official behaved improperly with regard to influence and accepting money before and since leaving the White House (U.S. General Accounting Office, July 11, 1986). Another example would be a request to examine in detail ax-President Marcos' use of funds intended by the United States for military or civilian purposes for his personal benefit. At the other extreme, an instance may be as large as an event, such as the Cuban missile crisis (Allison, 1971) and the swine flu vaccine (Neustadt and Fineberg, 1978), which have been the subjects of two well-known case studies, or the Challenger tragedy. It can be a region (Chesapeake Bay water cleanup programs), a nation (democracy in the Philippines), or an organization (UNESCO). Moreover, it is possible to have questions that require nested case studies. For example, to answer a question about how programs to serve handicapped children are working, we might select the cases of preschool and elementary programs; we might further select within preschool programs, those for the hearing impaired and those for the orthopedically impaired. Each of these nested studies is treated, in terms of specification of the unit of study and collection of data appropriate to it, as any other case study would be.
The last key element of the definition is "and in its context." Context means
all factors that could affect what is happening in an instance. As an example, in the
Challenger tragedy, inquiry began with trying to locate the technology that failed as the
reason for the explosion. The right-hand booster rocket was identified as the source of
the explosion and, within the rocket, technological attention focused on the O-rings. The
inquiry expanded very quickly, however, from asking what technology failed to an
examination of contextual influences, such as
That is, the Challenger inquiry could be seen as similar to a case study in some ways. The rapid spread of inquiry from an examination of the technology to an investigation of decisionmaking on that flight, to inquiry about NASA management as it affected the Challenger disaster generally, is what "taking the context into account" means. In case study methods, to understand what happened and why, context always is considered, and it is this consideration that gives the case study its strength as a way of understanding cause and effect.
Some Common Benefits Expected From Case Study Evaluations
Doing a good case study is more than just looking at what is happening in a few instances.
It is a special systematic way of looking at what is happening, of selecting the
instances, collecting the data, analyzing the information, and reporting the results.
There are nine features of case study evaluations that merit special discussion. Each
of these features-if carried out-confers certain benefits in terms of the product. Two of
the features relate to design, three to data collection, three to analysis, and one to
reporting. These features and their benefits are shown in table 2.4. For example, with
regard to design, information over time-the longitudinal feature of the design-provides
assurance that the final product represents what is happening and not an atypical
situation.
Table 2.4: Some Common Benefits Expected From Case Study Evaluations
| Study feature | Benefits expected | |
| Design | ||
| Longitudinal | Assurance that a short-term situation that may be unrepresentative of what is happening isn't inflated in importance | |
| Triangulation | Assurance that reasons given for events properly reflect influences from many different sources | |
| Purposive instance | Ability to match questions asked and later generalization of findings at level appropriate to the questions | |
| Data Collection | ||
| Comprehensive | Assurance that important conditions, consequences, and reasons for these have not been overlooked | |
| Flexible | Broader perspectives, increased assurance that what is important on the scene rather than centrally will be examined | |
| Multiple data sources | Assurance that a full picture will be obtained and that bias associated with self-protection or self-interests will be reduced | |
| Analysis | ||
| "Yoked" or concurrent with data collections | Assurance of the ability to collect data needed to test alternative interpretations and to make rapid adjustments in design | |
| Search for disproving - proving evidence | Assurance that alternative interpretations have been thoroughly searched for and checked; thorough identification of instances that don't fit the general pattern; and, often, understanding of the reasons for the outliers | |
| Chain-of-evidence and pattern matching techniques | Permit fairly direct assessment of how convincingly the evidence of conclusions are related | |
| Reporting |
||
| Actual instances | Assurance of authenticity through persuasiveness and ease of recall; use of the tendency to generalize from personal experience but via the substitution of more objective experience for anecdotes of unknown credibility | |
These features are the price of admission to the expected benefits. One frequent question about case study methods is how rigorously these features have to be followed. Obviously, the more closely the requirements are followed, the more benefits can be expected. It is a judgment call as to how much the features can be compromised before the "case study" becomes a site visit or turns into a survey. Probably the most critical features are appropriate instance selection, triangulation, and the search for disproving evidence. And of these three, probably the most critical is appropriate instance selection.
Instance Selection in Case Studies
There are three general bases for selecting instances: convenience, purpose, and
probability. Each has its function and can be used to answer certain questions. A good
case study will use a basis for instance selection that is appropriate for the question to
be answered. Using the wrong basis for selecting an instance is a fatal error in case
study designs, as in all designs. Such a case study is a not-good case study, and it is
irredeemably flawed despite any methodological virtues it may have in terms of data
collection, analysis, and reporting.
Table 2.5 summarizes the three general bases for selecting instances and the questions each basis can answer. Of particular interest may be the seven varieties of purposive site selection: bracketing, best cases, worst cases, cluster, representative, typical, and special interest.
Instance selection is crucial to generalizability and to answering the evaluation
questions appropriately. Only rarely will convenience be a sound basis for instance
selection; only rarely will probability sampling be feasible. Thus, instance selection on
the basis of the purpose of the study is the most appropriate method in many designs.
Table 2.5: Instance Selection in Case Studies
| Selection basis | When to use and what questions it can answer | |
| Convenience | "In this site, selected because it was expedient for data collection purposes, what is happening, and why" | |
| Purpose | ||
| Bracketing | "What is happening at extremes? What explains such differences" | |
| Best cases | "What accounts for an effective program?" | |
| Worst cases | "Why isn't the program working?" | |
| Cluster | "How do different types of programs compare with each other?" | |
| Representative | "In instances chosen to represent important variations, what is the program like and why?" | |
| Typical | "In a typical site, what s happening and why?" | |
| Special Interest | "In this particular circumstance, what is happening and why?" | |
| Probability | "What is happening in the program as a whole, and why?" | |
The match between the question asked and the method of purposive sampling chosen can be
tricky. For example, studies that attain "representativeness" by conducting a
few case studies in a rural setting, a few in a suburban setting, and a few in an urban
setting will produce a report in which the three settings receive more or less equal
weight. If, however, 90 percent of the clients or sites for the program are rural, such
"representativeness" may appropriately capture the range of site experiences but
be rather unrepresentative of the program as a whole, and care will be needed to
generalize only to the range of settings and not to the program as a whole.
Table 2.6: Hypothetical Data on Instance Selection
| Location | Operated by | Number of beds | Clientele served | Years in Operation | Funded by | Costs a | Problems b |
| 1. San Diego, CA | CAIM, Inc. | 800 | Men and boys | 2 | INS | 25 | 4 |
| 2. Amarillo, TX | CAIM, Inc. | 130 | Men and boys | 1 | INS | 30 | 4 |
| 3. El Paso, TX | PIC | 75 | Families | 3 | INS | 15 | 7 |
| 4. El Paso, TX | CAIM, Inc. | 350 | Men and boys | 1 | BOP/INS | 60 | 7 |
| 5. Miami, FL | Security | 100 | Men and boys | 1 | BOP/INS | 150 | 15 |
| 6. Clearwater, FL | CAIM, Inc. | 300 | Men and boys | 5 | BOP/INS | 100 | 10 |
| 7. Pensacola, FL | Security | 100 | Families | 5 | INS/State | 70 | 6 |
| 8. Denver, FL | PIC | 100 | Families | 3 | INS/State | 20 | 3 |
| 9. Salida, CA | Security | 200 | Men and boys | 4 | INS | 70 | 9 |
| 10. Salinas, CA | CAIM, Inc. | 100 | Men and boys | 2 | INS | 30 | 3 |
| 11. Los Angeles, CA | Security | 300 | Men and boys | 3 | INS | 75 | 5 |
| 12. San Francisco, CA | Security | 250 | Men and boys | 3 | INS/State | 70 | 7 |
| 13. San Francisco, CA | PIC | 100 | Men and boys | 3 | INS | 25 | 4 |
| 14. New York, NY | ARIVA, Inc. | 100 | Men and boys | 2 | INS | 55 | 6 |
| 15. Washington, DC | ARIVA, Inc. | 300 | Families | 2 | INS | 85 | 5 |
| 16. Seattle, WA | Security | 100 | Men and boys | 3 | INS/State | 60 | 7 |
To illustrate what each variety means, and how it might be operationalized, consider the information in table 2.6. This gives hypothetical data about a real situation in designing a study-selecting instances (in this study, sites or locations) for an assessment of the costs and operations of federal detention facilities managed by private contractors under OMB Circular A-76. There are not many such facilities-so the 16 hypothetical facilities represent what we might actually find in such a study. The following paragraphs describe what a sample would look like if it were chosen according to the bases in table 2.6.
Convenience Samples
If our location were the Denver Regional Office, a convenience sample would be sites 8
(Denver) and 9 (Salida). That is, ease of collecting data and minimizing resources
required would have driven our choice.
Purposive Sample
Bracketing
If our interests were extreme costs, numbers 3 (El Paso, at $15 per person day) and 5
(Miami, at $150 per person day) would bracket the cost extremes. If we wanted the three
least expensive and the three most expensive, we could select 3 (El Paso), 8 (Denver, at
$20), and 13 (San Francisco, at $25) in comparison to 5 (Miami, at $150), 6 (Clearwater,
at $100), and 15 (Washington, D.C., at $85). Such an addition would also give us a better
basis for analysis because it includes not only high-cost and low-cost sites but also
services to men and boys and to families, a difference that in itself might be expected to
lead to cost variations.
Best Cases
If our interests were in operating centers with the least problems, we might examine
numbers 8
(Denver, 3 percent) and 10 (Salines, 3 percent). Since both are in Colorado (although
operated by different firms and serving different groups), we might want to add sites.
Such an addition could show whether we were looking at something about Colorado rather
than about low-problem centers. We could do this by selecting 1 (San Diego, 4 percent), 2
(Amarillo, 4 percent), and 13 (San Francisco, 14 percent).
Worst Cases
Sites 5 (Miami, 15 percent problems) and 6 (Clearwater, 10 percent) stand out as worst
cases.
Selecting an out-of-state comparison, if we wanted it, is harder here. The next highest
problem rate (9, Salida, at 9 percent) is run by a different company and costs much less.
Security has a site in San Francisco, for men and boys, which costs $70 daily with a
7-percent problem rate. The costs of site 15(Washington, D.C.) are higher, but this site
serves families and has a low problem rate. The best choice probably is 12 (San
Francisco): it serves the same group (men and boys) and is run by the same company
(Security).
Cluster
We might be interested in administrative arrangements-in, for example, how administration
works
out when INS alone is the contractor, when responsibility is shared with another federal
agency
(Bureau of Prisons), and when responsibility is shared with the state. One cluster of
sites (1, 2, 3, 8,
9, 10, 11, 13, 14, and 15) is administered by INS alone. Another cluster (4, 5, and 6) is
shared
between BOP and INS, and the last cluster (7, 8, 12, and 16) is run by INS and the state.
We could pick one or two sites from each cluster to get a sense of how agency auspices may
affect program operations.
Representative
One issue we might need to examine could be efficiencies of operation-particularly in
terms of
facility size. Here we might select numbers 1 (San Diego, 800 beds), 6 (Clearwater, 300
beds), and 10 (Salines, 100 beds). All are run by CAIM, and all serve men and boys. We
would have to limit our generalizations to facilities for men and boys, but these three
sites should give a good sense of the size and operations issue.
Typical
This would be a challenge. In terms of size, there is a "typical" bed size (100
beds); in terms of people served, there is a "typical" population (men and
boys); and in terms of years of operation, 3 years is "typical," with 2 years a
close runner-up. In terms of costs, however, the distribution is trimodal-that is, three
values appear about equally often-and for percent of problems, it is almost flat with two
outliers. Also, there is not a single site that matches all three "typical"
characteristics well. Miami, for example, has 100 beds and serves men and boys, but it has
been in operation only 1 year, costs $150 per person per day, and has a 15-percent problem
rate. The best approach would be to indicate that it is not possible to pick one site that
is "typical" of such distributions.
Special Interest
Any one of the 16 sites might be examined as a result of special congressional
interest. Such
interest usually would be based on information extraneous to the data in the table: a
complaint
might be received, for example, about conditions in the San Diego site, or allegations
might be made
that the high costs of the Miami site were due to mismanagement.
Probability Samples
Probabilistic sampling is the method of choice for answering questions about "how
much," or how
extensive a problem is in a population. Properly carried out, it provides strong
generalizability and
assurance of representativeness. A probability sample is one in which all members of the
population have a known and equal chance of being selected. If we used a table of random
numbers, and selected as the first two sites those corresponding to the first two numbers
between 1 and 16 in the table, we would have selected a probability sample. Each site
would have a 1-in-16 chance of selection, and that chance would be equal among sites. A
fair objection to this statement is that the laws of probability operate on large numbers,
and selecting fewer than 30 instances does not always provide the generalizability to the
population as a whole that probability samples promise. However, in terms of actual
operations, which we want to illustrate here, the method just sketched is a probabilistic
one, and some case studies have involved 30 or more sites selected on a probabilistic
basis. (See PEMD's transfer paper entitled Using Statistical Sampling (U.S. General
Accounting Office, May 15, 1986) for more information.)
For readers who want to check out their skills in applying different types of purposive selection, appendix II gives information for a job involving the 50 states (a fairly common situation for GAO), a form for indicating which you would select for each of the seven kinds of purposive selection, and our answers, for comparison against yours.
In many jobs, what is a "case" and what dimensions are important to consider
in selection will be clear. For example, the population of detention facilities supported
by INS contracts can be defined legally (by the contract awarded), and the relevant
dimensions (length of time in operation, facility size, detainee mix) are straightforward.
There are, however, more problematic circumstances. An example would be a study of the
extent to which voluntary organizations have taken up any slack in welfare supports. What
is a voluntary organization can be defined broadly, as "any nonprofit
organization," or narrowly, as "a service-oriented group whose members do not
receive payment for their work."
Dimensions of potential relevance for the outcome of interest are many, and the empirical
basis for selecting any one dimension over others few. In such situations, the evaluator
can turn to past experience, a search of the appropriate theoretical as well as empirical
literature, the advice of knowledgeable persons, an examination of key issues in proposed
or pending legislation, customer guidance, and similar techniques. That is, while it is
important to recognize the difficulties, there are ways of dealing with them in case
definition.
Chapter 3
Case Study Applications
As noted earlier, there are six types of applications for case study methods-illustrative,
exploratory, critical instance, implementation, program effects, and cumulative. But case
study reports commonly use only two of the six applications: illustrative and critical
instance. Greater use could be made of the four others in selecting alternative ways of
answering questions, because these may be able to give information that is more valuable
to customers than other techniques. Also, improvements can always be made in how even the
two approaches already used frequently are carried out, especially in the area of
selecting instances for study. The next sections summarize, for each of the six types, the
evaluation questions they can answer, the functions they perform, their design features,
and their pitfalls. The last section shows what basis for selecting sites is appropriate
for each of the six applications.
Illustrative
As table 3.1 indicates, illustrative case studies primarily describe what is happening and
why, in one or two instances, to show what a situation is like. This can help in the
interpretation of other data, particularly if we have reason to believe most readers know
too little about a program or situation to understand fully the information from surveys
or other methods.
Table 3.1: Illustrative Case Studies
| Aspect examined | Characteristics |
| Evaluation questions | Help interpret other data when there is reason to believe that readers know too little about a program; descriptive, often used in conjunction with other methods. |
| Functions | Make the unfamiliar familiar; provide surrogate experience; avoid over-simplification of reality; and give reader a common language about the topic. |
| Design features | Site selected as typical or representative of important variations; small number of cases to keep reader's interest; data often include visual evidence; analysis concerned with data quality and meaning; and reports use self-contained, separate narratives or descriptions. |
| Pitfalls | May be difficult to hold reader's interest while presenting in-depth information on each illustration; may not adequately represent situations where considerable diversity exists (in such situations it may be impossible to represent variety well enough to use illustrative case studies); and may not have time on-site for in-depth examination |
GAO has many examples of such illustrative use. In 1982, for instance, CED examined
housing block grants through a survey supplemented by case studies. The results of the
survey were published in the main report (U.S. General Accounting Office, December 13,
1982). For three of the sites (Pittsburgh, Seattle, and Dallas), individual reports
described what each city was like with regard to housing and housing-related activities
and how the money was used in that city and included before-and-after pictures of what
rehabilitation meant for individual neighborhoods and houses (U.S. General Accounting
Office, March 24, 1982; March 30, 1982; April 30, 1982). In a similar application, HRD
described the projects funded under the Emergency
Job Appropriations Act of 1983 in communities in Texas, Alabama, California, Georgia, and
Massachusetts (U.S. General Accounting Office, March 26, 1985; August 27, 1985; September
25, 1985; December 6, 1985).
Illustrative case studies are used by evaluators in other agencies. When the Department of Health and Human Services was trying out delivery of Head Start services to parents and children in their own homes, called Home Start, the Department supplemented a formal assessment of the development of the children before and after the program with case studies (High/Scope Educational Research Foundation, 1972). These case studies described what services were delivered, the conditions in rural as well as urban areas, and what the Home Start teachers did during the home visits and generally provided a surrogate or vicarious experience for readers who might never have visited a Head Start or a Home Start center. The case studies told, too, of the development of the program over time and helped give a realistic sense of problems in start-up and implementation, how changes in staffing were accommodated, and the impact of shifting federal guidance on efforts to carry out the program in the field.
Case studies such as these are well accepted as a valid way of amplifying a more systematic presentation via the realism and vividness of anecdotal information. There are, however, pitfalls in presenting illustrative case studies. The most serious is selecting the instances. The case or cases must adequately represent the situation or program. This is relatively easy if the program is small and homogeneous. Where considerable diversity exists, it may not be possible to select a "typical" site, and the diversity may be so great that to represent it adequately would require more case studies than most people would want to read for illustrative purposes. In the example of privately operated detention facilities, an illustrative case study might run the risk of oversimplifying a more complex situation. The example was contrived to illustrate exactly this point: that sometimes we cannot select a site that fits our needs and thus the method is not appropriate.
However, in many real-world situations, it is possible to represent diversity adequately for illustrative purposes and to obtain the benefits of this application: helping readers feel, hear, see, "be there" when this kind of surrogate site experience is necessary to undo stereotypes or explain a situation otherwise inaccessible for most people.
Such a situation might be a bilingual education class, about which stereotypes can
abound, or life aboard a nuclear-weapon-equipped submarine, a situation few readers will
ever experience themselves but may need to get a feel for in order to understand staff
selection, training, and management on modern submarines.
Exploratory
The exploratory case study is a shortened case study, undertaken before launching into a
large scale investigation. Its function is to develop the evaluation questions, measures,
designs, and analytic strategy for the bigger study. As table 3.2 indicates, it is most
helpful where considerable uncertainty exists about program operations, goals, and
results. Also rather than initiate a job requiring 1,000 staff days or more, when we do
not have an adequate on-the-shelf set of designs and measures, an exploratory case study
can save time and money in implementation as well as improving the confidence we have in
our results. We can aim more precisely and hit the target more often.
Table 3.2: Exploratory Case Studies
| Aspect examined | Characteristic |
| Evaluation questions | Usually cause and effect |
| Functions | Where considerable uncertainty exists about program operations, goals, and results, exploratory case studies help identify questions, select important measurement constructs, develop actual measures for these, which can be used later in larger-scale tests; formulate expectations; safeguard investment in larger studies (for problems or programs that are not well-developed) |
| Design features | Site selected: needs at least one site that represents each important variation to make a convenience sample acceptable; number of cases sufficient to cover diversity; data focus on program operations and on-site observation, are not longitudinal but need enough time to find out what is going on; analysis is closely concurrent with field work but does not require strong chain of evidence or audit trail; reports are usually internal or parts of larger, longer reports |
| Pitfalls | Temptation to prolong the exploratory phase; site selection only for convenience, inadequate coverage of diversity; prematurity-exploratory findings released as conclusions; over-involvement in evaluator's own hunches so that initial findings are confirmed rather than tested |
Some of our scoping work already may involve exploratory case studies. For example, in GGD, a design study was done as a separate job, culminating in a briefing, prior to an in-depth study of the implementation of the Bail Reform Act of 1984. The methodology included 90 interviews, observations, and data analysis from the population of 94 court districts selected purposively for their characteristics on significant variables. Researchers and experts in the field were also interviewed. An expert panel was used to give feedback at various points to make sure we had a comprehensive picture of the situation. The product of this exploratory case study was a briefing, with the study design choices described, including detailed research questions, outlines of data sources, significant variables, extant data bases, and site selection criteria. From this, a larger study was designed to meet the needs of the requester. Other jobs may involve similar efforts that are not, however, reported as separate jobs and thus are less visible as exploratory case studies.
Also reports that include some features of exploratory case studies have been issued by GAO. In 1985, for example, NSIAD examined emerging issues in export competition through a case study of the Brazilian market (U.S. General Accounting Office, September 26, 1985). Combining site visits to Brazil, Japan, West Germany, and France, interviews with many officials of appropriate agencies and from the private sector, examination of official government files, and a questionnaire survey of high technology firms active in the Brazilian market, the evaluators amassed a rich array of contextual and focal information and identified four trade practices considered to be key factors in export competitiveness in Brazilian markets. These were bilateral trade accords, countertrade, export financing, and compliance with trade-related industrial policy. Although to meet the requirements of the job, NSIAD did not need to test these factors for generalizability to other countries through a later study, the product would permit such testing. NSIAD is using the findings in this way, as part of its ongoing work on bilateral initiatives. Of particular methodological note in this report is the detailed explanation of why export competitiveness in Brazilian markets (the instance) was selected for the case study.
The exploratory case study has been used by agencies outside GAO. The Department of Justice, for example, supported an exploratory case study of the career criminal program (Chelimsky and Dahmann, 1980). The career criminal program aimed at "swift and certain" justice by trying to expedite and strengthen processing of individuals who had long criminal histories at the time of apprehension. The exploratory study looked in depth at four of the nine demonstration sites prior to conducting a program effects evaluation. The evaluators identified the key elements of the programs as implemented and what measurable changes were likely to occur and developed measures of the outcomes, as well as designs for testing cause and effect in the subsequent larger study (Chelimsky and Sasfy, 1976).
The greatest pitfall in the exploratory study is prematurity: that is, the findings may seem so convincing that it can be difficult to resist pressures to report on these as if they had the strength of the larger study. Also, care must be taken to scope and sequence the exploratory study so that it yields enough information to be worthwhile and in time for use in the larger study but does not unduly delay answering the questions through the larger study. In addition, it is inappropriate to use the scoping phase as an ad hoc exploratory case study accompanied by an urge to issue the product at the end of scoping, when the necessary procedures for an exploratory case study with regard to such issues as instance selection have not been followed.
Critical Instance
The critical instance is the most frequent application of the case study
method in GAO, so much so that it may be seen as a "usual GAO review" rather
than recognized as what it can be-a case study (U.S. General Accounting Office, January
22, 1981; April 23, 1982; October 30, 1985). The advantage of recognizing the approach as
an application of case study methods is that some aspects of the method-such as the close
yoking of data collection and analysis-that may not be widely used now could be applied in
a way that increases timeliness without reducing quality. (This technique, discussed in
more detail in the section on analysis, can increase efficiency by reducing collection of
data and large-scale analyses of these data that subsequently do not prove useful.)
The critical instance case study examines one, or very few, sites for one of two purposes. First, a very frequent application is the examination of a situation of unique interest, such as Three Mile Island, the Challenger disaster, or allegations concerning funding for a specific presidential campaign. There is little or no interest in generalizability. The instance is not "selected" by us; rather, we are called to it.
GAO conducts many critical instance studies. One example, already mentioned, was our review of the representation of foreign interests by former very high government officials (U.S. General Accounting Office, July 11, 1986). Another is PEMD's review of the readiness of the Big Eye Bomb for production (U.S. General Accounting Office, May 23, 1986). Yet another is RCED's review of a construction contract award at Jean Lafitte National Historical Park (U.S. General Accounting Office, September 26, 1987) and their examination in a separate report of the park service actions at Delaware Water Gap National Recreation area in awarding a lease, closing a camp ground, and raising a house rent (U.S. General Accounting Office, October 28, 1987).
A second, rare, application is where a highly generalized or universal assertion is being called into question, and we are able to test it through examining one instance.
In one such study, GGD examined whether national policies, procedures, and practices with regard to cargo imports were causing problems in port operations (U.S. General Accounting Office, December 1986). The Port of New York offered a critical test because, given the diversity of imports and the volume of work, if problems were occurring, they would be likely to show up clearly in this site. If no problems were observed, problems in other sites were unlikely. GGD used observations, interviews, and document analysis at three sites in the Port of New York and supplemented these with a small number of less intensive observations at other sites. The method, in this instance, was sufficient to permit recommendations that were systemwide and generalizable with the single case.
Table 3.3 summarizes the features of the critical instance case study. As noted, the method is particularly suited for answering cause-and-effect questions about the instance of concern. It provides assurance that we have not prematurely overlooked important factors, that we have not been swayed by information from limited or perhaps biased sources, and that we have taken context into account, thus giving a fair and balanced picture of the situation.
Perhaps the biggest pitfall in this application is insufficient specification of the
customer's question. That is, the job may be presented to us as if only that situation is
of concern, but the underlying question may call for a broader look at the issue. A
request to investigate the reasons for the bank failures in Ohio, for example, may reflect
an interest only in Ohio, but it could be a "tip of the iceberg" question. What
the customer may really want to know is whether other states are likely to have similar
problems. In such a situation, Ohio might be selected as a site to examine but we would
also need to look at other states or use other approaches to achieve the generalizability
needed. This then rules out the critical instance method as appropriate for this job. The
importance of probing the underlying questions in a request to achieve good specification
of the evaluation question is not unique, of course, to the critical instance case study
but it is crucial in its appropriate application.
Table 3.3: Critical Instance Case Studies
| Aspect examined | Characteristic |
| Evaluation questions | Cause and effect, usually stand alone |
| Functions | Investigation of specific problem (frequently encountered at GAO), decisive testing of universal assertion; cause-and-effect questions |
| Design features | Site selects itself in specific problem-for decisive testing, have to assume uniform system with regard to issue and so convenience sample acceptable; number of cases is usually one instance; comprehensive data for specific problem-for decisive testing, need more modeling, hypotheses, and targeting to know what to study; data analysis and collection concurrent and interactive: data feed new collection, and emphasis on ruling out alternative causes; report describes instances, presents conclusions about cause, gives evidence |
| Pitfalls | Inappropriate selection of this technique as real issue may not be specific problem (e.g. Ohio bank failure) but more general questions; premature closure may narrow causal search too early; overgeneralization from evidence |
Program Implementation
We frequently are asked whether a program has been implemented and, often,
whether implementation is in compliance with congressional intent. The program
implementation case study is helpful where enabling legislation offers considerable
flexibility. In such cases, a wide variety of expenditures or actions could be consistent
with legislation and compliance with intent may be a matter of understanding the process
by which decisions were made, who was involved, and whether the actions are meeting local
needs. One example is the 1981 legislation consolidating many small categorical grants
into larger block grants, the funds for which could be spent very flexibly.
Another situation where program implementation case studies may be called for is when
concern exists about implementation problems. In-depth, longitudinal reports of what has
happened over time and why can set a context for interpreting a finding of implementation
variability: that is, whether there seem to be basic structural problems or if the program
understandably requires time for installment, adaptations, and building an infrastructure.
In some instances, GAO has been able to follow fairly intensively the implementation of
programs or activities. One example is GOD's series of reports on how the 1980 census was
conducted. GAO evaluators, in addition to being "on the scene" due to their
location at the major audit site accompanied enumerators into the field and examined, in
depth, Census procedures at field offices. In other instances, we have spent somewhat less
elapsed time in the field, with less direct observation, and with greater reliance on
interview and documentary evidence. In 1985, for example, RCED was asked how the
Department of Interior was implementing the Office of Management and Budget's Circular
A-76, dealing with privatization of all appropriate services. The request overlapped with
another similar request. This request reflected a senator's special interest in the
Glacier National Park in Montana. The evaluators were able to combine the jobs in a review
that eventually involved information from 8 of 17 National Park Service regional offices
and 19 of 402 field offices. The report aggregates findings across these sites and
concludes that agencies have been slow to implement the circular, although progress has
been made since 1982 (U.S. General Accounting Office, March 15, 1985).
Another example is GAO's review of 23 federal agencies' efforts to implement the Federal
Managers' Financial Integrity Act of 1982. A series of case studies, together with an
overview report, was produced. Among these, RCED's review of the Department of Commerce
implementation, to take one report, examined the actions Commerce took that were intended
to improve internal controls, such as training senior financial analysts in evaluating
applicants and borrowers in the troubled EDA business loan program and overhauling the way
in which computer resources were used for the National Weather Service. RCED also examined
the results of these efforts and highlighted priority areas for further improvement, such
as better information on results for internal management purposes.
Table 3.4 summarizes the design, data collection, analysis, and reporting features of
program implementation case studies. Usually, in such studies, generalization is wanted
and care is required to negotiate the question with the customer (best situations? worst?
typical?) and to match instance selection carefully with the questions. Unless the program
is small and homogeneous, the evaluator faces two possibilities. The first possibility is
that the number of instances will need to be fairly large in order to achieve the
generalizability wanted, and, as a consequence, skill will be needed to manage data
collection with sufficient flexibility to obtain the insights case studies offer and
sufficient structure to permit cross-site aggregation of findings. The second possibility
is that the diversity will be so great that it would be impossible to have enough
instances to meet needs for generalizability and still manage the data collection and
analysis.
Table 3.4: Program Implementation Case Studies
| Aspect examined | Characteristic |
| Evaluation questions | Descriptive, normative |
| Functions | Learn what implementation has been achieved, understand unexpected aspects; understand reasons why implementation looks the way it does; useful when enabling legislation has given flexibility |
| Design features | Site selection cannot be convenience because usually generalization wanted, and purposive sample can be typical and representative of diversity and best and worst cases; number of cases depends on program diversity since generalization usually wanted; data rely on common instruments, published documents, and observation; reports are varied in theme, site, chronology, and narration |
| Pitfalls | Bias detection methods may be inadequate; may fail to take into account diverse views about program goals and purposes; competence of all on-site observers may not be sufficiently high; can be costly due to management, data quality control, validation procedures, and analytic model (within site, cross site, etc.) may lead to cutting too many corners to maintain quality |
An important requirement for good program implementation case studies is investment of enough time on site to get longitudinal data and to obtain breadth of information. If the purpose is to report what is happening in a descriptive sense only, short site visits together with administrative records may provide adequate bases for findings. If, however, the evaluation question requires GAO to report on how satisfactory progress is or the reasons for problems in implementation, the more staff who can be on site over time, with the richest or "thickest" base for examining the situation as the many people involved see it, the sounder our causal conclusions and subsequent recommendations will be.
The multiple sites usually required for program implementation questions impose demands
on training and supervision needed for quality control. Because of tight resources, lack
of travel funds, and the need to use staff with uneven experience and skills, this becomes
critical in situations involving many evaluators working in different regions. That is,
time is needed to train staff adequately in such case study techniques as the note-taking
required for thick descriptions, which is in turn required for the content analysis of
themes in the instance. It is possible, for example, for two persons to interview the same
informant and find that one has used a one-sentence summary for a detailed, rich, 5-minute
discourse while the other captured much more of the complexity and essence of what was
said and what was happening. Table 3.5 illustrates such a difference.
Table 3.5: Illustration of Differences in Note-Taking
| Situation | Technique | Characteristic |
| In an interview with the Director of the National Science Foundation program for grants to small colleges, the following question is asked: "How does your program inform the eligible colleges of the opportunity to apply for grants?" | Rich notes | "The Director indicated that procedures has changed three times since the inception of the program. In the first 4 years, announcements were mailed to the individual named as president in the listing, for the same year, of the American Association of Small Colleges. Because applications were very sparse, with about 30% of eligible colleges applying, the procedure was changed to a two-stage mailing, first to the president to find out the name of the official in charge of federal programs and then to the official. This worked well for 5-year period, in terms of receipt of applications from over 80% of the eligible colleges, but when overall federal funding for research was reduced, the positions of federal funding were abolished and applications fell to about 49% of eligible institutions responding. Two years ago, the decision was made to mail copies t to the persons listed as chairs of the relevant science departments in each college in appropriate professional association listings. This has increased the cost of outreach by about $15,000 or about 25% more than the prior system. To date, returns are at the 80% rate again." |
| Thin notes | "The current system is to mail copies of the announcements to the chairs of relevant science departments, such as chemistry, biology, physics, and computer science." |
Program Effects
Case studies can determine the effects of programs and reasons for success (or failures).
In 1982, for example, RCED examined the progress made since the 1970's in cleaning up the
nation's air, water, and land, finding that while strides had been made toward meeting the
established goals (cleaner air, properly treated wastewater, more drinkable water),
deadlines had been extended and unresolved issues made meeting even these deadlines
difficult (U.S. General Accounting Office, July 21, 1982). We pointed to lack of
flexibility as a source of cascading problems and delays. The bases for these conclusions
were in-depth case studies of three sites (Cleveland, Dallas, and New York City) together
with information from reports prepared by six federal agencies and by environmental
organizations and public interest groups and interviews with Environmental Protection
Agency officials. Particularly notable methodologically in this report is the integration
of case study findings with other sources of information throughout the first volume.
A PEMD report has focused on water quality: the effectiveness of efforts to improve water quality and the reasons for successes and failures. In-depth, very extensive case studies of several water catchment areas were conducted, and the final report is based on a synthesis of the findings from the case studies-another example of integration of findings across diverse sites (U.S. General Accounting Office, December 17, 1986a, b; September 19, 1986). This series of reports also is useful for illustrating the way in which causality is established in case studies: through development of internally consistent explanations of what led to what and the conscientious use of information from within the site and from contrasting sites to rule out alternative explanations.
For another example, to determine whether actions taken by the states since the mid-1970's to address medical malpractice insurance reduced insurance costs, the number of claims filed, and the average amount paid per claim, HRD conducted case studies in six selected states (Arkansas, California, Florida, Indiana, New York, and California). Work included obtaining views of organizations representing physicians, hospitals, insurers, and lawyers on perceived problems, actions taken to deal with them, results of these actions, and the need for federal involvement. Other information came from surveys of nonfederal hospitals about the sources, coverage limits, and costs and claims from leading insurers in each state and, for comparison, the same type of information from a nationwide company. The results are presented separately in six case study reports and aggregated in the overall report (U.S. General Accounting Office, December 31, 1986).
Other federal agencies have used the case study method successfully in answering program effects questions. The National Science Foundation, for example, assessed the effectiveness of a cooperative science program aimed at increasing innovation and knowledge transfer between university and industry researchers. Ten case studies were undertaken of a carefully selected group of projects that ranged from computer language systems through nuclear science to fisheries biology and chemical engineering. Of note is the methodological detail given on project selection, data collection, analysis, and case format. In a companion report, results from a survey of grant recipients are analyzed, giving both a quantitative and a qualitative sense of how the program was working. Results from the two methods were not integrated; both suggested, however, that the program was generally working well (National Science Foundation, 1984).
Table 3.6 summarizes key features of program effects case studies. Like the program
implementation case study, the evaluative question often requires generalizability and,
for a highly diverse program, it may not be possible to answer the questions adequately
and still have a manageable number of sites.
Table 3.6: Program Effects
| Aspect examined | Characteristic |
| Evaluation questions | Cause and effect, can be stand alone or mulitmethods and can be conducted before, during, or after other methods |
| Functions | Determine impact and give strong inference about reasons for effects |
| Design features | Site selection depends on program diversity, cannot be used with highly diverse programs; best, worst, representative, typical, or cluster bases appropriate; must keep number of cases manageable or risk becoming minisurvey, can use survey before or after to check generalizability or mix survey with concurrent case studies selected for special purposes; data rely on observation and structured materials, often combine qualitative and quantitative data; analysis uses varying degrees of formalization around emergent or predetermined themes; reports are usually thematic and describe site differences and explain these; variation in degree of integration of data across sites and of findings from different methods |
| Pitfalls | Not collecting the right amount of data; not examining the right number of sites; insufficient supply of well-trained evaluators; difficulties in giving evaluators enough data collection latitude to obtaining insight without risking bias |
There are some methodological solutions to this problem. One solution would be to conduct the case studies first in a set of sites chosen for representativeness and to verify the findings from the case study through targeted examination of administrative data, prior reports, or a survey. A second solution would be to use these other methods first. After identifying the findings of particular interest, case studies would be conducted in sites selected to maximize the ability to get the specific understanding required. Both of these approaches have been used with good effect in program evaluation.
Cumulative
This relatively new and not as yet widely used application of case study methods brings
together the findings from case studies done at different times. The applications
previously discussed that involved multisite case studies are cross-sectional: that is,
information from several sites is collected at the same time. In contrast, the cumulative
case study aggregates information from several sites collected at different and even quite
extended times.
The cumulative case study can be retrospective, aggregating information across studies done in the past, or prospective, structuring a series of investigations for different times in the future. The techniques for ensuring sufficient comparability and quality and for aggregating the information are what constitute the "cumulative" part of the methodology.
That is, the cumulative case study is similar to an evaluation synthesis, in that it is
a method for aggregating the findings of several studies. It differs from an evaluation
synthesis in that special techniques are required to aggregate the qualitative information
that often is a feature of case studies and to maintain the sense of the "instance as
a whole" in its complexity that distinguishes case studies from surveys of several
sites. For some jobs, both case study and noncase study reports can be
aggregated, each using the appropriate techniques, in order to produce capping reports or
similar products.
GAO does not appear to have done a cumulative case study using our own case study reports or other case studies. GAO reports have been used with good results, however, in cumulative case studies published by others outside GAO. One example is a book on bureaucratic failures, which is based entirely on GAO reports of management problems in different agencies over a considerable period of time (Pierce, 1981). The author began with a set of hunches or hypotheses about what can go wrong in agency management, and what would be evidence supporting-or contradicting-these hypotheses. He reviewed the GAO reports in detail, analyzed the data from each one in terms of his framework, and aggregated the results in his final chapter.
Other examples of cumulative case studies come from two international agencies. A
retrospective cumulative case study was conducted by the World Bank in its examination of
four in-depth case studies of the effectiveness of educational programs. These case
studies were intended initially as stand-alone assessments of the programs but were
brought together to learn about the effectiveness of the evaluations themselves in the
context of educational programs (Searle, 1985). A prospective cumulative case study was
commissioned by the U.S. Agency for International Development. The purpose was to identify
input and process components of economic assistance that could be quantitatively
associated with differences in outcome measures. The method was the specification of a
common set of data (both qualitative and quantitative) to be collected over a 5-year
period as projects were initiated, together with a means of coding the data across the 47
studies eventually completed. The coded results were analyzed quantitatively in the final
report (Finsterbush, 1984).
Table 3.7: Cumulative Case Studies
| Aspect examined | Characteristic |
| Evaluation questions | Cause and effect |
| Functions | Retrospective cumulation allows generalization without cost and time of conducting numerous new case studies; prospective cumulation also allows generalization without unmanageably large numbers of cases in process at any one time; strengthens inference from new studies by combining with results from older studies |
| Design features | Uses site selection and usually a large number of cases; data as reported (retrospective); usually on-site observation (prospective); backfill techniques; analysis uses case survey method to cumulate findings; possible to examine interactions directly since number of instances is large; reports may resemble evaluation syntheses |
| Pitfalls | Publication basis may severely limit generalization; inadequate or uncertain quality of original data; quality of data-reduction procedures may be very difficult to determine; the effects of changes in many contextual factors over time may be difficult to separate from effects of the programs |
Two features of the cumulative case study, shown in table 3.7, are the case survey
method just described as a means of aggregating findings (Lucas, 1974; Yin and Heald,
1975; Yin et al., 1976) and backfill techniques (Berger, 1983). The latter are helpful in
retrospective cumulation as a means of obtaining information from the authors that permits
an otherwise unusable case study to be included in the aggregation. Knowing the basis on
which the case instances were selected, for example, is crucial in cumulation; otherwise
it is not possible to know whether best case, worst case, typical, or the like instances
are being aggregated. Some published case studies do not provide sufficient detail on
this. In backfilling, the evaluator might call the author, visit the author to review the
original data, or contact others who were knowledgeable about the design decisions in
order to get adequate information on instance selection.
Opinion varies as to the credibility of cumulative case studies for answering program
implementation and effects questions. One authority notes that publication biases may
favor programs that seem to work, which could lead to a misleadingly positive view
(Berger, 1983). Other experts are concerned about the quality of the original data and
analyses and problems in verifying their quality (Hoaglin et al., 1982; Yin, 1989). For
the cumulative use of GAO reports, these concerns are less important, since we already use
the "audit trail" procedures recommended in the policy and other manuals for
verification of data collection and analysis quality. We do, however, have the opposite
concern: that is, we would need to be sure there was not "bad news" selectivity
in a particular area, associated with killing jobs that did not identify problems during
scoping.
Table 3.8: Some Design Decisions in Case Study Methods
Type of question
| Design decision | Illustrative, exploratory | Critical instance | Implementation, program effects, cumulative |
| Basis for site selection | Typical, representative, cluster | Convenience, unique interest | Best-worst case, bracketing, typical, representative, cluster, probability |
| If mulitmethod | Concurrent | Concurrent | Before, concurrent, after |
| Prestructuring | Low, moderate | Low, moderate | Moderate, high |
| Type of data | Qualitative only, qualitative-quantitative | Qualitative only, qualitative-quantitative | Qualitative only, qualitative-quantitative, quantitative only |
| Sequence of analysis | Within sites, then across | Within sites, then across | Within sites, then across; across sites, then within; concurrent |
| Reporting | Narrative, thematic | Narrative, thematic | Thematic |
Design Decisions and Case Study Applications
In earlier sections, we discussed seven bases for purposive selection of instances and six
applications of the case study method, each of which was associated with a different
evaluation purpose or question. Bringing this information together, table 3.8 shows the
relations among case study applications and design decisions. For example, if the purpose
of the study is illustrative, an appropriate basis for site selection could be typical,
representative, or cluster; the case studies would be conducted concurrently with other
methods used in the main study; prestructuring or guidance to the evaluators in the field
would be low to moderate to permit the thickness and richness of insights needed; data
could be qualitative only or both qualitative and quantitative; the case studies probably
would be analyzed within sites only; and the reporting would probably be narrative.
Chapter 4
Data Collection and Analysis
We have said that the features distinguishing case studies from other methods are how
sites are selected, how the data are collected, and how they are analyzed. In the last
chapter, we covered instance selection. We turn now to other elements that distinguish a
case study from a not-case study and a good case study from a not-good case study. The
discussion is an introduction to the approaches.
Data Collection
In other transfer papers on program evaluation, we have emphasized the importance of
validity. Validity involves measurement and also design. A valid measure-that is, one with
construct validity-reflects what it claims to reflect and not something else. For example,
whether or not there are active opposition parties may be a more valid measure of whether
a country is a democracy than how many people vote in an election. A valid
cause-and-effect design-that is, one with internal validity-rules out alternative
explanations of results by comparing what happened with an intervention to what happened
in the absence of the intervention. For example, in a study of the effects of an
employment training program, greater employment of participants after the training than
before must be shown to be due to the training and not simply to better economic
conditions, which also could increase employment.
Measurement Validity
Case study methods can use two tactics for achieving measurement validity: multiple
sources of evidence and using the chain-of-evidence technique in data reduction.
Multiple Sources of Evidence
Turning first to multiple data sources: case studies require "thick"
description in order to get enough information to check for trends, to rule out competing
explanations, and to corroborate findings. Eight techniques are used-sometimes all of them
in the same study-to collect information (Neustadt and Fineberg, 1978; Yin, 1989).
Many of the eight techniques are discussed in the General Policy Manual, chapter 8.0. Of these ways, the approaches that most differentiate case studies from other techniques are direct observation and participant observation.
GAO has used both approaches in its jobs. For example, in NSIAD's study of conditions on submarines, auditors spent time aboard submarines in a variety of situations, getting firsthand knowledge of life in these vessels. Their direct observations form the primary data source for our report. We went to sea in this instance, however, in our GAO role, as auditors and evaluators and so-it could be argued-might have seen what special guests see and not what life would be like for the average sailor.
To get more authentic information, evaluators have sometimes become participants in
situations, not identified to the other persons involved as GAO staff. One example of how
we have adapted this
participant-observer approach was in GOD's study of the services available to taxpayers
from IRS after IRS reduced the number of public information agents (U.S. General
Accounting Office, April 5, 1984). We developed a set of standard income tax questions
about which citizens typically would call IRS, obtained IRS agreement on the correct
answers to these questions, and then, on a probabilistic sampling basis, called IRS
offices around the country to seek help. We used names such as Gerald A. Office in these
conversations but did not say we were from GAO. We were able to report how long it took to
get the phone answered, how long it took to get information, the consistency of
information, and general helpfulness of the responding agent. Such an approach gave more
authentic information than relying only on IRS records of calls received, or a survey of
taxpayers. In the first instance, IRS would have no record of time before the person could
get through to an agent and of "discouraged callers." In the second, a survey of
taxpayers would have to be very large to get a good "hit" rate of individuals
who sought assistance, and the diversity of individual questions would have blurred
ability to interpret variation in IRS responsiveness. HRD used a similar approach in
reviewing the Social Security Administration's telephone inquiry program; over 4,000 calls
were made, with GAO personnel taking the role of ordinary citizens in asking the randomly
selected, prepared questions (U.S. General Accounting Office, August 29, 1986).
One element of data collection that distinguishes case studies from other techniques is that comprehensiveness of interviewing is very important. In order to learn the meaning of events to those involved in them, a key element of case studies, the views of more senior officials are not given greater weight than views of less highly placed persons. In fact, a case study where the only people interviewed were senior officials would be seen as a not good case study, in contrast to one where the views of individuals at all levels affected was obtained. For example, if we wanted to learn about how noncompetitive awards were reviewed in an agency, a good case study would obtain information from the agency head, the head of the procurement division the inspector general's office, the contracts officer responsible for selected awards, staff involved in the reviews for these awards, counterpart persons from the contractors' procurement and program operations staff, and the legal divisions within the agency and the contractors. We might shadow several noncompetitive procurements, following their life history from initiation through actual awards, sitting in on meetings, and studying, over time, how the awards were handled.
Chain of Evidence
A chain of evidence is the sequence from observation to conclusions. In a strong chain of
evidence, an independent second evaluator could follow the first evaluator from original
observations, the "raw" or unreduced data, through all the steps of data
aggregation and analysis, and conclude that the first evaluator's findings were justified
by the evidence and fairly represented it. This requires careful organization of the files
of original observations, complete documentation of the conditions of data collection that
are relevant to the trustworthiness and credibility of the information, and making
transparent and reproducible the manner in which the evaluator moved from phase to phase
of the analysis. Some evaluators call such a procedure "building an audit trail"
and use procedures similar to indexing and referencing to establish both the construct
validity of the measures reported and the convincingness of the causal explanations
developed in the case study (Halpern, 1983). That is, they have an independent evaluator
review the equivalent of their workpapers rather than providing so much detail in the
report itself that a reader can come to the same conclusion.
Some information in a case study is likely to be judgmental, particularly when observer
and participant-observer modes of data collection are used. And the collection process
involves judgment calls of promising leads and the meaning of initial information. While
documenting the basis for judgments can be more difficult than documenting nonjudgmental
information, overall the chain of evidence or audit trail techniques should not pose any
greater difficulty for GAO evaluators than our documentation procedures for other
evaluation methods.
Data Analysis
Case studies, obviously, can generate a great deal of data, data that need
to be analyzed sufficiently and with appropriate techniques in order to be useful. Much is
qualitative. As table 4.1 indicates, there are six general features of data analysis. Four
are essential to case study methods: iteration, OTTR, triangulation, and ruling out rival
explanations.
A unique feature of case studies is that data collection and analysis are concurrent. In most methods, we plan for data collection, then we collect the information, then we analyze it, and then we write the report. In case studies, the data coming in are analyzed as they become available, and the emerging results are used to shape the next set of observations.
The sequence in which this takes place is the OTTR, which stands for "observe,
think, test, revise." After observations have been made in the first phase (and
during the observations, because that is a natural way for our minds to work), the
evaluators think about the meaning of the information: what does it suggest about what is
happening and why? What else could explain what is going on? The
Table 4.1: Ways of Analyzing Case Study Data
| Feature | Methodology |
| Iterative | Data collection and concurrent analysis |
| OTTR | Observe, think, test, and revise |
| Triangulation | Comparison of multiple, independent sources of evidence before deciding there is a finding |
| Rival explanations | Developing alternative interpretations of findings and testing through search for confirming and disconfirming evidence until one hypothesis is confirmed and others ruled out |
| Reproducibility of findings | Establish through analysis of multiple sites and data over time |
| Plausible and complete | Data analysis ends when a plausible explanation has been developed, considering completely all the evidence |
| Specific techniques for handling mulitsite data sets | Matrix of categories, graphic data displays, tabulating frequency of different events, developing complex tabulations to check for relationships, and ordering information chronologically for time series analysis |
second, or "think," phase ends with specification of what new information
would be needed to rule out alternative explanations or confirm interpretations. This
triggers the third phase: test. In this phase, the evaluator collects more information, as
required by the specifications from the "think" cycle. The data collected in the
third phase are not specified before the first phase: they emerge, often with surprises,
from the initial observations. The fourth phase is examination of the second round of data
collection and a revision of initial interpretations and expectations-the
"revise" phase. The revise phase may lead to another test phase, if information
from the second round of data collection was insufficient to rule out alternatives, or if,
during revision, new interpretations emerged. This iterative process ends when a plausible
explanation has been developed and, at the end of a "revise" phase, there are no
outlier or unexplained data, no further interpretations possible, or it is clear that
despite the most diligent search for information, more is not available to further refine
description and explanation.
In case study methods, causality is established through the internal consistency and
plausibility of explanation, derived additively through the OTTR sequence. This is in
considerable contrast to other evaluation methods, where control and comparison groups are
used subtractively to rule out other reasons for a finding and establish firm attribution.
Handling Multisite Data Sets
Several techniques have been developed recently for handling multisite case
study data sets. These include setting up a matrix of categories, graphic data displays,
tabulating frequencies, developing cross-tabulations, and time series analysis.
Matrix of Categories
In this technique, a coding scheme is developed prior to data collection. It is modified
during data collection and the OTTR process and finalized after the evaluation team has
read through all the case materials. The categories are related to the evaluation
subquestions; for example, if a subquestion was "How does the Immigration and
Naturalization Service monitor the conditions of confinement in privately contracted
detention facilities," coding categories might include who is responsible, how these
persons get information, what they do with information received, evidence that minimum
standards are met, evidence of shortfalls, changes over time in monitoring, and
conflicting guidance or responsibilities. These categories might be put into a matrix by
facility size or groups served. The approach is similar to content analysis, and the PEMD
transfer paper on content analysis gives further how-to information (U.S. General
Accounting Office, June 1982).
Graphic Data Displays
This is a family of techniques, some of which have been adapted for computers and some of
which use wall-space. The evaluators immerse themselves in information on a site,
following OTTR. Their initial story of what is happening and why is displayed as a
flowchart with a series of critical paths for action. Evidence supporting the story is
arrayed in the display. The materials then are searched for counter evidence and
subsidiary or branching paths are laid out. As a satisfactory graphic is developed for one
site, the evaluators turn to the next site. The evaluators could at this point either
modify the first graphic, based on information from the second site, or prepare an
independent flowchart. In the second approach, aggregation would come after all the sites
had been charted, and the charts would be used as the data base for aggregation.
The graphic techniques can be applied to an instance as a whole or to subcomponents.
For example, if an analysis of life-threatening or fatal incidents at national parks were
needed, the evaluators might develop separate graphics for events leading up to the
incidents, the incidents themselves, and postincident actions. More complex case studies
might need several "layers" or graphics; less complex, few.
Tabulating Event Frequencies
Another technique for analyzing multisite case data is identifying events within
each case study ("meeting between Jones and Smith"; "Smith staff prepares
recommendations") and tabulating their frequency of occurrence. Such a simple
tabulation can draw the evaluator's attention to events that may be significant or to
informal networks and give a sense of actual (as contrasted to on-paper) organizational
relationships. Divergences between observed and expected patterns can be examined further
to see what happens as a result of these meetings and identify potential problem nodes:
for example, when an expected high-communication node turns out to be, relatively
speaking, a low-communication spot.
Complex Tabulations
Cross-tabulations of events can identify interactions and check the developing
story more formally. For example, service coordination is a popular remedy for limited
funds. An evaluator in the field may observe that coordination among local agencies funded
through the same federal agency is more frequent than coordination among local agencies
funded by different federal departments. Tabulations of actual meetings and of consequent
actions for same-agency funded and different-agency funded services can help check out
whether this impression is reliable.
Time Series Analysis
Organization of information within each site by time of occurrence, coupled with a
systematic analysis of contextual influences on events, permits a nonquantitative time
series analysis for case study data. The flow of events over time for each significant
actor and for significant points in the series of events forms the organizing framework
for data analysis within each site. Such comparisons of when key actions occurred, how
well (or poorly) they were carried out, and what influenced both timing and quality of
performance can be particularly helpful in case studies of program implementation.
In some instances, only one component of a case study may be analyzed in this way. For
example, a case study of the effectiveness of a job training program might need to take
into account general economic trends, such as unemployment rates in the community. A time
series comparing local unemployment rates with placement rates for job training program
participants could be computed quantitatively and changes interpreted through the more
qualitative time series data about the program
Basic Models for Data Analysis
Two basic models of data analysis are pattern matching and explanation
building. Pattern matching requires using past experience, logic, or theory before the job
begins to specify what we expect to find. The analysis then compares actual findings to
expectations. When the findings fit, the pattern is confirmed. When the findings don't
fit, the evaluator adjusts the expectations or elaborates them, building a subroutine that
can explain the unexpected findings. Explanation building is the inverse procedure:
starting with the observations, the evaluator develops a picture of what is happening and
why. Data are used to fill in the initial hunches, to change them, to elaborate on them.
The first strategy matches findings to hypotheses or assumptions. The second uses the data
to structure the hypotheses or assumptions.
In either strategy, the evaluator needs to search the full data base thoroughly for disconfirming evidence, in order to avoid the pitfall of premature conclusions and data analysis ends when the best fit possible has been reached between the observations and a statement about what they mean.
In either strategy, expectations and explanations can be expressed as themes: a job dealing with bank failures, for example, might have as themes decisions about credit risks, procedures for reviewing decisions, or controls over the accuracy and recency of information on bank solvency. A job dealing with employee training might have as themes decisions about training needs, how employees are selected for training, how course quality is monitored, or how employees and supervisors view the purpose of training.
Themes, in turn, can be analyzed within individual sites first, then findings on each
theme aggregated across sites. Alternatively, all themes within one site can be analyzed
first; then data from the second (and subsequent) sites can be examined. Theme analysis
also can proceed in matrix fashion. On the PEMD AFDC study, for example, evaluators were
assigned as site managers, responsible for understanding across themes all there was to
know about the issues for their site. They also were assigned to individual themes, such
as health and employment, responsible concurrently for looking across all sites for
information on their topic. This organization proved helpful in ensuring that reasons why
a site showed up as an outlier for a given theme could be discussed by someone who knew
the site as a whole.
Pitfalls and Booby Traps
Case study methods, like any other method, offer plenty of opportunity to
go awry. Two frequent concerns are the risks in using other people's studies and in
generalizability.
Impartiality
The biggest risk when we use other people's case studies is that GAO standards of
impartiality may not have been met. There are three meanings of impartiality, one of which
does not create problems. Case studies use as data the impressions and judgments of the
evaluator, which are inherently subjective. For a case study methodologist and for GAO, if
proper care is taken, this should not be a problem. If we want to illustrate, for example,
working conditions for immigrant laborers, we can report what the thermometers registered
and we can also report, firsthand, how people were sweating and what it felt like to be
out in the fields. Such observation is part of the richness, immediacy, and
"thick" description of a case study. However, case studies, like any other
method GAO uses, have to meet two other criteria of impartiality: accuracy and lack of
bias, in the sense that the evaluator's personal, preconceived opinions about a situation
do not distort reporting and that the evaluator is scrupulously evenhanded in examining
all sides of a situation.
Some authorities on evaluation methods believe that case studies reflect the author's
values in ways that can be difficult to detect. Other experts conclude that three actions,
taken together, are sufficient safeguards for lack of bias and adequate accuracy. These
are ( 1 ) submitting reports to people from whom data were collected and printing their
critiques with the report, (2) use of multiple data collection methods within case
studies, and (3) adoption of the audit trail or chain-of-evidence technique. Adequate
supervisory controls also are recommended. Complying with these safeguards should give us
no major problems in our own jobs. The guidance would mainly expand the range of
reviewers. We already conduct exit conferences and, following the "Yellow Book"
and Communications Manual, submit draft reports for agency comments. We often use
multiple methods, and the audit trail technique now recommended for case study use was
itself adopted from such auditing procedures as workpapers and referencing, which are
standard practice with GAO. We also require adequate supervisory control through such
means as prompt review of workpapers. We would need to assure ourselves, however, that
case studies whose results we are going to use have adopted the same procedures for
ensuring impartiality. (Appendix III gives a checklist for reviewing proposed or completed
case studies for quality.)
Generalizability
We often are asked questions where the customer wants in-depth information that is
nationally generalizable, but frequently the issue may not yet be ripe for a national
study or we do not have the resources to collect in-depth data from nationally
representative samples. Using 4, 10, or 15 sites as case studies might be feasible, but we
would still need to be concerned about the risks in generalizability. A main point of
this paper is that generalizability depends less on the number of sites and more on the
right match between the purpose of the study and how the instances were selected, taking
into account the diversity of the programs.
An example of an efficient combination of careful specification of the purpose of the study matched with appropriate site selection is the GGD study of the productivity of the Social Security Administration's (SSA's) regional operations. This review examined in depth only one SSA region (U.S. General Accounting Office, September 11, 1985). Atlanta was selected because it had the best productivity among the 10 regions; if GAO could demonstrate opportunities for improvement in the most productive SSA region, then similar improvements might be possible in the less productive regions. Following the case study, an inexpensive (25 staff day) check was made on productivity data and trends from other SSA regions, and similarities were noted. While other problems might be affecting these less productive regions, the findings from the single site plus the trends were so convincing that SSA concluded the single instance examination had national implications. Subsequent analysis of regional office plans for productivity improvement led to the conclusion that their implementation could save about $60 million annually.
Sometimes, however, it is not possible to answer the evaluation question using case
studies, if the program is diverse and the user needs national generalizability. The user
may prefer to sacrifice in-depth information for generalizability and we will have to use
other methods, such as surveys or secondary analysis of existing data. However, it often
is possible-with appropriate instance selection-to obtain adequate generalizability with a
manageable number of instances. In addition, the evaluator can apply the case survey
method to increase the generalizability of findings and can combine case studies with
other methods. Taken together, these strategies can permit the use of the case study
technique with enough generalizability for many users' purposes. That is, for the first
three types of case studies (illustrative, exploratory, and critical instances)
generalizability, if needed, cannot be achieved unless they are combined with other
methods. Generalizability can be achieved for the three other types, however, even when
they are used independently, as long as they are carefully designed in terms of case
selection and analytic strategies.
Where to Go for More Information
More detail on data collection and analysis can be found in two books on
case study methods: Case Study Research by Yin (1989) and Analyzing Qualitative
Data by Miles and Huberman (1984). More detail on applicable GAO guidance can be found
in the Communications Manual.
Chapter 5
Summary
We can summarize this paper in the answers to three questions: What are case studies? When
are they appropriately used in evaluation? What distinguishes a good case study from a
not-good case study?
What Are Case Studies?
The case study is a method of learning about a complex instance, based on a
comprehensive understanding of that instance obtained by extensive description and
analysis of the instance taken as a whole and in its context. Applying this definition
means learning virtually everything about the instance being studied, including how it
operates and what it does, in relation to the extrinsic or contextual events it is part
of.
Case studies often use one or only a few instances, because collecting and analyzing comprehensive data are prohibitively difficult for large numbers of sites. However, not all studies of a small number of instances are case studies. Some studies collect data from a small number of sites but have no other features in common with case studies and offer none of their advantages. Thus, the fact that a study involves only one or a few sites does not automatically make it a case study. For example, the evaluators may not have selected the sites appropriately for the generalizability needed or they may have collected minimal information with little depth of inquiry.
When Are Case Studies Appropriately Used in Evaluation?
We discussed six types of case study that differ considerably in their
requirements for site selection, data collection, and analysis, among other things. The
six types are illustrative, critical instance, exploratory, program implementation,
program effects, and cumulative. Together, they cover a wide range of evaluation
questions, although clearly not all evaluation questions. For example, case studies are
not well suited for answering the question, How often does something happen?
Some applications of the case study to evaluation purposes have been tried fairly
extensively-for example, program implementation case studies. Others are relatively
untried-for example, cumulative case studies. The latter is a particularly promising
method for GAO, because it can capitalize on the large number of case-study-like reports
that are available, on the quality of the documentation that supports their findings, and
on the general methodological framework that GAO's standards provide. We have not
stressed, in our analysis, the costs, feasibility, and timeliness of case studies, since
these are management criteria that are considered in all designs rather than issues of
particular concern to case studies. However, the implications of the design features
discussed here are that, contrary to what many people think, the case study is not
necessarily inexpensive, easy to conduct, or quick. It may require in-depth data
collection dependent on sensitivity to the setting that takes time to acquire and involve
extended periods for data analysis, interpretation, and reporting.
What Distinguishes a Good From a Not Good Case Study?
We have addressed quality in two ways. One is prospective and intended to
help those who plan evaluation to know the minimum features of the various case study
applications. The other is retrospective and intended to help those who review case study
reports to assess the quality of completed case studies. Table 5.1 summarizes common
pitfalls that we have mentioned throughout this paper.
Table 5.1: Some Common Pitfalls in Case Study Evaluation
| Study stage | Common pitfalls |
| Design | Mismatch between criteria for the specific job and what the case study application can do; insufficient attention to contrasts and comparisons needed for purposed of the study |
| Site selection | More sites selected than needed; fewer sites selected than needed; inappropriate basis for site selection, for the particular job and evaluation question |
| Data Collection | Reliability jeopardized by lack of common guidance in data collection; findings noncomparable; lack of quality control in data collector roles and responsibilities: impartiality threatened; overly loose relationship between data collected and the evaluation question; inadequacy of information |
| Data analysis | Insufficient attention to requirements of analytic plan chosen: low plausibility of results; insufficient attention to management and data reduction: inefficiency, lateness, incomplete methods of relating findings across sites; inadequate methods for relating qualitative and quantitative data within sites |
| Reporting | Overgeneralization, compared to actual basis for site selection, number of sites studied, and requirements for inference in the design; inadequate interpretation, unintegrated narrative, results not adequately related to user questions; inadequate attention to threats to impartiality and the extent to which these have been avoided |
Quality and Evaluation Design: Planning
We have presented six types of case study evaluations and for each one described
features such as number of sites, site selection, data collection, data analysis, and
reporting. Our descriptions represent a "floor" of quality for each evaluation
application. The features of the six types of case study are not interchangeable. That is,
the features of a case study that are appropriate for answering one kind of evaluation
question are not necessarily appropriate for answering another kind of evaluation
question. Evaluators considering the case study as a design for evaluation must first
decide what type of evaluation question their specific question is and then examine the
strengths and limitations of each type of case study for answering it. The crucial next
step is to look at the features of each type and decide whether it will be possible to
meet these methodological requirements in the specific situation.
For example, the basis on which instances can be selected differs for the different case study applications. Usually, an illustrative case study site should be typical of the program being examined while exploratory case study sites should bracket the diversity that is likely to be encountered in the program, population, and setting of a larger study. Usually, sites for program effects case studies should be selected with great care for criteria such as whether there is evidence that the program has been implemented at the site, whether the site has been subjected to changes that could have the same effects as the program or that could mask its effects, and how the addition of this site to the group of sites being studied supports the generalizability of the findings.
Quality of Evaluation Design: Reviewing
Turning to ways of assessing the quality of completed case studies, we have provided
guidelines for reviewing case study reports in appendix III. These guidelines are intended
to apply to all types of case study applications.
On matters of design, the guidelines discuss the clarity of issues, the relationship of
the evaluation question to the case study application selected, the basis for case study
selection, and the time span of the study. The data-collection guidelines emphasize
appropriateness of data-collection methods, evaluator training, and information sources.
Guidelines for data-base formation and analysis deal with explicitness of procedures and
techniques, interpretation differences, and the relationship of the findings to those of
similar studies. With regard to reporting, the guidelines emphasize constraints on the
study, arguments for and against various resolutions of the issues, and the role of
judgment in reaching conclusions. With regard to impartiality and generalizability, the
guidelines emphasize that a good case study report (or, for GAO purposes, job
documentation) describes both the evaluators' training and work on related studies,
presents comments on the draft report, and supplies adequate information for judging
generalizability. Reviewers will need to refer in addition to the features of each special
type of case study application for supplementary guidance on what to look for in
individual case studies.
Impartiality and Generalizability
Partiality and-in some instances-the inability to generalize from the findings can limit
the utility of case study methods for evaluation. There are three main threats to
impartiality: subjectivity, inaccuracy, and bias. The case study method inherently
requires subjective and judgmental elements. When proper procedural safeguards are used,
these elements alone do not diminish the value of case study methods. However,
inaccuracy and bias are unacceptable in any case study. Some ways of detecting and
preventing bias, such as the audit trail, have been well developed. Their applicability to
case study evaluations outside of settings such as GAO is being explored.
Many evaluation questions do not require a high degree of generalizability. Certain
case study applications provide high degrees of generalizability with small numbers of
instances. When both broad generalizations and in-depth understanding are required,
designs that cumulate case studies over a wide number of sites and settings, or that
combine case study methods and other methods in one concurrent effort may meet this dual
need (U.S. General Accounting Office, April 2, 1984). However, the diversity of the
population to which generalization is required is a limiting factor in case study
applications. It is also true that without careful attention to standards, case studies
are prone to the kind of overgeneralization that comes from selecting a few instances, assuming
without evidence that they are typical or representative of the population and then
offering national projections. The GAO Project Manual cautions against
overgeneralization from any method. For case studies in particular, there must be an
empirical basis for instance selection and assurance of adequate population homogeneity.
Appendix I
Theory and History
As a research method, the case study originates in the social sciences, particularly in
the fieldwork of anthropology and sociology. Within these disciplines, researchers have
defined the case study and discussed its critical elements in a variety of ways. In
general, six elements appear frequently: purpose, type of data collected, method of data
collection, design, method of data analysis, and reporting.
Purpose
The research case study has been defined as a method for learning the "right"
questions to ask (Hoaglin et al., 1982). That is, the purpose of case studies is said by
some researchers to be to generate hypotheses rather than to test or confirm them. The
method involves an in-depth, longitudinal examination of a single instance. The product is
a sharpened understanding of what might be important to look at further in similar
situations and what explains why the instance happened as it did. Because such inquiry
explores only one situation, it is argued that it cannot contribute directly to the
testing of general propositions, although it can contribute powerfully to the invention of
hypotheses.
Some other methods have a similar purpose. Exploratory analyses of large data bases are often used to select a smaller number of variables for additional testing, on the basis of interesting patterns that emerged from various combinations of the elements of the large data base. Promising relationships are singled out and those that seem uninteresting are set aside. Like findings from case studies, the result is considered as contributing not answers but a better understanding of what questions to ask and how to ask them.
An analogy might be drawn also to "evaluability assessment." Such assessment
may provide information valuable in itself about how completely a program has been
implemented. It is undertaken prior to testing the effects of the program, chiefly,
however, as an indicator of appropriate evaluation design.
Other researchers regard case studies as not only a different way of knowing but as a better way (Rist, 1981a; 1982a). More specifically, they emphasize the method's unique value in many complex situations of importance, such as studies of school desegregation, or economically distressed communities, or the Challenger tragedy. One reason they give is that skilled observers and interviewers can make judgments and valuations about factors that are otherwise very difficult to assess, such as how much effort a manager made to get information before a key decision was made or how much that person knew about what was going on. Also, these researchers believe, in complex situations the many persons who are significantly involved have different "realities" in their explanation of events and even in their perceptions of what happened, and this is best matched with a method that gradually represents and reconstructs these multiple realities, rather than a method that assumes a single "truth" exists.
As seen by such scholars,
"there is no single reality on which inquiry may converge, but rather there are multiple realities that are socially constructed, and that, when known more fully, tend to produce diverging reality. These multiple and constructed realities can not be studied in pieces (as variables, for example), but only holistically, since the pieces are interrelated in such a way as to influence all other pieces. Moreover, the pieces themselves are sharply influenced by the nature of the immediate context."
Methodologists who focus on case studies express their criteria of good research in different language, although they may deal with underlying concerns similar to those of researchers from more experiment-oriented traditions. Some criteria, however, are seen as unique to case studies and qualitative approaches. One example is fairness or assurance that a study has presented a balanced view of the many constructions of reality and the values underlying these. Another example is authenticity; another, realism. Each of these criteria is associated, in the literature on case study methods, with performance standards such as triangulation believed useful in ensuring--if they are carried out--that the study will be a good one.
Table 1.1: Criteria of Good Research
| Case study | Other approaches |
| Truth value; trustworthiness; credibility | Internal validity |
| Applicability; transferability | External validity; generalizability |
| Confirmability of data | Objectivity of observer |
| Consistency, dependability of data; explainable instabilities | Replicability; stability of reliability of data |
Type of Data
In some textbooks on evaluation, case studies are synonymous with qualitative data-that
is, data that are subjective or judgmental. Such data include narratives of events written
by participant observers, accounts of what the participants understood about an event,
reports of what was said at a meeting or an interview, observational records of how an
event took place, and statements of impressions about what was going on, why it was
happening, and how people felt about it, themselves, or each other.
To illustrate differences among types of information, we might base the conclusion that "the day was hot" on data from an instrument that records the room temperature (numerical and objective), a record of the atmospheric temperature as written down by an observer checking a thermometer (numerical and relatively nonsubjective), a survey asking people how hot they felt (nonnumerical and subjective), and a "thick" description of what clothes people were wearing, how much they perspired or shivered, whether they turned up the furnace or the air conditioner, and how much energy they seemed to have for work (nonnumerical and judgmental). When researchers describe case studies as using qualitative data, they usually mean the "thick" description. If the evaluation question involved an understanding of working conditions for migrant laborers or workers in heavy industry, a thick description, even including information on how exhausted the evaluator felt in the heat, would be more appropriate-according to some case study methodologists-than only recording that the thermometer registered 95 degrees.
Suppose we needed to know about the availability of housing for low-income people. If official records were adequate, good quantitative measures of availability might be the number of low-income persons applying for housing relative to the number of units that met minimum standards and cost within 30 percent of household income or the number of persons on waiting lists for such housing and how long they had to wait. We might also be able to report the number of applications for housing construction permits and how many units suitable for low-income housing were coming on the market within 12 months. Often, however, the records are not adequate. Here we might rely on qualitative information, such as the estimates of knowledgeable officials of demand and supply (judgmental, numerical) or of severity of the problem (judgmental, nonnumerical). We might also interview selected low-income families with regard to their experience in seeking housing or we might, as participant-observers, pose as low-income applicants and report our own experiences in finding housing for families of different sizes and within different payment ranges (judgmental, numerical, and nonnumerical).
Many researchers who write case studies use qualitative data because they believe them
to be richer, more insightful, and more flexible than quantitative data. They believe that
the meaning of an event is more likely to be caught in the qualitative net than on the
quantitative hook. For example, qualitative data permit dealing fairly directly with
values, politics, and factors that may be an important part of many situations. A
frequency distribution of events-such as a table showing the number of decision points in
a community economic development program and a decrease in the probability of action as
the number of decision points increases- are about as numerical as qualitative data are
likely to be in a research case study, according to some experts.
Method of Data Collection
To some researchers, case studies are synonymous with methods of data
collection deriving from anthropology, psychology, and sociology. The techniques include
fieldwork, ethnography, observation, and participant observation and have in common that
an observer is physically present at a site, stays at the site for a fairly long time, has
flexibility in deciding what data to collect from whom and under what circumstances, and
can organize the inquiry according to the meaning of events to the participants rather
than having to decide beforehand on a closed set of constructs or data elements. In most
instances, the observer is the senior investigator and the only researcher: Margaret Mead
in Samoa and Oscar Lewis in Puerto Rico are famous examples.
The greatest difference, to some experts, between other methods and case studies is the distinction between the researcher's (1) beginning by presuming, a priori, to know the relevant constructs and variables, measuring their incidence, and finding out how changes in them may be influenced by other events and (2) entering into an event to learn what is significant in it to the participants. As this implies, to researchers, the case study is an intensely personal method, dependent on the investigator's sensitivity, insights, and skill in noticing many things, recording them, and producing a narrative that suggests a pattern of the elements-or that recognizes the pattern that is there in the culture in its own terms. It is a demanding method, requiring specific skills (such as fluency in the language of the participants) and general self-awareness to maintain the fine balance between seeing things as others see them and identifying their perspective wholly with one's own. The researcher must weigh the value of experiencing what it is like to be part of the culture against the hazard of internalizing the experience too fully, which can jeopardize the capacity to see the culture from many perspectives. Nonetheless, some of the best reports have come from observers who entered as fully as possible as participants in the event being investigated.
The case study method is further distinguished by the researcher's self-conscious effort to understand what the observed events mean to the participants. No observer can enter a scene without preconceived ideas, but they can be set aside. Thus, a study of how a group is organized economically might begin with finding out what is valued in that group and how items of value are exchanged. They might not be goods or services, and exchanges might not be equal.
For example, in a basic research study of 40 low income women, Belle and her colleagues lived for many months among them as observers, confidantes, and friends, listening to what they said and noting what they did. The researchers found that turning to someone for even modest help (like minding a child for an hour) had the cost of later demands for a return of the favor and that this cost was nearly intolerable. The researchers found expected stresses like the loss of a check in the mail and the illness of loved ones. They found also, unexpectedly, that any change at all was stressful: being promoted to a higher-paying job, the graduation of a child from school, falling in love, even the restoration to health of a loved one who had been ill. In terms of the purposes of the study-finding out what was stressful to the women and why the incidence of mental health problems among them was so high-the case study method disclosed the importance of any change in life circumstances as a source of stress rather than merely confirming change that the observers might have thought stressful a priori.
Design
Case study methods have been defined by some researchers as designs that focus on a single
instance or a few instances. They also are identified with designs that are
nonexperimental in the sense that the investigator is not deliberately manipulating some
variable to see its possible effects on the system being studied. Two classic aims of
inquiry are to understand the nature of events and to understand their causes. Since case
study designs center on one or a few cases and lack the controls usually thought necessary
to an understanding of causal relationships, knowledge that results from case studies is
controversial with regard to generalizability and causality.
With regard to generalizability, some methodologists see case studies as above all particular, seeking to describe and understand the aspects of an instance without much concern for knowing whether they arise in or are characteristic of a larger population. The focus is on this school, this emergency room, this military base, or this nuclear power plant. Researchers can choose relatively freely which instance to study on any one of several bases, depending on the questions to be examined.
Thus, in a case study design, an instance may be selected because it is a unique event
of national interest, such as the Cuban missile crisis or the distribution of the swine
flu vaccine, both subjects of
research case studies. The instance may be selected because it has been affected by events
of interest, such as the desegregation of schools. It may be selected as an unusually
effective or unusually ineffective instance. However, whenever the purpose is an
understanding of the particular, the relationship of the instance to the various
populations that it is part of is less important than the assurance that the selected
instance can be fully examined.
With regard to causality, researchers using case study methods cannot rely on familiar ways of ruling out alternative explanations. Case studies do not compare individuals or groups to others randomly assigned to different treatments. Case studies do not use statistical adjustments to facilitate comparison. Case studies do not estimate statistically the influence of the many variables on the instance being examined. To understand a single case, the researcher must develop hunches about what is happening in the instance under study and systematically seek within it evidence consistent or inconsistent with the hunches. As evidence accumulates, a second tier of evidence is looked for that would be consistent or inconsistent with alternative explanations for why the hunches did or did not take the shape of a coherent pattern. That is, a very high standard of inferential logic is needed.
When this method produces a coherent, plausible story, the researcher can assert a relationship between cause and effect. When conflicting evidence cannot be resolved, the careful investigator indicates that causality cannot be established. The standard for making this judgment requires the diligence of the investigator in formulating alternative explanations of what is happening, in specifying the kind of evidence that would be supportive or nonsupportive, in searching intensively for evidence that would rule out initial hypotheses, and in thoroughly considering the reasons for inconsistent patterns of evidence in the second tier. These techniques, of course, have parallels in other research traditions.
The ability of the case study to capitalize on insight, to shift focus as the data
demand, and to let disparate pieces of evidence fall into place in ways that are not
always easy to describe or command is believed to yield a richer, fuller, and truer
explanation of why things look the way they do than the more limited number of tests of a
priori hypotheses that other methods use. In case studies, the criterion for deciding
whether causality has been established is the coherence of the evidence, its consistency
with the patterns ascribed to it, and its inconsistency with other explanations. In
research designs based on statistical inference, the criterion for establishing causality
is whether the findings are likely to have occurred by chance following appropriate
comparisons to eliminate alternative interpretations. In both instances, comparisons must
be appropriate if alternative explanations are to be ruled out.
Method of Data Analysis
Still another distinguishing feature of case studies, according to some
researchers, is a nonstatistical approach to data analysis. The elements of analysis are
familiar: the identification of regularities, patterns, and relationships and the
assessment of their importance of meaning. In quantitative methods, the regularities are
identified by manipulating numbers to produce indicators agreed on as sensible
descriptions of the patterns. For example, an average is a convention that creates a
single number to represent the collection of all the numbers in a set. Importance or
meaning is assessed in part by estimating the variability within the set of numbers to
obtain a probability that the regularity represents the characteristics of the population
of instances.
The logic of analysis in case studies is the same; the techniques for identifying regularities and assessing their meaning are different. Consider first the regularities. The case study analyst is trying to build an accurate description and explanation of events as both the observer and participants frame them. There would be little point in trying to identify a single number as an accurate representation of something this complex.
The analyst searches for clusters or paths in the data, using verbal notes and graphic aids, reviewing field data and other records of observations, until a pattern is evident. Then the data base is searched for further evidence that confirms or conflicts with the pattern. When the evidence is more inconsistent than consistent, the pattern is rejected. When the evidence is more consistent than not, the analyst confirms the pattern and looks for others related to it. When all the evidence is consistent, no further examination is needed. An array of techniques such as graphic displays has been developed to help standardize case study analysis.
A key element of case study analysis is the selection and organization of material to account for the complexities and interactions of the events. The rules are judgmental, not probabilistic. Have all the pieces of pertinent information been considered? Has adequate attention been given to the outliers? Does the pattern seem plausible or forced? Have inconsistencies been sensibly resolved?
Using an analogy, we can say that the case study analyst seeks to explain 100 percent of the variance by relying on a data base that includes more variables than most quantitative studies can accommodate, over more points in time, and on a method that draws on the integrative powers of the mind, which computers do not have. The method inherited from sociology and anthropology entails early immersion in the setting, recorded observations, reflections on the spot, and analysis that occur throughout fieldwork, as data are being collected. Analysis is infused throughout the research process in case studies; it is not a step after the completion of data collection.
Reporting
Case studies are usually reported as narratives that read like chronologies of what led up
to an event and what happened during and after it. They have been called
"then-they-did-this" studies. In order to be comprehensive and cohesive, the
researchers provide a great deal of detail and description and quote directly from the
participants' own words and vignettes in the observers' field records.
To some proponents of case studies, the credibility of the method depends on what they call "naturalistic generalizability." By this they mean that readers compare their own observations, experience, and belief to the narrative and regard the parts of the investigation that are consistent with these as confirmed. What is inconsistent tends to be examined more closely and may be rejected as less credible, unless-so the argument runs-there is enough detail that readers can "see it with their own eyes." For readers who know a handicapped child or have an aging parent, for example, a case study report of conditions in residential care for the severely handicapped or the aged may compare dramatically with vivid personal experiences. The case study report can provide an organizing framework for thinking about these and other experiences.
The usefulness of case study reports, therefore, depends to some degree on how well the investigator has portrayed the participants' ways of thinking about what happened and on how divergent the investigator's analysis is from the reader's ways of thinking about the subject. The credibility and authenticity of the case study report may depend on the writer's having provided extensive detail and description, making unexpected conclusions as difficult to deny as if the reader had been part of the event. In this respect, the narrative mode is not a stylistic choice; it is inherent in the purpose of case studies and the nature of their inquiry. It becomes an obstacle only when authors seek publication through outlets whose customers generally ask for brief details.
The Case Study Adapted for Evaluation
The traditional case study belongs to research, not evaluation. To apply case study
methods to evaluation, evaluators have had to adapt what derived largely from sociological
and anthropological fieldwork (Patton, 1980). Before 1970, however, evaluation case
studies were similar to research case studies. They were longitudinal, were made by onsite
observers who sough participant-observer roles, and constituted an inquiry structured from
an evolving understanding of events and their meaning to the persons involved in them.
There was usually only one research investigator, and the data consisted of descriptions,
observations, impressions, unstructured interviews, and existing materials gathered at the
site that were organized intuitively and informally. The case study report was a narrative
whose purpose was to illustrate or portray what a program was like, how it was being
implemented, and how those who were part of it both affected it and were affected by it.
In these early uses of the case study method, evaluators wrote their reports to stand alone. Little effort was made to integrate two or more sources of data, even when the evaluation design included them, although simple references might be made to the number of times a feature of other sites was also characteristic of the site reported in the study. The case study was useful for readers interested in what a particular program was like or what happened to a typical beneficiary.
Early in the 1970's when evaluators wanted to design studies that would capture some implementation or outcome features that were different or expensive to measure reliably on a large scale-for example, the way a large organization handled a complex innovation or the effect of an education program on motivation to learn or growth in self-confidence-they felt caught between risking considerable effort in trying to quantify qualitative variables and risking the criticism that they were ignoring really important things because they could not be measured. The case study seemed a way out. It offered a relatively inexpensive, low-risk supplement to the large-scale collection of other information that could be measured more cheaply and reliably than with large-scale studies. By and large, investigators commissioned to carry out the early case study evaluations had been trained in the academic disciplines with the strongest fieldwork traditions, and they had to struggle with the extent to which their research method could be adapted to evaluation, retain its integrity, and yield positive benefits. The struggle is not yet fully over.
The Study Questions
The first adaptation was a shift of the specification of study question
from the principal investigator during the period of study performance to the persons who
commissioned the study in advance of data collection. For example, those who supported an
evaluation of a training program might want the researchers to find out whether the
development of the participants' self-concepts, self-esteem, task orientation, work
habits, and personal and social traits seemed associated with the program or with
something else.
Researchers from fieldwork traditions would have argued that they could provide the
most useful information by spending some time at the site of the training program, trying
to understand what the program meant to those who were involved in it, and reporting on
what was happening from the perspective of those who were making it happen. Since this
might have everything to do what the participants' chance to socialize with friends in a
nonthreatening environment and nothing to do with
self-esteem or work habits, it seemed to these researchers that it was therefore logical
not to decide on the evaluation questions until their appropriateness could be determined.
The compromise that has developed is to include in the evaluation the questions of interest to the customer and to permit the researchers to determine what data elements are relevant to these questions and from whom and how they should be collected. It allows the evaluator to remain alert to other questions that might prove more salient if allowed to emerge.
How satisfactory is the compromise? The final reports of some non-GAO case studies show little or no resemblance between the final questions and those in the approved study proposal, and a number of issues about this have not been resolved. We do not know whether the discrepancy is more frequent in case studies than in other methods. It may be that the final questions are the ones the investigators wanted to look at all along, so that the methodology is vulnerable to subterfuge. Possibly the emergent questions were those that should reasonably have been expected to come into focus-and whose emergence may be why case studies rather than surveys are used. From the perspective of the authenticity and integrity of results, the larger public interest may have been served. If the method is highly susceptible to this kind of internal change, the appropriate scope for case studies should be examined. When the case study involves one site and modest expense, the price for identifying better questions early may seem affordable.
This is not to say that all case study evaluations show divergence between the questions that were asked and those that were answered or that an appropriate balance between the evaluator's and the customer's needs is never reached. However, applying the case study methods of research to evaluation requires dealing with matters of control, power, and responsibility that were less visible in the work of academic researchers before their methods were adapted to evaluation.
The Number of Sites
The demands of evaluation led to other adjustments in ethnographic methods.
One such demand was that a method developed for understanding the particular had to be
modified for learning about the general. Another was the need for something more adequate
than "naturalistic generalization" for evaluation purposes. A third was the
problem of site variation, which in the mid-1970's was identified in quantitative studies
as an ill-understood source of greater differences in a program's outcomes than the
program itself.
The case study method seemed born to help, but the forces of time and cost associated with making multisite evaluations led to considerable adaptation. First, since evaluators often needed simultaneous study at several sites, they needed several observers, which created issues of coordination and interpretation. Second, the cost of maintaining a trained full-time field worker at a site runs high, so that evaluators had to settle for shorter observations or untrained field workers or both.
All these changes-to multiple observers, professionally supervised but not professionally trained observers, and shorter observation times-led to others. The across-sites data base got much larger as the number of sites in a study rose. The within sites data bases became less extensive as observation times were shortened. It became a challenge to integrate the work of different observers if they focused their attention on different topics from site to site. And this much larger, much less extensive, probably less reliable data base had to be analyzed and reported in a much shorter time than that of fairly leisurely academic research. Not surprisingly, analysis has become a major methodological concern, and more structured and perhaps more efficient approaches to analysis have been developed.
Quantitative Methods
To these adaptations, another was added. The case study was given a
purpose-program evaluation- beyond that of illustration, exploration, or generation of
hypotheses. As the examination of program implementation and program effectiveness became
more central to the case study, so did the ability to generalize findings. In turn,
quantitative methods in case studies expanded.
Quantitative methods were incorporated in the case study in two ways. The first was in
triangulation: the use of several forms of data within a single case study in order to
give many reference points for verifying patterns and ruling out alternative explanations
in order to achieve what evaluators call "internal validity." The second was in
the combination of case study methods with other methods, particularly surveys, in order
to achieve the generalizability that evaluators called "external validity."
These adaptations created the need for a better understanding of the relationship between
case study techniques and other techniques and between quantitative and qualitative
approaches within case studies.
Summary
Table I.2 shows the changes that have been made to adapt the research case study to
evaluators' needs. Adapting the research case study to the evaluator's needs has entailed
a number of changes. Less time is spent at sites. Information is collected by junior staff
working under the supervision of an investigator trained in case study methods. More time
is allowed for training and monitoring quality. Data are combined from several sites to
allow generalization; and data collection has been given greater structure. Methods of
assessing the reliability of observations, techniques for transforming very large amounts
of qualitative data, and methods for aggregating qualitative data or findings from several
sites have been developed. The ability to generalize has become a matter of design and
analysis. Reporting methods have changed.
Table 1.2: Evaluation Adaptations of the Research Case Study
| Case Study Elements | Research | Evaluation | |
| Design specifications | |||
| Study questions | Researcher asks | Sponsor asks | |
| variables | Emerge from observation | Sponsor specifies | |
| Site selection | Of specific interest | Representative | |
| Instances | One | Many | |
| Data | Researcher specifies | Sponsor or sponsor and researcher specify | |
| Design | Trends at one site | Comparison of many sites | |
| Methods | One | Several | |
| Costs | Usually inexpensive, time-consuming | May be very expensive and time-consuming, particularly in studies with many sites | |
| Data Collection | |||
| Type of data | Quantitative | Quantitative and qualitative | |
| Time span studied | Long | Short; may be cross-sectional | |
| Time at sites | Long | Short | |
| Sources | Informants, observation | Informants, documents, administrative data | |
| Collection method | Researcher specifies | Sponsor or sponsor and researcher specify | |
| Role of sight | Central | Supplementary | |
| Collector | Researcher | Staff | |
| Analysis | |||
| Analyst | Researcher | Staff | |
| Researcher's role | Comprehension | Supervisory | |
| Data reduction | Minimal, original data | Considerable; codification, content analysis | |
| Multiple data | Triangulate within site | Triangulate across and within sites | |
| Analysis techniques | Nonformalistic, pattern recognition | Formalistic; graphic and content analysis techniques | |
| Procedures | Intuitive, thematic | Formal, comparative, thematic | |
| Establishing causality | Coherent, plausible story | Greater emphasis on design elements in addition to internal coherence | |
| Reporting | Narrative, descriptive, detailed building of coherent story | Conclusion-oriented, use of vignettes for examples | |
Case studies in evaluation today have made these adaptations in different degrees. Some
studies have not only generalized but also tested hypotheses. Some case studies rely
wholly on quantitative data. Some rely wholly on information collected by others, not
trained as sociologists or anthropologists, rather than on firsthand observation. Some aim
for uniformity or comparability of data both within a site with multiple observers and
across several sites. Some use inferential statistics as well as descriptive statistics.
Some present findings and conclusions in forms closely resembling those of other methods.
These adaptations are not uniformly valued. Some case study methodologists work with structured evaluation questions, structured data collection, and observers untrained as anthropologists or sociologists, but they believe that case studies offer a qualitative way of knowing that should not be merged with quantitative results. Others believe that case studies cannot be used for making the kind of generalizations that probabilistic models are used for, so that little is to be gained and so much is to be lost from increasing the number of sites. Still others believe in using many sites in case studies for evaluations and see the next step as establishing more explicit procedures for analyzing data and reviewing quality.
"Case study" means different things to different methodologists, who reach
different conclusions about how to do case studies, how to report them, and their overall
appropriateness for answering a specific question. If case studies can vary so greatly,
how can we assess their usefulness for evaluation? One way is to develop a working
definition of the case study that embodies its essential methodological features and then
to examine the strengths and limitations of case studies for different evaluation
questions. This is the approach taken in this paper in developing our initial definition.
Appendix II
Site Selection Example
Imagine that in 1987, within an effort to estimate the extent of tax revenues lost or
delayed from the failure of businesses to file returns, the General Accounting Office
examined revenue shortfalls to individual states. Imagine we found 170,076 such instances
(a national projection based on a sample) and estimated that, cumulatively, over $500
million was lost to the states. Our report attracted much congressional interest.
Variation among states in the rate of such "missing returns" was of particular
concern. Imagine we now have been asked to examine in more detail what explains
differences among states in "missing returns," since cumulatively the effect is
to make states look poorer than they actually would be if they collected revenues
authorized by their own legislatures. (Hypothetical data for this example are given in
table 2.2.)
Question 1: Instance Selection
Using the hypothetical data in table II.I, identify states for each type of purposive
selection that we might consider.
Bracketing
Best case
Worst case
Cluster
Representative
Typical
Special interest
Table II.I: Hypothetical Data on Unfiled Corporate Income Tax Returns for 1986
State Income Tax Returns
| State | Number unfiled | Rate unfiled |
| Alabama | 6,100 | 5 |
| Alaska | 610 | 2 |
| Arizona | 3,475 | 9 |
| Arkansas | 4,391 | 2 |
| California | 2,8841 | 3 |
| Colorado | 3,012 | 2 |
| Connecticut | 2,738 | 3 |
| Delaware | 995 | 5 |
| District of Columbia | 1,562 | 3 |
| Florida | 13,372 | 4 |
| Georgia | 8,887 | 5 |
| Hawaii | 1,197 | 1 |
| Idaho | 732 | 2 |
| Illinois | 16,103 | 3 |
| Indiana | 6,077 | 3 |
| Iowa | 2,096 | 1 |
| Kansas | 2,125 | 1 |
| Kentucky | 3,724 | 3 |
| Louisiana | 8,462 | 4 |
| Maine | 1,032 | 1 |
| Maryland | 6,292 | 3 |
| Massachusetts | 4,427 | 2 |
| Michigan | 8,849 | 3 |
| Minnesota | 3,074 | 2 |
| Mississippi | 6,002 | 5 |
| Missouri | 5,886 | 3 |
| Montana | 770 | 1 |
| Nebraska | 1,324 | 2 |
| Nevada | 781 | 5 |
| New Jersey | 7,985 | 3 |
| New Mexico | 2,394 | 3 |
| New York a | 19,349 | 1 |
| North Carolina | 7,460 | 10 |
| North Dakota | 539 | 1 |
| Ohio | 12,088 | 6 |
| Oklahoma | 3,593 | 6 |
| Oregon | 2,246 | 3 |
| Pennsylvania | 11,774 | 2 |
| Rhode Island | 856 | 3 |
| South Carolina | 5,529 | 4 |
| South Dakota | 736 | 1 |
| Tennessee | 5,734 | 15 |
| Texas | 18,061 | 2 |
| Utah | 1,152 | 2 |
| Vermont | 463 | 2 |
| Virginia | 8,032 | 4 |
| Washington | 3,806 | 2 |
| West Virginia | 1,760 | 3 |
| Wisconsin | 4,559 | 2 |
| Wyoming | 442 | 3 |
Answer to Question 1
Bracketing
Given the size differences between states, a double bracket might be considered. New York
and Texas might form one pair; Kansas and Arizona a second pair.
Best Case
Three states have missing returns (unfired) rates of less than 1 percent. These are
Hawaii, Kansas, and New York. Hawaii and Kansas are relatively small states and New York
has implemented a special initiative. Adding states with 1 percent unfired rates to the
pool would not add larger states, however, since these are Iowa, Maine, Montana, North
Dakota, and South Dakota. It may be that the correlation between "smaller"
states and very low rates of unfired returns is a "real" phenomenon that should
be examined and the initial cut of less than 1 percent should stand.
Worst Case
Texas is an outlier, with a 15-percent unfired rate. North Carolina has a 10-percent
rate, Arkansas 9 percent. The next closest states are Ohio and Oklahoma, with 6 percent
each. Selecting Texas, North Carolina, and Arkansas would be a reasonable worst-case
choice.
Cluster
Except for New York, no information is given about programs or state initiatives. Using
only the data in the table, several bases for clustering could be considered. One
frequently used basis is "size of the problem": that is, 7 states account for
about 45 percent of all unfired returns (California, Florida, Illinois, New York,
Oklahoma, Pennsylvania, and Texas). This basis for selection should be ruled out for this
job, however, because there is no meaningful cluster from the group, except that the
states are all among the larger states. The rationale for the job is bolstering each
individual state's revenue, not the national pooled aggregate. Since there are more
smaller, semirural states than big states, the wellbeing of individual states would not
necessarily be best served by examining what happens in the few larger states. Another
basis might be a crosstabulation of state size and rate of unfired returns; here selection
of six states could give a reasonable fix on reasons for the problem but would essentially
reproduce the strategy used in the representative sample. We would conclude that the data
in the table are not sufficient for drawing a cluster sample.
Representative
The distribution of unfired rates is positively skewed, which means that instances are
piled up at
the low end and scattered out over the high end. With such a distribution,
"representative" in terms
of unfired rates would sensibly mean at the low (1 and less than 1 percent), lower middle
(2 and 3 percent), upper middle (4, 5, and 6 percent), and high (9, 10, and 15 percent)
points. Assuming state size would be a "second cut" variable, New York (1
percent), California (3 percent), Ohio (6 percent), and Texas (15 percent) could be one
group to study, while Kansas (less than 1 percent), Massachusetts, (2 percent), Oklahoma
(6 percent), and Arizona (9 percent) could form a second group of smaller states.
Together, the eight states also would provide reasonable geographic representativeness, as
well as industrialized versus more rural spreads.
Typical
A frequency distribution of unfired rates shows that 14 states had rates of 3, which turns
out to be
both the mode and the median for this distribution. States in this category include
California, Connecticut, Illinois, Indiana, Kentucky, Maryland, Michigan, Missouri, New
Jersey, New Mexico, Oregon, Rhode Island, West Virginia, and Wyoming. With no other
information (for the purpose of this exercise), if fewer than 14 case studies were to be
made, selecting states typical in size such as Maryland, Michigan, New Jersey, and Indiana
would make sense.
Special Interest
New York would be of special interest as a large state with a very low rate of unfired
returns. New
York also is unique in implementing relevant legislation that might have some national
potential.
Question 2
While it might be possible, given the data in table II. 1, to select states on six of the
seven purposive bases, would the evaluation question itself present a situation in which
we would want to consider case studies at all?
Answer
Yes, but not as a stand-alone method. We have been asked to examine the reasons for state
variation in unfired returns. One plausible reason is that the differences are the result
of how states solicit returns, monitor compliance, and penalize failure to file. We could
obtain tax codes and procedures for each state, examine these, interview selected
officials, and generate some plausible patterns. However, understanding reasons for
behavior as complex as not filing is well suited for case studies. Explanations could
range from (for example) failures of managing returns actually filed, which are quite
susceptible to improvement, to economic cycles that affect business circumstances and that
may be less susceptible to change. Since the underlying concern is that many states may be
asking for federal assistance when they would have resources to handle more of their own
needs if they collected revenues owing to them, case studies of a representative sample of
states coupled with examination of the special interest state could be an efficient
strategy for ensuring that we had a comprehensive understanding of what was happening and
why. To provide the generalization desirable, the case studies could be followed by a
national survey of state officials, checking out the findings from the in-depth studies.
Such a sequence could be quite efficient, since the national survey would not be a fishing
expedition but targeted to verify initial findings. It also would offer considerable
assurance that we had accurately determined reasons affecting most states.
Appendix III
Guidelines for Reviewing Case Study Reports
There are at least six different types of case study application in evaluation, and their strengths and limitations are different. Choosing an appropriate method depends on understanding the evaluation question. What is technically right for one question is not necessarily right for another. However, there are standards that can be applied to all case studies in evaluation. Studies that fail to meet them have questionable merit. These guidelines present the minimum standard of quality in case study evaluation, taken in conjunction with the guidance in the "Yellow Book," Policy Manual, and Communications Manual.'
Design
1. Are the evaluation questions stated clearly and explicitly? A good study informs
the reader early in the report about the questions that were answered and the issues that
were investigated.
2. Is the case study application clearly described? Is it appropriate? A good case study describes the case study application that was used. It explains why this application is appropriate for the kind of evaluation questions that were answered (descriptive, normative, cause-and-effect). Where several methods were used, the relationship of the case study to the other methods is clear and appropriate.
3. Was the time span of the study long enough to address the core issues fairly? A good case study reports how much time the investigation covered in relation to the history of the instance or program. Case studies aiming at a comprehensive analysis of an event as a whole begin as early as possible in its history and continue through its completion or stabilization. Evaluation case studies have covered shorter periods and involved less on-site investigation than research case studies characteristically do. Readers should recognize, however, that as time shortens, so may the value of the method as a way of presenting a comprehensive understanding of the event as a whole.
4. Is the basis for case selection presented? Is it appropriate for the purpose of the case study? A good case study presents the reasons for selecting the instances that were examined. The reasons are appropriate for the case study application, an issue of particular concern if a generalization of the findings is intended. For assessing the study's adequacy, the kind of site selected is as important as the number of sites selected. Attention should be paid to the physical setting, to the people who are served by the program, and to variations in treatment.
| 1 These guidelines have been adapted from "Guidelines for Reporting Large Case Studies" by John R. Gilbert in David C. Hoaglin et al., Data for Decisions: Information Strategies for Decision makers (Cambridge, Mass.: Abt Books, 1982), pp. 138-39, and Robert K. Yin, Case Study Research: Design and Methods (Beverly Hills, Calif.: Sage, 1984), pp. 140-45. |
Data Collection
1. Are the methods of data collection presented? Are they appropriate
for the purpose of the case study? Unstructured methods may be appropriate for
illustrative and exploratory applications. Semi-structured approaches may be appropriate
for critical instance case studies involving multiple sites, particularly if more than one
investigator was responsible for collecting data for several sites.
2. If more than one investigator collected the data, how were the other evaluators selected, trained, and supervised? There is considerable agreement that the consequence of the many variants in data collection for multiple sites is uncertain, but providing detailed information on the procedures that are used and an explanation of the reasons for the approach are essential to a good case study.
3. Are information sources described clearly and fully? Are they appropriate?
A good case study presents in detail the sources of evidence. The detail is greater than
that required in other methods. A
good case study report gives the numbers and positions of the persons interviewed and the
evidence that they were appropriate for the evaluation. The reader should be able to judge
from the information that is given in the case study report how credible the conclusions
are in terms of the appropriateness and completeness of information sources.
Data Base Formation and Data Analysis Techniques
1. Are the procedures for the formation of the data base described? A good case
study describes how the data bases were formed and presents a justification for decisions
that were made about the qualification, precision, and detail of information in the data
base at each site.
2. Are the techniques of data-gathering and data processing explicitly described? Readers of a good case study should know how the data were collected and, step by step, how they were analyzed. If semistructured packets of directions were used to guide field workers through the issues, a good case study describes them or includes them in technical appendixes. All the steps of data reduction and coding are described, along with the basis for transformations in these steps. The analytic techniques are explicitly described. What data sources were used in triangulation? In what order? How were discrepant findings resolved? The validity of case study methods partly depends on the resolution process. At each step, safeguards should have been taken for completeness and the reduction of the threat of bias.
3. Were there interpretation differences, and if so how were they resolved? A good case study is explicit about differences in the interpretation of evidence and events between members of the investigative team and the reviewers of the draft report. The case study method often uses data that are more judgmental, interpretive, and subjective than other methods. The data are often less accessible to secondary analysis. Thus, a good case study states the argument and evidence more plainly than most reports have to.
4. If other studies, investigations, or experiments relevant to the issue are available, have their results been presented and reconciled with the case study findings? A good case study presents the findings and conclusions for other studies on the same issue. When the findings do not converge, the case study reconciles or explains the differences as far as possible. Completeness of information requires this step.
Reporting
1. Are methodological strengths and limitations identified clearly? A good case
study reports methodological strengths and limitations for answering the evaluation
questions and explains the tradeoffs that were considered and who influenced the
decisions. When several decisionmakers were involved, a good case study describes the
types of decisions each one made and the constraints on those decisions.
2. Are the arguments for various resolutions of the evaluation question presented? Most case studies are on topics about which some kind of opinion has been formed. In a good case study, the conceptual framework for organizing the inquiry is quite explicit about expectations. A good case study identifies the elements of the issue that was examined and presents the initial arguments in favor of the various resolutions and the findings of the study that support these resolutions.
3. Are the arguments against various resolutions of the issue presented? A good case study presents the initial arguments against the various resolutions of the issue that was considered. Case study investigators are supposed to seek evidence that confirms and evidence that contradicts the observations and conclusions. Explicitly stating the initial arguments for and against various resolutions helps readers know how thoroughly the investigators considered the issues and how thoroughly they sought evidence on both sides.
4. Does the case study identify the factors explaining the phenomena that were observed and state clearly whether the identification of these factors was based on insight and recognition or on quantitative techniques? Case studies are undertaken for their explanatory power and their superior ability to identify the reasons for problems and the nature of events. A good case study explicitly identifies alternative explanations, lays out the chain of reasoning, and makes clear which conclusions rest primarily on the investigators' insightful recognition of patterns of evidence and which have been recognized in other ways.
Impartiality and Generalizability
1. What is known about the competence and impartiality of the investigators? A good
case study provides information about the experience of the investigators with case study
methods and what they have written previously about the questions that were answered. The
more evidence there is that the investigators have had appropriate training in case study
methods, and that they have addressed related issues in ways that seem impartial and are
intended to reduce bias, the greater confidence the reader can have in the quality of the
work. For GAO reports, the job documentation should contain evidence that the evaluation
team as a group possessed the skills required and assurance that there were no impediments
to impartiality among individual team members. For others' reports we plan to use in our
studies, we should seek similar assurance in a report itself or from knowledgeable
persons.
2. Are comments on the draft report available? Perhaps because case studies require more detail than other methods, case study reports are sometimes criticized for failing to be convincing about their impartiality. One way that a good case study counters this criticism is by the inclusion of a technical appendix that gives the full comments of the informants who reviewed the draft.
3. Is there adequate information for judging generalizability? The basis for claiming generalizability is explicit in a good case study. It provides the evidence, of whatever type and detail, that is needed for assessing this claim. In a good case study, generalizations do not exceed the basis for these, considering program diversity and how the cases studies were selected.
We provide a checklist of the guidelines discussed in this appendix in table III. 1
Table III.1: Checklist for Reviewing Case Study Reports
|
|
Yes | No |
Design
Data collection
Data base information and data analysis technique
Reporting
Impartiality and generalizability
|
Bibliography
For readers with an interest in further information, but limited time, a few key
references are starred
(* )
Abert, James G., ed. Program Evaluation at HEW: Research vs. Reality, parts 1-3. New York: Marcel Dekker, 1979.
Abt, Wendy P., T. Cerva, and T. J. Marx. Why So Little Change? The Effects on Pupils of the Experimental Schools Program. Cambridge, Mass.: 1978.
Abt Associates. First Annual Substantive Report for a Study of Experimental Schools Projects in Small Schools Serving Rural Areas. Cambridge, Mass.: 1975.
Acland, Henry. "Are Randomized Experiments the Cadillacs of Design?" Policy Analysis, 5 (Spring 1979), pp. 223-41.
Allison, Graham T. The Essence of Decision: Explaining the Cuban Missile Crisis. Boston Little, Brown, 1971.
Anderson, Scarvia B., et al. "Case Study Method." Encyclopedia of Educational Evaluation, pp. 46-47. San Francisco: Jossey-Bass, 1976.
Arrow, Kenneth J. Social Choice and Individual Values, 2nd ed. New York: John Wiley and Sons,
Barzun, Jacques, and Henry F. Graff. The Modern Researcher, 3rd ed. New York: Harcourt Brace Jovanovich, 1977.
Becker, Howard S. "Problems of Inference and Proof in Participant Observation." American Sociological Review, 23 (1958), 652-59.
Belle, Deborah. Lives in Stress: Women and Depression. Beverly Hills, Calif.:
Sage, 1982.
Berger, Michael A. "Studying Enrollment Decline (and Other Timely Issues) via the
Case Survey." Educational Evaluation and Policy Analysis, 5:3 (1983), 307-17.
Berman, P., et al. How Schools View and Use the School Improvement Program. Berkeley, Calif.: Manifest International, 1981.
Blalock, Hubert M., Jr. Causal Inferences in Nonexperimental Research. Chapel Hill, N.C.: University of North Carolina Press, 1964.
Bloor, M. "On the Analysis of Observational Data: A Discussion of the Worth and Use of Inductive Techniques and Respondent Validation." Sociology: The Journal of the British Sociological Association, 12 (1978), 545-52.
Bock, Edwin A., ed. Essays on the Case Study Method in Public Administration. Brussels, Belgium: International Institute of Administrative Sciences, 1962.
Bock, Edwin A., ed. Essays on the Case Study Method. Syracuse, N.Y.: International Institute of Administration Sciences, The Inter-University Case Program, November 1971.
Bogdan, Robert. Participant Observation in Organizational Settings. Syracuse, N.Y.: Syracuse University Press, 1972.
Brandt, R. Studying Behavior in Natural Settings. New York: Holt, Rinehart and Winston, 1972.
Broadhead, R., and Ray C. Rist. "Gatekeepers and the Social Control of Social Research." Social Problems, 23 (1976), 325-26.
Bulmer, M. "Concepts in the Analysis of Qualitative Data." Sociological
Review, 27 (1979), 651-77.
Burger, R., and M. Massaglia. RANN Utilization Experience: Case Studies 22 Through 31,
vol. 2. Research Triangle Park, N.C.: Research Triangle Institute, August 1976.
Campbell, Donald T. "Degrees of Freedom and the Case Study." Comparative Political Studies, 8 (1975), 178-93.*
Campbell, Donald T., and Julian C. Stanley. Experimental and Quasi-experimental Designs for Research. Chicago: Rand McNally, 1963.
Chelimsky, Eleanor. "GAO's Institute for Program Evaluation." State Evaluation Network Newsletter, 1 (1981), 2-5.
Chelimsky, Eleanor, and J. Dahmann. Final Report of the Career Criminal Program National Evaluation: Case Studies of Four Jurisdictions, 1976-79. McLean, Va.: MITRE Corp., June 1980.
Chelimsky, Eleanor, and J. Sasfy. The National Level Evaluation of the Career Criminal Program: Concept and Plan. McLean, Va.: MITRE Corp., May 1976.
Christoph, James B., ed. Cases in Comparative Politics. Boston: Little, Brown, 1965.
Connelly, W. L. Continuity and Change in Rural Schooling. Cambridge, Mass.: Abt Associates, 1979.
Cook, Thomas D., and Charles S. Reichardt, eds. Qualitative and Quantitative Methods in Evaluation Research. Beverly Hills, Calif.: Sage, 1979.
Cook, Thomas D., and Donald T. Campbell. Quasi experimental Design and Analysis
Issues for Field Settings. Chicago: Rand McNally, 1979.*
Cronbach, Lee. "Remarks to the New Society." Evaluation Research Society
Newsletter, 1 (1977), 4.
Cronbach, Lee, et al. Toward Reform of Program Evaluation. San Francisco: Jossey-Bass, 1980.
Datta, Lois-ellin. "Strange Bedfellows." American Behavioral Scientist, 26:1 (1982), 133-44.
David, J. L., and Greene, D. A. Research Design for Generalizing from Multiple Case Studies. Palo Alto, Calif.: Bay Area Research Group, 1981.
Dawson, Judith. "The Validity of Qualitative Research." Paper presented at the American Educational Research Association meeting, San Francisco, Calif., April 1979.
Dawson, Judith A. "Qualitative Research Findings: What Do We Do to Improve and Estimate Their Validity'?" Paper presented at the Annual American Educational Research Association meeting, New York, March 1982.
Denzin, Norman K., ed. Sociological Methods: A Sourcebook. New York: McGraw-Hill, 1978a.
Denzin, Norman K. "The Logic of Naturalistic Inquiry." Social Forces, 50 (1971), 166-82.
Denzin, Norman K. The Research Act, 2nd ed. New York: McGraw-Hill, 1978b.
Derthick, Martha A. New Towns In-Town: Why a Federal Program Failed. Washington, D.C.: The Urban Institute, 1972.
Dittman, Laura, et al. "Study of Selected Children in Head Start Planned
Variation, 1969-70. First Year Report. Case Studies of Children." University of
Maryland, College of Education, College Park, Md., 1971.
Dobbert, Marion Lundy. Ethnographic Research: Theory and Applications for Modern
Schools and Societies. New York: Praeger, 1982.
Downey, H. Kirk, and Duane R. Ireland. "Quantitative Versus Qualitative: The Case of Environmental Assessment in Organizational Studies." Administrative Science Quarterly, 24 (1979), 630-37.
Farley, Joanne. "Combining Quantitative and Qualitative Methods in Evaluation Research." State Evaluation Network Newsletter, 1 (1981), 3-5.
Farrar, Eleanor, John DeSanctis, and David Cohen. "Views from Below: Implementation Research in Education." Teachers College Record, 82:1 (1980), 77-100.
Fiedler, Judith. Field Research: A Manual for Logistics and Management of Scientific Studies in Natural Settings. San Francisco: Jossey-Bass, 1978.
Fienberg, S. E. "The Collection and Analysis of Ethnographic Data in Educational
Research." Anthro
pology and Education Quarterly, 8 (1977), 50-57.
Filstead, William J., ed. Qualitative Methodology. Chicago: Markham, 1970.
Finsterbush, Kurt. "Statistical Summary of 52 AID Projects: Lessons on Project Effectiveness." University of Maryland, College Park, Md., 1984.
Glaser, Barney G. Theoretical Sensitivity: Advances in the Methodology of Grounded
Theory. Mill
Valley, Calif.: Sociology Press, 1978.
Glaser, Barney G., and Anselm L. Strauss. The Discovery of Grounded Theory:
Strategies for Qualitative Research. Chicago: Aldine, 1967.*
Glazer, Myron. The Research Adventure: Promise and Problems of Fieldwork. New York:
Random House, 1972.
Goetz, J. P., and M. D. LeCompte. "Ethnographic Research and the Problem of Data Reduction." Anthropology and Education Quarterly, 12 (1981),
Guba, Egon G. Toward a Methodology of Naturalistic Inquiry in Educational Evaluation. Los Angeles: University of California, Center for the Study of Evaluation, 1978.
Guba, Egon G. "Criteria for Assessing the Trustworthiness of Naturalistic Inquiries." Educational Communications and Technology Journal, 8 (1981), 42
Guba, Egon G., and Yvonna S. Lincoln. Effective Evaluation: Improving the Usefulness of Results Through Responsive and Naturalistic Approaches. San Francisco: Jossey-Bass, 1981.
Halpern, Edward S. "Auditing Naturalistic Inquiries: Some Preliminary Applications." Paper presented at the American Educational Research Association meeting, Toronto, Canada, April 1983.
Hamilton, D., et al. Beyond the Numbers Game. Berkeley, Calif: McCutchan, 1978.
Hargrove, Erwin C. "The Bureaucratic Politics of Evaluation: A Case Study of the Department of Labor." Evaluation Studies Review Annual, vol. 6, Howard E. Freeman and Marian A. Solomon, eds., pp. 179-288. Beverly Hills, Calif.: Sage, 1981.
Hedrick, Terry E., Robert F. Boruch, and K. J. Ross. "On Ensuring the Availability
of Evaluation Data for Secondary Analysis." Evaluation Studies Annual, vol. 4,
L. Sechrest et al., eds. Beverly Hills, Calif.: Sage, 1979.
Herriott, Robert E. "Ethnographic Case Studies in Federally Funded Multi-disciplinary
Policy Research: Some Design and Implementation Issues." Anthropology and
Education Quarterly, 9 (1977), 106-15.
Herriott, Robert E. Federal Initiatives and Rural School Improvement. Cambridge, Mass.: Abt Associates, 1980.
Herriott Robert E. Case Study Methods in School Evaluation and Research: A Synthesis of Experience. Part I. Final Report. Washington, D.C.: National Institute of Education, June 15, 1982.
Hersen, Michel, and David H. Barlow. Single-Case Experimental Designs: Strategies for Studying Behavior Change. New York: Pergamon Press, 1976.
High/Scope Educational Research Foundation. National Home Start Evaluation Study, Interim Reports IA and IB, Case Studies. Ypsilanti, Mich.: 1972.
Hoaglin, David C., et al. Data for Decisions: Information Strategies for Policy Makers. Cambridge, Mass.: Abt Books, 1982.
Holt, Robert T., and John E. Turner, eds. "The Methodology of Comparative Research." The Methodology of Comparative Research, pp. 1-20. New York: Free Press, 1970.
House, Ernest R. The Logic of Evaluative Argument. Los Angeles: University of California, Center for the Study of Evaluation, 1977.
Huberman, A. M., and M. B. Miles. "Drawing Valid Meaning from Qualitative Data:
Some Techniques of Data Reduction and Display." Quality and Quantity, 17
(1983), 283-339.
Jauch, L., R. Osborn, and T. Martin. "Structured Content Analysis of Cases." Academy
of Management Review, 5 (1980), 517-25.
Jerome, Chris H. National Home Start Evaluation Study, Interim Report III, Case Study Summaries. Ypsilanti, Mich.: High/Scope Educational Research Foundation, 1973.
Jerome, Chris H. National Home Start Evaluation Study, Interim Report V, Case Studies. Ypsilanti, Mich.: High/Scope Educational Research Foundation, 1974.
Jick, Todd D. "Mixing Qualitative and Quantitative Methods: Triangulation in Action." Administrative Science Quarterly, 24 (1979), 602-11.*
Johnson, John M. Doing Field Research. New York: Free Press, 1975.
Johnson, Steven D. "On the Use of Qualitative Methods in Policy Research: A Review of Three Multi-site Studies." Mimeograph, Cornell University, Ithaca, N.Y., February 1980.
Kendall, Patricia L., and Katherine M. Wolf. "The Analysis of Deviant Cases in Communications Research." Communications Research: 1948-1949. eds. Paul Lazarsfeld and Frank Stanton, pp. 152-57. New York: Harper & Row, 1949.
Kennedy, Mary M. "Generalizing from Single Case Studies." Evaluation Quarterly, 3 (1979), 661-78.*
Khadduri, Jill, and Raymond J. Struyk. "Improving Section 8 Rental Assistance: Translating Evaluation into Policy." Evaluation Review, 5 (1981), 189-206.
Kidder, Louise H. Research Methods in Social Relations, 4th ed. New York: Holt,
Rinehart and Winston, 1981.
Kirschen, Etienne S., and Lucien Morissens. "The Objectives and Instruments of
Economic Policy." Qualitative Planning of Economic Policy, ed. Bert G.
Hickman, pp. 111-13. Washington, D.C.: The Brookings Institution, 1965.
Kirschen, Etienne S., et al. Economic Policy in Our Time. Amsterdam, Netherlands: North Holland Pub. Co., 1964.
Kraft, Richard H. P., et al. Four Evaluation Examples: Anthropological, Economic, Narrative and Portrayal. Chicago: Rand McNally, 1974.
Kratochwill, Thomas R. Single Subject Research. New York: Academic Press, 1978.
Kyle, Diane Wells, and Dorene D. Ross. "Evaluating Qualitative Research: Criteria and Their Application." Paper presented at the Evaluation Network annual meeting, Chicago, III., October 1983.
Lazarsfeld, Paul F., and Allen H. Barton. "Qualitative Measurement in the Social Sciences: Classification, Typologies and Indices." The Policy Sciences: Recent Developments in Scope and Methods, eds. Daniel Lerner and Harold D. Lasswell, pp. 155-92. Stanford, Calif.: Stanford University Press, 1951.
Lecompte, Margaret D., and Judith P. Goetz. "Problems of Reliability and Validity in Ethnographic Research." Review of Educational Research, 152 (Spring 1982), 31-60.
Levine, Harold G. "Principles of Data Storage and Retrieval for Use in Qualitative Evaluations." Paper presented at the American Educational Research Association meeting, Montreal, Canada, April, 1983.
Lewy, Arieh, and Marvin Alkin. The Impact of a Major National Evaluation Study: Israel's Van Leer Report. Los Angeles: University of California, Center for the Study of Evaluation, International Monograph Series in Evaluation, April 1983.
Light, Richard J. "Six Evaluation Issues That Synthesis Can Resolve Better Than Single Studies." Issues in Data Synthesis, eds. William H. Yeaton and Paul M. Wortman, pp. 57-74. San Francisco: Jossey Bass, 1984.
Light, Richard J., and David B. Pillemer. Summing Up: The Science of Reviewing Research. Cambridge, Mass.: Harvard University Press, 1984.*
Lijphart, Arend. "Comparative Politics and the Comparative Method." American Political Science Review, 65 (1971), 682-93.
Lincoln, Yvonna S. "Strategies for Inquiring About the Dependability (Reliability) of Naturalistic Studies." Paper presented at the Evaluation Research Society meeting, Austin, Texas, September 30-October 3, 1981.
Lipset, Seymour Martin, Martin A. Trow, and James S. Coleman. Union Democracy. Glencoe, III.: Free Press, 1956.
Lofland, J. Analyzing Social Settings. Belmont, Calif.: Wadsworth, 1971.
Lucas, W. The Case Survey Method of Aggregating Case Experience. Santa Monica, Calif.: Rand, 1974.
McCall, George J., and J. L. Simmons, eds. Issues in Participant Observation: A Text and Reader. Reading, Mass.: Addison-Wesley, 1969.
McClintock, Charles C., Diane Brannon, and Steven Maynard-Moody. "Applying the
Logic of Sample Surveys to Qualitative Case Studies: The Case Cluster Method." Administrative
Science Quarterly, 24 (1979), 612-29.
McDaniels, Garry, et al. "Case Studies of Children in Head Start Planned Variation,
1970-71." University of Maryland, College of Education, College Park, Md., 1972.
McGowan, Eleanor, and David Cohen. "Rational Fantasies." Policy Sciences Journal, 1 (1979), 43954.
Marsh, Robert M. "The Bearing of Comparative Analysis on Sociological Theory." Social Forces, 43 (1964), 191-96.
Miles, Matthew B. "Qualitative Data as an Attractive Nuisance." Administrative Science Quarterly, 24 (1979), 590-601.
Miles, Matthew B. "A Mini-Cross-Site Analysis." American Behavioral Scientist, 26:1 (1982), 121-32.
Miles, Matthew B., and A. M. Huberman. Qualitative Data Analysis: A Sourcebook of New Methods. Beverly Hills, Calif.: Sage, 1984.*
Mill, John Stuart. A System of Logic, 8th ed. London, England: Longmans Green, 1972.
Mintzberg, Henry. "The Emerging Strategy of 'Direct' Research." Administrative Science Quarterly, 24 (1979), 582-89.
Moberg, P. "The Collection and Analysis of Qualitative Data in Evaluation Research." Paper presented at the National Conference on Evaluation in Alcohol, Drug Abuse, and Mental Health Programs, Washington, D.C., 1974.
Mosteller, F., and D. L. Wallace. Inference and Disputed Authorship: The Federalist. Reading, Mass.: Addison-Wesley, 1964.
Mosteller, F., et al. "The Pre-Election Polls of 1948." Social Science
Research Council Bulletin, 60 (1949).
Mulhauser, Frederick, "Ethnography and Policy Making: The Case of Education." Human
Organization, 3 (1975), 311-15.
National Science Foundation. Case Studies in Science Education, Vols. l and 2. Washington, D.C.: U.S. Government Printing Office, 1978.
National Science Foundation. Cooperative Science: A National Study of University and Industry Researchers. Assessment of the Industry/University Cooperative Research Projects Program, vols. 1 and 2. Washington, D.C.: November 1984.
Neustadt, R. E., and H. V. Fineberg. The Swine Flu Affair: Decision Making on a
Slippery Disease. Washington, D.C.: U.S. Government Printing Office, 1978.
Office of Technology Assessment. Assessing the Efficacy and Safety of Medical
Technologies. Washington, D.C.: U.S. Government Printing Office, 1978.
Paige, Glenn. "Problems and Use of the Single Case in Political Research." Ph.D. dies., Northwestern University, Evanston, III., 1959.
Patton, Michael Quinn. Qualitative Evaluation Methods. Beverly Hills, Calif.: Sage, 1980.*
Pelto, Pertti J., and Gretel H. Pelto. Anthropological Research: The Structure of Inquiry, 2nd ed. Cambridge, England: Cambridge University Press, 1978.
Philadelphia Inquirer. "Anti-Terrorism Videos: Airline Personnel Say FAA-Ordered Training Program Is Boring, Ineffective." April 29, 1986.
Pierce, William Spangar. Bureaucratic Failure and Public Expenditure. New York:
Academic Press, 1981.
Piore, Michael J. "Qualitative Research Techniques in Economics." Administrative
Science Quarterly, 24 (1979), 560-69.
Platt, J. R. "Strong Inference." Science, 146 (1964), 347-53.
Plog, Michael. "The Use of Case Study Methodology." State Evaluation Network Newsletter, 1:2 (1980), 5-6.
Popkewitz, Thomas S., and Robert B. Tabachnick. The Study of Schooling: Field-Based Methodologies in Educational Research and Evaluation. New York: Praeger, 1981.
Pressman, Jeffrey L., and Aaron Wildavsky. Implementation. Berkeley, Calif.: University of California Press, 1973.*
Przeworski, Adam, and Henry Teune. The Logic of Comparative Social Inquiry. New York: John Wiley and Sons, 1970.
Rist, Ray C. "On the Relations Between Educational Research Paradigms: From Disdain to Detente." Anthropology and Education Quarterly, 8 (1977),
Rist, Ray C. Earning and Learning: Youth Employment Policies and Programs. Beverly Hills, Calif.: Sage, 1981a.
Rist, Ray C. "On the Utility of Ethnographic Research for the Policy Process." Urban Education, 15 (1981b), 485-94.
Rist, Ray C. "Mandating Collaboration Through Federal Legislation: YEDPA and the
CETA-School Linkage." Research in Sociology of Education and Socialization,
vol. 3, ed. R. Corwin, pp. 187-205. New York: JAI Press, 1982a.
Rist, Ray C. "On the Application of Ethnographic Inquiry to Education: Procedures and
Possibilities." Journal of Research in Science Teaching, 19:6 (1982b), 439-50.
Rist, Ray C. "Beyond the Quantitative Cul-de-Sac: A Qualitative Perspective on Youth Employment Programs." Applied Poverty Research, eds. Richard Goldstein and Stephen M. Sachs, pp. 123-38. Totawa, N.J.: Rowman and Allanheld, 1984.
Roncek, Dennis W., and Gail Weinberger. "Neighborhoods of Leased Public Housing." Evaluation Review, 5 (1981), 231-44.
Sanday, Peggy R. "The Ethnographic Paradigm(s)." Administrative Science Quarterly, 24 (1979), 52738.
Scheirer, Mary Ann, and Eva L. Resmovic. "Measuring the Degree of Program Implementation." Evaluation Review, 7 (1983), 599-633.
Scriven, Michael. "Objectivity and Subjectivity in Educational Research." Philosophical Redirection in Educational Research, ed. L. G. Thomas. Chicago: University of Chicago Press, 1972.
Searle, Barbara (ed.). Evaluation in World Bank Education Projects: Lessons from Three Case Studies, Report EDT5. Washington, D.C.: World Bank, July 1985.
Sechrest, Lee, ed. Unobtrusive Measurement Today. San Francisco, Calif.: Jossey-Bass, 1979.
Shapiro, E. "Educational Evaluation: Rethinking the Criteria of Competence." School Review, 81 (1973), 523-49.
Sieber, Sam D. "The Integration of Fieldwork and Survey Methods." American
Journal of Sociology, 78:6 (1973), 1335-59.
Sjoberg, Gideon. "The Comparative Method in the Social Sciences." Philosophy
of Science, 22 (1955), 106-17.
Smith, Allen G., and Karen S. Louis, eds. "Multi method Policy Research: Issues and Applications." American Behavioral Scientist, 26:1 (1982), 1-144.
Smith, Louis M. "An Aesthetic Education Workshop for Administrators: Some Implications for a Theory of Case Studies." Paper presented at the American Educational Research Association meeting, Chicago, 111., 1974.
Smith, Louis M. "An Evolving Logic of Participant Observation, Educational Ethnography and Other Case Studies." Review of Research in Education, 6 (1979), 316-77.
Smith, Louis M., and S. Schumacher. Extended Pilot Trials of the Aesthetic Education
Program: A Qualitative Description. St. Louis: CEMREL 1972.
Snow, Richard E. "Representative and Quasi-representative Designs for Research on
Teaching." Review of Educational Research, 44 (1974), 265-91.
Spirer, Janet E. The Case Study Method: Guidelines, Practices and Applications for Vocational Education. Columbus, Ohio: National Center for Research in Vocational Education, 1980.
Stake, Robert E. "The Case Study Method in Social Inquiry." Educational Researcher, 7 (1978), 5-8.
Stake, Robert E., and J. Easley, eds. Case Studies in Science Education. Urbana, III.: Center for Instructional Research and Curriculum Evaluation, 1978.
Stein Harold. Public Administration and Policy Development: A Case Book. New
York: Harcourt Brace Jovanovich, 1952.
Stenhouse, Lawrence. "Case Study in Comparative Education: Particularity and
Generalizability." Comparative Education, 15:1 (1979), 5-10.
Trend, M. G. "On the Reconciliation of Qualitative and Quantitative Analyses: A Case Study." Human Organization, 37 (1978), 345-54.
Trow, Martin. "Comment on Participant Observation and Interviewing: A Comparison." Human Organization, 16 (1957), 33-35.
Turner, B. A. "Some Practical Aspects of Qualitative Data Analysis: One Way of Organizing the Cognitive Processes Associated with the Generation of Grounded Theory." Quality and Quantity, 15 (1981), 225-47.
University of Sussex, Social Science Research Policy Unit. Success and Failure in Industrial Innovation. London, England: Center for the Study of Industrial Innovation, 1972.
U.S. General Accounting Office. Lands in the Lake Chelan National Recreation Area Should Be Returned to Private Ownership, GAO/CED-81-10. Washington, D.C.: January 22, 1981.
U.S. General Accounting Office. Housing Block Grant Activity in Pittsburgh: A Case Study, CED-82-52. Washington, D.C.: March 24, 1982.
U.S. General Accounting Office. Housing Block Grant Activity in Seattle: A Case Study, CED-82-60. Washington, D.C.: March 30, 1982.
U.S. General Accounting Office. Review of the Operations of the Sea Island
Comprehensive Health Care Corporation and the Franklin C. Fetter Family Health Center,
HRD-82-69. Washington, D.C.: April 23, 1982.
U.S. General Accounting Office. Housing Block Grant Activity in Dallas: A Case Study,
CED-82-75. Washington, D.C.: April 30, 1982.
U.S. General Accounting Office. Content Analysis: A Methodology for Structuring and Analyzing Written Material. Methodology transfer paper 3. Washington, D.C.: June 1982.
U.S. General Accounting Office. Cleaning Up the Environment: Progress Achieved but Major Unresolved Issues Remain, vols. 1 and 2, CED-82-72. Washington, D.C.: July 21, 1982.
U.S. General Accounting Office. Block Grants for Housing: A Study of Local Experiences and Attitudes, GAO/RCED-83-21. Washington, D.C.: December 13, 1982.
U.S. General Accounting Office. How Well Do the Military Services Perform Jointly in Combat? DOD's Joint Test-and-Evaluation Program Provides Few Credible Answers, GAO/PEMD-84-3. Washington, D.C.: February 22, 1984.
U.S. General Accounting Office. An Evaluation of the 1981 AFDC Changes: Initial Analyses, GAO/ PEMD-84-6. Washington, D.C.: April 2, 1984.
U.S. General Accounting Office. Need to Better Assess Consequences Before Reducing Taxpayer Assistance, GAO/GGD-84-13. Washington, D.C.: April 5, 1984.
U.S. General Accounting Office. Implementation Status of the Office of Management
and Budget Circular A-76 Program at the Department of the Interior's National Park Service
and Bureau of Reclamation, GAO/RCED-85-56. Washington, D.C.: March 15, 1985.
U.S. General Accounting Office. Projects Funded in Northeast Texas by the Emergency
Jobs Appropriations Act of 1983, GAO/HRD-85-42. Washington, D.C.: March 26, 1985.
U.S. General Accounting Office. Projects Funded in the Montgomery, Alabama, Metropolitan Area by the Emergency Jobs Appropriations Act of 1983, GAO/HRD-85-59. Washington, D.C.: May 7, 1985.
U.S. General Accounting Office. Projects Funded in Fresno County, California by the Emergency Jobs Appropriations Act of 1983, GAO/HRD-85-90. Washington, D.C.: August 27, 1985.
U.S. General Accounting Office. Projects Funded in South Central Georgia by the Emergency Jobs Appropriations Act of 1983, GAO/HRD-85-98. Washington, D.C.: September 25, 1985.
U.S. General Accounting Office. Improving Operating and Staffing Practice Can Increase Productivity and Reduce Costs in SSA's Atlanta Region, GAO/GGD-85-85. Washington, D.C.: September 11, 1985.
U.S. General Accounting Office. Emerging Issues in Export Competition: A Case Study of the Brazilian Market, GAO/NSIAD-85-121. Washington, D.C.: September 26, 1985.
U.S. General Accounting Office. Information on the Forest Service's Efforts to Control the Spread of the Western Spruce Budworm in the Carson National Forest, GAO/RCED-86-6. Washington, D.C.: October 30, 1985.
U.S. General Accounting Office. Department of Commerce's Second-Year Efforts to
Implement the Federal Manager's Financial Integrity Act, GAO/ RCED-86-21. Washington,
D.C.: November 5, 1985.
U.S. General Accounting Office. Emergency Jobs Act of 1983: Projects Funded in the
Lawrence-Haverhill, Massachusetts, Area, GAO/HRD-86-30. Washington, D.C.: December 6,
1985.
U.S. General Accounting Office. Using Statistical Sampling. Methodology transfer paper 6. Washington, D.C.: May 15, 1986.
U.S. General Accounting Office. Bigeye Bomb: An Evaluation of DOD Chemical and
Development Tests, GAO/PEMD-86-12BR. Washington, D.C.: May 23, 1986
U.S. General Accounting Office. Foreign Representation: Former High-Level Federal
Officials Representing Foreign Interests, GAO/NSIAD-86-175BR. Washington, D.C.: July
11, 1986.
U.S. General Accounting Office. Social Security: Improved Telephone Accessibility Would Better Serve the Public. GAO/HRD-85-86. Washington, D.C.: August 29, 1986.
U.S. General Accounting Office. The Nation's Water Quality: Key Unanswered Questions About the Quality of Rivers and Streams, GAO/PEMD-86-6. Washington, D.C.: September 19, 1986.
U.S. General Accounting Office. Cargo Imports: Customs Need to Better Assure Compliance with Trade Laws and Regulations, GAO/GGD-86-136. Washington, D.C.: December 1986.
U.S. General Accounting Office. Water Quality: An Evaluation Method for the Construction Grants Program-Methodology, GAO/PEMD-87-4A. Washington, D.C.: December 17, 1986a.
U.S. General Accounting Office. Water Quality: An Evaluation Method for the
Construction Grants Program-Case Studies. GAO/PEMD-87-4B, vol. 2 Washington, D.C.:
December 17, 1986b.
U.S. General Accounting Office. Medical Malpractice: Six Case Studies Show Claims and
Insurance Costs Still Rise Despite Reforms, GAO/HRD-87-21. Washington, D.C.: December
31, 1986.
U.S. General Accounting Office. Parks and Recreation: Construction Contract at Jean LaFitte National Historical Park, GAO/ACED 86-232FS. Washington, D.C.: September 26, 1987.
U.S. General Accounting Office. Parks and Recreation: Concerns Raised About National Park Service Actions at Delaware Water Gap, GAO/RCED-87-24BR. Washington, D.C. October 28, 1987.
VanderPutten, Elizabeth. "Toward a Theory of Loosely Coupled Systems." Ph.D. dies., George Washington University, Washington, D.C., 1983.
Van Maanen, John. "The Fact of Fiction in Organizational Ethnography." Administrative
Science
Quarterly, 24 (1979), 539-50.
Van Maanen, John. Qualitative Methodology. Beverly Hills, Calif.: Sage, 1983.
Vidich, Arthur J., and Gilbert Shapiro. "A Comparison of Participant Observation and Survey Data." American Sociological Review, 20 (1955), 28-33.
Voss, Harwin L. "Pitfalls in Social Research: A Case Study." American Sociologist, 1 (1966), 136-40.
Wax, Rosalie. Doing Field Work: Warnings and Advice. Chicago: University of Chicago Press, 1971.
Weatherly, R., and M. Lipsky. "Street Level Bureaucrats and Institutional
Innovation: Implementing
Special-Education Reforms." Harvard Educational Review, 47 (May 1977), 171-97.
Webb, Eugene, and Karl E. Weick. "Unobtrusive Measures in Organizational Theory: A
Reminder."
Administrative Science Quarterly, 24 (1979), 65059.
Weiss, Robert S., and M. Rein. "The Evaluation of Broad Aim Programs: Experimental Design, Its Difficulties and an Alternative." Administration Science Quarterly, 15 (1980), 97-109.
Wilson, Steve. "The Use of Ethnographic Methods in Educational Evaluation." Human Organization, 36 (1977), 2.
Wilson, Steve. "Explorations of the Usefulness of Case Study Evaluations." Evaluation Quarterly, 3 (1979), 446-59.
Yeaton, William H., "The Case Study Crisis: Some Answers." Administrative Science Quarterly, 26 (1981b), 58-66.
Yeaton, William H., Eveleen gingham, and Karen A. Heald. "The Difference That Quality Makes: The Case of Literature Reviews." Sociological Methods and Research, 5 (1976), 139-56.
Yeaton, William 11., and Paul M. Wortman. Issues in Data Synthesis. San Francisco: Jossey-Bass, 1984.
Yin, Robert K. Case Study Research: Design and Methods, rev. ed. Beverly Hills, Calif.: Sage, 1989.*
Yin, Robert K., et al. A Review of Case Studies of Technological Innovation in State and Local Services, Santa Monica, Calif.: Rand, 1976.
Yin, Robert K., and Karen A. Heald. "Using the Case Survey Method to Analyze Policy Studies." Administrative Science Quarterly, 20 (1975), 371-81.
Yin, Robert K., and Ingrid Heinsohn. Using the Research Sponsored by the AOA.
Case Study No. 1. Transportation Service for the Elderly. Washington, D.C.:
American Institute for Research, 1980.
Zelditch, Morris J. "Some Methodological Problems of Field Studies." American
Journal of Sociology, 67 (1962), 566-76.
Glossary
Backfill Techniques
Techniques used in cumulative case studies to collect information needed if the study is
to be usable for aggregation; these techniques include, for example, obtaining missing
information from the authors on how instances studied were identified and on the bases for
instance selection.
Bias
The extent to which a measurement, sampling, or analytic method systematically
underestimates or overestimates the true value of an attribute.
Case Study
A method for learning about a complex instance, based on a comprehensive
understanding of that instance, obtained by extensive description and analysis of the
instance, taken as a whole and in its context.
Convenience Sample
Instances selected where the only basis is feasibility or ease of data
collection. Rarely useful in evaluation and is usually hazardous.
Construct
An attribute, usually unobservable, such as educational attainment or socioeconomic
status, that is represented by an observable measure.
Construct Validity
The extent to which a measurement method accurately represents a construct and
produces an observation distinct from that produced by a measure of another construct.
External Validity
The extent to which a finding applies (or can be generalized) to persons,
objects, settings, or times other than those that were the subject of study.
Focused Interview
An interview organized around several predetermined questions or topics but
providing some flexibility in the sequencing of the questions and without a predetermined
set of response categories or specific data elements to be obtained.
Generalizability
Used interchangeably with "external validity."
Internal Validity
The extent to which the causes of an effect are established by an inquiry.
Longitudinal Data
Sometimes called "time series data," observations collected over a
period of time; the sample (instances or cases) may or may not be the same each time.
Matrix of Categories
A method of displaying relationships among themes in analyzing case study data
that shows whether changes in categories or degrees along one dimension are associated
with changes in the categories of another dimension.
Normative Question
A type of evaluation question requiring comparison between what is happening (the
condition) to norms and expectations or standards for what should be happening (the
criterion).
Open-Ended Interview
An interview in which, after an initial or lead question, subsequent questions
are determined by topics brought up by the person being interviewed; the concerns
discussed, their sequence, and specific information obtained are not predetermined and the
discussion is unconstrained, able to move in unexpected directions.
Outliers
Instances that are aberrant or do not fit with other instances; instances that, compared
to other members of a population, are at the extremes on relevant dimensions.
Program Effectiveness Evaluation
The application of scientific research methods to estimate how much observed
results, intended or not, are caused by program activities. Effect is linked to cause by
design and analyses that compare observed results with estimates of what might have been
observed in the absence of the program.
Program Evaluation
The application of scientific research methods to assess program concepts,
implementation, and effectiveness.
Purposive Sample
Instances appropriately selected to answer different evaluation questions, on
various systematic bases, such as best or worst practices; a judgmental sample. If
conducted systematically, can be widely useful in evaluation.
Qualitative Data
Information based on judgments (such as the estimated speed of a UFO) which may
be expressed in numerical or nonnumerical ways and data that may not be based on judgments
(such as state of birth) but are not meaningfully expressed numerically. The data sources
are often textual and observational and expressed in words.
Quantitative Data
Information based on measures that do not rely on judgments and that are
meaningfully measured. These are usually expressed numerically and often use continuous
rather than discrete or categorical levels of measurement and scales with interval or
ratio properties.
Reliability
The extent to which a measurement process produces similar results on repeated
observations of the same condition or event.
Representative Sample
A sample that has approximately the same distribution of characteristics as the
population from which it was drawn.
Simple Random Sample
A method for drawing a sample from a population such that all samples of a given
size have equal probability of being drawn.
Structured Interview
An interview in which questions to be asked, their sequence, and the detailed
information to be gathered are all predetermined; used where maximum consistency across
interviews and interviewees is needed.
Triangulation
The combination of methodologies in the study of the same phenomenon or construct; a
method of establishing the accuracy of information by comparing three or more types of
independent points of view on data sources (for example, interviews, observation, and
documentation; different investigations; different times) bearing on the same findings.
Akin to corroboration and an essential methodological feature of case studies.
Yoked
Concurrent with. For example, data collection and analyses in case studies are iterative
and concurrent-that is, are yoked.
Papers inThis Series
This is a flexible series continually being added to and updated. The interested reader
should inquire about the possibility of additional papers in the series.
The Evaluation Synthesis. Transfer paper 10.1.2, formerly methods paper I.
Content Analysis: A Methodology for Structuring and Analyzing Written Material. Transfer paper 10.1.3, formerly methodology transfer paper 3.
Designing Evaluations. Transfer paper 10.1.4, formerly methodology transfer paper 4.
Using Structured Interviewing Techniques. Transfer paper 10.1.5, formerly methodology transfer paper 5.
Using Statistical Sampling. Transfer paper 10.1.6, formerly methodology transfer paper 6.
Developing and Using Questionnaires. Transfer paper 10.1.7, formerly methodology transfer paper 7.
Case Study Evaluations. Transfer paper 10.1.9, formerly methodology transfer paper 9.
Prospective Evaluation Methods: The Prospective Evaluation Synthesis. Transfer
paper 10.1.10, formerly methodology transfer paper 10.
U. S. GOVERNMENT PRINTING OFFICE: 1995 404-741/20012