U.S. General Accounting Office, Program Evaluation and Methodology Division. Developing and Using Questionnaires. Washington, DC; 1993.
Chapter 3
Designing the Sample or Population for Data Collection
Along with deciding what to ask, evaluators must decide who to ask. The people questioned must have the information the evaluators they must be readily identifiable and accessible, they must be willing and able to answer, and they must be representative of the population being measured. They can be migrant workers, prisoners, police, scientists, medical doctors, commanders or soldiers, inner city African American youths, or government officials..
Ideally, everyone in the population should be questioned, and sometimes this is done if
the population is very small. But usually the best that can be done is to take a sample of
these people and generalize the findings to the population they come from.
In theory, to generalize findings, evaluators must fist define the population. Then they
should enumerate every unit in the population in a way such that every unit has an equal
chance of being selected for the sample. In practice, it may be unrealistic to expect to
enumerate every unit in a real population (for example, all persons who participated in a
government program such as Head Start), but the enumeration must be reasonably complete
and accurate and be reasonably representative of the actual population. The evaluators
must then draw a representative sample from this population.
Survey Population
However, the sample cannot be determined or drawn until the evaluators have studied the
size and characteristics of the population they want to know about. All too often, this
step in questionnaire development is overlooked or assumed to be routine. Then, when the
questionnaire is complete and ready to be mailed, the team is faced with weeks of hard
research, or a major redesign, because the sample was not well founded.
The fist step in defining the survey population is to learn about the population
distribution-the mayor categories of units and the numbers category. For example, if the
evaluators want to sample banks, they should learn the differences between county,
regional, statewide, branch, and unit banks; they should know geographic location factors
and understand the basis for classifying banks as very large, large, medium, and small. If
they are studying unit commanders in the armed services, they should know the unit sizes
and types and the variations among the services. This research will help in designing
sampling factors, such as stratification and stratification size, and will ensure a
representative sample.
Once the evaluators are familiar with the characteristics of the population, they can look
for sources that enumerate each unit in the population or develop a reasonable theory for
selecting the sampling units. The enumeration should be accurate, up-to-date, and
organized to reflect the distribution characteristics. Sometimes this task is relatively
easy. For example, in one project we needed to assess the effect that the Foreign Corrupt
Practices Act had on U.S. business. The act prohibits payments to foreign officials if the
purpose is to influence business. The population was U.S. companies that conduct most of
the foreign business. These companies were readily identified because they were among the
Fortune 1,000 companies, which conduct most of the foreign business. All we had to do was
buy this list from Fortune magazine. The list gave the order of the companies by
sales volume and provided information on each company's activities and the name and
address of both the chief executive officer and the chairman of the board. However, for
many other projects, considerable effort is needed to document the survey population.
In practice, evaluators rarely have a list of the real population; at best they have only
a list at the time the source material was current. By the time the questionnaire is
administered, some units will have left the population and others will have joined it. For
example, in the Fortune 1,000 evaluation, 6 percent of the firms left the population and
we do not know how many may have joined it. The sample analysis must evaluate and make
statistical adjustments for the losses. Whenever possible, the effect of the additions
should also be considered.
The best way to start enumerating a population is to talk to experts in the field and
search out likely organizations, archives, directories, libraries, and management
information systems until a reliable source has been discovered. Then the sampling units
or population elements are organized, reorganized, or indexed into groups or frames, so
they can be reached by a random, systematic, or prescribed process. For example, in one
evaluation, we had to locate retired military users of military medical facilities. From a
Department of Defense archival data base we were able to get the names and addresses of
all the retired military personnel but we had no way of knowing if they were uses of a
particular medical facility. Our field work showed that retired military were likely to
travel up to 40 miles to use hospital services; if they lived farther away, they usually
made other arrangements. So we developed a computer program, based on zip codes, that
matched persons to the hospitals that were within 40 miles of their homes.
In a study of zoning problems encountered by group homes for the mentally disabled, we
discovered that there was no national register of group homes. Since this was a study to
see if this restrictive zoning practice was geographically widespread, we sampled
catchment areas. We then called up the catchment area directors and got the names and
addresses of every group home in each catchment area and sent the group home directors a
questionnaire asking about their zoning problems.
Sometimes, no matter how hard the search, archival data or records cannot be found from
which to develop a population. When this happens, the best thing to do is to look for
groups, sections, or clusters of files or lists that contain the information. Or the
evaluators may want to look at existing data to surmise some ratio or relationship
associated with the population. For example, if they want to define the population of
general aviation flight-service airport specialists, they may be able to use previous work
or pilot or survey studies. For example, from previous experience, they may find that they
can estimate that the average number of specialists per airport is 16, multiply 16
specialists by the 316 airports, and estimate the population at about 5,000.
Unfortunately, in a great many cases, there is neither a population enumeration nor a
way to get cluster, unit, or ratio figures. In these cases, the evaluators must try to
document the biggest possible portion of the most important and most representative cases,
or they must develop some reasonable theory for selecting the sampling units. For example,
to get a representative list of internal auditors, the evaluators might use the membership
list for the Institute of Internal Auditors plus a list of the internal audit departments
for the Fortune 1,000 companies. The latter would be included because most of them have
internal audit departments.
In one situation, we had to sample major importers and exporters. The available list had
over 10,000 entries, almost all of which were too small to be considered major. So we used
a combination of a "small world network" and a "snowball" approach. We
found an association on the eastern coast to which most major midAtlantic shippers
belonged. We contacted the association and obtained a list of the major shippers and their
business volume. This association identified two other shippers' associations, which
provided their lists and the names of six more associations. We continued until we had
identified all associations and had a list of most of the major shippers. The shippers'
associations reviewed our list and estimated that it accounted for 82 percent of the
import-export business.
Many other sources of specialized lists are available, but their reliability varies
considerably. For example, major organizations such as the American Medical Association,
the National Education Association, and the National Association for Home Builders can
provide detailed address lists and population descriptions of their members. However,
their cooperation varies with their interest in what the job is about The cost for lists
can be anything from nothing to a few hundred to several thousand dollars. Although the
Bureau of the Census sometimes has useful lists, such as the census of manufactures and
the census of governments, these sources may be out of date. Many commercial sources, such
as Ruben and Donnelly, Polk, and Thomas, sell population lists. Also, some commercial
firms such as Dunn and Bradstreet sell specialized lists for various users, such as mail
order companies. Care must be taken in using these lists because their quality varies
considerably and very little may be known about the bias built into them, how they were
developed, or what they include and, more importantly, exclude.
Before using a list, it is a good idea to review and perhaps test it For example, in a
sample survey of farmers, the address list was developed from a list of subscribers to the
Farm Home Journal. The list turned out to be several yeas old, and many of the subscribes
were not farmers in the technical sense but people who sold or bought agricultural
equipment or products or who were interested in rural living.
Selecting the Sample
Once the population has been enumerated and the evaluators sure that it represents the
population to which they want to generalize, they are ready to draw the sample.
The sample must be drawn in accordance with a procedure that ensures a random selection. The sample size must be large enough to provide the degree of measurement precision and accuracy generally accepted by the scientific community. This must be done very efficiently and cost effectively. In many instances, accomplishing this will require the assistance of a sampling statistician who has the appropriate technical skills and practical experience.1
| 1 See U.S. General Accounting Office, Using Statistical Sampling, GAO/PEMD-10.1.6 (Washington, D.C.: May 1992). This paper provides a thorough treatment of this topic. |
Nonstatistical Sampling
Questionnaires may be used on projects in which statistical sampling is not used, so we
need to consider briefly other ways in which evaluators select cases (Deming, 1960).
Either all the cases can be studied-that is, a census can be taken-or part of the
population can be selected in a nonstatistical manner. When evaluators take part of the
population, they usually do so for a reason. It may be that they are doing a case study,
so they select one or more cases that provide the best opportunity to observe the
phenomena or relationships of interest, and they do not need to generalize their findings
to the population. In other situations, the evaluators know very little about the
population and cannot draw a statistical sample, so they arbitrarily select as many cases
as they can and report the findings. However, in many situations, evaluators want to
generalize and they know something about the population but it is just not feasible to
draw statistical samples. So they pick a sample that they hope will correspond, in its
features, to the population, even though they know they will not be able to use the
powerful reasoning associated with statistical samples. An important category of
nonstatistical sampling is "Judgments sampling."
A judgment sample draws its name from the fact that in the judgment of the evaluator, the cases chosen correspond to certain aspects of the population. The cases may be selected because they are judged most typical, because they represent the extreme ranges, because they represent a known part of the population, or because they simulate or act as a proxy for a representative sample from the population. For example, we could interview all the Fortune 500 chief executive officers in New York and Chicago because we believe that this sample is typical of chief executive of in large companies. We could study selected group homes for the mentally disabled in California, Mississippi, New York, and Texas, because these states represent the extremes of the laws and practices. We could study 60 prime contractors with the Department of Defense in California and New York, because these contractors account for 82 percent of all defense contracts. We might pick 15 airports in 11 states, such that the sample would be similar to the population of airports with respect to size, geographic coverage, and weather conditions.
As a rule, the use of judgment sampling in a project in which the intent is to
generalize is ill advised, because arguments to support generalization cannot be nearly as
persuasive as with statistical samples. However, occasions may arise (as with a very
homogeneous population) in which the situation is not altogether bleak.
When the validity of the findings depends on the extent to which they can be generalized
to the population, and when there is no statistical sample, it might help to have some
rule of thumb that might compare judgment samples to statistical samples. One way to
picture the relationship between statistical samples and judgment samples with respect to
representativeness might be to imagine a credibility scale from 1 to 10. Assume that a
score of 1 is the value given to a single case study designed without intent whatsoever to
generalize, and 10 is the credibility associated with studying the whole population. A
very large, statistically valid random sample might yield a value of 9. A large, medium,
and very small but statistically valid random sample might yield respective scores of 8,
7, and 6. If we made many case studies but did not take a random sample, we might get a
value of 4. We might extend this value to 5 if the groups were large enough to provide
statistical certainty within their limited area of selection or if the population was very
homogeneous. We might get the same score of. 5 if we selected a number of cases that
represented the range of conditions and circumstances that apply to the population.
(Incidentally, this is how pretest candidates are selected, because there is neither time
nor resources to draw a statistically valid sample.) However, the score would drop to 3 or
even 2 if we selected many or fewer cases without giving consideration to representing the
expected range of conditions.
A few yeas ago, we did a review of the elderly in which we selected thousands of cases at
random from the same city. This might have been acceptable, from a generalization
viewpoint, if we were measuring the conditions associated with cholesterol levels; these
levels could be presumed similar for most U.S. city-dwellers. However, in this review, we
were concerned about programs and their effects, which may have varied from city to city.
Thus, limiting the sample to one city prohibited generalizations beyond the city that was
studied. Another example involved a population of 132 health maintenance organizations. We
arbitrarily picked 16 of these organizations and collected data from hundreds of people in
each one. In the end, what we came up with was a set of 16 case studies. Although the
sample for each case study was representative of the population of people in one of the
132 health maintenance organizations, the 16 case studies together permitted only very
careful and limited findings. We might have had a much more powerful evaluation at a
fraction of the cost if we had taken a random sample of organizations and looked at fewer
cases within each organization.