U.S. General Accounting Office, Program Evaluation and Methodology Division. Developing and Using Questionnaires. Washington, DC; 1993.

View entire document

 

Chapter 3
Designing the Sample or Population for Data Collection

Along with deciding what to ask, evaluators must decide who to ask. The people questioned must have the information the evaluators they must be readily identifiable and accessible, they must be willing and able to answer, and they must be representative of the population being measured. They can be migrant workers, prisoners, police, scientists, medical doctors, commanders or soldiers, inner city African American youths, or government officials..

Ideally, everyone in the population should be questioned, and sometimes this is done if the population is very small. But usually the best that can be done is to take a sample of these people and generalize the findings to the population they come from.

In theory, to generalize findings, evaluators must fist define the population. Then they should enumerate every unit in the population in a way such that every unit has an equal chance of being selected for the sample. In practice, it may be unrealistic to expect to enumerate every unit in a real population (for example, all persons who participated in a government program such as Head Start), but the enumeration must be reasonably complete and accurate and be reasonably representative of the actual population. The evaluators must then draw a representative sample from this population.

Survey Population

However, the sample cannot be determined or drawn until the evaluators have studied the size and characteristics of the population they want to know about. All too often, this step in questionnaire development is overlooked or assumed to be routine. Then, when the questionnaire is complete and ready to be mailed, the team is faced with weeks of hard research, or a major redesign, because the sample was not well founded.

The fist step in defining the survey population is to learn about the population distribution-the mayor categories of units and the numbers category. For example, if the evaluators want to sample banks, they should learn the differences between county, regional, statewide, branch, and unit banks; they should know geographic location factors and understand the basis for classifying banks as very large, large, medium, and small. If they are studying unit commanders in the armed services, they should know the unit sizes and types and the variations among the services. This research will help in designing sampling factors, such as stratification and stratification size, and will ensure a representative sample.

Once the evaluators are familiar with the characteristics of the population, they can look for sources that enumerate each unit in the population or develop a reasonable theory for selecting the sampling units. The enumeration should be accurate, up-to-date, and organized to reflect the distribution characteristics. Sometimes this task is relatively easy. For example, in one project we needed to assess the effect that the Foreign Corrupt Practices Act had on U.S. business. The act prohibits payments to foreign officials if the purpose is to influence business. The population was U.S. companies that conduct most of the foreign business. These companies were readily identified because they were among the Fortune 1,000 companies, which conduct most of the foreign business. All we had to do was buy this list from Fortune magazine. The list gave the order of the companies by sales volume and provided information on each company's activities and the name and address of both the chief executive officer and the chairman of the board. However, for many other projects, considerable effort is needed to document the survey population.

In practice, evaluators rarely have a list of the real population; at best they have only a list at the time the source material was current. By the time the questionnaire is administered, some units will have left the population and others will have joined it. For example, in the Fortune 1,000 evaluation, 6 percent of the firms left the population and we do not know how many may have joined it. The sample analysis must evaluate and make statistical adjustments for the losses. Whenever possible, the effect of the additions should also be considered.

The best way to start enumerating a population is to talk to experts in the field and search out likely organizations, archives, directories, libraries, and management information systems until a reliable source has been discovered. Then the sampling units or population elements are organized, reorganized, or indexed into groups or frames, so they can be reached by a random, systematic, or prescribed process. For example, in one evaluation, we had to locate retired military users of military medical facilities. From a Department of Defense archival data base we were able to get the names and addresses of all the retired military personnel but we had no way of knowing if they were uses of a particular medical facility. Our field work showed that retired military were likely to travel up to 40 miles to use hospital services; if they lived farther away, they usually made other arrangements. So we developed a computer program, based on zip codes, that matched persons to the hospitals that were within 40 miles of their homes.

In a study of zoning problems encountered by group homes for the mentally disabled, we discovered that there was no national register of group homes. Since this was a study to see if this restrictive zoning practice was geographically widespread, we sampled catchment areas. We then called up the catchment area directors and got the names and addresses of every group home in each catchment area and sent the group home directors a questionnaire asking about their zoning problems.

Sometimes, no matter how hard the search, archival data or records cannot be found from which to develop a population. When this happens, the best thing to do is to look for groups, sections, or clusters of files or lists that contain the information. Or the evaluators may want to look at existing data to surmise some ratio or relationship associated with the population. For example, if they want to define the population of general aviation flight-service airport specialists, they may be able to use previous work or pilot or survey studies. For example, from previous experience, they may find that they can estimate that the average number of specialists per airport is 16, multiply 16 specialists by the 316 airports, and estimate the population at about 5,000.

Unfortunately, in a great many cases, there is neither a population enumeration nor a way to get cluster, unit, or ratio figures. In these cases, the evaluators must try to document the biggest possible portion of the most important and most representative cases, or they must develop some reasonable theory for selecting the sampling units. For example, to get a representative list of internal auditors, the evaluators might use the membership list for the Institute of Internal Auditors plus a list of the internal audit departments for the Fortune 1,000 companies. The latter would be included because most of them have internal audit departments.

In one situation, we had to sample major importers and exporters. The available list had over 10,000 entries, almost all of which were too small to be considered major. So we used a combination of a "small world network" and a "snowball" approach. We found an association on the eastern coast to which most major midAtlantic shippers belonged. We contacted the association and obtained a list of the major shippers and their business volume. This association identified two other shippers' associations, which provided their lists and the names of six more associations. We continued until we had identified all associations and had a list of most of the major shippers. The shippers' associations reviewed our list and estimated that it accounted for 82 percent of the import-export business.

Many other sources of specialized lists are available, but their reliability varies considerably. For example, major organizations such as the American Medical Association, the National Education Association, and the National Association for Home Builders can provide detailed address lists and population descriptions of their members. However, their cooperation varies with their interest in what the job is about The cost for lists can be anything from nothing to a few hundred to several thousand dollars. Although the Bureau of the Census sometimes has useful lists, such as the census of manufactures and the census of governments, these sources may be out of date. Many commercial sources, such as Ruben and Donnelly, Polk, and Thomas, sell population lists. Also, some commercial firms such as Dunn and Bradstreet sell specialized lists for various users, such as mail order companies. Care must be taken in using these lists because their quality varies considerably and very little may be known about the bias built into them, how they were developed, or what they include and, more importantly, exclude.

Before using a list, it is a good idea to review and perhaps test it For example, in a sample survey of farmers, the address list was developed from a list of subscribers to the Farm Home Journal. The list turned out to be several yeas old, and many of the subscribes were not farmers in the technical sense but people who sold or bought agricultural equipment or products or who were interested in rural living.

Selecting the Sample

Once the population has been enumerated and the evaluators sure that it represents the population to which they want to generalize, they are ready to draw the sample.

The sample must be drawn in accordance with a procedure that ensures a random selection. The sample size must be large enough to provide the degree of measurement precision and accuracy generally accepted by the scientific community. This must be done very efficiently and cost effectively. In many instances, accomplishing this will require the assistance of a sampling statistician who has the appropriate technical skills and practical experience.1

1 See U.S. General Accounting Office, Using Statistical Sampling, GAO/PEMD-10.1.6 (Washington, D.C.: May 1992). This paper provides a thorough treatment of this topic.

Nonstatistical Sampling

Questionnaires may be used on projects in which statistical sampling is not used, so we need to consider briefly other ways in which evaluators select cases (Deming, 1960). Either all the cases can be studied-that is, a census can be taken-or part of the population can be selected in a nonstatistical manner. When evaluators take part of the population, they usually do so for a reason. It may be that they are doing a case study, so they select one or more cases that provide the best opportunity to observe the phenomena or relationships of interest, and they do not need to generalize their findings to the population. In other situations, the evaluators know very little about the population and cannot draw a statistical sample, so they arbitrarily select as many cases as they can and report the findings. However, in many situations, evaluators want to generalize and they know something about the population but it is just not feasible to draw statistical samples. So they pick a sample that they hope will correspond, in its features, to the population, even though they know they will not be able to use the powerful reasoning associated with statistical samples. An important category of nonstatistical sampling is "Judgments sampling."

A judgment sample draws its name from the fact that in the judgment of the evaluator, the cases chosen correspond to certain aspects of the population. The cases may be selected because they are judged most typical, because they represent the extreme ranges, because they represent a known part of the population, or because they simulate or act as a proxy for a representative sample from the population. For example, we could interview all the Fortune 500 chief executive officers in New York and Chicago because we believe that this sample is typical of chief executive of in large companies. We could study selected group homes for the mentally disabled in California, Mississippi, New York, and Texas, because these states represent the extremes of the laws and practices. We could study 60 prime contractors with the Department of Defense in California and New York, because these contractors account for 82 percent of all defense contracts. We might pick 15 airports in 11 states, such that the sample would be similar to the population of airports with respect to size, geographic coverage, and weather conditions.

As a rule, the use of judgment sampling in a project in which the intent is to generalize is ill advised, because arguments to support generalization cannot be nearly as persuasive as with statistical samples. However, occasions may arise (as with a very homogeneous population) in which the situation is not altogether bleak.

When the validity of the findings depends on the extent to which they can be generalized to the population, and when there is no statistical sample, it might help to have some rule of thumb that might compare judgment samples to statistical samples. One way to picture the relationship between statistical samples and judgment samples with respect to representativeness might be to imagine a credibility scale from 1 to 10. Assume that a score of 1 is the value given to a single case study designed without intent whatsoever to generalize, and 10 is the credibility associated with studying the whole population. A very large, statistically valid random sample might yield a value of 9. A large, medium, and very small but statistically valid random sample might yield respective scores of 8, 7, and 6. If we made many case studies but did not take a random sample, we might get a value of 4. We might extend this value to 5 if the groups were large enough to provide statistical certainty within their limited area of selection or if the population was very homogeneous. We might get the same score of. 5 if we selected a number of cases that represented the range of conditions and circumstances that apply to the population. (Incidentally, this is how pretest candidates are selected, because there is neither time nor resources to draw a statistically valid sample.) However, the score would drop to 3 or even 2 if we selected many or fewer cases without giving consideration to representing the expected range of conditions.

A few yeas ago, we did a review of the elderly in which we selected thousands of cases at random from the same city. This might have been acceptable, from a generalization viewpoint, if we were measuring the conditions associated with cholesterol levels; these levels could be presumed similar for most U.S. city-dwellers. However, in this review, we were concerned about programs and their effects, which may have varied from city to city. Thus, limiting the sample to one city prohibited generalizations beyond the city that was studied. Another example involved a population of 132 health maintenance organizations. We arbitrarily picked 16 of these organizations and collected data from hundreds of people in each one. In the end, what we came up with was a set of 16 case studies. Although the sample for each case study was representative of the population of people in one of the 132 health maintenance organizations, the 16 case studies together permitted only very careful and limited findings. We might have had a much more powerful evaluation at a fraction of the cost if we had taken a random sample of organizations and looked at fewer cases within each organization.