| United States General Accounting Office |
| GAO | Program Evaluation and Methodology Division |
| October 1993 | Developing and Using Questionnaires |
Preface
GAO assists congressional decisionmakers in their decisionmaking process by furnishing
analytical information on issues and options under consideration. Many diverse
methodologies are needed to develop sound and timely answers to the questions that are
posed by the Congress. To provide GAO evaluators with basic information about the more
commonly used methodologies, GAO's policy guidance includes documents such as methodology
transfer papers and technical guidelines.
The purpose of this methodology transfer paper is to provide evaluators with a background that is of sufficient depth to use questionnaires in their evaluations. Specifically, this paper provides rationales for determining when questionnaires should be used to accomplish assignment objectives. It also describes how to plan, design, and use a questionnaire in conducting a population survey. We do not expect GAO evaluators to become experts after reading this paper. But we do hope that they will become familiar enough with questionnaire design guidelines to plan and use a questionnaire; to make preliminary designs and assist in many development and testing tasks; to communicate the questionnaire requirements to the measurement, sampling, and statistical analysis experts; and to ensure the quality of the final questionnaire and the resulting data collection.
The present document is a revision. An earlier version was authored by Brian Keenan and Marilyn Mauch in 1986. This revision, authored by Brian Keenan, includes new material on cognition as well as on a number of developments in pretesting that have occurred since then. As such, the present document supersedes the 1986 version.
Developing and Using Questionnaires is one of a series of papers prepared and
issued by the Program Evaluation and Methodology Division (PEMD). The purpose of the
series is to provide GAO evaluators with guides to various aspects of audit and evaluation
methodology, to illustrate applications, and to indicate where more detailed information
is available.
We look forward to receiving comments from the readers of this paper. They should be
addressed to Eleanor Chelimsky at 202-612-2900.
Werner Grosshans
Assistant Comptroller General
Office of Policy
Eleanor Chelimsky
Assistant Comptroller General
for Program Evaluation and Methodology
Contents
Chapter 1 Using Questionnaires
Overview of Tasks in Using Questionnaires
Deciding to Use Structured Questionnaires
Planning the Questionnaire
Developing the Measures
Designing the Sample
Developing and Testing the Questionnaire
Producing the Questionnaire
Preparing for and Collecting Data
Analyzing Data
Telephone Surveys
Chapter 2 Developing the Measures to Get the Questions
The Questionnaire Framework
Operationalizing the Constructs
Developing Measures From Operationalized
Constructs
Specify the Key Variable Relationships
Chapter 3 Designing the Sample or Population for Data Collection
Survey Population
Selecting the Sample
Nonstatistical Sampling
Chapter 4 Formatting the Questions
Open-Ended Questions
Fill-in-the-Blank Questions
Yes-No Questions
"Implied No" Choices
Single-Item Choices
Expanded Yes-No Questions
Free Choices
Multiple-Choice Questions
Ranking and Rating Questions
Guttman Format
Intensity Scale Questions
Semantic Differential Intensity Scales
Intensity Paired-Comparison Scales
Chapter 5 Avoiding Inappropriate Questions
Questions That Are Not Relevant to the Evaluation Goals
Unbalanced Line of Inquiry
Questions That Cannot or Will Not Be Answered Accurately
Questions That Are Not Geared to Respondent's Depth and Range of Information,
Knowledge, and Perceptions
Questions That Respondents Perceive as Illogical or Unnecessary
Questions That Require Unreasonable Effort to Answer
Threatening or Embarrassing Questions
Vague or Ambiguous Questions
Unfair Questions
Chapter 6 Writing Clear Questions
Simplify the Word Structure
Be Careful About Words With Several
Specific Meanings and Other Problem Words
Do Not Use Abstract Words
Reduce the Complexity of Ideas and Present Them One at a Time in Logical Order
Reduce the Sentence Length
Simplify the Sentence Structure
Use Active and Passive Voice Appropriately
Use Direct, Periodic, and Balanced Styles Appropriately
Avoid Writing Styles That Inhibit Comprehension
Chapter 7 Developing Unscaled Response Lists
Developing Comprehensive Lists
Presenting Mutually Exclusive Categories
Using Relevant and Appropriate Categories
Keeping the Response List Reasonably Short
Using Categories of Appropriate Specificity
Listing Categories in the Logical Order Expected by Respondents
Using a Screening Question
Chapter 8 Minimizing Question Bias and Memory Error
Question Bias
Memory Error
Remembering Frequency and Time of Occurrence
Chapter 9 Minimizing Respondent Bias
Response Styles
Highly Sensitive Items
Chapter 10 Measurement Error and Measurement Scales in Brief
Measurement Scales
Equal-Appearing Intervals
Chapter 11 Organizing the Line of Inquiry
Setting Expectations
Sequencing Questions
Using Subtitles as Cues
Choosing an Opening Question
Obtaining Complex Data
Using Transitional Phrases
Putting Specific Questions Before Overall Judgment Questions
Chapter 12 Following Quality Assurance Procedures
Pretesting
Expert Review
Validation and Verification
Analysis of Questionnaire Nonresponses
Chapter 13 Designing the Questionnaire Graphics and Layout
Instructions
Questionnaire Format Preparation
Typographic Style
Chapter 14 Preparing the Mail Out Package and Collecting and Reducing the
Data
Preparation of the Mail-Out Package
Data Collection
Data Reduction
Chapter 15 Analyzing Questionnaire Results
Analysis Plan
Item Responses and Univariate Analysis
Bivariate Analysis and Comparison of Two Groups
Multivariate Analysis and Comparison of Multiple Groups
Choice of Analysis Methods
Chapter 16 Adaptations for the Design and Use of Telephone Surveys
Advantages and Disadvantages of Telephone Surveys
Design Guidelines
Administration
Bibliography
Glossary
Papers in This Series
Table
Table 14.1: The Percentage of Questionnaires That Should Be Randomly Sampled to
Determine the Keypunch Error Rate
Figures
Figure 1.1: Typical Completion Times for Major Questionnaire Tasks
Figure 2.1: Operationalized Variable in Question Response Format
Figure 4.1: Fill-in-the-Blank Questions
Figure 4.2: Fill-in-the-Blank Row, Column, and Matrix Formats
Figure 4.3: Yes-No Filter Question
Figure 4.4: Mixed Yes-No and Multiple Choice Question
Figure 4.5: Balanced and Unambiguous Yes-No Question
Figure 4.6: "Implied No" Question
Figure 4.7: Emphasized-No Question
Figure 4.8: Single-Item Choice Question
Figure 4.9: Expanded Yes-No Format
Figure 4.10: Expanded Yes-No Format With Middle Category
Figure 4.11: Expanded Yes-No Format With Escape Choice
Figure 4.12: Multiple-Choice Question
Figure 4.13: Ranking Question
Figure 4.14: Rating Questions
Figure 4.15: Guttman Question
Figure 4.16: Extent Scale and the Expanded Yes-No Scale Questions
Figure 4.17: Extent Scale Converted to Likert Scale Question
Figure 4.18: Likert Question Used to Evaluate Policy
Figure 4.19: Amount Intensity Scale
Figure 4.20: Frequency Intensity Scale
Figure 4.21: Frequency and Amount Intensity Scales With Proportional and Verbal
Descriptive Anchors in Addition to the Conventional
Adjective and Scale Number Anchors
Figure 4.22: Branching Intensity Scale Format
Figure 4.23: Number-of-Occurrences and Time Interval Formats
Figure 4.24: Semantic Differential Question
Figure 4.25: Intensity Paired Comparison Scale
Figure 5.1: Skip Question
Figure 5.2: Behavior-Oriented Question
Figure 7.1: Question With Comprehensive List of Categories
Figure 7.2: Question With Overlapping Categories
Figure 7.3: Question With Nonoverlapping Categories
Figure 7.4: Tailored Question With Comprehensive Nonoverlapping Categories
Figure 8.1: Biased Question
Figure 8.2: List Divided Into Subgroups to Counter Primacy and Recency Biases
Figure 8.3: "Checks All That Apply" Response Format Changed to
"Check Yes or No" Format
Figure 8.4: Using Presentation Order to Counteract Expected Bias
Figure 8.5: Complex Question Broken Into Sequence of Questions
Figure 9.1: Question to Reduce Overreporting
Figure 9.2: Question With List of Ranges
Figure 9.3: Series of Indirect Questions
Figure 11.1: Sequence of Questions Obtaining Complex Data
Figure 13.1: Partial Questionnaire
Figure 14.1: Initial Questionnaire Transmittal Letter
Figure 14.2: Questionnaire Follow-Up Letter
Chapter 1
Using Questionnaires
This paper describes how to design and use questionnaires. Such information is important for GAO evaluators for two reasons. First, GAO frequently uses questionnaires to collect data. Second, the questionnaire is a method with a high potential for error if not designed and used properly.
GAO employs questionnaires to ask people for figures, statistics, amounts, and other
facts. We ask them to describe conditions and procedures that affect the work,
organizations, and systems with which they are involved, and we ask for their judgments
and views about processes, performance, adequacy, efficiency, and effectiveness. We ask
people to report past events and to make forecasts, to tell us about their attitudes and
opinions, and to describe their behavior and the behavior of others.
Questionnaires are popular because they can be a relatively inexpensive way of getting
people to provide information. But because they rely on people to provide answers, a
benefit-risk consideration is associated with their use. People with the ability to
observe, select, acquire, process, evaluate, interpret, store, retrieve, and report can be
a valuable and versatile source of information under the right circumstances. However, the
human mind is a very complex and vulnerable observation instrument. And if we do not ask
the right people the right questions in the right way, we will not get high-quality
answers.
This holds true for even the simplest of questions. An easy way to demonstrate this is to
do a simple straw poll, like asking co-workers how they came to work. One may answer
"By way of New York Avenue" or give some other route description. Another answer
to the same question may be "by car pool." If you continued this straw poll,
many of the answers would be unusable if your intent was to learn modes of transportation
to work.
*insert figure 1.1
After describing important factors to consider when deciding to use a questionnaire, we
briefly cover, in the remaining sections of this chapter, the major tasks listed in figure
1.1 and refer to subsequent chapters that provide detailed instructions. We do this to
give an overview of the scope of work required to plan, develop, and implement a
questionnaire and to show what the reader can expect to find in each of the subsequent
chapters. Overall, the organization of this paper parallels the logical sequence of tasks
undertaken when developing and using questionnaires.
Asking good questions in the right way-the focus of this paper-is both a science and an art. It is a science in that it uses many scientific principles developed from various fields of applied psychology, sociology, cognitive research, and evaluation research. It is an art because it requires clear and interesting writing and the ability to trade off or accommodate many competing requirements. For example, a precisely worded, well-qualified, unambiguous question may be stilted and hard to read. Questions must be clear, interesting, and easy to understand and answer. In addition to asking the right questions, evaluators need to be aware of other principles dealing with questionnaire design and administration that are also covered in this paper.
Overview of Tasks in Using Questionnaires
Using even a simple questionnaire is not always simple. Numerous mayor tasks to develop
and use a questionnaire must be completed in a logical sequence. After deciding to use a
questionnaire, evaluators must plan the questionnaire, develop measures, design the
sample, develop and test the questionnaire, produce the questionnaire, prepare and
distribute the mailout or interview packages, collect the data and follow up with
nonrespondents, perform checks to ensure the quality of responses, and reduce and analyze
the data. Figure 1.1 reviews these mayor tasks. Except for the data collection, these
processes are very similar regardless of whether the questionnaire is to be designed for
the mail or a telephone or face-to-face interview. When interviewers are used, however,
they must also be trained, which adds another major task.
Deciding to Use Structured Questionnaires
One of the first decisions evaluators have to make is whether to use a questionnaire or
some other method to collect the data for the job. In many situations, other data
collection techniques may be superior. In fact, over the past years other techniques were
recommended by technical design teams for about one of every three proposed GAO
questionnaires. The decision to use questionnaires should be made only after carefully
considering the comparative advantages and disadvantages of the various ways of
administering questionnaires over other data collection techniques.
Data Considerations
Data can be collected in a variety of ways, such as field observations, reviews of records
or published reports, interviews and standardized mail, and face-to-face or telephone
questionnaires. The selection of one technique over another involves trade-offs between
staff requirements, costs, time constraints, and most importantly the depth and type of
information needed. For example, if the objective of the assignment is to determine the
average per acre charge and the income derived from public grazing-land permit fees, the
evaluator might consider using structured data collection forms or pro forma work papers
to manually retrieve data from the case files in record storage. However, if the objective
is to determine how much land the ranchers are willing to lease and how much per acre they
are willing to pay, a mail, telephone, or face-to-face survey of ranchers would be
necessary.
Questionnaires are frequently used with sample survey strategies to answer descriptive and
normative audit or evaluation questions. They are often less central in studies answering
impact, or cause-and-effect, questions. While operational audits and impact, or
cause-and-effect, studies are often not large-scale efforts, questionnaires can be used to
confirm or expand their scope.
Questionnaires can be useful when the evaluator needs a cost-effective way to collect a large amount of standardized information, when the information to be collected varies in complexity, when a large number of respondents are needed, when different populations are involved, and when the people in those populations are in widely separated locations.
Furthermore, questionnaires are usually more versatile than other methods. They can be
used to collect more types of information from a wider variety of sources than other
methods because they use people, who can report facts, figures, amounts, statistics,
dates, attitudes, opinions, experiences, events, assessments, and judgments during a
single contact. People can answer for a specific type of source, such as members of a
health maintenance organization, or for a variety of types of sources, such as local,
state, and federal government officials.
Questionnaires are difficult to use if the respondent population cannot be readily
identified or if the information being sought is not widely distributed among the
population of those who hold the knowledge. Furthermore, questionnaires should not be used
if the respondents are likely to be unable or unwilling to answer or to provide accurate
and unbiased answers or if the questions are inappropriate or compromising.
In general, questionnaires should not be used to gather information that taxes the
limitations of the respondent. Sometimes people are not knowledgeable or accurate
reporters of certain kinds of information. They remember recent events much better than
long-past events. They remember salient and routine events and meaningful facts but do not
remember details, dates, and incidental events very well. For example, veterans might
accurately report that doctors made medical examinations for Agent Orange effects on their
eyes, ears, nose, throat, genitals, and pelvis but might substantially underreport skin
examinations. If the information were needed on skin examinations, other sources, such as
medical records, might be more useful. However, there are exceptions, particularly when
the respondents are highly motivated.
Structured questionnaires are also not particularly well suited for broad, global, or
exploratory questions. Because respondents have many different forms of reference, levels
of knowledge, and question interpretations, the structured methodology limits the
evaluators' ability to vary the focus, scope, depth, and direction of the line of inquiry.
Such flexibility is necessary to accommodate variances in the respondents' perceptions and
understanding that result from such questions.
Most of the people from whom GAO evaluators seek information are members of special
populations, such as federal and state government employees, welfare recipients, or
company executives. Unlike pollsters and market researchers, GAO evaluators rarely do a
national population survey. Consequently, some of the mass survey techniques like
random-digit dialing seldom apply to GAO work.1 Also, GAO evaluators very
rarely go back to the same population, and when they do, the time periods between surveys
are so long that they usually have to redocument the population.
| 1 Random-digit dialing refers to a telephone interview method that contacts people by dialing numbers at random. In some situations, usually when the population is hidden or not easily identified (for example, heads of households older than 66), this method may provide better access than other methods. |
Administration Considerations
If after considering the pros and cons of using questionnaires, a questionnaire is still
the method of choice for data collection, the evaluators need to consider the most
appropriate method of administration. The appropriateness of the method of administration
whether it be mail, face-to-face interview, or telephone varies with the resources and
constraints of the job, the abilities and motivation of the respondent population, and the
requirements of the evaluation. All three methods have comparative advantages and
disadvantages, depending on the time and cost constraints of the job, the characteristics
of the respondent population, and the nature of the inquiry.
Mail questionnaires are usually more cost effective but require longer time periods than
personal or telephone interviews. While mail questionnaires usually have higher
development costs than telephone or face-to-face interviews, this is generally offset by
the relatively inexpensive data collection costs. Mail questionnaires are the least labor
intensive of the alternatives, with the labor costs limited to the effort needed to mail
the questionnaire and track, follow up on, and edit the returns. Generally staff can mail
hundreds of letters or edit scores of returns in a given day. Workers are not so
productive with telephone and face-to-face interviews.
Because of the difficulty in establishing telephone or personal interview contacts and the
one-on-one nature of interviews, these alternatives require more staff time. Interviewers
usually do not complete more than 10 or 12 telephone interviews or two or three
face-to-face interviews in a day. Furthermore, the travel requirements for personal
interviews can be very expensive when compared to postage or telephone charges.
But mail questionnaires take longer to design and require longer periods for collecting
and editing data than other choices. Extra care must be taken with the mail questionnaires
because, unlike the other choices, there is no interviewer to help the respondent. Also,
mail is a slow means of transmission, and mail questionnaires take two or three
follow-ups. In summary, if money is tight and the subject matter can be phrased
intelligibly for the respondent population, use the mail; if time is tight and staff time
is not, use the face-to-face or telephone interview methods.
In addition to subject matter, respondent characteristics play a key role in the method of choice. For example, if the respondents are motivated and literate and have normal vision, the mail is often the best option; otherwise, use the telephone or an interviewer. If respondents cannot be readily located by address or telephone number but gather at particular places (such as restaurants, parks, or hospitals), then a face-to-face interview is the only option.
If the contact people are likely to conceal the identity of the intended respondent, and this is likely to make a difference, or if the evaluator is not sure that the intended respondent will get the questionnaire, then personal contact is better than telephone and telephone is better than mail. Also, if the respondent has a vested interest in giving biased reports that can readily be verified by inspection, then the face-to-face interview is the obvious choice.
However, if the contact has a likely chance of temporarily inconveniencing the respondent or the respondent has privacy concerns, then a mail survey has the advantage over the remaining choices.
Questionnaire characteristics also determine choice. Long, complex questionnaires
designed to be answered by simple checks or short fill-in-the-blanks are better suited for
self-administered questionnaires than the interview method. However, the converse is often
true if the questions require the composition of responses that are other than very short
answers (most people would rather speak than write). Also, if the questionnaire has many
complex and confusing skips that frequently require respondents to answer some questions
but not others, then one of the interview methods is preferable to a mail or
self-administered questionnaire.
In summary, evaluators should review the conditions and requirements of the data
collection before deciding to use questionnaires and again before deciding the methods for
administering the questionnaire. Mail questionnaires are a versatile, low-cost method of
collecting detailed data. They are particularly adaptable to survey methods when the
population is big, difficult to contact, likely to be inconvenienced, concerned about
privacy, and widely dispersed. But mail questionnaires usually have a long turnaround
time. The evaluators must be willing to invest the time required to carefully craft and
test these questions. And the respondent must be willing and able and sufficiently
literate and unbiased to accurately answer the queries. Interview methods, while much more
expensive and more prone to bias, help insure against respondent error, have less
turnaround time if sufficient staff is provided, and can be used to provide some
interviewer verifications.
Planning the Questionnaire
Once evaluators decide to use a questionnaire, planning starts with this paper, which
provides information on the procedures necessary to do each of the major tasks to design
and use questionnaires. The next step is to review the evaluation design and audit plan
and then mentally walk the job through each procedure necessary to design and implement a
questionnaire: developing the measures, designing the sample, developing and testing the
questionnaire, producing the questionnaire, preparing the mailout or interview materials,
and conducting the data collection, reduction, and analysis. A write-up of this mental
walkthrough, evaluated for comprehensiveness and feasibility, can serve as a basis for
writing the implementation plan.
Developing the Measures
As evaluators do their planning, they will find that the scope of the effort is greatly
influenced by information developed in the next two tasks developing the measures and the
sample design to ensure that the right questions are being asked of the right people.
Remember that the questionnaire is an instrument used to take measures. To be sure it can
do this, evaluators must first identify all the variables or conditions, criteria, causes,
and effects that they want to measure. Next, evaluators analyze these variables and
describe them so scientifically and precisely that they can be qualified, quantified,
manipulated, and related. As explained in chapter 2, "Developing the Measures to Get
the Questions," these measures define the requirements for the questionnaire.
Questionnaires are designed by establishing a framework and sets of related questions that
provide these measures.
Designing the Sample
Questionnaires are a way of asking the right people to take the measures needed to
complete an evaluation. Before evaluators begin to write a question, it makes good sense
to be sure they can find the people. The right people are representatives of a population
who share the experiences the evaluators are interested in and who have or can get, will
get, and will give them the information they need. Furthermore, evaluators must select
these people scientifically, so the population these people represent can be talked about
rather than just the individuals contacted. This is called a population survey, and how to
do a population survey with questionnaires is explained in chapter 3, "Designing the
Sample or Population for Data Collection."
Developing and Testing the Questionnaire
Once the evaluators have established what to measure and who to ask to take the measures,
they are ready to ask people to take these measures. Asking questions in the right way
requires the evaluators to write sets of questions so that the answerer can easily
understand precisely what information must be provided and, with little or no error, can
easily provide this information. This means writing questions in a way that facilitates
rather than interferes with the respondents' ability to understand the question and report
the answer to the best of their ability. This simply stated task is deceptively
complicated. To write good questions, evaluators must first understand something about the
very complicated mental or cognitive process people use to answer questions. If evaluators
access this cognitive process properly, the questionnaire can become a highly versatile
and powerful instrument for observation and recall. If not, it can become a source of
confusion and error.
The sets of inquiries or questions must then be organized into a draft instrument. This
questionnaire is then tested, reviewed, and revised until it is proven that as an
instrument it takes the required measures. Since completing these tasks is perhaps the
most difficult part of the job and consumes the most resources, we devote nine chapters
(chapters 4--12) to explaining some of the many known and tested ways to do this work.
Chapters 4-7 show how to facilitate the perception, acceptance, and understanding of
the questions and how to help respondents recall their mentally stored information. In
chapter 4, "Formatting the Questions," we show how to present the question in
the precise format best suited to get the specific type of information requested. We
demonstrate what respondents are likely to consider as fair and unfair questions in
chapter 6, "Avoiding Inappropriate Questions." In chapter 6, "Writing Clear
Questions," we explain how to write a question that can be quickly, easily, and
precisely understood by all respondents in the same way. And in chapter 7,
"Developing Unscaled Response Lists," we explain how to write in a way that aids
respondents as they cognitively search their minds to select the answers to questions.
Chapters 8 and 9 deal with the problem of bias and error. This problem has two sources:
the question writer and the question answerer. Chapter 8, "Minimizing Question Bias
and Memory Error," illustrates many of the typical mistakes question writers make and
how to avoid them. Chapter 9, "Minimizing Respondent Bias," explains the ranges
of capacities and limitations that respondents have in answering questions and how to make
the most of the respondents' abilities and minimize the risk and compromise of their
shortcomings.
Chapter 10, "Measurements Error and Measurement Scales in Brief," explains how
to translate the question answers into qualitative and quantitative measures for use in
GAO reports. Throughout chapter 10, we deal with how to write individual questions.
However, when we put these individual questions together into a single questionnaire, they
often interact with one another in a context that affects the measuring of the questions.
Chapter 11, "Organizing the Line of Inquiry," shows how to organize these
questions into a line of inquiry that can enhance the
quality of the answers and minimize unintended and interfering effects.
After finishing the first 11 chapters of this paper, evaluators should be able to help
write the first draft of a questionnaire. But there is still much more to be done before
evaluators can use this draft as a survey instrument. They should go through a
quality-assurance procedure, which requires that the draft questionnaire be tested and
validated. The methods for this task, and other quality assurance tasks carried out during
data collection and analysis, are described in chapter 12, "Following Quality
Assurance Procedures."
Producing the Questionnaire
Once the questionnaire has been tested and validated, and probably revised, the evaluators
can put it in final form and use it to collect and analyze data to answer the assignment
questions. Good questionnaires can be seriously compromised if they are not presented in a
format that is easy to read and administer. Chapter 13, "Designing the Form and
Layout," addresses this issue and shows the evaluator how to design the questionnaire
type, format, and layout in a manner that greatly facilitates the user's ability to
perceive and respond.
Preparing for and Collecting Data
Several administrative procedures, such as preparing the transmittal or contact letters or
mail piece or interviewers' kits, must precede data collection. Data collection methods
then involve such activities as mailing, contacting, interviewing, tracking, and following
up on nonresponses. Poor quality in the execution of these fundamental and very important
activities can cut the response rate by as much as 50 percent. To avoid this problem, we
have documented procedures shown to be highly effective for mail surveys in chapter 14,
"Preparing the Mail-out
Package and Collecting and Reducing the Data." Activities needed to check, edit, and
prepare the data for computer processing are also covered in this chapter.
Analyzing Data
Chapter 15, "Analyzing Questionnaire Results," discusses some of the initial
thinking and conceptualization that are important to the data analysis, including the
development of a strategy and a plan for the data analysis. We do not describe data
analysis methods since they are covered in Quantitative Data Analysis: An Introduction.2
Chapter 16 concludes the discussion on using mail and self-administered questionnaires.
| 2 U.S. General Accounting Office, Quantitative Data Analysis: An Introduction, GAO/PEMD-10.1.11 (Washington, D.C.: June 1992). |
Telephone Surveys
Personal or telephone interviews are also important and useful methods for collecting
structured data for GAO assignments. While the methodology for asking good questions
developed in this paper applies regardless of whether the questions are asked in a
self-administered mode, such as by mail, or in some other mode, such as a face-to-face or
telephone interview, certain limitations are specific to each administration method. Those
that apply to conducting telephone surveys are discussed in the concluding chapter 16,
"Adaptations for the Design and Use of Telephone Surveys." Further details on
personal interviews are presented in Using Structured Interviewing Techniques.3
| 3 U.S. General Accounting Office, Using Structured Interviewing Techniques, GAO/PEMD-10.1.6 (Washington, D.C.: July 1991). Some information relevant to conducting face-to-face interviews is presented in chapter 12 of this paper, in a section dealing with pretesting techniques. |
Chapter 2
Developing the Measures to Get the Questions
Deciding what and whom to ask appears be a straightforward task. But appearances can be
deceiving. And as we shall see in the next two chapters, this initial step must be thought
through with careful consideration and structured to an elemental level of detail. The
what and whom to ask decision lays the foundation for the focus and scope, the level of
difficulty and complexity, the risk, completion times, data collection, analysis, and
processing requirements and resources needed for the job. Hence, all the job plans are
based on this decision. Furthermore, the three major sources of error--misspecification of
variables, measurement error, and sampling error-are often introduced at this stage.
In this chapter, we discuss methods for documenting what a questionnaire should ask. This
documentation will be used to develop a framework for writing the questions, describing
the variables in scientific terms necessary for measurement, developing the measures, and
specifying the variable relationships in order to check for misspecification of variable
and measurement errors. In the next chapter, we discuss protocols for selecting the target
population in ways that maintain the integrity of the design and minimize sampling error.
Because deciding what to ask and deciding whom to ask it of are complex, we have described
them in two chapters. However, in actual practice, deciding what and whom to ask go hand
in hand and are among the few tasks in survey research that must be done interactively and
iteratively. This is because the questions we ask are determined by both the need for
information and the respondent's ability to provide this information.
To document the questionnaire framework, variable operationalizations, measures, and
variable relationships, it is best to start with what we know about the requirements
of the job and mentally work in two directions, by thinking, first, in the abstract to
integrate and conceptualize and, then, shifting to more concrete logic to define and
analyze. At the start, evaluators usually find that some of the information they will need
is very global, general, and abstract and other information is highly specific. However,
most of the information they have gathered is at a middle level of detail, and they can
begin by working with what they have. Information should be available from the job design,
audit plan, evaluation framework, and previously gathered background material. Evaluators
should conceptualize and organize this information into a framework of inquiry or types of
questions that can be developed to yield answers to the evaluation questions. Often they
may have to do additional research or additional thinking through to fill knowledge gaps.
Next, they must go in the other direction and think more concretely and analytically. They
must specifically describe or operationalize these information requirements and develop
measures that will satisfy these requirements. Finally, they should integrate these
conceptualizations and analysis into a format that presents the key relationships of the
measurement variables. The process needed to develop each document product is described in
the following sections.
The Questionnaire Framework
Initially, the evaluators decide what constructs, traits, conditions, or variables are to
be measured and how to measure them. The documentation for this task is sometimes referred
to as a questionnaire framework. The framework is usually depicted as a taxonomical
classification. It is a scheme that lays out the evaluation questions and all the
information required to answer each question with ordered and specified relationships. In
essence, the framework provides a
roadmap identify and track the kind of data needed to answer the questionnaire.
A relatively uncomplicated example might be structured in response to the evaluation question "Is the size of the 4year college associated with student performance?" The constructs (or the things evaluators want to measure) for college size and student performance and their relationships are identified for measurement development.
The identification of these constructs and their relationships influences the choice of data collection sources, methods, and measures. For instance, in the example above, we can readily see that there are alternatives: the use of extant data from various national graduate record achievement score data bases, surveys of administrative and academic deans, and so on. And just as the choice of methods and sources will force a choice of measures, so will the choice of measures determine the methods and sources.
Hence, these choices must be made interactively and iteratively. The relationship of college size to student performance was a simple example. In this case, evaluators might have been able to proceed without committing measurement considerations to paper, but it is nearly impossible to plan complex questionnaires without documentation. For example, consider the following evaluation question: "What are the needs of earth-orbiting satellite image users?" The answer to this question requires a plurality of complex considerations, constructs, and measures such as the identification of the different types of uses (national and international scientists, political administrators, disaster managers, and earth resource manages) the identification of the national and international, geopolitical, and socioeconomic considerations that determine the type of use and the measures of the quality of the information displays of the satellite and the relationships among the variables and constructs. This is a level of complexity that requires documentation.
As we can see by the example, the framework identifies, specifies, and justifies the need for the information, constructs, variables, measures, and variable relationships that the evaluator wishes to collect data on. It is a scheme for documenting the information needs requirements. It is not a questionnaire but rather the basis for the questionnaire.
Operationalizing the Constructs
So far we have talked in broad terms about ideas or concepts, traits or properties, and
characteristics evaluators often like to measure-usually referred to as constructs. These
constructs are not measures until the terms are specific enough to standardize. By
Ustandardize," we mean that questions are designed and asked so that each recipient
will understand and answer the same question in the same way. Different people reading the
same questions need to have a common understanding. For example, one survey asked
congresspersons about the "timeliness" of reports. Some respondents interpreted
the construct "timeliness" as turnaround time while others interpreted it as
getting the report information in time to use it for legislative decisions. As we can see,
standardizing is very important because it enhances the objectivity of the resulting
measure.
The fist step toward standardization is to operationalize or to define the construct in
concrete, specific, unambiguous, and contextual terms that reduce the measure to a single
trait or characteristic. Failure to do this in the example citing the size of the college
resulted in a misspecification of this variable. The respondents variously interpreted
size of college
as spring enrollment, fall enrollment, total spring and fall enrollment, total full-time
plus part-time spring enrollment, total full-time and part-time fall enrollment, full-time
equivalent enrollment, and so on. The construct should have been operationalized as the
enumeration of both the total full-time enrollment and the total part-time enrollment as
of the close of the spring 1992 semester or quarter.
Developing Measures From Operationalized Constructs
Measures are developed by giving operationalized constructs a dimension. Measures qualify
and sometimes quantify the trait in a single dimension such as presence or absence or the
amount, intensity, value, frequency of occurrence, or the ranking or rating or some other
form of comparative valuation or quantification. The next few paragraphs will help
familiarize the reader with some of the requirements of a measure. Although this
familiarization will proceed in other chapters of this paper through discussion and
example, evaluators should consult a text specifically devoted to measurement or consult a
specialist when complex measures are required.
Measures must be accurate, precise, valid, reliable, relevant, realistic, meaningful,
comprehensive, and in some cases complementary, sensitive, and properly anchored. While
evaluators may readily understand the meaning of precision and accuracy, some of the other
terms may need to be defined, because in measurement they are used in a very special way.
For instance, measures are considered valid if they are logical and they measure what they
say they are measuring. They must adequately represent the trait in question. They must
consistently predict outcomes, vary as expected in a variety of situations, and hold up
against rigorous attempts to prove them invalid. We have all seen valid and questionable
measures. Positive examples might be found in well-executed polls that predict voter
outcome to a reasonable
degree of accuracy. A negative example might be found in the logic of using complaints as
a measure of discrimination, because the cost, time to resolve a case, difficulty in
proving discrimination, difficulty in filing, fear of retaliation, and other reasons
discourage the aggrieved from filing a complaint.
Next, consider reliability, which is different from and independent of validity. To be reliable, a measure must give consistent results when repeated under similar situations. For example, IQ tests and employee attitude surveys usually give consistent results when repeated under similar circumstances with the same people.
Measures should be relevant, meaningful, and realistic. For example, some very valid measures like IQ and grade-point average are used to hire employees. These are not relevant measures if the new employee is expected to be creative and inventive and generate new ideas, because the traits of IQ, grade-point average, and creativity are not correlated. Also, the labels given to the measure should correctly describe and communicate its meaning. For example, manages frequently measure things like costs, staff time, and number of reports under the term "quality measures." These measures may index effectiveness or productivity but not quality. The measure should be realistic or practical. For example, if a reader's pupils are dilated, this might be a good measure of his or her interest, but these observations are very hard to make. Therefore, under certain circumstances, the accuracy of the respondents' information recall and self-reports, while not as accurate, are more useful because answers are easy to obtain.
Ideally, measures ought to be comprehensive and, in some cases, complementary. Comprehensive measures span the entire range of values that are of interest with equal precision. A single measure usually refers to a single trait, but sometimes if the construct is multidimensional or has several traits for reasons of economy or the need to capture two or more traits as they work together, we develop a measure that captures these multitrait effects. For example, asking the respondent if the text was easy to read and readily understandable might be considered a comprehensive measure. In contrast, complementary measures are measures that are distinct and must be taken together to reflect the construct. For example, the number of contrast shades and the sharpness of the contour lines are needed to measure photographic image quality.
Other features of measures that are also important are sensitivity and anchoring. Sensitivity refers to a measure's ability to detect (1) the presence or absence of the trait, (2) levels of intensity, or (3) changes in the level of intensity with sufficient precision at sufficiently low levels to meet the needs of the evaluation. Anchoring refers to the establishment of clear, concrete points on the measurement scale that are meaningful to the respondent. That is, the scale should have meaningful starting, interim, mid, and end points. For example, we might anchor estimations of lighting quality as too dim (not bright enough to read a newspaper), appropriate (could comfortably read a newspaper), or too bright (too much glare to comfortably read a newspaper).
An example of a complex measure taken from one of the cases cited in the preceding part
of this chapter is presented in figure 2.1. The measure was developed from a construct
identified with a questionnaire framework: the user's perception of the quality of an
earth-orbiting satellite image. The construct was operationalized and developed into a
measure of image quality. During this process, particular attention was given to accuracy,
precision, validity, reliability, realism of application, meaningfulness of concept, the
comprehensiveness and complementary requirements, measure sensitivity, and anchoring of
the measure.
*insert figure 2.1
Specify the Key Variable Relationships
We conclude this chapter with a brief but important discussion on specifying the variable
relationships to be evaluated. (The two remaining sets of documentation needed to initiate
the planning-the identification of and the selection of the target populationare discussed
in the next chapter.) This task is important because, as we shall see, errors or omissions
in specifying the variable relationship can either invalidate or weaken the evaluation. In
this task, evaluators document and review all variables to ensure that all key variable
relationships are included and specified with common units of analysis and for appropriate
functional relationships and in the appropriate measurement stratification and time
periods so as to permit statistical, temporal, and cross-sectional observations and
comparability. These variable relationships should be documented down to the level of
measurement specification.
Then the evaluation design, the evaluation framework, and the questionnaire framework
should be checked against this documentation to make sure nothing important is left out
and that nothing unnecessary is included. A review should ensure that the sample or
population measurements are to be taken on-and generalized to-common units of analysis.
For example, in one case we found that one measure was to be taken on contractors, while
its comparison measures applied to contracts. A review should be made for changes that
would facilitate statistical comparability. For instance, the evaluators may find that one
of the measures to be related is unnecessarily categorized while the other is continuous,
or that some measures are inappropriately categorized for the intended cross sectional
comparisons, thus weakening the statistical power of the analysis or, worse yet, rendering
the analysis invalid.
Further, review should make sure the specified categories in the comparison variables
are not likely to confound crosssectional comparisons. For instance, suppose we know from
past studies that the effects of training are not likely to be noticed until 9
months later, there is less bias against the mentally disabled in the city than in the
suburbs, or treatment for violence exposure is most effective soon after the incident. If
evaluators test for training effect soon after the training, they may not see an influence
because the trainees did not have enough development time to assimilate their experience.
If the test is for bias against the mentally disabled only in the inner city rather than
in both the inner city and the suburbs, the evaluators may not find the effect because
this bias is less noticeable in the inner city. If they test for the effects of treatment
for exposure to violence on only those who waited a year before receiving treatment, they
may not see the effect because the treatment was given too late to do much good. Hence,
evaluators must make sure that the cross-sectional comparison categories are structured to
capture, not hide, the effects under study.
Another point is to make sure the temporal comparisons are appropriate. For example, it
is not unusual to find that the data for the different variables in the relationship are
to be collected during different yeas. Finally, it is important to be sure important
categories were not left out. This is because the sampling specialists will use this
documentation to design the sample. For instance, in one case the evaluator was
disappointed to find that the sample did not have enough power to compare important city,
race, and educational stratifications because the sampling specialist had not been aware
of these stratifications.
Chapter 3
Designing the Sample or Population for Data Collection
Along with deciding what to ask, evaluators must decide who to ask. The people questioned must have the information the evaluators they must be readily identifiable and accessible, they must be willing and able to answer, and they must be representative of the population being measured. They can be migrant workers, prisoners, police, scientists, medical doctors, commanders or soldiers, inner city African American youths, or government officials..
Ideally, everyone in the population should be questioned, and sometimes this is done if
the population is very small. But usually the best that can be done is to take a sample of
these people and generalize the findings to the population they come from.
In theory, to generalize findings, evaluators must fist define the population. Then they
should enumerate every unit in the population in a way such that every unit has an equal
chance of being selected for the sample. In practice, it may be unrealistic to expect to
enumerate every unit in a real population (for example, all persons who participated in a
government program such as Head Start), but the enumeration must be reasonably complete
and accurate and be reasonably representative of the actual population. The evaluators
must then draw a representative sample from this population.
Survey Population
However, the sample cannot be determined or drawn until the evaluators have studied the
size and characteristics of the population they want to know about. All too often, this
step in questionnaire development is overlooked or assumed to be routine. Then, when the
questionnaire is complete and ready to be mailed, the team is faced with weeks of hard
research, or a major redesign, because the sample was not well founded.
The fist step in defining the survey population is to learn about the population
distribution-the mayor categories of units and the numbers category. For example, if the
evaluators want to sample banks, they should learn the differences between county,
regional, statewide, branch, and unit banks; they should know geographic location factors
and understand the basis for classifying banks as very large, large, medium, and small. If
they are studying unit commanders in the armed services, they should know the unit sizes
and types and the variations among the services. This research will help in designing
sampling factors, such as stratification and stratification size, and will ensure a
representative sample.
Once the evaluators are familiar with the characteristics of the population, they can look
for sources that enumerate each unit in the population or develop a reasonable theory for
selecting the sampling units. The enumeration should be accurate, up-to-date, and
organized to reflect the distribution characteristics. Sometimes this task is relatively
easy. For example, in one project we needed to assess the effect that the Foreign Corrupt
Practices Act had on U.S. business. The act prohibits payments to foreign officials if the
purpose is to influence business. The population was U.S. companies that conduct most of
the foreign business. These companies were readily identified because they were among the
Fortune 1,000 companies, which conduct most of the foreign business. All we had to do was
buy this list from Fortune magazine. The list gave the order of the companies by
sales volume and provided information on each company's activities and the name and
address of both the chief executive officer and the chairman of the board. However, for
many other projects, considerable effort is needed to document the survey population.
In practice, evaluators rarely have a list of the real population; at best they have only
a list at the time the source material was current. By the time the questionnaire is
administered, some units will have left the population and others will have joined it. For
example, in the Fortune 1,000 evaluation, 6 percent of the firms left the population and
we do not know how many may have joined it. The sample analysis must evaluate and make
statistical adjustments for the losses. Whenever possible, the effect of the additions
should also be considered.
The best way to start enumerating a population is to talk to experts in the field and
search out likely organizations, archives, directories, libraries, and management
information systems until a reliable source has been discovered. Then the sampling units
or population elements are organized, reorganized, or indexed into groups or frames, so
they can be reached by a random, systematic, or prescribed process. For example, in one
evaluation, we had to locate retired military users of military medical facilities. From a
Department of Defense archival data base we were able to get the names and addresses of
all the retired military personnel but we had no way of knowing if they were uses of a
particular medical facility. Our field work showed that retired military were likely to
travel up to 40 miles to use hospital services; if they lived farther away, they usually
made other arrangements. So we developed a computer program, based on zip codes, that
matched persons to the hospitals that were within 40 miles of their homes.
In a study of zoning problems encountered by group homes for the mentally disabled, we
discovered that there was no national register of group homes. Since this was a study to
see if this restrictive zoning practice was geographically widespread, we sampled
catchment areas. We then called up the catchment area directors and got the names and
addresses of every group home in each catchment area and sent the group home directors a
questionnaire asking about their zoning problems.
Sometimes, no matter how hard the search, archival data or records cannot be found from
which to develop a population. When this happens, the best thing to do is to look for
groups, sections, or clusters of files or lists that contain the information. Or the
evaluators may want to look at existing data to surmise some ratio or relationship
associated with the population. For example, if they want to define the population of
general aviation flight-service airport specialists, they may be able to use previous work
or pilot or survey studies. For example, from previous experience, they may find that they
can estimate that the average number of specialists per airport is 16, multiply 16
specialists by the 316 airports, and estimate the population at about 5,000.
Unfortunately, in a great many cases, there is neither a population enumeration nor a
way to get cluster, unit, or ratio figures. In these cases, the evaluators must try to
document the biggest possible portion of the most important and most representative cases,
or they must develop some reasonable theory for selecting the sampling units. For example,
to get a representative list of internal auditors, the evaluators might use the membership
list for the Institute of Internal Auditors plus a list of the internal audit departments
for the Fortune 1,000 companies. The latter would be included because most of them have
internal audit departments.
In one situation, we had to sample major importers and exporters. The available list had
over 10,000 entries, almost all of which were too small to be considered major. So we used
a combination of a "small world network" and a "snowball" approach. We
found an association on the eastern coast to which most major midAtlantic shippers
belonged. We contacted the association and obtained a list of the major shippers and their
business volume. This association identified two other shippers' associations, which
provided their lists and the names of six more associations. We continued until we had
identified all associations and had a list of most of the major shippers. The shippers'
associations reviewed our list and estimated that it accounted for 82 percent of the
import-export business.
Many other sources of specialized lists are available, but their reliability varies
considerably. For example, major organizations such as the American Medical Association,
the National Education Association, and the National Association for Home Builders can
provide detailed address lists and population descriptions of their members. However,
their cooperation varies with their interest in what the job is about The cost for lists
can be anything from nothing to a few hundred to several thousand dollars. Although the
Bureau of the Census sometimes has useful lists, such as the census of manufactures and
the census of governments, these sources may be out of date. Many commercial sources, such
as Ruben and Donnelly, Polk, and Thomas, sell population lists. Also, some commercial
firms such as Dunn and Bradstreet sell specialized lists for various users, such as mail
order companies. Care must be taken in using these lists because their quality varies
considerably and very little may be known about the bias built into them, how they were
developed, or what they include and, more importantly, exclude.
Before using a list, it is a good idea to review and perhaps test it For example, in a
sample survey of farmers, the address list was developed from a list of subscribers to the
Farm Home Journal. The list turned out to be several yeas old, and many of the subscribes
were not farmers in the technical sense but people who sold or bought agricultural
equipment or products or who were interested in rural living.
Selecting the Sample
Once the population has been enumerated and the evaluators sure that it represents the
population to which they want to generalize, they are ready to draw the sample.
The sample must be drawn in accordance with a procedure that ensures a random selection. The sample size must be large enough to provide the degree of measurement precision and accuracy generally accepted by the scientific community. This must be done very efficiently and cost effectively. In many instances, accomplishing this will require the assistance of a sampling statistician who has the appropriate technical skills and practical experience.1
| 1 See U.S. General Accounting Office, Using Statistical Sampling, GAO/PEMD-10.1.6 (Washington, D.C.: May 1992). This paper provides a thorough treatment of this topic. |
Nonstatistical Sampling
Questionnaires may be used on projects in which statistical sampling is not used, so we
need to consider briefly other ways in which evaluators select cases (Deming, 1960).
Either all the cases can be studied-that is, a census can be taken-or part of the
population can be selected in a nonstatistical manner. When evaluators take part of the
population, they usually do so for a reason. It may be that they are doing a case study,
so they select one or more cases that provide the best opportunity to observe the
phenomena or relationships of interest, and they do not need to generalize their findings
to the population. In other situations, the evaluators know very little about the
population and cannot draw a statistical sample, so they arbitrarily select as many cases
as they can and report the findings. However, in many situations, evaluators want to
generalize and they know something about the population but it is just not feasible to
draw statistical samples. So they pick a sample that they hope will correspond, in its
features, to the population, even though they know they will not be able to use the
powerful reasoning associated with statistical samples. An important category of
nonstatistical sampling is "Judgments sampling."
A judgment sample draws its name from the fact that in the judgment of the evaluator, the cases chosen correspond to certain aspects of the population. The cases may be selected because they are judged most typical, because they represent the extreme ranges, because they represent a known part of the population, or because they simulate or act as a proxy for a representative sample from the population. For example, we could interview all the Fortune 500 chief executive officers in New York and Chicago because we believe that this sample is typical of chief executive of in large companies. We could study selected group homes for the mentally disabled in California, Mississippi, New York, and Texas, because these states represent the extremes of the laws and practices. We could study 60 prime contractors with the Department of Defense in California and New York, because these contractors account for 82 percent of all defense contracts. We might pick 15 airports in 11 states, such that the sample would be similar to the population of airports with respect to size, geographic coverage, and weather conditions.
As a rule, the use of judgment sampling in a project in which the intent is to
generalize is ill advised, because arguments to support generalization cannot be nearly as
persuasive as with statistical samples. However, occasions may arise (as with a very
homogeneous population) in which the situation is not altogether bleak.
When the validity of the findings depends on the extent to which they can be generalized
to the population, and when there is no statistical sample, it might help to have some
rule of thumb that might compare judgment samples to statistical samples. One way to
picture the relationship between statistical samples and judgment samples with respect to
representativeness might be to imagine a credibility scale from 1 to 10. Assume that a
score of 1 is the value given to a single case study designed without intent whatsoever to
generalize, and 10 is the credibility associated with studying the whole population. A
very large, statistically valid random sample might yield a value of 9. A large, medium,
and very small but statistically valid random sample might yield respective scores of 8,
7, and 6. If we made many case studies but did not take a random sample, we might get a
value of 4. We might extend this value to 5 if the groups were large enough to provide
statistical certainty within their limited area of selection or if the population was very
homogeneous. We might get the same score of. 5 if we selected a number of cases that
represented the range of conditions and circumstances that apply to the population.
(Incidentally, this is how pretest candidates are selected, because there is neither time
nor resources to draw a statistically valid sample.) However, the score would drop to 3 or
even 2 if we selected many or fewer cases without giving consideration to representing the
expected range of conditions.
A few yeas ago, we did a review of the elderly in which we selected thousands of cases at
random from the same city. This might have been acceptable, from a generalization
viewpoint, if we were measuring the conditions associated with cholesterol levels; these
levels could be presumed similar for most U.S. city-dwellers. However, in this review, we
were concerned about programs and their effects, which may have varied from city to city.
Thus, limiting the sample to one city prohibited generalizations beyond the city that was
studied. Another example involved a population of 132 health maintenance organizations. We
arbitrarily picked 16 of these organizations and collected data from hundreds of people in
each one. In the end, what we came up with was a set of 16 case studies. Although the
sample for each case study was representative of the population of people in one of the
132 health maintenance organizations, the 16 case studies together permitted only very
careful and limited findings. We might have had a much more powerful evaluation at a
fraction of the cost if we had taken a random sample of organizations and looked at fewer
cases within each organization.
Chapter 4
Formatting the Questions
Before writing the questionnaire, the evaluators need to choose the format for each
question. Each format presented in this chapter serves a specific purpose that should
coincide with the available information and data analysis needs.
Open-Ended Questions
Open-ended questions are easy to write and require very little knowledge of the subject.
All the evaluators have to do is ask a question, such as "What factors do you
consider when you pick a carrier?" But this type of question provides a very
unstandardized, often incomplete, and ambiguous answer, and it is very difficult to use
such answers in a quantitative analysis. Respondents will write some salient factors that
they happen to think of (for example, lower rates and faster transit time) but will leave
out some important factors because at that moment they did not think of them. Open-ended
questions do not help respondents consider a range of factors; rather, they depend on the
respondents' unaided recall. There is no way of knowing what was important but not
recalled, and because not all respondents consider the same set of factors, it may be
extremely difficult or impossible to aggregate the responses.
Also, the evaluators may not know how to interpret the answers. For example, people
might say they choose a carrier because it is more convenient or less trouble. There is no
way of knowing what this means. It may mean any thing from faster transit time to easier
documentation.
Another problem is that open-ended questions cannot easily be tabulated. Rather, a
complicated process called "content analysis" must be used, in which someone
reads and rereads a substantial number of the written responses, identifies the major
categories of themes, and develops rules for assigning responses to these categories. Then
the entire sample has to be
gone through to categorize each answer. Because people interpret differently, three or
four people have to categorize the answers independently. Furthermore, rules must be
developed to handle disagreements and only very low levels of qualitative analysis can be
performed.1 Similarly, at the conclusion of the data reduction phase, only very
low levels of qualitative analysis can be performed.
| 1 Interrater reliability is a measure of the consistency among the people categorizing the answers. |
Still another problem is that open-ended questions substantially increase response burden. They usually take several minutes to answer, rather than a few seconds. Because respondents must compose and organize their thoughts and then try to express them in concise English, they are much less likely to answer.
However, open-ended questions do sometimes have advantages. It may happen that they are
unavoidable when, for example, we are uncertain about criteria or we are engaged in
exploratory work. If we ask enough people an open-ended question, we can develop a list of
alternatives for closed-ended questions. We can also use open-ended questions to make sure
our list of structured alternatives did not omit an important item or qualification. We
can also ask open-ended questions to obtain responses that might further clarify the
meaning of answers to closed-ended questions or to gather respondent examples that can be
used to illustrate points. The rest of this chapter details closed-ended questions,
because they are the meat and potatoes of our work.
Fill-in-the-Blank Questions
Each questionnaire usually has some fill-in-the-blank questions. They are not open-ended
because the blanks are accompanied by parenthetical directions that specify the units in
which the respondent is to answer. Some examples are shown in figure 4.1.
Figure 4.1 Fill-in-the-Blank Questions
Fill-in-the-blank questions should be reserved for very specific requests. The
instructions should be explicit and should specify the answer units. Sometimes, several
fill-in-the-blank questions are asked at once in a row, column, or matrix format, as shown
in the examples presented in figure 4.2.
*insert figure 4.2
Yes-No Questions
Unfortunately, yes-no questions are very popular. Although they have some advantages, they
have many problems and few uses. Yes-no questions are ideal for dichotomous variables,
such as black and white, because they measure whether the condition or trait is present or
absent. They are therefore very good for filters in the line of questioning and can be
used to move respondents to the questions that apply to them, as in figure 4.3.
Figure 4.3 Yes-No Filter Question
However, most of the questions GAO asks deal with measures that are not absolute or measures that span a range of values and conditions. Consider the question: "Were the terms of the contracts clear?" Most people would have trouble with this question because it involves several different considerations. First, some contracts may have been clear and others may not have been. Second, some contracts may have been neither clear nor unclear or of marginal clarity. Third, parts of some contracts may have been clear and others not clear.
Because so little information is obtained from each yes-no question, several rounds of questions individually have to be administered to get the information needed. "Did you have a plan?" "Was the plan in writing?" "Was it a formal plan?" "Was it approved?" This method of inquiry is usually so boring as to discourage respondents.
Sometimes, question writers try to compress their line of inquiry and cause serious
item-construction flaws. They ask for two things at once-a double-barreled question. For
instance, a yes-no answer to "Did you get mission and site support training?" is
imprecise.
How do respondents answer if they got mission but not site support training?
A related question-writing mistake is mixing yes-no and multiple choice. See figure 4.4.
Figure 4.4 Mixed Yes-No and Multiple Choice Question
The example in figure 4.4 has several problems. The question and the response space do
not agree. This slows up the cognitive processing because the question prepares the reader
for a simple yes-no answer. But in reality the reader gets not a yes-no answer space but,
rather, a list of qualified alternatives. The response alternatives are biased toward
"yes" because most of the choices have "yes" in them. Furthermore,
"no" in the last item cannot be used with the correlative conjunction
"neither nor," because this is an unintended double negative. Such questions
make a simple inquiry difficult because they are counter to the cognitive process,
burdensome, and cause errors.
Yes-no questions are prone to bias and misinterpretation for several reasons. First, many
people like to say "yes." Some have the opposite bias and like to say
"no." Second, questions such as "Do you submit reports?" have what is
called an "inferred bias" toward the "yes" response. The most common
way to counter this bias is to add the negative alternative-for example, "Do you
submit reports or not?" However, if this is done, the use of yes-no choices in the
answer must be qualified or avoided. Without this precaution, a simple "yes"
answer may be read as applying to both parts of the question, "yes, I submit"
and "Yes, I do not submit." A simple "No" might also be read as
"No, I do not submit"a double negative. To prevent confusion, qualify the answer
choices or avoid yes-no answers. See figure 4.5.
Figure 4.5 Balanced and Unambiguous Yes-No Question
"Implied No" Choices
In figure 4.6, failure to check an item implies "no." The implied-no choice
format is used because it is easy to read and quick to answer.
Figure 4.6 "Implied No" Question
When evaluators want to emphasize the "no" alternative, they can expand the
implied-no format to include one column for "yes" answers and one for
"no." "No" is listed as an option when the respondent might not answer
or might overlook part of the question, as when the choices are difficult, the list of
items is long, or the respondent's recollection is taxed. If "no" is not
included as an alternative, no's will be overreported, because the analysts will not be
able to differentiate real no's from omissions and nonresponses. An example appears in
figure 4.7.
Figure 4.7
| Questions asked | Yes (1) | No (2) |
| 1. Nervousness | ||
| 2. Headaches | ||
| 3. Numbness in arms, hands, legs, feet | ||
| 4. Infections | ||
| 5. Liver problems | ||
| 6. Weight loss | ||
| 7. Fatigue | ||
| 8. Skin Problems | ||
| 9. Lung Problems | ||
| 10. Change in sex drive | ||
| 11. Sterility | ||
| 12. Birth defects in children |
Single-Item Choices
In single-item choices, respondents choose not "yes" or "no" but one
of two or more alternatives. See figure 4.8 for an example. Since yes-no and single-item
choices are similar, they have the same types of problems, but the difficulties are less
pronounced in some respects and accentuated in others.
Figure 4.8 Single-Item Choice Question
On the positive side, the differences between the choices are usually clear, and the
writer can set up a truly dichotomous question. If used carefully, the single-item choice
can be efficient. It often serves to filter people out or to skip them through parts of
the questionnaire. It is not likely to be overused and cause excessive cycles of
repetition. Furthermore, the question writer is not likely to compress the question into a
double-barreled item. The single-choice format is also not subject to bias from yea sayers
or nay sayers. And eliminating the negative alternative reduces misinterpretation.
But there are problems. In the single-choice format, the writer is more apt to bias one of
the choices by understating or overstating it. Some writers may not properly emphasize the
second alternative; others, aware of this tendency, overcompensate.
Expanded Yes-No Questions
One way around the yes-no constraints is to use an expanded yes-no format like that shown
in figure 4.9. The expanded yes-no format gives a measure of intensity, avoids some of the
biases common to yes-no, implied-no, and single-choice questions, and resolves the problem
of quibbling. Consider the question, "Could you have gotten through college without a
loan or not?" Also in the expanded format more students will answer in the negative
than otherwise.
Figure 4.9 Expanded Yes-No Format
The expanded alternatives can have qualifiers other than "probably yes" and "probably no." Qualifiers can be changed to meet the situation-"generally yes" and "Generally no" or For the most part yes" and "for the most part no."
Free Choices
Yes-no, implied-no, single-choice, and expanded formats are forced choices in that
respondents must answer one way or the other. Forced-choice items generally simplify
measurement and analysis because they divide the population clearly into those who do and
those who do not or those who have and those who have not. Unfortunately, putting the
population into just two camps may also oversimplify the picture and yield error, bias,
and unreliable answers. To avoid this problem and to reduce the respondent's burden, a
middle category can be added, as in the question in figure 4.l0.
Figure 4.10 Expanded Yes-No Format With Middle Category
Even though the proportion of yes's to no's will not change, the evaluators will have a better measure of the yes-no polarization, because the middle category absorbs those who are uncertain. A good rule of thumb is that if we are not certain that nearly everyone can make a clear choice-we include a middle category.
Usually, the question asker will also put in an "escape choice" to filter out
those for whom the question is not relevant. Examples are "not applicable,"
"no to judge," "have not considered the issue," and "can't
recall." See figure 4.11.
Figure 4.11
Multiple-Choice Questions
The most efficient format-and the most difficult to design-is the multiple-choice
question. The respondent is exposed to a range of choices and must pick one or more, as in
the example in figure 4.12.
Figure 4.12 Multiple-Choice Question
Multiple-choice questions are difficult to write because the writer must provide a
comprehensive range of nonoverlapping choices. They must be a logical and reasonable
grouping of the types of experience the respondents are likely to have encountered.
The example in figure 4.12 turned out to be flawed in practice. We learned during the
pretest that we had left out some important choices. We detected this error because many
respondents wrote answers in the "other" category.
Because this format is very important and requires the most research, field work, and
testing, and because the analysis and interpretation can be
complex, we discuss multiple-choice question design in chapter 7 in considerably more
detail.
Ranking and Rating Questions
Ranking questions are used to make very difficult distinctions between things that are of
nearly equal value. The question forces the respondent to value one alternative over
another no matter how close they are. The value that is assigned is a relative value.
Rating questions are used when the alternatives are likely to vary somewhat in value and
when evaluators want to know how valuable the alternative is rather than if it is a little
more or less valuable than the next alternative. First consider ranking. In ranking, the
respondents are asked to tell which alternative has the highest value, which has the
second highest, and so on. They rank the choices with respect to one another, but their
answers tell little about the intrinsic value of their choices. For example, suppose we
asked respondents to rank the importance of the following services for institutionalized
children: education, health care, lawn care, telephones, and choir practice. They would be
hard put to choose between education and health care, because both are essential to the
children's development. But they would have to rank one first and one second. Telephones
would probably be ranked third. Compared to health care and education, telephones are much
less important, yet they are ranked third just behind two services that are so important
that it is difficult to choose between them.
Ranking starts to get hard for people when there are more than seven categories. This is
because they can usually pick the first and second and third and then the last and next to
the last and the next to the next to last, so that what is left is the middle. But for
more than seven items, respondents begin to lose track of where they are with respect to
the first, last, and middle positions. When this happens, they make mistakes. For more
than seven items, respondents can be given special task-taking procedures to counter this
problem. But this procedure is rather burdensome.
Also, ranking questions have to be written very carefully. The slightest lapse in
clarity in the question or the instruction given will cause some people to rank in the
reverse order or to assign two alternatives the same rank or to forget to rank every
alternative. Nonetheless, ranking must sometimes be used. The example in figure 4.13 is
one that has worked reasonably well. Respondents will make a few errors, but statistical
procedures are available to handle them.
Figure 4.13
Consider each of the following types of findings, which are often used to assess programs. FROM YOUR EXPERIENCE, which do you think are more likely to impress the state education agency program (SEA) officers? Indicate your answer by rank ordering each of the following alternatives from the most to the least impressive. Select the type of result you think is most likely to impreece the SEA officials. Rank this 1st by checking. Do the same for all the remaining categories, ranking them 2nd, 3rd, 4th, 5th, 6th, and 7th.
| 1st | 2nd | 3rd | 4th | 5th | 6th | 7th | |
| 1. Improvement in educational management or accountability | |||||||
| 2. Improvement in school or facilities | |||||||
| 3. Student improvement through gain scores on grades or teacher rating | |||||||
| 4. Student improvement through gain scores on standardized norm referenced | |||||||
| 5. Student improvement through gain scores on criterion referenced tests | |||||||
| 6. Student improvement through gain scores in the affective domain (e.g., likes, dislikes) | |||||||
| 7. Improvement in curriculum and instruction |
Rating questions are perhaps our most useful format because we usually want to know the
actual or absolute value of the trait we are measuring. Ratings are assigned solely on the
basis of the score's absolute position within a range of possible values. For example, a
rating scale might be assigned the following categories: of little importance, somewhat
important, moderately important, and so on. In writing rating questions, we should try to
categorize the scales in equal intervals and anchor the scale positions whenever possible.
Aside from the scaling, rating questions are easier to write properly and cause less error
than ranking questions. We can see from the two examples of the rating format shown in
figure 4.14 that ratings provide an adequate level of quantification for most purposes. We
can also see by comparing the examples in figures 4.13 and 4.14 that rating formats are
far less cumbersome than ranking formats.
Figure 4.14 Rating Questions
Guttman Format
In questions written in the Guttman format, the alternatives increase in
comprehensiveness; that is, the higher-valued alternatives include the lower-valued
alternatives.
Applying this principle in one job, we asked state resource officials how they benefited
from an earth-orbiting satellite. The question is given in figure 4.15. Here we assumed
that if respondents had measured the benefit, they had identified it, and if they had
determined the cost-benefit ratio, they had measured the primary and secondary benefits
and lack of benefits as well as the worth or dollar value of these benefits and lack of
benefits.
Figure 4.15
Consider the benefits, if any, of your state government may have received from participating in the LANDSAT program. Identify the benefit areas and the degree to which you can qualify and/or quantify these benefits. (Check column 1 if particular benefit not identified; otherwise check one of the columns 2-5).
Qualification of Benefits |
|||||
| No benefit identified (1) | Identified benefits (2) | Measured some or all benefits (3) | Assessed worth and/or dollar value of benefits (4) | Made cost-benefit analysis (5) | |
| Benefit area | |||||
| 1. Agriculture/forestry, range resources | |||||
| 2. Land use survey and mapping | |||||
| 3. Mineral resources, geostructural, and land form surveys | |||||
| 4. Water resources | |||||
| 5. Marine resources and ocean surveys | |||||
| 6. Meteorology | |||||
| 7. Environment | |||||
| 8. Other | |||||
Intensity Scale Questions
The intensity scale format is usually used to measure the strength of an attitude or an
opinion. Two popular versions, the extent and expanded yes-no scales, are presented in
figures 4.16.
Figure 4.16 Extent Scale and the Expanded yes-No Scale Questions
Likert Scale
Another frequently used intensity scale format is the Likert or agree-or-disagree scale.
The Likert scale is easy to construct. Consider the extent-scale example of figure 4.16.
As shown in figure 4.17, all the question writer has to do is convert the question into a
statement and follow it with agree-or-disagree choices.
Figure 4.17 Extent Scale Converted to Likert Scale Question
However, if the writer is not careful, the simplicity and adaptability of the Likert
scale format are often paid for by greater error and threats to validity.
First, there is bias. The Likert scale presents only one side of an argument, and some
people have a natural tendency to agree with the "status quo" or the argument
presented. Writers of Likert scale questions could attempt to counter this bias error by
presenting the converse statement also. For example, they would first ask for a response
to "My boss does not let me participate in decisions (agree or disagree)." Then
in a subsequent part of the questionnaire, they have to ask their questions in reverse:
"My boss lets me participate in decisions (agree or disagree)."
But now the line of inquiry is no longer concise or simple. The questions are doubled in
number with a serial repetitive format that interferes with the cognitive recall process,
aside from inhibiting motivation because these formats quickly become boring. Furthermore,
developing precise converse statements of counterbalancing intensity can be difficult and
complex. For example, "not satisfied" is not necessarily the opposite of
"satisfied." And in the example above, the phrase "My boss does not let me
participate" is much more negative than the phrase "My boss lets me
participate" is positive.
Another problem is that the extent of the respondent's agreement or disagreement with a statement may not correspond directly to the strength of the respondent's attitude about the Likert statement posed in the question. The respondent may consider the statement either true or false and respond as if the question were in an "either or" format rather than a graduated scale measuring the intensity of a belief.
The Likert question uses the statement as a reference point or anchor. Hence, what is measured may be not the strength of the respondent's attitude over the complete range of intensities but, rather, the range of intensities bounded or referenced by the position of the anchoring statement at one end of the range and unbounded at the other end of the range. To complicate things even more, the single-bounding anchor may not be at the extreme end of the range; this makes comparisons among items very difficult.
The point is that the indirect approach in the Likert scale may produce misleading
results for a variety of reasons. It is usually better to use a direct approach that
measures the strength of the respondent's actual attitude over a complete range of
intensities. For example, it is better to reformulate the item from "My boss never
lets me participate" to "To what extent, if at all, do you participate?"
However, one situation in which the Likert scale is very useful is when extent of
agreement or disagreement is closely and directly related to the statement. For instance,
the respondent may be asked about the extent to which he or she agrees or disagrees with a
policy, as in figure 4.18.
Figure 4.18 Likert Question Used to Evaluate Policy
Amount and Frequency Intensity Scales
Many questions ask the respondent to "quantify" either amounts or frequencies.
These are relatively simple. They use certain descriptive words to characterize the
amount, frequency, or number of items being measured. For example, traits like
"help," "hindrance," "effect," "increase," or
"decrease" can be quantified by adding "little," "some,"
"moderate,"
"great," or "very great." Certain adjectives like some and great have
a stable and relatively precise level of quantification. For instance some is usually
considered to be about 25 percent of the amount shown on the scale and a great amount is
usually considered to be about 75 percent. Sometimes such adverbs as "very" and
"extremely" are used. Quantities can also be implied by the sequence of numbered
alternatives ordered with respect to increasing or decreasing intensity. See figure 4.19,
which uses both methods together, in the common practice.
Figure 4.19 Amount Intensity Scale
Frequencies or occurrences of events are treated the same way. Question writers know
that words like "sometimes" and "great many" or "very often"
mean about one fourth of the amount or 25 percent of the time and three fourths or 75
percent of the time, respectively, to most people. Similarly, words like "about
half" and "moderate" anchor the midpoints. As
with amount intensity scales, it is important to use both numbered, ordered scalar
presentations and words to quantify the scale intervals. See figure 4.20.
Figure 4.20 Frequency Intensity Scale
In many amount and frequency measures, where ambiguities are likely to occur, it is
also important to use proportional anchors such as fractions and percents or verbal
descriptive anchors such as once a day or once a month in addition to the adjective and
scale number anchors. Examples are shown in figure 4.21.
Figure 4.21 Frequency and Amount Intensity Scales with Proportional and Verbal Descriptive
Anchors in addition to the Conventional Adjective and Scale Number Anchors
Branching Intensity Scale Formats
So far, all the examples have illustrated nonbranching formats. However, even more precise
measures can be obtained with branching formats. An example is shown in figure 4.22.
Figure 4.22 Branching Intensity Scale Format\
Fill-in-the-Blank Frequency Formats
Sometimes when evaluators have to be really precise and the range of frequency choices is
very wide, such as in the study of repetitive behaviors, they can use a fill-in-the-blank
format. What is asked for is the number of occurrences in a given time period or the
interval between events to be counted. Examples are shown in figure 4.23.
Figure 4.23 Number-of-Occurences and Time Interval Formats
Here are some guidelines for using intensity scales.
Semantic Differential Intensity Scales
In a semantic differential question, frequencies or values that span the range of possible
choices are not completely identified; only the extreme value or frequency categories are
labeled. An example is shown in figure 4.24. The respondent must infer that the range is
divided into equal intervals. The range seems to work much better with seven categories
than five. The reasons for this are complicated, but seven categories provide a closer
approximation to the normal distribution.
Figure 4.24 Semantic Differential Question
Semantic differentials are very useful when the evaluators do not have enough
information to anchor the intervals between the poles. However, three major problems
detract from this format. First, if the questions are not written with great care, many
respondents will not answer or will answer with errors. Second, respondents may flounder
and make judgment errors because the semantic differential has no midrange or intermediate
anchors. Third, the results lack a certain amount of credibility because they are not tied
to a factual observation. For example, compare a factually anchored scale point with a
simple enumerated scale point. We find there is a big difference between saying that 70
percent of the respondents said their streams were polluted to the point at which most
aquatic life was declining and saying that 70 percent checked 6 on a scale of 1 to 7.
Intensity Paired Comparison Scales
Intensity scales are very versatile and are sometimes combined with other types of scales.
One such combination of scales is sometimes used in establishing priorities. Here an
intensity scale is combined with a paired comparison scale. As its name implies, a paired
comparison scale compares all the question options by pairs by asking the respondent to
rank one item of the pair over the other. An intensity paired comparison scale asks the
respondents to scale the amount of the difference between the two pair items. See figure
4.25.
Figure 4.25: Intensity Paired Comparison Scale
| Much less important | Somewhat less important | Equally important | Somewhat more important | Much more important | |
Comparison Activities |
|||||
| 1. Biotechnology vs. Acquisition | |||||
| 2. Description vs. Breeding | |||||
| 3. Enhancement vs. Preservation | |||||
| 4. Acquisition vs. Description | |||||
| 5. Preservation vs. Biotechnology | |||||
| 6. Breeding vs. Enhancement | |||||
| 7. Biotechnology vs. Breeding | |||||
| 8. Description vs. Preservation | |||||
| 9. Enhancement vs. Acquisition | |||||
| 10. Acquisition vs. Breeding | |||||
| 11. Preservation vs. Acquisition | |||||
| 12. Breeding vs. Preservation | |||||
| 13. Biotechnology vs. Enhancement | |||||
| 14. Description vs. Biotechnology | |||||
| 15. Enhancement vs. Description |
Chapter 5
Avoiding Inappropriate Questions
To make sure questions are appropriate, the evaluators must become familiar with respondent groups-their knowledge of certain areas, the terms they use, and their perceptions and sensitivities. What may be an excessive burden for one group may not be for another. And what may be a fair question for some may not be for others. For example, in a survey of the handicapped, those who were not obviously handicapped were very sensitive about answering questions.
This chapter discusses nine types of inappropriate questions and ways to avoid them. Questions are inappropriate if they
The best way to avoid inappropriate questions is to learn about the respondent group,
design and field test for this group, and not rely on preconceptions or stereotypes. An
anecdote may bring this point home. A researcher was pretesting a questionnaire on people
who used mental health services. During the test, the researchers expressed surprise that
the respondents could handle certain difficult concepts. Annoyed, one of the respondents
rejoined, "I may be crazy, but I'm not stupid."
Questions That Are Not Relevant to the Evaluation Goals
A questionnaire should contain no more questions than necessary. Questions that are not
related to the goals of the evaluation or that are not likely to be used in the final
report should be avoided. They require unnecessary time and effort from respondents. And
questions that they view as irrelevant to the evaluation are less likely to be answered.
This is the single biggest cause of nonparticipation. However, there are occasions when
questions that are indeed very important appear to be irrelevant. If this is expected, the
author should be very careful to explain why it was included.
Occasionally, however, someone asks the evaluators to include what is called a "rider"an unrelated question for use in another evaluation. Including riders creates three problems. First, the evaluation now has a dual purpose that has to be explained to readers. Second, the riders have to be woven into the questionnaire so that they do not seem irrelevant. Third, the use of the rider changes the context and hence the meaning of the questions.
Aside from riders, there are three other ways in which irrelevant questions typically find their way into evaluations:
Not one of these reasons is acceptable because the use of evaluations for such purposes wastes the agency's and the respondents' time and money.
Unbalanced Line of Inquiry
Evaluators should not write questions that could be seen as developing a line of inquiry
to support a particular position or preconceived idea, possibly at the expense of evidence
to the contrary. The purpose of questionnaires is to develop information for an objective
evaluation. To seem to do otherwise threatens a study's reputation for objectivity,
commitment to balance, and integrity.
Questions That Cannot or Will Not Be Answered Accurately
Perhaps the most frequent source of error is asking questions that cannot or will not be
answered correctly. For example, we asked companies for 4 years of data, when they kept
records for only 3 years.
A more difficult problem occurs when respondents either purposely or unconsciously give
biased answers. For example, unit commanders had a favorable bias when reporting on the
performance of their units, whereas enlisted personnel were more likely to "tell it
like it is." Similarly, physicians in certain hospitals rated the quality of their
own medical practice very high but were objective in their judgment of peers. In these
instances, it was inappropriate to ask unit commanders and physicians to rate themselves,
because they were understandably biased in their answers. We obtained much more accurate
observations from other sources (enlisted members and physician peer and nurse reports).
Sometimes respondents provide misinformation because they make a random guess or they
do not like to admit that they do not know something or they like to please the question
asker by responding "yes." But it is better to have no information than false
information. So it is important to skip out those not qualified to answer by using
socially acceptable skip questions (see figure 5.1) or to direct the questionnaire only to
those the evaluators know are knowledgeable. For example, in one project we evaluated the
usefulness of a congressional report that analyzed federal funding by program and
geographic location. We did not know which congressional staff used this report. So we
analyzed staffing patterns and sent the questionnaire to the right people.
Figure 5.1 Skip Question
Another means of selection is to ask people to rate their expertise. For example, in a
study of the feasibility of a national health plan, we asked people to rate their
expertise in the various knowledge areas such as the health care industry, insurance,
education, manufacturing, and preventive medicine.
Questions That Are Not Geared to Respondent's Depth and Range of Information,
Knowledge, and Perceptions
To avoid questions not properly geared to the respondents, it is important not to use
words or terms they do not understand. It is very easy to assume that respondents know the
same words we do. Some terms and abbreviations that have caused problems in past surveys
are "detoxification," "EEO," "DCASR," "peer
group," "net sales," and "adjusted gross income." We could have
saved time and money had we provided a few words of explanation, such as
"detoxification, or drying out"; "peer group, or the people you work with
who have similar rank or status"; and "net sales, or the profit on sales after
all expenses have been deducted."
Evaluators must also use terms in the same context and sense that people are used to
seeing them in. To students at a state college, the student union was a place where people
hang out, watch television, and buy coffee and doughnuts; however, to military academy
cadets, it was a subversive organization. In another survey, the term "margin"
had different meanings to different respondents. It meant barely adequate to consumers,
the amount of collateral required for stock purchases to bankers and brokers, the benefits
of building or buying additional units to businessmen, and a cross-tabulation calculation
to statisticians.
Question writers must be familiar with their population, and they cannot assume too much
or too little. For instance, we were worried about using two technical terms in surveying
ranchers: "actual grazing capacity" and "forage productive capacity."
However, our pretests showed the ranchers uniformly understood the terms. In another
survey, we asked users to rate the quality of the computer image tapes from the LANDSAT
earth-orbiting satellite. (The tapes provide data used to make computer maps of the
earth's surface.) In general, the users could not answer this question because it was too
broad. They
wanted us to be much more specific and ask about the quality of the calibration, striping,
formatting, wave length bands, pixil number of original amplitude steps used in digital
conservation, corrections for geometric errors and distortions, and threshold settings. In
yet anothe