U.S. Department of Housing and Urban Development. Program Evaluation and Analysis: A
Technical Guide for State and Local Governments. Washington, DC: Prepared for the U.S.
Department of Housing and Urban Development by Public Technology, Inc.; 1978. pp.
20-25.
TASK 6 - DATA COLLECTION
The sixth task in the evaluation process is usually the most time-consuming and
expensive -collecting the data needed to conduct the evaluation. There are four major
steps in this task: (1) Identify the necessary data, (2) determine data availability, (3)
collect existing data, and (4) verify the accuracy of the data.
Step 1- Identifying the Data
Identifying the data involves determining what statistics or indicators are required to
measure the criteria identified earlier in the evaluation process. In many cases, the
criteria themselves will be statistical measures. An illustration of this can be seen
using the example of fire service criteria presented in Chapter II where the objective was
a 50 percent increase in public awareness of fire dangers this year. The associated
criteria were:
(a) Number of fire safety demonstrations performed.
(b) Public response to fire safety questionnaire.
Number of fire hazards reported by the public.
Criteria (a ) and are specific statistical measures. Criterion (b) actually represents several statistics, since analysis of the survey questionnaire responses would probably yield separate figures on overall awareness of hazards, and on awareness of specific types of hazards. The analyst should study each criterion and ask what data would be needed to quantify the criterion. The analyst should not be concerned at this point with whether the data are easily available, since a thorough check of this point is the next step. If no single data source seems sufficient it may be necessary to identify several data sources that indirectly measure aspects of the criteria.
Step 2- Determining Data Availability
Once the analyst has determined what data are necessary, the second step is to determine
how much are available. At least a preliminary survey of data availability should have
been done during the project selection process to ensure the feasibility of the project.
The methodology outlined here for determining data availability is considerably more
detailed than that used for preliminary data surveys.
As a matter of practicality, for small evaluations the analyst may well determine data availability and begin collection at the same time. For most evaluations, it will be desirable to keep these steps separate since the absence of required data may cause the analyst to formulate a new strategy for data collection. It is not always necessary to obtain data for every criterion of a multiple-criteria objective. Using the fire prevention example, it would not be absolutely necessary to obtain data for all three of the criteria to be able to make a sound evaluation of program effectiveness. Each piece of data would provide an additional indicator of program effectiveness, but even without all of the data, valid conclusions could still be drawn about the program.
The analyst would be well advised to prepare a worksheet to use during data identification and collection. Such a worksheet would have the specific program objective at the top of the page, a list of the applicable criteria, and the data required to measure each. Additional information could be added indicating the availability and specific location of the data. A sample of such a form using the first protection example is shown in Figure 10. There are numerous types of data, but for our purposes only three will be discussed in detail: (1) existing records and statistics, (2) client perception surveys, and (3) special data collection techniques.
1. Existing Records and Statistics. The analyst should begin the data search by examining the existing records of the jurisdiction, starting with those of the program agency. The partially completed data availability worksheets with the data requirements identified should be shown to the program agency liaison person. The agency liaison should be able to determine quickly whether the agency has the required data and help the analyst figure out the best way to collect them.
Some evaluations will require data from several agencies since the program being evaluated involves more than one agency. For example, an evaluation of police effectiveness would probably require records from the courts. Obtaining the cooperation of several agencies can be quite difficult, especially if the evaluation effort does not affect or benefit them directly. Such situations require experience and skill on the part of the evaluation team leader and underscore the importance of top-level management support for the evaluation. It is the analyst's job to locate the necessary data, but the team leader's help will often be needed to gain access to them. Some general suggestions that may prove helpful in locating data are presented in Figure 11.
2. Client Perception Surveys. If the data identification process revealed a need for data on citizen perceptions of service delivery, the analyst will probably have to turn to sources other than existing records. The analyst should determine whether a survey has recently been completed either on a jurisdiction wide basis or in the specific program area of the evaluation. A survey conducted within the past year can be considered current. The analyst should examine the questions and responses to determine if the necessary data can be obtained from the survey. If the survey is too old or none has been conducted, then consideration must be given to initiating a new survey.
The experience of several jurisdictions that have used surveys in program analysis indicates that small, narrowly defined surveys yield the most productive results. For example, a short (3-6 questions) survey on citizen satisfaction with plastic trash bags, or a specific recreation program, yields results that are easy to interpret and involves relatively little effort to prepare and administer. Such surveys are also easier for citizens to respond to than a long survey that asks their perception on a wide range of government programs or issues. The analyst may be able to use statistics on citizen complaints or service requests to gauge citizen perceptions on specific services.
3. Special Data Collection Techniques. Once the data availability worksheet has been completed, the analyst must study it carefully to see if sufficient data are available to make a valid evaluation. This will be a particularly sensitive decision for objectives that can only be measured by one or two criteria. As a rule of thumb, data should be available on more than half of the criteria to ensure the validity of the evaluation. This rule of thumb must be used very cautiously for some criteria can be more vital to an evaluation than others; therefore, it also matters which criteria can be measured. To retain the community impact emphasis of the evaluation, it is necessary to give most weight to those criteria that measure citizen perceptions and direct effects on the program clientele groups.
Figure 10. DATA AVAILABILITY WORKSHEET. This is a suggested form to be prepared by the analyst to determine the availability of the data needed to conduct an evaluation. The information shown in the sample applies to the fire prevention example originally presented in Chapter II.
DATA AVAILABILITY WORKSHEET |
Figure 11. DATA LOCATION. Below are some suggested sources for the types of data often required for program evaluations.
| If the jurisdiction has an active records management program, it may be
valuable to spend some time becoming familiar with the records inventory. A properly
maintained inventory will quickly tell what information is kept by each agency, how far
back the records go, how they are accessed, and where they are kept. Very few
jurisdictions have such a complete system, but if the jurisdiction is fortunate enough to
have one. It can be valuable to evaluators. Demographic data (population characteristics, geographic dispersion, etc.) are necessary for many instances, a regional planning agency or State planning department should be able to supply census information that fits the requirements. Keep in mind, however, that the census data for many localities may be out of date. If the community is a rapidly growing or decreasing one, or if it routinely has a high percentage of transients, then the census data must be used with caution. One of the most frequent uses of census data is to draw a profile of the community so that an accurate sample may be selected for survey purposes. Cost data are, of course, usually available from the accounting function of the finance agency. Depending on the level of detail needed and the type of financial reporting system the jurisdiction uses, it may be necessary for an account clerk to work with agency personnel to extract and total detailed records. Many operating agencies maintain some type of internal manual accounting system in addition to whatever type of centralized accounting system the jurisdiction uses. Such "satellite" accounting systems can be useful to the analyst since they are often easier to access for program costs than are central records. A possible problem in using data from such satellite systems is that agency personnel may classify expenditures differently than the central accounting office would. This can create discrepancies if the analyst is trying to compare expenditures with budgeted amounts for specific categories. It is usually possible to reconcile such discrepancies, but it will mean locating and examining the specific vouchers in question. All health departments routinely record births, deaths, and causes of death and code these data by census tract. Aggregations of these data on a State and national level are available. Such statistics can be used to evaluate health programs by comparing the statistics for a neighborhood with other neighborhoods similar to it in demographic characteristics either within the jurisdiction or in other jurisdictions. Naturally, such comparisons should be made with care since many other factors are involved. Data for evaluating manpower and employment programs are available from the State employment service or from county or city manpower offices. Statistics on employment by age, profession, race, education, and other factors are available by labor area. A "labor area" is a central city and the surrounding region within easy commuting distance. Data on more specific geographic areas such as neighborhoods, can sometimes be obtained from the State employment service, or can be determined by survey. |
The analysis described above will enable the analyst to determine whether the evaluation can be completed with the available data. There will be many instances when additional data will be necessary, and even more instances when additional data can add greatly to the validity and utility of the evaluation. This is a key decision point in an evaluation because, if some of the necessary data are lacking, a determination must be made whether to : (1) continue the evaluation with available data. (2) take the necessary time and effort to gather additional data from scratch, or (3) scrap the evaluation for lack of sufficient data.
If the first decision is reached the analyst may conclude that the lack of data
requires limiting the scope of the evaluation. If this limitation is deemed significant by
the team leader, then management and/or elected officials should be apprised of the
specifics and asked to approve the new scope or to direct that additional data be
generated to perform the evaluation as originally planned. If the analyst and team leader
decide there is sufficient information and that it is impractical to gather the needed
data, they should document their findings and present them to management.
When a reduced evaluation scope will not provide management with the type of information
needed for decision making, it is necessary to generate data from scratch. The specific
data should already have been identified, so that the first job should be to determine
exactly how to go about collecting them. The analyst and team leader should decide whether
the data can be collected: (1) by adding one or more data items to records routinely kept
by the government. (2) by establishing new records and procedures, or (3) by using a
special technique, such as a citizen survey. After this decision is made, the analyst
should prepare a work plan that clearly states the specific data needed, the methodology
to be employed, the time period to be covered, the calendar time required, the personnel
time required, the estimate cost of data collection, and the impact the collection effort
will have on the schedule for the evaluation as a whole. Once the impact on the project is
known, the new work plan should be submitted to top management and elected officials for
their consideration to ensure that all understand and approve the scope of the evaluation.
The main point to keep in mind is that the need to collect data from scratch, whatever the
reason, will have a significant impact on the duration and cost of the evaluation.
Step 3- Physically Collecting the Data
Once the data requirements have been identified and availability ascertained the team
leader, analyst, and agency liaison person should meet to decide the best way actually to
collect the data. As mentioned earlier, there are three main sources for evaluation data:
(1) existing records and statistics, (2) client perception surveys, and (3) special data
collection techniques.
1. Existing Records and Statistics. Data from existing records and
statistics can usually be collected most efficiently by program agency personnel. The
people who handle the records on a day-to-day basis are the extract the data quickly,
since they do not need a "get acquainted" period. Using program agency personnel
to do the time-consulting physical work of data collection can also free the evaluation
analyst for involvement in several evaluation projects simultaneously.
Several things must be done, however, before program agency personnel can be turned loose
on a data collection problem. First, the analyst should spot-check the accuracy of the
data is important to accuracy. If the data are guesses or estimates by field personnel
rather than "hard" data provided by program clients or some reliable form of
measurement, then the validity of the data may be seriously questioned. A full discussion
of data accuracy will be presented in Step 4 of this task.
Second, the analyst must provide the agency personnel with clear concise directions. The
analyst must be able to tell agency personnel exactly what data are needed and the
specific time span to be covered. The analyst should also provide worksheets for recording
the data so that they are collected in a consistent manner. It may also be possible to lay
out the worksheets so as to facilitate later analysis of the data. The analyst and the
agency liaison person should meet with the employees who will be doing the actual data
collection and discuss the reason for the data collection, the significance of the
evaluation and the collection worksheet and special instructions. After answering
questions, the analyst may find it beneficial to spend a few minutes working with
employees as they put the worksheets to use for the first time.
It is also wise for the analyst to spot-check data accuracy during the data collection by
examining a sample of the source records and comparing them with the worksheets prepared
by the agency personnel. To facilitate these checks, analysts should have the agency
personnel forward worksheets to them on an "as completed" basis, perhaps once a
week.
Evaluations that require data from several agencies can cause the analyst difficulty in
actually collecting the data and/or in coordinating the efforts of several groups. The
example of a police effectiveness evaluation used earlier will help illustrate the point.
To get a complete picture of police performance, data are likely to be needed from the
prosecutor and/or court system on indictment and conviction rates, and perhaps accident
statistics from the traffic engineering department. The prosecutor's office may not
perceive any immediate benefit to that agency from the evaluation and therefore may be
reluctant to take an active part in the project. The experience, tact, and political
expertise of the evaluation team leader can often greatly improve cooperation. The team
leader may be able to persuade the agency to cooperate by showing the agency head how his
or her agency will benefit.
In the above instance, the team leader may be able to convince the prosecutor that the
evaluation may produce results pointing to the need for police officers to build cases on
more solid evidence, thus making the prosecutor's job easier. If such a line of reasoning
fails to persuade the agency head, the team leader may be able to gain cooperation by
offering the resources of the team to help the agency head solve an operational problem in
return for voluntary assistance with the evaluation. A management mandate ordering the
agency to cooperate should be sought only as a last resort, since the resulting hard
feelings often lead to unfavorable agency perceptions of the evaluation process.
An evaluation such as the one outlined above also raises the issue of confidentiality of
personnel data. Some agencies may refuse the evaluation team access to individual records
on this basis. In such instances the evaluation team may be able to examine the records in
question by limiting access to a single designate analyst who will work solely on the
agency premises. In other cases, the agency may be willing to aggregate key information
about a group of individuals so that no one person can be identified. While it is always
preferable to examine the data first-hand, there may be instances in which aggregation of
data must be accepted.
2. Client Perception Surveys. Client perceptions are becoming
increasingly important data sources for evaluations as governments seek to measure various
program impacts on the people they serve. The most prevalent tool for measuring client
(citizen) perceptions is the survey.
Surveys are tools for questioning selected samples of the general public. They may involve
mailing questionnaires to respondents, leaving questionnaires at respondents' homes and
retrieving them at a later date.????respondents in person, or interviewing respondents
over the telephone. Surveys provide feedback on respondent perceptions, desires, needs,
preferences, priorities, opinions, and experiences.
The primary benefit that surveys offer is the capacity to elicit the views of numerous
individuals many of whom would not otherwise participate in the program evaluation
process. Thus, survey information can be more representative of the public at large than
information obtained through other kinds of public involvement efforts. Surveys also offer
the following benefits:
Survey responses can be readily analyzed to determine underlying patterns and
relationships, including trends over time.
Survey can focus on specific respondent groups and/or specific issues or objectives of
interest to the user jurisdiction.
Survey can identify the rationale behind respondent answers.
Surveys can gather information about people's perceptions, desires, and opinions
unavailable from other sources/
Surveys can reduce the sense of isolation or alienation felt by many respondents.
It is important for the analyst to realize that a properly prepared and analyzed survey
is a very useful and powerful tool, but one that requires a considerable amount of
calendar time. A simple reliable survey may take several weeks to complete, and several
months is a more realistic estimate for many surveys. Although a detailed explanation of
the conduct of sample surveys is beyond the scope of this Guide, an appendix has been
included to provide guidance. Appendix B contains decide when a survey is appropriate, (2)
prepare and administer the survey instrument (questionnaire), and (3) analyze and present
the results. References are also provided to jurisdictions that have practical experience
in the use of sample surveys.
Methods other than surveys are available to measure citizen perceptions. Regular meetings
of improvement associations service clubs, and other service organizations can provide a
forum for the airing of perceived problems. While such input is not necessary
representative of the entire community, the analyst can discern useful information through
careful questioning and listening. Such techniques may be necessary when the time or
resources are not available to conduct a survey. Extreme caution is urged in the use of
information obtained in this way, because of its lack of precision and objectivity.
Citizen complaints or service requests are not normally used as indicators of citizen
perceptions because few jurisdictions have made any effort to handle them systematically.
Additionally, such information is obviously selective since only dissatisfied clients use
this avenue of communication. However, at least one jurisdiction, Kansas City, Missouri,
has made an effort to use complaint data. Through the city's "action
center,"service requests are recorded, channeled to the appropriated department for
action, and followed up with a postcard to the citizen asking for an evaluation of the
city's response to the request. Complaint data and citizen ratings are summarized monthly
for the operating departments and the city manager's office. Such a system allows the
administration to get a rough barometer of feeling toward specific services by tracking
complaints. City council members also use the monthly summary figures of complaints as
rough performance indicators for the various departments.
3. Special Data Collection Techniques. Often, data must be generated from
scratch. Perhaps the most common way to do this is to add one or more data items to forms
currently in use by the agency. Such efforts are usually relatively low-cost, since the
only additional expense is redesigning and reprinting the appropriate forms. The chief
drawback is that collection of the information will require at least one program interval
( a month , a quarter , or a year), thereby delaying the evaluation for that time period.
A more involved variation of the above is when information must be added later to records
already on hand, as in the example of collecting additional information on clients already
served by a program. Such activities are very difficult to conduct because the
participants must first be located and then persuaded to cooperate. Such after-the fact
data collection techniques should be used only when alternatives are exhausted.
In some situations, subjective ratings by professionals may be appropriate for evaluating
program effects. This approach may be most useful in social service fields. For example,
professional social workers could use subjective ratings to measure changes in family and
community functioning attributed to social welfare programs. Rating scales might cover;
family relationships and family unity, individual behavior and adjustment , car and
training of children, economic practices, social activities, home and household practices,
health conditions and practices relationship to social workers, and community resource
use. Explicit directions must be provided for use of each rating on a scale. Ideally, the
rating system should enable a group of professionals, observing the same conditions to
arrive at the same rating.
A pretest is highly desirable to see if different professionals using specified procedures
would in fact give reasonably similar ratings. When using such a rating scale, individuals
should not be asked to rate themselves on their own effectiveness in providing a service.
Raters should be selected who do not have a personal interest in the outcome.
For meaningful program evaluation, three factors should be standardized; the
characteristics evaluated by professionals, the rating scale applied to these
characteristic, and the conditions under which the ratings are made. In the family
functioning example, the professionals are given guidance on the aspects of family
functioning to be rated. Each aspect is rated according to a standard descriptive scale.
For instance, one aspect "sibling relationships." would be assessed on the basis
of criteria for each grade on the scale:
Inadequate: There is conflict between children resulting in physical violence or cruelty
which warrants intervention....
Marginal: Emotional ties among children weak... rarely play together...
Adequate: Positive emotional ties and mutual identification....
The actual rating is made by first-hand observation of the family by the social workers.
This method requires professionals who are competent to make judgments about the
particular situations and who can be impartial in their appraisals. Also, if a grading
scale is not readily available, considerable time and effort will be needed to establish
an acceptable rating system. The costs of making ratings could be large because of the
time required for each observation and the specialized personnel involved. However, if
such ratings can be provided as part of the regular jobs of employed professionals, the
actual out-of-pocket costs to a government may be small.
In some situations, as time passes, raters may deviate from the rating scale. Periodic
checks and retraining in the use of the scale can alleviate this. For example, during the
Washington, D.C. "Operation Clean Sweep," checks of a sample of inspector
ratings using the street cleanliness rating scales indicated that inspectors tended after
a time to compress the scale; i,e.. To give fewer extreme ratings. To correct the problem,
the inspectors were exposed to the photographic rating scale. While the tendency to
compress the scale may not be as pronounced with more highly trained professionals, it is
still a situation that the analyst must guard against.
This method of data collection is basically subjective and normally should be used in
conjunction with more objective measurements. For example, the number of reported
difficulties in school for client-family children could supplement professional ratings to
measure child adjustment.
If none of the data sources above seems to fit the evaluation the analysts are free to
develop measures and sources of their own as long as the accuracy of the approach can be
verified. As an example, an evaluation of the Fairfax County, Virginia road maintenance
program was aided by the use of a device called a " roughometer" that measured
inches of roughness per mile. The evaluation team verified the accuracy of this approach
by showing a high correlation between citizen perceptions of roughness and readings taken
by the roughometer on the same sample of streets. There are many less dramatic examples of
analysts making creative uses of field observation techniques by measuring emergency
equipment response time or making special counts of participants in recreational
activities. The point is that the evaluation team should not restrict itself to the
approaches presented in this Guide.
Step 4-Verifying the Accuracy of the Data
One of the most frequently overlooked aspects of program evaluation is verifying
the accuracy of the data. While treated here as a separate step for emphasis, the
discussion of the previous step correctly suggests that data accuracy should be verified
during data collection. In this way, the analyst can take actions to correct or improve
the data immediately, rather than initiate a second collection effort later. There are
three major types of data inaccuracies-clerical errors, subjective errors, and
methodological errors.
1. Clerical Errors. Clerical errors are one of the most common sources of
inaccuracy. Such errors (transposed digits recording the wrong figure, etc.) frequently
occur when data are transferred from original source documents to summary reports or data
collection worksheets. Clerical errors can be detected by checking a sampling of the data
collection worksheets against the original source documents. If more than 10 percent of
the sample entries are incorrect, the analyst can take one of several remedial actions.
If more than one person has been recording the data in question, the analyst should try to
determine whether the high error rate is uniform among all collectors or is found only in
the work done by one or more individuals. The employee completing each worksheet can be
identified by a code on the sheet itself. If the high error rate is restricted to one or
more individuals, the analyst can either review collection procedures with those
individuals and stress the importance of accuracy to the employees and their supervisor,
or request that a more accurate employee be assigned to recollect the same data. Should
the high error rate prove to be uniform among all collectors, the analyst should review
the collection procedures with all employees and appropriate supervisors to determine
whether the worksheets are poorly designed or the data collection procedures incomplete or
confusing.
If data collection accuracy does not improve, analysts may want to consider collecting the
data themselves or finding another way to measure the criterion in question. Another
remedial course is to postpone the evaluation while improved data collection procedures
are developed. This will usually mean postponing the evaluation for one program period
(one month to one year). Naturally, the earlier in the evaluation process this
determination can be made, the fewer dollar and personnel resources will be wasted on an
incomplete effort.
2. Subjective Judgment Errors. Data involving subjective judgments will
require more involved accuracy checks than outlined above. When dealing with subjective
ratings such as those provided by inspectors or social services counselors, the analyst
must make an effort to determine the accuracy of the rating system. This is accomplished
by examining the rating scale to determine how clear and comprehensive the descriptions
are of the various rating categories. In addition, the analyst should attempt to determine
how much training the field personnel have had in the use of the scale and how often the
training is reviewed. The review question can be significant, since experience has shown
that extended use of a subjective scale often result in "compressed" ratings;
i.e.. Fewer ratings toward the extremes of the scale. Periodic reviews of the scale with
supervisors can help alleviate this tendency.
The analyst may also find it useful to examine the turnover rate among field personnel,
since high turnover often results in inconsistent ratings over the evaluation of the
ratings by getting several people independently to apply the rating scale to the same
situation or site at the same time.
3. Methodological Errors. Of the data collection techniques menti8oned, surveys
are most prone to methodological error. The analyst should review the survey instrument
(questionnaire ) for possible bias, the sample selection method, the size of the sample,
the degree of training given to surveyors, and the methods used to analyze responses. The
references found in Appendix B should provide the information needed to make most of these
determinations. No survey can be 100 percent accurate. What the analyst should watch for
are instances in which opinions or results are not clear-cut on a specific question and
there is some evidence of significant inaccuracy in the survey. Management should be
caution that does not have a high degree of reliability. Data from flawed surveys can
still be used, but with due caution.
Another type of methodological error can sometimes be avoided by double-checking of the
analyst's thought processes. It is very easy to get so involved in what you are doing that
relatively simple errors go unnoticed. For example, an evaluation director reported that
one of his associates was deeply involved in establishing criteria and collecting data on
the effectiveness of fire suppression services. The analyst hit on the idea of using the
percentage of a building that was consumed by fire as a criterion for effectiveness of the
fire department. The evaluation director hastened to point out that since the fire
department had no control over how long a building had been burning before an alarm was
turned in, and that a building might well be fully engulfed before the department was
notified, the proposed criterion was neither fair nor valid. There is a much better chance
of avoiding such errors if the work of an analyst is checked by at least one other
analyst.
It is generally inadvisable to continue the evaluation with data errors greater than 10
percent. If an evaluation is continued under such circumstances, the analyst should be
sure to identify clearly resulting distortions in the evaluation report. Managers must
understand that they cannot place the same degree of confidence in evaluations with
questionable data as in evaluations with highly reliable data.
In summary, five major options can be pursued if key data are discovered to be inaccurate:
(1) The evaluation team can seek other, perhaps less direct, ways of getting acceptable
data. (2) Improved procedures can be adopted for collecting the data and the evaluation
postponed until new, reliable data can be gathered. (3) the evaluation team can seek to
improve the quality of the data by such methods as clear supervision of the collection
effort, or the use of better collection forms. (4) The evaluation can be continued with
the clear warning that management should be cautious in using the data in question for
decision making. (5) The evaluation can be canceled as infeasible. While the most suitable
option will depend on the specifies of the situation, analysts will probably feel the most
confident with the second option, where practical. The important point is to recognize
that inaccurate data can badly undermine the credibility of an evaluation, and the analyst
should guard against this problem.