U.S. Department of Housing and Urban Development. Program Evaluation and Analysis: A
Technical Guide for State and Local Governments. Washington, DC: Prepared for the U.S.
Department of Housing and Urban Development by Public Technology, Inc.; 1978.  pp. 20-25.

View entire document

 

TASK 6 - DATA COLLECTION
The sixth task in the evaluation process is usually the most time-consuming and expensive -collecting the data needed to conduct the evaluation. There are four major steps in this task: (1) Identify the necessary data, (2) determine data availability, (3) collect existing data, and (4) verify the accuracy of the data.

Step 1- Identifying the Data
Identifying the data involves determining what statistics or indicators are required to measure the criteria identified earlier in the evaluation process. In many cases, the criteria themselves will be statistical measures. An illustration of this can be seen using the example of fire service criteria presented in Chapter II where the objective was a 50 percent increase in public awareness of fire dangers this year. The associated criteria were:
(a) Number of fire safety demonstrations performed.
(b) Public response to fire safety questionnaire.
Number of fire hazards reported by the public.

Criteria (a ) and are specific statistical measures. Criterion (b) actually represents several statistics, since analysis of the survey questionnaire responses would probably yield separate figures on overall awareness of hazards, and on awareness of specific types of hazards. The analyst should study each criterion and ask what data would be needed to quantify the criterion. The analyst should not be concerned at this point with whether the data are easily available, since a thorough check of this point is the next step. If no single data source seems sufficient it may be necessary to identify several data sources that indirectly measure aspects of the criteria.

Step 2- Determining Data Availability
Once the analyst has determined what data are necessary, the second step is to determine how much are available. At least a preliminary survey of data availability should have been done during the project selection process to ensure the feasibility of the project. The methodology outlined here for determining data availability is considerably more detailed than that used for preliminary data surveys.

As a matter of practicality, for small evaluations the analyst may well determine data availability and begin collection at the same time. For most evaluations, it will be desirable to keep these steps separate since the absence of required data may cause the analyst to formulate a new strategy for data collection. It is not always necessary to obtain data for every criterion of a multiple-criteria objective. Using the fire prevention example, it would not be absolutely necessary to obtain data for all three of the criteria to be able to make a sound evaluation of program effectiveness. Each piece of data would provide an additional indicator of program effectiveness, but even without all of the data, valid conclusions could still be drawn about the program.

The analyst would be well advised to prepare a worksheet to use during data identification and collection. Such a worksheet would have the specific program objective at the top of the page, a list of the applicable criteria, and the data required to measure each. Additional information could be added indicating the availability and specific location of the data. A sample of such a form using the first protection example is shown in Figure 10. There are numerous types of data, but for our purposes only three will be discussed in detail: (1) existing records and statistics, (2) client perception surveys, and (3) special data collection techniques.

1. Existing Records and Statistics. The analyst should begin the data search by examining the existing records of the jurisdiction, starting with those of the program agency. The partially completed data availability worksheets with the data requirements identified should be shown to the program agency liaison person. The agency liaison should be able to determine quickly whether the agency has the required data and help the analyst figure out the best way to collect them.

Some evaluations will require data from several agencies since the program being evaluated involves more than one agency. For example, an evaluation of police effectiveness would probably require records from the courts. Obtaining the cooperation of several agencies can be quite difficult, especially if the evaluation effort does not affect or benefit them directly. Such situations require experience and skill on the part of the evaluation team leader and underscore the importance of top-level management support for the evaluation. It is the analyst's job to locate the necessary data, but the team leader's help will often be needed to gain access to them. Some general suggestions that may prove helpful in locating data are presented in Figure 11.

2. Client Perception Surveys. If the data identification process revealed a need for data on citizen perceptions of service delivery, the analyst will probably have to turn to sources other than existing records. The analyst should determine whether a survey has recently been completed either on a jurisdiction wide basis or in the specific program area of the evaluation. A survey conducted within the past year can be considered current. The analyst should examine the questions and responses to determine if the necessary data can be obtained from the survey. If the survey is too old or none has been conducted, then consideration must be given to initiating a new survey.

The experience of several jurisdictions that have used surveys in program analysis indicates that small, narrowly defined surveys yield the most productive results. For example, a short (3-6 questions) survey on citizen satisfaction with plastic trash bags, or a specific recreation program, yields results that are easy to interpret and involves relatively little effort to prepare and administer. Such surveys are also easier for citizens to respond to than a long survey that asks their perception on a wide range of government programs or issues. The analyst may be able to use statistics on citizen complaints or service requests to gauge citizen perceptions on specific services.

3. Special Data Collection Techniques. Once the data availability worksheet has been completed, the analyst must study it carefully to see if sufficient data are available to make a valid evaluation. This will be a particularly sensitive decision for objectives that can only be measured by one or two criteria. As a rule of thumb, data should be available on more than half of the criteria to ensure the validity of the evaluation. This rule of thumb must be used very cautiously for some criteria can be more vital to an evaluation than others; therefore, it also matters which criteria can be measured. To retain the community impact emphasis of the evaluation, it is necessary to give most weight to those criteria that measure citizen perceptions and direct effects on the program clientele groups.

Figure 10. DATA AVAILABILITY WORKSHEET. This is a suggested form to be prepared by the analyst to determine the availability of the data needed to conduct an evaluation. The information shown in the sample applies to the fire prevention example originally presented in Chapter II.

DATA AVAILABILITY WORKSHEET
PROJECT: Fire prevention Program Evaluation
GOAL: Reduction in incidence of fires
OBJECTIVE: 50% increase in public awareness of fire dangers this year.
TIME PERIOD COVERED: Fiscal Year 1976
CRITERIA
1. Number of fire safety demonstrations performed.
     Data Required: Statistics on number of fire safety demonstrations performed.
     Availability: Fire department incident reports (headquarters central file room).
2. Public response to fire safety questionnaire.
      Data Required: Statistics on percentage of population showing awareness of various types of fire hazards.
Availability: Not immediately available: sample survey required.
3. Number of fire hazards reported by the public.
     Data Required: Statistics on number and type of fire hazards reported by the public.
Availability: (1) Fire department dispatching records (headquarters central file room).
                  (2) Mayor's "Citizen Service Line" complaint data (Mayor's office files).

Figure 11. DATA LOCATION. Below are some suggested sources for the types of data often required for program evaluations.

If the jurisdiction has an active records management program, it may be valuable to spend some time becoming familiar with the records inventory. A properly maintained inventory will quickly tell what information is kept by each agency, how far back the records go, how they are accessed, and where they are kept. Very few jurisdictions have such a complete system, but if the jurisdiction is fortunate enough to have one. It can be valuable to evaluators.
Demographic data (population characteristics, geographic dispersion, etc.) are necessary for many instances, a regional planning agency or State planning department should be able to supply census information that fits the requirements. Keep in mind, however, that the census data for many localities may be out of date. If the community is a rapidly growing or decreasing one, or if it routinely has a high percentage of transients, then the census data must be used with caution. One of the most frequent uses of census data is to draw a profile of the community so that an accurate sample may be selected for survey purposes.
Cost data are, of course, usually available from the accounting function of the finance agency. Depending on the level of detail needed and the type of financial reporting system the jurisdiction uses, it may be necessary for an account clerk to work with agency personnel to extract and total detailed records. Many operating agencies maintain some type of internal manual accounting system in addition to whatever type of centralized accounting system the jurisdiction uses. Such "satellite" accounting systems can be useful to the analyst since they are often easier to access for program costs than are central records. A possible problem in using data from such satellite systems is that agency personnel may classify expenditures differently than the central accounting office would. This can create discrepancies if the analyst is trying to compare expenditures with budgeted amounts for specific categories. It is usually possible to reconcile such discrepancies, but it will mean locating and examining the specific vouchers in question.
All health departments routinely record births, deaths, and causes of death and code these data by census tract. Aggregations of these data on a State and national level are available. Such statistics can be used to evaluate health programs by comparing the statistics for a neighborhood with other neighborhoods similar to it in demographic characteristics either within the jurisdiction or in other jurisdictions. Naturally, such comparisons should be made with care since many other factors are involved.
Data for evaluating manpower and employment programs are available from the State employment service or from county or city manpower offices. Statistics on employment by age, profession, race, education, and other factors are available by labor area. A "labor area" is a central city and the surrounding region within easy commuting distance. Data on more specific geographic areas such as neighborhoods, can sometimes be obtained from the State employment service, or can be determined by survey.

The analysis described above will enable the analyst to determine whether the evaluation can be completed with the available data. There will be many instances when additional data will be necessary, and even more instances when additional data can add greatly to the validity and utility of the evaluation. This is a key decision point in an evaluation because, if some of the necessary data are lacking, a determination must be made whether to : (1) continue the evaluation with available data. (2) take the necessary time and effort to gather additional data from scratch, or (3) scrap the evaluation for lack of sufficient data.

If the first decision is reached the analyst may conclude that the lack of data requires limiting the scope of the evaluation. If this limitation is deemed significant by the team leader, then management and/or elected officials should be apprised of the specifics and asked to approve the new scope or to direct that additional data be generated to perform the evaluation as originally planned. If the analyst and team leader decide there is sufficient information and that it is impractical to gather the needed data, they should document their findings and present them to management.

When a reduced evaluation scope will not provide management with the type of information needed for decision making, it is necessary to generate data from scratch. The specific data should already have been identified, so that the first job should be to determine exactly how to go about collecting them. The analyst and team leader should decide whether the data can be collected: (1) by adding one or more data items to records routinely kept by the government. (2) by establishing new records and procedures, or (3) by using a special technique, such as a citizen survey. After this decision is made, the analyst should prepare a work plan that clearly states the specific data needed, the methodology to be employed, the time period to be covered, the calendar time required, the personnel time required, the estimate cost of data collection, and the impact the collection effort will have on the schedule for the evaluation as a whole. Once the impact on the project is known, the new work plan should be submitted to top management and elected officials for their consideration to ensure that all understand and approve the scope of the evaluation. The main point to keep in mind is that the need to collect data from scratch, whatever the reason, will have a significant impact on the duration and cost of the evaluation.

Step 3- Physically Collecting the Data
Once the data requirements have been identified and availability ascertained the team leader, analyst, and agency liaison person should meet to decide the best way actually to collect the data. As mentioned earlier, there are three main sources for evaluation data: (1) existing records and statistics, (2) client perception surveys, and (3) special data collection techniques.

1. Existing Records and Statistics. Data from existing records and statistics can usually be collected most efficiently by program agency personnel. The people who handle the records on a day-to-day basis are the extract the data quickly, since they do not need a "get acquainted" period. Using program agency personnel to do the time-consulting physical work of data collection can also free the evaluation analyst for involvement in several evaluation projects simultaneously.

Several things must be done, however, before program agency personnel can be turned loose on a data collection problem. First, the analyst should spot-check the accuracy of the data is important to accuracy. If the data are guesses or estimates by field personnel rather than "hard" data provided by program clients or some reliable form of measurement, then the validity of the data may be seriously questioned. A full discussion of data accuracy will be presented in Step 4 of this task.

Second, the analyst must provide the agency personnel with clear concise directions. The analyst must be able to tell agency personnel exactly what data are needed and the specific time span to be covered. The analyst should also provide worksheets for recording the data so that they are collected in a consistent manner. It may also be possible to lay out the worksheets so as to facilitate later analysis of the data. The analyst and the agency liaison person should meet with the employees who will be doing the actual data collection and discuss the reason for the data collection, the significance of the evaluation and the collection worksheet and special instructions. After answering questions, the analyst may find it beneficial to spend a few minutes working with employees as they put the worksheets to use for the first time.

It is also wise for the analyst to spot-check data accuracy during the data collection by examining a sample of the source records and comparing them with the worksheets prepared by the agency personnel. To facilitate these checks, analysts should have the agency personnel forward worksheets to them on an "as completed" basis, perhaps once a week.

Evaluations that require data from several agencies can cause the analyst difficulty in actually collecting the data and/or in coordinating the efforts of several groups. The example of a police effectiveness evaluation used earlier will help illustrate the point. To get a complete picture of police performance, data are likely to be needed from the prosecutor and/or court system on indictment and conviction rates, and perhaps accident statistics from the traffic engineering department. The prosecutor's office may not perceive any immediate benefit to that agency from the evaluation and therefore may be reluctant to take an active part in the project. The experience, tact, and political expertise of the evaluation team leader can often greatly improve cooperation. The team leader may be able to persuade the agency to cooperate by showing the agency head how his or her agency will benefit.

In the above instance, the team leader may be able to convince the prosecutor that the evaluation may produce results pointing to the need for police officers to build cases on more solid evidence, thus making the prosecutor's job easier. If such a line of reasoning fails to persuade the agency head, the team leader may be able to gain cooperation by offering the resources of the team to help the agency head solve an operational problem in return for voluntary assistance with the evaluation. A management mandate ordering the agency to cooperate should be sought only as a last resort, since the resulting hard feelings often lead to unfavorable agency perceptions of the evaluation process.
An evaluation such as the one outlined above also raises the issue of confidentiality of personnel data. Some agencies may refuse the evaluation team access to individual records on this basis. In such instances the evaluation team may be able to examine the records in question by limiting access to a single designate analyst who will work solely on the agency premises. In other cases, the agency may be willing to aggregate key information about a group of individuals so that no one person can be identified. While it is always preferable to examine the data first-hand, there may be instances in which aggregation of data must be accepted.

2. Client Perception Surveys. Client perceptions are becoming increasingly important data sources for evaluations as governments seek to measure various program impacts on the people they serve. The most prevalent tool for measuring client (citizen) perceptions is the survey.
Surveys are tools for questioning selected samples of the general public. They may involve mailing questionnaires to respondents, leaving questionnaires at respondents' homes and retrieving them at a later date.????respondents in person, or interviewing respondents over the telephone. Surveys provide feedback on respondent perceptions, desires, needs, preferences, priorities, opinions, and experiences.

The primary benefit that surveys offer is the capacity to elicit the views of numerous individuals many of whom would not otherwise participate in the program evaluation process. Thus, survey information can be more representative of the public at large than information obtained through other kinds of public involvement efforts. Surveys also offer the following benefits:

Survey responses can be readily analyzed to determine underlying patterns and relationships, including trends over time.
Survey can focus on specific respondent groups and/or specific issues or objectives of interest to the user jurisdiction.
Survey can identify the rationale behind respondent answers.
Surveys can gather information about people's perceptions, desires, and opinions unavailable from other sources/
Surveys can reduce the sense of isolation or alienation felt by many respondents.

It is important for the analyst to realize that a properly prepared and analyzed survey is a very useful and powerful tool, but one that requires a considerable amount of calendar time. A simple reliable survey may take several weeks to complete, and several months is a more realistic estimate for many surveys. Although a detailed explanation of the conduct of sample surveys is beyond the scope of this Guide, an appendix has been included to provide guidance. Appendix B contains decide when a survey is appropriate, (2) prepare and administer the survey instrument (questionnaire), and (3) analyze and present the results. References are also provided to jurisdictions that have practical experience in the use of sample surveys.

Methods other than surveys are available to measure citizen perceptions. Regular meetings of improvement associations service clubs, and other service organizations can provide a forum for the airing of perceived problems. While such input is not necessary representative of the entire community, the analyst can discern useful information through careful questioning and listening. Such techniques may be necessary when the time or resources are not available to conduct a survey. Extreme caution is urged in the use of information obtained in this way, because of its lack of precision and objectivity.

Citizen complaints or service requests are not normally used as indicators of citizen perceptions because few jurisdictions have made any effort to handle them systematically. Additionally, such information is obviously selective since only dissatisfied clients use this avenue of communication. However, at least one jurisdiction, Kansas City, Missouri, has made an effort to use complaint data. Through the city's "action center,"service requests are recorded, channeled to the appropriated department for action, and followed up with a postcard to the citizen asking for an evaluation of the city's response to the request. Complaint data and citizen ratings are summarized monthly for the operating departments and the city manager's office. Such a system allows the administration to get a rough barometer of feeling toward specific services by tracking complaints. City council members also use the monthly summary figures of complaints as rough performance indicators for the various departments.

3. Special Data Collection Techniques. Often, data must be generated from scratch. Perhaps the most common way to do this is to add one or more data items to forms currently in use by the agency. Such efforts are usually relatively low-cost, since the only additional expense is redesigning and reprinting the appropriate forms. The chief drawback is that collection of the information will require at least one program interval ( a month , a quarter , or a year), thereby delaying the evaluation for that time period.

A more involved variation of the above is when information must be added later to records already on hand, as in the example of collecting additional information on clients already served by a program. Such activities are very difficult to conduct because the participants must first be located and then persuaded to cooperate. Such after-the fact data collection techniques should be used only when alternatives are exhausted.

In some situations, subjective ratings by professionals may be appropriate for evaluating program effects. This approach may be most useful in social service fields. For example, professional social workers could use subjective ratings to measure changes in family and community functioning attributed to social welfare programs. Rating scales might cover; family relationships and family unity, individual behavior and adjustment , car and training of children, economic practices, social activities, home and household practices, health conditions and practices relationship to social workers, and community resource use. Explicit directions must be provided for use of each rating on a scale. Ideally, the rating system should enable a group of professionals, observing the same conditions to arrive at the same rating.

A pretest is highly desirable to see if different professionals using specified procedures would in fact give reasonably similar ratings. When using such a rating scale, individuals should not be asked to rate themselves on their own effectiveness in providing a service. Raters should be selected who do not have a personal interest in the outcome.

For meaningful program evaluation, three factors should be standardized; the characteristics evaluated by professionals, the rating scale applied to these characteristic, and the conditions under which the ratings are made. In the family functioning example, the professionals are given guidance on the aspects of family functioning to be rated. Each aspect is rated according to a standard descriptive scale. For instance, one aspect "sibling relationships." would be assessed on the basis of criteria for each grade on the scale:

Inadequate: There is conflict between children resulting in physical violence or cruelty which warrants   intervention....
Marginal: Emotional ties among children weak... rarely play together...
Adequate: Positive emotional ties and mutual identification....

The actual rating is made by first-hand observation of the family by the social workers.

This method requires professionals who are competent to make judgments about the particular situations and who can be impartial in their appraisals. Also, if a grading scale is not readily available, considerable time and effort will be needed to establish an acceptable rating system. The costs of making ratings could be large because of the time required for each observation and the specialized personnel involved. However, if such ratings can be provided as part of the regular jobs of employed professionals, the actual out-of-pocket costs to a government may be small.

In some situations, as time passes, raters may deviate from the rating scale. Periodic checks and retraining in the use of the scale can alleviate this. For example, during the Washington, D.C. "Operation Clean Sweep," checks of a sample of inspector ratings using the street cleanliness rating scales indicated that inspectors tended after a time to compress the scale; i,e.. To give fewer extreme ratings. To correct the problem, the inspectors were exposed to the photographic rating scale. While the tendency to compress the scale may not be as pronounced with more highly trained professionals, it is still a situation that the analyst must guard against.

This method of data collection is basically subjective and normally should be used in conjunction with more objective measurements. For example, the number of reported difficulties in school for client-family children could supplement professional ratings to measure child adjustment.

If none of the data sources above seems to fit the evaluation the analysts are free to develop measures and sources of their own as long as the accuracy of the approach can be verified. As an example, an evaluation of the Fairfax County, Virginia road maintenance program was aided by the use of a device called a " roughometer" that measured inches of roughness per mile. The evaluation team verified the accuracy of this approach by showing a high correlation between citizen perceptions of roughness and readings taken by the roughometer on the same sample of streets. There are many less dramatic examples of analysts making creative uses of field observation techniques by measuring emergency equipment response time or making special counts of participants in recreational activities. The point is that the evaluation team should not restrict itself to the approaches presented in this Guide.

Step 4-Verifying the Accuracy of the Data
One of the most frequently overlooked aspects of program evaluation is verifying the accuracy of the data. While treated here as a separate step for emphasis, the discussion of the previous step correctly suggests that data accuracy should be verified during data collection. In this way, the analyst can take actions to correct or improve the data immediately, rather than initiate a second collection effort later. There are three major types of data inaccuracies-clerical errors, subjective errors, and methodological errors.

1. Clerical Errors. Clerical errors are one of the most common sources of inaccuracy. Such errors (transposed digits recording the wrong figure, etc.) frequently occur when data are transferred from original source documents to summary reports or data collection worksheets. Clerical errors can be detected by checking a sampling of the data collection worksheets against the original source documents. If more than 10 percent of the sample entries are incorrect, the analyst can take one of several remedial actions.

If more than one person has been recording the data in question, the analyst should try to determine whether the high error rate is uniform among all collectors or is found only in the work done by one or more individuals. The employee completing each worksheet can be identified by a code on the sheet itself. If the high error rate is restricted to one or more individuals, the analyst can either review collection procedures with those individuals and stress the importance of accuracy to the employees and their supervisor, or request that a more accurate employee be assigned to recollect the same data. Should the high error rate prove to be uniform among all collectors, the analyst should review the collection procedures with all employees and appropriate supervisors to determine whether the worksheets are poorly designed or the data collection procedures incomplete or confusing.

If data collection accuracy does not improve, analysts may want to consider collecting the data themselves or finding another way to measure the criterion in question. Another remedial course is to postpone the evaluation while improved data collection procedures are developed. This will usually mean postponing the evaluation for one program period (one month to one year). Naturally, the earlier in the evaluation process this determination can be made, the fewer dollar and personnel resources will be wasted on an incomplete effort.

2. Subjective Judgment Errors. Data involving subjective judgments will require more involved accuracy checks than outlined above. When dealing with subjective ratings such as those provided by inspectors or social services counselors, the analyst must make an effort to determine the accuracy of the rating system. This is accomplished by examining the rating scale to determine how clear and comprehensive the descriptions are of the various rating categories. In addition, the analyst should attempt to determine how much training the field personnel have had in the use of the scale and how often the training is reviewed. The review question can be significant, since experience has shown that extended use of a subjective scale often result in "compressed" ratings; i.e.. Fewer ratings toward the extremes of the scale. Periodic reviews of the scale with supervisors can help alleviate this tendency.

The analyst may also find it useful to examine the turnover rate among field personnel, since high turnover often results in inconsistent ratings over the evaluation of the ratings by getting several people independently to apply the rating scale to the same situation or site at the same time.

3. Methodological Errors.
Of the data collection techniques menti8oned, surveys are most prone to methodological error. The analyst should review the survey instrument (questionnaire ) for possible bias, the sample selection method, the size of the sample, the degree of training given to surveyors, and the methods used to analyze responses. The references found in Appendix B should provide the information needed to make most of these determinations. No survey can be 100 percent accurate. What the analyst should watch for are instances in which opinions or results are not clear-cut on a specific question and there is some evidence of significant inaccuracy in the survey. Management should be caution that does not have a high degree of reliability. Data from flawed surveys can still be used, but with due caution.

Another type of methodological error can sometimes be avoided by double-checking of the analyst's thought processes. It is very easy to get so involved in what you are doing that relatively simple errors go unnoticed. For example, an evaluation director reported that one of his associates was deeply involved in establishing criteria and collecting data on the effectiveness of fire suppression services. The analyst hit on the idea of using the percentage of a building that was consumed by fire as a criterion for effectiveness of the fire department. The evaluation director hastened to point out that since the fire department had no control over how long a building had been burning before an alarm was turned in, and that a building might well be fully engulfed before the department was notified, the proposed criterion was neither fair nor valid. There is a much better chance of avoiding such errors if the work of an analyst is checked by at least one other analyst.

It is generally inadvisable to continue the evaluation with data errors greater than 10 percent. If an evaluation is continued under such circumstances, the analyst should be sure to identify clearly resulting distortions in the evaluation report. Managers must understand that they cannot place the same degree of confidence in evaluations with questionable data as in evaluations with highly reliable data.

In summary, five major options can be pursued if key data are discovered to be inaccurate: (1) The evaluation team can seek other, perhaps less direct, ways of getting acceptable data. (2) Improved procedures can be adopted for collecting the data and the evaluation postponed until new, reliable data can be gathered. (3) the evaluation team can seek to improve the quality of the data by such methods as clear supervision of the collection effort, or the use of better collection forms. (4) The evaluation can be continued with the clear warning that management should be cautious in using the data in question for decision making. (5) The evaluation can be canceled as infeasible. While the most suitable option will depend on the specifies of the situation, analysts will probably feel the most confident with the second option, where practical. The important point is to recognize that inaccurate data can badly undermine the credibility of an evaluation, and the analyst should guard against this problem.