Monitoring and Evaluation of Labor Programs

EVALUATING LABOR MARKET PROGRAMS

Constructing the Counterfactual Analysis

Evaluation is the periodic assessment of the relevance, performance, efficiency, and impact of the project in relation to stated goals. It differs from monitoring in that it is not an essential task for the implementing agency. Evaluation is mainly concerned with impact, which may only be measurable toward the end of implementation or in later years and so is often better done by a separate agency independent from implementation.

A central requirement of any evaluation is separation of the effects that would have happened anyway from those that resulted from the intervention. Before-and-after comparisons alone are not sufficient. If earnings rise after training, for example, that may be the result not of the training but of changes in the macroeconomy or local changes in labor demand or of such worker-specific attributes as life-cycle changes in earnings.

Evaluation therefore requires a counterfactual test, which is normally provided by a control or comparison group of workers who did not participate in the severance or redeployment program. box 7.2 illustrates the importance of creating such groups for a hypothetical redeployment training program.

Counterfactual analysis can use either:

Evaluation is mainly concerned with impact.

Control groups, which consist of participants that are selected at random within a well-defined population from which the members of the treatment group are also selected
Comparison groups, which consist of participants who are purposively matched to the participants in the treatment group.

Counterfactual analysis demands careful choice of the scenarios against which the outcomes of labor programs are compared. For example, if the introduction of a PPI project in a declining public sector port leads to the loss of 1,000 cargo-handling jobs out of a total of 3,000 jobs through voluntary departure over three years, which of the following is the appropriate counterfactual comparison?

The before-and-after calculation of 1,000 job losses
A comparison with trends in other similar public sector ports (which might suggest an annual decline of 10 percent in cargo handlers as mechanization is introduced)
A comparison with normal annual rates of job loss and job creation in private sector ports (a private sector counterfactual test)
A comparison with cargo-handling labor benchmarks in the most efficient ports internationally against which the port can reasonably be compared.

Box 7.2: The Importance of Control Groups-A Hypothetical Example

In the town of Abca, 1,000 workers were laid off as a result of the closure of the ABC Gas Company. Based on random selection, 500 workers were given a severance package and the other 500 were put through an intensive retraining program in computer skills. All 1,000 people were monitored over time. Three months after the completion of the training, 400 trainees were employed. This employment rate of 80 percent for the treatment group was touted by many as the impact of the training program.

However, Abcan evaluators cautioned against using only this figure to judge the success of the program. They wanted to compare this employment percentage to that of the control group- those who did not go through training. It was found that 375 of the control group of 500 were also employed three months after the treatment group completed its training-an employment rate of 75 percent. Hence, Abcan evaluators judged that the true impact of the training program was 5 percent, not 80 percent.

Although this example makes many generalizations -there was no selection bias or randomization bias, those who got a severance package did not enroll in any training or other related labor programs, and so forth-it serves to illustrate the importance of using control groups when evaluating the impact of labor programs.

Source: Adapted from World Bank, no date.

In each case the counterfactual alternative provides the "what-would-have-happened-if" comparison (in this case, what would have happened if PPI had not happened). Each comparison is, however, likely to give rather different answers, and small differences in the assumptions and comparisons being made can lead to very different conclusions. This is why the use of a counterfactual test with a very clear definition of the assumptions being used is so important.

Assessing the Impact of Redeployment

Redeployment programs in developing countries have rarely been subject to rigorous evaluation.

In developing countries the evaluation of redeployment programs, social safety net programs, and active labor market programs has generally been inadequate.

One reason for the neglect of evaluation might be that properly assessing the impact of redeployment presents a significant technical challenge to evaluators everywhere. Undertaking a robust counterfactual analysis is particularly difficult because participants in, for example, a training program may be selected or may self-select. Such selection biases can distort policy conclusions, and redeployment programs are especially prone to these biases. Evaluators use two methodological approaches to tackle these selection problems:

An experimental approach randomly assigns individuals to enter a program. This approach avoids many (though not all) of the selection problems of statistical methodologies. It is difficult to implement, however, for practical reasons (that is, cost and the fact that only current or future programs can be evaluated by this approach) and for ethical and political reasons (for example, some workers are refused entry to the program). Few developing countries are likely to implement such an approach. A recent illustration of such an evaluation is that of a job-search assistance program for unemployed workers in the United Kingdom, where the evaluation identified a 6 percent lower unemployment rate among participants five years after the initial program (Dolton and O'Neill 2002).
A statistical approach allows selection of the participant and nonparticipant groups after the redeployment program has started. To deal with selection biases, complex econometric techniques are needed to reduce the biases (elimination is not possible). Regression techniques and matched-pair comparisons are the principal statistical tools. The main advantage of statistical approaches is that the evaluation can be done at any time, provided that adequate longitudinal data are available.

Rigorous evaluation presents a technical challenge to evaluators and requires specialist skills.

Dar and Gill (1998) summarized these alternative methodological approaches in the context of retraining programs. Theoretically, experimental techniques are more robust. However, the statistical approach is more practical, although it can be subject to large biases that risk offering false conclusions to policymakers. Matched-pair statistical techniques are preferable to regression-based techniques because (a) they offer the greatest potential for reducing differences between the participant and nonparticipant groups (other than the redeployment program), and (b) the results are easier for nonstatisticians and policymakers to interpret

The example of the evaluation of Mexico's retraining program for unemployed and displaced workers (box 7.3) illustrates some of the challenges of conducting evaluations. Problems for evaluators include:

Creaming: If program managers are evaluated on the percentages of trainees who find employment, then creaming may occur. In this situation managers actively select the best or most-qualified trainees to inflate the program's apparent success rate:

The analogy is to whole milk where the richest part, the cream, floats to the top and can be skimmed off. Creaming is an issue in labor market programs because if only the most able people get reemployment assistance, then the benefit to society of the programs is not as great as it might be otherwise. Highly qualified program entrants have a good chance of becoming reemployed even without the services offered in the program, while for less qualified applicants the program services might be the only realistic path to employment (O'Leary 1999, p. 3).

To tackle this problem in evaluations, the right counterfactual test is needed. If the employment rate of participants is compared with that of all displaced workers, then the apparent success of the program will be inflated. Control or comparison groups should therefore compare trainees with other displaced workers who had similar levels of qualifications and other observable attributes.
Creating matched control or comparison groups: If evaluations are undertaken some time after the program is completed, it becomes increasingly difficult to ensure matching between the treatment and the control or comparison groups. In Mexico's PROBECAT evaluation, for example, the comparison group was taken from a separate data set of an urban household survey of workers who were also unemployed at the time that unemployed trainees entered the PROBECAT program. Such different data sets may not be fully comparable. Ideally, control or comparison groups should be individually matched, but adequate data for such matching are available only sometimes.
Self-selection: In training programs where individuals choose (self-select) whether to enter the program, the problem of constructing the counterfactual comparison becomes more difficult because those who attend the program will be different from those who do not. If trainees volunteer for the program because it offers a stipend, for example, this can lead to selection biases when evaluating the program.
Dropouts: This is a related problem. If trainees drop out of the program when they find jobs, is that counted as a program success? Or does it simply show that trainees merely participated for the stipend?

Any evaluation needs to correct for potential selection biases.

Dead-weight loss: In later phases of PROBECAT, in-service training was provided by local employers. Government provided the workers' stipend, and the employers were required to hire at least 70 percent of the trainees. "Dead-weight loss" refers to the fact that firms participating in the in-service training would have hired some of the same workers anyway.
Influence of the very existence of the program: The evaluation approach outlined in box 7.3 used a conventional "differences-in-differences" approach, where the before-and-after earnings or employment changes for participants in the redeployment program are compared with the before-and-after change for a similar group of nonparticipants at a similar time. The approach assumes that the existence of the program itself is an external variable. Training programs may function as a form of job search for many of their participants (Heckman and Smith 1998). The decision to participate therefore also needs to be controlled for in the design of the evaluation.
Displacement effects: If a program participant improves his or her reemployment chances at the expense of nonparticipants, then one person's job may merely be taken by another. If this happens the program's overall benefit to the economy as a whole may be less than intended.
Changes in program design: Programs often change their design and approach during implementation. This is a problem for evaluators because (a) it compounds selection problems and (b) what they are evaluating may be seen as the "old" approach and therefore not relevant.

Box 7.3: Example of a Redeployment Evaluation-PROBECAT, Mexico

In 1984, as a response to a growing economic crisis, the government of Mexico established a labor retraining program for unemployed and displaced workers-Programa de Becas de Capacitación para Trabajadores, or PROBECAT. Revenga, Riboud, and Tan (1994) reported an impact evaluation analysis. The evaluation set itself four clear questions. First, what is the impact of training on the subsequent employment experiences of trainees? Second, does training increase the speed with which trainees move from unemployment to employment? Third, conditional upon finding employment, what effect does training have on the monthly earnings, work hours per week, and hourly wages of trainees? Fourth, do the monetary benefits from program participation outweigh the costs of providing retraining for the unemployed?

PROBECAT was a large program. At the time of the evaluation it had trained 251,181 unemployed people and provided 9,268 courses since 1987. During the training period (usually three months), program participants received a stipend equal to the minimum wage. Vocational courses were organized to respond to the needs of the local labor market and were designed to redress local shortages of workers with particular skills. These needs were determined through periodic studies of local labor market conditions.

Not everyone was eligible to participate in PROBECAT. The selection procedure gave variable weights to different criteria, including the number of economic dependents, attainment of certain levels of basic education, prior work experience, and unemployment of less than three months. The weighting scheme was quite complex and nonlinear, and only individuals with a total composite score exceeding a threshold level were eligible to join the program. In addition, participants had (in theory) to be between the ages of 20 and 55 and be registered as job seekers at the local state employment office. This nonrandom selection of individuals into PROBECAT posed potentially serious measurement problems for the evaluation of the training program.

The evaluation approach taken was to adopt a statistical methodology to account for the selection bias in the program, and to compare the post-training labor market experiences of PROBECAT trainees with those of a comparison group-a matched sample of unemployed people who were eligible for but did not participate in the training program. The evaluation found that participation in the training program decreased the period of unemployment for men and women trainees and increased the monthly earning of men but not of women.

Source: Revenga, Riboud, and Tan 1994.

Given the importance often attached by governments to redeployment, there is a good case for better evaluation. In a review of active labor market programs, Fay (1996) concluded that evaluation will be improved if:

Evaluation is made compulsory in the program design phase. Most donor-funded programs are subject-or potentially subject- to postevaluation. Although the benefits of evaluations may not accrue to the government, they will improve the quality of the database for other countries.
Evaluations are more rigorous. Evaluation of the overall effects of labor programs is complex. The design of the evaluation methodology requires specialist economic and evaluation skills.
Evaluations are undertaken by nongovernment agencies. This has two benefits: governments do not need to use scarce professional resources; and if the results come from an independent organization, they will probably carry more weight.

Evaluation studies of active labor programs have been conducted in middle-income countries at costs of about US$150,000 (Fretwell 2002), which is a relatively modest amount compared with the levels of expenditure incurred.
The period of evaluation is extended. Impacts on workers of, for example, retraining may not be observable shortly after the end of training. It may be valuable to wait longer after the program before beginning the evaluation.

In addition, the costs and benefits of redeployment need particular attention because they are often neglected in evaluations. Although evidence is patchy (Dar and Gill 1998), cost-benefit assessments indicate that:

Large-scale retraining programs may not be as effective as other measures, such as jobsearch assistance.
Targeting programs may improve their relevance and effectiveness. In some cases (for example, in Hungary) redeployment training is better focused on relatively disadvantaged job seekers, whereas other evaluations (such as PROBECAT in Mexico) suggest that the program is more cost-effective if focused on better-educated and more experienced job seekers.

box 7.4 offers some key indicators that could be used both for interim (gross impact) monitoring of redeployment programs (where usually there will be batches of trainees) and for subsequent net impact evaluations.

Box 7.4: Possible Cost and Benefit Indicators for Redeployment Programs

Average cost per entrant into counseling or training-disaggregated among different types of counseling or training
Average cost per trainee employed
Percentage of trainees employed or selfemployed
Percentage of trainees engaged in the vocation of training
Average monthly wages/net incomes of trainees (absolute and relative to preprogram incomes)
Average household incomes (to assess effects on other family members and the household as a whole)

Assessing the Effects on Workers' Welfare

One of the simplest approaches to estimating worker welfare loss was that of Galal and others (1994) in their evaluation of the impact on workers of one form of private participation (privatization). As Birdsall and Nellis (2002, p. 31) pointed out, that approach was "simple, completely open in noting the short cuts taken and derives a usable, quantified answer to a most complex question"- in their case, whether workers had been worse or better off following severance. To illustrate their approach they simply assumed an average wage in the economy (for example, $250 per month), and if it took workers 10 months to find a job, then workers receiving a severance package of $2,500 would be no worse or better off.

Birdsall and Nellis 2002.

Evaluations should include assessment of the costs and benefits of redeployment.

This simple approach does not provide a robust counterfactual test. In practice, however, it can be difficult to eliminate other factors influencing the impact on workers. As Rama and MacIsaac (1999, pp. 101-2) noted:

The most straightforward indicator of the loss experienced by displaced employees is the change in their annual earnings, excluding returns from invested compensation. This indicator could be criticized on the grounds that earnings before separation do not provide an appropriate counterfactual. A case could be made that the appropriate comparison is with the earnings these employees would have received had there been no downsizing. If the situation prior to downsizing was unsustainable, it could be argued, earnings would have declined in any event. Alternatively, if the situation was sustainable, some of these employees would have gotten pay raises and promotions in the 15 months elapsed since separation. More generally, the appropriate comparison would be between the lifetime earnings profile after separation from [the enterprise] and the corresponding earnings profile in the case of no separation. But this comparison would require heroic assumptions, so that it is safer to stick to observed earnings before and after separation.

More robust evaluations of effects on worker welfare, however, can use the same experimental and statistical methodologies described above for evaluating redeployment impacts. Unfortunately, there have been few longitudinal tracer studies on what actually happens to displaced employees over time.

A further difficulty in assessing impacts on workers' welfare is the wide range of workers' outcomes following displacement:

They retire and cease looking for paid employment or income-earning opportunities.
They cannot find work or incomes and remain in long-term unemployment.
They find alternative permanent employment.
They find alternative short-term, contractual, or informal employment.
They chose to become self-employed as individuals or start a microenterprise.
They launch a formal small business with potential for growth.
They expand existing income-earning activities that they were already running while employed in the enterprise (either "moonlighting" or "daylighting").
They migrate out of the region to find jobs or to return to rural communities where the family home is based.

If workers migrate to find new employment, follow-up evaluation is more difficult and more costly, and it is likely that such workers will not be captured in subsequent evaluations. In a survey of 5,334 workers from Brazil's railway, 1,217 workers could not be found because they moved without a trace (Estache, Schmitt de Azevedo, and Sydenstricker 2000).

In a follow-up survey of 675 former workers in Brazil that was conducted two to three years after retrenchment, it was found that although 53 percent were earning less than when they were at the state enterprise, 23 percent were making a better living. In general the dispersion of wages was greater in this survey than one conducted nine months earlier. (There is no information on the sampling methods in these surveys, however.)

Most cost-benefit assessments make fairly simple assumptions regarding the consequences of displacement -a number of months of unemployment at a wage of zero, followed by a wage of, for example, 60 percent of the previous wage. Without more detailed tracer studies the impacts of more complex outcomes may not be well known. Nonetheless, it is clear that many displaced workers move into self-employment. table 7.5 shows one example from Turkey, where nearly one-fifth of workers displaced during privatization used their severance money to enter into self-employment. A January 1998 survey of displaced workers in Brazil's federal railway found that "over half work on their own and 20 percent have opened their own business. Only 18 percent are employees and four percent are civil servants" (Estache, Schmitt de Azevedo, and Sydenstricker 2000, p. 18). A tracer study in Ghana found that nearly 70 percent of displaced civil servants went into selfemployment (table 7.6). In addition, there may be impacts on others. Although SOE workers are relatively well paid, those benefits may be shared with households and extended families.

More detailed tracer studies are needed.

Table 7.5: Turkey-How Workers Used Severance Compensation
(percentage of workers)
Use of severance money	Petrochemical workers (n = 682)	Cement workers (n = 563)	Total (n = 1,245)
Established own business	12.8	22.0	17.0
Used for daily expenses	31.4	28.2	30.0
Lent money	6.0	1.7	4.0
Placed time deposit in a bank	22.1	5.4	11.2
Bought a house	40.0	36.1	38.2
Bought gold or foreign exchange	9.4	18.3	13.3
Bought treasury bills	1.6	0.4	1.1
Bought securities	0.9	0.7	0.9
Used interest income for daily expenses	7.1	5.0	6.2
Used rental income for daily expenses	4.2	3.4	3.8
Bought a car	18.1	10.0	16.7
Bought land	3.5	1.4	2.6

Note: Based on an interview where respondents were faced with a number of possible choices and asked to choose as many as applicable. So, for example, around 40 percent of petrochemical workers stated that they bought a house with their severance compensation.
Source: Tansel 1996.

Many workers move into selfemployment.

Table 7.6: Preferred Employment Status of Redeployed Civil Servants-Ghana
	Agriculture		Nonagriculture		Total
Preferred employment status	Number	Percent	Number	Percent	Number	Percent
Self-employment	1,124	72	782	65	1,906	69
Cooperative	420	27	260	21	680	24
Private wage employment	20	1	164	14	184	7
Government	2	0	6	0	8	0
Subtotal	1,566	100	1,212	100	2,778	100
No preference (number)	95	n.a.	837	n.a.	932	n.a.
Total sample size (number)	1,661	n.a.	2,049	n.a.	3,710	n.a.

n.a. Not applicable.
Note: Based on sample of workers opting for retraining.
Source: Alderman, Canagarajah, and Younger 1994.

Assessing Overall PPI Benefits

Evaluating the success of a labor program requires that initial objectives be revisited. It asks, what were the initial objectives? and did the program meet those objectives?

An evaluation can be made at two levels:

Evaluation against the specific objectives of the labor program itself: This is the focus here, especially the impact of redeployment programs on workers' incomes and the period of unemployment, and the impact of labor adjustment on workers' welfare.
Evaluation of the contribution of the labor program to achieving the wider policy objectives of PPI: The effects of a labor program may go well beyond the consequences for individual displaced workers. table 7.7 indicates the scope of a comprehensive assessment- which would include impacts on government, consumers, investors, and labor unions, as well as workers themselves. box 7.5 summarizes information on the extent to which labor variables influence privatization prices.

Table 7.7: Assessing Labor Programs-A Checklist of Potential Effects on Different Stakeholders
Type of impact	For government	For unions	For employees	For consumers and customers	For investors
Positive effects	Reduced subsidies or net costs of providing PPI services Time advan- tage from faster com- pletion of PPI transaction, and faster implemen- tation of investment or service improvements Revenues from PPI transaction (concession or privatiza- tion receipts) Increases in tax revenue from private operators No disruption of service (power supplies, port operations)	Greater job security (but for fewer employees) Stronger role, if consulted and partici- pate in preparation of the labor program	Salary improve- ments for retained workers Changes in labor contracts that affect (improve or reduce) non- wage benefits	Faster access to improved PPI services Evidence of growth in supply of services (e.g., access to water) or demand for services (number of passengers on trains) Reduced costs to business (e.g., telecom- munications, transportation) Reduced tariffs to consumers (services) Service quality improvements	Improved financial performance Reduced costs More flexible labor contracts Improved labor productivity
Negative (adverse) effects	Political costs (if disputes) New incre- mental costs Financial loss if rehiring takes place	Loss of membership numbers Reduced bargaining power	Loss of salary and other tan- gible and intangible benefits for displaced workers	Increased tariffs (if there are labor cost increases)	Higher wage or benefits costs if negotiated upward by labor prior to PPI Less flexible labor contracts Loss of valuable skills if adverse selection

Source of the table content

Box 7.5: Impact of Downsizing on PPI Prices

The very high levels of downsizing in infrastructure enterprises described in module 2 suggest that infrastructure enterprises are qualitatively different from other privatizations or PPI schemes. Unfortunately, there is little evidence of the impact of downsizing on PPI prices received from investors. In part this is because investor behavior is inherently complex, and in part because downsizing is often part of a wider package and investor responses to downsizing may be confounded by other changes (such as revised labor contracts or relations). In some cases, without work force restructuring it is unlikely that any investor will be found. The dramatic work force restructuring in Mexico's airlines, described in module 4 (box 4.9) is one example. Another is Argentina's railway, which Ramamurti (1997) characterized prior to privatization as a "lemon"-an enterprise not attractive to a private investor because it was in a stagnant or declining market with poor profit prospects-but noted that:

The government did several things to turn FA [Ferrocarriles Argentines] from a lemon-not into a plum-but into a much sweeter proposition than before (a grapefruit?) Four government steps in that direction were: breaking the FA's unions, picking up the tab for downsizing its work force, splitting up the company into smaller parts and then leasing rather than selling its assets. In addition, the unions agreed to greater flexibility in the deployment of workers, and to negotiate contracts with private owners that would increase rail's competitiveness (p. 1982).

One attempt to review alternative factors influencing privatization prices is that of López-de- Silanes (1997). Using a database of all 236 Mexican privatizations between 1983 and 1992, he assessed the factors that influenced privatization price (as measured by a privatization quotient [PQ]). He found that "labor issues play a central role in explaining privatization prices" (p. 997), and that after accounting for endogeneity, reduction in the labor force increased privatization prices: "the net effect of a 20 percent reduction in the labor force before privatization is a 24 percent increase in PQ, evaluated at the predicted mean" (p. 1015). Moreover, union relationships were important: union contract renegotiation improved privatization prices (although this was not statistically different), and industrial disputes strongly depressed privatization prices: "one of the strongest results...is that an additional strike in an SOE leads to a 19 percent reduction in the net price evaluated at the mean predicted PQ" (p. 997).

A more recent assessment by Chong and Lópezde- Silanes (2002) undertook follow-up surveys of 308 privatized enterprises taken from a global database of privatizations. Using dummy variables for various labor downsizing policies, they found that labor downsizing did little for net privatization prices. The analysis, however, was unable to differentiate between large levels of downsizing-as occur in most infrastructure privatizations -and more modest levels of downsizing (only a quarter of the survey respondents provided any numerical information).

Evidence of the complexity of investor behavior comes from private sector work force restructuring. The common assumption is that stock market prices will rise following downsizing. Abraham and Kim (1999) reviewed a number of studies on the effects of downsizing on investor behavior and found that the evidence is inconclusive. Their own study of 381 firms found that both layoff announcements and employment guarantee announcements lead to reductions in stock market share prices. They suggested that investor response depended on the net result of four possible effects on investor behavior following downsizing:

A positive cost-saving effect (downsizing reduces cost of production)
A positive efficiency effect (downsizing improves overall firm efficiency)
A negative industrial relations effect (downsizing leads to poorer labor relations)
An ambiguous signaling effect (for a firm in good shape, layoffs indicate a positive response to changing circumstances; for firms in poor shape, layoffs confirm that firm performance is poor or even worse than expected).

Note: PQ is government's net privatization price after restructuring, adjusted by the percentage of company shares sold plus total liabilities at the time of privatization, divided by the total assets of the company at the time of privatization.

Monitoring and Evaluation of Labor Programs

EVALUATING LABOR MARKET PROGRAMS

Constructing the Counterfactual Analysis

Box 7.2: The Importance of Control Groups-A Hypothetical Example

Assessing the Impact of Redeployment

Box 7.3: Example of a Redeployment Evaluation-PROBECAT, Mexico

Box 7.4: Possible Cost and Benefit Indicators for Redeployment Programs

Assessing the Effects on Workers' Welfare

Assessing Overall PPI Benefits

Box 7.5: Impact of Downsizing on PPI Prices

Tools

Additional Materials

Web Sites

Quick Cost Calculator