Manipulation of a Poverty Index?

32
Manipulation of a Poverty Index? Adriana Camacho * Brown University [email protected] Emily Conover UC Berkeley [email protected] October 5, 2007 Work in progress. Please do not distribute or cite without our consent. We welcome comments. Abstract In the early 1990s laws were passed in Colombia that made spending on social programs a priority for the government. These laws also indicated that spending should be targeted to the poor and vulnerable population. For this purpose, the government designed a Census of the Poor. This census collects information at the individual and household level to construct a poverty index score for each family. Urban families with scores of 47 or less are eligible for many social programs. Despite the safeguards build into the system we see strange patterns in the data that suggest manipulation. Specifically, if we include municipality fixed effects, from 1994 to 1997, the number of interviews increases during the period prior to the elections for mayors. And from 1998 to 2003 the score distribution exhibits an increasing discontinuity of the density exactly at 47, the eligibility threshold. We also identified municipalities with relatively high proportions of families that have almost identical answers in a given month. Manipulation occurs, we believe, because the system rewards cheating by individuals and politicians, and because not enough controls are in place. For individuals, the rewards of having a score below the eligibility threshold are high, since it qualifies them for a broad range of social programs. For politicians, it provides an additional tool for increasing their political support. Newspaper articles indicate that the census of the poor was used to buy votes. We think that the mechanism through which politicians misuse the program depend on the relative costs and benefits. Using electoral data for mayors we find that when the elections are more competitive, and thus the benefits of an additional vote are higher, the amount of cheating is higher. Manipulation is a problem if it interferes with the democratic process. It also diverts the government’s resources and is costly. This study’s findings may prompt policy makers in Colombia and in other countries with similar social programs to incorporate additional safeguards that reduce the possibility of cheating. JEL No. D72, D73, H39, I32. * This is for making an acknowledgment. Many thanks will go here. 1

Transcript of Manipulation of a Poverty Index?

Manipulation of a Poverty Index?

Adriana Camacho∗

Brown University

[email protected]

Emily Conover†

UC Berkeley

[email protected]

October 5, 2007

Work in progress. Please do not distribute or cite without our consent. We welcome comments.

Abstract

In the early 1990s laws were passed in Colombia that made spending on social programsa priority for the government. These laws also indicated that spending should be targeted tothe poor and vulnerable population. For this purpose, the government designed a Census ofthe Poor. This census collects information at the individual and household level to constructa poverty index score for each family. Urban families with scores of 47 or less are eligible formany social programs. Despite the safeguards build into the system we see strange patternsin the data that suggest manipulation. Specifically, if we include municipality fixed effects,from 1994 to 1997, the number of interviews increases during the period prior to the electionsfor mayors. And from 1998 to 2003 the score distribution exhibits an increasing discontinuityof the density exactly at 47, the eligibility threshold. We also identified municipalities withrelatively high proportions of families that have almost identical answers in a given month.Manipulation occurs, we believe, because the system rewards cheating by individuals andpoliticians, and because not enough controls are in place. For individuals, the rewards ofhaving a score below the eligibility threshold are high, since it qualifies them for a broadrange of social programs. For politicians, it provides an additional tool for increasing theirpolitical support. Newspaper articles indicate that the census of the poor was used to buyvotes. We think that the mechanism through which politicians misuse the program depend onthe relative costs and benefits. Using electoral data for mayors we find that when the electionsare more competitive, and thus the benefits of an additional vote are higher, the amount ofcheating is higher. Manipulation is a problem if it interferes with the democratic process.It also diverts the government’s resources and is costly. This study’s findings may promptpolicy makers in Colombia and in other countries with similar social programs to incorporateadditional safeguards that reduce the possibility of cheating.

JEL No. D72, D73, H39, I32.

∗This is for making an acknowledgment.†Many thanks will go here.

1

1 Introduction

In the early 1990s laws were passed in Colombia that made spending on social programs a priorityfor the government. These laws also indicated that social spending should be targeted to the mostvulnerable population.1 In response to these requirements a remarkable and unprecedented systemwas put in place. To identify the poor the government designed what we will call the Census ofthe Poor.2 This Census collects information on dwelling characteristics, demographics, income,and employment at the individual and household level and uses it to assign a poverty indexscore to each family which goes from 0 (poorest) to 100 (richest).3 The motivation behind thecreation of this score was to generate a comprehensive measure of long term living conditions thatwould properly identify the population most in need and go beyond reflecting transitory incomeshocks. Eligibility rules for many social programs use a specific and known threshold from thepoverty index score. The most common was a score of 47 for urban families. People with scoresbelow the threshold can apply for a broad range of programs including educational subsidies,4

housing improvement programs, aid to elderly homeless, food-aid for school aged children, aidto mothers head of household with school aged children, Familias en Accion a conditional cashtransfer program,5 and the most extensive program, the subsidized health insurance program(Cardenas, 2006).

Implementation of the census of the poor was done at the municipal level and at differentpoints in time.6 The period which we study goes from 1994 to 2003. Enumerators were desig-nated to visit a pre-determined number of residence units. In order to determine cross-subsidiesof charges for public utilities, Colombia’s neighborhoods are officially stratified into six levels(strata). Stratum level 1 is the poorest and level 6 the wealthiest. Since the objective was toidentify the poor population, municipal officials were instructed to conduct door-to-door inter-views in neighborhoods of strata below level 4.7 The enumerators who conducted the door-to-doorinterviews recorded the information by hand in a questionnaire. These questionnaires were takento a central office where the data were entered into computers that operated with a programwritten and distributed by the central government. This program processed the information from

1Specifically, law 60 of 1993 indicated that spending on social programs should be targeted.2In Colombia the Census of the Poor is known as the SISBEN (in Spanish: system of beneficiary selection). In

this paper we will be using the “old” SISBEN, which was conducted between 1994 and 2003.3In Colombia this poverty index score is known as the SISBEN score.4See Barrera-Osorio and Urquiola (2007) for an evaluation of this program in Bogota using the “new” Census

of the Poor.5See Attanasio et al. (2006) for an evaluation of this program. This program uses a 37 score threshold.6A departamento is the Colombian jurisdiction equivalent to a state. It contains several municipios. A municipio

is the Colombian jurisdiction most similar to a county. Colombia has 32 departamentos and 1120 municipios. In

this paper we will refer to a municipio as a municipality.7People living in neighborhoods of strata levels 4 or above could request an interview. There is an unofficial

strata level 0 which corresponds to neighborhoods without access to any type of utilities. It can also be used for

domestic workers or people who rent a room from another household.

2

the survey and calculated the poverty index score. Because a system like this had never beenimplemented in Colombia, and because social programs were not as extensive before the 1990s,in the early years people were not aware of the range of potential benefits that they could getfrom this system.8 Initially, there was also the belief that having an enumerator come to a housefor the interview was a sufficient condition for eligibility, whereas, in fact, the requirement foreligibility was having a score below the threshold.

Despite the safeguards build into the system, we see strange patterns in the data that suggestthere was manipulation. Figure 1 shows that if we include municipality fixed effects, from 1994to 1997, the number of interviews increases during the periods prior to the elections for mayors.Figure 3 shows that from 1998 to 2003 the score distribution exhibits an increasing discontinuityof the density exactly at 47, the eligibility threshold. Manipulation of the census of the poor mightoccur at different stages and by different agents. The person being interviewed might be dishonestin order to increase his chances of being eligible for government programs.9 The enumerator,during the interview, might indicate to the respondent that certain answers will improve hischances of obtaining a lower score. The data entry person, finding patterns in the program, mightdecide to change some answers that will affect the score. The supervisor, a politician or someonein a position of power, might decide to change, or tell someone to change, some answers to makepeople eligible and gain political support from them. We can only detect in the data certain typesof manipulation. Using the answers to the questions, and the score algorithm, we first checkedthat the coded score corresponds to the score the algorithm would have calculated. For the mostpart these scores match, indicating that most of the manipulation was not due to overwriting thefinal score. Further evidence in support of the manipulation hypothesis comes from looking atpatterns in the data. In the spirit of Jacob and Levitt (2003) we identified municipalities withrelatively high proportions of families that had almost identical answers in a given month. Wefound that 97% of those individuals had scores below 47, the eligibility threshold, and 91% wereinterviewed after 1997.

We believe that manipulation occurred because the rules were not complete, and because thesystem generated incentives to cheat for individuals and politicians. For individuals the rewardsof having a score below the eligibility threshold are high, since it qualifies them for a broad rangeof social programs. For politicians, it provides an additional tool for increasing their politicalsupport. Newspaper articles indicate that manipulation took place at the government level.10

Colombian Senators have denounced irregularities in the system. One Senator said that the censusof the poor has been used as a vote buying mechanism (Robledo, 2006) and another Senator

8During the period that we are studying government spending on social programs increased by approximately

30% (Cardenas, 2006).9Throughout this paper we consider an individual as a “he”, but this individual could be either male or female.

10For example in the newspaper “El Pais” an article from October 13, 2000 is entitled “Polıticos ofrecen cupos

en el SISBEN a cambio de votos”, which translates to “Politicians offer Census of the Poor interviews in exchange

for votes.”

3

proposed a reform to the system to improve transparency (Moreno, 2005). Alternative surveydata, and the results from the previous section, support the hypothesis that one way of cheatingwas to lower the true score to below the eligibility threshold. We think that the mechanismthrough which some politicians misused the program, either by conducting a high number ofsurveys before elections or by changing scores, depended on the relative costs and benefits of eachat a particular point in time. Before the score algorithm was made public to municipal officials,we see that politicians were conducting a relatively high number of surveys in months beforeelections. After the score algorithm was released, and thus the costs of cheating lowered, we seethe emerging pattern of the discontinuity of the score density at the eligibility threshold. Usingdata from elections for mayors we find that when the elections are more competitive, and thus thebenefits of an additional vote higher, the discontinuity at the threshold is larger. Conversely, wesee in the data that the discontinuity at the threshold is smaller when the presence of monitoringinstitutions is stronger, and thus the marginal costs of cheating are higher.

In the last part of the paper we assess whether alternative explanations to manipulation couldgenerate the pattern that we observe in the score distribution over time. We looked at the scorealgorithm to see if, “by construction”, there is a higher number of possible combinations thatyield scores below the threshold. There are approximately 24 questions used to calculate thescore, and there could be approximately 16 billion different combinations of answers to generateall the scores from 0 to 100. We calculated the number of possible combinations to generateeach score and plotted the distribution. The distribution does not exhibit a discontinuity at theeligibility threshold. Next, to rule out that the changes in the distribution are not due to changesin macro-economic conditions we used survey data for 1993, 1997 and 2003, to construct thepoverty index score. We found that using survey data the score distribution for each of theseyears is smooth. Finally, to address the possibility of selection at the national and municipallevel, we looked at the number of interviews conducted by stratum over time and found that thisnumber remains relatively constant if we include municipality fixed effects.

Manipulation is a problem when it interferes with the democratic process. Furthermore, if weassume that the design of the poverty index score is properly identifying the population in mostneed, cheating will take away resources from potential alternative welfare enhancing programsand is costly for the government. Developing countries besides Colombia have adopted similarsystems to identify the poor, these countries can benefit from the lessons learned from Colombia’sexperience when designing or implementing their own programs.11 Finally, evaluations of socialprograms that used the Census of the Poor could theoretically use empirical methodologies thatexploit the discontinuous assignment of treatment based on the score index variable. However, theresults presented here suggest that the assumption of “no sorting at the threshold” that is requiredfor a valid causal interpretation does not hold. More broadly, this emphasizes the importanceof checking whether the necessary assumptions for identification hold prior to proceeding with a

11Some examples include: the SelBen in Ecuador and Chile’s CAS program.

4

particular methodology which “by design” may seem appropriate.

The paper is structured as follows: in section 2 we describe the Census of the Poor dataset,the survey data, the election data and the institutional data used in the paper. In section 3 wepresent evidence in support of the manipulation hypothesis. In section 4 we use a simple model toexplain what could be generating the discontinuity at the threshold which takes into account theincentives to the politicians for manipulation. We also test some of the predictions of the modelwith election data. In section 5 we present results to support the hypothesis that the changes inthe distribution are most likely not driven by alternative explanations such as the score algorithm,changes in economic conditions or selection. We conclude in section 6.

2 Data

The Census of the Poor data we use come from the “old” Census of the Poor conducted by eachmunicipality between 1994 and 2003.12 It was provided by the Colombian National PlanningAgency (DNP).13 The dataset contains 25.8 million individual records. In our working samplewe exclude the rural population because approximately 70% of Colombia’s population is urbanand the eligibility thresholds are different for rural areas. We also exclude people living in neigh-borhood strata level four or above because we believe that people who requested an intervieware different from those who were targeted. This leaves approximately 18 million individualsthat represents roughly 40% of the total Colombian population. Of 1120 municipalities 785 haveCensus of the Poor records, these municipalities account for 86.5% of the Colombian population.The Census of the Poor dataset is not a panel dataset despite the fact that we have informationover a 10 year period. Implementation dates varied by municipality. Some municipalities spreadout their interviews over several years, while others conducted their interviews in a few waves.Most municipalities conducted more than one round of interviews.

The poverty index score is a weighted average of answers to the Census of the Poor. Table 1shows the information collected and which variables were used to calculate the score. The scoreis calculated at the family level. It uses information from the unit of residence, the family andindividuals.14 The poverty index score has four components: utilities, housing, demographic andeducation. These components are divided into subcomponents and are added to calculate theoverall score. Tables 2 and 3 show the algorithm for calculating the poverty index score, as wellas the components and sub-components. Using this algorithm we reconstructed the score, andshow the cumulative distribution of answers for each of the subcomponents in the last column ofTables 2 and 3.

12The Census of the Poor was redesigned and a new Census of the Poor started being implemented in 2003.13DNP stands for Departamento Nacional de Planeacion in Spanish.14Within a unit of residence there could be more than one household, and within a household there could be

more than one family.

5

Survey data for 1993 come from the Socio-economic characterization survey implemented bythe Colombian national planning agency, DNP, the same agency that designed the Census of thePoor.15 This survey includes approximately 20,000 households in urban areas. Survey data for1997 and 2003 come from the Quality of life surveys, collected by the Colombian national statisticalagency, DANE.16 The 1997 survey includes approximately 9,000 households and the 2003 surveyincludes approximately 18,500 households in urban areas. The surveys are representative at thenational level. In our analysis we restricted the sample to people living in urban areas and stratalevels below 4 to make it comparable with our working dataset of the Census of the Poor.

To get a sense of what a score of 47 corresponds to, we use information from the survey datacollected in 1993. On average, people with a threshold score of 47, and older than 18 years, have4.8 years of schooling, while people with scores from 0 to 25 have 2.4 years, and people withscores from 75 to 100 have 12.6 years of schooling. The average normalized per capita incomefor someone with a threshold score of 47 is .15.17 The corresponding number for someone witha score between 0 and 25 is 0.11, and 2.08 for people with scores from 75 to 100. Table 4 showsmore detail on these figures.

Election data were provided by Colombia’s electoral agency.18 We use electoral informationfor mayor elections in 1990, 1992, 1994, 1997, 2000 and 2003. The data before 1997 have in-formation on the political party, the number of votes the winner received, and the total votesin the municipality. After 1997 there is information for the party and number of votes for eachcandidate. From these data we created variables to indicate turnout and incumbency from 1990to 2003, and political competition after 1997.

The number of community organizations data come from a non-profit civil institution, theSocial Foundation (1998).19 These data were used by Rosas and Mendoza (2005) as a proxyfor better governance and institutions. The newspaper circulation data corresponds to certifiedcirculation data for 2004 for Colombia’s main national newspaper.

3 Manipulation Hypothesis

3.1 Patterns in the Data

The poverty index score could have been manipulated at different stages and by different agents:during the interview by the interviewee or the enumerator; at the data entry point by the dataentry person or someone with access to the data; or after the surveys were recorded by someonewith access to the data or who could influence someone with access to the data, for example, a

15This survey is know in Colombia as the CASEN survey.16These surveys are known in Colombia as Encuestas de Calidad de Vida (ECV) in Spanish.17This number translates into each person in the family receiving .15 of the monthly minimum wage equivalent

in that year.18Registradurıa General de la Nacion.19Fundacion social: http://www.fundacion-social.com.co/default.htm

6

municipal official. For many of the questions it is difficult to detect whether an interviewee is lyinggiven that there are no right or wrong answers per se, but rather a combination of answers thatyield a score. Further, each person could randomly be misrepresenting himself. If the enumeratoris the one who is changing the answers, we can check whether a particular combination of answersappears with a higher than expected frequency for a given enumerator. Manipulation duringor after the data entry stage involves changes to the answers in the questionnaire, to a specificcomponent, or to the final score. In the introduction we noted that the rewards of having alow poverty index score are high for individuals since it gives them access to a broad range ofsocial programs.20 We also quoted articles from newspapers that indicated that the Census of thePoor was used to buy votes. In this section we present information in support of the claims thatmanipulation took place in the process of conducting the Census of the Poor. Later, in section5 we look at whether alternative explanations could be generating the trends we observe in thedata.

Some patterns in the data that give rise to initial suspicion are shown in Figures 1 and 3. Thefirst Figure shows that, if we include municipality fixed effects, there are spikes in the number ofinterviews conducted during periods of elections for mayors from 1994 to 1997. Figure 3 showsthat from 1998 to 2003 the score distribution exhibits an increasing discontinuity of the densityexactly at the eligibility threshold. Since implementation was done at the municipal level, ratherthan looking at the aggregate national level, it is more appropriate to look at the size of thediscontinuity at the threshold within municipalities. We do this in Figure 2 by using the followingequation:

jumpjts = α+ β1year montht + β2munij + β3stratumjts + εjts (1)

Where jumpjts corresponds to the fraction of interviews 3 points below the threshold minus thefraction of interviews 3 points above the threshold of 47 for each municipality in each monthcontrolling for stratum.21 In Figure 2 we plot the coefficients for β1. Figure 2 shows that from1994 to 1998 the discontinuity at the threshold is noisy, but we can not reject the hypothesis thatit is not centered around zero. From 1998 onwards, however, the discontinuity at the thresholdincreases over time and it is not centered around zero. It is important to note that the algorithmfor the score was made available to the municipal administrators sometime after July 1997 in aninstructional presentation that was also distributed as a pamphlet (DNP, 1997). The timing ofthis release coincides with the appearance of the discontinuity at the threshold in 1998 at theaggregate level.

20The Minister of social protection indicated that people’s valuation of one of the social programs, the pub-

lic health insurance program, was so high as to be creating incentives to discourage formal employment, see:

http://www.fcm.org.co/es/noticia.php?uid=0&todo=0&det=3837&leng=es21We also calculated this regression for 2 points away from the threshold, and the increasing pattern after 1998

remains.

7

3.2 Evidence of Manipulation

One way to change the score is by overwriting the real score with a hard coded score below thethreshold. Using the score algorithm and the individual answers from the survey we reconstructedthe poverty index score and compared it to the one recorded in the data. By doing this we wereable to identify whether the given overall score, or a specific component, is different from whatthe algorithm would have generated. Table 5 shows that the housing, utility and education com-ponents match almost perfectly. The observations that did not match in the housing and utilitycomponents came mostly from four municipalities where the total given score for a componentwas zero, despite the fact that the constructed score was non-zero (not reported in the table).Approximately 11% of individuals do not match in the demographic component. By using 1008possible combinations for the demographic component we were able to determine where the differ-ences for the non-matching families come from. Table 6 shows a break-down of the non-matchingdemographic families. Most of the discrepancies come from the income per capita in minimumwage units subcomponent, where 720 municipalities have a difference. An explanation for thediscrepancy is that at a certain point in the data entry stage the program used to calculate thescore asked the data entry person to enter a value for that year’s minimum wage. If the munic-ipality entered (by accident or on purpose) the wrong minimum wage, then our minimum wagecomponent is different. 50% of the difference in this minimum wage units subcomponent comesfrom one municipality, where in approximately 47% of the cases the reconstructed score is higherthan the given score. The second highest concentration comes from a municipality with 12% ofthe differences, in which approximately 58% of the cases the reconstructed score is higher thanthe given score. Across all municipalities in 45% of the cases the reconstructed score is higherthan the given score. The overall results are presented in Figure 4. This figure shows the givenpoverty index score distribution and the reconstructed score at the individual level and for peopleliving below strata level 4. The Figure shows that, with some exceptions at the zero score, thereconstructed score follows relatively closely the given score distribution.22 Importantly, at theaggregate level, the reconstructed score also changes discontinuously at the threshold, indicatingthat for most of the municipalities the manipulation did not occur at the point of overwriting thetrue score for a new score.

In the data we also identify values of the score that do not exist. Tables 2 and 3 show thatmost of the subcomponents of the poverty index score have four decimal digits. The only possiblecomponents that can take a whole number value is zero, or 13 from the utilities component.23

All other possible values for the components have at least two decimal places. We find that 14municipalities within a departamento have whole number values in the components. Moreoverthe average of these scores is 20 and all of them are below the eligibility threshold. The next case

22Most of the differences for the zero score come from one municipality in July 1997, a pre-electoral period.23Recall that the subcomponets are added up to generate a component score, which in turn is added to the other

components to get the total score.

8

that we identified, which is highly unlikely, is that all components sum to zero. We found thatthe majority of these cases appear in 8 municipalities for 14,354 families and after 1998.

Another way to change the scores, besides hard coding different answers, would be to learn acombination of answers that yield a score below the threshold and use this combination repeatedly.We first selected the families that have almost exactly the same answers as at least another familyinterviewed in a given municipality and month.24 We counted the number of families that we sawwith repeated answers and we divided that by the total number of families interviewed in thatmunicipality and month. This gives us a ratio between 0 and 1. If for example, everyone in thatmunicipality and month had the same answers, the ratio would be 1. We ranked that ratio andflagged everyone above the 80th percentile.25 With this methodology we were able, for example,to identify a municipality that conducted interviews on a single day in 2002, where approximately45,000 individuals from different neighborhoods all had a score of 31. These individuals had thesame answers for schooling, earnings and possessions, the same supervisor, coordinator and dataentry person, and very little variation in dwelling characteristics. Overall we identified with thismethodology around 415,000 individuals distributed as it appears in Figure 5. It is worth notingthat 97% of the people identified fall below the threshold, 91% of them were interviewed after1997, and there is a high concentration of people with scores between 35 and 47.

To summarize, in this section we showed patterns in the data that suggest there was ma-nipulation in the implementation of the Census of the Poor. We also found some evidence ofmanipulation by using the answers to the questionnaires and the score algorithm, and by identi-fying repeated answers within a municipality and month.

4 Possible Mechanism for Manipulation

4.1 A simple Framework

We see a pattern in the score distribution that suggests there was cheating during the implemen-tation of the Census of the Poor. This pattern appears after 1998 and after the algorithm forthe score was made public to municipal officials. We also see that in the mayor electoral periodsbefore 1998 there are spikes in the relative number of surveys conducted. We do not see any spikesin electoral periods after 1998. We think one way in which cheating occurred was to have somepeople’s score lowered as illustrated by Figure 9. This Figure portrays the Census of the Poor

24We write “almost exactly” because the condition we used is that the value for the four components of the score

(education, housing, demographics, and utilities) should be exactly the same.25The 80th percentile is an arbitrary value that we picked because we think it is sufficiently high. A more rigorous

way of doing this would be to find non-manipulated data, representative at the municipal level, with a sufficiently

large sample size and enough variables to be able to construct the poverty index score that would allow us to

calculate the probability of picking someone with almost exactly the same characteristics within a municipality. We

would then use this probability as a guide for determining the threshold. We have this in our “to do” list for future

work, if we are able to secure these data.

9

distribution for all years, as well as the 1993 survey data distribution, which is representativeat the national level. We believe that the survey data distribution is a good approximation ofwhat the Census of the Poor distribution would look like without manipulation. The differencesbetween the distributions serves as a guide as to where the people who had their scores changedcome from. A simplifying assumption is that some constant fraction of the people with scoresabove the threshold had their score lowered. Perhaps a more realistic assumption, that requiresthat the marginal cost of cheating is increasing with the score as portrayed in Figures 10 and11, would indicate that the people who had their scores lowered were closer to the eligibilitythreshold.26 We also know that cheating does not require being surveyed at that particular pointin time. For instance, people surveyed in the past could have their scores changed. People valuesurveys because in order to become eligible to many social programs they need to be surveyed andto have a score below a government imposed threshold s0. When the program started, there wasconfusion among the population as to whether being surveyed was a sufficient enough conditionfor eligibility. Newspaper articles imply that the program was used as a vote buying mechanism.In this section we provide a simple framework where the politician has two tools to increase theirelectoral support: conduct a high number of surveys before an election or cheat (i.e. lower some-one’s score). The framework presented here shows that the mechanism through which politiciansmisused the program, either by conducting a high number of surveys before elections or by chang-ing people’s scores, depended on the relative costs and benefits of each at a particular point intime.

Using a probabilistic voting model framework based on the one developed by Lindbeck andWeibull (1987) and Persson and Tabellini (2000). Let the density of the score s be given by F(s).Assume that the politician chooses a constant fraction p of people above the official threshold s0to have their scores lowered. We will call this cheating. Voters support the incumbent, I, if theexpected utility they would get from him exceeds the expected utility they would get from thechallenger C:

GI + nIµbsI[s ≤ s0] + pbsI[s > s0] + δi + θ > GC (2)

Where:

• G represents the candidate’s proposed policy variable, assume it is exogenous.

• Let nI be the proportion of surveys conducted before the election over the total numberof surveys conducted in a municipality, thus 0 ≤ nI ≤ 1. µbs represents the benefit tothe voter of being surveyed taking into account the probability that he is surveyed. SonIµbsI[s ≤ s0] represents the expected benefit to the voter if the incumbent conducts arelatively large number of surveys before the election. From the voter’s perspective, thisterm is only beneficial if his score is below the official threshold s0.

26In the appendix we develop a preliminary version of a model which assumes that the people who had their

scores lowered are closer to the eligibility threshold.

10

• Let p be the proportion of people with scores above s0 for whom the politician lowers thescore to some score below s0. Let bs be the benefit to the individual from having his scorelowered. So pbsI[s > s0] represents the expected benefits to a voter with a score above thethreshold. From the perspective of the voter, this term is only beneficial if his score is abovethe threshold s0.

• δi is an individual’s specific measure of the voter’s bias toward the candidate. δi ∼ U[−12φ ,

12φ

]and is distributed independently of score with a density of φ.

• θ is an aggregate shock to the population’s preferences, realized after the parties commit topolicies. θ ∼ U

[−12ψ ,

12ψ

]with a density of ψ.

Re-writing 2 for the swing voter we get:

GI −GC + nIµbsI[s ≤ s0] + pbsI[s > s0] + θ = −δi (3)

From 3 and using the fact that δi ∼ U[−12φ ,

12φ

]we get the vote share for the incumbent:

VI = 12 + φ[(GI −GC + nIµbs + θ)(F (s0)) (4)

+ (GI −GC + pbs + θ)(1− F (s0))]

The first term represents the fraction of people who benefit from the number of surveys. Thesecond terms corresponds to the fraction of people who benefit from cheating. The incumbentwants to maximize the probability of winning the next election Pr(VI > .5). Using 4 and infor-mation on the density of θ we get the incumbent’s problem:

maxp,nI

P IR− c(p) (5)

Where R is the value to the incumbent of winning the election. This is equivalent to:

maxp,nI

12

+ ψφR[(GI −GC + nIµbs)(F (s0)) (6)

+(GI −GC + pbs)(1− F (s0))]− c(p, nI)

The challenger can not do surveys or cheat before the election.27 Let c(p, nI) = ηnI+ c2(p)2 be the

personal costs incurred by the politician, these are increasing in the number of surveys conductedand in the amount of cheating.

27A question that arises here is how is this a credible way to buy votes since in a secret ballot system voters

should not feel obliged to vote for the incumbent once they receive their benefits. One way to think about why this

framework might hold is for instance, people who had their scores lowered might fear that if the opposition comes

to power there would be a higher chance of officials reverting their lowered score to their true score. And, people

who were surveyed might take surveying early (prior to elections as opposed to after elections) as a signal that the

politician will favor policies that are beneficial to the poor.

11

Solving for p:

p =ψφRbs[(1− F (s0)]

c(7)

From this simple set-up we obtain results that indicate there is an inverse relationship betweenthe costs and the amount of cheating. Also, if ψ increases, the election becomes more competitive,and the amount of cheating increases because the benefits of an additional vote to the politician arehigher. Given the costs function, we should expect to see an inverse relation between the numberof surveys conducted before an election and the amount of cheating.28 In terms of the patternswe see in the data, we can think that initially there was poor information about the benefits andrequirements of the program. People thought that having an interview automatically made themeligible to the different social programs. Furthermore, conducting surveys right before an electionwas not illegal. Under these circumstances, the costs of conducting surveys before an election weresufficiently low so the incumbent would choose to almost exclusively conduct surveys instead ofcheating. The exact formula for the score algorithm was released to municipal officials sometimeafter the summer of 1997. Over time people were also becoming increasingly aware that in orderto be eligible, they had to have a score below a given threshold. Additionally, people startedrealizing that although not illegal, conducting surveys right before an election was opportunistic.All these factors contributed to a sharp decreases in the costs of cheating after 1998 (independentlyof whether the costs of conducting surveys stayed constant or rose), and the optimal strategy forthe incumbent after 1998 became cheating.

4.2 Using Electoral Data to test Some Predictions

The administration of the Census of the Poor is controlled by the executive branch, thus we usedelection data for mayors as described in the data section. We expect tighter elections to yieldbigger discontinuities at the threshold. In the context of the model this can be interpreted as ahigher benefit for the politician from cheating, since the benefits of an additional vote are higherwhen the election is more competitive.

Using data for the 6 months prior to the elections we calculated the dependent variable jump.This variable serves as a proxy for the amount of cheating in a municipality and is defined as thedifference in the fraction of interviews 3 and 5 points below the threshold relative to the samenumber of points above the threshold of 47. If there were no surveys conducted in a municipalityin a given year then the variable jump has a missing value. jump ranges from -1 to 1. However,most of the values are positive. The closer this variable is to 0 the smaller the discontinuity at thethreshold. We define political competition as the negative of the difference in the fraction of votesthe winner received relative to the second runner in the previous election. The closer the value

28If we had assume that costs are quadratic for both nI and p, then the amount of cheating and number of

surveys conducted would depend on the relative costs of each.

12

is to 0 the more competitive the election. This variable ranges from -1 to 0. Since we only haveinformation for all candidates starting in 1997, we can do this for the election years 1997, 2000and 2003. We used lagged political competition as a proxy for anticipated political competitionbecause using the value from the same year is likely to be endogenous since it is a function ofanticipated and manipulated political competition.29 Results are displayed in Table 7. Consistentwith the model, the Table shows that when the election is more competitive the discontinuity atthe threshold is larger. The lagged political competition variable is significant if we constructthe variable jump using surveys 3 points below and above the threshold. A standard deviationincrease in the amount of political competition increases the percent of interviews 3 points belowthe threshold relative to 3 points above the threshold from .09 to .1. Using 5 points from thethreshold, the variable is not significant, but it is always positive. The results do not change afterincluding population controls.

We have not yet secured data that we could use as proxies for the cost of cheating and thatvaries by municipality and over time. So far the data we have is at the municipal level but only atone point in time. We use number of community organizations and number of the main newspaperin circulation as proxies for costs of manipulation in a given municipality, because we believe thatit is harder to cheat in municipalities with more community oversight or when citizens are betterinformed.30 To the extent possible, we try to control for differences across municipalities byincluding an alternative measure of poverty for municipalities calculated from the 1993 censusand population controls. We expect to see that municipalities with better monitoring institutionshave less cheating. The dependent variable, jump again proxies for the amount of cheating in amunicipality. For these regressions we used the same years for which we have political competitiondata: 1997, 2000 and 2003. Since implementation of the program was done at the municipal levelwe believe the results reported in Table 8 should be interpreted with caution and should only betaken as preliminary. Nevertheless, with this caveat in mind, we find that the coefficients have theexpected signs, consistent with the idea that better monitoring is associated with less cheating inmunicipalities around election times. A standard deviation increase in the number of communityorganizations or newspapers in circulation decreases the percent of interviews 3 points below thethreshold relative to 3 points above the threshold from .088 to .085.

29Mayors in Colombia can not be re-elected for consecutive terms. However Drazen and Eslava (2005) argue that

“manipulation of fiscal policy is regarded as a usual political practice” partly because officials run for election to

other (or the same) positions in later years, and because the politician’s decisions “affect his party’s re-election

chances (or those of the incumbent’s preferred candidate)”.30As mentioned earlier Rosas and Mendoza (2005) use the number of community organizations as a measure of

institutional quality in Colombia. Besley and Burgess (2002) found that government responsiveness is higher where

newspaper circulation is higher.

13

5 Alternative Explanations for Pattern in Score Distribution

In this section we explore whether alternative explanations to manipulation could be generatingthe pattern in the score distribution that we see over time.

First we want to rule out that “by construction” the score algorithm is generating a highernumber of possible combinations for scores below the eligibility threshold. The score algorithmtakes information from approximately 24 questions. The answers are then used to compute sub-scores for each of the four components. There are 384 possible combinations in the educationcomponent, 1008 in the demographic, 90 in utilities, and 480 in the dwelling component for atotal of approximately 16 billion possible combinations of answers. We calculated the number ofpossible combinations to generate each score and plotted the distribution. The maximum numberof combinations is around 600 million for a score of 50. The minimum is 1 for a score of 100.Figure 6 shows that the distribution does not exhibit a discontinuity at the eligibility threshold.

One objection to Figure 6 is that by constructing it we assumed that all combinations areequally likely. In reality however, we expect the covariance between certain answers to be differentfrom zero. For example, people with a dirt floor are more likely to have a thatch roof than peoplewho have floors with carpets. Further, we also expect that certain combinations of answers willnot appear in the population. Next we used information from a representative sample of theColombian population, which should give a realistic representation of the score distribution in thepopulation.

Another explanation for what could be generating the pattern in the score distribution overtime could be changes in general macroeconomic conditions. In fact, in 1999 Colombia experienceda recession. During that year, according to figures from the National Statistical Agency (DANE),real GDP fell by 4.2%.31 We expect that the recession increased the proportion of poor inthe population, and thus could have affected the shape in the aggregate score distribution. Toaddress this concern, we took alternative data from nationally representative household surveysfor 1993, 1997 and 2003. Using these surveys and the score algorithm, we constructed the povertyindex score to see how the distribution behaves over time. We recognize that survey data hasshortcomings some of which include: the wording of questions might be different from the Censusof the Poor; the surveys by design have smaller sample sizes; and the surveys provide a “snapshot”of the population in a given year. We tried to overcome the first shortcoming by using the 1993household survey. This survey was conducted during the summer of 1993, prior to the Census ofthe Poor. It was used as a pilot survey in the design of the Census of the Poor. The wording of thequestions is almost identical to the Census of the Poor.32 We believe that people answering the1993 household survey had no incentives to provide false information because prior to the Censusof the Poor, eligibility for social programs in Colombia was to our knowledge not determined

31According to DANE, in 2000 real GDP growth was 2.9%, this growth rate was above the 1996 level of 2.1%.32The one exception is in the income question, where the household survey provides more detailed and extensive

questions on income sources.

14

using survey information. There is nothing we can do to overcome the second shortcoming, but ingeneral, after restricting the sample to strata level below 4, the surveys we used are representativeof our population of interest, the urban poor. The third shortcoming we addressed by using surveydata for 1997 and 2003 which provide information on changes of the distribution over time.

Even though we do not have survey data for 1999, the year of the recession, we expect thatif the effects of the recession went beyond 1999 then the 2003 survey data distribution shouldalso exhibit a discontinuity at the threshold, such as the one observed in the Census of the Poor.The first graph in Figure 7 shows that the lowess 1993 household survey distribution and theCensus of the Poor distribution for 1994 look very similar.33 The Census of the Poor distributionlies slightly to the right of the 1993 household survey distribution. The second and third graphsin Figure 7 show the poverty index score distribution and the Quality of Life surveys for 1997and 2003 respectively. In 1997 the Census of the Poor distribution is to the left of the surveydistribution, but we do not observe a discontinuity at the eligibility threshold. In 2003 howeverthe two distributions differ greatly. The mode of the distribution of the Census of the Poor isto the left and there is a discontinuity at the eligibility threshold, which does not appear in thesurvey data distribution. To summarize, from Figure 7 we can see that if a random sample ofinterviews was drawn each year, then the distribution would not exhibit a discontinuity at theeligibility threshold and would be moving to the right over time.34 However, instead what we seeis that the mode of the Census of the Poor distribution moves left over time, and that after 1997the distribution shows a discontinuity at the eligibility threshold.

One objection to Figure 7 is that the survey data that we use is a representative sample of thepopulation at a given point in time. Comparisons with these data assume that a random sam-ple of neighborhoods was interviewed in a given year across and within municipalities. In fact,municipalities had discretion on the timing of the surveys, and not all municipalities interviewedall people in strata level below 4 at once. Thus, it could be possible that the pattern we see atthe aggregate level is driven by selection. Specifically, richer municipalities could have conductedinterviews first, and within a municipality richer neighborhoods could have been surveyed first.35

This is worrisome since one explanation for the pattern in the score distribution could be thatover time municipalities became better at identifying the poor neighborhoods, or that the munic-ipalities which conducted the interviews later were poorer and thus had a higher concentrationto the left of the threshold.

Since implementation was done at the municipal level, and to the extend possible, our analysisis at this level, one way to check for selection is by comparing within a municipality the numberof surveys conducted by stratum level over time. We did this because we knew that the central

33We use lowess because of limitations in sample size. Even if we did not smooth the distributions the survey

data do not exhibit a discontinuity at the threshold.34This is consistent with the overall growth in the Colombian economy during this 10 year period.35This however goes against some information provided by some municipal officials in charge of the implementation

who told us that poorer neighborhoods were prioritized.

15

government instructed municipal officials to use strata levels as their targeting mechanism. Weshould be concerned about selection if, for instance, we see that within a municipality stratalevel 1 (poorer) interviews are increasing over time while in strata level 3 (richer) interviews aredecreasing. The equation that we use to calculate the number of interviews within a municipalityover time is:

surveys stratumxjt = α+ β1year montht + β2munij + εjt

Where surveys stratumx corresponds to the number of surveys conducted in stratum level x.Figure 8 shows that, excluding the peaks in 1995 and 1997 which correspond to electoral periods,for strata 1 to 3 the number of interviews remains relatively constant over time, and they have aslight upward trend after 2000 for strata 0. To further look into the possibility of selection in thedata as an explanation for the patterns we see over time, we are in the process of trying to obtainpopulation census data for 1993 to see if the characteristics of neighborhoods interviewed earlierwithin a municipality vary greatly from those that were interviewed later. But in general, theresults presented here do not indicate that the score algorithm, changes in economic conditions orselection explain why after 1998 we start seeing a discontinuity exactly at the eligibility threshold.

6 Conclusion

In this paper we document patterns in the data that indicate there was manipulation during theimplementation of the Census of the Poor in Colombia. We also provided a simple framework toillustrate a possible mechanism through which manipulation by politicians may have occurred.We tested some of the predictions of this framework with electoral data and found suggestiveevidence that political competition is positively associated with the amount of manipulation in amunicipality.

Manipulation is a problem when it interferes with the democratic process, since incumbentpoliticians, or their political parties, may be getting an unfair advantage. Manipulation is alsocostly for the government. In a “back of the envelope” calculation we estimate that from 1994 to2003 approximately 3 million people were not properly assigned a poverty index score. Consideringthat during the period studied the total population of Colombia was approximately 40 million, themisallocation of 3 million of the poorest segment of the population is sizeable. Developing countriesthat have adopted similar systems to identify the poor can benefit from the lessons learnedfrom Colombia’s experience when designing or implementing their own programs. Increasing thepenalties for cheating, improving detection of cheaters, and restricting to non-electoral periodsthe selection of the people eligible for the program should be considered as ways in which futureduplicity can be limited. Finally, from a methodological perspective, we want to highlight theimportance of checking whether the necessary assumptions for identification hold. This should bedone before any efforts are made to evaluate the effectiveness of social programs.

16

Table 1: Information Collected in the Census of the Poor

Variable Description Used in ScoreLocationArea Urban/Rural YesEstrato stratum number NoDwellingType House, room or shelter NoWalls Material of external wall YesFloor Material of floors YesRoof Material of roof YesLights Electricity, gasoline or candles NoToilet Toilet facilities YesWater Provision of drinking water YesTime to water Time to reach water source NoTrash Trash disposal YesOwnership Dwelling ownership NoLiving room Number of living rooms YesDining room Number of dining rooms YesBedrooms Number of bedrooms YesOther rooms Number of other rooms YesAppliancesRefrigerator Ownership of refrigerator YesT.V. Ownership of T.V. YesFan Ownership of fan YesBlender Ownership of blender YesWashing Machine Ownership of washing machine YesFamily MembersGender Gender NoMarital status Married, single, divorced, cohabitation, etc. NoRelation Relation to family head YesDate of Birth Month, day, year YesDisabled Disabled indicator NoEnrollment School enrollment indicator NoSocial security Affiliation to social security system YesLast schooling grade Last grade of schooling completed YesLast schooling level Last level of schooling completed YesActivity Previous week activity in categories NoRegular activity Regular activity in categories YesEmployee rank Employee rank or position in categories NoSize of business Size of business where employed in categories NoEarnings Total earnings YesFarmer Farming activities indicator NoAdministrative RecordsInterview date Month, day, year YesScore Poverty index score N/AIndividual component scores Score for each of the components N/A

Source: Colombia’s National Planning Agency, (DNP).17

Table 2: Poverty Index Score AlgorithmDescription Score Cum. Dist.EducationEducation of the highest wage earnerWithout education 0 10.21Incomplete primary school 1.6239 38.82Complete primary school 3.4435 65.56Incomplete secondary school 5.0039 86.43Complete secondary school 7.3434 97.66Incomplete college 9.7833 98.97Completed college 11.546 99.86Post-graduate 12.4806 100Avg. education of family members older than 11 yearsWithout education 0 4.06(0, 4] 1.657 34.17(4, 5] 2.9947 52.15(5, 10] 4.969 93.24(10, 11] 7.6387 98.15(11, 15] 9.4425 99.77(15, 16] 10.69 99.95More than 16 years of schooling 11.1396 100Social security of the highest wage earnerNo social security and self-employed or not working 0 65.33No social security and works in firm of 2-9 workers 1.166 75.59No social security and works in firm of 10 or more workers 2.6545 81.08With social security and self-employed or not working 3.9539 87.88With social security and works in firm of 2-9 workers 5.8427 90.68With social security and works in firm of 10 or more workers 6.9718 100HousingWall materialsNo walls, bamboo 0 2.15Zinc, cloth, cardboard, metal etc. 0.2473 2.99Unpolished wood 2.0207 9.77Mud 4.8586 17.41Adobe 6.2845 24.53Rock, bricks or blocks 7.7321 100Roof materialsStraw 0 2.72Recycled materials (cardboard, metal, etc) 2.1043 5.22Tiles, zinc (without a ceiling) 3.7779 68.82Tiles, zinc (with a ceiling) 5.0973 100Floor materialsDirt 0 11.75Unpolished wood 2.9037 16.91Cement 3.6967 73.34Tiles, vinyl or bricks 5.8712 98.89Rugs, polished wood, marble 6.8915 100Number of appliances that the family ownsNone 0 29.86Up to 3 basic appliances 2.1435 87.94 basic appliances without a washer 3.0763 95.983 to 4 basic appliances with a washer 4.7194 100

Source: Colombia’s National Planning Agency, (DNP).18

Table 3: Poverty Index Score Algorithm (Cont.)Description Score Cum. Dist.DemographicChildren to family size ratioMore than 0.65 0 2.05(0.0, .65] 0.2237 51.87No children 1.4761 100Employed to family size ratioLess than 0.30 0 66.57(0.30, 0.60] 0.6717 94(0.60, 0.90] 1.739 97.11More than 0.90 4.0149 100Room crowdednessLess than 0.20 0 18.22(0.20, 0.30] 0.5584 30(0.30, 0.40] 1.6535 47.41(0.40, 0.70] 2.5727 74.09(0.70, 1.00] 4.3886 93.05(1.00, 4.00] 6.0042 99.91More than 4.0 8.3828 100Income percapita relative to the minimum wageLess than 0.15 0 56.85(0.15, 0.25] 0.8476 73.69(0.25, 0.35] 2.1828 82.46(0.35, 0.50] 3.5362 92.53(0.50, 0.75] 5.3636 95.25(0.75, 1.00] 7.0827 97.69(1.00, 1.25] 8.2489 98.63(1.25, 1.50] 9.4853 99.15(1.50, 2.00] 10.2098 99.59(2.00, 3.00] 11.3999 99.87(3.00, 4.00] 13.0872 99.94More than 4.0 13.7378 100UtilitiesWater sourceRiver or spring 0 1.21Public well/pool or other source 1.1601 4.92Well without a pump 2.6497 7.36Well with a pump 4.6037 8.46Truck 6.1693 10.31Water/sewage system 7.2554 100Type of toilet facilitiesNo toilet facilities 0 8.18Latrine 2.4519 12.3Toilet without connection to water source 3.3323 16.93Toilet connected to a well 3.9615 24.77Toilet connected to sewage 6.8306 100Waste collection and disposalThrow it to a lot 0 14.78Take it to a container 2.1291 19.5Picked by garbage collection services 3.2701 100

Source: Colombia’s National Planning Agency, (DNP).19

Table 4: Education and Income by Poverty Index Score Using 1993 Survey DataPoverty Index Score Years of Schoooling HH income per capita

(groups) (if older than 18 yrs) (normalized)0-25 2.35 0.1125-50 4.47 0.1550-75 7.96 0.6075-100 12.57 2.08Mean 7.23 0.56

Median 7 0.31

Table 5: Reconstructed vs. Given Poverty Index ScoreDescription Individuals HH % HH

Urban 18,283,865 5,358,130Housing Match 18,223,521 5,341,261 99.67

No-match 60,344 16,869 0.33Utilities Match 18,183,770 5,331,420 99.45

No-match 100,095 26,710 0.55Education Match 17,826,330 5,229,323 97.50

No-match 457,535 116,501 2.50Demographic Match 16,145,135 4,747,080 88.60

No-match 2,138,730 611,050 11.40

Table 6: Differences for Demographic Component in Reconstructed vs. Given Poverty IndexScore

HH %Age 9,516 1.56Employment 308 0.05Number of Rooms 120,764 19.76Minimum wage 446,368 73.05Family size 29,223 4.78Value not found 4,871 0.80Total 611,050

20

Figure 1: Number of Census of the Poor Interviews, controlling for Municipality and Strata

−10

000

1000

2000

3000

Rel

ativ

e nu

mbe

r of

sur

veys

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003Year

Black line indicates regional executive elections (mayors). Base month: Jan, 1994.

Figure 2: Size of the Discontinuity at the Threshold, controlling for Municipality and Strata

−.2

−.1

0.1

.2%

3 po

ints

pre

− %

3 po

ints

pos

t

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003Year

Black line indicates regional executive elections (mayors). Base month: Jan, 1994. 95% CI.

21

Figure 3: 1994-2003 Poverty Index Score Distribution

01

23

45

6Pe

rcen

t

0 7 14 21 28 35 42 49 56 63 70 77 84 91 98Poverty index score

1994

01

23

45

6Pe

rcen

t

0 7 14 21 28 35 42 49 56 63 70 77 84 91 98Poverty index score

1995

01

23

45

6Pe

rcen

t

0 7 14 21 28 35 42 49 56 63 70 77 84 91 98Poverty index score

1996

01

23

45

6Pe

rcen

t0 7 14 21 28 35 42 49 56 63 70 77 84 91 98

Poverty index score

1997

01

23

45

6Pe

rcen

t

0 7 14 21 28 35 42 49 56 63 70 77 84 91 98Poverty index score

1998

01

23

45

6Pe

rcen

t

0 7 14 21 28 35 42 49 56 63 70 77 84 91 98Poverty index score

1999

01

23

45

6Pe

rcen

t

0 7 14 21 28 35 42 49 56 63 70 77 84 91 98Poverty index score

2000

01

23

45

6Pe

rcen

t

0 7 14 21 28 35 42 49 56 63 70 77 84 91 98Poverty index score

2001

01

23

45

6Pe

rcen

t

0 7 14 21 28 35 42 49 56 63 70 77 84 91 98Poverty index score

2002

01

23

45

6Pe

rcen

t

0 7 14 21 28 35 42 49 56 63 70 77 84 91 98Poverty index score

2003

22

Figure 4: Poverty Index Score and Reconstructed Score

01

23

45

Per

cent

0 7 14 21 28 35 42 49 56 63 70 77 84 91 98Poverty index score

Poverty Index Score Reconstructed Score

Figure 5: Suspicious Concentration of Answers

02

46

810

Per

cent

0 7 14 21 28 35 42 49 56 63 70 77 84 91 98Poverty index score

23

Figure 6: Combinations that can be Used to Generate Each Score

01

23

4P

erce

nt

0 7 14 21 28 35 42 49 56 63 70 77 84 91 98Poverty index score

24

Figure 7: Poverty Index and Survey Data Score Distributions

01

23

4P

erce

nt

0 7 14 21 28 35 42 49 56 63 70 77 84 91 98Poverty index score

Poverty Index 1994 Survey Data 1993 lowess

1993−1994

01

23

4P

erce

nt

0 7 14 21 28 35 42 49 56 63 70 77 84 91 98Poverty index score

Poverty Index 1997 Survey Data 1997 lowess

1997

01

23

45

6P

erce

nt

0 7 14 21 28 35 42 49 56 63 70 77 84 91 98Poverty index score

Poverty Index 2003 Survey Data 2003 lowess

2003

25

Figure 8: Poverty Index Surveys by Stratum, controlling for Municipality

010

0020

0030

00R

elat

ive

num

ber

of s

urve

ys

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003Year

stratum 0 stratum 1 stratum 2 stratum 3

Black line indicates regional executive elections (mayors). Base month: Jan, 1994.

Figure 9: Census of the Poor and 1993 Survey Data Score Distribution

01

23

45

Per

cent

0 7 14 21 28 35 42 49 56 63 70 77 84 91 98Poverty index score

All Years of Poverty Index Survey Data 1993 lowess

26

Figure 10: Marginal Benefit and Marginal Cost of Score Manipulation

cost units

MC0

MB0

MC0 MB0

sI score s0

Figure 11: Stylized Score Density Function with Manipulation

f(s)

score sI s0

27

Table 7: Discontinuity at the Threshold and Political CompetitionDependent variable: 3 points 3 points 3 points 5 points 5 points 5 pointsPolitical competition 0.09 0.087 0.088 0.09 0.087 0.087

[0.034]*** [0.037]** [0.037]** [0.058] [0.062] [0.062]Log population 0.82 0.796 0.701 0.668

[0.262]*** [0.294]*** [0.399]* [0.413]Ratio of urban to total -0.52 -0.705population [3.058] [3.553]Constant 0.085 -4.941 -7.681 0.173 -4.941 -6.19

[0.010]*** [2.549]* [3.854]** [0.014]*** [2.549]* [4.891]Year effects Yes Yes Yes Yes Yes YesMunicipality effects Yes Yes Yes Yes Yes YesObservations 468 468 468 496 496 496R-squared 0.92 0.93 0.93 0.94 0.94 0.94

Robust standard errors in brackets* significant at 10%; ** significant at 5%; *** significant at 1%The dependent variable (-1 to 1) is the difference in the fraction of interviews 3 and 5 pointsbefore the threshold relative to the same points after the threshold, using data for the 6 monthsprior to the election. The closer to 0 the smaller the discontinuity at the threshold.Political competition (-1 to 0) is defined as the negative of the difference in the fraction of votesthe winner received relative to the second runner in the previous election. The closer to 0 themore competitive the election.

28

Table 8: Discontinuity at the Threshold and Costs of Cheating (Cross Section)Dependent variable: 3 points 3 points 5 points 5 pointsNumber of community -0.001 -0.003organizations [0.000]*** [0.001]***Newspaper circulation -0.007 -0.02

[0.002]*** [0.003]***Log population -0.012 -0.015 -0.013 -0.015

[0.004]*** [0.004]*** [0.006]** [0.007]**Constant 0.245 0.266 0.364 0.345

[0.038]*** [0.037]*** [0.057]*** [0.062]***Year effects Yes Yes Yes YesOther controls Yes Yes Yes YesMunicipality effects No No No NoObservations 795 594 855 627R-squared 0.07 0.09 0.08 0.08

Robust standard errors in brackets* significant at 10%; ** significant at 5%; *** significant at 1%The dependent variable (-1 to 1) is the difference in the fraction of interviews 3 and 5 pointsbefore the threshold relative to the same points after the threshold, using data for the 6 monthsprior to the election. The closer to 0 the smaller the discontinuity at the threshold.The number of community organizations is divided by 100.The number of newspaper circulation is divided by 100,000.Other controls include proportion urban population and proportion poor population using ameasure for unsatisfied basic needs from the 1993 census.

29

A Appendix: Model that Assumes the Costs of Cheating In-crease with the Score (preliminary)

Using a probabilistic voting model framework. Let the density of the score s be given by F(s).Voters support the incumbent, I, if the utility they would get from him exceeds the utility theywould get from the challenger C:

GI + bsnII[s < s0] + bsI[s0 < s < sI ] + δi + θ > GC (8)

Where:

• G represents the candidate’s proposed policy variable, assume it is exogenous.

• bsnII[s < s0] represents the benefit to the individual if the incumbent conducts a relatively

large number of surveys before the election, nI . This terms affects only people with a scorelower than the official threshold s0.

• bsI[s0 < s < sI ] captures the benefits to the people with a score less than the “modifiedthreshold”, sI , which is chosen by the politician and will be greater than or equal to s0.

• δi is an individual’s specific measure of the voter’s bias toward the candidate. δi ∼ U[−12φ ,

12φ

]and is distributed independently of score with a density of φ.

• θ is an aggregate shock to the population’s preferences, realized after the parties commit topolicies. θ ∼ U

[−12ψ ,

12ψ

]with a density of ψ.

Re-writing equation 8 we get for the pivotal voter:

GI −GC + bsnII[s < s0] + bsI[s0 < s < sI ] + θ = −δi (9)

From equation 9 and using the fact that δi ∼ U[−12φ ,

12φ

]we get the vote share for the incum-

bent:

VI = 12 + φ(bs +GI −GC + θ)(F (sI)− F (s0)) (10)

+ φ(bsnI +GI −GC + θ)(F (s0)) + φ(GI −GC + θ)(1− F (sI))

The first term represents the fraction of people who benefit from cheating and vote for theincumbent. The second terms corresponds to the fraction of people who benefit from surveys andvote for the incumbent. The third term captures people who vote for the incumbent because ofideological reasons. The incumbent wants to maximize his probability of winning the next electionPr(VI > .5). Using equation 10 and information on the density of θ we get the incumbent’sproblem:

maxsI ,nI

P IR− c(sI , nI) (11)

30

Where R is the value to the incumbent of winning the election. This is equivalent to:

maxsI ,nI

(12

+ ψφ((bs +GI −GC)(F (sI)− F (s0)) (12)

+(bsnI +GI −GC)(F (s0)) + (GI −GC)(1− F (sI))))R− c(sI , nI)

The challenger can not do surveys or cheat before the election.

Let c(sI , nI) = β(sI)2 + nIsI + η(nI)2

Solving for sI and nI :

sI =ψφRbs(2ηf(sI)− F (s0))

4βη − 1(13)

nI =ψφRbs(2βF (s0)− f(sI))

4βη − 1(14)

Which say that, if we assume that f ′(sI) < 0 because from the aggregate data, when we plotthe score distribution, f(s) is decreasing above s0, then:

1. To get the prediction that when the costs of cheating increase the amount of cheatingdecreases ∂sI

∂β < 0, we need to assume that either:

• 4βη − 1 > 0 or

• If 4βη − 1 < 0 then |4βη − 1| <∣∣−ψφRbs2ηf ′(sI)∣∣.

2. To get the prediction that there is an inverse relationship between surveys and cheating∂sI

∂nI < 0, we need: 4βη − 1 < 0.

3. To get the prediction that the number of surveys is lower when the costs of conductingsurveys is higher ∂nI

∂η < 0, we need: (2βF (s0)− f(sI)) > 0.

31

References

Attanasio, Orazio, Emla Fitzsimons, Ana Gomez, Diana Lopez, Costas Meghir, and Alice Mes-nard, “Child Education and Work Choices in the Presence of a Conditional Cash TransferProgramme in Rural Colombia,” CEPR Discussion Papers 5792, C.E.P.R. Discussion PapersAugust 2006.

Besley, Timothy and Robin Burgess, “The Political Economy Of Government Responsiveness:Theory And Evidence From India,” The Quarterly Journal of Economics, November 2002, 117(4), 1415–1451.

Cardenas, Mauricio, Introduccion a la Economıa Colombiana, Alfaomega-Fedesarrollo, BogotaColombia, 2006.

DNP, “SISBEN: Una Herramienta para la Equidad,” Technical Report July 1997.

Drazen, Allan and Marcela Eslava, “Electoral Manipulation via Expenditure Composition: The-ory and Evidence,” NBER Working Papers 11085, National Bureau of Economic ResearchJanuary 2005.

Felipe, Leigh Linden Barrera-Osorio and Miguel Urquiola, “The Effects of User Fee Reductionson Enrollment: Evidence from a Quasi-Experiment,” Working Paper, January 2007.

Jacob, Brian A. and Steven D. Levitt, “Rotten Apples: An Investigation of The Prevalence andPredictors of Teacher Cheating,” The Quarterly Journal of Economics, August 2003, 118 (3),843–877.

Lindbeck, Assar and Jorgen Weibull, “Balanced-Budget Redistribution as the Outcome of PoliticalCompetition,” Public Choice, January 1987, 52 (3), “272 – 297”.

Moreno, Alexandra, “Por la Cual se Fijan Criterios para Lograr Transparencia en el Sistema deSeleccion de Beneficiarios SISBEN,” Colombian Congress, 2005.

Persson, Torsten and Guido Tabellini, Political Economics: Explaining Economic Policy, MITPress, Cambridge MA, 2000.

Robledo, Jorge Enrique, “El Presidente Dispone de $1,4 Billones Para Hacer Clientelismo,” colom-bia.indymedia.org, October 2006.

Rosas, Andres and Juan Mendoza, “The Economic Effects of Geography: Colombia as a CaseStudy,” Documentos de Economıa 003584, Universidad Javeriana May 2005.

32