A Voter Validation Experiment

SCREENING FOR LIKELY VOTERS IN PRE-ELECTION SURVEYS

 

 

Michael Dimock, Pew Research Center for the People and the Press

Scott Keeter, George Mason University

Mark Schulman, Schulman, Ronca and Bucuvalas, Inc.

Carolyn Miller, Princeton Survey Research Associates

May 16, 2001

Paper prepared for presentation at the 56th Annual American Association for Public Opinion Research Conference.  May 17-20, 2001  Montreal, Quebec, Canada.


INTRODUCTION

            For all survey organizations, the accuracy of election projections hinges not only on having large and representative samples, but on accurately predicting who is and is not going to vote on election day.  Discriminating between those who say they are going to vote and those who actually are going to vote has become a fine art, with numerous techniques.  Though the process of culling likely voters from the larger pool of registered voters is often taken for granted, the process came to the forefront of many researchers' attention during the 2000 Presidential election, in which no candidate developed a clear lead and thus measurement precision became essential. 

 

            The Pew Research Center uses a procedure to arrive at likely voter estimates that was first developed in the 1950s and 1960s by Paul Perry, then chief election statistician at the Gallup Organization.  The method is based on deriving a likely voter index from a number of questions that are known to relate to actual voter turnout.  The purpose of this paper is to investigate the effectiveness of this approach using a dataset collected in the 1999 Philadelphia mayoral race in which the actual turnout of pre-election poll respondents was validated through precinct records.  Using this data, we are able to test the effectiveness of the current likely voter index, whether expanding the index to include more items would improve predictions and whether trimming down the index would cause problems.  We are also able to compare the effectiveness of the Guttman scaling technique applied by Gallup and Pew to methods in which respondents are assigned a probability of voting and weighted appropriately. 

 

            The results suggest that the standard Perry-Gallup likely voter index is as effective today as it has ever been, and is very difficult to improve upon.  Expanding the 8-item index to include as many as 15-items has minimal impact on index efficacy.  Moreover, more complicated probability models do nothing to improve the accuracy of the likely voter estimates that can be derived.  Overall, our findings reinforce past research on predicting voter turnout from pre-election polls.  Though it is impossible to accurately predict the behavior of all survey respondents, it is possible to accurately estimate the preferences of voters by identifying those most likely to vote. 

 

            In addition to studying pre-election likely voter screens, validation of non-responding households was also conducted.  Based on this data, we are able to study whether turnout rates among non-respondents differs from that of survey participants.

BACKGROUND

            The 1999 Philadelphia mayoral election turned out to be one of the closest in the city's history – just 9,447 votes separated the victor, Democrat John Street, from his opponent, Republican Sam Katz.  This represents a 2.2% margin of victory among the 441,981 votes cast by residents of the city.  Moreover, it was the most expensive municipal election in American history – with total spending well over $25 million, including $10 million by Street and $7 million by Katz (Committee of Seventy, 1999).  According to the Philadelphia Board of Elections records, roughly 45% of registered voters turnout out on election day.

 

            As far as can be determined, turnout among black constituents was relatively high, and overwhelmingly supportive of John Street, who is African-American himself.  Roughly 42% of residents in overwhelmingly black wards voted, and Street received 91% of their vote.  In overwhelmingly white wards, turnout was only slightly higher at 47%, with 83% of the vote going to Katz.  This turnout disparity between black and white wards (42% to 47%) was the smallest in 16 years, according to a local public interest group.

 

            The accuracy of these turnout figures is questionable, not because we don't know how many Philadelphia residents voted, but because of poor record keeping with respect to voter registration.  In 1999, the Board of Elections identified 985,912 registered voters, or 93% of the 1,056,764 who were age eligible according to the Bureau of the Census.  However, in the Pew Research Center's validation study, only 70% of respondents over the age of 18 claimed to be registered voters.  And the evidence suggests that even this may be an overstatement.  Just 86% of self-reported RVs who gave a name and address could actually be found in the voter registration list.  Combined, this suggests that the true registration figure for Philadelphia may be around 60% of the voting age population, or roughly 600,000 instead of the nearly one million reported by the Board of Elections.  Adjusting the total registration numbers based on this estimate, the 441,981 voters who participated in the November election represent roughly a 70% turnout rate among registered voters.

 

            Though using a municipal election as a basis for a validation study has inherent external validity concerns, it provides a relatively low-cost means of accumulating and validating actual voting behavior.[1]  Overall, we were able to match roughly 70% of the self-identified registered voters that we interviewed.  The biggest factor in matching success was the willingness of the interviewee to disclose their name and address at the end of the survey.  We successfully matched 86% of those who gave us their name and address, just 43% of those who gave a name only.

 

            The objective of the matching process is to uncover, using voting records, the actual behavior of our respondents on election day.  Our matching process used five distinct identifying characteristics as a means of aligning interview subjects with voting records:  phone number, address, last name, first name and birth year.  Overall, 75% of the cases we were able to match met virtually all of these criteria – matching first and last name, birth date, and either phone or address or both.  The remaining 25% were matched based on first and last name and birth year only (primarily among those who gave only their name), or those for whom we could match at least phone or address, first or last name, and at least a close match on birth year.

 

PART 1:  The Elements of the Perry-Gallup Likely Voter Index

            Typically, estimates of voter preferences in an election poll are based only on those who are registered to vote.  An analysis of a 1984 Gallup validation study suggests that filtering out respondents who say they are not registered introduces very little error in horserace predictions.  Just 6% of those who said they weren't registered actually were, according to voting records, and only 2% actually voted (Colasanto and Mattlin, 1987).  As a result, all the analysis to follow will be based solely on respondents who report themselves as registered voters.[2]

 

            But basing horserace predictions on all who claim to be registered is still problematic, since survey participants tend to both overstate their registration and their propensity to vote.  In the 1984 Gallup study, fully 23% of those who claimed to be registered were not, and 30% did not vote on election day.  Were this error distributed randomly across the population, we might overlook it.  However, overestimation of registration and voting is highest among predominantly Democratic constituencies, leading to a systematic bias in favor of Democratic candidates unless some further filter is applied. 

 

            The likely voter screen used by Gallup and the Pew Research center is based on an index measuring each respondent's propensity to vote.  In addition to registration, the likely voter index, originally developed by Paul Perry at Gallup, is made up of eight items intended to identify four concepts related to voter turnout: voter interest, voter intentions, past voting behavior, and knowledge about where to vote, each of which will be discussed below.  Though there are slight variations between the original Perry-Gallup index applied in the 1960s, 1970s and early 1980s and the one used today by the Pew Research Center, they are based on the same fundamental structure, outlined in Table 1 below.

 

            This procedure results in a Guttman index with values ranging from zero to eight, with the highest values representing those with the greatest likelihood of voting.  Both Gallup and the Pew Research Center then make a projection of voter turnout based on the past turnout rates and early indicators of turnout, such as particularly high or low levels of interest in the campaign.  This turnout projection is used to define what percentage of respondents will be considered "likely voters" – the proportion of highest scoring respondents on which election estimates will be based.  For example, in forecasting the 2000 presidential election, the Pew Center forecast that 50% of the age-eligible population would vote, and based its estimates on the 50% of respondents receiving the highest index scores.  In the 1999 Philadelphia study, evidence suggested that roughly 70% of registered voters would turn out to vote.[3]

Text Box: TABLE 1: Elements of the Likely Voter Index

Points on Index	Question	Response Categories
1	Q2   Thought given to election	A Lot/Some
1	Q6   Follow government affairs	Most/Some of time
1	Q14 Plan to Vote	Yes
1	Q15 Likelihood of voting (10-pt scale)	7,8,9,10
1	D13 Voted in previous Presidential elect	Yes, recall candidate
1	Q7   How often do you vote	Always/Nearly/Part of time
1	Q4   Know where to vote	Yes
1	Q5   Ever voted in current election dist.	Yes
8		
   Respondents are automatically  coded zero (0) on the index if:
          (1) they are not registered to vote
          (2) they say they do not plan to vote
   Respondents under 22 are not penalized for past voting behavior (Q5, Q7, D13)

 


            In addition to providing a more stable and reliable measure across distinct survey samples, the eight-item index provides a level of operational and content validity that no single item can achieve.  But in order to fully investigate the effectiveness of the index and whether improvements can be made, the relevance and effectiveness of each index element will first be examined, grouped by the substantive concepts they measure.

 

            Measures of Voter Interest

Text Box: Table 2: Measures of Interest and Validated Voter Turnout

Follow gov't affairs	%	Voted
  Most of the time	52	Þ	84%	
  Some of the time	32	Þ	71%	
  Only now and then	11	Þ	61%	
  Hardly at all	5	Þ	55%	
  DK/Refused	    *	Þ	--	
	100			
				
Thought given	%	Voted
  A lot	58	Þ	85%	
  Some (Vol.)	8	Þ	74%	
  Only a little	30	Þ	62%	
  None (Vol.)	3	Þ	--	
  DK/Refused	    1	Þ	--	
	100			

            Citizens who are more interested in politics and who have been paying attention to the campaign are presumably more likely to vote than those who are disinterested, and a bivariate analysis of voting patterns suggests that this is true (see Table 2).  To measure interest in politics, respondents are asked how much they follow what's going on in government and public affairs.  According to the 1999 Philadelphia validation study, fully 84% of those who follow politics "most of the time" actually voted in the mayoral race, compared to 61% of those who follow politics "only now and then" and 55% of those who "hardly at all" follow government affairs.

 

            In the 1984 Gallup pre-election poll a slightly different question achieved similar results.  Seventy-nine percent of those who say they have a "great deal" of interest in politics turned out on election day 1984, compared to 71% of those who have a "fair amount" of interest, 60% of those with "only a little" interest, and 19% of those with "no interest at all."

 

            Looking at actual attention to the campaign in the bottom of Table 2, we see that 85% of 1999 respondents who said they had given "quite a lot" of thought to the upcoming election actually voted, compared to 62% of those who said "only a little."  The identical question achieved comparable figures in 1984, with 74% of those giving "quite a lot" of thought to the election actually voting, compared to just 57% of those who said "only a little."[4] 

 

            Measures of Voter Intentions

            On its face, the most direct way of predicting voter turnout is to simply ask whether a person intends to vote or not.  Unfortunately, such a straightforward question often gives us little traction, since nearly all who say they are registered to vote tell us that they plan to vote.  Fully Text Box: Table 3: Measures of Intention and Validated Voter Turnout

Plan to vote	%	Voted
  Yes	97	Þ	77%	
  No	2	Þ	--	
  DK/Refused	    1	Þ	--	
	100			
				
10-pt scale	%	Voted
   10	77	Þ	84%	
     9	6	Þ	71%	
     8	6	Þ	46%	
     7	3	Þ	33%	
  1-6	7	Þ	39%	
  DK/Refused	    1	Þ	--	
	100			

97% of registered voters in the Philadelphia study told us they planned to vote, with only 2% saying they did not, proportions almost identical in the 1984 nationwide Gallup study.  Though all who say they do not plan to vote are automatically coded at zero on the Perry-Gallup index, this question has a minimal effect on overall index accuracy.

 

            A more promising measure of voter intention has respondents rate their chances of voting on a scale of 10 to 1.  Though more than three-fourths of registered voters in both 1999 and 1984 rated their chance of voting as a 10, this index provides a bit more variance than the simple "do you plan to vote" question.  Unfortunately, the Perry-Gallup index codes all responses above "6" as likely voters.  This has two problems – first, over 90% (92% in 1999, 95% in 1984) rate their chance of voting as 7 or higher on the scale, leaving us with little variance.  Second,  in 1999 only 46% of those who rate their chances of voting at "8" and only 33% of those who rate their chances at "7" actually voted, introducing a high level of error into the likely voter index. In light of this, we will test whether moving the cutpoint up to "9", or even a solid "10" would improve index effectiveness.[5]

 

            Measures of Past Voting Behavior

            Those who have voted in the past are the most likely to turn out in any given election, and measures of past voting behavior are central to any measure of the likelihood of voting.  The Perry-Gallup index uses two general measures of past voting:  whether an individual voted in the previous presidential election, and the individual's own assessment of how regularly they vote.  Each proves to be a powerful predictor of turnout in both the 1999 mayoral race and the 1984 general election.  Since respondents aged 18-21 may not have had the opportunity to vote in previous national elections, past voting behavior is not included as part of the likely voter index for these respondents.

            Table 4 shows that those who say they voted in the 1996 Presidential election were roughly twice as likely as those who did not to participate in the 1999 Philadelphia mayoral election.  Interestingly, the 8% of the sample who couldn't recall if they had voted, or refused to say, also exhibited high turnout in the mayoral race. 

 

Text Box: Table 4: Measures of Past Voting and Validated Voter Turnout

Voted in '96 Presid.	%	Voted
  Yes, Voted	70	Þ	81%	
  Voted, forgot who	10	Þ	76%	
  Did not vote	12	Þ	40%	
  DK/Refused	    8	Þ	82%	
	100			
				
How often do you…	%	Voted
  Always	60	Þ	85%	
  Nearly always	25	Þ	74%	
  Part of the time	9	Þ	43%	
  Seldom/(Never-vol.)	5	Þ	21%	
  Other/DK/Refused	    1	Þ	--	
	100			

            Our baseline likely voter index codes respondents as likely voters only if they say they voted in 1996 and can recall the name of the person they voted for.  The assumption underlying this coding choice is that we know that many people over-report past voting (in this poll fully 80% of registered voters told us they voted in 1996), and those who say they voted but can not recall who they voted for are the most likely to be the non-voters in the crowd.  The validation study suggests otherwise.  Turnout among the 10% who say they voted in 1996 but can’t recall who they voted for is not statistically different from turnout among those who say they voted and can recall for whom.  Below we will test whether altering the index to include these respondents in the likely voter index might improve index accuracy.

 

            Fully 85% of those who say they always vote turned out on November 2, 1999, along with 74% of those who say they nearly always vote.  By comparison, just 43% of those who say they vote part of the time went to the polls, and just 21% of those who said they seldom or never vote.  Unfortunately, the Perry-Gallup likely voter index codes those who say they vote just part of the time as likely voters, which this bivariate analysis suggests may be inaccurate.[6]  Altering the cutpoint on this question to include only those who always or nearly always vote will be tested.

 

            Measures of Knowledge about Where to Vote