SCREENING FOR LIKELY VOTERS
IN PRE-ELECTION SURVEYS
Michael Dimock, Pew Research
Center for the People and the Press
Scott Keeter, George Mason
University
Mark Schulman, Schulman,
Ronca and Bucuvalas, Inc.
Carolyn Miller, Princeton
Survey Research Associates
May
16, 2001
Paper prepared for
presentation at the 56th Annual American Association for Public
Opinion Research Conference. May
17-20, 2001 Montreal, Quebec,
Canada.
INTRODUCTION
For all survey organizations, the accuracy of election projections hinges
not only on having large and representative samples, but on accurately
predicting who is and is not going to vote on election day. Discriminating between those who say they are going to vote and those who
actually are going to vote has become
a fine art, with numerous techniques.
Though the process of culling likely voters from the larger pool of
registered voters is often taken for granted, the process came to the forefront
of many researchers' attention during the 2000 Presidential election, in which
no candidate developed a clear lead and thus measurement precision became
essential.
The Pew Research Center uses a procedure to arrive at likely voter
estimates that was first developed in the 1950s and 1960s by Paul Perry, then
chief election statistician at the Gallup Organization. The method is based on deriving a likely
voter index from a number of questions that are known to relate to actual voter
turnout. The purpose of this paper
is to investigate the effectiveness of this approach using a dataset collected
in the 1999 Philadelphia mayoral race in which the actual turnout of
pre-election poll respondents was validated through precinct records. Using this data, we are able to test the
effectiveness of the current likely voter index, whether expanding the index to
include more items would improve predictions and whether trimming down the index
would cause problems. We are also
able to compare the effectiveness of the Guttman scaling technique applied by
Gallup and Pew to methods in which respondents are assigned a probability of
voting and weighted appropriately.
The results suggest that the standard Perry-Gallup likely voter index is
as effective today as it has ever been, and is very difficult to improve
upon. Expanding the 8-item index to
include as many as 15-items has minimal impact on index efficacy. Moreover, more complicated probability
models do nothing to improve the accuracy of the likely voter estimates that can
be derived. Overall, our findings
reinforce past research on predicting voter turnout from pre-election
polls. Though it is impossible to
accurately predict the behavior of all survey respondents, it is possible to
accurately estimate the preferences of voters by identifying those most likely
to vote.
In addition to studying pre-election likely voter screens, validation of
non-responding households was also conducted. Based on this data, we are able to study
whether turnout rates among non-respondents differs from that of survey
participants.
BACKGROUND
The 1999 Philadelphia mayoral election turned out to be one of the
closest in the city's history – just 9,447 votes separated the victor, Democrat
John Street, from his opponent, Republican Sam Katz. This represents a 2.2% margin of victory
among the 441,981 votes cast by residents of the city. Moreover, it was the most expensive
municipal election in American history – with total spending well over $25
million, including $10 million by Street and $7 million by Katz (Committee of
Seventy, 1999). According to the
Philadelphia Board of Elections records, roughly 45% of registered voters
turnout out on election day.
As far as can be determined, turnout among black constituents was
relatively high, and overwhelmingly supportive of John Street, who is
African-American himself. Roughly
42% of residents in overwhelmingly black wards voted, and Street received 91% of
their vote. In overwhelmingly white
wards, turnout was only slightly higher at 47%, with 83% of the vote going to
Katz. This turnout disparity
between black and white wards (42% to 47%) was the smallest in 16 years,
according to a local public interest group.
The accuracy of these turnout figures is questionable, not because we
don't know how many Philadelphia residents voted, but because of poor record
keeping with respect to voter registration. In 1999, the Board of Elections
identified 985,912 registered voters, or 93% of the 1,056,764 who were age
eligible according to the Bureau of the Census. However, in the Pew Research Center's
validation study, only 70% of respondents over the age of 18 claimed to be
registered voters. And the evidence
suggests that even this may be an overstatement. Just 86% of self-reported RVs who gave a
name and address could actually be found in the voter registration list. Combined, this suggests that the true
registration figure for Philadelphia may be around 60% of the voting age
population, or roughly 600,000 instead of the nearly one million reported by the
Board of Elections. Adjusting the
total registration numbers based on this estimate, the 441,981 voters who
participated in the November election represent roughly a 70% turnout rate among
registered voters.
Though using a municipal election as a basis for a validation study has
inherent external validity concerns, it provides a relatively low-cost means of
accumulating and validating actual voting behavior.[1] Overall, we were able to match roughly
70% of the self-identified registered voters that we interviewed. The biggest factor in matching success
was the willingness of the interviewee to disclose their name and address at the
end of the survey. We successfully
matched 86% of those who gave us their name and address, just 43% of those who
gave a name only.
The objective of the matching process is to uncover, using voting
records, the actual behavior of our respondents on election day. Our matching process used five distinct
identifying characteristics as a means of aligning interview subjects with
voting records: phone number,
address, last name, first name and birth year. Overall, 75% of the cases we were able
to match met virtually all of these criteria – matching first and last name,
birth date, and either phone or address or both. The remaining 25% were matched based on
first and last name and birth year only (primarily among those who gave only
their name), or those for whom we could match at least phone or address, first
or last name, and at least a close match on birth year.
PART 1: The Elements of the Perry-Gallup Likely
Voter Index
Typically, estimates of voter preferences in an election poll are based
only on those who are registered to vote.
An analysis of a 1984 Gallup validation study suggests that filtering out
respondents who say they are not registered introduces very little error in
horserace predictions. Just 6% of
those who said they weren't registered actually were, according to voting
records, and only 2% actually voted (Colasanto and Mattlin, 1987). As a result, all the analysis to follow
will be based solely on respondents who report themselves as registered
voters.[2]
But basing horserace predictions on all who claim to be registered is
still problematic, since survey participants tend to both overstate their
registration and their propensity to vote.
In the 1984 Gallup study, fully 23% of those who claimed to be registered
were not, and 30% did not vote on election day. Were this error distributed randomly
across the population, we might overlook it. However, overestimation of registration
and voting is highest among predominantly Democratic constituencies, leading to
a systematic bias in favor of Democratic candidates unless some further filter
is applied.
The likely voter screen used by Gallup and the Pew Research center is
based on an index measuring each respondent's propensity to vote. In addition to registration, the likely
voter index, originally developed by Paul Perry at Gallup, is made up of eight
items intended to identify four concepts related to voter turnout: voter
interest, voter intentions, past voting behavior, and knowledge about where to
vote, each of which will be discussed below. Though there are slight variations
between the original Perry-Gallup index applied in the 1960s, 1970s and early
1980s and the one used today by the Pew Research Center, they are based on the
same fundamental structure, outlined in Table 1 below.
This procedure results in a Guttman index with values ranging from zero
to eight, with the highest values representing those with the greatest
likelihood of voting. Both Gallup
and the Pew Research Center then make a projection of voter turnout based on the
past turnout rates and early indicators of turnout, such as particularly high or
low levels of interest in the campaign.
This turnout projection is used to define what percentage of respondents
will be considered "likely voters" – the proportion of highest scoring
respondents on which election estimates will be based. For example, in forecasting the 2000
presidential election, the Pew Center forecast that 50% of the age-eligible
population would vote, and based its estimates on the 50% of respondents
receiving the highest index scores.
In the 1999 Philadelphia study, evidence suggested that roughly 70% of
registered voters would turn out to vote.[3]

In addition to providing a more stable and reliable measure across
distinct survey samples, the eight-item index provides a level of operational
and content validity that no single item can achieve. But in order to fully investigate the
effectiveness of the index and whether improvements can be made, the relevance
and effectiveness of each index element will first be examined, grouped by the
substantive concepts they measure.
Measures of Voter Interest
Citizens who are more interested in politics and who have been paying
attention to the campaign are presumably more likely to vote than those who are
disinterested, and a bivariate analysis of voting patterns suggests that this is
true (see Table 2). To measure
interest in politics, respondents are asked how much they follow what's going on
in government and public affairs.
According to the 1999 Philadelphia validation study, fully 84% of those
who follow politics "most of the time" actually voted in the mayoral race,
compared to 61% of those who follow politics "only now and then" and 55% of
those who "hardly at all" follow government affairs.
In the 1984 Gallup pre-election poll a slightly different question
achieved similar results.
Seventy-nine percent of those who say they have a "great deal" of
interest in politics turned out on election day 1984, compared to 71% of those
who have a "fair amount" of interest, 60% of those with "only a little"
interest, and 19% of those with "no interest at all."
Looking at actual attention to the campaign in the bottom of Table 2, we
see that 85% of 1999 respondents who said they had given "quite a lot" of
thought to the upcoming election actually voted, compared to 62% of those who
said "only a little." The identical
question achieved comparable figures in 1984, with 74% of those giving "quite a
lot" of thought to the election actually voting, compared to just 57% of those
who said "only a little."[4]
Measures of Voter Intentions
On its face, the most direct way of predicting voter turnout is to simply
ask whether a person intends to vote or not. Unfortunately, such a straightforward
question often gives us little traction, since nearly all who say they are
registered to vote tell us that they plan to vote. Fully
97% of registered voters in
the Philadelphia study told us they planned to vote, with only 2% saying they
did not, proportions almost identical in the 1984 nationwide Gallup study. Though all who say they do not plan to
vote are automatically coded at zero on the Perry-Gallup index, this question
has a minimal effect on overall index accuracy.
A more promising measure of voter intention has respondents rate their
chances of voting on a scale of 10 to 1.
Though more than three-fourths of registered voters in both 1999 and 1984
rated their chance of voting as a 10, this index provides a bit more variance
than the simple "do you plan to vote" question. Unfortunately, the Perry-Gallup index
codes all responses above "6" as likely voters. This has two problems – first, over 90%
(92% in 1999, 95% in 1984) rate their chance of voting as 7 or higher on the
scale, leaving us with little variance.
Second, in 1999 only 46% of
those who rate their chances of voting at "8" and only 33% of those who rate
their chances at "7" actually voted, introducing a high level of error into the
likely voter index. In light of this, we will test whether moving the cutpoint
up to "9", or even a solid "10" would improve index effectiveness.[5]
Measures of Past Voting
Behavior
Those who have voted in the past are the most likely to turn out in any
given election, and measures of past voting behavior are central to any measure
of the likelihood of voting. The
Perry-Gallup index uses two general measures of past voting: whether an individual voted in the
previous presidential election, and the individual's own assessment of how
regularly they vote. Each proves to
be a powerful predictor of turnout in both the 1999 mayoral race and the 1984
general election. Since respondents
aged 18-21 may not have had the opportunity to vote in previous national
elections, past voting behavior is not included as part of the likely voter
index for these respondents.
Table 4 shows that those who say they voted in the 1996 Presidential
election were roughly twice as likely as those who did not to participate in the
1999 Philadelphia mayoral election.
Interestingly, the 8% of the sample who couldn't recall if they had
voted, or refused to say, also exhibited high turnout in the mayoral race.
Our baseline likely voter index codes respondents as likely voters only
if they say they voted in 1996 and
can recall the name of the person they voted for. The assumption underlying this coding
choice is that we know that many people over-report past voting (in this poll
fully 80% of registered voters told us they voted in 1996), and those who say
they voted but can not recall who they voted for are the most likely to be the
non-voters in the crowd. The
validation study suggests otherwise.
Turnout among the 10% who say they voted in 1996 but can’t recall who
they voted for is not statistically different from turnout among those who say
they voted and can recall for whom.
Below we will test whether altering the index to include these
respondents in the likely voter index might improve index
accuracy.
Fully 85% of those who say they always vote turned out on November 2,
1999, along with 74% of those who say they nearly always vote. By comparison, just 43% of those who say
they vote part of the time went to the polls, and just 21% of those who said
they seldom or never vote.
Unfortunately, the Perry-Gallup likely voter index codes those who say
they vote just part of the time as likely voters, which this bivariate analysis
suggests may be inaccurate.[6] Altering the cutpoint on this question
to include only those who always or nearly always vote will be
tested.
Measures of Knowledge about Where to
Vote