Christmas Edition: How Ottawa Votes

Scroll this
Printable Version Save to Instapaper

This post was prepared for a ‘book club’ of Open Data Ottawa, a group based in Ottawa, Ontario that aims to showcase the good that can come from official government releases of data. Last month, one of the members redesigned the Jobs Portal, for example. This month the focus of the group was the Municipal Election Result dataset.

In this ‘investigation’, I’ve decided to use correlation mathematics to help determine how voting in Ottawa works, and can be affected by polling setup. A huge focus this year was the low voter turn-out, and I accordingly adjusted my statistical search to try and find any reasons that may have caused it in the basic setup of public polling that the city uses. I’ve used Microsoft Excel (with heavy reliance on the CORREL function) to both arrange and analyze my data, and combined the election results with publicly available city ward population data from the City of Ottawa themselves in order to better understand the smaller decisions that govern election results. I’ve also chosen to compare the 2014 election setup to the 2010 election setup to see whether any patterns come up that would be interesting to look at. My approach was a question based one, allowing one answer to lead me to ask the next question.

A single definition that you will need to know before understanding my results is my use of the word ‘Opportunities’. I define ‘Opportunities’ as a moment of time and place where you can go to vote. What this means is that any voting location that was open for polling thrice (twice for advanced voting and once for voting day) counts as three ‘Opportunities’! The reason I’ve done this is much the same as the reason the city provides multiple voting days in the first place: trying to catch all possible voters who may miss a specific window. This can have significant effects on my dataset due to the fact that I use ‘Opportunities’ to calculate much of what I believe can be reasons for a lower voter turnout, so I’ve divided my dataset per ward. This allows me to have sufficient replicates of each finding to ensure a trend exists. So, without further ado, herein is what I found.


since the dataset was divided by wards, can we make sure that this doesn’t change the results we might get?

First, I wanted to focus on whether my dividing the data by wards changes its meaning in any way. Looking at both the 2010 and 2014 results I could see variability across the voter turnout numbers (seen in blue) but nothing too egregious. But don’t take my word for it. In 2010, the mean voter turnout across wards was 44.5% with a standard deviation of 3.48%. In 2014 these numbers were 39.1% and 4.32%, respectively. These deviations are small enough that they account for a small minority of any results we have, and if we use the wards as statistical replicates, then the opportunity for any particularly skewed deviation affecting our whole result is insignificant. On the other hand, there was severe variability in how the voter turnout per opportunity provided (definition above) divided by 10 (for proper graph scaling) behaved (seen in red). What this means is that although across our wards everyone represented themselves consistently in both years, there is either an inconsistency in some wards with how many opportunities are provided or there is an inconsistency in some wards with when people choose to vote!


Does a high number of voting opportunities affect the amount of voting that happens per opportunity?

 Next, I looked at whether the increase in numbers of opportunities could lead to the significant fluctuations seen in the voter turnout per opportunity given. No surprises here, when voters are presented with more opportunities to vote, they spread themselves out to vote whenever is most comfortable for them. In 2010, the correlation between these two variables was -0.83 and in 2014 it was -0.87. These two variables are strongly inversely correlated. The difference between the years can simply be attributed to the lower overall voter turnout in 2014, which can have the overall effect of making it seems like fewer people showed up per voting opportunity when compared to 2010, strengthening our correlation (advantageously, this tells us that a systematic lower voter turnout existed and no single ward was responsible for it individually). However, what this particular result means, when taken in context of a consistent voter turn out per ward (the result above) is that when there are fewer voting opportunities, the people most likely to vote still end up capitalizing on them! The correlation in 2010 for the voter turnout when compared to the opportunities presented was 0.07 and in 2014 it was -0.16. The 2010 correlation is insignificant, but there is a weak negative correlation in the 2014 data-set, meaning that even though some times more opportunities are provided, fewer people turned up overall. Why? Could this account for the lower turnout in 2014?


 Does a fluctuation in voting opportunities affect voter turnout?

Now here’s an interesting result! The change in the number of opportunities provided is correlated with voter turnout. In fact, those two variables have a medium correlation of 0.347 (the first chart). The change in the number of opportunities provided on voting day was weakly correlated at 0.267 (the second chart). However, the change in the number of opportunities provided is not correlated with the change in voter turnout, with a weak correlation of 0.15 (the third chart). How does that make sense? Well, it pays to be mindful that 2010 must have had some fluctuations itself as the voter turnout in 2010 itself wasn’t standardized in anyway. What this means is that we’d need a more standard way of comparing years, or to compare results over a longer period of time, to see whether the changes are correlated with each other. On the other hand, the number of oppurtinities presented is an absolute non-subjective number that we can accurately calculate the difference for absolutely. For now, it’s clear that the change is correlated with the voter turnout of a given year itself. Even though voters will try to ‘make it’ to the best possible opportunity (given the above strongly negative correlation), some times a change in one of the opportunities on voting day or advanced voting day is enough to demotivate them from voting at all! Note that the correlation doesn’t even break the 0.5 mark, so it only accounts for about (:P) 34.7% of the reason some Ottawa residents withheld their vote, but it is still a factor!


does the number of advanced polling opportunities affect the turnout?

From the above results we know that the core voting population will always try to make it to the polls, and that change in the opportunities to do such presented to them does change that somewhat! So next I looked at whether the existence of advanced polling options matters, thinking that it should. Turns out, it didn’t. In 2010, the number of advanced opportunities correlated at 0.09 with voter turn-out, and in 2014 this was down to -0.02. This means that a majority of voting as we know it happens on the day itself, but it also means that the majority of discouragement from voting could happen on voting day itself (it was literally 2/3rds of the correlation we saw above). This could mean that a higher voting turnout can be achieved by increasing voting day polling locations, and decreasing them in advanced polling days, effectively not changing the budget of the operation, but positively changing the voter turnout! This is backed up by the fact that in the previous section, wards with a positive change in polling opportunities exclusively populated the high end of the ward turnout percentages.


what if the correlation happened because the city expected a higher turnout in some wards?

 Since correlation is not causation it’s possible that the correlation solely exists because the city correctly predicted the higher turnout in some wards and provided a higher number of opportunities to them. So, I tried to find out how the city comes to the conclusion of changing the number of opportunities on a given election. Was it based on electors or population size? The former would make sense politically (and would imply that the correlation we’re seeing is due to the fact that the city plans their opportunities according to expected turnout), but the latter would make sense if the city is trying to capitalize on people voting during their work day or their time off (when they visit a ward with a higher population density). The correlation for the change in number of opportunities and population was 0.24 (the first chart) which suggests that the latter is true, and among the wards that had a specific change in the number of opportunities on voting day, the correlation was 0.29 (the second chart)! This compares to the same changes when based on electors at 0.02 (the third chart) and 0.13 (the fourth chart), respectively, which suggests some overlap in population and elector status, or some motility in how people move residence within the city, but I haven’t investigated this yet.

These two charts looked really similar to me, so I wondered whether changes in population and changes in electors were the same thing, and they are strongly correlated at 0.81, which makes sense: where there are more people, there are more voters. What was interesting, though, was that the change in population was strongly negatively correlated with the change in electors per population at -0.79 (the fifth chart). What this means is that the higher the population, the less dense it is when it comes to voters within it, which means that if the city was trying to target where electors would most likely be we would have had to see a stronger correlation between change in electors and opportunities than between change in population and opportunities. This also suggests something that is equally as cool: Most electors are Ottawa based, and the increase in population isn’t due to an influx of electors from other cities, but is probably a mixture of the birth rate and immigrants! Ottawa’s municipal climate, for at least these past 4 years, has been under the exclusive control of those affected by it.


conclusions and open data

So, to sum up the findings: Voter turnout is consistent across the entire city, and this is not affected by the number of opportunities presented (or the density of the opportunities presented – data not shown), meaning there is a ‘core’ voting population that will always try to make it in every ward. Changes in the voting opportunities presented are correlated with voter turnout, however, and especially so on voting day. This could mean that either some voters are dissuaded when their favorite polling center is unavailable, or that the city specifically targets the changes in its polling locations or times to where it expects the highest percentage of voters. It is unlikely to be the latter because the changes seem to be based more on changes in population than electors, and also because some of the city’s highest turnouts also received some of the city’s strongest cuts to voting opportunities. An immediate improvement the city could provide is to cut back on the number of advanced polling opportunities (as high as 60% in some wards) and focus their location rental or hours on voting day, where there is most public recruitment to the polls.

Things that I have yet to look at are the separation of time and place (whether breaking down the opportunities to remove the time factor could prove to improve our statistics), whether or not the return of some incumbents could affect voter turnout, whether the geographical lack of some voting day locations makes sense in the lower voter turnout, and the actual significance of advanced voting days (considering statistically they only seem relevant in predicting election results but not much else). In the interests of maintaining the status quo for this entire article all the numbers required for the results presented are available here (the second sheet in the file deals with trying to find cool statistical quirks in advanced voting, but I never got around to it fully).

Submit a comment