This post was prepared for a ‘book club’ of Open Data Ottawa, a group based in Ottawa, Ontario that aims to showcase the good that can come from official government releases of data. Last month, one of the members redesigned the Ottawa.ca Jobs Portal, for example. This month the focus of the group was the Municipal Election Result dataset.
In this ‘investigation’, I’ve decided to use correlation mathematics to help determine how voting in Ottawa works, and can be affected by polling setup. A huge focus this year was the low voter turn-out, and I accordingly adjusted my statistical search to try and find any reasons that may have caused it in the basic setup of public polling that the city uses. I’ve used Microsoft Excel (with heavy reliance on the CORREL function) to both arrange and analyze my data, and combined the election results with publicly available city ward population data from the City of Ottawa themselves in order to better understand the smaller decisions that govern election results. I’ve also chosen to compare the 2014 election setup to the 2010 election setup to see whether any patterns come up that would be interesting to look at. My approach was a question based one, allowing one answer to lead me to ask the next question.
A single definition that you will need to know before understanding my results is my use of the word ‘Opportunities’. I define ‘Opportunities’ as a moment of time and place where you can go to vote. What this means is that any voting location that was open for polling thrice (twice for advanced voting and once for voting day) counts as three ‘Opportunities’! The reason I’ve done this is much the same as the reason the city provides multiple voting days in the first place: trying to catch all possible voters who may miss a specific window. This can have significant effects on my dataset due to the fact that I use ‘Opportunities’ to calculate much of what I believe can be reasons for a lower voter turnout, so I’ve divided my dataset per ward. This allows me to have sufficient replicates of each finding to ensure a trend exists. So, without further ado, herein is what I found.
since the dataset was divided by wards, can we make sure that this doesn’t change the results we might get?
Does a high number of voting opportunities affect the amount of voting that happens per opportunity?
Does a fluctuation in voting opportunities affect voter turnout?
does the number of advanced polling opportunities affect the turnout?
what if the correlation happened because the city expected a higher turnout in some wards?
These two charts looked really similar to me, so I wondered whether changes in population and changes in electors were the same thing, and they are strongly correlated at 0.81, which makes sense: where there are more people, there are more voters. What was interesting, though, was that the change in population was strongly negatively correlated with the change in electors per population at -0.79 (the fifth chart). What this means is that the higher the population, the less dense it is when it comes to voters within it, which means that if the city was trying to target where electors would most likely be we would have had to see a stronger correlation between change in electors and opportunities than between change in population and opportunities. This also suggests something that is equally as cool: Most electors are Ottawa based, and the increase in population isn’t due to an influx of electors from other cities, but is probably a mixture of the birth rate and immigrants! Ottawa’s municipal climate, for at least these past 4 years, has been under the exclusive control of those affected by it.
conclusions and open data
So, to sum up the findings: Voter turnout is consistent across the entire city, and this is not affected by the number of opportunities presented (or the density of the opportunities presented – data not shown), meaning there is a ‘core’ voting population that will always try to make it in every ward. Changes in the voting opportunities presented are correlated with voter turnout, however, and especially so on voting day. This could mean that either some voters are dissuaded when their favorite polling center is unavailable, or that the city specifically targets the changes in its polling locations or times to where it expects the highest percentage of voters. It is unlikely to be the latter because the changes seem to be based more on changes in population than electors, and also because some of the city’s highest turnouts also received some of the city’s strongest cuts to voting opportunities. An immediate improvement the city could provide is to cut back on the number of advanced polling opportunities (as high as 60% in some wards) and focus their location rental or hours on voting day, where there is most public recruitment to the polls.
Things that I have yet to look at are the separation of time and place (whether breaking down the opportunities to remove the time factor could prove to improve our statistics), whether or not the return of some incumbents could affect voter turnout, whether the geographical lack of some voting day locations makes sense in the lower voter turnout, and the actual significance of advanced voting days (considering statistically they only seem relevant in predicting election results but not much else). In the interests of maintaining the status quo for this entire article all the numbers required for the results presented are available here (the second sheet in the file deals with trying to find cool statistical quirks in advanced voting, but I never got around to it fully).