Social Data & Visualizations Project

Explainer Page

Data Collection

For this project, data was collected from multiple different locations using Python. First, a historical dataset on socio econmic status of all districts of the Copenhagen municipality was taken from opendata.dk. This set contains data from 2008 till 2014 on all sorts of socio economic stats like age distribution and how many people work in different sectors. The data was divided by 'rodes' (small subdivisions of districts used for taxes) and we wanted to work on districts, so another data set was taken from the portal to convert all rows into their districts. The data was then grouped on district and year with Pandas and a district_id was added for easy reference, since the spelling of Danish brings a lot of character encoding issues. A geojson file was then used to create a map of the districts. Since non of our group members are Danish, quite some time went into translating all variables into English. The final dataset contianed 70 rows (10 districts for 7 years) and 67 columns. Only the counts (and not the ratios) were kept because we had to group all the original 'rodes'.

Then, voting data on two municipal elections (2013 & 2017) was scraped from the official result website. The page for Copenhagen has the results from every voting office linked below. It was clear that all these pages had similar urls, with only the last two digits changing, ranging from 1 to 55. On every page there was the same table with the number of votes per party which could easily be parsed using Beautiful Soup. With a bit of tinkering, it was possible to map every page id to a district (often around 5 offices per district) and collect all data, merge it per district and write to JSON. This resulted in a dataset with the district_ids as keys and an array of objects as values, each object containing the name of the party, the number of votes and the number of votes lost or gained.

All the code can be found here.

Visualisations

Following the rules described at the lectures, in the project we did our best to choose intuitive visualisations. In the main page we mainly have 5 different visualisations.

Firstly, map is the most intuitive way to display information about a city. Above map you can see a slider and also the type of data (Age, Income etc.) can be set through a dropdow menu. Map is color coded and a legend is provided. The basic intention here was to provide user a way to analyze any geographical clustering in any feature.

When the visitor clicks an area he/she can see a change on 2 different visualizations below. A bubble chart displaying the results of the political elections and a stack chart displaying yearly change in the district.

Since there are a lot of political parties, a bubble chart was an intuitive selection. This is because it is possible to see a lot of elements on the same layout at this kind of visualisaton. Here we have paid attention to give same colors to the same political parties. Also note that we have filtered only parties which have obtained more than 1% of the votes in the district. This is because there are 32 parties in total and it is not ideal to give proportionally bigger space to parties which did not obtain 1% of the votes.

The second chart updated by the map is Stack Area Chart. This chart shows yearly change of the feature selected. The main intention here was to show user the change of the district by time.

So map is one way to see the city. Another way is by sun burst chart. Everything in the chart is totally interactive. You can filter any data in this chart and see the yearly distribution of the filtered data at the bar chart below. For example if you click to the inner layer of the chart you would see the distribution of the feature by district (ie only population). However while going outer layers, you can see extra filters and when you click to these areas you would see the change in the filterd data.

Since our data mainly shows distributions (i.e high income-middle income - lower incom) sun burst chart was just a natural choice. The user easily can see the percentage of high income people at some area. The user also can easily see layers. For example can compare the number of "singles" in a district and the number of "single who have kids". The main drawback of the sunburst chart is that it contains a lot of different areas thus it is very hard to keep a minimalistic approach.

Finally, even though bar chart is the very basic style bar chart it is always one of the most intuitive ways to put time series information. Note that the color of the bars are same with the chosen area (where data was filtered at sun burst chart)

Contributions

Ahmet Baglan: Sun Burst Chart, Bar Chart, Bubble Chart, Explainer Page.

Wisse Barkhof: Data Collection, General Structure of the Webpage, Data Story, Map Visualisation, Stack Chart, Explainer Page.

Siddharth Chopde:Putting the page online, Bubble Chart

Copenhagen in Demographics & Politics

Explainer Page

Data Collection

Visualisations

Contributions

The Data Story

Indre By

Municpal Elections

Change in demographics 2008 - 2014