We know from last week’s post on analyzing the Edmonton census data that adjacent age groups generally tend to group together in Edmonton neighbourhoods; e.g., 50-54 year-olds tend to live in neighbourhoods with relatively higher numbers of 40-49 and 55-59 year-olds. I’m going to take this idea a little further and, using some common clustering techniques, show how Edmonton neighbourhoods can be divided into 5 major age-based clusters.
Clustering in a nutshell
According to the Wikipedia article, clustering is “the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense.” The most famous clustering algorithm, and the one that we used for this analysis, is called k-means. (Andrew Moore has an excellent tutorial for those interested.) K-means is a relatively simple but powerful technique that’s very useful for exploring datasets. There are quite a few details that a practitioner has to sort out (e.g., scaling, collinearity, etc.), but the output of k-means often reveals clear and distinct patterns and helps us get our bearings, particularly with marketing data.
Preparing the Edmonton census data
The Edmonton neighbourhood data is in matrix form: the rows represent neighbourhoods, the columns successive five-year age ranges (0-4, 5-9, etc.). Each cell contains the number of people in the associated neighbourhood and in the associated age range. We replaced all 65 and over age ranges with an aggregate 65+ (senior) group. We then scaled the data row-wise so that all cells contained values between 0 (the minimum row value) and 1 (the maximum row value) for each row. With this scaling, the clustering was performed on the distribution of age groups in each neighbourhood — similar neighborhoods would have similar distributions of young and old.
Neighbourhood cluster centers
The k-means algorithm finds a predetermined number of clusters. (We selected 5 clusters based on a trade-off between model complexity and goodness-of-fit.) The center of each cluster is calculated by averaging the attribute (column) values for each neighbourhood in the cluster. By inspecting the average attribute values of these cluster centers, we can understand the types of neighbourhoods found in each cluster. (Remember that the cell values range from 0 to 1, and that they represent the relative proportion of each age group in each cluster.)
Neighbourhood clusters described
- Young Families: Higher proportion of infants (0-4) and adults in the 25-44 range (parents).
- Older Families: Higher proportion of older children (10-24) and 40+ adults (parents).
- Young Adults, Seniors: Higher proportion of young adults (20-34) and seniors (65+).
- Seniors: Higher proportion of 65+ adults.
- Balanced: Comparatively uniform proportions of most age ranges.
How are these 5 age-based neighbourhood clusters distributed across Edmonton?
- On the outskirts: Younger families are setting up at the very edge of the city where most of the new detached residential development is taking place.
- Ring patterns: Older families are also in neighbourhoods at the edge of the city but not as far out. They occupy an inner ring around which the younger family neighbourhoods appear to be growing. The older family ring in turn encircles a core of senior-dominated neighbourhoods.
- Mixed use: Central neighbourhoods attract younger adults without children. They also have a well-represented senior population. The further you go from the core, however, the less likely you are to find younger adults, and the more the senior population starts to stand out.
- Balanced neighbourhoods: Balanced neighbourhoods (such as those in the southeast) appear to be made up of an ethnically-diverse, high visible minority population.
- Clusters tend to cluster: The neighbourhood clusters aren’t scattered randomly throughout the city — they group together putting even more emphasis on the “flocking together” effect.
Age-based clustering has managed to produce a fascinating picture of how the city has evolved and even to suggest how it will continue to evolve. You can see that, in the not too distant future, the many senior-dominated neighbourhoods in the core of the city will experience significant demographic shifts as residential property starts to change hands. Given the fact that younger families have settled in the outskirts, where they’re likely to remain for years to come, it’s an open question as to what the face of core neighbourhoods will look like when this shift happens.
This initial attempt didn’t take into account census attributes such as dwelling unit type, dwelling unit ownership, structure type, and property type status. Tackling that is next on the agenda.