Quantifying my taste in music: A marriage between spotify and last.fm APIs

A personal inquiry into a personal matter

Efe Baslar true
2022-03-11

First things first

This is a rather short blog post about humble analyses of my humble taste in music. I have no claim of being a music connoisseur and in fact my taste could very well be viewed as bland and unrefined. This, however, does not stop me from analyzing it. My initial goal was to come up with analyses that lay out the patterns in my listening habits and to find out which artist and tracks I played the most throughout the years.

Last.fm API

So, I have been a faithful last.fm user1 over the years and I have a peculiar partiality for seeing my taste in music kept track of. As I am writing this, a quick glance to my last.fm page reveals that I have listened to over 95000 tracks (called a scrobble if one chooses to observe the last.fm jargon) in a span of 9 years, most of which coming from proggy genres. The true number of tracks I listened to is possibly much higher in the same time period as I also frequent youtube so that I can get some soundtrack rolling while I am studying, since youtube generally has better curated content in this aspect.

Anyway, Prog genres and sub genres usually feature much longer tracks than your usual hit song and the mere scrobble tally on last.fm therefore does not necessarily reveal your true addiction.

However, the last.fm API2 is pretty limited when it comes to taking a peek at song characteristics. Consequently, to answer the simple question “to whom I listened the most?”, the last.fm API itself is not enough. While this could come as a letdown, it should cheer you up a little to know that it kindly gives you access to your entire scrobble record along with the top tags for each track/artist/album, both of which I have utilized throughout the analysis (Of course its capabilities are not limited to these two but these two are the ones that are immediately relevant for our analysis) The fact that we can access our scrobble record is particularly important since the spotify API provides no such functionality. Making it vital to rely on both APIs at once.

Spotify API

Speaking of the spotify API3, it provides us with many interesting opportunities. In addition to being able to give us what we are primarily looking for, namely the individual track lengths (which we cannot get from the last.fm API), we have the chance to incorporate in our analysis the audio features4 for each track available on spotify. This opens up several interesting avenues to explore, which I will be discussing below. What I have mentioned up until this point roughly settles all I needed for the analysis and my general strategy for acquiring all this is therefore as follows:

Please find the github repo associated with this project here to see how exactly I went through these steps. I used python to deal with the GET and POST requests and R to do the analyses and the visualizations.

Before moving on, let me share with you what my true list of top 20 artists and tracks look like and note how they compare to the number of times each track was scrobbled on last.fm. It shouldn’t come as a surprise that most of these tracks are over 10 minutes long. There are some artists whose songs I listened to for hundreds of hours. No regret. 100% would do it again.

Artist Name Genre Scrobbles Total Played (hours)
1 Haken Progressive metal 5278 637.23
2 Leprous Progressive metal 5795 601.87
3 Dream Theater Progressive metal 4539 589.11
4 Between the Buried and Me Progressive metal 3686 433.82
5 Caligula’s Horse Progressive metal 4029 423.38
6 Blind Guardian Power metal 4471 386.05
7 Mastodon Progressive metal 3889 323.62
8 Rush Progressive rock 2119 254.48
9 Opeth Progressive metal 1741 244.13
10 Ayreon Progressive metal 2116 204.99
11 Diablo Swing Orchestra Avant-garde Metal 2663 195.62
12 Ihsahn Progressive metal 2066 178.43
13 Tool Progressive metal 1418 163.87
14 Soen Progressive metal 1453 146.35
15 Cynic Progressive metal 1908 141.84
16 Hans Zimmer Soundtrack 1759 134.18
17 Symphony X Progressive metal 1336 133.59
18 Ghost heavy metal 1600 123.50
19 Porcupine Tree Progressive rock 1016 108.73
20 Native Construct Progressive metal 953 108.62

Looking at the table above, it is astounding that I have spent at least a hundred hours with each of the top 20 artists, with the time I played Haken, Leprous and Dream Theater each separately amounting to almost a month (of my life). These simple facts would easily have eluded us if we were to only glimpse at the mere last.fm tally. This way, it is much easier to put the time I spent listening to music into perspective. Of course, most of the time when I am listening to music I am not only listening to music since music is also an amazing tool for me to focus and get into the zone. Therefore, the more I am spending time in front of the computer or in the act of commuting, the more I resort to listening to proggy tunes.

Track Name Artist Name Genre Scrobbles Total Played (hours)
1 Crystallised Haken Progressive metal 140 45.20
2 Rosetta Stoned Tool Progressive metal 176 32.82
3 Veil Haken Progressive metal 149 31.29
4 Chromatic Aberration Native Construct Progressive metal 138 28.70
5 Alone In The World Caligula’s Horse Progressive metal 143 26.40
6 White Walls Between the Buried and Me Progressive metal 107 25.36
7 Justice For Saint Mary Diablo Swing Orchestra Avant-garde Metal 178 24.59
8 Silent Flight Parliament Between the Buried and Me Progressive metal 97 24.50
9 Ants of the Sky Between the Buried and Me Progressive metal 106 23.28
10 Graves Caligula’s Horse Progressive metal 90 23.28
11 Visions - remastered 2017 Haken Progressive metal 62 23.21
12 Jaguar God Mastodon Progressive metal 170 22.50
13 Nocturnal Conspiracy - remastered 2017 Haken Progressive metal 102 22.36
14 Echo Leprous Progressive metal 138 22.29
15 The Fountain of Lamneth Rush Progressive rock 66 21.96
16 The Architect Haken Progressive metal 83 21.67
17 Lay Your Ghosts to Rest Between the Buried and Me Progressive metal 128 21.42
18 Xanadu Rush Progressive rock 115 21.24
19 Extremophile Elite Between the Buried and Me Progressive metal 126 20.95
20 End Of An Empire Turisas folk metal 170 20.61

It seems I have spent an almost two days of my precious lifetime being Crystallised by Haken, which is an amazing musical journey (I am obviously biased here) and was also a treat witnessing it being played live 3 years ago5. Furthermore, the 20 minute long Rush magnum opus seemed to have squeezed in at the 15th spot with only 66 scrobbles. I remember being obssessed with Fountain of Lamneth and -The Necromancer for a couple of month, this must have been the result.(well, I still play these songs quite regularly!).

Analysis

Let us dive into how we can benefit from the data set we have curated. We have in our hands the timestamps for each scrobbled track, the genres associated with each artist, a set of audio features for each track. We can take a temporal perspective for first order insights or an explanatory perspective using linear models. Of course, we’ll do both.

Genres and artists

Well, our first figure is merely cosmetic but it does reveal some patterns regarding my listening habits. That I am a huge prog metal fan-boy did not need a second confirmation but here we can see that how big of a fan-boy I am. The hierarchical graph visualization below shows us the artists and genres I listened to, where the radii of the circles are proportional to the time I spent listening to that particular artist. The immense dominance by progressive metal displayed here should not come as a surprise, I have been listening to the genre for almost 15 years now and have been consistently looking for new artists. But there are several other things that are salient from the plot: I am a sucker for good movie/series soundtracks. Especially given that I am a big sci-fi nerd, I have found that listening to soundtracks especially when studying/working is a nice break from otherwise a (prog) metal-heavy undertaking. You can also see that my interest is not only limited to prog metal, as I also have listened to a considerable number of other metal and rock artists over the last 9 years.

Click here for a higher quality version of the figure below.

A hierarchical graph representation of what I have listened to over 9 years.

Figure 1: A hierarchical graph representation of what I have listened to over 9 years.

A temporal take

Now to take a look at how my appetite for different genres has evolved over time. By now you should be able to guess that progressive metal absolutely dominates here too. The pair of plots below shows how long I listened to my top 10 genres (and the rest grouped under “Other”) over each month since February 2013. The first plot shows the share of each genre in the given month, whereas the second plot deals with the total number of hours I played tracks from each genre.

Figure 2: Shares of genres I listened to each month over the last 9 years

There are some particularly interesting facts to be drawn from these plots. Every time I go to see a movie that moves something in me, given that it features a score that is essentially one of the pillars that makes the movie itself, I become enamored with its soundtrack as well. There are two points in time that the share of soundtrack pieces I listened to has surged: Late 2017 and Late 2021. These dates actually correspond to the release dates of Blade Runner 2049 and Dune, whose scores I found to be phenomenal.

Figure 3: Hours of music I listened to each month, broken down to genres over the last 9 years

The spikes in the total hours I listened to music usually coincide with the months in which I had to work a lot. For instance, towards the end of 2019 I was on a sprint to finish my master’s thesis, this was however followed by a dip in the December of the same year during which I was serving in the military. All in all, I can say that I am pretty consistent in maintaining the (in)equality between genres and that not much has changed over the last 9 years in my taste in music.

Principal Component Analysis

The audio features we retrieved using the spotify API can be used to identify genre characteristics, engineer distance measures between genres, tracks and artists, and they can also be used to build a classification model (which I am not going to do, as there is lots of overlap between genres). What I am going to do instead is to take the relevant audio features and build individual PCAs for genres, a subset of artists and a subset of songs. To see what makes my music taste… tick!

But let us first see what the spotify API has in store for us. I discarded the features6 which I deemed irrelevant for the subset of genres I am analyzing.

Genre

A disclaimer before we move on: the genres and artists are only represented by how much I listened to them, there is obviously a bias here but it is good to know how open I am to the more distant genres. On a side note, all the variables are scaled to 0 mean without which it would not make sense to run a principal component analysis.

The first two principal components shown below accounts for more than 81% of the entire variation, which is an acceptable level for visualization. The bubble sizes are proportional to the total time I listened to each genre over the last 9 years. As expected, valence and danceability seem to form an axis of their own whereas energy, loudness and tempo seem to be somewhat correlated with each other (keep in mind that the data set only represents my own selection).

You can see that most of the prog genres are clustered around a sweet spot that I seem to like: Not too loud, not too fast, not as dark as black metal but at the same time still keeping metal traits. No wonder the power metal I like is in the immediate vicinity of prog metal, I suspected as much even though I saw prog metal and power metal being put at two extreme edges of a dichotomy.

Figure 4: Genre PCA: Use the plotly controls on the upper right corner to navigate the plot!

My usual unwillingness to listen to post-rock is not without a rationale it seems, even the sort of post-rock I listen to is way off to the region I seem to feel comfortable around.

Artists

Instead of plotting all the artists I have ever listened to, I defined a subset of 4 genres: Progressive metal, avant-garde metal, progressive rock and power metal (looking at the genre PCA just above, you can see why I have chosen these). This plot works much better if you navigate it using the controls in the upper-right corner. The region I like is characterized by significantly low valence and danceability but fast and powerful enough to provide some stimulus. This could actually be the reason why I choose this kind of music even when I am working.

This time, the variance covered by the first two principal components is lower: 67%. Although labeled as Power Metal, Blind Guardian seem to be proggy enough so that it found such a formidable place in my taste in music.

Figure 5: Artists from 4 selected genres PCA: Use the plotly controls on the upper right corner to navigate the plot!

Songs

So, the final plot is mostly for my own investigation as it is considerably complicated to behold without prior knowledge on these genres. But if you are willing, hover your mouse over the bubbles for information about that particular track (e.g. track name, artist and genre). I have only included the tracks I ever played from my top 25 artists. Contrary to the prior PCA plots, we can see a stronger spread along the valence/danceability axis, Tthis is partly due to the subset being narrower in terms of its genre coverage and partly due to prog metal being a genre with a lot of diversity. The songs can range from death metal-like bursts to concert ballads.

Figure 6: Songs PCA: Use the plotly controls on the upper right corner to navigate the plot!

Closure

Yep that was it, I’ll show this post next time whoever tells me that I don’t have a diverse-enough-taste-in-music. yes, I focus prog and prog-like metal as genres, but within genres and across songs I do have considerable amount of variation and I believe it is this variation that has kept me hooked on a couple genres for years.


  1. https://www.last.fm/tr/user/baslare↩︎

  2. https://www.last.fm/api/intro↩︎

  3. https://developer.spotify.com/documentation/web-api/reference/#/operations/get-several-audio-features↩︎

  4. https://developer.spotify.com/documentation/web-api/reference/#/operations/get-several-audio-features↩︎

  5. https://www.setlist.fm/setlist/haken/2019/zorlu-center-psm-istanbul-turkey-73939681.html↩︎

  6. acousticness, instrumentalness, key, liveness, mode, time signature and speechiness↩︎