The Free Music Archive (FMA) is an online library of free, legal music downloads maintained by WFMU and curated by dozens of cultural institutions and music-minded organizations. FMA provides a framework for hosting and sharing new output and its library currently features about 16,000 artists. Part of this framework is the detailed assignment of metadata to hosted files including genre and relational information. To this end, FMA's organizers have released their library's data to the Echo Nest. The Echo Nest describes itself as a music intelligence platform - it crunches the sonic properties of digital song files and provides numeric values for certain features (loudness, vocal presence, "energy level" etc.). The output of the algorithms that Echo Nest uses is conditioned by text analysis of popular music resources as well. Echo Nest co-founder Brian Whitman described the philosophy behind the company's approach on his blog. These analyses are often used to power the automated suggestions provided by services like Spotify and rdio. For my final project, I created a network map of these suggestions to visualize the sonic landscape of the Free Music Archive.
Both FMA and the Echo Nest make their APIs available to developers. While Echo Nest works with many content providers, you can limit the scope of results returned on queries to the API to specific domains. In this case, I wanted to retrieve only data on FMA artists classified as related to other FMA artists based on the Echo Nest's sonic analysis. To do this, I set up two separate API queries using python scripts. First, I requested all of the artist names and unique ID values from the FMA, calling the API 20 records at a time using a page-structured loop. These values were associated in a big dictionary that I exported to a json formatted file.
The Echo Nest API accepts artist queries by name rather than foreign (ie. partner-organization-specific) ID, which posed issues for retrieving consistent records. But the API does accept queries by Echo Nest IDs, which are assigned to all artist entities in the database. By first looping a request based on the FMA-assigned artist name, I returned records that contained each artists' Echo Nest-specific identifier and a normalized version of the artist name that played nice with the Echo Nest API. I then ran a second query using the local ID, limited to FMA-domain artists, and retrieved lists of each FMA ID deemed to be related to the queried FMA ID.
I used Gephi to create the network graph. Gephi requires imported edge data to be formatted in a Source-Target arrangement. I used regular expressions to clean up the output from the Echo Nest and iterate every relationship between artists on individual rows of a giant CSV file. As every artist can be related to a max of 20 other artists according to the music intelligence platform, this made for a 180,000+ lines! Gephi huffed and puffed a bit at the import, but ultimately managed. Importing the nodes for the graph was just a matter of reformatting the original JSON output from FMA to a CSV file.
The network graph exhibits undirected connections between nodes. Nodes are sized according to degree of connectivity - artists deemed similar to a lot of other artists are bigger than those with fewer sonic cohorts. With this many edges, I reached the end of available room in Gephi's network plane to delineate the regions of familiar artists. Nevertheless, there is clustering on the graph along genre paradigms. Some data issues include repeat or invalid artist entries and the iterating of certain artist entities across multiple name versions decreasing accuracy.
Additionally, I wanted to modulate the graph with the overlay of user favorites within the FMA site. At the outset, I wondered if centrality in the network graph might be correlated to an artist's popularity within FMA. These favorites tallies were retrieved alongside the FMA ids from the FMA API. However, the user behavior of assigning favorite artists does not seem widespread enough to provide meaningful information. A handful of artists garnered hundreds of favorites while most, even broadly popular ones, remain unadorned. These trends had very little if anything to do with degree of connectivity.