The ARChive of Contemporary Music website features many image galleries depicting items from the collection, including records, CDs, sheet music, music-related books, and ephemera. Since the launch of the site in May 2014, web traffic to the galleries has been relatively low, about a third of the number of users that hit the homepage. The ARC's social media posts also have relatively low reach and low engagement (e.g., average interaction per tweet = 1).

By repurposing interesting, fun, and quirky digital content in the context of social media, perhaps we can better engage followers, attract new users, and drive new traffic to the site, potentially attracting new donors to the non-profit archive.

To efficiently capture and repost this content, I wrote two complementary scripts. The first scrapes image URLs and metadata from the ARC photo gallery pages, which are generated by the Next-Gen Gallery plugin for WordPress. This data is written out to a JSON file. The second script uses Pytumblr, a Python Tumblr API client, to build and send a user-determined number of randomly-selected photo posts to Tumblr along with appropriate caption text and tags. The JSON file is then updated to indicate which images have been posted by means of a true/false value.

Why Tumblr? It's free, many themes support a photo-gallery style layout, users can queue and schedule up to 300 posts for publication, and it can serve as a social media hub – using a service like, Tumblr photo posts can trigger parallel photo posts on Twitter and Facebook.

Some challenges: Getting Pytumblr to install successfully was difficult. It has the OAuth2 module as a dependency, and that proved tricky to install. Eventually, I was able to install and run Pytumblr using Python 2.7. After writing the scripts specifically with the ARChive of Contemporary Music in mind, I went back and moved all ARC-specific data into a separate file. This leaves the Tumblr-post code clean and generic – in theory, someone else could use this code as their own Tumblr bot, repurposing their own set of images. However, this process was much more difficult for the web scraper script, as this kind of image-scraping is so context-dependent.

Future development:

  • Continue “abstracting” the code for the web scraper
  • Update the documentation accordingly
  • Continue analysis on ARChive web traffic and social media engagement
  • Systematically change variables in social media posts (time of day, caption text, tags/hashtags) and observe effect on engagement and site traffic
  • Set up web scraping “profiles” for other popular image-gallery generators, e.g., CONTENTdm, Omeka, Flickr, Tumblr itself engagement