This project constitutes a Python-powered RomCom Thermom, made up of two main parts:

  1. a JSON dataset of 462 full-text romance and comedy screenplays, web scraped from
  2. a sentiment analysis code, which utilizes Google’s machine learning-based Cloud Natural Language API

Inspiration for the Thermom comes from my interest in storytelling structure, and draws upon the experiments of Kurt Vonnegut and Vladimir Propp in mapping out core types of narratives through shapes and irreducible units. I came across several other programming projects playing off of Vonnegut and Propp’s ideas (Matthew Jockers’s R syuzhet, Dan Kuster’s Jupyter notebook on Disney screenplays, and story shape research at the University of Vermont and University of Adelaide), which were also influential. Romantic comedies and the character tropes of reporter, tramp and heiress have been a particular node of personal investigation; the RomCom Thermom provides a vehicle for further examination into the tone and emotional flow of characters’ interactions per this genre of narrative.

While by no means exhaustive, the Internet Movie Script Database presents one of the most comprehensive, publically available collections of screenplays on the internet. I chose IMSDb as the data source for the Thermom due to its voluminous nature and for the sake of variety in content. IMSDb also integrates information from the more famous IMDb into its records, which would make it easy to potentially link other movie elements to the dataset. Screenplays on IMSDb are written in HTML and not accessible as PDFs, etc., making the JSON dataset of collected screenplays an even more pertinent and usable asset.

For this project, I selected five classic romcoms to analyze: 10 Things I Hate About You, Annie Hall, His Girl Friday, It Happened One Night and The Apartment. Since the JSON database also contains results for movies which polarize towards the far ends of each of that hybrid’s corresponding genres, I also chose one movie that was obviously a comedy (Beavis and Butt-Head Do America) and one that was obviously a romance (The Bodyguard). Detailed results with corresponding text are available at the GitHub link above, but I am also including the results for 10 Things I Hate About You below for quick reference:

Pages 1-10:
"score": -0.10000000149011612,
"magnitude": 85.80000305175781
  Pages 11-20:
"score": -0.20000000298023224,
"magnitude": 75.5
  Pages 21-30:
"score": -0.10000000149011612,
"magnitude": 88.9000015258789
  Pages 31-40:
“score": 0.0,
"magnitude": 76.9000015258789
  Pages 41-50:
"score": -0.10000000149011612,
"magnitude": 66.80000305175781
  Pages 51-60:
"score": -0.10000000149011612,
“magnitude": 71.9000015258789
  Pages 61-70:
"score": -0.10000000149011612,
"magnitude": 77.4000015258789
  Pages 71-80:
"score": -0.10000000149011612,
"magnitude": 86.5999984741211
  Pages 81-90:
"score": -0.10000000149011612,
"magnitude": 82.0999984741211
  Pages 91-100:
"score": -0.10000000149011612,
"magnitude": 46.70000076293945

In the above screenplay and others, I found that the ‘score’ tends to not fluctuate very much, providing more of monitored tone of the screenplay, whereas the ‘magnitude’ has its peaks and valleys, more in the tradition of Vonnegut’s story shape analysis. I will be providing Tableau-generated visualizations of my selected examples in the near future, in addition to the detailed JSON files available on GitHub.

Overall, I believe the RomCom Thermom can be useful as either a JSON dataset for someone to use however they want in examining one of the 462 screenplays available, or as a Python script to run sentiment analysis on various movie screenplays in various segments of text to take the temperature of anything from a Hugh Grant fluff vehicle to a Billy Wilder penned masterpiece.

Special thanks to Professor Matt Miller and Erin Elsbernd for their assistance on the project

Photo: Cooper thermometer (physical scan by Alan Webber) and still from It Happened One Night (1934) with Clark Gable and Claudette Colbert