This project was inspired by our involvement with Linked Jazz, a research project based at Pratt Institute that investigates the application of Linked Open Data technologies to digital cultural heritage materials. The work is built on a collection of 50+ jazz oral history transcripts from which relationships between jazz musicians are derived. When the person being interviewed mentions another person, an RDF triple is generated to describe that Person A knows of Person B. One goal in working with the Linked Jazz data, and a goal of linked open data in general, is to link the data with external sources. This project is an exploratory step toward that goal, and serves as a use case and iterative research effort that we hope to expand on in the future.

Linked Jazz Meets Carnegie Hall:

Working with performance history data from Carnegie Hall, our work compares the Linked Jazz network of relationships to the network of performers involved in jazz events at Carnegie Hall from 1912 to May 1955. Our first step was to define the subset of people who are in both datasets. Using a series of Python scripts and the Python Human Name Parser module, we used string matching to find people with the same first and last names. From a total of 2005 people in the Linked Jazz dataset and 19000+ performers from Carnegie Hall, 264 name matches were identified using this method. In addition to the name parser string matching, we compared Unique Resource Identifiers (URIs) in the datasets for matches using the Python RDFLib package to parse the RDF. Many of the performers in the CH name directory do not have URIs that could be matched to URIs in the LJ data (which are primarily from DBpedia). However, this method identified 268 matches. Combining the resulting matches from both methods and dispensing with duplicates, we generated a list of 373 names in common.

From this resulting group of people in both domains, we generated three new datasets using the existing Linked Jazz relationships derived from the oral history transcripts, and from the Carnegie Hall event data:

  1. Linked Jazz relationships where a person in both datasets (CH and LJ) 'knows of' someone else (3058 relationships).
  2. Carnegie Hall relationships where two people who are in both datasets (CH and LJ) performed at the same event (6706 relationships)
  3. A subset of relationships that fall into both of the first two categories (293 relationships).

To visualize the resulting networks of relationships and allow for the exploration of these connections, we combined the data into a csv file and used it to generate a Gephi visualization.

Interactive Version

Thanks to Rob Hudson, Associate Archivist at Carnegie Hall, and to the Linked Jazz Project