My (perhaps over-ambitious) goal with this project was to perform a global survey of art work data in Wikidata through Python scripts that queried, collected, and analyzed the properties associated with ALL the “instances of”(P31) “painting” (Q3305213) including all subclasses. The hope of this project was to provide some utility to cultural heritage institutions in understanding best practices with data modeling and aligning application profiles for a more robust and universally usable art dataset in the sphere of linked data. The scripts of this project work in conjunction with each other to identify all the QID’s of items classified as “painting”/+ all subclasses, then request all the claim information(i.e. associated properties) of those QID’s from wikidata, and then output as an aggregate by property number and number of occurrence in the dataset.
I say “ambitious” because it took multiple plans of attack to gather only a portion of the originally intended scope. At the time of this project, there were 566,444 items/ instances of painting/subclass of paintings, but the results of this project only consider 203,063 items in the analysis due to the considerable amount of time needed to request claim data.
[Data + Methods]: The main source of data was accessed through Wikidata’s SPARQL endpoint, Wikidata’s Query Service, and through the Special:EntityData API endpoint through respective python scripts. The python scripts worked in succession with each other to collect, store, and pass through data with JSON.
[Some barriers]: Wikidata doesn’t like it when you request hundreds of thousands of requests in succession. Weird, right? That necessitated adjustment to the plan of attack via the collection, retrieval, and storage of the Special:EntityData search results. Since it took a long time to request all the claim information for each of the 566,444 items (I am talking DAYS), I made the decision to limit the claim collection to just 203,063 items (instances of “painting”)
[Conclusion]: This project has definitely served as my foray into programmatic inquiry. I look forward to further expanding this project (and skillset) to “instances of” other entities in the Wiki-verse in the hopes of creating a utility to understand current state modeling activities(by popularity) within the linked open data realm.