Human and Python NGrams
By: Sydni Meyer
A working project providing a foundation for generating word clouds from JStor Data for Research. This project is particularly interested in the potential effect on academic freedom these kinds of agreements foreclose, using keyword data from the Economics Department of George Mason University to track language shifts in published articles overtime. Using Jstor Data for Research (https://docs.constellate.org/differences-from-jstors-data-for-research-dfr/) metadata spanning 1990-2021, I began extracting citational information from the metadata documents about citation practices and keywords. The initial metadata request and extraction practice began in early April, prior to DfR transitioning to a new data platform that provides NGram extraction rather than full citational information (https://docs.constellate.org/topic/about/). This working project finds itself in that transitory space, attempting to pythonic recreation of NGram functionality. his repository is more of a sandbox than a meaningfully finished project, meant to provide foundational tools for librarians and information professionals to understand how to replicate data services provided by licensed resources. Files provided leave breadcrumbs for myself other python learners to test their human NGram capacity with Python, at the very least to understand how to reproduce software capacities lest they transition to a new form mid-project.