I chose to webscrape the Brooklyn Historical Society's digital images collection for my final project. I thought this might be easily achieved because the site itself presents all search results in a table that seemed like it would be easy to loop through.

My initial goal was to create a map in Google Maps with URLs to each image. However, by the end of class my code mostly exists in three pieces and requires some imagination to put the whole together. All my code can be found on Github.

In bhs_webscrape1.py, while each title is successfully gathered from each page, the code keeps returning to the first page to get image URLs. So those URLs do not accurately match up the titles to the photos. In the file bhs_find_regex.py, I used regular expressions to print a list of titles that included a street address starting with the # sign. The names of pictures in the collection overall varies but luckily for me a collection of photos from John Morrell all had street addresses when they were known in the same format, example '#1 Argyle Rd'.

The file bhs_find_csv.py writes the original, erroneous json file to csv. Since that proved to be an unusable file, I ultimately copied the addresses that were printed in the regex file to a spreadsheet, cleaned the data up, and uploaded that file to Google Maps. I added a few images manually since that had been my goal but unfortunately, the semester came to an end before I could figure out how to make everything work together.

My map: https://www.google.com/maps/d/edit?mid=1gRhzeRS9dvgfnSJioIfsL1p5CNk&ll=40.661612171044524%2C-73.94808869999997&z=12