- Begin with one directory of jpeg image files and one CSV of all metadata associated with the images
- Search for specific strings in captions and keywords for themes with geographic implications, count the assets in potential collections to make sure the number of photos is meaningful for an actual sub-collection
- The 6 sub-collections are: beaches, cities, forests, landmarks, mountains, parks
- Create separate CSVs of the metadata for each sub-collection
- Physically move or copy the images into separate directories for visual understanding of how the images match up with location keywords and see what the collections look like
- Any images not found during the copying process are identified and listed by filename
- After collections have been separated and moved to different directories, combine CSVs into one CSV for the 6 collections
Data Cleaning: OpenRefine was used to clean up the locations data. This data was separated by city, state, and country in separate columns. These had been originally supplied by the submitting photographers and there was a large range of spelling errors and language discrepancies, or wrong data in wrong field. Project Visualization (using Tableau):
- Use Google Developer geocoding tool to obtain lat-long data for images from string location names
- Global Map Visualization of separate image locations for each sub-collection, with option to filter collections to understand the breadth of each sub-collection, as well as breadth of all the geographic collections together. Points on the map are sized according to the number of filenames at the location, shaded according to collection subject.
- On hover, the viewer can see the location (city, state, country) and the number of images represented at that location.
Please Note: The images for this project are protected by copyright and the metadata is proprietary. The visualization may be used to explore this particular collection and the scripts may be used as a template for analyzing other image collections or similar projects.