Hacking the Humanities – Midterm Project

Introduction:

My project aimed to geolocate the photographs from the Charles W. Cushman Photograph Collection that were taken in the United States in 1960, and visualize the metadata by their given categories or genres: aerial photographs, architectural photographs or cityscape photographs. The methods I used were data-sorting and cleaning using Microsoft Excel, geolocation and mapping using ArcGIS online, and elementary data visualization of the cleaned data using Flourish. The results were a webmap of locations where the photographs were taken, with additional metadata provided when someone clicks on an individual marker, and graphs visualizing statistical relationships between some of the metadata.

Sources discussing the dataset used and any steps taken to clean it. Apart from the picture URLs, the data used the following sources as metrics to classify their data:

I cleaned out most of the columns. For example, out of the 4 columns dedicated to cities, I kept only 1, and made similar changes to the 5 columns for street adresses, countys, countries, etc. I deleted columns for other names, corporate names and ID numbers of the photographs, and also deleted the extra genres and topical subject headings columns. In the end, my focus was on the columns that included a description of the photographs, the date they were added to the archive, the primary street address, county, state and country the imagses were taken in, the primary genre of the photos, and the primary subtopical heading of the photos.

Process:

I inputed the cleaned and sorted csv file into ArcGIS online, picked out a basemap for the data and categorized it by genre as the primary attribute and archive date as the secondary attribute. I found three different stock images to represent/symbolize the three different genres: aerial photographs, architectural photographs, and cityscape photographs. The second attribute aided in changing the relative size of the symbols by making the largest symbols the photographs archived later in 1960, and the smallest symbols the photographs archived earlier in 1960. The labels of the points on the map are the descriptions of the images as recieved in the original metadata. If you click on an individual point, you can access more details about the image in particular, including a URL to see the image in the collection.

I also used Flourish to highlight certain relationships between data points. I used the columns that had the data for the state the pictures were taken in, and the genre of the photographs to create a stacked bar chart that shows how many pictures of each genre were encountered in the different states in the US. Additionally, I tracked the most frequently occuring topical subject headings and noticed that they overlapped across genres, meaning, for example, that pictures of trees ended up categorized under both cityscapes and architectural photographs. I input the frequency of these topical subject headings under each category and created a Radar Chart.

Presentation I decided to share my results using an online WordPress website on subdomain. Individual links to my map can be found here:

https://carleton.maps.arcgis.com/apps/webappviewer/index.html?id=88fd46d8751649849d95e49a87f1b3d7

Below are the embedded views of my visualizations. The first one is the map of where all the photographs were taken in the US, with additional metadata on interaction with the plot points. The second one is the Stacked Bar Chart, and the third one is the Radar Chart.

Data Visualization 

Significance: I realized while cleaning and visualizing that some of the choices made about metadata classification were ones I wouldn’t have expected. For example, I found that images of cactus and taxicabs were classified as architectural photographs, and bridges and trees were classified as cityscape photographs. There were many overlaps between classification of the images and there could have been more distinct categories by genre. Some of the pitfalls are that I couldn’t precisiely geolocate the photographs due to the absence of latitudinal and longitudinal values. I didn’t know the most efficient way of sorting and cleaning my data, and would love to learn ways of handling data that aren’t as time-consuming as my current strategies!