Site Overlay

How can we best use huge amounts of data?

This week in class we are discussing data, data “tidying” and visualization, and data mining. We looked at theory and a variety of examples of how various scholars have used amalgamations of huge data sets to reach conclusions and visualize trends. We noted that some of these examples were more successful than others, and as a whole the class seemed to reach a rather pessimistic conclusion: so what? What do these data sets really tell us that furthers our understanding? We looked at the example of organizing paintings by color. I wholeheartedly agreed with a classmates questioning of how useful a data set of 40,000 blue images could be. Sure, she argued, we could look at the spread of pigment geographically, iconography associated with the color, or a host of other topics, but does a massive collection of images really help a scholar on that quest? I also wasn’t convinced. To further dissuade me from thinking it would be helpful, I hadn’t even thought about the way these large data sets could be skewed. Professor Bauer brought up “color pollution” or the idea that the background color of an object would also be mined for these color sets. This means that many coins are placed in black sets because of the black velvet drapes they are photographed against for collections, or that sculptures generally were not accurately tagged because of the wall color they were photographed against. So, if we were to run with the hypothetical collection of 40,000 images of works that are mainly blue, not only is this huge collection perhaps not useful to me as an individual scholar trying to make a claim, but it may not even be accurate.

Data mining is also used to identify trends in textual sources. Dan Cohen’s “Searching for the Victorians” is a great example of this, but it also raises the “so what” question from skeptics. Cohen and his fellow researchers were able to code over a million books (!!) thanks to widespread digitization of Victorian era literature by projects like Google Books and Hathitrust. Below is a graph of the number of books that reference “Revolution” in their titles (for now, only titles are analyzed, but analyzation of full text is in the pipeline for the project):

graph of the frequency of the word "revolution" in the titles of victorian books, frequency goes down overall except for a spike around 1848
Graph showing the frequency of the word “Revolution” in the title of Victorian books from Dan Cohen’s “Searching for the Victorians”

The graph is interesting in that it lets us see how much revolution (and therefore perhaps political in/stability and social unrest) was present in the consciousness of society. The spike in the middle of the graph seems interesting and draws the viewer’s attention, but any historian would immediately know that this spike coincides with the French revolution about which there was a lot published and discussed. So again, you may be left with the question, “so what? what does this actually tell us?” In fact, some commenters asked just that in regard to Cohen’s post.

I don’t mean to be pessimistic about the use of data in the humanities, I think there is huge potential to incorporate it into research in art history and beyond. Returning to Cohen’s revolution example, I actually think there is value in simply visualizing trends. Being able to look at not only a small sample, but virtually all examples of Victorian literature and plotting trends in words used shows the general attitude of the population and what is important. Sometimes just showing data and trends is as valuable to scholarship as distinct arguments.

Forensic Architecture at the Whitney Biennial as Another Case Study

image of rubble with colored blobs that overlay possible teargas canisters to help train the AI software
Film still from “Triple Chaser” by Forensic Architecture on view at the 2019 Whitney Biennial

I want to shift back towards collecting and mining images for a brief discussion on the piece made by Forensic Architecture included in this year’s Whitney Biennial. Forensic Architecture is an agency which comprises about 20 full-time researchers, filmmakers, and technologists, along with a team of fellows that looks into global violence, corruption, and conflict. They provide an interesting example of the ways in which image recognition and data amalgamation can be useful: as a journalistic pursuit (they try to showcase the role of a Whitney board member in profiting from violence), as a tool to recognize very different images and sort through huge sets of them, but also simply to create art (they are exhibiting in the Whitney Biennial after all!).

Forensic Architecture has enlisted artists, filmmakers, writers, data analysts, technologists, and academics in an intensively collaborative process. Maps and digital animations often play a critical role in the group’s work, allowing for painstaking recreations of shootings and disasters, and images are often culled from social media and scrutinized for information. Forensic Architecture’s work suggests a union of institutional critique and post-internet aesthetics, and it exists in many forms. On the group’s website, it lives as design-heavy interactive presentations. In museums, their work takes the form of installations dense with videos, diagrams, and elements of sound.

Alex Greenberger, “For Whitney Biennial, One Participant Targets Controversial Whitney Patron

I encourage you to look more into how Forensic Architecture made the video that was on display at the Whitney that resulted from the larger project because my lack of understanding of the machine learning processes that made it possible also hinders my ability to talk insightfully about the piece. However, very simply, Forensic Architecture trained AI to identify images of Safariland tear gas canisters. In order to train image recognition software you need A LOT of images, it’s one of the major barriers to use. To get around this, they crowdsourced for images of the canisters (and received a disturbing amount from activists around the world). They then put these canisters against various backgrounds and repositioned them from various angles to help train the software further. Again, this is hugely simplifying the process, and the video that they produced and which was displayed at the museum goes through the process in much better detail.

I bring up this example both because I think it’s an amazing work of art and incredibly thought provoking, but also because I think this sort of image recognition training is how I can envision using large amounts of data most effectively. I can see how useful it would be to identify objects (like a teargas canister) or symbols and then train machines to find them in huge collections of images. On a grand scale this could show cross-cultural connections if we see objects or symbols in use across large geographic or temporal divides, but also in a logistical sense help viewers make sense of blurry or degrading images that the human eye may have trouble discerning.

I know in my own work when I look at colonial photographs, many photographers used the same props in multiple photos in order to create “authentic” portraits that satisfied what the colonists envisioned of the “primitives” they controlled. Using image recognition, I could potentially find all the instances in which a certain prop (or type of prop) was used and use this to highlight the fictitious nature of these photographs. Perhaps with the current state of machine learning this wouldn’t be possible, after all I would need a huge data set to train the machine, but as opposed to some of the examples we looked at in class, this type of image recognition data project may help us answer that nagging “so what” question. I’m not sure I’ll ever be able to code this type of software, although I could definitely find wonderful scholars to collaborate with. Perhaps text data would be most useful and realistic for me. I could easily chart biographical data of subjects or photographers using the basic Excel skills I already know, or use existing text mining software to go through records to pull out relevant information for my research. I’ve been the intern that has to “tidy” this type of data before for projects, so I am used to the type of work that goes into amassing data in a way that is useful for these tools. Although I have not used text-mining services in the past, I would love to work with these tools in the future as it would greatly improve my ability to get through vast archives of information. Perhaps these text based approaches are a better place for me to start as an amateur digital art historian.

5 thoughts on “How can we best use huge amounts of data?

  1. Taylor Barrett says:

    Michelle – Thank you for sharing the Forensic Architecture work; one of my favorite part about everyone’s blog posts is coming into contact with resources, artworks and scholarship that I otherwise would never have known about. I also appreciate the Forensic Architecture piece because it shows how artists are already using (and are well-versed in) technologies that art historians are starting to adopt. As art historians must not only understand the cultural context of a work, but also how those works are made, I enjoy the idea that we can learn more from the artists we study about tools that could actually help our scholarship as well. That being said, I also share your reservations about tackling huge image data mining projects due to my relative lack of technological prowess. I do wonder though – as technological advancements move FAST does this mean that the field might be on the edge of big shift towards the digital? (Right now, I’m thinking of a string of comments on one of Taylor H’s posts about his mapping project.) While this may not be the immediate future of the field, I think we could be moving in that direction faster than any of us are able to realize. In that vein, I think its important for us to stop worrying that we lack proficient technological knowledge and take a leap of faith – learn new skills, find people who want to collaborate with us, and integrate these technologies more into our own work. What would happen if we started to push back in our seminars and conducted mini digital projects? (I say ‘mini’ because we all know how time consuming and labor intensive a DAH project can be). Maybe I’m in a radical mood as I write, but I think we should be more excited and less reserved when it comes to the potential for data mining in our field… now, back to yelling at my computer because I don’t know how it works 😉

  2. Michelle, I enjoyed your very thorough post on our readings this week, and I agree, I too was wondering, what would I do with 40,000 blue paintings? On the one hand, I view digital technology and its ability to handle vast amount of data as something exciting, but on the other hand, I tend to be stumped when it comes to practical applications. Thanks for bringing the Forensic Architecture project to our attention, it is a fascinating venture. I was surprised to learn that their investigation into the use of tear gas and bullets discovered that they were manufactured by companies led by Warren Kanders, who is also a Whitney Museum vice chair. The Forensic Architecture group shared its findings with the European Center for Constitutional and Human Rights (ECCHR). Eyal Weizman, the director of Forensic Architecture said, “The controversy surrounding the 2019 Whitney Biennial presented us with a challenge that unites two of the fields within which we operate: human rights and culture. When arms traders support culture, they end up being, in return, reputationally supported by cultural and symbolic capital. We decided to use the platform of the Whitney Biennial to invert this economy through these investigations.” When Forensic Architecture was training the AI to look for the Triple Chaser teargas grenades, they placed the canisters against visually cluttered and colorful backgrounds, so the AI would be able to locate the canisters regardless of the background of the photos. Forensic Architecture is a powerful statement on what digital technology can do. It’s also a compelling statement on the ability to conduct research via the internet, as many volunteers around the world sent in photos and locations when they found the tear gas canisters. I think your suggestion about how to use this technology in terms of showing cross-cultural connections sounds promising. The first stumbling block in digital art history is that to conduct this type of research would require a large team of people who are knowledgeable and skilled in the use of this technology. The next would be funding for a project of this scope and magnitude. How do you get art historians interested in digital art history? How do you show them that learning all these tools and skills will be worth the time and effort involved? How will they maintain and update these skills? You point out in your post that as an intern, you tidied data. This is very true. Interns are used to do the tasks that others don’t want, or don’t know how to do. I understand the economic realities of the situation, but if technological skills are perpetually farmed out to interns, then nothing changes. For me, the more I learn about digital technology, the more I realize that it is a skill that must be practiced every day, like playing a musical instrument. Unless art historians are going to suddenly be required to use specific software, or digital tools on a regular basis, it seems impossible for me to gain the technical savvy and knowledge to be proficient enough to design a project similar in scope to Forensic Architecture.

  3. Thanks for sharing Michelle, this is incredibly interesting and such a great tie-in into the real world. You note your lack of insight into machine learning, but let me just say very few know how machine learning works, and even then part of the appeal of big data and machine learning is that the researcher doesn’t even really need to know how the computer is doing what it’s doing. Of course this can result in researcher’s overlooking bias, because they think the computer is neutral. As we saw in the example of the Trever Paglen piece, the examples that the computer learns form can be bias, thus creating biased algorithms. I think you’re right though that this kind of technology would be insanely useful in art history study, I love the example that you bring up. I would almost wonder if you would be able to track studios based on specific props? I know for me, in an ideal world with infinite time, knowledge, and money it would be interesting to compare still life paintings, portraits, and inventory records to see if there is any overlap in a painters oeuvre maybe to elucidate the way the objects were being used. Alas alack I think it might be out of my skill range and pay grade for quite a while. Thanks again for sharing!

  4. Michelle,
    I appreciate your candid honesty in acknowledging the “so what” question. I think your one example from the Victorian literature gets at a specific point – that just taking the time to visualize and chart the data gives legitimacy to arguments which may have previously been based on several anecdotal pieces of supporting evidence. While these data mining tools are certainly very flawed, there could be ways to use them to this end of there were fewer variables. Perhaps one way to truly address the “so what” question is to consider if we are trying to answer the wrong questions. The issue of looking at the spread of pigment, for example, would be just as flawed even if it were not a digital humanitites project, but a more tradiitonal one. Sometimes, the questions we are trying to answer or the evidence we are seeking will be flawed regardless, and so I think we really do need to take a close look at our own expectations. Digital humanities does not fix flawed research. Have you also thought of ways that digital humanities data mining could be used not for image analysis, but for something else? For historical, cultural, socioeconomic data relevant to a research project? The benefits of text mining are going to be more clear when actually dealing with text, as opposed to judging the text mining purely from an image – based approach. I liked your suggestions for very specific uses of AI for image recognition and possibilities for cross cultural connections, for instance. I’m happy to see that in spite of the seemingly pessimistec view some have of data mining that you are still hopeful for its potential in your work. Thanks for a thought-provoking post!

Leave a Reply

Your email address will not be published. Required fields are marked *

Follow by Email