Why we care
For the second consecutive year, the writers here at Swampflix have been attempting to complete the #52FilmsByWomen challenge posed to us by the organization Women in Film. The pledge is simple enough: to try over the course of one year to watch the equivalent of one film per week by a female director or female writer. As a staff member of a library, I started to wonder what films from within our own collection qualify and how do I find that out? A team of several colleagues, including my co-author Rachel Tillay and supervisor lisa Hooper formed to answer this question about our own collection, with the aim to create a tool that would allow other institutions to similarly analyze their own holdings.
The Representation Problem
Recently, students of film and film arts have begun to ask whether the creators of film accurately reflect the human record. Studies such as “Inequality in 800 Popular Films: Examining Portrayals of Gender, Race/Ethnicity, LGBT, and Disability from 2007-2015” have explored the relationship between creators and whether they accurately represent the human condition. Interest in the unequal rates in which women fill various positions has been particularly acute. Women in Film found that “women comprised 11% of all directors working on the top 250 films of 2017.” Women are slightly more likely to be involved in other parts of the creative process. For example, “overall, women accounted for 16% of all directors, writers, executive producers, producers, editors, and cinematographers working on the top 100 films. Women fared best as producers (24%), followed by executive producers (15%), editors (14%), writers (10%), directors (8%), and cinematographers (2%).” These studies all point to the importance of further examination of the factors that lead to inequality in hiring and funding practices in movie business.
The Data Cycle and Libraries
While the factors that lead to inequality in the creation of film are being examined, the role discrimination plays in other portions of the data cycle have not been examined. The data life cycle is the process which occurs between the creation of a film and the inclusion of that film as part of the inspiration for a new film. This is the work of libraries, archives, museums, and other cultural heritage institutions. For example, libraries collect or acquisition film into their collections, describe the films (also known as creating metadata or cataloging), and store the content for the long term. Specifically, two of the most common, long-standing characteristics of libraries are that they are a “collection [of] what is deemed to be important information” and that they “preserve the information for future users.” [Evans, G. Edwards and Margaret Zarnosky Saponaro. Collection Management Basics, sixth edition pg. 2]
Nevertheless, libraries are not living up to their own ideals regarding properly recording the wealth of diversity present in modern culture. In fact, one of the topics being discussed passionately in recent conferences (such as ALA 2018 held this past June in New Orleans), is how can these organizations work to increase inclusion in their own organizations and the wider community, preserve the record of oppressed peoples, and correct past practices which suppressed the knowledge and values of minorities. In this context, the question about diversity in film becomes, “is the work of a diverse population being acquired, described, and preserved by historical institutions?” When libraries acquire film and make it available for loan we are supporting the status quo if we collect more films by men, describe them more accurately, loan them out more often, and save more of them for future watchers. Additionally, the libraries that exist on the margins often struggle to protect the collections they’re preserving. As an example of the scale of the loss, the sample collection of data we are examining begins with DVDs bought in 2005. All other DVDs owned by Howard-Tilton Memorial Library, and a number of other items, were all lost when ten feet of water filled the bottom floors of the library during Hurricane Katrina.
If, however, we can begin to correct this bias by collecting, describing, loaning, and preserving more films by women or other under-represented groups, we are participating in creating a more accurate version of the historical record and succeeding in our mission, as well as providing a more equitable set of data from which new films will draw for their inspiration.
A New Tool
For this reason, Howard-Tilton Memorial Library has begun to explore our own collections and is developing a free tool that will allow other preservers of the historical record to examine their own collections to answer these questions. Our initial project has been to examine what percentage of our DVD collection was directed by women and what percentage of the directors whose work we have collected are women.
This project was more difficult than desired because only recently have library metadata (or catalog) records for DVDs been allowed to incorporate demographic data about the creators, and the majority of records created by libraries around the world rarely include this data. Unfortunately, in the complex calculus of balancing comprehensive records for all information and detailed records, many new fields like those for demographic data are often ignored. Additionally, the terminology that should be used in demographic fields is still in development. Catalogers and metadata librarians are exploring how to describe gender in sensitive and accurate ways. The terminology must encompass cis and trans, male, female, and gender non-conforming identities. It must be useful for grouping and analyzing large sets of data, be relatively stable, and be extensible as terminology change over time.
Fortunately for our purposes, cataloging records do almost always very carefully note who the agents associated with the creation and dissemination of each object are. The names are recorded according to a very detailed set of predictable rules, many creators of multiple works are assigned their own name format to distinguish from people who have the same name, and they are included in the same place in every record. Many records also use terminology or codes that describe the role each person played. We were also able to harvest into our dataset lists of female directors from Wikipedia’s female directors list, Annenberg Inclusion Initiative’s Inclusion in the Director’s Chair, and Collider.com’s The Most Exciting Female Directors Working Today. We created Python scripts and regular expressions that interpret the most common data structures in libraries (inverted names, often followed by dates or other identifying information) into direct order (First Name Last Name). We documented the process we used for creating and applying these so that others can recreate or extend our work. Finally, we compared the imperfect lists that resulted. We were disappointed to realize that only a bit more than 4% of our DVDs have female directors. We are hopeful that as we add missing names to our data, that the percentage will increase. However, we are also going to put more effort into acquiring films with female directors in an attempt to create a more representative collection.
We invite you to participate in this work! Ways you can participate include:
- Contributing to lists of creators on Wikipedia who belong to under-represented groups.
- Examine your collections, or collections you have data for. (Spoiler alert: it would take some effort, but nearly all libraries have provided some information about their holdings publicly online). Because our code is available for free online, you can reuse it as well!
- Check our work! Is there something obvious we’re missing? If you find something we should take into account, you can even submit suggestions through Github and we would love to add them in!
-Rachel Tillay & CC Chapman