Digital humanities, medievalism, and the importance of errors

Two weeks ago, I attended the Middle Ages in the Modern World conference in Lincoln (and gave a talk on medieval vs. ‘medieval’ names if you haven’t already read the recap of that), which included an extremely interesting paper by Bridget Ruth Whearty (Stanford), “Of Scribes and Digitizers: Modern Digitization Studio as Medieval Scriptorium”, in which she gave the audience an overview of the making of a digital medieval manuscript, in particular the painstaking processes that are in place to ensure that every single digital copy of every single manuscript leaf is presented in identical conditions to every other one. The lighting of the room is controlled, the camera is perfectly placed, there are constraints on what the technicians can wear while the digitization is taking place, etc. The same leaf can be photographed multiple times until an error-free version is obtained. The result is as close to the original as you can get, perfect in presentation — and also perfectly anonymous. And both these facts gave rise to critical discussion during the Q&A.

In particular, it was pointed out that, for a group of people who claim to understand and respect the importance of history, we can be remarkably bad at the keeping and documenting of our own history, in this case, the ‘history’ that is created in the context of digital humanities or digital medievalist projects, such as the digitization of a book. By concentrating on the production of a ‘perfect’ piece, the process by which the piece is create is erased: The end product has no record of the images that were ultimately discarded, the ones where the light was wrong, or the folio misaligned, where there is a reflection of one of the technicians visible, or even a more tangible piece such as a hand or an arm. People in the 21st C interested in manuscripts are often deeply interested in their production — how were they made? By how many people? In what order? What were the tools and techniques used? People dedicate their academic careers to answering these questions: So why don’t we provide the answers to the same questions concerning the production of a digital copy of a manuscript? Another worry is that focusing on the production of a perfect end product can (though it is important to note that it doesn’t have to) result in the erasure not only of the errors along the way to perfection but also of the people who participated in the production. This erasure is even more problematic than the other, because of who it is that is doing the work in these projects: Often it is junior researchers, sometimes even undergrads, and more often than not these junior people are women. When only the name of the PI is attached to the finished product, and not the names of all the people by whose work the product was produced, these already often marginalized people are further marginalized by their erasure from the history of the digital MS. Again, historians often spend lifetimes seeking to identify particular scribes — is hand X the same as hand Y? Do we know the name of the person who wrote this MS? When such questions are answered, it is a cause for celebration. So why, the question was raised during the Q&A on Whearty’s presentation, are we so eager to remove from the modern context the very information that we seek for so ardently in the medieval context?

These are all important questions that I think anyone working on a digital humanities/digital medievalist/digital classicist project needs to keep in mind, that it is not only important to use these techniques to shed light on the history of the past, but also important to make sure we don’t extinguish the light on the history of the present. I’ll admit, I listened to Whearty’s presentation and the ensuing discussion with some amount of surprise (not ever having seen the ins-and-outs of the procedure of digitization before), because it showed to me that the Dictionary has been developed to work in a very different sort of context. Two of the guiding factors in the development of our technical infrastructure have been, from the very beginning:

  • Document EVERYTHING
  • Attribute EVERYTHING

Every single piece of information is, from the moment the file its in is created, is deposited in a version control repository, specifically, github (it’s a private repository, though anyone who would like the code, as opposted to the data, can obtain it by asking). Every single change, no matter how small, to any file is documented: What was changed, who changed it, when it was changed, and why. Every file is reviewed by hand before it is marked for inclusion in an upcoming edition, and we keep track of who marks an entry as ‘live’, and when. When errors are identified, they are corrected, and again everything is documented: Was the error a typo? Was it a misidentification of the canonical name form? Was it an identification of a medieval place name with its modern correspondent? Was it the addition of further information? Was it the follow-up of a ‘todo’ that had previously been put into the file? By keeping track of all this information, we are able to pinpoint specifically when and how errors entered the data (because we know that there will be. No one is perfect, not even us!), and gives us a means by which we can provide quality control over our processes.

This means that none of our history is lost. If at one point we have a variant name form identified with one canonical name form, and later on realize this was in error and update it, we not only have the updated version but also the chain of documentation that led us to that version. This is particularly important for anyone who might question “Why is X a form of Z rather than of Y?” or for any errors in published versions that are noted by our readers and corrected in later versions. The latter point is crucial. By publishing new editions on a quarterly basis, we give up some amount of stability that ordinary books have: Information in an edition that someone cited in one month might not be there a few months later if an error has been identified. This may make some think that the Dictionary is not useful as a stable source to be cited, but in fact, there is no reason to worry: Because we document everything, no information is ever lost. When entries are updated in new editions, the old versions do not disappear, they are simply moved to the archive, and then noted as such (cf., e.g., the archive of the 2015 no. 1 edition). Thus, researchers can cite any version of the Dictionary confident that even if the data they cite gets superceded, it never gets removed or destroyed.

So much for “document EVERYTHING”. The other guiding factor is “attribute EVERYTHING”. Documenting everything makes it easy to determine mechanically who contributed to an entry, either by writing parts of the main entry, or entering variant name forms, or by reviewing and correcting typos, or even by fixing XML errors. I decided early on that I did not want to be in the business of deciding whose contributions were “important enough” to warrant including them in the citation, because the truth is that every contribution was necessary before that particular entry could be published. This is why every single entry has its own, individual “how to cite” instructions at the bottom, e.g.:

example citation

The author lists for these are created automatically by extracting the authors of the git commit messages for the relevant files for each entry. By including everyone who contributed, we ensure that while we are documenting medieval history, we are documenting our own history as well.

The field of digitial humanities is still relatively new, and for the most part people active in it are still working out what exactly the field is, and how it should function. This means it is still early enough to make these important aspects – documenting the process, not just producing the oucome; and giving credit where credit is due – part of our basic operating principles. If your project doesn’t already operate along similar lines, I’d like to charge you to consider why, and whether this should change.


Filed under technical

4 responses to “Digital humanities, medievalism, and the importance of errors

  1. A couple of thoughts and a couple of questions:

    Do you know Julian’s book, Error, Misuse, Failure? Or Seth Lerer’s Error and the Academic Self: The Scholarly Imagination, Medieval to Modern ?
    “Was the error a typo? ”
    Can you ever be sure you can tell the difference between a Freudian slip and a typo? See the last part of Derrida’s The Post Card for a fascinating discussion of a correction made to a “typo” attributed to Lacan.
    “This means that none of our history is lost.” Well, history in the vulgar sense of a timeline. But history as recursive or repetitive? Your history remains squarely in institutional norms of property and attribution. Who did what is a simple matter of ID. But of course people could use flash names, do prank digitizations with fake errorz. Software does’t automatically authenticate anything.

    • Sometimes a typo is just a typo, and not a Freudian slip, for example, when a key sticks and thus does not register that it has been struck, resulting in a missing letter; or when the person clearly began typing before the form was fully loaded, resulting in only the final letters of the name being recorded rather than the full name. Typing “Swedish” for “Sweden” might be a Freudian slip, but it’s an innocuous one and easy to fix.

      I’m not sure what you mean by “flash names” or “prank digitizations”. There are only 8 people with account credentials to enter and edit data directly into the Dictionary’s database. Should I find any evidence either that an account has been hacked or that the editorial assistance associated with the account is introducing deliberate errors, those accounts will be suspended immediately. The only software authentication that occurs is:

      * ensuring the XML entered via the edit form is well-formed and matches the XML schemata we’ve defined.
      * ensuring that all the required fields are filled in.
      * ensuring that the CNF (header name) and bibkey fields are filled in with values from a designated list of values.

      All other authentication of data is done by hand.

  2. Sorry, I meant Julian Yates.

  3. Pingback: 2015 wrap-up | Dictionary of Medieval Names from European Sources

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.