Category Archives: technical

Outage update + Mystery Monday: Wurgitan

Every Monday we will post an entry that hasn’t yet been published with a view towards harnessing the collective onomastic power of the internet. If you have any thoughts about the name’s origin, other variants it might be related to, other examples of its use, etc., please share them in the comments! If you wish to browse other Mystery Monday names, there is an index.

But first! A huge shout-out to our technical guru, Dr. Joel Uckelman, who got the site back up and running again Saturday evening. What should have been a simple server upgrade turned into a whole row of dominoes collapsing; first, it turned out that our original hosting service was not equipped to handle the upgrade, and the first we knew of this was when we tried rebooting the machine and it wouldn’t. We switched providers, he set up an entirely new virtual machine and server, only time find that when he tried to restore all the data via rsync, the connection kept getting dropped after 10-15 seconds, making it completely impossible to rebuild the site. After a couple of rounds with customer service, which regularly got escalated up to the next level, it became clear that (a) it was a network issue on their end, not our end and (b) they weren’t interested in doing the legwork to find out what the issue was and fix it. So, bye-bye hoster 2, on to hoster 3. He set up a new virtual machine Friday night, and thankfully by the end of Saturday we were able to have the entire site restored. If you’ve ever benefited from the DMNES and would like a way to say thanks, feel free to buy him a beer or a coffee if you’re ever in the area. The hard work of the editorial team would be nothing without the technical infrastructure to make the data available to the world.

Today’s Mystery Monday name is from the Redon cartularies, a dithematic Breton name where we’ve identified the prototheme but not the deuterotheme:

Wurgitan

Our resources for Breton names are, unfortunately, rather limited; so if an element or name doesn’t appear in what we have, we’re generally pretty much at a loss. If any of you, dear readers, have better Breton resources than we do, we’d love to know what you have to say about this name! Please share in the comments.

Advertisements

Leave a comment

Filed under announcements, crowd-sourcing, dictionary entries, mystery monday, technical

DMNES outage: longer the planned

Unfortunately, what should’ve been an easily fixed upgrade problem has snowballed and we’re trouble shooting with the company that is hosting the virtual machine that runs the website concerning a networking problem that is preventing us from being able to restore from backup. It’s not clear whether they’ll be able to figure out what the networking problem is, or whether we’ll have to switch to a different hosting company.

We hope it won’t be too much longer before we’re back up!

Leave a comment

Filed under announcements, technical

DMNES temporarily down

After a server upgrade this afternoon, the machine that runs that DMNES website didn’t reboot. We’re troubleshooting now and when the site is back up and running we’ll announce this on Twitter and Facebook.

Thanks for your patience!

Leave a comment

Filed under announcements, technical

Digital Humanities: a fairy tale? (Guest Post)

We’re very pleased to host our first guest post here on the blog. After our recent post Digital Humanities: Challenges, Difficulties, Reflections, and Questions, we thought it would be interesting to get the perspective of Comp’s side of things, since our post mostly focused on the view of Hums. We invited Dr. Tarek Besold of the KRDB Research Centre at the Free University of Bozen-Bolzano and Sarah Schulz of the Institut für Maschinelle Sprachverarbeitung at Universität Stuttgart to share their thoughts on the issues people on the ‘Digital’ side of Digital Humanities have getting into the ‘Humanities’ side.


And it came to pass, that two curious explorers went out to foreign realms, ruled by two kings that could not be more different in their rules and aims. Moving carefully not to upset the other kingdom, the explorers exercised diplomacy. Gently, they strove to understand each other’s worlds. But what they didn’t know was that their diplomacy and cautious approach could do more harm than good…

In this or a similar way, the story of Digital Humanities could have started. But are Digital Humanities a fairy tale or do they really exist — and if they exist, where can they be found?

Countless attempts to access the nature of DH have been started, sheer endless seems the pool of explanations what it does and what it is aiming for (cf. whatisdigitalhumanities). Dependent on who defines it and despite all carefulness, it is not uncommon that the digital disciplines are reduced in their role to a serving science, aiding Humanities to reach their traditional aims using (possibly speedier or more efficient versions of) their traditional methods. The possibilities of evolution on both sides, powered by a cooperation on equal footing — with potential for a real fusion of disciplines in the long run — instead of a hierarchical interaction with clear primacy of one discipline over the other (usually the H being in the stronger position than the D), are often neglected. By the way, these possibilities do indeed exist on both sides and not as widely assumed by Humanities scholars just on their side. We have often been asked: “Why do you want to help us? Why do you spend your time on this” This answer is quite simple. From a computational point of view the diversity of data and modality poses a big challenge for Computer Science. The problems arising in such projects are often much closer to real-world problems than those tackled in the ivory tower until now and techniques developed in this context will bring advances for all sort of real-world applications.

Furthermore, DH is a field that is not just shaped by the risk of miscommunication of totally different disciplines, but moreover a field that carries the fear of its very own scholars along. Afraid of being left behind by an evolving form of their science of origin, researchers seem in constant fear to lose their right to exist. Thinking back to our study times, when students of theater science spent an entire semester of their basic education on justifying why theater science should be an independent science, it is surprising to see the opposite trend in Digital Humanities. Whereas the upcoming science of theater was fighting for its emancipation, the newly developing field of Digital Humanities embraces many disciplines. But instead of the scientists who are strolling in this new and undiscovered realm searching for the right to define this new field, single disciplines try to position themselves inside of it, desperate to show their surplus value unable to let go the old for something new. Interestingly, this finds its expression in the withdrawal to laboriously developed vocabularies and terminologies of specific fields, to equal parts jeopardizing the diplomacy and communication of our abovementioned explorers.

The right question to ask might therefore not be what Digital Humanities are but rather what they could be once the two explorers let go of their shyness and fear. The most promising characteristics of what is apparent so far might be a strong will. The will to collaborate despite all frustrations, all communication barriers and all compromises that have to be made with the prospect of a gain. Of which kind this gain might be remains to be seen. But it must be this gain that will define this newly developed science. As a defining part of science is the contribution of new knowledge to mankind, there must be more to it than just the pure simplification of the manual labor of a Humanities scientist. So far, DH is more a potential than a science. DH is something that has yet to develop.

Usually, people make a division: on the one side are the computer scientists, on the other side are the Humanities scholars, and when both sides meet, DH comes to be. To be honest, we are not sure that we are really and wholeheartedly able to believe in this simple formula. To be even more honest, we don’t even want to believe in it. Because if this simple rule was true, DH would have no reason (and consequently also no right) to call itself a new field. It would just be a new and fancy tool in the toolbox that Humanities scholars use to tackle their research questions. Still, having grown up in a world of accelerating progress in Computer Science — and in artificial intelligence in particular — we share the strong conviction that there must be more to it.

What is this more that we are calling for? What we are asking for are new methods and new knowledge. Methods which are genuinely new and only come to pass due to the application of computers and AI techniques and technology to questions in the Humanities. Knowledge that would have remained hidden without these methods and the corresponding consolidation. We envision results that, no matter how much time a human Humanities researcher would invest manually in striving for, she would fail. We are convinced that DH cannot, shall not, and will not only be a speed-up and scale-up of conventional methods as practiced for decades and partially centuries. Computer science can offer the skills to work out multimodal collaborations between different fields of Humanities. When art meets sound, when text meets pictures, knowlegde comes to light that otherwise would have remain hidden.

The practice of Humanities as a field of study are in a certain way one of the “most human” activities imaginable. As they have research subjects created by humans like works of art, poems or films, it seems to be an indispensable condition that the researcher needs to share the humanness with the originator in order to understand. Scholars often have to rely on their interpretation of incomplete or lacking information, especially in the arts much is conveyed in an implicit manner and left to the recipient to add, and so forth. Where we have good reason to believe that all humans share basic principles and qualities in their cognition, this also means that we have to accept that we all share the same biases and limitations – both, on the perceptual and on the mental/reasoning level.

Artificial systems aren’t human — they don’t employ the same “hardware”, they don’t work according to the same functional or mechanistic principles, and their strengths and weaknesses seem almost orthogonal to our own. To many (not only, but especially on the Humanities side of things), this makes AI appear absolutely incompatible with the goals of Humanities research. Still, coming from fields such as cognitive AI or computational cognitive modelling, we are convinced that misunderstanding and prejudice are at the basis of this conclusion: It is true, AI still is far from achieving its ambitious goals in overcoming more and more parts of this distinction between man and machine. Nonetheless, cognitive systems and computational models of cognition over the last decades have made great progress and already today put us in the position to model significant parts of human cognition in ways which 20 or 30 years ago would still have been unimaginable. Modelling frameworks such as SOAR or ACT-R allow us to build psychologically-plausible task models down to the level of cognitive memory processes or even activation patterns in a simulated neural network not unlike parts of the brain — why not apply all these possibilities in modelling tasks in the Humanities, complementing and completing the reasoning based on introspection and observation of actual humans as practiced today?

But we want to go even further. Many researchers in Computer Science and AI over the years have learned to appreciate and leverage — rather than deride and avoid — the already mentioned foundational differences between man and machine. An AI does not “see with human eyes”, human perceptual and cognitive primitives are foreign to it, its entire way of perceiving and conceptualizing are qualitatively different — and, thus, allow for different forms of sensing, perceiving, and structuring of its environment and of information in general. Using artificial systems and computational methods might offer genuinely new perspectives on these subjects — perspectives which are new and enriching precisely because they are not human, because they do not adhere with our shared “cognitive program”, but instead are only achievable by taking the human out of the loop and letting the machine do the sensing, perceiving, and reasoning.

And we think what is in our way to find these new means and approaches of gathering knowledge, is the research question itself. What is needed is exploration without fear and the perception of prejudice as a chance (namely in overcoming it) rather than as a valid reason for staying where we are and doing what we do. The non-human and, thus, this far unknown objectivity artificial systems can introduce to DH should be welcomed as an opportunity to see well-known objects of investigation under new light and with literally new “mechanic” eyes. What this requires on our side is neither the mind of a Humanities scholar nor the mind of a computer scientist, but the mind of a curious explorer without fear of breaking laws in one or the other kingdom. And the courage to allow also non-human aspects into the very field called Humanities to make a fairy tale come true.

1 Comment

Filed under technical

Digital humanities: Challenges, difficulties, reflections, and questions

Last night on Twitter over at We The Humanities we got onto the subject of collaboration, in particular digital humanities collaborations, and some of the pitfalls and problems that can be faced:

Since it’s hard to encapsulate these very complex issues in 140 characters, we thought we’d devote a blog post to some of the challenges and difficulties we faced in the development of the Dictionary, as well as some questions — because sadly, we don’t have any answers, at this point.

In an ideal world, digital humanities collaborations would involve experts on both sides: Experts in some field of the humanities pooling their resources with experts on the technical and computational side of things. Unfortunately, the world is not always ideal.

Many DH projects receive their impetus on the humanities side, and thus involve someone without the technical/computational skills looking to collaborate with someone who does. (For ease of discussion, I’m going to call the humanities person Hums and the technical person Comp.) This is the first hurdle: How to find suitable collaborators. Some universities are lucky enough to have digital humanities clusters or centers which have technical people whose dedicated job is to collaborate on such projects. Unfortunately, such dedicated clusters are still quite rare. Even where they exist, there are challenges. The people who work in these clusters are probably already involved in a number of other programmes, which means that Hums has to pitch his project in such a way as to get Comp interested. If Hums isn’t able to, he’s going to be out of luck in utilizing that resource. Now, maybe Comp is interested, and furthermore doesn’t already have enough projects on her plate to prevent her from taking on a new one. Odds are that Comp has been hired into this cluster/resource so that she can contribute to the technical research agenda of the group. This may be the building of some software or infrastructure, the development of search types of methods or algorithms, or something else: Whatever the reason, Comp is going to have to find a way of integrating her contribution to the collaborative project into the work that she has been hired to do. This means that Comp comes into the project with an agenda, some pre-conceived idea of the approach/software/methods/mechanism to use, rather than coming to the project asking “What are the best approaches/software/methods/mechanisms for this project?”

When the Dictionary was in its infancy, we were incredibly lucky to have access to the members of Heidelberg Research Architecture, who were incredibly generous with their time and helpful with the initial “what, exactly, is it that you want to do?” stages. Ultimately what we learned was that the infrastructure that they could offer was not going to be well-suited for our needs: Their focus was primarily on projects where the data already existed, but simply needed to be appropriately marked-up (usually using TEI) so that it could be accessed by a search interface, while in its infancy, the Dictionary was focusing on collecting and creating data, rather than taking pre-existing and already published information. But this period in the development of the Dictionary was marked off by another challenge that Hums can face when talking to Comp: Hums probably only knows very generally what it is that he wants, and not any of the specifics of how it can best be realized. This is, after all, why he is talking to Comp in the first place! (There are not many unicorns out there, people who are both Hums and Comp in equal measure.) This means that Comp has to be able to listen to not only what Hums says he wants, but what he actually wants:

In the end, we knew enough about what we wanted to know that what HRA could offer wasn’t for us, so we had to start looking elsewhere (as would Hums who don’t have a dedicated DH group within the institution to tap). There are two options: Within the university and without. Even universities that don’t have dedicated DH groups are likely to have Comps of various flavors, and maybe even a Comp or two who is interested in the humanities and interested in collaborating with Hums. A difficulty that arises here is that what counts as research for Hums and what counts as research with Comp is likely to be different, and often what Hums needs from Comp is the production of a product using already known/available resources, whereas what Comp needs to justify her involvement in the project to her department is the production of research output. But if what Hums is doing doesn’t look like research to people in Comp’s department, Comp will have trouble justifying her spending the necessary time on Hums’s project.

The next option is to go outside the university: There are plenty of qualified Comps out there. But this leads to the next challenge: Cost. The problem is, if these Comps are not working in the university, they’re probably far too costly for a DH project to be able to afford, in two ways. First, quite literally: Unless Hums can find a demi-unicorn (i.e., a Comp who is willing to provide her services free of charge/on a volunteer basis), it is not unreasonable for Hums to have to cough up $100/an hour if he wants ot hire a consultant; and if Hums wants a good job done, Comp is going to spend a non-negligible number of hours working for him. Second, even if Hums is lucky enough to have grant money for his project (perhaps in the form of some seedcorn money, or a start-up grant designed to get DH projects off the ground and up and running), that grant money often comes with strings. When the Dictionary‘s base of operations moved to England, we were, naturally, quite interested in seeing what sorts of funds for starting up collaborations of this type were available. What we found is that many funding agencies specifically restrict recipients of their money from using it to pay consultants. So even if Hums has the money, he might not be able to use it to pay the people he’d need to pay.

In the end, the Dictionary has been incredibly lucky: We achieved our excellent infrastructure without any monetary outlay (The lack of monetary outlay is also why we don’t have too many bells and whistles yet: The whole Beggars Cannot Be Choosers thing!). But as we discussed on twitter last night, our experience is not one that generalizes easily:

So, what are some of the challenges Hums faces when trying to get a DH project up and running? (a) He may not know what he wants, or he may know but not know how to articulate it to Comp. (b) Finding Comp is difficult. (c) Comp’s agenda may not align with Hums’s. (d) Comp costs too much. Unfortunately, we have no good solutions to offer for these difficulties; the best we can offer are some reflections and questions:

  • Can we change the perception of the contributions of Comp to DH projects so that such collaborative work does count in the eyes of her academic/university colleagues? If collaborative outputs which genuinely further Hums’s research agenda was valued in the context of Comp’s research context, then it would be easier to find Comps who are willing to lend their skills to Hums.
  • How can we work with funding agencies and bodies to change their perceptions of the use of grant money for consultants, especially in the start-up phase? If Comps inside the university are hesitant to devote the necessary time to DH proejcts, the only other option is to go outside the university: And outside of the academic ivory tower, experts cost money. If you want to have a good product, you have to be able to pay for it.
  • The Dictionary benefited quite a bit from having a long incubation period: More than a year from inception to first creation, and that year involved a lot of protracted discussion, much of which happened in small increments and not at scheduled times. The fact that our Hums and Comp simply spent a lot of time in each other’s presence made determining the appropriate way forward significantly easier. How can this sort of experience be translated into more standard DH arrangements?

In this post, we’ve mostly focused on the problems facing Hums without any Comp; there are also problems that face Comp without any Hums. We make no promises, but we’ve got someone we’d like to tap for a guest post on this side of things! As we said, we don’t have any of the answers for these challenges, and many of these difficulties are not ones that a single person or a single project can change. But these challenges can only be surmounted if we get the conversation going, which we hope this post will contribute to.

3 Comments

Filed under technical

Digital humanities, medievalism, and the importance of errors

Two weeks ago, I attended the Middle Ages in the Modern World conference in Lincoln (and gave a talk on medieval vs. ‘medieval’ names if you haven’t already read the recap of that), which included an extremely interesting paper by Bridget Ruth Whearty (Stanford), “Of Scribes and Digitizers: Modern Digitization Studio as Medieval Scriptorium”, in which she gave the audience an overview of the making of a digital medieval manuscript, in particular the painstaking processes that are in place to ensure that every single digital copy of every single manuscript leaf is presented in identical conditions to every other one. The lighting of the room is controlled, the camera is perfectly placed, there are constraints on what the technicians can wear while the digitization is taking place, etc. The same leaf can be photographed multiple times until an error-free version is obtained. The result is as close to the original as you can get, perfect in presentation — and also perfectly anonymous. And both these facts gave rise to critical discussion during the Q&A.

In particular, it was pointed out that, for a group of people who claim to understand and respect the importance of history, we can be remarkably bad at the keeping and documenting of our own history, in this case, the ‘history’ that is created in the context of digital humanities or digital medievalist projects, such as the digitization of a book. By concentrating on the production of a ‘perfect’ piece, the process by which the piece is create is erased: The end product has no record of the images that were ultimately discarded, the ones where the light was wrong, or the folio misaligned, where there is a reflection of one of the technicians visible, or even a more tangible piece such as a hand or an arm. People in the 21st C interested in manuscripts are often deeply interested in their production — how were they made? By how many people? In what order? What were the tools and techniques used? People dedicate their academic careers to answering these questions: So why don’t we provide the answers to the same questions concerning the production of a digital copy of a manuscript? Another worry is that focusing on the production of a perfect end product can (though it is important to note that it doesn’t have to) result in the erasure not only of the errors along the way to perfection but also of the people who participated in the production. This erasure is even more problematic than the other, because of who it is that is doing the work in these projects: Often it is junior researchers, sometimes even undergrads, and more often than not these junior people are women. When only the name of the PI is attached to the finished product, and not the names of all the people by whose work the product was produced, these already often marginalized people are further marginalized by their erasure from the history of the digital MS. Again, historians often spend lifetimes seeking to identify particular scribes — is hand X the same as hand Y? Do we know the name of the person who wrote this MS? When such questions are answered, it is a cause for celebration. So why, the question was raised during the Q&A on Whearty’s presentation, are we so eager to remove from the modern context the very information that we seek for so ardently in the medieval context?

These are all important questions that I think anyone working on a digital humanities/digital medievalist/digital classicist project needs to keep in mind, that it is not only important to use these techniques to shed light on the history of the past, but also important to make sure we don’t extinguish the light on the history of the present. I’ll admit, I listened to Whearty’s presentation and the ensuing discussion with some amount of surprise (not ever having seen the ins-and-outs of the procedure of digitization before), because it showed to me that the Dictionary has been developed to work in a very different sort of context. Two of the guiding factors in the development of our technical infrastructure have been, from the very beginning:

  • Document EVERYTHING
  • Attribute EVERYTHING

Every single piece of information is, from the moment the file its in is created, is deposited in a version control repository, specifically, github (it’s a private repository, though anyone who would like the code, as opposted to the data, can obtain it by asking). Every single change, no matter how small, to any file is documented: What was changed, who changed it, when it was changed, and why. Every file is reviewed by hand before it is marked for inclusion in an upcoming edition, and we keep track of who marks an entry as ‘live’, and when. When errors are identified, they are corrected, and again everything is documented: Was the error a typo? Was it a misidentification of the canonical name form? Was it an identification of a medieval place name with its modern correspondent? Was it the addition of further information? Was it the follow-up of a ‘todo’ that had previously been put into the file? By keeping track of all this information, we are able to pinpoint specifically when and how errors entered the data (because we know that there will be. No one is perfect, not even us!), and gives us a means by which we can provide quality control over our processes.

This means that none of our history is lost. If at one point we have a variant name form identified with one canonical name form, and later on realize this was in error and update it, we not only have the updated version but also the chain of documentation that led us to that version. This is particularly important for anyone who might question “Why is X a form of Z rather than of Y?” or for any errors in published versions that are noted by our readers and corrected in later versions. The latter point is crucial. By publishing new editions on a quarterly basis, we give up some amount of stability that ordinary books have: Information in an edition that someone cited in one month might not be there a few months later if an error has been identified. This may make some think that the Dictionary is not useful as a stable source to be cited, but in fact, there is no reason to worry: Because we document everything, no information is ever lost. When entries are updated in new editions, the old versions do not disappear, they are simply moved to the archive, and then noted as such (cf., e.g., the archive of the 2015 no. 1 edition). Thus, researchers can cite any version of the Dictionary confident that even if the data they cite gets superceded, it never gets removed or destroyed.

So much for “document EVERYTHING”. The other guiding factor is “attribute EVERYTHING”. Documenting everything makes it easy to determine mechanically who contributed to an entry, either by writing parts of the main entry, or entering variant name forms, or by reviewing and correcting typos, or even by fixing XML errors. I decided early on that I did not want to be in the business of deciding whose contributions were “important enough” to warrant including them in the citation, because the truth is that every contribution was necessary before that particular entry could be published. This is why every single entry has its own, individual “how to cite” instructions at the bottom, e.g.:

example citation

The author lists for these are created automatically by extracting the authors of the git commit messages for the relevant files for each entry. By including everyone who contributed, we ensure that while we are documenting medieval history, we are documenting our own history as well.

The field of digitial humanities is still relatively new, and for the most part people active in it are still working out what exactly the field is, and how it should function. This means it is still early enough to make these important aspects – documenting the process, not just producing the oucome; and giving credit where credit is due – part of our basic operating principles. If your project doesn’t already operate along similar lines, I’d like to charge you to consider why, and whether this should change.

4 Comments

Filed under technical

What next?

We’ve hit our biggest milestone, but rather than sitting on our laurels, we’re already looking to the future. Here’s a brief summary of some of our plans.

We will be publishing new editions on a quarterly basis, to begin with, so the next edition is planned for July 2015. The List of Entries page will now reflect new entries to be published in the next edition. In addition to having more entries and more citations, what else are we planning for future editions?

  • Search: This is the biggest level of functionality that we don’t yet have. We will be adding search tools that search (a) header forms; (b) citations; (c) full text; (d) any combination of these; and these tools can be used with limitation functions, e.g., search only a certain gender, time period, geographic location.
  • Hebrew and Arabic scripts: Despite all of our pre-testing, we’ve discovered that the Hebrew, Arabic, and Aramaic scripts are showing up in reversed order. We will have this fixed by the next edition at the latest.
  • Maps: We are already in contact with a GIS specialist who will help put together maps for each entry, showing where a name was used over both time and place.
  • Requested entries: We are happy to take requests for new entries, if there is a particular name you are looking for that is not yet included. You will then be alerted when the relevant entry is published or updated in a future edition.
  • Mobile optimisation: Right now, we’ve done no optimisation for mobile browsing; our primary goal was to get a computer-browser appropriate version up and running. But while the site doesn’t look bad on the few mobiles we’ve tried, there are a few simple things we can implement to improve things.

We have other bells and whistles planned (browsable categories, timelines, toggle between sort-by-date, sort-by-name, and sort-by-country), but they will be rolled out over the longer term. We are also interested in hearing from our users: What would you find useful? what would you find interesting? What would you find necessary? Let us know, either via a comment here or via email to eic@dmnes.org.

7 Comments

Filed under crowd-sourcing, technical