Our technical guru has been hard at work putting together the Dictionary‘s front end, and last night I got a sneak preview of where things stand. The first edition won’t have all the bells and whistles, but it will have all the data, and what I’ve seen so far is pretty darned awesome. (I think I may have used the words “coolest thing ever”, but hyperbole is allowed when you’re seeing glimpses of the fruit of 20+ years of your life.) The neatest part about finally having a published version of the Dictionary is being able to do easy, at a glance, comparisons, such as how common a name was over a temporal or geographic spread, or how many languages it shows up in, or the variety of spellings within a language. I spent a good 20 minutes simply clicking through random entries, and could easily have lost my entire evening to doing this.
It’s so very exciting to watch everything coming together. Let the countdown can begin: 18 days to publication!
Brand, spankin’ new, you can now reach the DMNES Editor-in-Chief via email@example.com.
Thanks to many hours of volunteer work by our chief technical person, Dr. Joel Uckelman, the Dictionary is one step closer to having a working public website and now has a working private website where the editors can log in to enter new citations and new header names. The webform allows us to not have to worry about editing XML by hand (which is where far and away the most errors creep in). It validates the files against a custom-made schema (also created by Joel), ensures that the header forms being entered actually exist, and cross-checks the bibliographic keys against the bibliographic database. The forms will allow us to proceed with the entering of data at a much faster rate than previously, meaning that our goal of having an initial version of the Dictionary with basic searches available by the end of September is still a reasonable one!
One of the hardest things about getting a project like this up and running is manpower and funding. Many people who may be interested in contributing can only do so if their time is being recompensed; and yet, funding agencies are often loath to provide funding for “unproven” projects. We are extremely grateful to Joel for taking time out of his own projects to build this infrastructure for us, and getting us out of this Catch-22!
After rescheduling twice — hence the radio silence here — the two of us local to Heidelberg finally managed to meet with Eric Decker, the head of the HRA this afternoon. It was an extremely productive meeting where we discussed the nature of the potential relationship between the HRA and the DMNES, and what sort of technical guidance and help they can offer.
One of the most important concerns we have is the long-term sustainability of whatever set-up we choose: While we want to take advantage of whatever assistance the HRA can offer, we have no guarantee that the Dictionary will have ties with Heidelberg beyond the next four years. Thus with the choices we make now we hope to maximize the opportunity to collaborate with HRA while making it possible to collaborate with others in the future.
As a result of the preliminary discussions today, we believe that it makes most sense to store the Dictionary‘s data in individual XML files in a git repository. Keeping the files under version control will allow us to keep track of who edits the data and when, so that we can pin-point when any errors enter in (though hopefully these will be very few!). This data can then be imported into an XML database such as eXist-db or into a relational database such as MySQL, depending on whichever is more suitable for the task at hand. It is most likely that we will begin working with eXist-db, for the expedient reasons (1) that the HRA already has a large infrastructure built around eXist-db in place which we could both take advantage of and which the Dictionary‘s data could contribute to by making easy linking to other projects that the HRA develops, and (2) that I am currently taking an intensive 6-week course run by Wolfgang Meier, the author of eXist-db, and thus can benefit from training by the expert. Next Monday after the next training session I hope to speak with him about the Dictionary to see if he would be willing and interested in helping us with the another crucial component: Developing the XML schema.
Because we intend for the dictionary to be available online and fully searchable, and our primary goal is to produce this version, as opposed to the production of a print version which will then be put on-line, the first step in creating the dictionary is to choose the correct technical framework. It is important that we take the time necessary to create the correct infrastructure appropriate for our specific needs. This requires the initial establishment of two things: Means of data storage, and method of data retrieval, and this is where our on-going discussions with members of the Heidelberg Research Architecture as well as with the creators and maintainers of similar database/dictionary projects, such as the Middle English Dictionary, the Digitale Familiennamenwörterbuch Deutschlands, and the Dictionary of Medieval Latin from British Sources, come in. The two primary questions that we hope to resolve in these discussions are:
- Do we want to use TEI for our encoding (as the DFD does) or create a custom schema (as the DMLBS did)?
- What is more appropriate, an XML database (such as eXist-db) or a relational database (such as MySQL)? Both the DFD and the DMBLS use XML databases with XQuery; however, the initial dataset from which the DMNES will be built is currently in MySQL. Two posts which discuss the relative merits for each type of database can be found here and here.
We want to be sure that we pick an infrastructure that is optimal in many different aspects: in terms of search speed and efficiency, in terms of long-term maintainability, in terms of ease of use for non-specialists, in terms of cost and access (as a principle, the DMNES intends to work with open-source tools unless there is absolutely no alternative). Doing so is no easy task and we do not intend to rush things merely to be able to make headway on the fun part: Collating and interpreting the data!