Computer aided full-text searching in manuscript documents

By Ben Lacey, Adam Matthew Digital, a sponsor of the 2017 DLF Forum

 

Between the start of 2014 and the middle of 2015 most of my working day was spent doing one thing: reading through seventeenth- and eighteenth-century manuscripts, and writing down the names of people, places, events and keywords that appeared in them. It could take several days to work through a single volume of these texts from C05 series in The National Archives, UK. This summer, I typed the word “rebellion” into a search box, and watched a computer take seconds to find three instances of the term in the handwritten CO5 document I was viewing.

Handwritten Text Recognition, HTR for short, is something of a holy grail for manuscript studies. I will confess, I was sceptical about the chances of a computer being able to work with the idiosyncrasies and different styles that handwriting presents. I have even presented conference papers in the past on the limitations of printed-text recognition programmes. I think many in the field retain the mix of hope and scepticism that I have always had.

However, work in this area is moving rapidly forward, and has been for several years. I am now a full advocate for the success of this new technology, and the potential advantages it will bring. Since that first test, I have seen more that has convinced me that a technology-led revolution in manuscript studies is coming. The HTR search programme that I have been testing now works not only on the clerical hand of the CO5 documents, but the rougher writing of correspondence to and from the nineteenth-century statistician, nurse, and sanitation reformer, Florence Nightingale.

This is, of course, early days. Technology is never perfect when it first appears; but what an amazing opportunity we now have to refine this ground-breaking innovation. The ability of a computer to search even a single page of cursive handwriting would have been unbelievable just a short time ago. It is now a reality. That in itself is amazing. The question now is how far can we take this innovation? And, perhaps more importantly, how can librarians, researchers, and teachers best use it to enhance current study?

This last part is important. I do not think anyone would advocate cutting back on the deep knowledge and understanding of texts gained through close reading of the material. Rather, the innovations we are now seeing in manuscript studies and the digital humanities need to complement and further advance the excellent work already being done. The DLF Forum would be an ideal opportunity to discuss this technology and its potential impact with those at the forefront of digital research, teaching and collection management. It would be great to see you at the Adam Matthew table if you would like to be part of this conversation.

Skip to content