Crowdsourced User Text Correction (UTC)

Author: Stefan Boddie, Managing Director, DL Consulting Ltd.
Date: 2014-06-24

Veridian’s User Text Correction (UTC) module allows users to correct OCR errors as they come across them in the text. For newspaper digitization collections, which often contain a lot of OCR errors, this can work surprisingly well.

A brief video overview of Veridian’s user text correction feature

Try Veridian’s Crowdsourced User Text Correction (UTC)

Lots of Veridian-based newspaper digitization projects are now using UTC to improve the quality of their collections and engage with their online patrons. Try it out at the links below.

The benefits of UTC

At first glance the benefits of crowd-sourced UTC might seem obvious — the “searchability” of the collection is gradually improved as errors are corrected. In reality though the corrections are just a fortunate by-product. The real benefit for the library is the creation of an engaged online community around their collection.

A good example of a newspaper digitization project for which UTC has been very successful is the California Digital Newspaper Collection (CDNC). In three years the CDNC has built a community of more than 3,000 registered users, and collectively that community has corrected more than 10 million errors!

Aside from the improvements to the OCR text the benefits to the collection owner include the following:

  • A more engaged user base — more repeat visitors, longer average duration of visits, increased total visits, etc.
  • Opportunities to communicate and engage with online patrons. Are you interested in learning more about who uses your collection, what they use it for, what their priorities are for future digitization, or what they do/don’t like about the collection? Go ahead and ask your registered users!

What motivates online patrons to use UTC?

Believe it or not many people do want to engage in this way with digitized newspapers and other cultural heritage collections, even if it’s only to correct OCR errors. Below are comments from users, from surveys conducted by the CDNC and two other projects using UTC.

I enjoy the correction – it’s a great way to learn more about past history and things of interest whilst doing a ‘service to the community’ by correcting text for the benefit of others.

I have recently retired from IT and thought that I could be of some assistance to the project. It benefits me and other people. It helps with family research.

I am interested in all kinds of history. I have pursued genealogy as a hobby for many years. I correct text at CDNC because I see it as a constructive way to contribute to a worthwhile project. Because I am interested in history, I enjoy it.

As an amateur historical researcher my time for research is very limited. Making time to travel to archives, libraries, and historical societies does not happen as often as I would like. The Cambridge Public Library’s online newspaper collection has been an invaluable resource and it is fun. I am very grateful for all the help I have received over the years from so many research organizations. Correcting text has several benefits. It makes it much more likely that I will find a story if I decide to search for it in the future. It is a way of saying ‘thank you’ to the Cambridge Library for having such a great resource available and maybe I can make the next person’s research a little easier. It is my own little historical preservation project.