Crowdsourced User Text Correction (UTC)

Veridian’s User Text Correction (UTC) module allows users to correct OCR errors as they come across them in the text. For newspaper digitization collections, which often contain a lot of OCR errors, this can work surprisingly well.

The benefits of UTC

At first glance the benefits of crowd-sourced UTC might seem obvious - the “searchability” of the collection is gradually improved as errors are corrected. In reality though the corrections are just a fortunate by-product. The real benefit for the library is the creation of an engaged online community around their collection.

“Engaging with users and building virtual communities is just as important to the users as providing the data itself. They want to be part of a community.” - Rose Holley, National Library of Australia

A good example of a newspaper digitization project for which UTC has been very successful is the California Digital Newspaper Collection (CDNC). At the time of writing the CDNC has built a community of more than 14,000 registered users, and collectively that community has corrected more than 10 million errors! You can watch a short video of CDNC text correction at the bottom of this page.

Aside from the improvements to the OCR text the benefits to the collection owner include the following:

  •  A more engaged user base — more repeat visitors, longer average duration of visits, increased total visits, etc.
  • Opportunities to communicate and engage with online patrons. Are you interested in learning more about who uses your collection, what they use it for, what their priorities are for future digitization, or what they do/don’t like about the collection? Go ahead and ask your registered users!

What motivates online patrons to use UTC?

Believe it or not many people do want to engage in this way with digitized newspapers and other cultural heritage collections, even if it’s only to correct OCR errors. Below are comments from users, from surveys conducted by the CDNC and two other projects using UTC.

"I enjoy the correction – it’s a great way to learn more about past history and things of interest whilst doing a ‘service to the community’ by correcting text for the benefit of others."

"I have recently retired from IT and thought that I could be of some assistance to the project. It benefits me and other people. It helps with family research."

"I am interested in all kinds of history. I have pursued genealogy as a hobby for many years. I correct text at CDNC because I see it as a constructive way to contribute to a worthwhile project. Because I am interested in history, I enjoy it.

"As an amateur historical researcher my time for research is very limited. Making time to travel to archives, libraries, and historical societies does not happen as often as I would like. The Cambridge Public Library’s online newspaper collection has been an invaluable resource and it is fun. I am very grateful for all the help I have received over the years from so many research organizations. Correcting text has several benefits. It makes it much more likely that I will find a story if I decide to search for it in the future. It is a way of saying ‘thank you’ to the Cambridge Library for having such a great resource available and maybe I can make the next person’s research a little easier. It is my own little historical preservation project."

Try Veridian’s Crowdsourced User Text Correction (UTC)

Lots of Veridian-based newspaper digitization projects are now using UTC to improve the quality of their collections and engage with their online patrons. Try it out at the links below:

California Digital Newspaper Collection

CDNC

Illinois Digital Newspaper Collections

Illinois Digital Newspaper Collection

Cambridge Public Library

Cambridge Public Library

Library of Virginia

Library of Virginia

CDNC User Text Correction