Workflow of a Veridian newspaper digitization project

Veridian software, services, and workflows were designed for large-scale newspaper digitization projects. They have of course been used for many much smaller projects too, including some non-newspaper projects, but the workflows and processes remain the same.

Understanding Veridian’s digitization workflows and how they differ from some other projects is important, especially for those accustomed to using products like CONTENTdm. For example, when developing a digital collection with CONTENTdm librarians often upload one digital file at a time, manually cataloging them as they go. That type of one-by-one upload-and-catalog workflow is fine for many types of projects, but is simply not practical for large collections of digitized newspapers.

Another key difference between Veridian and other content management systems is that it’s not usually necessary for library staff to learn how to load or remove data batches, or to configure and customize the software. We do all the technical work for most libraries who host their digital collections with Veridian. We have long-term relationships with our Veridian customers — we don’t simply sell software products and leave library staff to work out how to use them effectively. And for those libraries who do choose to learn how to do all these things we provide documentation and support (and it’s really not difficult), but it isn’t usually necessary.

I love the fact that I can set a project in motion and just leave it alone. Veridian has never caused any problems for me. We derive a great deal of peace-of-mind reassurance from the deep support and the fact that everything runs smoothly without any intervention on our part.

Cliff Wulfman, Coordinator of Library Digital Initiatives, Princeton University Library

The Veridian digitization workflow

The workflow for a newspaper digitization project with Veridian typically looks as described in the sections below. Often large projects are completed in “batches”, with each batch consisting of from a few hundred to many thousands of newspaper pages.

1. Scanning

Newspaper images are scanned in batches, either from microfilm or paper originals, to produce TIFF images. Some libraries choose to do the scanning in-house, while others choose local scanning vendors, or we can recommend one of our scanning partners.

Once each batch of images is scanned it is typically shipped (either electronically or on portable hard drive) to whichever organization was chosen to produce the final digital objects.

More information about the scanning process can be found in The Process of Scanning Newspapers.

2. Image clean-up and creation of METS/ALTO digital objects

The recommended digital objects for newspaper digitization are based on METS/ALTO XML and require specialized software to produce. Depending on requirements the batches of scanned TIFF images are usually shipped either to us at Veridian or to a selected vendor/partner for processing. Alternatively, some large projects choose to purchase the necessary processing software and produce the digital objects themselves.

The digital objects produced during this process include all the appropriate metadata, usually embedded within METS XML files.

The organization responsible for producing the digital objects would also usually do the image clean-up work, including splitting any “two-up” images, cropping borders, de-skewing, de-speckling, etc.

3. Loading digital objects to the Veridian presentation system

Once the digital objects are completed they are shipped (either electronically or on portable hard drive) to whoever is hosting the Veridian presentation system. If we host the project the completed objects are shipped to us, or if the library hosts the project themselves the objects are shipped to the library.

For most projects we run the data ingest process to load each new batch of data into Veridian as it arrives. This is true even when the library hosts Veridian on their own hardware — as long as we have remote login access to the server we can do all the work.

4. Quality Assurance (QA)

QA can be done in several different ways, depending on the library’s preference. For some projects we set up a second “staging” copy of Veridian, and load all new batches there first. Library staff can then check the data on the staging site before copying it to the live site. Other projects are comfortable to load new data batches directly to the live site, and do their QA on the live site. And if a vendor was chosen to produce the digital objects it is often possible for that vendor to load the objects to their own copy of Veridian, so online QA can be carried out by library staff as soon as the digital objects are completed, and before they are shipped.

Errors found during the QA process usually result in the entire digital object (i.e. a newspaper issue) being reprocessed. Or in rare cases an entire data batch containing many digital objects may be sent back for reprocessing.

5. Preservation

As well as being loaded to Veridian for online access, and regardless of where the Veridian site is hosted, a copy of the finished and QA verified digital objects is always shipped to the library.

For more information see Long-term preservation of digitized newspapers.

6. Ongoing support

At Veridian we never consider a digitization project to be “finished”. Having the newspapers digitized, preserved, and posted online is only the first step. After that the collection needs to be nurtured and maintained, it needs to attract visitors and encourage those visitors to engage with the digitized content. Veridian features like Crowdsourced User Text Correction (UTC) and metadata editing allow both librarians and online patrons to contribute to and improve the quality of the collection, long after it’s first posted online.

We offer hosting, support, and maintenance services to support digitization projects in the long term. And for most Veridian projects we install software updates with new features at least once every year, to keep them as fresh and up to date as possible.