Our top 10 tips for newspaper digitization projects

  1. Consider the entire project and all its steps from the outset, and talk to all your proposed vendors/suppliers early in the planning process.
  2. Sometimes the decisions you make about the platform the finished collection will reside on may affect the type/format of digital objects you need to produce. If possible make decisions about hosting platforms and long term preservation prior to starting on the scanning/digitization process.
  3. Consider which parts of the project you’re able to do yourself and which you might outsource. For example, if the project is not too large you might consider doing the scanning work in-house, if you have suitable equipment. For larger projects though it might make sense to outsource all the scanning and data preparation work.
  4. Carefully consider how the project fits with (and differs from) your other digitization projects. Is the platform and/or workflow you’ve used for previous digitization projects suitable for newspapers? Are there better alternatives?
  5. Get advice from those who have worked on large newspaper digitization projects before. Newspaper projects have unique characteristics and are often more complex than other types of digitization projects. You are likely to get good advice from other institutions that have gone through the process. Following a similar approach/workflow to that you’ve developed for other types of projects may not always be the best option.
  6. Produce digital objects in the METS/ALTO format. If done correctly it should not cost more to produce METS/ALTO than something simpler like PDF. See Why choose METS/ALTO? for a full explanation of why this is so important.
  7. Evaluate your scanning/digitization options carefully. If your newspapers have already been microfilmed it is easier and less expensive to scan from microfilm than from originals. Scanning from microfilm also gives you a larger choice of vendors, since microfilms are much more easily transported than original newspapers. On the flip side, scanning from originals may produce better digital images.
  8. If using different vendors for scanning and OCR, as is often the case for large newspaper projects, or if you’re doing the scanning in-house, talk to the people responsible for the OCR process early in the project. If possible send sample images to the OCR vendor prior to going ahead with large-scale scanning. Filters and image processing algorithms applied after scanning to make images “look nice” can often have a detrimental effect on OCR accuracy.
  9. Consider the costs and benefits of article segmentation. Many modern newspaper digitization projects now do this, to allow individual newspaper articles to be identified on the page. Article segmentation is generally considered to provide a nicer user experience, but there is additional cost. Typically you might expect it to cost $0.20 – $0.50 per page to digitize newspapers without article segmentation, or $0.70 – $1.00 with it.
  10. Consider running a pilot project with a small number of newspaper pages, prior to making any final decisions. Many scanning, OCR, and hosting platform vendors (including us at Veridian!) are willing to process a small number of samples and put them online for evaluation, for little or no cost. Doing so allows you to ensure the entire process works as expected before you commit to it.