Why choose METS/ALTO for newspaper digitization?

Author: Stefan Boddie, Managing Director, DL Consulting Ltd.
Date: 2014-07-19

Why is METS/ALTO so important? Because everyone is using it!

Many benefits come with conforming to recognized standards like METS and ALTO, as used by existing, well-established projects.

Benefits

  • The long-term sustainability of your digital objects is greatly enhanced. If METS/ALTO ever becomes obsolete it is certain that a suitable migration path will be developed for the many hundreds of millions of digitized pages already in this format, for large projects at the Library of Congress and elsewhere.
  • If it’s ever desirable to share content between projects it’s easy to do so, when those projects use the same open standards.
  • Projects using METS/ALTO benefit from the knowledge and tools created by/for other projects using the same standards.

In addition to the above METS/ALTO is simply a better, richer format for capturing digitized newspapers than any currently available alternative.

  • Not only does ALTO store the full-text content of each page and word, it also captures structural information like column, line, and word locations.
  • METS has the capability to support article segmentation, so articles, headlines, bylines, and other article-level metadata can be captured.
  • The combination of METS and ALTO captures very “rich” data, allowing the development of innovative discovery and delivery interfaces.
  • Both METS and ALTO are open XML standards — no proprietary software is required to read or transform the digitized objects.