Author: Stefan Boddie, Managing Director, DL Consulting Ltd.
The METS standard is a flexible schema for describing a complex digital object (like a digitized newspaper issue). METS describes the structure of the object but does not encode the actual textual content of the object. The ALTO standard fills this void by encoding the textual content of a digitized page in great detail, including styles and layouts. As well as encoding the digitized text itself ALTO encodes the spatial coordinates of every column, line, and word as it appears on the page.
The combination of METS and ALTO (often written METS/ALTO) is the current industry standard for newspaper digitization used by hundreds of modern, large-scale newspaper digitization projects (and lots of smaller projects too!) A very small sample of projects using METS/ALTO are listed below.
- Chronicling America from the Library of Congress.
- The British Newspaper Archive from the British Library.
- Trove from the National Library of Australia.
- Papers Past from the National Library of New Zealand.
- NewspaperSG from the National Library of Singapore.
- Papers of Princeton from Princeton University Library.
- Columbia Spectator Archive from Columbia University Libraries.