Author: Stefan Boddie, Managing Director, DL Consulting Ltd.
What is METS?
The Metadata Encoding and Transmission Standard (METS) is an XML standard maintained by the Library of Congress.
METS is a highly flexible schema for encoding descriptive, administrative, and structural metadata to describe complex digital objects. In a METS file you find information such as the title, author, publisher and date of the original work, and also information about the digital object itself, including the digitization process and the physical and logical structure of the object.
When used to describe digitized newspapers there is typically a single METS file to describe each newspaper issue.
What METS XML contains
A METS XML file has 5 separate metadata sections. Each section describes a different aspect of the digital object.
Section 1 – Descriptive Metadata — <dmdSEC>
Uses MODS or similar metadata to describe the object itself. Here you find the title of the object, as well as other information like author, publisher, and date.
Section 2 – Administrative Metadata — <amdSEC>
Uses MIX or a similar metadata schema to describe the digitization process and the resulting digital files. Here you find information about the scanning process, hardware, digitization software, compression, file types and more.
Section 3 – File Section — <fileSEC>
Lists, describes, and links to the files that make up the complex digital object described by the METS file. For a newspaper issue those files typically include page-level images (in TIFF and/or JPEG 2000 format), ALTO XML files describing the layout and content of each individual page, and page-level and/or issue-level PDF files.
Section 4 – Physical Structure — <structMap LABEL=”Physical Structure”>
Describes the physical structure of a complex digital object. For a digitized newspaper this section “points to” and describes the pages that make up the newspaper issue. It includes metadata associated with the physical pages (e.g. page numbers and/or ordering information) and links to files (e.g. images and ALTO XML files) that describe each page.
Section 5 – Logical Structure — <structMap LABEL=”Logical Structure”>
Describes the “logical” structure of a complex digital object. For newspapers, if articles have been identified during digitization, this section lists the “table of contents” of articles in the newspaper issue, as well as any metadata (e.g. headlines and bylines) associated with individual articles.