Veridian XML API documentation


Please note that the Veridian software changes very regularly, and different versions of the software vary in their support for the XML API described on this page. Documentation on this page should therefore only be used as a guide, and may not work as expected with all Veridian installations. Existing Veridian users should consult the API documentation included with their software installation for instructions specific to their particular version of Veridian.


The XML interface described in this document is designed to provide an alternative method of accessing the data in a Veridian system. Veridian usually returns HTML pages, to provide a user interface via a web browser, but it can also return XML versions of some pages. These allow access to the document data at a low level, making it possible to use Veridian as a “backend” and create alternative interfaces to search, browse and view the documents.


Contents

1. Introduction

2. Types
2.1. DocumentContent
2.2. DocumentMetadata
2.3. LogicalSectionContent
2.4. LogicalSectionMetadata
2.5. PageContent
2.6. PageMetadata
2.7. PublicationMetadata

3. Requests
3.1. GetDates
3.2. GetDocumentContent
3.3. GetLogicalSectionContent
3.4. GetPageContent
3.5. GetPublications
3.6. GetPublicationDocuments
3.7. SearchDocuments
3.8. SearchLogicalSections
3.9. SearchPages

A. Appendix: Example Request and Responses


1. Introduction

Veridian runs as a CGI executable through a web server (typically Apache HTTPD). When requests are received (usually GET requests with a series of parameters in the URL), Veridian will search the built Lucene index and look up information from its metadata databases, and output a response. Usually this response is HTML, but many of the requests also support XML output. Accessing the XML version is done by adding “&f=XML” to the request parameters.

The remainder of this document describes the parameters to the different XML requests, and the format of the XML responses produced.


2. Types

This section describes some common data types used in the XML responses. The purpose of the data types is to reduce duplication and inconsistency by grouping related fields together and ensuring that these fields always appear together.

2.1. DocumentContent

DocumentContent contains fields related to the contents of a document (rather than its metadata): the pages and logical sections it contains. This type is only returned from the GetDocumentContent request. It contains the following fields:

DocumentContent
Fields
DocumentNextDocumentID The identifier of the next document within the publication, in date order. This will be empty if no next document exists (i.e. the current document is the last document in the publication).

Type: String

Example: DSC19800119

DocumentPrevDocumentID The identifier of the previous document within the publication, in date order. This will be empty if no previous document exists (i.e. the current document is the first document in the publication).

Type: String

Example: DSC19800117

DocumentPdfURL The URL of the PDF for this document, if one is available.

Type: String

Example: /cgi-bin/imageserver/imageserver.pl?oid=DSC19800118&getpdf=true

DocumentViewURL The URL of the HTML page in the Veridian delivery system displaying the document.

Type: String

Example: /cgi-bin/veridian?a=d&d=DSC19800118

ArrayOfPage A container object, with zero or more occurrences (one for each page that makes up the document) of the following field:
Page
A container object, with the following field:
PageMetadata
An instance of PageMetadata.
ArrayOfLogicalSection A container object, with zero or more occurrences (one for each logical section that makes up the document) of the following field:
LogicalSection
A container object, with the following fields:
LogicalSectionMetadata
An instance of LogicalSectionMetadata.
ArrayOfLogicalSection
A container object, with zero or more occurrences (one for each child logical section) of the LogicalSection field (described above).

2.2. DocumentMetadata

DocumentMetadata contains fields related to the metadata of a document. This type is returned from many requests. It contains the following fields:

DocumentMetadata
Fields
DocumentDate The date of the document, in human-readable “DD Month YYYY” format.

Type: String

Example: 18 January 1980

DocumentFeatureCode Specifies any special information about the document, or a reason why it is not present in the Veridian system. Usually empty.

Type: String

Example: Missing document

DocumentID The unique identifier of the document. Document identifiers consist of the document’s publication code followed by the document date in YYYYMMDD format.

Type: String

Example: DSC19800118

DocumentNumber The “number” of the document, as specified in the source METS file.

Type: String

Example: 5255

DocumentTitle The title of the document, as specified in the source METS file. These titles may be incorrect or inconsistent, so the PublicationTitle field is generally more useful.

Type: String

Example: Daily Southern Cross

DocumentType The type of the document; possible values include BOOK, DOCUMENT, IMAGES, MULTIMEDIA, NEWSPAPER, PERIODICAL, PHOTO.

Type: String

Example: NEWSPAPER

DocumentVolume The “volume” of the document, as specified in the source METS file.

Type: String

Example: XXX

2.3. LogicalSectionContent

LogicalSectionContent contains fields related to the content of a logical section: its text and images. This type is only returned from the GetLogicalSectionContent request. It contains the following fields:

LogicalSectionContent
Fields
LogicalSectionImagesHTML A block of HTML that displays the series of block images that make up the logical section, including highlighting of query terms (if specified). This is the same HTML that is used in the “clipped article” view in the standard Veridian user interface.

Type: HTML

Example: <div class=”veridianimagecontainerdiv” style=”width:283px; height:18px”><img src=”/cgi-bin/imageserver/imageserver.pl?oid=DSC18740625.2.4.1&amp;area=1&amp;width=283&amp;color=all&amp;ext=jpg&amp;key=” width=”283″ height=”18″ class=”veridianimage” alt=”Block image” title=”Block image” /></div>
<div class=”veridianimagespacerdiv” style=”width:283px; height:18px”></div>

LogicalSectionNextLogicalSectionID The identifier of the next logical section within the document, traversing the hierarchical logical section list. This will be empty if no next logical section exists (i.e. the current logical section is the last logical section in the document).

Type: String

Example: DSC19800118.2.5

LogicalSectionPrevLogicalSectionID The identifier of the previous logical section within the document, traversing the hierarchical logical section list. This will be empty if no previous logical section exists (i.e. the current logical section is the first logical section in the document).

Type: String

Example: DSC19800118.2.3

LogicalSectionTextHTML The text of the logical section, in HTML format with <p> tags surrounding the blocks, and query terms (if specified) highlighted. This may be empty for some logical section types.

Type: String

Example: <p>’Entrance fee’ to clubhouse shock for lunch crowd IRENE NGOO By CURRY lovers to the popular Indian curry stall at the Singapore Civil Service Sports I Council in Dempsey Road</p>

LogicalSectionTextWordCount The number of tokens in the logical section text.

Type: Number

Example: 31

LogicalSectionViewURL The URL of the HTML page in the Veridian delivery system displaying the logical section.

Type: String

Example: /cgi-bin/veridian?a=d&d=DSC19800118.2.4

2.4. LogicalSectionMetadata

LogicalSectionMetadata contains fields related to the metadata of a logical section. This type is returned from the GetLogicalSectionContent, GetDocumentContent and SearchLogicalSections requests. It contains the following fields:

LogicalSectionMetadata
Fields
LogicalSectionFirstPageID The identifier of the page on which the logical section starts. Page identifiers consist of the document identifier followed by “.1” and then the page’s position within the document.

Type: String

Example: DSC19800118.1.1

LogicalSectionID The unique identifier of the logical section. Logical section identifiers consist of the document identifier followed by “.2” and then the hierarchical position of the logical section within the document.

Type: String

Example: DSC19800118.2.4

LogicalSectionTitle The title of the logical section.

Type: String

Example: ‘Entrance fee’ to clubhouse shock for lunch crowd

LogicalSectionType The type of the logical section; possible values being ADVERTISEMENT, ARTICLE, ARTICLE + ILLUSTRATION, GROUPING_NODE, LETTER, MISCELLANEOUS, OBITUARY.

Type: String

Example: ARTICLE

2.5. PageContent

PageContent contains fields related to the content of a page: its image and PDF. This type is only returned from the GetPageContent request. It contains the following fields:

PageContent
Fields
PageImageHTML A block of HTML for displaying the page image, including highlighting of query terms (if specified). This is the same HTML that is used in the “classic” page view in the standard Veridian user interface, but without the information for highlighting the logical section areas (this is encapsulated in the LogicalSectionBlock fields below).

Type: HTML

Example: <div class=”veridianimagecontainerdiv” style=”width:1400px; height:2410px”>
<img src=”/cgi-bin/imageserver/imageserver.pl?oid=NZGAZETTE18400509.1.1&amp;width=1400&amp;color=all&amp;ext=jpg&amp;key=” width=”1400″ height=”2410″ class=”veridianimage” alt=”Page image” title=”Page image” />
</div>
<div class=”veridianimagespacerdiv” style=”width:1400px; height:2410px”></div>

PageNextPageID The identifier of the next page within the document, traversing the linear page list. This will be empty if no next page exists (i.e. the current page is the last page in the document).

Type: String

Example: DSC19800118.1.5

PagePrevPageID The identifier of the previous page within the document, traversing the linear page list. This will be empty if no previous page exists (i.e. the current page is the first page in the document).

Type: String

Example: DSC19800118.1.3

PagePdfURL The URL of the PDF for this page.

Type: String

Example: /cgi-bin/imageserver/imageserver.pl?oid=DSC19800118.1.1&getpdf=true

PageTextHTML The text of the page, in HTML format with <p> tags surrounding the blocks, and query terms (if specified) highlighted. This may be empty for some page types.

Type: String

Example: <p>’Entrance fee’ to clubhouse shock for lunch crowd IRENE NGOO By CURRY lovers to the popular Indian curry stall at the Singapore Civil Service Sports I Council in Dempsey Road</p> [truncated]

PageTextWordCount The number of tokens in the page text.

Type: Number

Example: 31

PageViewURL The URL of the HTML page in the Veridian delivery system displaying the page.

Type: String

Example: /cgi-bin/veridian?a=d&d=DSC19800118.1.1

ArrayOfLogicalSectionBlock A container object, with zero or more occurrences (one for each fragment of logical section that appears on the page) of the following field:
LogicalSectionBlock
A container object, with the following field:
LogicalSectionBlockLocation
The location of the logical section fragment within the page, in “X,Y,Width,Height” format.

Type: String

Example: 498,205,158,810

LogicalSectionID
The identifier of the logical section this fragment is part of.

Type: String

Example: DSC18450819.2.2.4

LogicalSectionTitle
The title of the logical section this fragment is part of.

Type: String

Example: Page 41 Advertisements Column 4

LogicalSectionType
The type of the logical section this fragment is part of.

Type: String

Example: ADVERTISEMENT

2.6. PageMetadata

PageMetadata contains fields related to the metadata of a page. This type is returned from the GetDocumentContent, GetPageContent and SearchPages requests. It contains the following fields:

PageMetadata
Fields
PageFeatureCode Specifies any special information about the page. Usually empty.

Type: String

Example: Supplement page

PageID The unique identifier of the page. Page identifiers consist of the document identifier followed by “.1” and then the page’s position within the document.

Type: String

Example: DSC19800118.1.1

PageImageHeight The height of the original page image, in pixels.

Type: Number

Example: 9197

PageImageWidth The width of the original page image, in pixels.

Type: Number

Example: 6614

PageOCRAccuracy The estimated accuracy of the page text from the OCR process. Obtained from the source data.

Type: String

Example: 95%

PageTitle The title of the page. Page numbers do not necessarily start at 1 for a document, and some pages may not have a number at all.

Type: String

Example: Page 8

2.7. PublicationMetadata

PublicationMetadata contains fields related to the metadata of a publication. This type is returned from many requests. It contains the following fields:

PublicationMetadata
Fields
PublicationID The unique identifier of the publication.

Type: String

Example: DSC

PublicationTitle The human-readable name of the publication.

Type: String

Example: Daily Southern Cross


3. Requests

This section describes each of the nine available requests, with descriptions of their request parameters and response values.

3.1. GetDates

The GetDates request allows the date coverage of the collection to be obtained. It returns the list of dates containing documents, optionally filtered by a publication.

GetDates Request
Required parameters
a Must be set to “cl”.
cl Must be set to “CL2”.
f Must be set to “XML”.
Optional parameters
sp A publication ID to filter the results by. Only dates with documents in the specified publication will be returned.

Type: String

Example: DSC

GetDates Response
Fields present on failure
Error Contains an error message if the GetDates request failed. The GetDates request will fail if the “sp” parameter does not specify a valid value.

Type: String

Example: Invalid value “The New Zealand Times” for CGI argument “sp”.

Fields present on success
ArrayOfDate A container object, with zero or more occurrences (one for each date in the collection/publication) of the following field:
Date
The date value in YYYY, YYYYMM or YYYYMMDD format.

3.2. GetDocumentContent

The GetDocumentContent request provides access to all the data available for one document: document content (including the list of pages and the hierarchical list of logical sections), and document metadata.

GetDocumentContent Request
Required parameters
a Must be set to “d”.
d The identifier of the document being requested. Document identifiers consist of the document’s publication code followed by the document date in YYYYMMDD format.

Type: String

Example: DSC19800118

f Must be set to “XML”.
Optional parameters
(None)
GetDocumentContent Response
Fields present on failure
Error Contains an error message if the GetDocumentContent request failed. The GetDocumentContent request will fail if the “d” parameter does not specify a valid document identifier.

Type: String

Example: Missing required CGI argument “d”.

Fields present on success
Document A container object, with the following fields:
DocumentContent
An instance of DocumentContent.
DocumentMetadata
An instance of DocumentMetadata.
PublicationMetadata
An instance of PublicationMetadata.

3.3. GetLogicalSectionContent

The GetLogicalSectionContent request provides access to all the data available for one logical section: logical section content (both text and images), and logical section metadata.

GetLogicalSectionContent Request
Required parameters
a Must be set to “d”.
d The identifier of the logical section being requested. Logical section identifiers consist of the document identifier followed by “.2” and then the hierarchical position of the logical section within the document.

Type: String

Example: DSC19800118.2.4

f Must be set to “XML”.
Optional parameters
hl A comma separated list of phrases/terms to locate in the logical section images returned.

Type: String

Example: the daily,cross

GetLogicalSectionContent Response
Fields present on failure
Error Contains an error message if the GetLogicalSectionContent request failed. The GetLogicalSectionContent request will fail if the “d” parameter does not specify a valid logical section identifier.

Type: String

Example: Missing required CGI argument “d”.

Fields present on success
LogicalSection A container object, with the following fields:
LogicalSectionContent
An instance of LogicalSectionContent.
LogicalSectionMetadata
An instance of LogicalSectionMetadata.
DocumentMetadata
An instance of DocumentMetadata.
PublicationMetadata
An instance of PublicationMetadata.

3.4. GetPageContent

The GetPageContent request provides access to all the data available for one page: page content (image and PDF), and page metadata.

GetPageContent Request
Required parameters
a Must be set to “d”.
d The identifier of the page being requested. Page identifiers consist of the document identifier followed by “.1” and then the page’s position within the document.

Type: String

Example: DSC19800118.1.1

f Must be set to “XML”.
Optional parameters
(None)
GetPageContent Response
Fields present on failure
Error Contains an error message if the GetPageContent request failed. The GetPageContent request will fail if the “d” parameter does not specify a valid page identifier.

Type: String

Example: Missing required CGI argument “d”.

Fields present on success
Page A container object, with the following fields:
PageContent
An instance of PageContent.
PageMetadata
An instance of PageMetadata.
DocumentMetadata
An instance of DocumentMetadata.
PublicationMetadata
An instance of PublicationMetadata.

3.5. GetPublications

The GetPublications request returns the list of publications in the collection.

GetPublications Request
Required parameters
a Must be set to “cl”.
cl Must be set to “CL1”.
f Must be set to “XML”.
Optional parameters
(None)
GetPublications Response
Fields present on failure
Error Contains an error message if the GetPublications request failed. The GetPublications request should not fail.
Fields present on success
ArrayOfPublication A container object, with zero or more occurrences (one for each publication in the collection) of the following field:
Publication
A container object, with the following field:
PublicationMetadata
An instance of PublicationMetadata.

3.6. GetPublicationDocuments

The GetPublicationDocuments request returns the list of documents in a publication.

GetPublicationDocuments Request
Required parameters
a Must be set to “cl”.
cl Must be set to “CL1”.
f Must be set to “XML”.
sp The identifier of the publication being requested.

Type: String

Example: DSC

Optional parameters
(None)
GetPublicationDocuments Response
Fields present on failure
Error Contains an error message if the GetPublicationDocuments request failed. The GetPublicationDocuments request will fail if the “sp” parameter does not specify a valid publication identifier.

Type: String

Example: Invalid value “The New Zealand Times” for CGI argument “sp”.

Fields present on success
ArrayOfDocument A container object, with zero or more occurrences (one for each document in the publication) of the following field:
Document
A container object, with the following field:
DocumentMetadata
An instance of DocumentMetadata.

3.7. SearchDocuments

The SearchDocuments request allows the thousands of documents in the collection to be listed, and optionally filtered in a number of different ways. It returns a list of matching documents.

SearchDocuments Request
Required parameters
a Must be set to “q”.
f Must be set to “XML”.
leq Must be set to “Document”.
Optional parameters
dafdq Filter by date: day component of range start.

Type: String

Example: 01

dafmq Filter by date: month component of range start.

Type: String

Example: 01

dafyq Filter by date: year component of range start.

Type: String

Example: 1980

datdq Filter by date: day component of range end.

Type: String

Example: 31

datmq Filter by date: month component of range end.

Type: String

Example: 12

datyq Filter by date: year component of range end.

Type: String

Example: 1989

deq Filter by decade.

Type: String

Example: 198

o The number of search results to return. Default value is 20 if not specified. Maximum value is 100. Used in conjunction with the “r” parameter, this allows the results for a search to be requested in blocks. This is particularly important when there are many matches for a search.

Type: Number

Example: 20

puq Filter by publication ID.

Type: String

Example: DSC

r The number of the first search result to return. Default value is 1 if not specified. Used in conjunction with the “o” parameter, this allows the results for a search to be requested in blocks. This is particularly important when there are many matches for a search.

Type: Number

Example: 21

sf Allows the search results to be sorted by a document field; valid values are “byDA” (document date), “byPU” (publication ID) and “byTY” (document type). If this parameter is not specified, the search results will be sorted by relevance.

Type: String

Example: byDA

tyq Filter by document type.

Type: String

Example: BOOK

yeq Filter by year.

Type: String

Example: 1980

SearchDocuments Response
Fields present on failure
Error Contains an error message if the SearchDocuments request failed. The SearchDocuments request will fail if invalid values are specified for the “o” or “sf” parameters. If invalid values are specified for the filter parameters then no search results will be returned.

Type: String

Example: Invalid value “Date” for CGI argument “sf”.

Fields present on success
TotalNumberOfSearchResults The number of documents that matched the search criteria. May be 0.

Type: Number

Example: 94851

FirstSearchResultNumberReturned The number of the first search result in the results returned.

Type: Number

Example: 1

LastSearchResultNumberReturned The number of the last search result in the results returned.

Type: Number

Example: 20

ArrayOfDocument A container object, with zero or more occurrences (one for each document returned) of the following field:
Document
A container object, with the following fields:
SearchResultNumber
The position of the search result amongst all the matches to the query (not just those returned).

Type: Number

Example: 5

DocumentMetadata
An instance of DocumentMetadata.
PublicationMetadata
An instance of PublicationMetadata.
ArrayOfSearchFacet A container object, with zero or more occurrences (one for each search facet returned) of the following field:
SearchFacet
A container object, with the following fields:
SearchFacetField
The field name of the search facet.

Type: String

Example: PublicationCode

SearchFacetValue
The value of the search facet.

Type: String

Example: DSC

SearchFacetCount
The frequency of this search facet within the search results.

Type: Number

Example: 494

3.8. SearchLogicalSections

The SearchLogicalSections request allows the thousands/millions of logical sections in the collection to be searched by keyword, and optionally filtered in a number of different ways. It returns a list of matching logical sections.

SearchLogicalSections Request
Required parameters
a Must be set to “q”.
f Must be set to “XML”.
leq Must be set to “Logical”.
txq One or more keywords to search for in the logical section text.

Type: String

Example: hamilton farming

Optional parameters
dafdq Filter by date: day component of range start.

Type: String

Example: 01

dafmq Filter by date: month component of range start.

Type: String

Example: 01

dafyq Filter by date: year component of range start.

Type: String

Example: 1980

datdq Filter by date: day component of range end.

Type: String

Example: 31

datmq Filter by date: month component of range end.

Type: String

Example: 12

datyq Filter by date: year component of range end.

Type: String

Example: 1989

deq Filter by decade.

Type: String

Example: 198

o The number of search results to return. Default value is 20 if not specified. Maximum value is 100. Used in conjunction with the “r” parameter, this allows the results for a search to be requested in blocks. This is particularly important when there are many matches for a search.

Type: Number

Example: 20

puq Filter by publication ID.

Type: String

Example: DSC

r The number of the first search result to return. Default value is 1 if not specified. Used in conjunction with the “o” parameter, this allows the results for a search to be requested in blocks. This is particularly important when there are many matches for a search.

Type: Number

Example: 21

sf Allows the search results to be sorted by a logical section field; valid values are “byDA” (document date), “byPU” (publication ID), “byTI” (section title) and “byTY” (section type). If this parameter is not specified, the search results will be sorted by relevance.

Type: String

Example: byDA

ssnip Specifies the type of search result “snippet” to return. Valid values are “img” for image snippets, “txt” for text snippets, “auto” for an image snippet if possible (METS/ALTO data) or a text snippet otherwise, and “” for no search snippets. Default value is “txt” if not specified.

Type: String

Example: txt

tyq Filter by section type.

Type: String

Example: ADVERTISEMENT

wofq Filter by section text word count: range start.

Type: String

Example: 50

wotq Filter by section text word count: range end.

Type: String

Example: 100

yeq Filter by year.

Type: String

Example: 1980

SearchLogicalSections Response
Fields present on failure
Error Contains an error message if the SearchLogicalSections request failed. The SearchLogicalSections request will fail if the “txq” parameter is empty, or invalid values are specified for the “o”, “sf” or “ssnip” parameters. If invalid values are specified for the filter parameters then no search results will be returned.

Type: String

Example: Missing required CGI argument “txq”.

Fields present on success
TotalNumberOfSearchResults The number of logical sections that matched the search criteria. May be 0.

Type: Number

Example: 94851

FirstSearchResultNumberReturned The number of the first search result in the results returned.

Type: Number

Example: 1

LastSearchResultNumberReturned The number of the last search result in the results returned.

Type: Number

Example: 20

ArrayOfLogicalSection A container object, with zero or more occurrences (one for each logical section returned) of the following field:
LogicalSection
A container object, with the following fields:
SearchResultNumber
The position of the search result amongst all the matches to the query (not just those returned).

Type: Number

Example: 5

SearchResultScore
The relevancy score for the search result, higher scores being better matches (more relevant).

Type: Number

Example: 3708

SearchResultSnippetHTML
A small preview of the search result, showing the first matching term in the section.

Type: HTML

Example: <div class=”veridiansnippettextcontainerdiv”>… . 80 0 I up ili urn .. ..8 6 <b class=”veridianhighlightcolor”>Hamilton</b> .. 85 0 OT’Jiv .. .. 7 C Cambridge …</div>

LogicalSectionMetadata
An instance of LogicalSectionMetadata.
DocumentMetadata
An instance of DocumentMetadata.
PublicationMetadata
An instance of PublicationMetadata.
ArrayOfSearchFacet A container object, with zero or more occurrences (one for each search facet returned) of the following field:
SearchFacet
A container object, with the following fields:
SearchFacetField
The field name of the search facet.

Type: String

Example: LogicalSectionType

SearchFacetValue
The value of the search facet.

Type: String

Example: advertisement

SearchFacetCount
The frequency of this search facet within the search results.

Type: Number

Example: 494

3.9. SearchPages

The SearchPages request allows the thousands/millions of pages in the collection to be searched by keyword, and optionally filtered in a number of different ways. It returns a list of matching pages.

SearchPages Request
Required parameters
a Must be set to “q”.
f Must be set to “XML”.
leq Must be set to “Page”.
txq One or more keywords to search for in the page text.

Type: String

Example: hamilton farming

Optional parameters
dafdq Filter by date: day component of range start.

Type: String

Example: 01

dafmq Filter by date: month component of range start.

Type: String

Example: 01

dafyq Filter by date: year component of range start.

Type: String

Example: 1980

datdq Filter by date: day component of range end.

Type: String

Example: 31

datmq Filter by date: month component of range end.

Type: String

Example: 12

datyq Filter by date: year component of range end.

Type: String

Example: 1989

deq Filter by decade.

Type: String

Example: 198

o The number of search results to return. Default value is 20 if not specified. Maximum value is 100. Used in conjunction with the “r” parameter, this allows the results for a search to be requested in blocks. This is particularly important when there are many matches for a search.

Type: Number

Example: 20

puq Filter by publication ID.

Type: String

Example: DSC

r The number of the first search result to return. Default value is 1 if not specified. Used in conjunction with the “o” parameter, this allows the results for a search to be requested in blocks. This is particularly important when there are many matches for a search.

Type: Number

Example: 21

sf Allows the search results to be sorted by a page field; valid values are “byDA” (document date), “byPU” (publication ID) and “byTI” (page title). If this parameter is not specified, the search results will be sorted by relevance.

Type: String

Example: byDA

ssnip Specifies the type of search result “snippet” to return. Valid values are “img” for image snippets, “txt” for text snippets, “auto” for an image snippet if possible (METS/ALTO data) or a text snippet otherwise, and “” for no search snippets. Default value is “txt” if not specified.

Type: String

Example: txt

wofq Filter by page text word count: range start.

Type: String

Example: 50

wotq Filter by page text word count: range end.

Type: String

Example: 100

yeq Filter by year.

Type: String

Example: 1980

SearchPages Response
Fields present on failure
Error Contains an error message if the SearchPages request failed. The SearchPages request will fail if the “txq” parameter is empty, or invalid values are specified for the “o”, “sf” or “ssnip” parameters. If invalid values are specified for the filter parameters then no search results will be returned.

Type: String

Example: Missing required CGI argument “txq”.

Fields present on success
TotalNumberOfSearchResults The number of pages that matched the search criteria. May be 0.

Type: Number

Example: 94851

FirstSearchResultNumberReturned The number of the first search result in the results returned.

Type: Number

Example: 1

LastSearchResultNumberReturned The number of the last search result in the results returned.

Type: Number

Example: 20

ArrayOfPage A container object, with zero or more occurrences (one for each page returned) of the following field:
Page
A container object, with the following fields:
SearchResultNumber
The position of the search result amongst all the matches to the query (not just those returned).

Type: Number

Example: 5

SearchResultScore
The relevancy score for the search result, higher scores being better matches (more relevant).

Type: Number

Example: 3708

SearchResultSnippetHTML
A small preview of the search result, showing the first matching term in the section.

Type: HTML

Example: <div class=”veridiansnippettextcontainerdiv”>… . 80 0 I up ili urn .. ..8 6 <b class=”veridianhighlightcolor”>Hamilton</b> .. 85 0 OT’Jiv .. .. 7 C Cambridge …</div>

PageMetadata
An instance of PageMetadata.
DocumentMetadata
An instance of DocumentMetadata.
PublicationMetadata
An instance of PublicationMetadata.
ArrayOfSearchFacet A container object, with zero or more occurrences (one for each search facet returned) of the following field:
SearchFacet
A container object, with the following fields:
SearchFacetField
The field name of the search facet.

Type: String

Example: PublicationCode

SearchFacetValue
The value of the search facet.

Type: String

Example: DSC

SearchFacetCount
The frequency of this search facet within the search results.

Type: Number

Example: 494


A. Appendix: Example Request and Responses

Request
GetDates a=cl&cl=CL2&f=XML
Request list of all the dates with documents in the collection.
GetDocumentContent a=d&d=DSC18740625&f=XML
Request contents of the Daily Southern Cross June 25 1874 issue.
GetLogicalSectionContent a=d&d=DSC18740625.2.4.1&hl=waipu&f=XML
Request contents of the Daily Southern Cross June 25 1874 “PORT OF AUCKLAND” article, with occurrences of “waipu” located.
GetPageContent a=d&d=DSC18740625.1.1&f=XML
Request contents of the Daily Southern Cross June 25 1874 first page.
GetPublications a=cl&cl=CL1&f=XML
Request list of all the publications in the collection.
GetPublicationDocuments a=cl&cl=CL1&sp=DSC&f=XML
Request list of all the issues in the Daily Southern Cross.
SearchDocuments a=q&leq=Document&dafyq=1850&datyq=1900&puq=DSC&sf=byDA&f=XML
Request issues in the Daily Southern Cross between 1850 and 1900, sorted by date.
SearchLogicalSections a=q&leq=Logical&txq=hamilton&dafyq=1850&datyq=1900&puq=DSC&sf=byTI&f=XML
Request articles in the Daily Southern Cross between 1850 and 1900 containing “hamilton”, sorted by article title.
SearchPages a=q&leq=Page&txq=hamilton&dafyq=1850&datyq=1900&puq=DSC&sf=byDA&f=XML
Request pages in the Daily Southern Cross between 1850 and 1900 containing “hamilton”, sorted by date.