Skip to content
This repository was archived by the owner on Jan 23, 2025. It is now read-only.

Metadata

knoxa edited this page Mar 6, 2017 · 6 revisions

Each document in the corpus contains Dublin Core metadata elements. These have been collected as RDF triples to create a corpus metadata document using the [DCMI schema] (http://dublincore.org/documents/2012/06/14/dcmi-terms/).

The coverage location RDF links DC coverage fields to locations. The locations are places from DBpedia, with WGS84 latitude and longitude properties. The DC coverage fields are added to the relevant locations as additional rdfs:label properties.

The publisher RDF links DC publisher fields to the publishing agency extracted from the header text.

The description RDF is a first attempt at adding a DC description field. The assumption is that for news reports the first sentence of a document is essentially a summary. Where a description for a document exists, it consists of the text of the first sentence that follows the "[TEXT]" marker (that indicates the end of the header information).

Clone this wiki locally