Skip to content

clarin-eric/PressMint

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PressMint: Interoperable Corpora of Historical Newspapers

The CLARIN PressMint project plans to compile corpora of historical newspapers for a number of countries and languages.

PressMint corpora are to be interoperable, i.e. encoded to a common PressMint schema, a customisation of the TEI Guidelines, but with various down-stream formats (TSV, CoNLL-U, JSON etc.) also available. The same scripts should process the common data in any PressMint corpus, despite the different kind of information included in the corpora.

The PressMint Git workflow, scripts and documentation will be based on the ParlaMint project, which builds richly annotated corpora of parliamentary proceedings for a large number of countries and autonomous regions.

This Git repository is, as yet, a stub with content still to be added. Note that there are several branches for different parts of the development.

The repository contains the following directories:

  • The Samples directory contains directories by contributing (CLARIN) country. It will eventually include samples for all variants and formats of the PressMint corpora.

About

PressMint: Interoperable Corpora of Historical Newspapers

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •