Skip to content
Change the repository type filter

All

    Repositories list

    • core

      Public
      Collection of OCR-related python tools and wrappers from @OCR-D
      Python
      3313211519Updated Dec 22, 2025Dec 22, 2025
    • A ground truth (GT) dataset created within the OCR-D project and consisting of 348 pages extracted from historical documents pertaining to the "Verzeichnis der im deutschen Sprachraum erschienenen Drucke" (VD), all of which have been digitised by Staatsbibliothek zu Berlin – Berlin State Library (SBB).
      Shell
      0000Updated Nov 26, 2025Nov 26, 2025
    • OCR-D-compliant page segmentation
      Python
      1668111Updated Nov 19, 2025Nov 19, 2025
    • Offer of different keyboards for transcription software (Aletheia, Transkribus, LAREX, QURATOR-neat, eScriptorium)
      XSLT
      1100Updated Nov 5, 2025Nov 5, 2025
    • OCR-D wrapper for prima-pagetopdf
      Python
      7930Updated Oct 30, 2025Oct 30, 2025
    • spec

      Public
      Specification of the @OCR-D technical architecture, interface definitions and data exchange format(s)
      Python
      517426Updated Sep 18, 2025Sep 18, 2025
    • HTML
      724161Updated Sep 17, 2025Sep 17, 2025
    • Website for OCR-D specs, formats, requirements
      HTML
      2500Updated Sep 17, 2025Sep 17, 2025
    • Simple character-based language model using keras
      Python
      6710Updated Aug 12, 2025Aug 12, 2025
    • Wrapper for the kraken OCR engine
      Python
      61341Updated Jul 12, 2025Jul 12, 2025
    • ocrd_all

      Public
      Master repository which includes most other OCR-D repositories as submodules
      Makefile
      1972234Updated Jul 4, 2025Jul 4, 2025
    • assets

      Public
      Test data for testing specs and software in @OCR-D
      Makefile
      95207Updated May 20, 2025May 20, 2025
    • OCR-D wrapper for ocr-fileformat
      Python
      3560Updated May 20, 2025May 20, 2025
    • Recognize text using Calamari OCR and the OCR-D framework
      Python
      615181Updated May 13, 2025May 13, 2025
    • An OCR evaluation tool
      Python
      16100Updated May 7, 2025May 7, 2025
    • ocrd_froc

      Public
      Python
      2860Updated May 6, 2025May 6, 2025
    • Python
      2110Updated May 6, 2025May 6, 2025
    • Convert PAGE (v. 2019) to ALTO (v. 2.0 - 4.2)
      Python
      51490Updated May 6, 2025May 6, 2025
    • DFKI Layout Detection for OCR-D
      PureBasic
      1147191Updated May 1, 2025May 1, 2025
    • Binarize with Olena/scribo
      Python
      8720Updated May 1, 2025May 1, 2025
    • Run tesseract with the tesserocr bindings with @OCR-D's interfaces
      Python
      1139125Updated Apr 30, 2025Apr 30, 2025
    • Converters for various file formats used for representing OCR
      XSLT
      71212Updated Apr 30, 2025Apr 30, 2025
    • The OCR-D Ground Truth text and structure corpus was created between 2015 -2017. In the years since 2017, this corpus has been further curated and supplemented with metadata where appropriate. The corpus includes page XML files within annotations of the text and structure include.
      3520Updated Mar 25, 2025Mar 25, 2025
    • Vue
      10213Updated Oct 22, 2024Oct 22, 2024
    • Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)
      JavaScript
      24100Updated Oct 11, 2024Oct 11, 2024
    • Run ImageMagick with an OCR-D CLI
      Shell
      3520Updated Oct 1, 2024Oct 1, 2024
    • Middleware for running Quiver locally
      Python
      0000Updated Sep 24, 2024Sep 24, 2024
    • Benchmarking OCR-D workflows in Docker
      HTML
      1282Updated Sep 20, 2024Sep 20, 2024
    • Python
      1401Updated Jun 24, 2024Jun 24, 2024
    • The repo gt_structure_5_3 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.
      1001Updated Jun 24, 2024Jun 24, 2024