Skip to content

Post45-Data-Collective/BookReconciler

Repository files navigation

build passes

BookReconciler 📘💎 — Metadata Enrichment and Work-Level Clustering

  • Who is this for? Digital humanities researchers, librarians, metadata specialists, and more.
  • What does it do? Finds, clusters, and enriches records for books. Adding ISBNS, HathiTrust IDs, subject headings, descriptions, page counts, publication dates, and more.

Table of Contents:


Watch the video

YouTube Demo Video

BookReconciler 📘💎 is a tool that helps you reconcile and enrich bibliographic data from multiple library and knowledge sources:

  1. Library of Congress
  2. Google Books
  3. OCLC / WorldCat
  4. HathiTrust
  5. VIAF
  6. Wikidata
  7. OpenLibrary

You can take a spreadsheet with only title and author information, and you can add identifiers like ISBNs, OCLC numbers, or HathiTrust Volume IDs, as well as valuable contextual information like Library of Congress Subject Headings, genres, descriptions, page counts, and dates of first publication. Additionally, you can find and cluster different editions or manifestations of the same Work (e.g., translations, reprints, etc.).

The tool currently works as an extension of the software application OpenRefine, which makes it accessible to those with and without computational experience. It includes a user-friendly, human-in-the-loop interface for manually evaluating matches, defining Works (e.g., whether to include translations or not), and configuring the behavior of the service (e.g., matching all possible editions or just the best one).

The tool can also serve as a bridge to computational text analysis. A HathiTrust Volume ID can be used to computationally access the full text (for public domain works) or "bags of words" (for in-copyright works) for any text that is held by the HathiTrust Digital Library. This enable users to move from metadata to full computational text analysis. To learn more about accessing full text with Volume IDs, see the HathiTrust Feature Reader Python package.

🚀 Quick Start

👉 Read the full documentation →
📄 Download sample data (20 rows) // Google Sheet
🎥 Watch a tutorial/demo

Sample data source: Claire Grossman, Juliana Spahr, and Stephanie Young. 2022. “The Index of Major Literary Prizes in the US.” Post45 Data Collective. https://doi.org/10.18737/CNJV1733p4520221212

Installation

1. Install OpenRefine

BookReconciler 📘💎 is designed to work with OpenRefine, an open-source tool for working with messy data.

  1. Visit the OpenRefine download page.
  2. Download the latest release for your operating system (Windows, macOS, or Linux).
  3. Unzip the package (if needed) and follow the included instructions to start OpenRefine.
  4. Once running, OpenRefine will be available at:
    http://127.0.0.1:3333/

2. Install BookReconciler

Choose the installation method that works best for your system:

Option 1: Desktop App (Recommended)

Download and run the standalone desktop app.

macOS (Intel or Apple Silicon):

  1. Download the latest macOS app
  2. Open the .dmg file and drag BookReconciler.app to your Applications folder
  3. Launch BookReconciler — your browser will automatically open to http://127.0.0.1:5001/

Note: On first launch, macOS may show a security warning. Right-click the app and select OpenOpen to bypass.

Windows:

  1. Download the latest Windows installer (.exe)
  2. Run the installer and follow the prompts
  3. Launch BookReconciler from your Start Menu — your browser will automatically open to http://127.0.0.1:5001/

Note: Windows may show a SmartScreen warning. Click More infoRun anyway.

Once launched, you can access:

Option 2: Docker App (Mac/Windows)

Requirements: Install Docker Desktop and make sure it's running.

Mac (Intel or Apple Silicon):

  1. Download: BookReconcilerApp.zip
  2. Unzip and double-click BookReconcilerApp.app to launch
  3. Your browser will open to http://127.0.0.1:5001/

Windows:

  1. Download: BookReconcilerApp.bat.zip
  2. Unzip and double-click BookReconcilerApp.bat to launch
  3. Your browser will open to http://127.0.0.1:5001/
Option 3: Command Line with Docker

Works on any OS with Docker installed:

git clone https://github.com/Post45-Data-Collective/openrefine-reconciliation-service.git
cd openrefine-reconciliation-service
docker compose up
Option 4: Launch Your Own Server (Advanced)

If you'd rather not use Docker, you can follow these steps.

Requirements

  • Python 3.10+
  • macOS / Linux / Windows

1) Clone this GitHub repository

git clone https://github.com/<your-org-or-user>/openrefine-reconciliation-service.git
cd openrefine-reconciliation-service

2) Create a virtual environment (recommended but not required)

macOS / Linux

python3 -m venv .venv
source .venv/bin/activate

Windows (PowerShell)

python -m venv .venv
.venv\Scripts\Activate.ps1

3) Install required packages

pip install --upgrade pip
pip install -r requirements.txt

4) Start the server, which runs the BookReconciler tool

# Tell Flask which app to run
export FLASK_APP=app.py          # Windows PowerShell: $env:FLASK_APP="app.py"

# Start BookReconciler on port 5001
flask run --host=0.0.0.0 --port=5001
# (Optional during development) add --debug to auto-reload on file changes:
# flask run --host=0.0.0.0 --port=5001 --debug

When it starts, the service will be available at:


Basic Usage

  1. Open your dataset/project in OpenRefine.

  2. Click a column you want to reconcile—for example, the book "title" column.

  3. Choose Reconcile → Start reconciling…

  4. Click Add Standard Service.

  5. Paste the service URL for BookReconciler, which will connect you with Library of Congress, Wikidata, Google Books, and more:

    http://127.0.0.1:5001/api/v1/reconcile
    
  6. Select a reconciliation type (e.g., LC_Work_Id, OCLC_Record, HathiTrust, VIAF_Personal, VIAF_Title, Wikidata_Title).

  7. Optionally, add "Additional Properties," like the author's name, which may help improve match performance.

  8. Click Start Reconciling.

  9. Wait for reconciliation to complete. This can take seconds to hours depending on the number of values. Then, inspect matches.

  10. Lastly, add new values—ISBNs, Subject Headings, Descriptions, etc.—based on matches. Select Edit Column -> Add columns from reconciled values... Choose the values that you want to add from "Suggested Properties" (possible values are different for each service). They will be added to the spreadsheet.


Full Documentation

For more details about all of the customization and configuration options that are available with BookReconciler, more advanced usage instructions, and technical details, please see the Full Documentation in our Wiki.


Credit & Citation

This code is primarily written by Matt Miller, with contributions from Melanie Walsh and input from Dan Sinykin. The project is supported by the Post45 Data Collective. The code is licensed under the MIT License.

If you use this tool as part of a publication, you can credit us by citing the following paper:

"BookReconciler📘💎: An Open-Source Tool for Metadata Enrichment and Work-Level Clustering". Matt Miller, Dan Sinykin, and Melanie Walsh. Joint Conference on Digital Libraries. December 2025.

BibTeX Citation:

@inproceedings{miller-2025-bookreconciler,
  title = {BookReconciler📘💎: An Open-Source Tool for Metadata Enrichment and Work-Level Clustering},
  author = {Miller, Matt and Sinykin, Dan and Walsh, Melanie},
  booktitle = {Joint Conference on Digital Libraries},
  month = dec,
  year = {2025},
  publisher = {ACM/IEEE},
}

If you use this tool at all, we'd love to hear from you! You can fill out this Google Form or email us.


Acknowledgments

This project was initially supported by a grant from the National Endowment for the Humanities (NEH), "Post45 Data Collective: Enhancing Cultural Data Documentation, Interoperability, and Reach," and led by co-PIs Dan Sinykin and Melanie Walsh. The grant was slated to run from 2024-2026, but it was abruptly cancelled in spring 2025.

We are grateful to the Post45 Data Collective editorial board and to Juan Pablo Albornoz, Jen Doty, Sanghoon Oh, and Teddy Roland for early testing and feedback.


Contributing

In the near term, maintenance of this tool will be supported by the Post45 Data Collective. However, to grow and sustain this project, we strongly welcome and encourage contributions from the broader community (or funding :D ).

Feel free to add pull requests or get in touch if you have ideas. Please also note any issues or problems.


About

BookReconciler, A Tool for Metadata Enrichment and Clustering of Book Data

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors 3

  •  
  •  
  •