- Who is this for? Digital humanities researchers, librarians, metadata specialists, and more.
- What does it do? Finds, clusters, and enriches records for books. Adding ISBNS, HathiTrust IDs, subject headings, descriptions, page counts, publication dates, and more.
BookReconciler 📘💎 is a tool that helps you reconcile and enrich bibliographic data from multiple library and knowledge sources:
- Library of Congress
- Google Books
- OCLC / WorldCat
- HathiTrust
- VIAF
- Wikidata
- OpenLibrary
You can take a spreadsheet with only title and author information, and you can add identifiers like ISBNs, OCLC numbers, or HathiTrust Volume IDs, as well as valuable contextual information like Library of Congress Subject Headings, genres, descriptions, page counts, and dates of first publication. Additionally, you can find and cluster different editions or manifestations of the same Work (e.g., translations, reprints, etc.).
The tool currently works as an extension of the software application OpenRefine, which makes it accessible to those with and without computational experience. It includes a user-friendly, human-in-the-loop interface for manually evaluating matches, defining Works (e.g., whether to include translations or not), and configuring the behavior of the service (e.g., matching all possible editions or just the best one).
The tool can also serve as a bridge to computational text analysis. A HathiTrust Volume ID can be used to computationally access the full text (for public domain works) or "bags of words" (for in-copyright works) for any text that is held by the HathiTrust Digital Library. This enable users to move from metadata to full computational text analysis. To learn more about accessing full text with Volume IDs, see the HathiTrust Feature Reader Python package.
|
👉 Read the full documentation → |
BookReconciler 📘💎 is designed to work with OpenRefine, an open-source tool for working with messy data.
- Visit the OpenRefine download page.
- Download the latest release for your operating system (Windows, macOS, or Linux).
- Unzip the package (if needed) and follow the included instructions to start OpenRefine.
- Once running, OpenRefine will be available at:
http://127.0.0.1:3333/
Choose the installation method that works best for your system:
Option 1: Desktop App (Recommended)
Download and run the standalone desktop app.
- Download the latest macOS app
- Open the
.dmgfile and drag BookReconciler.app to your Applications folder - Launch BookReconciler — your browser will automatically open to http://127.0.0.1:5001/
Note: On first launch, macOS may show a security warning. Right-click the app and select Open → Open to bypass.
- Download the latest Windows installer (
.exe) - Run the installer and follow the prompts
- Launch BookReconciler from your Start Menu — your browser will automatically open to http://127.0.0.1:5001/
Note: Windows may show a SmartScreen warning. Click More info → Run anyway.
Once launched, you can access:
- Configuration interface: http://127.0.0.1:5001/
- OpenRefine endpoint: http://127.0.0.1:5001/api/v1/reconcile
Option 2: Docker App (Mac/Windows)
Requirements: Install Docker Desktop and make sure it's running.
- Download: BookReconcilerApp.zip
- Unzip and double-click BookReconcilerApp.app to launch
- Your browser will open to http://127.0.0.1:5001/
- Download: BookReconcilerApp.bat.zip
- Unzip and double-click BookReconcilerApp.bat to launch
- Your browser will open to http://127.0.0.1:5001/
Option 3: Command Line with Docker
Works on any OS with Docker installed:
git clone https://github.com/Post45-Data-Collective/openrefine-reconciliation-service.git
cd openrefine-reconciliation-service
docker compose upOption 4: Launch Your Own Server (Advanced)
If you'd rather not use Docker, you can follow these steps.
- Python 3.10+
- macOS / Linux / Windows
git clone https://github.com/<your-org-or-user>/openrefine-reconciliation-service.git
cd openrefine-reconciliation-servicemacOS / Linux
python3 -m venv .venv
source .venv/bin/activateWindows (PowerShell)
python -m venv .venv
.venv\Scripts\Activate.ps1pip install --upgrade pip
pip install -r requirements.txt# Tell Flask which app to run
export FLASK_APP=app.py # Windows PowerShell: $env:FLASK_APP="app.py"
# Start BookReconciler on port 5001
flask run --host=0.0.0.0 --port=5001
# (Optional during development) add --debug to auto-reload on file changes:
# flask run --host=0.0.0.0 --port=5001 --debugWhen it starts, the service will be available at:
- Browser User Interface (for configuration): http://127.0.0.1:5001/
- OpenRefine endpoint: http://127.0.0.1:5001/api/v1/reconcile
-
Open your dataset/project in OpenRefine.
-
Click a column you want to reconcile—for example, the book "title" column.
-
Paste the service URL for BookReconciler, which will connect you with Library of Congress, Wikidata, Google Books, and more:
http://127.0.0.1:5001/api/v1/reconcile -
Select a reconciliation type (e.g.,
LC_Work_Id,OCLC_Record,HathiTrust,VIAF_Personal,VIAF_Title,Wikidata_Title). -
Optionally, add "Additional Properties," like the author's name, which may help improve match performance.

-
Click Start Reconciling.
-
Wait for reconciliation to complete. This can take seconds to hours depending on the number of values. Then, inspect matches.

-
Lastly, add new values—ISBNs, Subject Headings, Descriptions, etc.—based on matches. Select Edit Column -> Add columns from reconciled values...
Choose the values that you want to add from "Suggested Properties" (possible values are different for each service).
They will be added to the spreadsheet.

For more details about all of the customization and configuration options that are available with BookReconciler, more advanced usage instructions, and technical details, please see the Full Documentation in our Wiki.
This code is primarily written by Matt Miller, with contributions from Melanie Walsh and input from Dan Sinykin. The project is supported by the Post45 Data Collective. The code is licensed under the MIT License.
If you use this tool as part of a publication, you can credit us by citing the following paper:
"BookReconciler📘💎: An Open-Source Tool for Metadata Enrichment and Work-Level Clustering". Matt Miller, Dan Sinykin, and Melanie Walsh. Joint Conference on Digital Libraries. December 2025.
BibTeX Citation:
@inproceedings{miller-2025-bookreconciler,
title = {BookReconciler📘💎: An Open-Source Tool for Metadata Enrichment and Work-Level Clustering},
author = {Miller, Matt and Sinykin, Dan and Walsh, Melanie},
booktitle = {Joint Conference on Digital Libraries},
month = dec,
year = {2025},
publisher = {ACM/IEEE},
}
If you use this tool at all, we'd love to hear from you! You can fill out this Google Form or email us.
This project was initially supported by a grant from the National Endowment for the Humanities (NEH), "Post45 Data Collective: Enhancing Cultural Data Documentation, Interoperability, and Reach," and led by co-PIs Dan Sinykin and Melanie Walsh. The grant was slated to run from 2024-2026, but it was abruptly cancelled in spring 2025.
We are grateful to the Post45 Data Collective editorial board and to Juan Pablo Albornoz, Jen Doty, Sanghoon Oh, and Teddy Roland for early testing and feedback.
In the near term, maintenance of this tool will be supported by the Post45 Data Collective. However, to grow and sustain this project, we strongly welcome and encourage contributions from the broader community (or funding :D ).
Feel free to add pull requests or get in touch if you have ideas. Please also note any issues or problems.



