BMLibrarian v0.9.6 Release Notes

Release Date: December 2025
Previous Release: v0.9-alpha (November 11, 2025)
Total Commits: 817+ commits since v0.9-alpha

This release includes significant new features, extensive bug fixes, and substantial improvements to stability and reliability for the systematic review workflow.

Highlights

Complete Systematic Review Workflow - Full end-to-end systematic literature review automation with checkpoint-based resumability
Evidence Synthesis Engine - AI-powered citation extraction and narrative synthesis from included papers
Professional PDF Export - Publication-quality PDF generation from markdown reports
Unified Evaluations Database - PostgreSQL-backed evaluation tracking with full audit trail
Model Benchmarking Tool - Compare and evaluate document scoring models
Citation-Aware Writing Editor - Academic writing plugin with reference management

New Features

Systematic Review System

The systematic review module has been completely overhauled with production-ready capabilities:

Checkpoint-Based Resumability - Save and resume reviews at any phase:
- Search strategy checkpoint
- Initial results checkpoint
- Scoring complete checkpoint
- Quality assessment checkpoint
- Full progress history displayed when resuming
Evidence Synthesis - New EvidenceSynthesizer component:
- Extracts relevant citations from included papers
- Generates narrative synthesis answering research questions
- Configurable citation thresholds and limits
- Real-time progress callbacks
Improved Search & Scoring:
- Phased execution mode for better progress tracking
- Per-document progress updates in UI
- Default inclusion/exclusion criteria
- Comprehensive excluded paper tracking with reasons
Quality Assessment Improvements:
- Database caching for all quality assessments (PICO, PRISMA, Study Assessment, Paper Weight)
- Version-tracked cache invalidation
- Improved study type detection including narrative reviews, scoping reviews, and expert opinions
Systematic Review GUI (systematic_review_gui.py):
- Tabbed interface with report preview
- Real-time progress visualization
- Checkpoint browser for resume selection
- Full activity log with markdown formatting

PDF Export System

New professional PDF export using ReportLab:

Pure Python - No external dependencies (wkhtmltopdf, etc.)
Cross-Platform - Works on Windows, macOS, and Linux
Publication Quality - Proper fonts, page numbering, headers/footers
Full Markdown Support - Headings, lists, tables, code blocks, emphasis
Configurable - Page size (A4/Letter), fonts, margins, colors

uv run python export_to_pdf.py report.md -o report.pdf --research-report

Model Benchmarking Tool

New CLI and module for evaluating document scoring models:

uv run python model_benchmark_cli.py benchmark "research question" \
    --models gpt-oss:20b medgemma4B_it_q8:latest \
    --authoritative gpt-oss:120B

Compare scoring consistency across models
Alignment metrics and statistical analysis
Database-backed benchmark run history
Visualization of score distributions

Citation-Aware Writing Editor

New Writing plugin for academic document creation:

Markdown editor with live preview
Automatic References section management
Citation insertion from BMLibrarian database
Auto-save and document recovery
PDF export integration

Audit Trail Validation GUI

New interface for human review of automated evaluations:

uv run python audit_validation_gui.py --user reviewer_name

Review and validate AI-generated assessments
Incremental mode for unvalidated items only
Track validation decisions with explanations

Unified Evaluations Module

New database-backed evaluation tracking system:

PostgreSQL evaluations schema for all assessment types
Evaluation runs with status tracking (in_progress, completed, failed)
Processing time and confidence tracking
Full audit trail with timestamps

Improvements

Study Type Detection

Added new study types: narrative_review, scoping_review, expert_opinion
Improved LLM prompts for accurate study classification
Better handling of review articles that were previously classified as "unknown"

PRISMA Assessment

Auto-repair incomplete LLM responses instead of failing
Fill missing fields with sensible defaults and clear warnings
Track incomplete responses for quality monitoring
Include actual invalid values in warning messages for debugging

Database & Caching

Results cache for all quality assessments (study assessment, PICO, PRISMA, paper weight)
Version-based cache invalidation
Fixed N+1 query patterns in paper retrieval
Immediate evaluation persistence (no batch-only saves)
DateTimeEncoder for proper JSON serialization

GUI Improvements

Cross-platform font support (fixes macOS font warnings)
Default page size changed to A4 (international standard)
PDF viewer with text selection, search, and fit-width zoom
Improved progress bars with per-step tracking
Restored progress display when resuming from checkpoints

Bug Fixes

Critical Fixes

Fixed checkpoint resume crashes with proper error handling
Fixed evaluation data not being saved to database
Fixed datetime JSON serialization errors in cache manager
Fixed callback signature mismatch in EvidenceSynthesizer
Fixed N+1 query pattern causing performance issues

Systematic Review Fixes

Fixed InclusionDecision construction with required arguments
Fixed InclusionStatus.PENDING to use UNCERTAIN
Fixed InitialFilter initialization parameter errors
Fixed missing research_question in RelevanceScorer
Fixed UnboundLocalError in phased search mode
Fixed checkpoint files not being saved during resume
Fixed missing final_rank in checkpoint resume
Fixed quality gate statistics showing incorrect counts

Assessment Fixes

Fixed PaperWeightAssessmentAgent.assess_paper() parameter name
Fixed PRISMA None results crashing on .to_dict() calls
Fixed PostgreSQL type casting for evaluation functions
Fixed study_design field extraction for quality assessment

GUI Fixes

Fixed QThread crash on application close
Fixed validation status not updating in list views
Fixed pipe characters breaking markdown tables
Fixed report viewer attribute errors after merge

Breaking Changes

EvidenceSynthesizer.progress_callback now expects (message, current, total) signature
InclusionDecision now requires stage parameter (not exclusion_stage)
Relevance score range changed from (1, 5) to (0, 5) to allow marking irrelevant documents
Full-text documents must be chunked/embedded before paper weight assessment

Database Migrations

This release includes new database schemas:

evaluations schema for evaluation tracking
results_cache schema for quality assessment caching

Run the migration scripts before using new features:

uv run python -m bmlibrarian.database.migrations

Documentation Updates

New user guides for evidence synthesis, PDF export, and model benchmarking
Updated developer documentation for evaluations module
Added golden rules compliance documentation
Improved CLAUDE.md with comprehensive project structure

Contributors

This release was developed with significant contributions from Claude Code (Anthropic's AI coding assistant), demonstrating effective human-AI collaboration in complex software development.

Upgrade Instructions

Update dependencies:
```
uv sync
```

Run database migrations:

uv run python initial_setup_and_download.py your.env --skip-medrxiv --skip-pubmed

Clear any stale caches:

# In PostgreSQL
TRUNCATE results_cache.study_assessments CASCADE;

Known Issues

Large systematic reviews (>1000 papers) may require increased PostgreSQL connection pool size
PRISMA assessment may return incomplete results for some document types (auto-repaired with warnings)
Evidence synthesis requires Ollama models with sufficient context window

What's Next

Enhanced multi-model query generation
Improved inter-rater reliability analysis tools
Web-based interface option
Enhanced counterfactual analysis for contradictory evidence detection

For detailed documentation, see the doc/ directory or visit the project repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Systematic review enabled - still needs validation

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

BMLibrarian v0.9.6 Release Notes

Highlights

New Features

Systematic Review System

PDF Export System

Model Benchmarking Tool

Citation-Aware Writing Editor

Audit Trail Validation GUI

Unified Evaluations Module

Improvements

Study Type Detection

PRISMA Assessment

Database & Caching

GUI Improvements

Bug Fixes

Critical Fixes

Systematic Review Fixes

Assessment Fixes

GUI Fixes

Breaking Changes

Database Migrations

Documentation Updates

Contributors

Upgrade Instructions

Known Issues

What's Next

Uh oh!