BMLibrarian v0.9.6 Release Notes
Release Date: December 2025
Previous Release: v0.9-alpha (November 11, 2025)
Total Commits: 817+ commits since v0.9-alpha
This release includes significant new features, extensive bug fixes, and substantial improvements to stability and reliability for the systematic review workflow.
Highlights
- Complete Systematic Review Workflow - Full end-to-end systematic literature review automation with checkpoint-based resumability
- Evidence Synthesis Engine - AI-powered citation extraction and narrative synthesis from included papers
- Professional PDF Export - Publication-quality PDF generation from markdown reports
- Unified Evaluations Database - PostgreSQL-backed evaluation tracking with full audit trail
- Model Benchmarking Tool - Compare and evaluate document scoring models
- Citation-Aware Writing Editor - Academic writing plugin with reference management
New Features
Systematic Review System
The systematic review module has been completely overhauled with production-ready capabilities:
-
Checkpoint-Based Resumability - Save and resume reviews at any phase:
- Search strategy checkpoint
- Initial results checkpoint
- Scoring complete checkpoint
- Quality assessment checkpoint
- Full progress history displayed when resuming
-
Evidence Synthesis - New EvidenceSynthesizer component:
- Extracts relevant citations from included papers
- Generates narrative synthesis answering research questions
- Configurable citation thresholds and limits
- Real-time progress callbacks
-
Improved Search & Scoring:
- Phased execution mode for better progress tracking
- Per-document progress updates in UI
- Default inclusion/exclusion criteria
- Comprehensive excluded paper tracking with reasons
-
Quality Assessment Improvements:
- Database caching for all quality assessments (PICO, PRISMA, Study Assessment, Paper Weight)
- Version-tracked cache invalidation
- Improved study type detection including narrative reviews, scoping reviews, and expert opinions
-
Systematic Review GUI (
systematic_review_gui.py):- Tabbed interface with report preview
- Real-time progress visualization
- Checkpoint browser for resume selection
- Full activity log with markdown formatting
PDF Export System
New professional PDF export using ReportLab:
- Pure Python - No external dependencies (wkhtmltopdf, etc.)
- Cross-Platform - Works on Windows, macOS, and Linux
- Publication Quality - Proper fonts, page numbering, headers/footers
- Full Markdown Support - Headings, lists, tables, code blocks, emphasis
- Configurable - Page size (A4/Letter), fonts, margins, colors
uv run python export_to_pdf.py report.md -o report.pdf --research-reportModel Benchmarking Tool
New CLI and module for evaluating document scoring models:
uv run python model_benchmark_cli.py benchmark "research question" \
--models gpt-oss:20b medgemma4B_it_q8:latest \
--authoritative gpt-oss:120B- Compare scoring consistency across models
- Alignment metrics and statistical analysis
- Database-backed benchmark run history
- Visualization of score distributions
Citation-Aware Writing Editor
New Writing plugin for academic document creation:
- Markdown editor with live preview
- Automatic References section management
- Citation insertion from BMLibrarian database
- Auto-save and document recovery
- PDF export integration
Audit Trail Validation GUI
New interface for human review of automated evaluations:
uv run python audit_validation_gui.py --user reviewer_name- Review and validate AI-generated assessments
- Incremental mode for unvalidated items only
- Track validation decisions with explanations
Unified Evaluations Module
New database-backed evaluation tracking system:
- PostgreSQL
evaluationsschema for all assessment types - Evaluation runs with status tracking (in_progress, completed, failed)
- Processing time and confidence tracking
- Full audit trail with timestamps
Improvements
Study Type Detection
- Added new study types:
narrative_review,scoping_review,expert_opinion - Improved LLM prompts for accurate study classification
- Better handling of review articles that were previously classified as "unknown"
PRISMA Assessment
- Auto-repair incomplete LLM responses instead of failing
- Fill missing fields with sensible defaults and clear warnings
- Track incomplete responses for quality monitoring
- Include actual invalid values in warning messages for debugging
Database & Caching
- Results cache for all quality assessments (study assessment, PICO, PRISMA, paper weight)
- Version-based cache invalidation
- Fixed N+1 query patterns in paper retrieval
- Immediate evaluation persistence (no batch-only saves)
- DateTimeEncoder for proper JSON serialization
GUI Improvements
- Cross-platform font support (fixes macOS font warnings)
- Default page size changed to A4 (international standard)
- PDF viewer with text selection, search, and fit-width zoom
- Improved progress bars with per-step tracking
- Restored progress display when resuming from checkpoints
Bug Fixes
Critical Fixes
- Fixed checkpoint resume crashes with proper error handling
- Fixed evaluation data not being saved to database
- Fixed datetime JSON serialization errors in cache manager
- Fixed callback signature mismatch in EvidenceSynthesizer
- Fixed N+1 query pattern causing performance issues
Systematic Review Fixes
- Fixed
InclusionDecisionconstruction with required arguments - Fixed
InclusionStatus.PENDINGto useUNCERTAIN - Fixed
InitialFilterinitialization parameter errors - Fixed missing
research_questioninRelevanceScorer - Fixed
UnboundLocalErrorin phased search mode - Fixed checkpoint files not being saved during resume
- Fixed missing
final_rankin checkpoint resume - Fixed quality gate statistics showing incorrect counts
Assessment Fixes
- Fixed
PaperWeightAssessmentAgent.assess_paper()parameter name - Fixed PRISMA
Noneresults crashing on.to_dict()calls - Fixed PostgreSQL type casting for evaluation functions
- Fixed
study_designfield extraction for quality assessment
GUI Fixes
- Fixed QThread crash on application close
- Fixed validation status not updating in list views
- Fixed pipe characters breaking markdown tables
- Fixed report viewer attribute errors after merge
Breaking Changes
EvidenceSynthesizer.progress_callbacknow expects(message, current, total)signatureInclusionDecisionnow requiresstageparameter (notexclusion_stage)- Relevance score range changed from
(1, 5)to(0, 5)to allow marking irrelevant documents - Full-text documents must be chunked/embedded before paper weight assessment
Database Migrations
This release includes new database schemas:
evaluationsschema for evaluation trackingresults_cacheschema for quality assessment caching
Run the migration scripts before using new features:
uv run python -m bmlibrarian.database.migrationsDocumentation Updates
- New user guides for evidence synthesis, PDF export, and model benchmarking
- Updated developer documentation for evaluations module
- Added golden rules compliance documentation
- Improved CLAUDE.md with comprehensive project structure
Contributors
This release was developed with significant contributions from Claude Code (Anthropic's AI coding assistant), demonstrating effective human-AI collaboration in complex software development.
Upgrade Instructions
-
Update dependencies:
uv sync
-
Run database migrations:
uv run python initial_setup_and_download.py your.env --skip-medrxiv --skip-pubmed
-
Clear any stale caches:
# In PostgreSQL TRUNCATE results_cache.study_assessments CASCADE;
Known Issues
- Large systematic reviews (>1000 papers) may require increased PostgreSQL connection pool size
- PRISMA assessment may return incomplete results for some document types (auto-repaired with warnings)
- Evidence synthesis requires Ollama models with sufficient context window
What's Next
- Enhanced multi-model query generation
- Improved inter-rater reliability analysis tools
- Web-based interface option
- Enhanced counterfactual analysis for contradictory evidence detection
For detailed documentation, see the doc/ directory or visit the project repository.