Skip to content

bohdanbobrowski/blog2epub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

blog2epub

Maintenance MIT license GitHub all releases GitHub release (with filter) GitHub Release Date - Published_At

Convert website (blog) to epub using command line or GUI.

My main goal in creating this app is to preserve the legacy of the blogosphere for future generations.

Supported blogs:

  • *.blogspot.com
  • *.wordpress.com
  • multiple other blogs and even some webpages

Main features

  • command line (CLI) and graphic user interface (GUI)
  • script downloads all text contents of selected blog to epub file,
  • if it's possible, it includes post comments,
  • images are downsized to given resolution (600*800,640*960 or 1236*1648)
  • images are also by default converted to grayscale,
  • one post = one epub chapter,
  • chapters are sorted by date ascending,
  • cover is generated automatically from downloaded images.

Example covers

Installation

Checkout for latest available builds.

Running from sources

Easiest way

pip install git+https://github.com/bohdanbobrowski/blog2epub.git

Developer environment

git clone [email protected]:bohdanbobrowski/blog2epub.git
cd blog2epub
python -m venv venv
Windows:
venv\Scripts\activate
pip install -e .[dev]

macOS/Linux:

source ./venv/bin/activate
pip install -e .[dev]

Building own executable

Android

buildozer android debug

Windows

python build_windows.py
"C:\Program Files (x86)\NSIS\makensis" blog2epub_windows_installer.nsi

macOS

python build_macos.py
./make_macos_dmg.sh

Linux

I love Linux, but this gives me a headache:

AppImage

Works fine, but is HUGE (about 250 MB).

python build_linux.py
./make_linux_appimage.sh
Snap

This is promising, despite taking 150 MB... but still i have issues with plyer modules called fileselect and notification.

snapcraft pack

Screenshots of GUI

Android (Google Pixel 6a)

Windows (11)

Linux (Ubuntu 24.04)

macOS (Sequoia 15.6)

CLI

blog2epub --help
usage: Blog2epub Cli interface [-h] [-l LIMIT] [-s SKIP] [-q QUALITY] [-o OUTPUT] [-d] url

Convert blog (blogspot.com, wordpress.com or another based on Wordpress) to epub using CLI or GUI.

positional arguments:
  url                   url of blog to download

options:
  -h, --help            show this help message and exit
  -l LIMIT, --limit LIMIT
                        articles limit
  -s SKIP, --skip SKIP  number of skipped articles
  -q QUALITY, --quality QUALITY
                        images quality (0-100)
  -o OUTPUT, --output OUTPUT
                        output epub file name
  -d, --debug           turn on debug

Example:

blog2epub starybezpiek.blogspot.com -l=2 -o=example.epub
Starting blogger.com crawler
Found 54 articles to crawl.
Downloading.
1. 10 lat kremlowskiej propagandy, czyli RT ujawnia swoje sekrety
Downloading.
2. "Komunę obaliliśmy, a nadal jest źle. Dlaczego?" Czyli 1984 Orwella właściwie odczytany
Locale set as en_US.UTF-8
Generating cover (800px*600px) from 1 images.
Cover generated: .\starybezpiek.blogspot.com\example.epub.jpg
Epub created: .\example.epub

Examples

blog2epub starybezpiek.blogspot.com
blog2epub velosov.blogspot.com -l=10
blog2epub poznanskiehistorie.blogspot.com -q=100
blog2epub classicameras.blogspot.com --limit=10 --no-images

Running tests

pytest ./tests
pytest --cov=blog2epub ./tests
pytest --cov=blog2epub --cov-report=html ./tests

Current version

  • integration testing
  • increase unit test coverage
  • use sitemaps.xml for scraping
  • crawlers refactor
    • use builtin dataclasses instead of pydantic
    • more common methods in crawler class
    • expand crawler abstract
  • cli interface refactor
  • greek alphabet support
  • image download and attachment bug solved (ex. modernistyczny-poznan.blogspot.com)
  • color/bw images and covers
  • custom image/cover sizes
  • improved resistance to http errors
  • dedicated crawler class for zeissikonveb.de
  • (on GUI) skip value is enlarged on limit value (if such is set)
  • download progress is much more verbose, also on GUI it can be cancelled everytime
  • remove poetry as it's overcomplicated for the case,
  • Windows installer!
  • results of cancelled downloads might be converted to epub

» Complete Change Log here «

Project backlog

And finally, a list known bugs and future plans for some new functions and enhancements: BACKLOG.md

Project road map:

  • 1.0 - somewhat working
  • 2.0 - fully working project, 90% unit tested and available builds for Android/Windows/Linux/MacOS

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages