convert pdf document to audiobook
This is proof of concept ML pipe to convert books to audiobooks using couple of incompatible libraries.
Tested on Divine Comedy polish pdf 357 pages
from wolnelektury.pl see benchmark
Each page is separate file.
Each pipe produces it's own output that can be adjusted.
Exising files are skipped from output.
For example if you convert from pdf to html using document-to-html pipe
and then from html to txt using html-to-text pipe and after that
delete wav directory or files from wav directory that are invalid.
You can adjust txt files to make better audio output for wav files.
Be aware that file names should always stay the same !
Files are numerated with page numbers from original document.
docling - for pdf to html conversion
beautifulsoup4 - for html cleanup
coqui-ai/TTS - for TTS
- go to each directory inside pipe
- create
.venvwith python version from.python-version- ex.
cd pipe/text-to-speechpython version3.11.11 - use
pyenvto install3.11.11python version - in
pipe/text-to-speech directoryexecute~/.pyenv/versions/3.11.11/bin/python -m venv .venv - execute
source .venv/bin/activateto activate virtual env - run
pip install -r requirements.txtto install requirements for given pipe - run
deactivateand go to nextpipedirectory
- ex.
- after each pipe environment is installed run
python -m audiobook validateto check if everything is correct
after correct installation pipe directory structure should look like that
pipe/
document-to-html/
.venv/ (python 3.13)
...
html-to-text/
.venv/ (python 3.13)
...
text-to-speech
.venv/ (python 3.11)
...Assuming that you managed to install everything, run with command line
python3 -m audiobook -d /path/to/some_pdf.pdf -m tts_models/en/ljspeech/vitsHelp
python3 -m audiobook -htested models
tts_models/en/ljspeech/vits
tts_models/pl/mai_female/vits- list models from TTS on command line
- provide steps as command line args
- test with other types than pdf
- support document ocr
- support for coqui-ai/TTS multilingual models
- fix text-to-speech pipe logging
on rtx3090 with power limit 250W (book with 357 pages)
time python -m audiobook -d boska-komedia.pdf -m tts_models/pl/mai_female/vits
real 8m38.380s
user 11m40.027s
sys 0m21.981s