audiobook

convert pdf document to audiobook

Description

This is proof of concept ML pipe to convert books to audiobooks using couple of incompatible libraries.
Tested on Divine Comedy polish pdf 357 pages from wolnelektury.pl see benchmark

Each page is separate file.
Each pipe produces it's own output that can be adjusted.
Exising files are skipped from output.

For example if you convert from pdf to html using document-to-html pipe and then from html to txt using html-to-text pipe and after that delete wav directory or files from wav directory that are invalid. You can adjust txt files to make better audio output for wav files.
Be aware that file names should always stay the same !

Files are numerated with page numbers from original document.

docling - for pdf to html conversion
beautifulsoup4 - for html cleanup
coqui-ai/TTS - for TTS

Install

go to each directory inside pipe
create .venv with python version from .python-version
1. ex. cd pipe/text-to-speech python version 3.11.11
2. use pyenv to install 3.11.11 python version
3. in pipe/text-to-speech directory execute ~/.pyenv/versions/3.11.11/bin/python -m venv .venv
4. execute source .venv/bin/activate to activate virtual env
5. run pip install -r requirements.txt to install requirements for given pipe
6. run deactivate and go to next pipe directory
after each pipe environment is installed run python -m audiobook validate to check if everything is correct

after correct installation pipe directory structure should look like that

pipe/
   document-to-html/
      .venv/ (python 3.13)
      ...
   html-to-text/
      .venv/ (python 3.13)
      ...
   text-to-speech
      .venv/ (python 3.11)
      ...

Run

Assuming that you managed to install everything, run with command line

python3 -m audiobook -d /path/to/some_pdf.pdf -m tts_models/en/ljspeech/vits

Help

python3 -m audiobook -h

tested models

tts_models/en/ljspeech/vits
tts_models/pl/mai_female/vits

TODO

list models from TTS on command line
provide steps as command line args
test with other types than pdf
support document ocr
support for coqui-ai/TTS multilingual models
fix text-to-speech pipe logging

Benchmark

on rtx3090 with power limit 250W (book with 357 pages)

time python -m audiobook -d boska-komedia.pdf -m tts_models/pl/mai_female/vits
real    8m38.380s
user    11m40.027s
sys     0m21.981s

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
pipe		pipe
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
audiobook.py		audiobook.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

audiobook

Description

Install

Run

TODO

Benchmark

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

vane/audiobook

Folders and files

Latest commit

History

Repository files navigation

audiobook

Description

Install

Run

TODO

Benchmark

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages