Home

###webengine

a script that will gather, correct, and convert html (raw) files to markdown, as well as download all the original images.

> python webengine.py -h
usage: webengine.py [-h] url href src type case

get all content (html / images) from a wiki (site) and convert to markdown

positional arguments:
  url         URL to site or wiki
  href        URL of the new site
  src         Path to the image directory
  type        "site" for full website or "wiki" for wiki Title Index page
  case        jekyll, hyde, none

optional arguments:
  -h, --help  show this help message and exit

Once this is completed you will have 5 directories.

whole_site      (original site from wget)
raw_files       (original html files)
html_files      (corrected html files)
md_files        (new corrected markdown files)
image_files     (original images)

this script is the driver for all the others:

wiki type=wiki
- url_gatherer
- file_gatherer
site type=site
- site_gatherer
file_corrector
file_converter
image_gaterer
bold_cleanup
head_adder

you will want to view a few of the files in md_files to determine how many lines of the top and bottom are not needed. (old headers, menus, or footers) and then call filetrimmer

python filetrimmer.py converted_files X Y

If you do don't forget to re run head_adder.py with either jekyll or hyde

python head_adder.py case

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Home

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally