Skip to content
Matthew Harris edited this page Jan 21, 2016 · 4 revisions

###webengine

a script that will gather, correct, and convert html (raw) files to markdown, as well as download all the original images.

> python webengine.py -h
usage: webengine.py [-h] url href src type case

get all content (html / images) from a wiki (site) and convert to markdown

positional arguments:
  url         URL to site or wiki
  href        URL of the new site
  src         Path to the image directory
  type        "site" for full website or "wiki" for wiki Title Index page
  case        jekyll, hyde, none

optional arguments:
  -h, --help  show this help message and exit

Once this is completed you will have 5 directories.

whole_site      (original site from wget)
raw_files       (original html files)
html_files      (corrected html files)
md_files        (new corrected markdown files)
image_files     (original images)

this script is the driver for all the others:

you will want to view a few of the files in md_files to determine how many lines of the top and bottom are not needed. (old headers, menus, or footers) and then call filetrimmer

python filetrimmer.py converted_files X Y

If you do don't forget to re run head_adder.py with either jekyll or hyde

python head_adder.py case

Clone this wiki locally