-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Matthew Harris edited this page Jan 21, 2016
·
4 revisions
###webengine
a script that will gather, correct, and convert html (raw) files to markdown, as well as download all the original images.
> python webengine.py -h
usage: webengine.py [-h] url href src type case
get all content (html / images) from a wiki (site) and convert to markdown
positional arguments:
url URL to site or wiki
href URL of the new site
src Path to the image directory
type "site" for full website or "wiki" for wiki Title Index page
case jekyll, hyde, none
optional arguments:
-h, --help show this help message and exit
Once this is completed you will have 5 directories.
whole_site (original site from wget)
raw_files (original html files)
html_files (corrected html files)
md_files (new corrected markdown files)
image_files (original images)
this script is the driver for all the others:
- wiki
type=wiki - site
type=site - file_corrector
- file_converter
- image_gaterer
- bold_cleanup
- head_adder
you will want to view a few of the files in md_files to determine how many lines of the top and bottom are not needed. (old headers, menus, or footers) and then call filetrimmer
python filetrimmer.py converted_files X Y
If you do don't forget to re run head_adder.py with either jekyll or hyde
python head_adder.py case