** You need Python 3.10 or later to run this script. **
This script uses the Save Page Now 2 Public API.
To use it:
-
Clone or download and unzip this repository.
-
Install the required Python libraries. Assuming you cloned or unzipped this repository to the directory
path/to/capture-urls/:cd path/to/capture-urls/ make -
Go to https://archive.org/account/s3.php and get your S3-like API keys.
-
In
path/to/capture-urls/, create a file calledsecret.pywith the following contents:ACCESS_KEY = 'your access key' SECRET_KEY = 'your secret key'
(Use the actual values of your access key and secret key, not
your access keyandyour secret key.) -
Optionally edit
config.pyto your liking. -
Archive your URLs:
cat urls.txt | ./capture-urls.py > archived-urls.txturls.txtshould contain a list of URLs to be archived, one on each line. -
Archiving URLs can take a long time. You can interrupt the process with
Ctrl-C. This will create a file calledprogress.jsonwith the state of the archiving process so far. If you start the process again, it will pick up where it left off. You can add new URLs tourls.txtbefore you restart the process. -
When it finishes running you should have a list of the archived URLs in
archived-urls.txt.