-
Notifications
You must be signed in to change notification settings - Fork 1
Auto_update #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto_update #14
Conversation
…lass; create DEMO notebook
…ed in github runner
…into auto_update
|
Well done, it works. I made some small tweaks...
One thing I'll say, you may find that this fails to commit in main because the branch is protected. There may be a way to add permissions to the github-actions bot to allow it to push to main - or update the commit command to force push. When working, developing or testing in such ways, always be careful with force push as these changes go straight into main. While they can be reverted, it's worth not having that headache. If you're happy go ahead and figure out the branch protection problem I pointed out and merge her! Lemme know how you get on 😄 |
In this PR, I have created a csv containing all scraped PFD reports (~5600). As an interim, this is just with HTML + PDF scraping (no LLM just yet).
This csv is now accessible to the user through new
loader.pymodule. They just have to run (see new notebook):Once a week, this csv is updated using PFDScraper's top_up() method. This is executed via a GitHub Workflow. In each iteration of the workflow, a summary is added detailing how many reports have been updated.
This workflow only operates in main.
**
@johnpytch I could really do with your expertise here to make sure I've designed this appropriately! I've noticed that this new workflow isn't appearing in the Actions tab, so I'm not sure if I'm missing a final piece of the puzzle.
Along with this running once a week, I want to be able to run the Workflow 'on demand'.