-
Notifications
You must be signed in to change notification settings - Fork 192
fix: Council Fix Pack - January 2026 #1772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fix: robbrad#1500 - Renfrewshire Council
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1772 +/- ##
=======================================
Coverage 86.81% 86.81%
=======================================
Files 9 9
Lines 1138 1138
=======================================
Hits 988 988
Misses 150 150 ☔ View full report in Codecov by Sentry. |
fix: robbrad#1771 - South Lanarkshire Council
📝 WalkthroughWalkthroughMultiple council scrapers were converted from browser/Selenium flows to direct HTTP/UPRN or ICS-based retrievals; a new Folkestone & Hythe requests-based parser was added; Castle Point calendar parsing was reworked; tests and wiki docs updated to reflect UPRN and parameter changes. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant CouncilSite as Council Website
participant ICSServer as ICS Host
participant Parser as ICS Parser
Note over Client,CouncilSite: Renfrewshire UPRN → ICS retrieval flow
Client->>CouncilSite: GET view URL (with UPRN)
CouncilSite-->>Client: HTML page containing ICS link
Client->>Client: Extract ICS link from HTML
Client->>ICSServer: GET ICS file
ICSServer-->>Client: Return ICS calendar data
Client->>Parser: Parse ICS events (filter next 365 days)
Parser-->>Client: Map events to bin types + dates
Client->>Client: Build & sort `bins` by collectionDate
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
fix: robbrad#1777 - fix: Newark and Sherwood District Council
fix: robbrad#1760 - Folkstone and Hythe District Council
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
wiki/Councils.md (1)
85-86: Remove duplicate Cumberland entry from contents list and clarify the two council sections.The contents list at lines 85–86 contains a duplicate
- [Cumberland](#cumberland)entry. Additionally, the file has two separate sections both titled "### Cumberland" (one for CumberlandAllerdaleCouncil using postcode/house number parameters, another for CumberlandCouncil using UPRN). Since Allerdale merged into Cumberland, clarify which entry users should use—either remove the legacy CumberlandAllerdaleCouncil entry, add a deprecation note, or use distinct section headings to differentiate them.
🧹 Nitpick comments (3)
uk_bin_collection/uk_bin_collection/councils/RenfrewshireCouncil.py (2)
39-55: Consider error handling for ICS parsing.The
events()function fromicaleventscould fail if the ICS file is malformed or the URL is unreachable. While the current implementation will naturally propagate exceptions, consider whether more specific error messages would improve debugging.Potential improvement
- # Parse ICS calendar - upcoming_events = events(ics_url, start=now, end=future) + # Parse ICS calendar + try: + upcoming_events = events(ics_url, start=now, end=future) + except Exception as e: + raise ValueError( + f"Failed to parse ICS calendar for UPRN {user_uprn}: {str(e)}" + ) from eThis would provide clearer error messages if ICS parsing fails, though the existing behavior of propagating the original exception is also acceptable.
57-59: Usedate_formatconstant for consistency.The hardcoded date format string
"%d/%m/%Y"should ideally reference thedate_formatconstant to ensure consistency across the codebase.🔎 Suggested improvement
bindata["bins"].sort( - key=lambda x: datetime.strptime(x.get("collectionDate"), "%d/%m/%Y") + key=lambda x: datetime.strptime(x.get("collectionDate"), date_format) )This ensures that if
date_formatis ever changed, the sorting logic remains consistent.uk_bin_collection/uk_bin_collection/councils/FolkstoneandHytheDistrictCouncil.py (1)
4-4: Static analysis flagged star imports.Static analysis tools flag
datetimeanddate_formatas potentially undefined because they're imported via star import fromcommon. While this is a codebase-wide pattern, explicit imports would improve clarity and help static analysis tools.Optional refactor to explicit imports
+from datetime import datetime from uk_bin_collection.uk_bin_collection.common import *Note: This assumes
date_formatis intentionally provided by thecommonmodule's star export.Also applies to: 38-38, 42-42
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
uk_bin_collection/tests/input.jsonuk_bin_collection/uk_bin_collection/councils/FolkstoneandHytheDistrictCouncil.pyuk_bin_collection/uk_bin_collection/councils/NewarkAndSherwoodDC.pyuk_bin_collection/uk_bin_collection/councils/RenfrewshireCouncil.pyuk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.pywiki/Councils.md
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/NewarkAndSherwoodDC.py (1)
uk_bin_collection/uk_bin_collection/common.py (1)
check_uprn(67-78)
🪛 Ruff (0.14.10)
uk_bin_collection/uk_bin_collection/councils/NewarkAndSherwoodDC.py
20-20: check_uprn may be undefined, or defined from star imports
(F405)
25-25: requests may be undefined, or defined from star imports
(F405)
uk_bin_collection/uk_bin_collection/councils/RenfrewshireCouncil.py
22-22: check_uprn may be undefined, or defined from star imports
(F405)
28-28: Probable use of requests call without timeout
(S113)
35-37: Avoid specifying long messages outside the exception class
(TRY003)
53-53: date_format may be undefined, or defined from star imports
(F405)
uk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.py
96-96: date_format may be undefined, or defined from star imports
(F405)
uk_bin_collection/uk_bin_collection/councils/FolkstoneandHytheDistrictCouncil.py
38-38: datetime may be undefined, or defined from star imports
(F405)
42-42: date_format may be undefined, or defined from star imports
(F405)
🔇 Additional comments (11)
wiki/Councils.md (6)
160-160: ✓ Isle of Anglesey entry is well integrated.The new council entry is properly alphabetized in the contents list and follows the documentation format. The note clarifying that either UPRN or postcode/house number can be passed, and that the URL is not used, is helpful.
Also applies to: 2148-2158
1646-1654: ✓ Epsom and Ewell parameters and Selenium requirement properly documented.The updated command, parameter list, and explicit note about Selenium requirement are clear and consistent with other Selenium-dependent councils in the documentation.
3141-3147: ✓ Renfrewshire scraper modernization properly documented.The updated URL and parameter set (UPRN-only, no Selenium) align with the stated PR objective. The removal of
-pand-wparameters and the updated note clearly communicate the breaking change. Users transitioning from the old approach will need to adjust their setup.
1753-1762: ✓ Folkestone and Hythe entry is properly documented.The UPRN-based approach without Selenium is clearly presented and consistent with other modernized council entries.
2526-2532: ✓ Merton and North Hertfordshire entries updated consistently.Both councils are now documented with the UPRN-based approach (
-s -uparameters) and updated URLs. The changes align with the PR's modernization strategy and are properly documented.Also applies to: 2809-2815
1-2: Verify that input.json has been updated to match these documentation changes.This file is marked as auto-generated from
uk_bin_collection/tests/input.json. The marked changes should be reflected in input.json to ensure they persist after regeneration. If these changes were made directly to Councils.md without updating the source file, they will be lost during the next regeneration.Confirm that all council entry updates (Isle of Anglesey, Cumberland, Epsom and Ewell, Merton, North Hertfordshire, Renfrewshire) have corresponding updates in input.json.
uk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.py (2)
74-75: LGTM! Guard prevents index error on "Area" rows.This addition directly addresses issue #1771 by skipping rows where
schedule_typeis "Area", which likely lack the expected table cell structure that would cause the "list index out of range" error at line 77 when accessing.split(" ")[1].
95-97: LGTM! Improved readability.The reformatting of the date formatting statement improves code readability with no functional changes.
uk_bin_collection/tests/input.json (1)
1973-1980: LGTM! Test data updated for uprn-based flow.The test configuration correctly reflects the migration from Selenium-based scraping to the uprn-based ICS retrieval approach:
- Replaced
house_number,paon, andpostcodewithuprn- Removed
web_driverrequirement- Updated
wiki_noteto reference FindMyAddress for UPRN lookupuk_bin_collection/uk_bin_collection/councils/RenfrewshireCouncil.py (2)
1-5: LGTM! Imports updated for ICS-based approach.The imports correctly reflect the migration from Selenium to a direct HTTP + ICS parsing approach:
requestsfor HTTP callsicaleventsfor ICS calendar parsingdatetimeutilities for date handling
21-23: LGTM! UPRN parameter handling is correct.The code properly extracts and validates the UPRN parameter using the
check_uprnhelper function from the common module.
uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py
Show resolved
Hide resolved
uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py
Show resolved
Hide resolved
fix: robbrad#1803 robbrad#1793 Castlepoint District Council
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py:
- Around line 68-74: The loop in CastlepointDistrictCouncil (within the method
that iterates calendarContainer2) currently does "return 0" when a table is
missing and prints month_txt; replace the early return with a skip (e.g.,
continue) so parsing proceeds, remove the debug print, and ensure the method
always returns a dict with a "bins" key (e.g., return {"bins": []} when nothing
parsed) so callers get the expected type; add a warning log if you want to
surface missing tables instead of returning non-dict values.
- Around line 63-66: The code calls calendarContainer.find_all(...) without
checking calendarContainer for None; update the logic in
CastlepointDistrictCouncil (around the calendarContainer variable) to first test
if calendarContainer is truthy and handle the None case (e.g., set
calendarContainer2 = [] or return/skip processing and log a warning) before
invoking calendarContainer.find_all so you avoid an AttributeError.
- Around line 58-61: Null-check wasteCalendarContainer (and the h1 element)
before trying to call get_text to avoid AttributeError if the page structure
changes or the request fails; if the container or h1 is missing, handle
gracefully (e.g., log an error/raise a descriptive exception or return/skip) and
remove the debug print(year) statement so no unintended console output remains;
keep the parsing logic (year_txt and datetime.strptime) intact once the elements
are validated.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py
🧰 Additional context used
🪛 Ruff (0.14.10)
uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py
60-60: datetime may be undefined, or defined from star imports
(F405)
73-73: datetime may be undefined, or defined from star imports
(F405)
86-86: datetime may be undefined, or defined from star imports
(F405)
88-88: date_format may be undefined, or defined from star imports
(F405)
94-94: datetime may be undefined, or defined from star imports
(F405)
96-96: date_format may be undefined, or defined from star imports
(F405)
101-101: datetime may be undefined, or defined from star imports
(F405)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run Integration Tests (3.12, 1.8.4)
- GitHub Check: Run Unit Tests (3.12, 1.8.4)
🔇 Additional comments (1)
uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py (1)
49-52: LGTM on the form field update.The switch from USRN to
roadIDaligns with the updated API requirements.
| wasteCalendarContainer = soup.find("div", class_="contentContainer") | ||
| year_txt = wasteCalendarContainer.find("h1").get_text(strip=True) | ||
| year = datetime.strptime(year_txt, "About my Street - %B %Y").strftime("%Y") | ||
| print(year) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add null check and remove debug print statement.
If the page structure changes or the request fails, wasteCalendarContainer could be None, causing an AttributeError on line 59. Additionally, the print(year) statement appears to be a debug artifact.
Suggested fix
wasteCalendarContainer = soup.find("div", class_="contentContainer")
+ if not wasteCalendarContainer:
+ return data
year_txt = wasteCalendarContainer.find("h1").get_text(strip=True)
year = datetime.strptime(year_txt, "About my Street - %B %Y").strftime("%Y")
- print(year)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| wasteCalendarContainer = soup.find("div", class_="contentContainer") | |
| year_txt = wasteCalendarContainer.find("h1").get_text(strip=True) | |
| year = datetime.strptime(year_txt, "About my Street - %B %Y").strftime("%Y") | |
| print(year) | |
| wasteCalendarContainer = soup.find("div", class_="contentContainer") | |
| if not wasteCalendarContainer: | |
| return data | |
| year_txt = wasteCalendarContainer.find("h1").get_text(strip=True) | |
| year = datetime.strptime(year_txt, "About my Street - %B %Y").strftime("%Y") |
🧰 Tools
🪛 Ruff (0.14.10)
60-60: datetime may be undefined, or defined from star imports
(F405)
🤖 Prompt for AI Agents
In @uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py
around lines 58 - 61, Null-check wasteCalendarContainer (and the h1 element)
before trying to call get_text to avoid AttributeError if the page structure
changes or the request fails; if the container or h1 is missing, handle
gracefully (e.g., log an error/raise a descriptive exception or return/skip) and
remove the debug print(year) statement so no unintended console output remains;
keep the parsing logic (year_txt and datetime.strptime) intact once the elements
are validated.
uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py
Show resolved
Hide resolved
| for container in calendarContainer2: | ||
| table = container.find("table", class_="calendar") | ||
| if not table: | ||
| return 0 | ||
| month_txt = container.find("tr", class_="calendar").get_text(strip=True) | ||
| month = datetime.strptime(month_txt, "%B").strftime("%m") | ||
| print(month_txt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: return 0 breaks the expected return type.
Returning 0 when a table is not found violates the method contract—callers expect a dict with a "bins" key. This will cause downstream failures. Additionally, the print(month_txt) statement should be removed.
Suggested fix
for container in calendarContainer2:
table = container.find("table", class_="calendar")
if not table:
- return 0
+ continue
month_txt = container.find("tr", class_="calendar").get_text(strip=True)
month = datetime.strptime(month_txt, "%B").strftime("%m")
- print(month_txt)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for container in calendarContainer2: | |
| table = container.find("table", class_="calendar") | |
| if not table: | |
| return 0 | |
| month_txt = container.find("tr", class_="calendar").get_text(strip=True) | |
| month = datetime.strptime(month_txt, "%B").strftime("%m") | |
| print(month_txt) | |
| for container in calendarContainer2: | |
| table = container.find("table", class_="calendar") | |
| if not table: | |
| continue | |
| month_txt = container.find("tr", class_="calendar").get_text(strip=True) | |
| month = datetime.strptime(month_txt, "%B").strftime("%m") |
🧰 Tools
🪛 Ruff (0.14.10)
73-73: datetime may be undefined, or defined from star imports
(F405)
🤖 Prompt for AI Agents
In @uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py
around lines 68 - 74, The loop in CastlepointDistrictCouncil (within the method
that iterates calendarContainer2) currently does "return 0" when a table is
missing and prints month_txt; replace the early return with a skip (e.g.,
continue) so parsing proceeds, remove the debug print, and ensure the method
always returns a dict with a "bins" key (e.g., return {"bins": []} when nothing
parsed) so callers get the expected type; add a warning log if you want to
surface missing tables instead of returning non-dict values.
| pink_days = [ | ||
| td.get_text(strip=True) for td in calendar.find_all("td", class_="pink") | ||
| td.get_text(strip=True) for td in table.find_all("td", class_="pink") | ||
| ] | ||
| black_days = [ | ||
| td.get_text(strip=True) | ||
| for td in calendar.find_all("td", class_="normal") | ||
| normal_days = [ | ||
| td.get_text(strip=True) for td in table.find_all("td", class_="normal") | ||
| ] | ||
|
|
||
| for day in pink_days: | ||
| out.append( | ||
| ( | ||
| "Pink collection", | ||
| datetime(year=int(year), month=int(month), day=int(day)), | ||
| ) | ||
| ) | ||
| for day in black_days: | ||
| out.append( | ||
| ( | ||
| "Normal collection", | ||
| datetime(year=int(year), month=int(month), day=int(day)), | ||
| ) | ||
| ) | ||
|
|
||
| return out | ||
|
|
||
| collection_tuple = [] | ||
| for s in (soup, soup_nextmonth): | ||
| collection_tuple.extend(parse_calendar_month(s)) | ||
|
|
||
| ordered_data = sorted(collection_tuple, key=lambda x: x[1]) | ||
|
|
||
| for item in ordered_data: | ||
| dict_data = { | ||
| "type": item[0], | ||
| "collectionDate": item[1].strftime(date_format), | ||
| } | ||
| data["bins"].append(dict_data) | ||
| dict_data = { | ||
| "type": "Pink collection", | ||
| "collectionDate": datetime( | ||
| year=int(year), month=int(month), day=int(day) | ||
| ).strftime(date_format), | ||
| } | ||
| data["bins"].append(dict_data) | ||
| for day in normal_days: | ||
| dict_data = { | ||
| "type": "Normal collection", | ||
| "collectionDate": datetime( | ||
| year=int(year), month=int(month), day=int(day) | ||
| ).strftime(date_format), | ||
| } | ||
| data["bins"].append(dict_data) | ||
|
|
||
| data["bins"].sort( | ||
| key=lambda x: datetime.strptime(x.get("collectionDate"), "%d/%m/%Y") | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against empty day values and verify format consistency.
- If a table cell with class
pinkornormalcontains no text (e.g., empty cell placeholder),int(day)will raise aValueError. - The sort on line 101 uses a hardcoded format
"%d/%m/%Y"while entries usedate_format. Ifdate_formatdiffers, the sort will fail.
Suggested fix
pink_days = [
- td.get_text(strip=True) for td in table.find_all("td", class_="pink")
+ td.get_text(strip=True) for td in table.find_all("td", class_="pink") if td.get_text(strip=True)
]
normal_days = [
- td.get_text(strip=True) for td in table.find_all("td", class_="normal")
+ td.get_text(strip=True) for td in table.find_all("td", class_="normal") if td.get_text(strip=True)
]
for day in pink_days:
dict_data = {
"type": "Pink collection",
"collectionDate": datetime(
year=int(year), month=int(month), day=int(day)
).strftime(date_format),
}
data["bins"].append(dict_data)
for day in normal_days:
dict_data = {
"type": "Normal collection",
"collectionDate": datetime(
year=int(year), month=int(month), day=int(day)
).strftime(date_format),
}
data["bins"].append(dict_data)
data["bins"].sort(
- key=lambda x: datetime.strptime(x.get("collectionDate"), "%d/%m/%Y")
+ key=lambda x: datetime.strptime(x.get("collectionDate"), date_format)
)🧰 Tools
🪛 Ruff (0.14.10)
86-86: datetime may be undefined, or defined from star imports
(F405)
88-88: date_format may be undefined, or defined from star imports
(F405)
94-94: datetime may be undefined, or defined from star imports
(F405)
96-96: date_format may be undefined, or defined from star imports
(F405)
101-101: datetime may be undefined, or defined from star imports
(F405)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In
@uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py:
- Around line 25-31: The first GET to initialize session cookies (response =
session.get(URI1)) does not check for errors; add a status check immediately
after that call (e.g., call response.raise_for_status() or validate
response.status_code and raise/log) before proceeding to the second request
(response = session.get(URI2)) so that failures on the URI1 request are detected
and handled early; update any surrounding try/except or logging to surface the
error context from the URI1 request.
In @wiki/Councils.md:
- Around line 2148-2158: The documentation lists flags `-s` and `-u` but the
note says "Pass either UPRN or postcode/house number", causing inconsistency;
either document the postcode/house-number flags (e.g., add `-p` for postcode and
`-n` for house number to the "Additional parameters" list with brief
descriptions) or revise the note to state that only UPRN (`-u`) is accepted;
update the Isle of Anglesey block so the parameters list and the note are
consistent (referencing the existing `-s` and `-u` flags and the note text).
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py (1)
1-5: Star import obscures dependencies.The wildcard import on line 4 makes it unclear which symbols (
check_uprn,datetime,date_format) come fromcommon. This triggers Ruff F403/F405 warnings. Consider explicit imports for clarity and to avoid accidental shadowing.♻️ Suggested explicit imports
import requests from bs4 import BeautifulSoup -from uk_bin_collection.uk_bin_collection.common import * +from uk_bin_collection.uk_bin_collection.common import check_uprn, date_format +from datetime import datetime from uk_bin_collection.uk_bin_collection.get_bin_data import AbstractGetBinDataClass
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
uk_bin_collection/tests/input.jsonuk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.pywiki/Councils.md
🚧 Files skipped from review as they are similar to previous changes (1)
- uk_bin_collection/tests/input.json
🧰 Additional context used
🪛 Ruff (0.14.10)
uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py
4-4: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names
(F403)
16-16: Unused method argument: page
(ARG002)
19-19: check_uprn may be undefined, or defined from star imports
(F405)
47-47: datetime may be undefined, or defined from star imports
(F405)
53-53: date_format may be undefined, or defined from star imports
(F405)
58-58: datetime may be undefined, or defined from star imports
(F405)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Run Unit Tests (3.12, 1.8.4)
- GitHub Check: Run Integration Tests (3.12, 1.8.4)
🔇 Additional comments (9)
uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py (2)
16-19: LGTM on method signature and UPRN validation.The
pageparameter is unused because this scraper fetches data directly via HTTP. This is an acceptable pattern when the base class interface requires the parameter but the implementation uses its own data retrieval. The UPRN validation viacheck_uprnis correctly placed.
45-59: Date parsing and sorting logic is correct.The date extraction handles edge cases well with the try/except block. The sorting on line 58 uses
strptimewith"%d/%m/%Y", which correctly matches thedate_formatconstant defined incommon.py(line 17:date_format = "%d/%m/%Y"). Since the code formats dates withdt.strftime(date_format)on line 52 and then parses them with the same format on line 58, the logic is consistent and will function correctly.wiki/Councils.md (7)
1246-1254: LGTM!Cumberland Council documentation correctly updated to reflect UPRN-only parameters with the new URL path.
1644-1655: LGTM!Epsom and Ewell documentation correctly updated with Selenium requirement and all necessary parameters.
1753-1762: LGTM!Folkestone and Hythe documentation correctly updated to reflect the new UPRN-based, non-Selenium implementation.
2160-2169: LGTM!Islington documentation correctly updated with postcode parameter.
2524-2533: LGTM!Merton documentation correctly updated with new URL and UPRN-based parameters.
2807-2816: Verify URL and parameter redundancy.The URL contains
XXXXXXXXplaceholder (for UPRN in the path) and also documents-u XXXXXXXXparameter. Confirm whether both are required or if the UPRN should only be in one place to avoid user confusion.
3139-3148: LGTM!Renfrewshire documentation correctly updated to reflect the breaking change from Selenium-based to UPRN-only implementation, aligning with PR objective #1500.
uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py
Show resolved
Hide resolved
| ### Isle of Anglesey | ||
| ```commandline | ||
| python collect_data.py IsleOfAngleseyCouncil https://www.anglesey.gov.wales/en/Residents/Bins-and-recycling/Waste-Collection-Day.aspx -s -u XXXXXXXX | ||
| ``` | ||
| Additional parameters: | ||
| - `-s` - skip get URL | ||
| - `-u` - UPRN | ||
|
|
||
| Note: Pass either UPRN or postcode/house number. URL is not used. | ||
|
|
||
| --- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation note inconsistent with parameters.
The note states "Pass either UPRN or postcode/house number" but only -s and -u parameters are documented. If postcode/house number are supported, add -p and -n to the parameters list. Otherwise, update the note.
🤖 Prompt for AI Agents
In @wiki/Councils.md around lines 2148 - 2158, The documentation lists flags
`-s` and `-u` but the note says "Pass either UPRN or postcode/house number",
causing inconsistency; either document the postcode/house-number flags (e.g.,
add `-p` for postcode and `-n` for house number to the "Additional parameters"
list with brief descriptions) or revise the note to state that only UPRN (`-u`)
is accepted; update the Isle of Anglesey block so the parameters list and the
note are consistent (referencing the existing `-s` and `-u` flags and the note
text).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py:
- Around line 107-109: Replace the hardcoded "%d/%m/%Y" in the sort key with the
shared date_format used elsewhere so parsing stays consistent; update the lambda
to use the common date_format (e.g., datetime.strptime(x.get("collectionDate"),
date_format)) or import date_format from the common module and use it, and
optionally guard against missing/invalid dates by providing a fallback or
wrapping the parse in a try/except to avoid ValueError during sorting.
- Around line 75-77: Remove the debug print and guard against a missing
tr.calendar element: check the result of container.find("tr", class_="calendar")
before calling get_text/strptime (e.g., if result is None, skip this container
or log and continue), then compute month_txt and month using datetime.strptime
only when the element exists; update the code around the month_txt/ month
assignment in the CastlepointDistrictCouncil parsing routine to remove the
print(month_txt) and add the null check for the find call.
- Around line 58-69: The code assumes wasteCalendarContainer.find("h1") always
exists and parses a single `year` for all months, which can raise AttributeError
or mis-tag dates across year boundaries; first add a defensive check that
`wasteCalendarContainer.find("h1")` is not None before using `get_text`
(returning early or logging as other checks do) and only parse a starting
month/year into `current_year` and `current_month`; then in the loop that
iterates `calendarContainer2` (and where dates are constructed) recompute each
month from the month header and detect year rollover by comparing month numbers
(if next month < previous month increment `current_year`) and use `current_year`
(not the original `year` variable) when building date strings.
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py (1)
90-105: Consider consolidating duplicate entry-building logic.The pink and normal collection entry creation blocks are nearly identical. This is minor, but consolidating would reduce duplication.
Optional refactor
-for day in pink_days: - dict_data = { - "type": "Pink collection", - "collectionDate": datetime( - year=int(year), month=int(month), day=int(day) - ).strftime(date_format), - } - data["bins"].append(dict_data) -for day in normal_days: - dict_data = { - "type": "Normal collection", - "collectionDate": datetime( - year=int(year), month=int(month), day=int(day) - ).strftime(date_format), - } - data["bins"].append(dict_data) +for collection_type, days in [("Pink collection", pink_days), ("Normal collection", normal_days)]: + for day in days: + data["bins"].append({ + "type": collection_type, + "collectionDate": datetime( + year=int(year), month=int(month), day=int(day) + ).strftime(date_format), + })
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py
🧰 Additional context used
🪛 Ruff (0.14.10)
uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py
62-62: datetime may be undefined, or defined from star imports
(F405)
76-76: datetime may be undefined, or defined from star imports
(F405)
93-93: datetime may be undefined, or defined from star imports
(F405)
95-95: date_format may be undefined, or defined from star imports
(F405)
101-101: datetime may be undefined, or defined from star imports
(F405)
103-103: date_format may be undefined, or defined from star imports
(F405)
108-108: datetime may be undefined, or defined from star imports
(F405)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Integration Tests (3.12, 1.8.4)
🔇 Additional comments (2)
uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py (2)
29-30: LGTM!The data structure initialization and the form field change from
USRNtoroadIDalign with the PR objectives to update selectors for reliable scraper initialization.Also applies to: 49-49
79-88: LGTM!The day extraction logic with filtering for non-empty text is clean and defensive.
| wasteCalendarContainer = soup.find("div", class_="contentContainer") | ||
| if not wasteCalendarContainer: | ||
| return data | ||
| year_txt = wasteCalendarContainer.find("h1").get_text(strip=True) | ||
| year = datetime.strptime(year_txt, "About my Street - %B %Y").strftime("%Y") | ||
|
|
||
| calendarContainer = soup.find("div", class_="calendarContainer") | ||
| if not calendarContainer: | ||
| return data | ||
| calendarContainer2 = calendarContainer.find_all( | ||
| "div", class_="calendarContainer" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential AttributeError if <h1> element is missing.
Line 61 assumes find("h1") succeeds, but if the page structure changes or the element is missing, this will raise an AttributeError. Add a defensive check similar to the container checks.
Additionally, the year is extracted once from the header (e.g., "January 2026"), but if the calendar spans across a year boundary (e.g., December 2025 to January 2026), all dates will incorrectly use the same year.
Proposed fix for null check and year rollover
wasteCalendarContainer = soup.find("div", class_="contentContainer")
if not wasteCalendarContainer:
return data
-year_txt = wasteCalendarContainer.find("h1").get_text(strip=True)
-year = datetime.strptime(year_txt, "About my Street - %B %Y").strftime("%Y")
+h1_element = wasteCalendarContainer.find("h1")
+if not h1_element:
+ return data
+year_txt = h1_element.get_text(strip=True)
+base_date = datetime.strptime(year_txt, "About my Street - %B %Y")
+base_month = base_date.month
+base_year = base_date.yearThen in the loop, handle year rollover:
month = datetime.strptime(month_txt, "%B").strftime("%m")
-print(month_txt)
+# Handle year rollover: if month < base_month, we've crossed into next year
+current_year = base_year if int(month) >= base_month else base_year + 1And use current_year instead of year when constructing dates.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| wasteCalendarContainer = soup.find("div", class_="contentContainer") | |
| if not wasteCalendarContainer: | |
| return data | |
| year_txt = wasteCalendarContainer.find("h1").get_text(strip=True) | |
| year = datetime.strptime(year_txt, "About my Street - %B %Y").strftime("%Y") | |
| calendarContainer = soup.find("div", class_="calendarContainer") | |
| if not calendarContainer: | |
| return data | |
| calendarContainer2 = calendarContainer.find_all( | |
| "div", class_="calendarContainer" | |
| ) | |
| wasteCalendarContainer = soup.find("div", class_="contentContainer") | |
| if not wasteCalendarContainer: | |
| return data | |
| h1_element = wasteCalendarContainer.find("h1") | |
| if not h1_element: | |
| return data | |
| year_txt = h1_element.get_text(strip=True) | |
| base_date = datetime.strptime(year_txt, "About my Street - %B %Y") | |
| base_month = base_date.month | |
| base_year = base_date.year | |
| calendarContainer = soup.find("div", class_="calendarContainer") | |
| if not calendarContainer: | |
| return data | |
| calendarContainer2 = calendarContainer.find_all( | |
| "div", class_="calendarContainer" | |
| ) |
🧰 Tools
🪛 Ruff (0.14.10)
62-62: datetime may be undefined, or defined from star imports
(F405)
🤖 Prompt for AI Agents
In @uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py
around lines 58 - 69, The code assumes wasteCalendarContainer.find("h1") always
exists and parses a single `year` for all months, which can raise AttributeError
or mis-tag dates across year boundaries; first add a defensive check that
`wasteCalendarContainer.find("h1")` is not None before using `get_text`
(returning early or logging as other checks do) and only parse a starting
month/year into `current_year` and `current_month`; then in the loop that
iterates `calendarContainer2` (and where dates are constructed) recompute each
month from the month header and detect year rollover by comparing month numbers
(if next month < previous month increment `current_year`) and use `current_year`
(not the original `year` variable) when building date strings.
| month_txt = container.find("tr", class_="calendar").get_text(strip=True) | ||
| month = datetime.strptime(month_txt, "%B").strftime("%m") | ||
| print(month_txt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove debug print statement and add null check.
Line 77 contains a debug print statement that should be removed before merging. Additionally, line 75 may raise AttributeError if the tr.calendar element is not found.
Proposed fix
-month_txt = container.find("tr", class_="calendar").get_text(strip=True)
-month = datetime.strptime(month_txt, "%B").strftime("%m")
-print(month_txt)
+month_row = container.find("tr", class_="calendar")
+if not month_row:
+ continue
+month_txt = month_row.get_text(strip=True)
+month = datetime.strptime(month_txt, "%B").strftime("%m")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| month_txt = container.find("tr", class_="calendar").get_text(strip=True) | |
| month = datetime.strptime(month_txt, "%B").strftime("%m") | |
| print(month_txt) | |
| month_row = container.find("tr", class_="calendar") | |
| if not month_row: | |
| continue | |
| month_txt = month_row.get_text(strip=True) | |
| month = datetime.strptime(month_txt, "%B").strftime("%m") |
🧰 Tools
🪛 Ruff (0.14.10)
76-76: datetime may be undefined, or defined from star imports
(F405)
🤖 Prompt for AI Agents
In @uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py
around lines 75 - 77, Remove the debug print and guard against a missing
tr.calendar element: check the result of container.find("tr", class_="calendar")
before calling get_text/strptime (e.g., if result is None, skip this container
or log and continue), then compute month_txt and month using datetime.strptime
only when the element exists; update the code around the month_txt/ month
assignment in the CastlepointDistrictCouncil parsing routine to remove the
print(month_txt) and add the null check for the find call.
| data["bins"].sort( | ||
| key=lambda x: datetime.strptime(x.get("collectionDate"), "%d/%m/%Y") | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded date format in sort may cause inconsistency.
The sort key uses a hardcoded "%d/%m/%Y" format, but the collectionDate is formatted using the date_format variable from the common module. If date_format ever differs, the sort will fail with a ValueError.
Proposed fix
data["bins"].sort(
- key=lambda x: datetime.strptime(x.get("collectionDate"), "%d/%m/%Y")
+ key=lambda x: datetime.strptime(x.get("collectionDate"), date_format)
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| data["bins"].sort( | |
| key=lambda x: datetime.strptime(x.get("collectionDate"), "%d/%m/%Y") | |
| ) | |
| data["bins"].sort( | |
| key=lambda x: datetime.strptime(x.get("collectionDate"), date_format) | |
| ) |
🧰 Tools
🪛 Ruff (0.14.10)
108-108: datetime may be undefined, or defined from star imports
(F405)
🤖 Prompt for AI Agents
In @uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py
around lines 107 - 109, Replace the hardcoded "%d/%m/%Y" in the sort key with
the shared date_format used elsewhere so parsing stays consistent; update the
lambda to use the common date_format (e.g.,
datetime.strptime(x.get("collectionDate"), date_format)) or import date_format
from the common module and use it, and optionally guard against missing/invalid
dates by providing a fallback or wrapping the parse in a try/except to avoid
ValueError during sorting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In
@uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py:
- Around line 18-19: The call to check_uprn(user_uprn) in
FolkestoneandHytheDistrictCouncil.py ignores its boolean result so invalid UPRNs
continue processing; capture the return (e.g. valid = check_uprn(user_uprn)) and
if it is False/None abort early (raise ValueError, return an error response, or
log and return) before making API calls or further processing to avoid
downstream failures; ensure you reference user_uprn and check_uprn in your
changes and propagate a clear error path when validation fails.
- Around line 26-32: The two HTTP calls using session.get(URI1) and
session.get(URI2) in FolkestoneandHytheDistrictCouncil.py lack timeouts; update
both calls to include a sensible timeout (e.g. timeout=10 or a module-level
TIMEOUT constant) when calling session.get(...) so the scraper won’t hang,
keeping the subsequent response.raise_for_status() checks unchanged and ensuring
the session variable and URI1/URI2 usage remain the same.
🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py (2)
4-4: Consider explicit imports instead of wildcard import.The wildcard import makes it unclear which symbols (
datetime,date_format,check_uprn) are being used. Explicit imports improve readability and help static analysis tools verify correctness.Suggested fix
-from uk_bin_collection.uk_bin_collection.common import * +from uk_bin_collection.uk_bin_collection.common import check_uprn, date_format +from datetime import datetime
58-60: Usedate_formatconstant in sorting to match date formatting on line 35.Line 35 formats dates using the
date_formatconstant fromcommon.py, but the sorting at lines 58-60 hardcodes the format string. Ifdate_formatever changes, the sorting will break with aValueError.Suggested fix
bindata["bins"].sort( - key=lambda x: datetime.strptime(x.get("collectionDate"), "%d/%m/%Y") + key=lambda x: datetime.strptime(x.get("collectionDate"), date_format) )
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py
🧰 Additional context used
🪛 Ruff (0.14.10)
uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py
4-4: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names
(F403)
16-16: Unused method argument: page
(ARG002)
19-19: check_uprn may be undefined, or defined from star imports
(F405)
48-48: datetime may be undefined, or defined from star imports
(F405)
54-54: date_format may be undefined, or defined from star imports
(F405)
59-59: datetime may be undefined, or defined from star imports
(F405)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Run Integration Tests (3.12, 1.8.4)
🔇 Additional comments (2)
uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py (2)
8-14: LGTM!Class structure correctly inherits from
AbstractGetBinDataClassand follows the established pattern.
34-56: LGTM!The HTML parsing logic is well-structured with appropriate null checks and defensive error handling. The date parsing is correctly wrapped in a try/except block to gracefully handle malformed data.
| user_uprn = kwargs.get("uprn") | ||
| check_uprn(user_uprn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check_uprn return value is not used.
Looking at the check_uprn implementation in common.py, it returns True on success but implicitly returns None on failure (after printing an error). The code proceeds regardless of whether the UPRN is valid, which could lead to confusing downstream errors when the API request fails or returns unexpected data.
Suggested fix
user_uprn = kwargs.get("uprn")
- check_uprn(user_uprn)
+ if not check_uprn(user_uprn):
+ raise ValueError("Invalid UPRN provided")
bindata = {"bins": []}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| user_uprn = kwargs.get("uprn") | |
| check_uprn(user_uprn) | |
| user_uprn = kwargs.get("uprn") | |
| if not check_uprn(user_uprn): | |
| raise ValueError("Invalid UPRN provided") | |
| bindata = {"bins": []} |
🧰 Tools
🪛 Ruff (0.14.10)
19-19: check_uprn may be undefined, or defined from star imports
(F405)
🤖 Prompt for AI Agents
In
@uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py
around lines 18 - 19, The call to check_uprn(user_uprn) in
FolkestoneandHytheDistrictCouncil.py ignores its boolean result so invalid UPRNs
continue processing; capture the return (e.g. valid = check_uprn(user_uprn)) and
if it is False/None abort early (raise ValueError, return an error response, or
log and return) before making API calls or further processing to avoid
downstream failures; ensure you reference user_uprn and check_uprn in your
changes and propagate a clear error path when validation fails.
| session = requests.session() | ||
| response = session.get( | ||
| URI1 | ||
| ) # Initialize session state (cookies) required by URI2 | ||
| response.raise_for_status() # Validate session initialization | ||
| response = session.get(URI2) | ||
| response.raise_for_status() # Raise HTTPError for bad status codes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add timeout to HTTP requests.
Both session.get() calls lack a timeout parameter. Without a timeout, requests can hang indefinitely if the server is unresponsive, causing the scraper to stall. This is particularly important for automated collection systems.
Suggested fix
+ REQUEST_TIMEOUT = 30 # seconds
+
# Make the GET request
session = requests.session()
response = session.get(
- URI1
+ URI1,
+ timeout=REQUEST_TIMEOUT
) # Initialize session state (cookies) required by URI2
response.raise_for_status() # Validate session initialization
- response = session.get(URI2)
+ response = session.get(URI2, timeout=REQUEST_TIMEOUT)
response.raise_for_status() # Raise HTTPError for bad status codesCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In
@uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py
around lines 26 - 32, The two HTTP calls using session.get(URI1) and
session.get(URI2) in FolkestoneandHytheDistrictCouncil.py lack timeouts; update
both calls to include a sensible timeout (e.g. timeout=10 or a module-level
TIMEOUT constant) when calling session.get(...) so the scraper won’t hang,
keeping the subsequent response.raise_for_status() checks unchanged and ensuring
the session variable and URI1/URI2 usage remain the same.
BREAKING CHANGES
fix: #1500 - Renfrewshire Council - This scraper has been re-developed and now uses the UPRN value of your address only. It also no longer requires Selenium
fix: #1760 - Folkestone and Hythe District Council - This was previously incorrectly spelled Folkstone and Hythe District Council
Fixes
fix: #1771 - South Lanarkshire Council
fix: #1777 - Newark and Sherwood District Council
fix: #1803 - Castlepoint District Council
fix: #1793 - Castlepoint District Council
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.