Skip to content

Conversation

@m26dvd
Copy link
Contributor

@m26dvd m26dvd commented Dec 10, 2025

BREAKING CHANGES
fix: #1500 - Renfrewshire Council - This scraper has been re-developed and now uses the UPRN value of your address only. It also no longer requires Selenium
fix: #1760 - Folkestone and Hythe District Council - This was previously incorrectly spelled Folkstone and Hythe District Council

Fixes
fix: #1771 - South Lanarkshire Council
fix: #1777 - Newark and Sherwood District Council
fix: #1803 - Castlepoint District Council
fix: #1793 - Castlepoint District Council

Summary by CodeRabbit

  • New Features

    • Added Folkestone & Hythe scraper; expanded UPRN-based lookups for Renfrewshire, Newark & Sherwood and others; Castlepoint now parses multi-month calendars; Isle of Anglesey added.
  • Bug Fixes

    • Improved reliability for Renfrewshire (ICS-based retrieval) and Folkestone & Hythe collection retrieval; South Lanarkshire ignores "Area" rows; fixed Folkestone & Hythe naming and removed obsolete web-driver usage.
  • Documentation

    • CLI docs updated to prefer UPRN (-u), added skip (-s) notes, and clarified Selenium guidance per council.

✏️ Tip: You can customize this high-level summary in your review settings.

@codecov
Copy link

codecov bot commented Dec 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.81%. Comparing base (83cce53) to head (e56d0ab).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1772   +/-   ##
=======================================
  Coverage   86.81%   86.81%           
=======================================
  Files           9        9           
  Lines        1138     1138           
=======================================
  Hits          988      988           
  Misses        150      150           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fix: robbrad#1771 -  South Lanarkshire Council
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 10, 2025

📝 Walkthrough

Walkthrough

Multiple council scrapers were converted from browser/Selenium flows to direct HTTP/UPRN or ICS-based retrievals; a new Folkestone & Hythe requests-based parser was added; Castle Point calendar parsing was reworked; tests and wiki docs updated to reflect UPRN and parameter changes.

Changes

Cohort / File(s) Summary
Folkestone & Hythe (new parser & tests)
uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py, uk_bin_collection/tests/input.json
Added a new requests-based scraper that performs two GETs and parses article.service-card entries for bin type and next collection date; corrected public entity name and removed web_driver from tests.
Renfrewshire (ICS/UPRN migration & docs/tests)
uk_bin_collection/uk_bin_collection/councils/RenfrewshireCouncil.py, wiki/Councils.md, uk_bin_collection/tests/input.json
Removed Selenium/browser automation; use UPRN-based view URL, fetch HTML, extract ICS link, download ICS, parse events via icalevents, map event summaries to bin types; updated URL, wiki note, tests and docs to UPRN usage.
Newark & Sherwood (uprn-driven fetch)
uk_bin_collection/uk_bin_collection/councils/NewarkAndSherwoodDC.py
Now validates uprn, constructs a pid-based URL, performs requests.get to fetch HTML, then parses that response instead of using the provided page string.
Castle Point (calendar parsing rewrite)
uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py
Reworked parsing to iterate multiple calendarContainer elements, derive year/month per container, extract Pink/Normal day lists, accumulate into data["bins"] and sort by collectionDate.
South Lanarkshire (robustness tweak)
uk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.py
Added guard to skip rows where schedule_type == "Area" and small date-formatting adjustments to avoid index errors.
Documentation updates
wiki/Councils.md
Added Isle of Anglesey and updated multiple council entries (Renfrewshire, Cumberland, Epsom & Ewell, Merton, North Hertfordshire, etc.) to reflect UPRN usage and parameter changes (-u, -s).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant CouncilSite as Council Website
    participant ICSServer as ICS Host
    participant Parser as ICS Parser

    Note over Client,CouncilSite: Renfrewshire UPRN → ICS retrieval flow
    Client->>CouncilSite: GET view URL (with UPRN)
    CouncilSite-->>Client: HTML page containing ICS link
    Client->>Client: Extract ICS link from HTML
    Client->>ICSServer: GET ICS file
    ICSServer-->>Client: Return ICS calendar data
    Client->>Parser: Parse ICS events (filter next 365 days)
    Parser-->>Client: Map events to bin types + dates
    Client->>Client: Build & sort `bins` by collectionDate
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • dp247

Poem

🐰
I hopped through HTML and calendar springs,
Swapped browsers for requests and tiny wings.
UPRNs led me to each dated line,
Cards and ICS now mark collection time.
Hoppity hops — tidy bins, tidy shine! 🥕

🚥 Pre-merge checks | ✅ 3 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'fix: Council Fix Pack - January 2026' is vague and does not clearly describe the primary changes. It uses a generic term 'Council Fix Pack' without specifying which councils or what was actually fixed. Use a more descriptive title that specifies the main changes, such as 'fix: Refactor Renfrewshire, Folkestone, Newark, South Lanarkshire, and Castlepoint councils' or focus on the primary breaking change (Renfrewshire Selenium removal).
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed The PR successfully addresses all five linked issues: #1500 (Renfrewshire Selenium removal with UPRN), #1760 (Folkestone spelling correction), #1771 (South Lanarkshire index fix), #1777 (Newark and Sherwood refactor), and #1803/#1793 (Castlepoint null handling).
Out of Scope Changes check ✅ Passed All code changes are directly aligned with the linked issue objectives. The wiki documentation updates are in-scope as they document the parameter changes introduced by the council-specific fixes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

m26dvd and others added 3 commits December 15, 2025 12:01
@m26dvd m26dvd marked this pull request as ready for review December 26, 2025 23:52
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
wiki/Councils.md (1)

85-86: Remove duplicate Cumberland entry from contents list and clarify the two council sections.

The contents list at lines 85–86 contains a duplicate - [Cumberland](#cumberland) entry. Additionally, the file has two separate sections both titled "### Cumberland" (one for CumberlandAllerdaleCouncil using postcode/house number parameters, another for CumberlandCouncil using UPRN). Since Allerdale merged into Cumberland, clarify which entry users should use—either remove the legacy CumberlandAllerdaleCouncil entry, add a deprecation note, or use distinct section headings to differentiate them.

🧹 Nitpick comments (3)
uk_bin_collection/uk_bin_collection/councils/RenfrewshireCouncil.py (2)

39-55: Consider error handling for ICS parsing.

The events() function from icalevents could fail if the ICS file is malformed or the URL is unreachable. While the current implementation will naturally propagate exceptions, consider whether more specific error messages would improve debugging.

Potential improvement
-        # Parse ICS calendar
-        upcoming_events = events(ics_url, start=now, end=future)
+        # Parse ICS calendar
+        try:
+            upcoming_events = events(ics_url, start=now, end=future)
+        except Exception as e:
+            raise ValueError(
+                f"Failed to parse ICS calendar for UPRN {user_uprn}: {str(e)}"
+            ) from e

This would provide clearer error messages if ICS parsing fails, though the existing behavior of propagating the original exception is also acceptable.


57-59: Use date_format constant for consistency.

The hardcoded date format string "%d/%m/%Y" should ideally reference the date_format constant to ensure consistency across the codebase.

🔎 Suggested improvement
         bindata["bins"].sort(
-            key=lambda x: datetime.strptime(x.get("collectionDate"), "%d/%m/%Y")
+            key=lambda x: datetime.strptime(x.get("collectionDate"), date_format)
         )

This ensures that if date_format is ever changed, the sorting logic remains consistent.

uk_bin_collection/uk_bin_collection/councils/FolkstoneandHytheDistrictCouncil.py (1)

4-4: Static analysis flagged star imports.

Static analysis tools flag datetime and date_format as potentially undefined because they're imported via star import from common. While this is a codebase-wide pattern, explicit imports would improve clarity and help static analysis tools.

Optional refactor to explicit imports
+from datetime import datetime
 from uk_bin_collection.uk_bin_collection.common import *

Note: This assumes date_format is intentionally provided by the common module's star export.

Also applies to: 38-38, 42-42

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 83cce53 and f14840a.

📒 Files selected for processing (6)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/FolkstoneandHytheDistrictCouncil.py
  • uk_bin_collection/uk_bin_collection/councils/NewarkAndSherwoodDC.py
  • uk_bin_collection/uk_bin_collection/councils/RenfrewshireCouncil.py
  • uk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.py
  • wiki/Councils.md
🧰 Additional context used
🧬 Code graph analysis (1)
uk_bin_collection/uk_bin_collection/councils/NewarkAndSherwoodDC.py (1)
uk_bin_collection/uk_bin_collection/common.py (1)
  • check_uprn (67-78)
🪛 Ruff (0.14.10)
uk_bin_collection/uk_bin_collection/councils/NewarkAndSherwoodDC.py

20-20: check_uprn may be undefined, or defined from star imports

(F405)


25-25: requests may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/RenfrewshireCouncil.py

22-22: check_uprn may be undefined, or defined from star imports

(F405)


28-28: Probable use of requests call without timeout

(S113)


35-37: Avoid specifying long messages outside the exception class

(TRY003)


53-53: date_format may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.py

96-96: date_format may be undefined, or defined from star imports

(F405)

uk_bin_collection/uk_bin_collection/councils/FolkstoneandHytheDistrictCouncil.py

38-38: datetime may be undefined, or defined from star imports

(F405)


42-42: date_format may be undefined, or defined from star imports

(F405)

🔇 Additional comments (11)
wiki/Councils.md (6)

160-160: ✓ Isle of Anglesey entry is well integrated.

The new council entry is properly alphabetized in the contents list and follows the documentation format. The note clarifying that either UPRN or postcode/house number can be passed, and that the URL is not used, is helpful.

Also applies to: 2148-2158


1646-1654: ✓ Epsom and Ewell parameters and Selenium requirement properly documented.

The updated command, parameter list, and explicit note about Selenium requirement are clear and consistent with other Selenium-dependent councils in the documentation.


3141-3147: ✓ Renfrewshire scraper modernization properly documented.

The updated URL and parameter set (UPRN-only, no Selenium) align with the stated PR objective. The removal of -p and -w parameters and the updated note clearly communicate the breaking change. Users transitioning from the old approach will need to adjust their setup.


1753-1762: ✓ Folkestone and Hythe entry is properly documented.

The UPRN-based approach without Selenium is clearly presented and consistent with other modernized council entries.


2526-2532: ✓ Merton and North Hertfordshire entries updated consistently.

Both councils are now documented with the UPRN-based approach (-s -u parameters) and updated URLs. The changes align with the PR's modernization strategy and are properly documented.

Also applies to: 2809-2815


1-2: Verify that input.json has been updated to match these documentation changes.

This file is marked as auto-generated from uk_bin_collection/tests/input.json. The marked changes should be reflected in input.json to ensure they persist after regeneration. If these changes were made directly to Councils.md without updating the source file, they will be lost during the next regeneration.

Confirm that all council entry updates (Isle of Anglesey, Cumberland, Epsom and Ewell, Merton, North Hertfordshire, Renfrewshire) have corresponding updates in input.json.

uk_bin_collection/uk_bin_collection/councils/SouthLanarkshireCouncil.py (2)

74-75: LGTM! Guard prevents index error on "Area" rows.

This addition directly addresses issue #1771 by skipping rows where schedule_type is "Area", which likely lack the expected table cell structure that would cause the "list index out of range" error at line 77 when accessing .split(" ")[1].


95-97: LGTM! Improved readability.

The reformatting of the date formatting statement improves code readability with no functional changes.

uk_bin_collection/tests/input.json (1)

1973-1980: LGTM! Test data updated for uprn-based flow.

The test configuration correctly reflects the migration from Selenium-based scraping to the uprn-based ICS retrieval approach:

  • Replaced house_number, paon, and postcode with uprn
  • Removed web_driver requirement
  • Updated wiki_note to reference FindMyAddress for UPRN lookup
uk_bin_collection/uk_bin_collection/councils/RenfrewshireCouncil.py (2)

1-5: LGTM! Imports updated for ICS-based approach.

The imports correctly reflect the migration from Selenium to a direct HTTP + ICS parsing approach:

  • requests for HTTP calls
  • icalevents for ICS calendar parsing
  • datetime utilities for date handling

21-23: LGTM! UPRN parameter handling is correct.

The code properly extracts and validates the UPRN parameter using the check_uprn helper function from the common module.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py:
- Around line 68-74: The loop in CastlepointDistrictCouncil (within the method
that iterates calendarContainer2) currently does "return 0" when a table is
missing and prints month_txt; replace the early return with a skip (e.g.,
continue) so parsing proceeds, remove the debug print, and ensure the method
always returns a dict with a "bins" key (e.g., return {"bins": []} when nothing
parsed) so callers get the expected type; add a warning log if you want to
surface missing tables instead of returning non-dict values.
- Around line 63-66: The code calls calendarContainer.find_all(...) without
checking calendarContainer for None; update the logic in
CastlepointDistrictCouncil (around the calendarContainer variable) to first test
if calendarContainer is truthy and handle the None case (e.g., set
calendarContainer2 = [] or return/skip processing and log a warning) before
invoking calendarContainer.find_all so you avoid an AttributeError.
- Around line 58-61: Null-check wasteCalendarContainer (and the h1 element)
before trying to call get_text to avoid AttributeError if the page structure
changes or the request fails; if the container or h1 is missing, handle
gracefully (e.g., log an error/raise a descriptive exception or return/skip) and
remove the debug print(year) statement so no unintended console output remains;
keep the parsing logic (year_txt and datetime.strptime) intact once the elements
are validated.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f14840a and 19be622.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py
🧰 Additional context used
🪛 Ruff (0.14.10)
uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py

60-60: datetime may be undefined, or defined from star imports

(F405)


73-73: datetime may be undefined, or defined from star imports

(F405)


86-86: datetime may be undefined, or defined from star imports

(F405)


88-88: date_format may be undefined, or defined from star imports

(F405)


94-94: datetime may be undefined, or defined from star imports

(F405)


96-96: date_format may be undefined, or defined from star imports

(F405)


101-101: datetime may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run Integration Tests (3.12, 1.8.4)
  • GitHub Check: Run Unit Tests (3.12, 1.8.4)
🔇 Additional comments (1)
uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py (1)

49-52: LGTM on the form field update.

The switch from USRN to roadID aligns with the updated API requirements.

Comment on lines 58 to 61
wasteCalendarContainer = soup.find("div", class_="contentContainer")
year_txt = wasteCalendarContainer.find("h1").get_text(strip=True)
year = datetime.strptime(year_txt, "About my Street - %B %Y").strftime("%Y")
print(year)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add null check and remove debug print statement.

If the page structure changes or the request fails, wasteCalendarContainer could be None, causing an AttributeError on line 59. Additionally, the print(year) statement appears to be a debug artifact.

Suggested fix
         wasteCalendarContainer = soup.find("div", class_="contentContainer")
+        if not wasteCalendarContainer:
+            return data
         year_txt = wasteCalendarContainer.find("h1").get_text(strip=True)
         year = datetime.strptime(year_txt, "About my Street - %B %Y").strftime("%Y")
-        print(year)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
wasteCalendarContainer = soup.find("div", class_="contentContainer")
year_txt = wasteCalendarContainer.find("h1").get_text(strip=True)
year = datetime.strptime(year_txt, "About my Street - %B %Y").strftime("%Y")
print(year)
wasteCalendarContainer = soup.find("div", class_="contentContainer")
if not wasteCalendarContainer:
return data
year_txt = wasteCalendarContainer.find("h1").get_text(strip=True)
year = datetime.strptime(year_txt, "About my Street - %B %Y").strftime("%Y")
🧰 Tools
🪛 Ruff (0.14.10)

60-60: datetime may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents
In @uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py
around lines 58 - 61, Null-check wasteCalendarContainer (and the h1 element)
before trying to call get_text to avoid AttributeError if the page structure
changes or the request fails; if the container or h1 is missing, handle
gracefully (e.g., log an error/raise a descriptive exception or return/skip) and
remove the debug print(year) statement so no unintended console output remains;
keep the parsing logic (year_txt and datetime.strptime) intact once the elements
are validated.

Comment on lines 68 to 74
for container in calendarContainer2:
table = container.find("table", class_="calendar")
if not table:
return 0
month_txt = container.find("tr", class_="calendar").get_text(strip=True)
month = datetime.strptime(month_txt, "%B").strftime("%m")
print(month_txt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: return 0 breaks the expected return type.

Returning 0 when a table is not found violates the method contract—callers expect a dict with a "bins" key. This will cause downstream failures. Additionally, the print(month_txt) statement should be removed.

Suggested fix
         for container in calendarContainer2:
             table = container.find("table", class_="calendar")
             if not table:
-                return 0
+                continue
             month_txt = container.find("tr", class_="calendar").get_text(strip=True)
             month = datetime.strptime(month_txt, "%B").strftime("%m")
-            print(month_txt)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for container in calendarContainer2:
table = container.find("table", class_="calendar")
if not table:
return 0
month_txt = container.find("tr", class_="calendar").get_text(strip=True)
month = datetime.strptime(month_txt, "%B").strftime("%m")
print(month_txt)
for container in calendarContainer2:
table = container.find("table", class_="calendar")
if not table:
continue
month_txt = container.find("tr", class_="calendar").get_text(strip=True)
month = datetime.strptime(month_txt, "%B").strftime("%m")
🧰 Tools
🪛 Ruff (0.14.10)

73-73: datetime may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents
In @uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py
around lines 68 - 74, The loop in CastlepointDistrictCouncil (within the method
that iterates calendarContainer2) currently does "return 0" when a table is
missing and prints month_txt; replace the early return with a skip (e.g.,
continue) so parsing proceeds, remove the debug print, and ensure the method
always returns a dict with a "bins" key (e.g., return {"bins": []} when nothing
parsed) so callers get the expected type; add a warning log if you want to
surface missing tables instead of returning non-dict values.

Comment on lines 76 to 102
pink_days = [
td.get_text(strip=True) for td in calendar.find_all("td", class_="pink")
td.get_text(strip=True) for td in table.find_all("td", class_="pink")
]
black_days = [
td.get_text(strip=True)
for td in calendar.find_all("td", class_="normal")
normal_days = [
td.get_text(strip=True) for td in table.find_all("td", class_="normal")
]

for day in pink_days:
out.append(
(
"Pink collection",
datetime(year=int(year), month=int(month), day=int(day)),
)
)
for day in black_days:
out.append(
(
"Normal collection",
datetime(year=int(year), month=int(month), day=int(day)),
)
)

return out

collection_tuple = []
for s in (soup, soup_nextmonth):
collection_tuple.extend(parse_calendar_month(s))

ordered_data = sorted(collection_tuple, key=lambda x: x[1])

for item in ordered_data:
dict_data = {
"type": item[0],
"collectionDate": item[1].strftime(date_format),
}
data["bins"].append(dict_data)
dict_data = {
"type": "Pink collection",
"collectionDate": datetime(
year=int(year), month=int(month), day=int(day)
).strftime(date_format),
}
data["bins"].append(dict_data)
for day in normal_days:
dict_data = {
"type": "Normal collection",
"collectionDate": datetime(
year=int(year), month=int(month), day=int(day)
).strftime(date_format),
}
data["bins"].append(dict_data)

data["bins"].sort(
key=lambda x: datetime.strptime(x.get("collectionDate"), "%d/%m/%Y")
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against empty day values and verify format consistency.

  1. If a table cell with class pink or normal contains no text (e.g., empty cell placeholder), int(day) will raise a ValueError.
  2. The sort on line 101 uses a hardcoded format "%d/%m/%Y" while entries use date_format. If date_format differs, the sort will fail.
Suggested fix
             pink_days = [
-                td.get_text(strip=True) for td in table.find_all("td", class_="pink")
+                td.get_text(strip=True) for td in table.find_all("td", class_="pink") if td.get_text(strip=True)
             ]
             normal_days = [
-                td.get_text(strip=True) for td in table.find_all("td", class_="normal")
+                td.get_text(strip=True) for td in table.find_all("td", class_="normal") if td.get_text(strip=True)
             ]

             for day in pink_days:
                 dict_data = {
                     "type": "Pink collection",
                     "collectionDate": datetime(
                         year=int(year), month=int(month), day=int(day)
                     ).strftime(date_format),
                 }
                 data["bins"].append(dict_data)
             for day in normal_days:
                 dict_data = {
                     "type": "Normal collection",
                     "collectionDate": datetime(
                         year=int(year), month=int(month), day=int(day)
                     ).strftime(date_format),
                 }
                 data["bins"].append(dict_data)

         data["bins"].sort(
-            key=lambda x: datetime.strptime(x.get("collectionDate"), "%d/%m/%Y")
+            key=lambda x: datetime.strptime(x.get("collectionDate"), date_format)
         )
🧰 Tools
🪛 Ruff (0.14.10)

86-86: datetime may be undefined, or defined from star imports

(F405)


88-88: date_format may be undefined, or defined from star imports

(F405)


94-94: datetime may be undefined, or defined from star imports

(F405)


96-96: date_format may be undefined, or defined from star imports

(F405)


101-101: datetime may be undefined, or defined from star imports

(F405)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In
@uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py:
- Around line 25-31: The first GET to initialize session cookies (response =
session.get(URI1)) does not check for errors; add a status check immediately
after that call (e.g., call response.raise_for_status() or validate
response.status_code and raise/log) before proceeding to the second request
(response = session.get(URI2)) so that failures on the URI1 request are detected
and handled early; update any surrounding try/except or logging to surface the
error context from the URI1 request.

In @wiki/Councils.md:
- Around line 2148-2158: The documentation lists flags `-s` and `-u` but the
note says "Pass either UPRN or postcode/house number", causing inconsistency;
either document the postcode/house-number flags (e.g., add `-p` for postcode and
`-n` for house number to the "Additional parameters" list with brief
descriptions) or revise the note to state that only UPRN (`-u`) is accepted;
update the Isle of Anglesey block so the parameters list and the note are
consistent (referencing the existing `-s` and `-u` flags and the note text).
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py (1)

1-5: Star import obscures dependencies.

The wildcard import on line 4 makes it unclear which symbols (check_uprn, datetime, date_format) come from common. This triggers Ruff F403/F405 warnings. Consider explicit imports for clarity and to avoid accidental shadowing.

♻️ Suggested explicit imports
 import requests
 from bs4 import BeautifulSoup

-from uk_bin_collection.uk_bin_collection.common import *
+from uk_bin_collection.uk_bin_collection.common import check_uprn, date_format
+from datetime import datetime
 from uk_bin_collection.uk_bin_collection.get_bin_data import AbstractGetBinDataClass
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19be622 and 27a9e0e.

📒 Files selected for processing (3)
  • uk_bin_collection/tests/input.json
  • uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py
  • wiki/Councils.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • uk_bin_collection/tests/input.json
🧰 Additional context used
🪛 Ruff (0.14.10)
uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py

4-4: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)


16-16: Unused method argument: page

(ARG002)


19-19: check_uprn may be undefined, or defined from star imports

(F405)


47-47: datetime may be undefined, or defined from star imports

(F405)


53-53: date_format may be undefined, or defined from star imports

(F405)


58-58: datetime may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run Unit Tests (3.12, 1.8.4)
  • GitHub Check: Run Integration Tests (3.12, 1.8.4)
🔇 Additional comments (9)
uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py (2)

16-19: LGTM on method signature and UPRN validation.

The page parameter is unused because this scraper fetches data directly via HTTP. This is an acceptable pattern when the base class interface requires the parameter but the implementation uses its own data retrieval. The UPRN validation via check_uprn is correctly placed.


45-59: Date parsing and sorting logic is correct.

The date extraction handles edge cases well with the try/except block. The sorting on line 58 uses strptime with "%d/%m/%Y", which correctly matches the date_format constant defined in common.py (line 17: date_format = "%d/%m/%Y"). Since the code formats dates with dt.strftime(date_format) on line 52 and then parses them with the same format on line 58, the logic is consistent and will function correctly.

wiki/Councils.md (7)

1246-1254: LGTM!

Cumberland Council documentation correctly updated to reflect UPRN-only parameters with the new URL path.


1644-1655: LGTM!

Epsom and Ewell documentation correctly updated with Selenium requirement and all necessary parameters.


1753-1762: LGTM!

Folkestone and Hythe documentation correctly updated to reflect the new UPRN-based, non-Selenium implementation.


2160-2169: LGTM!

Islington documentation correctly updated with postcode parameter.


2524-2533: LGTM!

Merton documentation correctly updated with new URL and UPRN-based parameters.


2807-2816: Verify URL and parameter redundancy.

The URL contains XXXXXXXX placeholder (for UPRN in the path) and also documents -u XXXXXXXX parameter. Confirm whether both are required or if the UPRN should only be in one place to avoid user confusion.


3139-3148: LGTM!

Renfrewshire documentation correctly updated to reflect the breaking change from Selenium-based to UPRN-only implementation, aligning with PR objective #1500.

Comment on lines +2148 to +2158
### Isle of Anglesey
```commandline
python collect_data.py IsleOfAngleseyCouncil https://www.anglesey.gov.wales/en/Residents/Bins-and-recycling/Waste-Collection-Day.aspx -s -u XXXXXXXX
```
Additional parameters:
- `-s` - skip get URL
- `-u` - UPRN

Note: Pass either UPRN or postcode/house number. URL is not used.

---
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Documentation note inconsistent with parameters.

The note states "Pass either UPRN or postcode/house number" but only -s and -u parameters are documented. If postcode/house number are supported, add -p and -n to the parameters list. Otherwise, update the note.

🤖 Prompt for AI Agents
In @wiki/Councils.md around lines 2148 - 2158, The documentation lists flags
`-s` and `-u` but the note says "Pass either UPRN or postcode/house number",
causing inconsistency; either document the postcode/house-number flags (e.g.,
add `-p` for postcode and `-n` for house number to the "Additional parameters"
list with brief descriptions) or revise the note to state that only UPRN (`-u`)
is accepted; update the Isle of Anglesey block so the parameters list and the
note are consistent (referencing the existing `-s` and `-u` flags and the note
text).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In @uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py:
- Around line 107-109: Replace the hardcoded "%d/%m/%Y" in the sort key with the
shared date_format used elsewhere so parsing stays consistent; update the lambda
to use the common date_format (e.g., datetime.strptime(x.get("collectionDate"),
date_format)) or import date_format from the common module and use it, and
optionally guard against missing/invalid dates by providing a fallback or
wrapping the parse in a try/except to avoid ValueError during sorting.
- Around line 75-77: Remove the debug print and guard against a missing
tr.calendar element: check the result of container.find("tr", class_="calendar")
before calling get_text/strptime (e.g., if result is None, skip this container
or log and continue), then compute month_txt and month using datetime.strptime
only when the element exists; update the code around the month_txt/ month
assignment in the CastlepointDistrictCouncil parsing routine to remove the
print(month_txt) and add the null check for the find call.
- Around line 58-69: The code assumes wasteCalendarContainer.find("h1") always
exists and parses a single `year` for all months, which can raise AttributeError
or mis-tag dates across year boundaries; first add a defensive check that
`wasteCalendarContainer.find("h1")` is not None before using `get_text`
(returning early or logging as other checks do) and only parse a starting
month/year into `current_year` and `current_month`; then in the loop that
iterates `calendarContainer2` (and where dates are constructed) recompute each
month from the month header and detect year rollover by comparing month numbers
(if next month < previous month increment `current_year`) and use `current_year`
(not the original `year` variable) when building date strings.
🧹 Nitpick comments (1)
uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py (1)

90-105: Consider consolidating duplicate entry-building logic.

The pink and normal collection entry creation blocks are nearly identical. This is minor, but consolidating would reduce duplication.

Optional refactor
-for day in pink_days:
-    dict_data = {
-        "type": "Pink collection",
-        "collectionDate": datetime(
-            year=int(year), month=int(month), day=int(day)
-        ).strftime(date_format),
-    }
-    data["bins"].append(dict_data)
-for day in normal_days:
-    dict_data = {
-        "type": "Normal collection",
-        "collectionDate": datetime(
-            year=int(year), month=int(month), day=int(day)
-        ).strftime(date_format),
-    }
-    data["bins"].append(dict_data)
+for collection_type, days in [("Pink collection", pink_days), ("Normal collection", normal_days)]:
+    for day in days:
+        data["bins"].append({
+            "type": collection_type,
+            "collectionDate": datetime(
+                year=int(year), month=int(month), day=int(day)
+            ).strftime(date_format),
+        })
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27a9e0e and 141c6ff.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py
🧰 Additional context used
🪛 Ruff (0.14.10)
uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py

62-62: datetime may be undefined, or defined from star imports

(F405)


76-76: datetime may be undefined, or defined from star imports

(F405)


93-93: datetime may be undefined, or defined from star imports

(F405)


95-95: date_format may be undefined, or defined from star imports

(F405)


101-101: datetime may be undefined, or defined from star imports

(F405)


103-103: date_format may be undefined, or defined from star imports

(F405)


108-108: datetime may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run Integration Tests (3.12, 1.8.4)
🔇 Additional comments (2)
uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py (2)

29-30: LGTM!

The data structure initialization and the form field change from USRN to roadID align with the PR objectives to update selectors for reliable scraper initialization.

Also applies to: 49-49


79-88: LGTM!

The day extraction logic with filtering for non-empty text is clean and defensive.

Comment on lines +58 to 69
wasteCalendarContainer = soup.find("div", class_="contentContainer")
if not wasteCalendarContainer:
return data
year_txt = wasteCalendarContainer.find("h1").get_text(strip=True)
year = datetime.strptime(year_txt, "About my Street - %B %Y").strftime("%Y")

calendarContainer = soup.find("div", class_="calendarContainer")
if not calendarContainer:
return data
calendarContainer2 = calendarContainer.find_all(
"div", class_="calendarContainer"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential AttributeError if <h1> element is missing.

Line 61 assumes find("h1") succeeds, but if the page structure changes or the element is missing, this will raise an AttributeError. Add a defensive check similar to the container checks.

Additionally, the year is extracted once from the header (e.g., "January 2026"), but if the calendar spans across a year boundary (e.g., December 2025 to January 2026), all dates will incorrectly use the same year.

Proposed fix for null check and year rollover
 wasteCalendarContainer = soup.find("div", class_="contentContainer")
 if not wasteCalendarContainer:
     return data
-year_txt = wasteCalendarContainer.find("h1").get_text(strip=True)
-year = datetime.strptime(year_txt, "About my Street - %B %Y").strftime("%Y")
+h1_element = wasteCalendarContainer.find("h1")
+if not h1_element:
+    return data
+year_txt = h1_element.get_text(strip=True)
+base_date = datetime.strptime(year_txt, "About my Street - %B %Y")
+base_month = base_date.month
+base_year = base_date.year

Then in the loop, handle year rollover:

 month = datetime.strptime(month_txt, "%B").strftime("%m")
-print(month_txt)
+# Handle year rollover: if month < base_month, we've crossed into next year
+current_year = base_year if int(month) >= base_month else base_year + 1

And use current_year instead of year when constructing dates.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
wasteCalendarContainer = soup.find("div", class_="contentContainer")
if not wasteCalendarContainer:
return data
year_txt = wasteCalendarContainer.find("h1").get_text(strip=True)
year = datetime.strptime(year_txt, "About my Street - %B %Y").strftime("%Y")
calendarContainer = soup.find("div", class_="calendarContainer")
if not calendarContainer:
return data
calendarContainer2 = calendarContainer.find_all(
"div", class_="calendarContainer"
)
wasteCalendarContainer = soup.find("div", class_="contentContainer")
if not wasteCalendarContainer:
return data
h1_element = wasteCalendarContainer.find("h1")
if not h1_element:
return data
year_txt = h1_element.get_text(strip=True)
base_date = datetime.strptime(year_txt, "About my Street - %B %Y")
base_month = base_date.month
base_year = base_date.year
calendarContainer = soup.find("div", class_="calendarContainer")
if not calendarContainer:
return data
calendarContainer2 = calendarContainer.find_all(
"div", class_="calendarContainer"
)
🧰 Tools
🪛 Ruff (0.14.10)

62-62: datetime may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents
In @uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py
around lines 58 - 69, The code assumes wasteCalendarContainer.find("h1") always
exists and parses a single `year` for all months, which can raise AttributeError
or mis-tag dates across year boundaries; first add a defensive check that
`wasteCalendarContainer.find("h1")` is not None before using `get_text`
(returning early or logging as other checks do) and only parse a starting
month/year into `current_year` and `current_month`; then in the loop that
iterates `calendarContainer2` (and where dates are constructed) recompute each
month from the month header and detect year rollover by comparing month numbers
(if next month < previous month increment `current_year`) and use `current_year`
(not the original `year` variable) when building date strings.

Comment on lines +75 to +77
month_txt = container.find("tr", class_="calendar").get_text(strip=True)
month = datetime.strptime(month_txt, "%B").strftime("%m")
print(month_txt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove debug print statement and add null check.

Line 77 contains a debug print statement that should be removed before merging. Additionally, line 75 may raise AttributeError if the tr.calendar element is not found.

Proposed fix
-month_txt = container.find("tr", class_="calendar").get_text(strip=True)
-month = datetime.strptime(month_txt, "%B").strftime("%m")
-print(month_txt)
+month_row = container.find("tr", class_="calendar")
+if not month_row:
+    continue
+month_txt = month_row.get_text(strip=True)
+month = datetime.strptime(month_txt, "%B").strftime("%m")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
month_txt = container.find("tr", class_="calendar").get_text(strip=True)
month = datetime.strptime(month_txt, "%B").strftime("%m")
print(month_txt)
month_row = container.find("tr", class_="calendar")
if not month_row:
continue
month_txt = month_row.get_text(strip=True)
month = datetime.strptime(month_txt, "%B").strftime("%m")
🧰 Tools
🪛 Ruff (0.14.10)

76-76: datetime may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents
In @uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py
around lines 75 - 77, Remove the debug print and guard against a missing
tr.calendar element: check the result of container.find("tr", class_="calendar")
before calling get_text/strptime (e.g., if result is None, skip this container
or log and continue), then compute month_txt and month using datetime.strptime
only when the element exists; update the code around the month_txt/ month
assignment in the CastlepointDistrictCouncil parsing routine to remove the
print(month_txt) and add the null check for the find call.

Comment on lines +107 to +109
data["bins"].sort(
key=lambda x: datetime.strptime(x.get("collectionDate"), "%d/%m/%Y")
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hardcoded date format in sort may cause inconsistency.

The sort key uses a hardcoded "%d/%m/%Y" format, but the collectionDate is formatted using the date_format variable from the common module. If date_format ever differs, the sort will fail with a ValueError.

Proposed fix
 data["bins"].sort(
-    key=lambda x: datetime.strptime(x.get("collectionDate"), "%d/%m/%Y")
+    key=lambda x: datetime.strptime(x.get("collectionDate"), date_format)
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
data["bins"].sort(
key=lambda x: datetime.strptime(x.get("collectionDate"), "%d/%m/%Y")
)
data["bins"].sort(
key=lambda x: datetime.strptime(x.get("collectionDate"), date_format)
)
🧰 Tools
🪛 Ruff (0.14.10)

108-108: datetime may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents
In @uk_bin_collection/uk_bin_collection/councils/CastlepointDistrictCouncil.py
around lines 107 - 109, Replace the hardcoded "%d/%m/%Y" in the sort key with
the shared date_format used elsewhere so parsing stays consistent; update the
lambda to use the common date_format (e.g.,
datetime.strptime(x.get("collectionDate"), date_format)) or import date_format
from the common module and use it, and optionally guard against missing/invalid
dates by providing a fallback or wrapping the parse in a try/except to avoid
ValueError during sorting.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In
@uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py:
- Around line 18-19: The call to check_uprn(user_uprn) in
FolkestoneandHytheDistrictCouncil.py ignores its boolean result so invalid UPRNs
continue processing; capture the return (e.g. valid = check_uprn(user_uprn)) and
if it is False/None abort early (raise ValueError, return an error response, or
log and return) before making API calls or further processing to avoid
downstream failures; ensure you reference user_uprn and check_uprn in your
changes and propagate a clear error path when validation fails.
- Around line 26-32: The two HTTP calls using session.get(URI1) and
session.get(URI2) in FolkestoneandHytheDistrictCouncil.py lack timeouts; update
both calls to include a sensible timeout (e.g. timeout=10 or a module-level
TIMEOUT constant) when calling session.get(...) so the scraper won’t hang,
keeping the subsequent response.raise_for_status() checks unchanged and ensuring
the session variable and URI1/URI2 usage remain the same.
🧹 Nitpick comments (2)
uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py (2)

4-4: Consider explicit imports instead of wildcard import.

The wildcard import makes it unclear which symbols (datetime, date_format, check_uprn) are being used. Explicit imports improve readability and help static analysis tools verify correctness.

Suggested fix
-from uk_bin_collection.uk_bin_collection.common import *
+from uk_bin_collection.uk_bin_collection.common import check_uprn, date_format
+from datetime import datetime

58-60: Use date_format constant in sorting to match date formatting on line 35.

Line 35 formats dates using the date_format constant from common.py, but the sorting at lines 58-60 hardcodes the format string. If date_format ever changes, the sorting will break with a ValueError.

Suggested fix
         bindata["bins"].sort(
-            key=lambda x: datetime.strptime(x.get("collectionDate"), "%d/%m/%Y")
+            key=lambda x: datetime.strptime(x.get("collectionDate"), date_format)
         )
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 141c6ff and e56d0ab.

📒 Files selected for processing (1)
  • uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py
🧰 Additional context used
🪛 Ruff (0.14.10)
uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py

4-4: from uk_bin_collection.uk_bin_collection.common import * used; unable to detect undefined names

(F403)


16-16: Unused method argument: page

(ARG002)


19-19: check_uprn may be undefined, or defined from star imports

(F405)


48-48: datetime may be undefined, or defined from star imports

(F405)


54-54: date_format may be undefined, or defined from star imports

(F405)


59-59: datetime may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run Integration Tests (3.12, 1.8.4)
🔇 Additional comments (2)
uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py (2)

8-14: LGTM!

Class structure correctly inherits from AbstractGetBinDataClass and follows the established pattern.


34-56: LGTM!

The HTML parsing logic is well-structured with appropriate null checks and defensive error handling. The date parsing is correctly wrapped in a try/except block to gracefully handle malformed data.

Comment on lines +18 to +19
user_uprn = kwargs.get("uprn")
check_uprn(user_uprn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

check_uprn return value is not used.

Looking at the check_uprn implementation in common.py, it returns True on success but implicitly returns None on failure (after printing an error). The code proceeds regardless of whether the UPRN is valid, which could lead to confusing downstream errors when the API request fails or returns unexpected data.

Suggested fix
         user_uprn = kwargs.get("uprn")
-        check_uprn(user_uprn)
+        if not check_uprn(user_uprn):
+            raise ValueError("Invalid UPRN provided")
         bindata = {"bins": []}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
user_uprn = kwargs.get("uprn")
check_uprn(user_uprn)
user_uprn = kwargs.get("uprn")
if not check_uprn(user_uprn):
raise ValueError("Invalid UPRN provided")
bindata = {"bins": []}
🧰 Tools
🪛 Ruff (0.14.10)

19-19: check_uprn may be undefined, or defined from star imports

(F405)

🤖 Prompt for AI Agents
In
@uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py
around lines 18 - 19, The call to check_uprn(user_uprn) in
FolkestoneandHytheDistrictCouncil.py ignores its boolean result so invalid UPRNs
continue processing; capture the return (e.g. valid = check_uprn(user_uprn)) and
if it is False/None abort early (raise ValueError, return an error response, or
log and return) before making API calls or further processing to avoid
downstream failures; ensure you reference user_uprn and check_uprn in your
changes and propagate a clear error path when validation fails.

Comment on lines +26 to +32
session = requests.session()
response = session.get(
URI1
) # Initialize session state (cookies) required by URI2
response.raise_for_status() # Validate session initialization
response = session.get(URI2)
response.raise_for_status() # Raise HTTPError for bad status codes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeout to HTTP requests.

Both session.get() calls lack a timeout parameter. Without a timeout, requests can hang indefinitely if the server is unresponsive, causing the scraper to stall. This is particularly important for automated collection systems.

Suggested fix
+        REQUEST_TIMEOUT = 30  # seconds
+
         # Make the GET request
         session = requests.session()
         response = session.get(
-            URI1
+            URI1,
+            timeout=REQUEST_TIMEOUT
         )  # Initialize session state (cookies) required by URI2
         response.raise_for_status()  # Validate session initialization
-        response = session.get(URI2)
+        response = session.get(URI2, timeout=REQUEST_TIMEOUT)
         response.raise_for_status()  # Raise HTTPError for bad status codes

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In
@uk_bin_collection/uk_bin_collection/councils/FolkestoneandHytheDistrictCouncil.py
around lines 26 - 32, The two HTTP calls using session.get(URI1) and
session.get(URI2) in FolkestoneandHytheDistrictCouncil.py lack timeouts; update
both calls to include a sensible timeout (e.g. timeout=10 or a module-level
TIMEOUT constant) when calling session.get(...) so the scraper won’t hang,
keeping the subsequent response.raise_for_status() checks unchanged and ensuring
the session variable and URI1/URI2 usage remain the same.

robbrad added a commit that referenced this pull request Jan 14, 2026
@robbrad robbrad closed this Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants