refactor(scraper): replace Crawlee with hand-rolled fetch+cheerio#67
Merged
ThatXliner merged 14 commits intomainfrom Apr 3, 2026
Merged
refactor(scraper): replace Crawlee with hand-rolled fetch+cheerio#67ThatXliner merged 14 commits intomainfrom
ThatXliner merged 14 commits intomainfrom
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…upsertContent Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…drop Crawlee Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t + log Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…+ log Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…, yargs, TS errors Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
lcai000
reviewed
Apr 1, 2026
Collaborator
lcai000
left a comment
There was a problem hiding this comment.
removing crawlee and playwright dependencies is good, more simple
reduced LOC is good
TODO: make more descriptive logging, the current logging prints out a bunch of random stuff and a lot is unnecesarry, change it to be better
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
fetch+cheeriodirectly, removing ~2100 lines of lockfile and heavy dependenciesupsertBill,upsertGovernmentContent,upsertCourtCaseinto a singleupsertContent(type, data)using a discriminated unionfetchWithRetry()(retry + backoff + Retry-After + timeout) andlog()(timestamped, scraper-prefixed logging)main.tsis now a loop overScraper[]objects with yargs for CLI parsinggoogle-images.tsNet result: -2100 lines, fewer deps, unified patterns, same behavior.
Test plan
cd apps/scraper && npx tsc --noEmit— compiles with only pre-existing@acme/dberrorspnpm run start:dev govtrack— fetches bills, logs with[HH:MM:SS] [GovTrack]prefixpnpm run start:dev whitehouse— fetches articles with paginationpnpm run start:dev congress— fetches from Congress.gov APIpnpm run start:dev scotus— fetches from CourtListener APIpnpm run start:dev --help— shows yargs help outputgrep -r "crawlee" apps/scraper/src/— no matches🤖 Generated with Claude Code