A scalable, backend-first system that scrapes real job listings, analyzes hiring trends, and serves insights through a fast API and a clean frontend. Built in Go, without unnecessary dependencies.
- Scrapes job data from We Work Remotely
- Analyzes trends across:
- Most in-demand skills
- Frequently mentioned locations
- Hiring companies
- Experience level distribution
- Deduplicates and stores listings in MongoDB
- Cleans up stale listings automatically (TTL)
- Offers a minimal frontend to trigger crawls and view trends
- Background crawling with async updates — no page reloads
I wanted to understand how hiring patterns shift across roles — and use that as an excuse to build something real. This wasn’t about making another CRUD app. I wanted concurrency, background processing, crawling architecture, and live data flowing through a pipeline I built end-to-end.
| Component | Tool |
|---|---|
| Language | Go (1.22) |
| Scraper | Colly |
| Database | MongoDB (Atlas) |
| Trend Logic | Mongo Aggregation Pipelines |
| API Server | net/http |
| Frontend | Vanilla JS + HTML/CSS |
| Infra | Go Modules, .env config |
/api/crawl?role=XtriggersStartCrawlingin backgroundStartCrawlingspawns async workers for downloading jobs- Deduplicated results are written to MongoDB
This is a clutter-free view of how components are organized logically and physically:
- Frontend: Basic HTML/CSS/JS served via Go’s static file server
- API Server: Uses
net/httpwith role validation and background job handling - Job Crawler: Modular, Colly-based crawlers per role
- Trend Worker: Aggregates skill/location trends using Mongo pipelines
- MongoDB Atlas: Stores job documents with TTL index
Each part is modular, extensible, and communicates clearly with its neighbors.
Job-Crawler
├── api_server # API server and static frontend
│ ├── handlers # API endpoint logic
│ ├── routes # HTTP route mapping
│ ├── static # Frontend JS + CSS
│ └── templates # HTML views
├── internal # Core crawling logic
│ ├── crawler # Job scrapers and role logic
│ │ └── sites # Site-specific parsers
│ ├── downloader # HTTP client with timeout/cancel
│ └── urlfrontier # Deduplicated job queue
├── pkg # MongoDB, models, shared utils
├── trend_worker # Aggregation logic for trends
├── images # Diagrams and screenshots
GET /api/crawl?role=backend&role=ml
→ Starts background crawl for given roles
GET /api/trends?role=frontend
→ Returns trend report for a roleCrawls run in the background. The UI automatically refreshes results once done.
- Go 1.22+
- MongoDB Atlas URI
Create a .env file:
DATABASE_URL=mongodb+srv://<your-connection-string>
go run api_server/main.goThen open http://localhost:8080 in your browser.
- Select a role (e.g., “Backend Developer”)
- Click “Analyze Now”
- A countdown shows scraping progress
- Trends appear when ready — no reloads
- Clean separation between crawling, storage, and analytics
- MongoDB TTL cleans up stale job listings after 30 days
- Background workers use context for timeout + cancellation
- Role validation to prevent junk API calls
- Trend analytics are fast, computed via aggregation pipelines
- No frontend framework — just async JS and minimal UI
- Designing resilient crawlers with proper error handling
- Handling concurrency and graceful cancellation in Go
- Coordinating async jobs with the frontend
- Working with MongoDB's aggregation framework and TTL indexes
- Building something that actually solves a problem
Achal — backend-leaning engineer with an interest in systems that scale and data that tells a story.



