A full-stack web application for scraping, storing, and managing job listings from LinkedIn
Features โข Installation โข Usage โข API Docs โข Contributing
- Project Overview
- Features
- Tech Stack
- Architecture
- Installation
- Configuration
- Usage
- API Documentation
- Frontend Guide
- Database Schema
- Scraping Details
- Troubleshooting
- Development
- Contributing
- License
- Assumptions & Limitations
- Future Improvements
Job Scraper Dashboard is a comprehensive full-stack application designed to automate job hunting by:
- Scraping job listings from LinkedIn in real-time
- Storing data in PostgreSQL
- Searching through collected jobs with advanced filters
- Managing job listings through a modern React dashboard
- Monitoring scraping progress with live updates
- Time-Saving: Automate job search across multiple parameters
- Centralized Storage: All jobs in one place, searchable and filterable
- Real-time Updates: Live scraping progress monitoring
- User-Friendly: Intuitive dashboard with responsive design
- Scalable: Built with production-ready technologies
- โ Real-time LinkedIn Scraping - Live job extraction with progress tracking
- โ Advanced Search - Filter jobs by title, company, location
- โ Job Details View - Complete job descriptions with modal display
- โ Direct Apply Links - One-click access to LinkedIn job postings
- โ Admin Dashboard - Database management and bulk operations
- โ Responsive Design - Works perfectly on mobile and desktop
- โ Background Processing - Non-blocking scraping operations
- Live Scraping Status - Real-time progress monitoring
- Job Statistics - Total counts and insights
- Search & Filter - Quick find functionality
- Delete Operations - Individual and bulk job removal
- Admin Controls - Database management interface
| Technology | Version | Purpose |
|---|---|---|
| FastAPI | 0.104.1 | Modern web framework for APIs |
| SQLAlchemy | 2.0.23 | ORM for database interactions |
| Pydantic | 2.5.0 | Data validation and settings |
| Selenium | 4.15.2 | Browser automation for scraping |
| BeautifulSoup4 | 4.12.2 | HTML parsing for job data |
| Uvicorn | 0.24.0 | ASGI server for FastAPI |
| Psycopg2 | 2.9.9 | PostgreSQL adapter |
| Technology | Purpose |
|---|---|
| React 18 | Frontend UI library |
| React Router | Client-side routing |
| Tailwind CSS | Utility-first CSS framework |
| Lucide React | Icon library |
| Vite | Build tool and dev server |
| Technology | Purpose |
|---|---|
| PostgreSQL 14+ | Primary relational database |
| pgAdmin (optional) | Database management GUI |
| Technology | Purpose |
|---|---|
| dotenv | Environment variable management |
| WebDriver Manager | Auto ChromeDriver management |
| Chrome Headless | Headless browser for scraping |
| Git | Version control |
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ React Frontendโโโโโบโ FastAPI Backendโโโโโบโ PostgreSQL DB โ
โ (Dashboard) โ โ (REST API) โ โ (Job Storage) โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ โ โ
โ โ โ
โผ โผ โผ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ User Browser โ โ Selenium โ โ Data Models โ
โ (UI) โ โ (Scraper) โ โ (ORM) โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
job-scraper-dashboard/
โโโ backend/ # FastAPI Backend
โ โโโ main.py # FastAPI app & endpoints
โ โโโ database.py # DB connection setup
โ โโโ models.py # SQLAlchemy models
โ โโโ scraper.py # LinkedIn scraper class
โ โโโ requirements.txt # Python dependencies
โ โโโ .env # Environment variables
โ
โโโ frontend/ # React Frontend
โ โโโ src/
โ โ โโโ components/ # React components
โ โ โ โโโ Header.jsx
โ โ โ โโโ ScrapeForm.jsx
โ โ โ โโโ SearchBar.jsx
โ โ โ โโโ JobCard.jsx
โ โ โ โโโ JobModal.jsx
โ โ โ โโโ Footer.jsx
โ โ โโโ services/
โ โ โ โโโ api.js # API service layer
โ โ โโโ pages/
โ โ โ โโโ App.jsx # Main dashboard
โ โ โ โโโ AdminPanel.jsx
โ โ โโโ index.css # Tailwind styles
โ โ โโโ main.jsx # React entry point
โ โโโ package.json # Frontend dependencies
โ โโโ index.html # HTML template
โ
โโโ README.md # This documentation
โโโ .gitignore
# Core application setup with routes for:
# - Job management (CRUD operations)
# - Scraping control (start/stop/status)
# - Statistics and admin endpoints
# - CORS middleware configuration
# - Background task processing# SQLAlchemy engine and session factory setup
# PostgreSQL connection configuration
# Database session dependency injection# SQLAlchemy ORM models for Job entity
# Includes fields: title, company, location, description, url, etc.
# Unique constraints to prevent duplicate jobs
# Timestamp fields for tracking# Selenium-based web scraper for LinkedIn jobs
# Methods for: page navigation, job extraction, description parsing
# Anti-detection measures and error handling
# Duplicate job checking logic// Main application component
// State management for jobs, search, scraping status
// Component composition and layout
// API integration and data fetching- Header.jsx - Navigation and branding
- ScrapeForm.jsx - Scraping controls and form
- SearchBar.jsx - Search functionality
- JobCard.jsx - Individual job display
- JobModal.jsx - Detailed job view
- Footer.jsx - Footer information
// Centralized API client
// Methods for all backend interactions
// Error handling and response parsing- Python 3.9+ (with pip)
- Node.js 16+ (with npm)
- PostgreSQL 14+
- Chrome/Chromium browser
- Git (for version control)
git clone https://github.com/uzair-javed-1/LinkedinJobScrapper
cd LinkedinJobScrapper# Navigate to backend directory
cd backend
# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On Mac/Linux:
source venv/bin/activate
# Install Python dependencies
pip install -r requirements.txt
# Set up environment variables
cp .env.example .env
# Edit .env with your database credentials-- Connect to PostgreSQL
psql -U postgres
-- Create database
CREATE DATABASE job_scraper;
-- Create user (optional)
CREATE USER scraper_user WITH PASSWORD 'your_password';
-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE job_scraper TO scraper_user;
-- Exit psql
\q# Navigate to frontend directory
cd ../frontend
# Install Node.js dependencies
npm installEdit backend/.env file:
DATABASE_URL=postgresql://scraper_user:your_password@localhost:5432/job_scraperfor reference: setup is this "DATABASE_URL=postgresql://postgres:uzair@localhost:5432/job_scraper"
//uzair is user name of db server and job_scraper is the database which i created and port in which this backend will run is 5432.
Create backend/.env file with:
# Database Configuration
DATABASE_URL=postgresql://username:password@localhost:5432/job_scraper
# Scraping Configuration
SCRAPING_MAX_PAGES=2
SCRAPING_DELAY_SECONDS=2
SCRAPING_HEADLESS=true
# Application Configuration
DEBUG=true
HOST=0.0.0.0
PORT=8000
# CORS Configuration
CORS_ORIGINS=http://localhost:5173,http://localhost:3000Edit frontend/src/services/api.js:
const API_URL = 'http://localhost:8000'; // Change if backend runs elsewhereThe application automatically creates tables. Manual configuration includes:
-
Start PostgreSQL:
# Linux sudo systemctl start postgresql # Windows (via Services) # Start PostgreSQL service
-
Verify Connection:
psql -U scraper_user -d job_scraper -h localhost
Terminal 1 - Backend Server:
cd backend
source venv/bin/activate # or venv\Scripts\activate on Windows
uvicorn main:app --reload --host 0.0.0.0 --port 8000Terminal 2 - Frontend Server:
cd frontend
npm run dev# Backend (production)
cd backend
gunicorn -w 4 -k uvicorn.workers.UvicornWorker main:app
# Frontend (build and serve)
cd frontend
npm run build
npm run preview- Frontend Dashboard: http://localhost:5173
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs (Swagger UI)
- Admin Panel: http://localhost:5173/admin
- Login to your LinkedIn: login Linkedin Account and Make it default Browser in which you are logged In.
- Open Dashboard: Navigate to http://localhost:5173
- Enter Parameters:
- Keyword: Job title or skill (e.g., "Software Engineer")
- Location: City, state, or country (e.g., "New York")
- Start Scraping: Click "Start Scraping" button
- Monitor Progress: Watch real-time updates
- Stop Anytime: Click stop button to cancel -> well stop is not working need little fix required, basically i created /stop backend route but in frontend not perfecly alligned with the backend so need react fixes just, well you can go to /stop route or /admin route this will terminate the scrapping process at this time but will fix this later.
curl -X POST "http://localhost:8000/scrape" \
-H "Content-Type: application/json" \
-d '{
"keyword": "Data Scientist",
"location": "San Francisco",
"max_pages": 2
}'import requests
# Start scraping
response = requests.post(
"http://localhost:8000/scrape",
json={
"keyword": "Marketing Manager",
"location": "Chicago",
"max_pages": 3
}
)
print(f"Scraping started: {response.json()}")
# Check status
status = requests.get("http://localhost:8000/scraping-status").json()
print(f"Current status: {status}")-
Initialization:
- Opens Chrome in headless mode
- Navigates to LinkedIn jobs search
- Closes popups and modals
-
Job Collection:
- Extracts job cards from each page
- Gets title, company, location, URL
- Checks for duplicates in database
-
Detail Extraction:
- Visits each job page individually
- Extracts full job description
- Captures posting date if available
-
Data Storage:
- Saves to PostgreSQL with unique constraints
- Updates scraping status
- Commits transaction
-
Cleanup:
- Closes browser
- Updates final status
- Releases database connection
| Parameter | Default | Description | Recommended Range |
|---|---|---|---|
| max_pages | 2 | Number of pages to scrape | 1-5 |
| delay | 1-3s | Delay between requests | 1-5 seconds |
| headless | True | Headless browser mode | True/False |
| timeout | 15s | Page load timeout | 10-30 seconds |
http://localhost:8000
No authentication required for development. For production, implement JWT or API keys.
GET /
curl http://localhost:8000/Response:
{
"message": "Job Scraper API",
"status": "running"
}GET /jobs
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
| skip | integer | 0 | Number of records to skip |
| limit | integer | 50 | Maximum records to return |
| search | string | "" | Search term for title/company/location |
Example:
curl "http://localhost:8000/jobs?search=engineer&skip=0&limit=20"Response:
[
{
"id": 1,
"title": "Senior Software Engineer",
"company": "Tech Corp Inc.",
"location": "San Francisco, CA",
"description": "We are looking for a Senior Software Engineer...",
"url": "https://linkedin.com/jobs/view/123456",
"source": "LinkedIn",
"posted_date": "2023-12-01T10:00:00Z",
"scraped_at": "2023-12-01T14:30:00Z"
}
]GET /jobs/{job_id}
curl http://localhost:8000/jobs/1DELETE /jobs/{job_id}
curl -X DELETE http://localhost:8000/jobs/1Response:
{
"message": "Job deleted"
}POST /scrape
Request Body:
{
"keyword": "Software Engineer",
"location": "New York",
"max_pages": 2
}Response:
{
"message": "Scraping started",
"keyword": "Software Engineer"
}GET /scraping-status
curl http://localhost:8000/scraping-statusResponse:
{
"is_scraping": true,
"current_page": 1,
"total_pages": 2,
"jobs_found": 5,
"current_job": "Scraping: Senior Backend Engineer at Google"
}POST /stop-scraping
curl -X POST http://localhost:8000/stop-scrapingResponse:
{
"message": "Stopping scraper..."
}GET /stats
curl http://localhost:8000/statsResponse:
{
"total_jobs": 150
}DELETE /admin/delete-all
curl -X DELETE http://localhost:8000/admin/delete-allResponse:
{
"message": "Deleted 150 jobs",
"count": 150
}| Status Code | Description |
|---|---|
| 400 | Bad Request - Invalid input |
| 404 | Not Found - Resource doesn't exist |
| 409 | Conflict - Duplicate job |
| 500 | Internal Server Error |
class ScrapeRequest(BaseModel):
keyword: str
location: str
max_pages: int = 2class JobResponse(BaseModel):
id: int
title: str
company: str
location: Optional[str]
description: Optional[str]
url: str
source: str
posted_date: Optional[datetime]
scraped_at: datetimeLocation: frontend/src/pages/App.jsx
- Main application container
- Manages global state (jobs, search, scraping status)
- Renders all other components
Location: frontend/src/components/Header.jsx
- Application header with title
- Navigation to admin panel
- Responsive design with gradient background
Location: frontend/src/components/ScrapeForm.jsx
- Form for starting scraping jobs
- Real-time progress monitoring
- Start/stop controls with visual feedback
Features:
- Keyword and location inputs
- Live scraping status updates
- Progress bar and job count
- Success/error notifications
Location: frontend/src/components/SearchBar.jsx
- Search functionality for jobs
- Real-time filtering
- Clear search option
Location: frontend/src/components/JobCard.jsx
- Displays individual job listing
- Compact view with essential info
- Action buttons (View, Apply, Delete)
Job Card Layout:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Senior Software Engineer โ โ Title
โ Google โ โ Company
โ ๐ Mountain View, CA โ โ Location
โ โ
โ [๐๏ธ View] [๐ค Apply] [๐๏ธ] โ โ Action buttons
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Location: frontend/src/components/JobModal.jsx
- Modal popup for detailed job view
- Full job description
- Direct apply link to LinkedIn
Location: frontend/src/components/Footer.jsx
- Footer with author information
- Contact details and GitHub link
Location: frontend/src/pages/AdminPanel.jsx
- Administrative interface
- Statistics display
- Bulk delete functionality
- Database management tools
Location: frontend/src/services/api.js
Methods Available:
// Get jobs with search
api.getJobs(search = '', skip = 0, limit = 50)
// Get single job
api.getJob(id)
// Start scraping
api.scrapeJobs(keyword, location, maxPages = 2)
// Get scraping status
api.getScrapingStatus()
// Stop scraping
api.stopScraping()
// Delete job
api.deleteJob(id)
// Delete all jobs
api.deleteAllJobs()
// Get statistics
api.getStats()The application uses React hooks for state management:
// Main state variables
const [jobs, setJobs] = useState([]) // Job listings
const [search, setSearch] = useState('') // Search term
const [loading, setLoading] = useState(false) // Loading state
const [scraping, setScraping] = useState(false) // Scraping status
const [selectedJob, setSelectedJob] = useState(null) // Selected job for modal// React Router setup in main.jsx
<BrowserRouter>
<Routes>
<Route path="/" element={<App />} />
<Route path="/admin" element={<AdminPanel />} />
</Routes>
</BrowserRouter>CREATE TABLE jobs (
id SERIAL PRIMARY KEY,
title VARCHAR(500) NOT NULL,
company VARCHAR(500) NOT NULL,
location VARCHAR(500),
description TEXT,
url VARCHAR(1000) NOT NULL,
source VARCHAR(100) DEFAULT 'LinkedIn',
posted_date TIMESTAMP WITH TIME ZONE,
scraped_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
-- Unique constraint to prevent duplicates
UNIQUE(title, company, url)
);| Field | Type | Description | Constraints |
|---|---|---|---|
| id | SERIAL | Auto-incrementing primary key | PRIMARY KEY |
| title | VARCHAR(500) | Job title | NOT NULL |
| company | VARCHAR(500) | Company name | NOT NULL |
| location | VARCHAR(500) | Job location | NULLABLE |
| description | TEXT | Full job description | NULLABLE |
| url | VARCHAR(1000) | LinkedIn job URL | NOT NULL |
| source | VARCHAR(100) | Source platform | DEFAULT 'LinkedIn' |
| posted_date | TIMESTAMPTZ | Original posting date | NULLABLE |
| scraped_at | TIMESTAMPTZ | When job was scraped | DEFAULT NOW() |
-- Create indexes for faster queries
CREATE INDEX idx_jobs_title ON jobs(title);
CREATE INDEX idx_jobs_company ON jobs(company);
CREATE INDEX idx_jobs_location ON jobs(location);
CREATE INDEX idx_jobs_scraped_at ON jobs(scraped_at DESC);
-- Full-text search index (optional)
CREATE INDEX idx_jobs_search ON jobs
USING gin(to_tsvector('english', title || ' ' || company || ' ' || COALESCE(location, '')));INSERT INTO jobs (title, company, location, url, description, source)
VALUES (
'Senior Software Engineer',
'Google',
'Mountain View, CA',
'https://linkedin.com/jobs/view/123456',
'Join our team to build scalable systems...',
'LinkedIn'
);Location: backend/scraper.py
-
__init__(self)- Sets up Chrome browser with headless options
- Configures WebDriver with anti-detection settings
- Initializes WebDriverWait for element waiting
-
close_popups(self)- Closes LinkedIn popups and modals
- Uses multiple CSS selectors for robustness
- Handles various popup types
-
scrape_jobs(self, keyword, location, max_pages=20)- Main scraping method
- Handles pagination and job extraction
- Returns list of job dictionaries
-
get_job_description(self, job_url)- Visits individual job pages
- Extracts full job descriptions
- Handles "Show more" buttons
1. Initialize Browser
โ
2. Navigate to LinkedIn Jobs Search
โ
3. Close Popups
โ
4. For each page:
โ 4.1. Scroll page
โ 4.2. Parse HTML with BeautifulSoup
โ 4.3. Extract job cards
โ 4.4. For each job card:
โ โ 4.4.1. Extract basic info
โ โ 4.4.2. Check for duplicates
โ โ 4.4.3. Visit job page for description
โ โ 4.4.4. Save to database
โ โ
โ
5. Cleanup and Close Browser
# Job card selectors
JOB_CARD_SELECTORS = [
'div.job-search-card',
'div.base-card',
'li.jobs-search-results__list-item'
]
# Title selectors
TITLE_SELECTORS = [
'h3.base-search-card__title',
'a.base-card__full-link'
]
# Company selectors
COMPANY_SELECTORS = [
'h4.base-search-card__subtitle',
'a.hidden-nested-link'
]
# Description selectors
DESCRIPTION_SELECTORS = [
'div.show-more-less-html__markup',
'div.jobs-description__content',
'div.description__text',
'section.description'
]- User-Agent Rotation: Uses realistic user agent string
- Headless Mode: Runs browser in background
- Random Delays: Varies timing between requests
- Scroll Simulation: Mimics human scrolling behavior
- Popup Handling: Closes all interfering popups
To avoid LinkedIn blocking:
- Default delay: 1-3 seconds between requests
- Max pages per scrape: 2 (configurable)
- Random delays to mimic human behavior
- Consider using proxies for production
Symptoms:
- "Could not connect to database" error
- Jobs not saving to database
- API returning 500 errors
Solutions:
-
Check if PostgreSQL is running:
# Linux sudo systemctl status postgresql # Windows # Check Services for PostgreSQL
-
Verify connection string in
.env:DATABASE_URL=postgresql://username:password@localhost:5432/database
-
Test connection manually:
psql -U username -d database -h localhost
Symptoms:
- "ChromeDriver executable needs to be in PATH" error
- Selenium fails to start
- Browser not opening
Solutions:
-
Update Chrome and ChromeDriver:
pip install --upgrade webdriver-manager
-
Check Chrome installation:
google-chrome --version # or chromium --version -
Run in non-headless mode for debugging:
# In scraper.py, comment out: # options.add_argument('--headless')
Symptoms:
- Application slowing down over time
- High memory usage in task manager
- Browser crashes during scraping
Solutions:
-
Limit scraping pages:
max_pages=2 # Reduce from higher values
-
Increase delay between requests:
time.sleep(3) # Increase from 1 second
-
Implement periodic browser restart
Symptoms:
- CAPTCHA appears
- "Access denied" errors
- No job cards found
- IP address temporarily blocked
Solutions:
-
Add longer, random delays:
import random time.sleep(random.randint(3, 7))
-
Reduce scraping frequency:
max_pages=1 # Scrape fewer pages at once
-
Use proxy rotation (advanced)
Symptoms:
- "Failed to fetch" errors in console
- API calls timing out
- Blank dashboard
- CORS errors
Solutions:
-
Check if backend is running:
curl http://localhost:8000
-
Update API URL in frontend:
// In frontend/src/services/api.js const API_URL = 'http://localhost:8000';
-
Check CORS configuration:
# In main.py app.add_middleware( CORSMiddleware, allow_origins=["http://localhost:5173"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"], )
import logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('debug.log'),
logging.StreamHandler()
]
)# Backend logs
tail -f uvicorn.log
# Database logs (Linux)
tail -f /var/log/postgresql/postgresql-14-main.log
# Application logs
tail -f debug.logcurl http://localhost:8000/ # Health check
curl http://localhost:8000/jobs # Get jobs
curl http://localhost:8000/stats # Get stats-- Check job count
SELECT COUNT(*) FROM jobs;
-- Check recent jobs
SELECT * FROM jobs ORDER BY scraped_at DESC LIMIT 5;
-- Check for duplicates
SELECT title, company, COUNT(*)
FROM jobs
GROUP BY title, company
HAVING COUNT(*) > 1;-
Database Indexing:
CREATE INDEX idx_jobs_combined ON jobs(title, company, location); CREATE INDEX idx_jobs_posted_date ON jobs(posted_date DESC);
-
Connection Pooling:
engine = create_engine( DATABASE_URL, pool_size=10, max_overflow=20, pool_recycle=3600, pool_pre_ping=True )
-
Query Optimization:
from sqlalchemy.orm import selectinload jobs = db.query(Job).options(selectinload(Job.tags)).all()
# Clone repository
git clone https://github.com/your-username/job-scraper-dashboard.git
cd job-scraper-dashboard
# Setup backend
cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Setup frontend
cd ../frontend
npm installBackend (requirements-dev.txt):
pytest>=7.0.0
pytest-asyncio>=0.20.0
black>=23.0.0
flake8>=6.0.0
mypy>=1.0.0
pre-commit>=3.0.0
Frontend Development:
npm install -D eslint prettier @types/react @types/react-dom- Follow PEP 8 guidelines
- Use type hints for all functions
- Maximum line length: 100 characters
- Use docstrings for all public methods
Example:
def get_jobs(
skip: int = 0,
limit: int = 50,
search: Optional[str] = None,
db: Session = Depends(get_db),
) -> List[JobResponse]:
"""
Retrieve jobs with optional filtering and pagination.
Args:
skip: Number of records to skip
limit: Maximum records to return
search: Search term for filtering
db: Database session
Returns:
List of job objects
"""
query = db.query(models.Job)
# ... implementation- Use functional components with hooks
- Follow React naming conventions
- Use Tailwind CSS for styling
- Implement prop types or TypeScript
Example:
const JobCard = ({ job, onView, onDelete }) => {
return (
<div className="job-card">
{/* JSX content */}
</div>
);
};
JobCard.propTypes = {
job: PropTypes.object.isRequired,
onView: PropTypes.func.isRequired,
onDelete: PropTypes.func.isRequired,
};# tests/test_main.py
def test_get_jobs():
response = client.get("/jobs")
assert response.status_code == 200
assert isinstance(response.json(), list)// JobCard.test.jsx
test('renders job cards', () => {
render(<JobCard job={mockJob} />);
expect(screen.getByText(mockJob.title)).toBeInTheDocument();
});- Fork the Repository
- Create Feature Branch
git checkout -b feature/your-feature-name
- Make Changes
- Follow coding standards
- Add tests if applicable
- Update documentation
- Commit Changes
git add . git commit -m "Add: Description of changes"
- Push and Create PR
git push origin feature/your-feature-name # Create Pull Request on GitHub
- Use conventional commits:
feat:,fix:,docs:,style:,refactor:,test:,chore: - Update README for new features
- Add API documentation for new endpoints
- Update component documentation
- Code Review: All PRs require review
- Testing: Must pass existing tests
- Documentation: Must be updated
- CI/CD: Must pass pipeline
This project is licensed under the MIT License - see the LICENSE file for details.
- LinkedIn for providing job data
- Open source community for libraries and tools
- Contributors and testers
- Mentors and advisors
- LinkedIn Structure Stability: LinkedIn's HTML/CSS structure remains relatively unchanged
- Public Access: Jobs are accessible without LinkedIn login
- English Content: Primary language for job descriptions is English
- Geographic Availability: Jobs are available in specified locations
- Browser Compatibility: Chrome/Chromium is available on the system
- Network Stability: Stable internet connection for scraping
- Rate Limiting: LinkedIn may block excessive requests
- CAPTCHA Challenges: May encounter CAPTCHA during scraping
- JavaScript Rendering: Requires Selenium for dynamic content
- Memory Usage: Long scraping sessions may use significant memory
- Network Dependence: Requires stable internet connection
- Browser Updates: ChromeDriver compatibility issues with Chrome updates
- Single Source: Currently only supports LinkedIn
- No Scheduling: Manual scraping only, no automated schedules
- Limited Filters: Basic keyword/location filtering only
- No User Accounts: Single-user system
- No Export: Cannot export data to external formats
- No Notifications: No alert system for new jobs
-
Indeed Integration
- Add support for Indeed.com scraping
- Unified job storage
- Source-specific parsing
-
Advanced Filters
- Salary range filtering
- Job type (full-time, contract, etc.)
- Experience level filtering
- Remote/hybrid/onsite options
-
Export Functionality
- CSV export
- Excel export
- PDF reports
- JSON API for integration
-
User Authentication
- Multi-user support
- Role-based access (admin/user)
- User preferences
- Saved searches
-
Email Notifications
- New job alerts
- Daily/weekly digests
- Custom notification rules
- Unsubscribe options
-
Scheduling System
- Automated daily scraping
- Custom schedule configuration
- Result notifications
- Performance monitoring
-
Multiple Job Sources
- Glassdoor integration
- Monster integration
- CareerBuilder support
- Company career pages
-
Advanced Analytics
- Job market trends
- Salary analysis
- Company insights
- Location heatmaps
-
Resume Matching
- Resume upload
- Skills matching
- Job recommendations
- Application tracking
-
AI Features
- Smart job recommendations
- Resume optimization
- Interview preparation
- Salary negotiation tips
-
Mobile Application
- iOS app
- Android app
- Push notifications
- Offline access
-
Enterprise Features
- Team collaboration
- Applicant tracking
- Reporting dashboard
- API access for businesses
-
Performance Optimization
- Database indexing optimization
- Caching implementation
- Async processing improvements
- Load balancing
-
Security Enhancements
- JWT authentication
- Rate limiting
- Input validation
- Security headers
-
Monitoring & Logging
- Application performance monitoring
- Error tracking
- Usage analytics
- Audit logging
Documentation last updated: December 2025
Project Version: 1.0.0
Maintainer: Uzair Javed
Contact: [email protected]
GitHub: uzair-javed-1
LinkedIn: LinkedIn