🚀 Job Scraper Dashboard - Complete Documentation

A full-stack web application for scraping, storing, and managing job listings from LinkedIn

Features • Installation • Usage • API Docs • Contributing

📋 Table of Contents

Project Overview
Features
Tech Stack
Architecture
Installation
Configuration
Usage
API Documentation
Frontend Guide
Database Schema
Scraping Details
Troubleshooting
Development
Contributing
License
Assumptions & Limitations
Future Improvements

🎯 Project Overview

What is Job Scraper Dashboard?

Job Scraper Dashboard is a comprehensive full-stack application designed to automate job hunting by:

Scraping job listings from LinkedIn in real-time
Storing data in PostgreSQL
Searching through collected jobs with advanced filters
Managing job listings through a modern React dashboard
Monitoring scraping progress with live updates

Key Benefits

Time-Saving: Automate job search across multiple parameters
Centralized Storage: All jobs in one place, searchable and filterable
Real-time Updates: Live scraping progress monitoring
User-Friendly: Intuitive dashboard with responsive design
Scalable: Built with production-ready technologies

✨ Features

Core Features

✅ Real-time LinkedIn Scraping - Live job extraction with progress tracking
✅ Advanced Search - Filter jobs by title, company, location
✅ Job Details View - Complete job descriptions with modal display
✅ Direct Apply Links - One-click access to LinkedIn job postings
✅ Admin Dashboard - Database management and bulk operations
✅ Responsive Design - Works perfectly on mobile and desktop
✅ Background Processing - Non-blocking scraping operations

Dashboard Features

Live Scraping Status - Real-time progress monitoring
Job Statistics - Total counts and insights
Search & Filter - Quick find functionality
Delete Operations - Individual and bulk job removal
Admin Controls - Database management interface

🏗️ Tech Stack

Backend (Python)

Technology	Version	Purpose
FastAPI	0.104.1	Modern web framework for APIs
SQLAlchemy	2.0.23	ORM for database interactions
Pydantic	2.5.0	Data validation and settings
Selenium	4.15.2	Browser automation for scraping
BeautifulSoup4	4.12.2	HTML parsing for job data
Uvicorn	0.24.0	ASGI server for FastAPI
Psycopg2	2.9.9	PostgreSQL adapter

Frontend (React)

Technology	Purpose
React 18	Frontend UI library
React Router	Client-side routing
Tailwind CSS	Utility-first CSS framework
Lucide React	Icon library
Vite	Build tool and dev server

Database

Technology	Purpose
PostgreSQL 14+	Primary relational database
pgAdmin (optional)	Database management GUI

DevOps & Tools

Technology	Purpose
dotenv	Environment variable management
WebDriver Manager	Auto ChromeDriver management
Chrome Headless	Headless browser for scraping
Git	Version control

🏗️ Architecture

System Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   React Frontend│◄──►│   FastAPI Backend│◄──►│   PostgreSQL DB  │
│   (Dashboard)   │    │   (REST API)     │    │   (Job Storage)  │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   User Browser  │    │   Selenium      │    │   Data Models   │
│   (UI)          │    │   (Scraper)     │    │   (ORM)         │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Project Structure

job-scraper-dashboard/
├── backend/                    # FastAPI Backend
│   ├── main.py                # FastAPI app & endpoints
│   ├── database.py            # DB connection setup
│   ├── models.py              # SQLAlchemy models
│   ├── scraper.py             # LinkedIn scraper class
│   ├── requirements.txt       # Python dependencies
│   └── .env                   # Environment variables
│
├── frontend/                  # React Frontend
│   ├── src/
│   │   ├── components/        # React components
│   │   │   ├── Header.jsx
│   │   │   ├── ScrapeForm.jsx
│   │   │   ├── SearchBar.jsx
│   │   │   ├── JobCard.jsx
│   │   │   ├── JobModal.jsx
│   │   │   └── Footer.jsx
│   │   ├── services/
│   │   │   └── api.js         # API service layer
│   │   ├── pages/
│   │   │   ├── App.jsx        # Main dashboard
│   │   │   └── AdminPanel.jsx
│   │   ├── index.css          # Tailwind styles
│   │   └── main.jsx           # React entry point
│   ├── package.json           # Frontend dependencies
│   └── index.html             # HTML template
│
├── README.md                  # This documentation
└── .gitignore

Backend File Details

`backend/main.py` - FastAPI Application

# Core application setup with routes for:
# - Job management (CRUD operations)
# - Scraping control (start/stop/status)
# - Statistics and admin endpoints
# - CORS middleware configuration
# - Background task processing

`backend/database.py` - Database Connection

# SQLAlchemy engine and session factory setup
# PostgreSQL connection configuration
# Database session dependency injection

`backend/models.py` - Data Models

# SQLAlchemy ORM models for Job entity
# Includes fields: title, company, location, description, url, etc.
# Unique constraints to prevent duplicate jobs
# Timestamp fields for tracking

`backend/scraper.py` - LinkedIn Scraper

# Selenium-based web scraper for LinkedIn jobs
# Methods for: page navigation, job extraction, description parsing
# Anti-detection measures and error handling
# Duplicate job checking logic

Frontend File Details

`frontend/src/pages/App.jsx` - Main Dashboard

// Main application component
// State management for jobs, search, scraping status
// Component composition and layout
// API integration and data fetching

`frontend/src/components/` - UI Components

Header.jsx - Navigation and branding
ScrapeForm.jsx - Scraping controls and form
SearchBar.jsx - Search functionality
JobCard.jsx - Individual job display
JobModal.jsx - Detailed job view
Footer.jsx - Footer information

`frontend/src/services/api.js` - API Service

// Centralized API client
// Methods for all backend interactions
// Error handling and response parsing

🚀 Installation

Prerequisites

Python 3.9+ (with pip)
Node.js 16+ (with npm)
PostgreSQL 14+
Chrome/Chromium browser
Git (for version control)

Step-by-Step Installation

1. Clone the Repository

git clone https://github.com/uzair-javed-1/LinkedinJobScrapper
cd LinkedinJobScrapper

2. Backend Setup

# Navigate to backend directory
cd backend

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On Mac/Linux:
source venv/bin/activate

# Install Python dependencies
pip install -r requirements.txt

# Set up environment variables
cp .env.example .env
# Edit .env with your database credentials

3. Database Setup

-- Connect to PostgreSQL
psql -U postgres

-- Create database
CREATE DATABASE job_scraper;

-- Create user (optional)
CREATE USER scraper_user WITH PASSWORD 'your_password';

-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE job_scraper TO scraper_user;

-- Exit psql
\q

4. Frontend Setup

# Navigate to frontend directory
cd ../frontend

# Install Node.js dependencies
npm install

5. Environment Configuration

Edit backend/.env file:

DATABASE_URL=postgresql://scraper_user:your_password@localhost:5432/job_scraper

for reference: setup is this "DATABASE_URL=postgresql://postgres:uzair@localhost:5432/job_scraper"

//uzair is user name of db server and job_scraper is the database which i created and port in which this backend will run is 5432.

⚙️ Configuration

Backend Configuration

Create backend/.env file with:

# Database Configuration
DATABASE_URL=postgresql://username:password@localhost:5432/job_scraper

# Scraping Configuration
SCRAPING_MAX_PAGES=2
SCRAPING_DELAY_SECONDS=2
SCRAPING_HEADLESS=true

# Application Configuration
DEBUG=true
HOST=0.0.0.0
PORT=8000

# CORS Configuration
CORS_ORIGINS=http://localhost:5173,http://localhost:3000

Frontend Configuration

Edit frontend/src/services/api.js:

const API_URL = 'http://localhost:8000';  // Change if backend runs elsewhere

Database Configuration

The application automatically creates tables. Manual configuration includes:

Start PostgreSQL:

# Linux
sudo systemctl start postgresql

# Windows (via Services)
# Start PostgreSQL service

Verify Connection:

psql -U scraper_user -d job_scraper -h localhost

🚀 Usage

Running the Application

Development Mode

Terminal 1 - Backend Server:

cd backend
source venv/bin/activate  # or venv\Scripts\activate on Windows
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Terminal 2 - Frontend Server:

cd frontend
npm run dev

Production Mode

# Backend (production)
cd backend
gunicorn -w 4 -k uvicorn.workers.UvicornWorker main:app

# Frontend (build and serve)
cd frontend
npm run build
npm run preview

Access Points

Frontend Dashboard: http://localhost:5173
Backend API: http://localhost:8000
API Documentation: http://localhost:8000/docs (Swagger UI)
Admin Panel: http://localhost:5173/admin

🎯 How to Use the Scraper

Via Web Interface

Login to your LinkedIn: login Linkedin Account and Make it default Browser in which you are logged In.
Open Dashboard: Navigate to http://localhost:5173
Enter Parameters:
- Keyword: Job title or skill (e.g., "Software Engineer")
- Location: City, state, or country (e.g., "New York")
Start Scraping: Click "Start Scraping" button
Monitor Progress: Watch real-time updates
Stop Anytime: Click stop button to cancel -> well stop is not working need little fix required, basically i created /stop backend route but in frontend not perfecly alligned with the backend so need react fixes just, well you can go to /stop route or /admin route this will terminate the scrapping process at this time but will fix this later.

Via API

curl -X POST "http://localhost:8000/scrape" \
  -H "Content-Type: application/json" \
  -d '{
    "keyword": "Data Scientist",
    "location": "San Francisco",
    "max_pages": 2
  }'

Via Python Script

import requests

# Start scraping
response = requests.post(
    "http://localhost:8000/scrape",
    json={
        "keyword": "Marketing Manager",
        "location": "Chicago",
        "max_pages": 3
    }
)
print(f"Scraping started: {response.json()}")

# Check status
status = requests.get("http://localhost:8000/scraping-status").json()
print(f"Current status: {status}")

Scraping Process Details

Initialization:
- Opens Chrome in headless mode
- Navigates to LinkedIn jobs search
- Closes popups and modals
Job Collection:
- Extracts job cards from each page
- Gets title, company, location, URL
- Checks for duplicates in database
Detail Extraction:
- Visits each job page individually
- Extracts full job description
- Captures posting date if available
Data Storage:
- Saves to PostgreSQL with unique constraints
- Updates scraping status
- Commits transaction
Cleanup:
- Closes browser
- Updates final status
- Releases database connection

Scraping Parameters

Parameter	Default	Description	Recommended Range
max_pages	2	Number of pages to scrape	1-5
delay	1-3s	Delay between requests	1-5 seconds
headless	True	Headless browser mode	True/False
timeout	15s	Page load timeout	10-30 seconds

🌐 API Documentation

Base URL

http://localhost:8000

Authentication

No authentication required for development. For production, implement JWT or API keys.

Endpoints Reference

1. Health Check

GET /

curl http://localhost:8000/

Response:

{
  "message": "Job Scraper API",
  "status": "running"
}

2. Get All Jobs

GET /jobs

Query Parameters:

Parameter	Type	Default	Description
skip	integer	0	Number of records to skip
limit	integer	50	Maximum records to return
search	string	""	Search term for title/company/location

Example:

curl "http://localhost:8000/jobs?search=engineer&skip=0&limit=20"

Response:

[
  {
    "id": 1,
    "title": "Senior Software Engineer",
    "company": "Tech Corp Inc.",
    "location": "San Francisco, CA",
    "description": "We are looking for a Senior Software Engineer...",
    "url": "https://linkedin.com/jobs/view/123456",
    "source": "LinkedIn",
    "posted_date": "2023-12-01T10:00:00Z",
    "scraped_at": "2023-12-01T14:30:00Z"
  }
]

3. Get Single Job

GET /jobs/{job_id}

curl http://localhost:8000/jobs/1

4. Delete Job

DELETE /jobs/{job_id}

curl -X DELETE http://localhost:8000/jobs/1

Response:

{
  "message": "Job deleted"
}

5. Start Scraping

POST /scrape

Request Body:

{
  "keyword": "Software Engineer",
  "location": "New York",
  "max_pages": 2
}

Response:

{
  "message": "Scraping started",
  "keyword": "Software Engineer"
}

6. Get Scraping Status

GET /scraping-status

curl http://localhost:8000/scraping-status

Response:

{
  "is_scraping": true,
  "current_page": 1,
  "total_pages": 2,
  "jobs_found": 5,
  "current_job": "Scraping: Senior Backend Engineer at Google"
}

7. Stop Scraping

POST /stop-scraping

curl -X POST http://localhost:8000/stop-scraping

Response:

{
  "message": "Stopping scraper..."
}

8. Get Statistics

GET /stats

curl http://localhost:8000/stats

Response:

{
  "total_jobs": 150
}

9. Delete All Jobs (Admin)

DELETE /admin/delete-all

curl -X DELETE http://localhost:8000/admin/delete-all

Response:

{
  "message": "Deleted 150 jobs",
  "count": 150
}

Error Responses

Status Code	Description
400	Bad Request - Invalid input
404	Not Found - Resource doesn't exist
409	Conflict - Duplicate job
500	Internal Server Error

Pydantic Models

ScrapeRequest

class ScrapeRequest(BaseModel):
    keyword: str
    location: str
    max_pages: int = 2

JobResponse

class JobResponse(BaseModel):
    id: int
    title: str
    company: str
    location: Optional[str]
    description: Optional[str]
    url: str
    source: str
    posted_date: Optional[datetime]
    scraped_at: datetime

💻 Frontend Guide

Component Overview

1. App (Main Dashboard)

Location: frontend/src/pages/App.jsx

Main application container
Manages global state (jobs, search, scraping status)
Renders all other components

2. Header Component

Location: frontend/src/components/Header.jsx

Application header with title
Navigation to admin panel
Responsive design with gradient background

3. ScrapeForm Component

Location: frontend/src/components/ScrapeForm.jsx

Form for starting scraping jobs
Real-time progress monitoring
Start/stop controls with visual feedback

Features:

Keyword and location inputs
Live scraping status updates
Progress bar and job count
Success/error notifications

4. SearchBar Component

Location: frontend/src/components/SearchBar.jsx

Search functionality for jobs
Real-time filtering
Clear search option

5. JobCard Component

Location: frontend/src/components/JobCard.jsx

Displays individual job listing
Compact view with essential info
Action buttons (View, Apply, Delete)

Job Card Layout:

┌─────────────────────────────────┐
│ Senior Software Engineer        │ ← Title
│ Google                          │ ← Company
│ 📍 Mountain View, CA            │ ← Location
│                                 │
│ [👁️ View] [📤 Apply]      [🗑️] │ ← Action buttons
└─────────────────────────────────┘

6. JobModal Component

Location: frontend/src/components/JobModal.jsx

Modal popup for detailed job view
Full job description
Direct apply link to LinkedIn

7. Footer Component

Location: frontend/src/components/Footer.jsx

Footer with author information
Contact details and GitHub link

8. AdminPanel Component

Location: frontend/src/pages/AdminPanel.jsx

Administrative interface
Statistics display
Bulk delete functionality
Database management tools

API Service Layer

Location: frontend/src/services/api.js

Methods Available:

// Get jobs with search
api.getJobs(search = '', skip = 0, limit = 50)

// Get single job
api.getJob(id)

// Start scraping
api.scrapeJobs(keyword, location, maxPages = 2)

// Get scraping status
api.getScrapingStatus()

// Stop scraping
api.stopScraping()

// Delete job
api.deleteJob(id)

// Delete all jobs
api.deleteAllJobs()

// Get statistics
api.getStats()

State Management

The application uses React hooks for state management:

// Main state variables
const [jobs, setJobs] = useState([])          // Job listings
const [search, setSearch] = useState('')      // Search term
const [loading, setLoading] = useState(false) // Loading state
const [scraping, setScraping] = useState(false) // Scraping status
const [selectedJob, setSelectedJob] = useState(null) // Selected job for modal

Routing

// React Router setup in main.jsx
<BrowserRouter>
  <Routes>
    <Route path="/" element={<App />} />
    <Route path="/admin" element={<AdminPanel />} />
  </Routes>
</BrowserRouter>

🗄️ Database Schema

Jobs Table

CREATE TABLE jobs (
    id SERIAL PRIMARY KEY,
    title VARCHAR(500) NOT NULL,
    company VARCHAR(500) NOT NULL,
    location VARCHAR(500),
    description TEXT,
    url VARCHAR(1000) NOT NULL,
    source VARCHAR(100) DEFAULT 'LinkedIn',
    posted_date TIMESTAMP WITH TIME ZONE,
    scraped_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    
    -- Unique constraint to prevent duplicates
    UNIQUE(title, company, url)
);

Field Descriptions

Field	Type	Description	Constraints
id	SERIAL	Auto-incrementing primary key	PRIMARY KEY
title	VARCHAR(500)	Job title	NOT NULL
company	VARCHAR(500)	Company name	NOT NULL
location	VARCHAR(500)	Job location	NULLABLE
description	TEXT	Full job description	NULLABLE
url	VARCHAR(1000)	LinkedIn job URL	NOT NULL
source	VARCHAR(100)	Source platform	DEFAULT 'LinkedIn'
posted_date	TIMESTAMPTZ	Original posting date	NULLABLE
scraped_at	TIMESTAMPTZ	When job was scraped	DEFAULT NOW()

Indexes for Performance

-- Create indexes for faster queries
CREATE INDEX idx_jobs_title ON jobs(title);
CREATE INDEX idx_jobs_company ON jobs(company);
CREATE INDEX idx_jobs_location ON jobs(location);
CREATE INDEX idx_jobs_scraped_at ON jobs(scraped_at DESC);

-- Full-text search index (optional)
CREATE INDEX idx_jobs_search ON jobs 
USING gin(to_tsvector('english', title || ' ' || company || ' ' || COALESCE(location, '')));

Sample Data

INSERT INTO jobs (title, company, location, url, description, source)
VALUES (
    'Senior Software Engineer',
    'Google',
    'Mountain View, CA',
    'https://linkedin.com/jobs/view/123456',
    'Join our team to build scalable systems...',
    'LinkedIn'
);

🤖 Scraping Details

LinkedInScraper Class

Location: backend/scraper.py

Key Methods:

__init__(self)
- Sets up Chrome browser with headless options
- Configures WebDriver with anti-detection settings
- Initializes WebDriverWait for element waiting
close_popups(self)
- Closes LinkedIn popups and modals
- Uses multiple CSS selectors for robustness
- Handles various popup types
scrape_jobs(self, keyword, location, max_pages=20)
- Main scraping method
- Handles pagination and job extraction
- Returns list of job dictionaries
get_job_description(self, job_url)
- Visits individual job pages
- Extracts full job descriptions
- Handles "Show more" buttons

Scraping Flow

1. Initialize Browser
   ↓
2. Navigate to LinkedIn Jobs Search
   ↓
3. Close Popups
   ↓
4. For each page:
   │   4.1. Scroll page
   │   4.2. Parse HTML with BeautifulSoup
   │   4.3. Extract job cards
   │   4.4. For each job card:
   │       │   4.4.1. Extract basic info
   │       │   4.4.2. Check for duplicates
   │       │   4.4.3. Visit job page for description
   │       │   4.4.4. Save to database
   │       ↓
   ↓
5. Cleanup and Close Browser

Selectors Used

# Job card selectors
JOB_CARD_SELECTORS = [
    'div.job-search-card',
    'div.base-card',
    'li.jobs-search-results__list-item'
]

# Title selectors
TITLE_SELECTORS = [
    'h3.base-search-card__title',
    'a.base-card__full-link'
]

# Company selectors
COMPANY_SELECTORS = [
    'h4.base-search-card__subtitle',
    'a.hidden-nested-link'
]

# Description selectors
DESCRIPTION_SELECTORS = [
    'div.show-more-less-html__markup',
    'div.jobs-description__content',
    'div.description__text',
    'section.description'
]

Anti-Detection Measures

User-Agent Rotation: Uses realistic user agent string
Headless Mode: Runs browser in background
Random Delays: Varies timing between requests
Scroll Simulation: Mimics human scrolling behavior
Popup Handling: Closes all interfering popups

Rate Limiting

To avoid LinkedIn blocking:

Default delay: 1-3 seconds between requests
Max pages per scrape: 2 (configurable)
Random delays to mimic human behavior
Consider using proxies for production

🔧 Troubleshooting

Common Issues & Solutions

Issue 1: Database Connection Failed

Symptoms:

"Could not connect to database" error
Jobs not saving to database
API returning 500 errors

Solutions:

Check if PostgreSQL is running:

# Linux
sudo systemctl status postgresql

# Windows
# Check Services for PostgreSQL

Verify connection string in .env:

DATABASE_URL=postgresql://username:password@localhost:5432/database

Test connection manually:

psql -U username -d database -h localhost

Issue 2: ChromeDriver Not Found

Symptoms:

"ChromeDriver executable needs to be in PATH" error
Selenium fails to start
Browser not opening

Solutions:

Update Chrome and ChromeDriver:
```
pip install --upgrade webdriver-manager
```

Check Chrome installation:

google-chrome --version
# or
chromium --version

Run in non-headless mode for debugging:

# In scraper.py, comment out:
# options.add_argument('--headless')

Issue 3: Memory Issues

Symptoms:

Application slowing down over time
High memory usage in task manager
Browser crashes during scraping

Solutions:

Limit scraping pages:

max_pages=2  # Reduce from higher values

Increase delay between requests:
```
time.sleep(3)  # Increase from 1 second
```
Implement periodic browser restart

Issue 4: LinkedIn Blocking/Throttling

Symptoms:

CAPTCHA appears
"Access denied" errors
No job cards found
IP address temporarily blocked

Solutions:

Add longer, random delays:

import random
time.sleep(random.randint(3, 7))

Reduce scraping frequency:

max_pages=1  # Scrape fewer pages at once

Use proxy rotation (advanced)

Issue 5: Frontend Not Connecting to Backend

Symptoms:

"Failed to fetch" errors in console
API calls timing out
Blank dashboard
CORS errors

Solutions:

Check if backend is running:
```
curl http://localhost:8000
```

Update API URL in frontend:

// In frontend/src/services/api.js
const API_URL = 'http://localhost:8000';

Check CORS configuration:

# In main.py
app.add_middleware(
    CORSMiddleware,
    allow_origins=["http://localhost:5173"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

Debugging Tips

Enable Debug Logging

import logging

logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('debug.log'),
        logging.StreamHandler()
    ]
)

Check Application Logs

# Backend logs
tail -f uvicorn.log

# Database logs (Linux)
tail -f /var/log/postgresql/postgresql-14-main.log

# Application logs
tail -f debug.log

Test API Endpoints

curl http://localhost:8000/                        # Health check
curl http://localhost:8000/jobs                    # Get jobs
curl http://localhost:8000/stats                   # Get stats

Database Diagnostics

-- Check job count
SELECT COUNT(*) FROM jobs;

-- Check recent jobs
SELECT * FROM jobs ORDER BY scraped_at DESC LIMIT 5;

-- Check for duplicates
SELECT title, company, COUNT(*)
FROM jobs
GROUP BY title, company
HAVING COUNT(*) > 1;

Performance Optimization Tips

Database Indexing:

CREATE INDEX idx_jobs_combined ON jobs(title, company, location);
CREATE INDEX idx_jobs_posted_date ON jobs(posted_date DESC);

Connection Pooling:

engine = create_engine(
    DATABASE_URL,
    pool_size=10,
    max_overflow=20,
    pool_recycle=3600,
    pool_pre_ping=True
)

Query Optimization:

from sqlalchemy.orm import selectinload
jobs = db.query(Job).options(selectinload(Job.tags)).all()

🛠️ Development

Setting Up Development Environment

1. Clone and Setup

# Clone repository
git clone https://github.com/your-username/job-scraper-dashboard.git
cd job-scraper-dashboard

# Setup backend
cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Setup frontend
cd ../frontend
npm install

2. Development Dependencies

Backend (requirements-dev.txt):

pytest>=7.0.0
pytest-asyncio>=0.20.0
black>=23.0.0
flake8>=6.0.0
mypy>=1.0.0
pre-commit>=3.0.0

Frontend Development:

npm install -D eslint prettier @types/react @types/react-dom

Code Standards

Python Backend

Follow PEP 8 guidelines
Use type hints for all functions
Maximum line length: 100 characters
Use docstrings for all public methods

Example:

def get_jobs(
    skip: int = 0,
    limit: int = 50,
    search: Optional[str] = None,
    db: Session = Depends(get_db),
) -> List[JobResponse]:
    """
    Retrieve jobs with optional filtering and pagination.
    
    Args:
        skip: Number of records to skip
        limit: Maximum records to return
        search: Search term for filtering
        db: Database session
    
    Returns:
        List of job objects
    """
    query = db.query(models.Job)
    # ... implementation

React Frontend

Use functional components with hooks
Follow React naming conventions
Use Tailwind CSS for styling
Implement prop types or TypeScript

Example:

const JobCard = ({ job, onView, onDelete }) => {
  return (
    <div className="job-card">
      {/* JSX content */}
    </div>
  );
};

JobCard.propTypes = {
  job: PropTypes.object.isRequired,
  onView: PropTypes.func.isRequired,
  onDelete: PropTypes.func.isRequired,
};

Testing

Backend Tests

# tests/test_main.py
def test_get_jobs():
    response = client.get("/jobs")
    assert response.status_code == 200
    assert isinstance(response.json(), list)

Frontend Tests

// JobCard.test.jsx
test('renders job cards', () => {
  render(<JobCard job={mockJob} />);
  expect(screen.getByText(mockJob.title)).toBeInTheDocument();
});

🤝 Contributing

Development Workflow

Fork the Repository

Create Feature Branch

git checkout -b feature/your-feature-name

Make Changes
- Follow coding standards
- Add tests if applicable
- Update documentation

Commit Changes

git add .
git commit -m "Add: Description of changes"

Push and Create PR

git push origin feature/your-feature-name
# Create Pull Request on GitHub

Code Standards

Use conventional commits: feat:, fix:, docs:, style:, refactor:, test:, chore:
Update README for new features
Add API documentation for new endpoints
Update component documentation

Review Process

Code Review: All PRs require review
Testing: Must pass existing tests
Documentation: Must be updated
CI/CD: Must pass pipeline

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

LinkedIn for providing job data
Open source community for libraries and tools
Contributors and testers
Mentors and advisors

⚠️ Assumptions & Limitations

Assumptions

LinkedIn Structure Stability: LinkedIn's HTML/CSS structure remains relatively unchanged
Public Access: Jobs are accessible without LinkedIn login
English Content: Primary language for job descriptions is English
Geographic Availability: Jobs are available in specified locations
Browser Compatibility: Chrome/Chromium is available on the system
Network Stability: Stable internet connection for scraping

Technical Limitations

Rate Limiting: LinkedIn may block excessive requests
CAPTCHA Challenges: May encounter CAPTCHA during scraping
JavaScript Rendering: Requires Selenium for dynamic content
Memory Usage: Long scraping sessions may use significant memory
Network Dependence: Requires stable internet connection
Browser Updates: ChromeDriver compatibility issues with Chrome updates

Functional Limitations

Single Source: Currently only supports LinkedIn
No Scheduling: Manual scraping only, no automated schedules
Limited Filters: Basic keyword/location filtering only
No User Accounts: Single-user system
No Export: Cannot export data to external formats
No Notifications: No alert system for new jobs

🔮 Future Improvements

Phase 1: Immediate (1-2 Months)

Indeed Integration
- Add support for Indeed.com scraping
- Unified job storage
- Source-specific parsing
Advanced Filters
- Salary range filtering
- Job type (full-time, contract, etc.)
- Experience level filtering
- Remote/hybrid/onsite options
Export Functionality
- CSV export
- Excel export
- PDF reports
- JSON API for integration

Phase 2: Short-term (3-6 Months)

User Authentication
- Multi-user support
- Role-based access (admin/user)
- User preferences
- Saved searches
Email Notifications
- New job alerts
- Daily/weekly digests
- Custom notification rules
- Unsubscribe options
Scheduling System
- Automated daily scraping
- Custom schedule configuration
- Result notifications
- Performance monitoring

Phase 3: Medium-term (6-12 Months)

Multiple Job Sources
- Glassdoor integration
- Monster integration
- CareerBuilder support
- Company career pages
Advanced Analytics
- Job market trends
- Salary analysis
- Company insights
- Location heatmaps
Resume Matching
- Resume upload
- Skills matching
- Job recommendations
- Application tracking

Phase 4: Long-term (12+ Months)

AI Features
- Smart job recommendations
- Resume optimization
- Interview preparation
- Salary negotiation tips
Mobile Application
- iOS app
- Android app
- Push notifications
- Offline access
Enterprise Features
- Team collaboration
- Applicant tracking
- Reporting dashboard
- API access for businesses

Technical Improvements

Performance Optimization
- Database indexing optimization
- Caching implementation
- Async processing improvements
- Load balancing
Security Enhancements
- JWT authentication
- Rate limiting
- Input validation
- Security headers
Monitoring & Logging
- Application performance monitoring
- Error tracking
- Usage analytics
- Audit logging

Documentation last updated: December 2025
Project Version: 1.0.0
Maintainer: Uzair Javed
Contact: [email protected]
GitHub: uzair-javed-1
LinkedIn: LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
backend		backend
frontend		frontend
LinkedInJobScrapper_ByUzair (1).png		LinkedInJobScrapper_ByUzair (1).png
LinkedInJobScrapper_ByUzair (2).png		LinkedInJobScrapper_ByUzair (2).png
LinkedInJobScrapper_ByUzair (3).png		LinkedInJobScrapper_ByUzair (3).png
LinkedInJobScrapper_ByUzair (4).png		LinkedInJobScrapper_ByUzair (4).png
readme.md		readme.md

uzair-javed-1/LinkedinJobScrapper

Folders and files

Latest commit

History

Repository files navigation

🚀 Job Scraper Dashboard - Complete Documentation

📋 Table of Contents

🎯 Project Overview

What is Job Scraper Dashboard?

Key Benefits

✨ Features

Core Features

Dashboard Features

🏗️ Tech Stack

Backend (Python)

Frontend (React)

Database

DevOps & Tools

🏗️ Architecture

System Architecture

Project Structure

Backend File Details

backend/main.py - FastAPI Application

backend/database.py - Database Connection

backend/models.py - Data Models

backend/scraper.py - LinkedIn Scraper

Frontend File Details

frontend/src/pages/App.jsx - Main Dashboard

frontend/src/components/ - UI Components

frontend/src/services/api.js - API Service

🚀 Installation

Prerequisites

Step-by-Step Installation

1. Clone the Repository

2. Backend Setup

3. Database Setup

4. Frontend Setup

5. Environment Configuration

⚙️ Configuration

Backend Configuration

Frontend Configuration

Database Configuration

🚀 Usage

Running the Application

Development Mode

Production Mode

Access Points

🎯 How to Use the Scraper

Via Web Interface

Via API

Via Python Script

Scraping Process Details

Scraping Parameters

🌐 API Documentation

Base URL

Authentication

Endpoints Reference

1. Health Check

2. Get All Jobs

3. Get Single Job

4. Delete Job

5. Start Scraping

6. Get Scraping Status

7. Stop Scraping

8. Get Statistics

9. Delete All Jobs (Admin)

Error Responses

Pydantic Models

ScrapeRequest

JobResponse

💻 Frontend Guide

Component Overview

1. App (Main Dashboard)

2. Header Component

3. ScrapeForm Component

4. SearchBar Component

5. JobCard Component

6. JobModal Component

7. Footer Component

8. AdminPanel Component

`backend/main.py` - FastAPI Application

`backend/database.py` - Database Connection

`backend/models.py` - Data Models

`backend/scraper.py` - LinkedIn Scraper

`frontend/src/pages/App.jsx` - Main Dashboard

`frontend/src/components/` - UI Components

`frontend/src/services/api.js` - API Service

Packages