Skip to content

MaheshDhingra/MaheshSpidey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MaheshSpidey Banner

MaheshSpidey 1.8.4

A Python web scraper that fetches data from various sources including GitHub, YouTube, LinkedIn, Arxiv, and movie/song databases. It stores the data in structured directories, respects robots.txt, and uploads the datasets to Kaggle.

Objective

Track trending GitHub repositories and YouTube videos daily for open analysis, historical growth tracking, and data sharing.


Setup

# Clone the repo
git clone https://github.com/MaheshDhingra/MaheshSpidey.git
cd MaheshSpidey

# Install dependencies
pip install -r requirements.txt

Project Structure

MaheshSpidey/
├── data/
│   ├── arxiv_ai/
│   │   └── 2025-07-08-papers.json
│   ├── github/
│   │   ├── 2025-07-08/
│   │   │   ├── top.json
│   │   │   └── trending.json
│   │   └── tracked/
│   │       ├── anthropics_prompt-eng-interactive-tutorial.json
│   │       ├── CodeWithHarry_Sigma-Web-Dev-Course.json
│   │       ├── commaai_openpilot.json
│   │       ├── dockur_macos.json
│   │       ├── ed-donner_llm_engineering.json
│   │       ├── humanlayer_12-factor-agents.json
│   │       ├── pocketbase_pocketbase.json
│   │       ├── rustfs_rustfs.json
│   │       ├── smallcloudai_refact.json
│   │       └── th-ch_youtube-music.json
│   ├── movies/
│   │   ├── movies_0.csv
│   │   ├── movies_1998.csv
│   │   ├── movies_2017.csv
│   │   └── movies_2018.csv
│   └── songs_2000/
│       └── songs_2000_data.csv
├── logs/
│   └── youtube/
│       └── scraper.log
├── scraper/
│   ├── SpideyGithub/
│   │   ├── github_scraper.py
│   │   └── track_repo_growth.py
│   ├── SpideyLinked/
│   │   └── linked_tech_jobs.py
│   ├── SpideyMovie/
│   │   └── movie_scrapper.py
│   ├── SpideyResearch/
│   │   └── arixy-scrapper.py
│   ├── SpideySongs2000/
│   │   └── songs_2000_scraper.py
│   └── SpideyYoutube/
│       └── youtube_scraper.py
├── requirements.txt
└── README.md