MaheshSpidey 1.8.4

A Python web scraper that fetches data from various sources including GitHub, YouTube, LinkedIn, Arxiv, and movie/song databases. It stores the data in structured directories, respects robots.txt, and uploads the datasets to Kaggle.

Scraped data: https://www.kaggle.com/datasets/maheshdhingra/youtube-trending https://www.kaggle.com/datasets/maheshdhingra/research-papers https://www.kaggle.com/datasets/maheshdhingra/movies-year https://www.kaggle.com/datasets/maheshdhingra/song-year https://www.kaggle.com/datasets/maheshdhingra/github-top https://www.kaggle.com/datasets/maheshdhingra/github-trending

Objective

Track trending GitHub repositories and YouTube videos daily for open analysis, historical growth tracking, and data sharing.

Setup

# Clone the repo
git clone https://github.com/MaheshDhingra/MaheshSpidey.git
cd MaheshSpidey

# Install dependencies
pip install -r requirements.txt

Project Structure

MaheshSpidey/
├── data/
│   ├── arxiv_ai/
│   │   └── 2025-07-08-papers.json
│   ├── github/
│   │   ├── 2025-07-08/
│   │   │   ├── top.json
│   │   │   └── trending.json
│   │   └── tracked/
│   │       ├── anthropics_prompt-eng-interactive-tutorial.json
│   │       ├── CodeWithHarry_Sigma-Web-Dev-Course.json
│   │       ├── commaai_openpilot.json
│   │       ├── dockur_macos.json
│   │       ├── ed-donner_llm_engineering.json
│   │       ├── humanlayer_12-factor-agents.json
│   │       ├── pocketbase_pocketbase.json
│   │       ├── rustfs_rustfs.json
│   │       ├── smallcloudai_refact.json
│   │       └── th-ch_youtube-music.json
│   ├── movies/
│   │   ├── movies_0.csv
│   │   ├── movies_1998.csv
│   │   ├── movies_2017.csv
│   │   └── movies_2018.csv
│   └── songs_2000/
│       └── songs_2000_data.csv
├── logs/
│   └── youtube/
│       └── scraper.log
├── scraper/
│   ├── SpideyGithub/
│   │   ├── github_scraper.py
│   │   └── track_repo_growth.py
│   ├── SpideyLinked/
│   │   └── linked_tech_jobs.py
│   ├── SpideyMovie/
│   │   └── movie_scrapper.py
│   ├── SpideyResearch/
│   │   └── arixy-scrapper.py
│   ├── SpideySongs2000/
│   │   └── songs_2000_scraper.py
│   └── SpideyYoutube/
│       └── youtube_scraper.py
├── requirements.txt
└── README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MaheshSpidey 1.8.4

Objective

Setup

Project Structure

About

Uh oh!

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
app		app
data		data
logs/youtube		logs/youtube
scraper		scraper
web		web
README.md		README.md
requirements.txt		requirements.txt

MaheshDhingra/MaheshSpidey

Folders and files

Latest commit

History

Repository files navigation

MaheshSpidey 1.8.4

Objective

Setup

Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages