ScrapeddashiiAlchemy 🔮

Turning raw HTML into structured health data — with a little DevTools wizardry.

This repo scrapes all condition pages from ada.com/conditions, extracting every <h2>-based feature and its corresponding content blocks beneath it. The results are saved in both JSON and CSV formats, ready for analysis, modeling, or medical NLP tasks.

🧠 Origin Story

While exploring ada.com, I wanted structured data on medical conditions. Here's how the alchemy happened:

Fired up Chrome DevTools 🔍
Inspected the layout of a few condition pages. I noticed a consistent pattern:
- Feature sections (e.g., Symptoms, Causes, Types of X) were always inside <h2> tags.
- The content under each was wrapped in <div class="Text_wrapper__rP9t7"> blocks.
Discovered the HTML rhythm 🎼
Under each <h2>, the relevant text was grouped into these styled div elements — up until the next <h2>.
Built the parser 🧰
Scraped each condition page, looping over all <h2> headers and capturing all following Text_wrapper__rP9t7 content.
Exported it all ✍️
Saved the scraped content into a cleanly structured conditions.json and conditions.csv.

🚀 Features

DevTools-Inspired DOM Mapping
Designed based on real-world inspection of Ada’s frontend structure.
Flexible <h2> Extraction
Auto-detects any feature section — no hardcoded "Symptoms", "Causes", etc.
Content-Aware Scraping
Pulls all relevant paragraphs under each section until the next begins.
Dual Output Formats
- JSON: nested, flexible, clean
- CSV: flat, tabular, analysis-ready
Stealth Mode
Random User-Agent headers and respectful delays to avoid detection.

🛠️ Installation

Clone this repo

git clone https://github.com/Ksschkw/ScrapeddashiiAlchemy.git
cd ScrapeddashiiAlchemy

Create a virtualenv (optional but smart)

python3 -m venv venv
source venv/bin/activate

Install the ingredients
```
pip install -r requirements.txt
```
Requirements: requests, beautifulsoup4, pandas

🎯 Usage

Run the script with:

python scrape_ada_extended.py

This will generate:

conditions.json: Each condition with all its feature sections and text blocks.
conditions.csv: Tabular form, one row per condition. Dynamically expanding columns as more features are detected.

🧬 Sample Output

{
  "condition": "Sepsis",
  "url": "https://ada.com/conditions/sepsis/",
  "overview": "Sepsis is a potentially life-threatening condition...",
  "symptoms": "High fever, confusion, rapid heart rate...",
  "treatment": "Hospitalization, IV fluids, antibiotics..."
}

⚙️ Configuration

Delays & Headers Tweak time.sleep() and USER_AGENTS list in scrape_ada_extended.py.
Feature Key Normalization Custom rules inside normalize_header_to_key() (e.g., "Symptoms of X" → "symptoms").
Slug-based URLs? You can optionally use your own list of slugs instead of crawling the entire /conditions index.

🧪 Ideas for Next Experiments

🧠 Symptom clustering via NLP
📊 Auto-profiling datasets with pandas-profiling
🌍 Translate scraped text using DeepL API
🔎 Semantic search with embedding vectors (OpenAI, HuggingFace)

🤝 Contributing

Found a weird condition page? Wanna help extend the scraper for images or diagrams? Fork it, improve it, and send a PR 💫

“Alchemy isn't magic. It's structured parsing wrapped in mystery.” — ScrapeddashiiAlchemy

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Disease precaution.csv		Disease precaution.csv
DiseaseAndSymptoms.csv		DiseaseAndSymptoms.csv
Disease_symptom_and_patient_profile_dataset.csv		Disease_symptom_and_patient_profile_dataset.csv
README.md		README.md
ada_conditions.csv		ada_conditions.csv
ada_conditions_optimized.csv		ada_conditions_optimized.csv
check.ipynb		check.ipynb
conditions.csv		conditions.csv
conditions.json		conditions.json
merged_symptom_checker.csv		merged_symptom_checker.csv
mplus_topics_2025-06-24.xml		mplus_topics_2025-06-24.xml
scrape_ada.py		scrape_ada.py
scrape_ada2.py		scrape_ada2.py
scrape_ada_extended.py		scrape_ada_extended.py
scrape_ada_optimised.py		scrape_ada_optimised.py
symbipredict_2022.csv		symbipredict_2022.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ScrapeddashiiAlchemy 🔮

🧠 Origin Story

🚀 Features

🛠️ Installation

🎯 Usage

🧬 Sample Output

⚙️ Configuration

🧪 Ideas for Next Experiments

🤝 Contributing

About

Uh oh!

Releases

Packages

Languages

Ksschkw/ScrapeddashiiAlchemy

Folders and files

Latest commit

History

Repository files navigation

ScrapeddashiiAlchemy 🔮

🧠 Origin Story

🚀 Features

🛠️ Installation

🎯 Usage

🧬 Sample Output

⚙️ Configuration

🧪 Ideas for Next Experiments

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages