This project takes a URL and returns a Markdown summary using either OpenAI API (gpt-4o-mini) or a local model (llama3.2) through Ollama.
- Environment Check: Verifies that the appropriate virtual environment (e.g.
llms,venv) is active. - API Key Handling:
- Loads
.envfile from the project root. - Checks that
OPENAI_API_KEYexists and is correctly formatted.
- Loads
- Website Class: Downloads and parses the webpage using
requestsandBeautifulSoup. Strips out unnecessary tags (script,style,img,input) and extracts the main text. - Model Interfaces:
- OpenAI: Uses the
openaiPython SDK andgpt-4o-mini. - Ollama: Sends a POST request to the local server at
http://localhost:11434/api/chat.
- OpenAI: Uses the
- Prompt Handling: Combines system and user prompts for summarization.
- Display: Uses
IPython.display.Markdownto show the output in a notebook.
- Accepts any public URL (e.g. articles, blog posts)
- Supports both cloud-based and local language models
- Extracts clean content from HTML
- Returns summary in Markdown format
- Displays results inside Jupyter Notebooks
- Python 3.8 or higher
- Jupyter Lab or Notebook
- Active virtual environment (
llms,venv, etc.) .envfile containing:
OPENAI_API_KEY=sk-proj-...- Clone the repository:
git clone https://github.com/yourusername/webpage-summarizer.git
cd webpage-summarizer- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate
conda activate llms- Install dependencies:
pip install -r requirements.txt- Add your OpenAI API key:
Create a.envfile in the parent directory of your notebook:
conda env create -f environment.yml
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxxxxxxxxxxxxx- Start Ollama (for local models):
ollama run llama3.2- Run Jupyter Lab:
jupyter labdisplay_summary("https://en.wikipedia.org/wiki/OpenAI")This will generate a Markdown summary of the provided URL.
response = openai.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": USER_PROMPT}
]
)response = requests.post(
"http://localhost:11434/api/chat",
json={
"model": "llama3",
"messages": [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": USER_PROMPT}
]
}
)