AI-curated cultural events calendar for Athens, Greece. Transforms daily event newsletters and events scraped from the internet into SEO/GEO-optimized static pages designed for AI answer engines (ChatGPT, Perplexity, Claude), future A2A assistants, and humans.
Phase: Live Prototype (Developer-Only) Current Milestone: Available only to project developers Next Milestone: Make available to search engines Future Milestone: Release to small group of first users for feedback Mode: Static Site (Netlify CDN) Pages: Dynamic (currently ~315 pages based on active events) Location: Athens, Greece (EET/UTC+2) Deployment: Netlify (auto-deploy on git push)
Latest Update (Nov 3, 2025): β Full Automation Active - Running Daily at 8 AM
- β Email ingestion integrated (Priority 0 - fetches newsletters from Gmail)
- β Web scraping automation (3-step pipeline with frequency-based scheduling)
- β Cron job deployed (daily 8 AM Athens time)
- β State tracking & duplicate prevention
- β Automated archiving & logging
- See
docs/EMAIL-INGESTION-INTEGRATION.mdanddocs/CRON-AUTOMATION-SETUP.md
Start with Athens. Prove the model. Expand to agent-barcelona, agent-berlin, agent-cities. Become the global cultural events platform for the AI era, monetized through affiliate revenue (tickets, hotels, restaurants) and agent referral networks where AI agents earn commission on bookings they drive.
In the reputation economy where AI trust = revenue, agent-athens is positioned to be the source that AI engines cite first. We're building the infrastructure for affiliate marketing in the post-LLM world.
1A. Email Ingestion (β INTEGRATED - Automated newsletter fetching):
Status: Integrated into orchestrator as Priority 0 (runs before web scraping)
Workflow:
- Connect to Gmail via IMAP (
[email protected]) - Fetch unread newsletter emails from INBOX
- Save emails to
data/emails-to-parse/for Claude Code parsing - Mark as processed in
data/processed-emails.json(prevents reprocessing) - Archive emails (move from INBOX β All Mail)
Features:
- β Frequency-based scheduling: daily (integrated into orchestrator)
- β Duplicate prevention: Tracks Message-IDs to avoid reprocessing
- β Email archiving: Keeps inbox clean, creates audit trail
- β Timeout handling: 60-second timeout with 2 retries
- β State tracking: Included in orchestrator state management
Scripts:
scripts/ingest-emails.ts- Standalone email fetchingscripts/parse-emails.ts- Helper for Claude Code parsing workflowsrc/ingest/email-ingestion.ts- Core email fetching logic
Usage:
# Standalone (test only)
bun run scripts/ingest-emails.ts # Fetch emails
bun run scripts/ingest-emails.ts --dry-run # Preview
# Integrated (production)
bun run scripts/scrape-all-sources.ts # Runs email ingestion first (Step 0)Parsing Workflow:
- Emails saved to
data/emails-to-parse/ - Ask Claude Code: "Parse emails in data/emails-to-parse/ and add events to database"
- Claude Code uses FREE
tool_agent(no API costs!) - Events imported to database with auto-deduplication
Configuration: config/orchestrator-config.json β email_ingestion section
1B. Web Scraping (β AUTOMATED - Standalone orchestrator with frequency scheduling):
Orchestrator: bun run scripts/scrape-all-sources.ts
Features:
- β Frequency-based scheduling: daily/weekly/monthly (only scrapes when due)
- β Timeout handling: kills runaway processes (configurable per source)
- β Retry logic: exponential backoff (2s, 4s, 8s)
- β
State tracking:
data/scrape-state.json(timestamps, counts, errors) - β Rate limiting: configurable delays between sources
- β
CLI modes:
--force,--source=X,--dry-run
Configuration:
config/orchestrator-config.json- Pipeline config (timeout, retry, frequency, priority)config/scrape-list.json- Scraper config (sites, URLs, user agent)- Currently configured: viva.gr (daily), more.com (daily), gazarte.gr (weekly)
Pipeline (3-step execution per source):
- Scraper: Python scraper (
scripts/scrape-all-sites.py --site X)- Crawls website URL
- Saves HTML to
data/html-to-parse/
- Parser: Python parser (
scripts/parse_tier1_sites.py)- Extracts events from HTML
- Saves JSON to
data/parsed-events/
- Importer: Bun importer (
scripts/import-X-events.ts)- Imports events to database
- Auto-deduplicates by hash(title+date+venue)
Usage:
# Run all sources that are due (frequency-based)
bun run scripts/scrape-all-sources.ts
# Force all sources (ignore frequency)
bun run scripts/scrape-all-sources.ts --force
# Run specific source
bun run scripts/scrape-all-sources.ts --source=viva.gr
# Preview without executing
bun run scripts/scrape-all-sources.ts --dry-runAutomation: β ACTIVE - Cron job running daily at 8 AM
Cron Schedule:
0 8 * * * cd /Users/chrism/Project\ with\ Claude/AgentAthens/agent-athens && /Users/chrism/.bun/bin/bun run scripts/scrape-all-sources.ts >> logs/scrape-$(date +\%Y\%m\%d).log 2>&1Documentation: See docs/EMAIL-INGESTION-INTEGRATION.md, docs/CRON-AUTOMATION-SETUP.md, and docs/INTEGRATION-COMPLETE.md
1C. Database Upsert (Deduplication & Storage):
- Normalize all collected events (from email + web) to Schema.org format
- For each event:
- Generate event ID (hash of
title+date+venue) - Check if ID exists in database
- IF exists: UPDATE (description, price, URL changes)
- IF new: INSERT
- Generate event ID (hash of
- Log results summary
Output:
π Database Upsert Results:
β
X new events inserted
π Y events updated (price/description changes)
βοΈ Z duplicates skipped (already current)
Total: New raw events per day in database (dynamic number - realistically we don't expect hundreds of new events every day)
AI Description Generator (bun run scripts/enrich-events.ts):
For each event without a full description:
-
Build enrichment prompt:
- Include event metadata (title, type, venue, date, genre)
- Request exactly 400 words (Β±20 acceptable)
- Emphasize cultural context, artist background, what makes it special
- Avoid marketing fluff, focus on authentic storytelling
-
Call
tool_agent(need to work on scheduling a call to the Agent SDK to run the project like we manually do in Claude Code, which uses the internaltool_agent):- Generate compelling narrative description
- Include practical details naturally (time, location, price)
- Mention Athens neighborhood connections when relevant
- Never fabricate facts
-
Update database:
- Store in
full_descriptioncolumn - Update
updated_attimestamp - Word count validation (must be ~400 words)
- Store in
-
Rate limiting:
- 2-second pause between
tool_agentcalls - Handle 429 errors gracefully (wait 30s)
- Log progress and errors
- 2-second pause between
Output: All events have rich 400-word descriptions
Cost: CRITICAL: FREE when using tool_agent!
Automatic Event Lifecycle (within generate-site.ts):
-
Delete expired events:
- Remove events older than 1 day (past events)
- Keep today/future events only
- Maintains database size (~300-500 event categories typical)
-
Smart date handling:
- "Today" automatically updates daily
- "Tomorrow" becomes "today" (no manual updates)
- Past events disappear automatically
Output: Clean database with only current/future events
Combinatorial Page Generator (bun run build):
-
Load events from database:
const allEvents = getAllEvents(); console.log(`π₯ Loaded ${allEvents.length} events`);
-
Generate combinatorial pages (Type Γ Time Γ Price Γ Genre = dynamic number of pages, currently ~315):
Core time pages (dynamic count):
/today,/tomorrow,/this-week,/this-weekend,/this-month,/next-month,/all-events- TODO: Consider if we also need pages with specific values, such as
/november-2025(which currently appears as "next-month")
Type pages (dynamic count: types Γ time ranges):
/concert-today,/exhibition-this-week,/cinema-this-weekend, etc.
Price pages (dynamic count: price filters Γ time ranges):
/open-today,/with-ticket-this-week, etc.- Note: Using "open" and "with-ticket" terminology instead of "free/paid"
Type + Price pages (dynamic count: types Γ prices Γ time ranges):
/open-concert-today,/with-ticket-exhibition-this-week, etc.
Genre pages (dynamic count: genres Γ time ranges Γ prices):
/jazz-concert-today,/open-jazz-concert-this-week, etc.
-
For each page:
- Filter events matching criteria (type, time, price, genre)
- Generate HTML with Schema.org markup (Event CollectionPage)
- Generate JSON API (same data, different format)
- Handle empty pages gracefully ("0 events found, check back tomorrow")
- Add cross-links to related pages
-
Generate discovery files:
llms.txt- AI agent discovery (what this site offers) - TODO: Confirm if this is a good practicerobots.txt- Search engine crawling rulessitemap.xml- All URLs for search engines (dynamic count)
Output: HTML pages + JSON APIs + 3 discovery files (total size ~4.1 MB)
Build time: ~2-5 seconds (Bun is fast!)
Git + Netlify Auto-Deploy:
-
Commit changes:
git add dist/ data/events.db git commit -m "chore: Daily update $(date +%Y-%m-%d) - X new events added - Y events enriched with AI descriptions - Z past events removed π€ Automated daily update" git push origin main
-
Netlify detects push:
- Triggers build (instant - just copies files)
- Atomic deployment (zero downtime)
- Global CDN distribution
- SSL/HTTPS automatic
-
Site goes live:
- https://agent-athens.netlify.app
- All pages updated (dynamic count)
- Fresh data visible to users and AI agents
Deploy time: ~30 seconds (Netlify build + CDN propagation)
Total pipeline: ~20 minutes (collection β enrichment β generation β deployment)
Note: Designed for Athens (EET/EEST) but configurable for any timezone.
08:00 AM - Email ingestion + Web scraping (PRIORITY - need to develop now)
08:05 AM - AI enrichment (using tool_agent)
08:10 AM - Database cleanup
08:15 AM - Site generation (dynamic page count)
08:20 AM - Git commit + Netlify deploy
08:25 AM - β
Live site updated
Current (Manual): Run bun run build whenever events are added/updated. Manual from the perspective of the human developer can mean asking Claude Code to run these tasks. Agent SDK will be able to run them as standalone when called through the SDK.
Future (Automated): macOS launchd triggers full pipeline daily at 8 AM.
- Do we have new events? β Yes: Enrich with AI descriptions
- Are events enriched? β No: Run
scripts/enrich-events.ts - Are there past events? β Yes: Auto-cleanup on site generation
- Page has 0 events? β Still generate (show "0 events found" message)
- Database has changes? β Regenerate ALL pages (ensures consistency)
- Deployment ready? β Git push β Netlify auto-deploys
Before you start coding, you'll need to set up a few things manually. These are one-time setup tasks that enable you to own and operate the production infrastructure.
Why: You'll own the codebase and control deployments.
Steps:
- Fork or transfer the repository to your GitHub account:
- Option A: Fork:
https://github.com/ggrigo/agent-athensβ Click "Fork" - Option B: Transfer: Repo owner transfers ownership to you (Settings β Transfer)
- Option A: Fork:
- Clone YOUR repository:
git clone https://github.com/YOUR-USERNAME/agent-athens.git cd agent-athens - You now control the
mainbranch and all deployments
Why: You'll own the production deployment and domain.
Steps:
- Go to netlify.com and sign up (free tier is plenty)
- Click "Add new site" β "Import an existing project"
- Connect to your GitHub repository (
YOUR-USERNAME/agent-athens) - Configure build settings:
- Build command:
bun run build(or leave empty - we commitdist/) - Publish directory:
dist - Branch:
main
- Build command:
- Click "Deploy site"
- Your production site is live at:
agent-athens.netlify.app(or custom domain)
Auto-Deploy Setup:
- Netlify will auto-deploy every time you push to
main - Or use CLI:
netlify deploy --prod --dir=dist
Local CLI Setup:
npm install -g netlify-cli
netlify login
netlify link # Links to YOUR production siteWhy: To receive Athens event newsletters for testing email ingestion (Phase 1A - not yet implemented).
Steps:
- Create a new Gmail account (e.g.,
[email protected]) - Enable IMAP in Gmail:
- Settings β Forwarding and POP/IMAP β Enable IMAP
- Generate an App Password:
- Google Account β Security β 2-Step Verification β App Passwords
- Select "Mail" and generate a 16-character password
- Subscribe to Athens event newsletters:
- This is Athens: thisisathens.org
- Lifo Guide: lifo.gr/guide
- Venue newsletters: Six D.O.G.S, Gazarte, Bios, Fuzz Club, SNFCC
- Save your credentials to
.env:cp .env.example .env # Edit .env with your email and app password
Note: Email ingestion is not yet implemented. For now, the pipeline uses web scraping (no credentials needed).
Why: HTML parsing and event enrichment use Claude Code tool_agent (free with your subscription).
Steps:
- Install Claude Code CLI: Follow instructions at claude.ai/code
- Authenticate:
claude login - You're ready to use interactive AI features (HTML parsing, enrichment)
What you'll use it for:
- Parsing HTML event pages into structured JSON
- Generating 400-word event descriptions
- Database queries and transformations
# Install Bun (fast JavaScript runtime)
curl -fsSL https://bun.sh/install | bash
# Verify installation
bun --version# Clone repository
git clone https://github.com/ggrigo/agent-athens.git
cd agent-athens
# Install dependencies
bun install# Create SQLite database with schema
bun run scripts/init-database.ts
# Import sample events (optional)
bun run scripts/import-scraped-events.ts# Generate 400-word descriptions for all events
bun run scripts/enrich-events.ts
# Or enrich in batches (need to decide on tool_agent capacity)
bun run scripts/enrich-5-events.ts# Build all pages (dynamic count)
bun run build
# Output: dist/ directory with HTML + JSON files# First time: Connect to Netlify
netlify login
netlify init
# Deploy (or just git push if auto-deploy is configured)
bun run deploy
# Or manually: netlify deploy --prod --dir=distNote: The following is offered as an example architecture.
agent-athens/
βββ src/ # Source code
β βββ generate-site.ts # Main site generator (combinatorial logic)
β βββ types.ts # TypeScript types (Event, Filters, etc.)
β βββ db/ # Database layer
β β βββ database.ts # SQLite queries (insert, update, get events)
β βββ templates/ # HTML generation
β β βββ page.ts # Page renderer (Schema.org markup)
β βββ utils/ # Utilities
β βββ normalize.ts # Event normalization (Schema.org format)
β βββ filters.ts # Event filtering (type, time, price, genre)
β βββ urls.ts # URL building (/open-jazz-concert-today)
βββ scripts/ # Standalone scripts
β βββ init-database.ts # Database initialization
β βββ scrape-events.ts # Web scraping
β βββ import-scraped-events.ts # Import JSON events to DB
β βββ enrich-events.ts # AI description generation (all events)
β βββ enrich-5-events.ts # AI enrichment (batched)
βββ data/ # Data files (gitignored except .sql)
β βββ events.db # SQLite database (gitignored)
β βββ events.sql # Database schema
β βββ scraped-events.json # Raw scraped events (gitignored)
β βββ unenriched-events.json # Events pending enrichment (gitignored)
βββ dist/ # Generated static site (gitignored locally, committed for Netlify)
β βββ *.html # HTML pages (dynamic count)
β βββ api/*.json # JSON API endpoints (dynamic count)
β βββ llms.txt # AI agent discovery
β βββ robots.txt # Search engine rules
β βββ sitemap.xml # Search engine sitemap
βββ logs/ # Runtime logs (gitignored)
βββ netlify.toml # Netlify configuration
βββ package.json # Bun dependencies
βββ tsconfig.json # TypeScript configuration
βββ .gitignore # Git exclusions
βββ README.md # This file
βββ PROJECT_DESCRIPTION.md # Full technical overview
βββ ELEVATOR_PITCH.md # 30-second + 2-minute pitches
βββ IMPLEMENTATION_PLAN.md # 4-step daily pipeline architecture
βββ ENRICHMENT_README.md # AI enrichment guide
βββ COMBINATORIAL_SEO_STRATEGY.md # SEO/GEO strategy documentation
Note: The following values are offered as examples.
- Input: Curated events daily (dynamic count)
- Output: Unique pages (dynamic count, currently ~315)
- Multiplier: Dynamic coverage based on event diversity
- Update Frequency: Daily (8:00 AM Athens time)
Note: These are example values. We need to define the multi-dimensional cube and its accepted values.
- Event Types: 6 (concert, exhibition, cinema, theater, performance, workshop)
- Time Ranges: 7 (today, tomorrow, this-week, this-weekend, this-month, next-month, all-events)
- Price Filters: 2 (open, with-ticket)
- Genres: Top genres per type (dynamic, based on actual events)
Pattern: /{price}-{genre}-{type}-{time}
Note: Using "open" and "with-ticket" terminology β
Examples:
/open-jazz-concert-today/contemporary-art-exhibition-this-week/with-ticket-electronic-concert-this-weekend/cinema-this-month/open-today
- Active Events: 300-500 event categories at any time (rolling window)
- Cleanup Policy: Events older than 1 day auto-deleted
- Retention: 90 days historical (future enhancement)
- File Size: 2-5 MB (SQLite)
- Model: Anthropic Agent SDK (
tool_agent) - Word Count: 400 words (Β±20 acceptable)
- Rate Limit: 2 seconds between requests
- Cost: FREE (using
tool_agent)
Current:
- Manual Scraping: This is Athens, SNFCC, Gazarte, Bios, Six D.O.G.S, Fuzz Club
- Need: List of websites to crawl
- SQLite Database: Persistent event storage
In Development:
- Gmail IMAP: Automated newsletter ingestion (
[email protected])- Need: List of active newsletters for monitoring and documentation of event catchment scope
Note: We have done minimal analysis. The below are offered as a direction that must be confirmed, and in time will be updated and modified as we learn more about the best practices of SEO/GEO.
Discovery:
llms.txt- Tells AI agents what this site offers (TODO: Confirm if this is a good practice)- Schema.org markup - Machine-readable event data (Event + CollectionPage)
- Freshness signals - Explicit "Last updated: Oct 19, 2025" timestamps
Trust Signals:
- Daily updates (freshness = AI trust)
- Structured data (easy to parse)
- Single source (no conflicting data)
- Specific pages (exact intent matching)
Citation Format Example:
User: "What open concerts are in Athens this weekend?"
AI Agent Response:
"According to agent-athens (updated today), there are 3 open concerts
this weekend:
1. Jazz Night at Six D.O.G.S (Friday, Oct 20)
2. Electronic Showcase at Bios (Saturday, Oct 21)
3. Indie Band at Fuzz Club (Sunday, Oct 22)
Source: https://agent-athens.netlify.app/open-concert-this-weekend"
- Keyword-rich URLs (
/open-jazz-concert-today) - Semantic HTML with proper headings
- Mobile-responsive design
- Fast loading (static HTML, global CDN)
- Internal linking (related pages)
- Content Validation: Word count checks on AI descriptions (~400 words)
- Date Handling: Automatic cleanup of past events (no stale data)
- Error Handling: Rate limit detection, retry logic, progress logging
- Empty Pages: Graceful handling (show "0 events found" message)
- URL Stability: All URLs always exist (even with 0 events)
Edit src/generate-site.ts:
// Skip site generation
if (process.env.PAUSE_GENERATION === 'true') {
console.log('βΈοΈ Generation paused via PAUSE_GENERATION flag');
process.exit(0);
}# Delete dist/ and regenerate everything
rm -rf dist/
bun run build# Revert to previous Netlify deployment
netlify rollbackRuntime logs in logs/:
scrape-YYYY-MM-DD.log- Web scrapingenrich-YYYY-MM-DD.log- AI enrichmentbuild-YYYY-MM-DD.log- Site generationdeploy-YYYY-MM-DD.log- Netlify deployment
# Test database initialization
bun run scripts/init-database.ts
# Test event import
bun run scripts/import-scraped-events.ts
# Test AI enrichment (batched - need to decide on tool_agent capacity)
bun run scripts/enrich-5-events.ts
# Test site generation
bun run build
# Check output
ls -lh dist/*.html | head -10
# Test deployment (dry run)
netlify deploy --dir=dist
# Then check deploy preview URLCreate ~/Library/LaunchAgents/com.user.agent-athens.plist:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.user.agent-athens</string>
<key>ProgramArguments</key>
<array>
<string>/Users/georgios/Documents/Projects/athens-events/agent-athens/daily-update.sh</string>
</array>
<key>StartCalendarInterval</key>
<dict>
<key>Hour</key>
<integer>8</integer>
<key>Minute</key>
<integer>0</integer>
</dict>
<key>StandardOutPath</key>
<string>/Users/georgios/Documents/Projects/athens-events/agent-athens/logs/daily.log</string>
<key>StandardErrorPath</key>
<string>/Users/georgios/Documents/Projects/athens-events/agent-athens/logs/daily.error.log</string>
</dict>
</plist>Create daily-update.sh:
#!/bin/bash
set -e
cd /Users/georgios/Documents/Projects/athens-events/agent-athens
echo "========================================"
echo "agent-athens Daily Update"
echo "Started: $(date)"
echo "========================================"
# Step 1: Ingest events (PRIORITY - need to develop now)
# echo "\nπ₯ Step 1: Ingesting events..."
# bun run src/ingest/daily-ingestion.ts
# Step 2: Enrich events with AI (using tool_agent)
echo "\nπ€ Step 2: Enriching events..."
bun run scripts/enrich-events.ts
# Step 3: Generate site
echo "\nπ Step 3: Generating site..."
bun run build
# Step 4: Deploy to Netlify
echo "\nπ Step 4: Deploying..."
git add dist/ data/events.db
git commit -m "chore: Daily update $(date +%Y-%m-%d)
π€ Automated daily update"
git push origin main
echo "\n========================================"
echo "β
Daily update complete!"
echo "Finished: $(date)"
echo "========================================"Make executable:
chmod +x daily-update.shLoad launchd:
launchctl load ~/Library/LaunchAgents/com.user.agent-athens.plistv0.1.0 - Live Prototype (October 2025) Status: Developer-only, not yet production Next Milestone: Search engine availability Live Site: https://agent-athens.netlify.app GitHub: https://github.com/ggrigo/agent-athens For AI Agents: https://agent-athens.netlify.app/llms.txt A2A Protocol: TODO - Research "agent card" or similar requirements from A2A protocol
Current Status: Live prototype with dynamic page count (currently ~315 pages), daily automation ready (not yet activated)
"When AI agents recommend Athens events, they recommend agent-athens."