Skip to content

An advanced RAG (Retrieval-Augmented Generation) system using RAPTOR algorithm to hierarchically organize and retrieve lessons from the 2011 Great East Japan Earthquake and Tsunami for educational purposes.

Notifications You must be signed in to change notification settings

tk-yasuno/tsunami-lesson-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŒŠ Tsunami Lesson RAPTOR: AI-Powered Disaster Education System

Hierarchical Knowledge Retrieval for Disaster Prevention Education

An advanced RAG (Retrieval-Augmented Generation) system using RAPTOR algorithm to hierarchically organize and retrieve lessons from the 2011 Great East Japan Earthquake and Tsunami for educational purposes.

Python LangChain Ollama License

๐ŸŽฏ Overview

This system was developed to preserve and pass on the lessons learned from the Great East Japan Earthquake and Tsunami of March 11, 2011, to future generations. Using the RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval) algorithm, it hierarchically organizes vast amounts of disaster lesson data, enabling context-preserving search and summarization.

๐Ÿš€ Key Features

  • ๐ŸŒฒ Hierarchical Knowledge Structure: Organizes lessons using RAPTOR algorithm
  • ๐Ÿ” Context-Aware Search: Semantic search considering hierarchical relationships
  • ๐Ÿ“ Automatic Summarization: LLM-powered context-preserving lesson summaries
  • ๐Ÿ“Š Optimized Clustering: Automatic optimal cluster selection using Silhouette strategy
  • ๐ŸŽฏ Disaster Education Focused: Specialized prompts and chunking for disaster prevention education
  • ๐Ÿ“ˆ Advanced Visualization: Comprehensive visualization suite with NetworkX, t-SNE, UMAP
  • ๐Ÿš€ 3D Dynamic Tree Visualization: Interactive 3D RAPTOR tree with cognee-inspired rendering

๐Ÿ“Š System Architecture

Knowledge Base Statistics

  • Total Content: 50,892 characters of comprehensive disaster lessons
  • Hierarchical Structure: 78-node knowledge tree with 4 depth levels
  • Cluster Distribution: Mean 33.1 documents, median 7.0 documents per cluster
  • Coverage: 20 major categories of disaster lessons

Technical Stack

  • LLM: granite-code:8b (GPU-optimized)
  • Embeddings: mxbai-embed-large (1024 dimensions)
  • Vector Storage: FAISS with hierarchical indexing
  • Clustering: Silhouette-based optimization (max_depth=3)
  • Visualization: NetworkX, matplotlib, seaborn, plotly, t-SNE, UMAP
  • 3D Visualization: Plotly WebGL, cognee-inspired dynamic 3D tree rendering

๐Ÿ“š Data Sources

The system integrates authoritative information from multiple reliable sources:

Major Topics (20 Categories)

  1. Damage Assessment & Initial Response

    • Earthquake/tsunami overview and casualties (15,900 deaths, 2,525 missing)
    • Emergency response by Self-Defense Forces, fire departments, and police
    • Transportation infrastructure, communication, and supply chain challenges
  2. Tsunami Evacuation Case Studies

    • Kesennuma City, Miyagi: Port facility evacuation
    • Otsuchi Town, Iwate: Disaster response headquarters damage and lessons
    • Iwaki City, Fukushima: Combined nuclear and tsunami disaster response
    • Minamisanriku Town, Miyagi: Disaster Prevention Building tragedy
    • Rikuzentakata City, Iwate: Urban destruction and reconstruction
  3. Disaster Prevention Education Examples

    • Kamaishi City: "Miracle of Kamaishi" and three principles
    • Sendai City: Disaster education supplementary readers and Arahama Elementary ruins
    • Ishinomaki City: Okawa Elementary lessons and school disaster system review
    • Fukushima Prefecture: Radiation education and scientific literacy
  4. Reconstruction Town Planning

    • Rikuzentakata City: 10m elevation reconstruction project (5 million mยณ of soil)
    • Onagawa Town: Compact city and Seapal-Pier Onagawa
    • Higashimatsushima City: Environmental Future City concept and smart housing
    • Natori City Yuriage: Challenges and lessons in resident consensus building
  5. Fukushima Nuclear Accident Details

    • Nuclear plant incident timeline and technical analysis
    • Evacuation procedures and long-term impact assessment
    • Radiation monitoring and safety protocols

๐Ÿ“Š Visualization Results

The system generates comprehensive visualizations to understand the hierarchical knowledge structure:

๐ŸŒฒ RAPTOR Tree Structure

RAPTOR Tree Structure

Hierarchical Knowledge Tree:

  • Total Nodes: 78 (distributed across 4 levels)
  • Root Level: 1 comprehensive overview node
  • Intermediate Levels: 4 โ†’ 16 category-specific summary nodes
  • Leaf Level: 57 specific lesson nodes
  • Node Colors: Depth-based (Root: green, Intermediate: blue, Leaf: yellow)
  • Node Sizes: Proportional to document count

๐Ÿ“Š Cluster Statistics Analysis

Cluster Statistics

Statistical Distribution:

  • Depth Distribution: Balanced 4-level hierarchy (1โ†’4โ†’16โ†’57)
  • Cluster Size Variation: Range from 1 to 433 documents
  • Mean Cluster Size: 33.1 documents
  • Median Cluster Size: 7.0 documents

๐Ÿ“ˆ Clustering Quality Evaluation

Evaluation Metrics

Quality Metrics:

  • Silhouette Coefficient: 0.0968 average (higher is better)
  • Davies-Bouldin Index: 2.8291 average (lower is better)
  • Optimal k Selection: k=2 (6 times), k=3 (2 times)
  • Strategy: Silhouette-based automatic optimization

๐ŸŽฏ High-Dimensional Visualization

t-SNE Visualization UMAP Visualization

Dimensionality Reduction:

  • t-SNE: 1024D โ†’ 2D projection preserving local structure
  • UMAP: Global structure preservation with clearer cluster boundaries
  • Interactive Versions: Available in HTML format for detailed exploration

๐Ÿ› ๏ธ Installation & Setup

Prerequisites

  1. Install Ollama (https://ollama.ai/)
  2. Download Required Models:
ollama pull mxbai-embed-large    # Embedding model (1024 dimensions)
ollama pull granite-code:8b      # LLM model (8B parameters)
  1. Install Python Dependencies:
pip install langchain langchain-community langchain-ollama
pip install faiss-cpu scikit-learn numpy pandas
pip install matplotlib seaborn plotly networkx
pip install umap-learn jupyter notebook

Verification

Test the installation:

python test_simple.py

Expected output:

โœ… Embedding model works! Vector dimension: 1024
โœ… LLM model works! Response: Hello! ...
โœ… All models are working correctly!

๐Ÿš€ Quick Start

Interactive Search

python quick_start.py

Sample Queries:

  • "What were effective tsunami evacuation actions?"
  • "What are the success factors of the Kamaishi Miracle?"
  • "What is the counterpart system?"
  • "What's important for disaster information transmission?"
  • "What are the challenges in reconstruction town planning?"

Batch Processing

python quick_start.py batch

Full Evaluation

python tsunami_lesson_raptor.py

๐Ÿ“Š Visualization Suite

Jupyter Notebook Analysis

jupyter notebook raptor_tree_visualization_tsunami.ipynb

The notebook contains 67 cells including:

  • Library imports and model configuration
  • Tree construction and data loading
  • NetworkX structure visualization
  • Statistical analysis and distribution plots
  • High-dimensional embedding projections
  • Interactive plot generation

Generated Visualizations

All visualizations are saved to output_figure/ directory:

Visualization Type File Purpose Key Information
๐ŸŒฒ Tree Structure 01_tree_structure.png Hierarchical overview 78 nodes, 4 levels, parent-child relationships
๐Ÿ“Š Statistics 02_cluster_statistics.png Numerical analysis Depth distribution, cluster sizes
๐Ÿ“ˆ Evaluation 03_evaluation_metrics.png Clustering quality Silhouette scores, DBI, k-selection
๐ŸŽฏ t-SNE 04_tsne_visualization.png Local structure Neighborhood preservation, cluster boundaries
๐Ÿ—บ๏ธ UMAP 05_umap_visualization.png Global structure Topology preservation, clear separation
๐Ÿ” Multi-layer 06_multi_layer_comparison.png Layer comparison Abstraction process, level structures
๐Ÿš€ Ultra-Fast 3D 07_ultra_fast_3d_raptor.html 3D tree structure Interactive WebGL, GPU-optimized
โšก Instant 3D 08_instant_3d_raptor.html One-click visualization Instant 3D tree (~3.8s total)

๐Ÿ”ฌ Technical Details

๐Ÿš€ 3D Dynamic Visualization System

Ultra-Fast Interactive 3D Tree Rendering

Our system features a cutting-edge 3D visualization system inspired by cognee principles, delivering interactive RAPTOR tree exploration in under 4 seconds:

Key Features

  • โšก Lightning Performance: 0.022s 3D generation, 3.8s total execution
  • ๐ŸŽฎ Interactive Controls: Mouse-based rotation, zoom, and pan
  • ๐ŸŒˆ Hierarchical Color Coding: Level-based visual distinction (Redโ†’Orangeโ†’Yellowโ†’Green)
  • ๐Ÿ“Š Dynamic Node Sizing: Document count-based proportional sizing
  • ๐Ÿ”— Edge Visualization: Parent-child relationship mapping
  • ๐Ÿ’พ Multi-format Output: Interactive HTML + static PNG exports

3D Structure Overview

78-Node Hierarchical Tree:
โ”œโ”€โ”€ Level 0 (Root): 1 node - 433 documents (Red)
โ”œโ”€โ”€ Level 1: 4 nodes - 108 docs/node (Orange)  
โ”œโ”€โ”€ Level 2: 16 nodes - 27 docs/node (Yellow)
โ””โ”€โ”€ Level 3: 57 nodes - 7 docs/node (Green)

Performance Metrics

3D_PERFORMANCE = {
    "initialization": "0.002s (cached)",
    "coordinate_calculation": "0.022s", 
    "webgl_rendering": "<0.2s",
    "total_execution": "3.759s",
    "speed_improvement": "1500x faster than baseline"
}

Technical Implementation

  • NumPy Vectorization: Batch coordinate calculation
  • Plotly WebGL: GPU-accelerated browser rendering
  • Pre-computed Data: Fixed tree structure for instant loading
  • Conditional Initialization: Duplicate processing prevention
  • Memory Optimization: Efficient data structures

RAPTOR Configuration

RAPTOR_CONFIG = {
    "max_depth": 3,                    # Maximum tree depth
    "selection_strategy": "silhouette", # Clustering strategy
    "chunk_size": 500,                 # Text chunk size
    "chunk_overlap": 100,              # Overlap between chunks
    "embedding_dimension": 1024,       # mxbai-embed-large dimension
    "temperature": 0.0,                # LLM temperature for consistency
}

Visualization Parameters

VISUALIZATION_CONFIG = {
    "tree_layout": "spring",           # NetworkX layout algorithm
    "figure_size": (15, 10),          # Figure size (inches)
    "node_size_range": (100, 3000),   # Node size range
    "edge_alpha": 0.6,                # Edge transparency
    "dpi": 300,                       # Image resolution
    "tsne_perplexity": 30,           # t-SNE parameter
    "umap_n_neighbors": 15,          # UMAP parameter
}

# 3D Visualization Configuration
RAPTOR_3D_CONFIG = {
    "instant_generation": True,       # One-click execution
    "webgl_optimization": True,       # GPU browser rendering
    "color_scheme": {                # Hierarchical colors
        0: "#FF0000",  # Root (Red)
        1: "#FF8000",  # Level 1 (Orange)  
        2: "#FFFF00",  # Level 2 (Yellow)
        3: "#00FF00"   # Level 3 (Green)
    },
    "node_size_scale": 20,           # Document count scaling
    "camera_position": (1.2, 1.2, 1.2),  # Default 3D view
    "background": "black",           # Scene background
    "export_formats": ["html", "png"]    # Output file types
}

๐Ÿ“ˆ Performance Metrics

System Performance

  • Tree Construction: ~2-3 minutes (initial build)
  • Search Response: <1 second (after model loading)
  • GPU Utilization: 100% (granite-code:8b)
  • Memory Usage: ~4GB RAM, ~8GB VRAM
  • 3D Visualization: 3.8s total execution (instant after initialization)
  • Interactive Rendering: WebGL real-time performance

Quality Metrics

  • Retrieval Accuracy: Context-aware hierarchical search
  • Clustering Quality: Silhouette coefficient 0.0968
  • Knowledge Coverage: 20 major disaster lesson categories
  • Language Support: Japanese (primary), English (documentation)

๐Ÿš€ Quick Start

3D Visualization Demo

To experience the interactive 3D RAPTOR tree visualization:

  1. Open Notebook: raptor_tree_visualization_tsunami.ipynb
  2. Run One-Click Cell: Execute the "๐Ÿš€ ใƒฏใƒณใ‚ฏใƒชใƒƒใ‚ฏ่ถ…้ซ˜้€Ÿ่ตทๅ‹•" cell
  3. Interactive Exploration: Use mouse to rotate, zoom, and explore the 78-node structure
  4. Export Options: Automatically generates HTML and PNG files
# One-click 3D visualization execution
# Cell execution time: ~3.8 seconds
# Output: Interactive 3D tree with WebGL rendering
instant_fig = instant_3d_raptor()
instant_fig.show()

Files Generated

  • output_figure/08_instant_3d_raptor.html - Interactive 3D visualization
  • output_figure/08_instant_3d_raptor.png - Static image export

๐ŸŽฏ Applications

Educational Use Cases

  1. Disaster Prevention Training: Structured lesson delivery
  2. Academic Research: Systematic disaster lesson analysis
  3. Policy Development: Evidence-based disaster preparedness
  4. Community Education: Accessible lesson sharing

Technical Applications

  1. Knowledge Management: Hierarchical information organization
  2. RAG System Development: Advanced retrieval techniques
  3. Visualization Research: Multi-modal data representation
  4. AI Education: RAPTOR algorithm implementation example
  5. 3D Interactive Learning: Immersive knowledge exploration
  6. Web-based Education: Browser-compatible 3D tree navigation

๐Ÿ”„ Future Development

  • Multilingual Support: English, Chinese, Korean translations
  • Web Interface: Browser-based interactive system
  • Knowledge Base Expansion: Global tsunami lessons integration
  • Multimedia Integration: Images, videos, audio materials
  • Real-time Updates: Dynamic knowledge base updates
  • Mobile Application: Smartphone-optimized interface
  • API Development: RESTful API for integration
  • Evaluation Dataset: JQaRA-format 30-question dataset

๐Ÿค Contributing

This project aims to preserve disaster lessons for educational purposes. We welcome contributions in:

  • Knowledge Base Expansion: Additional disaster lesson sources
  • Evaluation Dataset Creation: Standardized assessment materials
  • Documentation Improvement: Enhanced guides and tutorials
  • Internationalization: Translation and localization
  • Visualization Enhancement: New analysis techniques
  • Performance Optimization: Speed and efficiency improvements

๐Ÿ“„ License

This project is freely available for educational and research purposes. For commercial use, please check the licensing terms of the underlying LLM models.

๐Ÿ™ Acknowledgments

This system is based on research and activities from:

  • Cabinet Office Reconstruction Agency: "Lessons and Know-how Collection for Reconstruction"
  • Tohoku University International Research Institute of Disaster Science: Prof. Fumihiko Imamura's research
  • Kamaishi City Board of Education: Disaster prevention education practices
  • Storytellers nationwide: Community-based lesson sharing activities

We pray for those who lost their lives in the Great East Japan Earthquake and hope these lessons will contribute to future disaster prevention.


Version: 1.0 - International Edition
Created: October 20, 2025
Project: Tsunami Lesson RAPTOR System

๐Ÿ“ž Contact

For questions, suggestions, or collaboration opportunities, please open an issue in this repository.

Repository: https://github.com/tk-yasuno/tsunami-lesson-rag