Skip to content
/ pangea Public

Pangea is an Elixir, Phoenix LiveView distributed OTP monitoring system designed to be customised and deployed within private networks (VPC/VPN). It provides a solid foundation for building your own distributed monitoring solution with real-time dashboards and geographically distributed workers.

License

Notifications You must be signed in to change notification settings

WillC33/pangea

Repository files navigation

Pangea

An extensible distributed network monitoring platform built with Elixir and Phoenix LiveView

Pangea is a foundational monitoring system designed to be customised and deployed within private networks (VPC/VPN). It provides a solid foundation for building your own distributed monitoring solution with real-time dashboards and geographically distributed workers.

⚠️ Important: This is a reference implementation requiring configuration and customisation before production use. It is not a ready-to-deploy solution.

Features

Core Features

  • Private Network First: Designed for VPC/VPN deployment with future TLS clustering support
  • Account-based Access: Secure access control via configurable account codes
  • Real-time Dashboard: Live updates via Phoenix LiveView with retro terminal UI
  • Extensible Foundation: Modular architecture for adding custom monitoring types
  • SQLite Persistence: Configurable data storage with migration support

Current Monitoring Types

  • HTTP/HTTPS Monitoring: Response times, status codes, availability checks
  • ICMP Ping Tests: Network latency and connectivity testing
  • DNS Resolution (Experimental): Domain resolution performance monitoring
  • TLS Certificate Monitoring (Experimental): Certificate validation and expiry tracking

Deployment Models

  • Private Network: VPC/VPN deployment with cookie-based clustering (current)
  • Public Internet (Planned): TLS-secured clustering for public deployments
  • Hybrid: Mixed private/public worker deployments

Prerequisites & Dependencies

System Requirements

  • Elixir 1.14+
  • Erlang/OTP 25+
  • Docker and Docker Compose (for development)
  • Private network connectivity between nodes (VPC/VPN)

Configuration Requirements

Before deployment, you must configure:

  • Email service (SMTP credentials for transactional emails)
  • Secret keys and security tokens
  • Account access codes
  • Database paths and persistence settings
  • Node networking and discovery

This is not a plug-and-play solution - expect to modify configuration files, environment variables, and potentially code before deployment.

Quick Start

Development Setup

  1. Clone the repository

    git clone <repository-url>
    cd pangea
  2. Install dependencies

    mix deps.get
  3. Setup the database

    mix ecto.setup
  4. Start the coordinator node

    mix phx.server
  5. Access the dashboard Open http://localhost:4000 in your browser

Docker Development

The included Docker Compose setup provides a complete development environment:

# Start development cluster
make up

# Monitor logs
make logs

# Access coordinator dashboard
open http://localhost:4000

Architecture & Design

Network Security Model

Current: Private Network Deployment

  • Uses Erlang distributed clustering with shared cookies
  • Requires VPC or VPN connectivity between nodes
  • No public port exposure required
  • Suitable for internal infrastructure monitoring

Planned: Public Internet Deployment

  • TLS-secured clustering for public network deployment
  • Certificate-based authentication between nodes
  • Encrypted metric transmission
  • Suitable for global monitoring networks

Authentication Model

  • Account Code System: Users must provide a valid account code to access the system
  • Session Management: Standard Phoenix authentication with configurable session duration
  • Worker Authentication: Secure node-to-node communication via clustering protocol

Data Persistence

  • SQLite Database: Configurable storage location for historical metrics
  • In-Memory Caching: Recent metrics cached for real-time dashboard updates
  • Migration Support: Ecto migrations for schema updates
  • Backup Friendly: Single file database for easy backup/restore

Configuration Guide

Essential Configuration

1. Email Service Configuration

# config/runtime.exs
config :pangea, Pangea.Mailer,
  adapter: Swoosh.Adapters.SMTP,
  relay: System.get_env("SMTP_RELAY"),
  username: System.get_env("SMTP_USERNAME"),
  password: System.get_env("SMTP_PASSWORD"),
  port: System.get_env("SMTP_PORT") || 587

2. Account Access Codes

# config/runtime.exs
config :pangea, :account_codes, [
  System.get_env("ACCOUNT_CODE_1"),
  System.get_env("ACCOUNT_CODE_2")
  # Add more as needed
]

3. Security Secrets

# Generate secure secrets
mix phx.gen.secret

4. Database Configuration

# config/runtime.exs
config :pangea, Pangea.Repo,
  database: System.get_env("DATABASE_PATH") || "/data/pangea.db",
  pool_size: String.to_integer(System.get_env("POOL_SIZE") || "5")

Deployment Options

Docker Compose (Development)

The included Docker Compose setup provides a complete development environment:

# Start development cluster
make up

# Monitor logs
make logs

# Access coordinator dashboard
open http://localhost:4000

Infrastructure as Code

Terraform Example (requires customisation)

# Example VPC deployment
module "pangea_vpc" {
  source = "./terraform/modules/vpc"
  
  coordinator_instance_type = "t3.medium"
  worker_instance_type      = "t3.small"
  worker_regions           = ["us-east-1", "eu-west-1", "ap-southeast-1"]
  
  account_codes = var.account_codes
  smtp_config   = var.smtp_config
}

Docker Swarm Example (requires customisation)

# docker-stack.yml
version: '3.8'
services:
  coordinator:
    image: your-registry/pangea:latest
    environment:
      NODE_TYPE: coordinator
      ACCOUNT_CODES: ${ACCOUNT_CODES}
      SMTP_RELAY: ${SMTP_RELAY}
    networks:
      - pangea_private
    deploy:
      placement:
        constraints: [node.role == manager]

  worker-eu:
    image: your-registry/pangea:latest
    environment:
      NODE_TYPE: worker
      REGION: eu-west-1
    networks:
      - pangea_private

Manual Deployment

  1. Prepare Configuration Files

    cp config/runtime.exs.example config/runtime.exs
    # Edit with your specific configuration
  2. Build Release

    MIX_ENV=prod mix release
  3. Deploy to Target Infrastructure

    # Copy release to target servers
    # Configure systemd services or Docker containers
    # Ensure network connectivity between nodes

Usage

Creating Monitoring Jobs

  1. Access the dashboard at /dash
  2. Click [+ NEW MONITOR]
  3. Configure:
    • Target: Domain or IP address to monitor
    • Type: ping, http, dns, or tls
    • Worker: Specific worker or all workers
  4. Click [EXECUTE] to start monitoring

Monitoring Types

HTTP Monitoring

  • Monitors: Response time, status codes, response size
  • Target format: example.com or api.example.com/health
  • Automatically uses HTTPS

Ping Monitoring

  • Monitors: Network latency, packet loss
  • Target format: example.com or IP address
  • Uses ICMP echo requests

DNS Monitoring

  • Monitors: Resolution time, DNS response status
  • Target format: example.com
  • Tests A record resolution

TLS Monitoring

  • Monitors: Certificate validity, handshake time
  • Target format: example.com:443
  • Checks certificate expiration

Viewing Results

  • Live Dashboard: Real-time status and metrics
  • Drill-down View: Detailed history and response time charts
  • ASCII Charts: Visual representation of performance trends
  • Job Management: Start/stop monitoring jobs per worker

Extending Pangea

Adding Custom Monitoring Types

# lib/pangea/monitoring/custom_monitor.ex
defmodule Pangea.Monitoring.CustomMonitor do
  alias Pangea.Metrics.CustomResult
  
  @check_interval 60_000
  
  def start_link(target, worker_pid) do
    Task.start_link(fn -> monitor_loop(target, worker_pid) end)
  end
  
  defp monitor_loop(target, worker_pid) do
    result = perform_custom_check(target)
    GenServer.cast(worker_pid, {:custom_completed, target, result})
    Process.sleep(@check_interval)
    monitor_loop(target, worker_pid)
  end
  
  defp perform_custom_check(target) do
    # Implement your custom monitoring logic
    %CustomResult{
      target: target,
      worker_node: node(),
      status: :ok,
      timestamp: DateTime.utc_now(),
      custom_metric: "your_value"
    }
  end
end

Custom Metric Types

# lib/pangea/metrics/custom_result.ex
defmodule Pangea.Metrics.CustomResult do
  @derive Jason.Encoder
  @enforce_keys [:target, :worker_node, :timestamp]
  defstruct [:target, :worker_node, :status, :timestamp, :custom_metric]
  
  @type t :: %__MODULE__{
    target: String.t(),
    worker_node: atom(),
    status: :ok | :error,
    timestamp: DateTime.t(),
    custom_metric: any()
  }
end

defimpl Pangea.Metrics.Pushable, for Pangea.Metrics.CustomResult do
  def validate(%Pangea.Metrics.CustomResult{target: nil}), do: {:error, :missing_target}
  def validate(_), do: :ok
end

Configuration Customisation

Environment-Specific Settings

# config/prod.exs or config/staging.exs
import Config

# Custom monitoring intervals
config :pangea, :monitoring_intervals,
  http: 30_000,
  ping: 30_000,
  dns: 60_000,
  tls: 3_600_000,
  custom: 120_000

# Regional worker configuration
config :pangea, :worker_regions, [
  "us-east-1": "worker-use1@internal.example.com",
  "eu-west-1": "worker-euw1@internal.example.com",
  "ap-southeast-1": "worker-apse1@internal.example.com"
]

Development & Customisation

Project Structure

lib/
├── pangea/
│   ├── NodeManager/          # Coordinator and worker logic
│   ├── Monitoring/           # Monitor implementations (HTTP, Ping, DNS, TLS)
│   ├── Metrics/              # Metric data structures and validation
│   ├── Telemetry/            # Event handling and metric emission
│   └── Accounts/             # User authentication and account codes
├── pangea_web/
│   ├── live/                 # LiveView dashboard components
│   ├── controllers/          # Phoenix HTTP controllers
│   └── components/           # Reusable UI components
config/
├── config.exs               # Base configuration
├── dev.exs                  # Development settings
├── prod.exs                 # Production settings (requires customisation)
└── runtime.exs              # Runtime configuration (requires customisation)

Development Workflow

# Clone and setup
git clone <your-fork>
cd pangea
mix deps.get
mix ecto.setup

# Development with live reloading
iex -S mix phx.server

# Run tests
mix test

# Format code
mix format

# Generate documentation
mix docs

Testing Custom Monitors

# Debug specific worker nodes
make debug-coordinator
make debug-nyc
make debug-london

# Test custom monitoring logic
iex> Pangea.Monitoring.CustomMonitor.start_link("target.example.com", self())

# Reload changes in development
make reload

Security Considerations

Private Network Deployment (Current)

  • Cookie-based clustering: Ensure Erlang cookies are kept secret
  • VPC/VPN only: Do not expose clustering ports (4369, 9100+) to public internet
  • Account codes: Use strong, randomly generated account access codes
  • Database security: Protect SQLite database files with appropriate file permissions

Future Public Internet Deployment

  • TLS clustering: Certificate-based node authentication
  • Encrypted metrics: All data transmission will be encrypted
  • Certificate management: Proper PKI infrastructure required
  • Network security: Additional firewall and intrusion detection recommended

General Security

  • Secret management: Use proper secret management tools (HashiCorp Vault, AWS Secrets Manager)
  • Regular updates: Keep Elixir/Erlang and dependencies updated
  • Access logging: Monitor authentication attempts and access patterns
  • Backup encryption: Encrypt database backups

Troubleshooting

Common Issues

Workers not connecting

  • Check Erlang cookie matches between coordinator and workers
  • Verify network connectivity on EPMD port 4369
  • Ensure hostnames resolve correctly

High memory usage

  • Metrics are stored in memory - restart nodes to clear
  • Consider reducing monitoring frequency
  • Monitor number of active jobs

Database locked errors

  • SQLite doesn't handle high concurrency well
  • Consider PostgreSQL for production deployments

Debug Commands

# Check cluster status
Node.list()

# View active monitoring jobs
Pangea.NodeManager.Jobs.list_active_jobs()

# Check worker connectivity
Node.ping(:"worker-nyc@hostname")

Roadmap

Short Term

  • TLS Clustering: Certificate-based authentication for public internet deployment
  • Enhanced DNS Monitoring: Support for different record types (MX, CNAME, TXT)
  • Improved TLS Checks: Certificate chain validation and expiry alerting
  • Configuration Templates: Example configurations for common deployment scenarios

Medium Term

  • Historical Analytics: Long-term trend analysis and reporting
  • Alerting Framework: Pluggable notification system (email, webhooks, Slack)
  • API Endpoints: REST API for external integrations and automation
  • Worker Auto-discovery: Automatic worker registration and health checking

Long Term

  • Custom Monitor Plugins: Hot-swappable monitoring modules
  • Geographic Visualization: Map-based worker and target visualization
  • Multi-tenant Support: Organization-level isolation and management
  • Performance Optimization: Enhanced clustering and metric aggregation

Support & Contributing

Getting Help

  • Issues: Report bugs and request features via GitHub Issues
  • Documentation: Check the code documentation with mix docs
  • Examples: See config/*.example files for configuration templates

Contributing Guidelines

This is an extensible foundation - contributions should focus on:

  • Core functionality improvements: Better clustering, monitoring accuracy
  • New monitoring types: Additional protocols and service checks
  • Security enhancements: Authentication, encryption, access control
  • Documentation: Setup guides, configuration examples, deployment patterns

Before contributing:

  1. Fork the repository and create a feature branch
  2. Ensure tests pass (mix test)
  3. Follow Elixir formatting standards (mix format)
  4. Add documentation for new features
  5. Consider backward compatibility for configuration changes

Development Support

  • Use GitHub Discussions for architecture questions
  • Tag issues with appropriate labels (bug, enhancement, documentation)
  • Include minimal reproduction cases for bug reports
  • Provide configuration examples for feature requests

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.


⚠️ Important Disclaimers:

  • This is a foundational example requiring significant customisation
  • No support is provided for production deployments
  • You are responsible for security, scaling, and maintenance
  • Test thoroughly in your specific environment before production use

Built with Elixir, Phoenix LiveView, and distributed systems patterns.

About

Pangea is an Elixir, Phoenix LiveView distributed OTP monitoring system designed to be customised and deployed within private networks (VPC/VPN). It provides a solid foundation for building your own distributed monitoring solution with real-time dashboards and geographically distributed workers.

Topics

Resources

License

Stars

Watchers

Forks