An extensible distributed network monitoring platform built with Elixir and Phoenix LiveView
Pangea is a foundational monitoring system designed to be customised and deployed within private networks (VPC/VPN). It provides a solid foundation for building your own distributed monitoring solution with real-time dashboards and geographically distributed workers.
- Private Network First: Designed for VPC/VPN deployment with future TLS clustering support
- Account-based Access: Secure access control via configurable account codes
- Real-time Dashboard: Live updates via Phoenix LiveView with retro terminal UI
- Extensible Foundation: Modular architecture for adding custom monitoring types
- SQLite Persistence: Configurable data storage with migration support
- HTTP/HTTPS Monitoring: Response times, status codes, availability checks
- ICMP Ping Tests: Network latency and connectivity testing
- DNS Resolution (Experimental): Domain resolution performance monitoring
- TLS Certificate Monitoring (Experimental): Certificate validation and expiry tracking
- Private Network: VPC/VPN deployment with cookie-based clustering (current)
- Public Internet (Planned): TLS-secured clustering for public deployments
- Hybrid: Mixed private/public worker deployments
- Elixir 1.14+
- Erlang/OTP 25+
- Docker and Docker Compose (for development)
- Private network connectivity between nodes (VPC/VPN)
Before deployment, you must configure:
- Email service (SMTP credentials for transactional emails)
- Secret keys and security tokens
- Account access codes
- Database paths and persistence settings
- Node networking and discovery
This is not a plug-and-play solution - expect to modify configuration files, environment variables, and potentially code before deployment.
-
Clone the repository
git clone <repository-url> cd pangea
-
Install dependencies
mix deps.get
-
Setup the database
mix ecto.setup
-
Start the coordinator node
mix phx.server
-
Access the dashboard Open http://localhost:4000 in your browser
The included Docker Compose setup provides a complete development environment:
# Start development cluster
make up
# Monitor logs
make logs
# Access coordinator dashboard
open http://localhost:4000Current: Private Network Deployment
- Uses Erlang distributed clustering with shared cookies
- Requires VPC or VPN connectivity between nodes
- No public port exposure required
- Suitable for internal infrastructure monitoring
Planned: Public Internet Deployment
- TLS-secured clustering for public network deployment
- Certificate-based authentication between nodes
- Encrypted metric transmission
- Suitable for global monitoring networks
- Account Code System: Users must provide a valid account code to access the system
- Session Management: Standard Phoenix authentication with configurable session duration
- Worker Authentication: Secure node-to-node communication via clustering protocol
- SQLite Database: Configurable storage location for historical metrics
- In-Memory Caching: Recent metrics cached for real-time dashboard updates
- Migration Support: Ecto migrations for schema updates
- Backup Friendly: Single file database for easy backup/restore
1. Email Service Configuration
# config/runtime.exs
config :pangea, Pangea.Mailer,
adapter: Swoosh.Adapters.SMTP,
relay: System.get_env("SMTP_RELAY"),
username: System.get_env("SMTP_USERNAME"),
password: System.get_env("SMTP_PASSWORD"),
port: System.get_env("SMTP_PORT") || 5872. Account Access Codes
# config/runtime.exs
config :pangea, :account_codes, [
System.get_env("ACCOUNT_CODE_1"),
System.get_env("ACCOUNT_CODE_2")
# Add more as needed
]3. Security Secrets
# Generate secure secrets
mix phx.gen.secret4. Database Configuration
# config/runtime.exs
config :pangea, Pangea.Repo,
database: System.get_env("DATABASE_PATH") || "/data/pangea.db",
pool_size: String.to_integer(System.get_env("POOL_SIZE") || "5")The included Docker Compose setup provides a complete development environment:
# Start development cluster
make up
# Monitor logs
make logs
# Access coordinator dashboard
open http://localhost:4000Terraform Example (requires customisation)
# Example VPC deployment
module "pangea_vpc" {
source = "./terraform/modules/vpc"
coordinator_instance_type = "t3.medium"
worker_instance_type = "t3.small"
worker_regions = ["us-east-1", "eu-west-1", "ap-southeast-1"]
account_codes = var.account_codes
smtp_config = var.smtp_config
}Docker Swarm Example (requires customisation)
# docker-stack.yml
version: '3.8'
services:
coordinator:
image: your-registry/pangea:latest
environment:
NODE_TYPE: coordinator
ACCOUNT_CODES: ${ACCOUNT_CODES}
SMTP_RELAY: ${SMTP_RELAY}
networks:
- pangea_private
deploy:
placement:
constraints: [node.role == manager]
worker-eu:
image: your-registry/pangea:latest
environment:
NODE_TYPE: worker
REGION: eu-west-1
networks:
- pangea_private-
Prepare Configuration Files
cp config/runtime.exs.example config/runtime.exs # Edit with your specific configuration -
Build Release
MIX_ENV=prod mix release
-
Deploy to Target Infrastructure
# Copy release to target servers # Configure systemd services or Docker containers # Ensure network connectivity between nodes
- Access the dashboard at
/dash - Click [+ NEW MONITOR]
- Configure:
- Target: Domain or IP address to monitor
- Type: ping, http, dns, or tls
- Worker: Specific worker or all workers
- Click [EXECUTE] to start monitoring
HTTP Monitoring
- Monitors: Response time, status codes, response size
- Target format:
example.comorapi.example.com/health - Automatically uses HTTPS
Ping Monitoring
- Monitors: Network latency, packet loss
- Target format:
example.comor IP address - Uses ICMP echo requests
DNS Monitoring
- Monitors: Resolution time, DNS response status
- Target format:
example.com - Tests A record resolution
TLS Monitoring
- Monitors: Certificate validity, handshake time
- Target format:
example.com:443 - Checks certificate expiration
- Live Dashboard: Real-time status and metrics
- Drill-down View: Detailed history and response time charts
- ASCII Charts: Visual representation of performance trends
- Job Management: Start/stop monitoring jobs per worker
# lib/pangea/monitoring/custom_monitor.ex
defmodule Pangea.Monitoring.CustomMonitor do
alias Pangea.Metrics.CustomResult
@check_interval 60_000
def start_link(target, worker_pid) do
Task.start_link(fn -> monitor_loop(target, worker_pid) end)
end
defp monitor_loop(target, worker_pid) do
result = perform_custom_check(target)
GenServer.cast(worker_pid, {:custom_completed, target, result})
Process.sleep(@check_interval)
monitor_loop(target, worker_pid)
end
defp perform_custom_check(target) do
# Implement your custom monitoring logic
%CustomResult{
target: target,
worker_node: node(),
status: :ok,
timestamp: DateTime.utc_now(),
custom_metric: "your_value"
}
end
end# lib/pangea/metrics/custom_result.ex
defmodule Pangea.Metrics.CustomResult do
@derive Jason.Encoder
@enforce_keys [:target, :worker_node, :timestamp]
defstruct [:target, :worker_node, :status, :timestamp, :custom_metric]
@type t :: %__MODULE__{
target: String.t(),
worker_node: atom(),
status: :ok | :error,
timestamp: DateTime.t(),
custom_metric: any()
}
end
defimpl Pangea.Metrics.Pushable, for Pangea.Metrics.CustomResult do
def validate(%Pangea.Metrics.CustomResult{target: nil}), do: {:error, :missing_target}
def validate(_), do: :ok
endEnvironment-Specific Settings
# config/prod.exs or config/staging.exs
import Config
# Custom monitoring intervals
config :pangea, :monitoring_intervals,
http: 30_000,
ping: 30_000,
dns: 60_000,
tls: 3_600_000,
custom: 120_000
# Regional worker configuration
config :pangea, :worker_regions, [
"us-east-1": "worker-use1@internal.example.com",
"eu-west-1": "worker-euw1@internal.example.com",
"ap-southeast-1": "worker-apse1@internal.example.com"
]lib/
├── pangea/
│ ├── NodeManager/ # Coordinator and worker logic
│ ├── Monitoring/ # Monitor implementations (HTTP, Ping, DNS, TLS)
│ ├── Metrics/ # Metric data structures and validation
│ ├── Telemetry/ # Event handling and metric emission
│ └── Accounts/ # User authentication and account codes
├── pangea_web/
│ ├── live/ # LiveView dashboard components
│ ├── controllers/ # Phoenix HTTP controllers
│ └── components/ # Reusable UI components
config/
├── config.exs # Base configuration
├── dev.exs # Development settings
├── prod.exs # Production settings (requires customisation)
└── runtime.exs # Runtime configuration (requires customisation)
# Clone and setup
git clone <your-fork>
cd pangea
mix deps.get
mix ecto.setup
# Development with live reloading
iex -S mix phx.server
# Run tests
mix test
# Format code
mix format
# Generate documentation
mix docs# Debug specific worker nodes
make debug-coordinator
make debug-nyc
make debug-london
# Test custom monitoring logic
iex> Pangea.Monitoring.CustomMonitor.start_link("target.example.com", self())
# Reload changes in development
make reload- Cookie-based clustering: Ensure Erlang cookies are kept secret
- VPC/VPN only: Do not expose clustering ports (4369, 9100+) to public internet
- Account codes: Use strong, randomly generated account access codes
- Database security: Protect SQLite database files with appropriate file permissions
- TLS clustering: Certificate-based node authentication
- Encrypted metrics: All data transmission will be encrypted
- Certificate management: Proper PKI infrastructure required
- Network security: Additional firewall and intrusion detection recommended
- Secret management: Use proper secret management tools (HashiCorp Vault, AWS Secrets Manager)
- Regular updates: Keep Elixir/Erlang and dependencies updated
- Access logging: Monitor authentication attempts and access patterns
- Backup encryption: Encrypt database backups
Workers not connecting
- Check Erlang cookie matches between coordinator and workers
- Verify network connectivity on EPMD port 4369
- Ensure hostnames resolve correctly
High memory usage
- Metrics are stored in memory - restart nodes to clear
- Consider reducing monitoring frequency
- Monitor number of active jobs
Database locked errors
- SQLite doesn't handle high concurrency well
- Consider PostgreSQL for production deployments
# Check cluster status
Node.list()
# View active monitoring jobs
Pangea.NodeManager.Jobs.list_active_jobs()
# Check worker connectivity
Node.ping(:"worker-nyc@hostname")- TLS Clustering: Certificate-based authentication for public internet deployment
- Enhanced DNS Monitoring: Support for different record types (MX, CNAME, TXT)
- Improved TLS Checks: Certificate chain validation and expiry alerting
- Configuration Templates: Example configurations for common deployment scenarios
- Historical Analytics: Long-term trend analysis and reporting
- Alerting Framework: Pluggable notification system (email, webhooks, Slack)
- API Endpoints: REST API for external integrations and automation
- Worker Auto-discovery: Automatic worker registration and health checking
- Custom Monitor Plugins: Hot-swappable monitoring modules
- Geographic Visualization: Map-based worker and target visualization
- Multi-tenant Support: Organization-level isolation and management
- Performance Optimization: Enhanced clustering and metric aggregation
- Issues: Report bugs and request features via GitHub Issues
- Documentation: Check the code documentation with
mix docs - Examples: See
config/*.examplefiles for configuration templates
This is an extensible foundation - contributions should focus on:
- Core functionality improvements: Better clustering, monitoring accuracy
- New monitoring types: Additional protocols and service checks
- Security enhancements: Authentication, encryption, access control
- Documentation: Setup guides, configuration examples, deployment patterns
Before contributing:
- Fork the repository and create a feature branch
- Ensure tests pass (
mix test) - Follow Elixir formatting standards (
mix format) - Add documentation for new features
- Consider backward compatibility for configuration changes
- Use GitHub Discussions for architecture questions
- Tag issues with appropriate labels (bug, enhancement, documentation)
- Include minimal reproduction cases for bug reports
- Provide configuration examples for feature requests
This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.
- This is a foundational example requiring significant customisation
- No support is provided for production deployments
- You are responsible for security, scaling, and maintenance
- Test thoroughly in your specific environment before production use
Built with Elixir, Phoenix LiveView, and distributed systems patterns.