Ansible playbooks for deploying a production-ready RKE2 Kubernetes cluster with High Availability (HA) configuration.
# 1. Clone repository
git clone <repository-url>
cd kubernetes-cookbook
# 2. Configure environment
cp .env.example .env
vim .env # Edit configuration
# β οΈ IMPORTANT: Update these credentials in .env
# - RKE2_TOKEN: Change to a strong random value
# 3. Configure inventory
cp inventory/hosts.yml.example inventory/hosts.yml
vim inventory/hosts.yml # Add your server IPs and SSH credentials
# 4. Setup SSH keys (if not already done)
ssh-keygen -t rsa -b 4096
ssh-copy-id root@<master-ip>
ssh-copy-id root@<worker-ip>
# 5. Test connectivity
set -a && source .env && set +a
ansible -i inventory/hosts.yml all -m ping
# 6. Deploy RKE2 cluster
ansible-playbook -i inventory/hosts.yml site.yml
# 7. Get kubeconfig
scp -i ~/.ssh/fci root@<master-ip>:/etc/rancher/rke2/rke2.yaml ~/.kube/rke2-config
# Edit the file and change server: https://127.0.0.1:6443 to your master IP
export KUBECONFIG=~/.kube/rke2-config
kubectl get nodesRKE2 (Rancher Kubernetes Engine 2) is a CNCF-certified Kubernetes distribution focused on security and compliance. These playbooks automate the deployment of an HA RKE2 cluster with multiple control plane nodes.
- β RKE2 HA Cluster: Multi-master Kubernetes cluster with etcd
- β
Environment-based Configuration: All settings via
.envfile - β Secure Credentials Management: Tokens not in git
- β Private Registry Support: Multiple registry authentication
- β Automated Installation: Complete cluster setup with Ansible
Master Nodes (Control Plane):
- CPU: 2 cores
- RAM: 4GB
- Disk: 50GB
- Quantity: 1 node (minimum) or 3 nodes (for HA)
Worker Nodes:
- CPU: 2 cores
- RAM: 4GB
- Disk: 50GB
- Quantity: 2+ nodes
- RHEL/CentOS 7.x, 8.x
- Rocky Linux 8.x, 9.x
- Ubuntu 18.04, 20.04, 22.04, 24.04
- Debian 10, 11
- All nodes must have internet connectivity to download RKE2
- Nodes must be able to communicate with each other
- Load Balancer for RKE2 API Server (recommended for production)
Master Nodes:
- 9345/tcp - RKE2 supervisor API
- 6443/tcp - Kubernetes API
- 10250/tcp - Kubelet metrics
- 2379-2380/tcp - etcd
- 8472/udp - VXLAN (Canal/Flannel)
- 4789/udp - VXLAN (Flannel)
- 9098/tcp - Canal (Calico health check)
- 9099/tcp - Canal (Felix health check)
Worker Nodes:
- 10250/tcp - Kubelet metrics
- 8472/udp - VXLAN (Canal/Flannel)
- 4789/udp - VXLAN (Flannel)
All Nodes:
- 30000-32767/tcp - NodePort Services
brew install ansiblesudo apt update
sudo apt install ansible -ysudo yum install ansible -y
# or
sudo dnf install ansible -yansible-galaxy collection install ansible.posix
ansible-galaxy collection install community.general# Copy example file
cp .env.example .env
# Edit important variables
vim .envKey variables to configure:
# API Server Load Balancer VIP or DNS
RKE2_API_IP="192.168.1.100"
# Cluster token - MUST CHANGE THIS
# Note: Use a simple password string, not K10<hash>::<user>:<pass> format
RKE2_TOKEN="your-secure-random-token-here"
# Network Plugin
RKE2_CNI="canal" # canal, calico, cilium, flannel
# CIDR Ranges
RKE2_CLUSTER_CIDR="10.42.0.0/16"
RKE2_SERVICE_CIDR="10.43.0.0/16"
# Additional TLS SANs (comma-separated)
RKE2_TLS_SAN_EXTRA="rke2.example.com,kubernetes.example.com"
# Container Registry Configuration (JSON format for multiple registries)
REGISTRIES_JSON='[{"url":"registry.company.com","username":"admin","password":"secret"}]'# Copy example file
cp inventory/hosts.yml.example inventory/hosts.yml
# Edit with your node information
vim inventory/hosts.ymlExample configuration:
all:
children:
master:
hosts:
k8s-master-01:
ansible_host: 192.168.1.101
ansible_user: root
ansible_ssh_private_key_file: ~/.ssh/id_rsa
worker:
hosts:
k8s-worker-01:
ansible_host: 192.168.1.111
ansible_user: root
ansible_ssh_private_key_file: ~/.ssh/id_rsa
k8s-worker-02:
ansible_host: 192.168.1.112
ansible_user: root
ansible_ssh_private_key_file: ~/.ssh/id_rsaSSH Options:
# Option 1: SSH Key (Recommended)
ansible_ssh_private_key_file: ~/.ssh/id_rsa
# Option 2: Password
ansible_ssh_pass: your_password
# Option 3: Sudo password (if non-root user)
ansible_become_pass: your_sudo_password# Generate SSH key if not exists
ssh-keygen -t rsa -b 4096
# Copy SSH key to all nodes
ssh-copy-id [email protected]
ssh-copy-id [email protected]
ssh-copy-id [email protected]
# Test SSH connectivity
ssh [email protected]# Load environment variables
set -a && source .env && set +a
# Test with SSH key
ansible -i inventory/hosts.yml all -m ping
# Or test with password
ansible -i inventory/hosts.yml all -m ping --ask-pass# Check syntax
ansible-playbook -i inventory/hosts.yml site.yml --syntax-check
# Dry run (no changes made)
ansible-playbook -i inventory/hosts.yml site.yml --check
# List tasks to be executed
ansible-playbook -i inventory/hosts.yml site.yml --list-tasks# Load environment variables first
set -a && source .env && set +a
# Full cluster installation
ansible-playbook -i inventory/hosts.yml site.yml
# With verbose output (for debugging)
ansible-playbook -i inventory/hosts.yml site.yml -v
# or -vv, -vvv, -vvvv for more details# Install master nodes only
ansible-playbook -i inventory/hosts.yml site.yml --limit master
# Install worker nodes only
ansible-playbook -i inventory/hosts.yml site.yml --limit worker
# Install specific node only
ansible-playbook -i inventory/hosts.yml site.yml --limit k8s-master-01# SSH to master node
ssh root@<master-ip>
# Check cluster nodes
kubectl get nodes -o wide
# Check all pods across namespaces
kubectl get pods -A -o wide
# Check RKE2 service status
systemctl status rke2-server # on master
systemctl status rke2-agent # on workerCheck cluster status:
# Check nodes
kubectl get nodes -o wide
# Check system pods
kubectl get pods -A
# Check RKE2 version
kubectl versionFor production HA setup, configure a Load Balancer for RKE2 API server:
# /etc/haproxy/haproxy.cfg
frontend rke2_api_frontend
bind *:6443
mode tcp
option tcplog
default_backend rke2_api_backend
frontend rke2_supervisor_frontend
bind *:9345
mode tcp
option tcplog
default_backend rke2_supervisor_backend
backend rke2_api_backend
mode tcp
option tcp-check
balance roundrobin
server master-01 192.168.1.101:6443 check
server master-02 192.168.1.102:6443 check
server master-03 192.168.1.103:6443 check
backend rke2_supervisor_backend
mode tcp
option tcp-check
balance roundrobin
server master-01 192.168.1.101:9345 check
server master-02 192.168.1.102:9345 check
server master-03 192.168.1.103:9345 check# /etc/nginx/nginx.conf
stream {
upstream rke2_api {
least_conn;
server 192.168.1.101:6443 max_fails=3 fail_timeout=5s;
server 192.168.1.102:6443 max_fails=3 fail_timeout=5s;
server 192.168.1.103:6443 max_fails=3 fail_timeout=5s;
}
upstream rke2_supervisor {
least_conn;
server 192.168.1.101:9345 max_fails=3 fail_timeout=5s;
server 192.168.1.102:9345 max_fails=3 fail_timeout=5s;
server 192.168.1.103:9345 max_fails=3 fail_timeout=5s;
}
server {
listen 6443;
proxy_pass rke2_api;
}
server {
listen 9345;
proxy_pass rke2_supervisor;
}
}# Copy from master node
scp [email protected]:/etc/rancher/rke2/rke2.yaml ~/.kube/rke2-config
# Update server address in kubeconfig (replace 127.0.0.1 with Load Balancer IP)
sed -i '' 's/127.0.0.1/192.168.1.100/g' ~/.kube/rke2-config
# Export kubeconfig
export KUBECONFIG=~/.kube/rke2-config
# Or merge into main kubeconfig
KUBECONFIG=~/.kube/config:~/.kube/rke2-config kubectl config view --flatten > ~/.kube/config.new
mv ~/.kube/config.new ~/.kube/config# Check nodes
kubectl get nodes -o wide
# Check system pods
kubectl get pods -A
# Cluster info
kubectl cluster-info
# Check RKE2 version
kubectl get nodes -o jsonpath='{.items[0].status.nodeInfo.kubeletVersion}'
# Check etcd health
kubectl get pods -n kube-system -l component=etcd
# View component status
kubectl get componentstatuses # deprecated in newer versions
kubectl get --raw='/readyz?verbose' # preferred method# Create test deployment
kubectl create deployment nginx --image=nginx --replicas=3
# Expose as service
kubectl expose deployment nginx --port=80 --type=NodePort
# Check deployment
kubectl get deployments
kubectl get pods -l app=nginx
kubectl get svc nginx
# Access the service
curl http://<node-ip>:<node-port>
# Cleanup
kubectl delete deployment nginx
kubectl delete service nginx- Add node to
inventory/hosts.yml:
worker:
hosts:
k8s-worker-03:
ansible_host: 192.168.1.113
ansible_user: root
ansible_ssh_private_key_file: ~/.ssh/id_rsa- Run playbook for new node only:
set -a && source .env && set +a
ansible-playbook -i inventory/hosts.yml site.yml --limit k8s-worker-03- Verify node joined:
kubectl get nodes- Add to inventory under
mastersection - Update Load Balancer configuration to include new master
- Run playbook:
set -a && source .env && set +a
ansible-playbook -i inventory/hosts.yml site.yml --limit k8s-master-02- Verify etcd cluster:
kubectl get pods -n kube-system -l component=etcd# Drain node first
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
# Remove from cluster
kubectl delete node <node-name>
# Uninstall RKE2 on the node
ansible-playbook -i inventory/hosts.yml uninstall.yml --limit <node-name>
# Remove from inventory
# Edit inventory/hosts.yml and remove the node entrykubernetes-cookbook/
βββ .env.example # Environment variables template
βββ .env # Actual configuration (gitignored)
βββ .gitignore # Git ignore rules
βββ ansible.cfg # Ansible configuration
βββ site.yml # Main installation playbook
βββ uninstall.yml # Uninstall playbook
βββ README.md # This file
βββ inventory/
β βββ hosts.yml.example # Inventory template
β βββ hosts.yml # Actual inventory (gitignored)
βββ group_vars/
β βββ all.yml # Ansible variables (reads from environment)
βββ vars/
β βββ registry.yml # Complex variable parsing (registries, taints)
βββ roles/
βββ prereq/ # System preparation
β βββ tasks/
β βββ main.yml
βββ rke2-server/ # RKE2 master nodes installation
β βββ tasks/
β β βββ main.yml
β β βββ registry.yml
β βββ templates/
β βββ config.yaml.j2
β βββ registries.yaml.j2
βββ rke2-agent/ # RKE2 worker nodes installation
βββ tasks/
β βββ main.yml
β βββ registry.yml
βββ templates/
βββ config.yaml.j2
βββ registries.yaml.j2
# From Ansible control node
ansible -i inventory/hosts.yml master -m shell -a "systemctl status rke2-server"
ansible -i inventory/hosts.yml worker -m shell -a "systemctl status rke2-agent"
# View logs
ansible -i inventory/hosts.yml master -m shell -a "journalctl -u rke2-server -n 50"
ansible -i inventory/hosts.yml worker -m shell -a "journalctl -u rke2-agent -n 50"
# Or SSH directly to node
ssh [email protected]
systemctl status rke2-server
journalctl -u rke2-server -f# Check node conditions
kubectl describe node <node-name>
# Check kubelet logs on worker
ssh root@<worker-ip>
journalctl -u rke2-agent -n 100 --no-pager
# Check CNI pods
kubectl get pods -n kube-system -o wide | grep -E 'canal|calico|cilium'
# Restart RKE2 service
systemctl restart rke2-agent # on worker
systemctl restart rke2-server # on master# Check etcd pods
kubectl get pods -n kube-system -l component=etcd
# Check etcd health (from master node)
ETCDCTL_API=3 /var/lib/rancher/rke2/bin/etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/rke2/server/tls/etcd/server-client.crt \
--key=/var/lib/rancher/rke2/server/tls/etcd/server-client.key \
endpoint health
# List etcd members
ETCDCTL_API=3 /var/lib/rancher/rke2/bin/etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/rke2/server/tls/etcd/server-client.crt \
--key=/var/lib/rancher/rke2/server/tls/etcd/server-client.key \
member list# Check certificate validity
openssl x509 -in /var/lib/rancher/rke2/server/tls/server-ca.crt -text -noout | grep -A2 Validity
# Verify TLS connection
openssl s_client -connect <master-ip>:6443 -servername kubernetes
# Check kubelet certificates
ls -la /var/lib/rancher/rke2/agent/*.crt# Test connectivity between nodes
ansible -i inventory/hosts.yml all -m ping
# Test port connectivity
ansible -i inventory/hosts.yml all -m wait_for -a "host=192.168.1.101 port=6443 timeout=5"
# Check iptables rules
ansible -i inventory/hosts.yml all -m shell -a "iptables -L -n | grep 10.42"
# Test RKE2 API endpoint
curl -k https://192.168.1.100:6443/version
# Check pod network connectivity
kubectl run test-pod --image=busybox --rm -it -- ping 10.42.0.1# Test SSH connectivity with verbose output
ssh -vvv [email protected]
# Copy SSH key again if needed
ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
# Test with password authentication
ansible -i inventory/hosts.yml all -m ping --ask-pass
# Check SSH configuration
cat ~/.ssh/config# Verbose mode to see detailed errors
ansible-playbook -i inventory/hosts.yml site.yml -vvv
# Check inventory parsing
ansible-inventory -i inventory/hosts.yml --list
ansible-inventory -i inventory/hosts.yml --graph
# Test fact gathering
ansible -i inventory/hosts.yml all -m setup
# Validate environment variables
set -a && source .env && set +a
env | grep RKE2
# Test RKE2 API
curl -k https://<master-ip>:6443# Check registry configuration
cat /etc/rancher/rke2/registries.yaml
# Test registry connectivity
curl -v https://btxh-reg.azinsu.com/v2/_catalog
# Check pod image pull status
kubectl describe pod <pod-name> | grep -A5 Events
# View containerd logs
journalctl -u containerd -n 100 --no-pager- Update
RKE2_VERSIONin.env:
RKE2_VERSION="v1.35.0+rke2r1"- Load environment and run playbook:
set -a && source .env && set +a
ansible-playbook -i inventory/hosts.yml site.yml# On each node, run:
curl -sfL https://get.rke2.io | INSTALL_RKE2_VERSION=v1.35.0+rke2r1 sh -
# Restart service
systemctl restart rke2-server # on master
systemctl restart rke2-agent # on worker
# Verify upgrade
kubectl get nodesNote: Always upgrade master nodes first, then worker nodes.
# SSH to master node
ssh root@<master-ip>
# Create etcd snapshot using RKE2 built-in command
rke2 etcd-snapshot save --name snapshot-$(date +%Y%m%d-%H%M%S)
# List snapshots
rke2 etcd-snapshot list
# Snapshots are stored in: /var/lib/rancher/rke2/server/db/snapshots/
# Copy snapshot to backup location
scp /var/lib/rancher/rke2/server/db/snapshots/snapshot-*.db backup-server:/backups/# Stop RKE2 on all master nodes
systemctl stop rke2-server
# On first master node, restore snapshot
rke2 server \
--cluster-reset \
--cluster-reset-restore-path=/var/lib/rancher/rke2/server/db/snapshots/snapshot-xxx.db
# Start RKE2 on first master
systemctl start rke2-server
# Wait for cluster to be ready
kubectl get nodes
# Re-join other master nodes (if multi-master setup)
# They will automatically sync from the first master
systemctl start rke2-server- Use at least 3 master nodes for production HA setup
- Setup Load Balancer before cluster installation
- Backup etcd regularly (recommended: daily automated backups)
- Monitor cluster health with Prometheus/Grafana
- Update RKE2 regularly to patch security vulnerabilities
- Use persistent storage for production workloads
- Configure resource limits for pods (requests/limits)
- Implement network policies for security isolation
- Enable audit logging to track cluster activities
- Test disaster recovery procedures periodically
- Use private registries with authentication for production images
- Implement proper RBAC for user access control
- Enable Pod Security Standards (PSS) for workload security
- Use secrets management (e.g., Sealed Secrets, External Secrets)
- Regular security scanning of container images
To completely remove RKE2 from all nodes:
set -a && source .env && set +a
ansible-playbook -i inventory/hosts.yml uninstall.ymlThis will:
- Stop RKE2 services on all nodes
- Remove RKE2 binaries and configuration files
- Clean up container images and volumes
- Reset network interfaces
- Remove firewall rules
- RKE2 Official Documentation
- RKE2 GitHub Repository
- RKE2 Installation Options
- RKE2 HA Setup Guide
- Kubernetes Documentation
- Ansible Documentation
- Canal CNI Documentation
This project is licensed under the MIT License.
Contributions are welcome! Please feel free to submit a Pull Request.
If you encounter any issues or have questions:
- Check the Troubleshooting section above
- Review RKE2 official documentation
- Open an issue in this repository
Author: XDEV Asia Labs
Last Updated: December 2025
RKE2 Version: v1.34.2+rke2r1 (Kubernetes v1.34.2)
MIT License
Mα»i ΔΓ³ng gΓ³p Δα»u Δược chΓ o ΔΓ³n! Vui lΓ²ng tαΊ‘o pull request hoαΊ·c issue.
NαΊΏu gαΊ·p vαΊ₯n Δα», vui lΓ²ng:
- Kiα»m tra phαΊ§n Troubleshooting
- Xem RKE2 logs:
journalctl -u rke2-server -f - TαΊ‘o issue vα»i ΔαΊ§y Δα»§ thΓ΄ng tin vα» lα»i