Skip to content

Complete Azure Monitor implementation for VM observability with Terraform. Features Log Analytics workspace, VM Insights via Azure Policy, custom KQL alerts (CPU>95%, Memory>90%), metric alerts, interactive dashboards with 6 components, and custom workbooks. Includes email notifications and RBAC configuration.

Notifications You must be signed in to change notification settings

MDavidHernandezP/AzureVirtualMachineMonitoring

Repository files navigation

Azure Monitor Lab 1 - Virtual Machines Monitoring & Alerting

Azure Azure Monitor Terraform Log Analytics KQL

This repository demonstrates comprehensive Azure Monitor implementation for virtual machine observability, including Log Analytics workspace configuration, VM Insights deployment, custom alerting with Kusto queries, metric-based monitoring, interactive dashboards, and custom workbooks. The infrastructure is fully automated using Terraform with modular architecture.

Overview

This lab implements enterprise-grade monitoring for Azure virtual machines using Azure Monitor's full suite of capabilities. The project provisions Windows and Linux VMs, configures Log Analytics agents, implements VM Insights through Azure Policy, creates sophisticated alert rules using both Kusto queries and metrics, and builds custom visualizations through dashboards and workbooks.

What This Project Demonstrates

  • Log Analytics Workspace: Centralized log collection and analysis
  • VM Insights: Performance monitoring and dependency mapping
  • Azure Policy Integration: Automated agent deployment and configuration
  • Kusto Query Language: Custom log-based alerting
  • Metric Alerts: Threshold-based monitoring
  • Dashboard Creation: Visual monitoring interfaces
  • Workbook Development: Interactive reporting and analysis
  • Alert Action Groups: Email notification configuration
  • RBAC Management: Role-based access control

Table of Contents

Lab Objectives

Primary Goals

  • Deploy monitoring infrastructure using Azure Monitor
  • Configure Log Analytics workspace for centralized logging
  • Implement VM Insights via Azure Policy
  • Create alert rules using Kusto queries and metrics
  • Build custom dashboards for real-time monitoring
  • Develop workbooks for interactive analysis
  • Configure notifications through Action Groups
  • Manage RBAC for proper access control

Learning Outcomes

  • Azure Monitor architecture and components
  • Log Analytics workspace configuration
  • VM Insights deployment and usage
  • Kusto Query Language (KQL) for log analysis
  • Alert rule creation and management
  • Metric-based monitoring strategies
  • Dashboard design and implementation
  • Workbook development techniques
  • Azure Policy for automated compliance
  • Action Group configuration

Architecture

Monitoring Infrastructure

Azure Subscription
    ↓
┌─────────────────────────────────────────┐
│  Log Analytics Workspace                │
│  - Centralized log collection           │
│  - Data retention policies              │
│  - KQL query engine                     │
└─────────────────────────────────────────┘
           ↓
    ┌──────┴──────┐
    ↓             ↓
┌─────────┐   ┌─────────┐
│ Windows │   │ Linux   │
│ VM      │   │ VM      │
│ + Agent │   │ + Agent │
└─────────┘   └─────────┘
    ↓             ↓
┌─────────────────────────────────────────┐
│  VM Insights                            │
│  - Performance data                     │
│  - Dependency mapping                   │
│  - Health monitoring                    │
└─────────────────────────────────────────┘

Alert & Notification Flow

Monitored VMs
    ↓
Data Collection (Agents)
    ↓
Log Analytics Workspace
    ↓
┌────────────────────────────┐
│  Alert Rules               │
│  ├─ Log-based (KQL)        │
│  │  - CPU > 95%            │
│  │  - Memory > 90%         │
│  ├─ Metric-based           │
│  │  - Custom metrics       │
│  └─ VM Insights-based      │
│     - Performance metrics  │
└────────────────────────────┘
    ↓
Action Groups
    ↓
Email Notifications

Dashboard & Visualization

Azure Monitor
    ↓
┌─────────────────────────────────────────┐
│  Custom Dashboard                       │
│  ├─ CPU Usage %                         │
│  ├─ Memory Usage %                      │
│  ├─ Free Memory (MB)                    │
│  ├─ Disk Free Space (GB)                │
│  ├─ Disk Used Space %                   │
│  └─ Network Usage (Mbit)                │
└─────────────────────────────────────────┘
    ↓
┌─────────────────────────────────────────┐
│  Custom Workbook                        │
│  ├─ 2+ Metric Charts                    │
│  └─ 2+ Log Query Charts                 │
└─────────────────────────────────────────┘

Task Breakdown

Task 1: Infrastructure Setup (Foundation)

Requirements: ✅ Create Azure Log Analytics Workspace ✅ Create Windows Virtual Machine ✅ Create Linux Virtual Machine

Implementation:

  • Log Analytics Workspace: Centralized workspace for both VMs

    • Location: Same as resource group
    • SKU: PerGB2018 (pay-as-you-go)
    • Retention: 30 days default
  • Windows VM:

    • OS: Windows Server 2019/2022
    • Size: Standard_B2s (2 vCPU, 4 GB RAM)
    • Disk: Standard SSD
    • Network: Public IP for remote access
  • Linux VM:

    • OS: Ubuntu 20.04/22.04 LTS
    • Size: Standard_B2s (2 vCPU, 4 GB RAM)
    • Disk: Standard SSD
    • Network: Public IP for remote access

Terraform Module: modules/log_analytics/

Task 2: Agent Deployment & Configuration

Requirements: ✅ Install Log Analytics agent on both VMs ✅ Add VM Insights using Azure Policy ✅ Configure data collection:

  • System events
  • Application events
  • Performance counters (CPU, RAM, Free Space)

Implementation Details:

Log Analytics Agent:

  • Windows: Microsoft Monitoring Agent (MMA)
  • Linux: OMS Agent for Linux
  • Connection: Workspace ID and Primary Key
  • Status: Running and reporting

VM Insights via Azure Policy:

  • Policy: "Enable Azure Monitor for VMs"
  • Scope: Resource group or subscription
  • Remediation: Automatic for existing VMs
  • Effect: DeployIfNotExists

Data Sources Configuration:

Windows Event Logs:

  • System: Error, Warning, Critical
  • Application: Error, Warning, Critical

Linux Syslog:

  • Facilities: auth, authpriv, cron, daemon, kern, syslog
  • Levels: Error, Warning, Critical

Performance Counters:

Windows:

Processor(_Total)\% Processor Time - 60s
Memory\% Committed Bytes In Use - 60s
Memory\Available MBytes - 60s
LogicalDisk(_Total)\% Free Space - 60s
LogicalDisk(_Total)\Free Megabytes - 60s
Network Adapter(*)\Bytes Sent/sec - 60s
Network Adapter(*)\Bytes Received/sec - 60s

Linux:

Processor\PercentProcessorTime - 60s
Memory\PercentUsedMemory - 60s
Memory\AvailableMBytes - 60s
LogicalDisk\PercentFreeSpace - 60s
LogicalDisk\FreeSpaceInMBytes - 60s
Network\BytesSentPerSecond - 60s
Network\BytesReceivedPerSecond - 60s

Terraform Module: modules/monitoring/

Task 3: Alerting & Visualization

Requirements: ✅ Create alert rules for custom alerts (CPU > 95%, Memory > 90%) using Kusto query ✅ Create metric alert rules for 2 metrics using Metric Explorer ✅ Create alert rules for 2 metrics using VM Insights ✅ Create dashboard with 6 components ✅ Create workbook with 2+ metric charts and 2+ log charts

3.1 Log-Based Alerts (Kusto Query)

Alert 1: High CPU Usage

Perf
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| where InstanceName == "_Total"
| summarize AggregatedValue = avg(CounterValue) by bin(TimeGenerated, 5m), Computer
| where AggregatedValue > 95

Configuration:

  • Frequency: Every 5 minutes
  • Time range: Last 5 minutes
  • Threshold: 0 results (fires when condition is met)
  • Severity: Critical (Sev 1)

Alert 2: High Memory Usage

Perf
| where ObjectName == "Memory" and CounterName == "% Committed Bytes In Use"
| summarize AggregatedValue = avg(CounterValue) by bin(TimeGenerated, 5m), Computer
| where AggregatedValue > 90

Configuration:

  • Frequency: Every 5 minutes
  • Time range: Last 5 minutes
  • Threshold: 0 results
  • Severity: Warning (Sev 2)

Terraform Module: modules/alerts/

3.2 Metric-Based Alerts

Alert 3: Percentage CPU (Metric)

  • Metric: Percentage CPU
  • Operator: Greater than
  • Threshold: 80%
  • Aggregation: Average
  • Period: 5 minutes
  • Frequency: 1 minute

Alert 4: Available Memory Bytes (Metric)

  • Metric: Available Memory Bytes
  • Operator: Less than
  • Threshold: 500 MB (524288000 bytes)
  • Aggregation: Average
  • Period: 5 minutes
  • Frequency: 1 minute

3.3 VM Insights Alerts

Alert 5: VM Performance - CPU

  • Source: VM Insights
  • Metric: CPU utilization percentage
  • Condition: Greater than 85%
  • Evaluation: 5-minute average

Alert 6: VM Performance - Memory

  • Source: VM Insights
  • Metric: Memory utilization percentage
  • Condition: Greater than 85%
  • Evaluation: 5-minute average

3.4 Dashboard Configuration

Dashboard Components (6 required):

  1. CPU Usage %

    • Type: Line chart
    • Source: Performance counters
    • Metric: Processor Time
    • Aggregation: Average
    • Time range: Last 24 hours
  2. Memory Usage %

    • Type: Line chart
    • Source: Performance counters
    • Metric: Committed Bytes In Use
    • Aggregation: Average
    • Time range: Last 24 hours
  3. Free Memory (MB)

    • Type: Gauge/Number tile
    • Source: Performance counters
    • Metric: Available MBytes
    • Aggregation: Latest
    • Time range: Last 1 hour
  4. Logical Disk Free Space (GB)

    • Type: Bar chart
    • Source: Performance counters
    • Metric: Free Space
    • Aggregation: Average by disk
    • Time range: Last 4 hours
  5. Logical Disk Used Space %

    • Type: Pie chart
    • Source: Performance counters
    • Metric: % Used Space
    • Aggregation: Latest
    • Time range: Current
  6. Network Usage (Mbit)

    • Type: Area chart
    • Source: Performance counters
    • Metrics: Bytes Sent/Received per second
    • Conversion: Bytes to Mbit
    • Aggregation: Sum
    • Time range: Last 12 hours

Terraform Module: modules/dashboard/

3.5 Workbook Configuration

Minimum Requirements: 2 metric charts + 2 log view charts

Metric Charts:

Chart 1: CPU Utilization Trend

  • Data source: Azure Monitor Metrics
  • Metric: Percentage CPU
  • Visualization: Line chart
  • Split by: Virtual Machine
  • Time range: Last 7 days

Chart 2: Memory Available

  • Data source: Azure Monitor Metrics
  • Metric: Available Memory Bytes
  • Visualization: Area chart
  • Aggregation: Average
  • Time range: Last 24 hours

Log Query Charts:

Chart 3: Top CPU Consumers

Perf
| where TimeGenerated > ago(1h)
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize AvgCPU = avg(CounterValue) by Computer
| order by AvgCPU desc
| take 10
  • Visualization: Bar chart

Chart 4: Memory Trend by Computer

Perf
| where TimeGenerated > ago(24h)
| where ObjectName == "Memory" and CounterName == "Available MBytes"
| summarize AvgMemory = avg(CounterValue) by Computer, bin(TimeGenerated, 1h)
| render timechart
  • Visualization: Time chart

Additional Workbook Features:

  • Parameters for VM selection
  • Time range selector
  • Interactive filters
  • Export capabilities

Terraform Module: modules/workbook/

Terraform Modules

Module Structure

modules/
├── log_analytics/          # Workspace creation
│   ├── main.tf
│   ├── variables.tf
│   └── outputs.tf
│
├── monitoring/             # VM agents and data sources
│   ├── main.tf
│   ├── variables.tf
│   └── outputs.tf
│
├── alerts/                 # Alert rules (KQL, Metric, VM Insights)
│   ├── main.tf
│   ├── variables.tf
│   └── outputs.tf
│
├── dashboard/             # Custom dashboard
│   ├── main.tf
│   ├── variables.tf
│   └── outputs.tf
│
├── workbook/              # Custom workbook
│   ├── main.tf
│   ├── variables.tf
│   └── outputs.tf
│
└── rbac/                  # Role assignments
    ├── main.tf
    ├── variables.tf
    └── outputs.tf

Module: Log Analytics

Purpose: Create and configure Log Analytics workspace

Resources:

  • azurerm_log_analytics_workspace
  • azurerm_log_analytics_solution (for VM Insights)

Outputs:

  • Workspace ID
  • Workspace primary key
  • Workspace location

Module: Monitoring

Purpose: Deploy agents and configure data collection

Resources:

  • azurerm_virtual_machine_extension (Log Analytics agent)
  • azurerm_monitor_data_collection_rule
  • azurerm_monitor_data_collection_rule_association

Configuration:

  • Windows and Linux agent deployment
  • Performance counter configuration
  • Event log collection setup

Module: Alerts

Purpose: Create all alert rules and action groups

Resources:

  • azurerm_monitor_scheduled_query_rules_alert_v2 (KQL alerts)
  • azurerm_monitor_metric_alert (Metric alerts)
  • azurerm_monitor_action_group (Email notifications)

Alert Types:

  1. Log-based (Kusto queries)
  2. Metric-based (Azure Monitor metrics)
  3. VM Insights-based (Performance metrics)

Module: Dashboard

Purpose: Create custom monitoring dashboard

Resources:

  • azurerm_portal_dashboard

Components:

  • CPU usage visualization
  • Memory metrics
  • Disk space monitoring
  • Network traffic charts

Module: Workbook

Purpose: Create interactive workbook

Resources:

  • azurerm_application_insights_workbook

Features:

  • Metric-based charts
  • Log query visualizations
  • Interactive parameters
  • Custom layouts

Module: RBAC

Purpose: Grant Contributor role to trainer

Resources:

  • azurerm_role_assignment

Configuration:

  • Scope: Subscription level
  • Role: Contributor
  • Principal: Trainer email/object ID

Monitoring Configuration

Data Collection Rules

Performance Counters Collection Interval: 60 seconds

Windows Performance Counters:

performance_counters = [
  "\\Processor(_Total)\\% Processor Time",
  "\\Memory\\% Committed Bytes In Use",
  "\\Memory\\Available MBytes",
  "\\LogicalDisk(_Total)\\% Free Space",
  "\\LogicalDisk(_Total)\\Free Megabytes",
  "\\Network Adapter(*)\\Bytes Sent/sec",
  "\\Network Adapter(*)\\Bytes Received/sec"
]

Linux Performance Counters:

performance_counters = [
  "Processor(*)\\PercentProcessorTime",
  "Memory(*)\\PercentUsedMemory",
  "Memory(*)\\AvailableMBytes",
  "LogicalDisk(*)\\PercentFreeSpace",
  "LogicalDisk(*)\\FreeSpaceInMBytes",
  "Network(*)\\BytesSentPerSecond",
  "Network(*)\\BytesReceivedPerSecond"
]

VM Insights Configuration

Enabled Features:

  • Performance monitoring
  • Map (dependency visualization)
  • Health diagnostics

Azure Policy Assignment:

  • Initiative: "Enable Azure Monitor for VMs"
  • Effect: DeployIfNotExists
  • Remediation: Automatic
  • Scope: Resource group

Alert Rules

Alert Rule Summary

Alert Name Type Metric/Query Threshold Severity Frequency
High CPU (KQL) Log CPU > 95% 0 results Sev 1 5 min
High Memory (KQL) Log Memory > 90% 0 results Sev 2 5 min
CPU Percentage Metric Percentage CPU 80% Sev 2 1 min
Low Memory Metric Available Memory <500MB Sev 2 1 min
VM CPU Insights VM Insights CPU utilization 85% Sev 2 5 min
VM Memory Insights VM Insights Memory utilization 85% Sev 2 5 min

Action Groups

Email Action Group:

  • Name: "email-action-group"
  • Email addresses: Admin, DevOps team
  • SMS: Optional
  • Webhook: Optional for integration

Configuration:

resource "azurerm_monitor_action_group" "email" {
  name                = "email-action-group"
  resource_group_name = var.resource_group_name
  short_name          = "emailag"

  email_receiver {
    name          = "admin"
    email_address = var.admin_email
  }
}

Dashboard Components

Dashboard Layout

Top Row: Current Status

  • CPU Usage % (Gauge)
  • Memory Usage % (Gauge)
  • Free Memory MB (Number)

Middle Row: Resource Utilization

  • Disk Free Space GB (Bar chart)
  • Disk Used Space % (Pie chart)

Bottom Row: Network Activity

  • Network In/Out Mbit (Area chart)

Dashboard JSON Structure

{
  "lenses": {
    "0": {
      "order": 0,
      "parts": {
        "0": { "position": { "x": 0, "y": 0, "colSpan": 4, "rowSpan": 3 } },
        "1": { "position": { "x": 4, "y": 0, "colSpan": 4, "rowSpan": 3 } },
        "2": { "position": { "x": 8, "y": 0, "colSpan": 4, "rowSpan": 3 } }
      }
    }
  }
}

Workbook Features

Interactive Elements

Parameters:

  • Virtual Machine selector (dropdown)
  • Time range picker
  • Resource group filter
  • Subscription selector

Visualizations:

  • Line charts for trends
  • Bar charts for comparisons
  • Tables for detailed data
  • Maps for geographical distribution

Queries:

  • Real-time performance metrics
  • Historical trend analysis
  • Anomaly detection
  • Capacity planning

Key Features

Comprehensive Monitoring

  • Multi-platform support: Windows and Linux VMs
  • Real-time metrics: CPU, memory, disk, network
  • Log aggregation: Centralized event collection
  • Performance tracking: Historical trend analysis

Automated Alerting

  • Multiple alert types: Log-based, metric-based, insights-based
  • Intelligent thresholds: CPU 95%, Memory 90%
  • Email notifications: Immediate alert delivery
  • Severity levels: Critical, Warning, Informational

Custom Visualizations

  • Interactive dashboards: Real-time monitoring views
  • Custom workbooks: Ad-hoc analysis capabilities
  • Multiple chart types: Lines, bars, pies, gauges
  • Export functionality: PDF, Excel, CSV

Infrastructure as Code

  • Terraform modules: Reusable, modular configuration
  • Version control: Infrastructure changes tracked
  • Reproducible deployments: Consistent environments
  • Automated provisioning: Rapid deployment

Prerequisites

Required Tools

  • Terraform: >= 1.0.0
  • Azure CLI: Latest version
  • Azure Subscription: Active with sufficient quota

Required Permissions

  • Subscription Contributor: Resource creation
  • User Access Administrator: RBAC assignments
  • Monitoring Contributor: Alert rule creation

Required Knowledge

  • Azure Monitor concepts
  • Kusto Query Language basics
  • Terraform HCL syntax
  • Azure VM fundamentals
  • Networking basics

Getting Started

Step 1: Clone and Configure

# Clone repository
git clone <repository-url>
cd azure-monitor-lab1

# Configure Azure authentication
az login

# Set subscription
az account set --subscription "your-subscription-id"

Step 2: Configure Variables

# Create terraform.tfvars
cat > terraform.tfvars <<EOF
resource_group_name = "rg-monitoring-lab"
location            = "eastus"
admin_email         = "[email protected]"
trainer_email       = "[email protected]"
vm_admin_username   = "azureuser"
vm_admin_password   = "ComplexPassword123!"
EOF

Step 3: Deploy Infrastructure

# Initialize Terraform
terraform init

# Review plan
terraform plan

# Apply configuration
terraform apply

# Note outputs
terraform output

Step 4: Verify Deployment

# Check VMs are running
az vm list --resource-group rg-monitoring-lab --output table

# Verify Log Analytics workspace
az monitor log-analytics workspace show \
  --resource-group rg-monitoring-lab \
  --workspace-name la-workspace

# Check agent status
az vm extension list \
  --resource-group rg-monitoring-lab \
  --vm-name windows-vm \
  --output table

Step 5: Access Monitoring

  1. Navigate to Azure Portal
  2. Go to Azure Monitor
  3. View dashboards, workbooks, and alerts
  4. Test alert rules by generating load

Repository Structure

.
├── modules/
│   ├── log_analytics/              # Workspace module
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── monitoring/                 # Agent & data collection
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── alerts/                     # Alert rules
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── dashboard/                  # Custom dashboard
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── workbook/                   # Custom workbook
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   └── rbac/                       # Role assignments
│       ├── main.tf
│       ├── variables.tf
│       └── outputs.tf
│
├── main.tf                         # Root module
├── variables.tf                    # Root variables
├── providers.tf                    # Provider configuration
├── backend.tf                      # State backend
├── outputs.tf                      # Root outputs
└── README.md                      # This file

Validation

Infrastructure Validation

# Verify all resources
az resource list --resource-group rg-monitoring-lab --output table

# Check Log Analytics workspace
az monitor log-analytics workspace show \
  --resource-group rg-monitoring-lab \
  --workspace-name la-workspace

# Verify VM extensions
az vm extension list \
  --resource-group rg-monitoring-lab \
  --vm-name windows-vm

Monitoring Validation

# Test Kusto query
az monitor log-analytics query \
  --workspace la-workspace \
  --analytics-query "Perf | where ObjectName == 'Processor' | take 10"

# List alert rules
az monitor metrics alert list \
  --resource-group rg-monitoring-lab

# Check action groups
az monitor action-group list \
  --resource-group rg-monitoring-lab

Alert Testing

# Generate CPU load on Windows VM
# RDP into Windows VM and run:
# for /l %i in (0,0,0) do @echo test

# Generate CPU load on Linux VM
# SSH into Linux VM and run:
# stress --cpu 2 --timeout 300

Contributing

This repository documents personal learning progress through a DevOps bootcamp. While it's primarily for educational purposes, suggestions and improvements are welcome!

How to Contribute

  1. Report Issues: Found a bug or error in scripts?

    • Open an issue describing the problem
    • Include script name and error message
    • Provide steps to reproduce
  2. Suggest Improvements: Have ideas for better implementations?

    • Fork the repository
    • Create a feature branch
    • Submit pull request with clear description
  3. Share Knowledge: Learned something new?

    • Add comments or documentation
    • Create additional practice exercises
    • Write tutorials or guides

License

This project is created for educational purposes as part of a DevOps bootcamp internship.

Educational Use: Feel free to use these scripts and documentation for learning purposes.

Attribution: If you use or reference this work, please provide attribution to the original author.

No Warranty: These scripts are provided "as is" without warranty of any kind. Use at your own risk, especially in production environments.

About

Complete Azure Monitor implementation for VM observability with Terraform. Features Log Analytics workspace, VM Insights via Azure Policy, custom KQL alerts (CPU>95%, Memory>90%), metric alerts, interactive dashboards with 6 components, and custom workbooks. Includes email notifications and RBAC configuration.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages