Azure Monitor Lab 1 - Virtual Machines Monitoring & Alerting

This repository demonstrates comprehensive Azure Monitor implementation for virtual machine observability, including Log Analytics workspace configuration, VM Insights deployment, custom alerting with Kusto queries, metric-based monitoring, interactive dashboards, and custom workbooks. The infrastructure is fully automated using Terraform with modular architecture.

Overview

This lab implements enterprise-grade monitoring for Azure virtual machines using Azure Monitor's full suite of capabilities. The project provisions Windows and Linux VMs, configures Log Analytics agents, implements VM Insights through Azure Policy, creates sophisticated alert rules using both Kusto queries and metrics, and builds custom visualizations through dashboards and workbooks.

What This Project Demonstrates

Log Analytics Workspace: Centralized log collection and analysis
VM Insights: Performance monitoring and dependency mapping
Azure Policy Integration: Automated agent deployment and configuration
Kusto Query Language: Custom log-based alerting
Metric Alerts: Threshold-based monitoring
Dashboard Creation: Visual monitoring interfaces
Workbook Development: Interactive reporting and analysis
Alert Action Groups: Email notification configuration
RBAC Management: Role-based access control

Lab Objectives

Primary Goals

Deploy monitoring infrastructure using Azure Monitor
Configure Log Analytics workspace for centralized logging
Implement VM Insights via Azure Policy
Create alert rules using Kusto queries and metrics
Build custom dashboards for real-time monitoring
Develop workbooks for interactive analysis
Configure notifications through Action Groups
Manage RBAC for proper access control

Learning Outcomes

Azure Monitor architecture and components
Log Analytics workspace configuration
VM Insights deployment and usage
Kusto Query Language (KQL) for log analysis
Alert rule creation and management
Metric-based monitoring strategies
Dashboard design and implementation
Workbook development techniques
Azure Policy for automated compliance
Action Group configuration

Architecture

Monitoring Infrastructure

Azure Subscription
    ↓
┌─────────────────────────────────────────┐
│  Log Analytics Workspace                │
│  - Centralized log collection           │
│  - Data retention policies              │
│  - KQL query engine                     │
└─────────────────────────────────────────┘
           ↓
    ┌──────┴──────┐
    ↓             ↓
┌─────────┐   ┌─────────┐
│ Windows │   │ Linux   │
│ VM      │   │ VM      │
│ + Agent │   │ + Agent │
└─────────┘   └─────────┘
    ↓             ↓
┌─────────────────────────────────────────┐
│  VM Insights                            │
│  - Performance data                     │
│  - Dependency mapping                   │
│  - Health monitoring                    │
└─────────────────────────────────────────┘

Alert & Notification Flow

Monitored VMs
    ↓
Data Collection (Agents)
    ↓
Log Analytics Workspace
    ↓
┌────────────────────────────┐
│  Alert Rules               │
│  ├─ Log-based (KQL)        │
│  │  - CPU > 95%            │
│  │  - Memory > 90%         │
│  ├─ Metric-based           │
│  │  - Custom metrics       │
│  └─ VM Insights-based      │
│     - Performance metrics  │
└────────────────────────────┘
    ↓
Action Groups
    ↓
Email Notifications

Dashboard & Visualization

Azure Monitor
    ↓
┌─────────────────────────────────────────┐
│  Custom Dashboard                       │
│  ├─ CPU Usage %                         │
│  ├─ Memory Usage %                      │
│  ├─ Free Memory (MB)                    │
│  ├─ Disk Free Space (GB)                │
│  ├─ Disk Used Space %                   │
│  └─ Network Usage (Mbit)                │
└─────────────────────────────────────────┘
    ↓
┌─────────────────────────────────────────┐
│  Custom Workbook                        │
│  ├─ 2+ Metric Charts                    │
│  └─ 2+ Log Query Charts                 │
└─────────────────────────────────────────┘

Task Breakdown

Task 1: Infrastructure Setup (Foundation)

Requirements: ✅ Create Azure Log Analytics Workspace ✅ Create Windows Virtual Machine ✅ Create Linux Virtual Machine

Implementation:

Log Analytics Workspace: Centralized workspace for both VMs
- Location: Same as resource group
- SKU: PerGB2018 (pay-as-you-go)
- Retention: 30 days default
Windows VM:
- OS: Windows Server 2019/2022
- Size: Standard_B2s (2 vCPU, 4 GB RAM)
- Disk: Standard SSD
- Network: Public IP for remote access
Linux VM:
- OS: Ubuntu 20.04/22.04 LTS
- Size: Standard_B2s (2 vCPU, 4 GB RAM)
- Disk: Standard SSD
- Network: Public IP for remote access

Terraform Module: modules/log_analytics/

Task 2: Agent Deployment & Configuration

Requirements: ✅ Install Log Analytics agent on both VMs ✅ Add VM Insights using Azure Policy ✅ Configure data collection:

System events
Application events
Performance counters (CPU, RAM, Free Space)

Implementation Details:

Log Analytics Agent:

Windows: Microsoft Monitoring Agent (MMA)
Linux: OMS Agent for Linux
Connection: Workspace ID and Primary Key
Status: Running and reporting

VM Insights via Azure Policy:

Policy: "Enable Azure Monitor for VMs"
Scope: Resource group or subscription
Remediation: Automatic for existing VMs
Effect: DeployIfNotExists

Data Sources Configuration:

Windows Event Logs:

System: Error, Warning, Critical
Application: Error, Warning, Critical

Linux Syslog:

Facilities: auth, authpriv, cron, daemon, kern, syslog
Levels: Error, Warning, Critical

Performance Counters:

Windows:

Processor(_Total)\% Processor Time - 60s
Memory\% Committed Bytes In Use - 60s
Memory\Available MBytes - 60s
LogicalDisk(_Total)\% Free Space - 60s
LogicalDisk(_Total)\Free Megabytes - 60s
Network Adapter(*)\Bytes Sent/sec - 60s
Network Adapter(*)\Bytes Received/sec - 60s

Linux:

Processor\PercentProcessorTime - 60s
Memory\PercentUsedMemory - 60s
Memory\AvailableMBytes - 60s
LogicalDisk\PercentFreeSpace - 60s
LogicalDisk\FreeSpaceInMBytes - 60s
Network\BytesSentPerSecond - 60s
Network\BytesReceivedPerSecond - 60s

Terraform Module: modules/monitoring/

Task 3: Alerting & Visualization

Requirements: ✅ Create alert rules for custom alerts (CPU > 95%, Memory > 90%) using Kusto query ✅ Create metric alert rules for 2 metrics using Metric Explorer ✅ Create alert rules for 2 metrics using VM Insights ✅ Create dashboard with 6 components ✅ Create workbook with 2+ metric charts and 2+ log charts

3.1 Log-Based Alerts (Kusto Query)

Alert 1: High CPU Usage

Perf
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| where InstanceName == "_Total"
| summarize AggregatedValue = avg(CounterValue) by bin(TimeGenerated, 5m), Computer
| where AggregatedValue > 95

Configuration:

Frequency: Every 5 minutes
Time range: Last 5 minutes
Threshold: 0 results (fires when condition is met)
Severity: Critical (Sev 1)

Alert 2: High Memory Usage

Perf
| where ObjectName == "Memory" and CounterName == "% Committed Bytes In Use"
| summarize AggregatedValue = avg(CounterValue) by bin(TimeGenerated, 5m), Computer
| where AggregatedValue > 90

Configuration:

Frequency: Every 5 minutes
Time range: Last 5 minutes
Threshold: 0 results
Severity: Warning (Sev 2)

Terraform Module: modules/alerts/

3.2 Metric-Based Alerts

Alert 3: Percentage CPU (Metric)

Metric: Percentage CPU
Operator: Greater than
Threshold: 80%
Aggregation: Average
Period: 5 minutes
Frequency: 1 minute

Alert 4: Available Memory Bytes (Metric)

Metric: Available Memory Bytes
Operator: Less than
Threshold: 500 MB (524288000 bytes)
Aggregation: Average
Period: 5 minutes
Frequency: 1 minute

3.3 VM Insights Alerts

Alert 5: VM Performance - CPU

Source: VM Insights
Metric: CPU utilization percentage
Condition: Greater than 85%
Evaluation: 5-minute average

Alert 6: VM Performance - Memory

Source: VM Insights
Metric: Memory utilization percentage
Condition: Greater than 85%
Evaluation: 5-minute average

3.4 Dashboard Configuration

Dashboard Components (6 required):

CPU Usage %
- Type: Line chart
- Source: Performance counters
- Metric: Processor Time
- Aggregation: Average
- Time range: Last 24 hours
Memory Usage %
- Type: Line chart
- Source: Performance counters
- Metric: Committed Bytes In Use
- Aggregation: Average
- Time range: Last 24 hours
Free Memory (MB)
- Type: Gauge/Number tile
- Source: Performance counters
- Metric: Available MBytes
- Aggregation: Latest
- Time range: Last 1 hour
Logical Disk Free Space (GB)
- Type: Bar chart
- Source: Performance counters
- Metric: Free Space
- Aggregation: Average by disk
- Time range: Last 4 hours
Logical Disk Used Space %
- Type: Pie chart
- Source: Performance counters
- Metric: % Used Space
- Aggregation: Latest
- Time range: Current
Network Usage (Mbit)
- Type: Area chart
- Source: Performance counters
- Metrics: Bytes Sent/Received per second
- Conversion: Bytes to Mbit
- Aggregation: Sum
- Time range: Last 12 hours

Terraform Module: modules/dashboard/

3.5 Workbook Configuration

Minimum Requirements: 2 metric charts + 2 log view charts

Metric Charts:

Chart 1: CPU Utilization Trend

Data source: Azure Monitor Metrics
Metric: Percentage CPU
Visualization: Line chart
Split by: Virtual Machine
Time range: Last 7 days

Chart 2: Memory Available

Data source: Azure Monitor Metrics
Metric: Available Memory Bytes
Visualization: Area chart
Aggregation: Average
Time range: Last 24 hours

Log Query Charts:

Chart 3: Top CPU Consumers

Perf
| where TimeGenerated > ago(1h)
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize AvgCPU = avg(CounterValue) by Computer
| order by AvgCPU desc
| take 10

Visualization: Bar chart

Chart 4: Memory Trend by Computer

Perf
| where TimeGenerated > ago(24h)
| where ObjectName == "Memory" and CounterName == "Available MBytes"
| summarize AvgMemory = avg(CounterValue) by Computer, bin(TimeGenerated, 1h)
| render timechart

Visualization: Time chart

Additional Workbook Features:

Parameters for VM selection
Time range selector
Interactive filters
Export capabilities

Terraform Module: modules/workbook/

Terraform Modules

Module Structure

modules/
├── log_analytics/          # Workspace creation
│   ├── main.tf
│   ├── variables.tf
│   └── outputs.tf
│
├── monitoring/             # VM agents and data sources
│   ├── main.tf
│   ├── variables.tf
│   └── outputs.tf
│
├── alerts/                 # Alert rules (KQL, Metric, VM Insights)
│   ├── main.tf
│   ├── variables.tf
│   └── outputs.tf
│
├── dashboard/             # Custom dashboard
│   ├── main.tf
│   ├── variables.tf
│   └── outputs.tf
│
├── workbook/              # Custom workbook
│   ├── main.tf
│   ├── variables.tf
│   └── outputs.tf
│
└── rbac/                  # Role assignments
    ├── main.tf
    ├── variables.tf
    └── outputs.tf

Module: Log Analytics

Purpose: Create and configure Log Analytics workspace

Resources:

azurerm_log_analytics_workspace
azurerm_log_analytics_solution (for VM Insights)

Outputs:

Workspace ID
Workspace primary key
Workspace location

Module: Monitoring

Purpose: Deploy agents and configure data collection

Resources:

azurerm_virtual_machine_extension (Log Analytics agent)
azurerm_monitor_data_collection_rule
azurerm_monitor_data_collection_rule_association

Configuration:

Windows and Linux agent deployment
Performance counter configuration
Event log collection setup

Module: Alerts

Purpose: Create all alert rules and action groups

Resources:

azurerm_monitor_scheduled_query_rules_alert_v2 (KQL alerts)
azurerm_monitor_metric_alert (Metric alerts)
azurerm_monitor_action_group (Email notifications)

Alert Types:

Log-based (Kusto queries)
Metric-based (Azure Monitor metrics)
VM Insights-based (Performance metrics)

Module: Dashboard

Purpose: Create custom monitoring dashboard

Resources:

azurerm_portal_dashboard

Components:

CPU usage visualization
Memory metrics
Disk space monitoring
Network traffic charts

Module: Workbook

Purpose: Create interactive workbook

Resources:

azurerm_application_insights_workbook

Features:

Metric-based charts
Log query visualizations
Interactive parameters
Custom layouts

Module: RBAC

Purpose: Grant Contributor role to trainer

Resources:

azurerm_role_assignment

Configuration:

Scope: Subscription level
Role: Contributor
Principal: Trainer email/object ID

Monitoring Configuration

Data Collection Rules

Performance Counters Collection Interval: 60 seconds

Windows Performance Counters:

performance_counters = [
  "\\Processor(_Total)\\% Processor Time",
  "\\Memory\\% Committed Bytes In Use",
  "\\Memory\\Available MBytes",
  "\\LogicalDisk(_Total)\\% Free Space",
  "\\LogicalDisk(_Total)\\Free Megabytes",
  "\\Network Adapter(*)\\Bytes Sent/sec",
  "\\Network Adapter(*)\\Bytes Received/sec"
]

Linux Performance Counters:

performance_counters = [
  "Processor(*)\\PercentProcessorTime",
  "Memory(*)\\PercentUsedMemory",
  "Memory(*)\\AvailableMBytes",
  "LogicalDisk(*)\\PercentFreeSpace",
  "LogicalDisk(*)\\FreeSpaceInMBytes",
  "Network(*)\\BytesSentPerSecond",
  "Network(*)\\BytesReceivedPerSecond"
]

VM Insights Configuration

Enabled Features:

Performance monitoring
Map (dependency visualization)
Health diagnostics

Azure Policy Assignment:

Initiative: "Enable Azure Monitor for VMs"
Effect: DeployIfNotExists
Remediation: Automatic
Scope: Resource group

Alert Rules

Alert Rule Summary

Alert Name	Type	Metric/Query	Threshold	Severity	Frequency
High CPU (KQL)	Log	CPU > 95%	0 results	Sev 1	5 min
High Memory (KQL)	Log	Memory > 90%	0 results	Sev 2	5 min
CPU Percentage	Metric	Percentage CPU	80%	Sev 2	1 min
Low Memory	Metric	Available Memory	<500MB	Sev 2	1 min
VM CPU Insights	VM Insights	CPU utilization	85%	Sev 2	5 min
VM Memory Insights	VM Insights	Memory utilization	85%	Sev 2	5 min

Action Groups

Email Action Group:

Name: "email-action-group"
Email addresses: Admin, DevOps team
SMS: Optional
Webhook: Optional for integration

Configuration:

resource "azurerm_monitor_action_group" "email" {
  name                = "email-action-group"
  resource_group_name = var.resource_group_name
  short_name          = "emailag"

  email_receiver {
    name          = "admin"
    email_address = var.admin_email
  }
}

Dashboard Components

Dashboard Layout

Top Row: Current Status

CPU Usage % (Gauge)
Memory Usage % (Gauge)
Free Memory MB (Number)

Middle Row: Resource Utilization

Disk Free Space GB (Bar chart)
Disk Used Space % (Pie chart)

Bottom Row: Network Activity

Network In/Out Mbit (Area chart)

Dashboard JSON Structure

{
  "lenses": {
    "0": {
      "order": 0,
      "parts": {
        "0": { "position": { "x": 0, "y": 0, "colSpan": 4, "rowSpan": 3 } },
        "1": { "position": { "x": 4, "y": 0, "colSpan": 4, "rowSpan": 3 } },
        "2": { "position": { "x": 8, "y": 0, "colSpan": 4, "rowSpan": 3 } }
      }
    }
  }
}

Workbook Features

Interactive Elements

Parameters:

Virtual Machine selector (dropdown)
Time range picker
Resource group filter
Subscription selector

Visualizations:

Line charts for trends
Bar charts for comparisons
Tables for detailed data
Maps for geographical distribution

Queries:

Real-time performance metrics
Historical trend analysis
Anomaly detection
Capacity planning

Key Features

Comprehensive Monitoring

Multi-platform support: Windows and Linux VMs
Real-time metrics: CPU, memory, disk, network
Log aggregation: Centralized event collection
Performance tracking: Historical trend analysis

Automated Alerting

Multiple alert types: Log-based, metric-based, insights-based
Intelligent thresholds: CPU 95%, Memory 90%
Email notifications: Immediate alert delivery
Severity levels: Critical, Warning, Informational

Custom Visualizations

Interactive dashboards: Real-time monitoring views
Custom workbooks: Ad-hoc analysis capabilities
Multiple chart types: Lines, bars, pies, gauges
Export functionality: PDF, Excel, CSV

Infrastructure as Code

Terraform modules: Reusable, modular configuration
Version control: Infrastructure changes tracked
Reproducible deployments: Consistent environments
Automated provisioning: Rapid deployment

Prerequisites

Required Tools

Terraform: >= 1.0.0
Azure CLI: Latest version
Azure Subscription: Active with sufficient quota

Required Permissions

Subscription Contributor: Resource creation
User Access Administrator: RBAC assignments
Monitoring Contributor: Alert rule creation

Required Knowledge

Azure Monitor concepts
Kusto Query Language basics
Terraform HCL syntax
Azure VM fundamentals
Networking basics

Getting Started

Step 1: Clone and Configure

# Clone repository
git clone <repository-url>
cd azure-monitor-lab1

# Configure Azure authentication
az login

# Set subscription
az account set --subscription "your-subscription-id"

Step 2: Configure Variables

# Create terraform.tfvars
cat > terraform.tfvars <<EOF
resource_group_name = "rg-monitoring-lab"
location            = "eastus"
admin_email         = "[email protected]"
trainer_email       = "[email protected]"
vm_admin_username   = "azureuser"
vm_admin_password   = "ComplexPassword123!"
EOF

Step 3: Deploy Infrastructure

# Initialize Terraform
terraform init

# Review plan
terraform plan

# Apply configuration
terraform apply

# Note outputs
terraform output

Step 4: Verify Deployment

# Check VMs are running
az vm list --resource-group rg-monitoring-lab --output table

# Verify Log Analytics workspace
az monitor log-analytics workspace show \
  --resource-group rg-monitoring-lab \
  --workspace-name la-workspace

# Check agent status
az vm extension list \
  --resource-group rg-monitoring-lab \
  --vm-name windows-vm \
  --output table

Step 5: Access Monitoring

Navigate to Azure Portal
Go to Azure Monitor
View dashboards, workbooks, and alerts
Test alert rules by generating load

Repository Structure

.
├── modules/
│   ├── log_analytics/              # Workspace module
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── monitoring/                 # Agent & data collection
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── alerts/                     # Alert rules
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── dashboard/                  # Custom dashboard
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── workbook/                   # Custom workbook
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   └── rbac/                       # Role assignments
│       ├── main.tf
│       ├── variables.tf
│       └── outputs.tf
│
├── main.tf                         # Root module
├── variables.tf                    # Root variables
├── providers.tf                    # Provider configuration
├── backend.tf                      # State backend
├── outputs.tf                      # Root outputs
└── README.md                      # This file

Validation

Infrastructure Validation

# Verify all resources
az resource list --resource-group rg-monitoring-lab --output table

# Check Log Analytics workspace
az monitor log-analytics workspace show \
  --resource-group rg-monitoring-lab \
  --workspace-name la-workspace

# Verify VM extensions
az vm extension list \
  --resource-group rg-monitoring-lab \
  --vm-name windows-vm

Monitoring Validation

# Test Kusto query
az monitor log-analytics query \
  --workspace la-workspace \
  --analytics-query "Perf | where ObjectName == 'Processor' | take 10"

# List alert rules
az monitor metrics alert list \
  --resource-group rg-monitoring-lab

# Check action groups
az monitor action-group list \
  --resource-group rg-monitoring-lab

Alert Testing

# Generate CPU load on Windows VM
# RDP into Windows VM and run:
# for /l %i in (0,0,0) do @echo test

# Generate CPU load on Linux VM
# SSH into Linux VM and run:
# stress --cpu 2 --timeout 300

Contributing

This repository documents personal learning progress through a DevOps bootcamp. While it's primarily for educational purposes, suggestions and improvements are welcome!

How to Contribute

Report Issues: Found a bug or error in scripts?
- Open an issue describing the problem
- Include script name and error message
- Provide steps to reproduce
Suggest Improvements: Have ideas for better implementations?
- Fork the repository
- Create a feature branch
- Submit pull request with clear description
Share Knowledge: Learned something new?
- Add comments or documentation
- Create additional practice exercises
- Write tutorials or guides

License

This project is created for educational purposes as part of a DevOps bootcamp internship.

Educational Use: Feel free to use these scripts and documentation for learning purposes.

Attribution: If you use or reference this work, please provide attribution to the original author.

No Warranty: These scripts are provided "as is" without warranty of any kind. Use at your own risk, especially in production environments.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
modules		modules
README.md		README.md
backend.tf		backend.tf
cmaz-3z7ncrr4-modX-dashboard-vms - Microsoft Azure.pdf		cmaz-3z7ncrr4-modX-dashboard-vms - Microsoft Azure.pdf
main.tf		main.tf
providers.tf		providers.tf
variables.tf		variables.tf

MDavidHernandezP/AzureVirtualMachineMonitoring

Folders and files

Latest commit

History

Repository files navigation

Azure Monitor Lab 1 - Virtual Machines Monitoring & Alerting

Overview

What This Project Demonstrates

Table of Contents

Lab Objectives

Primary Goals

Learning Outcomes

Architecture

Monitoring Infrastructure

Alert & Notification Flow

Dashboard & Visualization

Task Breakdown

Task 1: Infrastructure Setup (Foundation)

Task 2: Agent Deployment & Configuration

Task 3: Alerting & Visualization

3.1 Log-Based Alerts (Kusto Query)

3.2 Metric-Based Alerts

3.3 VM Insights Alerts

3.4 Dashboard Configuration

3.5 Workbook Configuration

Terraform Modules

Module Structure

Module: Log Analytics

Module: Monitoring

Module: Alerts

Module: Dashboard

Module: Workbook

Module: RBAC

Monitoring Configuration

Data Collection Rules

VM Insights Configuration

Alert Rules

Alert Rule Summary

Action Groups

Dashboard Components

Dashboard Layout

Dashboard JSON Structure

Workbook Features

Interactive Elements

Key Features

Comprehensive Monitoring

Automated Alerting

Custom Visualizations

Infrastructure as Code

Prerequisites

Required Tools

Required Permissions

Required Knowledge

Getting Started

Step 1: Clone and Configure

Step 2: Configure Variables

Step 3: Deploy Infrastructure

Step 4: Verify Deployment

Step 5: Access Monitoring

Repository Structure

Validation

Infrastructure Validation

Monitoring Validation

Alert Testing

Contributing

How to Contribute

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages