This repository demonstrates comprehensive Azure Monitor implementation for virtual machine observability, including Log Analytics workspace configuration, VM Insights deployment, custom alerting with Kusto queries, metric-based monitoring, interactive dashboards, and custom workbooks. The infrastructure is fully automated using Terraform with modular architecture.
This lab implements enterprise-grade monitoring for Azure virtual machines using Azure Monitor's full suite of capabilities. The project provisions Windows and Linux VMs, configures Log Analytics agents, implements VM Insights through Azure Policy, creates sophisticated alert rules using both Kusto queries and metrics, and builds custom visualizations through dashboards and workbooks.
- Log Analytics Workspace: Centralized log collection and analysis
- VM Insights: Performance monitoring and dependency mapping
- Azure Policy Integration: Automated agent deployment and configuration
- Kusto Query Language: Custom log-based alerting
- Metric Alerts: Threshold-based monitoring
- Dashboard Creation: Visual monitoring interfaces
- Workbook Development: Interactive reporting and analysis
- Alert Action Groups: Email notification configuration
- RBAC Management: Role-based access control
- Lab Objectives
- Architecture
- Task Breakdown
- Terraform Modules
- Monitoring Configuration
- Alert Rules
- Dashboard Components
- Workbook Features
- Key Features
- Prerequisites
- Getting Started
- Repository Structure
- Screenshots
- Validation
- Credits
- Deploy monitoring infrastructure using Azure Monitor
- Configure Log Analytics workspace for centralized logging
- Implement VM Insights via Azure Policy
- Create alert rules using Kusto queries and metrics
- Build custom dashboards for real-time monitoring
- Develop workbooks for interactive analysis
- Configure notifications through Action Groups
- Manage RBAC for proper access control
- Azure Monitor architecture and components
- Log Analytics workspace configuration
- VM Insights deployment and usage
- Kusto Query Language (KQL) for log analysis
- Alert rule creation and management
- Metric-based monitoring strategies
- Dashboard design and implementation
- Workbook development techniques
- Azure Policy for automated compliance
- Action Group configuration
Azure Subscription
↓
┌─────────────────────────────────────────┐
│ Log Analytics Workspace │
│ - Centralized log collection │
│ - Data retention policies │
│ - KQL query engine │
└─────────────────────────────────────────┘
↓
┌──────┴──────┐
↓ ↓
┌─────────┐ ┌─────────┐
│ Windows │ │ Linux │
│ VM │ │ VM │
│ + Agent │ │ + Agent │
└─────────┘ └─────────┘
↓ ↓
┌─────────────────────────────────────────┐
│ VM Insights │
│ - Performance data │
│ - Dependency mapping │
│ - Health monitoring │
└─────────────────────────────────────────┘
Monitored VMs
↓
Data Collection (Agents)
↓
Log Analytics Workspace
↓
┌────────────────────────────┐
│ Alert Rules │
│ ├─ Log-based (KQL) │
│ │ - CPU > 95% │
│ │ - Memory > 90% │
│ ├─ Metric-based │
│ │ - Custom metrics │
│ └─ VM Insights-based │
│ - Performance metrics │
└────────────────────────────┘
↓
Action Groups
↓
Email Notifications
Azure Monitor
↓
┌─────────────────────────────────────────┐
│ Custom Dashboard │
│ ├─ CPU Usage % │
│ ├─ Memory Usage % │
│ ├─ Free Memory (MB) │
│ ├─ Disk Free Space (GB) │
│ ├─ Disk Used Space % │
│ └─ Network Usage (Mbit) │
└─────────────────────────────────────────┘
↓
┌─────────────────────────────────────────┐
│ Custom Workbook │
│ ├─ 2+ Metric Charts │
│ └─ 2+ Log Query Charts │
└─────────────────────────────────────────┘
Requirements: ✅ Create Azure Log Analytics Workspace ✅ Create Windows Virtual Machine ✅ Create Linux Virtual Machine
Implementation:
-
Log Analytics Workspace: Centralized workspace for both VMs
- Location: Same as resource group
- SKU: PerGB2018 (pay-as-you-go)
- Retention: 30 days default
-
Windows VM:
- OS: Windows Server 2019/2022
- Size: Standard_B2s (2 vCPU, 4 GB RAM)
- Disk: Standard SSD
- Network: Public IP for remote access
-
Linux VM:
- OS: Ubuntu 20.04/22.04 LTS
- Size: Standard_B2s (2 vCPU, 4 GB RAM)
- Disk: Standard SSD
- Network: Public IP for remote access
Terraform Module: modules/log_analytics/
Requirements: ✅ Install Log Analytics agent on both VMs ✅ Add VM Insights using Azure Policy ✅ Configure data collection:
- System events
- Application events
- Performance counters (CPU, RAM, Free Space)
Implementation Details:
Log Analytics Agent:
- Windows: Microsoft Monitoring Agent (MMA)
- Linux: OMS Agent for Linux
- Connection: Workspace ID and Primary Key
- Status: Running and reporting
VM Insights via Azure Policy:
- Policy: "Enable Azure Monitor for VMs"
- Scope: Resource group or subscription
- Remediation: Automatic for existing VMs
- Effect: DeployIfNotExists
Data Sources Configuration:
Windows Event Logs:
- System: Error, Warning, Critical
- Application: Error, Warning, Critical
Linux Syslog:
- Facilities: auth, authpriv, cron, daemon, kern, syslog
- Levels: Error, Warning, Critical
Performance Counters:
Windows:
Processor(_Total)\% Processor Time - 60s
Memory\% Committed Bytes In Use - 60s
Memory\Available MBytes - 60s
LogicalDisk(_Total)\% Free Space - 60s
LogicalDisk(_Total)\Free Megabytes - 60s
Network Adapter(*)\Bytes Sent/sec - 60s
Network Adapter(*)\Bytes Received/sec - 60s
Linux:
Processor\PercentProcessorTime - 60s
Memory\PercentUsedMemory - 60s
Memory\AvailableMBytes - 60s
LogicalDisk\PercentFreeSpace - 60s
LogicalDisk\FreeSpaceInMBytes - 60s
Network\BytesSentPerSecond - 60s
Network\BytesReceivedPerSecond - 60s
Terraform Module: modules/monitoring/
Requirements: ✅ Create alert rules for custom alerts (CPU > 95%, Memory > 90%) using Kusto query ✅ Create metric alert rules for 2 metrics using Metric Explorer ✅ Create alert rules for 2 metrics using VM Insights ✅ Create dashboard with 6 components ✅ Create workbook with 2+ metric charts and 2+ log charts
Alert 1: High CPU Usage
Perf
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| where InstanceName == "_Total"
| summarize AggregatedValue = avg(CounterValue) by bin(TimeGenerated, 5m), Computer
| where AggregatedValue > 95Configuration:
- Frequency: Every 5 minutes
- Time range: Last 5 minutes
- Threshold: 0 results (fires when condition is met)
- Severity: Critical (Sev 1)
Alert 2: High Memory Usage
Perf
| where ObjectName == "Memory" and CounterName == "% Committed Bytes In Use"
| summarize AggregatedValue = avg(CounterValue) by bin(TimeGenerated, 5m), Computer
| where AggregatedValue > 90Configuration:
- Frequency: Every 5 minutes
- Time range: Last 5 minutes
- Threshold: 0 results
- Severity: Warning (Sev 2)
Terraform Module: modules/alerts/
Alert 3: Percentage CPU (Metric)
- Metric: Percentage CPU
- Operator: Greater than
- Threshold: 80%
- Aggregation: Average
- Period: 5 minutes
- Frequency: 1 minute
Alert 4: Available Memory Bytes (Metric)
- Metric: Available Memory Bytes
- Operator: Less than
- Threshold: 500 MB (524288000 bytes)
- Aggregation: Average
- Period: 5 minutes
- Frequency: 1 minute
Alert 5: VM Performance - CPU
- Source: VM Insights
- Metric: CPU utilization percentage
- Condition: Greater than 85%
- Evaluation: 5-minute average
Alert 6: VM Performance - Memory
- Source: VM Insights
- Metric: Memory utilization percentage
- Condition: Greater than 85%
- Evaluation: 5-minute average
Dashboard Components (6 required):
-
CPU Usage %
- Type: Line chart
- Source: Performance counters
- Metric: Processor Time
- Aggregation: Average
- Time range: Last 24 hours
-
Memory Usage %
- Type: Line chart
- Source: Performance counters
- Metric: Committed Bytes In Use
- Aggregation: Average
- Time range: Last 24 hours
-
Free Memory (MB)
- Type: Gauge/Number tile
- Source: Performance counters
- Metric: Available MBytes
- Aggregation: Latest
- Time range: Last 1 hour
-
Logical Disk Free Space (GB)
- Type: Bar chart
- Source: Performance counters
- Metric: Free Space
- Aggregation: Average by disk
- Time range: Last 4 hours
-
Logical Disk Used Space %
- Type: Pie chart
- Source: Performance counters
- Metric: % Used Space
- Aggregation: Latest
- Time range: Current
-
Network Usage (Mbit)
- Type: Area chart
- Source: Performance counters
- Metrics: Bytes Sent/Received per second
- Conversion: Bytes to Mbit
- Aggregation: Sum
- Time range: Last 12 hours
Terraform Module: modules/dashboard/
Minimum Requirements: 2 metric charts + 2 log view charts
Metric Charts:
Chart 1: CPU Utilization Trend
- Data source: Azure Monitor Metrics
- Metric: Percentage CPU
- Visualization: Line chart
- Split by: Virtual Machine
- Time range: Last 7 days
Chart 2: Memory Available
- Data source: Azure Monitor Metrics
- Metric: Available Memory Bytes
- Visualization: Area chart
- Aggregation: Average
- Time range: Last 24 hours
Log Query Charts:
Chart 3: Top CPU Consumers
Perf
| where TimeGenerated > ago(1h)
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize AvgCPU = avg(CounterValue) by Computer
| order by AvgCPU desc
| take 10- Visualization: Bar chart
Chart 4: Memory Trend by Computer
Perf
| where TimeGenerated > ago(24h)
| where ObjectName == "Memory" and CounterName == "Available MBytes"
| summarize AvgMemory = avg(CounterValue) by Computer, bin(TimeGenerated, 1h)
| render timechart- Visualization: Time chart
Additional Workbook Features:
- Parameters for VM selection
- Time range selector
- Interactive filters
- Export capabilities
Terraform Module: modules/workbook/
modules/
├── log_analytics/ # Workspace creation
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
│
├── monitoring/ # VM agents and data sources
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
│
├── alerts/ # Alert rules (KQL, Metric, VM Insights)
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
│
├── dashboard/ # Custom dashboard
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
│
├── workbook/ # Custom workbook
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
│
└── rbac/ # Role assignments
├── main.tf
├── variables.tf
└── outputs.tf
Purpose: Create and configure Log Analytics workspace
Resources:
azurerm_log_analytics_workspaceazurerm_log_analytics_solution(for VM Insights)
Outputs:
- Workspace ID
- Workspace primary key
- Workspace location
Purpose: Deploy agents and configure data collection
Resources:
azurerm_virtual_machine_extension(Log Analytics agent)azurerm_monitor_data_collection_ruleazurerm_monitor_data_collection_rule_association
Configuration:
- Windows and Linux agent deployment
- Performance counter configuration
- Event log collection setup
Purpose: Create all alert rules and action groups
Resources:
azurerm_monitor_scheduled_query_rules_alert_v2(KQL alerts)azurerm_monitor_metric_alert(Metric alerts)azurerm_monitor_action_group(Email notifications)
Alert Types:
- Log-based (Kusto queries)
- Metric-based (Azure Monitor metrics)
- VM Insights-based (Performance metrics)
Purpose: Create custom monitoring dashboard
Resources:
azurerm_portal_dashboard
Components:
- CPU usage visualization
- Memory metrics
- Disk space monitoring
- Network traffic charts
Purpose: Create interactive workbook
Resources:
azurerm_application_insights_workbook
Features:
- Metric-based charts
- Log query visualizations
- Interactive parameters
- Custom layouts
Purpose: Grant Contributor role to trainer
Resources:
azurerm_role_assignment
Configuration:
- Scope: Subscription level
- Role: Contributor
- Principal: Trainer email/object ID
Performance Counters Collection Interval: 60 seconds
Windows Performance Counters:
performance_counters = [
"\\Processor(_Total)\\% Processor Time",
"\\Memory\\% Committed Bytes In Use",
"\\Memory\\Available MBytes",
"\\LogicalDisk(_Total)\\% Free Space",
"\\LogicalDisk(_Total)\\Free Megabytes",
"\\Network Adapter(*)\\Bytes Sent/sec",
"\\Network Adapter(*)\\Bytes Received/sec"
]Linux Performance Counters:
performance_counters = [
"Processor(*)\\PercentProcessorTime",
"Memory(*)\\PercentUsedMemory",
"Memory(*)\\AvailableMBytes",
"LogicalDisk(*)\\PercentFreeSpace",
"LogicalDisk(*)\\FreeSpaceInMBytes",
"Network(*)\\BytesSentPerSecond",
"Network(*)\\BytesReceivedPerSecond"
]Enabled Features:
- Performance monitoring
- Map (dependency visualization)
- Health diagnostics
Azure Policy Assignment:
- Initiative: "Enable Azure Monitor for VMs"
- Effect: DeployIfNotExists
- Remediation: Automatic
- Scope: Resource group
| Alert Name | Type | Metric/Query | Threshold | Severity | Frequency |
|---|---|---|---|---|---|
| High CPU (KQL) | Log | CPU > 95% | 0 results | Sev 1 | 5 min |
| High Memory (KQL) | Log | Memory > 90% | 0 results | Sev 2 | 5 min |
| CPU Percentage | Metric | Percentage CPU | 80% | Sev 2 | 1 min |
| Low Memory | Metric | Available Memory | <500MB | Sev 2 | 1 min |
| VM CPU Insights | VM Insights | CPU utilization | 85% | Sev 2 | 5 min |
| VM Memory Insights | VM Insights | Memory utilization | 85% | Sev 2 | 5 min |
Email Action Group:
- Name: "email-action-group"
- Email addresses: Admin, DevOps team
- SMS: Optional
- Webhook: Optional for integration
Configuration:
resource "azurerm_monitor_action_group" "email" {
name = "email-action-group"
resource_group_name = var.resource_group_name
short_name = "emailag"
email_receiver {
name = "admin"
email_address = var.admin_email
}
}Top Row: Current Status
- CPU Usage % (Gauge)
- Memory Usage % (Gauge)
- Free Memory MB (Number)
Middle Row: Resource Utilization
- Disk Free Space GB (Bar chart)
- Disk Used Space % (Pie chart)
Bottom Row: Network Activity
- Network In/Out Mbit (Area chart)
{
"lenses": {
"0": {
"order": 0,
"parts": {
"0": { "position": { "x": 0, "y": 0, "colSpan": 4, "rowSpan": 3 } },
"1": { "position": { "x": 4, "y": 0, "colSpan": 4, "rowSpan": 3 } },
"2": { "position": { "x": 8, "y": 0, "colSpan": 4, "rowSpan": 3 } }
}
}
}
}Parameters:
- Virtual Machine selector (dropdown)
- Time range picker
- Resource group filter
- Subscription selector
Visualizations:
- Line charts for trends
- Bar charts for comparisons
- Tables for detailed data
- Maps for geographical distribution
Queries:
- Real-time performance metrics
- Historical trend analysis
- Anomaly detection
- Capacity planning
- Multi-platform support: Windows and Linux VMs
- Real-time metrics: CPU, memory, disk, network
- Log aggregation: Centralized event collection
- Performance tracking: Historical trend analysis
- Multiple alert types: Log-based, metric-based, insights-based
- Intelligent thresholds: CPU 95%, Memory 90%
- Email notifications: Immediate alert delivery
- Severity levels: Critical, Warning, Informational
- Interactive dashboards: Real-time monitoring views
- Custom workbooks: Ad-hoc analysis capabilities
- Multiple chart types: Lines, bars, pies, gauges
- Export functionality: PDF, Excel, CSV
- Terraform modules: Reusable, modular configuration
- Version control: Infrastructure changes tracked
- Reproducible deployments: Consistent environments
- Automated provisioning: Rapid deployment
- Terraform: >= 1.0.0
- Azure CLI: Latest version
- Azure Subscription: Active with sufficient quota
- Subscription Contributor: Resource creation
- User Access Administrator: RBAC assignments
- Monitoring Contributor: Alert rule creation
- Azure Monitor concepts
- Kusto Query Language basics
- Terraform HCL syntax
- Azure VM fundamentals
- Networking basics
# Clone repository
git clone <repository-url>
cd azure-monitor-lab1
# Configure Azure authentication
az login
# Set subscription
az account set --subscription "your-subscription-id"# Create terraform.tfvars
cat > terraform.tfvars <<EOF
resource_group_name = "rg-monitoring-lab"
location = "eastus"
admin_email = "[email protected]"
trainer_email = "[email protected]"
vm_admin_username = "azureuser"
vm_admin_password = "ComplexPassword123!"
EOF# Initialize Terraform
terraform init
# Review plan
terraform plan
# Apply configuration
terraform apply
# Note outputs
terraform output# Check VMs are running
az vm list --resource-group rg-monitoring-lab --output table
# Verify Log Analytics workspace
az monitor log-analytics workspace show \
--resource-group rg-monitoring-lab \
--workspace-name la-workspace
# Check agent status
az vm extension list \
--resource-group rg-monitoring-lab \
--vm-name windows-vm \
--output table- Navigate to Azure Portal
- Go to Azure Monitor
- View dashboards, workbooks, and alerts
- Test alert rules by generating load
.
├── modules/
│ ├── log_analytics/ # Workspace module
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── monitoring/ # Agent & data collection
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── alerts/ # Alert rules
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── dashboard/ # Custom dashboard
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── workbook/ # Custom workbook
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ └── rbac/ # Role assignments
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
│
├── main.tf # Root module
├── variables.tf # Root variables
├── providers.tf # Provider configuration
├── backend.tf # State backend
├── outputs.tf # Root outputs
└── README.md # This file
# Verify all resources
az resource list --resource-group rg-monitoring-lab --output table
# Check Log Analytics workspace
az monitor log-analytics workspace show \
--resource-group rg-monitoring-lab \
--workspace-name la-workspace
# Verify VM extensions
az vm extension list \
--resource-group rg-monitoring-lab \
--vm-name windows-vm# Test Kusto query
az monitor log-analytics query \
--workspace la-workspace \
--analytics-query "Perf | where ObjectName == 'Processor' | take 10"
# List alert rules
az monitor metrics alert list \
--resource-group rg-monitoring-lab
# Check action groups
az monitor action-group list \
--resource-group rg-monitoring-lab# Generate CPU load on Windows VM
# RDP into Windows VM and run:
# for /l %i in (0,0,0) do @echo test
# Generate CPU load on Linux VM
# SSH into Linux VM and run:
# stress --cpu 2 --timeout 300This repository documents personal learning progress through a DevOps bootcamp. While it's primarily for educational purposes, suggestions and improvements are welcome!
-
Report Issues: Found a bug or error in scripts?
- Open an issue describing the problem
- Include script name and error message
- Provide steps to reproduce
-
Suggest Improvements: Have ideas for better implementations?
- Fork the repository
- Create a feature branch
- Submit pull request with clear description
-
Share Knowledge: Learned something new?
- Add comments or documentation
- Create additional practice exercises
- Write tutorials or guides
This project is created for educational purposes as part of a DevOps bootcamp internship.
Educational Use: Feel free to use these scripts and documentation for learning purposes.
Attribution: If you use or reference this work, please provide attribution to the original author.
No Warranty: These scripts are provided "as is" without warranty of any kind. Use at your own risk, especially in production environments.