Skip to content

Conversation

@chrisghill
Copy link
Member

BIG changes. Got rid of all the monitoring/core-services stuff. Going to make these separate bundles to add on.

Also a lot of updates and improvements to the terraform itself. Many of these are breaking changes and there is no easy upgrade path - best is to create a new cluster and cutover.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces breaking changes to upgrade the EKS cluster infrastructure to version 2.0.0, removing monitoring/core-services functionality and modernizing the Terraform configuration with significant architectural improvements.

Changes:

  • Removed monitoring, Prometheus, Grafana, and custom resources modules - these will be offered as separate add-on bundles
  • Updated EKS cluster to use modern authentication modes (API_AND_CONFIG_MAP), Amazon Linux 2023 AMI, and enhanced security configurations
  • Migrated EBS CSI driver from Helm chart to AWS-managed add-on and added new managed add-ons (pod identity agent, snapshot controller)

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
massdriver.yaml Version bumped to 2.0.0, removed core_services and prometheus configuration, updated K8s version to 1.34, added newer instance types
src/main.tf Added vpc_id local, access_config for API-based auth, upgrade_policy, EKS access entry for nodes, changed to AL2023 AMI, updated naming from name_suffix to name
src/iam.tf Added SourceAccount conditions to IAM roles, refactored cluster encryption policy, renamed resources to use underscores
src/kms.tf Fixed partition reference bug, added Auto Scaling permissions, switched from var.vpc.specs.aws.region to data.aws_region.current.name
src/data.tf Added aws_region data source
src/logging.tf Reordered kms_key_id parameter
src/_massdriver_variables.tf Removed core_services and prometheus variables, changed name_suffix to name
operator.md Updated documentation for AL2023, API-based auth, and modern security features
core-services/addons.tf Migrated to AWS-managed add-ons (EBS CSI, pod identity, snapshot controller), added VPC CNI configuration
core-services/autoscaler.tf New file implementing cluster autoscaler with IRSA
core-services/storage_classes.tf New file creating gp3 storage classes (default, high-IOPS, io2)
core-services/service_account.tf Renamed resources from massdriver-cloud-provisioner to massdriver_access
core-services/_massdriver_variables.tf Removed core_services and prometheus variables, changed name_suffix to name
core-services/_artifacts.tf Updated to reference renamed massdriver_access_token
custom-resources/* Deleted entire custom-resources directory and cert-manager issuer configuration
core-services/observability.tf Deleted Prometheus and metrics-server configurations
core-services/monitoring.tf Deleted alarm channel and application alarms
core-services/core_services.tf Deleted ingress-nginx, external-dns, cert-manager, and EFS CSI configurations
Comments suppressed due to low confidence (1)

core-services/addons.tf:1

  • The throughput parameter value should likely be "1000" (MB/s) with proper units or validation. Consider adding a comment clarifying that this is in MB/s to prevent confusion, as AWS gp3 throughput is specified in MiB/s.
locals {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

key_arn = module.kms.key_arn
}
resources = ["secrets"]
resources = ["secrets"]
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent indentation: this line has extra leading spaces (8 spaces instead of the expected 4). This should be aligned with the rest of the encryption_config block.

Suggested change
resources = ["secrets"]
resources = ["secrets"]

Copilot uses AI. Check for mistakes.

# Add permissions for Auto Scaling to use the key for encrypted EBS volumes
statement {
sid = "Allow Auto Scaling to use the key"
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Auto Scaling KMS permissions statement lacks an 'effect' field. While Terraform may default to 'Allow', it's a best practice to explicitly specify 'effect = "Allow"' for clarity and to match the pattern used in other statements in this policy document.

Suggested change
sid = "Allow Auto Scaling to use the key"
sid = "Allow Auto Scaling to use the key"
effect = "Allow"

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +45
parameters = {
type = "gp3"
iops = "16000"
throughput = "1000"
encrypted = "true"
fsType = "ext4"
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation for the gp3 high-IOPS storage class parameters. Consider adding comments explaining that gp3 supports up to 16,000 IOPS and 1,000 MiB/s throughput, and that these values should be adjusted based on workload requirements.

Suggested change
parameters = {
type = "gp3"
iops = "16000"
throughput = "1000"
encrypted = "true"
fsType = "ext4"
// gp3 supports up to 16,000 IOPS and 1,000 MiB/s throughput; these values
// are set to the maximums here and should be adjusted based on workload
// performance requirements and cost considerations.
parameters = {
type = "gp3"
iops = "16000"
throughput = "1000"
encrypted = "true"
fsType = "ext4"

Copilot uses AI. Check for mistakes.
volume_binding_mode = "WaitForFirstConsumer"

parameters = {
type = "io2"
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation for the io2 storage class IOPS parameter. Consider adding a comment explaining that io2 supports up to 64,000 IOPS (depending on volume size) and this value should be tuned based on workload requirements and volume size.

Suggested change
type = "io2"
type = "io2"
// io2 supports up to 64,000 IOPS depending on volume size; tune this value based on workload requirements and volume size.

Copilot uses AI. Check for mistakes.
@chrisghill chrisghill merged commit e5d34af into main Feb 2, 2026
7 checks passed
@chrisghill chrisghill deleted the v2.0.0 branch February 2, 2026 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants