Skip to content

Conversation

@andyzhangx
Copy link
Contributor

No description provided.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new blog post about autoscaling KAITO inference workloads on AKS using KEDA. The post introduces the alpha autoscaling feature released in KAITO v0.8.0 and provides a comprehensive guide for enabling intelligent autoscaling based on service monitoring metrics.

Key Changes

  • New blog post documenting KAITO inference workload autoscaling with KEDA
  • Includes architecture overview, prerequisites, installation steps, and quickstart guide
  • Demonstrates using the new InferenceSet CRD with KEDA's external scaler pattern

@andyzhangx andyzhangx force-pushed the autoscale-inference-workloads-with-kaito branch from 6d99ae0 to 7ba066d Compare December 15, 2025 15:18
Copy link
Contributor

@sdesai345 sdesai345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments!

@andyzhangx andyzhangx requested a review from Copilot January 8, 2026 13:32
@andyzhangx andyzhangx force-pushed the autoscale-inference-workloads-with-kaito branch from ab681e1 to c44f47c Compare January 8, 2026 13:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 11 comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 6 comments.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 9 comments.

Copilot AI review requested due to automatic review settings January 14, 2026 14:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated 14 comments.

@andyzhangx
Copy link
Contributor Author

@pauldotyu @sdesai345 I have addressed all you comments, could you take a look again? thx

@@ -0,0 +1,274 @@
---
title: "Autoscale KAITO inference workloads on AKS using KEDA"
date: "2026-01-15"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should push this date out to when you think the post will actually be published.


LLM inference service is a basic and widely used feature in KAITO. As the number of waiting inference requests increases, it's necessary to scale more inference instances to prevent blocking inference requests. Conversely, if the number of waiting inference requests declines, consider reducing inference instances to improve GPU resource utilization. Kubernetes Event-driven Autoscaling (KEDA) is well-suited for inference pod autoscaling. It enables event-driven, fine-grained scaling based on external metrics and triggers. KEDA supports a wide range of event sources (like custom metrics), allowing pods to scale precisely in response to workload demand. This flexibility and extensibility make KEDA ideal for dynamic, cloud-native applications that require responsive and efficient autoscaling.

To enable intelligent autoscaling for KAITO inference workloads using service.monitoring metrics, use the following components and features:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is service.monitoring a typo here, should it include a dot between service and monitoring?


- [Kubernetes Event Driven Autoscaling (KEDA)](https://github.com/kedacore/keda)

- **[keda.kaito.scaler](https://github.com/kaito-project/keda-kaito-scaler)** – A dedicated KEDA external scaler, eliminating the need for external dependencies such as Prometheus.
Copy link
Contributor

@pauldotyu pauldotyu Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **[keda.kaito.scaler](https://github.com/kaito-project/keda-kaito-scaler)** A dedicated KEDA external scaler, eliminating the need for external dependencies such as Prometheus.
- **[KEDA KAITO Scaler](https://github.com/kaito-project/keda-kaito-scaler)** A dedicated KEDA external scaler, eliminating the need for external dependencies such as Prometheus.


### Architecture

The following diagram shows how keda-kaito-scaler integrates KAITO InferenceSet with KEDA to autoscale inference workloads on AKS:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following diagram shows how keda-kaito-scaler integrates KAITO InferenceSet with KEDA to autoscale inference workloads on AKS:
The following diagram shows how KEDA KAITO Scaler integrates KAITO InferenceSet with KEDA to autoscale inference workloads on AKS:


### Metric-Based KEDA Scaler

#### Install keda-kaito-scaler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### Install keda-kaito-scaler
#### Install KEDA KAITO Scaler


#### Install keda-kaito-scaler

> This component is required only when using metric-based KEDA scaler, ensure that keda-kaito-scaler is installed within the same namespace as KEDA.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> This component is required only when using metric-based KEDA scaler, ensure that keda-kaito-scaler is installed within the same namespace as KEDA.
> This component is required only when using metric-based KEDA scaler, ensure that KEDA KAITO Scaler is installed within the same namespace as KEDA.

Copy link
Contributor

@pauldotyu pauldotyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestions mostly around casing of "KEDA KAITO Scaler" throughout the doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants