-
Notifications
You must be signed in to change notification settings - Fork 356
docs: add autoscale-inference-workloads-with-kaito blog #5507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a new blog post about autoscaling KAITO inference workloads on AKS using KEDA. The post introduces the alpha autoscaling feature released in KAITO v0.8.0 and provides a comprehensive guide for enabling intelligent autoscaling based on service monitoring metrics.
Key Changes
- New blog post documenting KAITO inference workload autoscaling with KEDA
- Includes architecture overview, prerequisites, installation steps, and quickstart guide
- Demonstrates using the new InferenceSet CRD with KEDA's external scaler pattern
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Show resolved
Hide resolved
6d99ae0 to
7ba066d
Compare
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
sdesai345
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments!
fix typo
ab681e1 to
c44f47c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 3 changed files in this pull request and generated 11 comments.
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 3 changed files in this pull request and generated 6 comments.
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 3 changed files in this pull request and generated 9 comments.
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 3 changed files in this pull request and generated 14 comments.
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Show resolved
Hide resolved
website/blog/2025-12-11-autoscale-inference-workloads-with-kaito/index.md
Outdated
Show resolved
Hide resolved
|
@pauldotyu @sdesai345 I have addressed all you comments, could you take a look again? thx |
| @@ -0,0 +1,274 @@ | |||
| --- | |||
| title: "Autoscale KAITO inference workloads on AKS using KEDA" | |||
| date: "2026-01-15" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should push this date out to when you think the post will actually be published.
|
|
||
| LLM inference service is a basic and widely used feature in KAITO. As the number of waiting inference requests increases, it's necessary to scale more inference instances to prevent blocking inference requests. Conversely, if the number of waiting inference requests declines, consider reducing inference instances to improve GPU resource utilization. Kubernetes Event-driven Autoscaling (KEDA) is well-suited for inference pod autoscaling. It enables event-driven, fine-grained scaling based on external metrics and triggers. KEDA supports a wide range of event sources (like custom metrics), allowing pods to scale precisely in response to workload demand. This flexibility and extensibility make KEDA ideal for dynamic, cloud-native applications that require responsive and efficient autoscaling. | ||
|
|
||
| To enable intelligent autoscaling for KAITO inference workloads using service.monitoring metrics, use the following components and features: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is service.monitoring a typo here, should it include a dot between service and monitoring?
|
|
||
| - [Kubernetes Event Driven Autoscaling (KEDA)](https://github.com/kedacore/keda) | ||
|
|
||
| - **[keda.kaito.scaler](https://github.com/kaito-project/keda-kaito-scaler)** A dedicated KEDA external scaler, eliminating the need for external dependencies such as Prometheus. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - **[keda.kaito.scaler](https://github.com/kaito-project/keda-kaito-scaler)** � A dedicated KEDA external scaler, eliminating the need for external dependencies such as Prometheus. | |
| - **[KEDA KAITO Scaler](https://github.com/kaito-project/keda-kaito-scaler)** � A dedicated KEDA external scaler, eliminating the need for external dependencies such as Prometheus. |
|
|
||
| ### Architecture | ||
|
|
||
| The following diagram shows how keda-kaito-scaler integrates KAITO InferenceSet with KEDA to autoscale inference workloads on AKS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The following diagram shows how keda-kaito-scaler integrates KAITO InferenceSet with KEDA to autoscale inference workloads on AKS: | |
| The following diagram shows how KEDA KAITO Scaler integrates KAITO InferenceSet with KEDA to autoscale inference workloads on AKS: |
|
|
||
| ### Metric-Based KEDA Scaler | ||
|
|
||
| #### Install keda-kaito-scaler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #### Install keda-kaito-scaler | |
| #### Install KEDA KAITO Scaler |
|
|
||
| #### Install keda-kaito-scaler | ||
|
|
||
| > This component is required only when using metric-based KEDA scaler, ensure that keda-kaito-scaler is installed within the same namespace as KEDA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| > This component is required only when using metric-based KEDA scaler, ensure that keda-kaito-scaler is installed within the same namespace as KEDA. | |
| > This component is required only when using metric-based KEDA scaler, ensure that KEDA KAITO Scaler is installed within the same namespace as KEDA. |
pauldotyu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestions mostly around casing of "KEDA KAITO Scaler" throughout the doc.
No description provided.