update multitenant docs, add best practice guide (#4090)

bechols · claude · brianmacdonald-temporal · web-flow · commit 2a307127af87 · 2026-01-29T16:55:36.000-05:00
* update multitenant docs, add best practice guide

* fix links

* Apply suggestions from code review

* Apply suggestions from code review

* Fix broken links after cloud docs path change

Update links from /production-deployment/cloud/... to /cloud/... to
reflect the cloud docs directory restructuring.

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* Revise multi-tenant feature page for open source and Cloud

- Split namespace isolation into open source and Cloud subsections
- Add open source benefits: Workflow ID uniqueness, resource isolation,
  configuration boundaries, custom Authorizer access control
- Clarify what Cloud adds: API keys/mTLS, built-in RBAC, rate limits,
  HA replication, Nexus
- Condense application multi-tenancy section, link to best practices

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* Move Nexus to open source section (available in both)

Nexus is GA for self-hosted and Cloud. Added it to the open source
section and removed from Cloud-only features.

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* Update multi-tenant-patterns.mdx

---------

Co-authored-by: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;
Co-authored-by: Brian MacDonald &lt;brian.macdonald@temporal.io&gt;
diff --git a/docs/evaluate/development-production-features/multi-tenant.mdx b/docs/evaluate/development-production-features/multi-tenant.mdx
@@ -1,35 +1,53 @@
 ---
 id: multi-tenancy
 title: Multi-tenancy - Temporal feature
-description: Learn about Temporal Cloud's multi-tenant architecture and how it enhances scalability, efficiency, and cost-effectiveness.
+description: Learn about Temporal's namespace isolation for multi-tenancy and how to build multi-tenant applications.
 sidebar_label: Multi-tenancy
 tags:
 - Temporal Cloud
+- Multitenancy
 keywords:
 - multi-tenant
 - Temporal Cloud
-- cloud architecture
-- scalability
-- cost-effectiveness
-- noisy neighbor
-- database performance
-- high throughput
+- namespace isolation
+- multi-tenant applications
+- tenant isolation
 ---
 
 import { RelatedReadContainer, RelatedReadItem } from '@site/src/components';
 
-A Namespace is a unit of isolation within the Temporal Platform -- but even a single Namespace is still multi-tenant.
-Multi-tenancy ensures extra capacity is available for all customers during traffic spikes.
+Multi-tenancy in Temporal operates at two levels:
 
-However, multi-tenancy can also presents the challenge of "noisy neighbors", where high-traffic tenants consume excess resources, causing slower performance for other tenants.
-This is a common problem for database scaling.
+## Namespace isolation
 
-Temporal's write-heavy workload, where changes in execution state are constantly written to the persistence layer, demands a database that supports reliably high throughput with low latency for multiple customers, concurrently and fairly.
+[Namespaces](/namespaces) are Temporal's unit of isolation, providing logical separation for multi-tenant deployments in both open source Temporal and Temporal Cloud.
 
-With Temporal Cloud, customers pay for consumption instead of entire sets of hardware, providing a cost-effective solution.
-Temporal Cloud's architecture scales to handle multiple tenants efficiently.
+### Open source Temporal
+
+Namespaces in self-hosted Temporal provide:
+
+- **Workflow ID uniqueness**: Temporal guarantees unique Workflow IDs within a Namespace. Different Namespaces can have Workflows with the same ID without conflict.
+- **Resource isolation**: Traffic from one Namespace does not impact other Namespaces on the same Temporal Service.
+- **Configuration boundaries**: Settings like [Retention Period](/temporal-service/temporal-server#retention-period) and [Archival](/temporal-service/archival) destination are configured per Namespace.
+- **Access control**: Use a custom [Authorizer](/self-hosted-guide/security#authorization) on your Frontend Service to restrict who can access each Namespace.
+- **Inter-namespace communication**: Use [Nexus](/evaluate/nexus) for controlled communication between Namespaces.
+
+### Temporal Cloud
+
+Temporal Cloud builds on these capabilities with additional isolation guarantees:
+
+- **Independent authentication** via [API keys](/cloud/api-keys) or [mTLS certificates](/cloud/certificates)
+- **Built-in [role-based access controls](/cloud/users#namespace-level-permissions)** without custom Authorizer configuration
+- **Separate [rate limits](/cloud/limits#namespace-level)** to prevent noisy neighbor problems
+- **[High availability replication](/cloud/high-availability)** across regions
 
 <RelatedReadContainer>
-  <RelatedReadItem path="https://docs.temporal.io/cloud/security#namespace-isolation" text="Namespace Isolation" archetype="cloud-guide" />
-  <RelatedReadItem path="https://docs.temporal.io/cloud/pricing" text="Cost-effective Consumption" archetype="cloud-guide" />
+  <RelatedReadItem path="/cloud/security#namespace-isolation" text="Namespace Isolation Details" archetype="cloud-guide" />
+  <RelatedReadItem path="/cloud/pricing" text="Temporal Cloud Pricing" archetype="cloud-guide" />
 </RelatedReadContainer>
+
+## Application multi-tenancy
+
+Many organizations use Temporal to power their own multi-tenant SaaS applications, isolating their customers' workloads using Task Queues, Search Attributes, and Worker design patterns.
+
+See the [multi-tenant application patterns guide](/production-deployment/multi-tenant-patterns) for detailed recommendations on architecting multi-tenant applications with Temporal.
diff --git a/docs/evaluate/temporal-cloud/security.mdx b/docs/evaluate/temporal-cloud/security.mdx
@@ -10,6 +10,7 @@ keywords:
   - security
   - temporal cloud
 tags:
+  - Multitenancy
   - Security
   - Temporal Cloud
 ---
@@ -50,9 +51,34 @@ By deploying a [Codec Server](/production-deployment/data-encryption) you can se
 ### Namespace isolation
 
 The base unit of isolation in a Temporal environment is a [Namespace](/namespaces).
-Each Temporal Cloud account can have multiple Namespaces.
-A Namespace (regardless of account) cannot interact with other Namespaces.
-Each Namespace is available through a secure gRPC endpoint and an HTTPS (TLS) endpoint.
+Each Temporal Cloud account can have multiple Namespaces, and each Namespace is isolated to ensure your workloads remain secure and performant.
+
+#### Authentication
+
+Each Namespace is secured with your choice of authentication method:
+- **mTLS certificates** - Namespace-specific X.509 certificates for mutual TLS authentication
+- **API keys** - Namespace-scoped API keys for authentication
+
+See [API Keys](/cloud/api-keys) and [mTLS Certificates](/cloud/certificates) for more details on configuring authentication for your Namespace.
+
+#### Rate limiting
+
+Temporal Cloud protects each Namespace with separate rate limits to prevent noisy neighbor problems:
+- **Actions Per Second (APS)** - Limits the rate of [actions](/best-practices/managing-aps-limits) performed in your Workflows
+- **Operations Per Second (OPS)** - Limits the rate of all [operations](/references/operation-list) that create load on Temporal Server
+
+These per-Namespace rate limits ensure that one Namespace experiencing a traffic spike cannot impact the performance or reliability of other Namespaces, whether those Namespaces belong to a single Temporal Cloud account or separate ones.
+
+See [Rate limiting](/cloud/limits) for more information about Temporal Cloud limits, and [Monitoring trends against limits](/cloud/service-health#rps-aps-rate-limits) for monitoring best practices.
+
+#### Inter-Namespace communication
+
+Namespaces are isolated by default. The only way for Workflows in one Namespace to interact with Workflows in another Namespace is through [Temporal Nexus](/nexus), which provides controlled, secure cross-Namespace communication via Nexus Endpoints.
+
+See [Nexus Security](/nexus/security) for details on how Nexus enables secure inter-Namespace communication.
+
+#### Logical segregation
+
 Temporal Cloud is a multi-tenant service.
 Namespaces in the same environment are logically segregated.
 Namespaces do not share data processing or data storage across regional boundaries.
diff --git a/docs/production-deployment/multi-tenant-patterns.mdx b/docs/production-deployment/multi-tenant-patterns.mdx
@@ -0,0 +1,282 @@
+---
+id: multi-tenant-patterns
+title: Multi-tenant application patterns
+sidebar_label: Multi-tenant patterns
+description: Learn how to build multi-tenant applications using Temporal with task queue isolation patterns, worker design, and best practices.
+slug: /production-deployment/multi-tenant-patterns
+toc_max_heading_level: 4
+keywords:
+  - multi-tenant
+  - task queues
+  - worker patterns
+  - SaaS
+tags:
+  - Multitenancy
+  - Best Practices
+---
+
+import { RelatedReadContainer, RelatedReadItem } from '@site/src/components';
+
+Many SaaS providers and large enterprise platform teams use a single Temporal [Namespace](/namespaces) with [per-tenant Task Queues](#1-task-queues-per-tenant-recommended) to power their multi-tenant applications. This approach maximizes resource efficiency while maintaining logical separation between tenants.
+
+This guide covers architectural patterns, design considerations, and practical examples for building multi-tenant applications with Temporal.
+
+## Architectural principles
+
+When designing a multi-tenant Temporal application, follow these principles:
+
+- **Define your tenant model** - Determine what constitutes a tenant in your business (customers, pricing tiers, teams, etc.)
+- **Prefer simplicity** - Start with the simplest pattern that meets your needs
+- **Understand Temporal limits** - Design within the constraints of your Temporal deployment
+- **Test at scale** - Performance testing must drive your capacity decisions
+- **Plan for growth** - Consider how you'll onboard new tenants and scale workers
+
+## Architectural patterns
+
+There are three main patterns for multi-tenant applications in Temporal, listed from most to least recommended:
+
+### 1. Task queues per tenant (Recommended)
+
+**Use different [Task Queues](/task-queue) for each tenant's [Workflows](/workflows) and [Activities](/activities).**
+
+This is the recommended pattern for most use cases. Each tenant gets dedicated Task Queue(s), with [Workers](/workers) polling multiple tenant Task Queues in a single process.
+
+**Pros:**
+- Strong isolation between tenants
+- Efficient resource utilization
+- Flexible worker scaling
+- Easy to add new tenants
+- Can handle thousands of tenants per [Namespace](/namespaces)
+
+**Cons:**
+- Requires worker configuration management
+- Potential for uneven resource distribution
+- Need to prevent "noisy neighbor" issues at the worker level
+
+<RelatedReadContainer>
+  <RelatedReadItem path="#task-queue-isolation-pattern" text="Task Queue Isolation Pattern Details" archetype="feature-guide" />
+</RelatedReadContainer>
+
+### 2. Shared Workflow Task Queues, separate Activity Task Queues
+
+**Share [Workflow Task Queues](/task-queue) but use different [Activity Task Queues](/task-queue) per tenant.**
+
+Use this pattern when [Workflows](/workflows) are lightweight but [Activities](/activities) have heavy resource requirements or external dependencies that need isolation.
+
+**Pros:**
+- Easier worker management than full isolation
+- Activity-level tenant isolation
+- Good for compute-intensive Activities
+
+**Cons:**
+- Less isolation than pattern #1
+- Workflow visibility is shared
+- More complex to reason about
+
+### 3. Namespace per tenant
+
+**Use a separate [Namespace](/namespaces) for each tenant.**
+
+Only practical for a small number (< 50) of high-value tenants due to operational overhead.
+
+**Pros:**
+- Complete isolation between tenants
+- Per-tenant rate limiting
+- Maximum security
+
+**Cons:**
+- Higher operational overhead
+- Credential and connectivity management per [Namespace](/namespaces)
+- Requires more [Workers](/workers) (minimum 2 per Namespace for high availability)
+- Expensive at scale
+
+<RelatedReadContainer>
+  <RelatedReadItem path="/evaluate/development-production-features/multi-tenancy#namespace-isolation" text="Namespace Isolation in Temporal Cloud" archetype="cloud-guide" />
+</RelatedReadContainer>
+
+## Task Queue isolation pattern
+
+This section details the recommended pattern for most multi-tenant applications.
+
+### Worker design
+
+When a [Worker](/workers) starts up:
+
+1. **Load tenant configuration** - Retrieve the list of tenants this Worker should handle (from config file, API, or database)
+2. **Create [Task Queues](/task-queue)** - For each tenant, generate a unique Task Queue name (e.g., `customer-{tenant-id}`)
+3. **Register [Workflows](/workflows) and [Activities](/activities)** - Register your Workflow and Activity implementations once, passing the tenant-specific Task Queue name
+4. **Poll multiple Task Queues** - A single Worker process polls all assigned tenant Task Queues
+
+```go
+// Example: Go worker polling multiple tenant Task Queues
+for _, tenant := range assignedTenants {
+    taskQueue := fmt.Sprintf("customer-%s", tenant.ID)
+
+    worker := worker.New(client, taskQueue, worker.Options{})
+    worker.RegisterWorkflow(YourWorkflow)
+    worker.RegisterActivity(YourActivity)
+}
+```
+
+### Routing requests to Task Queues
+
+Your application needs to route [Workflow](/workflows) starts and other operations to the correct tenant [Task Queue](/task-queue):
+
+```go
+// Example: Starting a Workflow for a specific tenant
+taskQueue := fmt.Sprintf("customer-%s", tenantID)
+workflowOptions := client.StartWorkflowOptions{
+    ID:        workflowID,
+    TaskQueue: taskQueue,
+}
+```
+
+Consider creating an API or service that:
+- Maps tenant IDs to Task Queue names
+- Tracks which [Workers](/workers) are handling which tenants
+- Allows both your application and Workers to read the mappings of:
+    1. Tenant IDs to Task Queues 
+    1. Workers to tenants
+
+### Capacity planning
+
+Key questions to answer through performance testing:
+
+**[Namespace](/namespaces) capacity:**
+- How many concurrent [Task Queue](/task-queue) pollers can your Namespace support?
+- What are your [Actions Per Second (APS)](/cloud/limits#actions-per-second) limits?
+- What are your [Operations Per Second (OPS)](/references/operation-list) limits?
+
+**[Worker](/workers) capacity:**
+- How many tenants can a single Worker process handle?
+- What are the CPU and memory requirements per tenant?
+- How many concurrent [Workflow](/workflows) executions per tenant?
+- How many concurrent [Activity](/activities) executions per tenant?
+
+**SDK configuration to tune:**
+- `MaxConcurrentWorkflowTaskExecutionSize`
+- `MaxConcurrentActivityExecutionSize`
+- `MaxConcurrentWorkflowTaskPollers`
+- `MaxConcurrentActivityTaskPollers`
+- Worker replicas (in Kubernetes deployments)
+
+### Provisioning new tenants
+
+Automate tenant onboarding with a Temporal [Workflow](/workflows):
+
+1. Create a tenant onboarding Workflow that:
+   - Validates tenant information
+   - Provisions infrastructure
+   - Deploys/updates [Worker](/workers) configuration
+   - Triggers Worker restarts or scaling
+   - Verifies the tenant is operational
+
+2. Store tenant-to-Worker mappings in a database or configuration service
+
+3. Update Worker deployments to pick up new tenant assignments
+
+## Practical example
+
+**Scenario:** A SaaS company has 1,000 customers and expects to grow to 5,000 customers over 3 years. They have 2 [Workflows](/workflows) and ~25 [Activities](/activities) per Workflow. All customers are on the same tier (no segmentation yet).
+
+### Assumptions
+
+| Item | Value |
+|------|-------|
+| Current customers | 1,000 |
+| Workflow Task Queues per customer | 1 |
+| Activity Task Queues per customer | 1 |
+| Max Task Queue pollers per Namespace | 5,000 |
+| SDK concurrent Workflow task pollers | 5 |
+| SDK concurrent Activity task pollers | 5 |
+| Max concurrent Workflow executions | 200 |
+| Max concurrent Activity executions | 200 |
+
+### Capacity calculations
+
+**[Task Queue](/task-queue) poller limits:**
+- Each [Worker](/workers) uses 10 pollers per tenant (5 Workflow + 5 Activity)
+- Maximum Workers in [Namespace](/namespaces): 5,000 pollers ÷ 10 = **500 Workers**
+
+**Worker capacity:**
+- Each Worker can theoretically handle 200 [Workflows](/workflows) and 200 [Activities](/activities) concurrently
+- Conservative estimate: **250 tenants per Worker** (accounting for overhead)
+- For 1,000 customers: **4 Workers minimum** (plus replicas for HA)
+- For 5,000 customers: **20 Workers minimum** (plus replicas for HA)
+
+**Namespace capacity:**
+- At 250 tenants per Worker, need 2 Workers per group of tenants (for HA)
+- Maximum tenants in Namespace: (500 Workers ÷ 2) × 250 = **62,500 tenants**
+
+:::note
+These are theoretical calculations based on SDK defaults. **Always perform load testing** to determine actual capacity for your specific workload. Monitor CPU, memory, and Temporal metrics during testing.
+
+While testing, also pay attention to your [metrics capacity and cardinality](/cloud/metrics/openmetrics/api-reference#managing-high-cardinality).
+:::
+
+### Worker assignment strategies
+
+**Option 1: Static configuration**
+- Each [Worker](/workers) reads a config file listing assigned tenant IDs
+- Simple to implement
+- Requires deployment to add tenants
+
+**Option 2: Dynamic API**
+- Workers call an API on startup to get assigned tenants
+- Workers identified by static ID (1 to N)
+- API returns tenant list based on Worker ID
+- More flexible, no deployment needed for new tenants
+
+## Best practices
+
+### Monitoring
+
+Track these [metrics](/references/sdk-metrics) per tenant:
+- [Workflow completion](/cloud/metrics/openmetrics/metrics-reference#workflow-completion-metrics) rates
+- [Activity execution](/cloud/metrics/openmetrics/metrics-reference#task-queue-metrics) rates
+- [Task Queue backlog](/cloud/metrics/openmetrics/metrics-reference#task-queue-metrics)
+- [Worker resource utilization](/references/sdk-metrics#worker_task_slots_used)
+- [Workflow failure rates](/encyclopedia/detecting-workflow-failures)
+
+### Handling noisy neighbors
+
+Even with [Task Queue](/task-queue) isolation, monitor for tenants that:
+- Generate excessive load
+- Have high failure rates
+- Cause [Worker](/workers) resource exhaustion
+
+Strategies:
+- Implement per-tenant rate limiting in your application
+- Move problematic tenants to dedicated Workers
+- Use [Workflow](/workflows)/[Activity](/activities) timeouts aggressively
+
+### Tenant lifecycle
+
+Plan for:
+- **Onboarding** - Automated provisioning [Workflow](/workflows)
+- **Scaling** - When to add new [Workers](/workers) for growing tenants
+- **Offboarding** - Graceful tenant removal and data cleanup
+- **Rebalancing** - Redistributing tenants across Workers
+
+### Search Attributes
+
+Use [Search Attributes](/search-attribute) to enable tenant-scoped queries:
+```go
+// Add tenant ID as a Search Attribute
+searchAttributes := map[string]interface{}{
+    "TenantId": tenantID,
+}
+```
+
+This allows filtering [Workflows](/workflows) by tenant in the UI and SDK:
+```sql
+TenantId = 'customer-123' AND ExecutionStatus = 'Running'
+```
+
+## Related resources
+
+<RelatedReadContainer>
+  <RelatedReadItem path="/evaluate/development-production-features/multi-tenancy" text="Multi-tenancy Overview" archetype="feature-guide" />
+  <RelatedReadItem path="/cloud/limits" text="Temporal Cloud Limits" archetype="cloud-guide" />
+  <RelatedReadItem path="/visibility" text="Visibility and Search Attributes" archetype="feature-guide" />
+</RelatedReadContainer>
diff --git a/sidebars.js b/sidebars.js