[TT-5545] issue when retrieving a list of keys attached to an API - alternative solution #7651

NurayAhmadova · 2025-12-18T11:11:49Z

Description

Related Issue

Motivation and Context

How This Has Been Tested

Screenshots (if appropriate)

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Refactoring or add test (improvements in base code or adds test coverage to functionality)

Checklist

I ensured that the documentation is up to date
I explained why this PR updates go.mod in detail with reasoning why it's required
I would like a code coverage CI quality gate exception and have explained why

Ticket Details

TT-5545


Status	In Dev
Summary	Issue when retrieving a list of keys attached to an API

Generated at: 2025-12-19 09:26:09

… api - alternative solution

github-actions · 2025-12-18T11:12:59Z

API Changes

--- prev.txt	2025-12-19 09:26:47.587337275 +0000
+++ current.txt	2025-12-19 09:26:37.784371522 +0000
@@ -9829,6 +9829,8 @@
     SessionDetail returns the session detail using the storage engine (either in
     memory or Redis)
 
+func (b *DefaultSessionManager) SessionDetailBulk(orgID string, keyNames []string, hashed bool) (map[string]user.SessionState, error)
+
 func (b *DefaultSessionManager) Sessions(filter string) []string
     Sessions returns all sessions in the key store that match a filter key (a
     prefix)
@@ -10771,10 +10773,12 @@
 
 func (l *LDAPStorageHandler) GetListRange(keyName string, from, to int64) ([]string, error)
 
-func (r *LDAPStorageHandler) GetMultiKey(keyNames []string) ([]string, error)
+func (r *LDAPStorageHandler) GetMultiKey(_ []string) ([]string, error)
 
 func (l *LDAPStorageHandler) GetRawKey(filter string) (string, error)
 
+func (l *LDAPStorageHandler) GetRawMultiKey(keys []string) ([]string, error)
+
 func (l *LDAPStorageHandler) GetRollingWindow(keyName string, per int64, pipeline bool) (int, []interface{})
 
 func (l LDAPStorageHandler) GetSet(keyName string) (map[string]string, error)
@@ -11264,6 +11268,8 @@
 
 func (r *RPCStorageHandler) GetRawKey(keyName string) (string, error)
 
+func (r *RPCStorageHandler) GetRawMultiKey(keys []string) ([]string, error)
+
 func (r *RPCStorageHandler) GetRollingWindow(keyName string, per int64, pipeline bool) (int, []interface{})
 
 func (r RPCStorageHandler) GetSet(keyName string) (map[string]string, error)
@@ -11827,6 +11833,7 @@
 	UpdateSession(keyName string, session *user.SessionState, resetTTLTo int64, hashed bool) error
 	RemoveSession(orgID string, keyName string, hashed bool) bool
 	SessionDetail(orgID string, keyName string, hashed bool) (user.SessionState, bool)
+	SessionDetailBulk(orgID string, keyNames []string, hashed bool) (map[string]user.SessionState, error)
 	KeyExpired(newSession *user.SessionState) bool
 	Sessions(filter string) []string
 	ResetQuota(string, *user.SessionState, bool)
@@ -13382,6 +13389,10 @@
     the retrieval was successful. Currently, this method is not implemented and
     will cause a panic if invoked.
 
+func (s *DummyStorage) GetRawMultiKey(keys []string) ([]string, error)
+    GetRawMultiKey retrieves multiple values from the DummyStorage. Since
+    DummyStorage is an in-memory map, we just delegate to GetMultiKey.
+
 func (s *DummyStorage) GetRollingWindow(string, int64, bool) (int, []interface{})
     GetRollingWindow retrieves data for a specified rolling window; currently
     not implemented.
@@ -13434,6 +13445,7 @@
 type Handler interface {
 	GetKey(string) (string, error) // Returned string is expected to be a JSON object (user.SessionState)
 	GetMultiKey([]string) ([]string, error)
+	GetRawMultiKey([]string) ([]string, error)
 	GetRawKey(string) (string, error)
 	SetKey(string, string, int64) error // Second input string is expected to be a JSON object (user.SessionState)
 	SetRawKey(string, string, int64) error
@@ -13521,6 +13533,12 @@
 
 func (m MdcbStorage) GetRawKey(string) (string, error)
 
+func (m MdcbStorage) GetRawMultiKey(keys []string) ([]string, error)
+    GetRawMultiKey retrieves multiple values from the MdcbStorage based on a
+    slice of keys. Since MdcbStorage is a wrapper that manages fallback logic
+    between local and RPC layers, this method delegates to GetMultiKey to avoid
+    duplicating that fallback logic.
+
 func (m MdcbStorage) GetRollingWindow(key string, per int64, pipeline bool) (int, []interface{})
 
 func (m MdcbStorage) GetSet(key string) (map[string]string, error)
@@ -13633,6 +13651,9 @@
 
 func (r *RedisCluster) GetRawKey(keyName string) (string, error)
 
+func (r *RedisCluster) GetRawMultiKey(keys []string) ([]string, error)
+    GetRawMultiKey retrieves multiple values using a Pipeline.
+
 func (r *RedisCluster) GetRollingWindow(keyName string, per int64, pipeline bool) (int, []interface{})
 
 func (r *RedisCluster) GetSet(keyName string) (map[string]string, error)
@@ -13846,6 +13867,9 @@
 func (m *MockHandler) GetRawKey(arg0 string) (string, error)
     GetRawKey mocks base method.
 
+func (m *MockHandler) GetRawMultiKey(arg0 []string) ([]string, error)
+    GetRawMultiKey mocks base method.
+
 func (m *MockHandler) GetRollingWindow(key string, per int64, pipeline bool) (int, []any)
     GetRollingWindow mocks base method.
 
@@ -13951,6 +13975,9 @@
 func (mr *MockHandlerMockRecorder) GetRawKey(arg0 any) *gomock.Call
     GetRawKey indicates an expected call of GetRawKey.
 
+func (mr *MockHandlerMockRecorder) GetRawMultiKey(arg0 any) *gomock.Call
+    GetRawMultiKey indicates an expected call of GetRawMultiKey.
+
 func (mr *MockHandlerMockRecorder) GetRollingWindow(key, per, pipeline any) *gomock.Call
     GetRollingWindow indicates an expected call of GetRollingWindow.

probelabs · 2025-12-18T11:14:21Z

This pull request addresses a critical performance issue (TT-5545) where retrieving a list of API keys filtered by an API ID caused excessive memory consumption and timeouts. The previous implementation loaded all session objects into memory before filtering. This PR refactors the logic to be highly memory-efficient and scalable by first fetching all key identifiers and then processing them in manageable batches.

Files Changed Analysis

The changes span 11 files and introduce a new bulk-processing workflow across the API, authentication, and storage layers:

gateway/api.go: The core logic in handleGetAllKeys has been rewritten to implement a batching mechanism. It now iterates through keys in chunks of 1000, calling a new bulk-fetch method for each batch.
gateway/auth_manager.go: A new method, SessionDetailBulk, has been added to the SessionHandler interface to efficiently retrieve session details for a list of keys.
storage/storage.go: The Handler interface has been updated with a new GetRawMultiKey method, establishing a contract for bulk data retrieval from any underlying storage backend.
storage/redis_cluster.go: Contains a performant, pipeline-based implementation of GetRawMultiKey for Redis, ensuring efficient bulk-read operations.
Other Storage Handlers (ldap, rpc, mdcb, dummy): These have been updated to implement the new GetRawMultiKey interface method for compatibility.
Tests: New integration and unit tests have been added in gateway/api_test.go and gateway/auth_manager_test.go to validate the new functionality.

Architecture & Impact Assessment

What this PR accomplishes: It resolves a critical scalability bottleneck in the /tyk/keys API endpoint, making it viable for use in deployments with millions of API keys.
Key technical changes introduced:
- Introduction of a batch processing pattern for API key filtering.
- Addition of a SessionDetailBulk method to the session management layer.
- Modification of the core storage.Handler interface to support bulk-reading (GetRawMultiKey).
- Propagation of context.Context from the API layer down to the storage layer for better timeout and cancellation handling.
Affected system components: The changes impact the Gateway API (specifically the /tyk/keys endpoint), the Authentication Manager, and the entire Storage Layer, requiring updates to all storage backend implementations.

Key Filtering Flow

sequenceDiagram
    participant Client
    participant GatewayAPI
    participant SessionManager
    participant Storage

    Client->>GatewayAPI: GET /tyk/keys?api_id=some-api
    GatewayAPI->>SessionManager: Sessions(filter) # Get all key IDs
    SessionManager-->>GatewayAPI: [key1, key2, ..., keyN]

    loop For each batch of 1000 keys
        GatewayAPI->>SessionManager: SessionDetailBulk(ctx, batch_keys)
        SessionManager->>Storage: GetRawMultiKey(batch_keys)
        Storage-->>SessionManager: {key_data_1, key_data_2, ...}
        SessionManager-->>GatewayAPI: map[key]SessionState
        GatewayAPI->>GatewayAPI: Filter sessions in batch by api_id
    end

    GatewayAPI-->>Client: 200 OK [filtered_key1, ...]

Scope Discovery & Context Expansion

The modification of the storage.Handler interface is a significant architectural change with broad impact, as it necessitates updates across all supported storage backends. While the Redis implementation is efficient, the implementations for other backends have issues:

RPC & MDCB Storage: The GetMultiKey and GetRawMultiKey implementations in gateway/rpc_storage_handler.go and storage/mdcb_storage.go are incorrect. They perform sequential lookups and return after the first found key, rather than fetching the entire batch. This will lead to incomplete results for deployments using these backends.
LDAP Storage: The gateway/ldap_auth_handler.go implementation is a stub that returns an error, meaning this feature will not work for LDAP-based deployments.

The introduction of context.Context is a notable improvement, enhancing the gateway's resilience by allowing request timeouts and cancellations to propagate down to the storage layer.

Metadata

Review Effort: 4 / 5
Primary Label: bug

Powered by Visor from Probelabs

Last updated: 2025-12-19T09:34:52.851Z | Triggered by: pr_updated | Commit: b219123

💡 TIP: You can chat with Visor using /visor ask <your question>

probelabs · 2025-12-18T11:15:32Z

Security Issues (1)

Severity	Location	Issue
🟢 Info	`gateway/api.go:781`	The `filter` parameter from the user input is sanitized using a denylist of characters (`?"{}[]`). While this is a good defense against pattern injection in the underlying storage (Redis), an allowlist-based validation approach is generally safer. An allowlist would restrict the input to a set of known-safe characters (e.g., alphanumeric, dashes, underscores), providing stronger protection against unforeseen malicious inputs. 💡 Suggestion* Consider validating the `filter` parameter against a regular expression that defines a strict set of allowed characters. For example, if filters are expected to be simple prefixes, a pattern like `^[a-zA-Z0-9_.-]+$` could be used. This would provide a more robust defense-in-depth security posture.

Architecture Issues (1)

Severity	Location	Issue
🟡 Warning	`gateway/rpc_storage_handler.go:313-328`	The implementation of `GetMultiKey` iterates over keys and makes a separate RPC call (`r.GetKey(key)`) for each one. This is inefficient and can lead to performance issues when fetching a large number of keys, as it creates a "N+1" query problem over RPC. The same pattern is repeated in `storage/mdcb_storage.go`. 💡 Suggestion A more scalable approach would be to introduce a new RPC endpoint that can accept a list of keys and return all of them in a single call. This would significantly reduce network latency and overhead. While the current implementation fixes a bug, it should be considered a temporary solution, and a follow-up task should be created to implement a proper bulk RPC endpoint.

Performance Issues (4)

Severity	Location	Issue
🟠 Error	`gateway/rpc_storage_handler.go:310-324`	The `GetMultiKey` function retrieves keys sequentially in a loop, which leads to an N+1 query problem. For a batch of 1000 keys, this will result in 1000 separate RPC calls, causing significant latency. 💡 Suggestion The underlying RPC service should be enhanced to support a bulk retrieval endpoint, and this client should be updated to use it. If that's not possible, this method of fetching keys will not scale and an alternative approach should be considered.
🟠 Error	`storage/mdcb_storage.go:61-75`	The `GetMultiKey` function retrieves keys sequentially in a loop, which leads to an N+1 query problem. For a batch of 1000 keys, this will result in 1000 separate calls to `GetKey`, causing significant latency. 💡 Suggestion This implementation should be optimized to perform a bulk fetch if the underlying storage supports it, instead of iterating and calling `GetKey` for each key.
🟠 Error	`storage/redis_cluster.go:332-352`	The `context.Context` from the incoming HTTP request is not propagated to the storage layer. `GetRawMultiKey` and `pipelineFetch` use `context.Background()`, which means that if the client cancels the request, the potentially long-running database query will not be cancelled. This can lead to unnecessary resource consumption and a less resilient system. 💡 Suggestion Modify the `storage.Handler` and `gateway.SessionHandler` interfaces to accept a `context.Context` in their bulk retrieval methods. Then, pass the request context from `handleGetAllKeys` all the way down to the Redis calls (`MGet`, `pipe.Get`, `pipe.Exec`).
🟡 Warning	`gateway/api.go:737`	The `batch` slice is re-allocated inside the `for` loop on every iteration. For a large number of keys, this results in unnecessary memory allocations and garbage collector pressure. 💡 Suggestion To optimize, declare the `batch` slice once before the loop and reset its length to zero at the beginning of each iteration using `batch = batch[:0]`. 🔧 Suggested Fix `batch := make([]string, 0, batchSize) for i := 0; i < len(keys); i += batchSize { // ... batch = batch[:0] // Reset slice for reuse for _, k := range keys[i:end] { // ... } // ... }`

Quality Issues (3)

Severity	Location	Issue
🟠 Error	`gateway/api.go:720-763`	The new functionality for bulk fetching session details and input validation introduces several critical paths that are not covered by tests, as indicated by the new code coverage of 61.5%. This reduces confidence in the reliability and security of the implementation. Specifically, tests are missing for: 1. Rejection of filters containing unsafe characters in `getAllSessionKeys`. 2. Context cancellation within the batch processing loop in `handleGetAllKeys`. 3. Error handling when `gw.GlobalSessionManager.SessionDetailBulk` returns an error. 4. Errors returned from the storage layer in `gateway/auth_manager.go:SessionDetailBulk`. 5. JSON unmarshalling errors for individual session objects in `gateway/auth_manager.go:SessionDetailBulk`. 6. Errors from the Redis client or pipeline execution in `storage/redis_cluster.go:GetRawMultiKey`. 💡 Suggestion Add unit and integration tests to cover these failure scenarios and edge cases. For example, use a mock storage handler that returns an error to test error propagation, and add specific test cases for invalid filter inputs and context cancellation.
🟡 Warning	`storage/mdcb_storage.go:61-75`	The `GetMultiKey` function in `MdcbStorage` (and `RPCStorageHandler`) was fixed to correctly retrieve all requested keys instead of returning after the first success. However, no new test cases were added to verify this fix. Without a dedicated test, there's a risk of regression and it's not proven that the bug is fully resolved in all cases. 💡 Suggestion Add a unit test for `GetMultiKey` that requests multiple keys, including a mix of existing and non-existing ones, and asserts that all corresponding values are returned correctly. This test should have failed with the old implementation.
🟡 Warning	`gateway/api.go:720`	The batch size for processing keys is hardcoded to 1000. While this may be a reasonable default, making it configurable would provide more flexibility for operators to tune performance based on their specific environment (e.g., Redis performance, network latency, session object size). 💡 Suggestion Make the batch size a configurable parameter in the Tyk gateway configuration, with 1000 as the default value. This will improve the maintainability and adaptability of the feature.

Powered by Visor from Probelabs

Last updated: 2025-12-19T09:34:55.614Z | Triggered by: pr_updated | Commit: b219123

💡 TIP: You can chat with Visor using /visor ask <your question>

… api - fix linter

… api - visor comments

sonarqubecloud · 2025-12-19T08:09:12Z

Quality Gate failed

Failed conditions
61.5% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

… api - visor comments

NurayAhmadova added 2 commits December 18, 2025 14:42

wip

e1a33c1

[TT-5545] fix bug issue when retrieving a list of keys attached to an…

3dc4d83

… api - alternative solution

NurayAhmadova added 2 commits December 18, 2025 16:39

[TT-5545] fix bug issue when retrieving a list of keys attached to an…

85e265a

… api - fix linter

[TT-5545] fix bug issue when retrieving a list of keys attached to an…

8129107

… api - visor comments

[TT-5545] fix bug issue when retrieving a list of keys attached to an…

b219123

… api - visor comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TT-5545] issue when retrieving a list of keys attached to an API - alternative solution #7651

[TT-5545] issue when retrieving a list of keys attached to an API - alternative solution #7651

NurayAhmadova commented Dec 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Dec 18, 2025 •

edited

Loading

Uh oh!

probelabs bot commented Dec 18, 2025 •

edited

Loading

Uh oh!

probelabs bot commented Dec 18, 2025 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[TT-5545] issue when retrieving a list of keys attached to an API - alternative solution #7651

Are you sure you want to change the base?

[TT-5545] issue when retrieving a list of keys attached to an API - alternative solution #7651

Conversation

NurayAhmadova commented Dec 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Motivation and Context

How This Has Been Tested

Screenshots (if appropriate)

Types of changes

Checklist

Ticket Details

Uh oh!

github-actions bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

probelabs bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Files Changed Analysis

Architecture & Impact Assessment

Key Filtering Flow

Scope Discovery & Context Expansion

Uh oh!

probelabs bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Security Issues (1)

Architecture Issues (1)

Performance Issues (4)

Quality Issues (3)

Uh oh!

sonarqubecloud bot commented Dec 19, 2025

Quality Gate failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NurayAhmadova commented Dec 18, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Dec 18, 2025 •

edited

Loading

probelabs bot commented Dec 18, 2025 •

edited

Loading

probelabs bot commented Dec 18, 2025 •

edited

Loading