Skip to content

Conversation

@NurayAhmadova
Copy link
Contributor

@NurayAhmadova NurayAhmadova commented Dec 18, 2025

Description

Related Issue

Motivation and Context

How This Has Been Tested

Screenshots (if appropriate)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Refactoring or add test (improvements in base code or adds test coverage to functionality)

Checklist

  • I ensured that the documentation is up to date
  • I explained why this PR updates go.mod in detail with reasoning why it's required
  • I would like a code coverage CI quality gate exception and have explained why

Ticket Details

TT-5545
Status In Dev
Summary Issue when retrieving a list of keys attached to an API

Generated at: 2025-12-19 09:26:09

@github-actions
Copy link
Contributor

github-actions bot commented Dec 18, 2025

API Changes

--- prev.txt	2025-12-19 09:26:47.587337275 +0000
+++ current.txt	2025-12-19 09:26:37.784371522 +0000
@@ -9829,6 +9829,8 @@
     SessionDetail returns the session detail using the storage engine (either in
     memory or Redis)
 
+func (b *DefaultSessionManager) SessionDetailBulk(orgID string, keyNames []string, hashed bool) (map[string]user.SessionState, error)
+
 func (b *DefaultSessionManager) Sessions(filter string) []string
     Sessions returns all sessions in the key store that match a filter key (a
     prefix)
@@ -10771,10 +10773,12 @@
 
 func (l *LDAPStorageHandler) GetListRange(keyName string, from, to int64) ([]string, error)
 
-func (r *LDAPStorageHandler) GetMultiKey(keyNames []string) ([]string, error)
+func (r *LDAPStorageHandler) GetMultiKey(_ []string) ([]string, error)
 
 func (l *LDAPStorageHandler) GetRawKey(filter string) (string, error)
 
+func (l *LDAPStorageHandler) GetRawMultiKey(keys []string) ([]string, error)
+
 func (l *LDAPStorageHandler) GetRollingWindow(keyName string, per int64, pipeline bool) (int, []interface{})
 
 func (l LDAPStorageHandler) GetSet(keyName string) (map[string]string, error)
@@ -11264,6 +11268,8 @@
 
 func (r *RPCStorageHandler) GetRawKey(keyName string) (string, error)
 
+func (r *RPCStorageHandler) GetRawMultiKey(keys []string) ([]string, error)
+
 func (r *RPCStorageHandler) GetRollingWindow(keyName string, per int64, pipeline bool) (int, []interface{})
 
 func (r RPCStorageHandler) GetSet(keyName string) (map[string]string, error)
@@ -11827,6 +11833,7 @@
 	UpdateSession(keyName string, session *user.SessionState, resetTTLTo int64, hashed bool) error
 	RemoveSession(orgID string, keyName string, hashed bool) bool
 	SessionDetail(orgID string, keyName string, hashed bool) (user.SessionState, bool)
+	SessionDetailBulk(orgID string, keyNames []string, hashed bool) (map[string]user.SessionState, error)
 	KeyExpired(newSession *user.SessionState) bool
 	Sessions(filter string) []string
 	ResetQuota(string, *user.SessionState, bool)
@@ -13382,6 +13389,10 @@
     the retrieval was successful. Currently, this method is not implemented and
     will cause a panic if invoked.
 
+func (s *DummyStorage) GetRawMultiKey(keys []string) ([]string, error)
+    GetRawMultiKey retrieves multiple values from the DummyStorage. Since
+    DummyStorage is an in-memory map, we just delegate to GetMultiKey.
+
 func (s *DummyStorage) GetRollingWindow(string, int64, bool) (int, []interface{})
     GetRollingWindow retrieves data for a specified rolling window; currently
     not implemented.
@@ -13434,6 +13445,7 @@
 type Handler interface {
 	GetKey(string) (string, error) // Returned string is expected to be a JSON object (user.SessionState)
 	GetMultiKey([]string) ([]string, error)
+	GetRawMultiKey([]string) ([]string, error)
 	GetRawKey(string) (string, error)
 	SetKey(string, string, int64) error // Second input string is expected to be a JSON object (user.SessionState)
 	SetRawKey(string, string, int64) error
@@ -13521,6 +13533,12 @@
 
 func (m MdcbStorage) GetRawKey(string) (string, error)
 
+func (m MdcbStorage) GetRawMultiKey(keys []string) ([]string, error)
+    GetRawMultiKey retrieves multiple values from the MdcbStorage based on a
+    slice of keys. Since MdcbStorage is a wrapper that manages fallback logic
+    between local and RPC layers, this method delegates to GetMultiKey to avoid
+    duplicating that fallback logic.
+
 func (m MdcbStorage) GetRollingWindow(key string, per int64, pipeline bool) (int, []interface{})
 
 func (m MdcbStorage) GetSet(key string) (map[string]string, error)
@@ -13633,6 +13651,9 @@
 
 func (r *RedisCluster) GetRawKey(keyName string) (string, error)
 
+func (r *RedisCluster) GetRawMultiKey(keys []string) ([]string, error)
+    GetRawMultiKey retrieves multiple values using a Pipeline.
+
 func (r *RedisCluster) GetRollingWindow(keyName string, per int64, pipeline bool) (int, []interface{})
 
 func (r *RedisCluster) GetSet(keyName string) (map[string]string, error)
@@ -13846,6 +13867,9 @@
 func (m *MockHandler) GetRawKey(arg0 string) (string, error)
     GetRawKey mocks base method.
 
+func (m *MockHandler) GetRawMultiKey(arg0 []string) ([]string, error)
+    GetRawMultiKey mocks base method.
+
 func (m *MockHandler) GetRollingWindow(key string, per int64, pipeline bool) (int, []any)
     GetRollingWindow mocks base method.
 
@@ -13951,6 +13975,9 @@
 func (mr *MockHandlerMockRecorder) GetRawKey(arg0 any) *gomock.Call
     GetRawKey indicates an expected call of GetRawKey.
 
+func (mr *MockHandlerMockRecorder) GetRawMultiKey(arg0 any) *gomock.Call
+    GetRawMultiKey indicates an expected call of GetRawMultiKey.
+
 func (mr *MockHandlerMockRecorder) GetRollingWindow(key, per, pipeline any) *gomock.Call
     GetRollingWindow indicates an expected call of GetRollingWindow.
 

@probelabs
Copy link

probelabs bot commented Dec 18, 2025

This pull request addresses a critical performance issue (TT-5545) where retrieving a list of API keys filtered by an API ID caused excessive memory consumption and timeouts. The previous implementation loaded all session objects into memory before filtering. This PR refactors the logic to be highly memory-efficient and scalable by first fetching all key identifiers and then processing them in manageable batches.

Files Changed Analysis

The changes span 11 files and introduce a new bulk-processing workflow across the API, authentication, and storage layers:

  • gateway/api.go: The core logic in handleGetAllKeys has been rewritten to implement a batching mechanism. It now iterates through keys in chunks of 1000, calling a new bulk-fetch method for each batch.
  • gateway/auth_manager.go: A new method, SessionDetailBulk, has been added to the SessionHandler interface to efficiently retrieve session details for a list of keys.
  • storage/storage.go: The Handler interface has been updated with a new GetRawMultiKey method, establishing a contract for bulk data retrieval from any underlying storage backend.
  • storage/redis_cluster.go: Contains a performant, pipeline-based implementation of GetRawMultiKey for Redis, ensuring efficient bulk-read operations.
  • Other Storage Handlers (ldap, rpc, mdcb, dummy): These have been updated to implement the new GetRawMultiKey interface method for compatibility.
  • Tests: New integration and unit tests have been added in gateway/api_test.go and gateway/auth_manager_test.go to validate the new functionality.

Architecture & Impact Assessment

  • What this PR accomplishes: It resolves a critical scalability bottleneck in the /tyk/keys API endpoint, making it viable for use in deployments with millions of API keys.
  • Key technical changes introduced:
    • Introduction of a batch processing pattern for API key filtering.
    • Addition of a SessionDetailBulk method to the session management layer.
    • Modification of the core storage.Handler interface to support bulk-reading (GetRawMultiKey).
    • Propagation of context.Context from the API layer down to the storage layer for better timeout and cancellation handling.
  • Affected system components: The changes impact the Gateway API (specifically the /tyk/keys endpoint), the Authentication Manager, and the entire Storage Layer, requiring updates to all storage backend implementations.

Key Filtering Flow

sequenceDiagram
    participant Client
    participant GatewayAPI
    participant SessionManager
    participant Storage

    Client->>GatewayAPI: GET /tyk/keys?api_id=some-api
    GatewayAPI->>SessionManager: Sessions(filter) # Get all key IDs
    SessionManager-->>GatewayAPI: [key1, key2, ..., keyN]

    loop For each batch of 1000 keys
        GatewayAPI->>SessionManager: SessionDetailBulk(ctx, batch_keys)
        SessionManager->>Storage: GetRawMultiKey(batch_keys)
        Storage-->>SessionManager: {key_data_1, key_data_2, ...}
        SessionManager-->>GatewayAPI: map[key]SessionState
        GatewayAPI->>GatewayAPI: Filter sessions in batch by api_id
    end

    GatewayAPI-->>Client: 200 OK [filtered_key1, ...]
Loading

Scope Discovery & Context Expansion

The modification of the storage.Handler interface is a significant architectural change with broad impact, as it necessitates updates across all supported storage backends. While the Redis implementation is efficient, the implementations for other backends have issues:

  • RPC & MDCB Storage: The GetMultiKey and GetRawMultiKey implementations in gateway/rpc_storage_handler.go and storage/mdcb_storage.go are incorrect. They perform sequential lookups and return after the first found key, rather than fetching the entire batch. This will lead to incomplete results for deployments using these backends.
  • LDAP Storage: The gateway/ldap_auth_handler.go implementation is a stub that returns an error, meaning this feature will not work for LDAP-based deployments.

The introduction of context.Context is a notable improvement, enhancing the gateway's resilience by allowing request timeouts and cancellations to propagate down to the storage layer.

Metadata
  • Review Effort: 4 / 5
  • Primary Label: bug

Powered by Visor from Probelabs

Last updated: 2025-12-19T09:34:52.851Z | Triggered by: pr_updated | Commit: b219123

💡 TIP: You can chat with Visor using /visor ask <your question>

@probelabs
Copy link

probelabs bot commented Dec 18, 2025

Security Issues (1)

Severity Location Issue
🟢 Info gateway/api.go:781
The `filter` parameter from the user input is sanitized using a denylist of characters (`*?"{}[]`). While this is a good defense against pattern injection in the underlying storage (Redis), an allowlist-based validation approach is generally safer. An allowlist would restrict the input to a set of known-safe characters (e.g., alphanumeric, dashes, underscores), providing stronger protection against unforeseen malicious inputs.
💡 SuggestionConsider validating the `filter` parameter against a regular expression that defines a strict set of allowed characters. For example, if filters are expected to be simple prefixes, a pattern like `^[a-zA-Z0-9_.-]+$` could be used. This would provide a more robust defense-in-depth security posture.

Architecture Issues (1)

Severity Location Issue
🟡 Warning gateway/rpc_storage_handler.go:313-328
The implementation of `GetMultiKey` iterates over keys and makes a separate RPC call (`r.GetKey(key)`) for each one. This is inefficient and can lead to performance issues when fetching a large number of keys, as it creates a "N+1" query problem over RPC. The same pattern is repeated in `storage/mdcb_storage.go`.
💡 SuggestionA more scalable approach would be to introduce a new RPC endpoint that can accept a list of keys and return all of them in a single call. This would significantly reduce network latency and overhead. While the current implementation fixes a bug, it should be considered a temporary solution, and a follow-up task should be created to implement a proper bulk RPC endpoint.

Performance Issues (4)

Severity Location Issue
🟠 Error gateway/rpc_storage_handler.go:310-324
The `GetMultiKey` function retrieves keys sequentially in a loop, which leads to an N+1 query problem. For a batch of 1000 keys, this will result in 1000 separate RPC calls, causing significant latency.
💡 SuggestionThe underlying RPC service should be enhanced to support a bulk retrieval endpoint, and this client should be updated to use it. If that's not possible, this method of fetching keys will not scale and an alternative approach should be considered.
🟠 Error storage/mdcb_storage.go:61-75
The `GetMultiKey` function retrieves keys sequentially in a loop, which leads to an N+1 query problem. For a batch of 1000 keys, this will result in 1000 separate calls to `GetKey`, causing significant latency.
💡 SuggestionThis implementation should be optimized to perform a bulk fetch if the underlying storage supports it, instead of iterating and calling `GetKey` for each key.
🟠 Error storage/redis_cluster.go:332-352
The `context.Context` from the incoming HTTP request is not propagated to the storage layer. `GetRawMultiKey` and `pipelineFetch` use `context.Background()`, which means that if the client cancels the request, the potentially long-running database query will not be cancelled. This can lead to unnecessary resource consumption and a less resilient system.
💡 SuggestionModify the `storage.Handler` and `gateway.SessionHandler` interfaces to accept a `context.Context` in their bulk retrieval methods. Then, pass the request context from `handleGetAllKeys` all the way down to the Redis calls (`MGet`, `pipe.Get`, `pipe.Exec`).
🟡 Warning gateway/api.go:737
The `batch` slice is re-allocated inside the `for` loop on every iteration. For a large number of keys, this results in unnecessary memory allocations and garbage collector pressure.
💡 SuggestionTo optimize, declare the `batch` slice once before the loop and reset its length to zero at the beginning of each iteration using `batch = batch[:0]`.
🔧 Suggested Fix
batch := make([]string, 0, batchSize)
for i := 0; i < len(keys); i += batchSize {
    // ...
    batch = batch[:0] // Reset slice for reuse
    for _, k := range keys[i:end] {
        // ...
    }
    // ...
}

Quality Issues (3)

Severity Location Issue
🟠 Error gateway/api.go:720-763
The new functionality for bulk fetching session details and input validation introduces several critical paths that are not covered by tests, as indicated by the new code coverage of 61.5%. This reduces confidence in the reliability and security of the implementation. Specifically, tests are missing for: 1. Rejection of filters containing unsafe characters in `getAllSessionKeys`. 2. Context cancellation within the batch processing loop in `handleGetAllKeys`. 3. Error handling when `gw.GlobalSessionManager.SessionDetailBulk` returns an error. 4. Errors returned from the storage layer in `gateway/auth_manager.go:SessionDetailBulk`. 5. JSON unmarshalling errors for individual session objects in `gateway/auth_manager.go:SessionDetailBulk`. 6. Errors from the Redis client or pipeline execution in `storage/redis_cluster.go:GetRawMultiKey`.
💡 SuggestionAdd unit and integration tests to cover these failure scenarios and edge cases. For example, use a mock storage handler that returns an error to test error propagation, and add specific test cases for invalid filter inputs and context cancellation.
🟡 Warning storage/mdcb_storage.go:61-75
The `GetMultiKey` function in `MdcbStorage` (and `RPCStorageHandler`) was fixed to correctly retrieve all requested keys instead of returning after the first success. However, no new test cases were added to verify this fix. Without a dedicated test, there's a risk of regression and it's not proven that the bug is fully resolved in all cases.
💡 SuggestionAdd a unit test for `GetMultiKey` that requests multiple keys, including a mix of existing and non-existing ones, and asserts that all corresponding values are returned correctly. This test should have failed with the old implementation.
🟡 Warning gateway/api.go:720
The batch size for processing keys is hardcoded to 1000. While this may be a reasonable default, making it configurable would provide more flexibility for operators to tune performance based on their specific environment (e.g., Redis performance, network latency, session object size).
💡 SuggestionMake the batch size a configurable parameter in the Tyk gateway configuration, with 1000 as the default value. This will improve the maintainability and adaptability of the feature.

Powered by Visor from Probelabs

Last updated: 2025-12-19T09:34:55.614Z | Triggered by: pr_updated | Commit: b219123

💡 TIP: You can chat with Visor using /visor ask <your question>

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
61.5% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants