Refactor + force router keywords to lowercase #2597

SmittieC · 2025-12-18T12:50:47Z

Technical Description

Forces all router keywords to lowercase. See https://dimagi.slack.com/archives/C07EJ7ENB1N/p1765973629616449

Migrations

The migrations are backwards compatible

Demo

https://www.loom.com/share/004b1243d47646d18fd866085695e4c1

Docs and Changelog

This PR requires docs/changelog update

From now on, routers will only support lower cased keywords and and force all keywords to be lower case.

SmittieC · 2025-12-18T12:51:49Z

apps/pipelines/nodes/nodes.py

        result_lower = result.lower()
        for keyword in self.keywords:
-            if keyword.lower() == result_lower:
+            if keyword == result_lower:


keyword is already lower case by now, so no need to call .lower() again

SmittieC · 2025-12-18T12:54:10Z

@coderabbitai do a full review please, especially the migration

coderabbitai · 2025-12-18T12:54:35Z

📝 Walkthrough

Walkthrough

This PR refactors node infrastructure by extracting mixin classes from nodes.py into a dedicated mixins.py module. It implements case-insensitive routing by converting conditional branches to lowercase in router logic and applies consistent lowercase normalization to routing keywords in both the backend (Python) and frontend (TypeScript) components. Several deprecated node implementations and history middleware classes are removed as part of the consolidation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Import verification: Ensure all mixins moved to the new mixins.py file are properly imported across the codebase
Removal validation: Confirm that removed classes (OutputMessageTagMixin, LLMResponseMixin, HistoryMixin, RouterMixin, history middleware, etc.) are not referenced elsewhere in the codebase
Lowercase normalization logic: Verify that the case-insensitive routing changes in base.py and widgets.tsx maintain existing behavior and don't break conditional routing decisions
Cross-file consistency: Check that keyword normalization is consistently applied in both backend and frontend components

Possibly related PRs

Dynamic model params #2418: Touches the same LLM node plumbing including LLMResponseMixin, llm_model_parameters, node/router mixins, and associated UI handling

Suggested reviewers

stephherbers
snopoke

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 15.79% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly reflects the main changes: refactoring node infrastructure and enforcing lowercase router keywords.
Description check	✅ Passed	The PR description follows the template structure with all required sections completed: Technical Description includes context with a Slack link, Migrations section is marked as backwards compatible, Demo includes a Loom video, and Docs/Changelog update is checked.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

apps/pipelines/nodes/mixins.py (1)
85-96: Return type annotation is misleading.

The mode="before" validator receives and returns raw data (dict), not a Self instance. The -> Self annotation is technically incorrect for a "before" mode validator.
🔎 Suggested fix:
     @model_validator(mode="before")
     @classmethod
-    def ensure_default_parameters(cls, data) -> Self:
+    def ensure_default_parameters(cls, data: dict) -> dict:
         if llm_provider_model_id := data.get("llm_provider_model_id"):

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6c16f9 and 8a98d23.

📒 Files selected for processing (4)

apps/pipelines/nodes/base.py (1 hunks)
apps/pipelines/nodes/mixins.py (1 hunks)
apps/pipelines/nodes/nodes.py (3 hunks)
assets/javascript/apps/pipeline/nodes/widgets.tsx (1 hunks)

🧰 Additional context used

📓 Path-based instructions (6)

**/*.{js,ts,jsx,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use ESLint with TypeScript support for JavaScript/TypeScript linting

Files:

assets/javascript/apps/pipeline/nodes/widgets.tsx

**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Use TypeScript strict mode for type checking

Files:

assets/javascript/apps/pipeline/nodes/widgets.tsx

**/*.{html,tsx,ts,jsx,js}

📄 CodeRabbit inference engine (CLAUDE.md)

Use DaisyUI components as the primary UI framework for all frontend components before falling back to TailwindCSS custom styling

Files:

assets/javascript/apps/pipeline/nodes/widgets.tsx

**/*.{html,tsx,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

All UI components must support both light and dark modes using theme-aware styling

Files:

assets/javascript/apps/pipeline/nodes/widgets.tsx

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Ruff for Python linting and formatting with a line length of 120 characters
Use isort via Ruff for Python import sorting

Files:

apps/pipelines/nodes/base.py
apps/pipelines/nodes/nodes.py
apps/pipelines/nodes/mixins.py

⚙️ CodeRabbit configuration file

**/*.py: Do not review for Python compatibility below 3.13
Flag heavy AI/ML imports at module level (langchain_*, openai, anthropic, google, boto3, pandas, numpy). These should be lazy-loaded inside methods to keep Django startup fast.

Files:

apps/pipelines/nodes/base.py
apps/pipelines/nodes/nodes.py
apps/pipelines/nodes/mixins.py

apps/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

apps/**/*.py: Avoid importing heavy AI/ML libraries (langchain_google_vertexai, langchain_google_genai, langchain_anthropic, langchain_openai, boto3, pandas, numpy) at module level; use lazy imports inside functions instead
Use TYPE_CHECKING guard for type-only imports from heavy AI/ML libraries to avoid module-level import costs

Files:

apps/pipelines/nodes/base.py
apps/pipelines/nodes/nodes.py
apps/pipelines/nodes/mixins.py

🧠 Learnings (6)

📚 Learning: 2024-11-15T13:16:57.559Z

Learnt from: SmittieC
Repo: dimagi/open-chat-studio PR: 869
File: assets/javascript/apps/pipeline/widgets.tsx:284-310
Timestamp: 2024-11-15T13:16:57.559Z
Learning: Type definitions for the `AssistantIdWidget` component in `assets/javascript/apps/pipeline/widgets.tsx` are maintained in a separate file.

Applied to files:

assets/javascript/apps/pipeline/nodes/widgets.tsx

📚 Learning: 2024-10-18T07:00:25.261Z

Learnt from: SmittieC
Repo: dimagi/open-chat-studio PR: 735
File: apps/pipelines/models.py:183-190
Timestamp: 2024-10-18T07:00:25.261Z
Learning: In `apps/pipelines/models.py`, within the `Pipeline` model's `create_new_version` method, only the `pipeline` field needs to be updated for node versions using `bulk_update`, as per the design.

Applied to files:

apps/pipelines/nodes/nodes.py
apps/pipelines/nodes/mixins.py

📚 Learning: 2024-10-18T07:02:17.007Z

Learnt from: SmittieC
Repo: dimagi/open-chat-studio PR: 735
File: apps/chat/bots.py:291-291
Timestamp: 2024-10-18T07:02:17.007Z
Learning: In `apps/chat/bots.py`, the `PipelineBot.process_input` method returns a `PipelineState` TypedDict where the 'ai_message_id' key always exists.

Applied to files:

apps/pipelines/nodes/nodes.py

📚 Learning: 2024-10-18T08:35:37.836Z

Learnt from: snopoke
Repo: dimagi/open-chat-studio PR: 740
File: apps/pipelines/models.py:9-9
Timestamp: 2024-10-18T08:35:37.836Z
Learning: In `apps/pipelines/models.py`, both `langchain_core.messages` and `langchain.schema` are acceptable import paths for `AIMessage` and related classes.

Applied to files:

apps/pipelines/nodes/nodes.py

📚 Learning: 2024-10-18T08:35:40.277Z

Learnt from: snopoke
Repo: dimagi/open-chat-studio PR: 740
File: apps/pipelines/models.py:252-262
Timestamp: 2024-10-18T08:35:40.277Z
Learning: In `apps/pipelines/models.py`, the `PipelineChatMessages.as_langchain_messages()` method intentionally returns messages in reverse order, so the sequence is correct as implemented.

Applied to files:

apps/pipelines/nodes/nodes.py

📚 Learning: 2024-10-18T06:41:13.553Z

Learnt from: SmittieC
Repo: dimagi/open-chat-studio PR: 735
File: apps/pipelines/models.py:21-34
Timestamp: 2024-10-18T06:41:13.553Z
Learning: In `apps/pipelines/models.py`, it's acceptable to prefetch `node_set` in the default queryset of `PipelineManager.get_queryset`.

Applied to files:

apps/pipelines/nodes/nodes.py

🪛 Ruff (0.14.8)

apps/pipelines/nodes/mixins.py

58-58: Avoid specifying long messages outside the exception class

(TRY003)

129-129: Avoid specifying long messages outside the exception class

(TRY003)

131-131: Avoid specifying long messages outside the exception class

(TRY003)

297-297: Unused method argument: info

(ARG002)

339-339: Unused lambda argument: x

(ARG005)

370-370: Unused method argument: state

(ARG002)

373-373: Unused method argument: reference_data

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: python-tests
GitHub Check: cypress-tests

🔇 Additional comments (11)

assets/javascript/apps/pipeline/nodes/widgets.tsx (1)

656-664: LGTM! Consistent lowercase normalization for keywords.

The frontend now normalizes keywords to lowercase on input, aligning with the backend RouterMixin validation that enforces lowercase keywords. This ensures users see the actual stored value immediately.

apps/pipelines/nodes/nodes.py (2)

61-68: LGTM! Clean import refactor.

The mixin imports are well-organized and clearly indicate the relocated functionality.

602-606: LGTM! Case-insensitive routing logic is correct.

The comparison keyword == result_lower works correctly because:

keywords are pre-normalized to lowercase by the RouterMixin.ensure_keywords_are_lowercase validator

result_lower is explicitly lowercased before comparison

This ensures consistent case-insensitive routing behavior.

apps/pipelines/nodes/mixins.py (7)

53-58: LGTM! Cached LLM provider model lookup.

The @lru_cache decorator appropriately caches the database lookup, and the error handling is clear.

290-302: LGTM! Keyword validation ensures lowercase and uniqueness.

The two validators correctly:

Normalize keywords to lowercase

Validate that keywords are non-empty and unique

Note: The info parameter in ensure_keywords_exist is unused, but this is typical for Pydantic validators that don't need field context.

326-344: LGTM! Extraction chain structure is well-designed.

The _prompt_chain and extraction_chain methods properly compose LangChain runnables for structured data extraction.

Note: The lambda lambda x: reference_data ignores its argument intentionally - this is a common pattern for injecting static values into runnable chains.

367-374: Base implementation stubs are intentionally minimal.

These methods (get_node_output, get_reference_data, update_reference_data) are meant to be overridden by subclasses. The unused arguments are expected for this pattern.

421-431: LGTM! Schema validation is correct.

The validator properly checks that data_schema is valid JSON and a non-empty dictionary.

316-323: Line reference is incorrect and concern is not valid.

The code accesses self.name at line 323 (not 318), which is properly inherited through the class hierarchy. Both RouterNode and StaticRouterNode inherit from PipelineRouterNode, which extends BasePipelineNode where name: str is defined. This is correct usage of multiple inheritance with mixins.

Likely an incorrect or invalid review comment.

173-176: No changes needed. The node_id attribute is properly available in all classes using HistoryMixin through their inheritance chain from BasePipelineNode, which defines it as a field at line 311. LLMResponseWithPrompt inherits it via LLMResponse → PipelineNode → BasePipelineNode, and RouterNode inherits it via PipelineRouterNode → BasePipelineNode.

apps/pipelines/nodes/base.py (1)

440-449: LGTM! Core lowercase normalization for router matching.

This change ensures that LLM-generated routing keywords are normalized to lowercase before:

Matching against the output map

Generating output tags

Looking up target nodes in the edge map

This works in conjunction with RouterMixin.ensure_keywords_are_lowercase to provide consistent case-insensitive routing.

coderabbitai · 2025-12-18T12:56:30Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@SmittieC I'll perform a comprehensive review of this PR, with special attention to the migration. Let me gather the necessary information.

🏁 Script executed:

#!/bin/bash

# Get PR details and changed files
gh pr view 2597 --json title,body,files --jq '.files[] | .path'

Length of output: 200

🏁 Script executed:

#!/bin/bash

# Get the full diff of the PR
gh pr diff 2597

Length of output: 44222

🏁 Script executed:

#!/bin/bash

# Find migration files in the repository
fd -e py -t f "migration" --exec echo {}

Length of output: 345

🏁 Script executed:

#!/bin/bash

# Look for router-related models and code
rg -n -C5 "class.*Router" --type py -g '!**/test**' -g '!**/migrations/**'

Length of output: 4829

🏁 Script executed:

#!/bin/bash

# Search for keyword-related fields and properties in router code
rg -n -C5 "keyword" --type py -g '!**/test**' -g '!**/migrations/**' | head -200

Length of output: 13200

Let me examine the test coverage and check for any edge cases in the migration.

🏁 Script executed:

#!/bin/bash

# Look for router node tests
rg -n "RouterNode|StaticRouterNode" --type py -g '**/test*.py' | head -50

Length of output: 1786

🏁 Script executed:

#!/bin/bash

# Check the Node model to understand the params field structure
rg -n -A10 "class Node" apps/pipelines/models.py | head -50

Length of output: 1116

🏁 Script executed:

#!/bin/bash

# Look at the actual migration file more carefully
cat apps/pipelines/migrations/0023_make_router_keywords_lower.py

Length of output: 998

🏁 Script executed:

#!/bin/bash

# Check if there are any existing migration tests
fd -t f "test.*migration" --exec basename {}

Length of output: 50

🏁 Script executed:

#!/bin/bash

# Look for how keywords are currently stored - check for any None or edge cases
rg -n "params\[\"keywords\"\]" --type py -C3

Length of output: 788

#!/bin/bash

# Look at the router test to see what it covers
cat apps/pipelines/tests/test_runnable_builder.py | sed -n '/class TestRouterNode/,/^class [^T]/p' | head -100

#!/bin/bash

# Check the RouterMixin validators more carefully
cat apps/pipelines/nodes/mixins.py | sed -n '/class RouterMixin/,/^class /p'

#!/bin/bash

# Check if there are any keywords with uppercase in existing test data
rg -i "keywords.*[A-Z]" apps/pipelines/tests/ --type py -C2

#!/bin/bash

# Look at how StaticRouterNode processes keywords
rg -n -A15 "class StaticRouterNode" apps/pipelines/nodes/nodes.py

codecov-commenter · 2025-12-18T13:05:35Z

❌ 23 Tests Failed:

Tests completed	Failed	Passed	Skipped
1582	23	1559	4

View the top 3 failed test(s) by shortest run time

apps/pipelines/tests/test_nodes.py::TestHistoryMixin::test_get_history_uses_pipeline_history_when_configured

Stack Traces | 0.003s run time

.../pipelines/tests/test_nodes.py:88: in mock_llm_provider_model
    with patch("apps.pipelines.nodes.nodes.get_llm_provider_model") as get_llm_provider_model:
.../hostedtoolcache/Python/3.13.11....../x64/lib/python3.13/unittest/mock.py:1497: in __enter__
    original, local = self.get_original()
.../hostedtoolcache/Python/3.13.11....../x64/lib/python3.13/unittest/mock.py:1467: in get_original
    raise AttributeError(
E   AttributeError: <module 'apps.pipelines.nodes.nodes' from '.../pipelines/nodes/nodes.py'> does not have the attribute 'get_llm_provider_model'

apps/pipelines/tests/test_nodes.py::TestHistoryMixin::test_store_compression_checkpoint_updates_metadata_with_compression_marker_global

Stack Traces | 0.003s run time

.../pipelines/tests/test_nodes.py:88: in mock_llm_provider_model
    with patch("apps.pipelines.nodes.nodes.get_llm_provider_model") as get_llm_provider_model:
.../hostedtoolcache/Python/3.13.11....../x64/lib/python3.13/unittest/mock.py:1497: in __enter__
    original, local = self.get_original()
.../hostedtoolcache/Python/3.13.11....../x64/lib/python3.13/unittest/mock.py:1467: in get_original
    raise AttributeError(
E   AttributeError: <module 'apps.pipelines.nodes.nodes' from '.../pipelines/nodes/nodes.py'> does not have the attribute 'get_llm_provider_model'

apps/pipelines/tests/test_nodes.py::TestHistoryMixin::test_store_compression_checkpoint_updates_metadata_with_compression_marker_node

Stack Traces | 0.003s run time

.../pipelines/tests/test_nodes.py:88: in mock_llm_provider_model
    with patch("apps.pipelines.nodes.nodes.get_llm_provider_model") as get_llm_provider_model:
.../hostedtoolcache/Python/3.13.11....../x64/lib/python3.13/unittest/mock.py:1497: in __enter__
    original, local = self.get_original()
.../hostedtoolcache/Python/3.13.11....../x64/lib/python3.13/unittest/mock.py:1467: in get_original
    raise AttributeError(
E   AttributeError: <module 'apps.pipelines.nodes.nodes' from '.../pipelines/nodes/nodes.py'> does not have the attribute 'get_llm_provider_model'

apps/pipelines/tests/test_nodes.py::TestHistoryMixin::test_store_compression_checkpoint_updates_summary_global

Stack Traces | 0.003s run time

.../pipelines/tests/test_nodes.py:88: in mock_llm_provider_model
    with patch("apps.pipelines.nodes.nodes.get_llm_provider_model") as get_llm_provider_model:
.../hostedtoolcache/Python/3.13.11....../x64/lib/python3.13/unittest/mock.py:1497: in __enter__
    original, local = self.get_original()
.../hostedtoolcache/Python/3.13.11....../x64/lib/python3.13/unittest/mock.py:1467: in get_original
    raise AttributeError(
E   AttributeError: <module 'apps.pipelines.nodes.nodes' from '.../pipelines/nodes/nodes.py'> does not have the attribute 'get_llm_provider_model'

apps/pipelines/tests/test_nodes.py::TestHistoryMixin::test_store_compression_checkpoint_updates_summary_node

Stack Traces | 0.003s run time

.../pipelines/tests/test_nodes.py:88: in mock_llm_provider_model
    with patch("apps.pipelines.nodes.nodes.get_llm_provider_model") as get_llm_provider_model:
.../hostedtoolcache/Python/3.13.11....../x64/lib/python3.13/unittest/mock.py:1497: in __enter__
    original, local = self.get_original()
.../hostedtoolcache/Python/3.13.11....../x64/lib/python3.13/unittest/mock.py:1467: in get_original
    raise AttributeError(
E   AttributeError: <module 'apps.pipelines.nodes.nodes' from '.../pipelines/nodes/nodes.py'> does not have the attribute 'get_llm_provider_model'

apps/pipelines/tests/test_nodes.py::TestSendEmailInputValidation::test_invalid_recipient_list[[email protected],[email protected],invalid-email]

Stack Traces | 0.003s run time

.../pipelines/tests/test_nodes.py:88: in mock_llm_provider_model
    with patch("apps.pipelines.nodes.nodes.get_llm_provider_model") as get_llm_provider_model:
.../hostedtoolcache/Python/3.13.11....../x64/lib/python3.13/unittest/mock.py:1497: in __enter__
    original, local = self.get_original()
.../hostedtoolcache/Python/3.13.11....../x64/lib/python3.13/unittest/mock.py:1467: in get_original
    raise AttributeError(
E   AttributeError: <module 'apps.pipelines.nodes.nodes' from '.../pipelines/nodes/nodes.py'> does not have the attribute 'get_llm_provider_model'

apps/pipelines/tests/test_nodes.py::TestSendEmailInputValidation::test_invalid_recipient_list[[email protected],invalid-email]

Stack Traces | 0.003s run time

.../pipelines/tests/test_nodes.py:88: in mock_llm_provider_model
    with patch("apps.pipelines.nodes.nodes.get_llm_provider_model") as get_llm_provider_model:
.../hostedtoolcache/Python/3.13.11....../x64/lib/python3.13/unittest/mock.py:1497: in __enter__
    original, local = self.get_original()
.../hostedtoolcache/Python/3.13.11....../x64/lib/python3.13/unittest/mock.py:1467: in get_original
    raise AttributeError(
E   AttributeError: <module 'apps.pipelines.nodes.nodes' from '.../pipelines/nodes/nodes.py'> does not have the attribute 'get_llm_provider_model'

apps/pipelines/tests/test_nodes.py::TestSendEmailInputValidation::test_valid_recipient_list[[email protected],[email protected]]

Stack Traces | 0.003s run time

.../pipelines/tests/test_nodes.py:88: in mock_llm_provider_model
    with patch("apps.pipelines.nodes.nodes.get_llm_provider_model") as get_llm_provider_model:
.../hostedtoolcache/Python/3.13.11....../x64/lib/python3.13/unittest/mock.py:1497: in __enter__
    original, local = self.get_original()
.../hostedtoolcache/Python/3.13.11....../x64/lib/python3.13/unittest/mock.py:1467: in get_original
    raise AttributeError(
E   AttributeError: <module 'apps.pipelines.nodes.nodes' from '.../pipelines/nodes/nodes.py'> does not have the attribute 'get_llm_provider_model'

apps/pipelines/tests/test_nodes.py::TestHistoryMixin::test_build_history_middleware_returns_none_when_history_disabled