-
Notifications
You must be signed in to change notification settings - Fork 17
Refactor + force router keywords to lowercase #2597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| result_lower = result.lower() | ||
| for keyword in self.keywords: | ||
| if keyword.lower() == result_lower: | ||
| if keyword == result_lower: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keyword is already lower case by now, so no need to call .lower() again
|
@coderabbitai do a full review please, especially the migration |
📝 WalkthroughWalkthroughThis PR refactors node infrastructure by extracting mixin classes from nodes.py into a dedicated mixins.py module. It implements case-insensitive routing by converting conditional branches to lowercase in router logic and applies consistent lowercase normalization to routing keywords in both the backend (Python) and frontend (TypeScript) components. Several deprecated node implementations and history middleware classes are removed as part of the consolidation. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
apps/pipelines/nodes/mixins.py (1)
85-96: Return type annotation is misleading.The
mode="before"validator receives and returns raw data (dict), not aSelfinstance. The-> Selfannotation is technically incorrect for a "before" mode validator.🔎 Suggested fix:
@model_validator(mode="before") @classmethod - def ensure_default_parameters(cls, data) -> Self: + def ensure_default_parameters(cls, data: dict) -> dict: if llm_provider_model_id := data.get("llm_provider_model_id"):
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
apps/pipelines/nodes/base.py(1 hunks)apps/pipelines/nodes/mixins.py(1 hunks)apps/pipelines/nodes/nodes.py(3 hunks)assets/javascript/apps/pipeline/nodes/widgets.tsx(1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{js,ts,jsx,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use ESLint with TypeScript support for JavaScript/TypeScript linting
Files:
assets/javascript/apps/pipeline/nodes/widgets.tsx
**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Use TypeScript strict mode for type checking
Files:
assets/javascript/apps/pipeline/nodes/widgets.tsx
**/*.{html,tsx,ts,jsx,js}
📄 CodeRabbit inference engine (CLAUDE.md)
Use DaisyUI components as the primary UI framework for all frontend components before falling back to TailwindCSS custom styling
Files:
assets/javascript/apps/pipeline/nodes/widgets.tsx
**/*.{html,tsx,jsx}
📄 CodeRabbit inference engine (CLAUDE.md)
All UI components must support both light and dark modes using theme-aware styling
Files:
assets/javascript/apps/pipeline/nodes/widgets.tsx
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff for Python linting and formatting with a line length of 120 characters
Use isort via Ruff for Python import sorting
Files:
apps/pipelines/nodes/base.pyapps/pipelines/nodes/nodes.pyapps/pipelines/nodes/mixins.py
⚙️ CodeRabbit configuration file
**/*.py: Do not review for Python compatibility below 3.13
Flag heavy AI/ML imports at module level (langchain_*, openai, anthropic, google, boto3, pandas, numpy). These should be lazy-loaded inside methods to keep Django startup fast.
Files:
apps/pipelines/nodes/base.pyapps/pipelines/nodes/nodes.pyapps/pipelines/nodes/mixins.py
apps/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
apps/**/*.py: Avoid importing heavy AI/ML libraries (langchain_google_vertexai, langchain_google_genai, langchain_anthropic, langchain_openai, boto3, pandas, numpy) at module level; use lazy imports inside functions instead
Use TYPE_CHECKING guard for type-only imports from heavy AI/ML libraries to avoid module-level import costs
Files:
apps/pipelines/nodes/base.pyapps/pipelines/nodes/nodes.pyapps/pipelines/nodes/mixins.py
🧠 Learnings (6)
📚 Learning: 2024-11-15T13:16:57.559Z
Learnt from: SmittieC
Repo: dimagi/open-chat-studio PR: 869
File: assets/javascript/apps/pipeline/widgets.tsx:284-310
Timestamp: 2024-11-15T13:16:57.559Z
Learning: Type definitions for the `AssistantIdWidget` component in `assets/javascript/apps/pipeline/widgets.tsx` are maintained in a separate file.
Applied to files:
assets/javascript/apps/pipeline/nodes/widgets.tsx
📚 Learning: 2024-10-18T07:00:25.261Z
Learnt from: SmittieC
Repo: dimagi/open-chat-studio PR: 735
File: apps/pipelines/models.py:183-190
Timestamp: 2024-10-18T07:00:25.261Z
Learning: In `apps/pipelines/models.py`, within the `Pipeline` model's `create_new_version` method, only the `pipeline` field needs to be updated for node versions using `bulk_update`, as per the design.
Applied to files:
apps/pipelines/nodes/nodes.pyapps/pipelines/nodes/mixins.py
📚 Learning: 2024-10-18T07:02:17.007Z
Learnt from: SmittieC
Repo: dimagi/open-chat-studio PR: 735
File: apps/chat/bots.py:291-291
Timestamp: 2024-10-18T07:02:17.007Z
Learning: In `apps/chat/bots.py`, the `PipelineBot.process_input` method returns a `PipelineState` TypedDict where the 'ai_message_id' key always exists.
Applied to files:
apps/pipelines/nodes/nodes.py
📚 Learning: 2024-10-18T08:35:37.836Z
Learnt from: snopoke
Repo: dimagi/open-chat-studio PR: 740
File: apps/pipelines/models.py:9-9
Timestamp: 2024-10-18T08:35:37.836Z
Learning: In `apps/pipelines/models.py`, both `langchain_core.messages` and `langchain.schema` are acceptable import paths for `AIMessage` and related classes.
Applied to files:
apps/pipelines/nodes/nodes.py
📚 Learning: 2024-10-18T08:35:40.277Z
Learnt from: snopoke
Repo: dimagi/open-chat-studio PR: 740
File: apps/pipelines/models.py:252-262
Timestamp: 2024-10-18T08:35:40.277Z
Learning: In `apps/pipelines/models.py`, the `PipelineChatMessages.as_langchain_messages()` method intentionally returns messages in reverse order, so the sequence is correct as implemented.
Applied to files:
apps/pipelines/nodes/nodes.py
📚 Learning: 2024-10-18T06:41:13.553Z
Learnt from: SmittieC
Repo: dimagi/open-chat-studio PR: 735
File: apps/pipelines/models.py:21-34
Timestamp: 2024-10-18T06:41:13.553Z
Learning: In `apps/pipelines/models.py`, it's acceptable to prefetch `node_set` in the default queryset of `PipelineManager.get_queryset`.
Applied to files:
apps/pipelines/nodes/nodes.py
🪛 Ruff (0.14.8)
apps/pipelines/nodes/mixins.py
58-58: Avoid specifying long messages outside the exception class
(TRY003)
129-129: Avoid specifying long messages outside the exception class
(TRY003)
131-131: Avoid specifying long messages outside the exception class
(TRY003)
297-297: Unused method argument: info
(ARG002)
339-339: Unused lambda argument: x
(ARG005)
370-370: Unused method argument: state
(ARG002)
373-373: Unused method argument: reference_data
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: python-tests
- GitHub Check: cypress-tests
🔇 Additional comments (11)
assets/javascript/apps/pipeline/nodes/widgets.tsx (1)
656-664: LGTM! Consistent lowercase normalization for keywords.The frontend now normalizes keywords to lowercase on input, aligning with the backend
RouterMixinvalidation that enforces lowercase keywords. This ensures users see the actual stored value immediately.apps/pipelines/nodes/nodes.py (2)
61-68: LGTM! Clean import refactor.The mixin imports are well-organized and clearly indicate the relocated functionality.
602-606: LGTM! Case-insensitive routing logic is correct.The comparison
keyword == result_lowerworks correctly because:
keywordsare pre-normalized to lowercase by theRouterMixin.ensure_keywords_are_lowercasevalidatorresult_loweris explicitly lowercased before comparisonThis ensures consistent case-insensitive routing behavior.
apps/pipelines/nodes/mixins.py (7)
53-58: LGTM! Cached LLM provider model lookup.The
@lru_cachedecorator appropriately caches the database lookup, and the error handling is clear.
290-302: LGTM! Keyword validation ensures lowercase and uniqueness.The two validators correctly:
- Normalize keywords to lowercase
- Validate that keywords are non-empty and unique
Note: The
infoparameter inensure_keywords_existis unused, but this is typical for Pydantic validators that don't need field context.
326-344: LGTM! Extraction chain structure is well-designed.The
_prompt_chainandextraction_chainmethods properly compose LangChain runnables for structured data extraction.Note: The lambda
lambda x: reference_dataignores its argument intentionally - this is a common pattern for injecting static values into runnable chains.
367-374: Base implementation stubs are intentionally minimal.These methods (
get_node_output,get_reference_data,update_reference_data) are meant to be overridden by subclasses. The unused arguments are expected for this pattern.
421-431: LGTM! Schema validation is correct.The validator properly checks that
data_schemais valid JSON and a non-empty dictionary.
316-323: Line reference is incorrect and concern is not valid.The code accesses
self.nameat line 323 (not 318), which is properly inherited through the class hierarchy. BothRouterNodeandStaticRouterNodeinherit fromPipelineRouterNode, which extendsBasePipelineNodewherename: stris defined. This is correct usage of multiple inheritance with mixins.Likely an incorrect or invalid review comment.
173-176: No changes needed. Thenode_idattribute is properly available in all classes usingHistoryMixinthrough their inheritance chain fromBasePipelineNode, which defines it as a field at line 311.LLMResponseWithPromptinherits it viaLLMResponse→PipelineNode→BasePipelineNode, andRouterNodeinherits it viaPipelineRouterNode→BasePipelineNode.apps/pipelines/nodes/base.py (1)
440-449: LGTM! Core lowercase normalization for router matching.This change ensures that LLM-generated routing keywords are normalized to lowercase before:
- Matching against the output map
- Generating output tags
- Looking up target nodes in the edge map
This works in conjunction with
RouterMixin.ensure_keywords_are_lowercaseto provide consistent case-insensitive routing.
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain@SmittieC I'll perform a comprehensive review of this PR, with special attention to the migration. Let me gather the necessary information. 🏁 Script executed: Length of output: 200 🏁 Script executed: Length of output: 44222 🏁 Script executed: Length of output: 345 🏁 Script executed: Length of output: 4829 🏁 Script executed: Length of output: 13200 Let me examine the test coverage and check for any edge cases in the migration. 🏁 Script executed: Length of output: 1786 🏁 Script executed: Length of output: 1116 🏁 Script executed: Length of output: 998 🏁 Script executed: Length of output: 50 🏁 Script executed: Length of output: 788 #!/bin/bash
# Look at the router test to see what it covers
cat apps/pipelines/tests/test_runnable_builder.py | sed -n '/class TestRouterNode/,/^class [^T]/p' | head -100#!/bin/bash
# Check the RouterMixin validators more carefully
cat apps/pipelines/nodes/mixins.py | sed -n '/class RouterMixin/,/^class /p'#!/bin/bash
# Check if there are any keywords with uppercase in existing test data
rg -i "keywords.*[A-Z]" apps/pipelines/tests/ --type py -C2#!/bin/bash
# Look at how StaticRouterNode processes keywords
rg -n -A15 "class StaticRouterNode" apps/pipelines/nodes/nodes.py |
❌ 23 Tests Failed:
View the top 3 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
| node.params["keywords"] = [kw.lower() for kw in node.params["keywords"]] | ||
| nodes_to_update.append(node) | ||
|
|
||
| if idx == 100: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean that it only saves the first batch of 100 items?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather could we do something like:
for chunk in queryset.iterator(chunk_size=100):
# Process chunk
# Then bulk_update immediately
snopoke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern with this is that users are still responsible for adding the keywords to the router prompt so typos are still likely to occur which the model might follow.
Technical Description
Forces all router keywords to lowercase. See https://dimagi.slack.com/archives/C07EJ7ENB1N/p1765973629616449
Migrations
Demo
https://www.loom.com/share/004b1243d47646d18fd866085695e4c1
Docs and Changelog
From now on, routers will only support lower cased keywords and and force all keywords to be lower case.