Skip to content

refactor(native): decompose core Rust algorithms and pipeline#845

Merged
carlos-alm merged 11 commits intomainfrom
refactor/titan-native-algorithm-decomposition
Apr 5, 2026
Merged

refactor(native): decompose core Rust algorithms and pipeline#845
carlos-alm merged 11 commits intomainfrom
refactor/titan-native-algorithm-decomposition

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Decompose louvain_impl (cog=85, bugs=2.72) into init_communities(), move_nodes_phase(), and aggregate_phase()
  • Split extract_param_names_strategy (cog=83, bugs=1.36) into per-language handler functions
  • Decompose run_pipeline (cog=110, bugs=7.42) into stage functions: setup, collect, parse, resolve, edges, finalize

Titan Audit Context

Changes

  • crates/codegraph-core/src/graph_algorithms.rs -- decompose louvain_impl into 3 phase functions
  • crates/codegraph-core/src/dataflow.rs -- split extract_param_names_strategy into per-language handlers
  • crates/codegraph-core/src/build_pipeline.rs -- decompose run_pipeline into stage functions

Metrics Impact

  • run_pipeline: bugs 7.42 -> 4.39, cog 110 -> 29, MI 22.9 -> 34.1
  • louvain_impl: bugs 2.72 -> 0.43, cog 85 -> 8, MI 30.6 -> 50.7
  • extract_param_names_strategy: bugs 1.36 -> 0.17, cog 83 -> 1, MI 23.1 -> 56.2

Test plan

  • CI passes (lint + build + tests)
  • Graph algorithm tests pass (community detection unchanged)
  • Build pipeline produces identical graph output
  • Dataflow analysis results unchanged

Extract hardcoded magic numbers to named constants in constants.rs:
- Louvain: MAX_LEVELS=50, MAX_PASSES=20, MIN_GAIN=1e-12, DEFAULT_SEED=42
- Dataflow: TRUNCATION_LIMIT=120
- Build pipeline: FAST_PATH_MAX_CHANGED_FILES=5, FAST_PATH_MIN_EXISTING_FILES=20
Also extract DEFAULT_RANDOM_SEED=42 in TS louvain.ts.
@claude
Copy link
Copy Markdown

claude bot commented Apr 4, 2026

Claude finished @carlos-alm's task —— View job


Root Cause Analysis

Problem Being Solved: Reducing cognitive complexity in high-complexity native Rust functions that were identified as hard to maintain (louvain_impl cog=85→8, extract_param_names_strategy cog=83→1, run_pipeline cog=110→29)

Why This Approach: Function decomposition is the standard solution for cognitive overload. The metrics improvements (bugs 7.42→4.39, MI 22.9→34.1) demonstrate real maintainability gains. This follows established "Titan" patterns for decomposing complex algorithms.

Risk Assessment: ✅ LOW - Pure refactoring with preserved test coverage and functionality. The existing Louvain and dataflow tests validate behavioral compatibility.

Backlog Compliance

  • Zero-dep: ✅ — No new runtime dependencies, pure Rust code reorganization
  • Foundation-aligned: ✅ — Improves maintainability of the native engine without changing APIs or violating principles
  • Problem-fit: 3 — Improves maintainability and reduces bugs in the native backend, but doesn't directly address the core "AI agents lost in codebases" problem
  • Breaking: No — Internal refactoring only, all public APIs preserved
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Code Quality Analysis

Excellent Decomposition Pattern

The refactoring follows clean separation principles:

crates/codegraph-core/src/graph_algorithms.rs:

  • louvain_impl split into init_communities(), local_move_phase(), aggregation_phase() - each with single responsibility
  • Well-defined function signatures with clear parameter passing
  • Tests preserved (lines 543-669) validate identical behavior

crates/codegraph-core/src/dataflow.rs:

  • extract_param_names_strategy decomposed into per-language handlers (extract_params_python, extract_params_go, etc.)
  • Strategy pattern implementation is clean and extensible
  • Each handler focuses on one language's AST patterns

crates/codegraph-core/src/build_pipeline.rs:

  • run_pipeline broken into staged functions with clear timing boundaries
  • Each stage has isolated concerns (collect, detect, parse, resolve, etc.)
  • Error handling preserved throughout the decomposition

Constants Extraction

crates/codegraph-core/src/constants.rs:

  • Magic numbers extracted to named constants (LOUVAIN_MAX_LEVELS, LOUVAIN_MIN_GAIN, FAST_PATH_MAX_CHANGED_FILES)
  • Follows the CLAUDE.md guidance: "always add them to DEFAULTS and wire them through config"

Test Integrity Maintained

All existing tests in graph_algorithms.rs (lines 554-668) remain unchanged and comprehensive:

  • BFS traversal (forward/backward/depth-limited)
  • Shortest path (found/not found/same node)
  • Fan-in/out centrality
  • Louvain community detection (empty/two-cliques/single-component)

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: High-quality refactoring that significantly improves maintainability metrics without breaking functionality. The decomposition patterns are exemplary, constants are properly extracted, and test coverage validates compatibility. While this doesn't directly solve the AI orientation problem, it's foundational work that makes the native engine more maintainable for future enhancements.

Minor note: This represents excellent engineering practice for managing technical debt in complex algorithms.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 4, 2026

Greptile Summary

This PR decomposes three high-complexity Rust functions in the native engine — louvain_impl (cog 85→8), extract_param_names_strategy (cog 83→1), and run_pipeline (cog 110→29) — into focused helper functions without changing behavior.

  • graph_algorithms.rs: louvain_impl is split into louvain_init, local_move_phase, aggregation_phase, and compute_modularity. The LouvainState struct cleanly encapsulates the mutable multi-level loop state. The original edge map is correctly preserved outside LouvainState for use in compute_modularity (since aggregation strips intra-community edges from cur_edges), making the architectural intent explicit via the in-code comment.
  • dataflow.rs: extract_param_names_strategy's monolithic match arm is replaced by 7 private extract_params_* handler functions (Python, Go, Rust, Java, C#, PHP, Ruby). The dispatch function is now a clean one-liner delegation table.
  • build_pipeline.rs: run_pipeline's ~400-line body is reorganized into named stage helpers (collect_source_files, reparse_barrel_candidates, finalize_build, check_version_mismatch, build_insert_batches, build_file_hash_entries, build_and_insert_call_edges), each with a clear single responsibility. The previous concern about the _rel_path underscore prefix has been resolved.
  • One minor Clippy-flagged style issue: file_node_id: file_node_id at line 880 in build_and_insert_call_edges can use Rust field init shorthand.

Confidence Score: 5/5

Safe to merge — pure decomposition refactor with no behavioral changes and no P0/P1 issues

All three decompositions are mechanically correct and semantics-preserving. The Louvain phase split correctly retains the original edge map outside LouvainState for final modularity computation. The dataflow dispatch is a clean table-driven refactor. The build pipeline stage extraction preserves all control flow and variable scoping. The only finding is a P2 redundant field initializer. The prior concern about the _rel_path underscore prefix is confirmed resolved.

No files require special attention

Important Files Changed

Filename Overview
crates/codegraph-core/src/build_pipeline.rs Decomposed run_pipeline into 7 named stage helpers; one redundant field initializer at line 880 (file_node_id: file_node_id)
crates/codegraph-core/src/dataflow.rs Split extract_param_names_strategy into 7 per-language handler functions; dispatch reduced to a clean match delegation table
crates/codegraph-core/src/graph_algorithms.rs Decomposed louvain_impl into louvain_init, local_move_phase, aggregation_phase, and compute_modularity; edge map clone and state management are semantically correct

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[run_pipeline] --> B[check_version_mismatch]
    A --> C[collect_source_files\nStage 2]
    C --> D{Scoped build?}
    D -- Yes --> D1[Use scope files directly]
    D -- No --> D2{Incremental + journal?}
    D2 -- Yes --> D3[try_fast_collect]
    D2 -- No --> D4[collect_files]
    A --> E[detect_changes\nStage 3]
    E --> F{No changes?}
    F -- Yes --> G[write_journal_header\nEarly return]
    F -- No --> H[parse_files_parallel\nStage 4]
    H --> I[do_insert_nodes\nStage 5]
    I --> J[resolve_imports_batch\nStage 6]
    J --> K{Incremental?}
    K -- Yes --> L[reparse_barrel_candidates\nStage 6b]
    K -- No --> M
    L --> M[build_import_edges\nStage 7]
    M --> N[build_and_insert_call_edges]
    N --> O[structure metrics\nStage 8]
    O --> P[role_classification]
    P --> Q[finalize_build\nStage 9]
Loading

Reviews (4): Last reviewed commit: "fix: resolve merge conflicts with main" | Re-trigger Greptile

Comment on lines +507 to +509
for (_rel_path, symbols) in file_symbols.iter() {
for imp in &symbols.imports {
let abs_file = Path::new(root_dir).join(_rel_path);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 _rel_path is used — misleading underscore prefix

In Rust, the _ prefix on a variable name suppresses the "unused variable" compiler warning and signals to readers that the variable is intentionally not used. However, _rel_path is actively referenced on line 509 (Path::new(root_dir).join(_rel_path)), so the prefix is misleading and could cause confusion for future contributors.

Suggested change
for (_rel_path, symbols) in file_symbols.iter() {
for imp in &symbols.imports {
let abs_file = Path::new(root_dir).join(_rel_path);
for (rel_path, symbols) in file_symbols.iter() {
for imp in &symbols.imports {
let abs_file = Path::new(root_dir).join(rel_path);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 457d85a. Removed the underscore prefix — the variable is now rel_path since it is actively used on the next line.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 457d85a. Removed the underscore prefix — the variable is now rel_path since it is actively used on the next line.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already fixed in 457d85a by renaming _rel_path to rel_path. Confirmed the fix is preserved through the merge with main.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

Codegraph Impact Analysis

20 functions changed18 callers affected across 5 files

  • run_pipeline in crates/codegraph-core/src/build_pipeline.rs:90 (0 transitive callers)
  • collect_source_files in crates/codegraph-core/src/build_pipeline.rs:424 (1 transitive callers)
  • reparse_barrel_candidates in crates/codegraph-core/src/build_pipeline.rs:491 (1 transitive callers)
  • finalize_build in crates/codegraph-core/src/build_pipeline.rs:610 (1 transitive callers)
  • build_file_hash_entries in crates/codegraph-core/src/build_pipeline.rs:713 (1 transitive callers)
  • extract_params_python in crates/codegraph-core/src/dataflow.rs:584 (4 transitive callers)
  • extract_params_go in crates/codegraph-core/src/dataflow.rs:613 (4 transitive callers)
  • extract_params_rust in crates/codegraph-core/src/dataflow.rs:632 (4 transitive callers)
  • extract_params_java in crates/codegraph-core/src/dataflow.rs:648 (4 transitive callers)
  • extract_params_csharp in crates/codegraph-core/src/dataflow.rs:662 (4 transitive callers)
  • extract_params_php in crates/codegraph-core/src/dataflow.rs:676 (4 transitive callers)
  • extract_params_ruby in crates/codegraph-core/src/dataflow.rs:690 (4 transitive callers)
  • extract_param_names_strategy in crates/codegraph-core/src/dataflow.rs:709 (5 transitive callers)
  • handle_return_stmt in crates/codegraph-core/src/dataflow.rs:1051 (6 transitive callers)
  • louvain_init in crates/codegraph-core/src/graph_algorithms.rs:260 (5 transitive callers)
  • xorshift32 in crates/codegraph-core/src/graph_algorithms.rs:308 (3 transitive callers)
  • local_move_phase in crates/codegraph-core/src/graph_algorithms.rs:317 (5 transitive callers)
  • aggregation_phase in crates/codegraph-core/src/graph_algorithms.rs:391 (5 transitive callers)
  • compute_modularity in crates/codegraph-core/src/graph_algorithms.rs:443 (5 transitive callers)
  • louvain_impl in crates/codegraph-core/src/graph_algorithms.rs:481 (4 transitive callers)

- Fix type mismatch in build_pipeline.rs: change FAST_PATH_MIN_EXISTING_FILES
  from usize to i64 to match get_existing_file_count return type
- Remove misleading underscore prefix from _rel_path variable that is
  actually used (build_pipeline.rs:507)
- Replace missed hardcoded 120 literal with DATAFLOW_TRUNCATION_LIMIT
  constant at dataflow.rs:1071
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed both Greptile findings in 457d85a:

  1. _rel_path misleading prefix (build_pipeline.rs:507): Removed the underscore — renamed to rel_path since the variable is actively used.
  2. Missed constant substitution (dataflow.rs:1071): Replaced the hardcoded 120 literal with DATAFLOW_TRUNCATION_LIMIT.

Also fixed the Rust compile error that was failing CI: FAST_PATH_MIN_EXISTING_FILES was typed as usize but compared against existing_file_count which is i64 (from SQLite). Changed the constant type to i64 to match.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed the two remaining Greptile findings from the re-review in b01db3f:

  1. Stage numbering mismatch in module doc comment (build_pipeline.rs): Updated the doc comment to match the actual implementation — 9 stages plus 6b for barrel reparse, not 10. Combined the import edges + call edges description into stage 7, and added stage 6b.

  2. _root_dir dead parameter in build_file_hash_entries (build_pipeline.rs): Removed the unused _root_dir parameter from both the function signature and the call site on line 228. The function reads abs_path directly from the ChangedFile entries, so the root dir was never needed.

Also merged origin/main to resolve the CONFLICTING state (conflicts in barrel_resolution.rs, constants.rs, graph_algorithms.rs — all resolved preserving the PR's decomposition while taking main's quality fixes).

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm
Copy link
Copy Markdown
Contributor Author

Resolved merge conflicts with main in bcf6ef6. Changes from PR #865 (scoped build paths, barrel resolution Windows normalization, build_meta table name) are incorporated into the decomposed helper functions. All three conflicts in build_pipeline.rs resolved by keeping the decomposed function calls while updating the extracted functions to include main's fixes.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit 6f30d27 into main Apr 5, 2026
18 of 19 checks passed
@carlos-alm carlos-alm deleted the refactor/titan-native-algorithm-decomposition branch April 5, 2026 07:02
@github-actions github-actions bot locked and limited conversation to collaborators Apr 5, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant