refactor: deduplicate code #38

rayokamoto · 2026-02-12T06:55:01Z

Summary by CodeRabbit

Refactor
- Removed the EpisodeResult result type and simplified MetaWorld environment interfaces.
- Streamlined environment registry by removing legacy registration checks.
- Centralized environment loading/evaluation flow and added container cleanup utilities.
- Consolidated response construction for scores and weights endpoints.
- Introduced an API client and added GPU enablement flags to worker orchestration.
Config
- Validator and Miner configs now inherit shared network defaults.
Tests
- Extensive test consolidations and parametrizations for crypto, cycle isolation, and genesis suites.
Chores
- Updated ignore entries in project ignore file.

coderabbitai · 2026-02-12T06:55:24Z

Warning

Rate limit exceeded

@rayokamoto has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 5 minutes and 37 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Replaces family-wide environment lookups with registry family API, centralizes API response construction, adds env_loader utilities for loading/warming/evaluating environments, refactors executor workers to use APIClient and env_loader, removes EpisodeResult dataclass, updates configs, and consolidates tests with fixtures and parametrization.

Changes

Cohort / File(s)	Summary
Git / Config `/.gitignore`, `kinitro/config.py`	Added `.ruff_cache/` to ignore; `ValidatorConfig` and `MinerConfig` now subclass `NetworkConfig` and drop duplicate network fields and env prefix config.
Environment family lookups `environments/_template/env.py`, `environments/genesis/env.py`, `environments/metaworld/env.py`	Replaced `get_all_environment_ids` + filtering with `get_environments_by_family("<family>")` calls.
Environments core `kinitro/environments/registry.py`, `kinitro/environments/__init__.py`, `kinitro/environments/base.py`	Removed `is_family_available` and `register_environment` from registry; removed `EpisodeResult` dataclass and its export.
MetaWorld environment `kinitro/environments/metaworld_env.py`	Removed `action_bounds` property and added private `_sync_camera_state` to centralize camera sync logic.
API routes `kinitro/api/routes/scores.py`, `kinitro/api/routes/weights.py`	Extracted inline response assembly into helpers (`_build_scores_response`, `_build_weights_response`) and updated endpoints to use them.
Executor utilities `kinitro/executor/env_loader.py`	New module: `build_load_kwargs`, `load_and_warmup_env`, `run_evaluation`, and `force_remove_container` to standardize loading, warmup, evaluation, and cleanup.
Executor refactor `kinitro/executor/family_worker.py`, `kinitro/executor/worker.py`, `kinitro/executor/worker_process.py`	Switched to `APIClient` for API calls, use new env_loader utilities for env lifecycle and evaluation, added `gpu_enabled`/`eval_gpu` propagation, and centralized container cleanup.
Registry & templates `environments/_template/env.py`	Template updated to use `get_environments_by_family("myenv")` (family string) instead of filtering IDs.
Tests: crypto, cycles, genesis `tests/unit/test_crypto.py`, `tests/unit/test_cycle_isolation.py`, `tests/unit/test_genesis.py`	Introduced fixtures and parameterized tests, added test helpers (`keypair`, `_make_mock_cycle`, `_make_mock_task`, `_make_g1_env`), and consolidated mock side effects for clarity.

Sequence Diagram(s)

sequenceDiagram
    participant Worker
    participant EnvLoader as env_loader
    participant Affinetes as af_env
    participant Container
    participant APIClient

    Worker->>EnvLoader: build_load_kwargs(image, mode, ...)
    EnvLoader-->>Worker: load_kwargs
    Worker->>EnvLoader: load_and_warmup_env(family, image, load_kwargs)
    EnvLoader->>Affinetes: load_env(load_kwargs)
    Affinetes->>Container: create/start container
    Container-->>Affinetes: ready
    Affinetes-->>EnvLoader: env instance
    EnvLoader->>Affinetes: env.list_environments() (warmup)
    EnvLoader-->>Worker: env instance

    loop per task
        Worker->>EnvLoader: run_evaluation(env, task, ...)
        EnvLoader->>Affinetes: env.evaluate(task, ...)
        Affinetes-->>EnvLoader: TaskResult
        EnvLoader-->>Worker: TaskResult
        Worker->>APIClient: submit_results(TaskResult)
        APIClient-->>Worker: ack
    end

    Worker->>EnvLoader: force_remove_container(container_name)
    EnvLoader->>Container: docker rm -f
    Container-->>EnvLoader: removed

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

refactor: split eval-env into separate metaworld and procthor environments #7: Changes to environment registry APIs and family-based metadata handling, directly related to get_environments_by_family adoption.
feat(executor): add concurrent multi-process executor #25: Executor and worker refactor overlapping in APIClient usage, task fetch/submit flows, and run_family_worker signature changes.
feat(executor): per-family Docker images and verification fixes #16: Modifications to executor environment-loading and per-family container/image handling related to the new env_loader utilities.

Suggested reviewers

Shr1ftyy

"🐰
I hopped through code with eager paws,
I stitched the registry, closed the gaps,
Warmed up the envs, ran tasks with cheer,
Cleaned up containers, nothing to fear —
A carrot for tests passing fast! 🥕"

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title 'refactor: deduplicate code' accurately and clearly describes the primary purpose of the changeset, which involves consolidating duplicated logic across multiple files.
Docstring Coverage	✅ Passed	Docstring coverage is 86.25% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch refactor/cleanup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/unit/test_cycle_isolation.py (1)

91-105: ⚠️ Potential issue | 🟡 Minor

Test doesn't verify its stated intent — no completed/failed task is present.

The name and docstring claim this test checks that completed/failed tasks are left unchanged, but it only creates a PENDING task and asserts it gets cancelled. There's no COMPLETED or FAILED task to verify is left untouched, making this effectively a duplicate of test_cancels_running_cycle_and_tasks.

Add a completed task mock and assert its status remains unchanged:

Proposed fix

     async def test_leaves_completed_tasks_unchanged(self, mock_session):
         """Completed/failed tasks from incomplete cycles are not modified."""
         mock_cycle = _make_mock_cycle()
-        mock_task = _make_mock_task(TaskStatus.PENDING.value)
+        mock_pending_task = _make_mock_task(TaskStatus.PENDING.value)
+        mock_completed_task = _make_mock_task(TaskStatus.COMPLETED.value)
 
-        mock_session.execute = _mock_execute_results([mock_cycle], [mock_task])
+        mock_session.execute = _mock_execute_results(
+            [mock_cycle],
+            [mock_pending_task, mock_completed_task],
+        )
 
         storage = Storage("postgresql+asyncpg://test:test@localhost/test")
 
         cycles_cancelled, tasks_cancelled = await storage.cancel_incomplete_cycles(mock_session)
 
-        # Only the pending task is cancelled
+        # Only the pending task is cancelled; completed task is untouched
         assert tasks_cancelled == 1
-        assert mock_task.status == TaskStatus.FAILED.value
+        assert mock_pending_task.status == TaskStatus.FAILED.value
+        assert mock_completed_task.status == TaskStatus.COMPLETED.value

🤖 Fix all issues with AI agents

In `@kinitro/executor/env_loader.py`:
- Around line 130-143: The code is discarding evaluation error details by
hardcoding error=None when building the TaskResult; change this to propagate any
error returned by the environment by extracting error = result.get("error") (or
None by default) and pass that into the TaskResult constructor (TaskResult(...,
error=error)); keep existing fields (task.task_uuid, success, score,
total_reward, timesteps) and only replace the hardcoded None with the extracted
error so evaluation failures include the environment's error message.

In `@kinitro/executor/family_worker.py`:
- Around line 111-121: The FamilyWorker instantiation chain never receives the
GPU flag so gpu_enabled is always false; pass ExecutorConfig.eval_gpu through
WorkerProcess.start into run_family_worker, add a gpu_enabled parameter to
run_family_worker and propagate it into FamilyWorker.__init__ (store as
self.gpu_enabled), and include gpu_enabled=self.gpu_enabled when calling
build_load_kwargs in FamilyWorker before load_and_warmup_env; update any
signatures that call run_family_worker (and tests) accordingly so family workers
honor eval_gpu instead of defaulting to False.

🧹 Nitpick comments (7)

kinitro/api/routes/scores.py (1)
17-17: Add type annotations to the helper's parameters.

cycle and scores_orm are untyped, which reduces IDE support and type-checker coverage. Since the types are already imported or available (EvaluationCycleORM from storage, list[MinerScoreORM]), annotating them is straightforward. The same applies to _build_weights_response in weights.py.
Proposed fix

You'll need to import the ORM types:
from kinitro.backend.models import EvaluationCycleORM, MinerScoreORM
Then annotate:
-def _build_scores_response(cycle, scores_orm) -> ScoresResponse:
+def _build_scores_response(cycle: EvaluationCycleORM, scores_orm: list[MinerScoreORM]) -> ScoresResponse:
As per coding guidelines, "Add return types to public functions and methods. Prefer BaseSettings and BaseModel type annotations for config/DTOs."
kinitro/api/routes/weights.py (1)
13-13: Add type annotations to the helper's parameters (same as scores.py).

Note that cycle is nullable here since get_cycle returns EvaluationCycleORM | None, and the body already handles None with conditionals on lines 25–27.
Proposed fix
-def _build_weights_response(weights_orm, cycle) -> WeightsResponse:
+def _build_weights_response(weights_orm: ComputedWeightsORM, cycle: EvaluationCycleORM | None) -> WeightsResponse:
You'll need to add the ORM import:
from kinitro.backend.models import ComputedWeightsORM, EvaluationCycleORM, WeightsResponse, WeightsU16
As per coding guidelines, "Add return types to public functions and methods. Prefer BaseSettings and BaseModel type annotations for config/DTOs."
tests/unit/test_crypto.py (2)
58-60: Good use of fixture to deduplicate keypair generation.

Since the keypair is never mutated, you could optionally scope this to "class" to avoid regenerating keys for every test method, but function scope is fine for isolation.

215-217: Remove the unnecessary alias backend_keypair = keypair.

Either rename the fixture parameter to backend_keypair directly, or use keypair throughout the method body. The intermediate assignment adds no value.
Option A: rename the fixture parameter
-    def test_full_commitment_flow(self, keypair):
+    def test_full_commitment_flow(self, backend_keypair):
         """Test full flow: generate keys, encrypt, parse, decrypt."""
-        backend_keypair = keypair
Option B: use `keypair` throughout
     def test_full_commitment_flow(self, keypair):
         """Test full flow: generate keys, encrypt, parse, decrypt."""
-        backend_keypair = keypair
-
         # Miner encrypts their deployment ID
         deployment_id = "95edf2b6-e18b-400a-8398-5573df10e5e4"
-        encrypted_blob = encrypt_deployment_id(deployment_id, backend_keypair.public_key_hex())
+        encrypted_blob = encrypt_deployment_id(deployment_id, keypair.public_key_hex())
         ...
-            parsed["encrypted_deployment"], backend_keypair.private_key
+            parsed["encrypted_deployment"], keypair.private_key
tests/unit/test_genesis.py (2)
78-80: object.__new__ bypass is fragile — consider a comment listing the assumed invariant.

This works because _compute_reward and _check_success currently only use their explicit arguments and never touch self.* attributes. If a future refactor introduces instance-state access in those methods, every test here will fail with an opaque AttributeError instead of a clear message.

A minimal safeguard: add a brief inline comment noting the invariant, or assert early that the methods don't rely on instance state (e.g., a smoke attribute check). Not blocking, since the pattern is documented in the docstring.

620-762: Consider a @pytest.fixture for the repeated _make_g1_env() calls.

Every test method in TestG1Reward (and TestG1Success) calls _make_g1_env() identically. A class-scoped or function-scoped fixture would reduce the repetition and make it trivial to swap the construction strategy later.
Example fixture
`@pytest.fixture`()
def g1_env() -> G1Environment:
    return _make_g1_env()
Then each test takes g1_env as a parameter instead of calling the helper.
environments/metaworld/env.py (1)

46-343: Significant code duplication between metaworld/env.py and genesis/env.py Actor classes.

Given this PR's goal of deduplicating code, it's worth noting that the Actor classes in environments/metaworld/env.py and environments/genesis/env.py share nearly identical structure: _call_miner, _run_evaluation, _build_error_result, _get_env, and cleanup are essentially the same, differing only in the family prefix string and a few minor details (e.g., genesis uses asyncio.Lock for serialization). A shared base Actor class or mixin could eliminate ~200 lines of duplication. This could be a follow-up.

kinitro/executor/env_loader.py

kinitro/executor/family_worker.py

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/unit/test_cycle_isolation.py (1)
91-105: ⚠️ Potential issue | 🟡 Minor

Test name doesn't match what's actually verified.

test_leaves_completed_tasks_unchanged only supplies a pending task via _make_mock_task(TaskStatus.PENDING.value). It never creates a completed task to assert it remains unmodified. Consider either renaming the test (e.g., test_cancels_only_pending_tasks) or adding a completed mock task and asserting its status is unchanged.
💡 Option: add a completed task to actually verify the claim
     async def test_leaves_completed_tasks_unchanged(self, mock_session):
         """Completed/failed tasks from incomplete cycles are not modified."""
         mock_cycle = _make_mock_cycle()
         mock_task = _make_mock_task(TaskStatus.PENDING.value)
+        mock_completed_task = _make_mock_task(TaskStatus.COMPLETED.value)
 
-        mock_session.execute = _mock_execute_results([mock_cycle], [mock_task])
+        mock_session.execute = _mock_execute_results([mock_cycle], [mock_task, mock_completed_task])
 
         storage = Storage("postgresql+asyncpg://test:test@localhost/test")
 
         cycles_cancelled, tasks_cancelled = await storage.cancel_incomplete_cycles(mock_session)
 
         # Only the pending task is cancelled
         assert tasks_cancelled == 1
         assert mock_task.status == TaskStatus.FAILED.value
+        assert mock_completed_task.status == TaskStatus.COMPLETED.value

🧹 Nitpick comments (7)

tests/unit/test_genesis.py (1)
625-627: Consider extracting g1_env to a module-level fixture to avoid duplication.

The same g1_env fixture is defined identically in both TestG1Reward and TestG1Success. You could define it once at module scope (or in conftest.py) to further reduce duplication — which aligns with this PR's objective.
♻️ Example: module-level fixture
# At module level or in conftest.py
`@pytest.fixture`()
def g1_env() -> G1Environment:
    return _make_g1_env()
Then remove the per-class fixture definitions and add g1_env as a parameter to each test method.
Also applies to: 772-774
tests/unit/test_crypto.py (1)

58-60: Same keypair fixture pattern duplicated across three test classes.

Identical BackendKeypair.generate() fixtures appear in TestBackendKeypair, TestEncryptDecrypt, and TestIntegration (as backend_keypair). Consider a single module-level or conftest.py fixture to consolidate, consistent with this PR's deduplication goal.

Also applies to: 141-143, 211-213
kinitro/api/routes/weights.py (1)
39-54: Consider adding return type annotations to endpoint functions.

The coding guidelines require return types on public functions. While FastAPI infers the response from response_model, adding -> WeightsResponse would satisfy the guideline and improve readability.
 `@router.get`("/latest", response_model=WeightsResponse)
 async def get_latest_weights(
     session: AsyncSession = Depends(get_session),
     storage: Storage = Depends(get_storage),
-):
+) -> WeightsResponse:
Same for get_weights_for_block at Line 58.
kinitro/executor/env_loader.py (1)
147-161: Silent except Exception: pass hides container-removal failures.

During cleanup debugging, a completely silent swallow gives zero visibility. A logger.debug would cost nothing and help diagnose stuck-container issues.
Proposed fix
     try:
         subprocess.run(
             ["docker", "rm", "-f", container_name],
             capture_output=True,
             text=True,
             timeout=5,
             check=False,
         )
-    except Exception:
-        pass
+    except Exception as e:
+        logger.debug("force_remove_container_failed", container=container_name, error=str(e))
As per coding guidelines: "Avoid broad except Exception unless you re-raise or return a clear error" and "Provide actionable error messages and log details with structlog".
kinitro/executor/family_worker.py (1)

244-246: Nit: _submit_result wraps a single result in a list.

submit_results expects a list, but this always submits one result. This is fine if the API client handles single-element lists efficiently, but worth noting in case batching is desired later.

kinitro/executor/worker_process.py (1)

39-62: Positional arg list is getting unwieldy (17 args) — consider passing a config/dataclass.

This isn't a regression from this PR, but adding eval_gpu as the 17th positional arg further increases the risk of ordering mistakes. A config dataclass or kwargs dict would be safer for cross-process serialization. Fine to defer.

kinitro/executor/worker.py (1)

22-34: Remove the unused EvalEnvConfig dataclass.

This dataclass is not referenced anywhere in the codebase and appears to be leftover from earlier code. Removing it (lines 22-33) will eliminate dead code.

Shr1ftyy

lgtm.

rayokamoto requested a review from Shr1ftyy February 12, 2026 06:55

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

kinitro/executor/env_loader.py Show resolved Hide resolved

kinitro/executor/family_worker.py Show resolved Hide resolved

rayokamoto force-pushed the refactor/cleanup branch from 3300722 to 2cdedef Compare February 12, 2026 07:07

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

refactor: deduplicate code

271097d

rayokamoto force-pushed the refactor/cleanup branch from 2cdedef to 271097d Compare February 12, 2026 07:19

Shr1ftyy approved these changes Feb 12, 2026

View reviewed changes

rayokamoto merged commit ba1b778 into main Feb 12, 2026
2 checks passed

rayokamoto deleted the refactor/cleanup branch February 12, 2026 07:52

coderabbitai bot mentioned this pull request Feb 12, 2026

feat: improve typing throughout the codebase #39

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: deduplicate code #38

refactor: deduplicate code #38

rayokamoto commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 12, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Shr1ftyy left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

refactor: deduplicate code #38

refactor: deduplicate code #38

Conversation

rayokamoto commented Feb 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Shr1ftyy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rayokamoto commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 12, 2026 •

edited

Loading