Skip to content

Conversation

@rayokamoto
Copy link
Contributor

@rayokamoto rayokamoto commented Feb 12, 2026

Summary by CodeRabbit

  • Refactor

    • Removed the EpisodeResult result type and simplified MetaWorld environment interfaces.
    • Streamlined environment registry by removing legacy registration checks.
    • Centralized environment loading/evaluation flow and added container cleanup utilities.
    • Consolidated response construction for scores and weights endpoints.
    • Introduced an API client and added GPU enablement flags to worker orchestration.
  • Config

    • Validator and Miner configs now inherit shared network defaults.
  • Tests

    • Extensive test consolidations and parametrizations for crypto, cycle isolation, and genesis suites.
  • Chores

    • Updated ignore entries in project ignore file.

@rayokamoto rayokamoto requested a review from Shr1ftyy February 12, 2026 06:55
@coderabbitai
Copy link

coderabbitai bot commented Feb 12, 2026

Warning

Rate limit exceeded

@rayokamoto has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 5 minutes and 37 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Replaces family-wide environment lookups with registry family API, centralizes API response construction, adds env_loader utilities for loading/warming/evaluating environments, refactors executor workers to use APIClient and env_loader, removes EpisodeResult dataclass, updates configs, and consolidates tests with fixtures and parametrization.

Changes

Cohort / File(s) Summary
Git / Config
/.gitignore, kinitro/config.py
Added .ruff_cache/ to ignore; ValidatorConfig and MinerConfig now subclass NetworkConfig and drop duplicate network fields and env prefix config.
Environment family lookups
environments/_template/env.py, environments/genesis/env.py, environments/metaworld/env.py
Replaced get_all_environment_ids + filtering with get_environments_by_family("<family>") calls.
Environments core
kinitro/environments/registry.py, kinitro/environments/__init__.py, kinitro/environments/base.py
Removed is_family_available and register_environment from registry; removed EpisodeResult dataclass and its export.
MetaWorld environment
kinitro/environments/metaworld_env.py
Removed action_bounds property and added private _sync_camera_state to centralize camera sync logic.
API routes
kinitro/api/routes/scores.py, kinitro/api/routes/weights.py
Extracted inline response assembly into helpers (_build_scores_response, _build_weights_response) and updated endpoints to use them.
Executor utilities
kinitro/executor/env_loader.py
New module: build_load_kwargs, load_and_warmup_env, run_evaluation, and force_remove_container to standardize loading, warmup, evaluation, and cleanup.
Executor refactor
kinitro/executor/family_worker.py, kinitro/executor/worker.py, kinitro/executor/worker_process.py
Switched to APIClient for API calls, use new env_loader utilities for env lifecycle and evaluation, added gpu_enabled/eval_gpu propagation, and centralized container cleanup.
Registry & templates
environments/_template/env.py
Template updated to use get_environments_by_family("myenv") (family string) instead of filtering IDs.
Tests: crypto, cycles, genesis
tests/unit/test_crypto.py, tests/unit/test_cycle_isolation.py, tests/unit/test_genesis.py
Introduced fixtures and parameterized tests, added test helpers (keypair, _make_mock_cycle, _make_mock_task, _make_g1_env), and consolidated mock side effects for clarity.

Sequence Diagram(s)

sequenceDiagram
    participant Worker
    participant EnvLoader as env_loader
    participant Affinetes as af_env
    participant Container
    participant APIClient

    Worker->>EnvLoader: build_load_kwargs(image, mode, ...)
    EnvLoader-->>Worker: load_kwargs
    Worker->>EnvLoader: load_and_warmup_env(family, image, load_kwargs)
    EnvLoader->>Affinetes: load_env(load_kwargs)
    Affinetes->>Container: create/start container
    Container-->>Affinetes: ready
    Affinetes-->>EnvLoader: env instance
    EnvLoader->>Affinetes: env.list_environments() (warmup)
    EnvLoader-->>Worker: env instance

    loop per task
        Worker->>EnvLoader: run_evaluation(env, task, ...)
        EnvLoader->>Affinetes: env.evaluate(task, ...)
        Affinetes-->>EnvLoader: TaskResult
        EnvLoader-->>Worker: TaskResult
        Worker->>APIClient: submit_results(TaskResult)
        APIClient-->>Worker: ack
    end

    Worker->>EnvLoader: force_remove_container(container_name)
    EnvLoader->>Container: docker rm -f
    Container-->>EnvLoader: removed
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • Shr1ftyy

"🐰
I hopped through code with eager paws,
I stitched the registry, closed the gaps,
Warmed up the envs, ran tasks with cheer,
Cleaned up containers, nothing to fear —
A carrot for tests passing fast! 🥕"

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'refactor: deduplicate code' accurately and clearly describes the primary purpose of the changeset, which involves consolidating duplicated logic across multiple files.
Docstring Coverage ✅ Passed Docstring coverage is 86.25% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor/cleanup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/test_cycle_isolation.py (1)

91-105: ⚠️ Potential issue | 🟡 Minor

Test doesn't verify its stated intent — no completed/failed task is present.

The name and docstring claim this test checks that completed/failed tasks are left unchanged, but it only creates a PENDING task and asserts it gets cancelled. There's no COMPLETED or FAILED task to verify is left untouched, making this effectively a duplicate of test_cancels_running_cycle_and_tasks.

Add a completed task mock and assert its status remains unchanged:

Proposed fix
     async def test_leaves_completed_tasks_unchanged(self, mock_session):
         """Completed/failed tasks from incomplete cycles are not modified."""
         mock_cycle = _make_mock_cycle()
-        mock_task = _make_mock_task(TaskStatus.PENDING.value)
+        mock_pending_task = _make_mock_task(TaskStatus.PENDING.value)
+        mock_completed_task = _make_mock_task(TaskStatus.COMPLETED.value)
 
-        mock_session.execute = _mock_execute_results([mock_cycle], [mock_task])
+        mock_session.execute = _mock_execute_results(
+            [mock_cycle],
+            [mock_pending_task, mock_completed_task],
+        )
 
         storage = Storage("postgresql+asyncpg://test:test@localhost/test")
 
         cycles_cancelled, tasks_cancelled = await storage.cancel_incomplete_cycles(mock_session)
 
-        # Only the pending task is cancelled
+        # Only the pending task is cancelled; completed task is untouched
         assert tasks_cancelled == 1
-        assert mock_task.status == TaskStatus.FAILED.value
+        assert mock_pending_task.status == TaskStatus.FAILED.value
+        assert mock_completed_task.status == TaskStatus.COMPLETED.value
🤖 Fix all issues with AI agents
In `@kinitro/executor/env_loader.py`:
- Around line 130-143: The code is discarding evaluation error details by
hardcoding error=None when building the TaskResult; change this to propagate any
error returned by the environment by extracting error = result.get("error") (or
None by default) and pass that into the TaskResult constructor (TaskResult(...,
error=error)); keep existing fields (task.task_uuid, success, score,
total_reward, timesteps) and only replace the hardcoded None with the extracted
error so evaluation failures include the environment's error message.

In `@kinitro/executor/family_worker.py`:
- Around line 111-121: The FamilyWorker instantiation chain never receives the
GPU flag so gpu_enabled is always false; pass ExecutorConfig.eval_gpu through
WorkerProcess.start into run_family_worker, add a gpu_enabled parameter to
run_family_worker and propagate it into FamilyWorker.__init__ (store as
self.gpu_enabled), and include gpu_enabled=self.gpu_enabled when calling
build_load_kwargs in FamilyWorker before load_and_warmup_env; update any
signatures that call run_family_worker (and tests) accordingly so family workers
honor eval_gpu instead of defaulting to False.
🧹 Nitpick comments (7)
kinitro/api/routes/scores.py (1)

17-17: Add type annotations to the helper's parameters.

cycle and scores_orm are untyped, which reduces IDE support and type-checker coverage. Since the types are already imported or available (EvaluationCycleORM from storage, list[MinerScoreORM]), annotating them is straightforward. The same applies to _build_weights_response in weights.py.

Proposed fix

You'll need to import the ORM types:

from kinitro.backend.models import EvaluationCycleORM, MinerScoreORM

Then annotate:

-def _build_scores_response(cycle, scores_orm) -> ScoresResponse:
+def _build_scores_response(cycle: EvaluationCycleORM, scores_orm: list[MinerScoreORM]) -> ScoresResponse:

As per coding guidelines, "Add return types to public functions and methods. Prefer BaseSettings and BaseModel type annotations for config/DTOs."

kinitro/api/routes/weights.py (1)

13-13: Add type annotations to the helper's parameters (same as scores.py).

Note that cycle is nullable here since get_cycle returns EvaluationCycleORM | None, and the body already handles None with conditionals on lines 25–27.

Proposed fix
-def _build_weights_response(weights_orm, cycle) -> WeightsResponse:
+def _build_weights_response(weights_orm: ComputedWeightsORM, cycle: EvaluationCycleORM | None) -> WeightsResponse:

You'll need to add the ORM import:

from kinitro.backend.models import ComputedWeightsORM, EvaluationCycleORM, WeightsResponse, WeightsU16

As per coding guidelines, "Add return types to public functions and methods. Prefer BaseSettings and BaseModel type annotations for config/DTOs."

tests/unit/test_crypto.py (2)

58-60: Good use of fixture to deduplicate keypair generation.

Since the keypair is never mutated, you could optionally scope this to "class" to avoid regenerating keys for every test method, but function scope is fine for isolation.


215-217: Remove the unnecessary alias backend_keypair = keypair.

Either rename the fixture parameter to backend_keypair directly, or use keypair throughout the method body. The intermediate assignment adds no value.

Option A: rename the fixture parameter
-    def test_full_commitment_flow(self, keypair):
+    def test_full_commitment_flow(self, backend_keypair):
         """Test full flow: generate keys, encrypt, parse, decrypt."""
-        backend_keypair = keypair
Option B: use `keypair` throughout
     def test_full_commitment_flow(self, keypair):
         """Test full flow: generate keys, encrypt, parse, decrypt."""
-        backend_keypair = keypair
-
         # Miner encrypts their deployment ID
         deployment_id = "95edf2b6-e18b-400a-8398-5573df10e5e4"
-        encrypted_blob = encrypt_deployment_id(deployment_id, backend_keypair.public_key_hex())
+        encrypted_blob = encrypt_deployment_id(deployment_id, keypair.public_key_hex())
         ...
-            parsed["encrypted_deployment"], backend_keypair.private_key
+            parsed["encrypted_deployment"], keypair.private_key
tests/unit/test_genesis.py (2)

78-80: object.__new__ bypass is fragile — consider a comment listing the assumed invariant.

This works because _compute_reward and _check_success currently only use their explicit arguments and never touch self.* attributes. If a future refactor introduces instance-state access in those methods, every test here will fail with an opaque AttributeError instead of a clear message.

A minimal safeguard: add a brief inline comment noting the invariant, or assert early that the methods don't rely on instance state (e.g., a smoke attribute check). Not blocking, since the pattern is documented in the docstring.


620-762: Consider a @pytest.fixture for the repeated _make_g1_env() calls.

Every test method in TestG1Reward (and TestG1Success) calls _make_g1_env() identically. A class-scoped or function-scoped fixture would reduce the repetition and make it trivial to swap the construction strategy later.

Example fixture
`@pytest.fixture`()
def g1_env() -> G1Environment:
    return _make_g1_env()

Then each test takes g1_env as a parameter instead of calling the helper.

environments/metaworld/env.py (1)

46-343: Significant code duplication between metaworld/env.py and genesis/env.py Actor classes.

Given this PR's goal of deduplicating code, it's worth noting that the Actor classes in environments/metaworld/env.py and environments/genesis/env.py share nearly identical structure: _call_miner, _run_evaluation, _build_error_result, _get_env, and cleanup are essentially the same, differing only in the family prefix string and a few minor details (e.g., genesis uses asyncio.Lock for serialization). A shared base Actor class or mixin could eliminate ~200 lines of duplication. This could be a follow-up.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/test_cycle_isolation.py (1)

91-105: ⚠️ Potential issue | 🟡 Minor

Test name doesn't match what's actually verified.

test_leaves_completed_tasks_unchanged only supplies a pending task via _make_mock_task(TaskStatus.PENDING.value). It never creates a completed task to assert it remains unmodified. Consider either renaming the test (e.g., test_cancels_only_pending_tasks) or adding a completed mock task and asserting its status is unchanged.

💡 Option: add a completed task to actually verify the claim
     async def test_leaves_completed_tasks_unchanged(self, mock_session):
         """Completed/failed tasks from incomplete cycles are not modified."""
         mock_cycle = _make_mock_cycle()
         mock_task = _make_mock_task(TaskStatus.PENDING.value)
+        mock_completed_task = _make_mock_task(TaskStatus.COMPLETED.value)
 
-        mock_session.execute = _mock_execute_results([mock_cycle], [mock_task])
+        mock_session.execute = _mock_execute_results([mock_cycle], [mock_task, mock_completed_task])
 
         storage = Storage("postgresql+asyncpg://test:test@localhost/test")
 
         cycles_cancelled, tasks_cancelled = await storage.cancel_incomplete_cycles(mock_session)
 
         # Only the pending task is cancelled
         assert tasks_cancelled == 1
         assert mock_task.status == TaskStatus.FAILED.value
+        assert mock_completed_task.status == TaskStatus.COMPLETED.value
🧹 Nitpick comments (7)
tests/unit/test_genesis.py (1)

625-627: Consider extracting g1_env to a module-level fixture to avoid duplication.

The same g1_env fixture is defined identically in both TestG1Reward and TestG1Success. You could define it once at module scope (or in conftest.py) to further reduce duplication — which aligns with this PR's objective.

♻️ Example: module-level fixture
# At module level or in conftest.py
`@pytest.fixture`()
def g1_env() -> G1Environment:
    return _make_g1_env()

Then remove the per-class fixture definitions and add g1_env as a parameter to each test method.

Also applies to: 772-774

tests/unit/test_crypto.py (1)

58-60: Same keypair fixture pattern duplicated across three test classes.

Identical BackendKeypair.generate() fixtures appear in TestBackendKeypair, TestEncryptDecrypt, and TestIntegration (as backend_keypair). Consider a single module-level or conftest.py fixture to consolidate, consistent with this PR's deduplication goal.

Also applies to: 141-143, 211-213

kinitro/api/routes/weights.py (1)

39-54: Consider adding return type annotations to endpoint functions.

The coding guidelines require return types on public functions. While FastAPI infers the response from response_model, adding -> WeightsResponse would satisfy the guideline and improve readability.

 `@router.get`("/latest", response_model=WeightsResponse)
 async def get_latest_weights(
     session: AsyncSession = Depends(get_session),
     storage: Storage = Depends(get_storage),
-):
+) -> WeightsResponse:

Same for get_weights_for_block at Line 58.

kinitro/executor/env_loader.py (1)

147-161: Silent except Exception: pass hides container-removal failures.

During cleanup debugging, a completely silent swallow gives zero visibility. A logger.debug would cost nothing and help diagnose stuck-container issues.

Proposed fix
     try:
         subprocess.run(
             ["docker", "rm", "-f", container_name],
             capture_output=True,
             text=True,
             timeout=5,
             check=False,
         )
-    except Exception:
-        pass
+    except Exception as e:
+        logger.debug("force_remove_container_failed", container=container_name, error=str(e))

As per coding guidelines: "Avoid broad except Exception unless you re-raise or return a clear error" and "Provide actionable error messages and log details with structlog".

kinitro/executor/family_worker.py (1)

244-246: Nit: _submit_result wraps a single result in a list.

submit_results expects a list, but this always submits one result. This is fine if the API client handles single-element lists efficiently, but worth noting in case batching is desired later.

kinitro/executor/worker_process.py (1)

39-62: Positional arg list is getting unwieldy (17 args) — consider passing a config/dataclass.

This isn't a regression from this PR, but adding eval_gpu as the 17th positional arg further increases the risk of ordering mistakes. A config dataclass or kwargs dict would be safer for cross-process serialization. Fine to defer.

kinitro/executor/worker.py (1)

22-34: Remove the unused EvalEnvConfig dataclass.

This dataclass is not referenced anywhere in the codebase and appears to be leftover from earlier code. Removing it (lines 22-33) will eliminate dead code.

Copy link
Collaborator

@Shr1ftyy Shr1ftyy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@rayokamoto rayokamoto merged commit ba1b778 into main Feb 12, 2026
2 checks passed
@rayokamoto rayokamoto deleted the refactor/cleanup branch February 12, 2026 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants