Skip to content

Conversation

@nightguarder
Copy link
Contributor

@nightguarder nightguarder commented Dec 20, 2025

Custom MLX Models Support - Issue #918

Motivation

Fixes Issue #918: Enable users to run custom MLX based models from mlx-community on Hugging Face without manual code updates.

What Changed

1. Frontend UI for Custom Models

Commit: “Add custom models to dashboard"

  • Added "Custom Models" section to downloads page with HuggingFace model ID input
  • Implemented "Download Model" button triggering model Download into the exo HOME folder
  • SAFE registration of the custom model via separate /custom_models API

2. Tests

Commit: “Integrate new test
@pytest are available in src/exo/worker/tests/test_custom_model.py folder
with future possibility to run them on CI/CD Pipeline on Github file (.github/workflows/custom_models.yml)

3. Persistent Storage & Model Registration

Commit: "Persist storage for custom models"

  • Fixed resolve_model_meta() to check both short_id keys and full model_id values
  • Enabled custom model registration to ~/.exo/custom_models.json during download
  • Models reload automatically on EXO restart from persistent storage
  • Prettified name for custom models in pretty_name key

4. Safe Downloading logic

Commit: “SAFE model registration

  • Implemented "lazy loading" logic. Runners are now skipped for download_only instances until a task is received, while model downloads continue in the background.
  • Added automatic model registration. Custom models are now registered in ~/.exo/custom_models.json immediately when the download starts, ensuring they are recognized by the system right away.

Why It Works

This implementation enables dynamic custom model loading without requiring manual modifications to model_cards.

Users can:

  • Download any mlx-community based model via the dashboard from HuggingFace
  • Have models persist across restarts
  • Test out the their model once it loads

Known Issues

1. Missing chat_template.jinja for Some Models

Some mlx-community models don't include a chat template, causing the model to output its Instructions instead of formatted chat responses. This is a model-specific issue with mlx-community models, not a bug in our implementation.

Workaround: Use models that include proper chat templates (e.g., mlx-community/Qwen2.5-0.5B-Instruct-4bit) or add a chat template.jinja yourself.

Testing

Note that manual rebuild of dashboard cd dashboard && npm run build is needed

Manual Testing

  • Hardware: MacBook Pro (M4 Pro)
  • Tested with mlx-community/Qwen2.5-14B-Instruct-8bit
  • Verified:
    • Model appears in downloads list with correct size
    • Download progress bar updates in real-time
    • Model persists in ~/.exo/custom_models.json
    • Prettified name in [pretty_name] key entry
    • Model is available after restart of exo
    • Chat inference works correctly

Automated Testing

  • Integration test: src/exo/worker/tests/test_custom_model.py
  • CI workflow: .github/workflows/test_custom_models.yml

Files Modified

  • src/exo/master/api.py - Model resolution & API response
  • src/exo/shared/models/model_cards.py - Persistence logic
  • src/exo/worker/download/impl_shard_downloader.py - Registration on download
  • src/exo/worker/plan.py - Scheduler lazy loading logic
  • dashboard/src/routes/downloads/+page.svelte - Custom models UI
  • dashboard/src/lib/stores/app.svelte.ts - API integration

@nightguarder nightguarder marked this pull request as draft December 20, 2025 13:02
@Evanev7 Evanev7 linked an issue Dec 20, 2025 that may be closed by this pull request
@nightguarder
Copy link
Contributor Author

Hi, I have successfully added a new Feature: Testing custom MLX models

Can Someone please clone & run my fork to verify downloading a larger model like mlx-community/gpt-oss-20b-MXFP4-Q8? I don’t have enough RAM :/

@nightguarder
Copy link
Contributor Author

I hope this is something we wanted. Currently only for testing purposes.
Screenshot 2025-12-20 at 16 02 34

@nightguarder
Copy link
Contributor Author

Not sure why my VSCode Prettier auto prettified all the files I’ve changed.

I will probably create a new clean PR where I only change the required code blocks, to keep it clean, if it’s needed to Approve this feature request.

@Evanev7
Copy link
Member

Evanev7 commented Dec 20, 2025

Looks good! I wonder if we should directly add the model to the model cards instead of a separate KNOWN_MODELS but there's wider questions to be answered in there.

@Evanev7
Copy link
Member

Evanev7 commented Dec 20, 2025

As for prettier, I don't believe our current formatter extends to the dashboard so I don't particularly mind atm

@nightguarder
Copy link
Contributor Author

nightguarder commented Dec 20, 2025

Looks good! I wonder if we should directly add the model to the model cards instead of a separate KNOWN_MODELS but there's wider questions to be answered in there.

My idea was after the users tests it and verify, then we add it model_cards as official supported model. but yeah, can be skipped.

@Evanev7
Copy link
Member

Evanev7 commented Dec 20, 2025

Ok - gpt-oss-20b-MXFP4-Q8 did not work, but the download was completely fine, seems like an upstream problem.

@nightguarder
Copy link
Contributor Author

Ok - gpt-oss-20b-MXFP4-Q8 did not work, but the download was completely fine, seems like an upstream problem.

Yes I see the erorr. this might be more difficult than I thought. Runner 4e13d976-5262-43eb-b513-e9678e673e59 crashed with critical exception Quantized SDPA does not support attention sinks

@Evanev7
Copy link
Member

Evanev7 commented Dec 20, 2025

This isn't an issue for this PR - we need to bump mlx versions and test afaik.

@nightguarder
Copy link
Contributor Author

Ok it’s working. GPT-OSS- model loaded. However I had to adedd TEMPORARY overrides as in my commit: 2e446ab Not ideal, we need to wait for official mlx support version.

@nightguarder
Copy link
Contributor Author

GPT-oss-20b has no chat_template.jinja resulting in artifacts and instructions appearing in chat:

QUERY
Hello

EXO
09:25:43
TTFT 555ms•70.7 tok/s
<|channel|>analysis<|message|>We need to be helpful, concise, no reasoning inside answer. Respond "Hello". Maybe ask how to help.<|end|><|start|>assistant<|channel|>final<|message|>Hello! How can I help you today? 

@nightguarder nightguarder marked this pull request as ready for review December 21, 2025 09:27
@Evanev7
Copy link
Member

Evanev7 commented Dec 21, 2025

appreciate the enthusiasm but can we keep this pr down to custom models? the gpt-oss fix is a separate issue.

@gj-aazoo
Copy link

gj-aazoo commented Jan 2, 2026

Tested this branch and ran into this, would be useful to be also download models that you converted yourself.

Model ID must start with mlx-community/

@nightguarder nightguarder changed the title Integrating custom mlx models Integrating custom models for Exo Jan 2, 2026
@nightguarder
Copy link
Contributor Author

Tested this branch and ran into this, would be useful to be also download models that you converted yourself.

Model ID must start with mlx-community/

You want to run models outside of mlx-community? they are not optimized for exo.

@gj-aazoo
Copy link

gj-aazoo commented Jan 2, 2026 via email

@nightguarder
Copy link
Contributor Author

nightguarder commented Jan 3, 2026

Yes, it is a MLX converted model. Sent from Android device Op 2 jan 2026 21:10 schreef Nightguarder @.***>:

Ok, You can now try your custom model. Please let me know if it works.

@nightguarder nightguarder requested a review from Evanev7 January 3, 2026 21:06
@Evanev7
Copy link
Member

Evanev7 commented Jan 4, 2026

Let me know when a good time for re-review is, I'm keen to get this integrated next week.

@gj-aazoo
Copy link

gj-aazoo commented Jan 4, 2026 via email

@gj-aazoo
Copy link

gj-aazoo commented Jan 5, 2026 via email

@gj-aazoo
Copy link

gj-aazoo commented Jan 5, 2026 via email

@nightguarder
Copy link
Contributor Author

nightguarder commented Jan 5, 2026

Turns out I needed to rebuild the dashboard, my bad. The download works fine now. [image.png] The model runs fine now.

Great News! Yes you need to update the frontend via cd dashboard && npm run build. What model did you use? No problems with unknown/weird <tokens> appearing in chat? Thanks

@gj-aazoo
Copy link

gj-aazoo commented Jan 5, 2026 via email

@nightguarder
Copy link
Contributor Author

@Evanev7
I think we are now ready to do a code review since another user reported successful running a custom model.

However I had some issues after pulling the latest changes after your commit 1ec550d. Mainly with placement.py.

Regarding this commit do you plan to fix the download model status? I find it rather disturbing to show all the default models not being downloaded. Just hide them under a tab Not Downloaded.

@Evanev7
Copy link
Member

Evanev7 commented Jan 7, 2026

It's not my favourite, but it's not really my code. WIP I think.
I would really much rather we only checked for models we had actually downloaded instead of trying all of them.

@nightguarder
Copy link
Contributor Author

nightguarder commented Jan 10, 2026

It's not my favourite, but it's not really my code. WIP I think. I would really much rather we only checked for models we had actually downloaded instead of trying all of them.

@Evanev7 I would like to fix it but this PR already has a lot of changes and we should not complicate things further.

@Evanev7
Copy link
Member

Evanev7 commented Jan 12, 2026

Agreed - let's get this merged and we can iterate

Copy link
Member

@JakeHillion JakeHillion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things before we can consider merging this:

  • There are merge conflicts in model_cards and the Svelte file. I am able to fix the model card ones quite easily, but the Svelte changes were significantly overlapping. Please push a merge commit for that.
  • There are several unrelated changes in here. Please run git rm -r .vscode/, git checkout main dashboard/package-lock.json and commit at least.
  • Run nix fmt and commit it.
  • There are lots of unrelated changes spread throughout this. I left a comment on a specific one. After the above steps, please push this PR and take a look through the "Files Changed" on GitHub. If there are any changes which aren't adding integration for custom models, please remove them from the PR so they don't show up anymore.

Thanks for the submission, I look forward to reviewing it in detail once it's merged & cleaned up!

topologyData,
type DownloadProgress,
placeInstance,
} from "$lib/stores/app.svelte";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These quote changes, for example, shouldn't be in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

5 participants