ComfyUI nodes for ACE-Step AI music generation — text-to-music, cover/remix, repaint, and LLM-powered sample generation.
Search for ACE-Step-ComfyUI in ComfyUI Manager and install.
cd ComfyUI/custom_nodes
git clone https://github.com/ace-step/ACE-Step-ComfyUI.git
cd ACE-Step-ComfyUI
pip install -r requirements.txtRestart ComfyUI after installation.
- Get your API key from acemusic.ai/api-key
- In the Text2music Server node, set mode →
cloudand paste your key into api_key - Or set the environment variable
ACESTEP_API_KEY
The key is auto-saved locally (in
.config/under the node directory) and displayed as●●●●for security. It is not stored in workflow JSON.
Local mode lets you run the ACE-Step inference server on your own machine (requires a GPU with sufficient VRAM).
1. Install ACE-Step 1.5
git clone https://github.com/ace-step/ACE-Step-1.5.git
cd ACE-Step-1.5Install dependencies (choose one):
# Using uv (recommended)
uv sync
# Or using pip
pip install -e .2. Start the local server
# Using uv
uv run acestep-openrouter --host 0.0.0.0 --port 8002
# Or if installed via pip
acestep-openrouter --host 0.0.0.0 --port 8002The server will download model weights on first launch and then listen on http://127.0.0.1:8002.
3. Configure ComfyUI
In the Text2music Server node, set mode → local. The URL auto-fills to http://127.0.0.1:8002 and the api_key field is hidden since no key is needed locally.
Drag workflows/text2music.json into ComfyUI to load the ready-to-use workflow.
┌── refer_audio ──┐
Load Audio (refer) ──────────┘ │
▼
Text2music Gen Params ── gen_params ──► Text2music Server ──► Save Audio
▲ ▲ │
Load Audio (src) ──── src_audio ───────────────┘ Settings ┘ Show Text
Audio Codes ────────── audio_codes ────────────┘
- Open the workflow. sample_mode is OFF by default — you are in manual mode.
- Fill in caption (music style description) and lyrics (with
[verse],[chorus]tags). Leave lyrics empty for instrumental. - Click Queue Prompt. The generated audio appears in Save Audio and generation info in Show Text.
Instead of writing caption and lyrics yourself, let the LLM generate everything from a simple description:
- Toggle sample_mode → ON in the Gen Params node.
- Fill in sample_query (e.g.
"a funk rock song with groovy bass and punchy drums"). - Choose vocal_language and set is_instrumental if you want instrumental only.
- Click Queue Prompt. The LLM generates caption, lyrics, bpm, key, duration, and time signature automatically, then synthesizes the music.
In sample mode, only three fields are shown:
sample_query,vocal_language, andis_instrumental. All other parameters are decided by the LLM.
When sample_mode is OFF, you have full manual control:
| Field | Description |
|---|---|
| caption | Music style / genre description |
| lyrics | Song lyrics (empty = instrumental) |
| vocal_language | Language for vocals |
| auto | ON (default): let the LM decide bpm, key, duration, time signature. OFF: set them manually below. |
| bpm | Beats per minute (shown when auto=OFF) |
| key | Musical key, e.g. C major (shown when auto=OFF) |
| duration | Duration in seconds (shown when auto=OFF) |
| time_signature | e.g. 4, 3, 6 (shown when auto=OFF) |
| is_repaint | Enable repaint mode (see below) |
| cover_strength | Noise injection strength for cover (0 = clean cover) |
| remix_strength | How much original audio to preserve (1.0 = full) |
- Load the source song via Load Audio and connect it to Gen Params →
src_audio. - Alternatively, connect pre-extracted audio codes via Audio Codes → Gen Params →
audio_codes. - The task type automatically switches to
coverwhensrc_audiooraudio_codesis connected. - Adjust cover_strength and remix_strength to control the output.
The server automatically converts source audio to audio codes internally when needed. The generated
audio_codesare also returned as a third output of Text2music Server.
- Connect source audio to Gen Params →
src_audio. - Toggle is_repaint → ON. The repaint_start and repaint_end fields appear.
- Set the time range (in seconds) for the region to regenerate.
- Fill in caption and lyrics for the regenerated section.
Connect a reference audio file to Gen Params → refer_audio to guide the style and timbre of the generated music. This works with any mode.
The Settings node controls inference hyperparameters:
| Field | Default | Description |
|---|---|---|
| seed | -1 |
Random seed (-1 = random, set a number for reproducibility) |
| thinking | true |
Enable 5Hz LM audio code generation (higher quality, slower) |
| use_cot_caption | true |
LM chain-of-thought for caption refinement |
| use_cot_language | true |
LM chain-of-thought for language detection |
| temperature | 0.85 |
Sampling temperature |
| lm_cfg_scale | 1.0 |
LM classifier-free guidance scale |
| dit_guidance_scale | 3.5 |
DiT guidance scale |
| dit_inference_steps | 60 |
Number of DiT denoising steps |
| Node | Description | Inputs | Outputs |
|---|---|---|---|
| Text2music Gen Params | Build generation parameters. Supports sample_mode (LLM auto) and manual mode. | sample_mode, vocal_language, + optional fields |
gen_params |
| Settings | Inference hyperparameters for LM + DiT. | seed, thinking, temperature, etc. |
settings |
| Text2music Server | Calls the ACE-Step API to generate music. | gen_params, settings, server_url, api_key, mode |
audio, info, audio_codes |
| Audio Codes | Editable passthrough for audio codes. Paste manually or receive from Text2music Server. | audio_codes_in (optional) |
audio_codes |
| Show Text | Displays any STRING input as a read-only text area. | text |
text (passthrough) |
- Cover/Repaint auto-safety: When the task is
coverorrepaint,thinking,use_cot_caption, anduse_cot_languageare forced tofalseregardless of Settings values — the LM is not used for these tasks. - Auto metas: When
autois ON, the LM infers bpm/key/duration/time_signature. When OFF, your manual values are sent directly. - Use Show Text nodes connected to
infooutputs to inspect what the server returned (LLM-generated parameters, generation metadata, etc.).