ACE-Step-ComfyUI

ComfyUI nodes for ACE-Step AI music generation — text-to-music, cover/remix, repaint, and LLM-powered sample generation.

Installation

Via ComfyUI Manager

Search for ACE-Step-ComfyUI in ComfyUI Manager and install.

Manual Installation

cd ComfyUI/custom_nodes
git clone https://github.com/ace-step/ACE-Step-ComfyUI.git
cd ACE-Step-ComfyUI
pip install -r requirements.txt

Restart ComfyUI after installation.

Setup

Cloud Mode (default)

Get your API key from acemusic.ai/api-key
In the Text2music Server node, set mode → cloud and paste your key into api_key
Or set the environment variable ACESTEP_API_KEY

The key is auto-saved locally (in .config/ under the node directory) and displayed as ●●●● for security. It is not stored in workflow JSON.

Local Mode

Local mode lets you run the ACE-Step inference server on your own machine (requires a GPU with sufficient VRAM).

1. Install ACE-Step 1.5

git clone https://github.com/ace-step/ACE-Step-1.5.git
cd ACE-Step-1.5

Install dependencies (choose one):

# Using uv (recommended)
uv sync

# Or using pip
pip install -e .

2. Start the local server

# Using uv
uv run acestep-openrouter --host 0.0.0.0 --port 8002

# Or if installed via pip
acestep-openrouter --host 0.0.0.0 --port 8002

The server will download model weights on first launch and then listen on http://127.0.0.1:8002.

3. Configure ComfyUI

In the Text2music Server node, set mode → local. The URL auto-fills to http://127.0.0.1:8002 and the api_key field is hidden since no key is needed locally.

Usage Guide

Drag workflows/text2music.json into ComfyUI to load the ready-to-use workflow.

                             ┌── refer_audio ──┐
Load Audio (refer) ──────────┘                 │
                                               ▼
                            Text2music Gen Params ── gen_params ──► Text2music Server ──► Save Audio
                                               ▲                          ▲          │
Load Audio (src) ──── src_audio ───────────────┘                   Settings ┘      Show Text
Audio Codes ────────── audio_codes ────────────┘

Quick Start: Text-to-Music

Open the workflow. sample_mode is OFF by default — you are in manual mode.
Fill in caption (music style description) and lyrics (with [verse], [chorus] tags). Leave lyrics empty for instrumental.
Click Queue Prompt. The generated audio appears in Save Audio and generation info in Show Text.

Sample Mode (LLM Auto-Generation)

Instead of writing caption and lyrics yourself, let the LLM generate everything from a simple description:

Toggle sample_mode → ON in the Gen Params node.
Fill in sample_query (e.g. "a funk rock song with groovy bass and punchy drums").
Choose vocal_language and set is_instrumental if you want instrumental only.
Click Queue Prompt. The LLM generates caption, lyrics, bpm, key, duration, and time signature automatically, then synthesizes the music.

In sample mode, only three fields are shown: sample_query, vocal_language, and is_instrumental. All other parameters are decided by the LLM.

Manual Mode Controls

When sample_mode is OFF, you have full manual control:

Field	Description
caption	Music style / genre description
lyrics	Song lyrics (empty = instrumental)
vocal_language	Language for vocals
auto	ON (default): let the LM decide bpm, key, duration, time signature. OFF: set them manually below.
bpm	Beats per minute (shown when auto=OFF)
key	Musical key, e.g. `C major` (shown when auto=OFF)
duration	Duration in seconds (shown when auto=OFF)
time_signature	e.g. `4`, `3`, `6` (shown when auto=OFF)
is_repaint	Enable repaint mode (see below)
cover_strength	Noise injection strength for cover (0 = clean cover)
remix_strength	How much original audio to preserve (1.0 = full)

Cover / Remix Mode

Load the source song via Load Audio and connect it to Gen Params → src_audio.
Alternatively, connect pre-extracted audio codes via Audio Codes → Gen Params → audio_codes.
The task type automatically switches to cover when src_audio or audio_codes is connected.
Adjust cover_strength and remix_strength to control the output.

The server automatically converts source audio to audio codes internally when needed. The generated audio_codes are also returned as a third output of Text2music Server.

Repaint Mode

Connect source audio to Gen Params → src_audio.
Toggle is_repaint → ON. The repaint_start and repaint_end fields appear.
Set the time range (in seconds) for the region to regenerate.
Fill in caption and lyrics for the regenerated section.

Reference Audio

Connect a reference audio file to Gen Params → refer_audio to guide the style and timbre of the generated music. This works with any mode.

Settings Node

The Settings node controls inference hyperparameters:

Field	Default	Description
seed	`-1`	Random seed (`-1` = random, set a number for reproducibility)
thinking	`true`	Enable 5Hz LM audio code generation (higher quality, slower)
use_cot_caption	`true`	LM chain-of-thought for caption refinement
use_cot_language	`true`	LM chain-of-thought for language detection
temperature	`0.85`	Sampling temperature
lm_cfg_scale	`1.0`	LM classifier-free guidance scale
dit_guidance_scale	`3.5`	DiT guidance scale
dit_inference_steps	`60`	Number of DiT denoising steps

Node Reference

Node	Description	Inputs	Outputs
Text2music Gen Params	Build generation parameters. Supports sample_mode (LLM auto) and manual mode.	`sample_mode`, `vocal_language`, + optional fields	`gen_params`
Settings	Inference hyperparameters for LM + DiT.	`seed`, `thinking`, `temperature`, etc.	`settings`
Text2music Server	Calls the ACE-Step API to generate music.	`gen_params`, `settings`, `server_url`, `api_key`, `mode`	`audio`, `info`, `audio_codes`
Audio Codes	Editable passthrough for audio codes. Paste manually or receive from Text2music Server.	`audio_codes_in` (optional)	`audio_codes`
Show Text	Displays any STRING input as a read-only text area.	`text`	`text` (passthrough)

Tips

Cover/Repaint auto-safety: When the task is cover or repaint, thinking, use_cot_caption, and use_cot_language are forced to false regardless of Settings values — the LM is not used for these tasks.
Auto metas: When auto is ON, the LM infers bpm/key/duration/time_signature. When OFF, your manual values are sent directly.
Use Show Text nodes connected to info outputs to inspect what the server returned (LLM-generated parameters, generation metadata, etc.).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
js		js
workflows		workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
nodes.py		nodes.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ACE-Step-ComfyUI

Installation

Via ComfyUI Manager

Manual Installation

Setup

Cloud Mode (default)

Local Mode

Usage Guide

Quick Start: Text-to-Music

Sample Mode (LLM Auto-Generation)

Manual Mode Controls

Cover / Remix Mode

Repaint Mode

Reference Audio

Settings Node

Node Reference

Tips

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

ace-step/ACE-Step-ComfyUI

Folders and files

Latest commit

History

Repository files navigation

ACE-Step-ComfyUI

Installation

Via ComfyUI Manager

Manual Installation

Setup

Cloud Mode (default)

Local Mode

Usage Guide

Quick Start: Text-to-Music

Sample Mode (LLM Auto-Generation)

Manual Mode Controls

Cover / Remix Mode

Repaint Mode

Reference Audio

Settings Node

Node Reference

Tips

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages