ACE-Step v1.5 XL Turbo

Fast 4B music generation model with 8-step inference for higher-quality rapid iteration

Text to Audio

ACE-Step v1.5 XL Turbo Overview

ACE-Step v1.5 XL Turbo is the accelerated 4B DiT variant of ACE-Step 1.5. It is optimized for faster music generation with 8-step distilled inference while retaining the higher-capacity XL architecture. It supports text-to-music, cover generation, and repaint workflows, making it suitable for rapid iteration when the 2B turbo model is not enough in audio quality.

How to Use ACE-Step v1.5 XL Turbo

Overview

ACE-Step v1.5 XL Turbo is the fast 4B variant in the ACE-Step 1.5 XL line. It combines the larger XL decoder with a distilled 8-step inference setup, making it the speed-focused option for users who still want the higher-capacity XL architecture.

This model is a good fit for rapid iteration workflows where generation speed matters, but the smaller turbo model may not provide enough audio quality.

Capabilities

Fast Text-to-Music Generation

Generate music from text prompts using a reduced-step inference profile. This makes the model better suited to quick turnaround and repeated iteration than the quality-oriented XL Base variant.

Cover and Repaint Workflows

The XL Turbo variant supports text-to-music, cover generation, and repaint tasks, which makes it useful for prompt-driven creation plus selected audio editing workflows.

Higher-Capacity XL Audio Quality

This variant uses the same 4B DiT class as the rest of the XL family, giving it stronger quality potential than the smaller 2B turbo model.

Multilingual Prompt and Lyric Support

The ACE-Step 1.5 family supports prompt and lyric control across 50+ languages, which is useful for multilingual music generation and structured song workflows.

Input and Output

  • Model ID: acestep-v15-xl-turbo
  • Input: text prompts, lyrics, and task-specific conditioning depending on workflow
  • Output: generated audio
  • Inference profile: 8 distilled steps without CFG
  • Supported duration: 10 seconds to 10 minutes
  • Task coverage: text-to-music, cover, repaint

Hardware Notes

Like the other XL models, this variant has higher VRAM requirements than the 2B line. The official model card lists support from 12 GB VRAM with offload and quantization, with 20 GB or more recommended for use without offload.

Typical Use Cases

  • Fast iteration on music prompts
  • Higher-quality turbo generation
  • Cover generation from text-guided workflows
  • Repaint-style music editing tasks
  • Prototyping songs before switching to a slower quality-first model