ACE-Step v1.5 Base
ACE-Step v1.5 Base is an open-source music generation foundation model built on a hybrid LLM planner and Diffusion Transformer architecture. It generates full tracks from text prompts with support for voice cloning, lyric editing, remixing, cover generation, and compositions up to 10 minutes. It supports over 50 languages and runs on consumer hardware with under 4GB VRAM.
API Options
Platform-level options for task execution and delivery.
-
taskType
string required value: audioInference -
Identifier for the type of task being performed
-
taskUUID
string required UUID v4 -
UUID v4 identifier for tracking tasks and matching async responses. Must be unique per task.
-
outputType
string default: URL -
Audio output type.
Allowed values 3 values
-
outputFormat
string default: MP3 -
File format for the generated audio.
Allowed values 3 values
-
webhookURL
string URI -
Specifies a webhook URL where JSON responses will be sent via HTTP POST when generation tasks complete. For batch requests with multiple results, each completed item triggers a separate webhook call as it becomes available.
Learn more 1 resource
- Webhooks PLATFORM
- Webhooks
-
deliveryMethod
string default: sync -
Determines how the API delivers task results.
Allowed values 2 values
- Returns complete results directly in the API response.
- Returns an immediate acknowledgment with the task UUID. Poll for results using getResponse.
Learn more 1 resource
- Task Responses PLATFORM
-
uploadEndpoint
string URI -
Specifies a URL where the generated content will be automatically uploaded using the HTTP PUT method. The raw binary data of the media file is sent directly as the request body. For secure uploads to cloud storage, use presigned URLs that include temporary authentication credentials.
Common use cases:
- Cloud storage: Upload directly to S3 buckets, Google Cloud Storage, or Azure Blob Storage using presigned URLs.
- CDN integration: Upload to content delivery networks for immediate distribution.
// S3 presigned URL for secure upload https://your-bucket.s3.amazonaws.com/generated/content.mp4?X-Amz-Signature=abc123&X-Amz-Expires=3600 // Google Cloud Storage presigned URL https://storage.googleapis.com/your-bucket/content.jpg?X-Goog-Signature=xyz789 // Custom storage endpoint https://storage.example.com/uploads/generated-image.jpgThe content data will be sent as the request body to the specified URL when generation is complete.
-
ttl
integer min: 60 -
Time-to-live (TTL) in seconds for generated content. Only applies when
outputTypeis 'URL'.
-
includeCost
boolean default: false -
Include task cost in the response.
-
numberResults
integer min: 1 max: 20 default: 1 -
Number of results to generate. Each result uses a different seed, producing variations of the same parameters.
Inputs
Input resources for the task (images, audio, etc). These must be nested inside the inputs object.
inputs object.-
inputs»audioaudio
string -
Audio input (UUID or URL).
Generation Parameters
Core parameters for controlling the generated content.
-
model
string required value: runware:ace-step@v1.5-base -
Identifier of the model to use for generation.
Learn more 3 resources
-
positivePrompt
string required min: 2 max: 3000 -
Text prompt describing elements to include in the generated output.
Learn more 2 resources
-
negativePrompt
string min: 2 max: 3000 -
Prompt to guide what to exclude from generation. Ignored when guidance is disabled (CFGScale ≤ 1).
Learn more 1 resource
-
duration
float min: 0 max: 300 step: 0.1 default: 60 -
Duration of the generation in seconds. Total frames = duration × fps.
-
seed
integer min: 0 max: 2147483647 -
Random seed for reproducible generation. When not provided, a random seed is generated in the unsigned 32-bit range.
-
steps
integer min: 0 max: 1000 default: 100 -
Total number of denoising steps. Higher values generally produce more detailed results but take longer.
Learn more 1 resource
-
CFGScale
float min: 1 max: 30 step: 0.01 default: 10 -
Guidance scale. Higher values follow the prompt more closely at the cost of quality.
Learn more 1 resource
-
strength
float min: 0 max: 1 step: 0.01 default: 0.5 -
Fraction of steps using the input source instead of generated output.
Settings
Technical parameters to fine-tune the inference process. These must be nested inside the settings object.
settings object.-
settings»bpmbpm
integer min: 30 max: 300 -
Beats per minute. If not set, the model decides automatically.
-
settings»coverConditioningScalecoverConditioningScale
float min: 0 max: 1 step: 0.01 default: 1 -
Fraction of steps using source-audio conditioning.
-
settings»guidanceTypeguidanceType
string default: apg -
Controls how guidance is applied during generation.
Allowed values 2 values
- Adaptive Projected Guidance.
- Classifier-Free Guidance.
-
settings»keyScalekeyScale
string -
Musical key and scale in '{Note}{Accidental} {Mode}' format (e.g. 'C major', 'F# minor', 'Bb major').
-
settings»lyricslyrics
string -
Song lyrics, typically formatted like a lyrics website.
-
settings»repaintingEndrepaintingEnd
integer -
End time in seconds for repaint region. Requires input audio. Values beyond audio duration append new audio.
-
settings»repaintingStartrepaintingStart
integer -
Start time in seconds for repaint region. Requires input audio. Negative values prepend audio before the start.
-
settings»timeSignaturetimeSignature
integer -
Beats per measure. Empty string lets the model decide.
Allowed values 4 values
-
settings»vocalLanguagevocalLanguage
string default: en -
ISO 639-1 language code for vocals. 'unknown' for instrumental or auto detection.
Allowed values 51 values