MiniMax
MiniMax

MiniMax Music 2.6

Promptable full-song generation with vocals, lyrics, BPM and key control

Text to Audio

MiniMax Music 2.6 Overview

MiniMax Music 2.6 is MiniMax’s latest music generation model for full vocal songs and instrumentals from text prompts. It supports natural-language prompts or detailed production-style instructions, follows specified BPM and key with high reliability, and exposes fine-grained song structure control through section tags. The same Music API also supports instrumental generation, lyrics-assisted workflows, and synchronous or streaming delivery.

From $0.1500/ audio
3 min song generation $0.15
3 min song cover $0.15

Commercial use

How to Use MiniMax Music 2.6

Overview

MiniMax Music 2.6 is a music generation model for creating complete songs from text prompts and lyrics. It supports both vocal music and instrumental generation, with control over style, arrangement, tempo, key, and section structure.

This model is suited to API workflows that need more control than simple text-to-music generation, especially when outputs need consistent musical parameters or a defined song layout.

Capabilities

Full Song Generation

Generate complete songs from prompts with vocals, lyrics, backing instrumentation, and arrangement. The model can work from simple natural-language descriptions or more technical music prompts.

Instrumental Generation

Generate music without vocals by enabling instrumental mode. This is useful for background music, soundtrack generation, ambient beds, and other non-vocal use cases.

BPM and Key Control

The model can follow explicit tempo and key instructions in the prompt, which helps when tracks need to align with pacing, mood, or other musical constraints across a larger library.

Section-Level Structure Control

Lyrics can include structure tags such as [Intro], [Verse], [Pre Chorus], [Chorus], [Bridge], [Outro], [Inst], and related sections. This allows more direct control over how the song develops across sections.

Flexible Prompting

wPrompts can be short and descriptive or more production-oriented, including details like instrumentation, genre, tempo, vocal style, and arrangement cues.

Typical Use Cases

  • Song generation from prompt and lyrics
  • Instrumental background music generation
  • Music libraries with controlled BPM or key
  • Structured music generation for games, ads, or branded audio
  • Rapid iteration on genre and arrangement concepts

More models from MiniMax

MiniMax Music Cover

Api Only

MiniMax Music Cover is MiniMax’s song-to-song transformation model for reimagining an existing track in a new style. It preserves the original vocal melody while changing voice timbre, instrumentation, genre, and arrangement through a text prompt. It supports one-step generation from reference audio or a two-step workflow with preprocessing and optional lyric editing.

MiniMax M2.7 is a long‑context LLM designed for agentic workflows across software engineering, search and tool use, and high‑value office productivity tasks. It’s built for multi‑step execution, with strong instruction following and dependable task decomposition, making it a solid default for production assistants that write code, call tools, and handle complex document workflows.

MiniMax M2.7‑Highspeed is the performance‑tuned variant of M2.7, built for lower latency and higher throughput while keeping output behavior consistent with the standard model. It’s a strong fit for interactive coding agents, tool‑calling pipelines, and office automation flows where responsiveness matters.

MiniMax-M2.5 is MiniMax’s latest frontier model, optimized for fast, low-cost agentic workflows across coding, search/tool use, and high-value office tasks. Trained with large-scale reinforcement learning in complex real-world environments, it delivers strong reasoning, efficient task decomposition, and high-quality outputs for production assistants and enterprise workflows.

MiniMax Speech 2.8 is an advanced text-to-speech model that turns text into natural, expressive audio in multiple languages. It delivers broadcast-ready speech with rich prosody, emotional control, and a diverse voice library. The model supports up to large input lengths and can be used for voiceovers, narration, accessibility tools, and interactive voice applications.

MiniMax Hailuo 2.3 Fast is the speed tier of the Hailuo 2.3 video family. It targets rapid iteration for social clips, ads, and previews. It produces 6 second 768p or 1080p outputs with smooth motion and stable composition. Ideal for high volume image driven video workflows.

MiniMax Hailuo 2.3 is a cinematic video model for short form production. It accepts text prompts or image inputs and outputs 6 or 10 second clips at 768p or 1080p. It focuses on consistent motion, strong physics, and stable scenes for ads, social content, and creative shots.

MiniMax Hailuo 02 is a 1080p AI video model for cinematic, high motion scenes. It converts text prompts or still images into short, polished clips with strong instruction following and realistic physics. Ideal for commercial spots, trailers, music promos, and social shorts.

MiniMax 01 Live generates short stylized videos from static anime art. It focuses on expressive character motion with consistent details. Use it to turn illustrations or manga panels into dynamic clips suitable for cutscenes, social posts, or prototype shots.

MiniMax 01 Director generates short cinematic video clips from text prompts with director level control. It supports detailed camera movement instructions, stable framing, and reduced motion randomness. Ideal for film previz, ads, and story beats inside production tools.

MiniMax 01 is a compact text to video model for short clips. It turns simple prompts into 720p videos with smooth motion and cinematic framing. It targets fast iteration and stable output so developers can prototype interactive video features and creative tools with low latency.