MiniMax
MiniMax

MiniMax Music Cover

Audio-to-audio song transformation that preserves melody while changing style

Audio to Audio

MiniMax Music Cover Overview

MiniMax Music Cover is MiniMax’s song-to-song transformation model for reimagining an existing track in a new style. It preserves the original vocal melody while changing voice timbre, instrumentation, genre, and arrangement through a text prompt. It supports one-step generation from reference audio or a two-step workflow with preprocessing and optional lyric editing.

How to Use MiniMax Music Cover

Overview

MiniMax Music Cover is an audio-to-audio music model for transforming an existing song into a new style while preserving the core melody. It is designed for cover-style generation rather than original song creation from scratch.

This model is useful when the input track already exists and the goal is to reinterpret it through a different genre, arrangement, instrumentation, or vocal style.

Capabilities

Melody-Preserving Style Transfer

The model keeps the recognizable melody from the source audio while changing the surrounding presentation. This makes the output feel like the same song reworked in a new style rather than a fully new composition.

Prompt-Controlled Cover Generation

A text prompt controls the target style of the generated cover, including genre, instrumentation, vocal character, arrangement, and overall energy.

Audio-to-Audio Workflow

The model accepts a reference song as input and returns a transformed version of that song. This makes it appropriate for cover generation, reinterpretation, and remix-adjacent workflows.

One-Step and Two-Step Modes

The model supports a direct generation flow from reference audio and prompt, as well as a preprocessing flow that extracts melody features and structured lyrics for more controlled editing before generation.

Lyric-Aware Editing

In the two-step workflow, extracted lyrics and song structure can be reviewed or edited before generating the final cover. This gives more control when the output needs to preserve or modify lyrical content.

Typical Use Cases

  • Reimagining existing songs in a different genre
  • Creating alternate arrangements from source audio
  • Converting a vocal demo into a different musical style
  • Building user-facing cover generation features
  • Lyric-aware cover generation with preprocessing and editing

More models from MiniMax

MiniMax Music 2.6

Api Only

MiniMax Music 2.6 is MiniMax’s latest music generation model for full vocal songs and instrumentals from text prompts. It supports natural-language prompts or detailed production-style instructions, follows specified BPM and key with high reliability, and exposes fine-grained song structure control through section tags. The same Music API also supports instrumental generation, lyrics-assisted workflows, and synchronous or streaming delivery.

MiniMax M2.7 is a long‑context LLM designed for agentic workflows across software engineering, search and tool use, and high‑value office productivity tasks. It’s built for multi‑step execution, with strong instruction following and dependable task decomposition, making it a solid default for production assistants that write code, call tools, and handle complex document workflows.

MiniMax M2.7‑Highspeed is the performance‑tuned variant of M2.7, built for lower latency and higher throughput while keeping output behavior consistent with the standard model. It’s a strong fit for interactive coding agents, tool‑calling pipelines, and office automation flows where responsiveness matters.

MiniMax-M2.5 is MiniMax’s latest frontier model, optimized for fast, low-cost agentic workflows across coding, search/tool use, and high-value office tasks. Trained with large-scale reinforcement learning in complex real-world environments, it delivers strong reasoning, efficient task decomposition, and high-quality outputs for production assistants and enterprise workflows.

MiniMax Speech 2.8 is an advanced text-to-speech model that turns text into natural, expressive audio in multiple languages. It delivers broadcast-ready speech with rich prosody, emotional control, and a diverse voice library. The model supports up to large input lengths and can be used for voiceovers, narration, accessibility tools, and interactive voice applications.

MiniMax Hailuo 2.3 Fast is the speed tier of the Hailuo 2.3 video family. It targets rapid iteration for social clips, ads, and previews. It produces 6 second 768p or 1080p outputs with smooth motion and stable composition. Ideal for high volume image driven video workflows.

MiniMax Hailuo 2.3 is a cinematic video model for short form production. It accepts text prompts or image inputs and outputs 6 or 10 second clips at 768p or 1080p. It focuses on consistent motion, strong physics, and stable scenes for ads, social content, and creative shots.

MiniMax Hailuo 02 is a 1080p AI video model for cinematic, high motion scenes. It converts text prompts or still images into short, polished clips with strong instruction following and realistic physics. Ideal for commercial spots, trailers, music promos, and social shorts.

MiniMax 01 Live generates short stylized videos from static anime art. It focuses on expressive character motion with consistent details. Use it to turn illustrations or manga panels into dynamic clips suitable for cutscenes, social posts, or prototype shots.

MiniMax 01 Director generates short cinematic video clips from text prompts with director level control. It supports detailed camera movement instructions, stable framing, and reduced motion randomness. Ideal for film previz, ads, and story beats inside production tools.

MiniMax 01 is a compact text to video model for short clips. It turns simple prompts into 720p videos with smooth motion and cinematic framing. It targets fast iteration and stable output so developers can prototype interactive video features and creative tools with low latency.