Your AI video pipeline is silent. Here's the fix.

Most AI video pipelines have the same gap. You generate a clip, it looks good, and then you realize it has no sound, or the sound is just not good enough. Your options at that point are not great: regenerate and hope the next model delivers something better, spend time manually sourcing and syncing effects, or ship something that feels unfinished.

Mirelo SFX v1.5 is a post-processing step that removes that decision. You pass it a silent video, it returns the same clip with synchronized sound effects. No regenerating. No writing sound descriptions for every clip. The video is the prompt.

Now, it's available on Runware.

How Mirelo SFX v1.5 works

Mirelo SFX v1.5 reads your video frames, infers what's happening on screen, and generates sound effects timed to match. Footsteps, impacts, ambient noise, mechanical sounds, or weather. Whatever the scene calls for. You can steer it with a text prompt if you want, or leave it to interpret the footage on its own.

A few things worth knowing upfront:

Input: silent video clips up to 10 seconds (longer clips are truncated)
Output: the same video with synced SFX, or an audio-only track for compositing
Steering: optional text prompt to guide the sound style or character
Variations: 2 to 4 samples per call
Pricing on Runware: $0.01 per second of output audio per sample (a three-second clip with three variations costs nine cents)

The model won't generate speech or music. If you need those, you're handling them separately. For most pipeline use cases, that's actually preferable: clean, compositable SFX with nothing bleeding in.

Explosion: Video + Sound

INPUT VIDEO, NO SOUND

OUTPUT VIDEO WITH MIRELO

Cooking: Video + Sound

INPUT VIDEO, NO SOUND

OUTPUT VIDEO WITH MIRELO

How it compares

MMAudio V2 is the closest like-for-like alternative: video in, synced audio out. It's a solid research model. Published at CVPR 2025, capable output, available via API on third-party platforms.

The practical differences are worth knowing. MMAudio's weights are released under CC-BY-NC 4.0, which means commercial use isn't guaranteed and needs to be assessed before you ship. It also works best with a text prompt alongside the video, so someone or something in your pipeline needs to describe the sound for each clip. And it returns one output per call.

Mirelo is a commercial API with clear usage rights, requires no prompt, and returns up to four variations per call. For pipelines running at volume without a human in the loop, those aren't minor differences.

	Mirelo SFX v1.5	MMAudio V2
Input	Silent video	Video + text prompt
Output	Clean SFX track	Full audio (SFX, ambient, sometimes music)
Commercial use	Yes	Not guaranteed (CC-BY-NC)
Prompt required	No	Recommended
Variations per call	2 to 4	1
Max duration	10 seconds	8 seconds (optimized)

Mirelo SFX v1.5 versus MMAudio V2

MMAudio V2

Mirelo SFX v1.5

MMAudio V2

Mirelo SFX v1.5

MMAudio V2

Mirelo SFX v1.5

Where it fits in real pipelines

The simplest case

Generate video, call Mirelo, get back a clip with sound. For use cases that don't need voice or music, that's the whole pipeline. It works well for:

Product showcases and demos
Social content and short-form video
Game cutscenes and UI previews
Any context where a watchable clip matters more than a fully produced piece

It won't replace a sound designer on a major production. But as a default step that runs on every clip, it holds up.

Prompting when you have context

The model interprets video frames on its own, but it responds well to text prompts when you give them. If your pipeline already tracks generation metadata, use it.

A clip generated with a prompt like "underwater chase sequence" will produce noticeably better audio when you pass something like "muffled impact, water movement, low ambient pressure" to Mirelo alongside the video. Stylized or non-photorealistic content especially benefits here: the visual cues are harder for any model to read, and a prompt anchors the output in the right sonic territory. Passing your generation prompt through to Mirelo adds almost no overhead and improves results on edge cases the model would otherwise misread.

The full audio stack

When you need all three layers (sound effects, voice, and music), keep them separate. Each layer stays independent, so if a clip gets regenerated, you're re-running Mirelo on that clip, not rebuilding the whole mix.

A typical setup:

SFX: Mirelo SFX v1.5, using audio-only output mode
Voice: ElevenLabs or similar
Music: your background score model of choice
Composition: FFmpeg or your audio tool of choice

Batch with variation selection

Mirelo returns two to four variants per call. In a high-volume pipeline, run each clip with three variations and build lightweight selection logic to pick the best one before delivery. It's not a substitute for quality review on anything high-stakes, but for pipelines generating at volume, it reduces outliers without adding a full review step.

A few honest caveats

The ten-second limit is real. For short-form content, it's fine. For anything longer, you're either trimming clips or chaining calls. Or you need to choose a different approach.

Where it can struggle is with highly abstract or surreal video, where motion cues are ambiguous. Most standard AI video output won't hit this edge case, but some will. Worth testing on your specific content before committing it to production.

And as with any generative model, build variation selection into your pipeline rather than assuming the first output is the right one.

Getting started

Mirelo SFX v1.5 is in the Runware Playground now if you want to test it on your own clips before writing any API code. Same credentials, same billing as everything else on the platform.

Try it in the Playground or go straight to the API docs.

Open in Playground

Read the docs

How Mirelo SFX v1.5 works

How it compares

Mirelo SFX v1.5 versus MMAudio V2

Where it fits in real pipelines

The simplest case

Prompting when you have context

The full audio stack

Batch with variation selection

A few honest caveats

Getting started

Run the fastest, lowest-cost generative AI API.