Sora 2

Next generation AI video and audio model from OpenAI

c830342e-3b57-4954-adda-0195736f5481

Sora 2 is OpenAI’s flagship generative model for video and audio. It accepts text prompts and generates visually rich clips with synchronized dialogue and sound. It improves physical realism and scene control. It also supports editing and extension of existing video inputs.

OpenAI
Commercial use
Text to VideoVideo to VideoImage to VideoAudio to Video
Each generation will cost $0.1/s for 720p.
720p · 8s$0.8

Examples

b1ac52b4-531f-49eb-ae85-923b484c1dc8
497d1cc1-1b50-400e-a72f-287d4495e361
3eee90e3-7f72-4b8e-bd2c-2d670ab74ba8
90e609f5-57b3-4f5d-91ce-8c124bb4bf4f
809dcd73-06e3-4bbb-9b84-8cf1e21ba1f4
9ffdf7cd-ebcc-4c47-91da-78b95a63e0a8
27b2d931-c4e5-4cd3-85d0-7137dd261e01
4d9cdab4-bf20-4a53-a64c-0972e0026a4a
cccb640b-0bf0-4b30-829f-c73225029828

More models from this creator

GPT Image 1.5 is OpenAI’s newest flagship image model powering the latest ChatGPT Images. It delivers significantly faster image generation with stronger instruction following, more precise edits that preserve original details, more believable transformations, and improved rendering of dense or small text. It is suited for practical creative workflows, detailed design tasks, and production use cases.

Sora 2 Pro is the higher quality Sora 2 variant for precision video work. It supports text prompts and image inputs. It outputs synchronized video with sound, higher resolution frames, and stronger temporal consistency. Ideal for production clips and demanding pipelines.

GPT Image 1 is OpenAI’s native GPT 4o image model. It creates detailed visuals from text prompts. It supports diverse styles and precise layouts. It can edit existing images with masks. It renders readable text in scenes. It suits design tools and production workflows.

DALL·E 3 converts natural language prompts into detailed images with strong caption fidelity. It improves handling of complex instructions and visual details. It integrates with ChatGPT and the OpenAI API for programmatic image creation and workflow automation.

DALL·E 2 is OpenAI’s diffusion based text to image model. It generates high quality images from prompts. It supports inpainting for local edits and outpainting for extended canvases. Developers use it through an API for creative tools, design workflows, and content pipelines.

OpenAI CLIP ViT-L/14 is a contrastive vision-language model that embeds images and text into a shared representation space. It enables tasks like zero-shot image classification, semantic search, and similarity scoring by computing aligned feature vectors for images and texts.