DeepSeek
DeepSeek

DeepSeek-V4-Flash

Fast frontier LLM with 1M context, tool use, and dual thinking modes

Text to Text

DeepSeek-V4-Flash Overview

DeepSeek-V4-Flash is DeepSeek's fast, efficient, and cost-focused frontier language model for coding, reasoning, and agent workflows. It supports both thinking and non-thinking modes, a 1M token context window, up to 384K output tokens, tool calls, JSON output, and efficient long-context operation for software, research, and structured professional tasks.

Token based
Input tokens / 1M0.14
Output tokens / 1M0.28
Cached input / 1M$0.014

Commercial use

How to Use DeepSeek-V4-Flash

Overview

DeepSeek-V4-Flash is a fast frontier language model built for coding, reasoning, and long-context agent workflows.

It is a strong fit for teams that need a high-capability text model with large context, strong tool use, configurable reasoning style, and efficient cost/performance for production assistants, coding systems, and autonomous task execution.

Strengths

Dual Thinking Modes

DeepSeek-V4-Flash supports both thinking and non-thinking modes. That makes it useful across a wide range of workloads, from quick lower-latency responses to more deliberate reasoning-heavy tasks.

Very Large Context Window

The model supports a 1M token context window with up to 384K output tokens. This makes it suitable for large repositories, long documents, retrieval-heavy workflows, and agents that need to keep substantial context in scope.

Fast and Cost-Efficient Frontier Model

Within the DeepSeek V4 line, the Flash variant is positioned as the fast and economical option. It is designed for teams that want strong reasoning and coding quality without paying the cost profile of the Pro variant.

Strong Tool Use

DeepSeek-V4-Flash supports tool calls and structured API workflows, which makes it well suited to function-calling systems, coding agents, and production assistants that need external actions in addition to raw generation.

Long-Context Efficiency

The V4 family is built around very high context efficiency. Flash is especially relevant for workloads where long context must be practical rather than only theoretically available.

Coding and Agent Work

DeepSeek positions V4-Flash as strong on reasoning and on par with V4-Pro for simpler agent tasks, which makes it a good fit for many day-to-day coding and automation workflows.

Capabilities

Text-to-Text

DeepSeek-V4-Flash handles general language tasks including coding assistance, summarization, planning, reasoning, drafting, structured generation, and research-oriented outputs.

Tool Calling

The model supports tool calls in API workflows, making it suitable for agent orchestration, external action pipelines, and function-driven application design.

Long-Context Reasoning

DeepSeek-V4-Flash is designed for workflows where very large context windows and long outputs are central to the task.

Input and Output

  • AIR ID: deepseek:v4@flash
  • Input: text
  • Output: text
  • Context window: 1M tokens
  • Max output: 384K tokens
  • Thinking modes: thinking and non-thinking
  • Tool use: supported
  • JSON output: supported

Best Fit

  • Coding assistants
  • Tool-using agents
  • Long-document analysis
  • Large-repository reasoning
  • Cost-sensitive production LLM workflows