DeepSeek-V4-Flash

DeepSeek-V4-Flash is DeepSeek's fast, efficient, and cost-focused frontier language model for coding, reasoning, and agent workflows. It supports both thinking and non-thinking modes, a 1M token context window, up to 384K output tokens, tool calls, JSON output, and efficient long-context operation for software, research, and structured professional tasks.

Complete technical specification for integration
Ready-to-use code snippets for common workflows
API Options
Platform-level options for task execution and delivery.
taskType
stringrequiredvalue: textInferenceIdentifier for the type of task being performed
taskUUID
stringrequiredUUID v4UUID v4 identifier for tracking tasks and matching async responses. Must be unique per task.
outputFormat
stringdefault: TEXTSpecifies the file format of the generated output. The available values depend on the task type and the specific model's capabilities.
Allowed values2 values
webhookURL
stringURISpecifies a webhook URL where JSON responses will be sent via HTTP POST when generation tasks complete. For batch requests with multiple results, each completed item triggers a separate webhook call as it becomes available.
Learn more1 resource
- WebhooksPLATFORM
- Webhooks
deliveryMethod
stringdefault: syncDetermines how the API delivers task results.
Allowed values3 values
- Returns complete results directly in the API response.
- Returns an immediate acknowledgment with the task UUID. Poll for results using getResponse.
- Streams results token-by-token as they are generated.
Learn more1 resource
- Task PollingPLATFORM
includeCost
booleandefault: falseInclude task cost in the response.
includeUsage
booleandefault: falseInclude token usage statistics in the response.
Core Parameters
Primary parameters that define the task output.
numberResults
integermin: 1max: 4default: 1Number of results to generate. Each result uses a different seed, producing variations of the same parameters.
model
stringrequiredvalue: deepseek-v4-flashIdentifier of the model to use for generation.
seed
integermin: 0max: 9223372036854776000Random seed for reproducible generation. When not provided, a random seed is generated in the unsigned 32-bit range.
jsonSchema
object | stringJSON Schema for structured output. Only honoured when
outputFormatis JSON. Accepts the OpenAI envelope ({name, schema, strict}) or a bare JSON Schema; bare schemas are auto-wrapped withname='response'andstrict=true.
messages
array of objectsrequiredmin items: 1Array of chat messages forming the conversation context.
tools
array of objectsmin items: 1Tool definitions available for the model to call during generation.
Properties4 properties
tools»typetype
stringrequiredThe kind of tool to make available to the model. User-defined functions require
nameandschema, while built-in tools (search,codeInterpreter) are executed server-side by the provider.Allowed values1 value
tools»namename
stringmax: 64Unique function name. Required for function tools.
tools»descriptiondescription
stringExplanation of what the function does, used by the model to decide when to call it.
tools»schemaschema
objectJSON Schema object describing the function's input parameters.
Settings
Technical parameters to fine-tune the inference process. These must be nested inside the settings object.
settings object.settings»systemPromptsystemPrompt
stringmin: 1max: 1000000System-level instruction that guides the model's behavior and output style across the entire generation.
settings»temperaturetemperature
floatmin: 0max: 2step: 0.01default: 1Controls randomness in generation. Lower values produce more deterministic outputs, higher values increase variation and creativity.
settings»topPtopP
floatmin: 0max: 1step: 0.01default: 1Nucleus sampling parameter that controls diversity by limiting the probability mass. Lower values make outputs more focused, higher values increase diversity.
settings»frequencyPenaltyfrequencyPenalty
floatmin: 0max: 2step: 0.01default: 0Penalizes tokens based on their frequency in the output so far. A value of 0.0 disables the penalty.
settings»maxTokensmaxTokens
integermin: 1max: 1048576default: 32768Maximum number of tokens to generate in the response.
settings»presencePenaltypresencePenalty
floatmin: 0max: 2step: 0.01default: 0Encourages the model to introduce new topics. A value of 0.0 disables the penalty.
settings»stopSequencesstopSequences
array of stringsmin: 1max: 50max items: 5Array of sequences that will cause the model to stop generating further tokens when encountered.
settings»thinkingLevelthinkingLevel
stringdefault: offControls the depth of internal reasoning the model performs before generating a response.
Allowed values3 values
toolChoice
objectControls how the model selects which tool to call. This only takes effect when
toolsare defined.Examples3 examples
Let the model decide (default):
"toolChoice": { "type": "auto" }Force a specific tool call:
"toolChoice": { "type": "tool", "name": "get_weather" }Require any tool call:
"toolChoice": { "type": "any" }Properties2 properties
toolChoice»typetype
stringrequiredStrategy the model uses to decide when and which tools to call.
Allowed values4 values
- The model decides whether to call a tool based on the conversation context. This is the recommended default.
- The model must call at least one tool but chooses which one. Useful when you always need structured output.
- The model must call the specific tool identified by
name. Use this to force a particular function call. - The model will not call any tool, even if tools are defined. Useful for forcing a text-only response.
toolChoice»namename
stringName of the specific tool the model must call. Required when type is
tool.