CLIP Skip: Adjusting text interpretation depth
Adjusts which text encoder layer interprets your prompt, shifting between literal and abstract output.
Introduction
The clipSkip parameter controls which layer of the CLIP text encoder is used to interpret your prompt. By skipping deeper layers, you change how the model reads your text, shifting between literal interpretation and more abstract, stylistic output.
Most diffusion models use a component called CLIP (Contrastive Language-Image Pre-training), which contains a text encoder that translates your prompt into a numerical representation the model can understand. This text encoder is a neural network with multiple layers, each extracting different levels of meaning:
- Deeper layers (lower skip values) focus on abstract, semantic, and stylistic aspects of your prompt.
- Earlier layers (higher skip values) emphasize literal, concrete interpretations.
Adjusting clipSkip lets you tune whether the model follows your prompt closely or takes more creative liberties with style and composition.
For sticker images, using a clipSkip value of 2 is preferred, leading to a simpler, cleaner result that better fits the minimalistic style expected of a sticker.
Smiling avocado with sunglasses emoji, stickers pack, outline, white borders, detailed, cartoon, black background
Smiling avocado with sunglasses emoji, stickers pack, outline, white borders, detailed, cartoon, black background
Smiling avocado with sunglasses emoji, stickers pack, outline, white borders, detailed, cartoon, black background
For photorealistic portrait images where capturing fine details and realism is more important, not using clipSkip produces a richer and more detailed image that better matches the intended outcome.
A stylized portrait of a woman with vibrant orange and teal makeup, short platinum hair, and big statement earrings, captured in bright sunlight with colorful reflections around her. Fresh, bold, energetic
A stylized portrait of a woman with vibrant orange and teal makeup, short platinum hair, and big statement earrings, captured in bright sunlight with colorful reflections around her. Fresh, bold, energetic
A stylized portrait of a woman with vibrant orange and teal makeup, short platinum hair, and big statement earrings, captured in bright sunlight with colorful reflections around her. Fresh, bold, energetic
Request structure
The clipSkip parameter is an integer passed at the top level of your generation request.
[
{
"taskType": "imageInference",
"model": "civitai:101055@128078",
"positivePrompt": "Smiling avocado with sunglasses emoji, stickers pack",
"clipSkip": 2,
"steps": 30,
"width": 1024,
"height": 1024
}
]Recommended values
The optimal clipSkip value depends on your use case and the content you're generating:
| Use case | Recommended value | Why |
|---|---|---|
| Photorealism, portraits | 0 (disabled) | Deeper layers preserve fine detail, skin texture, and accurate color reproduction |
| Anime, illustrations | 1 - 2 | Many anime-trained models respond well to skipping 1-2 layers. The output becomes more stylized and compositionally bold |
| Stickers, flat art | 2 | Skipping more layers produces cleaner lines and simpler forms that suit flat design |
| Abstract, experimental | 2 - 3 | Higher skip values push the model toward looser, more interpretive output |
Architecture notes
clipSkip only applies to models that use the CLIP text encoder, such as SD 1.5 and SDXL-based models. Other models that rely on different text encoders (like T5 or LLaMA) will not be affected by this parameter.
Note that SDXL models already skip one layer by default, so setting clipSkip to 2 with SDXL effectively skips three layers from the original encoder.
Models like FLUX, Recraft, and other non-CLIP architectures ignore this parameter entirely. If you're unsure whether your model uses CLIP, omit clipSkip and let the default behavior take effect.
Tips
- Default to 0 for most work. Unless you're targeting a specific stylistic effect, the default (no skip) gives you the most faithful interpretation of your prompt.
- Match the model's training. Many community models on Civitai list a recommended
clipSkipvalue. If the model was fine-tuned withclipSkip: 2, using that value will produce output closest to the model's intended aesthetic. - Pair with LoRAs carefully. Some LoRAs are trained with a specific
clipSkipvalue. Mismatching can produce unexpected style shifts. Check the LoRA's documentation if available.