MODEL ID sourceful:riverflow-2.5@pro
live

Riverflow 2.5 Pro

Sourceful
by Sourceful

Riverflow 2.5 Pro is the higher-capability variant in Sourceful's Riverflow 2.5 family. It supports both text-to-image and image-to-image workflows and is positioned for commercial visual work that needs stronger output quality, tighter brand control, and more dependable production results across packaging, product imagery, advertising, and design-heavy creative pipelines.

Riverflow 2.5 Pro

The custom scoring rubric

How to use scoringPrompt and scoringRubric on Sourceful Riverflow 2.5 Pro to drive different production workflows from the same brand inputs.

Introduction

Image models are usually optimised for general-audience appeal. That works for one-off illustrations, but production creative is a different problem. The same brief and the same product assets can produce a wholesale catalog packshot, an editorial magazine spread, an Instagram Story, an email header, an outdoor billboard, or a lifestyle shot for a paid social ad. Each one is a different right answer for a different downstream job. A prompt alone cannot decide between them.

Riverflow 2.5 Pro lets you encode that downstream context directly in the request. Alongside the positivePrompt, you supply a scoringPrompt (a free-text framing of what you are judging for) and an optional scoringRubric (a structured set of weighted dimensions the internal judge scores each candidate against). After every edit the model performs, the judge scores the candidate against your rubric and either accepts the result or sends the candidate back for another iteration.

This guide walks through the request shape, then exercises the rubric across three distinct production patterns (a judge swap, a channel kit, and a packshot-to-lifestyle transformation), then breaks down the anatomy of a rubric dimension and the mechanics of writing strong dimensions.

Request shape

The call takes the source references, a positivePrompt, and a settings block that carries the scoringPrompt and the optional scoringRubric. The example below is one of the rubrics used in the judge-swap demo further down the guide.

[
  {
    "taskType": "imageInference",
    "taskUUID": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "model": "sourceful:riverflow-2.5@pro",
    "positivePrompt": "Generate a marketing image for Acre Pantry, an artisan small-batch granola bar brand, featuring their three granola bar SKUs (Cacao Crunch, Honey Almond, Wild Berry). Use the supplied bar wrapper references and the brand logo card. The composition may feature one bar as a hero, two bars together, or all three of the SKUs depending on what the scoring rubric rewards.",
    "width": 1280,
    "height": 720,
    "inputs": {
      "referenceImages": [
        "https://example.com/ref-bar-cacao.jpg",
        "https://example.com/ref-bar-almond.jpg",
        "https://example.com/ref-bar-berry.jpg",
        "https://example.com/ref-logo.jpg"
      ]
    },
    "settings": {
      "thinkingLevel": "high",
      "scoringPrompt": "Score this image as a clean product catalog packshot for B2B retail buyers and e-commerce listings. The packshot must read clearly at thumbnail size and survive professional approval by a packaging stakeholder.",
      "scoringRubric": [
        {
          "key": "packDominance",
          "label": "Pack dominance",
          "description": "All three bar wrappers face front-on to the camera and the ACRE PANTRY wordmark plus each flavor name are clearly readable. The three bars together fill at least 60% of the frame area. No bars are partially cropped, rotated away from camera, or hidden.",
          "weight": 0.4,
          "passingScore": 0.7
        },
        {
          "key": "studioPurity",
          "label": "Studio purity",
          "description": "The frame contains ONLY the three bars and a pure white or near-white seamless studio background. NO food ingredients (no nuts, no berries, no chocolate, no oats, no honey). NO fabric or linen. NO plants or eucalyptus. NO ceramic mugs or bowls. NO scattered crumbs. NO atmospheric lighting. NO surface textures or wood grain. Any extra element in the frame is a scoring failure.",
          "weight": 0.4
        },
        {
          "key": "catalogConservatism",
          "label": "Catalog conservatism",
          "description": "Calm even three-point studio lighting, conservative front-on crops, no creative angles or dramatic shadows. Composition reads as approval-ready for a retail buyer rather than as a stylised concept piece.",
          "weight": 0.2
        }
      ]
    }
  }
]
[
  {
    "taskType": "imageInference",
    "taskUUID": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "imageUUID": "f1e2d3c4-b5a6-7890-1234-567890abcdef",
    "imageURL": "https://im.runware.ai/image/os/a14d18/ws/2/ii/f1e2d3c4-b5a6-7890-1234-567890abcdef.jpg",
    "seed": 837412938
  }
]

A few notes on the shape:

  • The scoringPrompt is a free-text framing line for the judge. The scoringRubric is the structured breakdown into weighted dimensions. Each is independently valid. Sending scoringPrompt alone is the lightweight pattern when you want one quality preference without designing a full rubric. Sending both is the production shape when the judge needs both an overall framing and weighted criteria.
  • The rubric is independent from the positivePrompt. The generator is not asked to think about catalog conservatism. The judge is asked to score for it after each candidate and reject results that fall short.
  • thinkingLevel controls how many edit-and-judge cycles the model is allowed before settling. The rubric only does real work when the model has the budget to iterate against it.

Three judges, one prompt

The first production pattern is the headline one: same brief, same references, three different judging criteria. This is where the rubric concept lands clearest, because the only thing changing across the three calls is what the judge is rewarding.

The four supplied references are three bar wrappers and the brand logo card. They stay constant across all three calls below.

The shared positivePrompt sent on all three calls is intentionally open-ended:

Generate a marketing image for Acre Pantry, an artisan small-batch
granola bar brand, featuring their three granola bar SKUs (Cacao
Crunch, Honey Almond, Wild Berry). Use the supplied bar wrapper
references and the brand logo card. The composition may feature one
bar as a hero, two bars together, or all three of the SKUs depending
on what the scoring rubric rewards.

The prompt deliberately gives the model permission to use one bar, two bars, or all three. The rubric decides which.

The same generation brief produces three visually distinct outputs because each rubric explicitly forbids what the other rubrics reward. The studio rubric forbids the props that the editorial rubric requires. The editorial rubric allows the bars to shrink, which the studio rubric forbids. The poster rubric forbids three bars where the others want them, and reserves half the frame as a flat copy zone where the others would put atmosphere.

Sharp rubrics describe what should not appear as clearly as what should. A judge that only knows what the rubric wants will default to the model's general-audience taste for everything else. A judge that knows what the rubric forbids has clear ground to reject candidates on.

Channel-kit production

The second production pattern is the channel kit: one product set, multiple deliverables sized and styled for different distribution channels. The same shared positivePrompt and references go into each call, but the request changes two things at once: the aspect ratio of the output, and a format-specific rubric that knows what that channel's deliverable needs to do.

Three channels rendered from the same Acre Pantry inputs:

The same product family, the same brand inputs, the same positivePrompt. What differs is the aspect ratio plus a rubric that knows what that channel's job is. The 1:1 thumbnail rubric understands that a product tile lives next to other product tiles in a grid. The 9:16 rubric understands that the top and bottom of a mobile frame get eaten by UI. The 16:9 rubric understands that an email banner exists to host a subject line. Each format gets the framing it actually needs.

This is the rubric pattern that scales to real production: one set of brand inputs, a library of rubrics per channel, one call per deliverable.

Packshot to lifestyle

The third production pattern is image-to-image transformation: send a single-SKU packshot reference, and the rubric decides what context that packshot gets transplanted into. Real brand teams pair specific SKUs with specific lifestyle contexts. The three calls below each send a different bar paired with the lifestyle direction that fits its narrative best: Honey Almond onto the trail, Wild Berry onto the Sunday morning counter, Cacao Crunch into the post-workout gym.

All three lifestyle rubrics share a barFidelity dimension with passingScore: 0.7 because every version needs the wrapper to render accurately. That is a non-negotiable for brand work, which is exactly the case passingScore exists to enforce. The other two dimensions per rubric are the ones that differentiate the lifestyle direction, and each direction's context dimension explicitly forbids the cues the other two use.

This pattern is what the rubric enables for brand teams that need to atomise a single packshot into many in-context shots for a campaign launch, social calendar, or retailer-specific collaboration.

Anatomy of a rubric dimension

A scoringRubric is an array of one to eight dimensions. Each dimension has four required fields and two optional ones that tighten the judging behaviour:

  • key (lowercase, alphanumeric, max 40 chars): machine-readable ID for the dimension. Appears in logs and judge output.
  • label (max 80 chars): short human-readable name for the dimension.
  • description (max 1000 chars): the actual judging criteria. The model uses this text to decide what to look for in each candidate. The load-bearing field.
  • weight (positive number): relative importance. Weights are normalised internally, so 0.5 / 0.3 / 0.2 and 5 / 3 / 2 produce the same behaviour.
  • passingScore (optional, 0 to 1): minimum score this dimension must reach for the candidate to be accepted. Use for non-negotiables.
  • scoreGuidance (optional, up to 5 entries): score anchors that define what numeric scores mean. Each entry has a score (0 to 1) and a description. Anchors give the judge a consistent vocabulary across runs.

The three demos above each lean on a different combination. The studio packshot and catalog thumbnail rubrics use passingScore on their pack-clarity dimension as a hard floor. The trail rest and Sunday morning rubrics use passingScore on their barFidelity dimension because a misrendered wrapper kills the creative even if the lifestyle reads. The other dimensions in each rubric lean on weight alone.

Weights are normalised internally, but descriptions are not. A weak description gets effectively downweighted by the judge even when its numeric weight is high, because the judge has nothing concrete to score. Put the writing budget on description, not on tweaking weight numbers.

The description carries the rubric

The most common rubric pitfall is writing descriptions that sound like the rubric author's feelings rather than observable criteria the judge can score. To isolate the description as the variable, the next demo holds the scoringPrompt, the positivePrompt, the references, and thinkingLevel all constant. Only the rubric descriptions change between the two calls.

On the left, three vague dimensions ("looks good", "brand feel", "clean") that any image could plausibly satisfy. On the right, the studio packshot rubric from the judge-swap demo above, with descriptions written as observable specs ("the frame contains ONLY the three bars and a pure white seamless studio background", "no food ingredients", "no fabric", "no plants").

The weak rubric is technically valid. The judge accepts it. But "looks good" and "feels professional" are not observable spec, so the judge has nothing concrete to score against and falls back to the model's default sense of visual appeal. The rubric is present in the request but is not actually steering anything.

Write descriptions like a creative reviewer would write notes to a designer. State what should be in the frame, what should be visually dominant, and what should not appear. Adjectives without referents give the judge nothing to work with.

Pairing with thinking level

The rubric only does real work when the model has the budget to iterate against it. thinkingLevel is that budget knob. On low, the model takes one or two edit-and-judge passes and settles. On high or xhigh, it iterates until the candidate actually clears the rubric or until budget runs out.

Both outputs below use the editorial magazine rubric from the judge-swap demo. The only thing that changes is thinkingLevel.

Treat thinkingLevel and scoringRubric as one tuning surface. Sending a detailed rubric on low mostly wastes the rubric, because the model accepts an early candidate before the rubric has time to land. Sending no rubric on xhigh mostly wastes the budget, because the model burns its thinking allowance chasing its own default judge with no extra guidance. Match the budget to the work the rubric is doing.

low and medium are reasonable defaults for ideation and exploration. high is the sweet spot when the rubric is doing real work. xhigh is for production runs where outputs need to clear a strict rubric repeatedly and you accept the longer thinking time and higher cost.

Tips

  1. Write descriptions like product specs, not feelings. State what should be in the frame, what should be dominant, and what should not appear. The judge can score "the frame contains ONLY the three bars and a pure white seamless studio background, no food ingredients, no fabric". It cannot reliably score "looks good".
  2. Sharp rubrics explicitly forbid what other rubrics reward. Studio purity forbids the props that editorial mood requires. Massive copy zone forbids the secondary elements that thumbnail legibility uses. Without explicit exclusions the model defaults to its general-audience taste and the rubrics converge.
  3. The lightweight pattern is one framing line, no rubric. When you want one quality preference (for example, "reward outputs that read as approachable rather than aspirational") without designing a structured rubric, send only scoringPrompt. Add scoringRubric when the judge needs weighted dimensions to score against.
  4. Weights apportion attention, they do not add or drop dimensions. If a dimension does not belong in the rubric, remove it. If it belongs but matters less, lower its weight. Weights are normalised internally, so only the relative values matter.
  5. Set a hard floor when a dimension is non-negotiable. Pack fidelity, brand wordmark legibility, no fabricated SKUs. Use passingScore rather than just weighting the dimension high. The judge will reject candidates that miss the floor regardless of how well the rest of the rubric scores.
  6. Lock the scoring scale with anchors. Without anchors, "0.6" on one run is not the same "0.6" on the next. With scoreGuidance describing what each score actually means, the judge has a consistent vocabulary across runs of the same rubric.
  7. Match the thinking budget to how much the rubric matters. A thinkingLevel of low or medium is good for ideation. High or xhigh give the model the budget to actually iterate against the rubric until it scores well, which is when the rubric is doing real work.
  8. Build a rubric library, not a one-off. The channel-kit pattern in this guide implies a real production workflow: one rubric per channel, one set of brand inputs, one call per deliverable. Naming and reusing rubrics is what lets a brand team scale this from a demo into a launch process.