How to

How to Write AI Video Prompts for Veo 3.1

8 min read

Veo 3.1 is the strongest video generation model available in 2026 for cinematography and physical realism. It is also the most expensive of the credible options, running at approximately 50p to 60p per second of finished video for standard 720p output. The economics only work if the prompt is structured to produce the model's strengths and avoid its specific failure modes. How to write AI video prompts for Veo 3.1 is the difference between paying premium for premium output and paying premium for output that could have been produced by a cheaper model.

Veo's documentation focuses on what the model can do. The practical question for performance marketers is the inverse: what kind of brief produces output worth the per-second premium, and what kind of brief produces output that should have been routed elsewhere.

Where Veo 3.1 actually beats the competition

Three categories of shot where Veo 3.1 is meaningfully ahead of Sora 2 Pro, Kling 3.0, and the rest of the leaderboard.

Physics-heavy shots: liquid pours, fabric movement, hair physics, particles in air, smoke, steam, fire. Veo's physics engine in 3.1 produces output that is plausibly real on first viewing in ways the other models do not. For DTC categories where product interaction with materials matters (a serum drop landing on skin, a smoothie pouring into a glass, a fabric flexing under tension), Veo is the right model.

Cinematic lighting nuance: directional light shaping faces, golden-hour warmth, mixed-source lighting (window plus practical lamp), neon accents. The other models can hit basic three-point lighting; Veo handles lighting with the directional intent of a competent cinematographer.

Camera movement integrated with subject motion: dolly-in synchronised with a head turn, push-in matched to a step forward, parallax through a tracking shot. Veo's coordination of camera and subject is the model's distinctive strength.

What Veo is not particularly better at: stylised animation, cartoon physics, character consistency across cuts, vertical-format short-form aesthetics. For those use cases, the per-second premium is not justified.

The prompt structure that produces Veo's best output

Veo responds to detailed scene-level briefs more than to high-level concept briefs. The structure that consistently produces the model's premium-tier output covers ten elements.

  1. Subject and action: who is in the shot and what they are doing, in a single descriptive line.
  2. Setting: location, time of day, environmental detail. "Bedroom, late evening, soft warm light from a bedside lamp" rather than "bedroom."
  3. Camera framing: shot type (wide, medium, close-up), height, angle. "Medium shot, eye level, slightly off-axis to subject's left."
  4. Camera movement: static, slow push, tracking, handheld. "Slow push from medium to medium-close-up over five seconds."
  5. Lighting: direction, quality (hard/soft), colour temperature. "Single soft source from screen-right, 3200K, slight warm spill."
  6. Lens character: focal length, depth of field, any imperfections. "85mm equivalent, shallow depth of field, slight lens flare on highlights."
  7. Subject details: clothing, posture, micro-actions. "Wearing a charcoal jumper, hands relaxed on the table, slight forward lean during dialogue."
  8. Performance direction: tone, emotional register, pace. "Reflective, mid-thought, not performative."
  9. Aesthetic reference: visual style, era, photographic vs cinematographic. "Naturalistic, in the register of contemporary documentary advertising, not commercial."
  10. Negative constraints: what the brief specifically does not want. "No to-camera smiling. No commercial-set staging. No on-screen text."

Veo 3.1 uses substantially more of this brief than earlier models. Earlier-generation video models would ignore most prompt detail beyond the headline subject and action; Veo 3.1 reads and applies camera movement, lighting direction, and lens character with reasonable fidelity.

A prompt that produces premium-tier output, dissected

Here is a Veo 3.1 brief for a DTC supplement testimonial, written in the structure above:

Female 30-something subject, mid-thought, sitting at a kitchen table with a mug. Setting: kitchen at late evening, single warm bedside-style lamp on the counter providing the only practical light source, dim window in background showing dusk. Camera: medium shot, eye level, slightly off-axis to the subject's left, shallow depth of field, 50mm equivalent. Camera movement: very slow push from medium to medium-close over six seconds. Lighting: single soft source screen-right at approximately 3000K, slight warm spill on the cheek closest to camera, opposite side of face in soft shadow. Subject wears a charcoal jumper, hair slightly undone, hands wrapped around the mug, posture slightly forward, occasional sip. Performance: reflective rather than presenting, speaks as if to a friend off-camera, occasional pause mid-sentence. Aesthetic: naturalistic documentary advertising in the register of recent supplement-brand campaigns, not commercial-set polish. Avoid: to-camera smiling, on-screen text, commercial lighting, glossy product placement, energetic delivery. Five seconds total, audio: ambient kitchen hum, no music bed.

This brief produces Veo 3.1 output that is genuinely useful as a hero placement. The same brief on Hailuo or Kling 3.0 produces output that does not justify the cost differential; the cheaper models do not use the lighting and camera movement detail in the same way.

Common Veo 3.1 mistakes that destroy the cost economics

Five mistakes that reliably produce expensive output that should have been generated on a cheaper model.

Underspecified briefs: a one-line prompt ("a woman drinking coffee") gets generic output regardless of the model. Veo's premium is paid for; the brief has to use it.

Conflicting directions: "casual but cinematic" or "professional but authentic." Veo will resolve the conflict at random, and the result is usually neither. Pick a register and brief consistently.

Over-specifying impossible physics: requesting movement Veo cannot render plausibly (rapid liquid pours with multiple objects, complex hair physics through wind) produces uncanny artefacts. Test before scaling.

Ignoring the negative-constraint slot: Veo defaults to commercial-set staging when not explicitly told otherwise. The "avoid" line in the brief is load-bearing for ad output that should not look commercial.

Generating at full quality for hook tests: hook-stage variants will be killed within 24 hours regardless of model quality. Generating them on Veo wastes spend that should be on Kling or Hailuo. Brands using model orchestration tools route hooks to cheaper models automatically.

For a fuller treatment of model routing across the leaderboard, see Cost per AI video by model in 2026. For Sora-specific prompting, see How to write AI video prompts for Sora 2 Pro.

When to pick Veo 3.1 versus alternatives

A practical decision rule based on placement and use case:

  • Hero placements with sustained spend (£20k+ media spend behind the asset): Veo 3.1. The cinematography quality is observable to the audience and justifies the premium.
  • Mid-funnel content with character consistency requirements: Sora 2 Pro is usually the better pick. Sora's character consistency across cuts beats Veo's, and the cost is lower.
  • Hook testing at scale: Hailuo or Kling 3.0. Veo at hook stage destroys campaign maths.
  • Short-form vertical for TikTok: Seedance handles vertical aspect ratios more cleanly than Veo.
  • Anything where lighting and camera movement carry the shot: Veo 3.1.

The brands deploying AI video at scale typically use Veo for 5-15% of total generation volume (the hero placements that justify the premium) and route the bulk to cheaper models. Model orchestration tools that read the brief intent and route automatically remove the operational overhead.

How vertical-aware platforms reduce per-model translation overhead

The same canonical brief produces meaningfully different output across Veo, Sora, Kling, and Hailuo unless the syntax is tuned per model. Veo wants explicit camera and lighting direction; Kling responds to action descriptions and reference imagery more than to lighting verbosity; Hailuo benefits from compressed, directive briefs.

Brands managing this manually either rewrite the brief seven times (operationally expensive) or accept degraded output on the models that need different syntax (cost-effectively expensive). Tonic Studio's per-model translation handles this layer: a single canonical brief gets translated into each model's preferred prompt shape automatically, and the platform's show-me-the-prompt transparency exposes the per-model output so brands can debug when results do not match intent.

For vertical-specific applications of these prompting patterns, see AI video ads for fitness apparel brands or AI testimonial videos for serum brands.

FAQ

Does Veo 3.1 require Google Cloud access to use?

Veo is available through Google AI Studio for development, Vertex AI for production, and through partner platforms. Direct API access requires Vertex AI quota; partner platforms (including Tonic Studio) handle the routing without per-customer Vertex setup.

What's the maximum clip length on Veo 3.1?

Standard configurations support up to 8 seconds at 720p in single-generation mode. Longer clips can be produced by chaining generations with frame interpolation, though character and scene consistency across stitches is imperfect.

Does Veo 3.1 support reference-image conditioning?

Yes. Brand reference imagery, talent reference, and style reference are all supported as conditioning inputs. This is critical for brands maintaining specific visual aesthetics; without it, output drifts toward Veo's default register.

How does Veo 3.1 audio compare to silent-video models?

Native synchronised audio is generated alongside video. Quality is competitive with Sora 2 Pro on dialogue and ambient sound. Music generation is weaker than dedicated audio models. For DTC ads where music is laid in post anyway, the audio premium may not be needed.

Is Veo 3.1's quality advantage stable, or is the leaderboard moving?

The leaderboard moves quickly. Sora and Kling have closed the cinematography gap meaningfully through 2026. The decision rule based on use case (rather than headline benchmark) will outlast specific model rankings.


100 free credits to test Veo 3.1 alongside Sora, Kling, Hailuo, and Seedance through Tonic's per-model routing: tonicstudio.ai/signup?promo=UGC100.

Try Tonic Studio free

30 seconds to your first AI-generated UGC video. No credit card required.

Get started