How to Write AI Video Prompts That Actually Look Professional

Invalid Date·12 min read

Open any AI video tool. Type "a woman drinking coffee in a kitchen". Run it. The result is fine. Forgettable, amateur, and indistinguishable from the millions of other "fine" generations being produced this minute. Now type "macro shot, espresso cup on a marble counter, soft window light from camera-left, slow push-in, 50mm equivalent, golden hour warmth, slight handheld feel, A24 muted grade". Same model, same product, same coffee. The second result reads like a campaign asset.

The difference is not the model. The difference is cinematography vocabulary. AI video prompts for marketing are a discipline now, and the people who learned that discipline are getting professional-looking output from the same tools that produce mediocre output for everyone else.

This post is the working framework. Ten elements that drive most of the perceived quality difference, with concrete before-and-after examples for each. Read it through once, keep it as a reference, and your hit rate on first-generation usable assets will roughly triple.

The 10 elements of cinematographic prompting

Every professional-looking AI video can be broken down into ten prompt elements. Most amateur prompts include two or three of them. Most professional prompts include all ten, even if some are implicit in the brand's house style. Here is the full list, with examples.

1. Subject

Who or what is in the shot. The most common amateur mistake is leaving this generic.

Amateur. A woman.
Professional. A 35-year-old woman with shoulder-length brown hair, light makeup, casual loungewear in muted earth tones.

The professional version gives the model enough specificity to render a consistent character. Generic prompts produce stock-flavoured talent because the model has nothing to anchor on.

2. Action

What the subject is doing, including the small physical details that read as natural.

Amateur. She is drinking coffee.
Professional. She holds the espresso cup with both hands, takes a slow sip, and looks down briefly before glancing toward the window.

Action sentences with two or three small movements read as continuous and natural. Single-action prompts produce stiff outputs.

3. Setting

Location with sensory detail. Setting carries half the mood of any shot.

Amateur. A kitchen.
Professional. A small modern kitchen with a marble countertop, mid-evening light coming through a single window with sheer curtains, a cluttered bookshelf visible in the soft background blur.

Specific settings produce cinematic backgrounds. Generic settings produce stock kitchens.

4. Camera angle

Close-up, medium, wide, or specific framing. This is the most-skipped element in amateur prompts.

Amateur. (no angle specified)
Professional. Medium close-up framed at chest height, talent at three-quarter angle to camera.

Common conventions:

Close-up (CU). Head and shoulders. Used for testimonials, intimate moments, beauty product application.
Medium close-up (MCU). Chest up. The default testimonial framing.
Medium shot (MS). Waist up. Lifestyle and conversational scenes.
Wide shot (WS). Full body or environment. Sets context.
Extreme close-up (ECU). Detail level. Eyes, lips, product texture.

5. Camera movement

Static, push, pull, dolly, pan, tilt, handheld, gimbal. Movement is what separates a video from a still.

Amateur. (no movement specified)
Professional. Slow push-in across the line, settling into the close framing on the final beat.

Common moves:

Static. No camera movement. Reads as honest and grounded.
Slow push-in. Camera advances gradually toward the subject. Builds intimacy.
Slow pull-out. Camera retreats. Reveals context or ends a moment.
Dolly. Camera moves laterally. Smooth, premium.
Handheld. Slight imperfection in the motion. Reads as authentic UGC.
Gimbal. Smooth movement that follows the subject. Lifestyle and action.

6. Lens specification

The single highest-leverage element most marketers do not use. Lens choice changes the entire feel of the shot.

24mm wide. Environmental shots, lifestyle, anything that needs context. Slight distortion at edges.
35mm natural. Closest to how the human eye sees. Default for honest, conversational shots.
50mm portrait. Flattering compression for talent on camera. The standard testimonial lens. Soft background blur.
85mm intimate. Stronger compression and stronger background blur. Beauty close-ups, premium product hero.
Macro. Extreme close-up of product detail. Texture, packaging, ingredient shots.

A 35mm-shot testimonial reads natural and grounded. The same scene at 85mm reads cinematic and premium. The model will respect the specification when you give it.

Amateur. A close-up of her face.
Professional. 85mm equivalent close-up, soft background blur, slight foreground bokeh.

7. Lighting

The single most-undervalued element. Lighting carries mood more than any other choice.

Golden hour. The hour after sunrise or before sunset. Warm, soft, flattering. Lifestyle.
Soft window light. Diffused daylight from a window. The default for honest testimonial work. Reads natural.
Harsh midday. High-contrast direct sunlight. Energy, fitness, anything that needs urgency.
Neon. Saturated colour with strong contrast. Beauty, fashion, evening lifestyle.
Candlelight. Warm, low-key, intimate. Premium evening rituals.
Studio softbox. Even, flattering, slightly clinical. Product hero shots.
Practical mixed. Lamps and overhead fixtures combining. Authentic interior register.
Amateur. Good lighting.
Professional. Soft window light from camera-left, golden-hour warmth, slight rim light from a practical lamp behind the talent.

8. Mood and atmosphere

The emotional register the shot should produce. Models respond to mood adjectives.

Cinematic. Composed, considered, premium.
Raw UGC. Slight imperfection, immediate, honest.
Premium. Considered, sparse, confident.
Energetic. Movement, brightness, active.
Intimate. Close, soft, low-key.
Aspirational. Spacious, well-lit, lifestyle-led.
Amateur. A nice video.
Professional. Honest, intimate, slightly low-key, the register of a friend speaking after dinner rather than a presenter on camera.

9. Style and aesthetic

Reference points that anchor the visual treatment. References work well in modern models.

Film grain references (16mm, 35mm, digital flat).
Colour grading references (A24 muted, Wes Anderson pastel, Soderbergh teal-and-orange, neutral commercial).
Era references (90s consumer camcorder, 70s polaroid warmth, 80s saturated film stock).
Mood references (Apple product launch, Aesop interior, Goop-style wellness).
Amateur. A cinematic look.
Professional. A24 muted colour grading, slight 35mm grain, neutral skin tones, no aggressive saturation.

10. Negative prompts

What you do not want to see. Models reach for clichés if you let them. Negative prompts cut the clichés.

Common negative prompts for DTC creative:

No stock-flavoured generic kitchen.
No overly white or sterile lighting.
No product floating unnaturally.
No exaggerated talent expressions.
No on-screen text or captions (those get added in post).
No cinematic dust particles or haze unless requested.
No multiple talents in frame unless specified.
No camera-shake artefacts unless handheld is requested.

Negative prompts are particularly important when generating supplement or skincare content where over-stylised "miracle moment" treatments are common AI tendencies.

Or skip the learning curve. Tonic enriches every brief with expert cinematography automatically.

Cinematography conventions for different content types

Once you have the ten elements, the next layer is knowing the conventions that work for specific content types. Here are the four most common DTC content types and the cinematography conventions that produce strong output for each.

Testimonial

The single most-used content type in DTC paid social. The convention:

50mm portrait lens equivalent
Medium close-up framing
Soft window light from camera-left or right (not direct)
Slight handheld feel for authenticity
Static or very slow push-in
Mid-evening or daytime warmth
Mood: honest, slightly low-key, intimate

A testimonial shot in this register feels like an actual human conversation. The 50mm flattens the talent's features in a flattering way. The window light reads natural. The slight handheld separates it from a presenter clip.

Product hero

The shot that carries the brand and the product. The convention:

Macro or 85mm
Extreme close-up to medium close-up
Studio softbox or single-source dramatic lighting
Slow dolly or slow rotation
Premium colour grade
Mood: considered, premium, sparse

Product hero shots benefit from the most cinematography effort because the asset has to carry brand. A poorly-lit hero shot is the fastest way to make a £40 product look like a £4 product.

Lifestyle

Product in context. The convention:

35mm natural lens
Medium shot, talent and product both in frame
Golden hour or natural daylight
Gimbal or slow handheld movement
Mood: aspirational, warm, lived-in

Lifestyle shots are where the brand's interior aesthetic lives. The setting carries half the work. Specific environmental details (the marble counter, the linen napkin, the open book) make the shot feel composed without being staged.

Before-and-after

Transition or comparison content. The convention:

50mm portrait lens
Identical framing across both states
Identical lighting across both states (this is the hard part)
Static camera, no movement
Subtle differences in talent expression and tension
Mood: honest, observational

The before-and-after register only works if the framing and lighting are identical. Any change in those reads as cheating, even if the actual product effect is real. AI models can hold framing across regenerations using seed values and reference images, which is one of the things that makes them suited to this content type.

For supplement and skincare brands working in regulated markets, before-and-after content also has compliance considerations. We covered those in detail in our piece on FTC compliance for supplement advertising, and they apply just as much to AI-generated comparisons as to creator-shot ones.

See what cinematic enrichment looks like in practice. Start free with 50 credits.

How model-specific dialects matter

You can write the perfect cinematography-enriched brief and still get mediocre results if you send it to the wrong model in the wrong dialect. We covered the model-by-model breakdown in detail in our seven AI video models compared on the same brief, but the dialect summary is worth restating here.

Kling. Comma-separated cinematography terms in a specific order. "50mm portrait, soft window light from camera-left, slow push-in, A24 muted grade, MCU, slight handheld feel."

Veo. Flowing prose with native audio direction. "A medium close-up of a 35-year-old woman holding an espresso cup. The camera pushes in slowly across the line. Soft window light from her left, mid-evening warmth in the colour temperature. Her voice softens on the third sentence."

Sora. Vivid action verbs and present-tense phrasing. "She lifts the espresso cup. Light catches the rim. Steam rises briefly. The camera presses in. She speaks quietly, looks down, glances back."

Hailuo. Compact, declarative briefs. "Woman, 35, espresso, kitchen, soft window light, push-in, intimate testimonial register."

Grok. Aesthetic references and style adjectives. "An A24-graded testimonial moment. Soft, intimate, considered. Espresso cup, hands, slight blur."

The same enriched cinematography brief, translated correctly into each dialect, will produce coherent results across all of them. Translated incorrectly, results vary wildly. This is mechanical work that nobody enjoys doing manually.

The shortcut: orchestration platforms that do this automatically

Cinematography prompting is a real skill, and getting good at it will lift the output of any AI video tool you use. There is no substitute for understanding the ten elements, the conventions, and the dialects, especially if you are running enough creative volume to justify learning the discipline.

That said, most DTC creative teams should not be spending their senior people's time on prompt engineering. The work is repetitive. The cinematography conventions are stable. The dialects are mechanical. All of it is exactly the kind of work that platform software is good at and humans are bad at.

This is what cinematography enrichment does at the platform level. The user writes a brief in plain English describing the content type, the talent, and the brand voice. The platform applies the appropriate cinematography conventions for that content type. The same enriched brief is then translated into the dialect of whichever model gets routed for the job. The user never has to write "50mm portrait, soft window light from camera-left" if they do not want to.

The trade-off is the trade-off of any abstraction layer. You give up some control to get back time. For DTC creative teams running ten or more assets per week, the time-back is dramatically more valuable than the control given up, because the cinematography conventions are well-understood and consistent.

For one-off hero pieces where the cinematography is the creative choice, manual prompting still wins. For the volume layer, automation wins.

Cinematography as a skill versus a platform feature

The right way to think about cinematography in 2026 is that it is now a layer in the stack, not a manual craft. Either you do it yourself, or your tooling does it for you. The output quality difference between a cinematography-aware pipeline and a generic one is enormous. The output quality difference between manual cinematography and automated cinematography in a well-built pipeline is small.

If you have the time and interest to learn the discipline yourself, the ten elements and the conventions in this post are roughly the working framework. Start with the testimonial convention because it is the highest-volume content type and the easiest to get right. Move to product hero next. Lifestyle and before-and-after come after.

If you would rather have the framework applied automatically, that is what platform-level cinematography enrichment exists to do. Either path produces better results than running unenriched generic prompts through a horizontal AI video tool. The wrong path is to keep doing what most teams are doing in 2026, which is generating mediocre output from good models because the briefs are missing eight of the ten elements that matter.

For brands doing £5M+ in annual revenue, a guided walkthrough shows how the enrichment performs against your real briefs. For everyone else, the free tier with 50 welcome credits is enough to compare enriched-and-translated against your current pipeline.

Try Tonic Studio free

30 seconds to your first AI-generated UGC video. No credit card required.

Get started