Requires Premium.
How to use it
- Open the video generator and start or open a Session
- Select Text-to-Video
- Write a prompt describing the subject, setting, and motion
- Select a resolution
- Click Generate
Writing a prompt for Text-to-Video
A Text-to-Video prompt needs to cover both the scene and the motion in one description. Structure it as: subject → setting → action → camera. Good:a woman with long dark hair, standing in a sunlit bedroom, turning slowly toward camera, slow zoom in, soft natural light
Bad: beautiful woman, high quality, cinematic — no motion, no setting, no camera direction.
Without a first frame the model makes its own composition decisions. The more specific your prompt, the less the model improvises.
See Video Prompting for a full breakdown of motion and camera terms.
When to use Text-to-Video vs Image-to-Video
| Text-to-Video | Image-to-Video | |
|---|---|---|
| Speed | Faster to start | Requires a first frame |
| Composition control | Low — model decides | High — you set the frame |
| Character consistency | Variable | Consistent with first frame |
| Best for | Quick experiments, abstract scenes | Specific characters, precise shots |
Token cost
Text-to-Video uses tokens. See /premium for current rates.Common issues
The character doesn't look consistent across the clip
The character doesn't look consistent across the clip
Text-to-Video has no reference frame, so character appearance can drift. If consistency matters, switch to Image-to-Video with a generated first frame.
There's no visible motion in the output
There's no visible motion in the output
The prompt doesn’t describe any movement. Add explicit motion terms —
slow pan, walking forward, hair moving, camera pull back — to the prompt.The composition is not what I expected
The composition is not what I expected
Without a first frame the model interprets composition freely. Add camera and framing terms to your prompt —
close-up, mid-shot, wide angle, overhead — or switch to Image-to-Video for full control.Image-to-Video
Animate a generated first frame for more control.
Video Prompting
Motion and camera terms that work.