Traditionally, creative software has served primarily in the final stages of refinement and production. One reason for this is language: we had to translate our creative intent into tedious sequences of low-level, machine-readable parameters such as pixel coordinates and hex codes. Generative models have changed this. Instead of manipulating these low-level parameters, we can now express intent naturally, across various modalities—"what would this picture look like at evening time?" or "make this video match the style of these images." This shift enables software to move beyond production tools to become instruments of creative exploration.
However, navigating these vast latent spaces presents new challenges in interface design:
- How can we best visualize these high-dimensional spaces to help users discover interesting regions to explore?
- How can we enable precise user control while still leaving room for serendipitous moments of discovery?
- How can we support the nonlinear nature of creative exploration, enabling both divergent and convergent thinking?
We recently shared our philosophy around interface design for this new era of media, and today, we're excited to present a prototype that explores these questions through video keyframing.
Graph Structure: A Window into Latent Space
The Graph structure is the foundation of the prototype. Images are represented as nodes, serving as waypoints in the model's latent space. These nodes can be connected to other nodes to create an edge; a video that transitions from the first frame to the last frame across latent space and time.
Balancing Control and Serendipity
Precise controls help limit the vast space of possibilities, but at the same time, variation and unpredictability can result in "happy accidents"–possibilities that we would not have considered given precise control. To balance this tradeoff, we provide two affordances for users to manipulate images in a "relational" manner that allows unpredictability in consistent dimensions.
Users can transform selected images through “Image to Image”—preserving the original composition while altering the style via text prompts—and “Image Variations”—which maintains the original style while varying the composition.
Supporting Non-Linear Exploration
Creative exploration rarely follows a straight line. The graph structure naturally affords exploration by allowing users to diverge at various points, creating new forks of possible alternatives. As more exploration occurs, the graph grows naturally, tracking various experimental paths.
This allows users to construct non-linear timelines. We provide a sequencer to allow users to export their non-linear timelines into a video with a linear timeline, similar to a “choose your own adventure” experience.
An Open Workspace
Beyond the Graph structure, we do not impose any organizational constraints on the workspace. Users have complete freedom to organize their nodes and edges—clustering related explorations or separating distinct creative experiments as their process demands.
Exploring Further
Our prototype demonstrates how creative interfaces can evolve in the age of generative media. The graph structure provides a way to navigate latent space, treating images as waypoints and transitions as paths through creative possibilities.
Through continued experimentation and curiosity to find new interface primitives, we can realize the full potential of generative models not just as production tools, but as active partners in the creative process—expanding our ability to discover and explore new possibilities.