The Turing Reel
Can you tell real video from AI? We showed 1,000 people two videos from the same frame - one real, one generated. Less than 10% could tell the difference. Try it yourself.
Thursday, January 22Evaluating Recognition of
AI-Generated Content
AI video generation models have improved exponentially since we released Gen-2, the first publicly available text-to-video model, in early 2023. Two years ago, these models took several minutes to generate choppy, pixelated clips that were a few seconds long. Today, leading video generation models can reliably produce outputs that are virtually indistinguishable from real video.
This week, we released image-to-video capabilities for Gen-4.5, our latest base model. Today, we're publishing new research, evaluating people's ability to determine if a five second video is real, or was generated by our model. We're also launching a new site where anyone can try for themselves.
For this research study, we recruited a random sampling of 1,043 participants. Each participant viewed 20 videos (10 real, 10 generated) in randomized order and judged whether each was real or AI-generated. Each video was generated only once – the outputs were not edited, and no video was regenerated to improve quality or skew results.
Results
Over 90% of participants could not reliably distinguish Gen-4.5 outputs from real video.
Only 99 of 1,043 participants (9.5%) achieved statistically significant accuracy (≥15/20 correct, p < 0.05, binomial test). Overall detection accuracy was 57.1%, only slightly above chance. Performance was similar on real (58.0%) and generated (56.1%) videos, indicating no systematic detection strategy.
Detection accuracy varied by content category. Human-related videos (faces, hands, actions) were easier to detect (58-65%), while animals and architecture fell below chance (45-47%) – participants were more likely to mistake generated videos for real than vice versa.
These findings represent a fundamental shift in how we should think about video authenticity. For years, we've been building toward General World Models. Realistic simulation is a prerequisite for solving hard problems in the physical world. Gen-4.5 is the most capable simulator we've built yet. But that capability comes with responsibility. When 90% of people cannot reliably distinguish synthetic from real footage—and when generated content in certain categories is more convincing than reality—detection is an inadequate strategy for trust and verification.
Conclusions
Video generation models will continue their exponential improvement, assuming we continue to scale training data and compute. The AI industry and society at large have reached a tipping point, where the average person cannot determine if a video is generated by AI or not.
From photography to photoshop to traditional CGI, technology has consistently shifted public opinion on what makes a piece of content "real." As AI models continue to improve, we expect another, similar shift. We believe that foundational model developers, including Runway, have a responsibility to drive public conversation around the quality of model outputs, and explore how we can mitigate the societal challenges this technology will introduce while continuing to push the boundaries of AI research and innovation.
All Runway-generated outputs include C2PA metadata, allowing us to certify the origin and provenance of the content our models produce. This open technical standard is embraced by a wide variety of media companies and news organizations, but it is not infallible. We need to build new, more capable standards that preserve trust while enabling creative possibility. That requires technical solutions like C2PA, but also new literacies, updated editorial standards and ongoing dialogue about authenticity.
Moving forward, we're committed to three principles: maintaining transparency about our models' capabilities, collaborating with industry partners on verification standards and engaging directly with creators, enterprises and policymakers to establish new norms for synthetic media.
Methodology
Source videos were sampled from Filmpac across five content categories: faces, full-body human motion, animals, nature scenes and urban environments. For each category, we selected examples representative of content people often aim to generate. The first frame of each video was extracted and used as input to Gen-4.5 with default settings. Each video was generated once, with no regeneration or post-processing. Real and generated clips were trimmed to five seconds and matched in resolution. Participants could view each video for up to 10 seconds before making their judgment. Participants who achieved greater than 75% accuracy (≥15/20 correct, p < 0.05, binomial test) were classified as successful detectors.
how scientific progress is made and how
the next frontiers of humanity are reached.
Runway Gen-4.5: A new frontier for video generation.
State-of-the-art motion, prompt adherence, and visual realism.