Generative AI is opening new possibilities for creating and transforming video in real time. In this talk, the speaker explores how recent models such as StreamDiffusion and LongLive push diffusion techniques into practical use for low-latency video generation and transformation. The speaker provides a deep technical walkthrough of how these systems can be adapted for streaming use cases, unpacking the full pipeline – from decoding, through the diffusion process, to encoding – and highlighting optimization strategies, such as key–value (KV) caching, that make interactive generation possible. The talk also discusses the trade-offs between ultra-low-latency video transformation and generating longer, more coherent streams. To make it concrete, the speaker presents demos of StreamDiffusion (served with the open-source cloud service Daydream) and LongLive (explored with the open-source research tool Scope), showcasing practical examples of both video-to-video transformation and streaming text-to-video generation.

