What is Remotion? The Claude Code skill everyone's talking about
Pierson Marks (00:00)
Welcome to Creative Flux, everybody. Episode 28 is the podcast where we talk about generative media, everything from AI video, audio, images, world models, text to speech. Last week we talked about Ralph Wiggum and sort of cloud co-work and some of that stuff.
it was a little bit more of coding episode than normal. But this week, I think we're going to focus a little bit more on the video side and in a similar vein as last week where you can prompt to code using Ralph Wiggum and Cloud Code Work and all that stuff. This week, I think we're going focus a little bit more on motion graphics and video because it was super viral this week and everybody's making really cool videos right now.
Bilal Tahir (00:38)
Yeah, yeah, no, I'm very excited for this because the tech behind this, Remotion, it's near and dear to our hearts because we use it extensively at JellyPod. So, you know, it's such a cool piece of software, you know, I've always, even before AI, it's just the idea that you can just code program video is just so in...
Pierson Marks (00:49)
Yeah.
Bilal Tahir (00:58)
It's like, wow, I can just write code and get something right there. It's like the closest thing to a nerd being a director, know? Before AI turned video. So yeah.
Pierson Marks (00:59)
to it.
Enjoy the...
I mean, I mean, okay, let's just jump into it. I mean, let's talk
about what Remotion is. So Remotion is a library that allows you to render React components into videos and animate them over time. So very high level. mean, when you're programming and you're building components on your front end, you have box, you have like dividers, you have text, you have images, you have just components.
And these components are what rendered on your screen. And most of the time on your website, you just have text in the middle, like your hero text, header images, and those are static and you can interact with the website and they're just static components. But what Remotion did was like, okay, we have these static components. What if we can animate those components over time? And so you can have the text appear in, appear out at specific frames.
they did was they built a library that allow you to do this very easily by just defining React components just like you're building a website but you define the React components and then you can render those and create videos and this has been around for now a few years small team over in Europe and it's really cool.
Bilal Tahir (02:17)
It's so cool. And I mean, love the insight. It was such a simple but brilliant insight that Johnny had, who was the creator of Remotion. And I think it stemmed from, I remember, what the key, just on a lower level, I think it's such a powerful insight that React is, when people think React, they think web framework. And React is not just a web framework. It's a rendering engine. So you can take React and you can apply it to so much.
so many things besides just using it on your framework. And there's actually, I think there's a repo with all these non-web use cases for React. And my favorite too are, obviously, Remotion is one, is, his insight was, oh, you can use the React diffing algorithm to...
apply it to video and React is known to all web developers. it's such an easy way to get introduced to that versus using a completely new framework where you have to learn the API, et cetera. And it's declarative. You just say, ⁓ this component comes in at this framework and it moves out.
React handles the rest. Similar to the other one, which I love is React email, is similar inside the people at Resend, is an email service. They built this open source package where you can write React components for your email. So, you know, this is my header, this is my paragraph, this is my image, and just use React code, and it builds beautiful email templates. So really powerful, you know, inside which I encourage people to check out, you know, all these other repos for that. But coming back to Remotion, as you said,
It's just components. You you define components, you're like, ⁓ this is an image component. This is a text component. This is an effect component. And these are the frames when it comes in and you get a video. And it's just code. And I remember...
Pierson Marks (03:59)
Right.
Bilal Tahir (04:01)
For me, when I started getting into emotion, it was so cool, but it was also frustrating because I'm like, but for like a 30 second video, I have to write so much code, you because I mean, you have to take care of so much. And when the chatty-pitty moment happened in late 2022, I...
Literally one of my first thoughts was like, this is going to change remote, remote you, because then I don't have to write all this remote code. can, because remote is decorative. You're like, you can do it. And LLMs can write code. And so you can just say in natural language, finally, hey, take this image.
put it in, let it come in in this frame from this corner and then get it out on frame 60, you can just write that and it'll do it. And it'll do it in a very deterministic, predictable way, because it's just writing code. So in a way it's different, you know, people are, we've talked about AI generated videos and you can kind of do it, you know, but you might, it might, the cat, you might say, give me a cat and it jumps from it, but then the cat is not the cat you imagined and stuff. But imagine you had the exact cat and you have the image, you can now,
program that and you can get a very deterministic
video. Now obviously it doesn't work for all kinds of scenes but there are some kind of scenes and people are generally I think the motion graphic launch videos and stuff are the ones that make a lot of sense right because it's just text and captions and cool effects and that is way that is probably one of we'll see other use cases but it's just such an easy way to build those kind of videos and I'm glad that this is catching up stream and it started with Remotion launching ⁓ tweeting out a skills package which is you can use it with Claude and it
blew up to 7 million views as of this recording
Pierson Marks (05:35)
Don't worry.
Right now it's really cool. mean, and one of the things I really want to emphasize on this, there's another package we've talked about also in Satori that does, you you're able to create images from React components. This remotion is kind of like that for video in a sense, you still have to render it, but it's the parameterization and the ability to actually explicitly define components. And if you're going to Sora, you're going to
Bilal Tahir (05:45)
Yeah.
Pierson Marks (06:02)
Vo3 and you want to create some video that has text like moving across the screen like that is all it's not deterministic you could say use times your Roman font and this text or move it from left to right across the screen and it might do good and it might work perfectly but the font might be a little bit off might not be perfect maybe the next generation is going to be a slightly slightly wrong and so you're letting this like not
deterministic
video model do something that really could be done deterministically. And so I think that there is this hybrid approach and we've been talking about this, you know, for eight episodes where let the LLM define a skeleton structure for stuff and then fill in the things that need to be deterministic with code. And so you can have layers on remotion where you have the actual text moving across the screen.
Bilal Tahir (06:45)
Hmm.
Pierson Marks (06:55)
The LLM can create the video structure and then say, hey, yeah. All right.
Bilal Tahir (06:58)
Now, no, that's such a powerful idea.
Agreed, 100%. I wonder if
somebody, I mean, I feel like somebody will do it. It's almost like you need a router. Like right now, you you have to decide, I want to use a motion set that up. But I wonder if there's like, like somebody just builds a smart tool, which is a router, which automatically takes your prompt and they say, okay, this makes sense as a deterministic code that times your Roman text. But then I'm going to generate an AI video background because I want something, you know, nice and cool. And you combine that, you know, so maybe that's a, does that.
Pierson Marks (07:27)
I mean, this is like, you
know, the MCP that we released at JellyPod, that was the Satori MCP server. Some of the templates there that we have is like,
you can use a image model to generate a background and then use Satori to layer on top of it text. So if you want to create a blog image for your blog article, you want the background to be like San Francisco, Golden Gate Bridge, blah, blah, blah. But you want it on top of it to say San Francisco and then date in the specific color. Let the background be generated by the model
the image and then let the LLM generate the text that lays on top of it and then you just combine them together and then you have a single image that is deterministic. It's the best of both worlds.
Bilal Tahir (08:20)
Exactly, I mean there's so many cool use because I think I saw one person did this they did captions but like fun captions where it was like the normal captions coming in but then she said he started yelling and the captions went to her back it was like a boom you know like a Batman type of pow effect and it was all you know just dynamic and I was like that's so cool like there's all kinds of these fun effects you can now create using this approach so yeah
It'd be very interesting to see where people take it. I do wanna talk about how this happened though. It's so funny, like the way this tech works. So if you go back to 2022, chat GPT, so the story, don't know how many people know this, but chat GPT actually was available six months before that as a model. And people were like, oh, there are some people like talking about, they're like, wow, GPT 3.5, it's awesome, very cool. But.
That was the model. But the thing was, it was just tech developers. And then somebody, as a random project, they were like, hey, what if we put a chat interface on it and just release it out there? And they did that in November. And it obviously, I mean, the rest is history. was probably one of the most insane viral things ever, Within the two weeks, everyone was talking about it.
Now, Remotion, similar story. We've all known we can create these predefined Lego blocks, put them together. People have done that. People have built abstraction layers, products on them. There's so many products that basically are built on Remotion every day. But...
What happened, and it makes sense why it happens now, December of 2025, the Opus 4.5 moment, which we talked about, happens where everyone discovers wipe coding. Opus 4.5 is amazing. Agentic, agentic is in now. Ralph Wiggum is happening. And then everyone get, we've talked about skills before, which is basically, ⁓ you create these skill.md markdown documents and you give it to your agent and suddenly your agent just, it knows, understands the API way better so it can just do a way of,
a
way better job. Then Vercell releases NPX skills, which is basically, oh, yeah, just have one line command where you can add these skills. So everyone starts creating their own skills. And then this week, Remotion says, okay, well, we'll create skills here's, well, cause they have a shit ton of documentation. It's like, know, crazy, all kinds of things you can do. So it's actually, Remotion is one of those few APIs, which I've discovered is the LLM struggles with, because it changes so much. The API has changed a lot and there's so many community packages
you can add on which it may not know, like 3D skew, morphism, blah, blah. They create all this extensive skillset skills MD files. It's just marked on files and how for a one line command and the tweet just blows up. And I wonder if this is like the chat GPD mode where something that's been possible for months, like maybe at least a year now, suddenly people are like, ⁓ wow, I can create motion graphs just by chatting to Claude. Happens so it'll be very interesting. I hope it does blow up, you know, cause the motion guys are.
Pierson Marks (10:58)
Totally.
Bilal Tahir (11:00)
I love them, they, I've yelled at them for them. They are not the best when it comes to product and marketing. They're like true nerds at Harvard. like, you know, they should be making so much money. I mean, and they do fine, like compared to like their billion dollar company standing on their product and their, you know, the fact that they don't get a slice of that is kind of tragic. So maybe, maybe this is, you know, what makes it fine.
Pierson Marks (11:21)
Maybe
they changed their licensing model and maybe we should... ⁓
Bilal Tahir (11:25)
I just think licensing always
is a dumb model in my opinion like the way at least for this is like you should just do and they did with the motion editor starter they had a editor starter package we supported them with that as well like at Jellypaw I think they should just build a managed player and I know they resisted that because they want to be pure and all that blah blah blah
at the end of the day, want to, yeah, you know, that's how you support open source product. You build something magic magic toasting kind of work. So they should totally build an agentic emotion, like, you know, chat interface where, cause they understand emotion better than anyone. built it. you know, just build it, you know.
Pierson Marks (11:58)
Yeah,
they should. Create the video agent, the emotion video agent, and then anyone can come in, give some assets, and then create the thing. Totally. They should. And charge $20 a month for all their users to create videos rendered locally on your device or in the cloud and Lambda. ⁓
Bilal Tahir (12:14)
Yeah.
Yeah, it'll be interesting
to see how these like the Higgs fields of the world because they're basically right now it's LLM wrappers. Oh use cling 2.1 view and I wonder now they'll add a programmatic like after effects style kind of using your motion under the hood layer where you're like, oh, you know, you just you can make it like darker and you don't need to generate a five second video just to make it the lighting darker or maybe well maybe with that case but let's say like we said text appear and stuff like that's just expensive. Why generate another 50 70 cent rendering which could probably will be very different when
you can just add captions and stuff you know on the video itself.
Pierson Marks (12:53)
Totally, I completely agree. I think what would be useful right now, I I would love to just jump in a little bit and explain how you actually do this for somebody that has never worked with emotion, never worked with skills, and actually maybe like walk through really quickly. And hopefully if you're listening to this only, we'll explain what's going on here. But I know a lot of people watch this on YouTube too, and it would be cool to show like something end to end created.
Bilal Tahir (13:17)
Yeah, yeah, let's do it.
so basically this is the Remotion Studio. And what you do is you create these components. So you can create components and stuff which are like different types of videos. We have like, it's the default here. These are some of our templates here as well, we, great, the motion for a gel pod, you should check out. know, they're a cool, awesome podcast with components, know, that you do all sorts of fun templates. And we actually have a free tool for that as well, where you, even if you have a,
non-jelly pod podcast you can throw it the audio in there and it'll generate viral scroll stopping clips anyway enough shelling there but so you create these components and then you can play them and then you know you can just talk to clark code you know and be like hey change that so this was something i made yesterday which is a a jelly pod launch video which is like goes through some of our and maybe we'll play it well should i just play the whole thing and then
So yeah, that was it.
So basically what, you I mean, you basically start with the concept. ours is like, you know, we have all these features at JellyPot. can clone your, you can create hosts, clone your voice, create a podcast. We can also like write documents using your own writing style and then images. And so I wanted a cohesive journey about all the things you can do. Right. And so you can see in emotion, you can like create these amazing infographics, like a dialogue box. You click a button, it goes, it shows a podcast, audiogram, et cetera.
So it's really cool that you can create all these things. And because, I mean, we're in our own code base, that's another advantage. You can just tell Cloud, oh, use some of our own assets there, use a similar color, et cetera, and add stuff. So very cool. And I guess one thing I want to mention is,
Pierson Marks (14:53)
All right.
Bilal Tahir (14:59)
Claude code doesn't have eyes so this can be very annoying because it'll do something and this text won't overlap right or there would be some spacing and you'll have to tell it. A good hack there is there's something called agent browser which basically lets Claude open and the agent see its output and it literally it's so cool you'll see it like click and then say okay now run the play record the screen and then it'll check it review it and then fix it and so that just increases the quality of the output so much you can do.
Pierson Marks (15:23)
Great.
Totally, totally, yeah.
Absolutely, it's really cool. I wanted to just take a step back real quick also for like people have never seen this interface think we're biased and like, you know, we're familiar with this. So what we're looking at right here is a basic you almost say it's a video editor. It's not really but it's very similar. So you have on the bottom here a timeline of music, which is right that green box is the music you have these multiple sequences and so you have those five different sequences on the bottom and a sequence
is
essentially just a series of components. So you have at the first five seconds here we have a single sequence that if we if we go there, can we move to the playhead to the first five the first sequence.
Yeah, so we see here we have that first sequence. And if we play this real quick and just show something on the screen, as we play and pause, we have the music on the top line. We have the ⁓ probably to create a host there. And then we have the.
Dr. Sarah Chen card that appears. So things will appear and disappear. And that's what that sequence timeline shows. Like when a component pops in is that blue box. And then when it leaves, there's nothing there. And so each one of these boxes represents one of these components, which is just code. And so what Blal was saying previously, we have
this box here that's Dr. Sarah Chen you have the emoji on the left the purple gradient background the description their title whatever and these are just defined components that Claude wrote
and then Claude then said, hey, I wrote this component, this card, this should be put into the video at the very beginning. And so you just iterate and you do that over and over again. You create these components and you figure out where should they go in the video and you let Claude do that. And then like the agent browser also gives Claude the ability to see its changes because it's kind of just like operating blind. It's very smart, but like it maybe screws some stuff up. being able to see what it actually
created is cool. And this whole thing is happening in the web browser. So this is not like a downloaded app. This is actually just literally a website and you have a web app and then you have your sequence, your player on the bottom, you have your playback buttons, you have all the components on the left, and then you actually have on the right hand side, this is where it's really cool, is props. So ⁓ we don't have them here.
Bilal Tahir (17:48)
We don't have them here, but yeah. These are just
default hard-coded things you can add. So you can say, name, title, Dr. Sarah Chen, astrophysicist, but let's say you want to show a male avatar or something. You can just change those props, and then this will change automatically.
Pierson Marks (18:03)
Totally.
Can we click on it? Let's show that. If you go to the composition one, I think, on the left on the lab product demo, if we go to like one of our JellyPod templates, scroll. Oh.
Bilal Tahir (18:11)
thing is like they're broken because they need
the default you can't access them in the studio so that's yeah I would have showed them
Pierson Marks (18:19)
Okay. interesting.
I thought I fixed that. But okay, cool.
Bilal Tahir (18:25)
yeah.
Yeah. But like, ⁓ they're basically, it's a basically a Jason, know, schema and that's like, you know, obviously manually updating. You're like, that's annoying. But like, imagine you can then take this and build in your web app, you can have like a dropdown or you can have prompts, which literally generate new props. So you can have, you know, different types of like, you know, ⁓ I want to show maybe different languages here or whatever. Right. So you can just say,
Pierson Marks (18:48)
Right. So instead
of going back to Claude and like asking it, you could modify here. So your voice in 70 languages, rather than that being hard coded text, it's just a variable. And then it'll show up on the right hand side here at props so that your voice in 70 languages will be there in the props. Then if you say your voice in 75 languages, you should make that edit on the right hand side and it shows up.
Bilal Tahir (19:09)
Exactly.
Yep. Yeah. So very powerful. Changing everything from colors and everything. And then one thing, what was it? There was a...
The way it works in terms of just technical detail, like if you want more stuff to come on top, you know, just put it on top. So the way that's kind of the z-index quote-unquote of Remotion is like the stuff that comes on top, you know, literally comes on top in the Remotion. So that's how you order stuff. And it's really powerful. And once you understand the paradigm of React, you know, the actual like declarative way you can just do things like you...
just create components and you just declare which frames it appears in. You can use sequences. There's another one called series, which is a little, if you don't want do a little more intelligently and not just like one, two, three, very powerful stuff. So. ⁓
Pierson Marks (19:57)
Can we show,
can we open up Claude here and just make some change with Claude? I think it'll be pretty cool.
Bilal Tahir (20:03)
Let's see. So what do we want to do? Let's see, Lab Launch. This one was any content, podcast comes in edit. Or maybe actually just take the main one.
Pierson Marks (20:12)
Honest.
Bilal Tahir (20:12)
We had
Sarachan come in, we had voice clones, we had a NASA podcast, publish clips, images. Create a host. So you create a host and then you clone your voice, but maybe I wanted to...
design.
Pierson Marks (20:23)
Yeah,
what if we just take the creative? So if we go to the very first frame here and then right when the creative host pops in, what if we say, let's move creative host to be centered in the screen. So instead of being like off centered a little bit, centered and much bigger, can we just like ask Claude to say center creative host, make it much bigger and then animate it upwards when the bottom card comes in.
Bilal Tahir (20:45)
Center Create Host and then what was it?
Pierson Marks (20:48)
Make it bigger.
like three times big or something and then animate it upwards when the bottom card appears below it.
Bilal Tahir (20:49)
and yeah.
Pierson Marks (20:57)
I know we can't see Claude here, but Bilal is typing on it. yeah, and how does Claude then connect into this? So how is it actually making these changes?
Bilal Tahir (20:59)
Yeah, I mean, just, yeah. Hopefully we'll just.
It's regular
just updating the code and this heart reloads. That's a cool thing.
Pierson Marks (21:14)
So there's
code running behind here and this is shown and this is like the Remotion Studio and then there's the code for this actual composition.
Bilal Tahir (21:23)
Yeah, very cool. you can see like now it's editing and it's hot reloading and stuff. for the agent browser, it's very interesting. If you just do directly, it'll just open up the browser yourself, but you can also do headless mode. So let's see, data change. ⁓ OK. OK.
Pierson Marks (21:27)
Is it iterating? OK, so it's making some changes.
yeah, it did. It did move up. Yeah. Well, that's cool.
Maybe we'll say, there it gets, it got bigger. Look. we have to watch it one more time. Is it done or still working?
Bilal Tahir (21:47)
Yeah, there you go.
Pierson Marks (21:52)
bigger.
Let's do one more. Let's make the background for the first one. Let's make it blue. Let's just show that the last like make the background of the video blue.
That'll be an easy change to visualize here.
So it looks at the code, it edits the code.
Bilal Tahir (22:05)
Yeah, but already I mean you can see like there's so much opportunity here if you build a product where it people who don't know code and stuff They just want to talk to a video like how powerful that be like you just drop in a video and you start chatting You know
Pierson Marks (22:17)
Mm-hmm.
It's so cool. And you can imagine, like...
As you get better, so you have your your remotion skill, but then you can also have your own little skills like your own theme guidelines. So if you're using remotion and you're trying to create your own thing for your company or your individual, say, hey, I always want all the components to be like purple and style like shadows and blah, blah, blah. And you create your own skill. Give that to Claude also so they can use both skills. Oh, there we go. Some blue.
like that, it's a gradient. Did you say gradient or did it just do that? nice.
Bilal Tahir (22:50)
Nice. I said gradient because I wanted it like a little
later, but then this look better like the lighting. So maybe it can be a little.
Pierson Marks (22:58)
That's cool.
Bilal Tahir (22:58)
Yeah, that is pretty sick. And again, because it's code, from a nerdy angle, we have other stuff we do like link checks, type checks. We run the code simplifier plugin, which just makes it just the code tighter and stuff, as you do. All the things that make your regular coding better make this better as well. It just translates over.
Pierson Marks (23:21)
Right.
There we go. That looks cool. I like it with the gradient back there.
Bilal Tahir (23:26)
Yeah, yeah, maybe I'll dim it a bit for contrast. again, I mean, this is like, yeah, all kinds of stuff. I wonder if, see, like I said, lighter, but then darker, and I was too dark. So you can already see, like, there's so many opportunities where it can be like, maybe there's a picker here or something, or maybe there's a more intelligent way for me to take a dial and do this. So, what else?
Pierson Marks (23:52)
Well, that would be the props, right?
So the gradient map.
Bilal Tahir (23:54)
That would be the props. Yeah, that'd
be the props. something like that. But like, yeah, the UI is, there's so much opportunity here to make a very nice intuitive UI for this. And there's like a million UIs. You know, obviously for video editors, video editors, everyone has their own flow, but still, I feel like it's still too complicated. And there's nothing like dead simple, you know, like the Apple of like video editors.
Pierson Marks (23:57)
All right. ⁓
Well, who's that one guy that's creating that? He's like, I think it's this YC company, not Mosaic. Mosaic is one of them. There's one guy that's like big on X, and I see his tweets all the time. I forget who it is, and he's always talking about like, I'm creating the cursor for a video editing. And I know he's using this behind the scenes, you know?
Bilal Tahir (24:37)
⁓ yeah.
definitely, definitely. Curse for vision.
Pierson Marks (24:42)
But also
just hard. mean, like anything, it's easy to get to the 80 % and then you go, it's like just as hard to get to the 90 % and then even harder to get to 100%. And so it's just, yeah, to get the baseline thing working is easy-ish. But then it's like, okay, now I want to do this thing. Now I want to do this thing. And then you have to kind of look at all these different use cases and figure out how do you actually add new features without sacrificing the user experience of all the other features. And that's just like
product management, engineering, ⁓ all these really cool things.
Bilal Tahir (25:12)
Yeah.
You're right, you're right. The
last 5-10 % is always, I mean that's where you spend most of the time. You know, it's hard to munch on it.
Pierson Marks (25:22)
That's where the money's made because when code becomes easy to generate, when you can make anything pretty quickly with cloud code and everything, getting to 80 % is going to be even easier. But getting to 100 % is going to be even harder. it's cool.
Bilal Tahir (25:37)
Yeah,
it's cool. Yeah, just going through some stuff you can do. mean, I was playing around. But these are all effects you can easily build on there. It's funny, one cool combo, because we've talked. There's cool UI libraries like a certainity. I its name, but even Chaat Cian. But people have built these amazing CSS text effects.
This is code. mean, this is basically, you you can use CSS styling here. So you can translate that over like effects like, you know, text typewriter effect or flowing in or gradients and stuff. You can literally copy paste those open source like CSS libraries, bring them here and build a video version of that, you know? so much. Interesting things.
Pierson Marks (26:18)
Totally. Do you think there's
going to be a Shad Cien for a remotion?
Bilal Tahir (26:21)
Yeah, I mean, does it need to be for remote? Because you can just use Shazien, right?
Pierson Marks (26:27)
I just wondered
if there's some of the animations and stuff that's a little bit more... definitely still react, but like...
Because Shadsyny is also good, is because they built in all the accessibility through RADX, base UI, and all those little things that don't matter now in video. But the things that do matter in video is the slight little small animations, and there's other things that just, if I could just plug in the right text, and animate typewriter or something, I don't know. Maybe it's easier.
Bilal Tahir (26:56)
Yeah, I mean, that is actually
a good idea. Because I mean, Remotion does have a showcase. And their Discord is pretty cool, because people will post about stuff they've built. Sometimes they do open source that. There's getups. But yeah, you're right. Maybe there's something. There should be like a Remotion component or like cool type of library, which have basic stuff like cool backgrounds and then text coming in and stuff. And then you can take that and then tweak it. So that could be cool. Maybe there's an opportunity there.
Pierson Marks (27:23)
Yeah.
Bilal Tahir (27:24)
somebody to build up. So, yeah.
Pierson Marks (27:27)
yeah, well, that's super cool. I know, we wanted to kind of focus on this, but was there anything else that we wanted to touch on before we?
Bilal Tahir (27:36)
Yeah,
I think on this note, one other thing I want to mention, which I think is super sick, saw on the local Lama Reddit subreddit was LTX2. So LTX2, we've covered it before. It's basically an open source model by Lightrix. 19 billion parameters distilled. Very cool. You can run it. It's probably one of the better open source models. The cool thing about that is, unlike Vue 3 or Clink,
They gave it as open source. The model is out there. And something, for some reason, I didn't because.
we see this with images, know, people, ⁓ Quinn and all these other ⁓ labs have released open source models. take that, they create LORAs out of them. So they'll create like a cool anime style or whatever. And you can generate that particular style of image over and over. Turns out you can do that with video. So people are taking this LTX2, training them on LORAs and creating community versions of video effects. And I was like, wow, this is, this kind of blew my mind because I'm like, wow, you can create amazing effects. That's the cool part of,
open source, you can let people go wild and the amount of possibilities go explored because the community as a whole has way more imagination than you will. And so one of the cool ⁓ examples of this I saw was this guy took one of the Lora's, was an audio video, Laura, which really focuses on lip sync and you can give it any song and it will take an image and create a video where the person in the video will be
singing that song, like, and it'll nail not just the lip sync, but the emotion. if it's like a high note, it'll like, you'll see the character like really shouting that note and it looks very realistic. And I was like, wow, we link it in the show notes, but like, I was like, this is amazing. You know, like what other cool things you can do. It's similar to the emotion, you know, it's like the power is in the tinkering. And so you combine AI generated video where you can tinker and tweak and come up with your own, maybe a multi-angle or a lighting Laura or a video or singing Laura.
Pierson Marks (29:10)
Hello.
Bilal Tahir (29:29)
then you have the motion and you tweak the components and that's just like combining those you have the sky's the limit for what you can do
Pierson Marks (29:36)
Totally that'd be super interesting
Yeah, totally like imagine where you generate Like what i've been seeing recently is that these upscaling video models. So like you take like grand the thought of four and then you take some footage of that and then you Put like an upscaling model. So it's just taking a video It's all pixelated in bad graphics and upscaling it So like the characters look real people the cars look real everything looks real And I think I think this is like what the future of video graphics
will look like where you have you know you max out or you get like a low res polygon sort of procedural generation of video and then you put on top of it a generative model that's like a laura trained specifically for like what your your graphics want to look like and then you layer that so like the the actual code is like like the the consistency of the the car the person it's like pretty much that person
just maybe doesn't have the freckles or it's a little bit lower res and everything. And then you let the generative graphic engine kind of just upscale that to realism. And maybe every single time the freckles a little bit off, not in the same spot, or the car has slightly different imperfections slightly, like the non-determinism of it, but it looks real. I think we're at the limit today of the actual video games. They're not going to get better in terms of realism.
It's kind of like it's plateaued, like you're not seeing significant changes. But if we actually then put an AI model on top at the very end, the last part of that pipeline, and the video gen pipeline, is AI, you can actually then get photorealistic graphics with the consistency of non-AI.
Bilal Tahir (31:19)
No, I
think that's a great idea. It's also a very cheap way to iterate because you can have this. There's something about having low latency, real time feedback. You get that, you play around with it once you like, and you know what you like and you're like, all right, now let's upscale, you know, do the thing. So there's, there's so much opportunity, I think with that kind of workflow. And I think ⁓ we'll see that with WordMod, we didn't talk about WordMod, but WordMod, WordLabs, probably will find this interesting. They just released their API as well. So I mean, you know, you're a developer and stuff, you
Pierson Marks (31:33)
Mm-hmm.
World Labs API.
Bilal Tahir (31:50)
So you can now programmatically, maybe that comes into play. So maybe that's the third dimension. Now you put in a whole world's inner motion and you're running text and you're having AI gender videos now. So yeah, it's crazy how all these things are.
Pierson Marks (32:01)
Oh, that's interesting. Do you know that does the API
allow you to generate the world and also navigate within the world? Like what's, do you know what the API is?
Bilal Tahir (32:07)
I haven't looked
at their API. wonder, curious. don't know how you, because generation is probably like, oh, you give it an occurred post request, but I don't know how you, there's probably a complicated way you move the character, right? That's probably a more of a, and you probably need a web socket or something for that, right? Cause that's a long lived connection.
Pierson Marks (32:24)
Yeah, I mean, but it
could be also just you having, cause the way that you have all the Gaussian splats inside of the, you have like the world generated and then you have a camera and that camera moves around based on your keyboard input. But I wonder if like the API could just be like, ⁓ place the camera, like you get coordinates and then you can move the camera forward or back. And it's not like real time, but it just send you back to like snapshot.
Bilal Tahir (32:41)
Right, right, yeah.
Yeah, I wonder if we
can add emotion components to the world. Like you create the world, then you add emotion components inside it and then point the camera at it or something. then you have like a 360, like, you know, jelly pod features, like kind of a world. That'd be sick. Yeah. Yeah. So yeah, I mean.
Pierson Marks (32:59)
Yeah, that would be cool.
Bilal Tahir (33:03)
So many cool possibilities. We're living in such exciting times. Hopefully you guys listening, this has started off some ideas in your head. the best way is just to play around with these tools and see what inspiration comes.
Pierson Marks (33:15)
Yeah, I've
said this to many people and I think 2026 is the year of AI creation. So it'll be exciting to see. Last year was the AI, I think 2025 was the year of AI coding. And this is the year of creation. I think we're gonna see vibe creation, vibe, all of that stuff explode.
Bilal Tahir (33:33)
Yes, yes, definitely.
Yeah, we see that with co-work, with everything, like basically all the non-technical people like kind of getting on the cloud coding agent take bandwagon. And that's like the majority of people. So it'll be cool to see like what they create because they have a lot more imagination than a lot of developers, let's just say.
Pierson Marks (33:53)
Understand. Totally.
Well, on that note, episode 28 covered a lot of stuff. Remotion, congrats on explosion of popularity, well deserved. OK, well, we'll talk next week, and take care.
Bilal Tahir (33:59)
Woo, yeah.
All right, take care guys, bye.
