Anthropic's Legal Victory, Interactive Media, Generative Workflows

PiersonMarks:

Hey. How's it going?

Bilal Tahir:

Good. How are you doing?

PiersonMarks:

Good. Good. Happy Friday. Happy Friday. It's beautiful day here in San Francisco.

Bilal Tahir:

Oh, yeah. Shitty day in Seattle. So Oh.

PiersonMarks:

You know. Bummer. Bummer. Yeah. I'm so excited.

PiersonMarks:

This weekend, it's gonna be it's 70 degrees, sunny.

Bilal Tahir:

Nice.

PiersonMarks:

You you don't get that many of those here, so excited for it. But

Bilal Tahir:

Oh, wait. You don't? I guess, yeah, summer is a little gray and has a problem. Right?

PiersonMarks:

Yeah. This this last week, it's been a little gray. I decided to I was just sharing this, but I decided to wash my rug that I had this dirty rug that I went outside and I I washed. But of course, when it was drying, it got all like the the fog came in and it was just Oh, no. Not a not a good good thing.

PiersonMarks:

But

Bilal Tahir:

yeah, so, well Depends, SF is like 20 different weather systems depending on where you are, so.

PiersonMarks:

Totally. Yeah, well, to everybody listening, welcome to Creative Flux, the semi technical podcast diving into the world of generative AI. I'm Pearson Marks.

Bilal Tahir:

And I'm Belal Dyer.

PiersonMarks:

And we just dive into the world of gen AI media, really focused on what happened this week, what's the cool things that people in the AI space are excited about, specifically around audio gen media, like images, video, how like, what what are the workflows that are actually working? I know over the past month, we've seen Greg the Stormtrooper trooper go viral, Yeti, Gorilla, all of all of the above, Sasquatch, Bigfoot. And so the world's changing so much. I mean, do you follow Justine Moore from a 1680?

Bilal Tahir:

Yeah. Followed her for a while, but she's, like, she's just been posting so many bangers. Like, yeah, I feel like I feel that I've I'm liking basically every tweet she puts out because she'll have just the funniest, you know, Jenny I stuff. And actually, she's a twin so her twin sister, Olivia Moore, is they're both a 16 z partners, and so she also posts in the same space. I think they have a podcast together too.

Bilal Tahir:

Think I watched clips of it. So yeah.

PiersonMarks:

Are you sure they're actually twins or is it like a like a AI avatar and they're just saying they're twins?

Bilal Tahir:

Who knows? Maybe. Yeah. I think they're twins but yeah. I don't know.

Bilal Tahir:

Both very smart people, very cool, very cool content. You know? Definitely check them out on Twitter.

PiersonMarks:

She posted a lot of stuff. I mean, like, I saw the other day I I just follow those threads. And it's so cool she's just showing kind of, I mean, like, what we're aiming to do in some sense, but it's very more she's doing practical stuff. It's like, hey. I did this, and then here's how I did it.

PiersonMarks:

Like, I used, you know, whatever, the new flux context.

Bilal Tahir:

Right. Laura What I find so inspiring is because, like, you she's, like, not technical or at least she doesn't use, like, APIs or whatever, and so she'll be like, I went to this tool. I do this. I did this, and I think, oh, in general, we can, like, combine that. Like, you know, we can stitch this together, you know.

Bilal Tahir:

So there's so many opportunities like that. She she did this podcast one, which was similar to I Know What We Wanna Do is where she first made the audio, and then she animated the using v o three, the the different clips or not v o three, but another video tool, and then switched them together. I'm like, yeah. That should just be one thing.

PiersonMarks:

Right. Totally. Totally. Yeah. And I wanna dive in a little bit later about, like, I saw yesterday you you made this, like, two minute long trailer Viking sort of thing, and I I would love just to, you know, in a second, to hear about, like, how you were thinking about, like, why you chose that concept.

PiersonMarks:

Like, was there anything behind the Vikings? What was that workflow? What was that process like for you that where you went from, you know, the first tool to the next and, like, how you thought about that and how you built it. But, I mean, I think we just talked about this. A lot of stuff happens super quickly.

PiersonMarks:

And based on some feedback, I thought it'd be cool just to run down a little bit about maybe what we could talk about on this episode, what's happened this week. It was a slower week for sure, I think, in the Gen AI media space. Last week we talked about 11 Labs v three and the expressive voice models. The API is still not out. I even I went

Bilal Tahir:

to wanna Level Labs, if you're hearing this. But also, just to say, I like when we say slower week, we're like, only like 10 things came out instead of 20. Right? So that's the thing.

PiersonMarks:

Right. That's slow. But Slow slow for us. But, yeah, I literally I went to an AI Tinker's event on Wednesday, and it was spawn it was good. It it was sponsored by Eleven Labs.

PiersonMarks:

It was voice agent for science fair. It was the first one that I went to here in San Francisco. So I was really debating on, do I do I go to the first one and demo? We we're not really a voice agent product, so it kind of felt out of place. But after watching the demos, I was like, I could have demoed.

PiersonMarks:

But it was cool because it was all around voice. I got to meet a lot of people in voice that were building things like what Alexa should be about controlling, you know, all your lights and your fans and dishwashers in your home with voice agents. And then there are some people talking about evals for voice agents and how do you ensure that your agent is handling angry If you're a call center agent, how do you measure the performance of that agent? Is it handling angry customers correctly? Is it like understanding the tone and all that stuff?

PiersonMarks:

So that was pretty cool. I met the founder of Daily there.

Bilal Tahir:

That's a key key Keehawap? Keehawap? I forgot his name.

PiersonMarks:

Yes. Yes.

Bilal Tahir:

Yeah. Yeah. And I met him at the AI engineering center. Called it.

PiersonMarks:

Yeah. Totally. Yeah. It was cool. So

Bilal Tahir:

They actually yeah. They made this thing called DailyBots initially, and I was using them. It was actually one of the first open source products under the hood. Took they did speech to tech did did speech to text using, like, a really fast oh, Grock Whisper. And then they used was it a level Carti they used Cartesia, which is a faster 11 apps.

Bilal Tahir:

You know, they are focusing on flash models. Yeah. The guy, I think Right. Who created Cartesia was invented flash attention or whatever. Right.

Bilal Tahir:

So you and so they they use Cartesia and then they used oh, yeah. I guess that that that would be the direct text to speech. And so there they they made something less than a second ripple response, so it was super fast. And this is like now it's going way more nuanced, but that was one of the first products I saw. I was like, wow.

Bilal Tahir:

This is like open source models, you can bring your own keys, you can put it together and it can be super fast, make your own voice agent. And then they kind of I think it's the same product, but then they renamed it to Pipedrive Cloud, and they basically offered a managed hosting, but I think there's an open source component to it. It's a very interesting product. If you ever did voice agents, I would definitely be looking into that for sure.

PiersonMarks:

Right. Right. Yeah. No. It's yeah.

PiersonMarks:

Like, even when we're talking about this, like, there's a few things that also came out, like, this week, there's Vercel ship, which we won't get into any of that stuff because it's not gen I gen AI media, but a lot of cool products. If you're in the tech space and you use Vercel, which they've dominated the mind share of AI engineers. Like like, we use Vercel. I like Next. Balal has used Next forever before it was even called Next.

PiersonMarks:

Zite.

Bilal Tahir:

Yeah. No. It was called Next, but the company Vercel was called Zite, which I actually I I like that. I think it was a German word. But No.

Bilal Tahir:

Next is awesome. Yeah. Vercel chipped a lot of stuff, like you said. It wasn't about AI, but it holds up, I feel like, the whole AI industry. I mean, fluid compute, queues, what was yeah.

Bilal Tahir:

I mean, I remember we were just posting every Vercel announcement, we were just posting on our side. We're like, oh, we can use this for this, this for this. You know, it's just really interesting stuff.

PiersonMarks:

Totally. It made me wonder that honestly, as I was I was telling you how, like, I did some, like, login auth work just because I don't I really don't like our auth pages. I think they're, like, the the worst designed parts of the app, There's some bugs there as well, which is bad, I I realized. But they had this, like, Vercelo announced this whole, like, bug bot detection where, like, everybody hates the captchas where you have to, like, type the Yes. The letters and it's it's just they're so annoying, and that's that's what kinda got me down the rabbit hole.

PiersonMarks:

Like, let me look at the auth stuff and

Bilal Tahir:

but Oh, yeah. Do we use the I guess we can, the reCaptcha thing on the client's I guess we have turnstile. Right? Cloudflare, probably?

PiersonMarks:

There there's, like, basic bot detection Right. But nothing on the auth form. And yeah. I mean, there's that's maybe maybe I add that. I don't know.

PiersonMarks:

But we'll see. Yeah. But coming back to, like, Gen AI stuff, like so this week, I think some of the things that I wanted to talk about today, one of them, big court case ruling. Mhmm. Talk about that.

PiersonMarks:

Anthropic, a federal judge ruled in favor of Anthropic on essentially their argument that training AI models constitutes fair use. So that was a US ruling by a federal judge that agreed with Anthropix fair use claim. They did and then we'll dive into, like, the piracy aspect of that later. Runway Gen four announced their API. Cling AI also announced motion control in their studio.

PiersonMarks:

And then there was, like, Flux context dev, which makes it super cheap to fine tune image models. So all these things that I would love to like, you know, talk about. And then there's like a higher level about context engineering that, you know, it might make sense to talk about or it might not,

Bilal Tahir:

but Context engineering. That's the new buzzword.

PiersonMarks:

Context engineering.

Bilal Tahir:

I think I made fun of that, like, we're or somebody somebody I feel like it's gonna take off. It's gonna become a thing.

PiersonMarks:

Totally. Maybe we buy contextengineering.com, and then we just be like, Jellypot is the best at context engineering.

Bilal Tahir:

Wouldn't it be funny if you insert back end, front end, you're like, okay, this guy makes the system prompts. He's a system prompt engineer, you know, and this guy makes the user prompts. And like, oh, I I focus on the baseline, you know. You're Yeah. So sick.

PiersonMarks:

If you have the English degree, I mean, that that those degrees probably worth some money right now, you know? Come, you're a writer? You're a system prompt engineer. Context engineer. Yeah.

PiersonMarks:

But

Bilal Tahir:

That'll be the ultimate thing when all the engineers lose their jobs, but the philosophy majors are the ones, you know, running the show.

PiersonMarks:

Totally. Totally. Well, yeah, let's just keep this off. Let's I wanna let's jump first into the the copyright stuff, Anthropic. And then next, let's jump into the workflow that you went through when you were creating that Viking short, I'd say.

Bilal Tahir:

Right.

PiersonMarks:

The video.

Bilal Tahir:

And we'll by the we'll leave all the links that we've talked about in the in the notes once we're

PiersonMarks:

Totally. Totally. Sweet. Yeah. So I know I just mentioned this, but there was a a well known federal judge who this week ruled that Anthropic, the makers of Claude, their training constitutes fair use in the way that they were they took books, physically took books, ripped out the pages, despined those books, scanned those books, and used author's works to to train Claude.

PiersonMarks:

And that was the premise of that that's what Anthropic did. Authors came out and said, hey. You took our work and illegally violated our copyrights by not compensating us for, you know, you taking our stuff and then redistributing it to everybody. It screws authors, and it makes the incentives for authors to, you know, why would an author wanna write a book if, you know, it's just gonna get sucked into the machine and be redistributed. And the judge essentially said, just like how humans can check out a book at a library or can go to Barnes and Noble and purchase a book to to become a better writer, to be inspired by tone and prose and, you know, learn how to write by reading.

PiersonMarks:

AI does the same thing, and it creates transformative new works and training on those materials constitutes fair use. And so I think this is a pretty groundbreaking thing in the creative space because, you know, the copyright holders both in images, in video, and in writing, this impacts a lot. I expect this to go to Supreme Court because it definitely be appeals. But as a preliminary ruling, it's yeah. I mean, it's interesting.

Bilal Tahir:

Yeah. No. For sure. And, you know, if you're, you know, a techno accelerationist, you're it's probably good news that, you know, we're we can do this, models aren't bogged down by lawsuits and they can train on the best day. It's something I feel like all the labs have figured out.

Bilal Tahir:

I think for years now that it's garbage in garbage out. The quality of the data is one of the biggest things that determine how good the model output is gonna be. That's why I think at Claudette, anyone who's used Claudette, the writing is really good in Claudette. I think this is one. They've really made spent a lot of effort making sure that the inputs that go into this model training are really good, high quality.

PiersonMarks:

Totally. And it's pretty crazy. I mean, like, the way that they went about this is, like, they hired the former I don't know, the former lead at Google where he essentially the the I forget his name, but he was the one in charge of the Google X Scholar program where, like, they were digitally, like, archiving all these books. And so Claude or Anthropic hired him, and his job was to legally obtain every single book in the world essentially and train Claude on it. Right.

PiersonMarks:

Pretty cool. I mean, like, think about it, you have to like, how do you figure out to get every single book in the world? Like, that that's strict your your objective.

Bilal Tahir:

Yeah. It's also interesting to me that they felt the need to do this versus just using common crawler. You know? Because I I mean, I need to get more into into the weeds of the kids, but I basically thought we had the whole Internet's text data at this point. So I thought the next frontier was images and videos and stuff like YouTube, which we can come to in a second.

Bilal Tahir:

It seems like it's not. It seems there's treasure trove still in these books that the models need to be trained on.

PiersonMarks:

So Right. And I I wonder, like, I'm not a I'm not a legal expert. I'm not a lawyer at all, but I wonder if there's some sort of when you, like, when you go to a website, there's terms of service associated on that website. If you, like, go and view a page or you download a PDF, it's like by downloading this, you accept the terms and conditions. But when you go to a library or when you go to Barnes and Noble and you buy a book, you're not accepting terms and conditions.

PiersonMarks:

There's copyright law, but there I wonder if that's kind of also has has to play with this, like why they got physical books. It'd be interesting. If anyone I wonder if anyone is a lawyer that's listening to this and they they wanna comment to me. Yeah.

Bilal Tahir:

It's a murky area. And touching on the same way in view, you know, you talked about images, but I wonder if this is gonna set precedence for YouTube as well because YouTube biggest thing has been it's public videos. And if you can train on YouTube videos, is that legal? And YouTube says that's against their terms of service. We know for a fact OpenAI does and probably like the all the labs, they use YouTube videos in their dataset, but they can't admit that because, you know, that opens up a can of worms.

Bilal Tahir:

So I'm sure that'll be a case too. And Google, in some ways, you know, they want that to happen because that way, the only entity that can legally train their model will be Gemini. So that would be a huge advantage if they but I don't think I think they'll settle or something else, but it's an interesting case as well to think about.

PiersonMarks:

Totally.

Bilal Tahir:

Because the same as like reading something, if I watch a million YouTube videos, it's in my head and I get inspired, you know, so I

PiersonMarks:

Oh oh oh, a 100%. Yeah. YouTube, data trove. Facebook, like I strongly believe that so Facebook is not I don't know. Google has an incredible ad business.

PiersonMarks:

Facebook Meta also have an incredible ad business. I've always felt that, like, Facebook is a little bit more on the, hey, we're gonna do whatever we wanna do with your data. Google's a little bit more of, hey, we're gonna fingerprint you. We're gonna serve you good ads. But, like, maybe they were a little bit, like, don't be evil, even though that's not necessarily their their slogan anymore.

PiersonMarks:

Mark Zuckerberg maybe leans a little bit more on the on the gray area, at least what I felt. Right. But I I totally see like, we were just talking about this a second ago about ads and creators and hiring UGC creators. It's this huge new industry where, I mean, over the last five years, you could be an influencer, you could promote a product, and based on your looks or based on who you are, you get brand deals and that's huge. It's a lot of people can make some solid money on that.

PiersonMarks:

It's harder than it looks, but people make that money. All that stuff is getting posted to to Instagram, getting posted to Shorts, they're getting posted to Facebook. It's going everywhere. Facebook has a lot of, you know, data as well posted to Instagram, and they say that they're gonna they they just come out. They're like, yeah, we're training on your posts.

PiersonMarks:

We're training on your videos. Imagine a world where us at for Jellypod, you know, we go to the Facebook, the meta ads platform, and rather than uploading a video that we hired a creator or an influencer to do, we just define we prompt, like, I want a female, 35 year old female, who has brunette hair, this is her background, and create the video. And so now Facebook Facebook in their ad platform provides that access to us directly. And then they can they can create variations. They can just test.

PiersonMarks:

Then you can make maybe make her brown haired, blonde hair. They could dynamically change what she looks like based on user demographics. And I think the advertising industry is about to get blown open from this. Like

Bilal Tahir:

Oh, it's interesting you said that because TikTok is launching this thing. It's actually what you're saying, it's, I think, in beta right now, but they're gonna be able to let you create AI videos like that. 35 year old Burnett, well, UGC. And so I do think this is the future con if you really, like, look on the horizon six, twelve months, I think majority of videos will be AI. And they'll be this, you know, scripted, maybe it's my audio and there's some different video, maybe with a different accent, whatever it could be, or totally from scratch.

Bilal Tahir:

I think most people will be, you know, like creating that now. Then that makes a question then a truly authentic video, you know, maybe that becomes more valued, you know, like this. But, again, that that also depends on the fact that the AI video is will always have uncanny value, which I don't think is the case. So Right. In a world where real videos, AI videos basically are the same, like, they look the same, I mean, why won't you just create AI videos?

Bilal Tahir:

Right? This is a yeah. It's a it's a crazy thing to wrap your mind around. I don't think we understand truly the second, third, fourth order effects of, you know, living in a generative AI first world. You know, it's kinda crazy.

PiersonMarks:

Right. Right. No. And I mean, my parents, I don't know about yours, but my parents never they didn't grow up around cartoons or animations. And that was something that was new.

PiersonMarks:

Disney and Pixar, and then you had the whole Nickelodeons, like the 5¢ nickel movies, and animation. And that became something that was, you know, in the later half of the twentieth century, in the early twenty first century, where a large number of shows were animated. And you could go to filmmakers from the fifties and the people back then, what is this? Animation? No.

PiersonMarks:

You have to have actors and you have to be recording. And probably seemed like to them, I'm speaking for them, but it seems very relatable to today where, know, hey, we can sit down, we can create a cool Viking mid century sort of medieval times video that's cool. I mean, like, who cares if it's an animation that you use CGI programs to do or use some AI tools and you can do it. It's like, if it's entertaining, it's entertaining.

Bilal Tahir:

Yeah. And I think we're getting there, like, in terms of different things, media, companionship, everything. It's a it's a can of worms. But

PiersonMarks:

It's a can of worms. And people don't like it too. Like, there's such a strong there's strong opinions, especially on Facebook. I saw mister Beast launch something

Bilal Tahir:

this week. Right? The AI YouTube thumbnail, which I think is one of those things that he's just so big that he has to do it, but it's a little stupid, I feel. Because We can give some somebody called it like, oh, I think it was Justine. She said, yeah, you're gonna hire the artist who's gonna go to the AI model and create it using AI.

Bilal Tahir:

So he is just a middleman.

PiersonMarks:

Right. Totally. Yeah. Wait. Wait.

PiersonMarks:

Give it give some context to what happened because I

Bilal Tahir:

don't Yeah. So what happened was well, actually, you might know more, but apparently because I didn't know about it, but mister Beast has an AI product where you could create YouTube thumbnails using AI. And apparently, he had a huge there was a huge backlash to that, and, you know, he basically backtracked. You know, he he came out with this apology video where he said, I read your comments, your feedback, and we're reverting we're not pulling the whole thing out and we're gonna let you commission human artists for it. Yeah.

PiersonMarks:

Right. Yeah. Now is it It's the innovator's dilemma. Yeah.

Bilal Tahir:

Exactly. Yeah. So, you know, think you think about us, you know, we can go for that. Like at Jelly Pot, you can come and you can create podcast cover art, you know, if you want, you know. It's all AI generated, but it's amazing.

Bilal Tahir:

So

PiersonMarks:

Oh, for sure. For sure. Yeah. Yeah. I mean, we're just segueing in, like, video creation.

PiersonMarks:

And I know I've prompt I brought this up. Can you just share with it? Like, share with me too. You created something. You posted it on X yesterday.

PiersonMarks:

I thought it was pretty cool. There's some Vikings and everyone. And you kind of shared, like, what the the models were that you used. Like, can you like, you wanna dive into that a little bit?

Bilal Tahir:

I think it'd be interesting. Was to give some context was I created this Viking video, which was a a montage. It's supposed to be a montage where they're getting ready for war, etcetera. It wasn't as good as I wanted to, but I I I time boxed those those things. So, like, I was, twenty minute because I had to go somewhere.

Bilal Tahir:

Was like, alright. I have thirty minutes. What am

PiersonMarks:

I gonna Really? You did that in thirty minutes?

Bilal Tahir:

Yeah. I mean, it was and mostly it was processing and stuff. Like, know, it takes a while. But because I have a tool. I have a I have my own little UI tool, and I think I've talked about this before.

Bilal Tahir:

But what I do is I paralyze a lot of this stuff. So I start with I usually go to Clotter of Jajapri, and I'm like, give me a script. A script is usually in a form of scenes. It's an array of scenes like and so first shot could be a Vikings hammering and, you know, on his sword making it sharp. And second scene could be, you know, Viking woman is just painting something on her face, whatever.

Bilal Tahir:

So you have these series of shots. And what inspired me was this song actually. And it's funny because the song was I I heard the song from it was a I think it was Sea Dance. They they made a trailer, and I love the song, and I found the song. And the way my mind works is I don't know how many how common it is.

Bilal Tahir:

If I hear I music gives me inspiration. Every if a music really hits, I immediately, in my head, I visualize a story with it. Like and that's not necessarily the the music video. It's my own story. And so it's one of the things I've always wanted to do is I create because I have all these songs in mind, I'm like, I wanna create a music video based on that.

Bilal Tahir:

So this one, was like, well, I I measured a warm on touch. And I was like, oh, this I can do because it's a pretty simple cinematic zoom scenes and stuff, so why not? So I took the took the song. I took a voice over, which is from the show Vikings. I haven't actually watched Vikings.

Bilal Tahir:

It's on my list for the longest time, but I was like Vikings, so maybe there's a good Odin, war cry, glum monologue. Just download that from YouTube. And so I had these two audio tracks. Then I created the script which is, like, you know, a bunch of random scenes. So I have that array that array.

Bilal Tahir:

And in my tool, what I do is, basically, I paralyze it's basically under the words of script that generates first the images. So you generate the images. You know, it's a series of image images. And if you like the images and I used a a I think it was a Imagen for for that. I also or sorry.

Bilal Tahir:

Context context for that. Some I go back and forth. But I used context, which we'll talk about. I know. But context is what we'll talk about is the editing part, which is where it shines.

Bilal Tahir:

But you can also use text to image, you can just prompt and get a image, which is pretty good using Flux Pro context. Create those images. Once I like the images, then I used a Holio two standard model video model to kick off a paralyzed video generation stream where that six hour time, it just generates five second videos on them. And then I merge it together, and that's my video. And then I just, you know, go to CapCut and add the songs and stuff on the audio there.

Bilal Tahir:

So So It's a pretty big the actual process is very quick because once you have the script, the hard part is just the getting the the scenes and stuff. Right? That's where I'm like, you know, it's crazy. I'm like, that's the moral. Like, just getting cool scenes.

PiersonMarks:

Gotcha. So let let me just try to repeat this back to you for like you know, I haven't spent as much time as I'd like, you know, making these videos. It It just looks super fun. And I think there's a lot of people out there that I think you said you paralyzed some stuff and, like, yeah, it's like some maybe some, like, stuff behind the hood that you coded up. And I think that's probably intimidating for a lot of people.

PiersonMarks:

And so especially people that aren't as technical. And then there's like the Zappiers, and there's like the the automation softwares that probably can get hooked up. So for the people out there that are listening that want to do what Bilal just did. So let me try to repeat what happened what you did. You had this idea of Vikings.

PiersonMarks:

It came from a song, you heard a song, and you were like, I wanna create a video montage of Vikings blowing their their horn and having a storming a like a area. So you took the song, and you went to ChatGPT, and you said, hey, create me 10 scenes. And in those 10 scenes, it was like, the first scene, a man painting his face. Second scene was something else. And so you you asked Chatuchi B T to create you all the scenes for, like, two minutes long, or is that kinda what

Bilal Tahir:

it was? It's just like and the scene is basically an image prompt and a video prompt because and you wanna distinguish between that because the image, you wanna describe the actual which is the first frame, which is like, oh, you know, man, viking, crazy hair, muscles, weird, intimidating, looking at you. That's the scene. And then the video prompt is what does he do? Maybe he, you know, looks, takes his takes out his axe and, you know, points it at you menacingly.

Bilal Tahir:

That's the action. So there's the action which is the video and the image is the setting the scene. So I generate that off the bat and you can I know I talked about paralyzing and stuff but you can do this you know in using fall or one of the generators where you can generate the images first? Know, come up with the idea, generate the scenes, then generate the images. Once those images make sense to you, you know, then you can spend time and generate the videos one at a time.

Bilal Tahir:

You know, you can even put this the prompts in and generate three, four at a time and it'll just keep running because it takes five minutes. The reason I like the the the image part first is because videos take a while and they also cost money. They're like, you know, this one, Holo two is like 27¢, I think for five seconds. Whereas plus Context Pro is like 3¢ for an image, and it can be even point zero zero zero three if you use Snell. So that's just a very easy, frictionless way to experiment and get the scene right.

Bilal Tahir:

And then once you know the scenes, you can kind of imagine, okay, this scene, this scene is gonna go. And then it's like filling in the blanks now. Okay. Have five seconds here, five seconds here, five seconds here, Right.

PiersonMarks:

Totally. So you go from ChatGPT, 10 scenes. Those scenes are first, hey, create the image prompt. So you get all those images prompts to like be like that first frame of that scene. And then you get those video prompts.

PiersonMarks:

So the video will take that image prompt and be like, now pan the camera from left to right or put flames or do that. So you go from, you know, text, you have some idea of what you want the whole thing to look like. So you go from that idea to scenes of 10 images, and then those 10 images, you expand those into 10 videos. Take those 10 videos, you put them into a tool like CapCut, and then you just stitch them right back to back, and then you're good?

Bilal Tahir:

Yeah. Yeah. I mean, and that's one way to do it. The other way a lot of people do is they'll generate one scene and then they'll take the last frame and then use it as the first frame of the second scene if they want a continuation of the same storyline. That's one way to, you know, have a more of a continuous scene.

Bilal Tahir:

So this is one reason I like the montage because it's just it can be a just a bunch of juxtaposed scenes. It doesn't need to necessarily, you know, have one continuous story. And this is like something I feel like is one of those hacks we're gonna be doing for the next maybe few months, but hopefully, you can just generate like a two, five, ten, thirty minute video, and you don't have to worry about this. But for now, these are the hacks, you know, we have to do.

PiersonMarks:

And that from what I understand, like, that was kinda like the whole flow thing from Google, where it's like you can extend, take that last last frame

Bilal Tahir:

Right.

PiersonMarks:

And then extend it and be like, hey, like, go from taking this image from the last frame of the previous scene's video and then extend it to the next one. Right?

Bilal Tahir:

Yeah. And I think Midjourney also lets you do that. Their video model that came out like a couple of weeks ago has been causing waves. It's pretty pretty good apparently. They let you do it up to four times.

Bilal Tahir:

So people have generated up to I think thirty thirty to forty second videos which Oh,

PiersonMarks:

that's cool. You have a preference? I mean, so there's there's Halyo, there is Midjourney's video, there

Bilal Tahir:

is Right.

PiersonMarks:

Sora, there's v o three. Like, if you were if you were to rank them based on quality to cost, like, is everything? I mean

Bilal Tahir:

Yeah. I I so for me, my big filter is API cause I'm a nerd and I need the API. So that's why stuff like Midjourney and Sora just immediately, I don't for me, they're not an option. I mean, if you're you know, if you like you know, if you don't care about the API, then definitely Midjourney I've heard is the cheapest apparently and really best quality model because you can get basically for the cost of an image, can get a video right now. And they might be subsidizing this, and you can get multiple generations.

Bilal Tahir:

So if you're creative, maybe that's a good option. And they're supposed to be the most artistic, beautiful models. In terms of the ones I use, I would say Cling is still number one, Cling Master. And every video model usually has a pro version and a light version. So Cling Master is supposed to be really good.

Bilal Tahir:

I I ranked that up there. Haulio Pro is also up there. I think basically up there. And they they're a bit more expensive, I think 74¢ compared to Cling's might be 48, so close ish. Right.

Bilal Tahir:

But they're up there. And then you have on the other side, you have Cdans Lite, which is 18¢. And it's basically a great budget option for simple scenes and gives you a good bang for your buck for the Lite version. And on the extreme budget side, there's LTX, which I've talked about before, but I love LTX because it's 4¢, five for a five second picture, which is ridiculous. Basically, within five to ten seconds generation, small model.

Bilal Tahir:

But as you can imagine, you can't really do complex stuff. It's great for simple scenes like zooms and motions. And if you don't want amazing high definition cinematic footage, you know, it's a great way. So I'll often do that. I'll create the images.

Bilal Tahir:

I do the LTX version first, and then I'll generate the new pro version. So so it's like a good way. You almost wanna start cheap and go up and up and up and spend more money as you get more confident in the scene.

PiersonMarks:

Do you no. It's super interesting. I mean, like, just like how we saw, like, I think a year ago, there were so many text language models, And there still are. Like, people are making language models. They post the weights to hugging face.

PiersonMarks:

And, you know, you have Ella Marina and they're competing, like, which is the best? And at some point, it's more of, qualitative where, you know, it's just like, what sounds good? And right now, it feels like we're in the same for video and images, where it's like there's all these different models. And I'm trying to understand like the end state five years from now. Are we gonna see one model that it's like a winner take all market, where it's gonna be essentially free to generate, or super cheap.

PiersonMarks:

It's gonna be good. Like, what I struggle to think about, like, what's gonna differentiate these models because you see a price go down, you see quality go up. Like, what's gonna keep mid journeys to compete with Sora? They're gonna have a a style that it's always gonna be in that style, and then maybe that's not actually good because of their TAM is smaller. I don't like, do you have any thoughts?

Bilal Tahir:

I I mean, it's an interesting question. Honestly, it's, like, so hard to look further out. But I I feel like there always will be differentiation. Maybe it's fine too. Like, we look at images.

Bilal Tahir:

Images hit this quality threshold where they're good, but then you have artistic images, models like ideogram if you want text accurately rendered on an image. So I wonder if videos will be the same. This is the if you want a nineties footage type of model, there'll be a fine tune model for that, high definition model, maybe an anime model, etcetera, know, and com companies just double down on that. But it's hard for me to imagine one model to rule them all because I feel like that just haven't really seen that happen in any other domain, you know, because there's always a way to differentiate, but who knows?

PiersonMarks:

So yeah. Because, like, I I when I think about this, it's like like, first principles. I mean, what does a video model do? It's like you take some prompt, you adhere to that prompt, and then you create a video, and you you create an image out of it. And based on that prompt, like how good that prompt is, the video, you know, would the the model would adhere to that user's prompt either, you know, as strictly as the the model was trained.

PiersonMarks:

Like, maybe there's more imagination. Like, some models may take your prompt, like your short little prompt, and you might not explicitly say something, but they might just make they might just assume that you wanted your video to be better. So they might put, like, if you're an astronaut in space, they might put some stars in the background, and they make some assumptions. And then on the other side, that might not be the right thing for, you know, a lot of creatives where it's like, hey. I'm gonna be really specific.

PiersonMarks:

My prompt is gonna be, like, exactly what I want and follow it to a t. Don't add stars in the background because I did not say stars. And so for me, I'm like, that where is where I could see, like, the artistic imagination of these models kinda differing. Because at the end of day, I mean, like, if every model could just be really good, like, I don't see a future where Sora or Right. You know, Cluck like, any of these, like, can't do text well because the the model's just gonna be good.

PiersonMarks:

Like, why can't

Bilal Tahir:

it be I I feel like also, I think the the and this is why it's so crazy. I think the underlying assumption we're making here is there's a ceiling. Like, you basically let's say the models can generate reality like we see it through our eyes. Okay? But then the question is why stop there?

Bilal Tahir:

Is there something like post reality? Like, is there something like better than, you know, four k, eight k? I mean, I don't know. I feel like we'll just keep pushing the bar. I don't know what that looks like, you know, but I think you just stop at some point.

Bilal Tahir:

You're like, alright. The models are good enough to basically create real life footage. Okay? But I feel like people always want more, and we just don't know what that is. So it's kinda crazy, you know?

PiersonMarks:

But that is yeah. It's totally it's like it's like the new styles. Like like when you had like modern art versus like abstract art. It's like abstract art was a when people started seeing abstract art back in, what was it, like nineteenth century or when it was I don't know. Like, were like, what is this craziness?

PiersonMarks:

Like, I've never seen anything like that before. They didn't even know how to comprehend something so absurd.

Bilal Tahir:

No. That's a great analogy because that abstract art and expressionism, they all started because you know before that painting was a profession to capture real life because we didn't have cameras and then cameras were invented and they're like, we don't need to paint anymore and the artists, you know, they innovated and they were like, alright, we're gonna do post reality, we're gonna do expressionism, abstract, you know, art, modern art, etcetera. And so I wonder if there's a video dimension here where, you know, once we can just recreate realities, you know, we do that. And one of the I really like this the way a lot of I've noticed multiple creators say this, but they think they they say that do not think of these video generation models as models that generate video. Think of them as real world simulators.

Bilal Tahir:

And I really like that way of thinking about this because we're yeah. You're generating video, but really, the end state is you're generating worlds like video game worlds. You can, like, walk in, talk in, live in, and that's a very interesting thing. So that's where, you know, maybe, you know, I wanna live in amazing, like, high definition four k world, but maybe sometime I wanna live in, like, a Super Mario world.

PiersonMarks:

For sure.

Bilal Tahir:

Yeah. Yeah. Very fascinating.

PiersonMarks:

You you segued right into, like, where I was gonna go perfectly because I was literally gonna bring up, like, how when I first saw Sora, I remember about, like, whatever, a year ago now or six months. I forget. Time flies. But there was a Jeep that was going it was like a one of those videos that they demoed, and it was a Jeep going through the hillside and kind of just driving along. And it looked like, you know, Forza or GTA or, you know, a really good version of that those video games.

PiersonMarks:

But, like, you could imagine where computing changes. Where today, you know, the the majority of computing is done where you have the calculations happening on the CPU, you have the GPU rendering those pixels on the screen, and they communicate back and forth, but the logic is actually hard coded into the physics engine. And then the physics and the lights and the shaders and everything is, you know, rendered to screen through the GPU because it could do that really well. But in a world like, that's not how our brains work. That's not like how I think the future of computing could work.

PiersonMarks:

And this is probably what Jensen was saying too. He you know, I'm I'm spewing what he's probably wanting to happen in the world. But like the Xbox I have in my living room, if you just bump up that GPU and it's able to run a world model that is trained on, let's say, Minecraft. Like, if you just trained a model on a lot of Minecraft, and I'm going here because there was an example of this with Oasis, But there's not like, the models starts to learn, like, hey, my character should move across the ground without, like, falling through the ground. And there's a tree over there.

PiersonMarks:

That tree should probably based if I'm moving closer to the tree, the tree gets bigger on the screen. And if I press a button, the model understands, hey. Like, I'm chopping down that tree. And, you know, like, it's much more it's less hard coded about, like, you know, action, render something. It's more of like, you press a button on on your controller, and then the screen changes versus a complex, like, physics engine and, like Right.

PiersonMarks:

Business game logic going on and running in the CPU.

Bilal Tahir:

Mhmm.

PiersonMarks:

So it's kind of wild to me to think about like how video games and that could all like completely completely change.

Bilal Tahir:

I absolutely, there's so many fascinating ideas. One idea I've always wanted to do, think it would be, I think this would be a hit show is like like imagine like a trash reality show, but AI generated characters like Big House, like, you know, one of those where they're trapped in a house and they have to do it but the way it works is it's generated on the fly so every hour or ten minutes there's a poll does Brad find out that Ashley's cheating or whatever like and then based on the poll it happens. It's like endless if like, you know, bender snatch type of you know, you can take wherever the plot goes, you know, which I think and imagine, like, it just opens up a whole new way of consuming content. It's just real it's dynamic is right there, and you're controlling the characters kind of, you know. Totally.

Bilal Tahir:

Yeah. So it's fascinating. Speaking of controlling the characters, I think you mentioned this before and I've never used Runway but I'm curious because you mentioned there's a Runway for API now. Have you used Runway before and what do you think? Or do you think they're because they used to be the best they they were way ahead of the game.

Bilal Tahir:

Right? I remember. And then I I don't know if the other people have caught up now. They've kinda fallen off or they're still doing their things, but they're more because they seem to be focused on, like, studios and artists, you know, people who are really professional, you know, creators. Right.

PiersonMarks:

Right. Right. No. I I I mean, I wish I could I had a better answer, but I haven't used Gen four much. I haven't I don't think I've used Gen four at all.

PiersonMarks:

And then I just saw that the API came out. But you're right. I mean, I think they were doing, like, the inpainting where the when I have used runway, it was like I took an image, and it's like a waterfall, and I drag a brush across the waterfall. And, like, now the water is actually moving in that direction. Everything else kinda stays the same, but, like, the water looks like it's moving and flowing.

PiersonMarks:

It seems like that's what like, the direction that they're going to, where it's more of, like, a you have a underlying great model, and they're building tools on top of that model to provide precision and control using, like, inpainting esque sort of, like that makes sense. I mean

Bilal Tahir:

Yeah. It's funny. It mirrors us Jellipal. Right? I mean, instead of NotebookEllum, which lets you just create an AI podcast, we go we let you be precise and generate the exact script, change the audio, you know, depending on the the sentence, etcetera.

Bilal Tahir:

So because we're we're also banking on personalization and precision because that's what our users want. They want control. Sounds like, you know, that's what Runway is banking on. They want their the artist want control. They just don't wanna they don't wanna press a button and get a scene.

Bilal Tahir:

They're like, well, I want the scene, but I know I wanna, you know, change the shadow and the shade and all that Right. Etcetera. So

PiersonMarks:

Interesting. I think it's what's funny is about, like, engineers will just hover and gravitate towards what the the coolest new shiny thing is because, obviously, we love building, and we just,

Bilal Tahir:

like Right.

PiersonMarks:

Using, like, the coolest new tech. And then you force and fit, like, this new tech into maybe just what's not great at. Like, going back to Wednesday, I went to the AI Tinkerer's Voice Agent Fair. And you see, like, voice is something super cool. You know?

PiersonMarks:

There's Siri that came out over a decade ago. You have Alexa. There's always a promise of voice being this universal interface. And now you have ChatGPT advanced voice mode, and it's actually good. Like, you could actually have a conversation.

PiersonMarks:

It sounds realistic. It knows you. It could soon, like, take actions on behalf, so it's actually Jarvis esque. But I think that right now, you're seeing a lot of companies kind of force voice into their UI when, like, voice isn't the best medium for a lot of use cases. It could be better than what some platforms use today, but, like, I don't like, I was saying this earlier, but I don't wanna go to a airline's website and necessarily chat with the the agent website thing where I could just click a button and be like, oh, I'm going from San Francisco Airport to JFK.

PiersonMarks:

Let me just see all the flights. Like, that's such a better interface than talking to the website, be like, hey. I'm doing this. Like Right. Then they respond back to me, then I have to process that.

PiersonMarks:

It's like, I think that's where there's a lot of leverage for builders and creatives to recognize that, hey, like, I know this is cool, the technology is cool, but it's probably not the right place everywhere. I mean, and so it's like that balance you have to figure out.

Bilal Tahir:

For sure.

PiersonMarks:

For sure.

Bilal Tahir:

No. Absolutely. I feel it's funny when we come full circle back to the knobs and dials that we had,

PiersonMarks:

you know Right.

Bilal Tahir:

Way back there. Like, yeah, that that worked. Why did you break it? So Totally. There's an interest for everything.

PiersonMarks:

For sure. Right. Sure.

Bilal Tahir:

But I am looking forward to a world where I can use, like I know you're you love coding with the you know, using Aquaphor. It's like, voice. Like, if you can just, like, change a world, like, add a waterfall here, some flowers, some birds, and just make it having it real time, I think, also gonna be, like, this Javan's paradox where it's just gonna make it seamless to just explore your imagination because the cost of doing it slow slow. So it's gonna be a Cambrian explosion of creativity.

PiersonMarks:

Totally. That's so exciting. It's so exciting. But

Bilal Tahir:

I love it. I I think it's a I think I don't know if I've shared this. I do think this is something I wanna nerd out on. One of my ideas. I wanna make GitHub for content.

Bilal Tahir:

So, you know, because I the analogy is, like, the cost of content basically goes down to the cost of code. And so what if you let's say you come up with Game of Thrones, you put it out there, but I hate season eight, so I I make a PR. I say, I don't like this. Happier ending, whatever, and I submit it, and then people can vote on it. Maybe you merge domain, if you don't, you're like, no, I love season.

Bilal Tahir:

I'm like, alright, I'm gonna fork your repo. I'm gonna make my own Game of Thrones, etcetera, and then game clone, etcetera. So I know I it's kinda nerdy, but I feel like, you know, the the the workflow of GitHub of having a piece of code, then people reviewing making a branch of the code, is a copy of the code, reviewing it, and if you like it, you can merge it back to the main branch or you can create a fork and create its own thing. You can use that for text con I'm surprised book authors don't use it, you know, but I think it's such a powerful workflow of versioning and exploring different ideas. And I feel like you can use it with content, you know, and

PiersonMarks:

Super interesting.

Bilal Tahir:

Have very interesting things like Star Wars, but Ghibli. I don't know. It's just like create that, you know, etcetera.

PiersonMarks:

No. Totally. No. That's so I mean, it it used to make my brain go in like 10 different directions and it's like how sick that would be because like, you have you have, like, a mainline branch where it's, like, some content, it's forked, and then you'd have to think about, like, okay, what is, like, the the atomic element of content? Is it the prompt?

PiersonMarks:

Is it a prompt plus the model? Is, like and so, like, what are those baseline things? Is it the generated object? Maybe. And you take that object as context into, like, the next fork, and you can, like, make changes.

PiersonMarks:

And, like, what are those changes? The diff. The diff is like, hey, I applied this new prompt to this image, and it's like, that's cool. And then you can also reduce

Bilal Tahir:

the cost of creation. Git commit ad hoc girl.

PiersonMarks:

Yeah. No. A 100%. That's so it's that's that's wild. And and you can even imagine this with characters.

PiersonMarks:

If you're building a world where, like, I mean, putting this into jelly pod shoes, people build their hosts as characters. And something that you've built is like, and we've talked about this, is like memory and backstories and context awareness. And at the end of the day, it's like, if I clone my voice and I put in documents that make me me, and you know, I want to send my host my podcast over to you or what or at least allow you to use me as, like, a as a character in your in your content. Like, I should be able to, you know, do that. Is there a monetization?

PiersonMarks:

Is there a market that could be created there where it's like you have creators and you could pay to use use, like, AI models? And, oh, man. That's a well, so interesting.

Bilal Tahir:

It's crazy world. I mean, it's like goes to the AI to AI economy we've talked about where you almost give like, you almost have a chaperone who's like your agent and he goes out and, you know, does business on your behalf and stuff, know. Then again, we're it's like maybe that's the midway way of looking at it and we're, you know, fifty years from like, yeah, these guys were trying to in an age of abundance, they were talking about trying to monetize 5¢ per video. I'm like, what do you need money for? It's like that book, I don't know if you ever read the Culture series, it's a very fascinating book series.

Bilal Tahir:

It's one of the few sci fi series I think which I like because it's basically set in a post abundance era, but it's not dystopian. It's basically an era where basically machines be reached AGI. And so society is basically run by by AIs. And so the humans kinda just chill and just do whatever, but there's no such thing as money and stuff. Relationships are sort of different, you know, without getting the details and stuff.

Bilal Tahir:

People can change their gender, but evidently, the avatars, etcetera. And it's a fascinating, you know, thought experiment on what that looks like. And it's not just I mean, it can be dystopian depending on how you look at it, but it's clearly not, which is what most of sci fi leans towards is like, oh, bad shit happens. But, you know, I feel like the culture series is a cool glimpse into what could happen, you know, with us.

PiersonMarks:

For serious. Right. No. Interesting. Yeah.

PiersonMarks:

Cool. Well, I mean, on that note, I think we wrap up here. About a lot of stuff.

Bilal Tahir:

Before we wrap up, just quick shout out to context. I know we didn't get a chance to talk about it, but image editing, and we'll do a whole show on it, but context image editing is out there. It's a so they already released a model like a few weeks back where you can give an image, you can edit it, and it was state of the art at piece Charity PT Image one. They just open sourced their model though, and this what this allows is to fine tune the model. So Fowl has done that recently where you can take a bunch of images like broccoli hairstyle and you can, like, train the model to just edit in any image and give you a broccoli hairstyle and just makes it easier to do that and you can fine tune that.

Bilal Tahir:

And so you don't have to be technical about it. They actually have a trainer on their site and you all you have to do is supply ten, twenty images of the style you want, maybe it's cartoon, you know, etcetera, and it'll the model learns, okay, this is the style you wanna edit an image for, and then you can create that style. So Is

PiersonMarks:

it is it leaning more towards, like, realistic edits or is it leaning towards, like, anything or

Bilal Tahir:

I've seen mostly realistic, but I think you can do anything. If you go on their site, there's a bunch of things, age progression, baby version, background change, Cartoonify. Okay. So there's a Cartoonify professional photo like the LinkedIn one, object removal, Wojak style.

PiersonMarks:

Right. Alright.

Bilal Tahir:

Totally. Actually, the person who made it, Jonathan, I actually know him. He works at Fault, pretty cool guy. You know, I've followed him for a while.

PiersonMarks:

Oh, nice.

Bilal Tahir:

Check him out on Twitter too.

PiersonMarks:

Alright. Totally. Yeah. I I on Wednesday, I mean, Dex from human from human layer, he was the organizer for the AI Tinkerys here.

Bilal Tahir:

Oh, nice.

PiersonMarks:

Yeah. I was hey. Yeah. That's sick. But cool.

PiersonMarks:

Okay. Flux context brings me back to the days when I was trying to train Alora and do our podcast cover art. And I was just like I was trying to figure it out, but then I was like, you know what? The problems get me there. The problems are cool.

PiersonMarks:

I mean, every every cover art that we generate is, like, pretty unique and different.

Bilal Tahir:

So Yeah. Yeah. To be honest, I feel like I've all I've never really gone down too much in the train because I'm like, the image models are good enough for most edits. The reason you wanna use this is because the the cost comes down. Like, if you wanna generate a ton of, it makes sense to just train what the model on 20 images, and then you can basically get the same image, slightly better editing at a tenth of the price.

Bilal Tahir:

So that's probably the use case. It's also a good way to build rapper products on. Maybe you wanna do LinkedIn photo to LinkedIn professional photo shoots, train the Lora, and then can create a state of the art LinkedIn image generator.

PiersonMarks:

Right. Right. And don't pay the $30 for those three images that those, like, rapper companies will do. Do it yourself. Yeah.

PiersonMarks:

You know?

Bilal Tahir:

Yeah. Respect the hustle, but, yeah, it's a

PiersonMarks:

good move. Yeah. Totally. Totally. Well, cool.

PiersonMarks:

On that note, well, episode three of Creative Flux will be here next week. We'll get get these out on Fridays, and come listen, join our conversation about, you know, semi technical folks. We'll get a little bit more technical, a little higher level, and talk about everything that happened in the world of generative media and tech. So Alright. Yep.

PiersonMarks:

See you.

Bilal Tahir:

Take care, guys. Bye.

Anthropic's Legal Victory, Interactive Media, Generative Workflows
Broadcast by