An In-Depth Look into AI Image Editing

Pierson Marks (00:02.004)
Hey.

Bilal Tahir (00:03.884)
Hey, hey everyone. How's it going, Pearson?

Pierson Marks (00:06.91)
Good, good, how you doing?

Bilal Tahir (00:08.942)
Good, good. Another crazy week in AI land.

Pierson Marks (00:11.144)
This is a big day.

Pierson Marks (00:15.102)
Crazy Week in AIland and Crazy Week for us, episode 10. We're halfway to 20.

Bilal Tahir (00:19.926)
Yeah, wow, double digits. did, what was the statistic? Like 80 % like to episode three or something and then.

Pierson Marks (00:26.036)
It was 90 % of podcasters never get past episode three and only 1 % ever reach episode 20. I don't have my sources on this. It could be completely false, but it was a cool stat that maybe it's US podcasts or I don't know. yeah, so we're halfway there. 90 % of work is showing up.

Bilal Tahir (00:47.054)
90 % of work is showing up, know, as they say. It's like that Mr. Beast, he has this like thing where people would come up to him like, hey, I need your help. You how do I become, you know, a successful YouTuber? And he goes, create 100 videos and then come to me. And so he says like, the people, most people won't do it, but the people who do it, by the time they hit 100 videos, they don't need my help. Like, you know, so, they're...

Pierson Marks (01:10.27)
Right. It's so true. It's true with everything in life. It's so cool. I mean, it's like, hey, you do something the first time, you're going to be ass at it.

Bilal Tahir (01:15.555)
Yeah.

Pierson Marks (01:21.312)
And second time, we're still going to be bad. A third time, probably still bad. And then you're like comp. But every time you get like slightly better and then you eventually get to this point where you're like, wow, your current work is just so indistinguishable from the first version. That's what's so fun. That's what makes life so fun. You know that meme? It's not a meme, but it's just like engineering. It's the SpaceX Falcon rocket boosters that goes from like the version one where it's just

Bilal Tahir (01:26.476)
Right.

Bilal Tahir (01:36.62)
No.

Pierson Marks (01:51.089)
like has all these pipes everywhere and just looks like like some garage like Iron Man like put it all together and then v3 is super clean and it's smaller it's more compact it's more efficient everything's better about it and it's like you just can't get to v3 without doing v1 and v2 like it's just impossible you can't do it so all right all right

Bilal Tahir (02:06.732)
Yeah, yeah, Yeah, get through those versions. It's true. Hopefully, yeah, hopefully we'll definitely level up our game too as we do more of these episodes.

Pierson Marks (02:18.654)
Totally. Yeah, we were talking about this. So last week I was at an event. It's called AI Tinkerers. That's how we met. Or actually we didn't make that, but we both went to those. And one of the presenters there was showcasing his workflow for taking podcast episodes and turning it into.

like blog posts essentially. So it took an episode, pulled the transcript, created like this high quality, well formatted blog posts that would go up online. And it's just cool. It's like that's the type of stuff that is a force multiplier, especially in like the marketing space. Like they create a podcast, they just sit down and record it like we do.

Bilal Tahir (03:01.454)
Hmm.

Pierson Marks (03:07.75)
And then they have a blog that takes a transcript that creates a blog derived from the transcript. And yeah, it's interesting.

Bilal Tahir (03:11.864)
Right.

Yeah. And how do they do it? Do they use an MCP cloud code workflow? What's some of that?

Pierson Marks (03:21.566)
They just coded it. They built it. It was kind of they uploaded the video and then the video got transcribed and then that transcription. It's like with this pipeline of formatting essentially into like a plain text blog and that blog just got uploaded to their website and we could do something the same. I think this is like what kind of the plan here is a plan here to just.

We're creating cool SEO content. It's going on YouTube.

Bilal Tahir (03:52.046)
Yeah, yeah, for sure. mean, there's the art of repurposing is super powerful. And I feel like you have some workflows that are very similar, like this video, you'll take it, there's edit, there's clips, there's the transcription, there's the...

post about the buffer posts and stuff, whatever, et cetera. So all these steps, you know, and so I know you've been thinking about those too. So yeah, totally. It's like, it's always a more to build a system to do the thing versus just doing the thing. So.

Pierson Marks (04:22.6)
Yeah, and the hardest thing, I mean, you know this too, there's just so much to do all the time. And I was like, man, I'm doing thing A and I need to do thing B and thing C. And then tomorrow, if you don't get to thing B and thing C, tomorrow there's thing A, thing B, thing C still. you're like.

Because it's always a challenge like multi multitasking like I I'm a complete believer that the sister sting is multitasking if you're multitasking you're doing both things bad I Would like to be proven wrong I haven't yet really But

Bilal Tahir (04:43.863)
Right.

Bilal Tahir (04:47.562)
Mm-hmm. Yeah.

Bilal Tahir (05:02.602)
For sure as Ron Swanson says, know, don't half-ass two things, full-ass one thing.

Pierson Marks (05:07.346)
Right, right, totally. But yeah, so yeah, we'll see. This will be interesting. But this week, let's get down into it. We had some cool news last week we were talking. So last week we talked about Genie 3.

We talked about the future of Hollywood and filmmaking related to Genie 3. We talked a little bit about Claude Code and how that could be used as a general purpose agent. That was an interesting conversation. But this week, I think we wanted to talk a little bit more about...

The Photoshop killer, the Nano Banana model. Super interesting. I'll let you lead on that. And then there's some other stuff that we'll get to. But you just want to start off with Nano Banana and like, what is it? That's a weird name.

Bilal Tahir (05:46.83)
Hmm.

Bilal Tahir (05:59.054)
Yeah, yeah, we can get into it. just before we start, want to echo on Genie 3. You shared know, Bilaw's video with me, which was very interesting. I watched it and a couple of things we haven't wanted to call out was that, because we were focusing so much on the gaming and the experience engine, but there's some cool aspects about commercially where you can like simulate robot training. I think we did talk about that, but also like smart cities and stuff like where you can like simulate.

Pierson Marks (06:26.036)
Mm-hmm.

Bilal Tahir (06:27.83)
like how traffic flows in a city, et cetera, or how certain, you know, you could be even that crazy, something crazy like weather patterns or something. And so that was a very interesting aspect I had not thought about that you can use it for simulations, you know, and use that as training data for X, which could be robot training or just planning the grid, et cetera. So very powerful.

Pierson Marks (06:51.892)
So no, it's super interesting. And for context, mean, if you didn't listen to last episode, Genie 3 is Google's world model. And so what Genie 3 is, is essentially they took a lot of data, probably YouTube, Google Earth satellite imagery, everything, and created a real time replication of.

world of the world. And so you could just generate based on an image or a video an interactable interactive world that you could walk through. And so everything is being generated in real time. And that was just super interesting. So you could just get plot. You could just get put into a world whether that's

you know, real world, a fake world, a piece of art and actually interact with that, that image or whatever you just generated, like move around, like playing a video game. So cool.

Bilal Tahir (07:52.172)
I, yeah, or have a, know, if you want to experience, I didn't, was, I've never been a huge.

Star Trek fan just because I just never experienced it but there's this concept called the holodeck in there where you know, I have seen scenes for that where but you know, you people can go in the room and that becomes you know into whatever like a 1800s tavern or whatever, know, and they can like, you know, it feels very real so you can like simulate that reality and that's a very interesting concept you where you can just simulate, you know night out at the bar would just be in your pajamas, but you can you know Simulate that so it's very interesting

Pierson Marks (08:26.206)
Don't worry.

Bilal Tahir (08:28.674)
You know, doesn't bode well for us socializing, but you know, I'm fine with that. IRL, IRL is overrated.

Pierson Marks (08:34.516)
No, yeah.

Yeah, it's so cool. was a there. We'll link this video in the show notes is 20 minutes and it just kind of goes through. I think it was eight different use cases of GD3.

Bilal Tahir (08:49.442)
Yeah, crazy use cases. And he didn't mention this because it's been very recent, but this is so typical in tech. They come out with Genie 3 and literally within a week, some lab released an open source version, which was almost as good and free open source. So this space is...

Pierson Marks (09:04.5)
All right, force the hand, force them to release it, because you can't use GD3, right?

Bilal Tahir (09:09.006)
Yeah, yeah. But it's crazy how those things always happen. It's like kind of like the four minute mile version where, you know, one guy does it and then a bunch of people do it, you know. So, yeah, it's interesting. I think this reality, like this, this tech is closer than we think it is, you know, and you'll be very interesting to see how, you know, how our, the way we interact on the internet online just changes because of it. So, yeah.

Pierson Marks (09:35.284)
Totally, totally, yeah. Yeah, totally. But cool, if you want to listen more to that, check out the last one, last episode.

Bilal Tahir (09:39.342)
Yeah. Yeah. Jumping back to image editing. So Nano Banana. So what is Nano Banana? So Nano Banana, and we actually don't know what Nano Banana is. We do know it's Google's latest image editing model. They haven't released it. Hopefully, they'll release it soon. But it's basically state of the art image editing. And it actually

Pierson Marks (10:00.094)
Wait, so how do they, wait, how do we know about it if it's not released and like, we don't know.

Bilal Tahir (10:04.558)
Because Logan Kilpatrick, who's the lead PM, he's been tweeting out banana emojis. So I think we're just trying. That's, I guess, that's the goal. Yeah, so what happens is there's sites like LM Arena. What these labs do is they'll put their model out with a code name. So it probably won't be called Nano Banana once it's out.

Pierson Marks (10:12.916)
wait, so NanoBanana is out there. You could use it, but there's unofficial.

Bilal Tahir (10:28.858)
There's other models like Horizon, et cetera, which we don't know about. Like I mentioned, I think last week that it's better than apparently Opus in front end, but we don't know who's that. So they put it out there as a way to test out the model, get feedback before they do a full release. And so Nano and Nano has been making the waves on Elam Arena or other sites like that. So that's why we know about it.

Pierson Marks (10:37.83)
Mm. Right.

Pierson Marks (10:51.38)
And LM Arena is a website where you have like two models go head to head to each other. Like what is LM Arena?

Bilal Tahir (11:01.932)
Yeah, yeah, basically, you you do an LO rating type thing where, you know, you ask a question and it'll give you two models you don't know about. It's a blind, it's kind of like a blind audition and you pick the answer. And based on that, you know, they come up with this LO rating. It's actually a pretty genius system.

from a basic standpoint, which has been totally gamified. And I actually don't believe in LM Marina benchmarks at all at this point, because a lot of them don't make sense. But it was a great idea when it started and it took off. And it's become the de facto place to go and look at models.

Pierson Marks (11:35.86)
And so for an image editing model like Nano Banana, how does that work in this head to head?

Bilal Tahir (11:45.026)
Yeah, I mean, it might not be LMRINA because I know LMRINA is primarily text editing, but it could be. think they might have added it. I imagine what happens is you edit an image and you get two different outputs and you pick one or something like that. But yeah, supposedly it's the best at editing and it actually follows in the wake of Quine image, which is...

Pierson Marks (11:55.41)
Gotcha. Cool. Cool.

Bilal Tahir (12:06.7)
currently and it has been released by Quen. It's also a really good image editing model. It's out there, it's on file replicate and it's open source. And so you can actually fine tune it as well, which is another interesting aspect of open source model I'll get into shortly. But Quen image is really good, very minute, know, edits can happen and maintains most of the image. And Anno Banana apparently takes it to the next level. And really with image editing, think the...

the real big North Star is just kind of eliminating Photoshop. you just...

take an image and you can edit my new details and it just gives you that perfectly. And I think we're fast reaching that point. Now, I don't think Photoshop is going away because I think you need a GUI and there's a ton of other stuff you have to do, but this will probably get baked into these editor models, editor workflow tools where you just use these models. for me, what's exciting is character consistency. I've been like, you know, kind of harping on this for a while, but I feel like one of the reasons we don't have coherent stories is because, you you start

the character and the character changes because you generate five second videos etc with character consistency what i've seen is and our friend PJ you know he had an awesome commercial he made for it was a tax company i forget yeah sorry but it was a ramp he did a 30 second commercial but the way his workflow works is super interesting what he does is he starts with these shots so he has the image

and then the main character, but he's edited in different poses, et cetera. That's how you come up with your main shots. And then from your shots, you generate the videos and stuff, and then you kind of stitch them together. And I think we're gonna see this new artistic workflow where you have these like shots generated first, and then from the shots using image to video, you you're generating different videos and then stitching them together. So very cool.

Pierson Marks (14:06.259)
Right. That's interesting. Yeah, I mean, you mentioned a lot of interesting topics. I character consistency is something that we've talked about a lot here. How important it is for movies, for video games, for just like all this stuff. There's so many reasons why being able to have consistent characters is important. What are the things that you mentioned? So Nano Banana, so image editing in general. It's going to be really interesting because like I love I've always really enjoyed

Messing around with Photoshop, Illustrator, that was probably the...

When I was much younger, like that was a pretty complicated and still today is very complicated piece of software. And there are things that Photoshop does do well, which I will be interested to see how, like what will the GUI be when the underlying software that powers image editing is no longer like Photoshop, like press buttons, do calculations stuff, but rather a model. Because one of the things that Photoshop does really well is you could layer so you can kind of and have like history.

Bilal Tahir (15:08.141)
Right.

Pierson Marks (15:10.179)
and sort of like you have layers and I wonder when you use an image editing model you take an image you apply some transformation and then you get a new image and so there isn't really necessarily a concept of layers where you can like hey I just like remove like I added the text to this image like I asked my image editing model hey add text like to the billboard that's in the background and then you were like what actually is edit that text you kind of want to compare it back and forth or remove it and

Bilal Tahir (15:38.35)
Yeah.

Pierson Marks (15:39.959)
So like the layering will be something that what will be the scaffolding.

Bilal Tahir (15:42.434)
Right. And that's why, I mean, that's a good call up because in that example, like I would much rather, think most people would rather just use a brush, select the element they want to edit and not be like,

the top right corner of the billboard, the green, and cause I've tried to do that and cause sometimes, you know, it's ambiguous and you have to be really specific in your instruction. It's annoying to write that down. So I actually think that is not the experience most people want. I think what will happen is that right now people like have these layers, they select that layer and then they have these 20 different options they can use or maybe 200 for sure. But now you can just then prompt it like, select the jacket layer and then now you've selected and now change that to green.

I think so you'll probably have like a pop-up box or something but the layer concept I think is is so baked in and so powerful that I think that'll probably remain because you know it's much easier to select the shadow layer or the jacket layer rather than saying change the jacket or I mean that's a very bad example but like imagine like there's 50 people and you're like the the people in the second to last row like no I just and on second to the left no I'll just select the person right

Pierson Marks (16:48.031)
Right. Totally. No. It's really interesting because I wonder if there will just be a model. Like I wonder if this lives at the model layer or at like the scaffolding software layer around the model where could a model just.

Rather than generating an image, could the model generate the Photoshop file that has the layers? And so rather than being trained on, yeah, SVG, because SVG is a great example. For anybody, it's like a scalable vector graphic, think, or simple, scalable, I think it is.

Bilal Tahir (17:10.21)
Right. Or the SVG or whatever. Yeah, right. Right.

Bilal Tahir (17:18.86)
Right. Right.

Pierson Marks (17:20.937)
But it's just a bunch of just lines. And it's a text file, essentially, that says, hey, from point A to point B, draw a line between these and have that line be this thickness. And so a text model can generate SVGs pretty well.

Bilal Tahir (17:30.252)
Hmm. Right. Right. That's true. That's true. And you use Photoshop. Photoshop is what? PLD? I mean, they have a special extension or something to write for their asset. PSD or something. Yeah. Yeah. I mean, that's.

Pierson Marks (17:41.75)
Yeah, it's PSD, right. And yeah, and I think that was just like a, it doesn't have to actually be a Photoshop file, but just like could a model be trained to rather than producing an image.

be trained to produce multiple layers of an image that then are collapsed when you view it, but then can also be extended like, hey, here's the baseline, here's the shadow that we applied, here's the text that we applied, here's the ultimate curves or lighting and stuff.

Bilal Tahir (18:09.75)
Right, right. Yeah, for sure. I mean, we already have models that do text to 3D, like they'll generate 3D, it's called GL models or something where you can like rotate it and get the whole thing. I mean, I think that's a good call. Like you can like probably generate, you know, whatever, you know, asset you want, which is optimized for manipulation and also like compression too. mean, a lot of people just generate PNG and JPEG, but why not WebP or, you know, the AWIFS or whatever.

Pierson Marks (18:26.729)
Right.

Pierson Marks (18:37.365)
I try to change everything to WebP. I'm like, everything nowadays, I'm like WebP because it goes from like a one megabyte file down to a 20 kilobyte file. And I was like, that's 50 times shorter, it's a long.

Bilal Tahir (18:43.564)
Right. Yeah. I wish, you know, I want to use WebP, but there's so many issues I've felt with a lot of things not supporting, like particularly with the OG images and stuff. I sometimes felt like maybe now it's better, used to be, it had to be a PNG, otherwise it wouldn't render. So a lot of people were annoyed at the...

the Weppy guy, even though, mean, there was this funny meme of Vince McMahon, like somebody said, sir, we found the Weppy guy and he goes, where is he? Where's that son of a bitch? And he just like runs out. But I mean, of course you want more impression. And apparently AWIF, which is even the latest and greatest, it's even better than Weppy, but nobody supports AWIF at this point. So at least Weppy has been around for a while.

Pierson Marks (19:13.205)
That's funny. Oh, that's funny.

Pierson Marks (19:24.201)
Right, Totally, Yeah. And one more thing I think would be interesting before we move away from image editing.

So we were talking about Nano Banana and how the challenges with image editing with a prompt is the lack of specifix like lack of specifying your intent well enough to actually get the intended output because you have to be like hey select row two of all those people in the audience. I wonder like I use Photoshop.

They have some AI tools built in. have this generative background, generative remove. They're really good. They work and they integrate really easily into that UI. The other thing that we've always talked about is just tools and an LLM is a great, you know, it can call tools and perform actions. Like,

I just wonder if Photoshop simplifies the UI into like a Photoshop AI. Like instead of CS, they're on Creative Cloud now. like, there's CS6 and Creative Cloud. Maybe the next thing is like Photoshop AI or something. And rather than having all these buttons everywhere, like as the default, you just have a text prompt on the right-hand side and say, hey, like do this thing for me.

Bilal Tahir (20:36.832)
Mm-hmm.

Pierson Marks (20:46.003)
and then it will actually hook into all the actions that Photoshop could normally do and just actually call those as tools. So rather than like, create a new layer for me or add text or, you know, add some skit.

Bilal Tahir (20:55.598)
Hmm.

Bilal Tahir (21:03.042)
Yeah. I mean, that's an interesting business idea. Somebody basically builds a wrapper on Photoshop and just gives up more like a Canva like experience. And then you don't have to worry about the 50,000 different tools under the hood. yeah. Yeah. Yeah.

Pierson Marks (21:07.517)
Right? I'm Gimp. Right?

Pierson Marks (21:16.501)
think you see this all the time. Like this is something where you have these co-pilots in so many applications now where on the right hand side, there's just going to be a text box like Riverside that what we record on has that just.

Bilal Tahir (21:23.426)
Yeah. Yeah, you know, you know we should do it. you know, one of my unsung heroes who doesn't know I exist and nobody knows about this guy, but you know, Photopia?

Pierson Marks (21:35.105)
yes, yes, a dinner photo, Pierre.

Bilal Tahir (21:36.696)
So Photopia, it's like one of those like OG8s, it was created by one guy. One guy created it in like 2000.

or whatever, I don't He makes millions of dollars, I think, from it. It's a totally free tool. It's all based on, I think, advertising or donation or something, but it's ridiculous. He basically was like, I want to create a Photoshop browser version free. And it's like crazy. It's like the simplest in terms of like no auth, nothing. it's obviously the application itself is super complicated, but he literally, it's just a free browser app. There's no gateway or anything. And he just keeps being at it for like a decade plus.

So yeah, crazy, like an OG indie hacker. So definitely check out Photopea. And hopefully he adds some AI stuff like we were talking about. That would be great. Because I would love for Adobe to get killed, because they are.

shit company but they have crazy like their subscription plans are insane and they will not let you cancel so there's a huge you know issue with them that they will you'll have to call them to cancel and they have to pay a cancellation fee which is stupid why would you have to pay a fee to cancel their subscription ridiculous so that's why like not a fan of adobe and i would love for photopeas and the canvas and the figmas of the world to eat their cake

Pierson Marks (22:50.43)
Right, 100%. No, totally. Yeah, well, we'll see what happens. is when tech, when you have new innovation and paradigm shifts, this is when companies die. We'll see what happens with them, see what happens with Apple.

Bilal Tahir (23:05.036)
Yeah, yeah, for sure.

kind of closing the loop on image editing. Another cool thing about image editing, I think for open source models is very cool is that you can fine tune them on Lora's, which is a way of fine tuning image models. there's some really cool fine tunes, particularly the replicate team is very good at putting out fine tune models. They have a couple of dedicated members who do it. And one of my favorite ones on Quen images, it's called a real life anime. And so if you've seen the movie, Who Framed Roger Rabbit, it's a very, it's an old classic movie

Pierson Marks (23:16.618)
Mm.

Bilal Tahir (23:36.736)
It's a bunch of loony tunes, but kind of like space jam loony tunes and there's a but the main character is like he's real like, you know So it's like a real-life movie, but then you got tunes and it's like I love that concept and I wish we had more

you know, movies and shows built around that concept. There's very few shows. It kind of happened around the nineties. I remember it was was a boom frame Roger Rabbit. There was a Arnold Schwarzenegger's like tune movie, which is bizarre. He did that. I think it was last action heroes, maybe I think or something. And then there's obviously Space Jam. So it was a fad for in the nineties, like late eighties, early nineties, and then it died. But this Laura might bring it back. Now you can go. You can generate like a tune image and everything else is like cinematic.

Pierson Marks (24:15.99)
All

Bilal Tahir (24:20.57)
know, real life. that's just one example of fine-tune models and the advantage of fine-tune models is you don't have to worry about getting the perfect prompt. can kind of, models are getting good enough that you can do this with the right prompt, you know, but fine-tune is just, you know, bake that, call it like the aesthetic in and then you just focus on generating the exact, like the character or the situation in the image and not worry about the art style.

Pierson Marks (24:47.584)
So would you recommend somebody who's trying to generate some media, spend the time first seeing if there's a fine-tuned version of something out there or like playing around with the latest image gen model, video gen model, and then finding of like, what do you recommend?

Bilal Tahir (25:07.82)
Yeah, I mean, I think start trying out like with the base models and see if you like, but if you, if consistency is key for you, especially if you want to change the exact style and second, you know, you don't want to.

type in the thousand word prompt over and over again, then fine tuning could be key. And the cool thing about fine tuning with platforms like Fall and Replicate is you don't need to be technical or anything. You don't need to download PyTorch or whatever and fine tune. can literally just all you do. It's a GUI, literally. You go, you upload 20 images of the style you want and you press run. And it's actually used to cost hundreds of dollars. Now it costs like a dollar or $2 or something, because it's a simple Lora. And within like, I think 10 to 15 minutes or sometimes even less,

maybe sometimes it's a couple of minutes you can

fine tune it. So you can fine tune Quen for image edit it. And if you want to do a style on image generation, I would check out Flux Dev, which is basically Black Forest Labs, which is the core team that creates stable diffusion. And they have some of the best state of the art models, Flux Context. We know is good for editing as well. Flux Dev is probably the best cheapest, cheap text generation model, et cetera. So you can use them.

Pierson Marks (26:20.81)
Right. That's super interesting. feel like to most people, yeah, it's super powerful and it's still even though it's so easy now, it's like, I don't even know people don't even know where to start. And so I think just explain, hey, if you're interested in generating images, most people probably know of chat GPT, like their image and you could do that there.

My brother, which is funny, he was visiting recently and he had the Kling app on his iPhone. I was like, whoa, that's cool. He was just making like hippos. He was like doing something with hippos. was like, this is weird. Like hippos playing basketball or something. like, yeah, so fun. then he's like, but I'm out of my credits. And they're like, oh, I have to subscribe to like this plan to get more credits. He's like, I don't want that. And so it's definitely not as user friendly as an app, but for anyone out there that just is kind of interested to play around with it. Like, what could you do at the...

Bilal Tahir (27:05.1)
Yeah.

Pierson Marks (27:13.43)
probably going to foul or replicate is the cheapest you could get. mean besides you know buying your own GPU and like hosting your model like you just go to foul and you have a GUI and then you could just fine-tune your image.

Bilal Tahir (27:17.699)
Right.

Bilal Tahir (27:23.554)
Yeah. yeah, I love those platforms. makes life so much easier for you. like chickens to stand on the shoulders of those giants and they've optimized it. It's not trivial to host these models and run them optimally and get the right parameters set, et cetera. So something, I think for most people, it doesn't make sense to go down that rabbit hole.

Pierson Marks (27:33.557)
Right.

Pierson Marks (27:39.989)
Right.

Pierson Marks (27:44.949)
What? Totally one of the things that I think might be interesting, because I'm curious too. So let's say you have a image model or video model or whatever and you want to fine tune it. You found the right model. You're able to fine tune it. Those 20 images that you get like.

Where do you get them? how close do they have to, let's say I want to create style of like Pokemon, pixelated Pokemon. That kind of makes sense. can go to Google and just get some pixelated Pokemons. But how close and similar do they have to Do they have to all be like the exact same? Or how diverse should they be? What works?

Bilal Tahir (28:22.734)
Yeah, I think that's a good question. It depends on what you're trying to generate. Like if you're trying to generate like a, let's say an anime style, Pokemon style, then it doesn't have to be Pikachu.

20 shots, if anything, you probably want Ash in one and you want to capture a more diverse style. But let's say you want to create a fine-tuned Pearson model, then you probably want portrait shots, you know, that have you and then, but they can't just be the same photo of you, you know, maybe you want some photos where you're smiling, maybe some side shots and stuff. So you want a little bit of diversity.

But if you add four images of the back of your head, that's probably going to mess up the Pearson portrait fine tune, right? So that's where it really comes down to, like, what are you trying to generate on. And it's very interesting because this itself is like the taste of selecting these images and fine tuning it is its own kind of skill set. Because I've actually seen people, what they do is they go, now images are models are so good, you can kind of just use prompts as well.

Pierson Marks (29:02.516)
Hmm. Right.

Bilal Tahir (29:25.712)
but back in the day, people, when stable diffusion first came out, they would like fine tune it on like...

you know, Pixar style, Disney, anime, old anime style, modern studio Ghibli style. And then they would basically make presets out of these like a drop down option and sell those apps, which under the hood would just hit one of those 20 models. And there are apps that were making thousands of dollars from this, you know, just because most people don't want to use them. And now I would say for the most of these use cases, you're probably better off just making an optimized system prompt, like a prompt that can generate that style you want and using

that but

that'll get you to 99%. But if you really want it to get the exact style, then fine tuning probably is still worth it. And it's so easy, you can always try it. And it's also cheaper, because once you fine tune that model, the other thing what people do is they will, sorry, I didn't mention this, this is actually a great hack is people would go to ChatGbd image or one of these higher models, generate the 20 images, then fine tune a Flux Dev Chanel, which costs,

0.00003 cents, right? And then you generate basically the same quality. So that's another arbitrage. If you're thousands of picks, you don't wanna rely on chat GPT, which takes a long time. So, and it costs money, right? So that can decrease your costs and latency. So it's a great hack as well.

Pierson Marks (30:53.206)
In that example, are you overfitting to those 20 examples? So would that just specifically be for a use case? Like, hey, I want to generate like just a bowl of fruits here. I go to Chetchipi tea, like generate a strawberry, generate an orange, generate a banana, generate an apple. And then you fine tune kind of on those. And then now you can kind of generate different versions of the strawberries and bananas. it wouldn't be like great for you can't like do that and then expect to, hey, generate a super high quality version of

Bilal Tahir (31:15.608)
Right, yeah.

Pierson Marks (31:23.25)
a car or something right so you'd have it kind of be

Bilal Tahir (31:25.186)
Right. I mean, I think it's more.

more style from what I've seen on CIVA.AI and other sites, people will find you based on the aesthetic like arcane style, like arcane is a show that has certain animation. Mid Journey is very popular because Mid Journey just has this really cool aesthetic. You can generate characters that look amazing. So you can imagine like, you know, they can generate like images which look like seventies grain, vintage, you know, like the freckles and stuff, and then get that vibe. So it's not necessarily generated.

generating a fruit bowl, it's more the style. Having said that, I have the other popular use case is like if you want an Instagram model, which is like you want the same type of, let's say red ad. So that's a great way where you can generate mid-journey images of that model and then fine tune a smaller model to generate big pictures of her or him.

Pierson Marks (32:20.51)
Right, right. It's super interesting. Yeah, there's still so much here where we're just figuring it out, you know, and figuring out like what's the best way to do it, how to make money with it, how to have fun with it. It's really cool.

Bilal Tahir (32:27.382)
Yeah, yeah, for sure.

Bilal Tahir (32:32.204)
Yeah, I mean, I think it's just a fun hack right now. I do believe, I think in the long term, and I've seen this particularly in the last year, the gap between just having one model that just generates whatever you like at a cheap enough rate that it doesn't matter versus fine tuning it. It used to be you needed to fine tune, now it's more like...

for 1 % cases you can find tune. And if you just play it out for the next six to 12 months and more, you're probably gonna get to .001%. Maybe if you're Coca-Cola and you want the exact Coca-Cola bottle, you're gonna find your model. But for most of us, it's like nano to micro to whatever. Pico banana is just gonna be so good that it doesn't matter. So yeah.

Pierson Marks (33:12.682)
Hmm. Right. So do you see that there's any use case then? Like I say, in five years, do you see, like, do you think that there will be any use case for fine tuning? in a movie, so character consistency maybe where you, maybe the freckle has to be in the exact right spot on the face always. And that's just going to be impossible to define in a prompt like.

Bilal Tahir (33:36.302)
Yeah, I mean, I don't know. mean, I find it like, the only thing I can think of is like, if you just don't want the prompt, like you don't want to put in a prompt and stuff, then you just want to like, you know, have it baked in. So maybe, and cost, because it'll always be cheaper to have a fine tune smaller model with that aesthetic. But at the same time, you know, if the cost is like three cents versus, like 0.03 cents versus 0.003 cents, do you care?

Pierson Marks (34:04.663)
Mm-hmm.

Bilal Tahir (34:04.854)
So it's interesting. For me personally, I probably will have used those fine-tuned models a lot less now versus what I used to. I used to use them a lot more last year.

Pierson Marks (34:15.755)
Right. It just gave me this idea too when so combining fine tuning a model and then image editing and video gen. It's like when we're talking about video gen we have a we have video models that could do things pretty well throughout like they give them like they just generate some some video and if you take an image model

You generate an image and then you generate the first frame. Let's say like this 24 frames per second. Turn in the first frame of that like snapshot. Then you generate the second frame, which is just pretty much take the first frame, you image edit it, and then you let the image model or the video model fill in the blanks. So you can have like so much control where you could essentially just have it.

Bilal Tahir (34:58.84)
Hmm.

Pierson Marks (35:03.895)
that you have complete creative control rather than letting the AI video model do everything. like, hey, here's the first frame, here's the second frame. And you use the image editing model to kind of be like, move the head from the left to right. So it goes from like here to like here with that nice image editing. Yeah.

Bilal Tahir (35:15.998)
Hmm. Yeah, it's like old school animation where they would draw a picture and then move the character a bit like, and you draw a little more and you superimpose stuff on it. So that's pretty interesting. Yeah. I have thought about like, you know, it's annoying to me that image editing has gone so cheap, but video it's now getting cheap. But I've thought about like, why can't we just generate 60? If you can generate

Pierson Marks (35:26.878)
Right, right,

Bilal Tahir (35:38.414)
60 images, you know, that's at a 60 frame per second or 24 images, you know, why I have 24 times five. So that's 120. So if you can generate 120 images for like five cents, why does a video of five seconds cost 25 cents? from at least my naive understanding, it's more to that because there's stuff like interpolation and stuff that is not captured in just having the frames and you kind of need to kind of do it more end to end. So it's interesting.

Pierson Marks (36:06.441)
for sure. Sure. I'm just like looking wait. Sorry. I'll cut this out. I need to make sure that

Pierson Marks (36:16.375)
I'll cut that out. was like shoot I don't want to get a ticket right now but no I'm good. No I agree I agree so we'll see what happens with the video stuff.

Bilal Tahir (36:17.346)
What happened?

Bilal Tahir (36:28.686)
You parking illegally?

Pierson Marks (36:31.283)
No, no, no, I see my car right here. And I just heard the street sweeper. was like, wait, wait. No, I'm good. I'm good. OK. Yeah, sweet. OK, cool.

Bilal Tahir (36:38.167)
Yeah

Yeah, I know, very cool image editing. Check it out. I think there's a lot of powerful use cases like generating consistent characters or certain art style. I...

very interested in playing around with it hopefully this weekend. She kind of did a little bit on Quinn Image and Quinn Image is also pretty good so if you don't want to wait for Nano Manana definitely check out Quinn Image it's pretty cheap as well. It's all on fall.

Pierson Marks (37:15.381)
Also on fault. Also on fault. Nice.

Bilal Tahir (37:18.636)
and you can try a fine tune. It's called Quen Image Trainer. If you ever want to fine tune anything, look for something called Image Trainer. They usually have that. And also check out Flux. It's a little dated now, but in AI land. But Flux Dev, and Flux had an update, Flux.1 now. So good. And actually, I left it for a while because I was just generating one-off images on GPD image one on chat GPD. And I went back and I realized, it re-realized that

what I was missing was that I would generate an image on chat GPT and it takes like sometimes two minutes, three minutes, and it just completely disrupted my flow. on fall there, I literally, I generated an image, it literally takes like a second or two seconds. And then I can just.

something about the instant feedback makes me just iterate on that prompt so much. I'm like, wait, no, add a little back here, remove that. And I can just, I have no qualms in generating 30, 50, 60, know, a hundred images of the same thing, because it costs next to nothing. So there's something about getting that latency down. I think that's very crucial for the creative process.

Pierson Marks (38:08.663)
Totally.

Pierson Marks (38:21.367)
All right.

Pierson Marks (38:27.223)
I mean, this goes full circle to what we started off the conversation with iteration speed. mean, V1 to VN. mean, you have to, the key to improvement is just faster iteration speeds and cause everything, whether that's image generating, building a company, building a product, rocket boosters. It's just, if you can decrease that, it's, it's, everyone's happier and your products will be better.

Bilal Tahir (38:51.618)
Yeah.

Bilal Tahir (38:55.566)
I know, mean, it would be awesome. I mean, basically, Flux Drive is so good, but having mid-journey quality, that would be so cool too. It's kind of frustrating to me that mid-journey, basically mid-journey has had this amazing aesthetic that somehow we've not been able to quite replicate for some reason. I don't know why. I don't know what the secrets are. I mean, it's probably a lot of copyright data. that's why. I don't know. But...

Pierson Marks (39:20.991)
I wonder what happened. Do you know what happened with the copyright suits? Is it still in with Mid Journey? I haven't followed. Yeah.

Bilal Tahir (39:27.79)
I have no idea. I think they're still ongoing and stuff, or maybe they settled with them, but they're moving forward. They have a video model as well now, which, know, coupled with a, I follow some artists on Twitter and they have amazing, like, generate amazing image journey images and then animate them. like, I can see these guys like creating amazing storylines. You know, they'll like have a...

Pierson Marks (39:31.766)
Yeah.

Pierson Marks (39:35.639)
Right.

Bilal Tahir (39:51.69)
an art style and from there they'll generate like different characters and storylines and I start like, you know, relating to the character because I've seen that art story like pops on my feed all the time and I'm like, wow, he's having a coffee now. wow, he's with his girlfriend. Like, you know, it's like suddenly I'm on a journey with this character. And so this is going to be super powerful how you can create these stories, you know, with these, you know, you know, and the style, you know, that people can relate to.

Pierson Marks (40:03.66)
Right.

Pierson Marks (40:10.103)
Totally.

Pierson Marks (40:19.957)
for sure. It is super interesting. Yeah it's a wild world. mean I know that like 11 Labs had their music API. We talked about that a little bit. 11 Labs music at their other time now it's available via API. All right. So you can use that in your product. Texas Beach V3.

Bilal Tahir (40:37.73)
Yep, yeah, same with the Text-to-Speech V3 API is also available, which is great for developers. If you want to build something now, you can take 11 Labs Text-to-Speech V3, which is their best speech model.

take that, generate amazing dialogue, and you can add speech tags like laugh or giggle or whisper to make it super realistic. Then you take the text to music API, have a nice overlay jazz background or whatever, like jazz music background or whatever. And you can then take the sound effects and add sound effects as well. And so it's basically end to end, you can create a super realistic storyline scene.

Pierson Marks (41:17.207)
Totally.

Bilal Tahir (41:19.412)
using all the tools.

Pierson Marks (41:21.269)
Yeah, no, it's wild.

Bilal Tahir (41:22.062)
It's overpowerful combining them all together. think just there's some alpha still I think in combining these disparate tools, you know, like people do is VO3. One of the reasons VO3 is so wireless because it has video, has sound, it has dialogue, right? It combines it for the new person, but it's still expensive. And you can kind of get there if you just stick together these tools, but it just takes some work. But that's the alpha, right? Right now, you know, there's only a certain number of people can stitch those tools together.

Pierson Marks (41:48.225)
Right, totally.

Bilal Tahir (41:52.016)
you

Pierson Marks (41:52.569)
And there will always be some alpha here. I think there was an interview I was to the other with Greg Brockman he was just like the president opening I and He was just saying how When he started computing he's like man. I'm too late. The internet already happened There's gonna be nothing else for me to build. It's like oh, yeah, and so it's just so funny You know, it's always like that way. It's never it's never gonna be too late so if you're listening to this you're like man vl3 came out and

Bilal Tahir (41:54.99)
Yeah.

Bilal Tahir (42:06.337)
Right.

Yeah.

Pierson Marks (42:17.72)
Everyone's going to be able to do all these move... No, it's always going to be alpha. It's going to be... It would be more or less depending, for sure, but there'll always be another wave, another opportunities, new things, especially a space that moves so quickly.

Bilal Tahir (42:30.907)
it's all about the mindset. Some people just have that mindset. was, peer levels put out, know, know, peer levels just in case no one knows. He's like this OG Indian actor runs, makes millions of dollars on his like solo dev businesses like photo AI. He has such a, I think his superpower is just like.

It's not even like some unique insight. He's just looking at it from a sales angle like oh wow I can make a product out of this and I can sell it and his most recent thing He was looking into was what we were talking about. He's like, oh, I literally went to my friends and was like hey You have a shop you have a product I can create this avatar and you can like edit the video and add your own like thing and if the avatar can sell it and then you can put it on Tiktok or whatever UGC again something if you've been in this world people have been doing for months, but like he just

you know, the way he phrased it, the way he was like putting it out, like going talking to users and like, oh, and everyone was like, yeah, I would love to have that. Cause most people don't live on the internet. It's just us folks. And they don't know you can do that. And you know, if you can package it up into a nice, you know, awesome product that makes it clear what the value add is and then go find these users, you can make a business out of it. And you probably will. I think you'll make bank on this.

Pierson Marks (43:43.287)
Absolutely. And we're not saying it's easy, but we're not saying it's going to be as hard as... It's not scary. I mean, it will be hard. And I think he makes it seem easy, but it's not hard. It's like the difficulty comes from the ambiguity and doing, like actually doing things. I think so many, so many difficulties in running a business is like...

the unknown unknowns, you don't know even like you should be doing, you don't even know how to approach because you're not even thinking of them. It's like, you don't know what you don't know. And the only way to minimize that is like, just trying.

Bilal Tahir (44:11.341)
Hmm.

Pierson Marks (44:25.568)
and you'll try the first time and you'll fail and you'll realize, you know, in hindsight, I should have done this differently. And the next time you go try that and fail. And so Peter levels. I mean, he had a lot of products before, like now where he is to get him to this point point where he is. He's also, you know, he's non-traditionally trained engineer, but like he does some stuff like the what you're talking about, putting like the road lip gloss on the UGC creators.

Bilal Tahir (44:36.206)
Right.

Bilal Tahir (44:49.134)
Hmm.

Pierson Marks (44:51.606)
Like he has some energy, like he has skill to do that. if you ask me, Hey, could you do this right now? It would take me a long time to do. I wouldn't know. And I probably, I wouldn't be able to get to the level of quality that he does at all. Like he's been doing this stuff for so long, but just taking his mentality and being like, you can do whatever you want to do. And it's just going to be, it's not going to be a straightforward or necessarily easy, but it's not going to be impossible.

And you should try it and iterate and push through the hard.

Bilal Tahir (45:25.464)
Yeah, I mean, it's all about just like going through those reps and developing taste. It comes back to what saying. Taste is just so important and he's developed it, basically a taste for what people like. And that is like such a superpower. And it's probably going to be one of the last few superpowers us humans will value as everything else gets automated. So, develop that taste, go out, talk to users, put your stuff out there, develop a feel for what works and what doesn't.

Pierson Marks (45:53.612)
Right. Right. The beauty. Bring back beauty. love. Yeah. So. But I mean on that note I think we covered now banana. We covered Love and Labs music. We cover about image editing.

Bilal Tahir (45:57.291)
You

Bilal Tahir (46:05.9)
Yeah, yeah, this was an image editing. We should have a theme for every episode. This was the image editing episode, guess, right? last time was the world, Genie, real time, player world episode. Yeah, because we do cover a bunch of things.

Pierson Marks (46:12.418)
All right.

Pierson Marks (46:17.048)
It's hard sometimes to do a themed episode because... Right, but yeah, image editing episode. hopefully this is helpful for people who don't know how to get started but on image editing or maybe we had some tips in there to do better. yeah.

Bilal Tahir (46:35.372)
go, please do leave comments and stuff about how, if you have any ideas, yes, if you were listening to us about how you would use image editing or any of the other things we mentioned, because we love reading those and using, and we'll talk about those if it makes sense in the next episode.

Pierson Marks (46:55.981)
Right, yeah, well, on that note, episode 10 wrapped up. Create a flux, 10 episodes. Follow us on Spotify.

Bilal Tahir (47:00.802)
Woo, 10 episodes. Wow, we did it.

Pierson Marks (47:07.761)
So yeah, cool. I know, I know. I'm getting like, my computer's been acting weird recently.

Bilal Tahir (47:11.544)
No, you cut out there.

Bilal Tahir (47:17.559)
no, you gotta do your sign off again. That was cool. was a...

Pierson Marks (47:21.175)
Yeah, wait, wait. So don't even know what said. So episode 10. I don't even know what I just said. I lost it.

Bilal Tahir (47:29.724)
You're saying follow us on Spotify.

Pierson Marks (47:31.479)
Oh yeah, yeah, so yeah, episode 10 in the books. Follow us on Spotify, Apple Podcasts, YouTube, anywhere you get your podcasts and we'll see you next week. Cool, bye.

Bilal Tahir (47:42.094)
Cheers, bye guys.

An In-Depth Look into AI Image Editing
Broadcast by