Gemini 3 Flash, ChatGPT Image 1.5, Meta SAM Audio

Pierson Marks (00:00)
Okay. ⁓ We're live. Episode 25. ⁓

Bilal Tahir (00:01)
We are live. Hello, hello, hello. 25th

quarter of a century. Here we are.

Pierson Marks (00:07)
quarter

we were just talking before this about how we're going to, like we already clip up Creative Flux into little short clips and post it to TikTok and YouTube and everything. But we just released clips, in JellyBod. And so we're gonna start clipping these into audiogram videos and other stuff too. So excited to see that.

Bilal Tahir (00:26)
dog

food our product. I think it's cool. The templates are super cool. I think a lot of people will like them.

Pierson Marks (00:28)
Dog food the stuff.

For everybody that's ⁓ watching for the first time, this is Creative Flux. We talk about generative media, specifically AI video, audio, world models, music, image editing models. I don't even know what I probably forgot some things, but we try to focus on the creative side of generative AI versus the...

LLM stuff, but sometimes it goes into other things and other topics like last week we talked about Disney and opening eyes deal, which is really exciting. We also talked about like Paramount and Netflix ads, 11 labs and meta, like talk about everything. So if you want, yeah.

Bilal Tahir (01:10)
lot of these

deals are actually the one we kind of slipped by us. We hadn't talked about it. Sunu and Warner Brothers as well. That's a huge one.

Pierson Marks (01:18)
let's talk, what happened there?

Bilal Tahir (01:20)
So Soon and

Warner Brothers group that's they own all the music they had some sort of agreement where I think their artists can will are gonna be on Suno potentially which is gonna be I Suno's biggest thing challenge was being sued out of existence and I think they're finally making the deals there as well same same as a you know opening eyes doing with Disney so it'll be interesting to see how this plan pans out a lot of people like I went on their subreddit a lot of people are hating on this they're like this is the end of Suno they're they're gonna restrict stuff because they're

Pierson Marks (01:25)
WMG.

Bilal Tahir (01:53)
there's been a couple of controversial moves like the UDO, the other one. I think they had this thing where they didn't let you download your music and there was a huge backlash on that for free users. With Suno there, everyone's like, my God, I take my music off, they're gonna ban it. But I actually think it's gonna be the other way where they're gonna have more artists maybe like Grimes that she said, she put her voice out and she's like, anyone can use it to collaborate with me as long as I get my royalty. And I think a lot of you're gonna see this where musicians will just say, you can make a cover out of my song or use

Pierson Marks (01:57)
Mm-hmm.

Bilal Tahir (02:20)
but just give me my pound of flesh.

Pierson Marks (02:23)
But the thing is the musicians don't have that power to it's the labels that do. And so it's like, yeah, so it's interesting. If you're independent, like you might see Chance the Rapper or somebody and like Macklemore and those guys have more power to do that. But yeah.

Bilal Tahir (02:26)
Well, exactly, yes.

Yeah, well,

actually think the labels are more likely to sign. At least in my experience, from what I've seen, it's the artists that are very, it's like my art, I don't want to share, versus labels are like business is business, money is money, you know, sure. So, all right.

Pierson Marks (02:50)
Right, right,

totally. I think it's just like the structure is going to be very interesting to see how these deals pan out because yeah, Udio, they like prevented the downloading of music right when they... Who acquired them or like did they have a seasoned assist? I forget.

Bilal Tahir (03:02)
Did audio had an acquisition? I didn't know. I thought they were still going.

Pierson Marks (03:06)
Something happened. Let me look this up because I...

Yeah, like, I don't know if you see this, yeah, so WMG Warner Music Group, they're getting acquired or a partnership, mean, between them. And so they said that they'll be rolling out more robust features for creation opportunities and collaboration. But yeah, at least there's a partnership there. So like you have a label supporting.

Bilal Tahir (03:29)
It's

a...

It was an interesting rabbit hole. definitely recommend it. There's this, just a random fact, but there's this like, stumbled upon this channel. It's called Top Music Attorney. it's like the...

This individual, she's like awesome, amazing. So she's the lead lawyer who's suing Sono. They're suing for artists. And she was actually a former musician, but still a musician. she's a lawyer. She's talking about proper, buttoned up, this is the legal implication. But then she has a music channel where she's just singing really hardcore rock songs. And I'm like, this awesome. This person has these two personas like that.

Pierson Marks (04:05)
Right, totally.

Bilal Tahir (04:07)
It's a very interesting channel. ⁓

Pierson Marks (04:09)
Last, I'll have to check that one out. Last night I had dinner with a few other founders and Wilson Sincini, who's like one of the oldest tech law firms here in the Bay. Like they took ⁓ Hewlett Packard public or they took Google public. were the, maybe they took HP public too. you know, these guys have been around for a while.

and two partners were there and we were talking about copyright and IP law. It was very interesting, know, we've mentioned this in the past, but the latest guidance from the USPTO was how generative media, like if you're one-shotting an image, like if you just prompt an AI to generate an image, that's not copyrightable, but if you go and you use an image editing model and you spend time kind of editing,

model and like efforts there than it is copyrightable. And so it's very interesting because I have the opinion that this is just a temporary guidance. any I do believe that in the near future, any image generated by or any asset generated by an LLM or some generative model will be copyrightable, independent of whether it was created via, you know, somebody recording it in a studio like

I'm playing a guitar that's copyrightable or if I'm taking a photo that's copyrightable. ⁓

Bilal Tahir (05:24)
According

to his legal, he's saying you cannot copyright. If you generate an image out of GPT image, it's not yours.

Pierson Marks (05:32)
Correct,

that's the Patent and Trade Patent Office has issued. This is copyrightable, this is not copyrightable. I can link the show notes, I spent a lot of time. It's a huge, it's like a 50 page thing. I read it in depth because it's very interesting because I...

Bilal Tahir (05:44)
Right. Really?

And is it because it's a derivative work? Like, what is the technical classification of, like, why you can't do that? Like, it's not yours. It's some sort of.

Pierson Marks (05:56)
So, I mean, there is, so I don't know like the right answer to your question, like why it's not, but they say it's not. It's more of, hey, if you go to Chad's GPT and you ask it to generate an image, let's say bald eagle flying over the White House, that image that you generated via prompt, not copyrightable. If you took that image and you took it into Nano Banana Pro,

and you edited the image, you say, hey, you selected the beak and you're like, hey, put a snake in that beak. You selected the grass, you put some flowers in the grass and you put, like say, hey, you went over here and you put like a person on a bicycle. You spend time to edit and like, you know, create that image more, put some more effort into it. And that's the key thing is the effort ⁓ and then effort, right? And so then you get into this world where it's like, okay, you one-shotted the image, not copyrightable.

Bilal Tahir (06:39)
Right. Yeah, what constitutes effort?

Pierson Marks (06:47)
you spent some time editing the image, copyrightable, right? And they're saying, it's because of effort. And then you have this question, okay, well, if I take a photo of something that is copyrightable, you know, I could take a photo, I could spend an hour lining up the perfect photo of Yosemite and like spending a lot of time and effort to get the right shot. Or I could take out my phone and slowly snap quick pictures, spend no effort.

Bilal Tahir (07:08)
Yeah, maybe that's

the solution. You generate the image and then you screenshot it.

Pierson Marks (07:12)
Well, you know, like when you get into the lots, it's not about, it's like actually about intent and tests.

Bilal Tahir (07:16)
Yeah.

But I agree with you. This

is a very midway take. because we're going to be generating tons of images. this doesn't even, I mean, what if it transfers into other domains like music? talked about, I don't know if you explicitly talked about it, but there was an artist, Zania Mune, she just signed a $3 million deal using, and she uses, know, she was the first human artist using AI that broke, that's, you know, is making money out of this. And there's probably others as well. But if she can, I mean, with music, it's even more important that you

copyright to your songs right so

Pierson Marks (07:50)
you have to be

able to assign that IP to yourself or some person. ⁓ Yeah, so you'll see because that deal would not have been signed if they believe that that those works were not copyrightable. So they definitely are. And so there's a threshold test, like with any in any legal domain, there is a test of like it's just more like a three part test. I forget the test for this, but like if it passes the test, it passed the test and.

Bilal Tahir (07:52)
Yeah, so very interesting.

Pierson Marks (08:14)
Right now the test is kind of effort-based, which I think will not be the case in the future. think all created works, independent of the medium created, are going to be copyrightable. I'm not a lawyer, but ⁓ I think that's just going to be what's adjudicated in the Supreme Court, probably.

Bilal Tahir (08:29)
Yeah,

I feel like copyright law, it's built over hundreds of years of precedence. It's all up in the air right now because we've never lived in this kind of age where you can just reproduce everything so easily. feel like so many things just go out the window in this day and age, even with covers and stuff. I was so interested in what I wanted to do. I wanted to make a Christmas album or whatever.

Pierson Marks (08:45)
totally.

Bilal Tahir (08:53)
I just wanted to make music and stuff. But I'm looking at this IP, this is how I discovered the Slurz. I'm like, oh, can I make a cover of Madonna's song or whatever, right? And there's so many specific things because you have to get, there's something called a mechanical license, which is like what you get. Like there's literally, like if you go to a bar and sing a karaoke song, they literally have a license for you to sing that song in that area, which is called a mechanical license. And you kind of have to do that.

If you want to, you can't just upload a cover song on using like, let's say whatever Madonna song, you can't upload that to YouTube directly because she can't violate it. Yeah, I mean, you can't do covers. Technically you're supposed to get a mechanical license, but nobody does. So most people are actually in breach. So what they're doing now is coming up with this automatic Spotify and YouTube. They have this automatic thing where they'll get the licenses for you and then you just pay them. So there's other, I think maybe DistroKid or whatever, like they kind of do that for you.

Pierson Marks (09:28)
Really?

Interesting.

Bilal Tahir (09:49)
Then there's another one which ⁓ is called, I forgot the exact name, but let's say you take a song and you actually update it. Like you take one part of the song and you change it to rock or whatever. Like that's actually a bigger deal.

And there's actually a bigger license you have to get for that. So there's a license that is easy to get, which is the mechanical license. And then there's this whole other license. So I definitely recommend. But the fact that I had to spend an hour of my time just to get this, I'm like, what the hell? This is insane.

Pierson Marks (10:19)
So the question I have

is you know, like those lo-fi fruits channels not lo-fi but like the the fruit channels ⁓ Not enough not lo-fi No, no, no, no, as more there's these these artists there are pretty much cover artists that cover everything and it's a you take some song and Then essentially this cover artist they're called fruits like ⁓ there's like I don't know that they're on Spotify. They're on YouTube everything and you ⁓

Bilal Tahir (10:23)
Yeah.

Like Lofi Girl or, no, ASMR. ⁓

Right. I have never heard

this term before. Fruits. Interesting.

Pierson Marks (10:49)
Well, it's

like the artist or whatever and they have like, I forget, but it's like Melon and like they have like Melon cover. I think it's yeah, it's Melon radio and they have like covers of every single song essentially and they're in like different genres, different.

Bilal Tahir (11:02)
Mmm.

Pierson Marks (11:07)
They don't sound anything like the song except for the lyrics are the same and like the melody is the same but it's like in different genres different like BPMs everything and they're all over Spotify all over YouTube. I used to listen to them all the time It's melon on Spotify. They have a half a million listeners they have like The rhythm of the night blue pond to replay like they have like You'll start to see them now. I guarantee

Bilal Tahir (11:30)
Yeah, I mean, that's interesting. wonder, mean, but again, they could be winging it, but I assume they have some sort of, they've done the licensing deal because reinterpretation or so on and stuff. There's some, and there's, I'm sure there's, you don't go to the artists, you go to some centralized group that kind of does all this stuff and they just take a fee. So it's very interesting. ⁓

Pierson Marks (11:49)
Gotcha. That's interesting.

Bilal Tahir (11:52)
But

again, this is like such a headache. People should just be able to create stuff. As long as, you know, this is where technology is gonna be so important. I mean, I think the original artists, should they get a royalty? Obviously, yes, of course they should get it, but.

you should, anyone who just wants to make a cover, should they be spending hours and hours trying to figure out how to stay on the right side of the law? I mean, that's just not gonna work. Most people are just gonna be like, either give up or they're just gonna wing it. So might as well come up with some automated mechanism for this thing to work. ⁓

Pierson Marks (12:22)
Alright, I mean,

but you have the bias of not being the artist, not being the label, but you know, no, I agree, but music, the music world in copyright is interesting. But okay, let's move on because we're in a bunch of stuff.

Bilal Tahir (12:35)
Yeah, today is a

shorter episode. We're going to be taking a Christmas break, I guess, for a couple

after this, you know, let's make it good. But there's a lot of news we have to cover before we go. I guess the big one this week was Gemini 3 Flash came out. You know, I think very... It was like... was seeing every Google or Gemini employee just posting, three lightning emojis over and over again. It was, super hyped. And it, at least on the benchmarks, has delivered. It was almost as good as Gemini 3 Pro, BluePass 2.5 Pro. And that's... We've seen this before. This is kind of the expected cycle where whatever the...

Pierson Marks (12:43)
Right, episode 25, totally. 100%.

Bilal Tahir (13:11)
Pro model was previously the Flash model kind of beat set, but obviously a tenth of a cost and very fast, and that's what Flash is.

Pierson Marks (13:20)
Right. Cause you just still,

you just still the pro model down into like.

Bilal Tahir (13:24)
Right, yeah, the cycle is you every six months and I've heard Hasabas talk about this, every six months they have a run, they do a new fresh run, they get the model and the big model is the Pro model, then they distill it down to Flash. They used to actually distill it down to even more, to like a light version, they did three, but now they've kind of settled on two, so Flash and ⁓ Pro. Flash is supposed to be, Pro is obviously the state of the art model, but it's a little more costly. Flash is for if you want something fast and cheap, and that's what Flash is for.

And in my opinion, Flash is usually a bigger deal than Pro in a lot of ways because for most of things you want to do like quick generations of text autocompletes, autocorrections, if Flash is good enough for that, you would use Flash. So I'm very excited about Flash. And I'm particularly excited about

The other models that are set on top of Flash, which would be Nano Banana Flash, which hasn't come out yet, which would just be the visual model link. Right now we have Nano Banana Pro at $0.15 an image. Flash, I'm assuming, will be $0.04. Flash, the previous Flash for $0.04 an image, I'm guessing it'll be around the same when Nano Banana Flash 3 comes out. And then you have the TTS model. So we have Gemini 2.5 Pro Textual Speech.

Pierson Marks (14:25)
All right.

Bilal Tahir (14:37)
Very good and hopefully they'll release a TTS model for 3 as well which would be better voice quality, know, more bear prompting and adherence and so on.

Pierson Marks (14:47)
Totally. Well, yeah, because we started seeing the other day that ⁓ Google is rolling out automatic dubbing to all videos. So on YouTube, on shorts, automatically be dubbing all that stuff. Instagram did that. We talked about this last week with Meta and 11 Labs, but like they're going to auto dub your reels into other languages so you can get like higher reach for your videos. But YouTube too, I mean, they're just going to do that, which is awesome. I actually, somebody sent me a video in Danish.

the other day so I could like hear the pronunciation of some words. And it was auto translated, auto dubbed English. I couldn't even listen to it in the way that they wanted me to listen to it. So I was just like, sorry.

Bilal Tahir (15:24)
Interesting. So you don't even

get a choice to switch to the original.

Pierson Marks (15:29)
I don't, I bet you could. just, I saw it, it auto. I didn't really see where the button was. So I bet you there is like under like an ellipses side button, like, oh, like view original, but yeah.

Bilal Tahir (15:35)
Hmm.

But I mean, again, this is going to be so interesting because it's going to open up your market to so many areas. People tend to, we're kind of in the English speaking world and that's.

a big world, I mean it's the biggest world, language world, so there's a lot of choice, but there's a lot of cool Spanish speakers and French speakers and you know, Chinese, because that we never really get to listen to, or if we do, it's usually a...

translations of that kind of take some of the magic away. So it would be very interesting to see how all the creators of these other siloed worlds come together and they can share. And hopefully, I mean, the hippie version is like, you know, it's like everyone is more friendlier and happier, you know, understanding towards each other because of it.

Pierson Marks (16:25)
Totally. Well, it's...

No,

absolutely. I mean, there was something that I know you saw the other day, but it was like, there's a lot of Chinese papers in terms like in computer science and math, you know, like there's a problem in academia where you have a PhD and some research in some department, let's say in biology or chemistry or something, and you have another person, another department, another university that don't collaborate and they never like connect the dots between, my research is over here, your research over there. They don't even know of each other. And you have to have like some sort of...

Bilal Tahir (16:34)
Yes.

Pierson Marks (16:56)
collision between like, ⁓ I connected the dots from this paper from like 1980 and this paper from 2010 and combined with my research. And that's how new discoveries get made. But the key thing is like, well, it's hard because, okay, they're in different domains. You have to kind of actively go and think about this very hard thing to do, but they're in English. Imagine also another dimension where there's a paper in Chinese or a paper in Italian.

and like that had some discovery, had something, I was like, it was cool in isolation, but wasn't like revolutionary. But when you combine that research with your research, that's amazing. Like you have like a universal ability to connect the dots.

Bilal Tahir (17:36)
That's what, I mean, this is

crucial point to do.

can explain what you were talking about there. So what happened was there was a story, it was a, I think it's called Chinese Archive. The big site that has all the papers called ArchiveX, I don't know if there's an X or Archive with an X.org, that has all the papers on it. Everyone publishes there. And some guy basically made a site called Chinese Archive which takes all the Chinese papers, translates it, and puts it in a mandate. And he was actually looking for sponsors. He's like, hey, if anyone wants to And some guy, Good Samaritan, just, you know, he said, I'm going to sponsor the whole project because this is very crucial.

And it is crucial because I remember I read this book by Kai Fuli called China and Superintelligence in 2018, 2019 years ago. This is before ChattGBT. AI was just becoming the big thing. he made such an important point because he's from China and he goes back and forth. He's an investor here and there. He's like one of the things people don't realize, as soon as a paper here comes out, somebody takes it, translates it into Mandarin, and it's circulated within all the WhatsApp group.

So when we chat and everywhere, takes that.

and then a paper comes out in China and nobody does that in English. So it's a one way door. They take all your knowledge and they learn from it. And it's not like they're hiding it. They'll produce papers there, but we just don't read it. And so it's so important, particularly with China. Because China is not like, there's this bias that, the West has all the innovation and China is great at maybe manufacturing and stuff like that. And that's true, but also like today, there's a lot of things in which China is basically at par with the US and they're even leading. Like robotics, for instance,

people think about Tesla, et cetera. I actually think if you actually look, follow this space, China leads, the Junetree and everything, they've blown past anything America can do in robotics. so...

If you are a robotics engineer, you probably want to pay attention to what they're doing and you want to learn from them. And same goes with LLMs. We get the DeepSeek moment or K-Meet K2. It shouldn't be a surprise. It should just be like, yeah, we've been following them. We know what's up. We know they're smart. And there should be, for us as consumers, it's great if the people, the Ilias of the world here versus the equivalent of there, they are collaborating because it's great for us.

Pierson Marks (19:34)
Mm-hmm. All right.

Totally. Well, think about this.

So I mean, this is in the context of podcasting. So we're creating this podcast in English. You know, do they have a Chinese version of this episode?

Bilal Tahir (19:52)
Right.

Yeah. Now you give me a,

we should totally be doing that. We should be dubbing ourselves in Mandarin and French and just like be publishing that.

Pierson Marks (20:01)
think Lex Friedman and

Huberman and some of those guys do it.

Bilal Tahir (20:05)
Yeah, but they pay for that. yeah, they have a, because I remember Lex treatment had an interview with Modi, the Indian prime minister, and it was in Hindi. And it was, I mean, pretty decent. I think he used 11 labs under the hood for that. Yeah.

Pierson Marks (20:18)
Right. That makes sense. I wonder if Spotify

will partner with 11 Labs or to offer like auto dubbing like this, because it seems like a no brainer. Like I wrote a blog post the other day just because we release clips and jelly pod like content repurposing. We take a piece of content, a core piece of content, and you repurpose that into blog posts. You purpose that into clips. You purpose that into all this other stuff. That's one dimension. And the other dimension is repurposing into other languages. So it's like you have a much broader.

Bilal Tahir (20:36)
Okay.

Pierson Marks (20:44)
addressable market in terms of the audience.

Bilal Tahir (20:46)
And you know who was one of the first people to figure this out? Mr. Beast. Like as always he's been ahead of it. He actually has dedicated channels for Spanish, French, etc. And I remember I read the story. So he did his, he dubbed his channel in Spanish and he hired a voice actor for him. And it was one of the most famous actors in Mexico, South America, whatever. And people were like, why are you spending so much money? Just get a normal, he's like, no, I want someone who actually is recognizable. And actually, mean, true to Mr. Beast, like he goes above and beyond and...

Pierson Marks (20:51)
Mmm.

Bilal Tahir (21:15)
sure that's paid off for him like you know because people like wow he's got you know the actor you know from the movie whatever doing his voice so

Pierson Marks (21:23)
Totally.

You know, it's those things. think it's like so important for people to recognize is like when things become easier, the differentiator is doing the things that are hard, the things that like aren't as easy. So it's like whether it's in product design, it's like doing the things like, is it really necessary? But it's the polish around the edges that is the differentiator. It's the taste. And it's hard because in the moment you're like, it doesn't matter. Like, I get this thing out, but it's deciding.

Bilal Tahir (21:40)
Yep.

Pierson Marks (21:53)
at like how much care and time do you take into building anything, whether that's building your YouTube channel, whether it's building a piece of art, whether it's like a design, a podcast studio. It's like, it's those things that is like the reason between, I believe, like something that's like meaningful and worth time and like versus something that just no longer, like it doesn't matter to people. Cause it's like,

And it's also, it's always a,

gradient kind of between but like it's and that's a tough thing it's like okay at what point is it too much does the marginal returns no longer pay off but like hey oftentimes I do so

Bilal Tahir (22:31)
I agree,

definitely that execution matters more than ever, especially in today's age where you can build anything if you want to.

Pierson Marks (22:40)
Totally. For sure. So there was a few other

things that I wanted to get to also. One of the things that excited me was the SAM, the Segment Anything audio model. So this is released by Meta. They pretty much, I think it probably a year ago they released the first SAM model. And Segment Anything essentially for people that aren't aware, it's like the ability to distinguish objects from other objects. So.

Bilal Tahir (22:50)
Yeah.

Pierson Marks (23:06)
The original models were image and video models where you could take, let's say, a video of a soccer game or football game and you're able to identify players, the soccer ball and like segment essentially that frame or that video into unique objects. here's the goal, here's the soccer ball, here's the player, here's another player. And so it's kind of like edge detection on steroids and like object detection, segment anything. And then Meta this week, they...

really meant segment anything, so they did it with an audio model. And so you can now isolate individual tracks and audio objects inside a single track. So if you're listening to a movie and that movie has audio where there's maybe it's a plane flying overhead, you have like people talking in the background, you hear like all these different noises and maybe a guitarist, you can prompt and say, hey, give me the guitar audio and I'll just extract that guitar audio out.

Bilal Tahir (23:57)
Yeah.

Pierson Marks (23:59)
and you can just listen to just the audio. It was really cool.

Bilal Tahir (24:00)
Yeah.

No, it is really cool. And I know we talked about use cases and stuff for it. I remember one of the first, there was another model that did this. It was not as good as.

Sam right now, but I remember I made acapella versions of like famous bands songs and stuff and you'd probably I mean there's probably so many cool things you can do when if you can like personalize songs maybe I mean that you could imagine a world where people listen to shows and there's a Netflix setting where you know somebody's hard of hearing they're like I actually want the vocals to be louder like you can actually like literally have dials not just volumes you know people would say they used to just have a volume dial you know we have like 10 things you know now

So super interesting.

Pierson Marks (24:42)
That is interesting. I'm excited more on the like what I like live DJing and like music creation too I used to like DJing and music production and stuff and I could just imagine you have some musician behind the keyboard because like DJs will have a keyboard sometimes and they'll have like discs and All these like knobs and buttons, you know And when you're selecting the songs to play if you're actually doing a live performance and you're actually choosing the music

you can think about like, like I want to bring this next song in, isolate, just give me the, like, just give me the vocals and the bass and like do things that you couldn't do with like normal equalization. And like, it'd be very cool because you could live remix things very precisely, even if you don't have the underlying tracks. So.

Bilal Tahir (25:30)
It's super interesting. And we've talked about, I mean, Jelly Pop, maybe there will be use cases in PodcastLine as well. You can do all sorts of cool things and layer stuff on our slice and dice. So, yeah, fun.

Pierson Marks (25:30)
Yeah, so.

totally.

Let's

talk about slicing and dicing. I know you were doing something with Nano Banana and the grid flow.

Bilal Tahir (25:48)
Yeah, I think this was

such a cool trick. I'm not going to take credit for the idea, but the thing about Nano Banana is two things. First, it's awesome. It really is the best model in terms of image editing. And GPT Image 1.5 came out this week. It's good, but still Nano Banana I do think Nano Banana is still better than OpenAI. I tried to answer that. And they did.

great job and the price is cheaper compared to what they had before. The yellow tint, I think, is gone now, which is nice. But Nano Banana is still king and it's great. But the problem is it's expensive. It's 15 cents an image. So one of the flows that I've been seeing going around is, if you want multiple images, what you do is you ask it for a three by three grid or some people say three by three is too much, two by two is like the sweet spot. But you can generate that grid of images with the reference object.

you're like, give me one where he's in a car, where you're standing, do that. Then you can basically either using edge detection or just because the model actually will give you a pretty standard image, you can basically use width and height and slice that image into four or six images. And then use an upscaler like crystal upscaler or there's one. ⁓

people love the SC VR, et cetera, but that takes a while. But you can use that and then actually upscale that two to four times and basically get a high quality image. And so you get four images for the price of one, because the upscaling and the slicing isn't as expensive. It's basically minimal. I tried this, it actually works really well. But also, caveat is, it depends on the subject. So for me, as a brown dude, I realized that

face can get distorted a lot, you you make thumbnails so you probably notice this banana as well, like my face gets, I'm like that's not me anymore, it somehow loses the base image at some point. the more steps, think of it as a lossy process, so if you're willing to lose a bit of the subject, know, it's fine and depending on your use case and obviously on the subject, know, I mean if you're making Ferrari cars and stuff, it'll probably, you know,

make that pretty well so but it's a great flow I definitely recommend checking it out and you can also use it to animate stuff so I've used it to take those images slice and dice and then animate and make five second videos stick them back together and it looks like a photo shoot so somebody did this Bill Gates Lambo photo shoot if you've seen like he's just like it's awesome like he's just like you know in this like he's like a Miami drug dealer like just sitting there like in his tracksuit in the Ferrari and they're doing different shots coming in and out so

Pierson Marks (28:00)
So,

I didn't see that.

That's

Bilal Tahir (28:26)
Really cool,

Pierson Marks (28:26)
funny.

Bilal Tahir (28:26)
actually think this could be a new, because we've seen this with other, you know, people will, know, chat LGBT went viral because people could make studio cubly images of themselves. We've seen with Nanomaniac was like people could put celebrities next to themselves. And now I think photo shoots and like, you know, actually video animations of yourself is gonna be the next viral meme. You're gonna see this blow up.

Pierson Marks (28:38)
Sora 2.

Totally. No, super interesting. And I have a quick question I want. So if you're creating a four by four or two two by two grid of yourself and it distorts or loses some sort of like the aspects of your face, do you say, and you wanted to keep the character consistent for the next generation? Are you taking one of those segments of the generated image and putting that into the next model? Or are you taking the original base image of you and regenerating? Like which do you go from like the output?

and then take that output into the next one or to go back to the original.

Bilal Tahir (29:19)
You can do both. mean, I've done actually, my original thing wasn't the detection. What I did was, you can actually ask Nano Banana to take the grid and the base reference be like, actually, just give me the top right or top left.

give it and actually that method will give a much like it'll actually more be consistent because nano banana actually is has the reference the other thing you can I guess I haven't tried but once you upscale the image and you feel like it's not quite the same you can just actually do the plain old editing just give the base reference and be like hey can you make the subject consistent so I think there's a lot of flexibility you can do a lot and these models will just get better where you hopefully the amount of steps you have to still it's like you know this is like still hacky people are making all these like node work

workflows and stuff is going to go away. You should just be able to make a one-step thing out of it.

Pierson Marks (30:08)
Totally. Yeah, we'll get there. know, everyone's like,

AGI is coming. No, still have node-based workflows.

Bilal Tahir (30:12)
Yeah, or tooling around it. I haven't

used it, actually I my list. ⁓ Hicksfield is another company. I thought they were just a wrapper company, so I dismissed them for the long time, but they do apparently have their own models. Also, they've, from what I see on Twitter, the creators love them because they've, the UI is really good. So we've seen this with Jellypot, like we pride ourselves with our UX. Sorry, I missed that word.

Pierson Marks (30:19)
yeah, Hicksfield School.

And they also pay a lot of people for...

And they also

pay a lot of the creators to talk about Hicksfields.

Bilal Tahir (30:39)
yeah, that's a good point too.

I mean, hey, it's a strategy. But I do think their UX is pretty good, so I wanted to actually dig deep ⁓ into it when I have time. But I think if you're a developer or you're just building tools, there's so much alpha in just stitching these tools together in the way you do something and just perfecting it for your workflow and putting it out there. And you'd be surprised how much demand there is for someone who doesn't want to even take.

Pierson Marks (30:43)
It true, it was good.

Bilal Tahir (31:05)
15 minutes to figure out how to go from banana to cling or whatever. just like, they want a one click button. They want to an image and they want to get a photo shoot, you know, like the flow we talked about that's like just three steps. If you put them together, you can just have a, oh, you upload an image and you get a photo shoot out of it. Like, you know, a 30 second video. People would pay 10, 20 dollars for that.

Pierson Marks (31:09)
Totally.

I don't know why, I

don't know why FAL doesn't allow like a workflows app store. Like I would build a workflow and then you can essentially set your price for that workflow. And it could either be like, the base model prices times like some profit margin and just have like, I would love to build a workflow that anybody could use and I could make profits.

Bilal Tahir (31:34)
Yeah.

Hey, Fanny, what's going on?

If anyone from FAL is listening, that's a great idea. I actually think that would be pretty sick, know? Because it lets them invest. It gives them an incentive for people to actually build cool workflows. I actually found a cool workflow that someone shared for this that helped me a lot, too. And FAL work flows from what people don't know. So FAL is probably my favorite platform for AI inference models. A workflow is basically somebody will, what they let you do is you can take a model and take the image generation of that model, put it into another model, basically make this

Pierson Marks (32:00)
Hmm. Sorry.

Bilal Tahir (32:15)
workflow, once you do that using a GUI, you can actually have one API endpoint. So you can call that API endpoint and go through all those steps. So very powerful.

Pierson Marks (32:17)
All right. All right.

It's super cool.

Absolutely. Well, and on that note, powerful workflows, image editing, all the cool stuff, nano, grid flow, Sam, amazing. ⁓

Bilal Tahir (32:35)
Yeah, I know.

but yeah, this is a wrap for 2025, you know, awesome time. So all right, take care guys. Happy New Year. Cheers.

Pierson Marks (32:39)
Wrap for 2025. We'll see everybody next year.

Absolutely. See you all. Bye.

Gemini 3 Flash, ChatGPT Image 1.5, Meta SAM Audio
Broadcast by