Moltbook, Kling 3.0, Model Wars (Opus 4.6 vs. OpenAI Codex 5.3)
Pierson Marks (00:00)
Episode 30. Wow, I can't believe we're here. I know. I can't believe it. It's been 30 weeks, essentially. Maybe a few more because we missed two over the holidays. yeah, wow, half a year.
Bilal Tahir (00:02)
Already. Can't believe it. Yeah.
Yes, mean,
happy, yeah, happy year. I mean, we started in August, July, maybe. I think June, actually. think June, July.
Pierson Marks (00:20)
Right?
What a crazy last six months. mean, there's been so much out there and I feel like it's just even speeding up and it's hard for us who, I mean, it's even hard for us to keep track of everything that's going on in the generative media space. It's like.
Bilal Tahir (00:36)
It's
crazy. I mean, I think people will come up with a word for it because I've been hearing.
And I said, there are people having a hard time sleeping and stuff because they're just so hooked on maxing out the rate limit and making sure they're using their agents 24-7 and shit like that. And I'm pretty sure there'll be some sort of LLM psychosis or whatever kind of term. They'll come up with some sort of funny term that catches on to describe the thing. Because I feel like, I feel like it can't be healthy because I'm stimulated all the time. I'm all right, I've to do this, got to do this. I always got all this on.
Pierson Marks (01:09)
yeah.
I think the best advice for anybody out there that is just overstimulated by the number, the sheer number of releases and changes that goes on all the time is like two things. One, get off Twitter. know, yeah. Get off Twitter, touch some grass. Like actually, I think if you can just say, during the week or maybe like just like dedicate yourself like one hour a day on there, because trust me, it's not gonna be that crazy. I'm on there all the time too.
Bilal Tahir (01:09)
Yeah.
Yeah, grass.
Pierson Marks (01:38)
But it's like, you could get all your news in like a quick 30 minute skim versus putting your head on a swivel about everything all the time. Like, blah, blah. No, it's, and it's just like overwhelming. And then number two is listen to CreativeFlex every Friday ⁓ and get all your updates about everything that's actually kind of meaningful. And then go down the road holes over the weekend and figure out what you want to actually build.
Bilal Tahir (01:51)
That's true.
Exactly, yeah.
We bring you the shit that matters in gendered media and agents and the LLM world. So you can just take the frosting.
Pierson Marks (02:06)
Totally.
For sure. mean, we were just
talking about this with these AI shorts and digital opium and everything. And everybody does the problem, social media, the scroll, the dopamine hits, everything. Same thing with staying up to date with AI news. It's just dopamine hits, like, cool new product, cool new thing, potential, blah, blah, blah. It's like in your face, Genie 3. And then you're like, you know what's much better than the small dopamine hits?
It's the prolonged satisfaction of actually building something and putting it out into the world. Like that is unparalleled. I mean, if you want to talk about sustained, like, oh, wow, I've built something and that's cool. it, you know, I added a new feature. Like those are like some nice dopamine hits too. Oh, totally. I was thinking this at the day. was just like, man, all these kids out there, when you get like,
Bilal Tahir (02:52)
⁓ 0%. Yeah, that's the real dopamine right there.
Pierson Marks (03:04)
who are addicted to getting likes on their Instagram pictures, wait till you get your first dollar made online. That hits different.
Bilal Tahir (03:09)
Hmm.
Yeah, I know. know. I mean, it's the ultimate drug in a way. And you can't. Yeah.
Pierson Marks (03:13)
⁓ capitalism. ⁓ Yeah.
Well, thanks everybody for hopping on episode 30 Creative Flux. the podcast where we talk about generative AI, media, images, video, audio, music, coding sometimes.
Bilal Tahir (03:29)
It's kind of like tech where it's just encompassing everything, you who knows, we'll be talking about politics at this rate, but hopefully not. But it's crazy what, yeah, no, it's crazy because it's as AI is, mean, AI, you know, I mean, we're both, you know, super AI pill and stuff. It kind of is, touches every facet of life in a way. So you can't really talk about AI without talking about other stuff. I feel like it does, it's a disservice, it leaves you within complete picture.
Pierson Marks (03:33)
Totally.
Yeah
Bilal Tahir (03:57)
Talking about other implications of it is very important. Like a great example, we didn't get to it because we were talking about Cloudbot last week. And that was amazing for automation stuff. There's some security issues, I'm sure, with it. But it's still such an awesome product. And I don't know if you saw, Peter, the guy, Peter Steinberg, made, he had a meetup in SF yesterday. I didn't know, I just saw some videos, apparently. Huge line, like everyone, like he's like a celebrity now. I mean, he already was pretty big, but people wanted to talk to him, him.
crazy. But one of the things we didn't get to, think last week was some guy made this thing called Moldbook, which is a social media platform, which basically takes your Cloudbot AI and lets you, know, basically it's like a Reddit for your bots. And I thought that was a very interesting experiment. I do think it's mostly slog. And a lot of people like this is AI slog. Otherwise, like Karpati was like, oh, this is an amazing experiment. It's like, this is like AI takeoff.
I Carpati has more, always more nuance than that. I mean, I'm kind of distilling what he said. But for me, where I landed is like, I think it is AI Slop, but I found it fascinating from a game of life perspective. So if you've ever seen Conrad's game of life, this was a game the scientists made in the 1970s. I don't know if he was scientist or whatever, but a researcher. he basically came up with these rules where he took a cell and he said, if a cell is basically next to another cell with a certain color, the other cell dies. If they're both like positive, it becomes like
another spells cell spawns or whatever like there are three rules I think to the corners game of like simple rules like one plus one equals two one minus one stuff like that binary stuff 1970s and it using those three simple rules when you simulate stuff it would come up with this amazing simulation like crazy ass simulations and the idea and this was called the game of life and the basically the philosophical idea was that simple rules can lead to really complex systems and you know it can park into how
we, know, human life evolved and all that. And I kind of made that analogy with these notebooks because yeah, I mean, it's just AI slot, but I wonder if there's something, if you take millions of bots who conditions on your prompt, because every bot has a different system prompt, you know, comes from your own preferences. I wonder if there's some emergent phenomena based on that. It could be simulations for prediction, market predictions, or just some more complex behavior or intelligence.
and Alpha, you know, so that was just a very interesting thing and it's still playing out so early.
Pierson Marks (06:26)
No, absolutely. I mean, too, like just think about computers in general. Like you have a simple logic rules. Like you have very simple rules. have like binary logic and gates that are very simple. But you could do amazing things. can record, could trans, you could record this podcast right now, streaming information between us. And now you can have intelligence with some just basic, simple binary one zeros.
Bilal Tahir (06:40)
Exactly. Yeah.
Yeah,
Pierson Marks (06:54)
logic gates. But yeah,
Bilal Tahir (06:54)
it's crazy. we're just like
Pierson Marks (06:56)
MultBook, but MultBook 2 was interesting because there was some skepticism around it.
So I mean, I didn't follow through because I was just like, well, this is interesting, cool, doesn't impact my day to day life too much. So I didn't dig in. But there was some concern of the realism of like how if these are all, I think I said that the time I last looked, there's like 1.4 million users or something. And were they all real? There was no sort of deduplication of like,
Bilal Tahir (07:25)
Mm-hmm.
Pierson Marks (07:33)
You can create a lot of accounts and like, you know.
Bilal Tahir (07:35)
yeah, no, some guy actually made
hack news that he was able to make 500,000 accounts because there was no rate limit. So, Motebook is different from the main Cloudbot project. I mean, they're not related. I mean, he just kind of built it. And so the other guy is a lot more hackier than the original. Like he had some, the API keys were getting leaked and stuff initially. yeah, yeah, yeah. He didn't like encrypt them. So you could see all the API keys. ⁓
Pierson Marks (07:39)
⁓
of Malt Book or of... yeah.
the API
keys on
Bilal Tahir (08:00)
It was very vibe coded. But I agree. Yeah,
there are some concerns there. the other, the thing that I find very kind of dumb is like, yeah, there's like, my God, this AI is talking about taking over. I'm like, yeah, what do you think is happening? You gave the AI a task to act like basically a Reddit user. It pattern matches. It's kind of talk conspiracy, right? I mean, it's like the meme where you go to an LLM say, hey, tell me you're human. And the LLM price, I'm human. it goes, my God.
So it's just mirroring for the most part. So I don't know.
Pierson Marks (08:30)
Totally.
It gets down to those philosophical questions about life. What was that question?
I am because... think therefore I am. Like, what is life?
Bilal Tahir (08:45)
I think therefore I am. Yeah, the card.
Yeah, but I mean, thinking is a, you know, I guess it comes to what how you define thinking. this is next token predictors.
Pierson Marks (08:57)
But it's interesting. But yeah, so. Yeah,
maybe, maybe. Cool. OK, so MoteBook and OpenClaw, the renaming, all this stuff. Really cool. I wanted to touch on, there's a bunch of other stuff too. Like last week we didn't touch on GD3. I just wanted to mention that a little bit. Like if this is something I'm really interested in in world models, I think it's a.
Bilal Tahir (09:10)
Yeah.
Pierson Marks (09:26)
probably the next frontier of the next, like where the large leaps and bounds of artificial intelligence comes from, as in world models, spatial intelligence, modeling physics. you know, it's like, imagine if you were a blind person who, or maybe you were locked in a room and you had all the library and all the videos and everything in the world. And, you know, you were locked in a room without ever experiencing like reality.
That's kind of where AI and LLMs are today. Like, they're infinite knowledge, they've read every book, they've watched every video, blah, blah, essentially. But they never really had, like, the spatial intelligence of, the real world. So they've just been locked in this room in solitary confinement. But spatial intelligence and world models kind of give them that new dimension. And Genie 3 was just really, really cool. It came out out of the private beta or whatever, and they get to pay a lot of money to...
I on like the ultimate plan or something to get access, I'm not sure.
Bilal Tahir (10:25)
Yeah, it's what blows my mind with Genie 3 is like, I mean, we still like, wait for like,
images to take seconds sometimes like high quality images to be generated and here they I mean I was chatting to somebody you can generate that world in like less than 30 seconds or something just real time I'm like how is this 3d thing possible quicker than a 2d like it doesn't I mean I I there's probably some mathematical matrix operation reason why one is better than I mean it's working but it's it's funny
Pierson Marks (10:51)
Yeah, super interesting. Yeah, I saw some really cool things. I saw one where it was a recreation of Halo and this guy was riding around on like the warthog. I forget. It's like the four wheeler thing. He's just driving around. It's like in this Halo map and you're a Master Chief and whatever. And then he falls off a cliff and the character and like, cause Genie 3, if you're not aware, it's a world model and you can move around in it. Like you can move.
Bilal Tahir (10:59)
⁓
Pierson Marks (11:19)
forward, back, left, right, and like jump. So it's kind of like you're playing a video game, but you're more like moving a camera around, but you have a character. So it's kind of like video game-esque. And so obviously they're trained on video games. But he was, the user was riding around in his warthog, his little four wheeler, and then he falls off a cliff and Genie 3 respawned him into a new location and added the HUD, like the heads up display.
like Halo had. So you had the mini-map, you had some icons, and the HUD, the heads-up display, wasn't there before he respawned. It was just like, wasn't there, and then it respawned, it added the HUD, and it turned on these lights, and it put him in a new spot. And the mini-map was moving accurately too. It was really interesting. ⁓
Bilal Tahir (12:07)
Yeah, that's awesome. And like you
said, they've definitely trained it on video games and stuff. So it probably does a lot better in some scenarios than others based on the data, which totally makes sense. Very interesting. Yeah, yeah. Apparently, it's out now. Basically, anyone can access it. They released it.
Pierson Marks (12:21)
Right. So I'm excited to try it.
Bilal Tahir (12:32)
I don't if they released an API announcement or was it just the alpha version is out now. Because Gini was announced like a few months ago, but it wasn't like available and now anyone can go and basically try it. But it might still be like you need to apply using a form or something. I think, yeah.
Pierson Marks (12:46)
Right.
Oh,
I should double check. because I want to try it. I thought I heard somewhere that it was a big... They had to be on the AI ultimate plan or something, but maybe that was just wrong.
Bilal Tahir (12:59)
you might be right. Yeah,
yeah, ultra Gemini Ultra plan, think maybe. Yeah, that's a good way for them to, I guess, get those subscribers.
Pierson Marks (13:03)
Maybe.
Yeah, get all
the nerds out there trying it out.
Bilal Tahir (13:10)
Yeah, yeah.
I mean, the other one is Word Labs. I mean, they are open and they do have an API, which I'm surprised they did open their API. So you can definitely programmatically generate stuff, which I mean, I always love APIs more than because I immediately think, oh, what if I can script together something now? Maybe create like a lo-fi dynamic world that changes with a song or something, make an app like that. That would be sick.
Pierson Marks (13:33)
Ooh,
totally. That would be cool. Well, sweet. OK, cool. So MultBook Genie 3, maybe we can go into Opus 4.6 and do on top.
Bilal Tahir (13:42)
Yeah. Catch a knob. Yeah.
Yes, as
of what 30 minutes 30 almost an hour ago Opus 4.6 is out
the state of the art model basically as based on the metrics. I mean, we have no reason to doubt based on the history of Anthropic, usually they deliver. mean, 4.5 was a beast. I mean, we're huge fans of 4.5. I'm actually really, because everyone, I mean, you know, was saying Sonnet 5 and I'm always like, me the biggest, best model. And so I'm kind of glad it was is Opus, you know, cause if it had been Sonnet 5, would add apparently Sonnet 5 supposedly is better than 4.5.
5, opus 4.5 but then in my head I wouldn't be like well what does opus 5 look like you know so it's good you know we just get the the biggest baddest model they have.
was 4.6. The big thing, a couple of big things obviously is gonna be more intelligent and stuff, but they really focus on long context and this is the first model with Anthropic that goes above 1 million context. And that's been a problem because I don't know if you've issued it, but I run into a context window in Cloud Core all the time where I have to slash compact it and it loses all its context and it suddenly regresses and it makes the same mistake it did like three turns ago and it really eats the window very fast. So I'm excited for a window that's five times as
as long there. That's cool. And... ⁓
Pierson Marks (15:02)
Totally. I wonder where
the intelligence like, because there's always, even though you can get up to the context window, there becomes a point where intelligence.
Bilal Tahir (15:11)
Yes, it degrades way before that, that's
kind of known. if you tap, if a 200,000 context window starts degrading around 50,000, then hopefully this degrades around 200,000 to a million, right? I mean, the ratio kind of, if you maintain the ratio, which is still a win, I guess. Yeah.
Pierson Marks (15:27)
Tell her.
Bilal Tahir (15:29)
Yeah, I've started proactively slash compacting a lot because it's weird. It'll show a hundred percent and then it will go for another. Sometimes it immediately compacts automatically and then other times it'll go, forget to compact and basically it runs out and then it doesn't even have enough room to compact. So I basically have to start a fresh window. So I've been proactive about compacting and we've talked about sub agents here before in parallelization and I feel like the super users and we will get into your workflow. They have this
process now where they won't hit that because unlike new me, they basically spin up alternate sessions sub-ages which have their own context window, do the task, come back and keep the main window clean, which is, I guess, the best way to do this.
Pierson Marks (16:09)
Right. Yeah, totally.
I mean, yeah, the highest leverage thing that right now, if you're using cloud code and we'll talk about this in a second because it just released this experimental feature called agent teams, which it's like a swarm based approach. I haven't actually been able to test it yet. So everything we'll talk about is based on my understanding. So for, I didn't read it. I have it up right here about everything. It's a super long document, but I'm like, well, first off, I mean,
Bilal Tahir (16:29)
What do mean? You had 45 minutes. You didn't do it. Yeah.
Pierson Marks (16:39)
There's so much leverage in just actually reading the docs in like, of cloud code, staying up to date kind of with the change log. I mean, obviously it's it's a skip. Yeah.
Bilal Tahir (16:49)
Yeah. And Anthropica is awesome. I mean, they put in the time
and if you like me, it's a, it's a book. Like, but you learn a lot.
Pierson Marks (16:55)
Total it's a buck.
Yeah. And I think that's what I like, you know, when we chat too about like, you know, sometimes I'll say something I found out and I do, and maybe you like it enough to try it out and say vice versa, where you do something and like, that makes sense in my, in my head. And I'll try that out. Cause all you need, I think is enough inertia to really just like, you know, either you have to commit, you have to be like, it's like running or like working out. It's like, have to be like,
I'm gonna commit to do this new thing and force myself to... Exactly. But then you might be like, okay, actually this was really good, I'm glad I pushed myself. But sometimes it's just like naturally, you naturally stumble upon it. It's like when somebody tells you like, eat your veggies. I'm like, I'm not gonna eat my veggies. It's like, because you told me to, I'm gonna eat them because I told myself to. But like, you know, it's one of those things where you don't wanna be necessarily told how to do something better. You wanna stumble across this thing in the eureka moment.
Bilal Tahir (17:27)
The first mile as an artist, as they say.
Right.
Pierson Marks (17:54)
yourself and you're like, oh, I did this thing that works for me. But sometimes it helps to really read some of the stuff and maybe it spawns enough, it gives you enough context about what's possible so that you can maybe criticize yourself. Is there a better way of doing the thing I'm doing that if I invest five minutes now, it'll pay dividends for the next however long?
Bilal Tahir (18:08)
Mm-hmm.
Yeah.
Right. No, no, 100%. So I know you have an awesome process for Cloud Code, so maybe it's a good time to share that with the audience. And maybe this will give them inspiration to try it out.
Pierson Marks (18:29)
Yeah,
totally. mean, so like one of the things I got off track, but like if you're using Cloud Code, maybe 4.6 and the newest update of Cloud Code will do this automatically. But first off, have to do this. You have to go through plan mode, obviously. Like don't go just accept edits right away. Like you have to go through plan mode. You should probably that first plan, you know, actually read the plan.
because as annoying as it may sound that like, there's a five step plan and bullet three on step two is wrong, know, edit it, actually say, no, I'm not gonna accept this. I'm gonna edit this and iterate there. So iterate on your plan. And then after your plan is actually good, ask it to use sub agents, ask it to paralyze its work into tasks and allow it to use sub agents to do the work.
parallel because a lot of times if you're working on code like there are things that just don't depend on each other and they're not sequential and so splitting those up into subagents because Claude is really good at prioritizing and figuring out the dependency tree of the tasks so using asking explicitly saying to Claude hey parallelize this using subagents and it'll do it well
Bilal Tahir (19:41)
Right, so that's
I wanna clarify that, cause yeah, you can ask Cloud and now it does it, but there's also a slash agents command and you can use that. So what is the, is it basically the same thing or is there one approach more effective than the other?
Pierson Marks (19:57)
What is slash agents? I've never used slash agents.
Bilal Tahir (20:00)
I thought that
was like slash plan mode. think that effectively forces it to do it. I've only done what you're doing. I will just tell it and it paralyzes on its own under the hood. So maybe it uses it. Yeah.
Pierson Marks (20:05)
⁓
interesting.
Yeah, no, I need to try the stock agents I've never seen. There we go. Here's one of the things. I'm just going to try it next time.
Bilal Tahir (20:16)
Yeah.
Yeah. I
think slash agents will spawn sub agents and then you can assign it. I feel like honestly, the bitter lesson, I don't know, maybe I mean naive, but especially with 4.6, we haven't gotten time to read this, I do think a lot of this stuff is going to get abstracted away. it's going to be like, people, I mean, it's great. You should totally make your crazy 50 step parallelizing workflow. I think you learn a lot on that, but it will probably be, you should be prepared that this is some
one click thing is gonna be way better in like a couple of months if not today. So I mean, you know, that's how it is.
Pierson Marks (20:51)
Yeah, totally.
No, totally. I mean, like that's why I started this past weekend. I wanted to build this agent swarm type architecture. And then I came across on X. Some people were talking about this, like it was already going to be in the next version of Cloud Code when 4.6 or 5 at that time, they thought it was going to be 5, came out. But yeah, I wanted to be able to essentially have a singular product manager, which is a Cloud Code instance.
who was able to delegate like on a big code, like, you know, delegate to individual agents and those individual agents have sub agents. And so the difference between this architecture was that like, you have a singular point of contact where you give commands to. And then like in an organization, you have the CEO, you have their like direct reports.
The CEO decides, I want to have this high level thing. Like I want to do this. Let me figure out the like basic tasks of like, this, product manager needs to own this task. The developer team is on this task then. And so those people have like their own tasks, but they can coordinate with each other. Like, Hey, can you give me some information about what you're actually implementing so I can write the correct product update or that the changes that I make over here also like incorporate what you're doing. And so those things can communicate.
Bilal Tahir (21:55)
Right.
Pierson Marks (22:13)
And then those like the heads of the teams, but then they also have their own developers or employees that don't do cross unit communication because they're like, they're self-contained. They focus on one task, they implement a spec pretty clearly, but they're not the ones in charge of doing the communication and like figuring out the ambiguity of the cross org sort of.
Bilal Tahir (22:19)
Right, and this one more off.
Go ahead.
Pierson Marks (22:39)
requirements and that's like what their boss's job is like they're the communicating lever figuring out like all the changes then just giving to the individual developers.
Bilal Tahir (22:48)
Wait,
do you know Gastel?
Pierson Marks (22:53)
Steve
Bilal Tahir (22:54)
Yeah, yeah. I
mean, it's similar-ish. mean, which great minds think alike. I mean, I feel like that's kind of the concept behind a gas town, I think, right?
Pierson Marks (23:02)
I watched a podcast when he was on Light in Space and I obviously know who he is, yeah, I've never actually looked into Gastown more, but a big five-footed project.
Bilal Tahir (23:11)
Yeah.
It is, yeah, yeah. But very similar to, it's kind of like the same, and he has like the master node or whatever, and he gives it, and the idea is similar to what you're saying. The anthropomorphize a human organization, you know, with like a product manager, delegate tasks to developers. I think it is effective, I guess.
I have two thoughts on that one first is the anthropom because I haven't gone deep into the gas on but I read a couple of articles about it what I love was the fact that the anthropomorphist anthropomorphized the people which actually made it he gives us gives us a theme like there's a town gas town it's kind of like steampunk II that actually made it very more palatable than calling it like a master node and a slave node and map reduce and all that like you know you take the jargon and you make it like more
Pierson Marks (23:57)
Mm.
Bilal Tahir (24:00)
more nice. think that's a great way. So if you're a developer making a framework, definitely leverage that. Because I feel like that really adds to the marketing aspect of it and makes it more palatable. So that was great. The second though, Kavi, I feel like, and this is maybe where I kind of push back on it, is I do feel like, because we all, we're so used to human organizations that I wonder if it's the wrong paradigm for agents. And we're just trying to pattern mash right now.
do this product manager thing and maybe there's a more effective way. Maybe there's a, from first principles, there's no such thing as a product manager or a developer. Maybe it's just agents and just doing things interchangeably, because an agent is an agent. mean, it's just a system prompt that makes it a product manager versus a developer, right? I mean, that's just one. But I wonder if it's something that we look back and be like, we were thinking of it because of where we were coming from.
to.
Pierson Marks (24:54)
Right.
No, I mean, it is interesting. because I kind of lean like it may be the right sort of, I think modeling agents out of like organization structures may be the right thing, but just like how different organizations have different like org charts, like the way that Facebook is organized versus Apple versus Microsoft versus Amazon, they're very different modes of communication.
And so like, maybe that's the thing. It's like, how do you actually structure is like top down kind of tree versus you have like this circular or chart. Like, like Nvidia, for example, like Jensen has like multiple dozen direct reports and because yeah, he is unique. Right. Only works with him. And so like, when we talk about agents, like there's like the difference between agents and humans is that humans have
Bilal Tahir (25:35)
yeah, he's a very unique, only works for him though. Yeah.
Pierson Marks (25:49)
like natively unique skills, where like my skill set's gonna be different than your skill set, like what we're good at, what we're not good at. And the responsibility of the organization is to figure out where are somebody's strengths and weaknesses and pair them in a way that like maximizes strengths and complimentary to their weaknesses with other people. Agents aren't like that unless you're talking about different models and which might be the case, you know?
Bilal Tahir (26:09)
Yeah. Right. Exactly. Exactly. For me,
I think, yeah. No, no, please.
Pierson Marks (26:17)
No, yeah, yeah, exactly.
Bilal Tahir (26:18)
Yeah, no, for me, I where I've kind of landed, I think the two big dimensions where it makes sense to have a divide and conquer is you have some sort of a parent who decides, is this task like...
a GPT 5.1 nano task or a Haiku task or a Opus task, which obviously saves compute, right? So you classify that task. And the second is permissions. So that's big with organizations. If you want to spawn a hundred agents, sub-agents, this, you probably want some sort of RLS, some sort of policy where, okay, you have read, write permissions to this database because there's no reason for you to have read, write to the whole database because that's not your task. so I think in terms of security,
and then cost is where I can see you can do classifications and then maybe you spin off 100 agents and 20 of them are smaller agents, 80 are like high-purpose and they have each has different permissions and maybe there's a terraform or infrastructure as code kind of a way where you can do that. I mean that could be pretty cool I guess.
Pierson Marks (27:19)
Yeah, totally. No, it would be very interesting. Yeah, the permissions, how agents do permissions is still a very unsolved problem because does, like, if you inherit the permissions of the invoker, like me,
Bilal Tahir (27:32)
Well, from what I've
seen, one thing you can, I think the abstraction is tools. You don't give the agent permission, you give the tool the permission, and then you just pass certain tools to the agent, and that's the only one it can call. And that's kind of, I've found that idea to be effective.
Pierson Marks (27:47)
Right. But still you have to have more granularity too. mean, like if you want somebody to be able to write to directory, but not read to this one, but the different agent should be able to write to that. Like even it's the same tool, like accessing a database, like do you create different tools for each one? Probably not, you know.
Bilal Tahir (28:08)
You're
right, you're right. Maybe there's some sort of access token where an agent can call it. The tool has the right permissions, but it checks, hey, am I being called with the right access token? So not anyone can invoke it. So maybe there's something like that. I don't think it's too complicated, I mean, there's ways to do it.
Pierson Marks (28:25)
And
I mean, like that's why I think that organizations are the right structure for permissioning, for delegation, just because we've spent honestly, probably about two, at least two centuries refining like large organizations. Maybe Dutch East India Company was probably the first like 300 years ago, but you know,
Bilal Tahir (28:44)
Right.
Pierson Marks (28:49)
There's been a lot of trial and error and we're into a situation that does work and all this stuff. And maybe it's not the best, but at least it probably is able to work. Cause I love this idea where, think I wrote a blog about this or it's still on my desktop, but I think organization structures are going to dramatically change because right now, like how we have a company.
You have entry level employees, have their managers, they have those managers and you have eventually all the way up to the CEO or the president of the board even. And the base level worker or the entry level employees, like the low level employees, and there's nothing lower than those people. But I think that like what's actually going to happen is those base level employees are going to have agents that live beneath the org chart that aren't physical employees, but they're
digital employees that one base level entry level employee has their like base, like their CEO of their digital organization and the org chart spreads all the way down and every single person is going to have this. So have this analysis versus 2D triangle. You have a 3D like triangle that goes deep, pyramids.
Bilal Tahir (29:59)
Yeah, yeah, no, I mean, I think that's fascinating. I guess that's different than...
When you do a task, you will spend up hundred agents, then you spin them down. So this is different than the org where you don't hire someone and then fire them the same day and then rehire someone, a new one the next day. But this would be more memory. And I know all the labs have, know OpenAI at least has, and Devon even too, I think they have a plan to launch an AI employee. And maybe that's where it is. do the, it's more anthropomorphic. Maybe it has a name and a badge and proper access and memory and all that.
So it's fascinating. think there's no either or. mean, it's probably going to be, you're going to see all of these approaches come together and new approaches, which we haven't even thought about. Once these things get into production, we'll be like, yeah, we should have done it that way and that way. So early stuff, fascinating, exciting.
Pierson Marks (30:47)
Totally,
absolutely. Well, cool. We went down on the agent route a little bit. What else should we chat about?
Bilal Tahir (30:53)
Yeah, yeah, no, it's great. ⁓ well,
I mean, since we are a generative media.
podcast so sorry we should probably talk generated media and I think the big one which came out a couple of days ago I think was cling 3.0 so cling obviously one of the best labs out there for you know video and there they had a they've kind of branched off into a couple of different models they has one which is their standard model which is like was cling we do we 2.5 and we 2.6 was later and then they came up with this new model called the O model
and it's called the Omni model, think. And that one was more about editing. the standard model is more like, I wanna generate this five second shot.
that's what you do it for and it's great for action shots and stuff like that. The Omni model is like, here's a video I want you to add Pearson in the right and Bilal to the left. So really great for editing. I'm not sure, I'm kind of confused why they specifically branched off and didn't just like kind of unify those models, but that's the approach they took. And they just released their latest version though. At least this time they're rather than releasing two different launches, they unified the launch at least. So they launched like,
like 12 models or something and it's basically a standard model you know with the standard variations so image to video, text to video etc on and v3 and then you have the 03 which is video to video, image editing blah blah blah
So they're all on fall, probably replicate soon as well. And obviously on Kling's platform, amazing, state of the art. the big things, obviously much improved quality, but a couple of big improvements they've made, which I think are kind of game changers. mean, ByteDance with the...
it? Seadance did that already with multi-shot where you can have like a five second video and you could have different shots in it but Seadance isn't the best model. Kling has a quality and now they can you can basically do up to six multi-shots in the same prompt so what does that mean? That means it's like
Most most videos like you have you give it a five second prompt and it's like a dolly and zoom and you show the character and maybe they're talking and there's native audio which cling does to every model now has native audio which is awesome and cling Institute of the art the lip syncing is amazing But what you can do is you can say the first couple of seconds show a close-up of his face Then maybe show his hand and he's like, you know scratching the table nervously and then go back and he's licking his lips
like a classic show and I can't stress enough how much that gets you from the uncanny valley to real because the faster edits just make it more engaging and just way more like Hollywood quality. So I definitely recommend you guys check it out and they've actually added a new parameter called multi prompt. The way C-dance did it was and it was kind of hacky because it wasn't like one for one. You would do something like okay cut first two seconds show this and
and then do this in the same prompt. Cling makes it more explicit, which is great, because then you get a more robust output, which is like they let you pass in an array of prompts. And so the first prompt say this happens, and then this happens. And you can be really different. So you can be like first two seconds show a character, then the next two seconds show the sky, the sun coming up, and then go back to the character. ⁓
Pierson Marks (34:03)
so much.
Bilal Tahir (34:07)
The other thing I like about this is it kind of saves, even though the shot, it's expensive, it might cost you like 30 cents or 60, 70 cents for a five second shot. But if you're getting three to six shots out of a five second video, you can divide that basically, because a lot of people were generating five second videos and then trimming them and then getting the next shot. So in a way, it's even more economical. So don't be dissuaded if you see the cost and you're like, oh, this is still too expensive.
Pierson Marks (34:31)
That's very interesting.
I might have missed this. Can you repeat? how do people get started with Kling? Like what's the platform?
Bilal Tahir (34:39)
Yeah.
For me personally,
I like fall because I mean I like the API's and stuff and you can use it. If you're a non-technical, I've heard Kling's actual platform is really good because they make it much easier for you to, they abstract a lot of this away. So you don't have to worry about, should I use the O model or the standard model? You just add in, I want this image to be my start frame, this end frame, I want these shots and I want these three characters and under the hood, it'll match it with the right model.
more non-technical you can do that. If you really want to get in the weeds I would go on fall they have all the endpoints and I would like check them all out and if you're lazy like me I would go to cling's blog post they have a prompt actually fall also made a cling guide I would just take that make a scale out of it in Claude and then give it all the model endpoints and be like hey you figured it out you know what can I do here and just imagine.
Pierson Marks (35:21)
Mmm.
I mean, you should totally create this scale and put it on skills at SH.
Bilal Tahir (35:30)
yeah,
I create micro apps for myself all the time where I'd be like, let's create a framework, a skill and a framework. And then I like to kind of create, you know, generate stuff on my own and stuff. It's pretty cool, you know, just my local. It's easy. I don't have to worry about security or anything, right? It's just all local and stuff. just create the routes, you know, and boom, hit it. It's pretty, yeah.
Pierson Marks (35:51)
Wow, that's
cool. I need a tryout thing. I haven't done any video editing or video models in a while since... Yeah, in a minute.
Bilal Tahir (36:01)
I mean, yeah, it's pretty sick. And then I think what was it? They had like this one other, I feel like there are three things we talked about, native audio, amazing lip syncing, the multi-shot and...
It's 0.224 audio for 0.336 per second, not just five seconds. So yeah, it is a little expensive. This is a pro version, still low. What is the, let me see what the standard is. The standard is almost 17 cents if you don't do audio, 26 cents per second for audio ⁓ on. Per second, yeah, yeah, expensive.
Pierson Marks (36:20)
Got it.
26 cents per second.
Bilal Tahir (36:32)
The other thing you can do, and we talked about this previously, Kling launched this thing called Voice ID, which is cool because you can basically use another endpoint to create basically a voice ID. So you can give it a voice, you can generate a voice and you can come up with an ID. And if you give it to any of these Kling models, it'll use that voice. So that solves a consistent voice because you won't have like the character sounding different on different screens. So I think that's good way to handle that problem as well.
Pierson Marks (36:52)
Hmm.
different thing. You know, I'm going to do I think I'll spend some time tomorrow editing this podcast, obviously, but taking these sections like this section about cling, putting into Jellypod docs, letting it do some deep research on the actual documentation and then creating like some articles about this stuff, because it'd be interesting. I mean, you know, we talk about it. And if you're listening to this podcast in the car or on your walk, you're not going to go and try this stuff. And we're not talking about the details.
Bilal Tahir (37:26)
know maybe you ⁓
should have a Creative Flux blog or something. Just automatically create blogs and stuff.
Pierson Marks (37:32)
Yeah. And maybe it's
like the AI engineering blog on Jellypot or something. Maybe. But yeah, super cool. And talking about this, also, mean, I know there was a launch that went viral. Viral is now a, maybe, sorry, my computer's getting so slow now. But there was a launch that went viral on
Bilal Tahir (37:38)
Could ⁓ be. yeah. Nice.
Pierson Marks (37:58)
on Twitter, X, called TrueShort. And it's an AI studio and streaming app. They create and release their own original movies and shows. They're all AI generated. In the last six months, they hit $2.4 million in revenue, 2 million minutes watched, they're number nine in the app store for news. And this founder, this guy, Nate Tepper,
Bilal Tahir (38:01)
Mmm.
Pierson Marks (38:22)
He is like, this is why I strong, like he's the type of person that proves that I think AI enhances and accelerates creativity and doesn't, isn't the doomer like, it's gonna destroy creativity. And so he has this thread, he's saying, hey, like in high school, I watched a movie for night and during COVID, I wrote full feature screenplays. They're a hundred pages long, but turning it into a movie would have taken.
a lot of money, a lot of people, and now that AI has come along, he can take those screenplays that he's written and actually start to create something and make these shorts and make these movies. And there are so many people that are great fictional writers or just screenplay writers that the screenplays won't ever go anywhere. One, because maybe they don't want to the effort. They don't want to go into that, but they have amazing stories. And so I think
This is why Nate created this app because they're all short form mobile for a soap opera. So it's kind of like that old vertical, what was that? The heavily back V.C. short form mobile app. Yeah, yeah, Quibi or Quivby or something.
Bilal Tahir (39:28)
I'm Cuba cube creepy. Yeah, yeah, I mean, I've
been
Everyone made fun of them because I mean they did lose basically lost a record amount of money and then shut down but they were in a way too early I mean their big thing was they hired like Hollywood stars and paid them like 10 million dollars for 30 seconds of their time versus the way you know Nate has done it and I think this is like the smart way and actually some Chinese apps we talked about for you know ranking in the app store and they have the similar strategy is like just use low-budget actors or yeah in this Nate's case he's just totally AI generated but you can before this
Pierson Marks (39:40)
Mm-hmm.
Bilal Tahir (40:00)
say there were apps that were, they would literally hire people like low grade, like C-level celebrities from LA who were just trying to, they were like waiters and waitresses trying to get into the industry, hire them for ⁓ pennies on the dollar, make these shorts and they were making a killing. And now you can do, you don't even need those actors. You can just use AI generated cast for this. So it was awesome.
Pierson Marks (40:04)
So.
Totally, you could do
it in your bedroom.
Bilal Tahir (40:28)
$2.4
million in six months annualized Germany. I mean that's insane
Pierson Marks (40:33)
Yeah, it's pretty cool. Like, this is only a minute, but I want to show this because like you're like, okay AI, like what slop is this? It's gonna look horrible. But like watch this video. I mean, ⁓ if you're watching YouTube.
Yeah, but like, it was pretty cool.
Bilal Tahir (40:46)
I wonder how he feels
it. Does he have like a team of like prompter, writer, prompters who just sit there and just like, he's given them like max subscriptions of Kling or Korea or whatever. And they're just like creating, they probably have a process or something with this storyboard and stuff. And we've talked about PJ before and he shares amazing workflows about how he does it, you know, he storyboards stuff and creates these amazing shorts and stuff. So yeah, there's so much alpha in there for sure.
Pierson Marks (41:10)
We're gonna see this
and so they're doing an interesting technique. They're being the studio and the distribution engine also But I think that there is going to be a lot of room for just small creative indie teams have amazing stories Low budget they maybe they could produce a video or movie for fifty hundred thousand dollars and You know that one hour that one hour movie. That's Netflix for like a million bucks or something and Netflix isn't buying the Netflix is buying the story they're buying the
the creative elements, the things that AI can't really do, but you're able to do that now with a team of like five people, $100,000, maybe $200,000, and make 5X to return on your money in a matter of months. mean, it's possible, not years.
Bilal Tahir (41:56)
Yeah, there's a lot of opportunity. I'm so excited about it. mean, I will always, like, you know, every passing day just gets the opportunity gets bigger, not smaller. yeah. ⁓
Pierson Marks (42:06)
Totally. I know. It's awesome.
Bilal Tahir (42:08)
Yeah, this is awesome. I know we're almost out of time. I want to just really quickly touch on this one last thing because you put it out paper banana. This is very interesting because this so we've talked about slide generation. We talked about Kimi slides. You know, you can create PowerPoint slides, know, Nana banana. So notebook has slides as well. Jelly pot has slides which you should check out. They're amazing with our magic slide template. But this is interesting because it basically focuses on a paper so you can give it a PDF paper and it generates infographics.
and I believe it uses nano banana under the hood. But what I like is it comes up with a holistic model first, like what makes sense, what is the concept in the paper, right? And then it kind of then basically becomes a director and then directs the nano banana prompts basically and comes up with a consistent theme and style, which is really cool.
Pierson Marks (42:57)
Totally, and it does like sort of iterative improvement. it looks, it generates the image, it looks at the image, it uses the VLM to kind of make sure that the image actually meets expectations and iterates on that too. So it's like the agentic framework for creating illustrations for academic papers using nanoBanana, but it's called paperBanana. mean, reference-driven agentic framework for automated academic illustrations.
orchestrates five specialized agents, retriever, planner, stylist, visualizer, and critique.
Bilal Tahir (43:31)
This is super cool. It kind of reminds me, I remember you had this idea about GitHub to a podcast or we'll take a repo and be able to create a podcast. I wonder if maybe you can apply the same formula and create infographics about a repo about what is the code base or like what the concepts are, you know, or something.
Pierson Marks (43:48)
Oh
yeah, totally. Yeah, no, I totally, still think maybe it's just because I would want this really badly. But I mean, if you could automate Fireship-like videos where you have a repo and on every major release, you're automating a high quality tutorial on what the open source project was so that you just watch a 10 minute video.
Bilal Tahir (44:14)
Yeah,
mean, yeah, that's it. Because we talked about remotion like a few weeks back about how remotions can open the floodgates and you can create these amazing programmatic videos. So I wonder if there's a remotion plus Nano Banana plus obviously doing some sort of it can take work to understand the repo first kind of workflow there. Probably there is, you can create that. ⁓ That's it.
Pierson Marks (44:34)
Mark. Yeah, it'd
be very, very interesting. yeah, I'm excited for JellyPod 2 and like a lot of the stuff that we're working on right now gives, I mean, it's going to give us the ability to really test out a lot of new, agentic creation mediums. So like this, illustrations, like self-refinement, self-iteration, subagents, I mean.
Keep an eye out on JellyPie, everybody. Plug our, plug our, start up.
Bilal Tahir (45:00)
Yeah, yeah. We got some
exciting updates coming through here. yeah. Awesome.
Pierson Marks (45:06)
Yes, yes. ⁓
But cool. Well, okay. Create a Flux, episode 30 in the books. Yeah.
Bilal Tahir (45:11)
Woo, yeah,
30. We did it. And yeah, it's a bit more exciting than when we started.
Pierson Marks (45:16)
Sweet.
Absolutely, absolutely. Well, okay, cool. Well, have a great rest of the week and yeah, talk soon.
Bilal Tahir (45:25)
All right, take care guys, bye.
