Grok 4, Multimodal Learning, & Is Coding Extinct?

Pierson Marks (00:00.743)
Cool, cool. Hey, how's it going?

Bilal Tahir (00:04.142)
Good, how are you doing?

Pierson Marks (00:05.895)
Good, good. It's always fun at the beginning. We kick it off with the same intro. How's it going, even though we were just talking a second ago?

Bilal Tahir (00:14.574)
I know, know, know. We gotta do something like...

call you know like good morning Vietnam or something like that.

Pierson Marks (00:23.109)
Yeah, good morning AI nerds and everybody else out there that wants to listen to us talk for an hour.

Bilal Tahir (00:26.808)
Yeah, yeah. There's, I forgot his name, Derek something, but he was one of the YouTubers when I was learning to code I used to follow, because he had this amazing, he would take literally, he would make these videos like learn Python in 40 minutes or something like distill. And his catchword was, well, hello internet. And he had this like really thick voice. Really OG YouTuber. Awesome. Really distilled it down into the simplest essence.

Pierson Marks (00:50.565)
Really?

Was that how you, I'm curious, like what was that journey, like when you were learning the program, like when were you first starting the program and like how did you go about that, I'm curious.

Bilal Tahir (01:01.774)
Oh, I think that was 2014, 2013. It was like around that time. Because I started as an analyst. I was doing Excel spreadsheets eight hours a day. I was really, you you basically get really good at running.

you know, VLOOKUPs and creating pivot tables. And then I started, I wanted to get out of that and I got into SQL and SQL made sense to me because, know, was, was in the data world already, but I was like, this is so much simpler versus like how you have to do these Excel things. You know, I want to learn more about programming. And so I started looking into a career move and I wanted to become a data analyst. And for a data analyst is like, basically you do SQL for the most part, but also if you know a little programming and the language is Python.

Pierson Marks (01:19.377)
Bye.

Bilal Tahir (01:47.488)
So I was like, all right, I'm going to take Python. And I'd taken Java in college and I had hated it. I absolutely hated object-oriented programming. It wasn't the thing for me. to go. I wonder if I had taken Python then or done something more functional if that had changed my life. I think it would have, but.

Pierson Marks (01:57.095)
Yeah.

Pierson Marks (02:06.278)
Really?

Bilal Tahir (02:06.572)
I remember I did that class, was like, programming isn't for me, I'm not doing it. And then 5.60 later, was like, at this dead end career, I was like, I need to switch. So I like, I picked up this course called Learn Python the Hard Way, which I think is still a great way to get into programming because it's basically literally the simplest way to learn programming. not like solving, it's like, this is how you just print, you know, print hello, just copy paste. So you literally just type out.

the words and that's it. It's just building muscle memory of code. And that was very effective. And I started doing that. Then I got the job at I was working at Capital One, you know, so one of the biggest banks in America, really cool company. people don't think a bank would be a cool place for technology, but actually they're very tech forward. I really enjoyed it. And they treat their analysts and software engineers with a lot of respect because they're like, you know what you're doing. I, and that's where I kind of own my craft and Python and SQL, cetera.

Pierson Marks (02:33.252)
Right.

Bilal Tahir (03:02.128)
And then ultimately, then I switched to JavaScript and React and TypeScript along the web in that job because I wanted to get more into front end. Just it's a whole story. yeah, what about you? Like, I guess you were computer science. So you were like, I guess from the start, hacking away 12 year olds into NASA or whatever.

Pierson Marks (03:11.78)
Right. No, interesting.

Pierson Marks (03:20.281)
No, no, no, no, no, no, no,

No, no, no. I mean, like, I think I mentioned this on previous episode, kind of like, I was always just fascinated with computers. I was just like building, like was a huge Lego nerd. I played Club Penguin a bunch. Club Penguin is kind of like the reason why I got into programming, because I can make it like a basic website to talk about Club Penguin. Like, it was like a blog website, just talking about like the pins and where the cheat codes were and what they were and how to...

get the most out of Club Penguin. that was like early, that was like mid elementary school, whatever I was doing that. But like, I wasn't really necessarily during that time, I just was kind of fascinated with.

like building things, not necessarily coding. It was more of like a means to an end because like I, you know, like built a basic website just because it's like, that's cool. I'm building something. And then I opened up the door to like blender and 3d animation software. And I think we talked about that in the past too. It's just like, I really liked using like the tools to kind of like build a mock-up of a swimming pool or like palm trees or a hotel and like iPhones and like random things that were just cool.

and I just I really loved I doing that because I would watch YouTube videos of people essentially doing the exact same thing like they're like creating a car in blender and they're just showing the keyboard shortcuts like on the screen of what they're doing to like a select this edge and then click this keyboard shortcut

Pierson Marks (04:52.622)
and then I'll do this thing and I'll just emulate them. Like just completely watch this eight hour video of them making this car and just like follow every step. It would take me like 20 hours probably to watch the eight hour video, but it was like elementary school, middle school and it was fun. I would wake up on weekends doing that. And I have probably still like a bunch of files somewhere on some old hard drive of all like those things. And then, yeah, and then like high school came around. I would like play around at like video game design classes and

some basic, basic stuff, nothing really too crazy. And then when college came around, I studied computer science and then that was really when it kind of accelerated. I was like...

I was like between business and CS. Like I was really like, I remember I applied to USC with like a, it was business slash computer science. It was like a half, it was like 50-50 major. And then I applied to Berkeley and UCLA with computer science strictly. And then some places I applied to business.

Bilal Tahir (05:35.758)
Thanks.

Pierson Marks (05:55.175)
But most places I apply is engineering. And just how different my life would have been if I decided to not have done, studied computer science. I always think about that.

Bilal Tahir (06:07.006)
Yeah, yeah, yeah, probably chose right. I always say it's easier to go from a technical to a non-technical field. I mean, you can always do an MBA if you wanted to, but like if you do a business, it's what you're, mean, you would have to pick up code and stuff. I don't know. I think it's good to get into a STEM field.

Pierson Marks (06:14.736)
Bye.

Pierson Marks (06:19.534)
Yeah. Do you, and I'm curious, mean, this is like a conversation people are having right now. Like, what is your thought on studying programming right now, going to school and studying pro, like, do you have any opinions on that? I kind of do.

Bilal Tahir (06:35.884)
Yeah, I know it's a big, odd topic and stuff. I personally, you know, I think coding is still very valuable skill set and you should do it because coding isn't just about writing words on it, typing words on a stream like, this is how, you know, you make a hook or whatever.

Pierson Marks (06:42.938)
Right.

Bilal Tahir (06:53.568)
it's for me, it's been just the way you think about stuff like, this is a step, can it be parallelized? And stuff like that, small stuff like that, you optimize and stuff. It's like the design thinking I think is actually the more important part. And before AI, if you think about how the trajectory of...

the engineering, was like you were a junior engineer and you were just a code monkey. You just coded out. And then at some point you get more responsibility to become a senior principal engineer. And at that point, you're not really writing code, but you're managing stuff and you're thinking more about, how should the system be architected and stuff? in a way, AI has basically brought that down to the junior level where you can just start doing that now. But the reason there's a hierarchy is because you kind of need to go through the coding.

like face to really understand how to think of that system. And so I think that's valuable. know, as I personally think, yeah, AI will design systems in this grade, but the way I think of it is like, I.

I have a superpower because I did code the old fashioned way. And so I can get more of the system. And so I think you should absolutely learn to code. It's a valuable skill set and it's more about the synapses that form encoding is just like something that will serve you even if ultimately LLMs are the ones writing the code. So yeah.

Pierson Marks (08:14.183)
Totally. No, I completely agree with you because I remember, there's a spectrum. It's like, okay, you don't learn to code. And people are like, coding is gonna, or AI is gonna take my job. I'm not gonna do that, which I think is a horrible thing. Don't do that. If you're passionate about software and you're building things, learn to code because the skills gained there is awesome.

And it's like boot camps and you take a year long boot camp and learn to code. That's actually where I think the most dangerous point is, is in the boot camp area where it's like you do six months to learn like reactor, know, build a website or like those things. I think that's probably the most dangerous zone in terms of like, you're not learning the hard fundamentals as much and you're learning more application layer and doesn't teach you how to think necessarily as deeply as like a

like a CS type of engineering degree maybe, I don't know.

Bilal Tahir (09:06.433)
Yeah, I totally. And if you get the fundamentals wrong, it could be hard. Like I personally went, we were talking about coding journey. I was doing Python, I wanted to get in front. And so I started React before I did JavaScript, which was, that actually set me back in some ways because I spent basically the first one to two years in very shaky grounds because I didn't understand like JavaScript, like closures or scopes or like hoisting and shit like that. And

And I feel like when I actually went back and learned a lot of that actually made me get bored out of React because I understood the basics of JavaScript. And so there's something about just getting those fundamentals, you know. By the way, for anyone who is a tech, there's a book called, it's free on GitHub, it's called, You Don't Understand JavaScript. It's six chapters and it's intermediate to advanced, but it'll give you basically.

I think better understanding of JavaScript than 99 % of all front-end devs, because most front-end devs, think they definitely jump into React or whatever framework Next.js, and they don't really actually appreciate the language and why it's certain. And once you understand that, you understand why certain, the frameworks make certain decisions, it just gives you more appreciation. I still, I'm not expert by any means, but it definitely helped me a lot. So recommend you don't understand.

Pierson Marks (10:16.103)
Right.

Wait, what's this again? Because I kind of want to read it because I bet you'll learn at least a few things from there.

Bilal Tahir (10:23.79)
Yeah, I'll send it to you and it's called you don't you don't I think it's got you don't Understand JavaScript and it's free. It was made by guy And he basically published it on get up You don't understand JavaScript. think you don't know JS maybe a book series. Yeah, you don't know JS 183,000 stars Yeah, yeah, it's pretty good. yeah

Pierson Marks (10:44.881)
Wow. That's crazy. Sweet. Maybe we turn it into a podcast and we listen to it on my walk.

Bilal Tahir (10:53.102)
I know one of your passions is that you want to create a GitHub repo to podcast flow.

Pierson Marks (10:59.461)
I don't know, just always think like the reason why I think about that is just because there are cool like the people I tweeted this but the people that scroll the GitHub like repo like feed are another level of just

Insane like if you if you open up your github app on your phone and you're like scrolling like These are the issues that are getting updated on the on the my starred my starred repose that I'm following like that's crazy. I mean It's but I would love to you know Like if there's like a bunch of cool projects out there and like There's this like new major version that released all these cool new features And I was like got a five-minute podcast or ten-minute podcast like what's the new releases? Like what did it do? I'm like, that'd be sick

Bilal Tahir (11:47.054)
No, seriously, GitHub is so underutilized and their data is actually like, it's open. One of the projects I made with my brother one time was, was like, get trends. We actually even bought the domain and stuff. it was basically, used to be, there was a space called GitHub Trending that they deprecated. I don't know if you remember. It was basically all the trending projects. And a lot of people love that, because it's a way to discover, because like you said, there's so many repos, it's hard to discover good stuff. And so what we did was we...

Pierson Marks (11:47.162)
So.

Pierson Marks (12:03.046)
Yeah.

Bilal Tahir (12:15.838)
the trending API was open, so we took that and we basically would let you filter by when was the last time this GitHub repo was updated. Because a lot of times you search for something and you get the most start, but then you notice it was the last commit was five years ago. I was like, you know, that's almost always, especially in today's world, anything over two, three years, I always dismiss that package, especially in JavaScript land. I know it's unfortunate, but.

things change, modern style, ES-90, like all these things change. So any package that's more than three, four years, it's almost gonna be unusable for you. And so for me, that was always a big thing. So what I did was we built this thing where you could put in the date, the last update was within, let's say six months or whatever, sort by stars, and then you can check by language, because GitHub actually tags it by, you can see, okay, does it use TypeScript? Does it use Python? So it was actually very useful. And I'm like...

Nobody's built this stuff, you know? And now you imagine you put some podcasts on it, maybe you can create little snippets about it. this is like a transpiler thing, or this is a little, you know, Kali tool or whatever, right? So yeah, I think there's a lot of gems here. So we are giving away golden ideas, guys. Go build this.

Pierson Marks (13:17.882)
Right. Right.

Pierson Marks (13:23.366)
Yeah, yeah, yeah, here's a startup idea. Go do it. No, totally. And I mean, this week was, and I'm jumping back into kind of more of the creative side. This week had some cool.

cool announcements. mean, the biggest one in the AI world is probably Grok 4. so if anyone didn't know, Grok 4 came out, which is soda or state of the art on a lot of benchmarks, which pretty cool. A benchmark is just pretty much just like a test that a lab will or external researcher will use to

Bilal Tahir (13:44.536)
Mm-hmm.

Pierson Marks (14:05.254)
to test a new model. So to benchmark it against like OpenAI or Anthropic or any of these other players. So it did pretty well. I didn't dive into it. I haven't used it yet. I don't have Grok, the premium plan, which I think you need. Do you need the premium plan to?

Bilal Tahir (14:21.422)
feel like you do, I know we have the Grok as a premium user, you have the Grok there, but I don't know if it uses the latest one. They also had that debacle last week or even this week, think where the guy, the Grok kind of went full Hitler. so then they shut it down from what it last. So I don't think you can do Grok or maybe, I don't know if they've turned it on, but they shut it down for a bit. So I wonder if Grok 4 just will take over from there. But yeah, I...

Pierson Marks (14:38.693)
Right.

Bilal Tahir (14:52.226)
So first off, just giving credit, the XAI team is insane. Like what they've done, they basically within two years, they've caught up to open AI, which is insane. And actually a lot of it, think I'm sure they're amazing researchers, but a lot of it was a feat of engineering because they have done.

this insane thing where they've been able to pull in and build these power plants because you need to shit off energy to run these models. there was some really, there's this guy called Dylan Patel, which I really recommend people following. He runs this newsletter called Semi-Analysis. It's become one of the foremost newsletters out there in terms of the chip industry and AI training, et cetera. And they get deep into like the exact training methods and you don't put, and he has

this

he was like you know just like the sub-strikes blew up and now he makes 20 million a year and it's just him and it's crazy like you know it's crazy

Pierson Marks (16:14.554)
Wow. Wait, he's making $20 million a year on Substack? Wow.

Bilal Tahir (16:19.918)
Yeah, because the most we see for all the big players are basically getting it because because he said and it's not the newsletter. It's actually the data. He says 10 percent of the revenue is sub stack like the subscription, but 90 percent is we just sell the data to, you know, like hedge funds and stuff like that because it's alpha. Right. Like, you know, OK, what's Nvidia do? Like who's using Cori, blah, blah. What chips are they selling? You know, what's going to be the projection for next quarter, etc. And it's coming back to XISO.

insane achievement at the same time there's some concerns like i i really didn't use grokty first the pricing was pretty weird and so was like why would i use that and it's still expensive but i think the bigger concern is some of the stuff that's come out is jeremy harvard posted this if you ask it like israel versus palestine the first thing it does is goes to elon's posts and to see what elon thinks and then replies and so and this is not just a scepter problem there's something bigger so i don't know

I don't want to speculate too much at this point, but I do think the fact that knowing Elon Musk and his history, there's a good chance that Grock will be biased towards him and he'll use that as a tool to kind of spread the news that he wants to. So that gives me pause. feel like combining Twitter with that model, it's a pretty powerful combo.

Pierson Marks (17:39.748)
Right. Right. It's a powerful model,

Bilal Tahir (17:41.92)
But anyway, that's like the political side. The technical side is actually interesting. So what they did was

So the way you train a model is you pre-train a model. The first is you just like train an LLM on a bunch of data, text data, blah, blah, That's called pre-training. then you do reinforcement learning on top, which is actually, give it like examples of this is how you respond to a question, et cetera. And it'll actually make it sound very intelligent and come up with like coherence, cetera. In this model, apparently from what I read,

because usually what you do is you spend a shit ton of amount of compute on pre-training and then you add a little RL frosting on top. They did it the other way. They basically started with a strong model like GROG 3, I think, did a little bit of pre-training and then they 10x amount of compute on RL. And that's apparently how they got the benchmarks to be so good. At the same time as the benchmarks, if you look at the way they're going, they spent 10x amount of compute. They only got like two, three percentage points, like incrementally increasing in the benchmarks, which...

Pierson Marks (18:34.587)
Hmm.

Pierson Marks (18:44.038)
Right.

Bilal Tahir (18:45.172)
is kind of concerning because that kind of indicates there's a plateau on how much you can scale RL as well. And up to last year was all about test time compute and stuff like that. And we were like, OK, there's a wall. We're to keep blazing through. But now there might be a wall. So I don't know a lot. I'm not an expert, an AI expert. some, you know, it's kind of the benchmarks make you go like, OK, maybe we are not going to.

Pierson Marks (18:53.222)
All right.

Pierson Marks (19:02.683)
Right.

Bilal Tahir (19:13.526)
Hate this insane hasm toad that we were promised.

Pierson Marks (19:16.518)
Right. It's super interesting as somebody, you know, like I understand very basic fundamentals because I listened to lot of things. I've read things. Am I a research AI researcher? No. Like I need to dig again back into like all those papers. And, know, there's probably there's like 30 papers that Carpathi identified with. Okay. Like these are like the top 30 papers that every person should read if you really want to understand like how transformers work. And then there's the three blue one Brown series.

Bilal Tahir (19:40.078)
Yeah.

Pierson Marks (19:46.426)
about LLMs, which is also a good watch. But yeah, the thing that I, we talked about this last week about spatial awareness and like spatial kind of understanding. And yeah, and just like, just at a very high level at a, know, ignorant sort of take is that like, yeah, a lot of data is in text. There's a lot of data still in images. There's a lot of data in videos that's still like not really, like how do you, how do you extract?

Bilal Tahir (19:59.967)
faithfully, yeah.

Pierson Marks (20:16.349)
all that information in video and then there's also just like world data like I I'm just super interested to see what happens with you know one Tesla and The vision data that they'll be able to get across the world and world understanding about like, know just like structures and like physics and stuff like that and will Understanding physics help you with other domains just like I mean one of the cooler things that I think llama showed was that

You know, if a model is better at programming, it's gonna be better at all other domains. So it's like performing RL on like logic-based programming tasks also increases the floor for like across all the board, even unrelated to programming. So it's like, are there other things like increasing spatial awareness and reasoning, physics-based, like will that improve other areas? Maybe. I don't know.

Bilal Tahir (21:11.342)
Absolutely, no, you're spot on because like we think of like most of like everything like there's some stuff that gets most of the attention and here it's like these benchmarks and like the training method but honestly there's so many levers to pull like you'd mentioned multimodality, that data but also other small things that you only the researchers in the field know. mean that.

Like even I like to, like a good example is the one, so Albert Gu, who invented, co-invented Flash attention, which is attention used to be quadratic and Flash attention makes it like less than quadratic. And that's how, you know, a lot of Gemini use that in their model as well. think Flash to make it very fast and cheap. then Albert Gu started Cartasia, which is in a compared to the 11 Labs and they're, they basically focused on latency and so they can get,

supposedly have the fastest voice agents and they use this flash retention architecture. Well one of the papers they came out with recently was a it's called H-Nets I think and it basically talked about tokenization about how you know tokenization is actually a huge problem because it actually causes so many gotchas because people don't understand LLMs don't see words like we do.

they actually break it up into these like little chunks. That's why we have stuff like how many R's in strawberry, people go, oh, ha ha, you can't do that stupid. It's not stupid, it's because it can't literally, when you understand how it actually sees, it sees the word straw and then berry. And then berry might be broken into bur and re. And so when you ask how many R's in strawberry, you are asking how many R's in straw, bur. R, and why? And imagine you were asked a question, you're like, is he talking about the first word, the second word?

the first four words concatenated, it's not obvious, right? That's why they get it wrong. And so when you understand tokenization and like, so why do I bring this up? Well, they come up with a new paper, they're like, tokenization is annoying. It's another abstraction. We should just have better tokenization. And so that's such a low level thing. But they came up with this new architecture where you can just basically tokenize it in a smarter way rather than naively because OpenAI, everyone just uses the same OpenAI chunking algorithm they came up with in GPT-3.

Bilal Tahir (23:24.526)
the last three, four years. So small innovation got great results. Will it revolutionize moral elements? I don't know, but that's just like one area. Then you got stuff like attention, self-attention, all these, like all these layers. Hundreds of PhDs are working on that one layer in that whole stack and they're optimizing it. That's why, you know, I mean,

Pierson Marks (23:35.206)
Right.

Bilal Tahir (23:48.768)
Even with today's architecture, there's so many levers to pull and LLMs are gonna get way cheaper, better, way higher quality. yeah, reason to be bullish for sure.

Pierson Marks (23:56.047)
Right. It's what maybe you know this because it's a question I've always had and maybe it's naive or whatever. like all all LLMs at least it seems like maybe I'm just completely wrong on this on the preface that they're trained on English text or just like all text. But

So like maybe that's just like completely wrong. It's just trained on like all all textual data. But me as somebody who doesn't understand Mandarin or thinking about like the Mandarin language or Chinese is you know, they use characters.

And to me, it seems like one of those characters more so directly relates to like what could be considered like a token versus like in Latin based kind of languages like English, where you have a word built up of like multiple letters. But in Mandarin, you have just like a single character representing an idea and it's a much larger alphabet. And it's just like, is that like

does the language itself that like, we're essentially building.

but large language models based on the fact that, this is the language that humans use. It's the language that like our society uses. But is that like a limiting factor? there another language, another sort of like thing that can represent information more densely? like, is it like a limit to our vocabulary and how our current languages are structured that limits their ability for large language models to like be intelligent?

Pierson Marks (25:35.704)
And it goes back to also like how last week or the other week we talked about like there are limits in our language where I can't explain to you the color green precisely because the green that I see is different than the green you might see and there's no way to communicate correctly with full accuracy like the green that I see.

Bilal Tahir (25:53.304)
Yeah, no, it's super fascinating because I mean, it's not even LLMs, even human beings, language changes the way we perceive the world. There's nuances in different languages where you kind of need to understand the language to kind of, it's almost like having an inside joke. like, you say it that way because that's just, in our language, that's funny or whatever. It's hard to explain that. And so you talk about LLMs, it's like it's.

doubly compound compounded. So it's a fascinating point. I haven't really thought about it, but you're right. And, I imagine if you're, so I do think LLMs are trained on common call. So they're to a certain extent trained on every language, which is why I think they can be really good translators out the box. are, but of course there's a bias. mean, most of the internet is English. I mean, at least the common call database. And so there's, it's an interesting question, like, cause there are other countries, particularly China has made Chinese models, which actually are more focused on Mandarin and would they be

Pierson Marks (26:35.077)
Right.

Bilal Tahir (26:49.74)
you know, there's stuff like there's literature say math one of the reasons you know Chinese people are better at maths is because Mandarin is a more mathematically sound language like like their day inside Monday Tuesday I think it's like one two like small stuff like that so the mathematical concepts make more sense to you know Chinese kids I mean that's one small aspect but I've heard that so I imagine so but but you apply that to LLMs maybe like maybe LLMs can be better at certain tasks just because of the

Pierson Marks (27:06.948)
Interesting.

Right.

Bilal Tahir (27:17.64)
language they would create. It's a fascinating concept. I don't know more about it I imagine that would be the case. So interesting.

Pierson Marks (27:24.198)
Right, right. No, it's super, yeah, super interesting. And I know we got down this whole rabbit hole about tokenization. see it on the agenda too, you about Cartesian, the new approach and super, super cool. So, I mean, well, one, we went into the, we started off kicking off like our journeys and coding and Grok for tokenization, all of these like theoreticals. fun.

Bilal Tahir (27:31.949)
Yeah.

Bilal Tahir (27:44.802)
Yeah, all over, sorry. We haven't gotten into the GenMedia stuff yet, so. Yeah.

Pierson Marks (27:49.413)
No, it's digging this, you know, finish up with that. So I know there is a cool podcast I haven't yet listened to. If you're also interested, if you're listening to this podcast, you probably know of it. It's Latent Space, hosted by two awesome computer scientists, technical people, Swicks and...

Alessio All I know is because it's Alessio and his last name starts with F and it's he's like I'm Alessio the CTO of Decibel Partners and it's like such a like every every episode starts off at the beginning With that so that's why I know but yeah, they just had an interview with Olivia and Olivia Moore and her sister

Bilal Tahir (28:34.958)
testing.

Pierson Marks (28:35.77)
Justine Moore, partners at A16Z and haven't listened to it, but they are like the creative influencers of the tech world, I would say.

Bilal Tahir (28:45.678)
I they're like up there, definitely. think the leaders in terms of, you know, posting cool content. But yeah, I mean, I fall for both of them and I, yeah, I feel like bad because I almost like like every post because it's like so cool and interesting. But yeah, yeah, it was a fascinating podcast. listened to it and kind of taking a step back. think one of the, obviously one...

Pierson Marks (28:53.435)
Bye.

Bilal Tahir (29:11.468)
reason this is really experienced view through we talked about view three a couple of weeks back. You know, it was the first model that really combined audio and video together in a really good way. And so you create Yetis or Sasquatches or stormtroopers and, and they can like, you know, say something and it actually sounds really cool and funny. so people made all these kits together. And so Olivia and Justine, you know, play around with that. They repost other people's creator stuff. And a lot of these things like they're blowing up like crazy. are getting millions of

Pierson Marks (29:21.83)
Mm-hmm.

Bilal Tahir (29:41.364)
views, know, and using like, you know, files like Yeti, Squash Squatch, etc. making characters out of them. And it's one of the I think big updates that happened this week was a couple of updates happened. First was image to video was released, which was Inflow, which is the Gemini app. And what that unlocked was great character consistency, because before that, you could only do text to video in Vio3. So you create a Yeti

video eight seconds. The next one you create another Yeti video, the Yeti may or may not look the same. It's one of the reasons why people love Stormtroopers and I guess the Yeti 2 because you kind of can get a consistent look but let's say you want to use the same character throughout it was very hard with image to video that unlocks it because now you can just generate an image of a character and use that as a starting frame of a VL3 video. So I think

we will now in the next few weeks see a longer format videos, you know, maybe two, three, five, 10 minutes, VO3 videos, or, you know, that could be a more serious storylines. So I think this is a step change towards actually getting to a 10 to 20 minute episode or a series of like a cartoon series or whatever series using just VO3. So it's a huge step.

Pierson Marks (30:53.99)
So you said this was added into Google's Flow product?

Bilal Tahir (30:59.67)
Yeah, it's there as far as I know and it's not in the API. I'm waiting for the API would be super cool on Fallen Replicate. What is in the API is VO3 Fast, which is basically a cheaper version of VO3. And there was some minor debacle on that. First they announced it and Fallen Replicate hosted it and they're like 5X cheaper. I was like, this is amazing. And then they doubled the price and they're like, sorry. So apparently there was some miscommunication I guess with Gemini, but right now it's 40 cents per second. So which gives you for an eight second video, $3.20.

compared to six dollars before and if you use flow it's even cheaper i think it's like 20 cents there so depending on your usage you might want to use slow

Pierson Marks (31:36.358)
and stay.

So Google's subsidizing essentially that product there. Still paying for use, but it's cheaper to use Flow than the API.

Bilal Tahir (31:47.554)
Yeah, you just have to get a subscription though. It's like what $250 something. so, depending on what your needs are, you can do that. I'm an API guy, I just want the API on demand. I'm happy paying hundreds of dollars, but I just don't want to be locked into a $5 subscription. It's like a weird thing in my head.

Pierson Marks (31:55.514)
Right. Yes.

Yeah.

Pierson Marks (32:05.248)
Totally, that's it. Yeah, flow seems cool. I had this idea of I want to play around with there was this guy that kind of went viral maybe last week and He creates Greco futurism art. Did you see this?

So it's, so Greco futurism is essentially a architecture style of like make the Greeks and future. So like Greco, like Greco, the Greeks and futurism. And it's pretty much, if you ever seen Star Wars, it's kind of like that where it's very like beautiful.

arts, Greek kind of like columns and stone and marble, but futuristic in terms of like what if they had access, what if the Greeks had access to like LEDs and like modern technology, what could be built? And he posted something like, he posted like this gallery of all these images that he created. And they were so sick. Like I was like, man, I want to live in that world. It looks so cool.

Bilal Tahir (32:54.786)
How nice.

Bilal Tahir (33:02.286)
How nice. How nice. So it's like Greek, like Roman Empire, like baths, but it's got microwaves or appliances and stuff.

Pierson Marks (33:12.23)
It's like, not a plant, it was just more of like the architecture. was like, what would a house look like if you had like these massive arches and columns and like LEDs and it felt futuristic, but also just really beautiful. Because I think that's a theme that I really, really hope starts to reemerge. David Perrell, he's a writer and...

Bilal Tahir (33:18.062)
Nice.

Pierson Marks (33:37.935)
great thought leader and he's really just doubled down on like in the world of AI and abundant sort of like it's very easy to create things like the floor raises and so we should be pushing our limits to make things that are beautiful and like.

a renaissance of sort of create, like, creative beauty. when you can create so much, you know, and everybody can create, the thing that either differentiates you or what we should be doing is making things that are, like, crafted beautifully. And I love that idea. Like, I...

Bilal Tahir (34:12.632)
Yeah, no, for sure. It's like I've said, it'll be a camion explosion of ideas because you can mix and remix and think about, just like add your own twist to it. It makes me think about, it's kind of very pressure now that I think about, but Andy Warhol, he said this like 50 years ago, think. He said, in the future, everyone will be famous for 15 minutes. That's how the phrase, 15 minutes of fame comes from. now that I think about it, the way it's so...

Our world moves so fast. You can literally come up with a Greco-Roman, okay theme, you go there, you put it out there, you go viral for 15 minutes, and then everyone moves on. Right now, if you look at the world, it's actually shrinking. used to be, maybe once you were famous, you were over famous, but now it's like people go viral for a month or whatever. And then this week, and I'm like, well, the time is basically shrinking, so maybe we will reach a point where everyone just gets to, it's almost like a, what do call it?

the forgotten him, everyone just takes turns. It's like, I'm famous now. You're famous now. You're famous now. So it's crazy. know, in a way it's very more debocatized, but yeah.

Pierson Marks (35:18.309)
Right.

Yeah, totally. It'll be interesting. Yeah. And pretty much I just got down onto this tangent too, because there is this, I wanna take those, I would love to recreate what he's doing and be inspired by that, because it's really cool. I think it's awesome. But then taking that and then with scene consistency, create essentially.

Home tours. Like, I love watching on YouTube video just like some sick houses or just like in other places around the world. Like, oh, what is this? Like, what does the city look like? Or what is this house looks like? Like the coolest Airbnbs around the world. that's a, I love watching that while meeting or whatever. But I was like, you know, to me watching it like this cool Airnb in the middle of, you know, Europe, that's awesome. I'll never go there. I'll never book it. It's more, it's fantasy. It's like, oh, I'm imagining myself in this Airbnbs.

and be experiencing that experience.

You know, that's very similar to like, if there was a YouTuber or somebody that, you know, created a 10 minute video showing, you know, just creating a home tour of like this Greco futuristic hotel that I'm just watching like, this is cool. You know, like, there's like the arches over there, the staircase, the rooms look like this. I'm like, it's exactly the same. Yeah, it doesn't exist. But the thing in my world, the other Airbnb doesn't exist either, because I'm never going to go there. I'm never going to experience.

Bilal Tahir (36:45.144)
Right.

Pierson Marks (36:45.523)
except just watching it for entertainment. So.

Bilal Tahir (36:49.542)
Yeah, you know, it doesn't exist right now, but with VR tech and stuff, maybe you can actually immerse yourself there. So who knows? That's awesome. Yeah, no, I love it.

Pierson Marks (36:57.202)
Yeah, no, totally. But maybe it's a maybe it's a maybe this weekend I mess around with flora or is that what is it flora or fauna? I think it's flora.

Bilal Tahir (37:06.828)
Yeah, I mean, you posted about Flora. So what is Flora? Is it basically similar to Flow?

Pierson Marks (37:12.945)
I think so. think it's more of like just like model agnostic where it's you take you kind of have like nodes it seems like and you can put like hey the first is an image node you have this image that could be like maybe you don't have an image you say hey you have a prompt and it's like generate an image of this person you have this image node and so you have a person with a mustache and he's like in a background then you connect that node to like the next node and that node is like image to video and then you can connect to the next node like add another person in background.

Bilal Tahir (37:37.752)
got it right so yeah so very similar to comfy UI but maybe nicer you know you use probably nice yeah it's funny how everyone just comes back to the comfy UI like node there I guess like it's like it's just become one of those ubiquitous like yeah that's how you just build pipelines thing you know whether it's gum loop or whatever it's like we're just connecting nodes together everything is a graph everything is computer everything is graph

Pierson Marks (37:44.837)
I think so, very similar. Yeah.

Pierson Marks (37:56.764)
Bye.

Pierson Marks (38:01.703)
Totally. And that's what. I feel like all those companies have such an opportunity just to be out. Maybe comfy UI has this, but I know gum loop and they don't like just give me a. Get allow me just to define my flows in JavaScript or Python. Like just be like hey, this is the node like this is the name of the node and here's the parameters and just. Like either like maybe even your own language where I could just type it out because I don't have to.

It's hard for me to reconcile like probably you too in the UI. Like hey, you node and you have to connect this node to that place and then click this button to insert the variable from step one to step two. Just let me just define the flowchart in code.

Bilal Tahir (38:45.314)
Yeah, yeah, I know I was playing, was mentioning my frustration. My frustration with these tools, and this is totally a skill issue on my part is, like these small distractions, especially I've noticed like the input outputs, I always struggle. It's like never clear to me, unlike code, like what am I inputting? What am I outputting? And there are various ways, this is the data, okay, yeah. But then when you connect the output, like you're like, okay, if you connect the output of this node to that, in my head, I'm like, okay, that means the input just flows through, but it doesn't.

It's almost like you have to go through this weird JavaScript object and it might be there or it might not be. I'm like, well, first just...

You know, I mean, I work very step by step. I'm like, if something doesn't work, I'm like console.log. Okay, show me exactly what's there. And there's just no console.log ability in these abstraction tools. Like almost you have to just connect, test, test, and it fails the whole node fails. You're like, why the fuck did this fail? Right? And so, I don't know. I feel like there's a lot of improvement people can do, you know, on these abstraction tools. And I think a post about this, once you work with these NoCo tools, you have an appreciation why non-tech, like it's like almost driving with the

Pierson Marks (39:38.257)
Right.

Bilal Tahir (39:52.648)
break on this is why we talked about coding I feel like super power because I do think you know you are the abstractions hold you back in so many small ways that compound you know so

Pierson Marks (40:03.331)
Mm-hmm. Totally, totally. No, I know. It's, I want to, like, I, I've never been this person, but like, you probably, I feel like you have this, but I wanted, I would like to have a, like a scripts. So like, you know, we write scripts all the time to like automate certain things and tasks and like you create a Python script or a bash script and you put it on your desktop, you run it, you do your thing and it's cool. And it's like, that's sick.

because if you're technical, you could automate some process pretty easy, like very powerful. Like writing a script is the most powerful thing you can do in terms of like automating any process. But oftentimes it's like, you don't know the script, like you don't know exactly everything to do. Like if you wanna connect some system to another system or interact with your file system on your computer, you have to know like the code to write. You don't need to know that anymore because AI can do it. So I'm just imagining like,

I have a Chat GPT, Raycast, whatever, some chat bot, and I would love it so I'm trying to do something and just create a script for me on demand. hey, resize these images, you know, and I'll write a script, resize the images, and then essentially I'll write that script, it works, it's great, and now that script gets added to like my toolbox.

on my computer locally and gets added to a tool that it's just like I can just create scripts on demand for like all these like one off workflows that I might have to use in the future. Maybe it saves like it knows my file system and everything but it's it kind of lives in like this one area that it's like I don't have to recreate the script all the time because sometimes I'll like forget I actually wrote a script in the past and like wait no I where did that go it's like oh she should be my toolbox and it

Bilal Tahir (41:42.734)
Hmm.

Bilal Tahir (41:47.728)
right, yeah.

Right, I think Raycast do you use that? It's called snippets, right? Like you can have these little shortcuts you can save and then just go.

Pierson Marks (41:58.972)
I think they have like, they have extensions and stuff and they have shortcuts and I'm getting still familiar with it. I'm just like kind of, I'm probably like an amateur level on Raycast right now. I'm kind of like high level, but there's still a lot of power that I haven't looked into all the time, looked into, but I feel like maybe, maybe that is the thing. Like they have like little snippet code stuff.

Bilal Tahir (42:18.03)
Yeah, yeah, mean, small stuff. I saw this one, I was like, because I typed the same URL, like Jellypod, and you can actually make, let's say, a very, you can have a shortcut, like semicolon J or whatever, which you would never really use, and it'll just auto-populate the URL. I'm like, oh, that's basic, but very helpful. You can just do that, and stuff like that.

Pierson Marks (42:31.719)
Mm-hmm.

Pierson Marks (42:37.666)
Right. I have one. think it's it's C. I'm going to test this CC. Yes. Oh yeah. I did close conversation. So for all of our in an intercom and we're chatting with support. I if you type CC I have like this snippet that says like hey thanks for like you know thanks for reaching out like if you need any more help I'm right here. But if not like I'm just going to close this conversation. So CC close conversation. So.

Bilal Tahir (43:01.656)
That's awesome. Unless you ever have to type a word with two Cs together consecutively.

Pierson Marks (43:06.457)
What word would have two C's? Yeah, accommodation. Yeah, yeah, yeah. But maybe, I think it doesn't work if you have like a C, C. I think has to be like C, C is like the first. I'm gonna test that right now. Let's see, but yeah, no, it doesn't. So it's like, it'd have to be C, C, space, and then, you go. So, yeah. But, yeah. I mean, yeah, so we'll take some of the, I think we'll take some of this stuff from.

Bilal Tahir (43:09.55)
Rebecca. I don't know. Does Becca have a name? I don't know. Rebecca.

Bilal Tahir (43:19.948)
Yeah.

Bilal Tahir (43:26.03)
okay, okay, nice. fun edge cases. Awesome.

Pierson Marks (43:35.377)
this and put it to next week or actually.

Bilal Tahir (43:38.159)
Yeah, was like, we're at time we can like move. I know we were going to talk about video to video, we can probably punt that, but we'll leave a teaser. Like I think it's some cool workflows there.

Pierson Marks (43:47.547)
Yeah, a lot of cool workflows video video and next week. Yeah, well, we'll keep next week out on Friday and yeah.

Bilal Tahir (43:55.778)
We might have to move to Thursday because I'm actually leaving. But yeah. Right, right, yeah.

Pierson Marks (43:58.268)
We'll record earlier if you're down. We'll record earlier in the week and I'll just keep it coming out on Friday. And then the following week maybe I can figure out a guest or someone to come on. I don't know. I'll reach out to some people like, you wanna talk about like how you're using these tools and maybe you'll see somebody who wants to come on. That'd be nuts. Hey, like I have this podcast that gets 20 listeners a week.

Bilal Tahir (44:16.92)
Talk to the Murtvans, know who I mean? Come on.

Bilal Tahir (44:24.974)
You know what's funny? I noticed a lot of these podcasts. I don't know, it pops up on my feed. I was watching a Dylan interview and he was like, and I looked at the guy's channel and he had like same 20 subscribers, 400. I'm like, well, how does he, you know, and I think there's a lot of people who started like us, they just start podcasts and stuff and they're maybe in SF or whatever and they're like, oh, would you want to come on my part? And you know, if you ask, the power of asking is very powerful. A lot of people will give you the time of day. you know, cause it doesn't hurt them. They're like, yeah, free press, whatever.

Pierson Marks (44:51.783)
All

All right. Totally, totally. You're right. You're right. have, I have to, I have that on my to-do list of just like reaching out to all those, like the starter stories and all those, like those channels. Cause yeah, you're right. Just, but that's, that's a little different. Cause I'm like, Hey, put me on your show. But yes, a hundred percent.

Bilal Tahir (44:54.58)
And so, yeah, good.

Bilal Tahir (45:08.876)
I did see this, Justine, that this was very interesting. She talked to Grok mode on a voice, and then she went to Head Drum, which is a talking avatar, and then she animated, well, she animated herself, which I thought was, she didn't need to do that probably, but she took a picture of a robot and she animated that based on the audio of Grok. And I'm like, so you basically simulated a podcast with Grok, but with the avatars. And so you can almost do that. You can clone your voice in JellyPod.

Pierson Marks (45:14.087)
Hmm.

Pierson Marks (45:37.692)
Mm-hmm.

Bilal Tahir (45:37.858)
you can kind of record yourself talking to yourself. And then I feel like that'd be actually, I think that will be a viral format where you just talk to yourself, but it's a podcast.

Pierson Marks (45:40.731)
Right.

Pierson Marks (45:47.025)
Totally. I talk to myself all the time, so I mean like, yeah, totally.

Bilal Tahir (45:49.442)
Yeah, it's a true eco chamber. You're like, yeah, man, you know, it's like, have you ever seen that Polaroid meme? It's like, you know, it's like, they clipped together him talking to himself, like, just me, just us. Yeah. And they're just looking at each other. It's funny.

Pierson Marks (46:01.767)
No, I've never seen that meme. That's funny. That's hilarious. But sweet. Well, cool. mean, everyone, thanks for tuning in to episode five of Creative Flux. We do this, it comes out every week on Friday. talk to you next week.

Bilal Tahir (46:07.042)
Awesome.

Bilal Tahir (46:18.008)
Talk next week, take care.

you

Grok 4, Multimodal Learning, & Is Coding Extinct?
Broadcast by