xAI’s Mind Blowing Grok 3 Demo w/Elon Musk & Team (full replay)

Transcript

Title: xAI’s Mind Blowing Grok 3 Demo w/Elon Musk & Team (full replay)
Author: Solving The Money Problem

Transcript:
well welcome to the gro 3 presentation
um so the mission of xai and Gro is to
understand the universe we want to
understand the nature of the universe so
we can figure out what's going on where
are the aliens what's the meaning of
life how does the universe end how did
it start all these fundamental questions
um were driven by curiosity about the
nature of the universe and um that's
also what causes us to be a maximally
truth seeking uh AI even if that truth
is sometimes at odds with what is
politically correct in order to
understand the nature of the universe
you must absolutely rigorously pursue
truth or you will not understand the
universe you'll be suffering from some
amount of delusion or error so that is
our goal um figure out what's going on
and uh we're very excited to present
grock 3 which is we think uh an order of
magnitude more capable than grock 2 in a
very short period of time and uh that's
thanks to uh the hard work of an
incredible team and um I'm honored to
work with such a great team and of
course we'd love to have um some of the
smartest humans out there join our team
so uh with that let's let's go hi
everyone my name is eigor lead
engineering at xcii I'm Jimmy Paul
leading research I'm Tony working on the
reasoning Team all right I'm Elon I
don't do
anything I just show up
occasionally yeah so um like I mentioned
Gro is the tool that we're working on
Gro is our AI that we're building here
at XI and we've been working extremely
hard over the last few months to improve
gr as much as we can so we can give it
to all of you so we can give all of you
access to it um we think it's going to
be extremely useful do we think it's
going to be interesting to talk to funny
really really funny um and we're going
to explain to you how we've improved gr
over the last few months we've made
quite a jump in in capabilities yeah
actually we should explain maybe also
what is why do we call it Gro so Gro is
a word from um a Highland novel Stranger
in a Strange Land um and it's uh used by
a guy who's who was raised on Mars um
and the word Gro is to sort of fully and
profoundly understand something that's
what the word gr mean fully and
profoundly understand something and
empathy is important true so yeah so uh
if we charted xas progress uh in the
last few months has only been 17 months
since we started kicking of our very
first model uh grock 1 was almost like a
toy by this point only 314 billion
parameters and now if you proud the
progress the time on x-axis the
performance of favorite Benchmark
numbers M mlu on the Y axis we're
literally progressing at unprecedent
speed across the whole field and then we
kick off grock 1.5 right after grock One
released after November 2023 and then
grock 2 so if you look at where the all
the performance coming from when you
have a very correct engineering team and
all the best AI at Talent the only one
thing we need is a big intelligence
comes from big cluster so we can
reconvert the entire progress of XI now
replacing the Benchmark on the y axis to
the total total amount of training flops
that is how many gpus we can run at any
given time to train our large language
models to compress the entire internet
so after all human all human knowledge
really that's right yeah internet being
part of it but it's really all human
knowledge all everything yeah the whole
internet fits into a USB stick at this
point it's like all the human tokens
yeah that's right yeah uh very soon into
the real world yeah um so we had so much
trouble actually training grock 2 back
in the days um with kickoff the model
around February and uh we thought we had
a large amount of chips but turned out
we can barely get AK training chips
running coherently at any given time and
we had so many Cooling and power issues
I think you were there in the data
center yeah it was like really sort of
more like 8K tiffs on average at 80%
efficiency more like like 6,500
effective uh h100s training for you know
several months but now now we're at 100K
yeah that's right okay that's right so
so what's the next step right so after
gu 2 so if we want to continue
accelerate we have to take the matter
into our own hands we have to solve all
the coolings um all the power issues and
everything yeah so so in April of last
year Elon decided that really the only
way for XI to succeed for XI to build
the best AI out there is to build our
own data center so um we didn't have a
lot of time right because we wanted to
give you gr free as quickly as possible
so really we realized we have to build
the data center in about 4 months um it
turned out it took us 122 days to get
the first 100K gpus up and running and
that was a Monumental effort uh to be
able to do that um it's we believe it's
the biggest uh fully connected h100
cluster of its kind um and uh we didn't
just stop there we actually decided that
we need to double the size of the
cluster pretty much immediately if we
want to build uh the kind of AI that we
want to build um so we then had another
phase um which we haven't talked about
publicly yet so this is the first time
that we're talking about this U where we
doubled the capacity of the data center
yet again um and that one only took us
92 days so we've been able to use all of
these gpus use all of this compute to
improve grock in the meantime and
basically today we're going to present
you the results of that the the fruits
that came from that um so let yeah so
all the path all the rows leads to gr 3
uh 10x more compute more than 10x really
yeah really maybe 15x yep uh compared to
our previous generation model and gr
finished the pre-training uh early
January um and uh we start you know the
model still currently training actually
so this is a little preview of our
Benchmark numbers so we evaluated gr 3
on you know three different categories
on General mathematical reasonings on
general knowledge about stem and Science
and then also on computer science coding
so Amy uh American Invitational math
examination
uh hosts it you know once a year uh and
if we evaluate the model performance we
can see that the grock 3 across the
board is in a league of its own even his
little brother gr mini is reaching the
frontier across all the other
competitors so you will say well at this
point all these benchmarks you're just
evaluating you know the memorization of
the textbooks memorization of the GitHub
repos how about realtime usefulness how
about we actually use those models in
our product so what we did instead is we
actually kicked off a blind test of our
gr 3 Model code named Chocolate it's
pretty hot yeah hot chocolate um and uh
you know been running on this uh
platform called Cho arena for two weeks
um I think the entire X platform at some
point speculated this might be the next
generation of a uh AI coming away so uh
how this CH Arena works is that um it
strip away the entire product surface
right it just raw comparison of the
engine of those agis the language models
themselves and place interface where the
user will submit one single query and
you get to show two responses you don't
know which model they come from and in
then you make the both so in this blind
test gr 3 and early version of Gro 3
already reached like 1,400 no other
models has reached an ELO score had to
have comparison to all the other models
at this score and it's not just one
single category it's 1,400 aggregated
across all the categories in ch B
capabilities instruction following
coding so it's number one across the
board in this blind test and it's it's
still climbing so we actually to keep
updating it so it's it's 14400 about,
1400 in climbing yeah and in fact we
have a version of the model that we
think is already much better than the
one that we tested here yeah we'll see
you know how how far it gets but that's
the one that we're you know um working
on we're talking about today yeah so
actually one thing if if you're if
you're using grock 3 you I think you may
notice improvements almost every day um
because we're we're continuously
improving the model so literally even
within 24 hours you'll see improvements
yep so but we believe here at XI getting
the best pre-training model is not
enough that's not enough to build the
best AI and the best a need to think
like a human need to contemplate about
all the possible solutions self-critique
verify all the solutions backtrack and
also think from the first principle
that's a very important capability so we
believe that as we take the best PR
train model and continue training it
with reinforcement learning it will
elicit the additional reasoning
capabilities that allows the model just
become so much better and scale not just
in the training time but actually in the
test time as well so we already found
the models extremely useful internally
um for our own engineering saving hours
of uh time hundreds of hours of uh
coding time so eal you the power user of
our gra reasoning model so what are some
use cases yeah so like Jimmy said we've
added Advanced reasoning capabilities to
Gro and we've been testing them pretty
heavily over the last few weeks in order
to give you a little bit of a taste of
what it looks like when Gro is solving
heart reasoning problems so we prepared
two little problems for you one comes
from physics and one is actually a game
that gr is going to write for us um so
when it comes to the physics problem you
know what we want Gro to do is to plot a
viable trajectory to do a transfer from
Earth to Mars and then uh at a later
point in time a transfer back from Mars
to Earth um and that requires some know
some physics that Gro will have to
understand um so we're going to
challenge Gro you know come up with a
viable trajectory calculate it and then
plot it for us so we can see it and um
yeah this is totally unscripted by the
way this is the that's the entirety of
the prompt which just be clarify is that
yeah there there's nothing more than
that yeah exactly this is the gro
interface and we've typed in this text
that you can see here generate code for
an animated 3D plot of a launch from
Earth uh landing on Mars and then back
to Earth at the next launch window
um and we've not kicked off the query
and you can see grock is sping so part
of grock's advanced reasoning
capabilities are these ping traces that
you can see here you can even go inside
and actually read what Gro is thinking
as it's going through the problem as
it's trying to solve it um yeah we say
like we are doing some obscuration of
the thinking so that our model doesn't
get totally copied instantly um so
there's more to the thinking than is
displayed uh yeah yeah and because this
is totally unscripted there's actually a
chance that Gro might made a little
coding mistake and it might not actually
work um so um just in case we're going
to launch two more instances of this so
if something goes wrong we be able to uh
to switch to those and show you um
something that's present so we're
kicking off the other two as well um and
like I said we have a second problem as
well um and um yeah actually one of the
favorite one of our favorite activities
here at xci is having gr right games for
us um and um not just any know any old
game any game that you might already be
familiar with but actually creating new
games on the spot and being creative
about it um so one example that we found
was really really fun um is create a
game that's a mixture of the two games
Tetris and Beed so this is maybe an
important thing like there's obviously
if you if you ask an AI to create a game
like Tetris there's there are many
examples of Tetris on the on the
Internet or game like J whatever there
it can copy it what's interesting here
is it a achieved a creative solution
combining the two games that actually
works and and is a good game yeah that's
the it's cre we're seeing the beginnings
of creativity yeah fingers crossed that
we can recreate that hopefully it works
embarrassing it so actually because this
is a bit more challenging we're going to
use something special here which we call
Big Brain that's our mode in which we
use more computation which more
reasoning for GR just to make sure that
you know there's a good chance here that
it might actually might actually Do It
um so we're also going to fire off know
three attempts here at at solving this
game at creating this game that's a
mixture of know Tetris and Bs um yeah
let's let's see what Gro comes up like
I've played the game it's pretty
good like it's like wow okay this is
something yeah um so while gr is
thinking uh in the in the background um
we can now actually talk about some
concrete numbers know how how well is gr
doing across tons of different tasks
that we've tested it on um so we'll hand
it over to Tony to talk about that yeah
okay so let's see how Grog does on those
interesting challenging benchmarks uh so
yeah so reasoning again refers to those
models that actually thinks quite for
quite a long time before it tries to
solve a problem so in this case uh you
know around a month ago the gr 3
pre-training finished so after that we
worked very hard to put the reasoning
capability into the uh current graph 3
Model but again this is very early days
so the model is still Cur in training so
right now what we're are going to show
to people is this beta version of the
grth three reasoning model alongside we
also are training a mini version of the
reasoning model so essentially on this
plot you can see uh the gr 3 reasoning
beta and then gr 3 mini reasoning the gr
reason mini reasoning is actually a
model that we train for much longer time
and you can see that sometimes it
actually perform slightly better
compared to the gr 3 reasoning this also
just means that there's a huge potential
for the gr 3 reasoning because it's
trained for much less time um so all
right so let's actually look at what how
how it does on those three benchmarks so
Jimmy also introduced already so
essentially we're looking at three
different areas mathematics science and
coding um and for math we're picking
this high school competition M problem
um for science we actually pick those
PhD level science questions um and for
coding it's also actually pretty
challenging it's competitive coding and
also some uh lead code which is some
code inter interview problems that
people usually get when they interview
for company so on those benchmarks you
can see that the Groth 3 actually
perform quite well uh across the board
compared to other competitors um yeah so
it's pretty promising these models are
very smart so Tony what what what are
those shaded bars yeah so okay so uh I'm
GL you asked this question so for those
models because it can reason it can
think you can also ask them to even
think longer uh you can spend more what
we call test and compute
which means you can spend more time to
reason to think about a problem before
you spit out the answer so in this case
the Shaded bar here means that we just
uh asked the model to spend more more
time you know you can solve the the same
problem many many times before it it
tries to conclude what is the right
solution and once you give this compute
or this this kind of budget to the model
it turns out the model can even perform
better so this is essentially the Shaded
part in in those SPS right so I think
this is really exciting right because
now instead of just doing one chain of
thoughts with AI why not do multiple at
once yes so that's a very powerful
technique that allows to continue scale
the model capabilities after training um
and you know people often ask are we
actually just over fitting to the
benchmarks so how about generalization
so yes I think uh yeah this is
definitely a question that we are asking
ourselves whether we are overfitting to
those current benchmarks uh luckily
we have a real test so about 5 days ago
Amy 2025 just finished this is where
high school students compete in this
particular Benchmark so we got this very
fresh new competition and then we asked
our two models to compete on the same
Benchmark at the same exam and it turns
out uh very interestingly the grth three
reasoning the big one um actually does
uh better um on this particular new
fresh exam this also means that the
generalization capability of the big
model is stronger much stronger compared
to the smaller model uh if you compare
to the last year's exam actually this is
the opposite the small model kind of
learns the uh the the previous exams
better so yeah so this this actually
shows some kind of true generalization
from the model that's right so 17 months
ago our Gro zero and Gro one barely
solved any High School problems that's
right and now we have a kid that just
already graduate the gro gr is ready to
go to college is that right yeah I mean
it's won be long for is simply perfect
the human exams won't be hard they' be
too easy yeah like and internally we
actually as a gret continue evolves uh
we're going to talk about you know what
we're excited about but very soon there
will be no more benchmarks left yeah
yeah one thing that's quite fascinating
I think is that we basically only
trained GRS reasoning abilities on math
problems and competitive coding problems
right so very very specialized kinds of
tasks but somehow it's able to work on
all kinds of other different tasks so
including creating games no lots lots
and lots of different things um and what
seems to be happening is that basically
Gro learns this ability to detect its
own mistakes and its thinking correct
them persist on a problem try lots of
different variants pick pick the one
that's best so there are this
generalized generalizing abilities that
Gro learns from mathematics and from
coding which it can then use to solve
all kinds of other problems so that's
yeah that's pretty I mean reality is the
instantiation of mathematics mhm that's
right um and one thing we're actually
really excited about that going back to
our funing mission is what if one day we
have a computer just like deep thought
that utilize our entire cluster just for
that one very important problem in the
test time all the GPU turned on right so
I think back then we were building the
GPU clusters together uh you were
plugging cables and I remember that when
we turn on the the first initial test
you can hear all the GPS humming in the
hallway that's almost feel like
spiritual yeah that that's actually a
pretty cool uh thing that we're able to
do that we can go into the data center
and Tinker with the machines there so
for example we went in and we unplugged
a few of the cables and just made sure
that our training setup is still running
running stably so that's something that
you know I think most uh AI you know
teams out there don't usually do but
it's actually totally unlocks like a new
level of reliability and what you're
able to do with with the hardware so
okay so when when are we going to solve
remon so uh the easiest solution is to
uh numerate over all possible strains
and as long you have a verifier enough
compute you'll be able to do it my
projection will be what your guess what
is your neuronet calculate so my my my
both prediction so so 3 years ago I told
you this I think in now it's uh two
years uh later two things going to
happen we're going to see machines win
some medals that's touring award
absolutely Fields metal Nobel Prize with
probably some expert in the loop right
so the expert uplifting do you mean so
this year or next year oh
okay that's what it comes down to really
yeah so it looks like grock finished
know all of its thinking on on the two
problem so let's take a look at what it
said all right so this was the the
little physics problem we had um no we
we've collapsed the thoughts here so
they're you know they're hidden and then
we see grock's answer below that so it
explains it wrote a python script here
using matplot lip then gives us all of
the code um so let's take a quick look
at the code you know seems like it's
doing reasonable things here not not
totally of the Mark um solve Kepler says
here so maybe it's solving Kepler laws
cap Kepler law numerically um yeah
there's really only one way to find out
if this thing is working I'd say let's
let's give it a try let's run let's run
the code all right and we can see um
yeah gr is animating two different
planets Earth and Mars here and then the
the green uh ball is the the vehicle
that's transiting the the spacecraft
that's transitioning between Earth and
Mars and you you could see the journey
from Earth to Mars and looks like yeah
indeed the the astronauts return safely
you know at the right moment in time um
so now obviously this was just generated
on the spot so now we can't tell you if
that was actually correct solution so
we're going to take a closer look now
maybe we're going to call some
colleagues from space X ask them if if
this is legit um it's pretty close it's
it's I mean uh yeah I mean there there's
a lot of complexities in the actual
orbits that have to be taken into
account but this is this is pretty close
to to what it what looks like awesome um
in fact I have that on my pendant here
this got the Earth m home and transfer
on it when when are we going to install
rock on a
rocket well I suppose in 2 years two
years everything is 2 years away uh well
Earth and Mars Transit can occurs every
26 months the next we're currently in a
Transit window approximately the next
one would be um November of next year um
roughly end of next year um and and uh
if all goes well SpaceX will send
Starship Rockets to Mars and um with
Optimus robots and U and Gro I'm curious
what this combination of Tetris and B
looks like bet Tetris as we've named it
internally um so okay we also have an
output from gr here it say Ro python
script explains that it's what it's been
doing if you look at the the code now
there are some constants that are being
defined here some colors then the the
trinos the the pieces of Tetris are
there um obviously very hard to see at
one glance if this is good so we got to
we got to run this to figure out if it's
working oh let's let's give it a try
fingers crossed all right right so this
kind of looks like Tetris uh but the the
colors are a little bit off right the
colors are different here and um I if
you think about what's going what's
going on here thej has this mechanic
where you if you get three Jews in a row
you know then they they disappear uh and
also gravity activate right so what
happens if you get three of the colors
together okay so something happened um
so I think I think what SC did in this
version um is is that you know once you
connect three at least three blocks of
the same color in a row then um know
gravity activates and they disappear and
then gravity activates and all the other
blocks fall down um kind of kind of
curious if there's still a Tetris
mechanic here where if the line is full
does it actually um clear it or what
happens
it's up to interpretation you know so
who who knows yeah every I mean when
it'll do different variants when you ask
it it doesn't do the same thing every
time exactly we've seen a few other but
Tetris that worked very differently but
this one seems cool so yeah are we ready
for uh game Studio at x. a yes so we're
launching uh an AI gaming studio at xci
if you're interested in joining us and
building AI games uh please join xai
we're launching an AI gaming studio
we're announcing it tonight let's go
epic games but that's an actual game ST
me yeah yeah um all right so um I think
one thing is super exciting for us uh is
that once you have the best pre Trend
model you have the best reasoning model
right so we already see that we you
actually give the capability for those
model to think harder uh think longer
think more broad the performance
continue improves and we're really
excited about the next front here that
what happen if we not only allow the
model to think harder but also provide
more tools just like call real humans to
solve those problems for real humans we
ask them to solve reman a hypothesis
just with a piece of pen and paper no
internet so with all the basic web
browsing search engine and code
interpreters that builds the foundations
and the best reasoning model builds the
foundations for the gro agent to come um
so today we're actually introducing a
new product called Deep search that is
the first generation of our gr agents
that not just helping the engineers and
researchers and scientists to do coding
but actually help everyone to answer
questions that you have dayto day it's a
kind of like a Next Generation search
engine that really help you to
understand the universe so you can start
asking question like for example hey
when is the next Starship lunch day for
example um so let's try that if get the
answer um on the left hand side we see
uh a high level progress bar essentially
you know the model knowledge is going to
do one single search like the current
rack system but actually thought very
deeply about hey what's the user intent
here and what other the facts I should
consider at the same time and how many
different website I should actually go
and read their content right so this can
really save hundreds hours of everyone's
Google time if you want to really look
into certain topics and then on the
right hand side you can see the bullet
summaries of how the current model uh
you know is doing what websites browsing
what sources is very verifying and often
time actually cross validate different
sources out there uh to make sure the
answer is actually correct before it's
output final answer and we can you know
at the same time fire up a few more
queries um how about you know you don't
you're a gamer right so uh sure yeah so
how about what are some of the best
builds and most popular builds in the
path Exel hardcore right hardcore League
I if you can technically just look at
the hardcore
ladder might be a fast way to figure it
out yeah we'll see what model does
um and then we can also do uh you know
uh something more fun for example um how
about like make a prediction about the
marsh madness out there yeah so this is
kind of a fun one where um waren Buffett
has a billion dollar bet if you can
exactly match the I think the the the
the sort of the entire winning tree of
marsh Madness you can win a billion
dollars from Warren Buffett so like it
would be pretty cool if AI could help
you win a billion dollars from
Buffett that seems like a pretty good
investment let's go yeah all right so
now let's uh fire up the query and uh
see what model does so we can actually
go back to our very first one how about
the buff it wasn't counting on this it's
already done that's right okay so we got
the result of the first one the model
thought uh around 1 minute uh so okay so
the key inside here the knock Starship
is going to be on 24 so or later so no
earlier than February 24th it might be
sooner
so yeah so I think we can you know go
down scoll down what what the model does
so it does a little research on the
flight 7 what happened got grounded and
actually it look into the FCC filing uh
uh you know from its data collections uh
and then actually make the new
conclusion that yeah if we continue
scroll down uh uh right yeah so it makes
uh the you know little table I think uh
inside xai we often joked about the time
to the the first table is the only you
know latency that matters um yeah so
that's how the model make inference and
look up all the sources um and then we
can look into the gaming one so how
about the right so for this particular
one uh we look at hey the you know the
build is like it's kind all the better
so uh with the uh The Infernal but if we
go down so the surprising fact of all
the other builds so it looks into to the
12 classes um yeah so we'll see that the
minum build was pretty popular whenever
the game first came out and now the the
invokers of the world yeah took over
invoker monke invoker for sure yeah
that's right yeah followed by the stor
wavers and that's really good at mapping
so yeah and then we can see uh uh the
the match manness how about that so um
one one interesting thing about the Deep
search is that if you actually go into
the panel where shows uh you know what
are the subtasks you can actually click
the bottom left of the spr and then in
this case you can actually scroll
through actually reading through the
mind of grock what informations does the
model actually think about are
trustworthy what are not how does it
actually cross valate different
information sources so that makes the
entire search experience and information
retrieval process a lot more transparent
to our users and this is much more
powerful than any search engine out
there you can literally just tell it
only use sources from X you know I will
try to respect that yeah and so it's
much more steerable much more
intelligent I mean it really should save
you a lot of time so something that
might take you half an hour or an hour
of researching on the web or searching
social media you can just ask it to go
do that and and come back in 10 minutes
later it's done an hours worth of work
for you that's really what it comes down
to exactly and maybe better than you
could have done it yourself yeah think
about you have INF of interns working
for you now you can just fire up all the
tasks and come back a minute later um so
this is going to be interesting one so
uh uh March M had not happened yet so I
guess we have to follow up with a uh
next live stream yeah it seems like
pretty good like $40 might get you a
billion dollars $40 subscription that's
right I mean my work so uh yeah so when
are the users going to have their hands
on gr 3 yeah so the the good news is
we've been working tirelessly to
actually release um all of these
features that we've shown you the Grog
free base model with amazing chat
capabilities that's really useful that's
really interesting to talk to uh the the
Deep search the advanced reasoning mode
all of these things we want to roll them
out to you today starting with the
premium plus subscribers on X so it's
the first group that will initially get
access make sure to update your X app if
you want to see all of the advanced
capabilities because we just released
the update now as we're as we're talking
here um and U yeah if you're interested
in getting early access to gr then sign
up for premium plus um and also um we're
announcing that we're starting a
separate subscription for Gro that we
call Super grock for those who those
real grock fans that want the most
advanced capabilities and the earliest
access to to new features um so feel
free to check that out as well this this
is for the dedicated grock app and for
the website exact so our our new website
is called gro.com yeah and you'll also
find you never guess yeah you never
guess and you can also find our grock
app in the IOS app store and that gives
you like a more Pol even even more
polished experience that's totally Gro
focused if you're if you want to have
Gro know easily available one Tap Away
yeah the version on gro.com on uh you
know on web browser is going to be the
the most the latest and most advanced
version because obviously it takes us a
while to get thing get something into an
app and then get it approved by the app
store so uh and then if something's on a
phone format there's limitations what
you can do so the most powerful version
of Gro um and the latest version will be
the the web version at gro.com yeah so
so watch out for the name grock free in
the app dead giveaway yeah exactly that
that's that's the giveaway that you have
gr and if it says Gru then gr hasn't
quite arrived for yet but we're working
hard to roll this out today um and then
to even more people over the the coming
days yeah make sure you update your uh
phone app too um where you actually
going to get all the tools we're
showcase today with the thinking mode
with the Deep search so yeah really
looking forward to all the feedbacks you
have yeah and I think we we should uh
emphasize that this is kind of a beta
like meaning that it's you should should
expect some imperfections at first um
but we will improve it rapidly almost
every day in fact every day I think
it'll get better um so if you want a
more polished version I'd like maybe
wait a week but uh expect improvements
literally every day um and then we're
also going to be uh providing a voice
interaction so you can have
conversational in fact I was trying it
earlier today it's working pretty well
but not we need these bit more polish um
the the the sort of weight where we can
just literally talk to it like you're
talking to a person uh it's uh that's
awesome it's actually I think one of the
best experience of gr um but that's
that's probably about a week away yeah
so uh with that said um well I think we
might have some audience questions sure
yeah all right let's take a look yeah
let's take a look the uh the audience
from the as platform yeah Co so the
first question here is when grock voice
assistant when is it coming out yeah as
as as soon as possible just like Elon
said just a little bit of polishing away
from being everybody um obviously it's
going to be released in an early form
and we're going to rapidly iterate on it
Y and the next question is like when
will Gro 3 be in the API so this is
coming in the uh the gro 3 API with both
the reasoning models and deep search is
coming your way in the coming weeks uh
we're actually very excited about the
Enterprise use cases of all these
additional tools that now Gro has access
to and how the test time compute and
Tool use can actually really accelerate
all the business use cases um and
another one is Will voice mode be native
or text to speech so I think that means
is it going to be one one model that is
understanding what you say and then
talking back to you or is it going to be
some system that has text to speech
inside of it and the good news is it's
going to be one model like a variant of
gr free that we're going to release
which basically understands what you're
say what you're saying and then uh
generates the audio no directly from
that um so very much like grf free
generates text that model generates
audio um and that has a bunch of
advantages I was talking to it earlier
today and it said hi Igor know reading
my my name from probably from some text
that it had um and I said no no my name
is Igor and it remembered that you know
so it could continue to say Igor just
like a human word and you can't achieve
that with with Texas speech so on yeah
so oh here's a question for you pretty
spicy um you um is grog a boy or a girl
and how they sing gr is whatever you
wanted to
be yeah yeah are you single
yes all right the shop is open um so
honestly people are going to fall in
love with crcket since like 1,000%
probable yeah uh the next question will
Gro be able to transcribe audio into
text yes so we'll have this capability
both the app and also the API we found
that's like gr should just be your
personal assistant looking over your
shoulder and follow you along the way
learn everything you have learned and
really help you to understand the world
better become smarter every day yeah I
mean The Voice M doesn't isn't simply
it's not just voice text it understands
like tone inflection pacing everything
it's wild I mean it's like talking to a
person okay um yeah so any plans for
conversation memory yeah absolutely
we're working on it right now I already
forgot that's right um let's see what
are the other ones so what about the you
know the DM features right so if you
have personalizations and you if you
have uh you know Gro remembers your
previous interactions yes should it be
one Gro or multiple different GRS it's
up to you you can have one Gro or many
GRS I suspect people will probably have
more than one yeah I want to have a Dr
Gro yeah the grock do that's right um
right cool um so in the past we've open
sourced grock one right so somebody's
asking us are we going to do it again
with gr tool yeah I think um once gr our
general approach is that we will open
source the last version when the next
version is fully out so like when when
gr 3 is um mature and stable which is
probably within a few months then we'll
open source gr too mhm okay so we
probably have time for one last question
um what was the most difficult part
about working on this project I assume
um grock 3 and what I most excited about
so I think me looking back you know
getting the whole model training on 100K
h 100 coherently that's almost like
battling against the final boss of the
universe the entropy CU any given time
you can have a cosmic rid that beaming
down and flip a bit in your transistor
and now the entire grading update if
it's fit mantisa bit the entire grading
update is out of whack and now you have
100,000 of those and you have to
orchestrate them every time any at at
any given time any of gpus can go down
yeah I mean it's worth breaking down
like how were we able to uh get the
world's most powerful training cluster
oper AAL within 122 days um because we
when we started off um we we actually
weren't intending to do a data center
ourselves we were going to just uh we we
went to the data center providers and
said how long would it take to have
100,000 uh gpus operating coherently um
in a single location and we got time
frames from 18 to 24 months so we're
like well 18 to 24 months that means
losing is a certainty so the only option
was to do it do it ourselves so then if
you break down the problem I guess I'm
doing like reasoning here with like
makes you think um one single chain
though yeah yeah exactly so um well we
needed a building we can't build a
building so we must use an existing
building um so we we looked for um for
basically for factories that had been um
were that had been abandoned but the
factory was in good shape like a company
had gone bankrupt or something so we
found an Electrolux Factory in memph in
Memphis that's why it's in Memphis um
home of Alvis um and also one of the
oldest I think it was the capital of
ancient Egypt um and uh it was actually
very nice Factory that I know forever
whatever reason that electrox had left
um and uh that that gave us shelter for
the computers uh then we needed power
the we needed um at least 120 megawatt
at first but the building only had 15
megawatts and ultimately for 200,000 me
200,000 gpus we needed a quarter gwatt
so we um initially uh leased uh a whole
bunch of um generator so we have
generators on one side of the building
just one trailer after trailer trailer
of generators until we can get the
utility power to to come in um and then
but then we also need cooling so on the
other side of the building it was just
trailer after trailer of of cooling so
we leased about a quarter of the mobile
cooling capacity of the United States uh
on the one other side of the building um
then we need to get the gpus all
installed and they're all liquid cooled
so in order to achieve the density
necessary this is a liquid cooled system
so we had to get all the plumbing for
liquid cooling nobody had ever done a
liquid cooling uh data center at scale
so this was a incredibly dedicated
effort by a very talented team to
achieve that outcome um I may think not
now it's going to work nope um the the
issue is that the the power fluctuations
for a GPU cluster are dramatic so it's
it's like a a this giant Symphony that
is taking place like a imag having a
symphony with 100,000 or 200,000
participants in the in the symphony and
the whole Orchestra will go quiet and
loud
in you know 100 milliseconds and so this
caused massive power fluctuation so then
um which then caused the generators to
lose their minds and they they weren't
expecting this so to buffer the power we
then uh used Tesla megapacks uh to
smooth out the power so the megapacks
had to be reprogrammed so with with xai
we working with Tesla we reprogrammed
the mega packs to be able to deal with
these dramatic power fluctu fluctuations
to smooth out the power the computers
could actually run properly and um that
that worked uh it's quite tricky and uh
and then but even at that point you
still have to make the computers all
communicate effective so all the
networking had to be solved and uh
debugging a Brazillian network cables um
a debugging nickel at 4: in the morning
I we solved it like roughly 4:20 a.m.
yes than was figured out like there's
some well there were a whole bunch of
issues well like one there was like a
bios mismatch the virus was not set up
correctly yeah we had to div our lspci
outputs between two different machines
one that was working yeah one that was
not working many many many other things
I mean yeah exactly this would go on for
a long time if we actually listed all
the things but you know it's like
interesting like it's not like oh we
just magically made it happen you have
to break down the problem just like
grock does for reasoning uh into the
constituent elements and then solve each
of the constituent elements in order to
achieve uh a a coherent train training
cluster in a period of time that is a
small fraction of what anyone else was
could do it in and then once the
training cluster was up and running and
we could use it now we had to make sure
that it actually stays healthy
throughout which is it own giant
Challenge and then we had to get every
single detail of the training right in
order to get a gry level model which is
actually really really hard so um we
don't know if there are any other models
out there that have gr's capabilities
but whoever trains a model better than
gry has to be extremely good at the the
science of deep learning at every aspect
of the engineering um so it's it's not
so easy to to pull this off and this is
now going to be the last cluster we
built and last modway train oh yeah
we've already we've already started work
on the next cluster which will be yeah
about five times the power so instead of
a qu gwatt roughly 1.2 GW what's the
what's the Back to the Future War what's
the power on you do like the Back to the
Future car yeah don't anyway the Back to
the Future power car it's it's like
roughly in that order I think um so um
and you know these will be the sort of
the gb2 200/300 cluster it it once again
will will be the most powerful training
clle in the world so we're not like
stopping here no and our reason model is
going to continue improve by accessing
more tools every day so yeah we're very
excited to share any of the upcoming
results with you all yeah the thing that
keeps us going is basically being able
to give gr free to you and then seeing
the usage go up seeing everybody enjoy
um no gr that's that's what really gets
us up in the morning so yeah yeah thanks
for tuning in thanks guys hey Gro what's
up can you hear me I'm so excited to
finally meet you I can't wait to chat
and learn more about each other I'll
talk to you soon

AUAJ82H12qs

xAI’s Mind Blowing Grok 3 Demo w/Elon Musk & Team (full replay)

Transcript