Worth the Hype? OpenAI ‘Deep Research’ Tested

Transcript

Title: Worth the Hype? OpenAI ‘Deep Research’ Tested
Author: Dr. Know-it-all Knows it all

Transcript:
hey y'all it's Dr know all today I
wanted to talk about open ai's new deep
research product this is something
that's quite different than their
operator which I found very much
underwhelming and of course I've
recently tested their 03 mini and 03
mini High their terribly named products
but today I wanted to talk about deep
research which I think will prove to be
quite the GameChanger although at this
point it is very difficult to use
because I think that open AI is being a
little bit overwhelmed so anyway
introducing deep research this was from
a couple of days ago and it's just taken
me it's taken a long time to go through
this normally you know that I do these
tests like when I test out the product I
do it kind of in front of you and I show
how it's thinking and stuff not the way
I'm going to be able to test this I had
to use this on and off in the background
for the better part of the day to try to
get it to actually run a few of these
deep research projects it has been
completely bogged down it's been very
frustrating to use so I have to admit
that to start with it's not the most
exciting thing in the world but I I can
start to see inklings and when they get
this kind of ironed out and they
actually have the bandwidth to be able
to do inference for the group of people
that are interested by the way this is
only available for the $200 a month
subscribers I have a feeling open AI may
open up a more expensive 500 or ,000 or
$2,000 a month subscription tier because
at this point it looks like their
services are being overwhelmed and $200
a month is probably not enough to reduce
demand so at this point I'm going to
cross cross my fingers I'm you know I'm
happy that I'm able to use it right now
for a mere $200 a month so it's not
cheap but it's very cool when I'm
actually able to use it with several
caveats so like I said the number one
caveat is just the fact that it's it's
basically unavailable I've been trying
to use it all day I tried to use it last
night it just it doesn't work it just
gets stuck in fact let me go take a look
right here yep waiting so I've been
trying to get it to do uh something for
me this project for me for about
actuators and stuff I I thought for
Scott Walter I would do one on actuators
and I was very curious about what the
latest is you know from all these
different robotics companies and stuff
but I have tried many many times to get
this thing to operate I've got a whole
bunch of different conversations where
I've sort of started it over so it's
basically not working now we're going to
sort of push that aside and assume that
that stuff is going to get worked out
and talk about this you know the reason
why it's important and the reason why it
is the future of AI and the reason why
people say that this is the beginning of
the significant hockey stick moment the
number one reason is they built this on
their 03 model so thus far we have seen
their 01 model which was the first
reasoning model which would actually
think about things of course you know R1
came out the Deep seek R1 and has
significantly challenged open Ai and so
open AI of course had to recast things
and revamp things and so of course in
response to that open AI released their
o03 mini and 03 mini High which
apparently give you a full access to the
actual rationale that's going on rather
than what 01 did did which was just sort
of a summary which was less than Stellar
so anyway we did get that but that did
not give us access to their 03 model the
the really big model the you know the
big boy model as opposed to the mini
models but it looks like open ai's deep
research actually does give us access to
their 03 model the full model so this is
the first you know kind of insight that
we have into how their 03 model will
work and from the couple of examples
that Ive been able to eek out today it
looks like it's actually really very
cool but it doesn't present things quite
as well as I wish it would so we'll take
a look at all of that so we'll start
with what it is and this is from open
ai's press release and of course I'll
leave a link to this down in the
description deep research is open ai's
next agent that can do work for you
independently you give it a prompt and
chat GPT will find analyze and
synthesize hundreds of online sources to
create a comprehensive report at the
level of a research analyst and and yeah
it can actually do papers that are
fairly sophisticated here powered by a
of the upcoming open aai 03 model so
this is what I was talking about that's
optimized for web browsing and data
analysis it leverages reasoning to
search interpret and analyze massive
amounts of text images and PDFs on the
internet pivoting as needed in reaction
to information it encounters so
basically it is a full-fledged agent you
uh say what you want and then it asks
you a whole bunch of questions it's it's
rather annoying actually it asks you so
many questions that it actually kind of
takes you out of the flow because you
have to answer the questions but they're
they're good question questions so I I I
don't deny the fact that open AI needs
to do this so that this agent that might
go spend 10 or 20 or 30 minutes
researching something for you is not
wasting its time so it makes sense why
they do this but it is rather annoying
the number of questions it asks before
it starts but anyway you know again I'll
give them that because they don't want
to be wasting their time or compute
Cycles or whatever but once it gets
going and assuming it actually starts to
work and doesn't break you get some
really interesting insights it does a
lot of research it'll search dozens of
sources and find the best sources it
will do everything from I asked it to
scour the internet and find me the best
replacement for my Sony a73 that I'm
using currently I don't know that I'll
use it but I was like that sounds like a
good thing right what would I what would
I want to spend my $3 to $5,000 on if I
wanted to upgrade my camera setup for my
studio and of course then I also asked
it about robot actuators I I did that
for Scott like I said and for my own
interest I haven't gotten an answer from
that yet I did a couple of things that I
can't talk about because they're more
private business related stuff and I was
able to eek that stuff out but then to
give it a real challenge a real
scientific challenge I asked it about
the latest theories about time as an
emergent property of the universe rather
than an intrinsic einsteinian sort of
property of the universe in other words
SpaceTime versus time as an emergent
phenomenon I wanted to see how it would
do with something that you know I would
consider to be a rather sophisticated
physics based paper that should take
account of things that have been written
in the last 20 or 25 years or so so even
though though it was difficult for me to
do this because it was it was broken and
it wasn't working a lot I was able to
get a range of different reports and so
I think I've got you know at least some
insight on this so again before we look
at the results we'll go a little bit
deeper into their press release which of
course is very light on details anyway
deep research is built for people who do
intensive knowledge work in areas like
Finance science policy and engineering
and need thorough precise and reliable
research it's also good for Shoppers Etc
so you know basically if you need it to
do deep research if it's something that
might take you hours or days to do
yourself this is a great way to do as
Dave Shapiro calls it cognitive
offloading to get rid of some of the
work that you would have to do some of
the brain cycles that you would have to
expend to do anything from shop for a
new camera to learning about Steven
wolfram's hypergraph update version of
emergent time in physics so you know
anything in between those things biology
medicine you name it you could get the
kind of information you want and
anything that's available on the
internet that you can scour the internet
for it can find for you so it's really
really cool so how does it work in more
detail deep research independently
discovers reasons about and consolidates
insights from across the web to
accomplish this it was trained on real
world tasks requiring browser and python
tool use so in other words it's able to
browse the web it can also write python
code and I guess execute it as well
using the same reinforcement learning
methods behind open ai1 our first
reasoning model so reinforcement
learning if you happen to have watched
my videos on R1 and r 10 definitely
check those out up here I'll leave them
at the end of the video as well but
anyway same basic idea reinforcement
learning to allow their 01 and now their
03 model to reason about the world to
think about the world in a reasoning way
and it can use tools like browsing the
web and python as they say here while 01
demonstrates impressive capabilities in
coding math and other technical domains
many real world challenges demand
extensive context and information
gathering from diverse online sources
deep research Builds on these reasoning
capabilities to bridge that Gap allowing
it to take on the types of problems
people face in work and everyday life so
in other words it's an actual agent it's
actually something that you can say go
do this task for me and it can be a
multi-step task and you want a report at
the end you don't care particularly
about the details you just want it to go
out and do this stuff that is a gentic
behavior you're not having to give it
instructions every step of the way and
hold its hand you're like go do this for
me and it goes and does it for you as
for how it was trained deep research was
trained using endtoend reinforcement
learning on hard browsing and reasoning
tasks across a range of domains so how
they got that information how they got
the requests and the answers and stuff
good question that's something they're
not telling us about but anyway
obviously it wasn't easy to collect this
data to train it on but theyve done that
through that training it learned to plan
and execute a multi-step trajectory in
other words it's a gentic it's behaving
on its own to find the data it needs
backtracking and reacting to realtime
information with where necessary the
model is also able to browse over user
uploaded files plot and iterate on
graphs using the python tool and bed
both generated graphs and images from
websites in its responses and site
specific sentences or passages from its
sources so it's able to act as a
research assistant I mean that's
basically that's basically what a
research assistant would do these days
they would go online they would find the
best sources they would read them they
would extract the good passages they
would put together a report for you in
the fashion that you wanted do you want
it e ademic do you want kind of a
highlevel executive summary whatever
you're interested in it will give you
that formatted output with tables with
graphs Etc and then for fun let's take a
look at Humanity's last exam which used
to be called Humanity's Last Stand in
other words it's supposed to be like the
hardest test that humans have ever come
up with and potentially it's the hardest
thing we will ever come up with and the
next version of this will actually be an
AI generated version of this exam anyway
I haven't really looked at the questions
but I know they're very very difficult
so uh you can see the accuracy
percentage GPT 40 which was
state-of-the-art not that long ago got
3.3% accuracy on this exam so just kind
of pathetic right you know it's like
it's like almost chance at that point if
it's 3% you can then see here that open
AI 01 which is relatively recent just a
few months old got 99.1% so that's a 3X
Improvement that's very very nice but
still single digits you can also see the
Deep seek R1 only got a 9.4 and that is
not multimodal it was on text only
subset but anyway it's it's on the order
of the same performance as open ai's 01
and then of course you can see that open
ai's recent releases 03 mini and 03 mini
High got 10.5% and 133% respectively but
then open ai's deep research gets 26.6%
in other words it doubles the best score
that was there a couple of days ago now
does it have access to Google and to
tools and things like that yes this is
something that these other models don't
necessarily have access to so it does
give it a leg up but we're talking about
going from 3% what six months ago or
something actually even less than that I
mean gp24 was state-of-the-art in like
uh September October right I think I
think that 01 came out in like November
if I'm remembering correctly so not that
long ago state-of-the-art was 3% we're
now at a quarter we're over a quarter of
the way to acing this exam what is
another six months or a year going to
give us I I don't know but you know once
this thing accomplishes on the order of
80 or 90% on Humanity's last exam people
are actually kind of running out of
ideas of how to test these things
anymore so that's the kind of like just
wall we're hitting at this point it's
just a hockey stick that it's just going
up like crazy and these models because
they're reinforcement learning based are
able to continue to self-improve because
you can actually ask these models to
write up documentation on how to improve
themselves they can then take that
documentation and figure out how to make
themselves better and once AI can make
itself better then it can make
everything else better because of course
the smarter the AI is the more it's able
to do deep research in other areas as
well so we're really we're looking at a
crazy moment in history right now so
with that being said let's take a look
at some results here and like I said
it's been challenging to get these
results the number one thing you need to
do is press this deep research button
down here and turn it a blue color I
told it a bunch of details and you can
pause the video I'm not going to read
all of this because it is a lot but
anyway I give it a lot of details about
a camera what I'm using it for what I
want out of it Etc and I said what I'm
using right now which is an a73 so this
isn't the most deep thing in the world
but it is something I could spend days
going down a rabbit hole trying to
figure out what the best camera was
second guessing myself going like oh
what about this what about this right so
this is something if it can do it in a
half an hour and it can save me that
cognitive load and it can give me a good
recommendation this is fantastic even
though it's not like Earth shattering it
could make a difference to me if I'm
going to spend three or $5,000 I want to
spend my money wisely right so anyway
then what it did was like I said the
first thing it does is asks a whole
bunch of questions so I did ask it about
a teleprompter as well by the way if
anybody knows I would love a
teleprompter for my iPhone for like this
where I could actually put people's
video on here like basically mirror the
iPhone so I can see it and look at them
when I'm talking otherwise I'm usually
looking off to the side which seems to
put people off but anyway aside from
that we're looking you know it ask
questions about budget range shooting
resolution frame rate current lighting
setup all that kind of stuff I give it
the answer then more questions autofocus
priority preferred lens type Studio
lighting setup Etc and then I answered
those questions and then it had more
questions for me so like I said is very
very pedantic about asking for a lot of
questions you know again I can
understand why they're about to expend a
huge number of tokens and compute Cycles
so they want to make sure it's right but
it is a little bit annoying before you
get there anyway so I gave that more
answers and so now it said I'm going to
research top five cameras best lens
choices comparison table Etc I'll update
you when I have the details so here's
where you know it kind of like just sat
there and I would wait for like 10
minutes and then it's like okay well you
know whatever and So eventually I just
said hello or are you working on it or
something and it finally got going and
then it it it finally did this now this
was like I think the third web browser I
opened with the same question kind of
was copying pasting by the end so the
answers might not perfectly match up
with the questions anymore but by the
time I finally got it working it spent
17 minutes you know looking through all
of this stuff and unfortunately when it
comes up with the answer you can't see
the thinking over here so it's kind of
gone which is which is unfortunate but I
did get some screen grab to show you for
another one but anyway when it gets done
you can see it produces a pretty nice
result here and it says the A7 the alpha
a74 and we've got the Canon EOS R6 Mark
II we've got the Sony A7 S3 good good
recommendation and then the classic
Panasonic Lumix s52 so yeah so those are
the the choices it gives a a chart about
all of this with approximate price lens
details all of this stuff with sources
that you can go to so you can actually
look at the sources if you're interested
and then some teleprompter Solutions as
well that I didn't find as useful but
still relatively nice to see this then
of course we get a kind of generic
response and I said hey you know give me
a recommendation I want the best camera
lens in your opinion plus the best
telepromter solution so it was it didn't
demure it was like okay here's my here's
my recommendation which is likely what I
would pick because I've got the Sony a73
so it's like well you know getting the
a74 makes perfectly good sense plus I
can use the the lens that I currently
have on that so that would make really
good sense to do that or I could upgrade
the lens to the 2470 that it recommends
here as well so it gave me a rationale
for that and the best teleprompter
solution which again I didn't find as
good because I really want a software
solution more than a hardware solution
and I was trying to find a hardware one
I've already got a teleprompter I've got
the mirror and all that stuff I just
want something where I can run video on
my phone but regardless this actually
produced something that is very useful
to me and you know saved a lot of time
and I think it came up with the right
answer I have a general sort of sense
that the Lumix or the Sony would be the
two best solutions for really really
active autofocus in a studio lighting
situation it's those those seem to be
the the correct choices and so I was
very pleased to see that this reinforced
my opinion about that now the big caveat
that I was talking about and is even
more egregious with the scientific paper
that I had it work on is there's no real
way to get this out of chat GPT and into
a document there's no way to print a PDF
that looks attractive and if you copy it
it sort of doesn't copy the format in
properly the table doesn't look good and
so it's it's really really substandard
so this is this is the big caveat I
wanted to pass on to uh open AI it's
like you guys need you know an export
feature you need something and I know
there's this share up here but that's
like share a link that doesn't actually
like print out of copy I want something
where I can actually print it print it
to PDF save it to a Word document print
it as a latch document whatever that's
pretty necessary if you're going to have
a product like this it needs to be able
to reduce professional output and I know
there is this thing called canvas but
when I try to edit in canvas it won't
let me do that because it says that that
the document is too long for canvas so
if you're going to use canvas as your
primary interface for creating more
pretty attractive documents and
exporting them then you better darn well
make the canvas bigger so you can
actually edit these documents there but
I would say at least a basic PDF export
would be really really important you
need to be able to have this formatted
properly you can't just have garbage
text all over the place if you try to
export it so that's super annoying
because you've got this beautiful thing
but you can't export it and you can't
share it with other people so speaking
of broken you can see here that I wanted
you know to ask it about actuators and
remember at the beginning of this video
which was now about 26 minutes ago 27
minutes ago it's still waiting it just
sits there and either eventually it will
break and just stop working entirely and
I'll have to ask something like you know
I'll be like are you working
and see like yeah you know just not
really getting much of a response it's
it's definitely been broken and I think
again it's because it's overwhelmed but
it is a frustrating experience that
you're not able to actually get a
response rapidly enough but let's take a
look at one of these that actually
worked well which was I want the latest
research into the question of whether
time is an emergent phenomenon or
intrinsic sources should be of the
highest quality the question of whether
Einstein's theory that space and time
are sort of intrinsic fabric is correct
or whether some other means of creating
the macroscopic reality we experience is
closer to the truth of the universe is
what I'm after here be sure your summary
is formatted like a highquality journal
article with graphs and tables as needed
things so anyway I'm I just I'm super
fascinated by this topic and I was like
well let's go ahead and let the
researcher figure this out let deep
research do this again I you know I
don't have this clicked right now but I
did of course before I you know
submitted the whole thing and of course
I have a lot of familiarity with this
topic because it really fascinates me so
I've spent a lot of time researching it
I wanted to see what this would come up
with it actually came up with one theory
that I had not heard of before so that
was actually really cool I was able to
learn some stuff but anyway you can see
the first thing it did was you know
asked a bunch of questions I answered
the questions asked more questions and
then it asked me how familiar I was I'm
like you know I'm good you can you can
go in as much depth as you want so it
was able to you know slowly get through
all the questions that it had more
questions more questions so and then it
said it's going to print all this stuff
out and then it stalled it didn't do
anything for for the longest time so you
know I would kind of prompt it I'd be
like thanks are you working let me know
when you start the research Etc
eventually it it kicked over this was
again I think I had to open this up
three times in three different web
browsers and start over again and then
finally it was able to do that it was
able to do this in 8 minutes which I
find pretty remarkable considering that
it's a fairly complex topic but you know
it went often spent time that would have
taken me a full day of research that I
that I guess you know 8 to 10 hours it
would have taken me to gather all these
sources and put this together and that's
not even counting writing it up in some
sort of reasonable format so you could
maybe even consider it 2 days if I
wanted to write it up decently well so
as it works and unfortunately like I
said once it gets done it sort of buries
this and you can't see it anymore so I
just did some screen grabs you can see
here you know let me let me know when
you start doing the research and it
starts doing it and there's like a
progress bar that goes across and over
on the right hand side there's sources
and activity and you can see you know
exploring recent contributions in string
theory identifying relevant sources
searching for all of this stuff so it
goes through this you can see there's 18
sources at this point up to 28 sources
at this point um you know it's it's
looks at like citing particular lines
out of individual papers so it's really
cool you know you can actually see the
process of it going through and
analyzing this stuff at this point it's
looking at Wolfram Alpha at Wolf from
physics to look into the hypergraph
version of the universe that wilam has
been promoting for the last decade or so
at this point and then it continues on
building the writing plan so it moves
from research and it had 28 sources here
to figuring out how to write up the
final documentation and so when it
finally gets there it produces this
paper where you can see there's an
abstract it goes through the different
possibilities there's an introduction
general relativity you know second law
of Thermodynamics emergence all that
kind of stuff so again if you want to
actually if you want me to I can try to
put this in the description but the
problem is like I said you can't really
print this if you try to print it out it
ends up being a garbled mess so it's
great inside of this but you can't
really export it which is very very
frustrating so that's a big caveat to
deep research it's not that useful at
this point because you can't share it
easily you'd have to sort of copy and
paste it by pieces and reformat the
whole thing and that would be a pain in
the butt especially when you get to
things like this where you have uh
equations and stuff and actually there's
some things like this like me and new
slash it it doesn't even format those
properly anyway but this equation is a
complete disaster if you try to like
export it and stuff so anyway it should
at the very least print to PDF and
create latc documents that are markup
documents with the latch formatting I
mean that seems like the basics of being
able to do export but anyway so you can
see that it does it it's you know you've
got thermodynamic Arrow of time emergent
Direction quantum gravity uh let's see
what else do we have string theory and
holography uh computation and novel
approaches so this would be wolm stuff
and then you eventually get to uh and I
did want to have experimental
constraints and falsifiability I want
theories that at least have some degree
of falsifiability or else it's just sort
of magic right it's otherwise it's
Alchemy or something but anyway so it
goes through all of this stuff and then
eventually creates this table where you
can see there's gr general relativity
there's thermodynamics goes through
fundamental key features key predictions
falsifiability Etc and you can see that
most of them these days actually
consider that time is more emergent than
it is intrinsic whereas certainly gr
general relativity assumes that it's
intrinsic the rest of them most of the
rest of them consider it to be emergent
or mixed with string theory and
holography so anyway it goes through
that and then we get to eventually a
conclusion here and then of course we
get to the ubiquitous sources with links
so you can actually go and check the
sources out so I would consider this to
be a a really good you know kind of
graduate level summary paper now it's
not discovering any new science it's not
making any predictions about science
it's just kind of going through and
summarizing what other people have done
but if you were going to be researching
in this area and you wanted to catch up
quickly and go like okay I need the
latest about this so I can see what my
theory is and how it stands versus the
rest of these this is a a huge timesaver
this is something either you're going to
have to hope somebody else already did
this for you or you're going to have to
spend days working on this to put this
all together for yourself and this did
this in 8 minutes that is crazy it did
the research the formatting and the
writing in 8 minutes that is nuts so
with the caveat that this is completely
overwhelmed and sorry Scott it's still
waiting if this eventually does produce
something about these actuators I will
actually pass it on to you because I'm
interested myself but anyway with the
caveat that it's it's completely
overwhelmed and not working very well
right now and a really big caveat that
you can't as far as I can tell export
this in any kind of reasonably formatted
document at this point the Deep research
project itself is really pretty
remarkable is this the game changer
people are talking about on the internet
I don't think it's quite as big as some
of the hype that people are talking
about but this is a true Game Changer in
terms of productivity elevation this is
the same kind of thing where talking to
a chatbot really can help you to
discover something really rapidly in the
creative space and can help you with
coding and stuff with this you can start
to take on larger tasks and you can
actually give up more of your uh
autonomy you can delegate more to an
entity like this and it can go out and
do significant tasks rather than writing
you a piece of code that can operate it
looks like something like this could
potentially write you an entire software
suite right it can write you something
with multiple different files with a
main function that then accesses a bunch
of sub files with really thinking about
how to do a more sophisticated
engineering project on its own or it can
do scientific research like this or it
can go shopping for you so whatever
you're interested in if it involves
python or scraping the web or doing
research on the web I think that deep
research is going to be a true Game
Changer is it enough to change the world
just yet I don't quite think so I don't
think it's good enough to discover new
science and new AI research on its own
yet but what is 04 going to look like I
mean we're talking about 03 and 03 was
only 3 to four months after 01 if 04
comes out by summer like May or June or
something and it's another leap that's
this much above 03 then the sky is the
limit at that point eventually this AI
stuff is going to start to research
itself and it's going to figure out how
to make itself better not just in a sort
of training sort of manner like it does
right now a re reinforcement learning
manner but in a more explicit directed
manner where it's actually doing its own
research and figuring it out on its own
and at that point things get really
crazy all right everybody what do you
think about this have you used deep
research yourself have you've been able
to make it function definitely let me
know in the comments while you're down
there if you don't mind liking and
subscribing both of those really help
out the channel and I greatly appreciate
it and I will see you in the next video
bye-bye
[Music]

30fdBD2C2K0

Worth the Hype? OpenAI ‘Deep Research’ Tested

Transcript