you've heard of large language models
like chat gbt chat GPT chat GPT chat GPT
they can answer questions write stories
and even engage in conversation but if
you want to build a business that uses
this technology you'll need to ask
yourself an important question how do I
take this role model and this raw
intelligence and actually customize this
to my use case how do I make it really
good for my user so that it's
differentiated and better than what's
out there this is Raza Habib his company
human Loop enables large language models
to have even greater superpowers we can
help you build differentiated
applications and products on top of
these models the range of use cases is
that now feels to be more limited by
imagination than it is limited by
technology you can replicate your exact
writing style customized tone fact check
answers and train the model on your
company's unique data we really hope
that this is a platform on top of which
you know the next million developers can
build llm applications in our
conversation we explore the secrets to
building an app that stands out what
made it so good that a million users
signed up in five days was a fine-tuning
exercise the impact of generative AI on
developers they're you know finding a
significant fraction of their code is
being written by a large language model
and what the future of large language
models might bring to society as a whole
it's an ethical Minefield there are
going to be societal consequences on the
path to AGI potential benefits are huge
as well but we do need to tread very
carefully
let's start like Basics and high level
like what is a large language model and
why is it that they've suddenly sort of
made a splash I assume they've been
around a lot longer than the past year
or two yeah so language models
themselves are are really old Concepts
and old technology and really all it is
is a statistical model of words in
English language you take a big bunch of
texts and you try to predict what is the
word that'll come next given a few
previous words so the cat sat on the mat
is the most likely word and then you
have a distribution over all the other
words in your vocabulary as you scale
um the language models both in terms of
the number of parameters they have but
also in the size of the data set that
they're trained on it turns out that
they continue to get better and better
at this prediction task eventually you
have to start doing things like having
World Knowledge you know early on the
language model is learning letter
frequencies and word frequencies and
that's fairly straightforwards and
that's kind of what we're used to from
predictive text on our phones but if the
language model is going to be able to
finish the sentence today the president
of the United States acts it has to have
learned through the president of the
United States is if it's going to finish
a sentence that's a maths problem it has
to be able to solve the immense problem
and so where we are today is that you
know I think starting from GPT one and
two but then gbd3 was really the one
that I think everyone said okay
something is very very different here we
now have these models of language that
they're just models of the words right
they don't know anything about the
outside world there's loads of debates
about whether they actually understand
language but they are able to do this
task extremely well and the only way to
do that is to have gotten better at
under you know some form of reasoning
and some form of knowledge what are some
of the challenges of using a pre-trained
model like chat CPT so one of the big
ones is that they have a tendency to
confidently or hallucinate
stuff I think Matt Friedman describes as
alternating between spooky and kooky
sometimes it's so good that you cannot
believe the large language model was
able to do that and then just
occasionally it's horrendously wrong and
that's just to do with how the model is
originally trained they're trained to do
an expert prediction and so they don't
necessarily know that they shouldn't be
uh dishonest yeah sometimes they get it
wrong sometimes they get it wrong but
the danger is that they they confidently
get it wrong so uh and very persuasively
you know very authoritatively they get
it wrong and so people might you know
mistakenly trust these models
um so there's a couple of ways that you
can you know hopefully fix that and it's
an open research question but the way we
can help you with with human Lube to do
this today is we make it very easy to
pull in a factual context to The Prompt
that you give to the model and so the
model is much more likely to use that
rather than make something up and so
we've seen that as a very successful
technique for reducing hallucinations
terrific and this is a an element to
building a differentiated model for your
use case absolutely add an element for
making it safe and reliable right yeah
you know I think when chat GPT came out
there was a lot of frustration from
people who didn't like its personality
the tone was a bit obsequious and it's
you know it'll defer it doesn't want to
give strong opinions on things
and and to me that demonstrates the need
for you know many different types of
models and tone and customizations
depending on the use case and depending
on the audience and we can help you do
that can you talk a little bit about
what it means to fine-tune a model and
and why that's important if you look at
what the difference is between chat GPT
or the most recent openai text DaVinci 3
model and uh what's been in the platform
for two years and has not gotten as much
attention the difference is fine-tuning
like it's the same base model more or
less they took you can see it on the
open a website it's one of their code
pre-trained models and what made it so
good that a million users signed up in
five days was a fine-tuning exercise and
so what fine-tuning is is gathering
examples of the outputs you want
um first for the tasks that you're
trying to do and then doing a little bit
of extra training on top of this base
model to specialize it to that task what
opening I I think did first and others
have followed to do is to first uh do a
fine-tuning round of these models on in
input and output pairs that are actually
instructions and the results that you
would like from the instructions so
there's a human generated pairs of data
and then to further fine-tune the model
using something called reinforcement
learning from Human feedback where you
get human preference data so you show
people a few different Generations from
the model ask them to rank them or
choose which of two they prefer and then
use that to train a signal that can
ultimately fine-tune the model and it
turns out that reinforcement learning
from Human feedback makes a huge
difference to Performance like it's
really hard to to understate that in the
instruct GPT paper that openai released
they compared you know a one or two
billion parameter model with instruction
tuning and rhf to the full gpt3 model
and people preferred that despite the
fact it was a hundred times smaller
anthropic had this very exciting paper
just a couple of weeks ago where
actually were able to get similar
results to rhf without the H so just
actually having a second model provide
the evaluation feedback as well and
that's obviously a lot more scalable and
what data do developers need to bring in
in order to fine-tune a model
so there's this kind of two types of
fine tuning you might do they might just
show up with you know a corpus of books
or some background they just want to
fine-tune for tone they have their
companies chat logs or tone of voice
from marketing Communications and they
just want to adjust the tone right or
all the emails they've sent all the
emails they've sent for example that's
kind of almost extra pre-training I
would think about it as but it's fine
tuning as well and then the other
fine-tuning data comes actually from in
production usage so once they have their
app being used they're capturing the
data that their customers are providing
they're capturing feedback data from
that and in some sense it's it's being
automated at this point right like human
Loop is taking care of that data capture
for you and it's making the fine tuning
easy so you have an interaction with a
customer that you're that the llm
produces and the customer sort of gives
a thumbs up or thumbs down as to whether
that was helpful to give you a concrete
example you know imagine you give the
email example imagine that you're
helping someone draft a sales email
um and so you generate a first draft for
them and then they either send it or
they don't so that's like a very
interesting piece of feedback that you
can capture they probably edit it so you
can capture the edited text and they may
be or get a response or they don't get a
response to all of those bits of
feedback are things we would capture and
then use to drive improvements of the
underlying model got it if a developer
is trying to build an app using a large
language model and is doing it for the
first time what problems are they likely
to encounter and how do you guys help
them address some of those problems okay
so we typically help developers with
kind of three key problems one is
prototyping evaluation and finally
customization maybe I can sort of talk
about each of those so at the early
stages of developing a new large
language model product you have to try
and get a good prompt that works well
for your use case that tends to be
highly iterative you have hundreds of
different versions of these things lying
around managing the complexity of that
versioning experimenting that's
something we help with then the use
cases that people are building now tend
to be a lot more subjective than you
might have done with machine learning
before and so evaluation is a lot harder
you can't just calculate accuracy on a
test set and so helping developers
understand how well is my App working
with my end customers is the next thing
that we really make easy and finally
customization everyone has access to the
same base models everyone can use gpt3
but if you want to build something
differentiated you need to find a way to
customize the model to your use case to
your end users to your context and we
make that much easier both through fine
tuning and also through a framework for
running experiments we can help you get
a product to Market faster but most
importantly once you're there we can
help you make something that your users
prefer over the base models that seems
pretty fundamental I mean it's
prototyping and getting you the first
versions out testing and evaluation and
then differentiation this seems pretty
fundamental to to building something
great I think so I mean we really hope
that this is a platform on top of which
you know the next million developers can
build llm applications and we've worked
really closely with some of the first
companies to realize you know the
importance of this understood the pain
points they had and you know in a proper
YC approach have have tried to build
something that those people really
wanted and I think we got to a point
that now we're seeing from other is that
it really does solve acute pain points
for them and it doesn't really matter to
us what base language model you're using
we can help you with the data feedback
collection with fine tuning with
prototyping and that those problems are
going to be this you know very similar
across different models and really we
just want to help you get to the best
result for your use case and sometimes
that'll mean choosing a different model
I wanted to ask how is the job or role
of a developer likely to change in the
future because of this technology it's
interesting I've thought about this a
lot I think in the short term
um it augments developers right you can
do the same thing you could do faster to
me the most impressive application we've
seen of the large language model so far
is GitHub co-pilot I think that they
cracked a really novel ux and figured
out how to apply a large language model
in a way that's now used by I think 100
million developers and many people I
speak to who say that they're you know
finding a significant fraction of their
code is being written by a large
language model and I think if you'd ask
people will that happen two years ago no
one wrote that on there one thing that
is surprising to me is that the people
who say to me they use it the most are
some of the people I consider to be
better or more senior developers you
might have thought this tool would help
Juniors more but I think people who are
more accustomed to editing and reading
code actually benefit more from the from
the completions so short term it just
accelerates us and allows us to do more
um on a longer time Horizon you could
imagine developers becoming more like
product managers in that they're writing
the spec they're writing the
documentation but more of the grunt work
and more of the boilerplate is taken
care of by models
um I don't know long enough time Horizon
I mean there's very few jobs that can be
done so much through just text right
we've like really pushed it to the
extreme we've got GitHub and you have
remote work Engineers can do a lot of
their jobs entirely sitting at a
computer screen and so when we do get
towards things that look like AGI I
suspect that developers will actually be
one of the first jobs to see large
fractions of their job be automated
which I think is very counter-intuitive
um but also predicting the future is
hard so yeah what do you think the next
breakthroughs will be in llm technology
ah so I actually think here the roadmap
is like
quite well known almost like I think
there's a bunch of things that are
coming that we are kind of baked in we
know they're coming we just have to wait
for it to be achieved one thing that I
think developers will really care about
is the context window so at the moment
when you sort of use these models as a
limit to how much information you can
feed it every time you use it and
extending that context window is going
to add a lot more capabilities one thing
that I'm really excited about is
actually augmenting large language
models with the ability to take actions
and so we've seen a few examples of this
that's a startup called Adept AI that
are doing this and a few others where
you essentially let the large language
model decide to take some tasks so it
can output a string that says you know
search the internet for this thing and
then off the basis of the result
generate some more and repeat so you
actually start treating these large
language models much more like agents
than just text generation machines well
if something we have to sort of expect
or look forward to is uh you know AI
taking actions can this technology just
fundamentally be steered in a safe and
ethical Direction and how
oh gosh that's a tough question
um
I certainly hope so
um and there are and you know I think we
need to spend more time thinking about
this and working on it than we currently
do because as the capabilities increase
it becomes more pressing
um
there's there's a lot of different like
angles to that right so there are people
who worry about just like end safety so
people like Elias utakovski just like in
order to distinguish himself from just
normal AI safety he just talked about AI
not kill everyone right like he thinks
the risks are potentially so large that
this could be an existential threat and
then there are just the shorter term
threats of social disruption people feel
threatened by these models there are
going to be societal consequences even
to the weaker versions on the path to
AGI that raise serious ethical questions
um the models bacon biases and
preferences that were in the model and
the data and the team that built it at
the time that it was being constructed
um so there are it's an ethical
Minefield
um I don't think that means we shouldn't
do it because I think the potential
benefits are huge as well but we do need
to tread very carefully how strong is
the network effect with these models in
other words
um
is it the case that the in the future
there may be one model that sort of
rules them all because it will be bigger
and hence smarter than anything anyone
else could build or or is that not the
dynamic that's at play here so I don't
think that's the dynamic that's at play
here
um like to me the barriers to entry of
training one of these models are mostly
capital and talent
um like the people needed are still very
specialized and very smart uh and you
need lots of money to pay for gpus
um but beyond that I don't see that much
secret sauce right like the you know
opening eye for all the criticism
criticism they get they actually have
been pretty open and deepmind have been
pretty open they've published a lot
about how they've achieved what they've
achieved and so the main barrier to
replicating something like gpt3 is can
you get enough compute and can you get
smart people and can you get the data
and more people are following on their
heels
um there's some question about you know
whether or not the feedback data might
give them a flywheel I'm a little bit
skeptical of that that it would give
them so much that no one could catch up
why that seems pretty compelling if like
if they have a two-year Head Start and
yeah thousands and thousands of apps get
built then the lead they have in terms
of feedback data would seem to be pretty
compelling so I think the feedback data
is great for narrower applications right
like if you're building an end user
application then I think you can get a
lot of differentiation through feedback
and customization but they're building
this very general model that has to be
good at everything yeah
um and so they can't kind of like let it
become bad at code whilst it gets good
at something else which others can do I
see got it now let me ask you probably
the hardest uh question here
um open ai's mission is to build AGI
artificial general intelligence so that
machines can be at the cognitive level
of humans if not better do you think
that's within reach like the
breakthroughs recently mean that that's
closer than people thought
um or is this still for the time being
science fiction so there's a huge amount
of uncertainty here and if you pull
experts you get a wide range of opinions
even if you pull the people who are
closest to it if you chat to Folks at
openai or other companies you know
opinions differ but I think compared to
most people's perception in the public
people think it's plausible sooner than
than I think a lot of us thought so
there are prediction markets on this
metaculous sort of pulse people on How
likely they think AGI will be and I
think the median estimate something like
2040. and if you even if you think that
that's plausible
um that's remarkably soon for a
technology that might you know up upend
almost all of society what is very clear
is that you know we are still going to
see very dramatic improvements in the
short term and even before AGI a lot of
societal transformation a lot of
economic benefit but also questions that
we're gonna have to wrestle with to make
sure that this is a positive for society
um so yeah I think on the the short end
of timelines you know there are people
who think 2030 is plausible but those
same people will accept there's some
probability that it won't happen for
hundreds of years you know there's a
there's a distribution if you take it
seriously you know I think you should
take it seriously
and it's very hard to take it seriously
even like having like made that choice
of like I'm going to accept that by 2030
it's plausible that we will have
machines that can do all the cognitive
tasks that humans can do and more
and then you ask me like okay Reza like
are you building your company in a way
that's like obviously going to make
sense in that world of like I'm trying
but it's really hard to internalize that
intuitively
Stuart Russell has a point where he says
you know if if I told you an alien
civilization was going to land on Earth
in 50 years you wouldn't do nothing and
there's some possibility that you know
we've got something like an alien
arriving
soon right soon an alien arriving soon
yeah you heard it here first
um so uh let me ask you what is what is
this new technology mean for startups oh
man it's unbelievably exciting it's
really difficult to articulate there's
so many things that previously required
a research team for and that felt just
impossible that now you just ask the
model like honestly stuff that during my
PhD I didn't think would be possible for
years or that I spent trying to solve
problems on where you want to have a
system that can generate questions or
can do something or be a really good
chat bot like chat GPT like a realistic
one that can understand context over
long ranges of time not like Alexa or
Siri that's a single message
um the range of use cases is like now
feels to be more limited by imagination
than it is limited by technology and
when there is a technology change dis
abrupt where something has improved so
much and YC teaches this right there's
sort of a few different things that open
up opportunities for new applications
um and we're beginning to see it you
know a sort of Cambrian explosion of new
startups I think the latest YC batch has
many more startups we see it at human
Loop we get a lot of uh inbound interest
from companies that are at the beginning
of their Explorations and trying to
figure out how do I take this raw model
and this raw intelligence and actually
turn that into a differentiated product
you know hopefully we have some AI
Engineers or aspiring AI Engineers uh
listening today and might be interested
in working at human Loop are you guys
hiring and and you know what kind of uh
culture and Company you're trying to
build we we absolutely are hiring we're
hoping to build a platform that's you
know potentially one of the most
disruptive Technologies we've ever had
and that ideally will be used by
millions of developers in the future
um and there's going to be a lot of
doing stuff for the first time uh and
also inventing novel ux or UI
experiences so full stack developers
were comfortable like genuinely really
comfortable up and down the stack I do
deeply care about the end user
experience who will enjoy speaking to
our customers and they're fun customers
to work with because we're working with
startups and AI companies who are really
on The Cutting Edge they're really
innovators
um you know if that if that sounds
exciting to you it will be very hard
lots of it will be very new
um but it'll also be very rewarding well
this has been really fascinating uh I
think what my crystal ball says is that
one day in the future uh literally
millions of developers will be using
your tools to build great applications
using AI technology so I wish you luck
and thank you again for your time thank
you ollie it's been an absolute pleasure
[Music]
thank you
[Music]