Try   HackMD

Relation between the vector field and the score function for generative modelling

A short note written by Federico Bergamin (fedbe@dtu.dk) with the help of Stas Syrota (stasy@dtu.dk) and Aliaksandra Shysheya (as2975@cam.ac.uk).

Unconditional velocity field
ut(x)
and
logpt(x)

Before starting we refresh the definition of probability path and velocity field as introduced in the Flow Matching paper by Lipman et al., as they will be useful to understand the derivations. These are given by:

pt(x)=pt(x|x1)q(x1)dx1ut(x)=ut(x|x1)pt(x|x1)q(x1)pt(x)dx1

In addition to that, we know that for a Gaussian probability path

pt(x|x1)=N(x|αtx1,σt2I), then the conditional velocity field
ut(x|x1)
can be derived as

ut(x|x1)=σt˙σt(xαtx1)+αt˙x1

We can rewrite this as follows (while I like the trick, I still have to figure out how one can think about this in the first place tbh, but that's life I suppose)

ut(x|x1)=σt˙σt(xαtx1)+αt˙x1=αt˙αtxαt˙αtx+σt˙σt(xαtx1)+αt˙x1revolutionary idea of adding 0=αt˙αtx+σt˙σt(xαtx1)αt˙αt(xαx1)=αt˙αtx(xαx1)(αt˙αtσt˙σt)=αt˙αtx1σtαt(xαx1)(αt˙σtαtσt˙)=αt˙αtx+(αt˙σtαtσt˙)σtαtlogpt(x|x1)
where in the last derivation we used the fact that
logpt(x|x1)=1σt2(xαtx1)
since
pt(x|x1)=N(x|αtx1,σt2I)
.

By substituting the definition we have just found of

ut(x|x1) in the definition of
ut(x)
, we get:
ut(x)=ut(x|x1)pt(x|x1)q(x1)pt(x)dx1=(αt˙αtx+(αt˙σtαtσt˙)σtαtlogpt(x|x1))pt(x|x1)q(x1)pt(x)dx1=(αt˙αtx)pt(x|x1)q(x1)pt(x)dx1+((αt˙σtαtσt˙)σtαtlogpt(x|x1))pt(x|x1)q(x1)pt(x)dx1=(αt˙αtx)pt(x|x1)q(x1)pt(x)pt(x)pt(x)=1dx1+((αt˙σtαtσt˙)σtαt)logpt(x|x1)pt(x|x1)q(x1)pt(x)What is this?dx1

To understand the second term, we have to analyze the score

logpt(x). Indeed, we can rewrite it by using the logarithm trick as follows:
logpt(x)=pt(x)pt(x)using log-trick=pt(x|x1)q(x1)pt(x)dx1definition of prob. path=pt(x|x1)q(x1)pt(x)dx1=logpt(x|x1)pt(x|x1)q(x1)pt(x)dx1using log-trick

Therefore we can see that the missing integral on the equation above is exactly

logpt(x). By substituting the definition, we get the relation between the velocity field and the score function, which is given by:
ut(x)=αt˙αtx+(αt˙σtαtσ˙t)σtαtlogpt(x)

NOTE: All the above derivations above rely on the fact that we are able to arrive to a definition of $ u_t(x|x_1)$ in terms of

logpt(x|x1). This is possible only because we are considering a conditional probability path that is Gaussian, i.e.
pt(x|x1)=N(x|αtx1,σt2I)
, which is possible if and only if the base distribution
q(x0)
is Gaussian. However, flow matching/stochastic interpolants can be used to transport any distribution
q(x0)
to any distribution
q(x1)
. In this case, the probability path is still Gaussian, but it depends on both
x0
and
x1
, i.e.
pt(x|x1,x0)=N(x|μt(x1,x0),σt(x1,x0)2I)
, but if we marginalize
x0
out, we don't get a Gaussian distribution.

NOTE: In a lot of paper we found a definition using the derivative of certain logarithm. Indeed, if we look closely at the definition of

αt˙αt and
(αt˙σtαtσ˙t)σtαt
we can see that they reminds of the derivative of a logarithmic function. Indeed, if we can write
dlnαtdt=1αtαt˙
. The same way, if we consider the following
12dσt2dt+σt2dlnαtdt=(σtσt˙+σt2αt˙αt)=σtαt(αt˙σtαtσt˙)
. So I feel that at the end of appendix B of the Guided flow paper, they are missing a square kind of. The same quantity can also be written as
σt2dln(αt/σt)dt
.

Therefore, we can defined the vector field in terms of the score in the following three equivalent ways:

ut(x)=αt˙αtx+(αt˙σtαtσ˙t)σtαtlogpt(x)ut(x)=dlnαtdtx+σ2dln(αt/σt)dtlogpt(x)ut(x)=dlnαtdtx+(12dσt2dt+σt2dlnαtdt)logpt(x)

Conditional velocity field
ut(x|y)
and
logpt(x|y)

To show the connectuon between

ut(x|y) and
logpt(x|y)
we have to follow almost the same steps as we did above.

In case of a certain conditioning observation

y, the probability path and the velocity field are defined as follows:

pt(x|y)=pt(x|x1)q(x1|y)dx1ut(x|y)=ut(x|x1)pt(x|x1)q(x1|y)pt(x|y)dx1

As before the trick is to rewrite the conditional velocity field

ut(x|x1) by using the fact that we are dealing with a Gaussian probability path
pt(x|x1)
. Therefore as before, we have that

ut(x|x1)=αt˙αtx+(αt˙σtαtσt˙)σtαtlogpt(x|x1)

We can also consider the conditional score

logpt(x|y) which can be rewritten following similar steps as before
logpt(x|y)=pt(x|y)pt(x|y)using log-trick=pt(x|x1)q(x1|y)pt(x|y)dx1definition of prob. path=pt(x|x1)q(x1|y)pt(x|y)dx1=logpt(x|x1)pt(x|x1)q(x1|y)pt(x|y)dx1using log-trick

By substituting the definition of

ut(x|x1) into
ut(x|y)
, we get:
ut(x|y)=ut(x|x1)pt(x|x1)q(x1|y)pt(x|y)dx1=(αt˙αtx+(αt˙σtαtσt˙)σtαtlogpt(x|x1))pt(x|x1)q(x1|y)pt(x|y)dx1=(αt˙αtx)pt(x|x1)q(x1|y)pt(x|y)dx1+((αt˙σtαtσt˙)σtαtlogpt(x|x1))pt(x|x1)q(x1|y)pt(x|y)dx1=(αt˙αtx)pt(x|x1)q(x1|y)pt(x|y)pt(x|y)pt(x|y)=1dx1+((αt˙σtαtσt˙)σtαt)logpt(x|x1)pt(x|x1)q(x1|y)pt(x|y)definition of logpt(x|y)dx1=αt˙αtx+(αt˙σtαtσ˙t)σtαtlogpt(x|y)

where we used the definition of

logpt(x|y) we found above.

NOTE This derivation is exactly the one of Zheng et al., which showed that we can write the conditional velocity field as follows

ut(x|y)=atx+btlogpt(x|y)

which in the case of a Gaussian probability path defined as

pt(x|x1)=N(x|αtx1,σt2I), it becomes

ut(x|y)=αt˙αtatx+(αt˙σtαtσ˙t)σtαtbtlogpt(x|y)

Guiding an unconditional velocity field

Let's assume we have a pre-trained velocity field

vθ(t,xt). How can I guide it to sample an example from a specific class for example. In the section above we have shown that
ut(x)=αt˙αtx+(αt˙σtαtσ˙t)σtαtlogpt(x)

We can also write the unconditional score

logpt(x) in terms of the velocity field
ut(x)
:

logpt(x)=(ut(x)αt˙αtx)αtσt(αt˙σtαtσ˙t)

In addition to that, if we want to sample an example

x conditioned on a specific observation
y
, we are interested in simulating the conditional vector field
u(x|y)
, which we don't have it available. However above we have seen that we can write it as

ut(x|y)=αt˙αtx+(αt˙σtαtσ˙t)σtαtlogpt(x|y)=αt˙αtx+(αt˙σtαtσ˙t)σtαt[logpt(x)+logpt(y|x)]using Bayes' rule=αt˙αtx+(αt˙σtαtσ˙t)σtαt[(ut(x)αt˙αtx)αtσt(αt˙σtαtσ˙t)def of logpt(x)+logpt(y|x)]=αt˙αtx+ut(x)αt˙αtx+(αt˙σtαtσ˙t)σtαtlogpt(t|x)=ut(x)+(αt˙σtαtσ˙t)σtαtlogpt(y|x)

Example In the most common setting, where we assume that

q(x0)=N(x0|0,I), and the probability path can be expressed as
p(x|x1)=N(x|tx1,(1t)2I))
, i.e. we have that
αt=t
and
σt=1t
, then we can compute the unconditional velocity field and the conditional one as follows:
ut(x)=1tx+1ttlogpt(x)ut(x|y)=ut(x)+1ttlogpt(y|x)

where
logpt(y|x)
can be an additional classifier trained on the interpolation
x
.

Inverse problems

Score based models are nice because there is a neat way to tackle inverse problems. In this cases, indeed, they allow to sample from

p(x1|y) (note usually in diffusion this will be
p(x(0)|y)
) without the need to train a separate classifier. Indeed, we can write
p(x1|y)
as follows
p(y|x)=p(y,x1|x)dx1=p(y|x1)likelihood modelp(x1|x)most probable x1 given noisy xdx1

where
p(y|x1)
is the likelihood model which we can compute easily and
p(x1|x)
is the distribution which given a noised
x
gives us a distribution over the possible noiseless
x1
. This distribution, despite not being available in closed form, can be approximated by moment matching. The mean can be obtained by using Tweedie’s formula (first introduced by Robbins):
x^(x)=E[x1|x]=x+σt2xlogpt(x)αt

Therefore, if we want to apply the same technique with flow matching, we have to compute the corresponding Tweedie's/Robbins' formula with the velocity field. To do so, one possibility is to use the equation that relates the score to the velocity field that we have introduced above. If we do so, we have a way to compute

x^(x) using a velocity field. We will derive the equation step-by-step.

x^(x)=xαt+σt2αtxlogpt(x)=xαt+σ2(t)αt((ut(x)αt˙αtx)αtσt(αt˙σtαtσ˙t))=xαt+σt2αtαtσt(ut(x)αt˙αtx)1(αt˙σtαtσ˙t)=xαt+σt(ut(x)αt˙αtx)1(αt˙σtαtσ˙t)=xαt+σt(αt˙σtαtσ˙t)ut(x)αt˙αtσt(αt˙σtαtσ˙t)x=xαt(1αt˙σt(αt˙σtαtσ˙t))+σt(αt˙σtαtσ˙t)ut(x)=xαt(αt˙σtαtσ˙tαt˙σt(αt˙σtαtσ˙t))+σt(αt˙σtαtσ˙t)ut(x)=xαt(αtσ˙t(αt˙σtαtσ˙t))+σt(αt˙σtαtσ˙t)ut(x)=xσ˙t(αt˙σtαtσ˙t)+σt(αt˙σtαtσ˙t)ut(x)=σt(αt˙σtαtσ˙t)(ut(x)σt˙σtx)

Now we have a way to approximate the mean of

p(x1|x). The covariance function can also be estimated via Tweedie's formula, although this require the computation of the Hessian:
Cov[x1|x]=[σt2αt2(I+σt22logpt(x)]=Σ^(x)

Therefore we can now approximate

p(x1|x) as a Gaussian distribution with mean
x^(x)
and covariance
Σ^(x)
. In addition to that, if assume that we have an observation likelihood that is Gaussian, i.e.
p(y|x1)=N(y|A(x1),σy2I)
, where
A(x1)
is the observation model and
σy2
the likelihood variance. By doing so, then we can get a closed form approximation for
p(y|x)
given by
N(y|Ax^(x),AΣ^(x)AT+σy2I)
. The computation of the Hessian can be problematic, in addition to that in our case we have to find a relationship between the velocity field and the Hessian. For this reason, different papers propose different approaches to approximate it simply by considering
Cov[x1|x]=rt2I
, where
rt
is a monotonically increasing function. Therefore the final approximation is given by
p(y|x)N(y|Ax^(x),(σy2+rt2I))

Alternative derivation In Zheng et al. paper (Training-free linear image inversion paper), we have a different approach to derive the

x^(x) for the Tweedie's formula. They, indeed, decided to use the relationship between the velocity field and the score that is defined by using the derivatives of logarithms that we have presented above, i.e.
ut(x)=dlnαtdtx+σ2dln(αt/σt)dtlogpt(x)

If we now re-write the relationship from the score perspective, we get
logpt(x)=(ut(x)dlnαtdtx)(σ2dln(αt/σt)dt)1

Then we can insert this into the Tweedie's formula and then it's just applying calculus as we are showing by deriving it step-by-step:
x^(x)=xαt+σt2αtxlogpt(x)=xαt+σt2αt(ut(x)dlnαtdtx)(σ2dln(αt/σt)dt)1=xαt+1αt(dln(αt/σt)dt)1ut(x)1αt(dln(αt/σt)dt)1dlnαtdtx=xαt(1(dln(αt/σt)dt)1dlnαtdt)+1αt(dln(αt/σt)dt)1ut(x)=xαt(1dlnαtdtdln(αt/σt)dt)+(αtdln(αt/σt)dt)1ut(x)=xαt(dlnαtdtdlnσtdt+dlnαtdtdln(αt/σt)dt)+(αtdln(αt/σt)dt)1ut(x)=xαt(dln(αt/σt)dt)1dlnσtdt+(αtdln(αt/σt)dt)1ut(x)=(αtdln(αt/σt)dt)1ut(x)(αtdln(αt/σt)dt)1dlnσtdtx=(αtdln(αt/σt)dt)1(ut(x)dlnσtdtx)

Example We now go back to the running example where

αt=t and
σt=1t
(and
αt˙=1
and
σt˙=11
, respectively). If we substitute them above, we get
x^(x)=(1t)(ut(x)+11tx)=(1t)ut(x)+x

Probability flow ODE and recovering the SDE using the velocity field

Before jumping into the relationship between the velocity field and the SDE we have to introduce some background concept. We will try to do it in the simplest possible way. Let's start by considering the general form of an SDE (here for simplicity we assume

xR, i.e.
x
is a scalar):
dx=f(x,t)dt+g(t)dW

where
f(x,t)
is the drift term and
g(t)
(which sometimes can also depends on
x
, i.e.
g(x,t)
) is the diffusion term. Let us also assume that the initial condition
x0
is sampled from a certain distribution
p0(x)
. Then the Fokker-Planck equation tells us how the probability distribution
pt(x)
evolves over time:
pt(x)t=x(f(x,t)pt(x))+12g2(t)2x2pt(x)

Let us now assume that we have an ODE, instead, with a specific form of velocity field, which is given by

dxdt=h(x,t)h(x,t)=f(x,t)g(t)22xlogqt(x)
where the initial observations are sampled from
x0q0(x)
. The evolution of
qt(x)
, the probability distribution associate with the dynamics, over time can be described by the continuity equation, which is given by
qt(x)t=x(h(x,t)qt(x))=x((f(x,t)g(t)22xlogqt(x))qt(x))=x(f(x,t)qt(x))+12g(t)2x(xlogqt(x))qt(x)=x(f(x,t)qt(x))+12g(t)2x(1qt(x)xqt(x))qt(x)=x(f(x,t)qt(x))+12g(t)2xxqt(x)=x(f(x,t)qt(x))+12g(t)22x2qt(x)

By comparing this with the Fokker-Planck equation above, we can notice that both the SDE and the ODE has the same dynamics if
p0(x)=q0(x)
. The ODE is known as Probability flow ODE, and it is usually written as:
dxdt=f(x,t)12g2(t)logpt(x)

Everybody at this point they cite Kingma 2021 (Variational Diffusion Model paper), but I am not able to find where they actually define the following:

f(x,t)=xddtlnαtg2(t)=dσt2dt2σt2ddtlnαt
Therefore, the probability flow ODE can also be written as
dxdt=f(x,t)12g2(t)logpt(x)=xddtlnαt12(dσt2dt2σt2ddtlnαt)logpt(x)=xddtlnαt+(12dσt2dt+σt2ddtlnαt)logpt(x)=ut(x)

where in the last equation we used the relation between
ut(x)
and
logpt(x)
we derived at the beginning.

Alternative and possibly simpler derivation We can derive the same thing by following a different and maybe easier path. The derivation we have presented above, was given in the following presentation/lecture. However, we can derive the same conclusion by just rewriting the Fokker-Planck equation as we are going to see and then use the definition of divergence and its connection to the continuity equation. We mention different concepts in just two lines, but we are going to describe them briefly here before getting into the step-by-step derivation. We have seen above that the Fokker-Planck equation tells us the evolution of the probability

pt(x) over time if it follows the dynamics described by the SDE. We can also see that it consists of two terms, the first one kind of describes the change of probability given by the drift, which is a deterministic function. The second term, instead, is related to the diffusion term of the SDE, and describes the change of probability given by the diffusion. Roughly speaking, an ODE is just a deterministic SDE, i.e. an SDE without the diffusion term. Therefore, the evolution of the porbability
pt(x)
following an ODE, is just given by the first term, i.e.
pt(x)t=x(f(x,t)pt(x))
, which is usually referred to as the continuity equation. Here we are reporting it for the scalar case, but in the high-dimensional case this is written as
pt(x)t=(f(x,t)pt(x))
, where
F
is the divergence operator.
Now we have all the ingredients to show the alternative derivation. Let's start from the Fokker-Planck equation:
pt(x)t=x(f(x,t)pt(x))+12g2(t)2x2pt(x)=x(f(x,t)pt(x))+12g2(t)xxpt(x)just a rewriting=x(f(x,t)pt(x))+12g2(t)xxlogpt(x)pt(x)log-trick logp(x)=1p(x)p(x)=x[f(x,t)12g2(t)xlogpt(x)]pt(x)

which can be seen as a continuity equation of the following ODE:
dxdt=F(x,t)=f(x,t)12g2(t)logpt(x)

Therefore, we show how we can express an SDE via an ODE, but we have to know the score
logpt(x)
. Then following the same reasoning we did above, we can show how we can link this to the vector field
ut(x)
. If you are interested in learning more about the Fokker-Planck equation we suggest this nice blogpost.

A different path to draw a connection between flow matching and score-based modelling

I am not completely sure this is 100% correct, so do not draw any conclusion from this

A different path to get to the relationship between the vector field in flow matching and the score learned by score-based models can be derived by following Karras et al. approach. In the paper, they propose a different view of the probability flow ODE associated to a particular noising SDE and corresponding backword/denoising SDE. Before showing the relation, we take a detour and start from the usual SDE formulation of the noising process as introduced by Song et al.

dx=f(x,t)dt+g(t)dwt,
where
f(x,t)
is the drift,
g(t)
the diffusion, and
wt
is a Wiener process. The drift and the diffusion usually depends on the type of the noising SDE we are considering, like if it is variance exploding or variance preserving, but usually in diffusion model we choose a drift term that is affine, meaning that
f(x,t)
can be written ad
f(t)x
. Therefore, the usual SDE we consider is given by
dx=f(t)xdt+g(t)dwt,

Since the drift is affine, then the perturbation kernel or noising kernel associated with this SDE has a closed form given by
p0t(x(t)|x(0))=N(x(t);s(t)x(0),s2(t)σ(t)2I)

where
s(t)
and
σ(t)
depends on the choice of the drift and the diffusion and are given by
s(t)=exp(0tf(ξ)dξ)σ(t)=0tg(ξ)2s(ξ)2dξ

NOTE Now we are assuming that the perturbation kernel has the form

p0t(x(t)|x(0))=N(x(t);s(t)x(0),s2(t)σ(t)2I), while before we were considering
p(x(t)|x1=N(x(t);α(t)x1,σ(t)2I)
. Therefore
σ
is different from above, and in addition to that, while the base distribution is
pT
for diffusion models, in flow matching it is given by
p0
.

The marginal distribution

pt(x), which is the one that the probability flow ODE wants to match for every time
t
is given by
pt(x)=Rdp0t(x|x0)pdata(x0)dx0

and the probability flow ODE that actually obey this
pt(x)
is given by
dx=f(t)x12g2(t)xlogpt(x)

However, the following probability flow ODE is defined in terms of the drift

f(t) and diffusion
g(t)
. It would be more helpful if the properties of the marginal distribution
pt(x)
that the ODE is trying to obey can be derived in terms of the
s(t)
and
σ(t)
that appear in the nosing kernel. In the paper, they show that is possible to define it in terms of
s(t)
and
σ(t)
, and this is given by:
dx=s˙(t)s(t)xs2(t)σ(t)σ˙(t)xlogp(xs(t);σ(t))

Derivation of the probability flow ODE in terms of

s(t) and
σ(t)

For completeness we will derive step by step the new formulation of the probability flow ODE, but the same derivations can be found in the Appendix of Karras et al paper.
Let us start by writing the marginal distribution in a different way (all the derivations we present here are actually reported in the appendix of Karras et al):
pt(x)=Rdp0t(x|x0)pdata(x0)dx0=Rdpdata(x0)[N(x(t);s(t)x(0),s2(t)σ(t)2I)]dx0insert definition of p0t(x|x0)=Rdpdata(x0)[s(t)dN(x(t)s(t);x0,σ(t)2I)]dx0x=s(t)x0+s(t)σ(t)ϵ so x(t)s(t)=x0+σ(t)ϵ=s(t)dRdpdata(x0)[s(t)dN(x(t)s(t);x0,σ(t)2I)]dx0=s(t)d[pdataN(0,σ(t)2I)]convolution, mollified version of the data(xs(t))

By looking at the definition of
pt(x)
above, we can define the following compact terms:
p(x;σ(t))=pdataN(0,σ(t)2I)andpt(x)=s(t)dp(xs(t);σ(t))

We can therefore substitute the new definition of the marginal distribution
pt(x)
into the probability flow ODE and get the following:
dx=f(t)x12g2(t)xlogpt(x)=f(t)x12g2(t)xlogs(t)dp(xs(t);σ(t))=f(t)x12g2(t)[xlogs(t)d=0+xlogp(xs(t);σ(t))]=f(t)x12g2(t)xlogp(xs(t);σ(t))

We can now compute the drift
f(t)
and the diffusion
g(t)
coefficient in terms of
s(t)
and
σ(t)
. Let's start with
s(t)
:
s(t)=exp(0tf(ξ)dξ)logs(t)=(0tf(ξ)dξ)ddtlogs(t)=ddt(0tf(ξ)dξ)s˙(t)s(t)=f(t)

while for
g(t)
we get:
σ(t)=0tg(ξ)2s(ξ)2dξσ2(t)=0tg(ξ)2s(ξ)2dξddtσ2(t)=ddt0tg(ξ)2s(ξ)2dξ2σ(t)σ˙(t)=g(t)2s2(t)2σ(t)σ˙(t)=g(t)s(t)g(t)=s(t)2σ(t)σ˙(t)

So by substituting those into the probability flow ODE then we can get a different version of it in terms of
s(t)
and
σ(t)
:
dx=f(t)x12g2(t)xlogp(xs(t);σ(t))dx=s˙(t)s(t)x12s2(t)2σ(t)σ˙(t)xlogp(xs(t);σ(t))dx=s˙(t)s(t)xs2(t)σ(t)σ˙(t)xlogp(xs(t);σ(t))

Therefore, now given a perturbation kernel, we have a way to describe the ODE that has the same marginals as the underlying SDE that is linked to that specific perturbation kernel. If we think about flow matching now, we are approximating a ODE and to do so we are using a probability path. Thus, if our base distribution

p0 is standard Gaussian, our velocity field is approximating exactly the probability flow ODE we have presented above, meaning that
ut(x)=s˙(t)s(t)xs2(t)σ(t)σ˙(t)xlogp(xs(t);σ(t))

We try now to see if by considering the usual flow matching choice of having the following probability path

p(x|x1)=N(x;tx1,(1t)2). While above we considered
αt=t
and
σt=1t
, now to follow Karras et al. framework, we have that
s(t)=t
, while
σ(t)=1tt
. From those we can simply compute also the derivatives
s˙(t)=1
and
σ˙(t)=1t2
. By substituting this in the relation between the velocity field and the score, we get:
ut(x)=s˙(t)s(t)xs2(t)σ(t)σ˙(t)xlogp(xs(t);σ(t))=1txt21tt1t2xlogp(xs(t);σ(t))=1tx+1ttxlogp(xs(t);σ(t))

which is the same we found also above, by following a different approach.

References

  • Lipman, Yaron, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. "Flow matching for generative modeling." ICLR 2023
  • Zheng, Qinqing, Matt Le, Neta Shaul, Yaron Lipman, Aditya Grover, and Ricky TQ Chen. "Guided flows for generative modeling and decision making." arXiv preprint arXiv:2311.13443 (2023).
  • Kingma, Diederik, Tim Salimans, Ben Poole, and Jonathan Ho. "Variational diffusion models." Advances in neural information processing systems 34
  • Pokle, Ashwini, Matthew J. Muckley, Ricky TQ Chen, and Brian Karrer. "Training-free linear image inversion via flows." arXiv preprint arXiv:2310.04432 (2023).
  • Efron, B. Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106, 2011.
  • Robbins, H. E. An empirical bayes approach to statistics. In Breakthroughs in Statis- tics: Foundations and basic theory, pp. 388–394. Springer, 1992.
  • Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2020). Score-based generative modeling through stochastic differential equations. _arXiv preprint arXiv:2011.13456.
  • Karras, T., Aittala, M., Aila, T., & Laine, S. (2022). Elucidating the design space of diffusion-based generative models. Advances in neural information processing systems35, 26565-26577.