Running mean (batched version) with Welford's method

Definitions:
\[ u_n = \frac{\sum_{i=1}^n x_i}{n} \\ u_{n+k} = \frac{\sum_{i=1}^{n+k} x_i}{n+k}\\ \]

Problem:

  • We have previously computed \(u_n\).
  • Now we have a batch of new samples \(x_{n+1},\dots,x_{n+k}\).
  • We want to compute \(u_{n+k}\) without accessing the old values of \(x_1,\dots,x_n\).

We can decompose \(u_{n+k}\) as:
\[ u_{n+k} = u_{n+k} + u_n - u_n = u_n + (u_{n+k}-u_n) \]

If we had an easy expressión for \(u_{n+k}-u_n\), we could use the old value of \(u_n\) to efficiently calculate \(u_{n+k}\). Let's try some algebra to simplify \(u_{n+k}-u_n\):

\[ u_{n+k}-u_n = \frac{\sum_{i=1}^{n+k} x_i}{n+k} - \frac{\sum_{i=1}^n x_i}{n}\\ =\frac{ n \sum_{i=1}^{n+k} x_i}{(n+k)n} - (n+k) \frac{\sum_{i=1}^n x_i}{(n+k)n}\\ =\frac{ n \sum_{i=1}^{n+k} x_i - (n+k) \sum_{i=1}^n x_i}{(n+k)n} \\ =\frac{ n \sum_{i=1}^{n} x_i + n \sum_{i=n+1}^{n+k} x_i - (n+k) \sum_{i=1}^n x_i}{(n+k)n}\\ =\frac{ n \sum_{i=n+1}^{n+k} x_i - k \sum_{i=1}^n x_i}{(n+k)n}\\ =\frac{ \sum_{i=n+1}^{n+k} x_i - k u_n} {(n+k)}\\ \]

That seems simple enough:

  • Numerator
    • \(\sum_{i=n+1}^{n+k} x_i\)) it's just a sum of the new values
    • The old mean \(u_n\) is scaled by \(k\) so that the magnitudes of both of these quantities match
    • The substraction actually gives us \(\sum_{i=1}^{n+k} x_i\)!
  • Denominator: we divide by \(n+k\) to obtain the new mean

Therefore, our algorithm is:

\[ u_{n+k} = u_{n} + (u_{n+k}-u_n) \\ u_{n+k} = u_n + \frac{ (\sum_{i=n+1}^{n+k} x_i) - k u_n} {(n+k)} \quad \textit{(more efficient)}\\ u_{n+k} = u_n + \frac{ \sum_{i=n+1}^{n+k} (x_i - u_n)} {(n+k)} \quad \textit{(more precise)}\\ \]

We can use a class in Python to implement this algorithm for numpy arrays with batches \(x\) of \(k\) numbers:

class RunningMean:
    def __init__(self):
        self.n=0
        
    def update(self,x:np.ndarray):
        k = x.shape[0]
        if self.n==0:
            self.n=k
            self.mean = x.mean(axis=0)
        else:
            self.n+=k
            diff = x.sum(axis=0) - k * self.mean
            # alternative, less efficient, better precision
            # diff = (x - self.mean).sum(axis=0)
            self.mean = self.mean + diff / self.n

Welford's method

Welford's method is a stable method for computing a running mean. It is generally derived for cases where the updates happen only for a single new sample.

Note that the previous is a generalization of Welford's method for k=1, since:

\[ u_{n+k} = u_n + \frac{ (\sum_{i=n+1}^{n+k} x_i) - k u_n} {(n+k)} \\ \text{Replacing $k=1$:}\\ u_{n+1} = u_n + \frac{ x_{n+1} -u_n} {(n+1)} \\ \]

Select a repo