# Forecasting based on total raw power and avearge quality multiplier
## Summary
The algorithm currently used to predict 20-day block reward per sector (necessary to compute initial pledge collateral) suffers from two fundamental problems:
1) It may predict values outside valid variable ranges (for instance negative network power)
2) It is unable to predict baseline crossing, which is one of the most significant events that affect the amount of block rewards, so there is always a lag in the prediction for realizing when baseline was crossed.
We propose here a prediction algorithm that solves both of these problems, as well as producing more accurate predictions than the current algorithm.
## Introduction
There is a need to predict the expected 20-day block reward per sector. The block reward per sector is given by the ratio of total new rewards over total QAP. The current on chain prediction mechanism is based on tracking these two quantities, applying alpha-beta filters on them, and using the filter estimates to make a prediction on the cumulative sum of their ratio.
There is an intrinsic problem to this approach, which is that these two quantities are not independent from each other. For instance, we plot here the data on total new rewards,

There is a large shift in behavior that occurs at the baseline crossing on **2021-04-03**. If we treat the **new rewards** as an independent variable, this shift in behavior seems completely unpredictable. Thus, any prediction based on **new reward** data will be unaware of the upcoming baseline crossing, and there will be a 20-day lag effect, before our prediction realizes baseline has been crossed.
The truth is that **new rewards** is not an independent variable, and it is instead completely derivable from the **total raw power**, as well as knowledge of the baseline power.
The details of how to compute the new rewards, based on the total raw power, are given in https://spec.filecoin.io/systems/filecoin_token/block_reward_minting/ .
In this note, we therefore show how it makes more sense to track and make predictions based on two *independent* variables, namely the **total raw power** and the **average quality multiplier**$\equiv$**total QAP/total raw power**.
## Forecasting raw power with proper range
The current forecasting methods used, are based on linear extrapolation. We track the **new rewards** and **total QAP** data, make estimates for their current positions and velocities, and extrapolate that they will continue evolving linearly with that velocity for the next 20 days.
This approach therefore assumes that the quantities being extrapolated actually evolve approximately linearly on a 20 day period.
In reality, we expect the **total raw power** to grow roughly exponentially, rather than linearly. This is already built into the assumption that it's growth should be comparable to the baseline power, which is exponential.
There is also a more fundamental issue with linear extrapolations, which is that the quantities we are extrapolating have a limited range, and are not free to take any real value. **Total raw power**, **total QAP**, and **new rewards** are all restricted to being greater or equal to zero.
We see this restriction has already been violated by the current alpha-beta filter based forecasting method. For instance, below is the current extrapolation for **total QAP**. We see that there was an actual shock that occured on **2020-12-20**, which was amplified in the prediction (by the 20 day factor). This amplification was so large, that the predicted values went to the *unphysical* negative regime.

This is a problem not just because these predicted values should never be negative, but when we use the filters to predict the cumulative sum of the ratio of such quantities, the currently used formula involves taking the logarithm of these quantities, which is problematic if they can become negative.
### Tracking and extrapolating Logarithm of total raw power
Our proposed solution for forecasting on the total raw power with the proper range, is to base the prediction on a linear extrapolation of the logarithm of the **total raw power** instead of the total raw power itself, with the following steps:
1) Compute $L=\log(\mathbf{total\,raw\,power})$.
2) Estimate an effective instantaneous velocity associated with $L$ at each point in time.
3) Predict the future $L$ by linear extrapolation based on the current velocity.
4) Filter this extrapolation if noisy.
5) Re-exponentiate to obtain prediction on $\mathbf{total\,raw\,power}=\log(L)$.
This solves the problem that the final prediction on **total raw power** will *always remain non-negative*. This is because even if the quantity we are actually linearly extrapolating (the logarithm of total raw power) becomes negative, Our prediction will remain positive after exponentiation.
The predictions should also be more accurate in the sense that the **logarithm of total raw power** should evolve more linearly than the **total raw power** itself, since as we mentioned, this one is expected to evolve more exponentially.
## Forecasting average quality multiplier with proper range
We are now also tracking the **average quality multiplyer**, a quantity that is by definition restricted to the values between 1 and 10 (under the current definition of quality multiplyers https://spec.filecoin.io/systems/filecoin_mining/sector/sector-quality/).
With the same motivation as before, if we apply simple linear extrapolation on this variable, the prediction may land outside the appropriate variable range. So our approach is to perform a similar variable mapping, to an alternative variable that can take any real value and we are free to linearly extrapolate. We then take this new variable and map it back to the bounded variable. One way to accomplish this is by switching to the variable
$$T\equiv\tanh^{-1}\left(\frac{\mathbf{average\,quality\,multiplyer}-1}{9}\right)$$
Our extrapolation algorithm is then:
1) Compute $T$ based on the **average quality multiplyer**.
2) Calculate an effective velocity associated with $T$.
3) Linearly extrapolate $T$.
4) Filter this extrapolation if noisy.
5) compute the prediction for average quality multiplyer by inverting the mapping:
$$\mathbf{average\,quality\,multiplyer}=9\tanh(T)+1$$
## Estimating velocities and smoothening post-extrapolation
To make our linear extrapolations, we need to estimate the current velocity associated with $L$ and $T$. As was shown in (https://hackmd.io/@R02mDHrYQ3C4PFmNaxF5bw/HJtL7i9HK), applying an alfa beta filter just to obtain a velocity estimate is not very efficient, since the original data is already smooth enough. We estimate the velocity by defining some time window $\tau$, such that
$$v_L(t_0)=\frac{L(t_0)-L(t_0-\tau)}{\tau},$$
$$v_T(t_0)=\frac{T(t_0)-T(t_0-\tau)}{\tau},$$
where the size of $\tau$ acts as a smoothening factor, as we estimate the current velocity by averaging over the velocities in the previous $\tau$ time steps.
We used these velocities to predict the future $L$ and $T$. These predictions are again quite noisy, and they do benefit from filtering, so we apply alpha beta filters on the predictions for $L$ and $T$.
<figure>

<figcaption align = "center"><b> Observed vs 20-day prediction **total raw power**, with τ=2hours. The prediction is then filtered with α=0.00007
β= 0.0000000002 (values found by trial and error)</b></figcaption>
</figure>
We notice that even after the shock on **2020-12-20** is amplified by the extrapolation, it never goes below 0.
<figure>

<figcaption align = "center"><b> Observed vs 20-day prediction **average quality multiplyer**, with τ=2hours. The prediction is then filtered with α=alpha=0.00003,
β= 0.000000001 (values found by trial and error)</b></figcaption>
</figure>
## Calculating the 20 day prediction for new rewards/total QAP
We have shown how to make 20-day predictions based on data on **total raw power** and **average quality multiplyer**.
**total new rewards** is computable from the **total raw power** data as discussed in https://spec.filecoin.io/systems/filecoin_token/block_reward_minting/ .
The **total QAP** can be computed as **total raw power\* average quality muliplyer**
With these two, and our predictions on **total raw power** and **average quality multiplyer**, it is straightforward to make a prediction for the 20 day *extrapolation* of the ratio $R=$**new rewards/total QAP** with the following approach:
1) Given the current $L(t_0),\,T(t_0)$ use our algorithms to predict $L(t_0+20d),\,T(t_0+20d)$.
2) Use $L(t_0+20d),\,T(t_0+20d)$ to compute $\mathbf{new\,rewards}(t_0+20d),\,\,\mathbf{total\, QAP}(t_0+20d)$.
3) Compute predicted ratio,
$$R(t_0+20d)=\frac{\mathbf{new\,rewards}(t_0+20d)}{\mathbf{total\, QAP}(t_0+20d)}$$
<figure>

<figcaption align = "center"><b> Observed vs 20-day prediction of ratio of new rewards/total QAP</b></figcaption>
</figure>
## Predicting the 20 day Cumulative sum of R
Once we have a prediction $R(t_0+\Delta t)$, we can use this to estimate the cumulative sum
$$R_\Sigma(t_0+\Delta t)\equiv \int_0^{\Delta t} R(t_0+t)dt$$.
If we assume the evolution from $R(t_0)$ to our prediction $R(t_0+\Delta t)$ was linear, that gives us the following formula for the cumulative sum
$$R_\Sigma(t_0+\Delta t)=\Delta t*R(t_0)+\frac{\Delta t}{2}\Delta R,$$
where $\Delta R=[R(t_0+\Delta t)-R(t_0)]$
This prediction can be further smoothened by basing the prediction not only only on the $\Delta R$ calculated at $t_0$, but we can base $\Delta T$ on a window of time $\tau_2$ before $t_0$, such that we can smoothen the prediction by redefining
$$\Delta R=\frac{1}{\tau_2}\int_0^{\tau_2}[R(t_0+\Delta t-\tau^\prime)-R(t_0-\tau^\prime)]d\tau^\prime$$
Below we show our prediction for the 20 day cumulative sum of $R$ using this approach.
<figure>

<figcaption align = "center"><b> Observed vs prediction for 20-day Cumulative sum of ratio of new rewards/total QAP, using τ_2=2hours.</b></figcaption>
</figure>
We also calculate the mean percentage of error as defined in (https://hackmd.io/@R02mDHrYQ3C4PFmNaxF5bw/HJtL7i9HK),
$ME=1.046\%$
<figure>

<figcaption align = "center"><b> Observed vs TRUNCATED prediction for 20-day Cumulative sum of ratio of new rewards/total QAP, With the current algorithm used by filecoin. The artificial peak after the 2020-12-20 shock is larger than pictured here, but we truncated it for better visibility. Notice that there is an artifical lag time after baseline crossing.</b></figcaption>
</figure>
This method has mean percentage of error
$ME=1.205 \%$
## Summary of proposed algorithm
Our propose a forecasting algorithm for 20-day block reward per sector which has the following advantages over the currently used algorithm:
1) By extrapolating the correct independent variables, we can easily predict an upcoming baseline crossing.
2) We ensure all variables stay inside their regime of validity (i.e. Total network power may not become negative), A problem that has affected the currently used algorithm.
3) The end result is more accurate with a lower mean percentage of error.
Here are the steps of the proposed algorithm.
1) Read **total raw power** and **total QAP** data.
2) Compute **average quality multiplyer** data as **total QAP/total raw power**.
1) Compute $L=\log(\mathbf{total\,raw\,power})$.
2) Estimate an effective instantaneous velocity associated with $L$ at each point in time $t_0$ as,
$$v_L(t_0)=\frac{L(t_0)-L(t_0-\tau)}{\tau},$$
for some period $\tau$, where $\tau$ functions as a smoothing parameter.
3) Predict the future $L$ by linear extrapolation based on the current velocity as,
$$L(t_0+\Delta t)=L(t_0)+v_L(t_0)\Delta t.$$
4) Apha-beta filter this extrapolation.
5) Re-exponentiate to obtain prediction on $\mathbf{total\,raw\,power}(t_0+\Delta t)=\log[L(t_0+\Delta t)]$.
1) Compute $T$ based on the **average quality multiplyer** as,
$$T\equiv\tanh^{-1}\left(\frac{\mathbf{average\,quality\,multiplyer}-1}{9}\right).$$
2) Calculate an effective velocity associated with $T$ at time $t_0$ as,
$$v_T(t_0)=\frac{T(t_0)-T(t_0-\tau)}{\tau}.$$
3) Linearly extrapolate $T$ as,
$$T(t_0+\Delta t)=T(t_0)+v_T(t_0)\Delta t.$$
4) Alpha-Beta filter this extrapolation.
5) compute the prediction for average quality multiplyer by inverting the mapping:
$$\mathbf{average\,quality\,multiplyer}(t_0+\Delta t)=9\tanh[T(t_0+\Delta t)]+1$$
1) Compute **total QAP** as
$$\mathbf{total\,QAP}(t_0+\Delta t)=\mathbf{total\,raw\,power}(t_0+\Delta t)*\mathbf{average\,quality\,multiplyer}(t_0+\Delta t).$$
2) Compute $\mathbf{new\,rewards}(t_0+\Delta t)$ using $\mathbf{total\,raw\,power}(t_0+\Delta t)$, as described in (https://spec.filecoin.io/systems/filecoin_token/block_reward_minting/).
3) Compute expected Block reward per sector rate
$$R(t_0+\Delta t)=\frac{\mathbf{new\,rewards}(t_0+\Delta t)}{\mathbf{total\, QAP}(t_0+\Delta t)}$$
4) Smoothen this prediction by averaging over the previous $\tau_2$ time periods,
$$\Delta R=\frac{1}{\tau_2}\int_0^{\tau_2}[R(t_0+\Delta t-\tau^\prime)-R(t_0-\tau^\prime)]d\tau^\prime$$
5) Use this to compute the expected *cumulative sum* of $R$,
$$R_\Sigma(t_0+\Delta t)=\Delta t*R(t_0)+\frac{\Delta t}{2}\Delta R.$$