owned this note
owned this note
Published
Linked with GitHub
# PID-IIR primer
* Author: Robert Jördens <rj@quartiq.de>
* Date: 2021-06-10
* Revision: v0.7
* This work is licensed under a [Creative Commons Attribution 4.0 License](http://creativecommons.org/licenses/by-sa/4.0/deed.en_US)
## Introduction
Control problem
P
I
D
* No introduction to control system theory or design.
* No tutorial on the use of PID controllers.
* No analysis in the usual terms of control theory.
* No analysis of the system to to be controlled (plant).
* No transfer functions, no Bode diagrams.
* No poles-zeros-gain formalism.
* No notch, bandstop, bandpass, lowpass, highpass filters.
* No discussion of phase/gain margins.
* No lead/lag. No compensation schemes.
* No controller tuning.
* Only implementation of the controller.
* Only in terms of (finite difference) operators.
* Mostly digital filter terminology and variable names.
[^Abramovitch2015]
[^Beauregard2011]
[^Abramovitch2015]: D. Y. Abramovitch, "Trying to keep it real: 25 Years of trying to get the stuff I learned in grad school to work on mechatronic systems," 2015 IEEE Conference on Control Applications (CCA), 2015, pp. 223-250, doi: [10.1109/CCA.2015.7320636](http://dx.doi.org/10.1109/cca.2015.7320636).
[^Beauregard2011]: Brett Beauregard, "Improving the Beginner’s PID", Blog post on the Arduino PID Library, April 2011 http://brettbeauregard.com/blog/2011/04/improving-the-beginners-pid-introduction/.
## Naive PID controller
### Analog
The naive approach outlined below is found in many texts, books, and papers [^Abramovitch2015][^Wescott] and it is frequently implemented as a first attempt to address a control problem.
[^Wescott]: Tim Wescott: PID Without a PhD, Wescott Design Services 2018 http://wescottdesign.com/articles/pid/pidWithoutAPhd.pdf
Input $x(t)$, setpoint/target $u$, output $y(t)$, error $e(t)$:
$$ e(t) = x(t) - u \\
y(t) = k_I\int_{-\infty}^te(t')dt' + k_Pe(t) + k_D\frac{\partial e(t)}{\partial t} $$
Units of the gain coefficients are $[k_I] = 1/\text{s}, [k_P] = 1, [k_D] = \text{s}$ with the natural time scale the SI second.
```graphviz
digraph {
node [fontname=helvetica];
rankdir = LR;
node [fixedsize=true];
node [shape=none]; x; y; u;
node [shape=circle]; sub [label="+"]; sum [label="+"];
{ rank = same; sub; u; }
j [shape=point];
node [shape=box];
id [label="1"];
int [label="Integ"];
der [label="Diff"];
node [shape=triangle, orientation=-90]; P; I; D;
u -> sub [headlabel="-", labelangle=-60];
x -> sub [headlabel="+", labelangle=60];
sub -> j [arrowhead=none, headlabel="e"];
j -> id:w; id -> P; P:e -> sum -> y;
j -> int:w; int -> I; I:e -> sum;
j -> der:w; der -> D; D:e -> sum;
}
```
#### Sign
It is pertinent to stress that in order to implement the correct controller response we expect overall negative feedback to be required in many cases. Negative feedback is conclusively reflected in negative gain coefficients. This choice then means the error is positive if the input exceeds the setpoint as would be expected by the common use of the term "error".
Other authors unfortunately choose to invert the sign of the error instead and define $e' = u - x$. But it's unclear why a too high input should correspond to a negative error: apart from contradicting the common use of the word "error" it can lead to confusion and additional sign flips in certain cases. For example, a refrigerator must respond to high temperature with a higher actuator setting (more cooling). Here, feedback with overall positive sign is required due to actuator or sensor specifics.
#### Variations
Several variations exist. For example in some educational contexts a version may be presented where instead of three individual gains, one common gain stage preceeds all three operator paths and integral and derivative action are then specified by their corner frequencies or characteristic times relative to proportional action. None of this fundamentally affects the following. In fact, it exacerbates key issues, like integral gain step described below.
### Digital
* Sample interval $\tau$.
* $i$ is the sample index, $i=0$ the current sample. Increasing $i$ refers to older sample of age $i\tau$.
$$ e_i = x_i - u $$
Finite differences yields the following approximation for the current output sample:
$$ y_0 = k_I\sum_{i=0}^\infty\tau e_i + k_Pe_0 + k_D\frac{e_0 - e_1}{\tau} $$
Such a digital implementation is only ever an approximation of the analog controller since $\tau > 0$. It is only accurate for small frequencies significantly below half the sample rate (the Nyquist frequency $1/(2\tau)$).
To implement this controller, one stores both $e_1$ and an integrator/accumulator $w$ between iterations such that the update equations are:
$$ w_0 = w_1 + e_0 \\
y_0 = k_Iw_0\tau + k_Pe_0 + k_D\frac{e_0 - e_1}{\tau} $$
### Flaws
We now show that the naive approach above has multiple severe flaws that are encountered in many real-world scenarios with high certainty. Many of the flaws are also prominently present in the most well-known publications or implementations of PID controllers ([^Wescott] to give just one example). All of the flaws below are inherent to the naive discrete time ("digital") controller but most also apply the continuous time ("analog") controller that would be implemented using one or several (operational) amplifiers or even mechanical, pneumatic, or hydraulic components.
We'll try to address the flaws in the context of the digital controller here.
#### Units
Digital, periodically sampled data processing systems are governed by the Nyquist-Shannon sampling theorem. A frequency in SI Hertz or a time interval in SI seconds always needs to be viewed in relation to the relevant sample rate and sample interval respectively. For example, an input signal at the sampling frequency is indistinguishable from a constant (0 Hz) input signal and will be interpreted as such. This sets a natural scale for time and frequency in a sampled system and suggests the use of relative (not SI) units. With the naive implementation the magnitude of the quantities fails to convey or enforce the sampled nature of the system leading to misunderstanding and errors.
**Solution:** In the digital controller, set $\tau = 1$ in order to work with time intervals in units of the sample interval and frequencies in units of the sampling rate. In these units "$1$" is the natural scale and limit for frequency and time domain quantities. This moves factors of $\tau$ and $1/\tau$ from the implementation details closer to the user-facing side and allows reasoning about the capabilities of the controller versus the properties of the plant as well as the validity of the finite differences approximation. The numerical values of $k_I/\pi, k_P, k_D\pi$ now refer to gains extrapolated to Nyquist frequency (half the sampling frequency). These units also enable expressing the constraint that in frequency domain, I and D gain must intersect below the P gain as $k_Ik_D \le k_P^2$. Otherwise the controller response contains resonances and/or anti-resonances.
We write:
$$ y_0 = k_Iw_0 + k_Pe_0 + k_D(e_0 - e_1) $$
#### Integral gain step
In practice, the integral gain $k_I$ is not a constant but must be changed, tuned. With the naive implementation, an integral gain change $\Delta k_I$ leads to a large, instantaneous, lasting, undesired controller output response $\Delta k_I w_0$ that immediately moves the control loop out of regulation and clips the actuator.
**Solution:** Apply $k_I$ at each iteration, and only then integrate:
$$ w_0 = w_1 + k_Ie_0 \\
y_0 = w_0 + \ldots $$
#### Derivative setpoint kick
In tracking control applications the setpoint $u$ follows prescribed trajectories with well-known spectra and rate-of-change. But also in stabilization control in practice the setpoint $u$ is not constant but changed during operation of the controller.
With the naive implementation, such a change of the setpoint by $\Delta u$ occurs within one controller cycle. It is differentiated and leads to a large (note that $|k_D|$ is typically large compared to $|k_I|$ and $|k_P|$), fast, and transient controller output kick response $k_D\Delta u$. This response is desired in tracking control applications but in stabilization control it is unphysical and undesired as it upsets the system and clips the controller.
**Solution:** Apply derivative gain only to the input, not to the error:
$$ y_0 = \ldots + k_D(x_0 - x_1) $$
#### Integral wind-up and output limiting
A practical output actuator $y_0$ will have a finite range and and is clipped $y_0 = \text{clip}(y_0', y_\text{min}, y_\text{max})$.
Yet the naive PID output can and will often produce outputs arbitrarily far outside that range as soon as deviations from close-to-steady-state operation occur. This is the case in several common scenarios:
* Turn-on transients
* Large setpoint changes
* Interruption of the control loop closure, for examble due to actuator disable or missing sensor data
* Dynamic change of the actuator range
* Large disturbances of the plant
During these situations the error is large for a long time and the response of the controller exceeds the range of the output where the actuator can not meet the controller demand. Especially the integrator will contribute to this effect as it accumulates the large error for the duration of the transient: it will "wind-up" and the output will stop reacting to a any change in the error including even a sign change. To again reach a valid operating point and bring the output into range the integrator needs to be "unwound": the loop ultimately has to artifically cause an similarly large error of opposite sign for an similarly long duration. This is highly undesirable and completely unnecessary.
One clean and robust remedy is to ensure that the controller output is the only quantity where clipping is performed.
To achieve this, clipping or limiting the integrator $w_0$ alone is insufficient: the proportional and derivative terms can still bring the output out of range. Even backing off the $w_0$ limits away from the actual actuator output range of $y_0$ is often impractical: Since proportional and derivative terms can be large (in normalized units) that back-off needs to be large as well.
**Solution:** Eliminate the intermediate integrator $w$ by expressing it in terms of $y$ and $x$, additionally store $y_1$, $x_2$ and clip $y_0$ exclusively:
$$ w_1 = y_1 - k_Pe_1 - k_D(x_1 - x_2) \\
y_0' = y_1 + (k_P + k_I)e_0 - k_Pe_1 + k_D(x_0 -2x_1 + x_2) \\
= -k_Iu + y_1 + (k_P + k_I + k_D)x_0 - (k_P + 2k_D)x_1 + k_Dx_2 \\
y_0 = \text{clip}(y_0', y_\text{min}, y_\text{max})
$$
<!-- $$ y_0 = \left[y_0'\right]_{y_\text{min}}^{y_\text{max}} $$ -->
Here we have also eliminated the error $e_0$ and instead directly introduce the setpoint $u$ as an offset into the equation for $y_0'$. In doing so we dropped the proportional response to the setpoint in the same manner we have already dropped derivative response. For a constant setpoint in stabilization control this is of no consequence. To maintain proportional response to a setpoint change in tracking control, the setpoint needs to be tracked and $k_P\Delta u=k_P(u_0 - u_1)$ is added to $y_0'$.
#### Bumpless transfer
A typical operation mode involves approaching the desired working point by manually (or through some means other than PID control) directly setting the output $y_0$ and only then enabling the PID action to take over control.
The naive implementation has problems implementing this: The correct integrator value has to be solved based on the current gains, the current proportional error contribution and actual output.
Luckily addressing integral wind-up has also made it possible to derive a sound method to implement bumpless transfer.
**Solution:** Clear $k_{I,P,D}$ and adjust $y_1$ to the desired value. Then set $k_{I,P,D}$ to take over. That transfer can be made smooth and smart by ramping up the gains to their final value and/or in the sequence $k_D$, $k_P$, $k_I$, instead of enabling them all abruptly.
### Data flow
Having addressed the flaws of the naive PID implementation, we have arrived at a very different architecture from the naive one. Instead of presenting the full data flow of this improved naive PID controller, we now derive that same architecture again from first principles and later show the common data flow diagram of both the fixed naive implementation and the IIR-style PID controller.
## IIR implementation
We need to compute the new output sample $y_0$.
Assume we have the three most recent input samples $x_0, x_1, x_2$ available and the two most recent output samples $y_1, y_2$.
### Feed-forward
We start with a feed-forward equation with three coefficients $b$ where the output is affected by the input samples
$$ y_0 = b_0x_0 + b_1x_1 + b_2x_2 = b\cdot x.$$
This can approximate (in the standard finite difference sense) increasing powers of the differentiation operator (P=D⁰, D, and D²) using for example the following kernels:
* $b^{(0)} = [b^{(0)}_0, b^{(0)}_1, b^{(0)}_2] = [1, 0, 0]$
* $b^{(1)} = [1, -1, 0]$
* $b^{(2)} = [1, -2, 1]$
Note that there are other valid approximations, such as $b^{(0)} = [1, 1, 1]/3$ and $b^{(1)} = [1, 0, -1]/2$. The choice of derivative (or equivalently integration) rule is a large and multi-levelled topic without a universally optimal selection [^Arias-Trijillo2015][^Lyons2019]. The kernels listed above correspond to the backward Euler rule and offer a small gain and phase advantage in some situations near Nyquist. A triangular integration rule (Tustin's approximation) is another interesting choice [^Abramovitch2015] but will fail dramatically for derivative action. Sufficiently far away from Nyquist the different approximations all become equivalent.
[^Arias-Trijillo2015]: Juana Arias-Trujillo, Rafael Blázquez, Susana López-Querol, "Frequency-Domain Assessment of Integration Schemes for Earthquake Engineering Problems", Mathematical Problems in Engineering, vol. 2015, Article ID 259530, 2015. https://doi.org/10.1155/2015/259530
[^Lyons2019]: Rick Lyons: [The Risk In Using Frequency Domain Curves To Evaluate Digital Integrator Performance](https://www.dsprelated.com/showarticle/1299.php), DSPrelated.com, 2019-09-24
### Feed-back
Similarly, feed-back coefficients of a reversed equation where the current output is also affected by old outputs
$$ a_0y_0 + a_1y_1 + a_2y_2 = a\cdot y = x_0 $$
can represent increasing powers of the integration operator (P=I⁰, I, and I² respectively) with the same kernels as the $b^{(i)}$ for feed-forward above:
* $a^{(0)} = [a^{(0)}_0, a^{(0)}_1, a^{(0)}_2] = [1, 0, 0]$ is pure proportional feedback P: $y_0 = x_0$
* $a^{(1)} = [1, -1, 0]$ is I: $y_0 = y_1 + x_0$: an integrator/accumulator
* $a^{(2)} = [1, -2, 1]$ is I²: $y_0 = 2y_1 - y_2 + x_0$: a double integrator (this can be shown by introducing a helper integrator $w_0 = w_1 + x_0$, then expressing $w_1$ in terms of $y$ and $x$, and finally noticing that $y$ integrates $w$: $y_0 = y_1 + w_0$)
### Feed-forward and feed-back
Combining feed-forward and feed-back expressions, the kernels $b^{(\cdot)}, a^{(\cdot)}$ act as:
| | $b^{(0)}$ | $b^{(1)}$ | $b^{(2)}$ |
| --------- | --------- | --------- | --------- |
| $a^{(0)}$ | P | D | D² |
| $a^{(1)}$ | I | P | D |
| $a^{(2)}$ | I² | I | P |
Feed-forward and feed-back can be applied successively. Introducing $w_0$ as both the output of the feed-forward section as well as the input of the feed-back section, we have:
$$w_0 = b_0x_0 + b_1x_1 + b_2x_2 \\
a_0y_0 = w_0 - a_1y_1 - a_2y_2 = b_0x_0 + b_1x_1 + b_2x_2 - a_1y_1 - a_2y_2
.$$
This is the update equation for a universal biquadratic digital IIR filter.
We can now write the IIR-style PID update equation:
$$ a^{(1)}\cdot y = (k_I b^{(0)} + k_P b^{(1)} + k_D b^{(2)}) \cdot x $$
or in full:
$$ y_0 = y_1 + (k_I + k_P + k_D) x_0 - (k_P + 2k_D) x_1 + k_D x_2$$
Note that this is the exact same update equation as the one derived above after having addressed the critical flaws in the naive approach.
Similarly, an IIR-style PII² controller is implemented with:
$$ a^{(2)}\cdot y = (k_{I²} b^{(0)} + k_I b^{(1)} + k_P b^{(2)}) \cdot x $$
If the application needs both I² and D, a higher order IIR implementation with more taps, or preferrably multiple biquads in series or parallel (with careful analysis of integral wind-up) can be used.
### Gain limit
The behavior of the IIR filter in the limit of low frequencies is fully determined by the contributions from lowest order kernels $b^{(i)}$ and $a^{(j)}$ that have non-zero gains $k_{i}$ and $k_{j}$. In that limit the gain of the filter is $k_i/k_j\omega^{i - j}$ with normalized angular frequency $\omega=2\pi\tau f$. Conversely the high-frequency gain is fully determined by the highest order non-zero $k_{m}b^{(m)}$ and $k_{n}a^{(n)}$ terms as $k_{m}/k_{n}\omega^{m - n}$.
This freedom of two additional parameters in the PID update equation can be used to affect the controller gain at low and high frequency.
For example, to limit the integral gain in a PID controller to $\hat{k_I}$, a term $k_I/\hat{k_I}a^{(0)}\cdot y$ is added to the update equation.
A derivative gain limit of $\hat{k_D}$ can simultaneously be implemented in the same controller:
$$ (k_I/\hat{k_I}a^{(0)} + a^{(1)} + k_D/\hat{k_D}a^{(2)})\cdot y
= (k_I b^{(0)} + k_P b^{(1)} + k_D b^{(2)}) \cdot x $$
In a PII² controller low frequency gain limits for the I and I² behavior can be implemented similarly.
### Setpoint as offset
The only immediate other degree of freedom we have in the biquad update equation is an offset $v_u$. This is then the unique place to introduce the constant setpoint $u$ by pushing it from the input through the delays and multipliers of the feed-forward section into the summing junction as $v_u = -(b_0 + b_1 + b_2) u$.
Note that this offset only has an effect if there is a $b^{(0)}$ contribution. A good implementation will always reduce the degree of the $a$ and $b$ coefficients as much as possible by cancelling. This will either lead to a non-zero $b^{(0)}$ term or prove that the controller is purely derivative and has no gain at DC. In the latter case case, a setpoint is irrelevant. In tracking control applications where higher order response to setpoint changes is desired, the same approach as for the naive implementation would be taken by introducing the setpoint early as an offset on $x$ or adding a $b^{(1)}\cdot u$ term to the summing junction.
### Multiple inputs
To incorporate data samples from other inputs, feed-forward terms from other sensors can be added to the summing junction using the kernels above and analyzed with the same tools. Using this formalism complex multiple-input multiple-output topologies can be generated.
### Implementation
#### Normalization
For an implementation it is obviously beneficial to normalize the coefficients without loss of generality to $a_0 = 1$ for a floating point implementation and $a_0 = 2^k$ for a fixed point (integer) implementation. The latter then only requires a arithmetic right shift after summing and before clipping and offers a convenient way to introduce a rounding bias. The magnitude of the shift and the bias depend on the maximum gain coefficient required.
### Dynamic range
TBD
#### Signs and Clipping
An implementor will also want to invert the sign of $a_1$ and $a_2$ for faster and smaller code.
Then we are left with a dot product between the coefficient vector $b\otimes a = [b_0, b_1, b_2, -a_1, -a_2]$ and the sample vector $x\otimes y = [x_0, x_1, x_2, y_1, y_2]$, offset by the setpoint $v_u = -(b_0 + b_1 + b_2) u$, and clipped by the actuator range $[y_\text{min}, ..., y_\text{max}]$:
$$y_0' = v_u + (b\otimes a)\cdot(x\otimes y) = v_u + b_0x_0 + b_1x_1 + b_2x_2 - a_1y_1 - a_2y_2 \\
y_0 = \text{clip}(y_0', y_\text{min}, y_\text{max})
$$
```graphviz
digraph g {
node [fontname=helvetica];
rankdir = LR;
node [fixedsize=true];
node [shape=none]; x; y;
{
rank=same;
sum [shape=circle, label="+"];
u [shape=plain, label="vᵤ"];
}
node [shape=triangle];
node [orientation=-90];
{
rank=same;
b0 [label=b₀];
b1 [label=b₁];
b2 [label=b₂];
}
node [orientation=90];
{
rank=same;
a0 [label="clip", shape=box];
a1 [label="-a₁"];
a2 [label="-a₂"];
}
node [shape=box];
{
rank=same;
xj [shape=point];
zb1 [label=z⁻¹];
zb2 [label=z⁻¹];
}
{
rank=same;
yj [shape=point];
za1 [label=z⁻¹];
za2 [label=z⁻¹];
}
edge [weight=10];
x -> xj [arrowhead=none];
xj -> b0 -> sum;
sum -> a0 [label="y'"];
a0 -> yj [arrowhead=none];
yj -> y;
edge [weight=1];
u -> sum;
xj -> zb1 -> b1; b1:e -> sum;
zb1 -> zb2 -> b2; b2:e -> sum;
yj -> za1;
za1 -> za2;
edge [dir=back];
sum -> a1:w; a1 -> za1;
sum -> a2:w; a2 -> za2;
}
```
Here we use a bit of digital signal processing notation where the register $z^{-1}$ corresponds to "delay by one cycle": if the input to a $z^{-1}$ block is $x_i$ the output is $x_{i + 1}$.
#### Topology and Overflow
This form of a IIR filter is called the "direct form I". When used with fixed point integers, it is free from internal overflow as long as the output $y_0'$ of the summing junction does not overflow. It could also be written in "transposed direct form II" with the same property.
#### Multiply-accumulate
The filter architecture can be implemented very efficiently on many relevant computing platforms using a fused/pipelined multiply-accumulate instruction (DSP, CPU, GPU) or primitive (FPGA, ASIC). These are the universal high-performance DSP building blocks.
When implemented on fixed point architectures, wide multiplications and wide accumulators (for example 32-by-32 bit multiplication with 64 bit output and 64 bit accumulators) enable a wide range of gain coefficients that can be represented.
#### Optimization
If there is no I² term required, there will be no $a_2$ and the equation simplifies further.
If additionally there is no need for a gain limit (see below), the $a_1$ coefficient is unity removing another multiplication.
This leaves 3 multiplications, 4 additions, 3 stored samples, and 4 pieces of storage for coefficients and offset for a full universal PID controller.
Even in the absence of multiplication support on the target platform this is still a powerfull pattern: often the coefficients can be approximated by powers of two or small sums of powers of two. Then cheap shift and add operations replace the multiplications.
#### Timing analysis
When implemented in programmable logic, the time critical path is the loop through the $a_1$ multiplier, the summing junction, and the clipping into the $z^{-1}$ register. The feed-forward half and part of the summation can be pipelined at some latency cost and the other register in the feedback path can be pushed through the $a_2$ multiplier to improve slack. If the single-cycle critical path is limiting performance, a multi-cycle approach with more pipelining registers may be beneficial if that would allow the clock frequency to be increased by more than a factor of two to compensate.
## Conclusion
## Thanks
The author would like to thank Norman Krackow and Sven Rotter for encouragement, critical reading, and constructive comments.