owned this note
owned this note
Published
Linked with GitHub
# Process for deciding on deprecations and breaking changes
Assessing impact
----------------
Here "impact" means how unmodified code may be negatively affected by a change ignoring any deprecation period.
To get an idea about how much impact a change has, try to list all potential impacts. This will often be just a single item (user of function `x` has to replace it with `y`), but it could be multiple different ones. Ideally *after* listing all potential impacts rank them on the following two scales (do not yet think about how to make the transition easier):
1. **Severity** (How bad is the impact for an affected user?)
- Minor: A performance regression or change in (undocumented) warning/error category will fall here. This type of change would normally not require a deprecation cycle or special consideration.
- Typical: Code must be updated to avoid an error, the update is simple to do in a way that works both on existing and future NumPy versions.
- Severe: Code will error or crash, and there is no simple work around or fix.
- Critical: Code returns incorrect results. A change requiring massive effort may fall here. A hard crash (e.g. segfault) in itself is not necessarily critical, it is much less severe than a silent change.
2. **Likelihood** (How many users does the change affect?)
- Rare: Change has very few impacted users (or even no known users after a code search). The normal assumption is that there is always someone affected, but a rarely used keyword argument of an already rarely used function may fall here.
- Limited: Change is in a rarely used function or function argument. Another possibility is that it affects only a small group of very advanced users.
- Common: Change affects a bigger audience or multiple large downstream libraries.
- Ubiquitous: Change affects a large fraction of NumPy users.
The categories will not always be perfectly clear. That is OK. Rather than establishing precise guidelines, the purpose is a structured *processes* that can be reviewed.
When the impact is exceptionally difficult to assess, it is often feasible to try a change on the development branch while signalling willigness to revert it. Downstream libraries test against it (and the release candidate) which gives a chance to correct an originally optimistic assessment.
After assessing each impact, it will fall somewhere on the following table:
Severity\Likelyhood | Rare | Limited | Common | Ubiquitous
--------------------|------|---------|--------|-----------
**Minor** | ok | ok | ok? |
**Typical** | ok? | | | no?
**Severe** | | | no? | no
**Critical** | no? | no | no | no
Note that all changes should normally follow the two release deprecation warning policy (except "minor" ones). The "no" fields means a change is clearly unacceptable, although a NEP can always overrule it. This table only assesses the "impact". It does not assess how the impact compares to the benefits of the proposed change. This must be favourable no matter how small the impact is. However, by assessing the impact, it will be easier to weigh it against the benefit.
(Note that the table is not symmetric. An impact with "critical" severity is rarely considered even when _no_ known users are impacted.)
### Mitigation and arguing of benefits
Any change falling outside the "ok" fields requires careful consideration. When an impact is larger, you can try to mitigate it to "move" on the table. Some possible reasons for this are:
* An avoidable warning for at least two releases (the policy for any change that modifies behaviour) reduces a change one category (usually from "typical" to "minor" severity).
* The severity category may be reduced by creating an easy work around (i.e. to move it from "sever" to "typical").
* Sometimes a change may break working code, but also fix _existing_ bugs, this can offset the severity. In extreme cases, this may warrant classifying a change as a bug-fix.
* For particularly noisy changes (i.e. ubiquitous category) considering fixing downstream packages, delay the warning (or use a `PendingDeprecationWarning`). Simply prolonging the the deprecation period is also an option. This reduces how many users struggle with the change and smoothens the transition.
* Exceptionally clear documentation and communication could be used to ensure that the impact is more acceptable. This may not be enough to move a "category" by itself, but helps.
After mitigation, the benefits can be assessed:
* Any benefit of the change can be argued to "offset" the impact. If this is necessary, a broad community discussion on the mailing list is required. It should be clear that this does not actually "mitigate" the impact but rather argues that the benefit outweighs it.
These are not a fixed set of rules, but provide a framework to assess and then try to mitigate the impact of a proposed change to an acceptable level. Arguing that a benefit can overcome multiple "impact" categories may require exceptionally large benefits, and most likely a NEP. For example a change with an initial impact classification of "severe" and "ubiquitous" is unlikely to even be considered unless the severity can be reduced.
Many deprecations will fall somewhere below or equal to a "typical and limited" impact (i.e. removal of an uncommon function argument). They recieve a deprecation warning to make the impact acceptable with a brief discussiong that the change itself is worthwhile (i.e. the API is cleaner afterwards). Any more disruptive change requires broad community discussion. This needs at least a discussion on the NumPy mailing list and it is likely that the person proposing it will be asked to write a NEP.
### Summary and reasoning for this processess
The aim of this process and table is to provide a loose formalism with the goal of:
* *Diligence:* Following this process ensures detailed assessment of its impact without being distracted by the benefits. This is achieved by following well defined steps:
1. Listing each potential impact (usually one).
2. Assessing the severity.
3. Assessing the likelihood.
4. Discussing what steps are/can be taken to lower the impact *ignoring any benefits*.
5. If the impact is not low at this point, this should prompt considering and listing of alternatives.
6. Argue that the benefits outweigh the remaining impact. (This is a distinct step: the original impact assessment stands as it was.)
* *Transparency:* Using this process for difficult decisions makes it easier for the reviewer and community to follow how a decision was made and criticize it.
* *Nuance:* When the it is clear that an impact is larger than typical this will prompt more care and thought. In many cases it may also clarify that a change has lower impact than expected on first sight.
* *Experience:* Using a similar formalism for many changes makes it easier to learn from past decisions by providing an approach to compare and conceptualize them.
We aim to follow these steps in the future for difficult decisions. In general, any reviewer and community member may ask for this process to be followed for a proposed change, if the change is difficult, it will be worth the effort. If it is very low impact it will be quick to clarify why.
NOTE: At this time the process is new and is expected to require clarification.
Examples
--------
It should be stressed again, that the categories will rarely be clear and intentially are categorized with some uncertainty below. Even unclear categories can help in forming a more clear idea of a change.
### Histogram
The "histogram" example doesn't really add much with respect to this process. But noting the duplicate effort/impact would probably move it into a more severe category than most deprecations.
That makes it a more difficult decision and indicates that careful thought should be spend on alternatives.
### Integer indexing requirement
* Severity: Typical--Severe (although fairly easy, users often had to do many changes)
* Likelihood: Ubiquitous
How ubiquitous it really was became probably only clear after the (rc?) release. The change would now probably go through a NEP as it initially falls into the lower right part of the table.
To get into the "acceptable" part of the table we note that:
1. Real bugs were caught in the processes (argued to reduce severity)
2. The deprecation was delayed and longer than normally (argued to mitigate the number of affected users by giving much more time)
Even with these considerations, it still has a larg impact and clearly requires careful thought and community discussion about the benefits.
### Removing financial functions
* Severity: Severe (on the high end)
* Likelihood: Limited (maybe common)
While not used by a large user base (limited), the removal is disurptive (severe). The change ultimately required a NEP, since it is not easy to weigh the maintainence advantage of removing the functions against the impact to their users.
The NEP included the reduction of the severity by providing a work-around: A pip installable package as a drop-in replacement (reducing the severity). For heavy users of these functions this will still be more severe than most deprecations, but it lowered the impact assessment enough to consider the benefit of removal to outweigh the impact.