owned this note
owned this note
Published
Linked with GitHub
# Pulp3 Squashing Migrations
## Problem:
* We have a lot of migrations.
* Old migrations show paths where we have gone wrong and turned around.
* There are migrations to create tables that are removed by later migrations.
* There are migrations to rename fields on models.
* There are migrations that change the types of fields.
* Plugin migrations depend on pulpcore migrations.
* Running all migrations in a new installation consumes a great amount of time.
* We have supporting code (historical model) only needed to run ancient migrations.
## General Advice on Writing Migrations
* Do not seed data in migrations; use a `post_migrate` signal handler instead.
* Write data migrations only to update existing data and mark them `elidable=True`.
* `pulpcore-manager squashmigrations` will skip them because there is no data to expect anyway.
## Squashing Migrations in Django
see [Django docs](https://docs.djangoproject.com/en/3.2/topics/migrations/#migration-squashing)
Squashing is meant to be a two step process in django:
1. Squash the migrations up to a specific one.
* It will claim to replace the original migrations and if none of those was applied, it will be used to migrate the database in one step.
* Ideally Django will optimize all the migrations to `CreateModel` and some extra steps for foreign keys and indices.
2. Delete the replaced migrations once you can be sure that all installations seeing the code either have applied all or none of them.
* This will remove any trace of the previous migrations.
* While we maybe able to do this in plugins, maybe we can never do this in pulpcore, once all plugins depend only on the squashed migration.
(Can we enforce this with the deprecation policy?)
Step one should be safe for Pulp at any time in core and in plugins.
Step two is problematic.
You cannot repeat step one without step two:
`CommandError: You cannot squash squashed migrations! Please transition it to a normal migration first: https://docs.djangoproject.com/en/3.2/topics/migrations/#squashing-migrations`
Should we start by squashing all migrations for a release? Probably not.
## General Advice on Squashing Migrations
* After squashing, check that the schema produced is reflecting your models.
* There have been cases with incorrect `default` values.
https://code.djangoproject.com/ticket/26223
* Observe the produced result. The optimization algorithm is not perfect.
* Hand optimizing may be possible.
* If you change the number of the squashed migration from `0001_...` to one more than the last existing migration, django will continue to count from there for future migrations.
* Not sure this is a good idea.
* The squashed migration may declare multiple dependencies on migrations in pulpcore.
* It should be safe to only keep the latest.
* Check the dependencies and wen deleting squashed migrations, adjust the dependencies of later migrations.
* Never, ever, ever rename an existing migration.
* Squashing a squashed migration again may optimize it even more.
* The optimize command should help here.
* Keep at least one migration unsquashed to allow for easier dependency specifications.
## Squashing Migrations in Pulp
Plugin migrations depend on pulpcore migrations.
But migrations may only be deleted once there are no more dependencies.
In general a plugin must be installable after all pulpcore migrations have run.
**Note:** With deleting migrations we loose the possibility to upgrade from everywhere.
Squashing the migration in plugins no other plugins depend on should be much simpler.
Plan:
- Introduce empty migrations for breaking change releases to make the whole process more transparent.
- Plugins should adjust their dependencies on these or later migrations.
- Squash pulpcore migrations in a breaking change release.
- Require all plugins to release compatible with squashed migration.
- Delete redundant migrations with next breaking change release.
```
pulpcore 3.25
0001_initial
...
0020
0001_squashed_0020 # "squashed migration"
0021_release_3.25 # plugin depend on this once they declare pulpcore>=3.25,<3.30
pulpcore 3.30
0001_squashed_0020 # rewritten to be a "normal migration" (ie loses 'replace' stmts)
0021_release_3.25 # plugins can still depend on this (will be removed in 3.35)
0022
...
0030
0001_squashed_0030 # "squashed migration"
0031_release_3.30 # plugin depend on this once they declare pulpcore>=3.30,<3.35
pulpcore 3.35
0001_squashed_0030 # rewritten to be a "normal migration"
0031_release_3.30 # plugins can still depend on this (will be removed in 3.40)
0032
...
0040
0001_squashed_0040 # "squashed migration"
0041_release_3.35 # plugin depend on this once they declare pulpcore>=3.35,<3.40
```
There is an experiment to manipulate migrations as needed for this process:
https://github.com/mdellweg/pulpcore/tree/migration_tools
Django 4.1 ships its own optimizemigration command.
## 8.22.2022 - Notes
## Attendees: x9c4, ipanova, ggainey, dralley, humerto, fao89, bmbouter
* implications on old-installs: if you're too old, you may have to do multi-step updates
* e.g.: to go from 3.24 to 3.36, you would need to upgrade to 3.25, then 3.30, then to 3.36
* ipanova: this has a large potential impact on our product-users
* bmbouters: even if squashing is painless, the multi-step-upgrade problem may make this a nonstarter
* dalley: not seeing a blocker to squashing **plugin** migrations, independent from whatever we do with pulpcore
* mdellweg: deleting migrations has the same impact on plugins as pulpcore, in terms of upgrade impact
* ipanova: need to think about this harder, esp the upgrade-path issue
* mdellweg: if we don't delete migrations, avoid the multi-step-upgrade problem, but can only ever squash **once**
* ggainey/ipanova: current number of migrations is partially a result of the period we're coming out of (LOTS of migrations early in pulp3, rate-of-change WAY lower now)
## Action Item
* [mdellweg] open discussion on Discourse, point to this doc
* larger audience than pulpcore