Pulp3 Squashing Migrations

Problem:

  • We have a lot of migrations.
  • Old migrations show paths where we have gone wrong and turned around.
  • There are migrations to create tables that are removed by later migrations.
  • There are migrations to rename fields on models.
  • There are migrations that change the types of fields.
  • Plugin migrations depend on pulpcore migrations.
  • Running all migrations in a new installation consumes a great amount of time.
  • We have supporting code (historical model) only needed to run ancient migrations.

General Advice on Writing Migrations

  • Do not seed data in migrations; use a post_migrate signal handler instead.
  • Write data migrations only to update existing data and mark them elidable=True.
    • pulpcore-manager squashmigrations will skip them because there is no data to expect anyway.

Squashing Migrations in Django

see Django docs
Squashing is meant to be a two step process in django:

  1. Squash the migrations up to a specific one.
    • It will claim to replace the original migrations and if none of those was applied, it will be used to migrate the database in one step.
    • Ideally Django will optimize all the migrations to CreateModel and some extra steps for foreign keys and indices.
  2. Delete the replaced migrations once you can be sure that all installations seeing the code either have applied all or none of them.
    • This will remove any trace of the previous migrations.
    • While we maybe able to do this in plugins, maybe we can never do this in pulpcore, once all plugins depend only on the squashed migration. (Can we enforce this with the deprecation policy?)

Step one should be safe for Pulp at any time in core and in plugins.
Step two is problematic.

You cannot repeat step one without step two: CommandError: You cannot squash squashed migrations! Please transition it to a normal migration first: https://docs.djangoproject.com/en/3.2/topics/migrations/#squashing-migrations

Should we start by squashing all migrations for a release? Probably not.

General Advice on Squashing Migrations

  • After squashing, check that the schema produced is reflecting your models.
  • Observe the produced result. The optimization algorithm is not perfect.
    • Hand optimizing may be possible.
  • If you change the number of the squashed migration from 0001_... to one more than the last existing migration, django will continue to count from there for future migrations.
    • Not sure this is a good idea.
  • The squashed migration may declare multiple dependencies on migrations in pulpcore.
    • It should be safe to only keep the latest.
  • Check the dependencies and wen deleting squashed migrations, adjust the dependencies of later migrations.
  • Never, ever, ever rename an existing migration.
  • Squashing a squashed migration again may optimize it even more.
    • The optimize command should help here.
  • Keep at least one migration unsquashed to allow for easier dependency specifications.

Squashing Migrations in Pulp

Plugin migrations depend on pulpcore migrations. But migrations may only be deleted once there are no more dependencies. In general a plugin must be installable after all pulpcore migrations have run.
Note: With deleting migrations we loose the possibility to upgrade from everywhere.
Squashing the migration in plugins no other plugins depend on should be much simpler.

Plan:

  • Introduce empty migrations for breaking change releases to make the whole process more transparent.
    • Plugins should adjust their dependencies on these or later migrations.
  • Squash pulpcore migrations in a breaking change release.
  • Require all plugins to release compatible with squashed migration.
  • Delete redundant migrations with next breaking change release.
pulpcore 3.25
  0001_initial
  ...
  0020
  0001_squashed_0020  # "squashed migration"
  0021_release_3.25  # plugin depend on this once they declare pulpcore>=3.25,<3.30
  
pulpcore 3.30
  0001_squashed_0020  # rewritten to be a "normal migration" (ie loses 'replace' stmts)
  0021_release_3.25  # plugins can still depend on this (will be removed in 3.35)
  0022
  ...
  0030
  0001_squashed_0030  # "squashed migration"
  0031_release_3.30  # plugin depend on this once they declare pulpcore>=3.30,<3.35

pulpcore 3.35
  0001_squashed_0030  # rewritten to be a "normal migration"
  0031_release_3.30  # plugins can still depend on this (will be removed in 3.40)
  0032
  ...
  0040
  0001_squashed_0040  # "squashed migration"
  0041_release_3.35  # plugin depend on this once they declare pulpcore>=3.35,<3.40

There is an experiment to manipulate migrations as needed for this process: https://github.com/mdellweg/pulpcore/tree/migration_tools
Django 4.1 ships its own optimizemigration command.

8.22.2022 - Notes

Attendees: x9c4, ipanova, ggainey, dralley, humerto, fao89, bmbouter

  • implications on old-installs: if you're too old, you may have to do multi-step updates
    • e.g.: to go from 3.24 to 3.36, you would need to upgrade to 3.25, then 3.30, then to 3.36
    • ipanova: this has a large potential impact on our product-users
    • bmbouters: even if squashing is painless, the multi-step-upgrade problem may make this a nonstarter
    • dalley: not seeing a blocker to squashing plugin migrations, independent from whatever we do with pulpcore
      • mdellweg: deleting migrations has the same impact on plugins as pulpcore, in terms of upgrade impact
    • ipanova: need to think about this harder, esp the upgrade-path issue
    • mdellweg: if we don't delete migrations, avoid the multi-step-upgrade problem, but can only ever squash once
  • ggainey/ipanova: current number of migrations is partially a result of the period we're coming out of (LOTS of migrations early in pulp3, rate-of-change WAY lower now)

Action Item

  • [mdellweg] open discussion on Discourse, point to this doc
    • larger audience than pulpcore
Select a repo