Dropping DB columns

# Dropping DB columns Ok, so you wake up one day and decide you have to delete that DB column that doesn’t make more sense in your application anymore. Easy, right? Wrong. It’s not rocket science, but there are some caveats. But before moving on, let’s make some assumptions. ## Assumptions 1. We’ll be using Rails as our main framework, so Rails migrations will be responsible for changing the database. 1. Pull Requests (PRs) are used to review and merge changes into the main branch. 1. Each PR, when merged to master, will be built in CI to make sure everything works as expected before deploying this new code to our production servers 1. As soon as CI completes its validations and everything is good, then one deployment to production will kick-off 1. Any deployment has many steps, but for the sake of simplicity, let’s assume two steps in this order 1. Changes to the database are applied 1. The code is copied to the web servers, and the web servers are restarted by CI so that the new code becomes available in production Looks like Nitro, right? ## Example Let’s create an example to make things easier to understand, and explain. Assume we have a database table named `bands` with columns `id`, `name`, and `presentations`. The presentations column stores the number of live performances of a band, but that is not info we want to track anymore, so let’s drop this column. It’s reasonable to think that our application will have many code references to that column. Things like this: ```ruby # Read reference on the model Band.where(presentations: 0) # Write reference on the model @band.increment!(:presentations) # Read reference in a manual query ActiveRecord::Connection.execute("SELECT * FROM bands WHERE prensentations = 0") ``` Or even references in Javascript like: ```js // A field reference in a GraphQL schema const LIST_BANDS = gql` query { bands { id presentations } } ` // An object reference const bands = fetchBandsApi() bands[0].presentations ``` Cool, so what’s the challenge of dropping the column? The main challenge is that the database and the code must be in sync for an application to function correctly. What do I mean by that? I mean that any code reference must be pointing to a valid column to not break the application. For instance, given the `bands` table structure with `id`, `name`, and `presentations` columns, if a ruby code calls to an unexistent column, like `@band.genre`, an error code will be thrown saying that such column does not exist. In other words, your app is broken. ## Implementation Let’s implement this column drop by taking our example app and all the previous assumptions. ### Attempt #1 1. Create a rails migration that drops the column. Commit. 2. Perform a code search to remove or update each one of the code references. Commit. 3. Create a PR. 4. Merge. If you forgot any code reference, your app would break on the page that calls that reference. Okay, but even if you spent a lot of time on step #2 and you haven't missed any code references, your app will still be down for a little while. And this little while depends on how long it takes for your deployment process to apply the Rails migration and perform the webserver restart. Let’s call this time the “deployment gap”. ### Deployment Gap The problem with Attempt #1 is that the application will be broken during the deployment gap. That happens because the code running in the web server and the database are in sync right before your deployment starts. In other words, the code loaded in the web server has references such as `band.presentations` all over the place AND the database has that column, too. When your deployment starts, it will drop the column from the DB, but the webserver will not have restarted yet. So, the code will still have those code references, but the database no longer has that column. Everything gets back in sync as soon as the webserver is restarted with the new code version, and you’re good. ### Attempt #2 As you might have heard before, these changes in an application must be done in two separate PRs. The first PR will remove any code references, and a second PR will drop the column. That deals with the Deployment Gap issue since you only merge the second PR after the webservers have restarted with the changes applied from your first PR. In other words, by the time the second deployment kicks in, the web server will not have any code references to the dropped column since you have removed them in a previous PR. Everything should be fine then, right? Wrong. ### Rails magic There’s some Rails magic happening under the hood, and I don’t mean that in a bad way. When you start (restart) the webserver, Rails will read the database schema and use this information to build queries with ActiveRecord. So, a query like this: ```ruby Band.where(name: "Muse").eager_load(:songs) ``` Will be converted to a SQL query like the following in the DB layer: ```SQL SELECT bands.id AS t0_r0, bands.name AS t0_r1, bands.presentations AS t0_r2, songs.id AS t1_r0, songs.name AS t1_r1 FROM bands LEFT OUTER JOIN songs ON songs.band_id = bands.id WHERE bands.name = "Muse"; ``` Do you see the problem? Because the `presentations` column is still on the database/schema, then Rails will keep adding dynamic references to that column. That means that when you merge your second PR, the web server still has some references to the column you’re about to drop. And that means that your app will be broken again. ### Attempt #3 So we need 3 PRs? Not really. There’s a method in ActiveRecord that helps us in these situations. It’s called `ignore_columns`. As the name suggests, it’s telling ActiveRecord to ignore columns even though they are in the database/schema. To safely drop a column, we should go with the two PRs approach while including the ignore_columns in the first PR. For example: ```ruby class Band < ActiveRecord::Base ignore_columns :presentations end ``` Now we are all set. So, to safely drop columns, you can follow these steps: 1. Perform a code search to remove or update any code references to the column you're willing to drop. 1. Add the `ignore_columns` in the related model. 1. Create a PR and merge it. 1. Wait until the deployment finishes. 1. Create a rails migration that drops the column. 1. Merge it. ## Links - This post here aims to explain *why* deleting columns can be troublesome. As soon as you understand it, you don't need to use this post as a guide for the future. But instead, you can read the compact version of it in [our documentation](https://github.com/powerhome/nitro-web/blob/master/docs/database/database-standards.md#deleting-columns). - There's an interesting gem called [StrongMigrations](https://github.com/ankane/strong_migrations) that will watch your migrations to help you avoid this and many other unsafe migrations. - All the knowledge shared here was based on a real-life situation where the `ignore_columns` was not used, and the app was down for a little while. You can find some documentation about that topic [here](https://github.com/powerhome/nitro-web/pull/23556).