![Kagi Doggo Resting](https://assets.kagi.com/v1/kagi_assets/doggo/doggo_4.png) # Kagi DB Practices The following is a quick summary of techniques that we use to program our database layer, that has allowed us to scale our DB code with minimal issues as our app has grown in complexity and user base size. ## Tooling ### Cloud We run on GCP, using Postgres under Cloud SQL, using one primary instance and many zonal read-replicas. ### Libraries We write in raw SQL using `crystal-db` and `crystal-pg`. ### Migrations We use `micrate`. It has some rough edges, but it's a great & simple tool. ### psql We make heavy use of `psql` for quick development. Particularly it becomes very nice to work with using `\e` to open `$EDITOR` to edit queries. Once the query works as we expect it, because we do not use any ORM, it is mostly just a matter of pasting the query into our code and writing a datatype (if relevant). I can also recommend some lines from my `.psqlrc`: ``` \set QUIET 1 \x auto \timing \set COMP_KEYWORD_CASE upper \set PROMPT1 '%[%033[1m%]%M %n@%/%R%[%033[0m%]%# ' \set PROMPT2 '[more] %R > ' \set VERBOSITY verbose \set HISTFILE ~/.psql_history- :DBNAME \set HISTCONTROL ignoredups \pset null '(null)' \pset pager off \pset linestyle unicode \unset QUIET ``` With options like `\timing`, we get an immediate sense if something could be slow, and also have the chance to run the query planner to do analysis before even integrating the query into code. It also means use the same tool locally as we do for remote debugging on staging or production database instances - it's a great tool to become familiar with. ## API Design All queries are exposed as static methods in our `DB` module: ```crystal module DB struct ReturnType include ::DB::Serializable include ::DB::Serializable::NonStrict getter col1 : Int32 getter col2 : Bool end def self.table_query_name(...) : ReturnType # .. end end ``` Queries are NOT written for arbitrary CRUD updates to the backing table. Sometimes we do end up writing a plain CRUD layer, but that is not how we approach writing the API. Each query is written to serve a specific function that our app needs. In many cases, there is no singular "backing table" - any query may read a subset of a table,or join multiple tables, or even return virtual columns that do not exist on disk. In a sense, our DB layer is "feature oriented" rather than any attempt at a generic "object relational" mapping. Treating our DB code as a sort of FFI layer, this decoupling brings many benefits. This means if we alter or remove a query, we have a list of all app features that are affected via compiler errors. We can also change the query underneath without affecting app code. We can tune the performance of each query for its specific use case, rather than attempting to preemptively optimize arbitrary table access. Cloud SQL metrics will show you the raw SQL submitted to the database, and the queries load on the database. Since we use raw SQL in our code, it is very easy to trace back the under-performing query in our code base with a grep. ### Naming Query methods are usually written in "big endian" name after the principle table that they operate on - usually whatever table the query is `FROM`, even if there are `JOIN`s involved - followed by a description of the action they perform. This keeps names mostly predictable, and editors still autocomplete this just fine, even without LSP. ### Heredocs Almost all queries are written using Crystal's heredoc syntax: ```crystal def self.table_action(param1, param2, *, db = SQL) db.exec(<<-SQL, param1, param2) UPDATE table SET param1 = $1, param2 = $2 SQL end ``` This allows formatting queries in a legible and unobtrusive way, without having to deal with any escapes. It will also trim leading whitespace using the last `SQL` as a guide, which makes logs easier to read as well. Finally, editors and sites like GitHub will actually recognize this and enable SQL syntax highlighting in that section. Almost all queries use a static string. A select few build an SQL string incrementally: in most cases, this is simply tacking on an extra `WHERE` clause. ### Parameters #### Objects vs IDs API parameters accept only what they need. Generally what this means, if a query operates on a user ID, it does not accept `User`, just `user_id`. This gives a lot more flexibility by not forcing code to produce an entire `User` object when that particular operation does not need any field other than the user's ID. We do not make overloads that accept `User`. It just adds noise / more code than just typing `user.id`. #### DB Parameter *Every* query binding accepts an untyped, named `db` parameter. It has a default value of a connection pool instance. (`DB::Database`) We have two connection pools in global constants: - `SQL`, a read-write primary - `SQL_RR`, a zonal read-only replica > Weird naming, but `SQL` was a legacy decision when we started the codebase > because we couldn't pick a better one. And `SQL_RR` was added later. You > can probably pick better names. This has a few properties: - At a glance, you can confirm or search for which queries go to our read replica, and which need to perform writes. - Switching a query between RW primary and RO replica means just changing `SQL` -> `SQL_RR`, or vice versa. ```crystal # targets primary def self.table_create(..., *, db = SQL) # .. end # targets read-replica def self.table_get(..., *, db = SQL_RR) : ReturnType # .. end ``` > The `*, db` enforces that `db` must be used with an explicit > `db:` argument by callers. The untyped parameter allows for either a `DB::Database` **or** a `DB::Connection` to be passed. This allows for trivially composing multiple queries in a transaction with zero changes needed to the API: ```crystal # one-shot queries, no tx table_update1(id, param1) table_update2(id, param1) table_update3(id, param1) # wrap in a tx DB.transaction do |tx| db = tx.connection table_update1(id, param1, db: db) table_update2(id, param1, db: db) table_update3(id, param1, db: db) end ``` These transaction operations themselves will usually be wrapped in their own static method, that also takes a `db` arg at the top level, for consistency with everything else. ### Return Types Most often, each query is given its own `Serializable` type bound to the return type of that particular query. This way, each query is treated as its own independent unit of code that can be freely moved around and changed without concern of having a rippling effect across a lot of queries, i.e. adding a new column without a default. This also gives us a lot of control over bandwidth and memory size on a query level - for example, some queries may only return a *subset* of the `users` table columns that are needed for that particular query. Every type will `include ::DB::Serializable::NonStrict`. This is important for zero-downtime updates to our app - adding new columns should not crash any existing code. We may place utility methods on these types, but they are *always* kept simple: performing some summary of data or state of the returned object, and never i.e. performing more queries. #### Enums We prefer to use ints in DB for crystal enums, with explicitly declared values. Using pg enums we felt was not worth the investment, nor did we prefer `TEXT`. Consider also that we have things other than Crystal interacting with our tables (Python, Go, ...) We use the following converters to override the default serialization, and everything has worked out fine. ```crystal module EnumInt(EnumT) def self.from_rs(rs) value = rs.read(Int32) EnumT.new(value) end end module EnumNullInt(EnumT) def self.from_rs(rs) value = rs.read(Int32?) return unless value EnumT.new(value) end end macro enum_str_lookup(value, enum_t) case value {% for member in EnumT.constants %} when {{member.stringify.underscore}}.to_slice EnumT::{{member}} {% end %} else {{yield}} end end module EnumStr(EnumT) def self.from_rs(rs) value = rs.read(Bytes) enum_str_lookup(value, EnumT) do raise "Unknown enum value for #{EnumT}: #{String.new(value)}" end end end module EnumNullStr(EnumT) def self.from_rs(rs) value = rs.read(Bytes?) return unless value enum_str_lookup(value, EnumT) do nil end end end ``` The `Str` variants were used to transition some columns away from using strings. ### Update Confirmation Minimally, our `UPDATE` or `DELETE` bindings will return a bool to confirm if the query had any affect, in case the caller needs to know this. The majority operate directly on a PK, so there will always be 0 or 1 matches. ```crystal def self.table_update(id : Int64, param : String) : Bool r = db.exec(<<-SQL, id, param) UPDATE table SET param = $1 SQL r.rows_affected == 1 # or > 0, if editing multiple end ``` ### Errors None of our DB bindings raise errors (besides those propagated from the `db` or `pg` shard itself). If the caller needs to handle some case, we return an enum or union, and the caller uses exhaustive `case`. Exceptions are unnecessary overhead 99% of the time, where we always just need to inform the immediate caller of some logical state. Exceptions are left for *actual* exceptions (protocol errors, unavailability, ...). ## Addendums ### Placement of Static Methods Initially, we started by attaching static methods that do queries directly on the "model" that they primarily deal with. For example: ```crystal module DB struct User def self.create(email : String) : User # .. end def self.get(email : String, full : Bool) : User? # .. end end end ``` It felt fine to do this at first, but also stemmed from the fact that we initially started our project with an ORM before removing it. However, as our codebase grew, it became apparent that this was not a great idea - more complex queries begged distracting questions as to which "model" they should be placed on, so you wind up with a mess of inconsistencies. So I suggest: completely discard the idea of attaching the queries to any "model" type; it is not useful, and does not scale. All new queries that we write go directly as static methods on our `DB` module, with the freedom to pick an expressive name, decoupled from congregating around any one particular datatype or table. Today we would write it as: ```crystal module DB # Limited/essential data about a user (ID, email) struct UserLite # .. end # "All" user data (ID, email, + all settings) struct UserFull # .. end def self.user_create(email : String, *, db = SQL) : UserLite # .. end def self.user_get_lite(email : String) : UserLite? # .. end def self.user_get_full(email : String) : UserFull? # .. end end ```