Setting up a primary-standby PostgreSQL 12 Cluster for OpenNMS

# Setting up a primary-standby PostgreSQL 12 Cluster for OpenNMS :::warning I'm not a PostgreSQL expert. This guide should serve as a reference based on my experience with that database when I helped customers deploying it in their environments. ::: [PostgreSQL](https://www.postgresql.org/) (PG) is a powerful, open source, object-relational database system with over 30 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance. OpenNMS uses PostgreSQL to store the inventory and the core objects OpenNMS requires to be able to work properly. In other words, PostgreSQL is an essential and mandatory component, and OpenNMS cannot work without it. :::danger It is crucial to keep in mind that a PG cluster in primary-standby mode is for disaster recovery or high availability purposes. PostgreSQL is **NOT** a distributed database, meaning the data will always be handled by the primary server, which is the only one capable of executing mutations (i.e., add, modify and delete entries in tables). Standbys will be in replication mode all the time, and they can be used in read-only mode for reporting purposes, but not for normal OpenNMS operation. ::: I will describe how to install and configure a PostgreSQL 12 cluster in primary-standby mode on RHEL/CentOS 7/8 using the RPMs from yum.postgresql.org. This is because RHEL/CentOS 7 comes with PG 9.2, and RHEL/CentOS 8 comes with PG 10. In the case of PG 10, it can be used with modern versions of OpenNMS Horizon and Meridian; although, I'll focus on a newer version here. :::success If you need more than one instance of OpenNMS, we recommend having a dedicated PostgreSQL instance for each of them and avoid sharing the same PostgreSQL server (or primary-standby cluster) across multiple OpenNMS servers. That's because if there is a problem with PostgreSQL or the operating system on which it is running, or maintenance work is required on it, that will affect ALL the OpenNMS servers (due to the unavoidable downtime it generates, as OpenNMS cannot work without PostgreSQL). Whereas, if each OpenNMS has its own instance, any work on the PostgreSQL application or the underlying operating system will never affect the other OpenNMS instances. ::: :::warning Please keep in mind that it is possible that you would find in the official documentation or blog posts the term "master" when referring to the "primary" server and the term "slave" when referring to a "standby" server. ::: ## NTP All the PostgreSQL nodes must have their time synchronized all the time. Failing on this would lead to data inconsistencies, as events and alarms operations in OpenNMS are time sensitive. For this reason, it is mandatory to have NTP installed and always running to synchronize time. It is also recommended to include the OpenNMS server itself. ## Install Packages You must install the PostgreSQL YUM repository as described [here](https://www.postgresql.org/download/linux/redhat/). That explains how to add the repository to your Red Hat-based Linux distribution and how to install the server. For instance, for RHEL/CentOS 7: ```bash= sudo yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm sudo yum install -y postgresql12-server ``` For instance, for RHEL/CentOS 8: ```bash= sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm sudo dnf -qy module disable postgresql sudo dnf install -y postgresql12-server ``` :::warning The default configuration and access via `pg_hba.conf` must be updated to use with OpenNMS, especially if OpenNMS will be running on a different server (which is the recommended way to do it). ::: Besides that, and to facilitate the cluster's maintenance, I use [repmgr](https://repmgr.org/), which simplifies the procedure to set up a primary-standby, and even offers a Linux daemon to automatically promote a standby to be the new primary in case the original primary becomes unavailable. However, keep in mind that this is a personal preference; you don't have to do it, as the PostgreSQL tools can do it. Also, other third-party tools can do the same work as `repmgr`, but that falls outside the scope of this guide. The good news is that the YUM repository for PostgreSQL also contains the RPMs for `repmgr`, as it depends on the chosen version of PostgreSQL used. ```bash= sudo yum install -y repmgr_12 ``` :::warning As mentioned, the version must match, which is why the RPM package for `repmgr` needs the major version of PostgreSQL you're using. ::: ## Firewall If the server where PostgreSQL is running has firewalld enabled, make sure to open the ports by doing the following: ```bash= sudo firewall-cmd --permanent --add-service=postgresql sudo firewall-cmd --add-service=postgresql ``` ## Configuring repmgr As we're going to use [repmgr](https://repmgr.org/docs/current/quickstart-repmgr-conf.html) to manage the registration and replication on the primarys and standbys, we need to configure that tool properly. Each node that will be part of the PG cluster needs a unique numeric ID and a unique name (like the hostname). Additionally, the connection information must be supplied. These three elements will be different on each server (you could use the hostname if it is resolvable to the server's IP address or use the IP address instead). The rest of the entries will be common for all the servers. Here is an example: ```bash= repmgr_nodeid=1 repmgr_nodename=$(hostname -s) repmgr_cfg=/etc/repmgr/12/repmgr.conf repmgr_bin=/usr/pgsql-12/bin/repmgr sudo cp $repmgr_cfg $repmgr_cfg.original cat <<EOF | sudo tee $repmgr_cfg # Unique content node_id=$repmgr_nodeid node_name='$repmgr_nodename' conninfo='host=$repmgr_nodename user=repmgr dbname=repmgr' # Shared content data_directory='/var/lib/pgsql/12/data' use_replication_slots=true log_level=INFO reconnect_attempts=3 reconnect_interval=10 failover=manual pg_bindir='/usr/pgsql-12/bin' promote_command='$repmgr_bin standby promote -f $repmgr_cfg --log-to-file' follow_command='$repmgr_bin standby follow -f $repmgr_cfg --log-to-file --upstream-node-id=%n' service_start_command='sudo systemctl start postgresql-12' service_stop_command='sudo systemctl stop postgresql-12' service_reload_command='sudo systemctl reload postgresql-12' service_restart_command='sudo systemctl restart postgresql-12' EOF sudo chown postgres:postgres $repmgr_cfg ``` The above assumes a manual failover. If you enable automatic failover, keep in mind this could lead to split-brain situations making OpenNMS and the database useless if something goes wrong. Also, that assumes the `repmgr` and the `postgres` users at the operating system level have `sudo` access to manipulate the `postgresql` application via `systemctl`. ## Authentication It is important to have authentication enabled to access PostgreSQL. I will discuss this more when explaining the changes in `pg_hba.conf`. However, this can be inconvenient in some cases, so to simplify maintenance, assuming only restricted personnel would have `root` access to the PG servers, we need to create a [password file](https://www.postgresql.org/docs/12/libpq-pgpass.html). This file is located at `/var/lib/pgsql/.pgpass` and should be built like this: ```bash= repmgr_password="repmgr" postgres_password="postgres" pgpass="/var/lib/pgsql/.pgpass" cat <<EOF | sudo tee $pgpass *:*:*:repmgr:$repmgr_password *:*:replication:postgres:$postgres_password EOF sudo chown postgres:postgres $pgpass sudo chmod 600 $pgpass ``` ## Configuring the Primary The following describes how to configure the primary for replication. For more information, follow the [official](https://www.postgresql.org/docs/12/high-availability.html) documentation for high-availability. Keep in mind that to avoid performance impacts, we recommend using asynchronous replication at the expense of consistency, meaning a standby eventuarlly would be consistent, but that won't be guaranteed. The requirements for synchronous replications will affect performance, which is why we don't recommend following this path. Although it is important to monitor the replication status based on PostgreSQL recommendation, the JDBC Collector in OpenNMS can help retrieve the metrics and create alarms when certain thresholds are crossed. However, and because OpenNMS cannot operate without PostgreSQL, it is recommended to monitor PostgreSQL from another OpenNMS with its own DB instance, separate from the PostgreSQL cluster. The first thing to do is initialize the database directory and start the PostgreSQL server: ```bash= sudo /usr/pgsql-12/bin/postgresql-12-setup initdb sudo systemctl enable postgresql-12 sudo systemctl start postgresql-12 ``` ### [pg_hba.conf](https://www.postgresql.org/docs/12/auth-pg-hba-conf.html) For PostgreSQL 12, that file is located at `/var/lib/pgsql/12/data/pg_hba.conf`. There are multiple ways to configure the content here, and the actual solution depends on how many servers would need access to the server for client operations and replications. The client-based operations would be the DBAs and OpenNMS; for replication it would be the `postgres` and the `repmgr` users. The access can be done at the subnet level or by adding the discrete list of servers. The disadvantage of using the list of servers is that adding a new standby would require changing `pg_hba.conf` on every single instance, which could cause complications. However, even if using a subnet level access won't have that problem; it could allow access to servers that shouldn't have it. This is the reason why using credentials is essential and mandatory. The following will assume that the OpenNMS server (or servers) and all the PostgreSQL servers will be in the same subnet. For demonstration purposes, let's assume that network is `192.168.1.0/24`. The content of that file should look like this: ``` # TYPE DATABASE USER ADDRESS METHOD # "local" is for Unix domain socket connections only local all all peer # IPv4 local connections: host all all 127.0.0.1/32 ident # IPv6 local connections: host all all ::1/128 ident # Allow replication connections from localhost, by a user with the # replication privilege. local replication postgres peer host replication postgres 127.0.0.1/32 md5 host replication postgres ::1/128 md5 # OpenNMS connections: host opennms opennms 192.168.1.0/24 md5 host template1 opennms 192.168.1.0/24 md5 # repmgr connections: host repmgr repmgr 192.168.1.0/24 md5 host replication repmgr 192.168.1.0/24 md5 ``` Only the last two sections were added there, and authentication via `md5` is required. :::warning Make sure to adjust the content for your specific use case. ::: Speaking about passwords, make sure they are created properly (the following assumes PostgreSQL is up and running in the primary): ```bash= repmgr_password="repmgr" postgres_password="postgres" sudo -u postgres psql -c "CREATE USER repmgr SUPERUSER REPLICATION LOGIN ENCRYPTED PASSWORD '$repmgr_password';" sudo -u postgres psql -c "CREATE DATABASE repmgr OWNER repmgr;" sudo -u postgres psql -c "ALTER USER postgres WITH ENCRYPTED PASSWORD '$postgres_password';" ``` ### [postgresql.conf](https://www.postgresql.org/docs/12/runtime-config.html) Then you need to modify `/var/lib/pgsql/12/data/postgresql.conf` to enable streaming replication. ``` listen_addresses = '*' default_statistics_target = 100 shared_preload_libraries = 'repmgr' wal_level = 'hot_standby' wal_log_hints = on wal_sender_timeout = 5s max_replication_slots = 5 max_wal_senders = 16 wal_buffers = 16MB checkpoint_completion_target = 0.7 wal_sender_timeout = 1s hot_standby = on log_connections = on default_statistics_target = 100 shared_preload_libraries = 'repmgr' ``` The above was extracted from the PostgreSQL official documentation, plus a few useful blog posts I found. :::warning Note that I'm assuming a maximum of 5 replication slots (i.e., `max_replication_slots`), meaning this cluster could have up to 5 standbys. It is better to set that value to a reasonable number from the beginning, as changing that in the future might be complicated (and falls outside the scope of this guide). Also, note that the `max_wal_senders` should be equal to or greater than `max_replication_slots` as recommended by the documentation. ::: Besides that, it is recommended to apply the basic tuning based on the hardware used. There are multiple ways to approach that, and I found the following site useful in that regard: https://pgtune.leopard.in.ua/#/ I use `Mixed type of application` as the value for `DB Type`. For instance, if the server has 8 Cores and 32 GB of RAM, and assuming 100 connections, you'll get for PG 12: ``` # DB Version: 12 # OS Type: linux # DB Type: mixed # Total Memory (RAM): 32 GB # CPUs num: 8 # Connections num: 100 # Data Storage: ssd max_connections = 100 shared_buffers = 8GB effective_cache_size = 24GB maintenance_work_mem = 2GB checkpoint_completion_target = 0.9 wal_buffers = 16MB default_statistics_target = 100 random_page_cost = 1.1 effective_io_concurrency = 200 work_mem = 10485kB min_wal_size = 1GB max_wal_size = 4GB max_worker_processes = 8 max_parallel_workers_per_gather = 4 max_parallel_workers = 8 max_parallel_maintenance_workers = 4 ``` :::danger In case you're planning to share the PG cluster across multiple OpenNMS instances, even if we don't recommend doing so, keep in mind that the `max_connections` should accommodate ALL the OpenNMS instances that will be connecting to the PG cluster, and keep in mind that the default number of connections for the pool will be 50 (configurable via `opennms-datasources.xml`). So, you either reduce that in OpenNMS (that has the potential side effect of reducing performance) or increase the number of connections on your PostgreSQL Server if your DB servers will be powerful enough. For instance, the above example might have troubles if you have, let's say, 10 OpenNMS servers, due to the impact that would be required to maintain over 500 connections. ::: Once you have all the configuration changes in place for the primary, you must restart the service: ```bash= sudo systemctl restart postgresql-12 ``` ### Register the server as primary The following commands register and set up the PostgreSQL server with the primary role: ```bash= repmgr_cfg=/etc/repmgr/12/repmgr.conf repmgr_bin=/usr/pgsql-12/bin/repmgr sudo -u postgres $repmgr_bin -f $repmgr_cfg -v master register sudo -u postgres $repmgr_bin -f $repmgr_cfg cluster show ``` The last command shows the current state of the cluster, for instance: ```bash= ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+------+---------+-----------+----------+----------+----------+----------+------------------------------------- 1 | pg01 | primary | * running | | default | 100 | 1 | host=pg01 user=repmgr dbname=repmgr ``` ## Adding and configuring a standby :::danger Do not initialize the database as we did for the primary. ::: To add standbys to the cluster, we only need access to the priamry server and have `repmgr` configured as described above. At this point, PostgreSQL should not be running on the standby, only installed, and the data directory must be empty. Then, you can configure the standby with the following command: ```bash= pg_master_server="pg01" repmgr_cfg="/etc/repmgr/12/repmgr.conf" repmgr_bin="/usr/pgsql-12/bin/repmgr" sudo -u postgres $repmgr_bin -h $pg_master_server -U repmgr -d repmgr -f $repmgr_cfg -W standby clone ``` Make sure to adjust `pg_master_server` accordingly. If there is a problem, make sure the password file was properly configured and has the right permissions. Then, you can start the PostgreSQL server: ```bash= sudo systemctl enable postgresql-12 sudo systemctl start postgresql-12 ``` Finally, you can register the standby: ```bash= sudo -u postgres $repmgr_bin -f $repmgr_cfg -v standby register sudo -u postgres $repmgr_bin -f $repmgr_cfg cluster show ``` The last command shows the current state of the cluster, for instance: ```bash= ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+------+---------+-----------+----------+----------+----------+----------+------------------------------------- 1 | pg01 | primary | * running | | default | 100 | 1 | host=pg01 user=repmgr dbname=repmgr 2 | pg02 | standby | running | pg01 | default | 100 | 1 | host=pg02 user=repmgr dbname=repmgr ``` ## Repmgrd - Automatic Failover :::danger Even if this looks very appealing, it might give you the false illusion that it will work for every use case, and that is not the case. For instance, if a network problem happens and a standby where `repmgrd` is running cannot connect to the primary, it will believe the primary is not running and will promote the standby to be the new primary, even though the primary might be up and running. If you have more standbys, they might do the same. When this happens, you end up in a split-brain situation that will affect all applications that require access to the database like OpenNMS, and that's because you must guarantee, at all times, that one and only one server has the primary role. This is another reason why sharing the same cluster across multiple OpenNMS servers is a bad idea. ::: If you're going to use it, make sure you have `failover=automatic` in `/etc/repmgr/12/repmgr.conf`, then start `repmgrd`: ```bash= sudo systemctl start repmgr12 ``` Note that this should not be started with the operating system, which is why I'm skipping the `enable` part. It would be best if you executed this on all the PG servers, and you must guarantee a reliable network across all the servers. ## Configuring OpenNMS We don't need complex third-party tools like `PGPool-II` here because the JDBC Driver for PostgreSQL supports multi-host entries within the URL, and you can even instruct the driver to prefer the primary. The only caveat here is that the URL must be updated every time the PG cluster topology changes. For instance, if you add a new standby, the URL must be updated, and this requires you to restart OpenNMS. This is particularly more tedious if you're sharing the same PG cluster across multiple OpenNMS servers (another reason to avoid that). Here is how it looks like: ``` jdbc:postgresql://pgsrv01:5432,pgsrv02:5432/opennms?targetServerType=master&loadBalanceHosts=false ``` Within the XML configuration inside `opennms-datasources.xml`, make sure to escape the `&` character (it has to be `&` because `&` is a reserved character in XML). For instance: ```xml= <jdbc-data-source name="opennms" database-name="opennms" class-name="org.postgresql.Driver" url="jdbc:postgresql://pgsrv01:5432,pgsrv02:5432/opennms?targetServerType=master&loadBalanceHosts=false" user-name="opennms" password="opennms" /> <jdbc-data-source name="opennms-admin" database-name="template1" class-name="org.postgresql.Driver" url="jdbc:postgresql://pgsrv01:5432,pgsrv02:5432/template1?targetServerType=master&loadBalanceHosts=false" user-name="postgres" password="postgres" /> ``` :::warning Make sure to use the proper password for the `postgres` user and register a password for the OpenNMS application. Also, do not forget to adjust the URL with the proper FQDN or IP addresses of all the primary-standby PostgreSQL cluster members. ::: After executing the `install` script, you can set the password for the `opennms` user in PostgreSQL like this, executed from the primary server: ```bash= onms_password="opennms" sudo -u postgres psql -c "ALTER USER opennms WITH ENCRYPTED PASSWORD '$onms_password';" ``` ## Test automatic failover You can either stop or kill the PostgreSQL process in the primary server and then see how one of the standbys would be promoted automatically. The timing here depends on the `repmgr` parameters for retry and wait. Of course, this will make OpenNMS inoperable until the new standby becomes the primary, the JDBC driver realizes that's the case, then finds the new primary from the multi-host list provided via the JDBC URL, and rebuilds the connections of the pool. Then, OpenNMS will continue working, but OpenNMS might not detect everything that happened during that transition, and anything that OpenNMS was trying to persist during that time will be lost. This is another reason why sharing the PG cluster is bad, because all the OpenNMS servers will suffer from the interruption. ## Promote a standby manually If you're not using automatic failover, once you kill the primary, do the following on one of the standbys to promote it as the new primary: ```bash= /usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf ``` ## Restore a broken server as standby If you suspect data corruption on the file system on which the PostgreSQL data resides, you can restore the failed server as a brand new standby as described above. If not, there are alternatives, but I haven't explored those options. ## Monitor replication From the primary: ```bash= postgres=# \x Expanded display is on. postgres=# select * from pg_stat_replication; -[ RECORD 1 ]----+---------------------------------- pid | 96891 usesysid | 16384 usename | repmgr application_name | pgsrv02 client_addr | 192.168.1.12 client_hostname | client_port | 39028 backend_start | 2021-06-09 10:14:36.181368-05 backend_xmin | state | streaming sent_lsn | 9B/1C53EF40 write_lsn | 9B/1C53EF40 flush_lsn | 9B/1C53EF40 replay_lsn | 9B/1C53EF40 write_lag | 00:00:00.000497 flush_lag | 00:00:00.001131 replay_lag | 00:00:00.001133 sync_priority | 0 sync_state | async reply_time | 2021-06-09 14:00:39.605857-05 ``` From the standby: ```bash= postgres=# \x Expanded display is on. postgres=# select pg_is_in_recovery(),pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp(); -[ RECORD 1 ]-----------------+------------------------------ pg_is_in_recovery | t pg_is_wal_replay_paused | f pg_last_wal_receive_lsn | 9B/2246C9B0 pg_last_wal_replay_lsn | 9B/2246C9B0 pg_last_xact_replay_timestamp | 2021-06-09 14:07:19.736478-05 ``` You can do some math to calculate the lag: ```bash= postgres=# SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay; -[ RECORD 1 ] log_delay | 0 ``` Ideally, it should be 0 or as close as possible to it.