Try   HackMD

Setting up a primary-standby PostgreSQL 12 Cluster for OpenNMS

I'm not a PostgreSQL expert. This guide should serve as a reference based on my experience with that database when I helped customers deploying it in their environments.

PostgreSQL (PG) is a powerful, open source, object-relational database system with over 30 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance.

OpenNMS uses PostgreSQL to store the inventory and the core objects OpenNMS requires to be able to work properly. In other words, PostgreSQL is an essential and mandatory component, and OpenNMS cannot work without it.

It is crucial to keep in mind that a PG cluster in primary-standby mode is for disaster recovery or high availability purposes. PostgreSQL is NOT a distributed database, meaning the data will always be handled by the primary server, which is the only one capable of executing mutations (i.e., add, modify and delete entries in tables). Standbys will be in replication mode all the time, and they can be used in read-only mode for reporting purposes, but not for normal OpenNMS operation.

I will describe how to install and configure a PostgreSQL 12 cluster in primary-standby mode on RHEL/CentOS 7/8 using the RPMs from yum.postgresql.org. This is because RHEL/CentOS 7 comes with PG 9.2, and RHEL/CentOS 8 comes with PG 10. In the case of PG 10, it can be used with modern versions of OpenNMS Horizon and Meridian; although, I'll focus on a newer version here.

If you need more than one instance of OpenNMS, we recommend having a dedicated PostgreSQL instance for each of them and avoid sharing the same PostgreSQL server (or primary-standby cluster) across multiple OpenNMS servers. That's because if there is a problem with PostgreSQL or the operating system on which it is running, or maintenance work is required on it, that will affect ALL the OpenNMS servers (due to the unavoidable downtime it generates, as OpenNMS cannot work without PostgreSQL). Whereas, if each OpenNMS has its own instance, any work on the PostgreSQL application or the underlying operating system will never affect the other OpenNMS instances.

Please keep in mind that it is possible that you would find in the official documentation or blog posts the term "master" when referring to the "primary" server and the term "slave" when referring to a "standby" server.

NTP

All the PostgreSQL nodes must have their time synchronized all the time. Failing on this would lead to data inconsistencies, as events and alarms operations in OpenNMS are time sensitive.

For this reason, it is mandatory to have NTP installed and always running to synchronize time.

It is also recommended to include the OpenNMS server itself.

Install Packages

You must install the PostgreSQL YUM repository as described here. That explains how to add the repository to your Red Hat-based Linux distribution and how to install the server.

For instance, for RHEL/CentOS 7:

sudo yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm sudo yum install -y postgresql12-server

For instance, for RHEL/CentOS 8:

sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm sudo dnf -qy module disable postgresql sudo dnf install -y postgresql12-server

The default configuration and access via pg_hba.conf must be updated to use with OpenNMS, especially if OpenNMS will be running on a different server (which is the recommended way to do it).

Besides that, and to facilitate the cluster's maintenance, I use repmgr, which simplifies the procedure to set up a primary-standby, and even offers a Linux daemon to automatically promote a standby to be the new primary in case the original primary becomes unavailable. However, keep in mind that this is a personal preference; you don't have to do it, as the PostgreSQL tools can do it. Also, other third-party tools can do the same work as repmgr, but that falls outside the scope of this guide.

The good news is that the YUM repository for PostgreSQL also contains the RPMs for repmgr, as it depends on the chosen version of PostgreSQL used.

sudo yum install -y repmgr_12

As mentioned, the version must match, which is why the RPM package for repmgr needs the major version of PostgreSQL you're using.

Firewall

If the server where PostgreSQL is running has firewalld enabled, make sure to open the ports by doing the following:

sudo firewall-cmd --permanent --add-service=postgresql sudo firewall-cmd --add-service=postgresql

Configuring repmgr

As we're going to use repmgr to manage the registration and replication on the primarys and standbys, we need to configure that tool properly.

Each node that will be part of the PG cluster needs a unique numeric ID and a unique name (like the hostname). Additionally, the connection information must be supplied. These three elements will be different on each server (you could use the hostname if it is resolvable to the server's IP address or use the IP address instead). The rest of the entries will be common for all the servers.

Here is an example:

repmgr_nodeid=1 repmgr_nodename=$(hostname -s) repmgr_cfg=/etc/repmgr/12/repmgr.conf repmgr_bin=/usr/pgsql-12/bin/repmgr sudo cp $repmgr_cfg $repmgr_cfg.original cat <<EOF | sudo tee $repmgr_cfg # Unique content node_id=$repmgr_nodeid node_name='$repmgr_nodename' conninfo='host=$repmgr_nodename user=repmgr dbname=repmgr' # Shared content data_directory='/var/lib/pgsql/12/data' use_replication_slots=true log_level=INFO reconnect_attempts=3 reconnect_interval=10 failover=manual pg_bindir='/usr/pgsql-12/bin' promote_command='$repmgr_bin standby promote -f $repmgr_cfg --log-to-file' follow_command='$repmgr_bin standby follow -f $repmgr_cfg --log-to-file --upstream-node-id=%n' service_start_command='sudo systemctl start postgresql-12' service_stop_command='sudo systemctl stop postgresql-12' service_reload_command='sudo systemctl reload postgresql-12' service_restart_command='sudo systemctl restart postgresql-12' EOF sudo chown postgres:postgres $repmgr_cfg

The above assumes a manual failover. If you enable automatic failover, keep in mind this could lead to split-brain situations making OpenNMS and the database useless if something goes wrong.

Also, that assumes the repmgr and the postgres users at the operating system level have sudo access to manipulate the postgresql application via systemctl.

Authentication

It is important to have authentication enabled to access PostgreSQL. I will discuss this more when explaining the changes in pg_hba.conf.

However, this can be inconvenient in some cases, so to simplify maintenance, assuming only restricted personnel would have root access to the PG servers, we need to create a password file. This file is located at /var/lib/pgsql/.pgpass and should be built like this:

repmgr_password="repmgr" postgres_password="postgres" pgpass="/var/lib/pgsql/.pgpass" cat <<EOF | sudo tee $pgpass *:*:*:repmgr:$repmgr_password *:*:replication:postgres:$postgres_password EOF sudo chown postgres:postgres $pgpass sudo chmod 600 $pgpass

Configuring the Primary

The following describes how to configure the primary for replication. For more information, follow the official documentation for high-availability.

Keep in mind that to avoid performance impacts, we recommend using asynchronous replication at the expense of consistency, meaning a standby eventuarlly would be consistent, but that won't be guaranteed. The requirements for synchronous replications will affect performance, which is why we don't recommend following this path. Although it is important to monitor the replication status based on PostgreSQL recommendation, the JDBC Collector in OpenNMS can help retrieve the metrics and create alarms when certain thresholds are crossed. However, and because OpenNMS cannot operate without PostgreSQL, it is recommended to monitor PostgreSQL from another OpenNMS with its own DB instance, separate from the PostgreSQL cluster.

The first thing to do is initialize the database directory and start the PostgreSQL server:

sudo /usr/pgsql-12/bin/postgresql-12-setup initdb sudo systemctl enable postgresql-12 sudo systemctl start postgresql-12

pg_hba.conf

For PostgreSQL 12, that file is located at /var/lib/pgsql/12/data/pg_hba.conf.

There are multiple ways to configure the content here, and the actual solution depends on how many servers would need access to the server for client operations and replications. The client-based operations would be the DBAs and OpenNMS; for replication it would be the postgres and the repmgr users.

The access can be done at the subnet level or by adding the discrete list of servers. The disadvantage of using the list of servers is that adding a new standby would require changing pg_hba.conf on every single instance, which could cause complications. However, even if using a subnet level access won't have that problem; it could allow access to servers that shouldn't have it. This is the reason why using credentials is essential and mandatory.

The following will assume that the OpenNMS server (or servers) and all the PostgreSQL servers will be in the same subnet. For demonstration purposes, let's assume that network is 192.168.1.0/24.

The content of that file should look like this:

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     peer
# IPv4 local connections:
host    all             all             127.0.0.1/32            ident
# IPv6 local connections:
host    all             all             ::1/128                 ident
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     postgres                                peer
host    replication     postgres        127.0.0.1/32            md5
host    replication     postgres        ::1/128                 md5
# OpenNMS connections:
host    opennms         opennms         192.168.1.0/24          md5
host    template1       opennms         192.168.1.0/24          md5
# repmgr connections:
host    repmgr          repmgr          192.168.1.0/24          md5
host    replication     repmgr          192.168.1.0/24          md5

Only the last two sections were added there, and authentication via md5 is required.

Make sure to adjust the content for your specific use case.

Speaking about passwords, make sure they are created properly (the following assumes PostgreSQL is up and running in the primary):

repmgr_password="repmgr" postgres_password="postgres" sudo -u postgres psql -c "CREATE USER repmgr SUPERUSER REPLICATION LOGIN ENCRYPTED PASSWORD '$repmgr_password';" sudo -u postgres psql -c "CREATE DATABASE repmgr OWNER repmgr;" sudo -u postgres psql -c "ALTER USER postgres WITH ENCRYPTED PASSWORD '$postgres_password';"

postgresql.conf

Then you need to modify /var/lib/pgsql/12/data/postgresql.conf to enable streaming replication.

listen_addresses = '*'
default_statistics_target = 100
shared_preload_libraries = 'repmgr'
wal_level = 'hot_standby'
wal_log_hints = on
wal_sender_timeout = 5s
max_replication_slots = 5
max_wal_senders = 16
wal_buffers = 16MB
checkpoint_completion_target = 0.7
wal_sender_timeout = 1s
hot_standby = on
log_connections = on
default_statistics_target = 100
shared_preload_libraries = 'repmgr'

The above was extracted from the PostgreSQL official documentation, plus a few useful blog posts I found.

Note that I'm assuming a maximum of 5 replication slots (i.e., max_replication_slots), meaning this cluster could have up to 5 standbys. It is better to set that value to a reasonable number from the beginning, as changing that in the future might be complicated (and falls outside the scope of this guide). Also, note that the max_wal_senders should be equal to or greater than max_replication_slots as recommended by the documentation.

Besides that, it is recommended to apply the basic tuning based on the hardware used. There are multiple ways to approach that, and I found the following site useful in that regard:

https://pgtune.leopard.in.ua/#/

I use Mixed type of application as the value for DB Type.

For instance, if the server has 8 Cores and 32 GB of RAM, and assuming 100 connections, you'll get for PG 12:

# DB Version: 12
# OS Type: linux
# DB Type: mixed
# Total Memory (RAM): 32 GB
# CPUs num: 8
# Connections num: 100
# Data Storage: ssd

max_connections = 100
shared_buffers = 8GB
effective_cache_size = 24GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 10485kB
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 8
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
max_parallel_maintenance_workers = 4

In case you're planning to share the PG cluster across multiple OpenNMS instances, even if we don't recommend doing so, keep in mind that the max_connections should accommodate ALL the OpenNMS instances that will be connecting to the PG cluster, and keep in mind that the default number of connections for the pool will be 50 (configurable via opennms-datasources.xml). So, you either reduce that in OpenNMS (that has the potential side effect of reducing performance) or increase the number of connections on your PostgreSQL Server if your DB servers will be powerful enough. For instance, the above example might have troubles if you have, let's say, 10 OpenNMS servers, due to the impact that would be required to maintain over 500 connections.

Once you have all the configuration changes in place for the primary, you must restart the service:

sudo systemctl restart postgresql-12

Register the server as primary

The following commands register and set up the PostgreSQL server with the primary role:

repmgr_cfg=/etc/repmgr/12/repmgr.conf repmgr_bin=/usr/pgsql-12/bin/repmgr sudo -u postgres $repmgr_bin -f $repmgr_cfg -v master register sudo -u postgres $repmgr_bin -f $repmgr_cfg cluster show

The last command shows the current state of the cluster, for instance:

ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+------+---------+-----------+----------+----------+----------+----------+------------------------------------- 1 | pg01 | primary | * running | | default | 100 | 1 | host=pg01 user=repmgr dbname=repmgr

Adding and configuring a standby

Do not initialize the database as we did for the primary.

To add standbys to the cluster, we only need access to the priamry server and have repmgr configured as described above. At this point, PostgreSQL should not be running on the standby, only installed, and the data directory must be empty.

Then, you can configure the standby with the following command:

pg_master_server="pg01" repmgr_cfg="/etc/repmgr/12/repmgr.conf" repmgr_bin="/usr/pgsql-12/bin/repmgr" sudo -u postgres $repmgr_bin -h $pg_master_server -U repmgr -d repmgr -f $repmgr_cfg -W standby clone

Make sure to adjust pg_master_server accordingly.

If there is a problem, make sure the password file was properly configured and has the right permissions.

Then, you can start the PostgreSQL server:

sudo systemctl enable postgresql-12 sudo systemctl start postgresql-12

Finally, you can register the standby:

sudo -u postgres $repmgr_bin -f $repmgr_cfg -v standby register sudo -u postgres $repmgr_bin -f $repmgr_cfg cluster show

The last command shows the current state of the cluster, for instance:

ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+------+---------+-----------+----------+----------+----------+----------+------------------------------------- 1 | pg01 | primary | * running | | default | 100 | 1 | host=pg01 user=repmgr dbname=repmgr 2 | pg02 | standby | running | pg01 | default | 100 | 1 | host=pg02 user=repmgr dbname=repmgr

Repmgrd - Automatic Failover

Even if this looks very appealing, it might give you the false illusion that it will work for every use case, and that is not the case. For instance, if a network problem happens and a standby where repmgrd is running cannot connect to the primary, it will believe the primary is not running and will promote the standby to be the new primary, even though the primary might be up and running. If you have more standbys, they might do the same. When this happens, you end up in a split-brain situation that will affect all applications that require access to the database like OpenNMS, and that's because you must guarantee, at all times, that one and only one server has the primary role. This is another reason why sharing the same cluster across multiple OpenNMS servers is a bad idea.

If you're going to use it, make sure you have failover=automatic in /etc/repmgr/12/repmgr.conf, then start repmgrd:

sudo systemctl start repmgr12

Note that this should not be started with the operating system, which is why I'm skipping the enable part. It would be best if you executed this on all the PG servers, and you must guarantee a reliable network across all the servers.

Configuring OpenNMS

We don't need complex third-party tools like PGPool-II here because the JDBC Driver for PostgreSQL supports multi-host entries within the URL, and you can even instruct the driver to prefer the primary.

The only caveat here is that the URL must be updated every time the PG cluster topology changes. For instance, if you add a new standby, the URL must be updated, and this requires you to restart OpenNMS. This is particularly more tedious if you're sharing the same PG cluster across multiple OpenNMS servers (another reason to avoid that).

Here is how it looks like:

jdbc:postgresql://pgsrv01:5432,pgsrv02:5432/opennms?targetServerType=master&loadBalanceHosts=false

Within the XML configuration inside opennms-datasources.xml, make sure to escape the & character (it has to be &amp; because & is a reserved character in XML). For instance:

<jdbc-data-source name="opennms" database-name="opennms" class-name="org.postgresql.Driver" url="jdbc:postgresql://pgsrv01:5432,pgsrv02:5432/opennms?targetServerType=master&amp;loadBalanceHosts=false" user-name="opennms" password="opennms" /> <jdbc-data-source name="opennms-admin" database-name="template1" class-name="org.postgresql.Driver" url="jdbc:postgresql://pgsrv01:5432,pgsrv02:5432/template1?targetServerType=master&amp;loadBalanceHosts=false" user-name="postgres" password="postgres" />

Make sure to use the proper password for the postgres user and register a password for the OpenNMS application. Also, do not forget to adjust the URL with the proper FQDN or IP addresses of all the primary-standby PostgreSQL cluster members.

After executing the install script, you can set the password for the opennms user in PostgreSQL like this, executed from the primary server:

onms_password="opennms" sudo -u postgres psql -c "ALTER USER opennms WITH ENCRYPTED PASSWORD '$onms_password';"

Test automatic failover

You can either stop or kill the PostgreSQL process in the primary server and then see how one of the standbys would be promoted automatically. The timing here depends on the repmgr parameters for retry and wait.

Of course, this will make OpenNMS inoperable until the new standby becomes the primary, the JDBC driver realizes that's the case, then finds the new primary from the multi-host list provided via the JDBC URL, and rebuilds the connections of the pool. Then, OpenNMS will continue working, but OpenNMS might not detect everything that happened during that transition, and anything that OpenNMS was trying to persist during that time will be lost. This is another reason why sharing the PG cluster is bad, because all the OpenNMS servers will suffer from the interruption.

Promote a standby manually

If you're not using automatic failover, once you kill the primary, do the following on one of the standbys to promote it as the new primary:

/usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf

Restore a broken server as standby

If you suspect data corruption on the file system on which the PostgreSQL data resides, you can restore the failed server as a brand new standby as described above. If not, there are alternatives, but I haven't explored those options.

Monitor replication

From the primary:

postgres=# \x Expanded display is on. postgres=# select * from pg_stat_replication; -[ RECORD 1 ]----+---------------------------------- pid | 96891 usesysid | 16384 usename | repmgr application_name | pgsrv02 client_addr | 192.168.1.12 client_hostname | client_port | 39028 backend_start | 2021-06-09 10:14:36.181368-05 backend_xmin | state | streaming sent_lsn | 9B/1C53EF40 write_lsn | 9B/1C53EF40 flush_lsn | 9B/1C53EF40 replay_lsn | 9B/1C53EF40 write_lag | 00:00:00.000497 flush_lag | 00:00:00.001131 replay_lag | 00:00:00.001133 sync_priority | 0 sync_state | async reply_time | 2021-06-09 14:00:39.605857-05

From the standby:

postgres=# \x Expanded display is on. postgres=# select pg_is_in_recovery(),pg_is_wal_replay_paused(), pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp(); -[ RECORD 1 ]-----------------+------------------------------ pg_is_in_recovery | t pg_is_wal_replay_paused | f pg_last_wal_receive_lsn | 9B/2246C9B0 pg_last_wal_replay_lsn | 9B/2246C9B0 pg_last_xact_replay_timestamp | 2021-06-09 14:07:19.736478-05

You can do some math to calculate the lag:

postgres=# SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0 ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) END AS log_delay; -[ RECORD 1 ] log_delay | 0

Ideally, it should be 0 or as close as possible to it.