Try   HackMD

Deploying Multiple OpenNMS Backends sharing the same Data Clusters

The objective here is to deploy multiple OpenNMS backends, where each backend will have a set of Minions/Locations. The idea is to share the same Kafka cluster across all the backend/minions set. For this reason, the topics in Kafka should be private to each backend/minions set. That way, the messages will be sent by the appropriate minions and will be processed by the appropriate backend server.

When the data collection feature is required, the inventory should be unique per the backend server. The Cassandra/ScyllaDB will be shared among all the backends. In this case, each backend will have its own keyspace, so the data from one backend won't interfere with the others.

In terms of the inventory, as each backend is basically completely independent and unaware of the other ones, each OpenNMS backend will have its own PostgreSQL server to simplify maintenance and avoid potential bottlenecks that might affect other backends.

The consequence of having independent backends is that to have a unified view for all the monitored devices, the usage of Grafana/Helm with custom dashboards will be mandatory, as the WebUI of each backend will show information only for the devices it is monitoring.

Deployment

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Requirements

  • Ensure that NTP is enabled and running on all the servers involved with the solution, as time synchronization is critical.
  • Ensure that the foreign source name and foreign ID combination are unique for all the monitored nodes across all the OpenNMS backend servers, even if each backend will have its own keyspace on ScyllaDB/Cassandra, to simplify locating specific resources on the unified UI with Helm/Grafana.
  • Ensure that the location names are unique across all the OpenNMS back-end servers, even if they won't share the same PostgreSQL server.
  • Ensure that the Minion ID is unique across all the Minions and OpenNMS back-end servers, even if they won't share the same PostgreSQL server.
  • Ensure that the org.opennms.instance.id attribute is unique per OpenNMS backend server and its Minions.
  • Ensure that the external entity responsible for maintaining the inventory across all the OpenNMS servers is consistent with the above rules and uses the ReST API to configure the SNMP credentials, the Requisitions, and the Foreign Source Definition if needed on the respective backend server.

Constraints

The first requirement is to facilitate the management of the unified Grafana/Helm dashboards, in case combining metrics from multiple backends is desired. Also, it would be useful to be able to identify the source of a given resource/node just by looking at its foreignSource/foreignID.

As Kafka will be in the middle of the communication, one OpenNMS must only get the data from the Minions that it is managing. In theory, we're talking about multiple independent OpenNMS/Minion deployments that don't share inventory. The main purpose of this is to avoid side effects, avoid introducing lag on a topic that belongs to another OpenNMS/Minion combination when one combination is busy, and facilitate troubleshooting.

The location name is part of the RPC topics, but it is not part of the Sink topics. That means the Sink topics will be shared containing information that matters only to certain OpenNMS servers. This is the main motivation for this change.

Solution

On each backend/minion set:

  1. For the OpenNMS server, create a file inside the /opt/opennms/etc/opennms.properties.d/ directory (for example minion.properties) and add an entry for org.opennms.instance.id inside of it; for instance:
echo "org.opennms.instance.id=XXXX" \ > /opt/opennms/etc/opennms.properties.d/minion.properties
  1. For the Minions, append the org.opennms.instance.id property and its value at the endfile called /opt/minion/etc/custom.system.properties; for instance:
echo "org.opennms.instance.id=XXXX" \ >> /opt/minion/etc/custom.system.properties

The suggestion for the ID format would be:

OpenNMS.XXX

Or

CompanyName.XXX

Where XXX is the custom identifier per OpenNMS backend. It is suggested to use a combination like DC+Region. Please make sure it is a short string without special characters, as this will be part of each topic name. There is no need to include that information on the location's name to include other information for the location.

The actual value of the Instance ID (XXXX in the above example) must match between OpenNMS and its Minions.

Background

Let's analyze the existing topics from a test environment:

[ec2-user@kafka1 ~]$ /opt/kafka/bin/kafka-topics.sh --zookeeper zookeeper1:2181/kafka --list OpenNMS.Alarms OpenNMS.Apex.rpc-request.Collect OpenNMS.Apex.rpc-request.DNS OpenNMS.Apex.rpc-request.Detect OpenNMS.Apex.rpc-request.Echo OpenNMS.Apex.rpc-request.PING OpenNMS.Apex.rpc-request.PING-SWEEP OpenNMS.Apex.rpc-request.Poller OpenNMS.Apex.rpc-request.Requisition OpenNMS.Apex.rpc-request.SNMP OpenNMS.Durham.rpc-request.Collect OpenNMS.Durham.rpc-request.DNS OpenNMS.Durham.rpc-request.Detect OpenNMS.Durham.rpc-request.Echo OpenNMS.Durham.rpc-request.PING OpenNMS.Durham.rpc-request.PING-SWEEP OpenNMS.Durham.rpc-request.Poller OpenNMS.Durham.rpc-request.Requisition OpenNMS.Durham.rpc-request.SNMP OpenNMS.Events OpenNMS.Metrics OpenNMS.Nodes OpenNMS.ALEC.Inventory OpenNMS.Sink.Events OpenNMS.Sink.Heartbeat OpenNMS.Sink.Syslog OpenNMS.Sink.Telemetry-IPFIX OpenNMS.Sink.Telemetry-JTI OpenNMS.Sink.Telemetry-NXOS OpenNMS.Sink.Telemetry-Netflow-5 OpenNMS.Sink.Telemetry-Netflow-9 OpenNMS.Sink.Trap OpenNMS.rpc-response.Collect OpenNMS.rpc-response.DNS OpenNMS.rpc-response.Detect OpenNMS.rpc-response.Echo OpenNMS.rpc-response.Poller OpenNMS.rpc-response.SNMP __consumer_offsets alec-datasource-alarmStore-changelog alec-datasource-inventoryStore-changelog alec-datasource-situationStore-changelog

The above is the result when having single-topic=false (which is the default for some versions of OpenNMS).

If you have single-topic=true, that reduces the number of RPC topics; for instance:

[ec2-user@kafka1 ~]$ /opt/kafka/bin/kafka-topics.sh --zookeeper zookeeper1:2181/kafka --list OpenNMS.Alarms OpenNMS.Apex.rpc-request OpenNMS.Durham.rpc-request OpenNMS.Events OpenNMS.Metrics OpenNMS.Nodes OpenNMS.ALEC.Inventory OpenNMS.Sink.Events OpenNMS.Sink.Heartbeat OpenNMS.Sink.Syslog OpenNMS.Sink.Telemetry-IPFIX OpenNMS.Sink.Telemetry-JTI OpenNMS.Sink.Telemetry-NXOS OpenNMS.Sink.Telemetry-Netflow-5 OpenNMS.Sink.Telemetry-Netflow-9 OpenNMS.Sink.Trap OpenNMS.rpc-response __consumer_offsets alec-datasource-alarmStore-changelog alec-datasource-inventoryStore-changelog alec-datasource-situationStore-changelog

On this particular installation, there is one OpenNMS instance with 2 Minions on different locations (Apex and Durham) handling telemetry, flows, syslog traps messages.

Let's analyze the topics that exist on Kafka:

  1. Kafka Producer

This is an optional feature to forward the inventory, events, alarms and metrics to Kafka topics. The topic names are:

OpenNMS.Alarms OpenNMS.Nodes OpenNMS.Events OpenNMS.Metrics

This can be fully customized per OpenNMS server. In fact, these can be shared across all the OpenNMS instances. The only consumers for these topics on this example deployment is the correlation engine (using the first 2 topics).

  1. ALEC

This is an optional feature for advanced alarm correlation. The topic names are:

OpenNMS.ALEC.Inventory alec-datasource-alarmStore-changelog alec-datasource-inventoryStore-changelog alec-datasource-situationStore-changelog

These topics are managed by the ALEC (Architecture for Learning Enabled Correlation), a new feature currently under development.

  1. Sink Pattern

This is part of the core communication between OpenNMS and Minions, and cover all asynchronous messages sent by Minions to OpenNMS. The topic names are:

OpenNMS.Sink.Events OpenNMS.Sink.Heartbeat OpenNMS.Sink.Syslog OpenNMS.Sink.Telemetry-IPFIX OpenNMS.Sink.Telemetry-JTI OpenNMS.Sink.Telemetry-NXOS OpenNMS.Sink.Telemetry-Netflow-5 OpenNMS.Sink.Telemetry-Netflow-9 OpenNMS.Sink.Trap

These are related with async messages like SNMP Traps, Syslog Messages, Flow data and streaming telemetry data sent by the monitored devices to Minions.

These topics are the ones that should not be shared across multiple OpenNMS instances, and required a custom "instanceId" to make sure each OpenNMS backend and its Minions will produce/consume only on their topics.

  1. RPC Requests [from OpenNMS to Minions]

This is part of the core communication between OpenNMS and Minions, and cover all the messages associated with synchronous monitoring (like polling and data collection).

NOTE: Using RPC with Kafka is not available on OpenNMS Meridian, this feature requires OpenNMS Horizon 23 or newer. For Meridian, ActiveMQ will manage RPC communication and the theory about the names applies to the AMQ Queue Names.

When not using single-topic, the topic names are:

OpenNMS.Apex.rpc-request.Collect OpenNMS.Apex.rpc-request.DNS OpenNMS.Apex.rpc-request.Detect OpenNMS.Apex.rpc-request.Echo OpenNMS.Apex.rpc-request.PING OpenNMS.Apex.rpc-request.PING-SWEEP OpenNMS.Apex.rpc-request.Poller OpenNMS.Apex.rpc-request.Requisition OpenNMS.Apex.rpc-request.SNMP
OpenNMS.Durham.rpc-request.Collect OpenNMS.Durham.rpc-request.DNS OpenNMS.Durham.rpc-request.Detect OpenNMS.Durham.rpc-request.Echo OpenNMS.Durham.rpc-request.PING OpenNMS.Durham.rpc-request.PING-SWEEP OpenNMS.Durham.rpc-request.Poller OpenNMS.Durham.rpc-request.Requisition OpenNMS.Durham.rpc-request.SNMP

When using single-topic, the topic names are:

OpenNMS.Apex.rpc-request OpenNMS.Durham.rpc-request

These topics are divided per Location. As you can see the location names are Durham and Apex.

These set works in conjunction with the rpc-response set, so even if these topics are constrained by location, the relation with the other topic set force us to use the instanceId feature per OpenNMS backend.

  1. RPC Responses [from Minions to OpenNMS]

These topics are also associated with RPC communication, but don't contain the location name (hence the need to set the instanceId):

When not using single-topic, the topic names are:

OpenNMS.rpc-response.Collect OpenNMS.rpc-response.DNS OpenNMS.rpc-response.Detect OpenNMS.rpc-response.Echo OpenNMS.rpc-response.Poller OpenNMS.rpc-response.SNMP

When using single-topic, the topic names are:

OpenNMS.rpc-response

Using RPC with Kafka is not available on OpenNMS Meridian, this feature requires OpenNMS Horizon 23 or newer. For Meridian, ActiveMQ will manage RPC communication, but the response Queues are handled differently compared with the solution for Kafka.

These are shared topics that handle RPC responses from all the Minions from all the locations.

For similar reasons like the Sink topics, these should be constrained per OpenNMS backend and its Minions, and the way to do it is by setting an instanceId per OpenNMS backend.

InstanceID Example

All the topics affected by the Instance ID are the Sink and RPC topics. The default value of the instanceId is OpenNMS (for OpenNMS Horizon), or Meridian (for OpenNMS Meridian 2018 or newer).

When this ID is set with let's say CompanyName-DC1-G1 (for data center 1, group/sub-group 1), the topics will look like the following, assuming a location name of Loc1:

CompanyName-DC1-G1.Sink.Events CompanyName-DC1-G1.Sink.Heartbeat CompanyName-DC1-G1.Sink.Syslog CompanyName-DC1-G1.Sink.Trap
CompanyName-DC1-G1.Loc1.rpc-request.Collect CompanyName-DC1-G1.Loc1.rpc-request.DNS CompanyName-DC1-G1.Loc1.rpc-request.Detect CompanyName-DC1-G1.Loc1.rpc-request.Echo CompanyName-DC1-G1.Loc1.rpc-request.PING CompanyName-DC1-G1.Loc1.rpc-request.PING-SWEEP CompanyName-DC1-G1.Loc1.rpc-request.Poller CompanyName-DC1-G1.Loc1.rpc-request.Requisition CompanyName-DC1-G1.Loc1.rpc-request.SNMP
CompanyName-DC1-G1.rpc-response.Collect CompanyName-DC1-G1.rpc-response.DNS CompanyName-DC1-G1.rpc-response.Detect CompanyName-DC1-G1.rpc-response.Echo CompanyName-DC1-G1.rpc-response.Poller CompanyName-DC1-G1.rpc-response.SNMP

Obviously, if a second OpenNMS backend has an instanceId of CompanyName-DC1-G2 or CompanyName-DC2-G3, the topics won't interfere between each other, and they can safely share the same Kafka Cluster.

About ActiveMQ

The above solution was designed to be implemented using Kafka. Unfortunately, there are situations on which it is required to use ActiveMQ, for example, when using Meridian 2018, or a version of Horizon older than 23.

ActiveMQ requires additional changes on $OPENNMS_HOME/etc/opennms-activemq.xml when the instance ID is changed. To be more precise, under a section called authorizationEntries.

The default configuration contains the following:

<!-- Users in the minion role can write/create queues that are not keyed by location -->
<authorizationEntry queue="OpenNMS.*.*" write="minion" admin="minion" />
<!-- Users in the minion role can read/create from queues that are keyed by location -->
<authorizationEntry queue="OpenNMS.*.*.*" read="minion" admin="minion" />

The prefix OpenNMS should be replaced with the chosen Instance ID. In case of Meridian 2018, the default prefix is different but the logic still applies.