The objective here is to deploy multiple OpenNMS backends, where each backend will have a set of Minions/Locations. The idea is to share the same Kafka cluster across all the backend/minions set. For this reason, the topics in Kafka should be private to each backend/minions set. That way, the messages will be sent by the appropriate minions and will be processed by the appropriate backend server.
When the data collection feature is required, the inventory should be unique per the backend server. The Cassandra/ScyllaDB will be shared among all the backends. In this case, each backend will have its own keyspace, so the data from one backend won't interfere with the others.
In terms of the inventory, as each backend is basically completely independent and unaware of the other ones, each OpenNMS backend will have its own PostgreSQL server to simplify maintenance and avoid potential bottlenecks that might affect other backends.
The consequence of having independent backends is that to have a unified view for all the monitored devices, the usage of Grafana/Helm with custom dashboards will be mandatory, as the WebUI of each backend will show information only for the devices it is monitoring.
org.opennms.instance.id
attribute is unique per OpenNMS backend server and its Minions.The first requirement is to facilitate the management of the unified Grafana/Helm dashboards, in case combining metrics from multiple backends is desired. Also, it would be useful to be able to identify the source of a given resource/node just by looking at its foreignSource/foreignID.
As Kafka will be in the middle of the communication, one OpenNMS must only get the data from the Minions that it is managing. In theory, we're talking about multiple independent OpenNMS/Minion deployments that don't share inventory. The main purpose of this is to avoid side effects, avoid introducing lag on a topic that belongs to another OpenNMS/Minion combination when one combination is busy, and facilitate troubleshooting.
The location name is part of the RPC topics, but it is not part of the Sink topics. That means the Sink topics will be shared containing information that matters only to certain OpenNMS servers. This is the main motivation for this change.
On each backend/minion set:
/opt/opennms/etc/opennms.properties.d/
directory (for example minion.properties
) and add an entry for org.opennms.instance.id
inside of it; for instance:
echo "org.opennms.instance.id=XXXX" \
> /opt/opennms/etc/opennms.properties.d/minion.properties
org.opennms.instance.id
property and its value at the endfile called /opt/minion/etc/custom.system.properties
; for instance:
echo "org.opennms.instance.id=XXXX" \
>> /opt/minion/etc/custom.system.properties
The suggestion for the ID format would be:
OpenNMS.XXX
Or
CompanyName.XXX
Where XXX
is the custom identifier per OpenNMS backend. It is suggested to use a combination like DC+Region. Please make sure it is a short string without special characters, as this will be part of each topic name. There is no need to include that information on the location's name to include other information for the location.
The actual value of the Instance ID (XXXX
in the above example) must match between OpenNMS and its Minions.
Let's analyze the existing topics from a test environment:
[ec2-user@kafka1 ~]$ /opt/kafka/bin/kafka-topics.sh --zookeeper zookeeper1:2181/kafka --list
OpenNMS.Alarms
OpenNMS.Apex.rpc-request.Collect
OpenNMS.Apex.rpc-request.DNS
OpenNMS.Apex.rpc-request.Detect
OpenNMS.Apex.rpc-request.Echo
OpenNMS.Apex.rpc-request.PING
OpenNMS.Apex.rpc-request.PING-SWEEP
OpenNMS.Apex.rpc-request.Poller
OpenNMS.Apex.rpc-request.Requisition
OpenNMS.Apex.rpc-request.SNMP
OpenNMS.Durham.rpc-request.Collect
OpenNMS.Durham.rpc-request.DNS
OpenNMS.Durham.rpc-request.Detect
OpenNMS.Durham.rpc-request.Echo
OpenNMS.Durham.rpc-request.PING
OpenNMS.Durham.rpc-request.PING-SWEEP
OpenNMS.Durham.rpc-request.Poller
OpenNMS.Durham.rpc-request.Requisition
OpenNMS.Durham.rpc-request.SNMP
OpenNMS.Events
OpenNMS.Metrics
OpenNMS.Nodes
OpenNMS.ALEC.Inventory
OpenNMS.Sink.Events
OpenNMS.Sink.Heartbeat
OpenNMS.Sink.Syslog
OpenNMS.Sink.Telemetry-IPFIX
OpenNMS.Sink.Telemetry-JTI
OpenNMS.Sink.Telemetry-NXOS
OpenNMS.Sink.Telemetry-Netflow-5
OpenNMS.Sink.Telemetry-Netflow-9
OpenNMS.Sink.Trap
OpenNMS.rpc-response.Collect
OpenNMS.rpc-response.DNS
OpenNMS.rpc-response.Detect
OpenNMS.rpc-response.Echo
OpenNMS.rpc-response.Poller
OpenNMS.rpc-response.SNMP
__consumer_offsets
alec-datasource-alarmStore-changelog
alec-datasource-inventoryStore-changelog
alec-datasource-situationStore-changelog
The above is the result when having single-topic=false
(which is the default for some versions of OpenNMS).
If you have single-topic=true
, that reduces the number of RPC topics; for instance:
[ec2-user@kafka1 ~]$ /opt/kafka/bin/kafka-topics.sh --zookeeper zookeeper1:2181/kafka --list
OpenNMS.Alarms
OpenNMS.Apex.rpc-request
OpenNMS.Durham.rpc-request
OpenNMS.Events
OpenNMS.Metrics
OpenNMS.Nodes
OpenNMS.ALEC.Inventory
OpenNMS.Sink.Events
OpenNMS.Sink.Heartbeat
OpenNMS.Sink.Syslog
OpenNMS.Sink.Telemetry-IPFIX
OpenNMS.Sink.Telemetry-JTI
OpenNMS.Sink.Telemetry-NXOS
OpenNMS.Sink.Telemetry-Netflow-5
OpenNMS.Sink.Telemetry-Netflow-9
OpenNMS.Sink.Trap
OpenNMS.rpc-response
__consumer_offsets
alec-datasource-alarmStore-changelog
alec-datasource-inventoryStore-changelog
alec-datasource-situationStore-changelog
On this particular installation, there is one OpenNMS instance with 2 Minions on different locations (Apex and Durham) handling telemetry, flows, syslog traps messages.
Let's analyze the topics that exist on Kafka:
This is an optional feature to forward the inventory, events, alarms and metrics to Kafka topics. The topic names are:
OpenNMS.Alarms
OpenNMS.Nodes
OpenNMS.Events
OpenNMS.Metrics
This can be fully customized per OpenNMS server. In fact, these can be shared across all the OpenNMS instances. The only consumers for these topics on this example deployment is the correlation engine (using the first 2 topics).
This is an optional feature for advanced alarm correlation. The topic names are:
OpenNMS.ALEC.Inventory
alec-datasource-alarmStore-changelog
alec-datasource-inventoryStore-changelog
alec-datasource-situationStore-changelog
These topics are managed by the ALEC (Architecture for Learning Enabled Correlation), a new feature currently under development.
This is part of the core communication between OpenNMS and Minions, and cover all asynchronous messages sent by Minions to OpenNMS. The topic names are:
OpenNMS.Sink.Events
OpenNMS.Sink.Heartbeat
OpenNMS.Sink.Syslog
OpenNMS.Sink.Telemetry-IPFIX
OpenNMS.Sink.Telemetry-JTI
OpenNMS.Sink.Telemetry-NXOS
OpenNMS.Sink.Telemetry-Netflow-5
OpenNMS.Sink.Telemetry-Netflow-9
OpenNMS.Sink.Trap
These are related with async messages like SNMP Traps, Syslog Messages, Flow data and streaming telemetry data sent by the monitored devices to Minions.
These topics are the ones that should not be shared across multiple OpenNMS instances, and required a custom "instanceId" to make sure each OpenNMS backend and its Minions will produce/consume only on their topics.
This is part of the core communication between OpenNMS and Minions, and cover all the messages associated with synchronous monitoring (like polling and data collection).
NOTE: Using RPC with Kafka is not available on OpenNMS Meridian, this feature requires OpenNMS Horizon 23 or newer. For Meridian, ActiveMQ will manage RPC communication and the theory about the names applies to the AMQ Queue Names.
When not using single-topic
, the topic names are:
OpenNMS.Apex.rpc-request.Collect
OpenNMS.Apex.rpc-request.DNS
OpenNMS.Apex.rpc-request.Detect
OpenNMS.Apex.rpc-request.Echo
OpenNMS.Apex.rpc-request.PING
OpenNMS.Apex.rpc-request.PING-SWEEP
OpenNMS.Apex.rpc-request.Poller
OpenNMS.Apex.rpc-request.Requisition
OpenNMS.Apex.rpc-request.SNMP
OpenNMS.Durham.rpc-request.Collect
OpenNMS.Durham.rpc-request.DNS
OpenNMS.Durham.rpc-request.Detect
OpenNMS.Durham.rpc-request.Echo
OpenNMS.Durham.rpc-request.PING
OpenNMS.Durham.rpc-request.PING-SWEEP
OpenNMS.Durham.rpc-request.Poller
OpenNMS.Durham.rpc-request.Requisition
OpenNMS.Durham.rpc-request.SNMP
When using single-topic
, the topic names are:
OpenNMS.Apex.rpc-request
OpenNMS.Durham.rpc-request
These topics are divided per Location. As you can see the location names are Durham
and Apex
.
These set works in conjunction with the rpc-response
set, so even if these topics are constrained by location, the relation with the other topic set force us to use the instanceId
feature per OpenNMS backend.
These topics are also associated with RPC communication, but don't contain the location name (hence the need to set the instanceId
):
When not using single-topic
, the topic names are:
OpenNMS.rpc-response.Collect
OpenNMS.rpc-response.DNS
OpenNMS.rpc-response.Detect
OpenNMS.rpc-response.Echo
OpenNMS.rpc-response.Poller
OpenNMS.rpc-response.SNMP
When using single-topic
, the topic names are:
OpenNMS.rpc-response
Using RPC with Kafka is not available on OpenNMS Meridian, this feature requires OpenNMS Horizon 23 or newer. For Meridian, ActiveMQ will manage RPC communication, but the response Queues are handled differently compared with the solution for Kafka.
These are shared topics that handle RPC responses from all the Minions from all the locations.
For similar reasons like the Sink topics, these should be constrained per OpenNMS backend and its Minions, and the way to do it is by setting an instanceId
per OpenNMS backend.
All the topics affected by the Instance ID are the Sink and RPC topics. The default value of the instanceId
is OpenNMS
(for OpenNMS Horizon), or Meridian
(for OpenNMS Meridian 2018 or newer).
When this ID is set with let's say CompanyName-DC1-G1
(for data center 1, group/sub-group 1), the topics will look like the following, assuming a location name of Loc1
:
CompanyName-DC1-G1.Sink.Events
CompanyName-DC1-G1.Sink.Heartbeat
CompanyName-DC1-G1.Sink.Syslog
CompanyName-DC1-G1.Sink.Trap
CompanyName-DC1-G1.Loc1.rpc-request.Collect
CompanyName-DC1-G1.Loc1.rpc-request.DNS
CompanyName-DC1-G1.Loc1.rpc-request.Detect
CompanyName-DC1-G1.Loc1.rpc-request.Echo
CompanyName-DC1-G1.Loc1.rpc-request.PING
CompanyName-DC1-G1.Loc1.rpc-request.PING-SWEEP
CompanyName-DC1-G1.Loc1.rpc-request.Poller
CompanyName-DC1-G1.Loc1.rpc-request.Requisition
CompanyName-DC1-G1.Loc1.rpc-request.SNMP
CompanyName-DC1-G1.rpc-response.Collect
CompanyName-DC1-G1.rpc-response.DNS
CompanyName-DC1-G1.rpc-response.Detect
CompanyName-DC1-G1.rpc-response.Echo
CompanyName-DC1-G1.rpc-response.Poller
CompanyName-DC1-G1.rpc-response.SNMP
Obviously, if a second OpenNMS backend has an instanceId
of CompanyName-DC1-G2
or CompanyName-DC2-G3
, the topics won't interfere between each other, and they can safely share the same Kafka Cluster.
The above solution was designed to be implemented using Kafka. Unfortunately, there are situations on which it is required to use ActiveMQ, for example, when using Meridian 2018, or a version of Horizon older than 23.
ActiveMQ requires additional changes on $OPENNMS_HOME/etc/opennms-activemq.xml
when the instance ID is changed. To be more precise, under a section called authorizationEntries
.
The default configuration contains the following:
<!-- Users in the minion role can write/create queues that are not keyed by location -->
<authorizationEntry queue="OpenNMS.*.*" write="minion" admin="minion" />
<!-- Users in the minion role can read/create from queues that are keyed by location -->
<authorizationEntry queue="OpenNMS.*.*.*" read="minion" admin="minion" />
The prefix OpenNMS
should be replaced with the chosen Instance ID. In case of Meridian 2018, the default prefix is different but the logic still applies.