owned this note
owned this note
Published
Linked with GitHub
Deploying Multiple OpenNMS Backends sharing the same Data Clusters
===========
The objective here is to deploy multiple OpenNMS backends, where each backend will have a set of Minions/Locations. The idea is to share the same Kafka cluster across all the backend/minions set. For this reason, the topics in Kafka should be private to each backend/minions set. That way, the messages will be sent by the appropriate minions and will be processed by the appropriate backend server.
When the data collection feature is required, the inventory should be unique per the backend server. The Cassandra/ScyllaDB will be shared among all the backends. In this case, each backend will have its own keyspace, so the data from one backend won't interfere with the others.
In terms of the inventory, as each backend is basically completely independent and unaware of the other ones, each OpenNMS backend will have its own PostgreSQL server to simplify maintenance and avoid potential bottlenecks that might affect other backends.
The consequence of having independent backends is that to have a unified view for all the monitored devices, the usage of Grafana/Helm with custom dashboards will be mandatory, as the WebUI of each backend will show information only for the devices it is monitoring.
## Deployment
![](https://i.imgur.com/OGWv9YM.jpg)
## Requirements
* Ensure that NTP is enabled and running on all the servers involved with the solution, as time synchronization is critical.
* Ensure that the foreign source name and foreign ID combination are unique for all the monitored nodes across all the OpenNMS backend servers, even if each backend will have its own keyspace on ScyllaDB/Cassandra, to simplify locating specific resources on the unified UI with Helm/Grafana.
* Ensure that the location names are unique across all the OpenNMS back-end servers, even if they won't share the same PostgreSQL server.
* Ensure that the Minion ID is unique across all the Minions and OpenNMS back-end servers, even if they won't share the same PostgreSQL server.
* Ensure that the `org.opennms.instance.id` attribute is unique per OpenNMS backend server and its Minions.
* Ensure that the external entity responsible for maintaining the inventory across all the OpenNMS servers is consistent with the above rules and uses the ReST API to configure the SNMP credentials, the Requisitions, and the Foreign Source Definition if needed on the respective backend server.
## Constraints
The first requirement is to facilitate the management of the unified Grafana/Helm dashboards, in case combining metrics from multiple backends is desired. Also, it would be useful to be able to identify the source of a given resource/node just by looking at its foreignSource/foreignID.
As Kafka will be in the middle of the communication, one OpenNMS must only get the data from the Minions that it is managing. In theory, we're talking about multiple independent OpenNMS/Minion deployments that don't share inventory. The main purpose of this is to avoid side effects, avoid introducing lag on a topic that belongs to another OpenNMS/Minion combination when one combination is busy, and facilitate troubleshooting.
The location name is part of the RPC topics, but it is not part of the Sink topics. That means the Sink topics will be shared containing information that matters only to certain OpenNMS servers. This is the main motivation for this change.
## Solution
On each backend/minion set:
1. For the OpenNMS server, create a file inside the `/opt/opennms/etc/opennms.properties.d/` directory (for example `minion.properties`) and add an entry for `org.opennms.instance.id` inside of it; for instance:
```bash=
echo "org.opennms.instance.id=XXXX" \
> /opt/opennms/etc/opennms.properties.d/minion.properties
```
2. For the Minions, append the `org.opennms.instance.id` property and its value at the endfile called `/opt/minion/etc/custom.system.properties`; for instance:
```bash=
echo "org.opennms.instance.id=XXXX" \
>> /opt/minion/etc/custom.system.properties
```
The suggestion for the ID format would be:
`OpenNMS.XXX`
Or
`CompanyName.XXX`
Where `XXX` is the custom identifier per OpenNMS backend. It is suggested to use a combination like DC+Region. Please make sure it is a short string without special characters, as this will be part of each topic name. There is no need to include that information on the location's name to include other information for the location.
:::warning
The actual value of the Instance ID (`XXXX` in the above example) must match between OpenNMS and its Minions.
:::
## Background
Let's analyze the existing topics from a test environment:
```bash=
[ec2-user@kafka1 ~]$ /opt/kafka/bin/kafka-topics.sh --zookeeper zookeeper1:2181/kafka --list
OpenNMS.Alarms
OpenNMS.Apex.rpc-request.Collect
OpenNMS.Apex.rpc-request.DNS
OpenNMS.Apex.rpc-request.Detect
OpenNMS.Apex.rpc-request.Echo
OpenNMS.Apex.rpc-request.PING
OpenNMS.Apex.rpc-request.PING-SWEEP
OpenNMS.Apex.rpc-request.Poller
OpenNMS.Apex.rpc-request.Requisition
OpenNMS.Apex.rpc-request.SNMP
OpenNMS.Durham.rpc-request.Collect
OpenNMS.Durham.rpc-request.DNS
OpenNMS.Durham.rpc-request.Detect
OpenNMS.Durham.rpc-request.Echo
OpenNMS.Durham.rpc-request.PING
OpenNMS.Durham.rpc-request.PING-SWEEP
OpenNMS.Durham.rpc-request.Poller
OpenNMS.Durham.rpc-request.Requisition
OpenNMS.Durham.rpc-request.SNMP
OpenNMS.Events
OpenNMS.Metrics
OpenNMS.Nodes
OpenNMS.ALEC.Inventory
OpenNMS.Sink.Events
OpenNMS.Sink.Heartbeat
OpenNMS.Sink.Syslog
OpenNMS.Sink.Telemetry-IPFIX
OpenNMS.Sink.Telemetry-JTI
OpenNMS.Sink.Telemetry-NXOS
OpenNMS.Sink.Telemetry-Netflow-5
OpenNMS.Sink.Telemetry-Netflow-9
OpenNMS.Sink.Trap
OpenNMS.rpc-response.Collect
OpenNMS.rpc-response.DNS
OpenNMS.rpc-response.Detect
OpenNMS.rpc-response.Echo
OpenNMS.rpc-response.Poller
OpenNMS.rpc-response.SNMP
__consumer_offsets
alec-datasource-alarmStore-changelog
alec-datasource-inventoryStore-changelog
alec-datasource-situationStore-changelog
```
The above is the result when having `single-topic=false` (which is the default for some versions of OpenNMS).
If you have `single-topic=true`, that reduces the number of RPC topics; for instance:
```bash=
[ec2-user@kafka1 ~]$ /opt/kafka/bin/kafka-topics.sh --zookeeper zookeeper1:2181/kafka --list
OpenNMS.Alarms
OpenNMS.Apex.rpc-request
OpenNMS.Durham.rpc-request
OpenNMS.Events
OpenNMS.Metrics
OpenNMS.Nodes
OpenNMS.ALEC.Inventory
OpenNMS.Sink.Events
OpenNMS.Sink.Heartbeat
OpenNMS.Sink.Syslog
OpenNMS.Sink.Telemetry-IPFIX
OpenNMS.Sink.Telemetry-JTI
OpenNMS.Sink.Telemetry-NXOS
OpenNMS.Sink.Telemetry-Netflow-5
OpenNMS.Sink.Telemetry-Netflow-9
OpenNMS.Sink.Trap
OpenNMS.rpc-response
__consumer_offsets
alec-datasource-alarmStore-changelog
alec-datasource-inventoryStore-changelog
alec-datasource-situationStore-changelog
```
On this particular installation, there is one OpenNMS instance with 2 Minions on different locations (Apex and Durham) handling telemetry, flows, syslog traps messages.
Let's analyze the topics that exist on Kafka:
1. Kafka Producer
This is an optional feature to forward the inventory, events, alarms and metrics to Kafka topics. The topic names are:
```=
OpenNMS.Alarms
OpenNMS.Nodes
OpenNMS.Events
OpenNMS.Metrics
```
This can be fully customized per OpenNMS server. In fact, these can be shared across all the OpenNMS instances. The only consumers for these topics on this example deployment is the correlation engine (using the first 2 topics).
2. ALEC
This is an optional feature for advanced alarm correlation. The topic names are:
```=
OpenNMS.ALEC.Inventory
alec-datasource-alarmStore-changelog
alec-datasource-inventoryStore-changelog
alec-datasource-situationStore-changelog
```
These topics are managed by the [ALEC](https://alec.opennms.com/) (Architecture for Learning Enabled Correlation), a new feature currently under development.
3. Sink Pattern
This is part of the core communication between OpenNMS and Minions, and cover all asynchronous messages sent by Minions to OpenNMS. The topic names are:
```=
OpenNMS.Sink.Events
OpenNMS.Sink.Heartbeat
OpenNMS.Sink.Syslog
OpenNMS.Sink.Telemetry-IPFIX
OpenNMS.Sink.Telemetry-JTI
OpenNMS.Sink.Telemetry-NXOS
OpenNMS.Sink.Telemetry-Netflow-5
OpenNMS.Sink.Telemetry-Netflow-9
OpenNMS.Sink.Trap
```
These are related with async messages like SNMP Traps, Syslog Messages, Flow data and streaming telemetry data sent by the monitored devices to Minions.
These topics are the ones that should *not* be shared across multiple OpenNMS instances, and required a custom "instanceId" to make sure each OpenNMS backend and its Minions will produce/consume only on their topics.
4. RPC Requests [from OpenNMS to Minions]
This is part of the core communication between OpenNMS and Minions, and cover all the messages associated with synchronous monitoring (like polling and data collection).
> NOTE: Using RPC with Kafka is not available on OpenNMS Meridian, this feature requires OpenNMS Horizon 23 or newer. For Meridian, ActiveMQ will manage RPC communication and the theory about the names applies to the AMQ Queue Names.
When not using `single-topic`, the topic names are:
```=
OpenNMS.Apex.rpc-request.Collect
OpenNMS.Apex.rpc-request.DNS
OpenNMS.Apex.rpc-request.Detect
OpenNMS.Apex.rpc-request.Echo
OpenNMS.Apex.rpc-request.PING
OpenNMS.Apex.rpc-request.PING-SWEEP
OpenNMS.Apex.rpc-request.Poller
OpenNMS.Apex.rpc-request.Requisition
OpenNMS.Apex.rpc-request.SNMP
```
```=
OpenNMS.Durham.rpc-request.Collect
OpenNMS.Durham.rpc-request.DNS
OpenNMS.Durham.rpc-request.Detect
OpenNMS.Durham.rpc-request.Echo
OpenNMS.Durham.rpc-request.PING
OpenNMS.Durham.rpc-request.PING-SWEEP
OpenNMS.Durham.rpc-request.Poller
OpenNMS.Durham.rpc-request.Requisition
OpenNMS.Durham.rpc-request.SNMP
```
When using `single-topic`, the topic names are:
```=
OpenNMS.Apex.rpc-request
OpenNMS.Durham.rpc-request
```
These topics are divided per Location. As you can see the location names are `Durham` and `Apex`.
These set works in conjunction with the `rpc-response` set, so even if these topics are constrained by location, the relation with the other topic set force us to use the `instanceId` feature per OpenNMS backend.
5. RPC Responses [from Minions to OpenNMS]
These topics are also associated with RPC communication, but don't contain the location name (hence the need to set the `instanceId`):
When not using `single-topic`, the topic names are:
```=
OpenNMS.rpc-response.Collect
OpenNMS.rpc-response.DNS
OpenNMS.rpc-response.Detect
OpenNMS.rpc-response.Echo
OpenNMS.rpc-response.Poller
OpenNMS.rpc-response.SNMP
```
When using `single-topic`, the topic names are:
```=
OpenNMS.rpc-response
```
:::info
Using RPC with Kafka is not available on OpenNMS Meridian, this feature requires OpenNMS Horizon 23 or newer. For Meridian, ActiveMQ will manage RPC communication, but the response Queues are handled differently compared with the solution for Kafka.
:::
These are shared topics that handle RPC responses from all the Minions from all the locations.
For similar reasons like the Sink topics, these should be constrained per OpenNMS backend and its Minions, and the way to do it is by setting an `instanceId` per OpenNMS backend.
## InstanceID Example
All the topics affected by the Instance ID are the Sink and RPC topics. The default value of the `instanceId` is `OpenNMS` (for OpenNMS Horizon), or `Meridian` (for OpenNMS Meridian 2018 or newer).
When this ID is set with let's say `CompanyName-DC1-G1` (for data center 1, group/sub-group 1), the topics will look like the following, assuming a location name of `Loc1`:
```=
CompanyName-DC1-G1.Sink.Events
CompanyName-DC1-G1.Sink.Heartbeat
CompanyName-DC1-G1.Sink.Syslog
CompanyName-DC1-G1.Sink.Trap
```
```=
CompanyName-DC1-G1.Loc1.rpc-request.Collect
CompanyName-DC1-G1.Loc1.rpc-request.DNS
CompanyName-DC1-G1.Loc1.rpc-request.Detect
CompanyName-DC1-G1.Loc1.rpc-request.Echo
CompanyName-DC1-G1.Loc1.rpc-request.PING
CompanyName-DC1-G1.Loc1.rpc-request.PING-SWEEP
CompanyName-DC1-G1.Loc1.rpc-request.Poller
CompanyName-DC1-G1.Loc1.rpc-request.Requisition
CompanyName-DC1-G1.Loc1.rpc-request.SNMP
```
```=
CompanyName-DC1-G1.rpc-response.Collect
CompanyName-DC1-G1.rpc-response.DNS
CompanyName-DC1-G1.rpc-response.Detect
CompanyName-DC1-G1.rpc-response.Echo
CompanyName-DC1-G1.rpc-response.Poller
CompanyName-DC1-G1.rpc-response.SNMP
```
Obviously, if a second OpenNMS backend has an `instanceId` of `CompanyName-DC1-G2` or `CompanyName-DC2-G3`, the topics won't interfere between each other, and they can safely share the same Kafka Cluster.
## About ActiveMQ
The above solution was designed to be implemented using Kafka. Unfortunately, there are situations on which it is required to use ActiveMQ, for example, when using Meridian 2018, or a version of Horizon older than 23.
ActiveMQ requires additional changes on `$OPENNMS_HOME/etc/opennms-activemq.xml` when the instance ID is changed. To be more precise, under a section called `authorizationEntries`.
The default configuration contains the following:
```xml
<!-- Users in the minion role can write/create queues that are not keyed by location -->
<authorizationEntry queue="OpenNMS.*.*" write="minion" admin="minion" />
<!-- Users in the minion role can read/create from queues that are keyed by location -->
<authorizationEntry queue="OpenNMS.*.*.*" read="minion" admin="minion" />
```
The prefix `OpenNMS` should be replaced with the chosen Instance ID. In case of Meridian 2018, the default prefix is different but the logic still applies.