owned this note
owned this note
Published
Linked with GitHub
# NetData Packaging Plan
> TL;DR: Support one and only one way of distributing the Netdata Agent through static binary all-batteries-included packages. Wrap the Netdata Agent binary in a number of popular Distribution Package Manager formats.
> [name=James Mills]
> [TOC]
## Overview
The following is an outline of a plan to finally get Netdata on to as many systems, OS(es), distributions and architectures as possible by providing binary packages. Several attempts have been made thus far and have fallen short of the intended goals.
### Goals
- Get Netdata onto as many systems as possible.
- Support automated updates of the agent.
- Be as simple to install as a 1-click installation / 1-command.
- Not break user's systems.
### Current Status
What we have managed to do thus far is:
- Build and Publish a set of RPM and DEB packages for a small limited number of OS/Distributions, namely CentOS, Debian and Ubuntu.
In addition we have attempted to try to increase the scope of packages we support and have to date:
- Built a number of "package builders" for more than a dozen Linux Distributions and versions and three architectures.
- Migrated a number of the old Travis CI based workflows to Github Actions and improved them.
## How?
Implementation:
- [ ] Refactor the existing static / Makeself build process and move to Github Actions with parallel publications.
- [ ] Add support for more architectures:
- [ ] x86 (_32bit_)
- [x] x86_64 (_64bit_)
- [ ] ARM (_32bit_)
- [ ] ARM64 (_64bit_)
- [ ] Create wrappers for a number of Linux/OS/Distributions:
- [ ] RPM (_CentOS, RHEL, etc_)
- [ ] DEB (_Debian, Ubuntu, etc_)
- [ ] APK (_Alpine_)
- [ ] Bottle (_macOS_)
- [ ] TGZ (_CRUX, Gentoo, Slackware, LFS, and others..._)
> __Note:__ When we say `TGZ` here we efffectively mean a Tarball. We _can_ of course use any compression algorithm we want here and we will probably use XZ and ship `.tar.xz`(s). Thanks @thiagoftsm
> [name=James Mills]
Deployment:
> Deployment will happen as early as possible in an interactive approach where as soon as we have an e2e flow/process working we will push this to users.
- [ ] Add a "feature gate" t `kickstart.sh` that determines if we have binary packages for a particular user's system and funnel a percentage of users to those packages.
- [ ] Gradually ramp up the percentage of users we funnel over a number of weeks as we support more native package formats, OSes and Architectures.
### Details
- We will only ever build and ship statically linked binaries of the Netdata Agent for all the architectures we care about.
- These binaries will be "batteries-included" and "self-updating".
- They will never depend on anything from the Host besides physical host resources and the OS Kernel.
- OS Native Packages will be provided for a number of popular OS/Distributions as "wrappers" of the binaries we build and ship.
- This effectively means the default distribution channel is our `TGZ` (_Tarball with our static binary_)
## Why?
> Did you know that the Netdata Agent today tat we build and distribute via `kickstart-static64.sh` is only ~5.5MB in size?
> Did you also know that it takes mere seconds to install and setup?
So why do we want to do all of the above? Easy.
- This makes our users lives far easier and simpler. A user just needs to perform a 1-click or 1-command installation for any OS/System/Architecture they happen to be on or want Netdata on.
- THis solves "hard dependencies" for us in ways OS-dependent packages can never solve very well. We bundle and ship all dependencies in the binary. We control how we build the software -- not our users, therefore there are less "build problems" reported by users and less "problem XY" trying to get the Netdata Agent running on some esoteric system we hadn't thought of o r don't support very well.
- Updates just work and can be completed automated if desired.
- No more downloading a dozen different things, building various components or other complications.
To illustrate some of the pre-existing issues this would solve:
- [How do we handle systems with an openssl library that is incompatible with libwebsockets?](https://github.com/netdata/netdata/issues/8846)
- [Older openssl libraries cause ACLK reconnection failure](https://github.com/netdata/netdata/issues/8812)
- [Properly document that we have issues with multiple installed versions of OpenSSL](https://github.com/netdata/netdata/issues/8306)
And _almost_ any Issue labelled with `area/build` or `area/packaging` for instance:
- [NetData fails to build and installon FreeBSD with custom libmosquitto fork](https://github.com/netdata/netdata/issues/8131)
- [solaris/*bsd support](https://github.com/netdata/netdata/issues/601)
- _and numerous others..._
## When?
> Since we already have much of the tooling and processes already in place for some of this we can confidently start shipping this new packaging model to users within a week or so in paralell to our existing packaging.
> [name=James Mills]
- ~1 week(s) -- Refactor existing build/publish process.
- ~1 weeks(s) -- Start funnelling users to the new packaging.
- ~2 weeks(s) -- Expand support for non-x86 architectures.
- ~1 weeks(s) -- Expand support for other Package Formats (_wrappers_).
At this point we would make the decision to turn off the `kickstart-static64.sh`.
> These would effectively become deprecated and really the default anyway via `kickstart.sh`.
> [name=James Mills]
We would then also deprecate and announce that we no longer support the packages on PackageCloud and disable the repos there.
> These pacakges would be replaced with new packages that wrap our binaries without external dependencies, becuase we plan to do this in paralell we will no longer use PackageCloud for this but normal HTTP storage such as a GCS Bucket.
> [name=James Mills]
## Discussion
### What about collectors?
We have 4 different supported external collectors today:
- Collectors written in Bash
- Collectors written in Python
- Collectors written in JavaScript
- Collectors written in Go
We also have "Core Collectors" distributed along with the Agent written in C.
For Python, JavaScript and Bash collectors we can continue to depend on whatever is available on the Host system to run said collectors.
For collectors written in Go or C, these are already shipped as static binaries and usually have few or even zero external dependencies (_with some exceptions of CGo code, like database collectors_).
### What about Bash?
We are "okay" supporting shelling out to `/bin/bash` for various functionality. We effectively `popen()` out to either `/bin/bash` or `/usr/bin/env bash`. Long term it would be nice to shell out to `/bin/sh` or `/usr/bin/env sh` so we can support systems that do not ship with Bash, e.g: Busybox or a myriad of IoT platforms. Longer long term we would eliminate all use of shell scripts and shelling out for agent functionality.
### What about eBPF Plugin?
> This is an interesting point because eBPF heavily depends on the Linux ABI. Whilst I'm confident the shared libraies will not be a problem and we can also hoose to statically link the eBPF Plugin inside the `netdata` binary, eBPF will not work on any other type of OS besides ones that run the Linux Kernel.
>
> Its a bit of a special case but one I think we can live with until we make plans to also support other types of tracing that you find in non-Linux Kernels such as *BSD's DTrace.
> [name=James Mills]
@thiagoftsm Your thoughts here would be invaluable!
@thiagoftsm Yes @ mention works. Please commant as per above ☝️
### What about Native Binary Packages built from Source for Linux Distro X?
We should not do this for several reasons:
- We're a small company and simply do not have the bandwidth to support Netdata on even the subset of systems we try to support today!
- The community at large has a much wider variety of packages across many more systems than we do! (see below_)
- We should simplify the way we package and distribute Netdata so we can focus on more important things.
- Doing so means we can get the Netdata agent out to more systems, more architectures and more OSes more easily.
Many software that exists on many more systems than we do are not distributed this way. Some examples:
Statically linked:
- Prometheus
- InfluxDB
- Docker
- Syncthing ([Stats](https://data.syncthing.net/))
Bundle all Dependencies:
- Dropbox
- Virtualbox
- Chrome (_depends on a very small number of consistent Systems libs_)
In addition, it appears the community at large maintains their own Netdata package(s) across quite a large variety of systems. See: https://repology.org/project/netdata/versions -- Not only does this have a larger coverage than our own packages but it makes sense that the community can do this as they have more bandwidth.
### We have seen several installation errors that are not getting caught with our current process.
This is a bit of a KP. There are several things going on here:
- We don't `set -e` in most/all? of our packaging scripts and tooling. We should and this plan once executed will ensure we do this. [#TBD](TBD)
- We already have an extensive CI test suite for the source-based installed `./netdata-installer.sh` which `kickstart.sh` ultimately uses. So any installation errors that fail should be caught there and block PRs (_and do_).
- It is a known issue and has been problematic that changes to our packaging can and has resulted in broken Travis builds for our RPM/DEB packages. This plan's primary goal is to eliminate these set of problems entirely.
### We have no way to checkpoint in stable releases, to prevent a binary from going out until we're ready.
This is a KP and technical debt we inherited.
I will be fixed and the full release process / workflow refactored. This is the last and final thing on our last before we kill Travis with fire 🔥
### How do we update our supported OSs matrix and what does it mean for our processes?
We don't. If this plan is approved and executed within a few weeks the plan is to eliminate this complicated and hard-to-follow list entirely and replace it simply with:
> Supported Systems:
> - All Linux OS(es) for x86, x86_64, arm and arm64.
> - All BSD OS(es) for x86, x86_64
And eventually Windows (_once we get there_)!