# Adding telegraf agent to opentelemetry-collector-contrib ## Import issues While working on adding telegraf receiver to [opentelemetry-collector-contrib](https://github.com/open-telemetry/opentelemetry-collector-contrib) one might notice that the dependencies pulled in by telegraf and opentelemetry-collector-contrib have a couple of conflicts. Adding the following dependency in opentelemetry-collector-contrib ``` require github.com/influxdata/telegraf v1.17.1 ``` results in the following build errors: ``` $ make otelcontribcol-unstable GO111MODULE=on CGO_ENABLED=0 go build -o ./bin/otelcontribcol_unstable_darwin_amd64 \ -ldflags "-X github.com/open-telemetry/opentelemetry-collector-contrib/internal/version.GitHash=63b2f339 -X github.com/open-telemetry/opentelemetry-collector-contrib/internal/version.Version=v0.19.0 -X go.opentelemetry.io/collector/internal/version.BuildType=release" -tags enable_unstable ./cmd/otelcontribcol go: finding module for package github.com/prometheus/prometheus/discovery/install go: finding module for package github.com/Azure/azure-sdk-for-go/arm/compute go: finding module for package github.com/Azure/azure-sdk-for-go/arm/network ../../../.gvm/pkgsets/go1.15.7/global/pkg/mod/github.com/prometheus/prometheus@v2.5.0+incompatible/discovery/azure/azure.go:24:2: module github.com/Azure/azure-sdk-for-go@latest found (v51.0.0+incompatible), but does not contain package github.com/Azure/azure-sdk-for-go/arm/compute ../../../.gvm/pkgsets/go1.15.7/global/pkg/mod/github.com/prometheus/prometheus@v2.5.0+incompatible/discovery/azure/azure.go:25:2: module github.com/Azure/azure-sdk-for-go@latest found (v51.0.0+incompatible), but does not contain package github.com/Azure/azure-sdk-for-go/arm/network ../../../.gvm/pkgsets/go1.15.7/global/pkg/mod/github.com/prometheus/prometheus@v2.5.0+incompatible/discovery/consul/consul.go:27:2: ambiguous import: found package github.com/hashicorp/consul/api in multiple modules: github.com/hashicorp/consul v1.2.1 (/Users/pmalek/.gvm/pkgsets/go1.15.7/global/pkg/mod/github.com/hashicorp/consul@v1.2.1/api) github.com/hashicorp/consul/api v1.7.0 (/Users/pmalek/.gvm/pkgsets/go1.15.7/global/pkg/mod/github.com/hashicorp/consul/api@v1.7.0) ../../../.gvm/pkgsets/go1.15.7/global/pkg/mod/go.opentelemetry.io/collector@v0.19.0/receiver/prometheusreceiver/factory.go:22:2: module github.com/prometheus/prometheus@latest found (v2.5.0+incompatible), but does not contain package github.com/prometheus/prometheus/discovery/install make: *** [otelcontribcol-unstable] Error 1 ``` ### Working version with resolved import issues There is a working version at https://github.com/pmalek-sumo/opentelemetry-collector-contrib/tree/telegrafreceiver (on `telegrafreceiver` branch) which uses a fork of telegraf at https://github.com/pmalek-sumo/telegraf/releases/tag/v1.17.7 `telegrafreceiver` branch in fork above is added to the unstable components list so in order to build this one will need to run: ``` make otelcontribcol-unstable ``` This version has removed the following plugins in order to make compilation pass: * `inputs/kube_inventory` * `inputs/prometheus` It was also necessary to change `*prompb.Label` to `prompb.Label` in a couple of lines in `plugins/serializers/prometheusremotewrite/prometheusremotewrite.go` due to [changes](https://github.com/prometheus/prometheus/pull/4957) in `prompb/remote.pb.go` from `github.com/prometheus/prometheus`. ### Incompatible Prometheus dependency Because of introduction of Go modules in Prometheus `v2.6` [commit][1] and not following Go semver convention (every module version past `v1` should have a `v<num>` suffix) and [no intention on changing this][2], requiring Prometheus package results in adding `github.com/prometheus/prometheus@v2.5.0` as a dependency since that was the last version without support for Go modules (hence no Go module requirements mentioned above). This can be overcome with requiring a [particular SHA of a release commit][3] in telegraf like so: ``` go get github.com/prometheus/prometheus@e83ef207b6c2398919b69cd87d2693cfc2fb4127 ``` This particular version is [a v2.21.0 release commit][4] which doesn't conflict with `00f16d1ac3a4` version required by otc-contrib - pointing to [`v2.22.1` commit][5]. [1]: https://github.com/prometheus/prometheus/commit/a516bc2160b86c652d7ebb7d2df0fc27ca328f8b [2]: https://github.com/prometheus/prometheus/issues/8417#issuecomment-769042914 [3]: https://github.com/prometheus/prometheus/issues/7991#issuecomment-701298893 [4]: https://github.com/prometheus/prometheus/commit/e83ef207b6c2398919b69cd87d2693cfc2fb4127 [5]: https://github.com/prometheus/prometheus/commit/00f16d1ac3a4c94561e5133b821d8e4d9ef78ec2 ### Renaming telegraf imports The [working version](#Working-version-with-resolved-import-issues) also needed to have import paths from `github.com/influxdata/telegraf` to `github.com/pmalek-sumo/telegraf` to prevent using original telegraf sources that do not work when imported to OTC. ## In process data flow with telegraf's Agent ### `telegraf.Agent` interface Telegraf's [`Agent`][agent_1] which manages plugins defined in the config has the following functions exported (available to be called from a 3rd party Go package): * `func (a *Agent) Once(ctx context.Context, wait time.Duration) error` which runs a single metrics gather * `func (a *Agent) Run(ctx context.Context) error` which runs the agent until the context is cancelled * `func (a *Agent) Test(ctx context.Context, wait time.Duration) error` runs a single gather but writes the gathered metrics on stdout As one can see, there is no point where one could possible read the gathered metrics (apart from running `Test` and hooking into process' stdout) hence it's impossible with the current state of `Agent`'s API to get metrics from telegraf's input plugin to otc for processing when in process otc data model would be chosen as the data flow. Changing telegraf's [`agent.Run()`][agent_run] which manages plugins (inputs, processors, aggregators, outputs) as well as adjusting the code to export ingested metrics in order to allow otc to consume it would be a rather controversial undertaking which most likely wouldn't be accepted upstream. [agent_1]: https://github.com/influxdata/telegraf/blob/86e50f85b39fe9afe1b62b8e1f5ef8c268ff1894/agent/agent.go#L20-L23 [agent_run]: https://github.com/influxdata/telegraf/blob/86e50f85b39fe9afe1b62b8e1f5ef8c268ff1894/agent/agent.go#L112-L198 #### Extending the interface One could be tempted to extend the interface with a following func: ``` func (a *Agent) RunWithChannel(ctx context.Context, out chan<- telegraf.Metric) error ``` which could be used in OTC's receiver to consume the metrics from telegraf. ### Data model differences and controversies `telegraf.Metric` is an interface with (among others) the following funcs: ``` type Metric interface { // Name is the primary identifier for the Metric and corresponds to the // measurement in the InfluxDB data model. Name() string // TagList returns the tags as a slice ordered by the tag key in lexical // bytewise ascending order. The returned value should not be modified, // use the AddTag or RemoveTag methods instead. TagList() []*Tag // FieldList returns the fields as a slice in an undefined order. The // returned value should not be modified, use the AddField or RemoveField // methods instead. FieldList() []*Field // Type returns a general type for the entire metric that describes how you // might interpret, aggregate the values. // // This method may be removed in the future and its use is discouraged. Type() ValueType ... } ``` Where metrics values are stored as fields in a key-value map returned by `FieldList()` and metric type is set on top of all those values and one get it with `Type()`. On the other hand OTC has a data model where a metric has a type (more fine grained, e.g. `pdata.MetricDataTypeDoubleGauge` instead of `Gauge`) and then data points attached to this metric with value and timestamp set. ``` var m telegraf.Metric ... var t = m.Time().UnixNano() ... switch m.Type() { // ... other types ... // case telegraf.Cou for _, f := range m.FieldList() { pm := pdata.NewMetric() pm.SetName(m.Name() + "_" + f.Key) switch v := f.Value.(type) { case float64: pm.SetDataType(pdata.MetricDataTypeDoubleGauge) dps := pm.DoubleGauge().DataPoints() dps.Resize(1) dps.At(0).SetValue(v) dps.At(0).SetTimestamp(pdata.TimestampUnixNano(t)) case uint64: pm.SetDataType(pdata.MetricDataTypeIntGauge) dps := pm.IntGauge().DataPoints() dps.Resize(1) dps.At(0).SetValue(int64(v)) dps.At(0).SetTimestamp(pdata.TimestampUnixNano(t)) } metrics.Append(pm) } ... ``` ## Data flow through loopback interface Another option would be to run telegraf as describe above (using `telegraf.Agent`) but with its exporters sending data to otc receivers. This would require the user to pay more attention during the configuration and it would consume more resources because of the serialization/deserialization costs (serializing at telegraf's output plugin and deserializing in the otc receiver). This might also be a rather controversial idea to bring up to upstream. --- ###### tags: `telegraf` `otc`