# Fluentd Study and Intro
###### tags: `By_Ivan`
Fluentd is a open source log server applicaiton. It is licensed under the terms of the ***Apache License v2.0***, and was made by ***Treasure Data***.
The application can achieve these fallowing tasks:
1. Data collection,
2. Message routing,
3. Filtering,
4. Storage and buffer,
5. Parsing
The main application's name is **Fluentd**, and **td-agent** is the distributed package. Both of them has the same function but with different installation requirements.
<img src=https://www.fluentd.org/images/fluentd-architecture.png>
**see also :** [FluentBit](https://docs.fluentbit.io/manual/)
___
## Content
[TOC]
___
## Related Slides
https://docs.google.com/presentation/d/1R_FrZpzvvHkKCsHNQhM-QGLXj9iJqf6A67zIaOpyJEI/edit?usp=sharing
## Deployment
### Settings (suggested)
###### reference: `https://docs.fluentd.org/installation/before-install`
1. NTP (e.g. [chrony](https://chrony.tuxfamily.org/), ntpd, etc)
For having an accurate current timestamp. This is crucial for all the production-grade logging services.
2. Max number of file descriptors
``` shell
$ ulimit -n
65535
```
::: warning
If the console shows **1024** or bellow, it is insufficient.
:::
Add the following lines to your **/etc/security/limits.conf** file and **reboot** :
``` json
root soft nofile 65536
root hard nofile 65536
* soft nofile 65536
* hard nofile 65536
```
:::info
**Alternatively**
- If fluentd is running under systemd, the option **LimitNOFILE=65536** can also be used.
- And, if you are using the td-agent package, this value is set up by default.
:::
3. Optimize Network
For high load environments with many Fluentd instances, add the following configuration to your /etc/sysctl.conf file:
``` json
net.core.somaxconn = 1024
net.core.netdev_max_backlog = 5000
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_wmem = 4096 12582912 16777216
net.ipv4.tcp_rmem = 4096 12582912 16777216
net.ipv4.tcp_max_syn_backlog = 8096
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 10240 65535
```
### Installation and Requirements
#### Install by gem:
> Requried:
>1. Ruby 2.4 ↑
>2. gcc
>3. make
``` shell
$ gem install fluentd --no-doc
```
#### Install by package (aka. td-agent):
Supported platforms:
- [Red Hat Linux](https://docs.fluentd.org/installation/install-by-rpm)
- [Debian / Ubuntu](https://docs.fluentd.org/installation/install-by-deb)
- [Mac OSX](https://docs.fluentd.org/installation/install-by-dmg)
- [Windows](https://docs.fluentd.org/installation/install-by-msi)
:::warning
if you're running under the **root user**, the script will not work due to "sudo" command.
``` shell
# run inside sudo
sudo sh <<SCRIPT
curl https://packages.treasuredata.com/GPG-KEY-td-agent | apt-key add -
# add treasure data repository to apt
echo "deb http://packages.treasuredata.com/4/ubuntu/focal/ focal contrib" > /etc/apt/sources.list.d/treasure-data.list
# update your sources
apt-get update
# install the toolbelt
apt-get install -y td-agent
SCRIPT
```
The package should still work, but the repo must be built by hand.
:::
:::danger
Package does not support **CentOS 5** because of some issues with **OpenSSL**.
The application can still be installed by fallowing this [document](https://gist.github.com/repeatedly/97d4746e83a5ec135abf3eb77f46ff30)
:::
## Endpoint and Signals
### Signals
| SIGINT / SIGTERM |SIGUSR1 | SIGUSR2 | SIGHUP |SIGCONT |
|-------- | -------- | -------- |---|---|
| Stops the daemon + Force flush| Force flush | Reloads the configuration file |Reloads the configuration file|Calls SIGDUMP |
|tries to flush the entire memory buffer, no retry|flush_interval will still function as usual|re-constructing the data pipeline, supported since **v1.9.0**| restarting the worker process, no suggested after **v1.9.0**|dump fluentd internal status|
[Signals reference](https://docs.fluentd.org/deployment/signals#signals)
### Endpoints
Disabled by default. The endpoint can be enabled by adding the **<rpc_endpoint>** section in the **configfile**
```shell
<system>
rpc_endpoint 127.0.0.1:24444
</system>
```
```shell
$ curl http://127.0.0.1:24444/api/${ENDPOINT}
```
| ENDPOINT | Function |
| -------- | -------- |
|/api/processes.interruptWorkers | Stops the daemon. |
|/api/processes.killWorkers | Stops the daemon. |
|/api/processes.flushBuffersAndKillWorkers |Flushes buffer and stops the daemon. |
|/api/plugins.flushBuffers | Flushes the buffered messages. |
|/api/config.gracefulReload | Reloads configuration. |
|/api/config.reload | Reloads configuration. |
[RPC reference](https://docs.fluentd.org/deployment/rpc)
::: warning
While fluend supports endpoint features, **endpoint support for FluentBit** is still under developement, and is a low priority feature.
:::
## Run and Config settings
###### For FluentBit setting, please check : ([NYI](https://))
Fluentd consist of multiple plugin functions.
Which plugin to use and the data proccess can be defined in a config file.
[list of all plugins](https://www.fluentd.org/plugins)
When starting up fluentd, the config file path can be specified with "-f" argument.
```shell
$ /bin/fluentd -f ${/PATH/TO/CONFIG}
```
To run fluentd as daemon(background), add the "-d" argument.
```shell
$ /bin/fluentd -f ${/PATH/TO/CONFIG} -d --daemon
$ pkill -f fluentd # this will stop the daemon
```
If fluentd was installed by package (td-agent), the applcation can be run by service handler.
- RedHat Linux / Debian / Ubuntu:
```shell=
$ vi /etc/systemd/system/td-agent.service # config location
$ sudo systemctl start td-agent.service # start
$ sudo systemctl status td-agent.service # status
$ sudo systemctl stop td-agent.service # stop
```
- MacOS:
```shell=
$ vi /etc/td-agent/td-agent.conf # config
$ sudo launchctl load \ # start
$ /Library/LaunchDaemons/td-agent.plist
$ less /var/log/td-agent/td-agent.log # status
$ sudo launchctl unload \ # stop
$ /Library/LaunchDaemons/td-agent.plist
```
- Windows .msi:
please check the [document](https://docs.fluentd.org/installation/install-by-msi)
### Plugin Install
### Config Syntax
Our current process model:
```plantuml
skinparam activity {
BackgroundColor<< Result >> Lightblue
BackgroundColor<< Data >> Dimgray
BorderColor Peru
BorderColor<< Result >> Tomato
BorderColor<< Data >> White
FontColor<< Result >> Tomato
FontColor<< Data >> Lightgray
}
partition Database {
log_data<< Data >>-down>mongoDB
snmp_data<< Data >>-down>mongoDB
}
mongoDB-down>[mongo_tail]Fluentd
partition MOD #lightgreen{
others<< Data >>-down>[...]FluentBit
metrics<< Data >>-down>[cpu]FluentBit
syslog<< Data >>-down>[file]FluentBit
}
FluentBit->[parsing/filter]FluentBit
FluentBit-down->[forward] Fluentd
Fluentd->[filter/csv]Fluentd
Fluentd-down->[copy]==Routing==
==Routing== -down> SQL
-down>report << Result >>
==Routing== -down> Prometheus
-down>alert << Result >>
```
- **Pipeline**
The configuration file consists of these directives:
- **source** -- directives determine the input sources
- **match** -- directives determine the output destinations
- **filter** -- directives determine the event processing pipelines
- **system** -- directives set system wide configuration
- **label** -- directives group the output and filter for internal routing
- **@include** -- directives include other files
Normally, a basic pipeline would be like this:
source(input) -> filter -> match(output)
- **Language**
Fluentd plugins must be wrapped with
Each line must fallows the rule of Strict Indention
To make comments, use "#" at the front of the line.
```yaml=
<source> # specify directive pipeline
@type forward # specify witch plugin to use
bind 0.0.0.0
port 24224
# arguments values
# ... ...
tag test_tag # add tag to message, important in fluentd
</source>
```
- **How "Match" works**
```yaml=11
<match test_tag> # match the message with the tag defined previously
@type stdout # specify witch plugin to use
</match>
```
### Source Plugin (input)
1. **forward:**
get data from other fluentd/fluentBit
2. **mongo_tail:**
periodically check for database update, then output the updated results
### Filter Plugin (modifying)
1. **csv:**
parse a csv string messages to json format
2. **grep:**
filter / filter out messages that contains some scecific words
3. **pretty json format:**
### Match Plugin (output)
1. **mongo:**
insert json message to mongoDB
2. **stdout**
output raw message to stdout
3. **copy**
copy the output message to different plugins
### Metrics
1. **prometheus input plugin**: This plugin provides a HTTP endpoint that exposes metrics to Prometheus server on port 24231.
2. **prometheus_monitor input plugin**: This plugin collects internal metrics in Fluentd. This plugin is used to monitor Fluentd itself.
3. **prometheus_output_monitor input plugin**: This plugin collects internal metrics for **output plugin** in Fluentd.
[Prometheus Integration](https://hackmd.io/8ZeK16vGRlO1Do7yP6mB_Q#Fluentd-amp-Prometheus)
___
[TOC]
Created 2020/08/28
Last Modified 2020/09/01