Fluentd Study and Intro

# Fluentd Study and Intro ###### tags: `By_Ivan` Fluentd is a open source log server applicaiton. It is licensed under the terms of the ***Apache License v2.0***, and was made by ***Treasure Data***. The application can achieve these fallowing tasks: 1. Data collection, 2. Message routing, 3. Filtering, 4. Storage and buffer, 5. Parsing The main application's name is **Fluentd**, and **td-agent** is the distributed package. Both of them has the same function but with different installation requirements. <img src=https://www.fluentd.org/images/fluentd-architecture.png> **see also :** [FluentBit](https://docs.fluentbit.io/manual/) ___ ## Content [TOC] ___ ## Related Slides https://docs.google.com/presentation/d/1R_FrZpzvvHkKCsHNQhM-QGLXj9iJqf6A67zIaOpyJEI/edit?usp=sharing ## Deployment ### Settings (suggested) ###### reference: `https://docs.fluentd.org/installation/before-install` 1. NTP (e.g. [chrony](https://chrony.tuxfamily.org/), ntpd, etc) For having an accurate current timestamp. This is crucial for all the production-grade logging services. 2. Max number of file descriptors ``` shell $ ulimit -n 65535 ``` ::: warning If the console shows **1024** or bellow, it is insufficient. ::: Add the following lines to your **/etc/security/limits.conf** file and **reboot** : ``` json root soft nofile 65536 root hard nofile 65536 * soft nofile 65536 * hard nofile 65536 ``` :::info **Alternatively** - If fluentd is running under systemd, the option **LimitNOFILE=65536** can also be used. - And, if you are using the td-agent package, this value is set up by default. ::: 3. Optimize Network For high load environments with many Fluentd instances, add the following configuration to your /etc/sysctl.conf file: ``` json net.core.somaxconn = 1024 net.core.netdev_max_backlog = 5000 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_wmem = 4096 12582912 16777216 net.ipv4.tcp_rmem = 4096 12582912 16777216 net.ipv4.tcp_max_syn_backlog = 8096 net.ipv4.tcp_slow_start_after_idle = 0 net.ipv4.tcp_tw_reuse = 1 net.ipv4.ip_local_port_range = 10240 65535 ``` ### Installation and Requirements #### Install by gem: > Requried: >1. Ruby 2.4 ↑ >2. gcc >3. make ``` shell $ gem install fluentd --no-doc ``` #### Install by package (aka. td-agent): Supported platforms: - [Red Hat Linux](https://docs.fluentd.org/installation/install-by-rpm) - [Debian / Ubuntu](https://docs.fluentd.org/installation/install-by-deb) - [Mac OSX](https://docs.fluentd.org/installation/install-by-dmg) - [Windows](https://docs.fluentd.org/installation/install-by-msi) :::warning if you're running under the **root user**, the script will not work due to "sudo" command. ``` shell # run inside sudo sudo sh <<SCRIPT curl https://packages.treasuredata.com/GPG-KEY-td-agent | apt-key add - # add treasure data repository to apt echo "deb http://packages.treasuredata.com/4/ubuntu/focal/ focal contrib" > /etc/apt/sources.list.d/treasure-data.list # update your sources apt-get update # install the toolbelt apt-get install -y td-agent SCRIPT ``` The package should still work, but the repo must be built by hand. ::: :::danger Package does not support **CentOS 5** because of some issues with **OpenSSL**. The application can still be installed by fallowing this [document](https://gist.github.com/repeatedly/97d4746e83a5ec135abf3eb77f46ff30) ::: ## Endpoint and Signals ### Signals | SIGINT / SIGTERM |SIGUSR1 | SIGUSR2 | SIGHUP |SIGCONT | |-------- | -------- | -------- |---|---| | Stops the daemon + Force flush| Force flush | Reloads the configuration file |Reloads the configuration file|Calls SIGDUMP | |tries to flush the entire memory buffer, no retry|flush_interval will still function as usual|re-constructing the data pipeline, supported since **v1.9.0**| restarting the worker process, no suggested after **v1.9.0**|dump fluentd internal status| [Signals reference](https://docs.fluentd.org/deployment/signals#signals) ### Endpoints Disabled by default. The endpoint can be enabled by adding the **<rpc_endpoint>** section in the **configfile** ```shell <system> rpc_endpoint 127.0.0.1:24444 </system> ``` ```shell $ curl http://127.0.0.1:24444/api/${ENDPOINT} ``` | ENDPOINT | Function | | -------- | -------- | |/api/processes.interruptWorkers | Stops the daemon. | |/api/processes.killWorkers | Stops the daemon. | |/api/processes.flushBuffersAndKillWorkers |Flushes buffer and stops the daemon. | |/api/plugins.flushBuffers | Flushes the buffered messages. | |/api/config.gracefulReload | Reloads configuration. | |/api/config.reload | Reloads configuration. | [RPC reference](https://docs.fluentd.org/deployment/rpc) ::: warning While fluend supports endpoint features, **endpoint support for FluentBit** is still under developement, and is a low priority feature. ::: ## Run and Config settings ###### For FluentBit setting, please check : ([NYI](https://)) Fluentd consist of multiple plugin functions. Which plugin to use and the data proccess can be defined in a config file. [list of all plugins](https://www.fluentd.org/plugins) When starting up fluentd, the config file path can be specified with "-f" argument. ```shell $ /bin/fluentd -f ${/PATH/TO/CONFIG} ``` To run fluentd as daemon(background), add the "-d" argument. ```shell $ /bin/fluentd -f ${/PATH/TO/CONFIG} -d --daemon $ pkill -f fluentd # this will stop the daemon ``` If fluentd was installed by package (td-agent), the applcation can be run by service handler. - RedHat Linux / Debian / Ubuntu: ```shell= $ vi /etc/systemd/system/td-agent.service # config location $ sudo systemctl start td-agent.service # start $ sudo systemctl status td-agent.service # status $ sudo systemctl stop td-agent.service # stop ``` - MacOS: ```shell= $ vi /etc/td-agent/td-agent.conf # config $ sudo launchctl load \ # start $ /Library/LaunchDaemons/td-agent.plist $ less /var/log/td-agent/td-agent.log # status $ sudo launchctl unload \ # stop $ /Library/LaunchDaemons/td-agent.plist ``` - Windows .msi: please check the [document](https://docs.fluentd.org/installation/install-by-msi) ### Plugin Install ### Config Syntax Our current process model: ```plantuml skinparam activity { BackgroundColor<< Result >> Lightblue BackgroundColor<< Data >> Dimgray BorderColor Peru BorderColor<< Result >> Tomato BorderColor<< Data >> White FontColor<< Result >> Tomato FontColor<< Data >> Lightgray } partition Database { log_data<< Data >>-down>mongoDB snmp_data<< Data >>-down>mongoDB } mongoDB-down>[mongo_tail]Fluentd partition MOD #lightgreen{ others<< Data >>-down>[...]FluentBit metrics<< Data >>-down>[cpu]FluentBit syslog<< Data >>-down>[file]FluentBit } FluentBit->[parsing/filter]FluentBit FluentBit-down->[forward] Fluentd Fluentd->[filter/csv]Fluentd Fluentd-down->[copy]==Routing== ==Routing== -down> SQL -down>report << Result >> ==Routing== -down> Prometheus -down>alert << Result >> ``` - **Pipeline** The configuration file consists of these directives: - **source** -- directives determine the input sources - **match** -- directives determine the output destinations - **filter** -- directives determine the event processing pipelines - **system** -- directives set system wide configuration - **label** -- directives group the output and filter for internal routing - **@include** -- directives include other files Normally, a basic pipeline would be like this: source(input) -> filter -> match(output) - **Language** Fluentd plugins must be wrapped with Each line must fallows the rule of Strict Indention To make comments, use "#" at the front of the line. ```yaml= <source> # specify directive pipeline @type forward # specify witch plugin to use bind 0.0.0.0 port 24224 # arguments values # ... ... tag test_tag # add tag to message, important in fluentd </source> ``` - **How "Match" works** ```yaml=11 <match test_tag> # match the message with the tag defined previously @type stdout # specify witch plugin to use </match> ``` ### Source Plugin (input) 1. **forward:** get data from other fluentd/fluentBit 2. **mongo_tail:** periodically check for database update, then output the updated results ### Filter Plugin (modifying) 1. **csv:** parse a csv string messages to json format 2. **grep:** filter / filter out messages that contains some scecific words 3. **pretty json format:** ### Match Plugin (output) 1. **mongo:** insert json message to mongoDB 2. **stdout** output raw message to stdout 3. **copy** copy the output message to different plugins ### Metrics 1. **prometheus input plugin**: This plugin provides a HTTP endpoint that exposes metrics to Prometheus server on port 24231. 2. **prometheus_monitor input plugin**: This plugin collects internal metrics in Fluentd. This plugin is used to monitor Fluentd itself. 3. **prometheus_output_monitor input plugin**: This plugin collects internal metrics for **output plugin** in Fluentd. [Prometheus Integration](https://hackmd.io/8ZeK16vGRlO1Do7yP6mB_Q#Fluentd-amp-Prometheus) ___ [TOC] Created 2020/08/28 Last Modified 2020/09/01