Fluent bit is an open source, light-weight, and multi-platform service created for data collection mainly logs and streams of data. Fluent bit service can be used for collecting CPU metrics for servers, aggregating logs for applications/services, data collection from IOT devices(like sensors) etc.
Fluent bit is easy to setup, configure and start collecting data from different sources. It has been developed with performance in focus because log processing is a continuos process.
It gives you full control of what data to collect, parsing the data to provide a structure to the data collected and remove unwanted data, filtering data and pushing it to some configured output destination, thereby providing an end to end solution for data collection.
Currently, Fluent Bit is amongst the most preferred service for managing the log challenges in cloud environments.
Fluent Bit provides support for multiple input sources for collecting logs and process them and then push them to multiple different destinations which can be configured by doing simple changes in the configuration file for Fluent Bit service.
Fluent Bit is also compatible with Docker and Kubernetes and can be used to aggregate logs for applications running in Kubernetes pods.
Fluent Bit Features
Fluent Bit is created with just what you need for log processing and aggregation, nothing less nothing more. Fluent Bit is created by TreasureData, which first created Fluentd which is kind of an advanced version of Fluent Bit or Fluent Bit is a lighter version of Fluentd.
Where Fluent Bit supports about 70 plugins for Input and Output source, Fluentd supports 1000+ plugins for Input and Output sources.
1. Fluent Bit is super Lightweight and fast, requires less resource and memory to work and all the I/O operations are done in asynchronous mode.
2. Fluent Bit is event driven where each log statement processed is treated as an event.
3. We can easily configure various different input sources.
4. It supports multiple data formats.
5. Simplified Routing mechanism. We can tag the incoming data and then match the tag while configuring the output destination and fluent bit automatically handles everything.
6. The configuration file for Fluent Bit is very easy to understand and modify.
7. Communication with the output destination is secured via TLS and Fluent Bit even support using certificates for the output destinations.
What are Input and Output Plugin?
Fluent Bit can be configured to collect data from various different sources and then push the collected log data onto a destination.
The different ways using which we can collect data is managed via Input plugins which are predefined in Fluent Bit, all we have to do is choose the right one. For example if you want to collect CPU metrics, all you have to do is specify Fluent Bit to use the cpu input plugin, similarly if you have to read one or multiple log files, you can use the tail input plugin to continuously logs from the files specified.
Here is a list of the Input Plugins supported in Fluent Bit 1.4: Fluent Bit Input Plugins
Just like input plugins, we have output plugins, which are again predefined in Fluent Bit, all we have to do is choose the one that satisfy our needs and use it in the Fluent Bit configuration file. For example, if you wish to push the logs aggregated onto Elasticsearch, you can do so using the elasticsearch output plugin, or if you want to write all the aggregated logs on a file you can use the file output plugin, or to stream aggregate logs on Kafka you can use the Kafka output plugin, etc.
Here is a list of the Output Plugins supported in Fluent Bit 1.4: Fluent Bit Output Plugins
Fluent Bit Key Terminologies and Concepts:
Now that we already now about Input and Output plugins, which are used to specify the input source and output destinations, let's cover a few other important concepts and terminologies which you must know before starting working with Fluent Bit.
So we will be covering the following:
-
Events or Records
-
Parser - Structuring the Message
-
Filter
-
Buffer
-
Routing - Tag and Match
The diagram below represents the data pipeline for the flow of data in Fluent Bit,
Let's cover each one of these one by one.
What are Events or Records?
Every log statement or every incoming piece of data coming to Fluent Bit is treated as an event or a record.
If following is a chunk of log statements read by Fluent Bit:
Jan 18 12:52:16 flb systemd[2222]: Starting GNOME Terminal Server
Jan 18 12:52:16 flb dbus-daemon[2243]: [session uid=1000 pid=2243] Successfully activated service 'org.gnome.Terminal'
Jan 18 12:52:16 flb systemd[2222]: Started GNOME Terminal Server.
Jan 18 12:52:16 flb gsd-media-keys[2640]: # watch_fast: "/org/gnome/terminal/legacy/" (establishing: 0, active: 0)
Then, as we have 4 lines of logs here, this will be treated as 4 separate events.
An Event, internally, is in array form with the following information: [TIMESTAMP, MESSAGE]
Where, the timestamp is the unix time at which the event is received and the format is [SECONDS.NANOSECONDS] and message is the the log text.
What is a Parser?
When an event or a record is received by Fluent Bit, it is generally a log statement which has a lot of information stacked together in a single line, like a timestamp, thread information, fully qualified class name, log level and the log text.
To convert this unstructured log statement into a structured format we can use parsers. Fluent Bit provides multiple parsers, the simplest one being JSON Parser which expects the log statement events to be in a JSON map form.
We can also provide Regular expression parser where in we can define a custom Ruby Regular Expression that will use a named capture feature to define which content belongs to which key name.
For example, this is a default Regular expression parser configuration specified in Fluent Bit configuration file for parsing apache logs:
[PARSER]
Name apache
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
Hence when Fluent Bit will recieve Apache Server logs:
192.168.2.20 - - [29/Jul/2015:10:27:10 -0300] "GET /cgi-bin/try/ HTTP/1.0" 200 3395
The above specified parser will covert this unstructured log message into the following structured form:
[1154104030, {"host"=>"192.168.2.20",
"user"=>"-",
"method"=>"GET",
"path"=>"/cgi-bin/try/",
"code"=>"200",
"size"=>"3395",
"referer"=>"",
"agent"=>""
}
]
You can specify multiple Parsers in your Fluent Bit configuration file.
What is a Filter?
What if you want to append some additional information to the log data received? Or what if you want to filter out some unwanted log statements from the log data received? This is taken care by Fluent Bit Filters.
There are many filter plugins available which can be used for filtering the log data. You can find all the supported filter plugins here: Fluent Bit Filter Plugins
Buffering
During processing the log data, Fluent Bit uses the heap memory for storing the data temporarily before pushing the data to the output destination. Fluent Bit manages this temporary memory for storing data while its processed and not sent to output destination, while more log data keeps coming in for processing.
Fluent Bit has two different buffering strategies: Primary Buffering Mechanism and Secondary mechanism.
The primary mechanism is the memory approach and the secondary approach is using the file system. Most often both these are used together where in the data ready for processing or ready for delivery to output destination is in memory and the data which is in the queue can be stored in the file system.
Routing
At the time of reading the log data from an input source Fluent Bit we can add a tag to the log events/records which are matched at the time of routing of messages to an output destination. All this is done by Fluent Bit automatically and efficiently where all we have to do is provide a tag in the Input plugin configuration and provide the same string or a matcher regular expression in the Output configuration.
We can even configure multiple output destination to pick the same event. All the configuration depends on your requirements.
Let's take an example for this. In the part of configuration file shown below, we have setup two input sources, one being CPU and another being memory and two different output destination, elasticsearch and standard output:
[INPUT]
Name cpu
Tag my_cpu
[INPUT]
Name mem
Tag my_mem
[OUTPUT]
Name es
Match my_cpu
[OUTPUT]
Name stdout
Match my_mem
Now Fluent Bit will automatically route the log messages coming from CPU to elasticsearch(es) and log messages coming from memory metrics to standard output. All this is done using Tag and Match fields.
Fluentd vs. Fluent Bit
As discussed earlier, Fluentd is full fledged loggin layer which has a lot of features, where as Fluent Bit can be considered a super small application with only the required and useful features of Fluentd.
Let's see the basic differences between both:
Features |
Fluentd |
Fluent Bit |
Scope |
Containers and Servers |
Embedded Linux / Containers / Servers |
Language |
Based on C & Ruby |
Based on C |
Memory |
Requires about ~40MB memory |
Requires only ~650KB memory |
Performance |
High Performance |
High Performance |
Dependencies |
Its built as a Ruby Gem, it requires a certain number of gems. |
It has zero dependencies, unless some special plugin requires them. |
Plugins |
More than 1000 plugins available |
Around 70 plugins available |
Well apart from the above differences there are many similarities too when it comes to their functionalities like both Fluentd and Fluent Bit can act as Aggregators and Forwarders. Both of these services can be used together and separately as well.
Conclusion:
In this article we covered all about Fluent Bit and its basic working and the data flow. We also covered all the key terminologies associated with Fluent Bit and the difference between Fluentd and Fluent Bit.
You may also like: