EFK stack is Elasticsearch, Fluent bit and Kibana UI, which is gaining popularity for Kubernetes log aggregation and management. The 'F' is EFK stack can be Fluentd too, which is like the big brother of Fluent bit. Fluent bit being a lightweight service is the right choice for basic log management use case.
So in this tutorial, we will be deploying Elasticsearch, Fluent bit, and Kibana on Kubernetes. Before taking the EFK stack setup to Kubernetes, if you want to test it on a local server, you can check this post How to set up Elasticsearch, Fluent bit, and Kibana for Log aggregation and Visualization. Also, we have written some more posts, explaining the setup of Fluent bit on Linux machines and in general what is Fluent bit?, you should check these too.
Why do we need EFK Stack?
Well if you know about Kubernetes then you must be thinking that you can use the kubectl logs
command to easily check logs for any Kubernetes pod running. But what if there are 100 pods or even more, in that case, it will be very difficult, on top of this, Kibana dashboard UI can be configured as you want to continuously monitor logs in runtime, which makes it easier for someone with no experience of running Linux commands to check logs, monitor the Kubernetes cluster and applications running on it.
If you are on AWS, then you can configure Elasticsearch to archive logs on the S3 bucket(which can be configured without the EFK stack too, but just saying), to have historical logs persisted.
If you have a large application with 100 pods running along with logs coming in from the Kubernetes system, docker container, etc, if you do not have a centralized log aggregation and management system, you will, sooner or later, regret the big-time, hence the EFK stack is a good choice.
Also, using Fluent bit we can parse logs from various different input sources, and filter them to add more info. or remove unwanted info, and then store the data in Elasticsearch.
How does it Work?
Well, to understand the setup, here is a picture:
Here we have a Kubernetes cluster with 3 nodes, on these 3 nodes pods will be created to run various services like your Applications, and in this case the EFK stack.
The fluent bit is run as a DaemonSet, which means each node in the cluster will have one pod for Fluent bit, and it will read logs from the /var/log/containers directory where log files are created for each Kubernetes namespace.
Elastcisearch service runs in a separate pod while Kibana runs in a separate pod. They can be on the same cluster node too, depending upon the resource availability. But usually, both of them demand high CPU and memory so their pods get started on different cluster nodes.
There will be some pods running your applications, which are shown as App1, App2, in the above picture.
The Fluent bit service will read logs from these Apps, and push the data in JSON document format in Elasticsearch, and from there Kibana will stream data to show in the UI.
So let's start with the setup.
How To Setup EFK Stack On Kubernetes
Step 1: Create a Namespace
It's good practice to create a separate namespace for every functional unit in Kubernetes as this makes the management of pods running within a particular namespace easy. To see the existing namespaces, you can use the following command:
kubectl get namespaces
and you will see the list of existing namespaces:
NAME STATUS AGE
default Active 5m
kube-system Active 5m
kube-public Active 5m
We will be creating a new namespace with the name kube-logging for us. To do so create a new file and name it kube-logging.yaml using your favorite editor like vim:
vi kube-logging.yaml
Press i to enter the INSERT mode and then copy the following text in it.
kind: Namespace
apiVersion: v1
metadata:
name: kube-logging
Then press ESC followed by :wq! and hit ENTER.
To create the namespace using the YAML file created above, run the following command:
kubectl create -f kube-logging.yaml
you will see the following output:
namespace/kube-logging created
You can further confirm the namespace creation by running the kubectl get namespaces
command.
Step 2: Setup Elasticsearch
For Elasticsearch we will set up a headless service and a stateful set which will get attached to this service. A headless service does not perform load balancing or have a static IP. We are making Elasticsearch a headless service because we will set up a 3-node elastic cluster and we want each node to have all the data stored in it, so we don't want any load balancing. We will get 3 Elasticsearch pods running once we are done with everything, which will ensure high availability.
Creating Elasticsearch Service:
Create a new file and name it elastic-service.yaml using your favorite editor like vim:
vi elastic-service.yaml
Press i to enter the INSERT mode and then copy the following text in it.
kind: Service
apiVersion: v1
metadata:
name: elasticsearch
namespace: kube-logging
labels:
app: elasticsearch
spec:
selector:
app: elasticsearch
clusterIP: None
ports:
- port: 9200
name: rest
- port: 9300
name: inter-node
Then press ESC followed by:wq! and hit ENTER.
In the YAML file, we have defined a Service
called elastic search in the kube-logging namespace, and give it a label app: elasticsearch which will be used when we define the stateful set for Elasticsearch. Also, we have kept the clusterIP as None as this is required for making it a headless service.
And we have specified the ports as 9200 and 9300 for REST API access and for inter-node communication.
To create the service using the YAML file created above, run the following command:
kubectl create -f elastic-service.yaml
You should see the following output:
service/elasticsearch created
To double check, we can run the following command to see all the services running in the kube-logging namespace that we created:
kubectl get services -n kube-logging
You will see the output similar to this:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 26s
Creating the StatefulSet
Now let's define the YAML for creating the statefulset for Elasticsearch service. When we define a statefulset, we provide a lot of information like the cluster information which includes the cluster name, number of replicas, and template for replica creation, then along with cluster information, we specify which Elasticsearch version to be installed, we provide the resources like CPU and Memory too in the StatefulSet only.
Create a new file and name it elastic-service.yaml using your favorite editor like vim:
vi elastic-statefulset.yaml
Press i to enter the INSERT mode and then copy the following text in it.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: es-cluster
namespace: kube-logging
spec:
serviceName: elasticsearch
replicas: 3
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:7.2.0
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
ports:
- containerPort: 9200
name: rest
protocol: TCP
- containerPort: 9300
name: inter-node
protocol: TCP
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
env:
- name: cluster.name
value: k8s-logs
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: discovery.seed_hosts
value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch"
- name: cluster.initial_master_nodes
value: "es-cluster-0,es-cluster-1,es-cluster-2"
- name: ES_JAVA_OPTS
value: "-Xms512m -Xmx512m"
initContainers:
- name: fix-permissions
image: busybox
command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
securityContext:
privileged: true
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
- name: increase-vm-max-map
image: busybox
command: ["sysctl", "-w", "vm.max_map_count=262144"]
securityContext:
privileged: true
- name: increase-fd-ulimit
image: busybox
command: ["sh", "-c", "ulimit -n 65536"]
securityContext:
privileged: true
volumeClaimTemplates:
- metadata:
name: data
labels:
app: elasticsearch
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: do-block-storage
resources:
requests:
storage: 10Gi
Then press ESC followed by :wq! and hit ENTER.
In the above YAML file, we have defined the following:
-
The Elasticsearch cluster information like cluster name which is es-cluster, the namespace for it which will be kube-logging, name of the service which we defined in the section above, the number of replicas as 3, and the template for those replicas which will be app: elasticsearch.
-
We have defined the container information, like the version of Elasticsearch to be set up, which is 7.2.0 in this case, then the resource allocation, CPU and Memory, the limit section defines the maximum limit, and the requests section defines how much will be used.
-
The Port information to define the port numbers for REST API and inter-node communication.
-
Then we have the environment variables followed by init containers which is some pre-setup commands run before Elasticsearch app is run, and at last we have defined the storage to be allocated for Elasticsearch data which we have kept as 10 GB, but you can increase it as per your requirements.
To create the service using the YAML file created above, run the following command:
kubectl create -f elastic-statefulset.yaml
You should see the following output:
statefulset.apps/es-cluster created
To double check, we can run the following command to see all the pods running in the kube-logging namespace that we created:
kubectl get pod -n kube-logging
You should see something like this in the output:
es-cluster-0 1/1 Running 0 3m07s
es-cluster-1 1/1 Running 0 3m07s
es-cluster-2 0/1 Pending 0 3m07s
We can also do a CURL request to the REST API, but for that we need the IP address of the pod, to get that, run the following command:
kubectl get pod -n kube-logging -o wide
The output for this command will be:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
es-cluster-0 1/1 Running 0 3m12s XX.XXX.XXX.XXX YOUR_NODE_NAME <none> <none>
Now you can run the following CURL command hit Elasticsearch service:
curl http://XX.XXX.XXX.XXX:9200/
which will give output like this:
{
"name" : "es-cluster-0",
"cluster_name" : "es-cluster",
"cluster_uuid" : "UfWUnhaIJUyPLu4_DkW7ew",
"version" : {
"number" : "7.2.0",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "508c38a",
"build_date" : "2019-06-20T15:54:18.811730Z",
"build_snapshot" : false,
"lucene_version" : "8.0.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
Step 3: Setup Kibana
Now you know some kubectl
commands and how we create the YAML files and run them, so I will skip that part. For Kibana, we will have a kibana service and a deployment to launch one pod.
We will be creating two YAML files, one for Kibana service and the other for Kibana deployment.
Here is the kibana-service.yaml file (Use the vim editor to create a file and save the content in it, just like we did above):
apiVersion: v1
kind: Service
metadata:
name: kibana
namespace: kube-logging
labels:
app: kibana
spec:
ports:
- port: 5601
selector:
app: kibana
In the service YAML we specified the service name, namespace name, a port on which the service will be accessible, and label which is app: kibana for the service.
Now, let's create the deployment YAML file with name kibana-deployment.yaml,
apiVersion: apps/v1
kind: Deployment
metadata:
name: kibana
namespace: kube-logging
labels:
app: kibana
spec:
replicas: 1
selector:
matchLabels:
app: kibana
template:
metadata:
labels:
app: kibana
spec:
containers:
- name: kibana
image: docker.elastic.co/kibana/kibana:7.2.0
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 700m
memory: 1Gi
env:
- name: ELASTICSEARCH_URL
value: http://elasticsearch:9200
ports:
- containerPort: 5601
In the deployment file, we have specified the image version for Kibana which is 7.2.0(the version of Elasticsearch and Kibana should be the same), port information, resource information like CPU and memory, etc.
To create the service and deployment using the YAML file created above, run the following command:
kubectl create -f kibana-service.yaml
kubectl create -f kibana-deployment.yaml
You should see the following output:
service/kibana created
deployment.apps/kibana created
To check the pod status run the following command:
kubectl get pods -n kube-logging
NAME READY STATUS RESTARTS AGE
es-cluster-0 1/1 Running 0 2h
es-cluster-1 1/1 Running 0 2h
es-cluster-2 0/1 Pending 0 2h
kibana-598vgt546f5-7b9wx 1/1 Running 0 2h
To access the Kibana UI, we can run the below command to access the UI using the browser,
kubectl port-forward kibana-598vgt546f5-7b9wx 5601:5601 --namespace=kube-logging
Then you can access the Kibana UI using the following URL http://localhost:5601/
Step 4: Fluent Bit Service
For Fluent Bit, we will create 5 YAML files and apply them using the kubectl command like we did in the above sections. The YAML files will be:
YAML File |
Purpose |
fluent-bit-service-account.yaml |
This is used to create a ServiceAccount with name fluent-bit in the namespace kube-logging, which Fluent Bit pods will use to access the Kubernetes API. |
fluent-bit-role.yaml |
This creates a ClusterRole which is used to grant the get , list , and watch permissions to fluent-bit service on the Kubernetes resources like the nodes, pods and namespaces objects. |
fluent-bit-role-binding.yaml |
This is to bind the ServiceAccount to the ClusterRole created above. |
fluent-bit-configmap.yaml |
This is the main file in which we specify the configurations for the Fluent Bit service like Input plugin, Parser, Filter, Output plugin, etc. We have already covered Fluent Bit Service and its Configurations. |
fluent-bit-ds.yaml |
This defines the DaemonSet for Fluent Bit along with Elasticsearch configuration, along with other basic configurations. |
Below we have the content of all the files. Please create these files and then we will run them all.
fluent-bit-service-account.yaml File:
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit
namespace: kube-logging
labels:
app: fluent-bit
fluent-bit-role.yaml File:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: fluent-bit
labels:
app: fluent-bit
rules:
- apiGroups: [""]
resources:
- pods
- namespaces
verbs: ["get", "list", "watch"]
fluent-bit-role-binding.yaml File:
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: fluent-bit
roleRef:
kind: ClusterRole
name: fluent-bit
apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
name: fluent-bit
namespace: kube-logging
fluent-bit-configmap.yaml File:
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
labels:
k8s-app: fluent-bit
data:
# Configuration files: server, input, filters and output
# ======================================================
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
@INCLUDE input-kubernetes.conf
@INCLUDE filter-kubernetes.conf
@INCLUDE output-elasticsearch.conf
input-kubernetes.conf: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
filter-kubernetes.conf: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
K8S-Logging.Exclude Off
output-elasticsearch.conf: |
[OUTPUT]
Name es
Match *
Host ${FLUENT_ELASTICSEARCH_HOST}
Port ${FLUENT_ELASTICSEARCH_PORT}
Logstash_Format On
Replace_Dots On
Retry_Limit False
parsers.conf: |
[PARSER]
Name apache
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name apache2
Format regex
Regex ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name apache_error
Format regex
Regex ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$
[PARSER]
Name nginx
Format regex
Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name json
Format json
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
[PARSER]
Name syslog
Format regex
Regex ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
Time_Key time
Time_Format %b %d %H:%M:%S
fluent-bit-ds.yaml File:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: logging
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
template:
metadata:
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "2020"
prometheus.io/path: /api/v1/metrics/prometheus
spec:
containers:
- name: fluent-bit
image: fluent/fluent-bit:1.3.11
imagePullPolicy: Always
ports:
- containerPort: 2020
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "elasticsearch"
- name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
terminationGracePeriodSeconds: 10
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config
serviceAccountName: fluent-bit
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- operator: "Exists"
effect: "NoExecute"
- operator: "Exists"
effect: "NoSchedule"
Once you have created these files, we will run the kubectl create command to run the Fluent Bit service. Run the following commands:
kubectl create -f fluent-bit-service-account.yaml
kubectl create -f fluent-bit-role.yaml
kubectl create -f fluent-bit-role-binding.yaml
kubectl create -f fluent-bit-configmap.yaml
kubectl create -f fluent-bit-ds.yaml
Run the following command to see if the daemons it created or not:
kubectl get ds -n kube-logging
And that's it. Our work here is done. You can use the kubectl get pod
and kubectl get services
commands used in the above section to see the pod information and the services running.
Conclusion:
So in this long tutorial, we successfully set up the EFK stack for logging in Kubernetes. The EFK stack here refers to Elasticsearch, Fluent Bit, and Kibana. If you want to set up Fluentd service, the same YAML files can be used that we used for Fluent Bit, with fluent-bit changed to fluentd everywhere.
If you face any issue in the setup, share it with us and we will definitely help you out.
Frequently Asked Questions(FAQs)
1. What is the EFK stack?
The EFK stack consists of Elasticsearch, Fluent-bit, and Kibana. It is a popular combination of open-source tools used for log management in Kubernetes environments. Elasticsearch is a distributed search and analytics engine, Fluent-bit is a lightweight log collector, and Kibana is a data visualization platform.
2. Why use the EFK stack for Kubernetes log management?
The EFK stack offers several advantages for Kubernetes log management. It provides centralized log storage, efficient log collection and parsing, and powerful data visualization capabilities. Using the EFK stack allows you to effectively monitor and analyze logs, troubleshoot issues, and gain valuable insights into your Kubernetes environment.
3. How do I set up the EFK stack for Kubernetes log management?
Setting up the EFK stack involves deploying and configuring Elasticsearch, Fluent-bit, and Kibana in your Kubernetes cluster. You can use YAML manifests or Helm charts to deploy these components. Configuration involves defining log sources, setting up log parsing and filtering, and configuring visualization dashboards in Kibana.
4. Can I scale the EFK stack to handle large log volumes?
Yes, the EFK stack is designed to handle large log volumes. Elasticsearch, the core component of the stack, is built for horizontal scalability and can be configured as a cluster to handle increased log ingestion and storage requirements. By properly configuring and scaling the EFK stack components, you can handle logs at scale.
5. Are there alternatives to the EFK stack for Kubernetes log management?
Yes, there are alternatives to the EFK stack for Kubernetes log management. Some popular alternatives include the ELK stack (Elasticsearch, Logstash, and Kibana) and the Prometheus-Grafana stack. Each stack has its own set of features and capabilities, so it's essential to evaluate your requirements and choose the stack that best aligns with your needs.
You may also like: