User activity on our apps is incredibly valuable data. System failures and performance states provide rich information that feeds the quality of a product. However collecting every single action from both humans and machines is sometimes a complex task.
ELK is a technology stack that mixes Elastic Search, Logstash and Kibana to provide a comprehensive approach to consolidate, manage, and analyze logs from your apps, by providing real-time insights from consolidated data.
The Problem
Assuming all systems activity goes to a single point is incorrect but not impossible, modern solutions like distributed systems produce thousand and millions of entries per day from different sources in our network.
I few words it’s an accelerated data growth from different sources.
And with big logs generation comes exponential disk usage.
The Solution
The most common way of getting graphs and business numbers from log files is to place all resources in a single disk/storage.
Centralized logs storage
Pull/Push strategy
Data can flow in both directions, pulled from the central log server or pushed from every node. It depends on resources and strategy. Once all eggs are in one basket, parse and search are an easy tasks with the right tools. And finally track and graph will produce an intersting desition making tool.
Diving into ELK (Elasticsearch, Logstash, Kibana)
The ELK Stack is popular because it fulfills a specific need in the log management and log analysis space. In cloud-based infrastructures, consolidating log outputs to a central location from different sources like web servers, mail servers, database servers, network appliances can be particularly useful. This is especially true when trying to make better data-informed decisions. The ELK stack simplifies searching and analyzing data by providing insights in real-time from the log data.
It is common to run the full ELK stack, not each individual component separately. Each of these services play a important role and in order to perform under high demand it is more advantageous to deploy each service on its own server. Doing so introduces other problems related to multi-node deployments, networking, security, and management. For now, we will stick with the single stack deployment so we can learn the basics of ELK and use it for development and testing purposes. We will address multi-node deployment suitable for production environments further along in the process.
ELK components the big picture
Elasticsearch:
A distributed RESTful search engine built for the cloud that stores the logged messages.
Logstash:
Collects, processes, and forwards events and log messages. Receives logged messages and relays them to ElasticSearch.
Kibana:
An open-source, browser-based analytics and search dashboard. Provides visualization capabilities on top of the content indexed on an Elasticsearch cluster.
Beats:
Client agent in charge of pushing any new file or modification from the datasource.
Implementing ELK the easy way (Docker)
This guide offers a step-by-step walk-through on performing the basic setup of an ELK Stack using container technology (Docker) and learning the basics of implementing logs into the stack and performing real time analysis of the data.
This option is particularly useful approach if you are not interested in getting into low level details of the ELK setup process and just want to leverage its benefits.
Setting up the ELK Container
The first thing that we need to do is download an ELK image. We will be using the following image available in DockerHub, which packages the latest available versions of the components:
Let’s pull the image by issuing the following command:
$ docker pull sebp/elk
Using default tag: latest
Trying to pull repository docker.io/sebp/elk ... latest: Pulling from sebp/elk
The image is around 900+GB so depending on your internet connection it will take some time. Once the image has been downloaded you will see the following message:
Status: Downloaded newer image for <a class="external-link" href="http://docker.io/sebp/elk:latest" rel="nofollow">docker.io/sebp/elk:latest</a>
And you should be able to see the image executing docker images command:
$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
docker.io/sebp/elk latest 3e07e86fb40d 10 days ago 975.5 MB
Now we can create a container from the image with the following command:
$ sudo docker run -p 5601:5601 -p 9200:9200 -p 5044:5044 -p 5000:5000 -it --name elk sebp/elk
This command publishes the following ports, which are needed for proper operation of the ELK stack:
- 5601 (Kibana web interface).
- 9200 (Elasticsearch JSON interface).
- 5044 (Logstash Beats interface, receives logs from Beats such as Filebeat – see the Forwarding logs with Filebeat section).
- 5000 (Logstash Lumberjack interface, receives logs from Logstash forwarders – see theForwarding logs with Logstash forwarder section).
- Access Kibana’s web interface by browsing to http://:5601, where is the hostname or IP address of the host Docker is running on, e.g. localhost (if running a local native version of Docker, or the IP address of the virtual machine if running a VM-hosted version of Docker).
As in Kibana version 4.0.0, you won’t be able to see anything (not even an empty dashboard) until something has been logged, so let’s jump right on to the next section on how to forward logs from regular applications, like Apache Web Server.
Logstash Configuration
Let’s first review how the process works:
- The Apache Web Server is running, and logging requests to configured path.
- Logstash, running as a persistent daemon, monitors the Apache logs for new lines and processes them.
- Logstash will send parsed logs in JSON document form to Elasticsearch for storage and the ability to perform analytics on them.
- Kibana uses Elasticsearch as a back-end for dashboarding and searching.
First thing to do is confirm where Apache is logging to. The default apache log file would be found here:
sudo tail /var/log/apache2/access.log
You should see one or more lines that look like this:
10.0.0.1 - - [18/Jun/2016:09:10:02 +0000] "GET / HTTP/1.1" 200 11359 "-" "curl/7.38.0"
This means, Logstash can now ingest and parse these logs.
Next, create a simple logstash configuration file - create a file named apache.conf (the name is arbitrary) under /etc/logstash/conf.d/apache.conf: /etc/logstash/conf.d/apache.conf
input {
file {
path => "/var/log/apache2/access.log"
start_position => beginning
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch { host => localhost }
}
If your log file is located at another non-traditional location, adjust the third line, path => ‘/var/log/apache2/access.log’, accordingly.
The config file simply watches the apache log file for events, parses them with a grok pattern (a simplified predefined regular expression) called COMBINEDAPACHELOG, and will print those events to standard output (the Logstash documentation has additional information.)
Visualising Data
To confirm the pipeline is working correctly, make some requests to your web server using your web browser. Then, look at an Elasticsearch API to see that a new index has been created (it will appear in the list of indices as logstash- followed by the date):
curl localhost:9200/_cat/indices
There should be an index with the current date appended with documents inserted by Logstash. Now, with Kibana running in the background from your systemd unit, browse to the machine’s address under port 5601 to start using Kibana’s interface:
elk-kibana-setup
Kibana makes an educated guess at your index and time field names, so selecting “Create” here will get you started. Click on the “Discover” tab at the top and you’ll see a timeline of Apache log events.
elk-kibana-discovery
Implementing ELK the hard way
By hard we mean deployment of every individual component from scratch on a fresh cloud machine.
Setup elasticsearch repositories
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
sudo apt-get install apt-transport-https
echo "deb https://artifacts.elastic.co/packages/5.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-5.x.list
sudo apt-get update
https://www.elastic.co/guide/en/elasticsearch/reference/current/deb.html
https://www.elastic.co/guide/en/elasticsearch/reference/master/heap-size.html
sudo apt-get install elasticsearch
vi /etc/elasticsearch/elasticsearch.yml
vi /etc/elasticsearch/jvm.options
-Xms512m
-Xmx1g
sudo systemctl start elasticsearch.service
Easy way to monitor service activity
sudo journalctl -f
sudo journalctl --unit elasticsearch --since "2016-10-30 18:17:16"
https://www.elastic.co/guide/en/logstash/current/installing-logstash.html#package-repositories
sudo apt-get install logstash
https://www.elastic.co/guide/en/logstash/current/running-logstash.html#running-logstash-systemd
vi /etc/logstash/logstash.yml
sudo systemctl start logstash.service
Test elasticsearch endpoint
curl -XGET 'localhost:9200/?pretty'
First test
root@ubuntu-xenial:/vagrant# systemctl start logstash.service
root@ubuntu-xenial:/vagrant# ps aux | grep logstash
logstash 27235 152 39.8 3444488 404764 ? SNsl 23:22 0:15 /usr/bin/java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -Djava.awt.headless=true -Dfile.encoding=UTF-8 -XX:+HeapDumpOnOutOfMemoryError -Xmx1g -Xms256m -Xss2048k -Djffi.boot.library.path=/usr/share/logstash/vendor/jruby/lib/jni -Xbootclasspath/a:/usr/share/logstash/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/usr/share/logstash/vendor/jruby -Djruby.lib=/usr/share/logstash/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main /usr/share/logstash/lib/bootstrap/environment.rb logstash/runner.rb --path.settings /etc/logstash
root 27270 0.0 0.0 14224 928 pts/0 S+ 23:22 0:00 grep --color=auto logstash
root@ubuntu-xenial:/vagrant# whereis logstash
logstash: /etc/logstash /usr/share/logstash
root@ubuntu-xenial:/vagrant# /usr/share/logstash/bin/logstash -e 'input { stdin { } } output { stdout {} }'
Install filebeat on client machine
sudo apt-get install filebeat
vi /etc/filebeat/filebeat.yml
filebeat.prospectors:
- input_type: log
paths:
- /var/log/syslog
output.logstash:
hosts: ["localhost:5044"]
systemctl start filebeat.service
Create your first pipeline
vi /etc/logstash/conf.d/localhost_syslog.conf
input {
beats {
port => "5044"
}
}
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}"}
}
geoip {
source => "clientip"
}
}
output {
stdout { codec => rubydebug }
}
systemctl restart logstash.service
Test config
root@ubuntu-xenial:/home/ubuntu# /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/localhost_syslog.conf --config.test_and_exit
WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults
Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs to console
Configuration OK
15:48:23.938 [LogStash::Runner] INFO logstash.runner - Using config.test_and_exit mode. Config Validation Result: OK. Exiting Logstash
root@ubuntu-xenial:/home/ubuntu#
Fire up the agent service on the client side
systemctl restart filebeat.service
Deploy kibana
sudo apt-get install kibana
whereis kibana
kibana: /etc/kibana /usr/share/kibana
vi /etc/kibana/kibana.yml
elasticsearch.url: "http://localhost:9200"
-
Set elasticsearch.url to point at your Elasticsearch instance config/kibana.ym
systemctl start kibana.service
-
Run bin/kibana
Control main services
ubuntu@ubuntu-xenial:~$
ubuntu@ubuntu-xenial:~$ sudo su
root@ubuntu-xenial:/home/ubuntu# systemctl start elasticsearch.service
root@ubuntu-xenial:/home/ubuntu# systemctl start logstash.service
root@ubuntu-xenial:/home/ubuntu# systemctl start filebeat.service
root@ubuntu-xenial:/home/ubuntu# systemctl start kibana.service
root@ubuntu-xenial:/home/ubuntu#
Monitor for startup activity
Apr 19 21:42:49 ubuntu-xenial su[1779]: pam_systemd(su:session): Cannot create session: Already running in a session
Apr 19 21:43:01 ubuntu-xenial systemd[1]: Starting Elasticsearch...
Apr 19 21:43:01 ubuntu-xenial systemd[1]: Started Elasticsearch.
Apr 19 21:43:08 ubuntu-xenial systemd[1]: Started logstash.
Apr 19 21:43:15 ubuntu-xenial systemd[1]: Started filebeat.
Apr 19 21:43:21 ubuntu-xenial systemd[1]: Started Kibana.
Check web instance is available
root@ubuntu-xenial:/home/ubuntu# curl 'localhost:9200/_cat/indices?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open .kibana Ru4w0KGQRLOa-Ogp5dHk7Q 1 1 2 0 23.4kb 23.4kb
yellow open filebeat-2017.04.20 bhSrqJLpQk2RTrJikBjTXw 5 1 1278749 0 152.8mb 152.8mb
yellow open logstash-2017.04.20 ez3TSPJZSniovcTPHtd92Q 5 1 586259 0 76.1mb 76.1mb
yellow open filebeat-2017.04.19 LvDgezqwSDaVVzg0utKIqg 5 1 3414161 0 407mb 407mb
root@ubuntu-xenial:/home/ubuntu#
Conclusion
Log management and analysis are key activities when it comes to maintaining a stable infrastructure. Having the possibility to consolidate logs from different sources and analyze them easily adds a lot of value to your DevOps practice and fosters a culture of continuous improvement within your organization.