Processing logs with Elastic Stack #1 - parse and send various log entries to Elasticsearch

Why should we learn how to process application logs with Elastic Stack? After all, the default logging mechanism in Spring Boot allows us to start working on our POC in no time. However, we must be aware that inadequate logging makes debugging and monitoring difficult in a production environment.

What we are going to build

In this example we are going to work with the project described in the Spring Boot Log4j 2 advanced configuration #2 – add a Rollover Strategy for log files post and available in the spring-boot-log4j-2-scaffolding repository. To enhance the project with the Elastic Stack we’re going to add:

FileBeat to read from a log file and pass entries to Logstash;
Logstash to parse and send logs to Elasticsearch;
Elasticsearch to keep indexed logs accessible to Kibana;
Elastichq to monitor Elastic.

As a result, we will be able to process Spring Boot logs with Elastic Stack.

Process logs in Elastic Stack run with Docker

All services are configured in the docker-compose.yml file which is attached to the project. Meanwhile, you can clone the repository and run $ docker-compose up on your machine to verify results. Remember to start the Spring Boot app first, so that there are logs for Elastic Stack to process.

The example configuration is based on the documentation. I store all sensitive and configurable properties as environmental variables. In the same directory as the docker-compose.yaml resides, create the file that contains the default values for the environment:

Using the COMPOSE_PROJECT_NAME variable is totally up to you – I just wanted to shorten the service names in the command line output. Browse image tags to see other available versions.

Run Elasticsearch with Docker

The container config is shown in the following snippet from the docker-compose.yaml file:

Volume

To keep data between container restarts I set up a named volume on my machine. I mounted the content of the /usr/share/elasticsearch/data (recommended in the docs and in this issue) to my elasticsearch volume.

Environment variables

You can read about the details concerning the ES_JAVA_OPTS in the Setting JVM options for an ElasticSearch service run in a Docker container post. Let’s explore the rest in the following sections.

Security

By default the security features are disabled. We want to run secured communication within the services. Therefore, we set the xpack.security.enabled property to true and provide the credentials.

Production and development mode

When an Elasticsearch node is using the single-node discovery it can’t form a cluster with another machine via a non-loopback address. Configuring the internal communication in this way means that the node is in the development node. We wan’t to work in this mode in order to disable bootstrap checks. In the development mode any failed check will be logged as a warning while in the production mode it will prevent the start of the application.

These bootstrap checks inspect a variety of Elasticsearch and system settings and compare them to values that are safe for the operation of Elasticsearch.
https://www.elastic.co/guide/en/elasticsearch/reference/current/bootstrap-checks.html

Ports

Elasticsearch uses the http and transport ports. The former support incoming HTTP requests and the latter serves for communication between nodes. We’re going to run only one elasticsearch container, therefore we’ll expose only the http port - 9200 to allow communication with Logstash and Kibana (to expose the APIs over HTTP). Check out the documentation on configuring the transport modules if you need to set up communication between nodes.

Networks

I’m going to keep all services running in the example project within one network – internal. Feel free to configure networking according to your needs.

Run Elastichq with Docker

For monitoring Elasticsearch nodes we’re going to use ElasticHQ. It’s an opensource application that we can run using its docker image. This tool provides the REST API for managing clusters on the http://localhost:5000/api url. To run the service with Docker I updated the docker-compse.yaml file below:

To make sure that the elasticsearch service will start before elastichq, we use the depends_on property.

Connecting to Elasticsearch

After starting the container we can verify the results by visiting the default address http://localhost:5000. You can see the page on the screenshot below:

elastichq connecting with elasticsearch screenshot

The default url visible in the input takes the value from the HQ_DEFAULT_URL environment variable. The ElasticHQ format for Basic Auth requires adding the credentials for Elasticsearch. To make the default port (5000) available it is exposed in the docker-compose.yaml file. After successful connection with the elasticsearch node we can see the following view:

elastichq screen where you can see how Elasti Stack process logs

You can also apply connection with SSL, change logging setup or externalize the configuration.

Run Logstash with Docker

Furthermore, to ensure that we process logs properly within our Elastic Stack, we are going to transfer data through a Logstash pipeline.

Pipeline

Create the logstash.conf file in which we’re going to specify and configure plugins for each pipeline section:

Logstash documentation contains other example confiturations to illustrate how you can create a more advanced setup. Let’s take a look at the structure of our config file.

input

Logstash will expect incoming Beats connections on the 5044 port. We have to remember this when configuring the Filebeat output.

filter

Applying filters allows us to parse and customise unstructured log data. With Grok filter plugin we can configure syntax and semantic to pull out useful fields from a log entry. Feel free to browse available patterns or create custom patterns if needed.

match

Every log entry is going to be matched against regular expressions and mapped according to the parts we want to extract. You don’t need to start your application to verify whether the pattern will work. Visit Grok Debugger, paste an example log line and your pattern to see the matches. As you can see I defined two matches – one for java exceptions and one for Spring Boot logs. You can learn more about it in the How to parse exceptions and normal logs with Grok filters post.

Custom grok patterns

In the match section you can see that I declared path to the file containing my custom grok patterns:

In my project the patterns directory is located in the same place as the logstash.conf file. Later, in the docker configuration you’ll see that I mount the ./logstash/pipeline directory to the /usr/share/logstash/pipeline location in the container. Therefore, in the patterns_dir option I put the resulting path to this file.

Conditions

In my example I want to impose control over how events are processed by the filter:

I merge two parts of the logger field as described in the Parsing logs with Grok #1 What to do when part of one field got caught in a different pattern post.
I remove grok failure tag when an entry was successfully processed by one of my matches as described in the Parsing logs with Grok #2 How to parse exceptions alongside regular logs post.

output

This is the final step in our pipeline. We can declare multiple outputs to push data to different destinations and we have a wide range of available plugins to assist us in this task. In our example I’m going to use the elastisearch plugin. You can explore documentation to learn about all accessible options for this plugin. I decided to set the output options with environmental variables.

hosts

We use this parameter to reference either data or client nodes in Elasticsearch. I could pass an array of hosts to distribute requests across them, but for the sake of simplicity let’s just use one – elasticseach:9200.

user, password

We enabled the Elasticsearch security in the container config. Therefore, my logstash service has to use the username and password that will be set for elasticsearch.

index

Logstash will write logs under this index. I decided to make it dynamic by combining the spring-boot-app-logs prefix with event timestamp formatted according to Joda format.

Debugging in console

You can add another output to see Logstash logs in the console:

If you use IntelliJ with Docker support you will be able to see the parsed log entries alongside regular Logstash logs:

verify the logs processed with Elastic Stack in Logstash console output

Docker container

To run this service we’re going to add the following lines to our docker-compose.yml file:

Applying pipeline config to the docker service using a volume

This service has to use the pipeline in the logstash.conf file and our custom grok patterns. We’re going to mount the ./logstah/pipeline directory as a read-only volume to the /usr/share/logstash/pipeline location in the container. Thanks to that, our container will be able to use our configuration file and custom patterns.

Ports

We need to expose the default Logstash port for Elasticsearch – 9600 as well as the 5044 port we already defined in the logstash.conf file as the input port for data send by Filebeat.

Environment variables

We’re going to set the heap size with the same JAVA_OPTS variable as in the Elasticsearch container. Furthermore, we have to set credentials and define hosts that will be applied in the logstash.conf file. I decided to disable X-Pack Monitoring to keep this example as simple as possible. If you leave it enabled but not configured you will get the Unable to retrieve license information from license server error.

Run Filebeat with Docker

Filebeat can read and forward log lines reliably even when it’s interrupted. Once everything works again, it starts from where it was when the failure occurred. Therefore, we can be sure that Elastic stack will process all logs.

In this example it will read log entries from the all.log file. Remember that Filebeat doesn’t read the last line in a file if there is no new line after it. You can explore all configuration options in the filebeat.reference.yml file and learn more about this tool in the How Filebeat works docs.

Configuration

Firstly, we need to configure our filebeat service to:

read log entries from the all.log file,
concatenate lines of a stacktrace into one entry,
send entries to our logstash service.

You can see the full setup in the snippet below:

inputs

In this section we tell Filebeat how it should locate and process data. We are enabling the log input that will read from the file that we specified in the paths option. Additionally, we’re concatenating Java stack trace into one entry by using the multiline option. Check out the Reading from rotating logs and Log rotation results in lost or duplicate events articles if you want to configure Filebeat to read from rotating log files.

output

We’re going to configure Filebeat to use Logstash. For the sake of simplicity I specified only one entry in the hosts options and used the default 5044 port.

Set config file permissions properly

We can read in the documentation, that:

The owner of the configuration files must be either root or the user who is executing the Beat process. The permissions on each file must disallow writes by anyone other than the owner.
https://www.elastic.co/guide/en/beats/libbeat/current/config-file-permissions.html#config-file-permissions

To comply with this requirements for our filebeat.yml file, we are going to customize the image configuration. Create the Dockerfile file with the follwing content:

Our configuration will be copied to the /usr/share/filebeat/filebeat.yml location in the container.
We switch temporarily to root to change ownership of this file: the user ownership is changed to the root and the group ownership is changed to the filebeat.
We want to remove write privilege for anyone except the owner.
In the end, we can switch to the filebeat user.

Don’t disable strict permission check and don’t run the container as root to fix the ownership issue. Use the custom image instead.

Configure the docker container

The filebeat service is the last one we’re going to set up in the docker-compose.yml file in this article.

In the filebeat.yml file we specified the path to our log file: /logs/all.log. In the container config we have to mount this file to the given path. Thanks to this volume, Filebeat can access the logs as you can see on the screenshot below:

file with logs to be processed with Elastic Stack mounted to a Filebeat container

Verify that Elastic Stack process application logs

Let’s assume that we have the all.log file with the following content:

I’m going to start all services with the following command run in the directory where my docker-compose.yml file is located:

When all services are running I have to do the following:

visit the Elastichq app running on the http://localhost:5000/,
connect with the http://elastic:test@elasticsearch:9200 url in the form input,
choose Query option from the menu (top right corner),
select the spring-boot-app-logs-YYYY.MM.dd index and execute the default query.

The results looks like on the screenshot below:

see logs processed with Elastic Stack in Elastihq

We can verify that filtering, parsing and mutating log entries work correctly. As you can see, we configured the Elastic Stack services to process our logs.

In addition, you can find the code that enables us to process logs with Elastic Stack in the commit 2a8068c2209c00605f2f24470c75b83ab712267c.

Learn more on how to take care of logs and process them with Elastic Stack

Logging guide from OWASP
Logging cheat sheet from OWASP
Insufficient logging and monitoring impacts
Using the grok debugger (YouTube video)
Getting started with the Elastic Stack
Getting started with Filebeat
disabling Logstash monitoring: How to fix logstash error Unable to retrieve license information from license server
enabling Logstash monitoring: Monitoring Logstash with X-Pack, Configuring Credentials for Logstash Monitoring

Photo by Alex Knight on StockSnap

Processing logs with Elastic Stack #1 – parse and send various log entries to Elasticsearch

What we are going to build

Process logs in Elastic Stack run with Docker