Processing logs with Elastic Stack #1 – parse and send various log entries to ElasticSearch

featured image

The default logging mechanism in Spring Boot allows us to start working on our POC in no time. However, we must be aware that inadequate logging makes debugging and monitoring difficult in a production environment.

What we are going to build

In this example we are going to work with the project described in the Spring Boot Log4j 2 advanced configuration #2 – add a Rollover Strategy for log files post and available in the spring-boot-log4j-2-scaffolding repository. To enhance the project with the Elastic Stack we’re going to add:

  • FileBeat to read from a log file and pass entries to Logstash;
  • Logstash to parse and send logs to Elasticsearch;
  • Elasticsearch to keep indexed logs accessible to Kibana;
  • Elastichq to monitor Elastic.

Dockerized environment

All services are configured in the docker-compose.yml file which is attached to the project. You can clone the repository and run $ docker-compose up on your machine to verify results.

The example configuration is based on the documentation. All sensitive and configurable properties will be passed as environmental variables. In the same directory as the docker-compose.yaml resides, create the file that contains the default values for the environment:

Using the COMPOSE_PROJECT_NAME variable is totally up to you – I just wanted to shorten the service names in the command line output. Browse image tags to see other available versions.

Elasticsearch

The container config is shown in the following snippet from the docker-compose.yaml file:

Volume

To keep data between container restarts I set up a named volume on my machine. I mounted the content of the /usr/share/elasticsearch/data (recommended in the docs and in this issue) to my elasticsearch volume.

Environment variables

The details concerning the ES_JAVA_OPTS are covered in the Setting JVM options for an ElasticSearch service run in a Docker container post. Let’s explore the rest in the following sections.

Security

By default the security features are disabled. We want to run secured communication within the services. Therefore, we set the xpack.security.enabled property to true and provide the credentials.

Production and development mode

When an Elasticsearch node is using the single-node discovery it can’t form a cluster with another machine via a non-loopback address. Configuring the internal communication in this way means that the node is in the development node. We wan’t to work in this mode in order to disable bootstrap checks. In the development mode any failed check will be logged as a warning while in the production mode it will prevent the start of the application.

These bootstrap checks inspect a variety of Elasticsearch and system settings and compare them to values that are safe for the operation of Elasticsearch.

https://www.elastic.co/guide/en/elasticsearch/reference/current/bootstrap-checks.html

Ports

Elasticsearch uses the http and transport ports. The former support incoming HTTP requests and the latter serves for communication between nodes. We’re going to run only one elasticsearch container, therefore we’ll expose only the http port - 9200 to allow communication with Logstash and Kibana (to expose the APIs over HTTP). Check out the documentation on configuring the transport modules if you need to set up communication between nodes.

Networks

I’m going to keep all services running in the example project within one network – internal. Feel free to configure networking according to your needs.

Elastichq

For monitoring Elasticsearch nodes we’re going to use ElasticHQ. It’s an opensource application that we can run using its docker image. This tool provides the REST API for managing clusters on the http://localhost:5000/api url. To run the service with Docker I updated the docker-compse.yaml file below:

To make sure the elasticsearch service is started before elastichq, we use the depends_on property.

Connecting to Elasticsearch

After starting the container we can verify the results by visiting the default address http://localhost:5000. The page is presented on the screenshot below:

elastichq connecting with elasticsearch screenshot

The default url visible in the input takes the value from the HQ_DEFAULT_URL environment variable. The ElasticHQ format for Basic Auth requires adding the credentials for Elasticsearch. To make the default port (5000) available it is exposed in the docker-compose.yaml file. After successful connection with the elasticsearch node we are redirected to the following view:

elastichq screen screenshot

You can also apply connection with SSL, change logging setup or externalize the configuration.

Logstash

To ensure that the Elasticsearch output will be well structured we are going to transfer data through a Logstash pipeline.

Pipeline

Create the logstash.conf file in which we’re going to specify and configure plugins for each pipeline section:

Logstash documentation contains other example confiturations to illustrate how you can create a more advanced setup. Let’s take a look at the structure of our config file.

input

Logstash will expect incoming Beats connections on the 5044 port. We have to remember this when configuring the Filebeat output.

filter

Applying filters allows us to parse and customise unstructured log data. With Grok filter plugin we can configure syntax and semantic to pull out useful fields from a log entry. Feel free to browse available patterns or create custom patterns if needed.

match

Every log entry is going to be matched against regular expressions and mapped according to the parts we want to extract. You don’t need to start your application to verify whether the pattern will work. Visit Grok Debugger, paste an example log line and your pattern to see the matches. As you can see I defined two matches – one for java exceptions and one for Spring Boot logs. You can learn more about it in the How to parse exceptions and normal logs with Grok filters post.

Custom grok patterns

In the match section you can see that I declared path to the file containing my custom grok patterns:

In my project the patterns directory is located in the same place as the logstash.conf file. Later, in the docker configuration you’ll see that I mount the ./logstash/pipeline directory to the /usr/share/logstash/pipeline location in the container. Therefore, in the patterns_dir option I put the resulting path to this file.

Conditions

In my example I want to impose control over how events are processed by the filter:

output

This is the final step in our pipeline. We can declare multiple outputs to push data to different destinations and we have a wide range of available plugins to assist us in this task. In our example I’m going to use the elastisearch plugin. You can explore documentation to learn about all accessible options for this plugin. I decided to set the output options with environmental variables.

hosts

This parameter is used to reference either data or client nodes in Elasticsearch. I could pass an array of hosts to distribute requests across them, but for the sake of simplicity let’s just use one – elasticseach:9200.

user, password

We enabled the Elasticsearch security in the container config. Therefore, my logstash service has to use the username and password that will be set for elasticsearch.

index

The log events will be written to this index. I decided to make it dynamic by combining the spring-boot-app-logs prefix with event timestamp formatted according to Joda format.

Debugging in console

You can add another output to see Logstash logs in the console:

If you use IntelliJ with Docker support you will be able to see the parsed log entries alongside regular Logstash logs:

logstash output in console screenshot

Docker container

To run this service we’re going to add the following lines to our docker-compose.yml file:

Applying pipeline config to the docker service using a volume

This service has to use the pipeline in the logstash.conf file and our custom grok patterns. We’re going to mount the ./logstah/pipeline directory as a read-only volume to the /usr/share/logstash/pipeline location in the container. Thanks to that, our container will be able to use our configuration file and custom patterns.

Ports

We need to expose the default Logstash port for Elasticsearch – 9600 as well as the 5044 port we already defined in the logstash.conf file as the input port for data send by Filebeat.

Environment variables

We’re going to set the heap size with the same JAVA_OPTS variable as in the Elasticsearch container. Furthermore, we have to set credentials and define hosts that will be applied in the logstash.conf file. I decided to disable X-Pack Monitoring to keep this example as simple as possible. If you leave it enabled but not configured you will get the Unable to retrieve license information from license server error.

Filebeat

Filebeat can read and forward log lines reliably even when it’s interrupted. Once everything works again, it starts from where it was when the failure occurred. In this example it will read log entries from the all.log file. Remember that Filebeat doesn’t read the last line in a file if there is no new line after it. You can explore all configuration options in the filebeat.reference.yml file and learn more about this tool in the How Filebeat works docs.

Configuration

We need to configure our filebeat service to:

  • read log entries from the all.log file,
  • concatenate lines of a stacktrace into one entry,
  • send entries to our logstash service.

You can see the full setup in the snippet below:

inputs

In this section we tell Filebeat how it should locate and process data. We are enabling the log input that will read from the file specified in the paths option. Additionally, we’re concatenating Java stack trace into one entry by using the multiline option. Check out the Reading from rotating logs and Log rotation results in lost or duplicate events articles if you want to configure Filebeat to read from rotating log files.

output

We’re going to configure Filebeat to use Logstash. For the sake of simplicity I specified only one entry in the hosts options and used the default 5044 port.

Set config file permissions properly

We can read in the documentation, that:

The owner of the configuration files must be either root or the user who is executing the Beat process. The permissions on each file must disallow writes by anyone other than the owner.

https://www.elastic.co/guide/en/beats/libbeat/current/config-file-permissions.html#config-file-permissions

To comply with this requirements for our filebeat.yml file, we are going to customize the image configuration. Create the Dockerfile file with the follwing content:

  • Our configuration will be copied to the /usr/share/filebeat/filebeat.yml location in the container.
  • We switch temporarily to root to change ownership of this file: the user ownership is changed to the root and the group ownership is changed to the filebeat.
  • We remove write privilege for anyone except the owner.
  • We switch to the filebeat user.
Don’t disable strict permission check and don’t run the container as root to fix the ownership issue. Use the custom image instead.

Configure the docker container

The filebeat service is the last one we’re going to set up in the docker-compose.yml file in this article.

In the filebeat.yml file we specified the path to our log file: /logs/all.log. In the container config we have to mount this file to the given path. Thanks to this volume, Filebeat can access the logs as you can see on the screenshot below:

log file mounted to filebeat container screenshot

Verify results

Let’s assume that we have the all.log file with the following content:

I’m going to start all services with the following command run in the directory where my docker-compose.yml file is located:

When all services are running I have to do the following:

  • visit the Elastichq app running on the http://localhost:5000/,
  • connect with the http://elastic:test@elasticsearch:9200 url in the form input,
  • choose Query option from the menu (top right corner),
  • select the spring-boot-app-logs-YYYY.MM.dd index and execute the default query.

The results looks like on the screenshot below:

elastichq query resutls screenshot

We can verify that filtering, parsing and mutating log entries work correctly.

The work presented in this article is contained in the commit 2a8068c2209c00605f2f24470c75b83ab712267c.

Useful links

Photo by Alex Knight on StockSnap

Leave a Reply

Your email address will not be published. Required fields are marked *