I want to show you some issues that I encountered while applying the Grok filter plugin on logs. Most of my problems resulted from overusing the GREEDYDATA pattern (.*
). I tested my solutions with the Grok Debugger app where you can also find the list of available patterns.
On the screenshot below you can see how using the GREEDYDATA pattern on the logged_for_thread
field messed up the parsing:
We don’t need the --- []
characters in the final result. Therefore, we can use them to explicitly pinpoint the start and end of a thread name. By including them in the pattern, we exclude them from the outcome which gives us only the actual main
label. Furthermore, we can now use the NOTSPACE (\S+)
pattern instead of the .*
as before. You can see how it works on the following screenshot:
We can use the JAVACLASS pattern for finding loggers:
(?:[a-zA-Z0-9-]+\.)+[A-Za-z0-9$]+
Unfortunately, it is not enough to properly identify the logger in the following example:
To solve this issue I’m going to:
log_message
with a custom pattern;.[.[.[/]
;logger
together;In my case every log_message
starts with a whitespace followed by a colon. Let’s write this information as a MSG pattern in the custom.txt
file:
# logstash/pipeline/patterns/custom.txt
MSG \s:.*
We can include this pattern in the logstash.conf
file only after declaring the path to it in the patterns_dir
option. Make sure that you use the correct path. In my example I placed this directory in the same location as the configuration file:
# logstash/pipeline/logstash.conf
…
grok {
patterns_dir => ["./patterns"]
match => {"message" => "… %{MSG:log_message}"}
}
…
Now we have to deal with the troublesome .[.[.[/]
part between the JAVACLASS and MSG patterns. In my case everything placed here is still part of a logger. Therefore, I’m going to identify it as the %{GREEDYDATA:loggercd}
segment. We ended up with two logger parts, where the second one contains trailing whitespace:
To create a properly constructed logger
field we’re going to apply the mutate filter plugin:
# logstash/pipeline/logstash.conf
…
if [loggercd] {
mutate {
replace => { "logger" => "%{logger}%{loggercd}" }
strip => ["logger"]
remove_field => ["loggercd"]
}
}
…
As you can see, we:
loggercd
part, we want to leave it as it is;logger
and loggercd
fields and replaced the original incomplete logger
field with that value;loggercd
field as it fulfilled its task.To show you the results I have to run my app this time. On the following screenshot (from my ElasticHQ instance) you can see how the original message was parsed:
You can see the example logstash.conf
file (with the STDOUT output so you can easily test it yourself) below:
# logstash/pipeline/logstash.conf
input {
beats {
port => 5044
}
}
filter {
grok {
patterns_dir => ["./patterns"]
match => {
"message" => "%{TIMESTAMP_ISO8601:log_timestamp}\s*%{LOGLEVEL:log_level}\s*%{POSINT:logged_for_pid}\s*--- \[+%{NOTSPACE:logged_for_thread}+\]\s*%{JAVACLASS:logger}%{GREEDYDATA:loggercd}\s*%{MSG:log_message}"
}
}
if [loggercd] {
mutate {
replace => { "logger" => "%{logger}%{loggercd}" }
strip => [ "logger" ]
remove_field => [ "loggercd" ]
}
}
}
output {
stdout {
codec => rubydebug
}
}
You can test it against logs that have the same structure as the entries listed below:
2020-05-12 08:31:26.530 INFO 10197 --- [SpringContextShutdownHook] o.s.s.c.ThreadPoolTaskExecutor : Shutting down ExecutorService 'applicationTaskExecutor'
2020-05-12 08:29:20.099 INFO 10197 --- [main] o.a.c.c.C.[.[.[/] : Initializing Spring embedded WebApplicationContext
2020-05-12 08:29:20.290 INFO 10197 --- [main] s.SpringBootLog4j2ScaffoldingApplication : Started SpringBootLog4j2ScaffoldingApplication in 1.099 seconds (JVM running for 1.729)
2020-05-12 08:29:20.290 ERROR 10197 --- [main] s.SpringBootLog4j2ScaffoldingApplication : Logger exception message
While the example log entries I used in this article don’t exhaust the topic, they may be useful to illustrate some issues with identifying structure in any content that we want to parse with Grok.
In the end, try to restrict the usage of the .*
pattern (GREEDYDATA) as much as possible and debug tricky entries. The Grok Constructor application might be helpful as well.
Spring Security allows us to use role-based control to restrict access to API resources. However,…
A custom annotation in Spring Boot tests is an easy and flexible way to provide…
Delegating user management to Keycloak allows us to better focus on meeting the business needs…
Swagger offers various methods to authorize requests to our Keycloak secured API. I'll show you…
Configuring our Spring Boot API to use Keycloak as an authentication and authorization server can…
Keycloak provides simple integration with Spring applications. As a result, we can easily configure our…