Grok processor
Stack Serverless
The Grok processor parses unstructured log messages and extracts fields from them. It uses a set of predefined patterns to match the log messages and extract the fields. The Grok processor is very powerful and can parse a wide variety of log formats.
You can provide multiple patterns to the Grok processor. The Grok processor will try to match the log message against each pattern in the order they are provided. If a pattern matches, the fields will be extracted and the remaining patterns will not be used. If a pattern does not match, the Grok processor will try the next pattern. If no patterns match, the Grok processor will fail and you can troubleshoot the issue. Refer to generate patterns for more information.
Start with the most common patterns first and then add more specific patterns later. This reduces the number of runs the Grok processor has to do and improves the performance of the pipeline.
This functionality uses the Elasticsearch Grok pipeline processor. Refer to the Grok processor Elasticsearch documentation for more information.
The Grok processor uses a set of predefined patterns to match the log messages and extract the fields.
You can also define your own pattern definitions by expanding the Optional fields
section. You can then define your own patterns and use them in the Grok processor.
The patterns are defined in the following format:
{
"MY_DATE": "%{YEAR}-%{MONTHNUM}-%{MONTHDAY}"
}
Where MY_DATE
is the name of the pattern.
The previous pattern can then be used in the processor.
%{MY_DATE:date}
Requires an LLM Connector to be configured. Instead of writing the Grok patterns by hand, you can use the Generate Patterns button to generate the patterns for you.
Click the plus icon next to the pattern to accept it and add it to the list of patterns used by the Grok processor.
Under the hood, the 100 samples on the right side are grouped into categories of similar messages. For each category, a Grok pattern is generated by sending a few samples to the LLM. Matching patterns are then shown in the UI.
This can incur additional costs, depending on the LLM connector you are using. Typically a single iteration uses between 1000 and 5000 tokens, depending on the number of identified categories and the length of the messages.