The+CloudWatch+Book+-+Logs+&+Insights

Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

Chapter 4.

Central Logs Management and


Analysis with Logs & Insights
Once you have a running application, you will quickly find yourself in a situation where you need to
understand how your system behaves. While typical metrics can help you understand how your
hardware performs, you need to gain insights into your actual user journey. This is where Logging
comes in.

Your system needs to continuously log information about what is happening. That can be log in
requests from users, errors that happened, or simply information about how long a certain operation
took.

But where should the logging process actually happen?

4.1. CloudWatch Is the Centralized Logging Service in


AWS
This is where CloudWatch comes in. CloudWatch Logs is the centralized logging service. It collects and
stores the logs in one central place.

In AWS, there are many services that log to CloudWatch by default. CloudWatch gives you the
functionality to store, view, and analyze these logs with a set of different tools.

Figure 10. Cloudwatch Receives Logs from Multiple AWS Services

If we take Lambda as an example, Lambda is logging automatically to CloudWatch. Each time you run a
Lambda function, the logs are stored in CloudWatch Logs.

Logs in CloudWatch reside in Streams, and Streams reside in Log Groups. To analyze logs and to get an
overview of Logs across different Log Groups, it is important to understand the concept of Log Insights.
You will learn all of that in this chapter.

We start by exploring the concepts of Logs in general. After that, we continue with Logs Insights. Logs

22
Insights lets you query your different Log Groups like a database with a SQL-like query language. We will
show you different examples of using Log Insights within our example project.

To fully understand the following chapter, it would be good to read the chapter about the example
project first. This will help you follow where logs appear and how to query them.

We will explain different concepts by showing you the actual names of Log Groups,
Streams, and more.
Remember that it depends on how you have deployed the example project and that the names
can vary.

4.2. Fundamentals of Logs, Streams, and Groups in


CloudWatch
Let’s get started with the first topic of this chapter: Logs Concepts.

Figure 11. Log Groups, Log Streams and Log Events

Logs follow the concept of:

• Log Groups: A Log Group can be seen as a container for Log Streams. A single Log Group is typically
attached to one application, like a Lambda Function or ECS container.

• Log Streams: Log Streams contain the actual Logs. There are multiple Log Streams within one Log
Group. Each Log Stream contains multiple Log Events.

• Log Events: Log Events are the actual log outputs of your application or resource. They are the
smallest unit of Logs.

23
Let’s dive into each category and see an example for each of them.

4.2.1. Log Groups

A Log Group is typically attached to one service or application. For example, one Log Group is attached
to one Lambda function. A Log Group holds several Log Streams.

Let’s understand Log Groups by looking at an example. We use our REST API Handler Lambda function
as an example.

Figure 12. The Lambda Function of Our REST API

The REST API handler Lambda function is a Lambda function that is triggered by an API Gateway. It is
responsible for handling all incoming requests to our REST API.

Lambda Log Groups always follow the convention /aws/lambda/<function-name>.

For most AWS services, the naming convention is like that: /aws/<service>. There are just a few
exceptions to this rule, like API Gateway Execution Logs.

If you create a Log Group with this exact name, Lambda will automatically log into this Log Group (if the
IAM permissions are set correctly). In our example, the function name is cw-ho-tf-dev-api-rest. By that,
the Log Group that is attached to this Lambda function is called:

/aws/lambda/cw-ho-tf-dev-api-rest

There are two ways of finding this Log Group. Either you go to CloudWatch ⇒ Log Groups and search for
this name. Or you go to the Lambda function ⇒ Monitoring ⇒ View CloudWatch logs. We find ourselves
doing the latter.

24
Figure 13. Jumping to the CloudWatch Logs via the Lambda Console

Once you click on this button, it will open a new tab with the Log Group. In the Log Group, you will see
several things like:

• The name of the Log Group

• The retention period (1 week in our example)

• Available Log Streams

Figure 14. The Log Group Details

4.2.2. Log Streams

This brings us to the next concept: Log Streams.

Multiple Log Streams are contained within one Log Group. Multiple Log Events live in one Log Stream.

For Lambda, one Log Stream is typically one execution environment of a Lambda function. That means

25
the Log Stream can contain multiple events that are logged by the Lambda function. But each Lambda
environment has its own Log Stream. In case multiple Lambdas are launched, multiple Log Streams are
created.

Let’s see an example of a Log Stream.

Figure 15. An Example Log Stream

The Log Stream contains all Log Events. You can define the time range of the Log Events and filter
events.

If you want to filter events, you can do this here already. The syntax is not very intuitive, but it is
nevertheless powerful. You can also easily filter by a specific time range or a specific string.

Figure 16. Filtering Events in a Stream

For example, you can filter by a simple string REPORT to get all report logs of the Lambda execution. This
log shows you some information about how long the Lambda did run and how much memory it used.

You can also filter structured logs as the query language of CloudWatch allows you to filter by JSON
fields.

26
But the real power of filtering logs comes later with Logs Insights.

4.2.3. Log Events

Log Events are the smallest unit of logs. They are the actual log outputs of your application. Log Events
are contained within Log Streams.

One actual Log Event looks like this:

{
"level": "INFO",
"message": "Getting repositories from DynamoDB.",
"sampling_rate": 0.1,
"service": "repo-tracker",
"timestamp": "2023-11-26T01:23:13.424Z",
"xray_trace_id": "1-65629e01-3175af1e414cddf527102748",
"correlationIds": {
"requestId": "df5049c6-3706-490f-88c4-c751fb58ba84"
}
}

This is a Log Event that was logged with a structured logger. If you haven’t set up a structured logger,
the Log Event will look different. It typically is a simple text output.

Now you know the basic concept of how CloudWatch treats logs. Log Groups contain Log Streams, and
Log Streams contain Log Events. Let’s now look at another basic building block: Structured Logs.

27
4.3. Use Structured Logging as the Basic Building Block
for Observability
Using structured logging is the basic building block for observability. It is highly recommended to create
your logs in a structured format like JSON.

Figure 17. Sending structured JSON logs to CloudWatch

By default, logs are just plain text. AWS often follows a so-called Common Log Format (CLF) for logs. For
example, Lambda logs do this a lot in INIT, REPORT, or START Logs.

START RequestId: 1e4b3b3e-3b3e-4b3e-3b3e-4b3e3b3e3b3e Version: $LATEST


END RequestId: 1e4b3b3e-3b3e-4b3e-3b3e-4b3e3b3e3b3e
REPORT RequestId: 1e4b3b3e-3b3e-4b3e-3b3e-4b3e3b3e3b3e Duration: 0.30 ms Billed Duration:
100 ms Memory Size: 1024 MB Max Memory Used: 112 MB

If you compare that with an example log from a structured logger, you will see the difference:

{
"level": "INFO",
"message": "Getting repositories from DynamoDB.",
"sampling_rate": 0.1,
"service": "repo-tracker",
"clientIp": "192.168.131.39",
"timestamp": "2023-11-26T01:23:13.424Z",
"xray_trace_id": "1-65629e01-3175af1e414cddf527102748",
"correlationIds": {
"requestId": "df5049c6-3706-490f-88c4-c751fb58ba84"
},
"time": 98,
"error": 0,
"failure": 0,
"cacheHit": 0
}

We recommend using a structured JSON format. This will not only help you query and analyze your logs
more effectively, but it will also enable you to log wide events and add additional context to your logs.

28
Figure 18. Structured Logs Example

Maybe even for things that you don’t need right now but might need in the future.

With correlation IDs, you can even attach IDs through several log outputs. This will help you trace
requests better and understand which requests belong to each other.

Built-In Structured Logging

In 2023, AWS announced that it is now possible to log JSON directly to CloudWatch Logs without any
change. This is a great change, but in this book, we’re not using that. This has several reasons:

• You can’t set default log levels

• You can’t sample logs

• No log formatter

• Weird timestamp format

This is why we stick to another library for structured logging. We are sure AWS is working on a couple of
these things already. And once they are there, we will update this section!

Powertools for AWS for Structured Logging

Powertools for AWS is an amazing library developed by internal teams at AWS. We use this library
throughout this book. For CloudWatch, we use several features like:

• Structured logging

• Setting log levels

• Assigning correlation IDs

• Sampling logs

• Tracing requests

29
• Creating custom metrics

• … and much more

Throughout the different chapters, we will make use of Powertools for AWS a lot. But remember, you
can also use other libraries like Pino or Winston for structured logging.

4.3.1. Use Powertools for AWS to Log in a Structured Way

Let’s see how you can use the logger from Powertools for AWS in Node.js:

import { Logger } from '@aws-lambda-powertools/logger';

const logger = new Logger();

logger.info('Creating my first log', { params: "hello World" });

This will result in the following JSON log:

{
"level": "INFO",
"message": "Creating my first log",
"sampling_rate": 0.1,
"timestamp": "2023-11-26T01:51:24.480Z",
"params": "hello World",
}

This gives us a structured log already. But we can do even more with the logger. Let’s add some
correlation IDs so that we can easily find logs that belong together.

logger.appendKeys({ requestId: event.requestContext.requestId });

With that, each subsequent log will now have the requestId as a correlation ID. We will see this in action
later once we use Log Insights.

For our example project, we use one shared logger with additional configuration.

import { Logger } from '@aws-lambda-powertools/logger';


import { GithubTrackerLogFormatter } from './log-formatter';

export const logger = new Logger({


sampleRateValue: 0.1,
serviceName: 'repo-tracker',
logFormatter: new GithubTrackerLogFormatter()

30
});

In the custom logger, we do the following:

• Sample Rate: We set the sample rate to 0.1. This means that we log 10% of all debug logs.

• Service Name: We set the service name to repo-tracker. This will be added to each log.

• Log Formatter: With a log formatter, you can format your logs.

4.3.2. Log Formatter

In our custom logger, we have created a log formatter.

We have added a custom log format that adds additional attributes to the log. Here is a smaller version
of the log formatter, so that you get an idea:

backend/lambda/utils/log-formatter.ts

class GithubTrackerLogFormatter extends LogFormatter {


public formatAttributes(attributes: UnformattedAttributes, additionalLogAttributes:
LogAttributes): LogItem {
const baseAttributes = {
logLevel: attributes.logLevel,
message: attributes.message,
environment: attributes.environment,
awsRegion: attributes.awsRegion,
correlationIds: {
awsRequestId: attributes.lambdaContext?.awsRequestId,
xRayTraceId: attributes.xRayTraceId,
},
lambdaFunction: {
name: attributes.lambdaContext?.functionName,
arn: attributes.lambdaContext?.invokedFunctionArn,
memoryLimitInMB: attributes.lambdaContext?.memoryLimitInMB,
version: attributes.lambdaContext?.functionVersion,
coldStart: attributes.lambdaContext?.coldStart,
},
timestamp: this.formatTimestamp(attributes.timestamp),
logger: {
sampleRateValue: attributes.sampleRateValue,
},
};
// Create a new LogItem with the base attributes
const logItem = new LogItem({ attributes: baseAttributes });

// Merge additional attributes


logItem.addAttributes(additionalLogAttributes);

31
return logItem;
}
}

The log formatter transforms the log into a more readable version.

Figure 19. Logs with and without Log Formatter

In the screenshot above, you can see the differences between both logs. It will automatically extract
important details like the awsRegion, which is the region where the function is deployed.

Using a log formatter will help you a lot with the readability of your logs.

4.3.3. Adding the Lambda Context

Each Lambda function has a context. The context contains information about the Lambda function. By
adding the context, we get additional information in our logs such as:

• Function Name

• Function ARN

• Memory

• Version

• Cold Start

Our log now looks like this:

{
"message": "Incoming event",
"service": "repo-tracker",
"awsRegion": "us-east-1",

32
"correlationIds": {
"awsRequestId": "57423c1e-a8db-4daf-8e63-39483bce8c9a",
"xRayTraceId": "1-65f2bb4d-3e59ea5908ddc0ce1ca5597d"
},
"lambdaFunction": {
"name": "cw-ho-tf-dev-api-rest",
"arn": "arn:aws:lambda:us-east-1:590183990318:function:cw-ho-tf-dev-api-rest",
"memoryLimitInMB": 1024,
"version": "$LATEST",
"coldStart": true
},
"logLevel": "INFO",
"timestamp": "2024-03-14T08:54:39.290Z",
"logger": {
"sampleRateValue": 0.1
}
}

This information can be very helpful when analyzing your workloads. The context is typically sent along
with the invocation event of the Lambda. You can add the context to your logger with three different
options:

1. You can call logger.addContext(context) after initializing the logger.

2. You can use a middy middleware to add the data before your execution happens.

3. You can use the class method decorator @logger.injectLambdaContext() (especially for people devs
who like OOP).

We love to use the middy.js framework, so we make use of that.

4.3.4. Middy.js Middleware Excursion

Middy.js is a middleware framework for AWS Lambda. It allows you to run middlewares before or after
the execution of your Lambda function.

Figure 20. Middy.js

We use it a lot for repetitive tasks like adding the Lambda context to the logger. In the screenshot, you

33
can see an example of such repetitive tasks, such as:

• Adding the Lambda context

• Logging the incoming event

• Connecting to the database

You can also add middlewares that should happen after the invocation, such as:

• Clearing the state from the logger

• Closing database connections

• Logging success messages

Middy is not directly connected to any CloudWatch feature per se. But we wanted to mention it since the
Powertools for AWS library is using middy. And we think it can speed up your Lambda development
workflow a lot!

This was a quick introduction to structured logging and Powertools. Powertools is not the only option,
but a pretty good one. Now, let’s learn more about CloudWatch.

4.4. Setting Up Log Retention Policies to Manage Storage


and Costs

Figure 21. Log Ingestion and Log Storage

CloudWatch Logs has two main cost parameters:

1. The amount of data ingested

2. The amount of data stored

We will see this in detail in the pricing section.

The easiest way to adjust the costs for the second parameter, the amount of data stored, is to set a
retention policy. The Log Retention period defines how long logs are stored in CloudWatch. By default,
logs are stored indefinitely when you create a new Lambda function.

This is not only suboptimal but can also be very costly. First of all, you will pay for the storage of these
logs. Second, you will have a lot of logs, which can make it very hard to analyze and find the appropriate

34
logs. Third, you often have audit requirements that define how long you need to store logs.

This is where Log Retention helps you. You can set the retention period to a value between 1 day and
10 years. Reminder: If you omit the retention period, logs are stored indefinitely.

Learn about your requirements and set your retention period accordingly.

Tip: It is often required from a regulatory perspective to store your logs longer than
you'd like to. For example, SOC2 requires you to store your server logs for at least 1
year.

4.4.1. A Note about Lambda and Log Groups

One problem with Lambda is that Log Groups are created automatically. By logging your first output to
STDOUT, a Log Group is created. This Log Group will not be in your CloudFormation Stack or your
Terraform state. That makes it quite hard to configure the Log Retention period.

This is why we suggest creating a Log Group for every Lambda function automatically. Log Groups are
connected to Lambda functions only by their name. This makes it easier to set a retention period.

Here you can see a Log Group that was created by a Lambda function:

Figure 22. Missing Retention for Log Groups

Infrastructure-as-code tools help you with that. In every project, we define a Lambda module that
creates a Log Group for each Lambda function. We set a default retention period of 14 days and let
developers override it if necessary.

We always deploy Lambda functions together with a Log Group so that we can set the retention period.
Here is the example Terraform code.

module "lambda_main" {
source = "terraform-aws-modules/lambda/aws"
function_name = "${var.global_prefix}-${var.environment}-api-rest"
handler = "dist/api-rest.handler"
runtime = var.nodejs_runtime

35
cloudwatch_logs_retention_in_days = 14
}

After we deploy a Lambda function with the Terraform code above, we will see the Log Group in the
CloudWatch console.

Figure 23. Log Group with a defined retention period

In 2024, Lambda launched a feature that allows setting a custom Log Group for your Lambda function.
That means you can also create a Log Group with a different name and attach it to your Lambda
function. This context is very important, especially if you see some legacy applications or Lambda
functions.

36
4.5. Analyze Logs with CloudWatch Logs Insights
Up until now, we have covered everything about ingesting logs into CloudWatch. But once you have
many logs, you will see that it is quite hard to analyze your logs. CloudWatch Logs Insights is exactly for
this use case.

Figure 24. Aggregating & querying logs via Log Insights

It helps you query your logs across multiple Log Groups. You can build queries (very much like a SQL
query) to find logs efficiently.

With sample queries, field suggestions, and the ability to share queries, it is one of the most powerful
tools in CloudWatch. If you use CloudWatch as a monitoring tool, Logs Insights will be your best friend.

We often saw that organizations start using third-party tools that could be easily solved with Logs
Insights. This is why we want to give you a deep dive into Logs Insights.

One important note: You can only query logs that are not older than November 5, 2018.

4.5.1. The Log Insights Console is Very Powerful

Let’s first have an overview of the Logs Insights console. Head over to CloudWatch and select Log
Insights. Make sure that you are in the correct region. Our application is deployed in the us-east-1
region.

After you open the Log Insights console, you will see the following:

37
Figure 25. The Log Insights Console

There is quite a bit to see in the console:

• Time Selection: You can choose the time range you want to query, either absolute (from 2021-12-12
12:00:00 to 2021-12-12 13:00:00) or relative (last 1 hour).

• Log Groups: A dropdown menu to select up to 50 Log Groups you want to query. We have selected
the Log Group of our REST API Handler Lambda Function (/aws/lambda/cw-ho-tf-dev-api-rest).

• Query Editor: The query editor helps you write queries by suggesting fields and functions. Since
2023, there has also been an AI query generator, but more on that later.

• Results, Patterns, Visualizations: Once you run a query, you will see the results, patterns, and
visualizations.

• Discovered Fields: This tab shows you all the fields that are available for querying.

• Sample Queries: CloudWatch Log Insights provides you with a set of sample queries. You can also
save queries and share them with the team.

We will dive into each functionality in the following sections.

4.5.2. Log Insights Comes with a Set of Sample Queries

Before we explain how to use Logs Insights, let’s take a look at some of the sample queries it comes
with. These are example queries you can use to get started quickly. Let’s examine two different queries
and explain what they do:

38
1. Find the most expensive requests:

filter @type = "REPORT"


| fields @requestId, @billedDuration
| sort by @billedDuration desc

This query filters all Log Events that have the field @type set to REPORT. This is a special Log Event that
each Lambda function provides at the end of the execution. For our Lambda function, it looks like this:

REPORT RequestId: acb7bcdd-5048-46c1-9c2c-a965e2dc825f Duration: 478.69 ms Billed


Duration: 479 ms Memory Size: 1024 MB Max Memory Used: 112 MB

From this Log Event, the fields @requestId and @billedDuration are extracted. Finally, the results are
sorted by the @billedDuration field. The results look like this:

Figure 26. A query result example

We can see the most expensive queries based on the @billedDuration field.

2: Determine the amount of overprovisioned memory:

filter @type = "REPORT"


| stats max(@memorySize / 1000 / 1000) as provisionedMemoryMB,
min(@maxMemoryUsed / 1000 / 1000) as smallestMemoryRequestMB,
avg(@maxMemoryUsed / 1000 / 1000) as avgMemoryUsedMB,
max(@maxMemoryUsed / 1000 / 1000) as maxMemoryUsedMB,
provisionedMemoryMB - maxMemoryUsedMB as overProvisionedMB

This query is a bit more complex. It shows you how to use the stats function. This query uses the
statistical functions min, max, and avg to calculate the overprovisioned memory. It will show you how
much memory your Lambda function used and how much memory you assigned to it. Let’s see the
results:

39
Figure 27. Another query result example

Wow, we over-provisioned our Lambda function quite a bit.

The values show:

• provisioned memory: 1024

• smallest memory request: 17

• average memory used: 94

• max memory used: 135

So we over-provisioned it by almost 1 GB!

With increased memory allocation, you also gain proportional enhancements in bandwidth and
vCPUs. This means that by overprovisioning memory, you can achieve better networking speed
and improved CPU performance. For instance, in memory-intensive applications, allocating
additional memory can reduce latency and increase throughput. Similarly, for compute-bound
tasks, more vCPUs can lead to faster processing times. Therefore, strategically overprovisioning
memory can optimize both your network and CPU resources, leading to overall better system
performance.

Interestingly, this can lead to cost savings. For example, a Lambda function with more memory
might complete its tasks faster, reducing the total execution time and thus lowering the overall
cost. By optimizing performance, you can achieve better efficiency and potentially lower expenses.

These two queries give you an idea of what is possible with CloudWatch Logs Insights.

Now, let’s start over and learn how to use Log Insights.

4.5.3. Finding Relevant Fields with the Query Editor and Field Selections

Now that you have seen some sample queries, let’s dive into how to use Logs Insights.

The main building blocks of Logs Insights are queries. Queries allow you to filter, aggregate, and
visualize your logs. The query language is similar to a SQL-like language.

You select the fields you want, define a filter to narrow down the logs, and use stats to aggregate the
logs. You can make use of a lot of different functions to get the most out of your logs. Let’s get an
overview and see what is possible.

We will start with a simple query and then dive into more complex queries.

40
fields @timestamp, @message, @logStream, @log
| sort @timestamp desc
| limit 20

You will see this query every time you open the Log Insights console. Let’s see what it does:

• fields: This function selects the fields you want to see in the results. In this example, you see the
fields @timestamp, @message, @logStream, and @log.

• sort - This function sorts the results by the field @timestamp in descending order.

• limit - This function limits the results to 20.

Once you run the query, you will see the following output:

Figure 28. Query result including a visualization

The result shows you a visualization of when Log Events were logged.

The first step of writing logs is to find the relevant fields you want to log. The query editor helps you with
that. Once you start typing, you can make use of auto-suggestions.

For example, we have different correlationIds in our logs. To see which ones are available, we can start
typing correlationIds., hit the auto-suggestion shortcut (cmd+space on Mac), and see all available fields.

Figure 29. Auto suggestions in the Log Insights query editor

41
Another way of finding all available fields is on the right side of the console. The tab Discovered Fields
shows you all the fields that are available for querying. Important: You need to run one query first. We
suggest just running the first sample query with a larger limit to see all available fields.

Figure 30. Log Insights shows discovered fields in our structured logs

After that, you will see all available fields and how often they appear. For example, the field
correlationIds.awsRequestId is only available in 69% of your logs. If you click on one field, you will also
see in which Log Groups the field appears and how often. Since we only selected one Log Group in our
query, the field appears in 100% of the logs for that Log Group.

4.5.3.1. Create your Queries with AI

In 2023, AWS announced the AI Query Generator for Logs Insights. This query generator lets you
generate queries based on natural text. For example, you can ask it to find all POST requests for a specific
Log Group. It will then try to generate a query for you.

To do this, simply enter the text find all POST requests in the query AI. Then click on Create Query and
it will generate the query for you. Here is the result:

Figure 31. Generating queries with the Query generator (powered by AI)

42
It correctly generates the query for us.

Please note that this feature is currently only available in us-east-1 and that it is still in beta
(state: August 2024).

This means, the AI may sometimes generate queries that won’t work, but it provides a good
starting point. However, a more critical issue arises when the AI generates a syntactically correct
query that is semantically incorrect. This can lead to misleading results and potentially costly
mistakes.

Therefore, it is crucial to invest time in learning the Logs Insights syntax thoroughly.
Understanding the syntax and semantics will help you to verify and refine AI-generated queries.
Remember, AI should be considered an assistant, not a replacement.

It can help you get started, but your expertise and judgment are essential for accurate and
actually useful monitoring.

4.5.4. Filtering Logs with the Filter Function

One of the most common functionalities in Log Insights is to filter logs based on certain fields. This is
where the filter function comes into play.

There are many different ways of filtering logs. You can filter on numerical values (points>1000) or also
on text values (httpMethod="GET"). You can match multiple values with the in operator (httpMethod in
("GET", "POST")) or even use regex (httpMethod like /GET/). This can be very helpful and is a very good
way to get started.

Let’s look at an example log event (the log event is slightly modified for better readability) :

{
"message": "Removing microsoft/vscode from DynamoDB.",
"service": "repo-tracker",
"awsRegion": "us-east-1",
"correlationIds": {
"awsRequestId": "7376e3e8-c7ee-4363-97c2-235a515b412e",
},
"lambdaFunction": {
"name": "cw-ho-tf-dev-api-rest",
},
"logLevel": "INFO",
"timestamp": "2024-03-17T08:01:32.765Z",
"logger": { "sampleRateValue": 0.1 }
}

43
A very common use case of Log Insights is to find logs that belong together. For example, we have
added the awsRequestId to the correlationIds field. Let’s see which logs all belong to this deletion
request:

fields @timestamp, message


| filter correlationIds.requestId = "7376e3e8-c7ee-4363-97c2-235a515b412e"
| sort timestamp asc

For that we filter all requests by the awsRequestId field. The ID for this log was 7376e3e8-c7ee-4363-97c2-
235a515b412e. This one will be different for your execution!

Figure 32. Filtering by the request ID

We can also filter all logs and look for all POST requests:

fields @timestamp, fullName, @message, @logStream, @log


| filter correlationIds.httpMethod = "POST"
| sort @timestamp desc

Figure 33. Filtering by the HTTP method

This query shows you all logs that have the correlationIds.httpMethod set to POST.

Using Like for Regex and Fulltext Search

There are many operations out there that you can use with the filter operation. For example, you can
also search full text. This is often very useful in logs.

Let’s say we want to find all POST requests that contain the word sst in the name of the repository.

44
fields @timestamp, fullName, @message, @logStream, @log
| filter correlationIds.httpMethod = "POST" and fullName like /sst/
| sort @timestamp desc

Figure 34. Filtering for HTTP method and repository name

This query will show you all POST requests that contain the word sst in the name of the repository. You
can combine filter arguments with and or or operators.

The operator like also supports using regular expressions and wildcards.

fields @message
| filter fullName like /(?i)Code/

With this query, you check for all logs that contain the case-insensitive word code in the fullName field.

You can also negate the filter with the not operator.

fields @message
| filter fullName not like /(?i)Code/

Now you’ll find all logs that don’t contain the word Code in the fullName field.

There are some more operators that you can use with the filter function. For example, the in operator to
match multiple values in an array.

4.5.5. Aggregate Logs with Statistical Functions

Log Insights can be very powerful when you apply statistical functions to your logs. Statistical functions
follow the syntax: stats <function> as <alias> by bin(<time>)

The alias and bin are both optional. There are several different statistical functions, which can be
categorized as aggregation and non-aggregation functions.

4.5.5.1. Aggregation Functions Calculate a Single Value

Aggregation functions use data to calculate a single value. The available aggregation functions are:

• avg() - Calculates the average of values in a numeric log field.

45
• count() - Counts the total number of log events.

• count_distinct() - Estimates the number of unique values in a log field.

• max() - Identifies the maximum value in a log field.

• min() - Identifies the minimum value in a log field.

• pct() - Determines the value at a specific percentile in a dataset.

• stddev() - Calculates the standard deviation of values in a numeric log field.

• sum() - Calculates the sum of values in a numeric log field.

4.5.5.2. Non-Aggregation Functions Display One Data Value

Non-aggregation functions do not perform calculations; they display one data value from a set of data.
For example, the earliest or latest field.

The available non-aggregation functions are:

• earliest() - Returns the value from the earliest timestamped log event for a specified field.

• latest() - Returns the value from the latest timestamped log event for a specified field.

• sortsFirst() - Returns the value that sorts first in the queried logs for a specified field.

• sortsLast() - Returns the value that sorts last in the queried logs for a specified field.

Let’s see some examples:

4.5.5.3. Count How Often a Repository Was Added

Let’s say we want to know how often the repository microsoft/vscode was added to our application. We
can run a query to do exactly that:

fields @timestamp, fullName, @message, @logStream, @log, repository


| filter correlationIds.httpMethod = "POST" and repository.full_name like "code"
| stats count(*) as CountCodeRepos

46
Figure 35. Building a complex query to find out how often a repository was added

This shows us that in the past 4 weeks, repositories with the name code were added 240 times.

4.5.5.4. Create a Time-Series of Added Repositories

We can even take this one notch further and look at how this data behaved over time. Let’s bin this data
into 1-day buckets and see how often it was added.

fields @timestamp, fullName, @message, @logStream, @log, repository


| filter correlationIds.httpMethod = "POST" and repository.full_name like "code"
| stats count(*) as CountCodeRepos by bin(1d) as DailyAdds

This results in a time series of how often the repository was added.

Figure 36. Timeseries how often the repository was added

With the bin() function, you can slice your data into different time bins. In this example, we slice the

47
data into 1-day bins. You can use one of the following time units:

• ms - Milliseconds

• s - Seconds

• m - Minutes

• h - Hours

• d - Days

• w - Weeks

• mo - Months

• q - Quarters

• y - Years

Visualize your Time-Series Data

In Log Insights, you can even visualize your queries. To do that, head over to the Tab Visualization.
Select your chart type (we selected Line Chart), and you can see your time-series data.

Figure 37. A line-chart visualization of our query results

This shows the query above (counting how often the repository VS code was added) in a time-series
chart.

This gives us a quick overview of our log data. It can be super helpful to understand business logic
better without creating expensive custom metrics. You can then go ahead and add this visualization to
your CloudWatch Dashboard and share it with your stakeholders.

48
4.5.6. Intrinsic Functions like ispresent, isipinsubnet, concat Make Your Life
Easier

Logs Insights offers a range of intrinsic functions that make your life easier. These functions help you
analyze your logs efficiently and effectively.

Some examples for intrinsic functions are:

4.5.6.1. Numeric Functions

• abs() - Calculates the absolute value.

• ceil() / floor() - Rounds numbers up or down.

• greatest() / least() - Finds the largest or smallest value.

• log() - Computes the logarithm.

• sqrt() - Calculates the square root.

4.5.6.2. Date Functions

• datefloor() / dateceil() - Rounds down or up to the nearest date unit.

• fromMillis() / toMillis() - Converts between milliseconds and date-time.

4.5.6.3. General Functions

• ispresent() - Checks if a field is present.

• coalesce() - Returns the first non-null value.

4.5.6.4. IP Functions

• isValidIp() / isValidIpV4() - Validates IP addresses.

• isIpv4inSubnet() - Checks if an IPv4 address is within a specified subnet.

4.5.6.5. String Functions

• isempty() / isblank() / concat() - Checks for empty or blank strings and concatenates strings.

• ltrim() / trimChars() / rtrim() - Trims characters from the left, right, or both sides of a string.

• strlen() / toupper() / tolower() - Returns the length of a string and converts strings to upper or
lower case.

Example Usage

A typical example is the ispresent function. With this function, you can filter logs where a field is
present. For our example, we want to see all logs where the repository field is present.

49
fields @timestamp, repository
| filter ispresent(repository)

Now you only see all logs where the field is present. There are many more such functions. The easiest
way to figure them out is to go to the query editor and hit cmd+space (or ctrl+space on Windows) to see
all available functions.

Figure 38. Showing available functions via the auto-complete shortcut

4.5.7. Adding Fields with the Parse Functionality

Sometimes you may want to add additional fields to your logs. In our example application, it could be
interesting to split the fullName field into owner and repo. Let’s see how we can do that with the parse
functionality of Log Insights.

From our fullName log attribute, we can extract the owner and the repo. For example, if the fullName is
apache/airflow, the owner would be apache and the repository would be airflow.

Figure 39. The owner and name parts of a repository

With the following query, we can extract these two fields:

fields @timestamp, fullName


| filter ispresent(fullName)
| parse fullName "*/*" as owner, repository

Note: The ispresent is not necessary here, but it makes the output clearer if we only see the fields. The
result looks like this:

50
Figure 40. Using the ispresent filter

This functionality makes it very easy to extract fields from your logs. It is also possible to extract fields
using Regex or Glob expressions.

4.5.8. Find Log Patterns with the Pattern Functionality

Logs in your application typically follow a certain pattern. Log Insights has a functionality to find these
patterns and cluster your logs based on them. A pattern describes when similar logs occur together.

This feature was recently released (in 2023) and is a very powerful tool. When you query your logs, you
will now see a tab called Patterns. Let’s run an example query and see what happens:

fields @timestamp, fullName, @message, message

We’ll query our REST API Handler Lambda function and retrieve a lot of logs.

Figure 41. Finding patterns in our logs

51
Log Insights has found 29 patterns in our logs. You can sort these patterns by event count or event
ratio.

We can see that the logs that appear the most are the start and end logs (see the second and third
rows). But we can also see that the pattern Incoming Request appears frequently. We can inspect this
pattern by clicking on the magnifying glass.

Figure 42. Inspecting patterns

Another pane will open, allowing you to examine the pattern in more detail. You can see the log events
that belong to this pattern, as well as all the values of the attributes and related patterns.

If you have a high volume of data, patterns can be useful for finding similar error cases and helping you
debug.

4.5.9. Deduplicate Logs with dedup

A common problem with putting out a lot of logs into a logging system is that you also increase the
noise. The more logs you have, the harder it will be to find relevant logs.

For every user session, you will have duplicated logs. This is not a bad thing. But at certain times, you
will want to see only logs of one user flow.

This is where the dedup function comes into play. dedup removes duplicate logs from your query results.

Let’s see an example query. We get all GET requests.

fields @timestamp, message, fullName

52
| filter correlationIds.httpMethod = 'GET'

You can see that many logs are duplicated because of multiple GET requests.

The logs in Log Insights look like this:

Figure 43. Filtering our results by the HTTP method GET

Once we apply the dedup function, we can see only one journey of the logs without seeing all the noise.

fields @timestamp, message, fullName


| filter correlationIds.httpMethod = 'GET'
| dedup message

53
Figure 44. Using the dedupt function to reduce the noise

When working with a lot of logs, this can be very helpful to reduce the noise.

4.5.10. Saving and Sharing Your Queries

Creating queries is an art in itself. You don’t want to do this over and over again. This is why it makes
sense to share queries with your team and save them. This is really easy in Log Insights. Click on the tab
"Queries" and you can save your queries.

Figure 45. Saving queries

You can organize your queries in folders. We have two query folders:

1. API Gateway Logs

2. Lambda Logs

It makes sense that you do this for queries that you often use.

One thing that is often overlooked with Saved Queries is that they act as Tabs. It sounds a bit weird but
hear us out.

54
Once you execute a query, you can switch between saved queries. The outputs and the changes in the
queries are saved. If you have a change in your query, you will see a blue indicator next to the name of
the query.

Figure 46. Blue indicator for query changes

There is no need to open the window multiple times. It took some time until we discovered that!

4.5.11. Find Previous Queries with the Query History

Log Insights also keeps track of your query history. This is very helpful if you want to go back to a query
you have run before.

Figure 47. Insights history

It often happens that you play around with queries and can’t remember them anymore. The query
history helps you with that.

4.6. Real-time Logs with CloudWatch Live Tail


Another great addition from the CloudWatch team was added in the year 2023. CloudWatch Live Tail
allows you to stream your logs in real time. It often happens that you’re developing a Lambda function
and you need to check the logs. Doing this often resulted in checking multiple Log Streams within a Log
Group. Finding the correct one was often a hassle.

This is where Live Tail can help you a lot. It gives you the ability to tail your logs in real time. Developers
and ops engineers were used to checking the logs of server-side applications with tail -f. CloudWatch
Live Tail is like tail -f for the Cloud.

55
Figure 48. The Live Tail console view

After you go to the Live Tail console, you can select a Log Group. Here, we’ve selected the Log Group of
our REST API Handler Lambda function.

You can also go one level deeper and select a Log Stream. Once you click on "Start," you will see the logs
in real time.

Figure 49. A running Live Tail session

4.6.1. Highlighting and Filtering Terms

There are two ways to filter and highlight your logs. First of all, Live Tail has a highlight feature. This is
the bar at the top. You can enter a term, and it will highlight all logs that contain this term. For example,
we have added DynamoDB and API.

56
Each term will get a different color. The color shown next to the log entry indicates that the term is in
the log.

Figure 50. Highlighting inside Live Tail

After you add the terms, the respective logs will be highlighted with a colored bar on the side. This can
help you a lot if you have a large number of incoming logs.

The second way to filter your logs is by using a filter. The filter follows the JSON Path Syntax. This syntax
is often used within AWS services, for example in Log Streams, Metric Filters, and more places.

We add a filter to only match all logs from the service repo-tracker.

{$.service = "repo-tracker"}

57
Figure 51. Filtering inside Live Tail

This filter shows us all logs from our service repo-tracker. It makes the logging experience smoother
since all INIT, START, REPORT logs are filtered out. You can also adapt the filter to specific users, paths,
or other attributes.

4.7. Masking Sensitive Data with CloudWatch


One common pitfall with logging is logging sensitive data. Your engineers need access to logs in
development and production environments. But once you start handling workloads such as
subscriptions, address changes, or anything related to personal information, you need to be careful.

58
Figure 52. Masking Sensitive Data with CloudWatch

CloudWatch can automatically detect sensitive data and mask it. What is sensitive data? For example:

• Credit Card Numbers

• Email addresses

• IP addresses

• Usernames

• Passwords

CloudWatch has built-in mechanisms for pattern matching and more advanced machine-learning
techniques to detect sensitive data.

4.7.1. Policies to hide sensitive data

You can make use of several policies to hide sensitive data. Head over to your Log Group, select Data
Protection, and click on Create Policy.

59
Figure 53. Creating a Data Protection Policy

In the policy, you can select different Data Identifiers. For example, you can choose:

• Address

• AwsSecretKey

• BankAccountNumber

• and many more.

The Data Identifiers allow you to narrow down the data you want to hide. You can also enable Audit
Destinations. If CloudWatch finds sensitive data, it will be reported to the Audit Destination of your
choice. You can choose between CloudWatch, S3, and Kinesis. We are choosing a CloudWatch Log
Group. Let’s click on Activate data protection.

In our example application, we are logging the IP address in the incoming Lambda event. Let’s see how
CloudWatch can hide this data. In the event, we can now see that the IP address is masked with
asterisks:

60
Figure 54. Masked IP Address in CloudWatch Logs

We have implemented this in our example activation with Terraform and CDK, of course, as well.

Masking data across a whole AWS Account

The example we showed you was for a single Log Group. Data protection policies can be implemented
at two levels: for all log groups across your entire account or specific individual log groups. When a
policy is created at the account level, it applies to existing and any log groups created in the future.

Figure 55. Account-Level Data Protection Policy

4.7.2. Unmasking Sensitive Data

The IP address in the last screenshot is masked. There are certain times when you want to unmask the
data. For example, if you need to debug an issue that involves the IP address.

61
The great thing about hiding the data is that you can unmask it. You need to have the right permissions
to do that (logs:Unmask). You can go directly to the Log Group, click on Display, and select Temporarily
unmask protected data.

You can also use the unmask() function in Logs Insights.

fields @timestamp, @message, unmask(@message)


| filter customer.CreditCard like //
| sort @timestamp desc

This feature of masking and unmasking data helps you keep your customer data secure and still allows
you to efficiently debug your issues.

4.8. Logs in our Example Application


We’ve learned the basics about CloudWatch Logs & Logs Insights. Now, let’s see how this looks in our
example application. Our example application has several components that create logs.

Let’s take another look at the architecture of the GitHub tracker:

Figure 56. Example Application Architecture

Each red line from the CloudWatch icon represents a connection to CloudWatch. Almost every service
you can see emits logs to CloudWatch. We will show you which logs are generated by our REST API
handler, which is responsible for getting, adding, and removing repositories from the DynamoDB table.

When we make requests to our API, there are three main places where our application creates logs.

62
Figure 57. Log Groups for REST API

These places are:

1. Lambda Logs - this is where the actual business logic is executed

2. API Gateway Access Logs - access logs provide information about the access to your API

3. API Gateway Execution Logs - execution logs provide information about the execution of your API,
such as latencies and status.

4.8.1. Lambda Logs

Lambda logs are logs created by your Lambda function. The Log Group for the Lambda function is called
/aws/lambda/{function-name}. For example, if you deployed the example application with Terraform, it
will be: /aws/lambda/cw-ho-tf-dev-api-rest

Let’s open this Log Group and see which logs were created.

63
Figure 58. Lambda Log Group

Once we open the Log Group, we can see different Log Streams. As a recap: Each Log Stream in Lambda
is a separate execution environment.

If we go to the Log Stream, we can see the actual logs.

Figure 59. Log Stream for REST API

64
Let’s see one example Log Event. Each Lambda execution starts with a START RequestId Log Event. This
shows you which version of Lambda was used and gives you the request. After that, all Log Events we
have configured follow.

For example, we see:

• Incoming event

• Data to process

• Retrieving data from DynamoDB

• … and much more.

You can open each Log Event to see additional data. Here is one example delete log event:

{
"message": "Removing repository from DynamoDB",
"service": "repo-tracker",
"awsRegion": "us-east-1",
"correlationIds": {
"awsRequestId": "71dc393d-08fc-408a-bbea-9b43ff260e3a",
"xRayTraceId": "1-65f7d919-322417b94928eab3742de94d",
"requestId": "40fba0f0-161e-4c9f-b068-30398b8aaeef",
"httpMethod": "DELETE",
"path": "/repositories/hapijs%2Fhapi",
"buildTimestamp": "none"
},
"lambdaFunction": {
"name": "cw-ho-tf-dev-api-rest",
"arn": "arn:aws:lambda:us-east-1:590183990318:function:cw-ho-tf-dev-api-rest",
"memoryLimitInMB": 1024,
"version": "$LATEST",
"coldStart": false
},
"logLevel": "INFO",
"timestamp": "2024-03-18T06:03:08.594Z",
"logger": {
"sampleRateValue": 0.1
},
"fullName": "hapijs/hapi"
}

It gives us information about the Lambda function, some information on the system, and IDs and
messages that are relevant to our application.

65
4.8.2. API Gateway Access Logs

Next, we look at API logs. Each access to the API is logged by API Gateway. These logs are called Access
Logs.

We’ve defined the access logs settings in Terraform like this:

resource "aws_api_gateway_stage" "stage" {


stage_name = "prod"
rest_api_id = aws_api_gateway_rest_api.rest_api.id
deployment_id = aws_api_gateway_deployment.deployment.id

xray_tracing_enabled = true

access_log_settings {
destination_arn = aws_cloudwatch_log_group.websocket_access_logs.arn
format = jsonencode({
requestId : "$context.requestId",
ip : "$context.identity.sourceIp",
...
identityCaller : "$context.identity.caller",
})
}
}

The access_log_settings block defines the Log Group to which the logs are sent and the format it should
follow. Once you deploy your API Gateway, the stage settings look like this:

66
Figure 60. API Gateway Stage Settings

Custom access logs are enabled with a specific format. The Log Group can also be found here:
/aws/apigateway/<API-STAGE_ID>/access_logs.

Here is an example Log Event of an access log:

{
"accountId": "-",
"apiKey": "-",
"authorizerPrincipalId": "-",
"caller": "-",
"httpMethod": "GET",
"identityAccessKey": "-",
"identityCaller": "-",
"identitySourceIp": "3.250.215.109",
"integrationLatency": "51",
"ip": "3.250.215.109",
"protocol": "HTTP/1.1",
"requestId": "b675d420-f676-4c78-be1f-15ff12b490eb",
"requestTime": "25/Nov/2023:04:08:06 +0000",
"requestTimeEpoch": "1700885286342",
"resourcePath": "/{proxy+}",
"responseLength": "2",
"stage": "prod",
"status": "200",
"user": "-",
"userAgent": "undici",
"userArn": "-"

67
}

You can see different attributes like:

• httpMethod - The HTTP Method of the request

• identitySourceIp - The IP address of the caller

• requestId - The request ID of the API Gateway request and much more!

The requestId can be helpful if you want to see additional logs in Logs Insights. For example, we have
added the requestId as a correlation ID to our logs.

logger.appendKeys({
requestId: event.requestContext.requestId,
});

Each Log Event will contain this one correlation ID.

The requestId is specific to an invocation. Adding it as a persistent key/attribute is discouraged


because persistent keys are not cleared automatically.

Conceptually, the requestId is temporary and scoped to a single invocation, even if you want it in
all your log entries for that invocation. If you want to use this, make sure to clear it at the end of a
request via middleware, for example, one from Middy.js.

If we want to see all logs by this request ID, we can make use of the following query in Logs Insights:

fields @timestamp, message, @message


| filter requestId = 'b675d420-f676-4c78-be1f-15ff12b490eb'
| sort @timestamp asc

This really simple query gives us all logs that are attached to this requestId. That can help you a lot while
debugging and understanding requests. It will also help you understand the flow of your application.

68
Figure 61. Logs Insights Query by Request ID

4.8.3. Execution Logs Give Request Details

The last logs we look at in our REST API are Execution Logs. These logs are created by API Gateway and
contain information about the execution of the API Gateway. They show the request and response to
each request. Be aware that these logs can get quite expensive if your API has a lot of requests!

We’ve created the CloudWatch Log Group with the name: API-Gateway-Execution-
Logs_<API_GATEWAY_STAGE_ID>/prod. The logs are especially helpful if you have more advanced
authentication with usage plans, API keys, etc. Even for our simple application, they are quite helpful to
understand your API better.

Here you can see an example journey with all Execution Logs attached:

69
Figure 62. Execution Logs Example

Execution Logs show you the whole trace and journey of the request. We won’t delve deeper into
execution logs. Make sure to make some requests to the application and see how the logs behave.

4.9. CloudWatch Logs Insights Has a Few Quotas That


You Need to Be Aware Of
You can find all CloudWatch Logs quotas here: CloudWatch Quotas.

Important quotas for Logs & Insights are the following (not a complete list):

Description Quota

Your log insight query can run up to 60 minutes

Discovered log fields Maximum of 1000 fields per Log Group

Extracted JSON fields Maximum 200

Concurrent live tail sessions 15

Log event has a maximum size of 256 KB

Log groups per account per region 1,000,000

You can request a quota increase for most of the quotas. There are many more quotas based on the
requests per second for CloudWatch APIs. Check out the link above for more information.

70
4.10. Understanding CloudWatch Logs Pricing: Ingestion,
Storage, and Analysis Costs
CloudWatch Logs pricing is based on the amount of log data ingested, stored, and analyzed. The TLDR
is:

• Ingestion - How much logs will be put into CloudWatch

• Storage - How many logs are stored in CloudWatch

• Analysis - How many logs you’ve analyzed with Log Insights

4.10.1. Ingestion

AWS CloudWatch charges for the amount of log data ingested per account per region. The first 5 GB of
ingested log data per month is included in the free tier. After that, the price per GB ingested varies
depending on the region and the type of log data. For example, US East (N. Virginia) has a price of $0.50
per GB for standard log data and $0.25 per GB for VPC Flow Logs.

4.10.2. Storage

The second pricing category is storage. The first 5 GB of stored logs are also included in the free tier.
After that, the price is dependent on the number of GB stored. For example, in us-east-1, 1 GB of logs
costs $0.03 per GB per month.

4.10.3. Log Analysis

The last pricing category is log analysis. You need to pay for each GB of analyzed log. If you analyze logs
with Logs Insights, you will scan through a lot of logs. You need to pay per GB of analyzed log. In us-
east-1, you will pay $0.005 per GB of scanned log.

Figure 63. Analyzed bytes per Query

This screenshot shows that 22.7 kb of logs were scanned. This can get much higher if you have a lot of
logs. For our example application, the logs are still in a low number.

4.11. Best Practices for Logs & Insights


1. Use Structured Logging: Always use structured logging (like JSON) in your applications. This will
allow you to easily query and filter your logs using CloudWatch Logs Insights.

2. Set Log Retention Policies: By default, logs in CloudWatch are kept indefinitely, and this can lead to

71
unnecessary costs. Always set up a Log Retention policy that aligns with your business and
compliance needs.

3. Use Metric Filters: Metric filters allow you to turn log data into numerical CloudWatch metrics that
you can graph or set an alarm on. We will create one in the later chapters.

4. Centralize your Logs: If you have logs in multiple AWS accounts, consider centralizing them into a
single account. This can make it easier to manage and analyze your logs. You can use the
Observability Access Manager (OAM) to achieve this.

5. Monitor Log Group Metrics: CloudWatch provides metrics for the number of log events and the
volume of log data ingested. Monitor these metrics to keep track of your logging activity and costs.
Keep an eye on the ingestion.

6. Use Log Insights for Complex Queries: If you need to perform complex queries on your logs, use
CloudWatch Logs Insights. It allows you to perform SQL-like queries on your log data and visualize
the results.

7. Mask Sensitive Data: Understand which data within your logs is sensitive and create data
protection policies to mask this data.

8. Use Live Tail for Real-Time Logs: Consider using Live Tail if you’re working with incoming Log
Streams. This makes it much easier to find the correct logs.

4.12. Summary
In this chapter, we explored central logs management and analysis with CloudWatch Logs and Logs
Insights. We covered the basics of Log Groups, Log Streams, and Log Events. We emphasized structured
logging using Powertools for AWS and discussed log retention policies to manage storage and costs. We
delved into CloudWatch Logs Insights, learning to use the console, sample queries, filtering,
aggregation, intrinsic functions, and saving queries.

We also introduced CloudWatch Live Tail for real-time log monitoring and discussed masking sensitive
data. We examined logs in our example application, focusing on Lambda logs and API Gateway access
logs.

Logs and Insights are central to observability with CloudWatch because they provide visibility into your
applications and infrastructure.

72

You might also like