The+CloudWatch+Book+-+Logs+&+Insights
The+CloudWatch+Book+-+Logs+&+Insights
The+CloudWatch+Book+-+Logs+&+Insights
Your system needs to continuously log information about what is happening. That can be log in
requests from users, errors that happened, or simply information about how long a certain operation
took.
In AWS, there are many services that log to CloudWatch by default. CloudWatch gives you the
functionality to store, view, and analyze these logs with a set of different tools.
If we take Lambda as an example, Lambda is logging automatically to CloudWatch. Each time you run a
Lambda function, the logs are stored in CloudWatch Logs.
Logs in CloudWatch reside in Streams, and Streams reside in Log Groups. To analyze logs and to get an
overview of Logs across different Log Groups, it is important to understand the concept of Log Insights.
You will learn all of that in this chapter.
We start by exploring the concepts of Logs in general. After that, we continue with Logs Insights. Logs
22
Insights lets you query your different Log Groups like a database with a SQL-like query language. We will
show you different examples of using Log Insights within our example project.
To fully understand the following chapter, it would be good to read the chapter about the example
project first. This will help you follow where logs appear and how to query them.
We will explain different concepts by showing you the actual names of Log Groups,
Streams, and more.
Remember that it depends on how you have deployed the example project and that the names
can vary.
• Log Groups: A Log Group can be seen as a container for Log Streams. A single Log Group is typically
attached to one application, like a Lambda Function or ECS container.
• Log Streams: Log Streams contain the actual Logs. There are multiple Log Streams within one Log
Group. Each Log Stream contains multiple Log Events.
• Log Events: Log Events are the actual log outputs of your application or resource. They are the
smallest unit of Logs.
23
Let’s dive into each category and see an example for each of them.
A Log Group is typically attached to one service or application. For example, one Log Group is attached
to one Lambda function. A Log Group holds several Log Streams.
Let’s understand Log Groups by looking at an example. We use our REST API Handler Lambda function
as an example.
The REST API handler Lambda function is a Lambda function that is triggered by an API Gateway. It is
responsible for handling all incoming requests to our REST API.
For most AWS services, the naming convention is like that: /aws/<service>. There are just a few
exceptions to this rule, like API Gateway Execution Logs.
If you create a Log Group with this exact name, Lambda will automatically log into this Log Group (if the
IAM permissions are set correctly). In our example, the function name is cw-ho-tf-dev-api-rest. By that,
the Log Group that is attached to this Lambda function is called:
/aws/lambda/cw-ho-tf-dev-api-rest
There are two ways of finding this Log Group. Either you go to CloudWatch ⇒ Log Groups and search for
this name. Or you go to the Lambda function ⇒ Monitoring ⇒ View CloudWatch logs. We find ourselves
doing the latter.
24
Figure 13. Jumping to the CloudWatch Logs via the Lambda Console
Once you click on this button, it will open a new tab with the Log Group. In the Log Group, you will see
several things like:
Multiple Log Streams are contained within one Log Group. Multiple Log Events live in one Log Stream.
For Lambda, one Log Stream is typically one execution environment of a Lambda function. That means
25
the Log Stream can contain multiple events that are logged by the Lambda function. But each Lambda
environment has its own Log Stream. In case multiple Lambdas are launched, multiple Log Streams are
created.
The Log Stream contains all Log Events. You can define the time range of the Log Events and filter
events.
If you want to filter events, you can do this here already. The syntax is not very intuitive, but it is
nevertheless powerful. You can also easily filter by a specific time range or a specific string.
For example, you can filter by a simple string REPORT to get all report logs of the Lambda execution. This
log shows you some information about how long the Lambda did run and how much memory it used.
You can also filter structured logs as the query language of CloudWatch allows you to filter by JSON
fields.
26
But the real power of filtering logs comes later with Logs Insights.
Log Events are the smallest unit of logs. They are the actual log outputs of your application. Log Events
are contained within Log Streams.
{
"level": "INFO",
"message": "Getting repositories from DynamoDB.",
"sampling_rate": 0.1,
"service": "repo-tracker",
"timestamp": "2023-11-26T01:23:13.424Z",
"xray_trace_id": "1-65629e01-3175af1e414cddf527102748",
"correlationIds": {
"requestId": "df5049c6-3706-490f-88c4-c751fb58ba84"
}
}
This is a Log Event that was logged with a structured logger. If you haven’t set up a structured logger,
the Log Event will look different. It typically is a simple text output.
Now you know the basic concept of how CloudWatch treats logs. Log Groups contain Log Streams, and
Log Streams contain Log Events. Let’s now look at another basic building block: Structured Logs.
27
4.3. Use Structured Logging as the Basic Building Block
for Observability
Using structured logging is the basic building block for observability. It is highly recommended to create
your logs in a structured format like JSON.
By default, logs are just plain text. AWS often follows a so-called Common Log Format (CLF) for logs. For
example, Lambda logs do this a lot in INIT, REPORT, or START Logs.
If you compare that with an example log from a structured logger, you will see the difference:
{
"level": "INFO",
"message": "Getting repositories from DynamoDB.",
"sampling_rate": 0.1,
"service": "repo-tracker",
"clientIp": "192.168.131.39",
"timestamp": "2023-11-26T01:23:13.424Z",
"xray_trace_id": "1-65629e01-3175af1e414cddf527102748",
"correlationIds": {
"requestId": "df5049c6-3706-490f-88c4-c751fb58ba84"
},
"time": 98,
"error": 0,
"failure": 0,
"cacheHit": 0
}
We recommend using a structured JSON format. This will not only help you query and analyze your logs
more effectively, but it will also enable you to log wide events and add additional context to your logs.
28
Figure 18. Structured Logs Example
Maybe even for things that you don’t need right now but might need in the future.
With correlation IDs, you can even attach IDs through several log outputs. This will help you trace
requests better and understand which requests belong to each other.
In 2023, AWS announced that it is now possible to log JSON directly to CloudWatch Logs without any
change. This is a great change, but in this book, we’re not using that. This has several reasons:
• No log formatter
This is why we stick to another library for structured logging. We are sure AWS is working on a couple of
these things already. And once they are there, we will update this section!
Powertools for AWS is an amazing library developed by internal teams at AWS. We use this library
throughout this book. For CloudWatch, we use several features like:
• Structured logging
• Sampling logs
• Tracing requests
29
• Creating custom metrics
Throughout the different chapters, we will make use of Powertools for AWS a lot. But remember, you
can also use other libraries like Pino or Winston for structured logging.
Let’s see how you can use the logger from Powertools for AWS in Node.js:
{
"level": "INFO",
"message": "Creating my first log",
"sampling_rate": 0.1,
"timestamp": "2023-11-26T01:51:24.480Z",
"params": "hello World",
}
This gives us a structured log already. But we can do even more with the logger. Let’s add some
correlation IDs so that we can easily find logs that belong together.
With that, each subsequent log will now have the requestId as a correlation ID. We will see this in action
later once we use Log Insights.
For our example project, we use one shared logger with additional configuration.
30
});
• Sample Rate: We set the sample rate to 0.1. This means that we log 10% of all debug logs.
• Service Name: We set the service name to repo-tracker. This will be added to each log.
• Log Formatter: With a log formatter, you can format your logs.
We have added a custom log format that adds additional attributes to the log. Here is a smaller version
of the log formatter, so that you get an idea:
backend/lambda/utils/log-formatter.ts
31
return logItem;
}
}
The log formatter transforms the log into a more readable version.
In the screenshot above, you can see the differences between both logs. It will automatically extract
important details like the awsRegion, which is the region where the function is deployed.
Using a log formatter will help you a lot with the readability of your logs.
Each Lambda function has a context. The context contains information about the Lambda function. By
adding the context, we get additional information in our logs such as:
• Function Name
• Function ARN
• Memory
• Version
• Cold Start
{
"message": "Incoming event",
"service": "repo-tracker",
"awsRegion": "us-east-1",
32
"correlationIds": {
"awsRequestId": "57423c1e-a8db-4daf-8e63-39483bce8c9a",
"xRayTraceId": "1-65f2bb4d-3e59ea5908ddc0ce1ca5597d"
},
"lambdaFunction": {
"name": "cw-ho-tf-dev-api-rest",
"arn": "arn:aws:lambda:us-east-1:590183990318:function:cw-ho-tf-dev-api-rest",
"memoryLimitInMB": 1024,
"version": "$LATEST",
"coldStart": true
},
"logLevel": "INFO",
"timestamp": "2024-03-14T08:54:39.290Z",
"logger": {
"sampleRateValue": 0.1
}
}
This information can be very helpful when analyzing your workloads. The context is typically sent along
with the invocation event of the Lambda. You can add the context to your logger with three different
options:
2. You can use a middy middleware to add the data before your execution happens.
3. You can use the class method decorator @logger.injectLambdaContext() (especially for people devs
who like OOP).
Middy.js is a middleware framework for AWS Lambda. It allows you to run middlewares before or after
the execution of your Lambda function.
We use it a lot for repetitive tasks like adding the Lambda context to the logger. In the screenshot, you
33
can see an example of such repetitive tasks, such as:
You can also add middlewares that should happen after the invocation, such as:
Middy is not directly connected to any CloudWatch feature per se. But we wanted to mention it since the
Powertools for AWS library is using middy. And we think it can speed up your Lambda development
workflow a lot!
This was a quick introduction to structured logging and Powertools. Powertools is not the only option,
but a pretty good one. Now, let’s learn more about CloudWatch.
The easiest way to adjust the costs for the second parameter, the amount of data stored, is to set a
retention policy. The Log Retention period defines how long logs are stored in CloudWatch. By default,
logs are stored indefinitely when you create a new Lambda function.
This is not only suboptimal but can also be very costly. First of all, you will pay for the storage of these
logs. Second, you will have a lot of logs, which can make it very hard to analyze and find the appropriate
34
logs. Third, you often have audit requirements that define how long you need to store logs.
This is where Log Retention helps you. You can set the retention period to a value between 1 day and
10 years. Reminder: If you omit the retention period, logs are stored indefinitely.
Learn about your requirements and set your retention period accordingly.
Tip: It is often required from a regulatory perspective to store your logs longer than
you'd like to. For example, SOC2 requires you to store your server logs for at least 1
year.
One problem with Lambda is that Log Groups are created automatically. By logging your first output to
STDOUT, a Log Group is created. This Log Group will not be in your CloudFormation Stack or your
Terraform state. That makes it quite hard to configure the Log Retention period.
This is why we suggest creating a Log Group for every Lambda function automatically. Log Groups are
connected to Lambda functions only by their name. This makes it easier to set a retention period.
Here you can see a Log Group that was created by a Lambda function:
Infrastructure-as-code tools help you with that. In every project, we define a Lambda module that
creates a Log Group for each Lambda function. We set a default retention period of 14 days and let
developers override it if necessary.
We always deploy Lambda functions together with a Log Group so that we can set the retention period.
Here is the example Terraform code.
module "lambda_main" {
source = "terraform-aws-modules/lambda/aws"
function_name = "${var.global_prefix}-${var.environment}-api-rest"
handler = "dist/api-rest.handler"
runtime = var.nodejs_runtime
35
cloudwatch_logs_retention_in_days = 14
}
After we deploy a Lambda function with the Terraform code above, we will see the Log Group in the
CloudWatch console.
In 2024, Lambda launched a feature that allows setting a custom Log Group for your Lambda function.
That means you can also create a Log Group with a different name and attach it to your Lambda
function. This context is very important, especially if you see some legacy applications or Lambda
functions.
36
4.5. Analyze Logs with CloudWatch Logs Insights
Up until now, we have covered everything about ingesting logs into CloudWatch. But once you have
many logs, you will see that it is quite hard to analyze your logs. CloudWatch Logs Insights is exactly for
this use case.
It helps you query your logs across multiple Log Groups. You can build queries (very much like a SQL
query) to find logs efficiently.
With sample queries, field suggestions, and the ability to share queries, it is one of the most powerful
tools in CloudWatch. If you use CloudWatch as a monitoring tool, Logs Insights will be your best friend.
We often saw that organizations start using third-party tools that could be easily solved with Logs
Insights. This is why we want to give you a deep dive into Logs Insights.
One important note: You can only query logs that are not older than November 5, 2018.
Let’s first have an overview of the Logs Insights console. Head over to CloudWatch and select Log
Insights. Make sure that you are in the correct region. Our application is deployed in the us-east-1
region.
After you open the Log Insights console, you will see the following:
37
Figure 25. The Log Insights Console
• Time Selection: You can choose the time range you want to query, either absolute (from 2021-12-12
12:00:00 to 2021-12-12 13:00:00) or relative (last 1 hour).
• Log Groups: A dropdown menu to select up to 50 Log Groups you want to query. We have selected
the Log Group of our REST API Handler Lambda Function (/aws/lambda/cw-ho-tf-dev-api-rest).
• Query Editor: The query editor helps you write queries by suggesting fields and functions. Since
2023, there has also been an AI query generator, but more on that later.
• Results, Patterns, Visualizations: Once you run a query, you will see the results, patterns, and
visualizations.
• Discovered Fields: This tab shows you all the fields that are available for querying.
• Sample Queries: CloudWatch Log Insights provides you with a set of sample queries. You can also
save queries and share them with the team.
Before we explain how to use Logs Insights, let’s take a look at some of the sample queries it comes
with. These are example queries you can use to get started quickly. Let’s examine two different queries
and explain what they do:
38
1. Find the most expensive requests:
This query filters all Log Events that have the field @type set to REPORT. This is a special Log Event that
each Lambda function provides at the end of the execution. For our Lambda function, it looks like this:
From this Log Event, the fields @requestId and @billedDuration are extracted. Finally, the results are
sorted by the @billedDuration field. The results look like this:
We can see the most expensive queries based on the @billedDuration field.
This query is a bit more complex. It shows you how to use the stats function. This query uses the
statistical functions min, max, and avg to calculate the overprovisioned memory. It will show you how
much memory your Lambda function used and how much memory you assigned to it. Let’s see the
results:
39
Figure 27. Another query result example
With increased memory allocation, you also gain proportional enhancements in bandwidth and
vCPUs. This means that by overprovisioning memory, you can achieve better networking speed
and improved CPU performance. For instance, in memory-intensive applications, allocating
additional memory can reduce latency and increase throughput. Similarly, for compute-bound
tasks, more vCPUs can lead to faster processing times. Therefore, strategically overprovisioning
memory can optimize both your network and CPU resources, leading to overall better system
performance.
Interestingly, this can lead to cost savings. For example, a Lambda function with more memory
might complete its tasks faster, reducing the total execution time and thus lowering the overall
cost. By optimizing performance, you can achieve better efficiency and potentially lower expenses.
These two queries give you an idea of what is possible with CloudWatch Logs Insights.
Now, let’s start over and learn how to use Log Insights.
4.5.3. Finding Relevant Fields with the Query Editor and Field Selections
Now that you have seen some sample queries, let’s dive into how to use Logs Insights.
The main building blocks of Logs Insights are queries. Queries allow you to filter, aggregate, and
visualize your logs. The query language is similar to a SQL-like language.
You select the fields you want, define a filter to narrow down the logs, and use stats to aggregate the
logs. You can make use of a lot of different functions to get the most out of your logs. Let’s get an
overview and see what is possible.
We will start with a simple query and then dive into more complex queries.
40
fields @timestamp, @message, @logStream, @log
| sort @timestamp desc
| limit 20
You will see this query every time you open the Log Insights console. Let’s see what it does:
• fields: This function selects the fields you want to see in the results. In this example, you see the
fields @timestamp, @message, @logStream, and @log.
• sort - This function sorts the results by the field @timestamp in descending order.
Once you run the query, you will see the following output:
The result shows you a visualization of when Log Events were logged.
The first step of writing logs is to find the relevant fields you want to log. The query editor helps you with
that. Once you start typing, you can make use of auto-suggestions.
For example, we have different correlationIds in our logs. To see which ones are available, we can start
typing correlationIds., hit the auto-suggestion shortcut (cmd+space on Mac), and see all available fields.
41
Another way of finding all available fields is on the right side of the console. The tab Discovered Fields
shows you all the fields that are available for querying. Important: You need to run one query first. We
suggest just running the first sample query with a larger limit to see all available fields.
Figure 30. Log Insights shows discovered fields in our structured logs
After that, you will see all available fields and how often they appear. For example, the field
correlationIds.awsRequestId is only available in 69% of your logs. If you click on one field, you will also
see in which Log Groups the field appears and how often. Since we only selected one Log Group in our
query, the field appears in 100% of the logs for that Log Group.
In 2023, AWS announced the AI Query Generator for Logs Insights. This query generator lets you
generate queries based on natural text. For example, you can ask it to find all POST requests for a specific
Log Group. It will then try to generate a query for you.
To do this, simply enter the text find all POST requests in the query AI. Then click on Create Query and
it will generate the query for you. Here is the result:
Figure 31. Generating queries with the Query generator (powered by AI)
42
It correctly generates the query for us.
Please note that this feature is currently only available in us-east-1 and that it is still in beta
(state: August 2024).
This means, the AI may sometimes generate queries that won’t work, but it provides a good
starting point. However, a more critical issue arises when the AI generates a syntactically correct
query that is semantically incorrect. This can lead to misleading results and potentially costly
mistakes.
Therefore, it is crucial to invest time in learning the Logs Insights syntax thoroughly.
Understanding the syntax and semantics will help you to verify and refine AI-generated queries.
Remember, AI should be considered an assistant, not a replacement.
It can help you get started, but your expertise and judgment are essential for accurate and
actually useful monitoring.
One of the most common functionalities in Log Insights is to filter logs based on certain fields. This is
where the filter function comes into play.
There are many different ways of filtering logs. You can filter on numerical values (points>1000) or also
on text values (httpMethod="GET"). You can match multiple values with the in operator (httpMethod in
("GET", "POST")) or even use regex (httpMethod like /GET/). This can be very helpful and is a very good
way to get started.
Let’s look at an example log event (the log event is slightly modified for better readability) :
{
"message": "Removing microsoft/vscode from DynamoDB.",
"service": "repo-tracker",
"awsRegion": "us-east-1",
"correlationIds": {
"awsRequestId": "7376e3e8-c7ee-4363-97c2-235a515b412e",
},
"lambdaFunction": {
"name": "cw-ho-tf-dev-api-rest",
},
"logLevel": "INFO",
"timestamp": "2024-03-17T08:01:32.765Z",
"logger": { "sampleRateValue": 0.1 }
}
43
A very common use case of Log Insights is to find logs that belong together. For example, we have
added the awsRequestId to the correlationIds field. Let’s see which logs all belong to this deletion
request:
For that we filter all requests by the awsRequestId field. The ID for this log was 7376e3e8-c7ee-4363-97c2-
235a515b412e. This one will be different for your execution!
We can also filter all logs and look for all POST requests:
This query shows you all logs that have the correlationIds.httpMethod set to POST.
There are many operations out there that you can use with the filter operation. For example, you can
also search full text. This is often very useful in logs.
Let’s say we want to find all POST requests that contain the word sst in the name of the repository.
44
fields @timestamp, fullName, @message, @logStream, @log
| filter correlationIds.httpMethod = "POST" and fullName like /sst/
| sort @timestamp desc
This query will show you all POST requests that contain the word sst in the name of the repository. You
can combine filter arguments with and or or operators.
The operator like also supports using regular expressions and wildcards.
fields @message
| filter fullName like /(?i)Code/
With this query, you check for all logs that contain the case-insensitive word code in the fullName field.
You can also negate the filter with the not operator.
fields @message
| filter fullName not like /(?i)Code/
Now you’ll find all logs that don’t contain the word Code in the fullName field.
There are some more operators that you can use with the filter function. For example, the in operator to
match multiple values in an array.
Log Insights can be very powerful when you apply statistical functions to your logs. Statistical functions
follow the syntax: stats <function> as <alias> by bin(<time>)
The alias and bin are both optional. There are several different statistical functions, which can be
categorized as aggregation and non-aggregation functions.
Aggregation functions use data to calculate a single value. The available aggregation functions are:
45
• count() - Counts the total number of log events.
Non-aggregation functions do not perform calculations; they display one data value from a set of data.
For example, the earliest or latest field.
• earliest() - Returns the value from the earliest timestamped log event for a specified field.
• latest() - Returns the value from the latest timestamped log event for a specified field.
• sortsFirst() - Returns the value that sorts first in the queried logs for a specified field.
• sortsLast() - Returns the value that sorts last in the queried logs for a specified field.
Let’s say we want to know how often the repository microsoft/vscode was added to our application. We
can run a query to do exactly that:
46
Figure 35. Building a complex query to find out how often a repository was added
This shows us that in the past 4 weeks, repositories with the name code were added 240 times.
We can even take this one notch further and look at how this data behaved over time. Let’s bin this data
into 1-day buckets and see how often it was added.
This results in a time series of how often the repository was added.
With the bin() function, you can slice your data into different time bins. In this example, we slice the
47
data into 1-day bins. You can use one of the following time units:
• ms - Milliseconds
• s - Seconds
• m - Minutes
• h - Hours
• d - Days
• w - Weeks
• mo - Months
• q - Quarters
• y - Years
In Log Insights, you can even visualize your queries. To do that, head over to the Tab Visualization.
Select your chart type (we selected Line Chart), and you can see your time-series data.
This shows the query above (counting how often the repository VS code was added) in a time-series
chart.
This gives us a quick overview of our log data. It can be super helpful to understand business logic
better without creating expensive custom metrics. You can then go ahead and add this visualization to
your CloudWatch Dashboard and share it with your stakeholders.
48
4.5.6. Intrinsic Functions like ispresent, isipinsubnet, concat Make Your Life
Easier
Logs Insights offers a range of intrinsic functions that make your life easier. These functions help you
analyze your logs efficiently and effectively.
4.5.6.4. IP Functions
• isempty() / isblank() / concat() - Checks for empty or blank strings and concatenates strings.
• ltrim() / trimChars() / rtrim() - Trims characters from the left, right, or both sides of a string.
• strlen() / toupper() / tolower() - Returns the length of a string and converts strings to upper or
lower case.
Example Usage
A typical example is the ispresent function. With this function, you can filter logs where a field is
present. For our example, we want to see all logs where the repository field is present.
49
fields @timestamp, repository
| filter ispresent(repository)
Now you only see all logs where the field is present. There are many more such functions. The easiest
way to figure them out is to go to the query editor and hit cmd+space (or ctrl+space on Windows) to see
all available functions.
Sometimes you may want to add additional fields to your logs. In our example application, it could be
interesting to split the fullName field into owner and repo. Let’s see how we can do that with the parse
functionality of Log Insights.
From our fullName log attribute, we can extract the owner and the repo. For example, if the fullName is
apache/airflow, the owner would be apache and the repository would be airflow.
Note: The ispresent is not necessary here, but it makes the output clearer if we only see the fields. The
result looks like this:
50
Figure 40. Using the ispresent filter
This functionality makes it very easy to extract fields from your logs. It is also possible to extract fields
using Regex or Glob expressions.
Logs in your application typically follow a certain pattern. Log Insights has a functionality to find these
patterns and cluster your logs based on them. A pattern describes when similar logs occur together.
This feature was recently released (in 2023) and is a very powerful tool. When you query your logs, you
will now see a tab called Patterns. Let’s run an example query and see what happens:
We’ll query our REST API Handler Lambda function and retrieve a lot of logs.
51
Log Insights has found 29 patterns in our logs. You can sort these patterns by event count or event
ratio.
We can see that the logs that appear the most are the start and end logs (see the second and third
rows). But we can also see that the pattern Incoming Request appears frequently. We can inspect this
pattern by clicking on the magnifying glass.
Another pane will open, allowing you to examine the pattern in more detail. You can see the log events
that belong to this pattern, as well as all the values of the attributes and related patterns.
If you have a high volume of data, patterns can be useful for finding similar error cases and helping you
debug.
A common problem with putting out a lot of logs into a logging system is that you also increase the
noise. The more logs you have, the harder it will be to find relevant logs.
For every user session, you will have duplicated logs. This is not a bad thing. But at certain times, you
will want to see only logs of one user flow.
This is where the dedup function comes into play. dedup removes duplicate logs from your query results.
52
| filter correlationIds.httpMethod = 'GET'
You can see that many logs are duplicated because of multiple GET requests.
Once we apply the dedup function, we can see only one journey of the logs without seeing all the noise.
53
Figure 44. Using the dedupt function to reduce the noise
When working with a lot of logs, this can be very helpful to reduce the noise.
Creating queries is an art in itself. You don’t want to do this over and over again. This is why it makes
sense to share queries with your team and save them. This is really easy in Log Insights. Click on the tab
"Queries" and you can save your queries.
You can organize your queries in folders. We have two query folders:
2. Lambda Logs
It makes sense that you do this for queries that you often use.
One thing that is often overlooked with Saved Queries is that they act as Tabs. It sounds a bit weird but
hear us out.
54
Once you execute a query, you can switch between saved queries. The outputs and the changes in the
queries are saved. If you have a change in your query, you will see a blue indicator next to the name of
the query.
There is no need to open the window multiple times. It took some time until we discovered that!
Log Insights also keeps track of your query history. This is very helpful if you want to go back to a query
you have run before.
It often happens that you play around with queries and can’t remember them anymore. The query
history helps you with that.
This is where Live Tail can help you a lot. It gives you the ability to tail your logs in real time. Developers
and ops engineers were used to checking the logs of server-side applications with tail -f. CloudWatch
Live Tail is like tail -f for the Cloud.
55
Figure 48. The Live Tail console view
After you go to the Live Tail console, you can select a Log Group. Here, we’ve selected the Log Group of
our REST API Handler Lambda function.
You can also go one level deeper and select a Log Stream. Once you click on "Start," you will see the logs
in real time.
There are two ways to filter and highlight your logs. First of all, Live Tail has a highlight feature. This is
the bar at the top. You can enter a term, and it will highlight all logs that contain this term. For example,
we have added DynamoDB and API.
56
Each term will get a different color. The color shown next to the log entry indicates that the term is in
the log.
After you add the terms, the respective logs will be highlighted with a colored bar on the side. This can
help you a lot if you have a large number of incoming logs.
The second way to filter your logs is by using a filter. The filter follows the JSON Path Syntax. This syntax
is often used within AWS services, for example in Log Streams, Metric Filters, and more places.
We add a filter to only match all logs from the service repo-tracker.
{$.service = "repo-tracker"}
57
Figure 51. Filtering inside Live Tail
This filter shows us all logs from our service repo-tracker. It makes the logging experience smoother
since all INIT, START, REPORT logs are filtered out. You can also adapt the filter to specific users, paths,
or other attributes.
58
Figure 52. Masking Sensitive Data with CloudWatch
CloudWatch can automatically detect sensitive data and mask it. What is sensitive data? For example:
• Email addresses
• IP addresses
• Usernames
• Passwords
CloudWatch has built-in mechanisms for pattern matching and more advanced machine-learning
techniques to detect sensitive data.
You can make use of several policies to hide sensitive data. Head over to your Log Group, select Data
Protection, and click on Create Policy.
59
Figure 53. Creating a Data Protection Policy
In the policy, you can select different Data Identifiers. For example, you can choose:
• Address
• AwsSecretKey
• BankAccountNumber
The Data Identifiers allow you to narrow down the data you want to hide. You can also enable Audit
Destinations. If CloudWatch finds sensitive data, it will be reported to the Audit Destination of your
choice. You can choose between CloudWatch, S3, and Kinesis. We are choosing a CloudWatch Log
Group. Let’s click on Activate data protection.
In our example application, we are logging the IP address in the incoming Lambda event. Let’s see how
CloudWatch can hide this data. In the event, we can now see that the IP address is masked with
asterisks:
60
Figure 54. Masked IP Address in CloudWatch Logs
We have implemented this in our example activation with Terraform and CDK, of course, as well.
The example we showed you was for a single Log Group. Data protection policies can be implemented
at two levels: for all log groups across your entire account or specific individual log groups. When a
policy is created at the account level, it applies to existing and any log groups created in the future.
The IP address in the last screenshot is masked. There are certain times when you want to unmask the
data. For example, if you need to debug an issue that involves the IP address.
61
The great thing about hiding the data is that you can unmask it. You need to have the right permissions
to do that (logs:Unmask). You can go directly to the Log Group, click on Display, and select Temporarily
unmask protected data.
This feature of masking and unmasking data helps you keep your customer data secure and still allows
you to efficiently debug your issues.
Each red line from the CloudWatch icon represents a connection to CloudWatch. Almost every service
you can see emits logs to CloudWatch. We will show you which logs are generated by our REST API
handler, which is responsible for getting, adding, and removing repositories from the DynamoDB table.
When we make requests to our API, there are three main places where our application creates logs.
62
Figure 57. Log Groups for REST API
2. API Gateway Access Logs - access logs provide information about the access to your API
3. API Gateway Execution Logs - execution logs provide information about the execution of your API,
such as latencies and status.
Lambda logs are logs created by your Lambda function. The Log Group for the Lambda function is called
/aws/lambda/{function-name}. For example, if you deployed the example application with Terraform, it
will be: /aws/lambda/cw-ho-tf-dev-api-rest
Let’s open this Log Group and see which logs were created.
63
Figure 58. Lambda Log Group
Once we open the Log Group, we can see different Log Streams. As a recap: Each Log Stream in Lambda
is a separate execution environment.
64
Let’s see one example Log Event. Each Lambda execution starts with a START RequestId Log Event. This
shows you which version of Lambda was used and gives you the request. After that, all Log Events we
have configured follow.
• Incoming event
• Data to process
You can open each Log Event to see additional data. Here is one example delete log event:
{
"message": "Removing repository from DynamoDB",
"service": "repo-tracker",
"awsRegion": "us-east-1",
"correlationIds": {
"awsRequestId": "71dc393d-08fc-408a-bbea-9b43ff260e3a",
"xRayTraceId": "1-65f7d919-322417b94928eab3742de94d",
"requestId": "40fba0f0-161e-4c9f-b068-30398b8aaeef",
"httpMethod": "DELETE",
"path": "/repositories/hapijs%2Fhapi",
"buildTimestamp": "none"
},
"lambdaFunction": {
"name": "cw-ho-tf-dev-api-rest",
"arn": "arn:aws:lambda:us-east-1:590183990318:function:cw-ho-tf-dev-api-rest",
"memoryLimitInMB": 1024,
"version": "$LATEST",
"coldStart": false
},
"logLevel": "INFO",
"timestamp": "2024-03-18T06:03:08.594Z",
"logger": {
"sampleRateValue": 0.1
},
"fullName": "hapijs/hapi"
}
It gives us information about the Lambda function, some information on the system, and IDs and
messages that are relevant to our application.
65
4.8.2. API Gateway Access Logs
Next, we look at API logs. Each access to the API is logged by API Gateway. These logs are called Access
Logs.
xray_tracing_enabled = true
access_log_settings {
destination_arn = aws_cloudwatch_log_group.websocket_access_logs.arn
format = jsonencode({
requestId : "$context.requestId",
ip : "$context.identity.sourceIp",
...
identityCaller : "$context.identity.caller",
})
}
}
The access_log_settings block defines the Log Group to which the logs are sent and the format it should
follow. Once you deploy your API Gateway, the stage settings look like this:
66
Figure 60. API Gateway Stage Settings
Custom access logs are enabled with a specific format. The Log Group can also be found here:
/aws/apigateway/<API-STAGE_ID>/access_logs.
{
"accountId": "-",
"apiKey": "-",
"authorizerPrincipalId": "-",
"caller": "-",
"httpMethod": "GET",
"identityAccessKey": "-",
"identityCaller": "-",
"identitySourceIp": "3.250.215.109",
"integrationLatency": "51",
"ip": "3.250.215.109",
"protocol": "HTTP/1.1",
"requestId": "b675d420-f676-4c78-be1f-15ff12b490eb",
"requestTime": "25/Nov/2023:04:08:06 +0000",
"requestTimeEpoch": "1700885286342",
"resourcePath": "/{proxy+}",
"responseLength": "2",
"stage": "prod",
"status": "200",
"user": "-",
"userAgent": "undici",
"userArn": "-"
67
}
• requestId - The request ID of the API Gateway request and much more!
The requestId can be helpful if you want to see additional logs in Logs Insights. For example, we have
added the requestId as a correlation ID to our logs.
logger.appendKeys({
requestId: event.requestContext.requestId,
});
Conceptually, the requestId is temporary and scoped to a single invocation, even if you want it in
all your log entries for that invocation. If you want to use this, make sure to clear it at the end of a
request via middleware, for example, one from Middy.js.
If we want to see all logs by this request ID, we can make use of the following query in Logs Insights:
This really simple query gives us all logs that are attached to this requestId. That can help you a lot while
debugging and understanding requests. It will also help you understand the flow of your application.
68
Figure 61. Logs Insights Query by Request ID
The last logs we look at in our REST API are Execution Logs. These logs are created by API Gateway and
contain information about the execution of the API Gateway. They show the request and response to
each request. Be aware that these logs can get quite expensive if your API has a lot of requests!
We’ve created the CloudWatch Log Group with the name: API-Gateway-Execution-
Logs_<API_GATEWAY_STAGE_ID>/prod. The logs are especially helpful if you have more advanced
authentication with usage plans, API keys, etc. Even for our simple application, they are quite helpful to
understand your API better.
Here you can see an example journey with all Execution Logs attached:
69
Figure 62. Execution Logs Example
Execution Logs show you the whole trace and journey of the request. We won’t delve deeper into
execution logs. Make sure to make some requests to the application and see how the logs behave.
Important quotas for Logs & Insights are the following (not a complete list):
Description Quota
You can request a quota increase for most of the quotas. There are many more quotas based on the
requests per second for CloudWatch APIs. Check out the link above for more information.
70
4.10. Understanding CloudWatch Logs Pricing: Ingestion,
Storage, and Analysis Costs
CloudWatch Logs pricing is based on the amount of log data ingested, stored, and analyzed. The TLDR
is:
4.10.1. Ingestion
AWS CloudWatch charges for the amount of log data ingested per account per region. The first 5 GB of
ingested log data per month is included in the free tier. After that, the price per GB ingested varies
depending on the region and the type of log data. For example, US East (N. Virginia) has a price of $0.50
per GB for standard log data and $0.25 per GB for VPC Flow Logs.
4.10.2. Storage
The second pricing category is storage. The first 5 GB of stored logs are also included in the free tier.
After that, the price is dependent on the number of GB stored. For example, in us-east-1, 1 GB of logs
costs $0.03 per GB per month.
The last pricing category is log analysis. You need to pay for each GB of analyzed log. If you analyze logs
with Logs Insights, you will scan through a lot of logs. You need to pay per GB of analyzed log. In us-
east-1, you will pay $0.005 per GB of scanned log.
This screenshot shows that 22.7 kb of logs were scanned. This can get much higher if you have a lot of
logs. For our example application, the logs are still in a low number.
2. Set Log Retention Policies: By default, logs in CloudWatch are kept indefinitely, and this can lead to
71
unnecessary costs. Always set up a Log Retention policy that aligns with your business and
compliance needs.
3. Use Metric Filters: Metric filters allow you to turn log data into numerical CloudWatch metrics that
you can graph or set an alarm on. We will create one in the later chapters.
4. Centralize your Logs: If you have logs in multiple AWS accounts, consider centralizing them into a
single account. This can make it easier to manage and analyze your logs. You can use the
Observability Access Manager (OAM) to achieve this.
5. Monitor Log Group Metrics: CloudWatch provides metrics for the number of log events and the
volume of log data ingested. Monitor these metrics to keep track of your logging activity and costs.
Keep an eye on the ingestion.
6. Use Log Insights for Complex Queries: If you need to perform complex queries on your logs, use
CloudWatch Logs Insights. It allows you to perform SQL-like queries on your log data and visualize
the results.
7. Mask Sensitive Data: Understand which data within your logs is sensitive and create data
protection policies to mask this data.
8. Use Live Tail for Real-Time Logs: Consider using Live Tail if you’re working with incoming Log
Streams. This makes it much easier to find the correct logs.
4.12. Summary
In this chapter, we explored central logs management and analysis with CloudWatch Logs and Logs
Insights. We covered the basics of Log Groups, Log Streams, and Log Events. We emphasized structured
logging using Powertools for AWS and discussed log retention policies to manage storage and costs. We
delved into CloudWatch Logs Insights, learning to use the console, sample queries, filtering,
aggregation, intrinsic functions, and saving queries.
We also introduced CloudWatch Live Tail for real-time log monitoring and discussed masking sensitive
data. We examined logs in our example application, focusing on Lambda logs and API Gateway access
logs.
Logs and Insights are central to observability with CloudWatch because they provide visibility into your
applications and infrastructure.
72