Lab_ Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
Lab_ Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
[version_1.0.27]
Note: It may take up to 25 minutes for the lab to complete set up and launch.
In this lab, you will use Amazon Kinesis Data Streams to collect data from a web server that is
hosted in Amazon Elastic Compute Cloud (Amazon EC2). You will then use AWS Lambda to
process and enrich the data. You will analyze the data by using OpenSearch Dashboards, where you
can build an index and then visualize the data to create dashboards. Business users will be able to
access dashboards by using Amazon Cognito.
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 1/23
1/2/25, 17:20 Lab: Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
Duration
This lab will require approximately 90 minutes to complete.
Scenario
The administrator for the university bookstore's website wants to gather insights on how visitors
interact with the site. She has been using a web-based JavaScript tracking system that includes
charts with information about user activity. The information includes where users are located,
which browsers they use, and whether they use mobile devices. The data can also indicate whether
visitors reached a bookstore product's page from a search page or a recommendations page.
The number of visitors to the site has grown each year. However, because a lot of visitors use
browsers that block third-party tracking scripts, she is now concerned that the data about user
activity isn't accurate.
While researching how to get data that is more accurate, she came across the Analyze User
Behavior Using Amazon Elasticsearch Service, Amazon Kinesis Data Firehose and Kibana post on
the AWS Database Blog. The post details a solution to use streaming data to analyze the logs for a
web server and determine user access patterns. The web administrator heard about your team
using AWS services to create data analytics solutions. She contacts you for advice about whether
the approach from the blog post could provide the same information as her current tracking
system.
The blog post includes files for a basic website with a search page and a recommendation page that
refer visitors to a few product pages. You decide to use this basic website to develop a proof of
concept (POC) to use streaming data to analyze user activity for the bookstore website.
The solution uses Kinesis Data Firehose to ingest streamed web server access logs and then uses
Lambda functions to enrich the data. After using Kinesis and Lambda to process the data, you can
use OpenSearch Service to load and index the data. By using OpenSearch Dashboards, you can
create visualizations of the data to provide insight about visitors to the university's websites. You
can share these visualizations by using Amazon Cognito as an authentication tool and AWS
Identity and Access Management (IAM) to authorize access to the dashboard.
When you start the lab, the environment will contain the resources that are shown in the following
diagram.
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 2/23
1/2/25, 17:20 Lab: Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
The infrastructure is designed to analyze streaming data and consists of the following
components:
An EC2 instance that runs a web server. The instance has a public subnet.
The web server that runs on the instance includes a website with the following files. If
you would like to review the website package, you can download and open this .zip file:
httpd.conf: Configuration file for the Apache web server
agent.json: Configuration file to connect the web server to a Kinesis Data
Firehose delivery stream
search.php: Page where a user can search for a product
recommendation.php: Page that recommends a particular product based on
the user's search history
echo.php, kindle.php, and firetvstick.php: Pages for three products on the
website
A Kinesis Data Firehose delivery stream that captures streaming data from the web server
logs. You will use the delivery stream to write data to an OpenSearch Service cluster.
A Lambda function to transform the data. If you would like to review the function, you
can download and open this .zip file.
An OpenSearch Service cluster to index and store the data.
An OpenSearch Dashboards instance to build data visualizations and gain insights from
the data.
By the end of the lab, you will have used the architecture to perform several tasks. The table after
the diagram provides a detailed explanation of these tasks in relation to the lab architecture.
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 3/23
1/2/25, 17:20 Lab: Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
NUMBERED
DETAIL
TASK
1 You will review the EC2 instance configuration for the web server. You
will also review the OsDemoWebserverIAMRole IAM role
and OsDemoWebserverIAMPolicy IAM policies to understand the IAM
permissions that are applied to the role.
2 You will review the Kinesis data stream that is configured to capture the
web access logs for users who access the website.
3 You will also review the configuration for the OpenSearch Service cluster
that is used in the lab.
4 You will then set up the OpenSearch Service index.
5 After you configure everything, you will browse the website to generate
access logs.
6 After you generate access logs, CloudWatch log events will also be
generated in the AWS account. You will review these logs to better
understand how Kinesis Data Firehose ingests these and passes them to
a Lambda function for further enrichment.
7 You will create an OpenSearch Service index pattern, which is needed to
create visualizations in OpenSearch Dashboards.
8 You will build a piece chart visualization in OpenSearch Dashboards.
9 You will wrap up the lab by creating a heat map visualization.
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 4/23
1/2/25, 17:20 Lab: Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
4. Review the security settings that are associated with the web server instance.
On the lower part of the summary page for the OpenSearch Demo instance,
choose the Security tab.
Under IAM Role, choose the OsDemoWebserverIAMRole link.
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 5/23
1/2/25, 17:20 Lab: Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
The IAM console opens. Multiple IAM policies were created to control access to
the resources in this lab. To view the details of each policy, you can choose the
plus icon to the left of the policy name.
Choose the link for OsDemoWebserverIAMPolicy1.
Choose the JSON tab.
The IAM policy displays in JSON format.
Review the policy.
This IAM policy enables the webserver EC2 instance to read and write to the
TempS3Bucket S3 bucket.
Note: There are other policies referenced by the OsDemoWebserverIAMRole,
and these perform other functions. If you are interested, review these and
identify what they do.
Excellent! In this task, you reviewed the EC2 instance and how it is secured.
5. To access the Kinesis console, in the search box to the right of Services, search for and
choose Kinesis.
Note: Multiple services start with "Kinesis," including Amazon Kinesis Data Analytics
and Amazon Kinesis Video Streams. Choose the service that is only "Kinesis."
Note: At a high level, the Lambda function enriches the web server logs with
more information. For example, the function determines the site visitor's
geographical location by converting the visitor's IP address to a location. You
can review details about this Lambda function by clicking on the link if you are
interested.
Congratulations! In this task, you reviewed the configuration for the Kinesis Data Firehose
delivery stream.
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 7/23
1/2/25, 17:20 Lab: Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
This role allows the OpenSearch domain to use Amazon Cognito to authenticate
users.
Return to the Roles page, and choose the Role name link for
os_demo_firehose_delivery_role.
Review the permissions policy that is associated with the role,
OsdemoFirehoseIAMPolicy.
This role is more complex than the other role, but at a high level it allows the
OpenSearch domain to:
Access Amazon S3 and get and write objects.
Get Lambda functions and use them.
Create CloudWatch logs.
Use the OpenSearch Service domain to do things like configure the
domain.
Congratulations! In this task, you reviewed the configuration for the OpenSearch Service cluster.
Note: Indexing is the method that search engines use to organize data for fast retrieval. The
resulting structure is called an index. For more information about OpenSearch indexes, see Index
Data.
In OpenSearch, the basic unit of data is a JSON document. Within an index, OpenSearch identifies
each document by using a unique ID.
Password: Passw0rd1!
Note: Amazon Cognito hosts this login screen. OpenSearch Service
depends on Amazon Cognito for authentication and IAM for
authorization.
You are prompted to change the password.
Enter the following for the new password: Passw0rd1!2
Choose Send.
If you are prompted with a pop-up window that asks for you to Add data or
Explore on my own, choose Explore on my own.
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 8/23
1/2/25, 17:20 Lab: Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
In the console area, replace the existing JSON text with the following SQL
command:
DELETE /apache_logs
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 9/23
1/2/25, 17:20 Lab: Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
This command deletes the apache_logs index, if it already exists, that is stored on
the OpenSearch Service EC2 instance.
To run the command, choose the blue arrow icon.
The following response displays:
{
"acknowledged": true
}
11. Now create the OpenSearch index by using the REST API.
Note: The PUT method is used to create the index. For more information, see Create
Index.
Copy and paste the following text into the console:
PUT apache_logs
{
"settings" : {
"index" : {
"number_of_shards" : 10,
"number_of_replicas" : 0
}
},
"mappings": {
"properties": {
"agent": {
"type": "text"
},
"browser": {
"type": "keyword"
},
"bytes": {
"type": "text"
},
"city": {
"type": "keyword"
},
"country": {
"type": "keyword"
},
"datetime": {
"type": "date","format":"dd/MMM/yyyy:HH:mm:ss Z"
},
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 10/23
1/2/25, 17:20 Lab: Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
"host": {
"type": "text"
},
"location": {
"type": "geo_point"
},
"referer": {
"type": "text"
},
"os": {
"type": "keyword"
},
"request": {
"type": "text"
},
"response": {
"type": "text"
},
"webpage": {
"type": "keyword"
},
"refering_page": {
"type": "keyword"
}
}
}
}
Note: Copy the command exactly as written, or you might receive errors when
you run the command. You should see the exact text that is shown in the
following image.
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 11/23
1/2/25, 17:20 Lab: Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "apache_logs"
}
Analysis: This command created a new index called apache_logs. When the web server logs update
because of website traffic, the Kinesis Data Firehose delivery stream will populate the OpenSearch
Service cluster with data based on the mappings in the command. The command sets the data
types for the fields in the server log files within the OpenSearch Service database. Now that you
have an index, you can generate some web access logs and then create visualizations that are
based on the data that is in this index.
To begin the testing process, you must first generate some logs on the web server.
12. Open a new browser tab or window, and go to the following URL. Replace <PUBLIC-IP>
with the public IP address that you copied previously: http://<PUBLIC-IP>/main.php
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 12/23
1/2/25, 17:20 Lab: Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
Note: For this POC, you need data from at least two web browsers. To diversify
the results, you can use multiple devices to access and browse the website.
Excellent! In this task, you populated the web server logs with data so that you can analyze the
data later with OpenSearch Dashboards.
As a data engineer, how can you observe the data being ingested, processed, and transformed in
your POC? CloudWatch Logs can help you see what is happening.
In this task, you will review the CloudWatch Logs information that was generated when the web
server access logs were ingested into Kinesis Data Firehose, and then transformed and enriched by
Lambda.
Note: In this lab, the workflow events are combined into a log group named /aws/lambda/aes-
demo-lambda-function. The two key events that you will review in this log group are Incoming
Record from Kinesis Firehose and Transformed Record going back to Kinesis Firehose.
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 13/23
1/2/25, 17:20 Lab: Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
16. Expand one of the logs with a message that begins with Incoming Record from Kinesis
Firehose.
The details of the event are similar to the following:
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 14/23
1/2/25, 17:20 Lab: Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
Analysis: This CloudWatch Logs event was generated when the web server access log was
sent from the EC2 instance and ingested into Kinesis Data Firehose. After being ingested,
the access log data is routed to Lambda, where a function transforms and enriches the
data.
17. Expand one of the logs with a message that begins with Transformed Record going back to
Kinesis Firehose.
The details of the event are similar to the following:
Analysis: Notice how the data is transformed with additional fields and enriched with the
location of the site visitor (by using the visitor's IP address).
18. Expand one of the logs with a message that begins with REPORT RequestId.
The details of the event are similar to the following:
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 15/23
1/2/25, 17:20 Lab: Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
Analysis: This event appears periodically in the log stream and includes usage
information for Kinesis Data Firehose and Lambda. For example, the Billed Duration is
the time that it takes Lambda to process a group of web access logs and enrich them. With
Kinesis Data Firehose and Lambda, AWS customers are billed for what they use.
Awesome! You have reviewed CloudWatch log events that are associated with the Kinesis Data
Firehose delivery stream and Lambda function from your POC. The delivery stream and Lambda
function automatically ingested and transformed the web access logs that you created by browsing
the site. You can view the log events in CloudWatch to see the process happening and get
information about resource use.
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 16/23
1/2/25, 17:20 Lab: Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
Perfect! In this task, you created an OpenSearch index pattern by using the datetime field that is
included in the access log data.
22. Create a bucket for the operating system of the visitor's device.
Choose Add > Split slices.
For Sub aggregation, choose Terms.
For Field, choose os.
Keep the default values for Order by, Order, and Size.
Turn on Group other values in separate bucket.
Keep the default values for the remaining options.
23. Repeat the previous steps to add another bucket for the browser type.
24. Choose Update in the lower-right corner.
26. Apply the donut style to the pie chart and configure other chart settings.
In the pane on the right side of the page, choose Options.
Turn off the Donut and Show top level only options as shown in the following
image.
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 18/23
1/2/25, 17:20 Lab: Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
Also, it looks like the UI has been updated. These are toggles now instead of check boxes.
-->
Turn on Show labels.
To apply the changes, choose Update in the lower-right corner.
The data labels are added to the visualization. Your visualization should look
similar to the following image.
Note: The colors on your visualization might not match the colors in the image.
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 19/23
1/2/25, 17:20 Lab: Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
Analysis: This pie chart is similar to the previous one but includes the filter for a specific browser.
Congratulations! In this task, you successfully created a stacked pie chart to illustrate which
operating systems and browsers the website visitors were using.
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 20/23
1/2/25, 17:20 Lab: Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
31. In the upper-right corner, change the duration to the last 1 hour, and then choose
Refresh.
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 21/23
1/2/25, 17:20 Lab: Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
The heat map shows the number of webpage views in addition to whether the referrals
were from the search page or the recommendations page.
Your heat map should look similar to the following, but it will be different based on the
pages that you visited when you generated the web access logs and the browsers that you
used.
Analysis: Based on this image of the visualization, visitors accessed the Echo and
FireStick product pages more often from the search page than the recommendations
page. The team could infer that the search page is more effective than the
recommendations page at directing users to the product pages.
Congratulations! In this task, you created a heat map to gain insights into whether more
customers were referred to product pages from the search page or the recommendations page.
Your POC to demonstrate how to use Kinesis Data Firehose and OpenSearch Service to analyze
streaming data from a website was successful.
In this lab, you configured OpenSearch Service to use an index for web access log data. You
observed how Kinesis Data Firehose ingested that data and then Lambda transformed it quickly.
You also created visualizations in OpenSearch Dashboards to analyze your data and generate
insights.
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 22/23
1/2/25, 17:20 Lab: Analyzing and Visualizing Streaming Data with Kinesis Data Firehose, OpenSearch Service, and OpenSearch Dashboards
Lab complete
Congratulations! You have completed the lab.
35. At the top of this page, choose End Lab, and then choose Yes to confirm that you want
to end the lab.
Additional Resources
For more information about the services and concepts covered in this lab, see the following
resources:
© 2022, Amazon Web Services, Inc. and its affiliates. All rights reserved. This work may not be
reproduced or redistributed, in whole or in part, without prior written permission from Amazon
Web Services, Inc. Commercial copying, lending, or selling is prohibited.
https://awsacademy.instructure.com/courses/96839/modules/items/8946974 23/23