Aws Projects - Build Log Analytics Solution On Aws
Aws Projects - Build Log Analytics Solution On Aws
AWS
September 2021
Notices
Customers are responsible for making their own independent assessment of the information in
this document. This document: (a) is for informational purposes only, (b) represents current
AWS product offerings and practices, which are subject to change without notice, and (c) does
not create any commitments or assurances from AWS and its affiliates, suppliers or licensors.
AWS products or services are provided “as is” without warranties, representations, or
conditions of any kind, whether express or implied. The responsibilities and liabilities of AWS to
its customers are controlled by AWS agreements, and this document is not part of, nor does it
modify, any agreement between AWS and its customers.
© 2021 Amazon Web Services, Inc. or its affiliates. All rights reserved.
Contents
Introduction.................................................................................................................................... 1
Architecture.................................................................................................................................... 1
Estimate Your Costs ........................................................................................................................ 2
Services Used and Costs.............................................................................................................. 2
Tutorial ........................................................................................................................................... 5
Step 1: Set Up Prerequisites ....................................................................................................... 6
Step 2: Create an Amazon Kinesis Data Firehose Delivery Stream ........................................... 12
Step 3: Install and Configure the Amazon Kinesis Agent on the EC2 Instance ......................... 13
Step 4: Create an OpenSearch domain in Amazon OpenSearch Service (successor to Amazon
Elasticsearch Service) ............................................................................................................... 15
Step 5: Create a Second Amazon Kinesis Data Firehose Delivery Stream ................................ 18
Step 6: Create an Amazon Kinesis Data Analytics Application .................................................. 20
Step 7: View the Aggregated Streaming Data ........................................................................... 25
Step 8: Clean Up........................................................................................................................ 34
Additional Resources .................................................................................................................... 36
About this Guide
Log analytics is a common big data use case that allows you to analyze log data from websites,
mobile devices, servers, sensors, and more for a wide variety of applications such as digital
marketing, application monitoring, fraud detection, ad tech, games, and IoT. In this project, you
use Amazon Web Services to build an end-to-end log analytics solution that collects, ingests,
processes, and loads both batch data and streaming data, and makes the processed data
available to your users in analytics systems they are already using and in near real-time. The
solution is highly reliable, cost-effective, scales automatically to varying data volumes, and
requires almost no IT administration.
Amazon Web Services Build a Log Analytics Solution on AWS
Introduction
Amazon Kinesis Data Analytics is the easiest way to process streaming data in real time with
standard SQL without having to learn new programming languages or processing frameworks.
Amazon Kinesis Data Analytics enables you to create and run SQL queries on streaming data so
that you can gain actionable insights and respond to your business and customer needs
promptly.
This tutorial walks you through the process of ingesting streaming log data, aggregating that
data, and persisting the aggregated data so that it can be analyzed and visualized. You create a
complete end-to-end system that integrates several AWS services. You analyze a live stream of
Apache access log data and aggregate the total request for each HTTP response type every
minute. To visualize this data in near real-time, you use a user interface (UI) tool that charts the
results.
Architecture
One of the major benefits to using Amazon Kinesis Data Analytics is that an entire analysis
infrastructure can be created with a serverless architecture. The system created in this tutorial
implements Amazon Kinesis Data Firehose, Amazon Kinesis Data Analytics, and Amazon
OpenSearch Service (successor to Amazon Elasticsearch Service). Each of these services is
designed for seamless integration with one another. The architecture is depicted below.
1
Amazon Web Services Build a Log Analytics Solution on AWS
In this architecture example, the web server is an Amazon Elastic Compute Cloud (Amazon EC2)
instance. You install the Amazon Kinesis Agent on this Linux instance.
1. The Kinesis Agent continuously forward log records to an Amazon Kinesis Data
Firehose delivery stream.
2. Amazon Kinesis Data Firehose writes each log record to Amazon Simple Storage
Service (Amazon S3) for durable storage of the raw log data. Amazon Kinesis Data
Analytics continuously runs a SQL statement against the streaming input data.
3. Amazon Kinesis Analytics creates an aggregated data set every minute and outputs
that data to a second Kinesis Data Firehose delivery stream.
4. This Firehose delivery stream writes the aggregated data to an OpenSearch domain
hosted in Amazon OpenSearch Service.
5. You create a view of the streaming data using OpenSearch Dashboards to visualize the
output of your system.
Amazon EC2
Description: Amazon EC2 provides the virtual application servers, known as instances, to run
your web application on the platform you choose. Amazon EC2 allows you to configure and
2
Amazon Web Services Build a Log Analytics Solution on AWS
scale your compute capacity easily to meet changing requirements and demand. It is integrated
into Amazon’s computing environment, allowing you to leverage the AWS suite of services.
How Pricing Works: Amazon EC2 pricing is based on four components: the instance type you
choose (EC2 comes in 40+ types of instances with options optimized for compute, memory,
storage, and more), the AWS Region your instances are based in, the software you run, and the
pricing model you select (on-demand instances, reserved capacity, spot, etc.). For more
information, see Amazon EC2 pricing.
Example: Assume your log files reside on a single Linux t2.micro EC2 instance in the US East
region. With an on-demand pricing model, the monthly charge for your virtual machine is
$4.18. For this tutorial, assuming that the log generating instance runs for 1 hour, your EC2 cost
is estimated to be $0.0116 [= ($8.35 per month / 30 days per month / 24 hours per day) * 1
hour].
3
Amazon Web Services Build a Log Analytics Solution on AWS
For this tutorial, assume that the system is only ingesting data for 1 hour. The cost specifically
for this tutorial would be approximately $0.25 [= ($179.21 per month / 30 days per month / 24
hours per day) * 1 hour].
The second Kinesis Data Firehose delivery stream is receiving records at a much less frequent
rate. Because the Amazon Kinesis Data Analytics application is outputting only a few rows of
data every minute, the cost for that delivery stream is correspondingly smaller. Assuming only
five records per minute are ingested, and each record is less than 5 KB, the cost for the delivery
stream is $0.00005 for the 1-hour duration assumed for this tutorial.
Amazon S3
Description: Amazon S3 provides secure, durable, and highly-scalable cloud storage for the
objects that make up your application. Examples of objects you can store include source code,
logs, images, videos, and other artifacts that are created when you deploy your application.
Amazon S3 makes it is easy to use object storage with a simple web interface to store and
retrieve your files from anywhere on the web, meaning that your website will be reliably
available to your visitors.
How Pricing Works: Amazon S3 pricing is based on five components: the type of Amazon S3
storage you use, the region you store your WordPress content (e.g., US East vs. Asia Pacific -
Sydney), the amount of data you store, the number of requests you or your users make to store
new content or retrieve the content, and the amount of data that is transferred from Amazon
S3 to you or your users. For more information, see Amazon S3 Pricing.
Example: Using Standard Storage in the US East Region, if you store 5 GB of content, you pay
$0.115 per month. If you created your account in the past 12 months, and you are eligible for
the AWS Free Tier, you pay $0.00 per month. For this tutorial, assume that the producer creates
5 GB of data. Over a 1-hour period, the total cost for storing the records in Amazon S3 is
$0.00016/ [= ($0.115 per month / 30 days per month / 24 hours per day) * 1 hour].
4
Amazon Web Services Build a Log Analytics Solution on AWS
How Pricing Works: With Amazon Kinesis Data Analytics, you pay only for what you use. You
are charged an hourly rate based on the average number of Kinesis Processing Units (KPUs)
used to run your stream processing application.
A single KPU is a unit of stream processing capacity comprised of 4 GB memory, 1 vCPU
compute, and corresponding networking capabilities. As the complexity of your queries varies,
and the demands on memory and compute vary in response, Amazon Kinesis Data Analytics
automatically and elastically scales the number of KPUs required to complete your analysis.
There are no resources to provision and no upfront costs or minimum fees associated with
Amazon Kinesis Analytics. For more information, see Amazon Kinesis Data Analytics Pricing.
Example: This example assumes that the system is running for 1 hour in the US East Region. The
SQL query in this tutorial is basic and does not consume more than one KPU. Given that the
price for Amazon Kinesis Data Analytics in US East is $0.11 per KPU-hour, and the tutorial runs
for 1 hour, the total cost for the usage of Amazon Kinesis Data Analytics is $0.11.
Tutorial
You can evaluate the simplicity and effectiveness of Amazon Kinesis Data Analytics with this
tutorial, which walks you through a simple Amazon Kinesis Data Analytics application.
You perform the following steps in this tutorial:
Step 1: Set Up Prerequisites
Step 2: Create an Amazon Kinesis Data Firehose Delivery Stream
Step 3: Install and Configure the Amazon Kinesis Agent on the EC2 Instance
Step 4: Create an Opensearch Domain in Amazon OpenSearch Service
Step 5: Create a Second Amazon Kinesis Data Firehose Delivery Stream
Step 6: Create an Amazon Kinesis Data Analytics Application
Step 7: View the Aggregated Streaming Data
5
Amazon Web Services Build a Log Analytics Solution on AWS
Step 8: Clean Up
This tutorial is not meant for production environments and does not discuss options in depth.
After you complete the steps, you can find more in-depth information to create your own
Amazon Kinesis Data Analytics application in the Additional Resources section.
6
Amazon Web Services Build a Log Analytics Solution on AWS
7
Amazon Web Services Build a Log Analytics Solution on AWS
You want to ensure that your EC2 instance has an AWS Identity and Access
Management (IAM) role configured with permission to write to Amazon Kinesis Data
Firehose and Amazon CloudWatch. For more information, see IAM Roles for Amazon
EC2.
a. Choose Create role.
b. For trusted entity, choose AWS service.
c. For the use case, choose EC2.
7. Clear the search bar and type CloudWatchFull. Select the check box for
CloudWatchFullAccess.
8
Amazon Web Services Build a Log Analytics Solution on AWS
9
Amazon Web Services Build a Log Analytics Solution on AWS
#!/bin/bash
sudo yum update -y
sudo yum install git -y
sudo easy_install pip
sudo pip install pytz
sudo pip install numpy
sudo pip install faker
sudo pip install tzlocal
git clone https://github.com/kiritbasu/Fake-Apache-Log-
Generator.git
mkdir /tmp/logs
cp /Fake-Apache-Log-Generator/apache-fake-log-gen.py
/tmp/logs/
10
Amazon Web Services Build a Log Analytics Solution on AWS
11
Amazon Web Services Build a Log Analytics Solution on AWS
Once you connect into the EC2 instance, move to the /tmp/logs directory and run the following
line of code to start the Fake Apache Log Generator program. Run this line of code multiple
times to create multiple log files within the /tmp/logs file.
Take note of the path to the log file. You need this information later in this tutorial.
In this step, you create an Amazon Kinesis Data Firehose delivery stream to save each log entry
in Amazon S3 and to provide the log data to the Amazon Kinesis Data Analytics application that
you create later in this tutorial.
12
Amazon Web Services Build a Log Analytics Solution on AWS
c. In the Create S3 bucket window, for S3 bucket name, specify a unique name. You do
not need to use the name elsewhere in this tutorial. However, Amazon S3 bucket
names are required to be globally unique.
d. For Region, choose US East (N. Virginia).
e. Choose Create S3 Bucket.
19. Choose Next.
20. On the Configure settings screen, scroll down to Permissions, and for IAM role,
choose Create or update IAM role.
The Amazon Kinesis Agent can preprocess records from monitored files before sending them to
your delivery stream. It has native support for Apache access log files, which you created in
Step 1. When configured, the agent parses log files in the Apache Common Log format and
13
Amazon Web Services Build a Log Analytics Solution on AWS
convert each line in the file to JSON format before sending the files to your Kinesis Data
Firehose delivery stream, which you created in Step 2.
1. To install the agent, copy and paste the following command, once SSH’d into the EC2
instance. For more information, see Download and Install the Agent.
2. For detailed instructions on how to configure the agent to process and send log data to
your Amazon Kinesis Data Firehose delivery stream, see Configure and Start the Agent.
To configure the agent for this tutorial, modify the configuration file located at
/etc/aws-kinesis/agent.json using the following template.
o Replace filePattern with the full-path-to-log-file that represents the path to
your log files and a wildcard if you have multiple log files with the same naming
convention. For example, it might look similar to: “/tmp/logs/access_log*”.
The value will be different, depending on your use case.
o Replace name-of-delivery-stream with the name of the Kinesis Data
Firehose delivery stream you created in Step 2.
o The firehose.endpoint is firehose.us-east-1.amazonaws.com
(default).
"firehose.endpoint": "firehose.us-east-1.amazonaws.com",
"flows": [
{
"filePattern": "/tmp/logs/access_log*",
"deliveryStream": "name-of-delivery-stream",
"dataProcessingOptions": [
{
"optionName": "LOGTOJSON",
"LogFormat": "COMMONAPACHELOG"
}]
}
]
Once started, the agent looks for files in the configured location and send the records to the
Kinesis Data Firehose delivery stream.
14
Amazon Web Services Build a Log Analytics Solution on AWS
15
Amazon Web Services Build a Log Analytics Solution on AWS
16
Amazon Web Services Build a Log Analytics Solution on AWS
Note: The following example shows a restrictive access policy where the domain
is restricted to only allow traffic from a specific IP address.
17
Amazon Web Services Build a Log Analytics Solution on AWS
18
Amazon Web Services Build a Log Analytics Solution on AWS
8. Choose Next.
9. Review the details for your Amazon Kinesis Data Firehose delivery stream and choose
Create Delivery Stream.
10. Add permissions for your Kinesis Data Firehose delivery stream to access your
OpenSearch cluster:
a. Select the newly created web-log-aggregated-data stream and choose the IAM role,
under the Configuration tab.
b. In the IAM window that opens, choose Add inline policy (on the right).
c. On the Create policy page, choose a service and search for OpenSearch in the
search box.
d. Under Actions, select the check box for All OpenSearch Service.
e. Expand the Resources tab, select the Specific radio button and check Any in this
account. Choose Review policy.
19
Amazon Web Services Build a Log Analytics Solution on AWS
20
Amazon Web Services Build a Log Analytics Solution on AWS
6. Under Source, choose Kinesis Firehose delivery stream and select web-log-ingestion-
stream that you created in Step 2.
7. Scroll down to the Schema section and choose Discover schema., Amazon Kinesis Data
Analytics analyzes the source data in your Kinesis Data Firehose delivery stream and
creates a formatted sample of the input data for your review:
21
Amazon Web Services Build a Log Analytics Solution on AWS
22
Amazon Web Services Build a Log Analytics Solution on AWS
After approximately 60 to 90 seconds, the Source data section presents you with a
sample of source data that is flowing into your source delivery stream.
11. In the SQL editor, enter the following SQL code:
23
Amazon Web Services Build a Log Analytics Solution on AWS
14. Under Destination, choose Kinesis Firehose delivery stream and select the web-log-
aggregated-data stream that you created in Step 5.
15. For In-application stream, choose DESTINATION_SQL_STREAM.
24
Amazon Web Services Build a Log Analytics Solution on AWS
16. Leave all other options set to their default values and choose Save changes.
25
Amazon Web Services Build a Log Analytics Solution on AWS
2. In the Domain column, choose the OpenSearch domain called web-log-summary that
you created in Step 4.
3. On the Overview tab, click the link next to OpenSearch Dashboards.
4. Enter the username and password you created in Step 4.
Because this is the first time you are opening OpenSearch Dashboards in your
OpenSearch domain, you need to configure it. This configuration includes giving your
user permissions to access the index in OpenSearch and giving Kinesis Data Firehose
permission to write to the index.
5. In the Welcome screen, choose Explore on my own. Choose Global for tenant. In the
left toolbar, choose Security and then choose Role Mappings.
26
Amazon Web Services Build a Log Analytics Solution on AWS
6. Verify an Open Distro Security Role named all_access role is listed. If it is not listed,
choose Create role and choose the Role all_access. See Amazon Opensearch Service
fine-grained access control documentation for more information.
You need to modify the Users associated with the all_access role.
27
Amazon Web Services Build a Log Analytics Solution on AWS
a. Choose Manage mapping and under Users, choose admin. This is the user you used
to log into OpenSearch Dashboards.
b. Under Backend roles, enter the ARN for the IAM role you are logged into in your
AWS account. The ARN can be found in the IAM console page, under Users. Click
into your user account and copy the User ARN.
c. Repeat this step for the firehose delivery role you created earlier, found under
Roles in the IAM Console, by choosing Add another backend role.
You should now have three entries listed for the all_access Role.
28
Amazon Web Services Build a Log Analytics Solution on AWS
d. Choose Map.
8. Choose the OpenSearch Dashboards icon on the top left to return to the Home
dashboard.
9. Choose OpenSearch Dashboards, Visualize & analyze.
29
Amazon Web Services Build a Log Analytics Solution on AWS
10. Choose Create index pattern. the Index pattern field, type request_data*. This entry
uses the OpenSearch index name that you created in Step 5.
30
Amazon Web Services Build a Log Analytics Solution on AWS
31
Amazon Web Services Build a Log Analytics Solution on AWS
32
Amazon Web Services Build a Log Analytics Solution on AWS
33
Amazon Web Services Build a Log Analytics Solution on AWS
Step 8: Clean Up
After completing this tutorial, be sure to delete the AWS resources that you created so that you
no longer accrue charges.
34
Amazon Web Services Build a Log Analytics Solution on AWS
3. On the Actions menu, choose the instance, and then choose Instance State, then
Terminate.
4. Read the warning regarding instance termination, and choose Yes, Terminate.
If you used an existing EC2 instance with Apache access logs and you do not plan to stop or
terminate the instance, you should stop the Amazon Kinesis Agent so that no additional records
are sent to the Kinesis Data Firehose delivery stream. Stop the agent with the following
command:
sudo service aws-kinesis-agent stop
Note: At this point in the tutorial, you have terminated or stopped any services
that accrue charges while ingesting and processing data. Because the data
producer has been stopped, you will not incur additional charges for Amazon
Kinesis Data Firehose and Amazon Kinesis Data Analytics since data is not being
ingested or processed. You can safely leave them in place for later reference or
future development. However, if you wish to remove all resources created in this
tutorial, continue with the following steps.
Delete the Amazon Kinesis Data Analytics Application and the Amazon Kinesis
Data Firehose Delivery Streams
1. Navigate to the Amazon Kinesis console at https://console.aws.amazon.com/kinesis.
35
Amazon Web Services Build a Log Analytics Solution on AWS
2. Choose Go to Analytics.
3. Locate and select the name of the Amazon Kinesis Data Analytics application called
web-log-aggregation-tutorial that you created in Step 6 to view its details.
4. Choose Application details.
5. On the Actions menu, choose Delete application.
6. To confirm the deletion, in the confirmation modal, choose Delete application.
7. Navigate to the Amazon Kinesis Data Firehose console at
https://console.aws.amazon.com/firehose.
8. Choose the Firehose delivery stream called web-log-ingestion-stream that you created
in Step 2.
9. On the Actions menu, choose Delete.
10. To confirm the deletion, enter the name of the delivery stream and choose Delete.
11. Repeat items 7 through 10 for the second delivery stream called web-log-aggregated-
data that you created in Step 5.
Additional Resources
We recommend that you continue to learn more about the concepts introduced in this guide
with the following resources:
• For detailed information on Amazon Kinesis Analytics, see Amazon Kinesis Analytics:
How It Works.
• For information on how to develop your own Amazon Kinesis Analytics application,
with specific information about its SQL extensions, windowed queries, and joining
multiple streams, see Streaming SQL Concepts.
• For additional examples, see Example Amazon Kinesis Analytics Applications.
36