0% found this document useful (0 votes)
73 views

Log Analytics Withamazonelasticsearchservice

Log analytics withamazonelasticsearchservice

Uploaded by

Nataraju G
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views

Log Analytics Withamazonelasticsearchservice

Log analytics withamazonelasticsearchservice

Uploaded by

Nataraju G
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Log Analytics with Amazon

Kinesis and Amazon Elasticsearch


Service

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What to do with a terabyte of logs?
Log analytics architecture

data source Amazon Kinesis Firehose Amazon Elasticsearch Kibana


Service
Amazon Elasticsearch Service is a cost-effective
managed service that makes it easy to deploy,
manage, and scale open source Elasticsearch for log
analytics, full-text search and more.
Amazon
Elasticsearch
Service
Amazon Elasticsearch Service benefits

Easy to use Scalable Highly available

Open-source Secure AWS integrated


compatible
Adobe Developer Platform (Adobe I/O)
1

Amazon Amazon
Data
Kinesis Spark Streaming Elasticsearch
Sources
Streams Service

PROBLEM SOLUTION BENEFITS


• Cost effective monitor • Log data is routed with • Management and
for XL amount of log Amazon Kinesis to operational simplicity
data Amazon Elasticsearch
• Over 200,000 API calls Service, then • Flexibility to try out
per second at peak - displayed using AES different cluster config
destinations, response Kibana during dev and test
times, bandwidth
• Adobe team can easily
• Integrate seamlessly see traffic patterns and
with other components
of AWS eco-system. error rates, quickly
identifying anomalies
and potential
challenges
McGraw Hill Education

PROBLEM SOLUTION BENEFITS


• Supporting a wide catalog • Search and analyze test • Confidence to scale
across multiple services in results, student/teacher throughout the school year.
multiple jurisdictions interaction, teacher From 0 to 32TB in 9 months
• Over 100 million learning effectiveness, student • Focus on their business, not
events each month progress their infrastructure
• Tests, quizzes, learning
modules begun / completed • Analytics of applications
/ abandoned and infrastructure are now
integrated to understand
operations in real time
Get set up right
Amazon ES overview

CloudTrail CloudWatch

Elasticsearch API

Elastic Load
IAM
Balancing

Amazon Route
53
Data pattern

One index per day Shard 1


logs_01.21.2017 Shard 2
logs_01.22.2017 Shard 3
host
logs_01.23.2017 Each index has ident
logs_01.24.2017 multiple shards auth
logs_01.25.2017 Each shard contains timestamp
a set of documents etc.
logs_01.26.2017
Each document contains
logs_01.27.2017
a set of fields and values
Amazon ES cluster
Deployment of indices to a cluster

Amazon ES cluster
• Index 1 Primary Replica
– Shard 1 1 1 1 3 2 1
– Shard 2 2 2
3 1 1 2
– Shard 3 3 3
Instance 1, Instance 2
Master
• Index 2 3 2
– Shard 1 1 1
2 3
– Shard 2 2 2
Instance 3
– Shard 3 3 3
How many instances?

The index size will be about the same as the


corpus of source documents
• Double this if you are deploying an index replica
Size based on storage requirements
• Either local storage or up to 1.5TB of EBS per
instance

• Example: 2TB corpus will need 4 instances


– Assuming a replica and using EBS
– Or with i2.2xlarge nodes (1.6TB ephemeral storage)
Instance type recommendations
Instance Workload
T2 Entry point. Dev and test.
M3, M4 Equal read and write volumes.

R3, R4 Read-heavy or workloads with high memory demands (e.g.,


aggregations).

C4 High concurrency/indexing workloads

I2 Up to 1.6 TB of SSD instance storage.


Cluster with no dedicated masters

Amazon ES cluster

1 3 2 1

3 1 1 2

Instance 1, Instance 2
Master
3 2

2 3

Instance 3
Cluster with dedicated masters
Amazon ES cluster

1 3 2 1

3 1 1 2

Instance 1 Instance 2

3 2

2 3

Dedicated master nodes Instance 3

Data nodes: queries and updates


Master node selection

• < 10 nodes - m3.medium, c4.large


• 11-20 nodes - m4.large, r4.large, m3.large, r3.large
• 21-40 nodes - c4.xlarge, m4.xlarge, r4.xlarge, m3.xlarge
Cluster with zone awareness

Amazon ES cluster

1 3 2 3 1 2

3 1 2 2 3 1

Instance 1 Instance 2 Instance 3 Instance 4

Availability Zone 1 Availability Zone 2


Small use cases

Application
Instance

• Logstash co-located on the • Up to 200GB of data


Application instance • m3.medium + 100G EBS
• SigV4 signing via provided data nodes
output plugin • 3x m3.medium master nodes
Large use cases

Amazon
DynamoDB

Amazon S3
bucket AWS
Lambda

Amazon
CloudWatch

• Data flows from instances • Up to 5TB of data


and applications via • r3.2xlarge + 512GB EBS
Lambda; CWL is implicit data nodes
• SigV4 signing via • 3x m3.medium master nodes
Lambda/roles
XL use cases

Amazon
Kinesis

Amazon
EMR

• Ingest supported through • Up to 60 TB of data


high-volume technologies • R3.8xlarge + 640GB data
like Spark or Kinesis nodes
• 3x m3.xlarge master nodes
Best practices

Data nodes = Storage needed/Storage per node


Use GP2 EBS volumes
Use 3 dedicated master nodes for production deployments
Enable Zone Awareness
Set indices.fielddata.cache.size = 40
Amazon Kinesis
Amazon Kinesis: Streaming Data Made Easy
Services make it easy to capture, deliver, process streams on AWS

Amazon Kinesis Amazon Kinesis Amazon Kinesis


Streams Firehose Analytics
Amazon Kinesis Streams

• Easy administration
• Build real time applications with framework of choice
• Low cost
Amazon Kinesis Firehose

• Zero administration
• Direct-to-data store integration
• Seamless elasticity
Amazon Kinesis Analytics

• Interact with streaming data in real-time using SQL


• Build fully managed and elastic stream processing
applications that process data for real-time visualizations
and alarms
Amazon Kinesis - Firehose vs. Streams

Amazon Kinesis Streams is for use cases that require custom


processing, per incoming record, with sub-1 second processing
latency, and a choice of stream processing frameworks.

Amazon Kinesis Firehose is for use cases that require zero


administration, ability to use existing analytics tools based on
Amazon S3, Amazon Redshift and Amazon Elasticsearch, and a
data latency of 60 seconds or higher.
Kinesis Firehose overview

Delivery Stream: Underlying


AWS resource

Destination: Amazon ES,


Amazon Redshift, or Amazon
S3

Record: Put records in


streams to deliver to
destinations
Kinesis Firehose Data Transformation
• Firehose buffers up to 3MB of ingested data
• When buffer is full, automatically invokes Lambda function,
passing array of records to be processed
• Lambda function processes and returns array of transformed
records, with status of each record
• Transformed records are saved to configured destination

[{" [{
"recordId": "1234", "recordId": "1234",
"data": "encoded-data" "result": "Ok"
}, "data": "encoded-data"
{ },
"recordId": "1235", {
"data": "encoded-data" "recordId": "1235",
} "result": "Dropped"
] "data": "encoded-data"
}
]
Kinesis Firehose delivery architecture with
transformations
Data transformation
function

data source transformed


records
source records

Firehose Amazon Elasticsearch


delivery stream Service

S3 bucket
delivery failure

transformation failure

source records
Kinesis Firehose features for ingest

Serverless scale Error handling S3 Backup


Best practices

Use smaller buffer sizes to increase throughput, but be


careful of concurrency

Use index rotation based on sizing

Default: stream limits: 2,000 transactions/second, 5,000


records/second, and 5 MB/second
Log analysis with aggregations
Amazon ES aggregations

Buckets – a collection of documents meeting some criterion


Metrics – calculations on the content of buckets
Metric: count

Bucket: time
host:199.72.81.55 with <histogram of verb>

199.72.81.55

1, GET GET 5
4, GET POST 2
8, POST PUT 1
12, GET
30, PUT
42, GET
58, GET
100 POST
...

Look up Field data Buckets Counts


A more complicated aggregation

Bucket: ARN
Bucket: Region
Bucket: eventName
Metric: Count
Best practices

Make sure that your fields are not_analyzed

Visualizations are based on buckets/metrics

Use a histogram on the x-axis first, then sub-aggregate


Run Elasticsearch in the AWS cloud with Amazon
Elasticsearch Service
Use Kinesis Firehose to ingest data simply
Kibana for monitoring, Elasticsearch queries for
Amazon deeper analysis
Elasticsearch
Service
What to do next

Qwiklab:
https://qwiklabs.com/searches/lab?keywords=introduction
%20to%20amazon%20elasticsearch%20service
Centralized logging solution
https://aws.amazon.com/answers/logging/centralized-
logging/
Our overview page on AWS
https://aws.amazon.com/elasticsearch-service/
Q&A

Thank you for joining!

You might also like