AWS - Kinesis Quizlet
AWS - Kinesis Quizlet
Accelerated log and data feed intake: Instead of waiting to batch up the data, you
can have your data producers push data to an Amazon Kinesis data stream as soon
as the data is produced, preventing data loss in case of data producer failures. For
example, system and application logs can be continuously added to a data stream
and be available for processing within seconds.
Real-time metrics and reporting: You can extract metrics and generate reports from
Amazon Kinesis data stream data in real-time. For example, your Amazon Kinesis
Application can work on metrics and reporting for system and application logs as the
data is streaming in, rather than wait to receive data batches.
Real-time data analytics: With Amazon Kinesis Data Streams, you can run real-time
streaming data analytics. For example, you can add clickstreams to your Amazon
Kinesis data stream and have your Amazon Kinesis Application run analytics in real-
time, enabling you to gain insights out of your data at a scale of minutes instead of
hours or days.
Complex stream processing: You can create Directed Acyclic Graphs (DAGs) of
Amazon Kinesis Applications and data streams. In this scenario, one or more
Amazon Kinesis Applications can add data to another Amazon Kinesis data stream
for further processing, enabling successive stages of stream processing.
How do I use Amazon Kinesis Data Streams?
Creating an Amazon Kinesis data stream through either AWS Management Console
or CreateStream operation.
Configuring your data producers to continuously add data to your data stream.
Building your Amazon Kinesis Applications to read and process data from your data
stream, using either Amazon Kinesis API or Amazon Kinesis Client Library (KCL)
What are the limits of Amazon Kinesis Data Streams?
By default, Records of a stream are accessible for up to 24 hours from the time they
are added to the stream. You can raise this limit to up to 7 days by enabling
extended data retention.
The maximum size of a data blob (the data payload before Base64-encoding) within
one record is 1 megabyte (MB).
Each shard can support up to 1000 PUT records per second.
How does Amazon Kinesis Data Streams differ from Amazon SQS?
Amazon Kinesis Data Streams enables real-time processing of streaming big data. It
provides ordering of records, as well as the ability to read and/or replay records in
the same order to multiple Amazon Kinesis Applications. The Amazon Kinesis Client
Library (KCL) delivers all records for a given partition key to the same record
processor, making it easier to build multiple applications reading from the same
Amazon Kinesis data stream (for example, to perform counting, aggregation, and
filtering).
Amazon Simple Queue Service (Amazon SQS) offers a reliable, highly scalable
hosted queue for storing messages as they travel between computers. Amazon SQS
lets you easily move data between distributed application components and helps you
build applications in which messages are processed independently (with message-
level ack/fail semantics), such as automated workflows.
When should I use Amazon Kinesis Data Streams, and when should I use Amazon
SQS?
Routing related records to the same record processor (as in streaming MapReduce).
For example, counting and aggregation are simpler when all records for a given key
are routed to the same record processor.
Ordering of records. For example, you want to transfer log data from the application
host to the processing/archival host while maintaining the order of log statements.
Ability for multiple applications to consume the same stream concurrently. For
example, you have one application that updates a real-time dashboard and another
that archives data to Amazon Redshift. You want both applications to consume data
from the same stream concurrently and independently.
Ability to consume records in the same order a few hours later. For example, you
have a billing application and an audit application that runs a few hours behind the
billing application. Because Amazon Kinesis Data Streams stores data for up to 7
days, you can run the audit application up to 7 days behind the billing application.
What is a shard?
Shard is the base throughput unit of an Amazon Kinesis data stream. One shard
provides a capacity of 1MB/sec data input and 2MB/sec data output. One shard can
support up to 1000 PUT records per second. You will specify the number of shards
needed when you create a data stream. For example, you can create a data stream
with two shards. This data stream has a throughput of 2MB/sec data input and
4MB/sec data output, and allows up to 2000 PUT records per second. You can
monitor shard-level metrics in Amazon Kinesis Data Streams and add or remove
shards from your data stream dynamically as your data throughput changes by
resharding the data stream.
What is a record?
A record is the unit of data stored in an Amazon Kinesis data stream. A record is
composed of a sequence number, partition key, and data blob. Data blob is the data
of interest your data producer adds to a data stream. The maximum size of a data
blob (the data payload before Base64-encoding) is 1 megabyte (MB).
What is a partition key?
Partition key is used to segregate and route records to different shards of a data
stream. A partition key is specified by your data producer while adding data to an
Amazon Kinesis data stream. For example, assuming you have a data stream with
two shards (shard 1 and shard 2). You can configure your data producer to use two
partition keys (key A and key B) so that all records with key A are added to shard 1
and all records with key B are added to shard 2.
What is a sequence number?
A sequence number is a unique identifier for each record. Sequence number is
assigned by Amazon Kinesis when a data producer calls PutRecord or PutRecords
operation to add data to an Amazon Kinesis data stream. Sequence numbers for the
same partition key generally increase over time; the longer the time period between
PutRecord or PutRecords requests, the larger the sequence numbers become.
How do I decide the throughput of my Amazon Kinesis data stream?
The throughput of an Amazon Kinesis data stream is determined by the number of
shards within the data stream.
Estimate the average size of the record written to the data stream in kilobytes (KB),
rounded up to the nearest 1 KB. (average_data_size_in_KB)
Estimate the number of records written to the data stream per second.
(number_of_records_per_second)
Decide the number of Amazon Kinesis Applications consuming data concurrently
and independently from the data stream. (number_of_consumers)
Calculate the incoming write bandwidth in KB (incoming_write_bandwidth_in_KB),
which is equal to the average_data_size_in_KB multiplied by the
number_of_records_per_seconds.
Calculate the outgoing read bandwidth in KB (outgoing_read_bandwidth_in_KB),
which is equal to the incoming_write_bandwidth_in_KB multiplied by the
number_of_consumers.
What is the minimum throughput I can request for my Amazon Kinesis data stream?
The throughput of an Amazon Kinesis data stream scales by unit of shard. One
single shard is the smallest throughput of a data stream, which provides 1MB/sec
data input and 2MB/sec data output.
What is the maximum throughput I can request for my Amazon Kinesis data stream?
The throughput of an Amazon Kinesis data stream is designed to scale without
limits. By default, each account can provision 10 shards per region. You can use the
Amazon Kinesis Data Streams Limits form to request more than 10 shards within a
single region.
How can record size affect the throughput of my Amazon Kinesis data stream?
A shard provides 1MB/sec data input rate and supports up to 1000 PUT records per
sec. Therefore, if the record size is less than 1KB, the actual data input rate of a
shard will be less than 1MB/sec, limited by the maximum number of PUT records per
second.
How do I add data to my Amazon Kinesis data stream?
You can add data to an Amazon Kinesis data stream via PutRecord and PutRecords
operations, Amazon Kinesis Producer Library (KPL), or Amazon Kinesis Agent.
What is the difference between PutRecord and PutRecords?
PutRecord operation allows a single data record within an API call and PutRecords
operation allows multiple data records within an API call. For more information about
PutRecord and PutRecords operations, see PutRecord and PutRecords.
What is Amazon Kinesis Producer Library (KPL)?
Amazon Kinesis Producer Library (KPL) is an easy to use and highly configurable
library that helps you put data into an Amazon Kinesis data stream. KPL presents a
simple, asynchronous, and reliable interface that enables you to quickly achieve high
producer throughput with minimal client resources.
What is Amazon Kinesis Agent?
Amazon Kinesis Agent is a pre-built Java application that offers an easy way to
collect and send data to your Amazon Kinesis data stream. You can install the agent
on Linux-based server environments such as web servers, log servers, and
database servers. The agent monitors certain files and continuously sends data to
your data stream
How do I use Amazon Kinesis Agent?
After installing Amazon Kinesis Agent on your servers, you configure it to monitor
certain files on the disk and then continuously send new data to your Amazon
Kinesis data stream.
What happens if the capacity limits of an Amazon Kinesis data stream are exceeded
while the data producer adds data to the data stream?
The capacity limits of an Amazon Kinesis data stream are defined by the number of
shards within the data stream. The limits can be exceeded by either data throughput
or the number of PUT records. While the capacity limits are exceeded, the put data
call will be rejected with a ProvisionedThroughputExceeded exception. If this is due
to a temporary rise of the data stream's input data rate, retry by the data producer
will eventually lead to completion of the requests. If this is due to a sustained rise of
the data stream's input data rate, you should increase the number of shards within
your data stream to provide enough capacity for the put data calls to consistently
succeed. In both cases, Amazon CloudWatch metrics allow you to learn about the
change of the data stream's input data rate and the occurrence of
ProvisionedThroughputExceeded exceptions.
What is enhanced fan-out?
Enhanced fan-out is an optional feature for Kinesis Data Streams consumers that
provides logical 2 MB/sec throughput pipes between consumers and shards. This
allows customers to scale the number of consumers reading from a data stream in
parallel, while maintaining high performance.
What is an Amazon Kinesis Application?
An Amazon Kinesis Application is a data consumer that reads and processes data
from an Amazon Kinesis data stream. You can build your applications using either
Amazon Kinesis Data Analytics, Amazon Kinesis API or Amazon Kinesis Client
Library (KCL).
What is Amazon Kinesis Client Library (KCL)?
Amazon Kinesis Client Library (KCL) for Java | Python | Ruby | Node.js | .NET is a
pre-built library that helps you easily build Amazon Kinesis Applications for reading
and processing data from an Amazon Kinesis data stream.
What is the SubscribeToShard API?
The SubscribeToShard API is a high performance streaming API that pushes data
from shards to consumers over a persistent connection without a request cycle from
the client. The SubscribeToShard API uses the HTTP/2 protocol to deliver data to
registered consumers whenever new data arrives on the shard, typically within
70ms, offering ~65% faster delivery compared to the GetRecords API.. The
consumers will enjoy fast delivery even when multiple registered consumers are
reading from the same shard.
Can I use SubscribeToShard without using enhanced fan-out?
No, SubscribeToShard requires the use of enhanced fan-out, which means you also
need to register your consumer with the Kinesis Data Streams service before you
can use SubscribeToShard.
What is Amazon Kinesis Connector Library?
Amazon Kinesis Connector Library is a pre-built library that helps you easily
integrate Amazon Kinesis Data Streams with other AWS services and third-party
tools. Amazon Kinesis Client Library (KCL) for Java | Python | Ruby | Node.js | .NET
is required for using Amazon Kinesis Connector Library. The current version of this
library provides connectors to Amazon DynamoDB, Amazon Redshift, Amazon S3,
and Elasticsearch. The library also includes sample connectors of each type, plus
Apache Ant build files for running the samples.
What is Amazon Kinesis Storm Spout?
Amazon Kinesis Storm Spout is a pre-built library that helps you easily integrate
Amazon Kinesis Data Streams with Apache Storm. The current version of Amazon
Kinesis Storm Spout fetches data from Amazon Kinesis data stream and emits it as
tuples. You will add the spout to your Storm topology to leverage Amazon Kinesis
Data Streams as a reliable, scalable, stream capture, storage, and replay service
What is a worker and a record processor generated by Amazon Kinesis Client
Library (KCL)?
An Amazon Kinesis Application can have multiple application instances and a worker
is the processing unit that maps to each application instance. A record processor is
the processing unit that processes data from a shard of an Amazon Kinesis data
stream. One worker maps to one or more record processors. One record processor
maps to one shard and processes records from that shard.
How does Amazon Kinesis Client Library (KCL) keep tracking data records being
processed by an Amazon Kinesis Application?
Amazon Kinesis Client Library (KCL) for Java | Python | Ruby | Node.js | .NET
automatically creates an Amazon DynamoDB table for each Amazon Kinesis
Application to track and maintain state information such as resharding events and
sequence number checkpoints. The DynamoDB table shares the same name with
the application so that you need to make sure your application name doesn't conflict
with any existing DynamoDB tables under the same account within the same region.
All workers associated with the same application name are assumed to be working
together on the same Amazon Kinesis data stream. If you run an additional instance
of the same application code, but with a different application name, KCL treats the
second instance as an entirely separate application also operating on the same data
stream.
How can I automatically scale up the processing capacity of my Amazon Kinesis
Application using Amazon Kinesis Client Library (KCL)?
You can create multiple instances of your Amazon Kinesis Application and have
these application instances run across a set of Amazon EC2 instances that are part
of an Auto Scaling group. While the processing demand increases, an Amazon EC2
instance running your application instance will be automatically instantiated. Amazon
Kinesis Client Library (KCL) for Java | Python | Ruby | Node.js | .NET will generate a
worker for this new instance and automatically move record processors from
overloaded existing instances to this new instance.
What is resharding?
Resharding is the process used to scale your data stream using a series of shard
splits or merges. In a shard split, a single shard is divided into two shards, which
increases the throughput of the data stream. In a shard merge, two shards are
merged into a single shard, which decreases the throughput of the data stream. For
more information, see Resharding a Data Stream in the Amazon Kinesis Data
Streams developer guide.
Can I privately access Kinesis Data Streams APIs from my Amazon Virtual Private
Cloud (VPC) without using public IPs?
Yes, you can privately access Kinesis Data Streams APIs from your Amazon Virtual
Private Cloud (VPC) by creating VPC Endpoints. With VPC Endpoints, the routing
between the VPC and Kinesis Data Streams is handled by the AWS network without
the need for an Internet gateway, NAT gateway, or VPN connection. The latest
generation of VPC Endpoints used by Kinesis Data Streams are powered by AWS
PrivateLink, a technology that enables private connectivity between AWS services
using Elastic Network Interfaces (ENI) with private IPs in your VPCs. To learn more
about PrivateLink, visit the PrivateLink documentation.
Create an Amazon Kinesis Data Firehose delivery stream through the Firehose
Console or the CreateDeliveryStream operation. You can optionally configure an
AWS Lambda function in your delivery stream to prepare and transform the raw data
before loading the data.
Configure your data producers to continuously send data to your delivery stream
using the Amazon Kinesis Agent or the Firehose API.
Firehose automatically and continuously loads your data to the destinations you
specify.
What is a source?
A source is where your streaming data is continuously generated and captured. For
example, a source can be a logging server running on Amazon EC2 instances, an
application running on mobile devices, a sensor on an IoT device, or a Kinesis
stream.
What is a delivery stream?
A delivery stream is the underlying entity of Amazon Kinesis Data Firehose. You use
Firehose by creating a delivery stream and then sending data to it.
What is a record?
A record is the data of interest your data producer sends to a delivery stream. The
maximum size of a record (before Base64-encoding) is 1000 KB.
Upgrade to remove adverts
Only ₹83.25/month
What is a destination?
A destination is the data store where your data will be delivered. Amazon Kinesis
Data Firehose currently supports Amazon S3, Amazon Redshift, Amazon
Elasticsearch Service, and Splunk as destinations.
How does compression work when I use the CloudWatch Logs subscription feature?
You can use CloudWatch Logs subscription feature to stream data from CloudWatch
Logs to Kinesis Data Firehose. All log events from CloudWatch Logs are already
compressed in gzip format, so you should keep Firehose's compression
configuration as uncompressed to avoid double-compression
What kind of encryption can I use?
Amazon Kinesis Data Firehose allows you to encrypt your data after it's delivered to
your Amazon S3 bucket. While creating your delivery stream, you can choose to
encrypt your data with an AWS Key Management Service (KMS) key that you own.
What is data transformation with Lambda?
Firehose can invoke an AWS Lambda function to transform incoming data before
delivering it to destinations. You can configure a new Lambda function using one of
the Lambda blueprints we provide or choose an existing Lambda function.
What is source record backup?
If you use data transformation with Lambda, you can enable source record backup,
and Amazon Kinesis Data Firehose will deliver the un-transformed incoming data to
a separate S3 bucket. You can specify an extra prefix to be added in front of the
"YYYY/MM/DD/HH" UTC time prefix generated by Firehose.
What is error logging?
If you enable data transformation with Lambda, Firehose can log any Lambda
invocation and data delivery errors to Amazon CloudWatch Logs so that you can
view the specific error logs if Lambda invocation or data delivery fails.
What is buffer size and buffer interval?
Amazon Kinesis Data Firehose buffers incoming streaming data to a certain size or
for a certain period of time before delivering it to destinations. You can configure
buffer size and buffer interval while creating your delivery stream. Buffer size is in
MBs and ranges from 1MB to 128MB for Amazon S3 destination and 1MB to 100MB
for Amazon Elasticsearch Service destination. Buffer interval is in seconds and
ranges from 60 seconds to 900 seconds. Please note that in circumstances where
data delivery to destination is falling behind data writing to delivery stream, Firehose
raises buffer size dynamically to catch up and make sure that all data is delivered to
the destination.
How is buffer size applied if I choose to compress my data?
Buffer size is applied before compression. As a result, if you choose to compress
your data, the size of the objects within your Amazon S3 bucket can be smaller than
the buffer size you specify.
What privilege is required for the Amazon Redshift user that I need to specify while
creating a delivery stream?
The Amazon Redshift user needs to have Redshift INSERT privilege for copying
data from your Amazon S3 bucket to your Redshift cluster.
What do I need to do if my Amazon Redshift cluster is within a VPC?
If your Amazon Redshift cluster is within a VPC, you need to grant Amazon Kinesis
Data Firehose access to your Redshift cluster by unblocking Firehose IP addresses
from your VPC.
Why do I need to provide an Amazon S3 bucket while choosing Amazon Redshift as
destination?
For Amazon Redshift destination, Amazon Kinesis Data Firehose delivers data to
your Amazon S3 bucket first and then issues Redshift COPY command to load data
from your S3 bucket to your Redshift cluster.
Why do I need to provide an Amazon S3 bucket when choosing Amazon
Elasticsearch Service as destination?
When loading data into Amazon Elasticsearch Service, Amazon Kinesis Data
Firehose can back up all of the data or only the data that failed to deliver. To take
advantage of this feature and prevent any data loss, you need to provide a backup
Amazon S3 bucket.
Can I change the configurations of my delivery stream after it's created?
You can change the configuration of your delivery stream at any time after it's
created. You can do so by using the Firehose Console or the UpdateDestination
operation. Your delivery stream remains in ACTIVE state while your configurations
are updated and you can continue to send data to your delivery stream. The updated
configurations normally take effect within a few minutes.
How do I prepare and transform raw data in Amazon Kinesis Data Firehose?
Amazon Kinesis Data Firehose allows you to use an AWS Lambda function to
prepare and transform incoming raw data in your delivery stream before loading it to
destinations. You can configure an AWS Lambda function for data transformation
when you create a new delivery stream or when you edit an existing delivery stream
How do I return prepared and transformed data from my AWS Lambda function back
to Amazon Kinesis Data Firehose?
All transformed records from Lambda must be returned to Firehose with the following
three parameters; otherwise, Firehose will reject the records and treat them as data
transformation failure.
- recordId
- result
- data
Can I keep a copy of all the raw data in my S3 bucket?
Yes, Firehose can back up all un-transformed records to your S3 bucket concurrently
while delivering transformed records to destination. Source record backup can be
enabled when you create or update your delivery stream.
How do I add data to my Amazon Kinesis Data Firehose delivery stream?
You can add data to an Amazon Kinesis Data Firehose delivery stream through
Amazon Kinesis Agent or Firehose's PutRecord and PutRecordBatch operations.
Kinesis Data Firehose is also integrated with other AWS data sources such as
Kinesis Data Streams, AWS IoT, Amazon CloudWatch Logs, and Amazon
CloudWatch Events.
How do I add data to my Firehose delivery stream from my Kinesis stream?
When you create or update your delivery stream through AWS console or Firehose
APIs, you can configure a Kinesis stream as the source of your delivery stream.
Once configured, Firehose will automatically read data from your Kinesis stream and
load the data to specified destinations.
How often does Kinesis Data Firehose read data from my Kinesis stream?
Kinesis Data Firehose calls Kinesis Data Streams GetRecords() once every second
for each Kinesis shard.
From where does Kinesis Data Firehose read data when my Kinesis stream is
configured as the source of my delivery stream?
Kinesis Data Firehose starts reading data from the LATEST position of your Kinesis
data stream when it's configured as the source of a delivery stream.
Can I configure my Kinesis data stream to be the source of multiple Firehose
delivery streams?
Yes, you can. However, note that the GetRecords() call from Kinesis Data Firehose
is counted against the overall throttling limit of your Kinesis shard so that you need to
plan your delivery stream along with your other Kinesis applications to make sure
you won't get throttled.
Can I still add data to delivery stream through Kinesis Agent or Firehose's PutRecord
and PutRecordBatch operations when my Kinesis data stream is configured as
source?
No, you cannot. When a Kinesis data stream is configured as the source of a
Firehose delivery stream, Firehose's PutRecord and PutRecordBatch operations will
be disabled. You should add data to your Kinesis data stream through the Kinesis
Data Streams PutRecord and PutRecords operations instead.
How do I add data to my delivery stream from AWS IoT?
You add data to your delivery stream from AWS IoT by creating an AWS IoT action
that sends events to your delivery stream. For more information
How do I add data to my delivery stream from CloudWatch Logs?
You add data to your Firehose delivery stream from CloudWatch Logs by creating a
CloudWatch Logs subscription filter that sends events to your delivery stream.
How do I add data to my Amazon Kinesis Data Firehose delivery stream from
CloudWatch Events?
You add data to your Firehose delivery stream from CloudWatch Events by creating
a CloudWatch Events rule with your delivery stream as target.
Resharding
Split shard - performance, Merge shard - Cost saving
What is record
Unit of data stored in stream
Record consists of
1. Partition key, 2. Sequence number, 3. Data Blob
Partition key
1. Group the data by shard 2. tells which shard data belongs to 3. Partition key is
specified by application putting data into stream
Sequence number
Unique number for data, identifies blob
Data Blob
Actual data, max size 1 MB
Stream retention period
1 day, extendable upto 7 days
How to add data into stream
.1. Kinesis stream API, KPL, Kinesis agent
Kinesis stream API
AWS SDK for java, PutRecord, PutRecords
KCL
KCL uses dynamo DB table to track processed data, KCL will restart the processing
of the shard at the last known processed record, if worker fails. Use autoscaling to
handle scalability
KCL_dynamoDB
KCL uses stream application name for dynamoDB table and each row represents a
shard, Hash key is shard ID. DynamoDB table has 10 RCU and 10 WCU.
KCL_dynamoDB_1
DynamoDB provisioned throughput exceptions happens
1. Too many shards
2. Application frequent checkpoint
Stream clients
S3, RedShift, DynamoDB, ElasticSearch. EMR, Lambda
Client Library Pipeline
Stream -> iTransformer -> iFilter -> iBuffer -> iEmiiter ->
S3/RedShift/DynamoDB/ElasticSearch
KCL connectors available for
S3, DynamoDB, ElasticSearch, RedShift (not EMR)
Lambda can automatically read records from stream, process them and send it to
S3, RedShift, DynamoDB
Kinesis FireHose
Collect and stream data in near real time, Load data into S3, Redshift, ElasticSearch
Kinesis FireHose-1
Fully managed service. Scaling, sharding, monitoring done with zero administration.
Can be created using console, API
Kinesis FireHose-2
Buffer size 1MB/128MB (5MB default), Buffer Interval 60-900sec
(300 sec default) - Stream will deliver to end use which ever condition met first
(size/time). FH will increase buffer size automatically. Redshift transfer will depends
on how fast COPY command completes
Kinesis FireHose-3- SDK loading
PutRecord, PutRecordBatch
Kinesis FireHose-2 -Lambda
Data transformation, create new function or use existing blueprints - syslog -> CSV,
syslog -> JSON, Apache log to CSV/Jason, Kinesis FH processing
Kinesis FireHose-2 - compression
uncompressed, gzip, zip, snappy
Kinesis FireHose-2 - Encrypt
kms
Kinesis FireHose-2 - Lambda processing
FH requires from lambda 1) recordID, 2) Result (ok, dropped, processingfailed), 3)
data
Kinesis FireHose-2 - Failure Handling
retry 3 times, invocation errors - cloudwatch logs
Unsuccessful process records - sent to S3 in processing_failed folder. S3 - FH tries
retry for 24 hours. RedShift - retry 0-7200secs from S3 , add skip items in manifest
file
Redshift
is a fast and powerful, fully managed, petabyte-scale data warehouse service in the
cloud.
VPC
A logically isolated network in the AWS Cloud.
EC2
is a web service that provides resizable compute capacity in the cloud
S3
is object storage with a simple web service interface to store and retrieve any
amount of data from anywhere on the web. It is designed to deliver 99.999999999%
durability, and scale past trillions of objects worldwide.
RDS
makes it easy to set up, operate, and scale a relational database in the cloud. It
provides cost-efficient and resizable capacity while managing time-consuming
database administration tasks, freeing you up to focus on your applications and
business. Amazon RDS provides you six familiar database engines to choose from,
including Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle, and Microsoft
SQL Server.
Aurora
is a MySQL-compatible relational database engine that combines the speed and
availability of high-end commercial databases with the simplicity and cost-
effectiveness of open source databases. Amazon Aurora provides up to five times
better performance than MySQL with the security, availability, and reliability of a
commercial database at one tenth the cost.
You can now launch Amazon Aurora database instances with either MySQL or
PostgreSQL compatibility. PostgreSQL compatibility is now available in preview.
IAM
AWS Identity and Access Management (IAM) enables you to securely control access
to AWS services and resources for your users. Using IAM, you can create and
manage AWS users and groups, and use permissions to allow and deny their access
to AWS resources.
data pipeline
is a web service that helps you reliably process and move data between different
AWS compute and storage services, as well as on-premise data sources, at
specified intervals. With AWS Data Pipeline, you can regularly access your data
where it's stored, transform and process it at scale, and efficiently transfer the results
to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and
Amazon EMR.
AWS Data Pipeline helps you easily create complex data processing workloads that
are fault tolerant, repeatable, and highly available. You don't have to worry about
ensuring resource availability, managing inter-task dependencies, retrying transient
failures or timeouts in individual tasks, or creating a failure notification system. AWS
Data Pipeline also allows you to move and process data that was previously locked
up in on-premise data silos.
EMR
provides a managed Hadoop framework that makes it easy, fast, and cost-effective
to process vast amounts of data across dynamically scalable Amazon EC2
instances. You can also run other popular distributed frameworks such as Apache
Spark, HBase, Presto, and Flink in Amazon EMR, and interact with data in other
AWS data stores such as Amazon S3 and Amazon DynamoDB.
Amazon EMR securely and reliably handles a broad set of big data use cases,
including log analysis, web indexing, data transformations (ETL), machine learning,
financial analysis, scientific simulation, and bioinformatics.
DynamoDB
is a fast and flexible NoSQL database service for all applications that need
consistent, single-digit millisecond latency at any scale. It is a fully managed cloud
database and supports both document and key-value store models. Its flexible data
model and reliable performance make it a great fit for mobile, web, gaming, ad tech,
IoT, and many other applications.
Kinesis
is a platform for streaming data on AWS, offering powerful services to make it easy
to load and analyze streaming data, and also providing the ability for you to build
custom streaming data applications for specialized needs. Web applications, mobile
devices, wearables, industrial sensors, and many software applications and services
can generate staggering amounts of streaming data - sometimes TBs per hour - that
need to be collected, stored, and processed continuously. Amazon Kinesis services
enable you to do that simply and at a low cost.
Kinesis Firehose
Amazon Kinesis Firehose is the easiest way to load streaming data into AWS. It can
capture, transform, and load streaming data into Amazon Kinesis Analytics, Amazon
S3, Amazon Redshift, and Amazon Elasticsearch Service, enabling near real-time
analytics with existing business intelligence tools and dashboards you're already
using today.
Kinesis Streams
Amazon Kinesis Streams enables you to build custom applications that process or
analyze streaming data for specialized needs. Amazon Kinesis Streams can
continuously capture and store terabytes of data per hour from hundreds of
thousands of sources such as website clickstreams, financial transactions, social
media feeds, IT logs, and location-tracking events. With Amazon Kinesis Client
Library (KCL), you can build Amazon Kinesis Applications and use streaming data to
power real-time dashboards, generate alerts, implement dynamic pricing and
advertising, and more.
Kinesis Analytics
Amazon Kinesis Analytics is the easiest way to process streaming data in real time
with standard SQL without having to learn new programming languages or
processing frameworks. Amazon Kinesis Analytics enables you to create and run
SQL queries on streaming data so that you can gain actionable insights and respond
to your business and customer needs promptly.
Kinesis vs SQS
Amazon Kinesis is differentiated from Amazon's Simple Queue Service (SQS) in that
Kinesis is used to enable real-time processing of streaming big data. SQS, on the
other hand, is used as a message queue to store messages transmitted between
distributed application components.
Kinesis provides routing of records using a given key, ordering of records, the ability
for multiple clients to read messages from the same stream concurrently, replay of
messages up to as long as seven days in the past, and the ability for a client to
consume records at a later time. Kinesis Streams will not dynamically scale in
response to increased demand, so you must provision enough streams ahead of
time to meet the anticipated demand of both your data producers and data
consumers.
SQS provides for messaging semantics so that your application can track the
successful completion of work items in a queue, and you can schedule a delay in
messages of up to 15 minutes. Unlike Kinesis Streams, SQS will scale automatically
to meet application demand. SQS has lower limits to the number of messages that
can be read or written at one time compared to Kinesis, so applications using Kinesis
can work with messages in larger batches than when using SQS.
API Gateway
is a fully managed service that makes it easy for developers to create, publish,
maintain, monitor, and secure APIs at any scale. With a few clicks in the AWS
Management Console, you can create an API that acts as a "front door" for
applications to access data, business logic, or functionality from your back-end
services, such as workloads running on Amazon Elastic Compute Cloud (Amazon
EC2), code running on AWS Lambda, or any Web application. Amazon API Gateway
handles all the tasks involved in accepting and processing up to hundreds of
thousands of concurrent API calls, including traffic management, authorization and
access control, monitoring, and API version management. Amazon API Gateway has
no minimum fees or startup costs. You pay only for the API calls you receive and the
amount of data transferred out.
Cognito
lets you easily add user sign-up and sign-in to your mobile and web apps. With
Amazon Cognito, you also have the options to authenticate users through social
identity providers such as Facebook, Twitter, or Amazon, with SAML identity
solutions, or by using your own identity system. In addition, Amazon Cognito enables
you to save data locally on users devices, allowing your applications to work even
when the devices are offline. You can then synchronize data across users devices
so that their app experience remains consistent regardless of the device they use.
OpsWorks
uses Chef recipes to start new app server instances, configure application server
software, and deploy applications. Organizations can leverage Chef recipes to
automate operations like software configurations, package installations, database
setups, server scaling, and code deployment.
KMS -Key Management Service
is a managed service that makes it easy for you to create and control the encryption
keys used to encrypt your data, and uses Hardware Security Modules (HSMs) to
protect the security of your keys. AWS Key Management Service is integrated with
several other AWS services to help you protect the data you store with these
services. AWS Key Management Service is also integrated with AWS CloudTrail to
provide you with logs of all key usage to help meet your regulatory and compliance
needs.
IAM
Identity and Access Management (AWS version of Active Directory)
CloudTrail
Tracks API usage in an account and stores logs in S3
S3
Simple Storage Service; major storage area for AWS; uses buckets and folders to organize
files and data
Amazon Redshift
Data warehousing
Amazon Kinesis
data streaming
Amazon Kinesis Firehose
...
Amazon Simple Workflow Service
Build, run and scale background or parallel processes
Virtual Private Cloud
...
DynamoDB
Scalable NoSQL database; supports document and key/value models
AWS Storage Gateway
Can connect an on-premises software application to cloud storage
CloudWatch
Monitoring Service for AWS services such as CloudFront, EC2, S3, etc.
SNS
Simple Notification Service; can send and deliver messages
SES
Simple Email Service; can send and deliver emails
SQS
Simple Queue Service; message queueing
Region
Geographically separated region; each region contains multiple isolated Availability Zones
EMR
Elastic Map-Reduce; for processing large amounts of data
Ganglia
open source software to monitor clusters without adding load
AWS Config
Detailed view of the configuration of AWS resources in an account. Produces a configuration
item for each resource used