0% found this document useful (0 votes)
5 views11 pages

Kinesis

Amazon Kinesis is a feature of AWS that allows for real-time collection, processing, and analysis of video and data streams. It includes four main capabilities: Kinesis Video Streams, Kinesis Data Streams, Kinesis Data Firehose, and Kinesis Data Analytics, each providing unique functionalities for handling streaming data. Kinesis is a managed service that simplifies setup compared to alternatives like Apache Kafka, while also offering features like durability, security, and serverless processing.

Uploaded by

subham joshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views11 pages

Kinesis

Amazon Kinesis is a feature of AWS that allows for real-time collection, processing, and analysis of video and data streams. It includes four main capabilities: Kinesis Video Streams, Kinesis Data Streams, Kinesis Data Firehose, and Kinesis Data Analytics, each providing unique functionalities for handling streaming data. Kinesis is a managed service that simplifies setup compared to alternatives like Apache Kafka, while also offering features like durability, security, and serverless processing.

Uploaded by

subham joshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Amazon Kinesis is an important feature of Amazon Web Services (AWS) that

easily gathers or collects, processes, and analyses video and data streams in
a real-time environment.

There are four Amazon Kinesis Capabilities:


1.Amazon Kinesis Video Stream

1. Stream video from millions of devices: It provides SDKs that make it


easy for devices to securely stream media to AWS for playback,
storage, analytics, machine learning, and other processing.
2. Build real-time vision and video-enabled apps: It can easily build
applications with real-time computer vision capabilities through
integration with Amazon Recognition Video and with real-time video
analytics capabilities using popular open-source machine learning
frameworks.
3. Playback live and recorded video streams: Users can easily stream
live and recorded media from your Kinesis video streams to your
browser or mobile application using the Kinesis Video Streams HTTP
Live Streaming (HLS) capability.
4. Durable, searchable storage: It uses Amazon S3 as the underlying
data store, which means your data is stored durably and reliably.
Kinesis Video Streams enables you to quickly search and retrieve video
fragments based on device and service-generated timestamps.

2.Amazon Kinesis Data Streams

Data Streams -
Streaming data is data which is generated continuously from thousands of
data sources, and these data sources can send the data records
simultaneously and in small size.

Amazon Kinesis Streams are used to gather together and process huge
streams of data records in real time. AWS Kinesis Data Stream Applications
can be created, which are data-processing applications. These applications
perform the reading from a data stream in the form of data records. They use
Kinesis Client Library for these operations and can run on Amazon EC2
instances. Processed records can be sent to AWS dashboards and can be
used to generate alerts, send data to other AWS services, and dynamically
change advertising and pricing strategies.
1. Durability: It ensures minimal data loss along with synchronous
duplication of streaming data across all the Availability Zones in the
AWS Region.
2. Security: Sensitive data can be encrypted within KDS so that you can
access your data privately through Amazon Virtual Private Cloud (VPC).
3. Easy to use and low cost: The components like connectors, agents,
Kinesis Client Library (KLC), etc can help you build streaming
applications quickly and effectively. There is no upfront cost for Kinesis
Data Streams. You only have to pay for the resources you use.
4. Elasticity and Real-Time Performance: According to SNDK Corp, you
can easily dynamically scale your applications from gigabytes to
terabytes of data per hour adjusting the throughput. Real-Time Analytics
Applications can be supplied with real-time streaming data within a very
short time of the data being collected.

3.Amazon Kinesis Data Firehose

1. Integrated with AWS services and service providers: Amazon


Kinesis Data Firehose is integrated with Amazon S3, Amazon Redshift,
and Amazon Elasticsearch Service.
2. Serverless data transformation: It can easily convert raw streaming
data from your data sources into formats like Apache Parquet and
Apache ORC required by your destination data stores, without having to
build your own data processing pipelines.
3. Near real-time: Users can load new data into their destinations within
60 seconds after the data is sent to the service. As a result, you can
access new data sooner and react to business and operational events
faster.
4. Pay only for what you use: With Amazon Kinesis Data Firehose,
users need to pay only for the volume of data they transmit through the
service, and if applicable, for data format conversion.

4.Amazon Kinesis Data Analytics

1. Powerful real-time processing: It provides built-in functions to filter,


aggregate, and transform streaming data for advanced analytics. It
processes streaming data with sub-second latencies, enabling you to
analyse and respond to incoming data and events in real time.
2. No servers to manage: It runs your streaming applications without
requiring you to provision or manage any infrastructure. Amazon
Kinesis Data Analytics automatically scales the infrastructure up and
down as required to process incoming data.
3. Pay only for what you use: With Amazon Kinesis Data Analytics, you
only pay for the processing resources that your streaming applications
use.
Kinesis Data Streams Terminology

Kinesis Data Stream

A Kinesis data stream is a set of shards. Each shard has a sequence of data
records. Each data record has a sequence number that is assigned by
Kinesis Data Streams.

Data Record

A data record is the unit of data stored in a Kinesis data stream. Data records
are composed of a sequence number, a partition key and a data blob, which
is an immutable sequence of bytes. Kinesis Data Streams does not inspect,
interpret, or change the data in the blob in any way. A data blob can be up to
1 MB.

Capacity Mode

A data stream capacity mode determines how capacity is managed and how
you are charged for the usage of your data stream. Currently, in Kinesis Data
Streams, you can choose between an on-demand mode and
a provisioned mode for your data streams.

With the on-demand mode, Kinesis Data Streams automatically manages


the shards in order to provide the necessary throughput. You are charged
only for the actual throughput that you use and Kinesis Data Streams
automatically accommodates your workloads’ throughput needs as they ramp
up or down.

With the provisioned mode, you must specify the number of shards for the
data stream. The total capacity of a data stream is the sum of the capacities
of its shards. You can increase or decrease the number of shards in a data
stream as needed and you are charged for the number of shards at an hourly
rate.

Retention Period

The retention period is the length of time that data records are accessible
after they are added to the stream. A stream’s retention period is set to a
default of 24 hours after creation.

Producer

Producers put records into Amazon Kinesis Data Streams. For example, a
web server sending log data to a stream is a producer.

Consumer

Consumers get records from Amazon Kinesis Data Streams and process
them. These consumers are known as Amazon Kinesis Data Streams
Application

Shard

A shard is a uniquely identified sequence of data records in a stream. A


stream is composed of one or more shards, each of which provides a fixed
unit of capacity. If your data rate increases, you can increase or decrease the
number of shards allocated to your stream.

Partition Key

A partition key is used to group data by shard within a stream. Kinesis Data
Streams segregates the data records belonging to a stream into multiple
shards. It uses the partition key that is associated with each data record to
determine which shard a given data record belongs to. Partition keys are
Unicode strings, with a maximum length limit of 256 characters for each key.
An MD5 hash function is used to map partition keys to 128-bit integer values
and to map associated data records to shards using the hash key ranges of
the shards. When an application puts data into a stream, it must specify a
partition key.

Sequence Number

Each data record has a sequence number that is unique per partition-key
within its shard. Kinesis Data Streams assigns the sequence number after
you write to the stream with client.putRecords or client.putRecord. Sequence
numbers for the same partition key generally increase over time. The longer
the time period between write requests, the larger the sequence numbers
become.

Kinesis Client Library

The Kinesis Client Library is compiled into your application to enable fault-
tolerant consumption of data from the stream. The Kinesis Client Library
ensures that for every shard there is a record processor running and
processing that shard. The library also simplifies reading data from the
stream. The Kinesis Client Library uses an Amazon DynamoDB table to store
control data. It creates one table per application that is processing data.
AWS Kinesis vs Apache Kafka:

There are both a number of similarities and a number of differences between


Kinesis and Kafka.

Similarities-

1.Both are designed to ingest and process multiple large-scale streams of


data with a great deal of flexibility in terms of source.

2.Both replace traditional message brokers in environments that ingest large


streams of data that need to be processed and delivered to other applications
and services.

The biggest difference between the two is that Amazon Kinesis is a managed
service that requires minimal setup and configuration. Kafka is an open-
source solution, requiring significant time, investment and knowledge to
configure.

Kinesis:

Key concepts used by Kinesis include:

Data Producers, Data Consumers, Data Streams, Shards, Data Records,


Partition Key, and Sequence Number.

• Data Producers are the source devices that emit Data Records.
• The Data Consumer retrieves the Data Records from shards in the stream.
The consumer is the app or service that makes use of the stream data.

• Shards are comprised of these Data Records, and in turn, numerous


shards make up a Kinesis Data Stream.

• The partition key is an identifier, such as a user ID or timestamp, with the


sequence number serving as a unique identifier for each data record. This
ensures that the data remains unchanged throughout the stream.

Kafka:

Key concepts used by Kafka include:


Kafka utilises similar concepts: Records, Topics, Consumers, Producers,
Brokers, Logs, Partitions, and Clusters.

• With Kafka, records are immutable from the outset and sent sequentially to
ensure continuous flow without data degradation.

• A Topic could be considered analogous to a Kinesis Shard and is


essentially a stream of records. Logs serve as storage on disk, further sub-
divided into partitions and segments.

• Kafka has four key APIs. The Producer API sends streams of data to
Topics in the Kafka cluster. The Consumer API reads the streams of data
from topics. The Streams API transforms streams of data from input to
output Topics. The Connect API implements connectors that pull from
source systems and push from Kafka to other systems/services/
applications.

• The Broker can be considered a Kafka Server running in a Kafka Cluster.


Multiple Kafka Brokers are part of a given cluster, and a Kafka Cluster can
consist of Brokers split among many servers. However, Broker can
sometimes refer to Kafka as a whole. Essentially, it is the piece that
manages the stream of data, incoming and outgoing.

Differences in integrations and feature sets-

• Kafka has SDK support for Java, but Kinesis can support Android, Java,
Go, and .Net, among others. Because Kafka is open source, however, new
integrations are in development every day.

• But while Kinesis may currently offer more flexibility in integrations, it is less
flexible in terms of configuration – it only allows the number of days and
shards to be configured and writes synchronously to 3 different machines,
datacenter, and availability zones (this standard configuration can constrain
throughput performance). Kafka is more flexible, allowing more control over
configuration, letting you set the complexity of replications, and, when
configured properly to a use case, can be even more scalable and offer
greater throughput.

• Kinesis’ standardised configuration allows it to be set up in hours as


opposed to weeks.
Kinesis Data Streams vs SQS

Amazon Kinesis Data Streams

• allows real-time processing of streaming big data and the ability to read
and replay records to multiple Amazon Kinesis Applications.
• Amazon Kinesis Client Library (KCL) delivers all records for a given
partition key to the same record processor, making it easier to build
multiple applications that read from the same Amazon Kinesis stream
(for example, to perform counting, aggregation, and filtering).

Amazon SQS

• offers a reliable, highly-scalable hosted queue for storing messages as


they travel between applications or microservices.
• It moves data between distributed application components and helps
decouple these components.
• supports both standard and FIFO queues

You might also like