0% found this document useful (0 votes)

15 views4 pages

What Is Streaming Data

Streaming Data refers to continuously generated data from various sources that needs to be processed incrementally for real-time analytics. It is beneficial for industries to gain insights into customer activity and operational performance, evolving from simple data collection to complex analyses using machine learning. AWS provides services like Amazon Kinesis for managing streaming data, along with options for deploying other streaming platforms on its infrastructure.

Uploaded by

Suresh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views4 pages

What Is Streaming Data

Uploaded by

Suresh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

What is Streaming Data?

Streaming Data is data that is generated continuously by thousands of data sources, which
typically send in the data records simultaneously, and in small sizes (order of Kilobytes).
Streaming data includes a wide variety of data such as log files generated by customers using
your mobile or web applications, ecommerce purchases, in-game player activity, information
from social networks, financial trading floors, or geospatial services, and telemetry from
connected devices or instrumentation in data centers.

This data needs to be processed sequentially and incrementally on a record-by-record basis or

over sliding time windows, and used for a wide variety of analytics including correlations,
aggregations, filtering, and sampling. Information derived from such analysis gives companies
visibility into many aspects of their business and customer activity such as –service usage (for
metering/billing), server activity, website clicks, and geo-location of devices, people, and physical
goods –and enables them to respond promptly to emerging situations. For example, businesses
can track changes in public sentiment on their brands and products by continuously analyzing
social media streams, and respond in a timely fashion as the necessity arises.

Benefits of Streaming Data

Streaming data processing is beneficial in most scenarios where new, dynamic data is generated
on a continual basis. It applies to most of the industry segments and big data use cases.
Companies generally begin with simple applications such as collecting system logs and
rudimentary processing like rolling min-max computations. Then, these applications evolve to
more sophisticated near-real-time processing. Initially, applications may process data streams to
produce simple reports, and perform simple actions in response, such as emitting alarms when
key measures exceed certain thresholds. Eventually, those applications perform more
sophisticated forms of data analysis, like applying machine learning algorithms, and extract
deeper insights from the data. Over time, complex, stream and event processing algorithms, like
decaying time windows to find the most recent popular movies, are applied, further enriching
the insights.

Streaming Data Examples

• Sensors in transportation vehicles, industrial equipment, and farm machinery send data to a
streaming application. The application monitors performance, detects any potential defects in
advance, and places a spare part order automatically preventing equipment down time.

• A financial institution tracks changes in the stock market in real time, computes value-at-risk,
and automatically rebalances portfolios based on stock price movements.

• A real-estate website tracks a subset of data from consumers’ mobile devices and makes real-
time property recommendations of properties to visit based on their geo-location.

• A solar power company has to maintain power throughput for its customers, or pay penalties.
It implemented a streaming data application that monitors of all of panels in the field, and
schedules service in real time, thereby minimizing the periods of low throughput from each
panel and the associated penalty payouts.

• A media publisher streams billions of clickstream records from its online properties,
aggregates and enriches the data with demographic information about users, and optimizes
content placement on its site, delivering relevancy and better experience to its audience.

• An online gaming company collects streaming data about player-game interactions, and feeds
the data into its gaming platform. It then analyzes the data in real-time, offers incentives and
dynamic experiences to engage its players.

Comparison between Batch Processing and Stream Processing

Before dealing with streaming data, it is worth comparing and contrasting stream
processing and batch processing. Batch processingcan be used to compute arbitrary queries over
different sets of data. It usually computes results that are derived from all the data it
encompasses, and enables deep analysis of big data sets. MapReduce-based systems, like
Amazon EMR, are examples of platforms that support batch jobs. In contrast, stream
processing requires ingesting a sequence of data, and incrementally updating metrics, reports,
and summary statistics in response to each arriving data record. It is better suited for real-time
monitoring and response functions.

Batch processing Stream processing

Queries or processing over all Queries or processing over data within a

Data scope or most of the data in the rolling time window, or on just the most
dataset. recent data record.

Individual records or micro batches consisting

Data size
Large batches of data. of a few records.
Requires latency in the order of seconds or
Performance Latencies in minutes to hours.
milliseconds.

Simple response functions, aggregates, and

Analyses Complex analytics.
rolling metrics.
Many organizations are building a hybrid model by combining the two approaches, and maintain
a real-time layer and a batch layer. Data is first processed by a streaming data platform such
as Amazon Kinesis to extract real-time insights, and then persisted into a store like S3, where it
can be transformed and loaded for a variety of batch processing use cases.

Challenges in Working with Streaming Data

Streaming data processing requires two layers: a storage layer and a processing layer. The
storage layer needs to support record ordering and strong consistency to enable fast,
inexpensive, and replayable reads and writes of large streams of data. The processing layer is
responsible for consuming data from the storage layer, running computations on that data, and
then notifying the storage layer to delete data that is no longer needed. You also have to plan for
scalability, data durability, and fault tolerance in both the storage and processing layers. As a
result, many platforms have emerged that provide the infrastructure needed to build streaming
data applications including Amazon Kinesis Streams, Amazon Kinesis Firehose, Apache Kafka,
Apache Flume, Apache Spark Streaming, and Apache Storm.

Working with Streaming Data on AWS

Amazon Web Services (AWS) provides a number options to work with streaming data. You can
take advantage of the managed streaming data services offered by Amazon Kinesis, or deploy
and manage your own streaming data solution in the cloud on Amazon EC2.

Amazon Kinesis is a platform for streaming data on AWS, offering powerful services to make it
easy to load and analyze streaming data, and also enables you to build custom streaming data
applications for specialized needs. It offers two services: Amazon Kinesis Firehose, and Amazon
Kinesis Streams.

In addition, you can run other streaming data platforms such as –Apache Kafka, Apache Flume,
Apache Spark Streaming, and Apache Storm –on Amazon EC2 and Amazon EMR.
Amazon Kinesis Streams
Amazon Kinesis Streams enables you to build your own custom applications that process or
analyze streaming data for specialized needs. It can continuously capture and store terabytes of
data per hour from hundreds of thousands of sources. You can then build applications that
consume the data from Amazon Kinesis Streams to power real-time dashboards, generate alerts,
implement dynamic pricing and advertising, and more. Amazon Kinesis Streams supports your
choice of stream processing framework including Kinesis Client Library (KCL), Apache Storm, and
Apache Spark Streaming. Learn more about Amazon Kinesis Streams »
Amazon Kinesis Firehose
Amazon Kinesis Firehose is the easiest way to load streaming data into AWS. It can capture and
automatically load streaming data into Amazon S3 and Amazon Redshift, enabling near real-time
analytics with existing business intelligence tools and dashboards you’re already using today. It
enables you to quickly implement an ELT approach, and gain benefits from streaming data
quickly. Learn more about Amazon Kinesis Firehose »

Other Streaming Solutions on Amazon EC2

You can install streaming data platforms of your choice on Amazon EC2 and Amazon EMR, and
build your own stream storage and processing layers. By building your streaming data solution
on Amazon EC2 and Amazon EMR, you can avoid the friction of infrastructure provisioning, and
gain access to a variety of stream storage and processing frameworks. Options for streaming
data storage layer include Apache Kafka and Apache Flume. Options for stream processing
layer Apache Spark Streaming and Apache Storm.

Real-Time Streaming in Big Data: Kafka and Spark With Singlestore
100% (1)
Real-Time Streaming in Big Data: Kafka and Spark With Singlestore
23 pages
Cloud Computing Lab Manual Final
100% (1)
Cloud Computing Lab Manual Final
72 pages
Bigdata-Mining Data Streams
No ratings yet
Bigdata-Mining Data Streams
19 pages
BDA Unit-4
No ratings yet
BDA Unit-4
12 pages
Module-2-MINING DATA STREAMS
100% (3)
Module-2-MINING DATA STREAMS
17 pages
Big Data Analytics Unit-2
No ratings yet
Big Data Analytics Unit-2
11 pages
StreamProcessingAndAnalytics Handout
No ratings yet
StreamProcessingAndAnalytics Handout
7 pages
Unit-Ii 30-1-24
No ratings yet
Unit-Ii 30-1-24
162 pages
Unit-II BDA
No ratings yet
Unit-II BDA
19 pages
4 Building Blocks of A Streaming Data Architecture
No ratings yet
4 Building Blocks of A Streaming Data Architecture
11 pages
20250129-EB-Ultimate Data Streaming Guide
No ratings yet
20250129-EB-Ultimate Data Streaming Guide
103 pages
Real Time Data
No ratings yet
Real Time Data
4 pages
Unit 2 BD Mining Data Streams
No ratings yet
Unit 2 BD Mining Data Streams
34 pages
Data Analytics Unit 3
No ratings yet
Data Analytics Unit 3
14 pages
Ppb1 Workshop Streaming
No ratings yet
Ppb1 Workshop Streaming
64 pages
AWS Data Lake
No ratings yet
AWS Data Lake
87 pages
Compute Engine
No ratings yet
Compute Engine
49 pages
Thomas Douglas Hacker Culture
100% (1)
Thomas Douglas Hacker Culture
296 pages
Introducing Amazon Kinesis: Managed Service For Streaming Data Ingestion & Processing
No ratings yet
Introducing Amazon Kinesis: Managed Service For Streaming Data Ingestion & Processing
36 pages
BDA Unit 3
No ratings yet
BDA Unit 3
42 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
57 pages
6 - Streaming Part 1
No ratings yet
6 - Streaming Part 1
44 pages
Lecture 3 SStreaming Data Systems and Applications
No ratings yet
Lecture 3 SStreaming Data Systems and Applications
39 pages
T09 Data Streaming
No ratings yet
T09 Data Streaming
52 pages
Stream Processing
No ratings yet
Stream Processing
33 pages
BDA Lec10
No ratings yet
BDA Lec10
33 pages
Unit 3
No ratings yet
Unit 3
51 pages
Chapter 1-1
No ratings yet
Chapter 1-1
34 pages
Hazelcast Level Up To Instant Action-1706173416548
No ratings yet
Hazelcast Level Up To Instant Action-1706173416548
36 pages
Whitepaper Streaming Data Solutions On Aws With Amazon Kinesis
No ratings yet
Whitepaper Streaming Data Solutions On Aws With Amazon Kinesis
33 pages
Lec 19
No ratings yet
Lec 19
24 pages
ECS765P - W10 - Stream Processing
No ratings yet
ECS765P - W10 - Stream Processing
39 pages
Chapter 6
No ratings yet
Chapter 6
26 pages
JyothsnaDST Unit-1 Extra
No ratings yet
JyothsnaDST Unit-1 Extra
25 pages
Lec 19
No ratings yet
Lec 19
23 pages
Module II
No ratings yet
Module II
22 pages
SA Unit 1 PPT 2
No ratings yet
SA Unit 1 PPT 2
27 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
19 pages
BDA Unit 3
No ratings yet
BDA Unit 3
18 pages
Bigdata Unit-Ii
No ratings yet
Bigdata Unit-Ii
33 pages
Assignment No. 3 For Business Data Analytics
No ratings yet
Assignment No. 3 For Business Data Analytics
16 pages
Reference Guide To Stream Processing
No ratings yet
Reference Guide To Stream Processing
14 pages
SA Unit 1 PPT 5
No ratings yet
SA Unit 1 PPT 5
14 pages
Chapter 1
No ratings yet
Chapter 1
13 pages
25-Introduction To Data Streaming-04-03-2025
No ratings yet
25-Introduction To Data Streaming-04-03-2025
13 pages
Lec 01
No ratings yet
Lec 01
17 pages
DataStreaming L-4
No ratings yet
DataStreaming L-4
16 pages
Unit Iv
No ratings yet
Unit Iv
11 pages
Big Data IV Nit
No ratings yet
Big Data IV Nit
15 pages
Unit-II (Big Data)
No ratings yet
Unit-II (Big Data)
20 pages
Streaming Systems
No ratings yet
Streaming Systems
1 page
Ethercat Doc
100% (1)
Ethercat Doc
101 pages
Streaming Graph Processing Unit5
No ratings yet
Streaming Graph Processing Unit5
7 pages
Real Time Analytics Spark Streaming PDF
No ratings yet
Real Time Analytics Spark Streaming PDF
20 pages
3 Challenges of Data Streaming Pipelines and How To Overcome Them
No ratings yet
3 Challenges of Data Streaming Pipelines and How To Overcome Them
5 pages
Real Time Data Streaming New Techniques
No ratings yet
Real Time Data Streaming New Techniques
5 pages
Unit Iv
No ratings yet
Unit Iv
5 pages
CDA C2 R 050 en File 24.en
No ratings yet
CDA C2 R 050 en File 24.en
2 pages
What Is Stream Processing
No ratings yet
What Is Stream Processing
3 pages
2-Lab Assignment Spring Boot Framework-Assignment
No ratings yet
2-Lab Assignment Spring Boot Framework-Assignment
8 pages
Spring Angular Secruity Integration 20 Jan 21
No ratings yet
Spring Angular Secruity Integration 20 Jan 21
36 pages
Java InterviewQuestion LinkedIn JavaCommunity
No ratings yet
Java InterviewQuestion LinkedIn JavaCommunity
45 pages
Iot Internet of Things
No ratings yet
Iot Internet of Things
18 pages
Java Exercises For Exams and Interviews (Bond, Jake) (Z-Library)
No ratings yet
Java Exercises For Exams and Interviews (Bond, Jake) (Z-Library)
52 pages
App1 Stepbystep-Django
No ratings yet
App1 Stepbystep-Django
15 pages
Steps ApplicationManagedEntityManager
No ratings yet
Steps ApplicationManagedEntityManager
4 pages
Unique Properties of P Block Elements Elias Lectures 2 August 2016
No ratings yet
Unique Properties of P Block Elements Elias Lectures 2 August 2016
37 pages
PL 3
No ratings yet
PL 3
27 pages
CR and DR SW Config Matrix (6K0323 Rev AJ)
No ratings yet
CR and DR SW Config Matrix (6K0323 Rev AJ)
149 pages
Observer
No ratings yet
Observer
7 pages
ClassBook-Lesson00-JPA With Hibernate 3.0
No ratings yet
ClassBook-Lesson00-JPA With Hibernate 3.0
9 pages
Create-React-App Steps
No ratings yet
Create-React-App Steps
3 pages
Casestudyforpractice Java8features Day2
No ratings yet
Casestudyforpractice Java8features Day2
3 pages
Amines DPP 10 Solutions
No ratings yet
Amines DPP 10 Solutions
4 pages
Engineering Equation Solver (EES)
No ratings yet
Engineering Equation Solver (EES)
6 pages
Smart Logistics Based On The Internet of Things Technology
No ratings yet
Smart Logistics Based On The Internet of Things Technology
24 pages
WSDL
No ratings yet
WSDL
8 pages
Proxy
No ratings yet
Proxy
4 pages
Spring4 Spring MVC
No ratings yet
Spring4 Spring MVC
25 pages
PMHEFT
No ratings yet
PMHEFT
2 pages
Part Test 3 Neet Xi Solution
No ratings yet
Part Test 3 Neet Xi Solution
6 pages
883 Question Paper
No ratings yet
883 Question Paper
2 pages
New Text Document
No ratings yet
New Text Document
3 pages
Js Intro Presentation
No ratings yet
Js Intro Presentation
5 pages
UDDI
No ratings yet
UDDI
4 pages
APNA-380 Instruction Manual (E)
No ratings yet
APNA-380 Instruction Manual (E)
179 pages
Output 5
No ratings yet
Output 5
77 pages
VLAN Note 1744033214
No ratings yet
VLAN Note 1744033214
47 pages
CRI A380 F-36 Ed 2 Closed (Use of Object Oriented Techniques at Design or Source Code Level) )
No ratings yet
CRI A380 F-36 Ed 2 Closed (Use of Object Oriented Techniques at Design or Source Code Level) )
6 pages
Final Theory Exam Practice Questions
No ratings yet
Final Theory Exam Practice Questions
2 pages
Acadinfo
No ratings yet
Acadinfo
10 pages
DC Unit 1
No ratings yet
DC Unit 1
18 pages
Rhod RGB: Quick Installation Guide
No ratings yet
Rhod RGB: Quick Installation Guide
12 pages
Clojure Book
No ratings yet
Clojure Book
63 pages
Campus Drive JD-InFynd
No ratings yet
Campus Drive JD-InFynd
4 pages
Yts C 0111
No ratings yet
Yts C 0111
44 pages
Tentec Omc Manual
No ratings yet
Tentec Omc Manual
47 pages
Department of Information Technology National Instiute of Technology Srinagar
No ratings yet
Department of Information Technology National Instiute of Technology Srinagar
37 pages
Files
No ratings yet
Files
15 pages
VM Load Balancing - XCP-NG Documentation
No ratings yet
VM Load Balancing - XCP-NG Documentation
7 pages
Pega Academy
No ratings yet
Pega Academy
7 pages
Hfss 3d Component Model User Guide
No ratings yet
Hfss 3d Component Model User Guide
11 pages
Spring 2023 Assignment 1 (CS301p)
No ratings yet
Spring 2023 Assignment 1 (CS301p)
3 pages
Parallel and Distributed Transaction Processing: Practice Exercises
No ratings yet
Parallel and Distributed Transaction Processing: Practice Exercises
4 pages
Jurisprudence Ebook & Lecture Notes PDF Download (Studynama Com India's Biggest Website For Law Stud by Vinnie - Singh05 - Issuu
No ratings yet
Jurisprudence Ebook & Lecture Notes PDF Download (Studynama Com India's Biggest Website For Law Stud by Vinnie - Singh05 - Issuu
3 pages
CIS Question Bank
No ratings yet
CIS Question Bank
3 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Kinesis Stream Processing Essentials: Definitive Reference for Developers and Engineers
From Everand
Kinesis Stream Processing Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

What Is Streaming Data

Uploaded by

What Is Streaming Data

Uploaded by

What is Streaming Data?

This data needs to be processed sequentially and incrementally on a record-by-record basis or

Benefits of Streaming Data

Streaming Data Examples

Comparison between Batch Processing and Stream Processing

Batch processing Stream processing

Queries or processing over all Queries or processing over data within a

Individual records or micro batches consisting

Simple response functions, aggregates, and

Challenges in Working with Streaming Data

Working with Streaming Data on AWS

Other Streaming Solutions on Amazon EC2

You might also like