0% found this document useful (0 votes)
39 views11 pages

Session 3.9.1

This document discusses data streams in data analytics. It defines a data stream as a continuous ordered flow of data items arriving at a high rate. It describes different types of data streams including transactional and measurement streams. Examples of common data stream sources mentioned are sensor data, image data, and internet/web traffic. Key characteristics of data streams are large volumes, continuous updating, requirement for real-time processing, and inability to randomly access past data. Applications discussed include fraud detection, real-time trading, customer analytics, and monitoring IT systems. Advantages are improving sales and costs while disadvantages include lack of security and dependence on cloud providers. The next session will cover stream data models and architectures.

Uploaded by

dhurgadevi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views11 pages

Session 3.9.1

This document discusses data streams in data analytics. It defines a data stream as a continuous ordered flow of data items arriving at a high rate. It describes different types of data streams including transactional and measurement streams. Examples of common data stream sources mentioned are sensor data, image data, and internet/web traffic. Key characteristics of data streams are large volumes, continuous updating, requirement for real-time processing, and inability to randomly access past data. Applications discussed include fraud detection, real-time trading, customer analytics, and monitoring IT systems. Advantages are improving sales and costs while disadvantages include lack of security and dependence on cloud providers. The next session will cover stream data models and architectures.

Uploaded by

dhurgadevi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

SRI KRISHNA COLLEGE OF TECHNOLOGY

[An Autonomous Institution | Affiliated to Anna University and


Approved by AICTE | Accredited by NAAC with ‘A’ Grade]

KOVAIPUDUR, COIMBATORE – 641 042.

21ITE06/BIG DATA ANALYTICS


III YEAR /CSE/VI SEMESTER
MODULE-3
Module 3- HADOOP ECO SYSTEMS

Session Topic
3.1 Hadoop Eco systems: Hive – Architecture – data type – File format

3.2 HQL – SerDe – User defined functions

3.3 Pig: Features – Anatomy – Pig on Hadoop - Pig Latin overview – Data
types – Running pig – Execution modes of Pig
3.4 HDFS commands – Relational operators – Eval Functions – Complex
data type – Piggy Bank – User defined Functions – Parameter
substitution – Diagnostic operator
3.5 Jasper Report: Introduction – Connecting to Mongo DB – Connecting to
Cassandra
3.6 Introduction of Big data Machine learning with Spark: Introduction to
Spark MLib, Linear Regression - Clustering - Collaborative filtering
3.7 Association rule mining – Decision tree using Spark
3.8 Introduction to Graph - Introduction to Spark GraphX
MODULE 1 Introduction to Big Data
3.9.1 Introduction to Streams Concepts

Course Outcome:
Upon completion of the session, students shall have ability to

CO6 Explore the importance of bigdata framework HIVE and its built-in functions, [AP]
datatypes and services like DDL.
Data Stream in Data Analytics
 A data stream is an existing, continuous, ordered (implicitly by entrance time or explicitly by
timestamp) chain of items. It is unfeasible to control the order in which units arrive, nor it is
feasible to locally capture stream in its entirety.
 It is enormous volumes of data, items arrive at a high rate.

Types of Data Streams :


 A data stream is a(possibly unchained) sequence of tuples. Each tuple comprised of a set of

attributes, similar to a row in a database table.


Types of Data Streams
Transactional data stream
◦ It is a log interconnection between entities
◦ Credit card – purchases by consumers from producer
◦ Telecommunications – phone calls by callers to the dialled parties
◦ Web – accesses by clients of information at servers
Measurement data streams
◦ Sensor Networks – a physical natural phenomenon, road traffic
◦ IP Network – traffic at router interfaces
◦ Earth climate – temperature, humidity level at weather stations
Examples of Stream Sources
Sensor Data
◦ In navigation systems, sensor data is used. Imagine a temperature sensor floating about in
the ocean, sending back to the base station a reading of the surface temperature each
hour.
◦ The data generated by this sensor is a stream of real numbers.
◦ We have 3.5 terabytes arriving every day and we for sure need to think about what we can
be kept continuing and what can only be archived.
Image Data
Satellites frequently send down-to-earth streams containing many terabytes of images
per day. Surveillance cameras generate images with lower resolution than satellites, but there can
be numerous of them, each producing a stream of images at a break of 1 second each.
Internet and Web Traffic
A bobbing node in the centre of the internet receives streams of IP packets from many
inputs and paths them to its outputs. Websites receive streams of heterogeneous types. For
example, Google receives a hundred million search queries per day.
Characteristics of Data Streams
 Large volumes of continuous data, possibly infinite.
 Steady changing and requires a fast, real-time response.
 Data stream captures nicely our data processing needs of today.
 Random access is expensive and a single scan algorithm
 Store only the summary of the data seen so far.
 Maximum stream data are at a pretty low level or multidimensional in creation, needs
multilevel and multidimensional treatment.
Applications of Data Streams

◦ Fraud perception
◦ Real-time goods dealing
◦ Consumer enterprise
◦ Observing and describing on inside IT systems
Advantages of Data Streams
 This data is helpful in upgrading sales
 Help in recognizing the fallacy
 Helps in minimizing costs
 It provides details to react swiftly to risk
Disadvantages of Data Streams

 Lack of security of data in the cloud


 Hold cloud donor subordination
 Off-premises warehouse of details introduces the probable for disconnection
Next Session…

3.9.2 Stream Data Model and Architecture .


…………

You might also like