0% found this document useful (0 votes)
85 views

Subject: Big-Data Analytics (CSE-420) Class: B.Tech (CSE) Semester: 6 Semester: 6 Lecture No. 1

The document provides an overview of a lecture on big data analytics. It defines big data using the four V's of volume, velocity, variety, and veracity. It discusses big data issues related to storage and processing. Traditional systems like databases and data warehouses struggle with big data. Tools for big data include NoSQL databases, Hadoop, HDFS, and cloud platforms. Harnessing big data involves online transaction processing, online analytical processing, and real-time analytical processing.

Uploaded by

Mdim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views

Subject: Big-Data Analytics (CSE-420) Class: B.Tech (CSE) Semester: 6 Semester: 6 Lecture No. 1

The document provides an overview of a lecture on big data analytics. It defines big data using the four V's of volume, velocity, variety, and veracity. It discusses big data issues related to storage and processing. Traditional systems like databases and data warehouses struggle with big data. Tools for big data include NoSQL databases, Hadoop, HDFS, and cloud platforms. Harnessing big data involves online transaction processing, online analytical processing, and real-time analytical processing.

Uploaded by

Mdim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Subject: Big-Data Analytics (CSE-420)

Class: B.Tech (CSE)


Semester: 6th
Lecture No. 1
At the end of this lecture, students will be able to
understand the concept of

1. Introduction to Big-Data

2. Big-Data Issues

3. Issues with existing system

4. Big-Data Definition Using 4V’s (IBM def.)

5. Why we need Big-Data

6. Tools typically used in Big-Data

7. Harnessing Big-Data
• Introduction to Big Data:--
Big Data is not a framework, where we have to learn, it is
not a language, not a technology. Really, it says a problem
statements actually.

• Measures the data in terms of: --


Bytes KB(Kilobyte) MB(Megabyte)

GB(Gigabyte) TB(Terabytes) PB(Petabyte)

• EB(Exabyte) ZB(Zetabytes) YB(Yotabytes)


• Big Data Issues:--
Basically two kinds of issues in BigData: --

1. Storage Wise

2. Processing Wise

• Issues with existing system:--


Like Java, mainframes, DWH BI Tools & RDBMS (like
DB2 & MySql) – Straining to store the data & straining to
process the data.
• Big Data Definition Using 4 V’s (IBM Def.):--
1. Volume of the Data: -- Amount of data that we are adding
from data source to data warehouse.
Example:-- (100B, 100MB, 100TB, 1PB)

2. Velocity of the Data:- Data is being generated at an


alarming rate. OR At what speed the data is getting
added to the warehouse from different data sources.
Example:-- 100GB in a week, in a day, in a year etc. etc.
• Big Data Definition Using 4 V’s (IBM Def.):--
3. Variety of the Data: -- Different kinds of data is being
generated from various sources.

Example:--

Structured:-- Data in rows & Column

Semi-structured:-- XML Files, JSON File, Click Stream


etc etc.

Unstructured:-- Doesn’t have any meaning to the data.


• Big Data Definition Using 4 V’s (IBM Def.):--
4. Veracity of the Data: -- Uncertainties & Inconsistencies in
the data.
Extracting the valued data or useful data out of whole
data is called veracity.

• Whenever if you take any data set who is high in volume,


high in velocity, high in variety & high in veracity. The
data set is having these four characteristics, then that time
we have called it as a Big-Data.
• Big Data Definition:--
Big Data is the term for collection of data sets so large and
complex that it becomes difficult to process using on-hand
database system tools or traditional data processing
application.

OR

A data which can not be processed by the traditional dbms,


that is called Big Data.
RDBMS can not process huge data.
• Why Big-Data:--
1. To increase the storage capacities

2. To increase the processing power

3. Availability of the data


• Tools typically used in Big-Data:--
1. NOSQL: -- Database Mongo DB

2. MapReduce:-- Hadoop, Hive, Pig, S4, MapR

3. Storage:-- S3, HDFS

4. Servers:-- EC2, Google App Engine

5. Processing:-- R, Yahoo!
• Harnessing Big-Data:--
1. OLTAP(Online transaction Processing): -- (DBMS)
2. OLAP(Online Analytical Processing):-- (Data warehouses)

3. RTAP(Real-time Analytical Processing):-- (Big-Data


Analytics & Techniques)
Big Data Definition:--
• Big Data is the amount of data just beyond technology’s
capability to store, manage and process efficiently.

Bottom line: Any data that exceeds our current capability of


processing can be regarded as “big”

You might also like