0% found this document useful (0 votes)

4 views11 pages

Module - 1 - Introduction To Big Data

The document provides an overview of Big Data, highlighting its significance, challenges, and the evolving characteristics known as the '6Vs' (Volume, Velocity, Variety, Veracity, Value, and Variability). It categorizes data into structured, unstructured, and semi-structured types, and discusses the application of Big Data across various sectors such as banking, government, education, healthcare, e-commerce, and social media. Additionally, it outlines tools for Big Data analytics and contrasts traditional data management with Big Data management.

Uploaded by

sahilkhedekar2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views11 pages

Module - 1 - Introduction To Big Data

Uploaded by

sahilkhedekar2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

INTRODUCTION TO BIG DATA

Big Data is becoming one of the most talked about technology trends nowadays. The real challenge with the big
organization is to get maximum out of the data already available and predict what kind of data to collect in the
future. How to take the existing data and make it meaningful that it provides us accurate insight in the past data is
one of the key discussion points in many of the executive meetings in organizations.

With the explosion of the data the challenge has gone to the next level and now a Big Data is becoming the reality in
many organizations. The goal of every organization and expert is same to get maximum out of the data, the route
and the starting point are different for each organization and expert. As organizations are evaluating and
architecting big data solutions they are also learning the ways and opportunities which are related to Big Data.

There is not a single solution to big data as well there is not a single vendor which can claim to know all about Big
Data. Big Data is too big a concept and there are many players – different architectures, different vendors and
different technology.

Big Data 3 V’s and 6 V’s

In recent years, Big Data was defined by the “3Vs” but now there is “6Vs” of Big Data which are also termed as the
characteristics of Big Data as follows:

Join Our Telegram Group to Get Notifications, Study Materials, Practice test & quiz: https://t.me/ccatpreparations
Visit: www.ccatpreparation.com
1. Volume:

 The name ‘Big Data’ itself is related to a size which is enormous.

 Volume is a huge amount of data.
 To determine the value of data, size of data plays a very crucial role. If the volume of data is
very large, then it is actually considered as a ‘Big Data’. This means whether a particular data
can actually be considered as a Big Data or not, is dependent upon the volume of data.
 Hence while dealing with Big Data it is necessary to consider a characteristic ‘Volume’.
 Example: In the year 2016, the estimated global mobile traffic was 6.2 Exabytes (6.2 billion
GB) per month. Also, by the year 2020 we will have almost 40000 Exabytes of data.

2. Velocity:
 Velocity refers to the high speed of accumulation of data.
 In Big Data velocity data flows in from sources like machines, networks, social media, mobile
phones etc.

Join Our Telegram Group to Get Notifications, Study Materials, Practice test & quiz: https://t.me/ccatpreparations
Visit: www.ccatpreparation.com
 There is a massive and continuous flow of data. This determines the potential of data that
how fast the data is generated and processed to meet the demands.
 Sampling data can help in dealing with the issue like ‘velocity’.
 Example: There are more than 3.5 billion searches per day are made on Google. Also,
Facebook users are increasing by 22%(Approx.) year by year.

3. Variety:
 It refers to nature of data that is structured, semi-structured and unstructured data.
 It also refers to heterogeneous sources.
 Variety is basically the arrival of data from new sources that are both inside and outside of an
enterprise. It can be structured, semi-structured and unstructured.
 Structured data: This data is basically an organized data. It generally refers to data
that has defined the length and format of data.
 Semi- Structured data: This data is basically a semi-organised data. It is generally a
form of data that do not conform to the formal structure of data. Log files are the
examples of this type of data.
 Unstructured data: This data basically refers to unorganized data. It generally refers
to data that doesn’t fit neatly into the traditional row and column structure of the
relational database. Texts, pictures, videos etc. are the examples of unstructured
data which can’t be stored in the form of rows and columns.

4. Veracity:
 It refers to inconsistencies and uncertainty in data, that is data which is available can
sometimes get messy and quality and accuracy are difficult to control.
 Big Data is also variable because of the multitude of data dimensions resulting from multiple
disparate data types and sources.
 Example: Data in bulk could create confusion whereas less amount of data could convey half
or Incomplete Information.


Join Our Telegram Group to Get Notifications, Study Materials, Practice test & quiz: https://t.me/ccatpreparations
Visit: www.ccatpreparation.com
5. Value:
 After having the 4 V’s into account there comes one more V which stands for Value! The bulk
of Data having no Value is of no good to the company, unless you turn it into something
useful.
 Data in itself is of no use or importance but it needs to be converted into something valuable
to extract Information. Hence, you can state that Value! is the most important V of all the
6V’s.

6. Variability:
 How fast or available data that extent is the structure of your data is changing?
 How often does the meaning or shape of your data change?
 Example: if you are eating same ice-cream daily and the taste just keep changing.

Different Types of Big Data

Big data types in Big Data are used to categorize the numerous kinds of data generated daily.
Primarily there are 3 types of data in analytics. The following types of Big Data with examples are
explained below:-

1. Structured Data: Any data that can be processed, is easily accessible, and can be stored in a
fixed format is called structured data. In Big Data, structured data is the easiest to work with
because it has highly coordinated measurements that are defined by setting parameters.
Structured types of Big Data are:-

 Address
 Age
 Credit/debit card numbers
 Contact
 Expenses
 Billing
2. Unstructured Data: Unstructured data in Big Data is where the data format constitutes
multitudes of unstructured files (images, audio, log, and video). This form of data is classified as
intricate data because of its unfamiliar structure and relatively huge size. A stark example of
unstructured data is an output returned by ‘Google Search’ or ‘Yahoo Search.’

Join Our Telegram Group to Get Notifications, Study Materials, Practice test & quiz: https://t.me/ccatpreparations
Visit: www.ccatpreparation.com
3. Semi-structured Data: In Big Data, semi-structured data is a combination of both
unstructured and structured types of data. This form of data constitutes the features of
structured data but has unstructured information that does not adhere to any formal structure
of data models or any relational database. Some semi-structured data examples include XML and
JSON.

Banking

Since there is a massive amount of data that is gushing in from innumerable sources, banks need
to find uncommon and unconventional ways to manage big data. It’s also essential to examine
customer requirements, render services according to their specifications, and reduce risks while
sustaining regulatory compliance. Financial institutions have to deal with Big Data Analytics to
solve this problem.

overnment

Government agencies utilize Big Data and have devised a lot of running agencies, managing
utilities, dealing with traffic jams, or limiting the effects of crime. However, apart from its benefits in
Big Data, the government also addresses the concerns of transparency and privacy.

 Aadhar Card: The Indian government has a record of all 1.21 billion citizens. This huge data is stored
and analyzed to find out several things, such as the number of youth in the country. According to which

Join Our Telegram Group to Get Notifications, Study Materials, Practice test & quiz: https://t.me/ccatpreparations
Visit: www.ccatpreparation.com
several schemes are made to target the maximum population. All this big data can’t be stored in some
traditional database, so it is left for storing and analyzing using several Big Data Analytics tools.

Education

Education concerning Big Data produces a vital impact on students, school systems, and
curriculums. By interpreting big data, people can ensure students’ growth, identify at-risk students,
and achieve an improvised system for the evaluation and assistance of principals and teachers.

 Example: The education sector holds a lot of information concerning curriculum, students,
and faculty. The information is analyzed to get insights that can enhance the operational
adequacy of the educational organization. Collecting and analyzing information about a
student such as attendance, test scores, grades, and other issues take up a lot of data. So,
big data approaches a progressive framework wherein this data can be stored and analyzed
making it easier for the institutes to work with.

Big Data in Healthcare

When it comes to what Big Data is in Healthcare, we can see that it is being used enormously. It
includes collecting data, analyzing it, leveraging it for customers. Also, patients’ clinical data is too
complex to be solved or understood by traditional systems. Since big data is processed
by Machine Learning algorithms and Data Scientists, tackling such huge data becomes
manageable.

 Example: Nowadays, doctors rely mostly on patients’ clinical records, which means that a lot
of data needs to be gathered, that too for different patients. It is not possible for old or
traditional data storage methods to store this data. Since there is a large amount of data
coming from different sources, in various formats, the need to handle this large amount of
data is increased, and that is why the Big Data approach is needed.

E-commerce

Maintaining customer relationships is the most important in the e-commerce industry. E-commerce
websites have different marketing ideas to retail their merchandise to their customers, manage

 Flipkart: Flipkart is a huge e-commerce website dealing with lots of traffic daily. But, when
there is a pre-announced sale on Flipkart, traffic grows exponentially that crashes the website.
So, to handle this kind of traffic and data, Flipkart uses Big Data. Big Data can help in
organizing and analyzing the data for further use.

Social Media

Social media in the current scenario is considered the largest data generator. The stats have
shown that around 500+ terabytes of new data get generated into the databases of social media
every day, particularly in the case of Facebook. The data generated mainly consist of videos,
photos, message exchanges, etc. A single activity on any social media site generates a lot of data
which is again stored and gets processed whenever required. Since the data stored is in
terabytes, it would take a lot of time for processing if it is done by our legacy systems. Big Data is
a solution to this problem.

What is Big Data Analytics?

Big Data Analytics examines large and different types of data to uncover hidden patterns,
insights, and correlations. Big Data Analytics is helping large companies facilitate their growth and
development. And it majorly includes applying various data mining algorithms on a certain dataset.

How is Big Data Analytics used today?

Big Data Analytics is used in several industries to allow organizations and companies to make
better decisions, as well as verify and disprove existing theories or models. The focus of Data
Analytics lies in inference, which is the process of deriving conclusions that are solely based on
what the researcher already knows.

Join Our Telegram Group to Get Notifications, Study Materials, Practice test & quiz: https://t.me/ccatpreparations
Visit: www.ccatpreparation.com
Tools for Big Data Analytics

 Apache Hadoop
Big Data Hadoop is a framework that allows you to store big data in a distributed environment for
parallel processing.
 Apache Pig
Apache Pig is a platform that is used for analyzing large datasets by representing them as data flows.
Pig is designed to provide an abstraction over MapReduce which reduces the complexities of writing a
MapReduce program.
 Apache HBase
Apache HBase is a multidimensional, distributed, open-source, and NoSQL database written in Java.
It runs on top of HDFS providing Bigtable-like capabilities for Hadoop.
 Apache Spark
Apache Spark is an open-source general-purpose cluster-computing framework. It provides an
interface for programming all clusters with implicit data parallelism and fault tolerance.

Join Our Telegram Group to Get Notifications, Study Materials, Practice test & quiz: https://t.me/ccatpreparations
Visit: www.ccatpreparation.com
 Talend
Talend is an open-source data integration platform. It provides many services for enterprise application
integration, data integration, data management, cloud storage, data quality, and Big Data.
 Splunk
Splunk is an American company that produces software for monitoring, searching, and analyzing
machine-generated data using a Web-style interface.
 Apache Hive
Apache Hive is a data warehouse system developed on top of Hadoop and is used for interpreting
structured and semi-structured data.
 Kafka
Apache Kafka is a distributed messaging system that was initially developed at LinkedIn and later
became part of the Apache project. Kafka is agile, fast, scalable, and distributed by design.

Difference between Traditional data and Big data

Traditional Data Big Data

Big data is generated outside the

Traditional data is generated in enterprise level.
enterprise level.

Its volume ranges from Petabytes to

Its volume ranges from Gigabytes to Terabytes.
Zettabytes or Exabytes.

Big data system deals with structured,

Traditional database system deals with
semi-structured,database, and
structured data.
unstructured data.

Traditional data is generated per hour or per But big data is generated more frequently
day or more. mainly per seconds.

Traditional data source is centralized and it is Big data source is distributed and it is
managed in centralized form. managed in distributed form.

Data integration is very easy. Data integration is very difficult.

Normal system configuration is capable to High system configuration is required to

process traditional data. process big data.

Join Our Telegram Group to Get Notifications, Study Materials, Practice test & quiz: https://t.me/ccatpreparations
Visit: www.ccatpreparation.com
Traditional Data Big Data

The size is more than the traditional data

The size of the data is very small.
size.

Special kind of data base tools are

Traditional data base tools are required to
required to perform any
perform any data base operation.
databaseschema-based operation.

Special kind of functions can manipulate

Normal functions can manipulate data.
data.

Its data model is strict schema based and it is Its data model is a flat schema based and
static. it is dynamic.

Big data is not stable and unknown

Traditional data is stable and inter relationship.
relationship.

Big data is in huge volume which

Traditional data is in manageable volume.
becomes unmanageable.

It is difficult to manage and manipulate

It is easy to manage and manipulate the data.
the data.

Its data sources includes ERP transaction data, Its data sources includes social media,
CRM transaction data, financial data, device data, sensor data, video, images,
organizational data, web transaction data etc. audio etc.

Join Our Telegram Group to Get Notifications, Study Materials, Practice test & quiz: https://t.me/ccatpreparations
Visit: www.ccatpreparation.com

BigData Hadoop Notes
No ratings yet
BigData Hadoop Notes
101 pages
Big Data 2022 Notes
No ratings yet
Big Data 2022 Notes
118 pages
BDA GTU Study Material E-Notes All-Units 03122021014217PM
No ratings yet
BDA GTU Study Material E-Notes All-Units 03122021014217PM
42 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
39 pages
Big Data
No ratings yet
Big Data
7 pages
Unit 1 What Is Big Data
No ratings yet
Unit 1 What Is Big Data
26 pages
Unit 1 BDM
No ratings yet
Unit 1 BDM
49 pages
BDA Introduction
No ratings yet
BDA Introduction
61 pages
Bda18cs72 M1
No ratings yet
Bda18cs72 M1
31 pages
Module 04 Ba
No ratings yet
Module 04 Ba
45 pages
Big Data SKN
No ratings yet
Big Data SKN
24 pages
Converted 4011171
No ratings yet
Converted 4011171
144 pages
Big Data Cat 1
No ratings yet
Big Data Cat 1
11 pages
Lexus - GS300 - GS430 - Service - Manual 1
100% (12)
Lexus - GS300 - GS430 - Service - Manual 1
615 pages
Unit 1
No ratings yet
Unit 1
56 pages
Bda M1
No ratings yet
Bda M1
111 pages
Big Data Analysis
No ratings yet
Big Data Analysis
14 pages
BIG Data Analytics
No ratings yet
BIG Data Analytics
17 pages
Module 6 - Big Data and NOSQL
No ratings yet
Module 6 - Big Data and NOSQL
63 pages
Unit-5 DS
No ratings yet
Unit-5 DS
20 pages
Big Data Analytics
No ratings yet
Big Data Analytics
127 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
Unit I
No ratings yet
Unit I
25 pages
Big Data 1
No ratings yet
Big Data 1
22 pages
Big Data Chapter 1
No ratings yet
Big Data Chapter 1
22 pages
Big Data
No ratings yet
Big Data
4 pages
BDA Unit 1
No ratings yet
BDA Unit 1
22 pages
BDA Unit 1
No ratings yet
BDA Unit 1
60 pages
Big Data Study Material Part 1 (Unit I) - 1
No ratings yet
Big Data Study Material Part 1 (Unit I) - 1
38 pages
Unit 3 Big Data Analytics
No ratings yet
Unit 3 Big Data Analytics
18 pages
BDA Question Answer
No ratings yet
BDA Question Answer
29 pages
Big Data
No ratings yet
Big Data
16 pages
Unit 1
No ratings yet
Unit 1
57 pages
Unit-1 Bda
No ratings yet
Unit-1 Bda
20 pages
Evolution of Big Data and Tools For Big Data
No ratings yet
Evolution of Big Data and Tools For Big Data
9 pages
Big Data (Analytics) in Power Systems
No ratings yet
Big Data (Analytics) in Power Systems
20 pages
Introductions: What Are The 5 Vs of Big Data/ Characteristics of Big Data or Nature of Data
No ratings yet
Introductions: What Are The 5 Vs of Big Data/ Characteristics of Big Data or Nature of Data
75 pages
Lec 1 - Introduction To Big Data
No ratings yet
Lec 1 - Introduction To Big Data
37 pages
1.2 Module-1
No ratings yet
1.2 Module-1
21 pages
07 Perforance Dashboard
100% (1)
07 Perforance Dashboard
37 pages
Quote: "Data Is Widely Available. What Is Scarce Is The Ability To Extract Wisdom From It."
No ratings yet
Quote: "Data Is Widely Available. What Is Scarce Is The Ability To Extract Wisdom From It."
58 pages
Unit 1
No ratings yet
Unit 1
44 pages
Unit I Bda
No ratings yet
Unit I Bda
18 pages
(Ca) Bda Unit-I
No ratings yet
(Ca) Bda Unit-I
10 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
37 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
Unit 1 and Unit 2 Notes Bda
No ratings yet
Unit 1 and Unit 2 Notes Bda
11 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
What Is Data
No ratings yet
What Is Data
20 pages
Unit 1 (Chapter 1) - Introduction
No ratings yet
Unit 1 (Chapter 1) - Introduction
10 pages
Big Data Hadoop
No ratings yet
Big Data Hadoop
35 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
Overview of Big Data
No ratings yet
Overview of Big Data
4 pages
PM & CM Training
100% (2)
PM & CM Training
29 pages
Ecdis PDF
No ratings yet
Ecdis PDF
2 pages
Big Data Analysis
No ratings yet
Big Data Analysis
3 pages
Phishing Websites by ML
No ratings yet
Phishing Websites by ML
4 pages
WCDMA BTS Alarm Descriptions
No ratings yet
WCDMA BTS Alarm Descriptions
125 pages
MCQs On UNIT 4
No ratings yet
MCQs On UNIT 4
5 pages
Unit 1 Engineering Design Presentation
No ratings yet
Unit 1 Engineering Design Presentation
15 pages
Machine User Manual
No ratings yet
Machine User Manual
14 pages
Ohio University Critique Paper
100% (1)
Ohio University Critique Paper
6 pages
CW50 CW50L CW50N: Owner'S Manual
No ratings yet
CW50 CW50L CW50N: Owner'S Manual
68 pages
Test Bank For Fundamentals of Solid Modeling and Graphics Communication 7th Edition Bertoline
No ratings yet
Test Bank For Fundamentals of Solid Modeling and Graphics Communication 7th Edition Bertoline
105 pages
Motors Efficiency Standards & Regulations LAFERT
No ratings yet
Motors Efficiency Standards & Regulations LAFERT
6 pages
FinalCut Pro 6中文使用手册
No ratings yet
FinalCut Pro 6中文使用手册
2,033 pages
Preventive Maintenance Vs Corrective Maintenance 1735522987
No ratings yet
Preventive Maintenance Vs Corrective Maintenance 1735522987
5 pages
Eastern Kentucky University: Lab 4 Developing Robotics Application
No ratings yet
Eastern Kentucky University: Lab 4 Developing Robotics Application
3 pages
CFIHOS Implementation Guide For Principal
No ratings yet
CFIHOS Implementation Guide For Principal
28 pages
Internet Etiquette PDF
No ratings yet
Internet Etiquette PDF
3 pages
Brother BP2100 Espanol Sewing Machine Instruction Manual
No ratings yet
Brother BP2100 Espanol Sewing Machine Instruction Manual
180 pages
Contaminant Transport
No ratings yet
Contaminant Transport
2 pages
Esp32 Errata en
No ratings yet
Esp32 Errata en
25 pages
Brochure Omega 2007
No ratings yet
Brochure Omega 2007
8 pages
Solar Energy Research
No ratings yet
Solar Energy Research
3 pages
Assignment
No ratings yet
Assignment
3 pages
Akansha Rana: Position: Company: Location: Experience
No ratings yet
Akansha Rana: Position: Company: Location: Experience
2 pages
Assignment 4
No ratings yet
Assignment 4
2 pages
Parts Marking Identification Ac - 43-213a
No ratings yet
Parts Marking Identification Ac - 43-213a
5 pages
2332x Ref
No ratings yet
2332x Ref
3 pages
Zeiss Erosion Module
No ratings yet
Zeiss Erosion Module
13 pages
Assessment: Rubric CLO 3 Practical 1: Buoyancy Concept
No ratings yet
Assessment: Rubric CLO 3 Practical 1: Buoyancy Concept
2 pages
Data Analytics for Beginners: Introduction to Data Analytics
From Everand
Data Analytics for Beginners: Introduction to Data Analytics
Anthony S. Williams
4/5 (19)
Big Data: Revolutionizing the Future
From Everand
Big Data: Revolutionizing the Future
Parvati Mishra
No ratings yet
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
From Everand
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
Marlowe Reyes
No ratings yet
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
From Everand
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
Waldo Todd
No ratings yet
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
From Everand
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
Steven Vollmer
No ratings yet
Data Analytics for Businesses 2019: Master Data Science with Optimised Marketing Strategies using Data Mining Algorithms (Artificial Intelligence, Machine Learning, Predictive Modelling and more)
From Everand
Data Analytics for Businesses 2019: Master Data Science with Optimised Marketing Strategies using Data Mining Algorithms (Artificial Intelligence, Machine Learning, Predictive Modelling and more)
Riley Adams
5/5 (1)
Data Analytics with Python: Data Analytics in Python Using Pandas
From Everand
Data Analytics with Python: Data Analytics in Python Using Pandas
Frank Millstein
3/5 (1)
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
From Everand
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Module - 1 - Introduction To Big Data

Uploaded by

Module - 1 - Introduction To Big Data

Uploaded by

INTRODUCTION TO BIG DATA

Big Data 3 V’s and 6 V’s

 The name ‘Big Data’ itself is related to a size which is enormous.

Different Types of Big Data

Big Data in Healthcare

What is Big Data Analytics?

How is Big Data Analytics used today?

Difference between Traditional data and Big data

Big data is generated outside the

Its volume ranges from Petabytes to

Big data system deals with structured,

Data integration is very easy. Data integration is very difficult.

Normal system configuration is capable to High system configuration is required to

The size is more than the traditional data

Special kind of data base tools are

Special kind of functions can manipulate

Big data is not stable and unknown

Big data is in huge volume which

It is difficult to manage and manipulate

You might also like