Lect 2 Big Data Lesson01
Lect 2 Big Data Lesson01
Objectives
By the end of this lesson, you will be
able to:
Explain the need for Big Data
Define the concept of Big Data
Describe the basics and benefits of
Hadoop
2
Need for Big Data
90% of the data in the world today has been created in last two years alone.
Structured format has some limitations with respect to handling large
quantities of data. Thus, there is a need for perfect mechanism, like Big Data, to
handle these increasing quantities.
Big Data relies on three important aspects of data complexity as explained in
the following image.
3
What is Big Data
Big Data is the term applied to data sets whose size is beyond the ability of
Defining Big Data the commonly used software tools to capture, manage, and process within a
tolerable elapsed time.
● Web logs
● Sensor network
● Social media
● Internet text and documents
● Internet pages
Sources of Big Data ● Search index data
● Atmospheric science, astronomy, biochemical, medical records
● Scientific research
● Military surveillance
● Photography archives
4
Types of Data
Three types of data can be identified:
Unstructured Data
Data which do not have a pre-defined data model
E.g. Text files
Semi-structured Data
Data which do not have a formal data model
E.g. XML files
Structured Data
Data which is represented in a tabular format
E.g. Databases
5
Handling Limitations of Big Data
6
Introduction to Hadoop
7
History and Milestones of Hadoop
Hadoop originated from Nutch open source project on search engine to work
over distributed network nodes. Yahoo was the first company to make and use
Hadoop as a core part of their system operations. Now Hadoop is a core part in
systems like Facebook, LinkedIn, Twitter, etc.
Hadoop Milestones
8
Organizations Using Hadoop
Name of the
organization Cluster specifications Uses
a. Structured data
b. Semi-structured data
c. Unstructured data
d. Flexible-structure data
11
Quiz 1
a. Structured data
b. Semi-structured data
c. Unstructured data
d. Flexible-structure data
Answer: c.
12
Quiz 2
13
Quiz 2
Answer: a.
14
Quiz 3
15
Quiz 3
Answer: c.
16
Quiz
4
17
Quiz 4
Answer: d.
18
Quiz 5
a. Volume
b. Velocity
c. Variety
d. Value
19
Quiz 5
a. Volume
b. Velocity
c. Variety
d. Value
Answer: a.
Explanation: Volume in Big Data refers to the size of the data to be processed.
20
Quiz 6
Which of the following aspects of Big Data refers to the speed of the response of appropriate data request generated
by the user?
a. Variety
b. Value
c. Velocity
d. Volume
21
Quiz 6
Which of the following aspects of Big Data refers to the speed of the response of appropriate data request generated
by the user?
a. Variety
b. Value
c. Velocity
d. Volume
Answer: c.
Explanation: Velocity in Big Data refers to the speed of the response of appropriate data request generated
by the user.
22
Quiz 7
Which of the following aspects of Big Data refers to multiple data sources?
a. Variety
b. Value
c. Volume
d. Velocity
23
Quiz 7
Which of the following aspects of Big Data refers to multiple data sources?
a. Variety
b. Value
c. Volume
d. Velocity
Answer: a.
24
Summary
Let us summarize the topics covered in this lesson:
● Big Data is the term applied to data sets whose size is beyond the ability
of the commonly used software tools to capture, manage, and process
within a tolerable elapsed time.
● Big Data relies on volume, velocity, and variety with respect to
processing.
● Data can be divided into 3 types—Unstructured data, semi-structured
data, and structured data.
● Hadoop is a free, Java-based programming framework that supports the
processing of large data sets in a distributed computing environment.
● Hadoop is a software framework used by organizations like Facebook,
25
26