0% found this document useful (0 votes)
5 views22 pages

Big Data - Module 1

The document introduces Big Data, highlighting its characteristics such as volume, velocity, veracity, and variety, along with the types of data including structured, semi-structured, and unstructured. It discusses the development of Hadoop as a solution to manage large data sets and mentions various companies that provide commercial support for Hadoop. Additionally, it outlines the evolution of data processing technologies, starting from the Google File System in 2002 to the establishment of Hadoop in the mid-2000s.

Uploaded by

naikrohith90
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views22 pages

Big Data - Module 1

The document introduces Big Data, highlighting its characteristics such as volume, velocity, veracity, and variety, along with the types of data including structured, semi-structured, and unstructured. It discusses the development of Hadoop as a solution to manage large data sets and mentions various companies that provide commercial support for Hadoop. Additionally, it outlines the evolution of data processing technologies, starting from the Google File System in 2002 to the establishment of Hadoop in the mid-2000s.

Uploaded by

naikrohith90
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Introduction of Big Data,

Hadoop and the Eco System


Mr.R.VIVEKANANDHAN (Senior SME)
What is Data?

Data is a collection
of facts, information,
and statistics and this
can be in various forms
such as numbers,
text, sound, images,
or any other format
Big Data
Data which is used
to store and
process in that if
you face any
problems called
Big Data
Characteristics of Big Data
Volume of Big Data

Data volume is the amount of data


that is stored, processed, and
transmitted. It's a critical factor in big
data, where the volume of data is
often so large that it requires
specialized processing technologies.
Velocity of Big Data

Velocity in Big Data refers to


the speed at which data is
generated, collected,
processed, and analyzed. It
determines how quickly insights
can be extracted and used for
decision-making.
Veracity of Big Data

Veracity is a big data characteristic


related to consistency, accuracy,
quality, and trustworthiness. Data
veracity refers to the biasedness,
noise, and abnormality in data. It
also refers to incomplete data or
errors, outliers, and missing values.
Variety of Big Data

Variety in Big Data refers to the


different types, formats, and sources
of data that organizations collect and
process. Unlike traditional structured
data (e.g., databases), Big Data
includes structured, semi-structured,
and unstructured data from diverse
sources, making it complex to manage
and analyze.
Types of Data
• Structured Data
Well-organized, fits into databases
Examples:
SQL databases, spreadsheets, customer transaction records.

• Semi-Structured Data
Partially organized but lacks a strict format
Examples:
JSON, XML, NoSQL databases, emails, sensor logs.

• Unstructured Data
No predefined format, difficult to process
Examples:
Social media posts, images, videos, PDFs, audio recordings.
 2002 – Google File System
 2004 – Google Map
Reduce

 Using this reference he


developed Hadoop in the
mid of 2005-06
 Access free source code

 Problem => Solution =>


Community => License =>
Publish => Research&
Development => Analysis
=>Commericals
How the company making
Commercial using Open Source

 No support from
ASF(Apache Software
Foundation)
 Some Supporting
approach to the
Respective company who
need Hadoop
 They’ll make commercials
by doing support for the
Hadoop
 Cloudera – ClouderaVM
 Hortonworks – HDP SandBox
 IBM – BigInsights
 Microsoft – HD Insights
 AWS – EMR

You might also like