Big Data - Module 1
Big Data - Module 1
Data is a collection
of facts, information,
and statistics and this
can be in various forms
such as numbers,
text, sound, images,
or any other format
Big Data
Data which is used
to store and
process in that if
you face any
problems called
Big Data
Characteristics of Big Data
Volume of Big Data
• Semi-Structured Data
Partially organized but lacks a strict format
Examples:
JSON, XML, NoSQL databases, emails, sensor logs.
• Unstructured Data
No predefined format, difficult to process
Examples:
Social media posts, images, videos, PDFs, audio recordings.
2002 – Google File System
2004 – Google Map
Reduce
No support from
ASF(Apache Software
Foundation)
Some Supporting
approach to the
Respective company who
need Hadoop
They’ll make commercials
by doing support for the
Hadoop
Cloudera – ClouderaVM
Hortonworks – HDP SandBox
IBM – BigInsights
Microsoft – HD Insights
AWS – EMR