Unit 1.1 - Introduction To Big Data Analytics
Unit 1.1 - Introduction To Big Data Analytics
Data
• Raw facts
Information
• Processed data -
Insight / Knowledge
• Deep understanding
Digital Data
Structured
Semi-structured
Unstructured
1. Structured Digital Data
• Data is called structured when it conforms to a pre-defined
schema or structure
• E.g.: RDBMS
• Ease of working with Structured Data
• Add/Mod/Del
• Security
• Indexing
• Scalability
• Transaction processing
2. Semi-structured Digital Data
• Also referred to as Self-describing Structure
• Data stored using Markup tags
• E.g.:
• HTML – HyperText Markup Language
• XML – eXtensible Markup Language
• JSON – Java Script Object Notation
• There is no separation between the data and the schema
• Entities belonging to the same class need not have the same
attributes
3. Unstructured Digital Data
• Data that does not conform to any pre-defined data model
• E.g.:
• Text messages
• Log files on a server
• Email
• Web pages
• Images
• Audio
• Video
• Free-form text
• Social media posts
• Chats
• Document
How much data is structured?
Structured Unstructured
Data Data
20% 80%
Introduction to Big Data
• Big Data refers to the massive datasets that are collected from
a variety of data sources for business needs to reveal Big Data
new insights for optimized decision-making. Analytics
Mobile Cloud
Computing Computing
Highly available
Big Data data storage and
Analytics computational facility
Big Data Characteristics (5 Vs)
1. Volume
2. Velocity
3. Variety
4. Veracity
5. Value
Big Data Characteristics (5 Vs)
1. Volume
• The “big” in big data is a relative term
• E.g.: What is big for a small company may be quite small for a government
Real-time
data Petabytes (~1015 bytes)
Volume Can handle only less volume Can handle much larger volumes
Velocity Can handle only less velocity Can handle large velocity data