Big Data Basics - Simple Notes
Big Data Basics - Simple Notes
• Volume: The amount of data. Think of how much data is generated every second (social
media posts, website visits, sensor data, etc.).
• Velocity: The speed at which data is created, processed, and analyzed.
• Variety: The different types of data, such as structured (tables, rows) and unstructured
data (text, images, videos).
Some people also refer to Veracity (data reliability) and Value (usefulness of the data).
b. Spark
c. NoSQL Databases
• Databases designed for handling unstructured data and data that doesn't fit neatly into
traditional relational databases.
• Examples:
o MongoDB: A document-based database.
o Cassandra: A column-based database for handling large amounts of data across
many servers.
o HBase: A column-family store designed to scale across many machines.
• A distributed streaming platform that allows you to build real-time data pipelines and
stream data to other systems.
b. Hive
• A data warehouse system built on top of Hadoop that allows for querying data using
SQL-like language (HiveQL).
c. Pig
• A high-level platform for creating MapReduce programs in Hadoop. It uses Pig Latin,
which is similar to SQL.
• Healthcare: Analyzing patient data for personalized medicine and disease prediction.
• Finance: Detecting fraud, predicting market trends, and risk management.
• Retail: Customer behavior analysis, inventory management, and targeted marketing.
• Government: Analyzing public data for policy making, crime prevention, and smart
cities.
This is a high-level summary of Big Data concepts. If you need more details or specific
examples, just let me know!