0% found this document useful (0 votes)
9 views2 pages

Unit 1 Big Data Analysis

Big Data refers to large and complex data sets that require advanced tools for processing and analysis. It is characterized by the 5Vs: Volume, Velocity, Variety, Veracity, and Value, and features such as scalability and real-time analysis. Various technologies like Hadoop and Spark support Big Data applications across sectors like healthcare, retail, and banking.

Uploaded by

Mahesh veera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views2 pages

Unit 1 Big Data Analysis

Big Data refers to large and complex data sets that require advanced tools for processing and analysis. It is characterized by the 5Vs: Volume, Velocity, Variety, Veracity, and Value, and features such as scalability and real-time analysis. Various technologies like Hadoop and Spark support Big Data applications across sectors like healthcare, retail, and banking.

Uploaded by

Mahesh veera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Unit 1: Introduction to Big Data

1. Definition of Big Data

Big Data refers to large and complex data sets that cannot be processed efficiently using traditional data

processing tools. It involves capturing, storing, managing, and analyzing huge volumes of data to extract

valuable insights.

2. Characteristics of Big Data (5Vs)

1. Volume - Large amounts of data (terabytes to zettabytes).

2. Velocity - High speed of data generation (real-time or near-real-time).

3. Variety - Different data formats (text, images, videos, logs, etc.).

4. Veracity - Data quality, accuracy, and trustworthiness.

5. Value - Usefulness of the data in decision-making.

3. Features of Big Data

- Scalability - Systems must scale horizontally to manage large data sets.

- Fault Tolerance - Systems should handle failures gracefully.

- Distributed Storage - Data is stored across multiple machines.

- Parallel Processing - Tasks are executed concurrently across nodes.

- Real-Time Analysis - Insights can be extracted in real-time or near real-time.

- Cost Efficiency - Uses commodity hardware and open-source tools like Hadoop.

- Flexibility - Supports multiple data formats and sources.

- Data Redundancy - Ensures data availability through replication.

4. Types of Digital Data

- Structured - Tabular data (e.g., SQL databases).

- Semi-Structured - Partially organized (e.g., XML, JSON).

- Unstructured - No predefined format (e.g., videos, emails, social media posts).

5. Traditional vs Big Data Systems


Unit 1: Introduction to Big Data

Traditional Systems vs Big Data Systems:

Storage: Centralized vs Distributed

Processing: Batch vs Batch & Real-Time

Data Types: Structured vs All types

Scalability: Vertical vs Horizontal

Cost: Expensive vs Cost-effective

6. Technologies Supporting Big Data

- Hadoop - Distributed storage and processing.

- MapReduce - Programming model for parallel data processing.

- Spark - In-memory, faster processing framework.

- NoSQL - MongoDB, Cassandra for flexible data models.

7. Applications of Big Data

- Healthcare - Patient analytics, disease prediction.

- Retail - Customer behavior prediction.

- Banking - Fraud detection, risk analysis.

- Government - Smart cities, public safety.

- Social Media - Trend analysis, sentiment mining.

You might also like