0% found this document useful (0 votes)
13 views12 pages

Intr Oduction of Big Data

Uploaded by

aksaraf1508
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views12 pages

Intr Oduction of Big Data

Uploaded by

aksaraf1508
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

INTR ODUCTION OF BIG DATA

Presented By : Man Singh


Enrolment No. : 0198EX211025
Branch : CSE VII Sem
Subject : CS702(B)-Big Data
Introduction of Big
Data
Big data refers to the massive and complex datasets that traditional
data processing applications are inadequate to handle. It encompasses
a wide range of data types, including structured, unstructured, and
semi-structured data, generated from various sources at high velocity
and volume.
Key Characteristics of B ig Data
1 Volume 2 Variety
The sheer amount of data being generated and collected, The diverse data types, including structured,
often in the petabytes or exabytes range. unstructured, and semi-structured data, from various
sources.

3 Velocity 4 Veracity
The speed at which data is being created, collected, and The trustworthiness and reliability of the data, ensuring its
processed, often in real-time or near-real-time. accuracy and consistency.
B enefits of B ig Data
Improved Decision-Making Increased Operational E nhanced Customer
E fficiency E xperience
Leveraging data-driven insights to
make more informed and strategic Optimizing processes and workflows Personalized and targeted
decisions. through data-based process products/services based on
improvements. customer behavior and preferences.
Challenges in Big Data Management
Data Storage Data Security
Managing the vast amount of data and ensuring scalable storage Protecting sensitive data and ensuring compliance with data privacy
solutions. regulations.

1 2 3

Data Integration
Integrating diverse data sources and formats into a cohesive
system.
B ig Data Technologies and Tools

Hadoop Apache S park Apache Kafka Tableau


A framework for distributed A unified analytics engine A distributed streaming A data visualization and
processing of large for large-scale data platform for building real- business intelligence
datasets. processing. time data pipelines. software.
Unit -2

Introduction to
Hadoop
Hadoop is an open-source software framework designed to handle
large-scale data processing and storage across a distributed
computing environment. It provides a reliable, scalable, and fault-
tolerant infrastructure for data-intensive applications.
What is Hadoop?
Distributed File System MapReduce Ecosystem

Hadoop's Distributed File System Hadoop's MapReduce framework Hadoop has a rich ecosystem of tools
(HDFS) stores and manages data processes large datasets in parallel, and technologies that extend its
across multiple servers, providing distributing the workload across a capabilities, such as Hive, Spark, and
high availability and fault tolerance. cluster of computers. Kafka.
Key Components of Hadoop
1 NameNode
The NameNode manages the file system namespace
and coordinates access to files by clients.

2 DataNode
DataNodes store and manage the actual data blocks,
providing redundancy and fault tolerance.

3 ResourceManager
The ResourceManager allocates and manages the
computational resources in the Hadoop cluster.
Hadoop Ecosystem and Applications
Big Data Analytics Data Warehousing
Hadoop is widely used for analyzing large datasets, including web Hadoop's distributed storage and processing capabilities make it a
logs, sensor data, and social media information. popular choice for data warehousing and business intelligence
applications.

Stream Processing Machine Learning


Hadoop's ecosystem includes tools like Spark Streaming and Kafka Hadoop's scalability and parallel processing features enable
for real-time data processing and streaming analytics. advanced machine learning and deep learning algorithms to be
applied to big data.
B enefits and Challenges of
Hadoop

S calability Cost-E ffective


Hadoop's distributed architecture Hadoop runs on commodity hardware,
allows it to scale up or down to handle making it a cost-effective solution for
growing data volumes and processing big data processing and storage.
needs.

Fault Tolerance Complexity


Hadoop's replication and failover The distributed nature of Hadoop and
mechanisms provide a high degree of its ecosystem can make it challenging
fault tolerance and data reliability. to set up, configure, and manage.
Thank you

You might also like