Rev. Lecture 1 PPT2
Rev. Lecture 1 PPT2
UNIT 1
Introduction – distributed file system
Lecture 1
Presented By:
Pooja Varshney
Assistant Professor, CEIT
Learning Objective
• Understand the architecture, purpose, and functionality of
distributed file systems in managing large-scale data.Big
• Learn the concept of Big Data, its significance, and how it
influences decision-making across industries.
• Identify the Four Vs (Volume, Velocity, Variety, Veracity) and
key drivers propelling the growth of Big Data.
• Explore analytics techniques and real-world applications of
Big Data in domains like healthcare, finance, and social media.
• Understand the MapReduce framework and implement
algorithms like Matrix-Vector Multiplication to process large
datasets efficiently.
Learning Outcome
• Understanding Distributed File Systems: Learners will be able to
explain the role and functionality of distributed file systems in
managing large datasets.
• Comprehending Big Data Concepts: Learners will articulate the
importance of Big Data and its impact on modern decision-making
processes.
• Analyzing the Four Vs and Drivers: Learners will evaluate the Four Vs of
Big Data and identify the key factors driving its growth and adoption.
• Applying Big Data Analytics: Learners will demonstrate the ability to
analyze Big Data and recognize its applications across various industries.
• Implementing MapReduce Algorithms: Learners will design and
execute algorithms, such as Matrix-Vector Multiplication, using the
MapReduce paradigm to solve complex data problems.
Distributed File System (DFS)
• Which of the following best describes the term "fault tolerance" in a DFS?
A. Ability to detect errors in file transfers
B. Ability to continue functioning despite node failures
C. The process of replicating data
D. Ensuring fast write operations
• https://
www.youtube.com/watch?v=c3loR2znLDI
• https://userweb.ucs.louisiana.edu/~
vvr3254/CMPS598/Notes/Matrix-Vector%20M
ultiplication%20by%20MapReduce-v2.pdf?ut
m_source=chatgpt.com
• https://www.databricks.com/glossary/hadoop
-distributed-file-system-hdfs
?
Reference Books
• "Hadoop: The Definitive Guide" by Tom White
• "Big Data: Principles and Best Practices of Scalable
Real-Time Data Systems" by Nathan Marz and James
Warren
• "Mining of Massive Datasets" by Jure Leskovec, Anand
Rajaraman, and Jeffrey Ullman
• "Big Data Analytics: From Strategic Planning to
Enterprise Integration with Tools, Techniques, NoSQL,
and Graph" by David Loshin
• "Data-Intensive Text Processing with MapReduce" by
Jimmy Lin and Chris Dyer
THANK YOU
24