DBMS Final
DBMS Final
Presented By
Amr Khaled Salem 4 12311117
Devyani Anande 5 12310148
Tapan Belapurkar 7 12311797
C E Dhakshesh 11 12311915
Vishwakarma Institute Of Technology
Department Instrumentation & Control
• Introduction
• Objective of Seminar
• Need of Seminar
• Motivation of Seminar
• Literature review
• Components of GFS
• Block Diagram
• Applications
Overview • Case Studies
• Youtube
• Technical Comparisons
• Advantages of GFS
• Advantages over Traditional file system
• Integration with DBMS
• Conclusion
• References
Introduction
•Google File System (GFS) is a distributed
file system designed by Google to handle
large-scale data storage and processing.
•To explore how GFS integrates with Database Management Systems (DBMS) for storing
large data sets and enhancing performance.
•To discuss the role of distributed file systems in modern database architectures and
applications.
Need for the Seminar
•As data grows exponentially, traditional storage systems and DBMS face scalability
and performance challenges.
•Big Data, Cloud Computing, and Machine Learning applications require systems
like GFS for efficient data handling.
•Companies like Google and others in cloud and big data services rely heavily on
GFS for scalable and resilient storage solution
Motivation for the Seminar
•To explore solutions that can handle massive volumes of data across distributed
systems, beyond the limitations of traditional databases.
•Understanding how GFS ensures data integrity and availability, which is critical for
modern DBMS applications.
S. Ghemawat, "The Google File System" ACM SOSP, 2003 Introduced GFS, a scalable distributed file
H. Gobioff, and system for large-scale data processing;
S.-T. Leung focused on fault tolerance and high
throughput.
K. Shvachko et "The Hadoop Distributed File IEEE MSST, 2010 Discussed how GFS influenced the Hadoop
al. System" Distributed File System (HDFS); compared
design elements such as metadata
management.
T. Condie et al. "MapReduce Online" USENIX NSDI, 2010 Discussed extensions to the MapReduce
framework for online processing, with
considerations on GFS's role in supporting
such systems.
Author Title Publication/Year Key Contributions
H. Gupta "Insights from paper — 2021 Analyzed the key aspects of GFS,
The Google File System" including its fault tolerance
mechanisms and scalability
features, offering insights into its
real-world applications.
Architecture Of GFS
Components of Google File System (GFS)
• Master Node: Controls metadata, file system operations, and coordinates data
management across the system.
• Key Responsibilities: Keeps track of file names, chunk locations, and file system
consistency.
• Chunk Servers: Store actual data in chunks (typically 64 MB each) and serve data
to clients.
• Fault Tolerance: Data is replicated across multiple chunk servers for reliability.
• Clients: Communicate with both the master node and chunk servers to access and
modify data.
• Data Access: Clients can perform read and write operations on data stored in GFS.
Applications of GFS in DBMS
•Big Data Storage
•Data Analytics
•Backup
•Cloud Based
• Data Sharding for Performance Optimization
Case Studies / Real-world Applications
• Google Search: Uses GFS for storing search index data
• YouTube: Utilizes GFS for storing video data across multiple
servers
• Google Bigtable (NoSQL): A distributed storage system used
for managing structured data, relies on GFS for its underlying
file storage needs.
• Cloud Services: Cloud providers like Google Cloud and AWS
Application of GFS in Youtube
• Massive Storage
• Replication
• High Throughput
• Metadata Management
• Scalability
• Load Balancing
Technical Comparisons
HDFS (Hadoop Distributed File System):
Similarities:
• Both are designed for large-scale data storage and processing.
• Operate on commodity hardware to reduce costs.
Differences:
• GFS emphasizes real-time data access for Google's services, whereas HDFS is
tailored for batch processing in Hadoop ecosystems.
• HDFS uses a single NameNode for metadata, making it a single point of failure,
while GFS uses replication strategies for metadata for fault tolerance.
• Block sizes differ: GFS typically uses 64 MB chunks; HDFS uses 128 MB by default.
Advantages of GFS for DBMS
•Scalability: Supports petabytes of data, allowing databases to scale seamlessly as
data grows.
•High Throughput: Ideal for large-scale data operations such as big data processing,
batch analytics, and database backups.
•Highly optimized for high-throughput operations, critical for services like Google
Search and YouTube.
Integration with DBMS
Data Handling:
•GFS supports large-scale unstructured data, making it a suitable backend for
modern DBMS, including both SQL and NoSQL databases.
•GFS serves as the storage backbone, allowing DBMS to efficiently retrieve and store
massive datasets.
Challenges and Limitations
• Dependancy On Custom Hardwares: Designed for Google’s Infrastructure,not generic use.
• Limited Small File Performance – Optimized for large files; small files can cause
inefficiencies.
• High Latency for Metadata Updates – Frequent metadata changes can slow down
operations.
Conclusion
•GFS is a powerful distributed file system designed for large-scale data storage, which can
greatly enhance the performance and scalability of Database Management Systems (DBMS).
•The evolution of cloud-native databases, big data analytics, and machine learning will
continue to depend on distributed systems like GFS for efficient storage and processing.
•As data continues to grow, integrating systems like GFS with DBMS will be essential for
managing complex data at scale in a fault-tolerant and efficient manner.
References
• S. Ghemawat, H. Gobioff, and S.-T. Leung, "The Google File System," ACM Symposium
on Operating Systems Principles (SOSP), 2003
• K. Shvachko, H. Kuang, S. Radia, and R. Chansler, "The Hadoop Distributed File System,"
in IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010.
• A. Anand, "Google File System: Paper Review," Medium, Sep. 2020.
• T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears,
"MapReduce Online," in Proceedings of the 7th USENIX Symposium on Networked
Systems Design and Implementation (NSDI), 2010.
• H. Gupta, "Insights from paper — The Google File System," Medium, Jan. 2021.
• https://www.geeksforgeeks.org/google-file-system/.