0% found this document useful (0 votes)
11 views21 pages

DBMS Final

The seminar discusses the Google File System (GFS) and its significance in database management systems, emphasizing its role in handling large-scale data storage and processing. It covers GFS's architecture, components, applications, and advantages over traditional file systems, highlighting its integration with modern DBMS. The presentation also addresses challenges and limitations while concluding that GFS is essential for managing complex data efficiently as data continues to grow.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views21 pages

DBMS Final

The seminar discusses the Google File System (GFS) and its significance in database management systems, emphasizing its role in handling large-scale data storage and processing. It covers GFS's architecture, components, applications, and advantages over traditional file systems, highlighting its integration with modern DBMS. The presentation also addresses challenges and limitations while concluding that GFS is essential for managing complex data efficiently as data continues to grow.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Seminar on :

Google File System and its Role in Database Management Systems

Presented By
Amr Khaled Salem 4 12311117
Devyani Anande 5 12310148
Tapan Belapurkar 7 12311797
C E Dhakshesh 11 12311915
Vishwakarma Institute Of Technology
Department Instrumentation & Control
• Introduction
• Objective of Seminar
• Need of Seminar
• Motivation of Seminar
• Literature review
• Components of GFS
• Block Diagram
• Applications
Overview • Case Studies
• Youtube
• Technical Comparisons
• Advantages of GFS
• Advantages over Traditional file system
• Integration with DBMS
• Conclusion
• References
Introduction
•Google File System (GFS) is a distributed
file system designed by Google to handle
large-scale data storage and processing.

•It is optimized for high-throughput, fault-


tolerant data access across massive clusters of
machines, supporting Google’s services.

•DBMS often require robust storage systems


for large datasets and high-availability
demands. GFS serves as a potential solution
for storing unstructured and large-scale data.
Objective of the Seminar
•To provide an understanding of how GFS functions and its significance in data
management.

•To explore how GFS integrates with Database Management Systems (DBMS) for storing
large data sets and enhancing performance.

•To discuss the role of distributed file systems in modern database architectures and
applications.
Need for the Seminar
•As data grows exponentially, traditional storage systems and DBMS face scalability
and performance challenges.

•Big Data, Cloud Computing, and Machine Learning applications require systems
like GFS for efficient data handling.

•Companies like Google and others in cloud and big data services rely heavily on
GFS for scalable and resilient storage solution
Motivation for the Seminar
•To explore solutions that can handle massive volumes of data across distributed
systems, beyond the limitations of traditional databases.

•Understanding how GFS ensures data integrity and availability, which is critical for
modern DBMS applications.

•Examining the role of GFS in improving the performance of data-heavy


applications, from database backups to analytics.
Literature Review
Author Title Publication/Year Key Contributions

S. Ghemawat, "The Google File System" ACM SOSP, 2003 Introduced GFS, a scalable distributed file
H. Gobioff, and system for large-scale data processing;
S.-T. Leung focused on fault tolerance and high
throughput.
K. Shvachko et "The Hadoop Distributed File IEEE MSST, 2010 Discussed how GFS influenced the Hadoop
al. System" Distributed File System (HDFS); compared
design elements such as metadata
management.

T. Condie et al. "MapReduce Online" USENIX NSDI, 2010 Discussed extensions to the MapReduce
framework for online processing, with
considerations on GFS's role in supporting
such systems.
Author Title Publication/Year Key Contributions

A. Anand "Google File System: 2020 Provided a comprehensive review


Paper Review" of GFS, highlighting its
architecture, design decisions, and
impact on distributed storage
systems

H. Gupta "Insights from paper — 2021 Analyzed the key aspects of GFS,
The Google File System" including its fault tolerance
mechanisms and scalability
features, offering insights into its
real-world applications.
Architecture Of GFS
Components of Google File System (GFS)
• Master Node: Controls metadata, file system operations, and coordinates data
management across the system.
• Key Responsibilities: Keeps track of file names, chunk locations, and file system
consistency.
• Chunk Servers: Store actual data in chunks (typically 64 MB each) and serve data
to clients.
• Fault Tolerance: Data is replicated across multiple chunk servers for reliability.
• Clients: Communicate with both the master node and chunk servers to access and
modify data.
• Data Access: Clients can perform read and write operations on data stored in GFS.
Applications of GFS in DBMS
•Big Data Storage
•Data Analytics
•Backup
•Cloud Based
• Data Sharding for Performance Optimization
Case Studies / Real-world Applications
• Google Search: Uses GFS for storing search index data
• YouTube: Utilizes GFS for storing video data across multiple
servers
• Google Bigtable (NoSQL): A distributed storage system used
for managing structured data, relies on GFS for its underlying
file storage needs.
• Cloud Services: Cloud providers like Google Cloud and AWS
Application of GFS in Youtube

• Massive Storage
• Replication
• High Throughput
• Metadata Management
• Scalability
• Load Balancing
Technical Comparisons
HDFS (Hadoop Distributed File System):
Similarities:
• Both are designed for large-scale data storage and processing.
• Operate on commodity hardware to reduce costs.
Differences:
• GFS emphasizes real-time data access for Google's services, whereas HDFS is
tailored for batch processing in Hadoop ecosystems.
• HDFS uses a single NameNode for metadata, making it a single point of failure,
while GFS uses replication strategies for metadata for fault tolerance.
• Block sizes differ: GFS typically uses 64 MB chunks; HDFS uses 128 MB by default.
Advantages of GFS for DBMS
•Scalability: Supports petabytes of data, allowing databases to scale seamlessly as
data grows.

•Fault Tolerance: Replication ensures data availability, which is crucial for


databases in production environments.

•High Throughput: Ideal for large-scale data operations such as big data processing,
batch analytics, and database backups.

•Cost-Effective Storage: By using commodity hardware, GFS offers a cost-efficient


storage solution compared to traditional enterprise storage systems.
Advantages of GFS Over Traditional File Systems:

•Efficient for large-scale sequential reads and writes.

•Built-in fault tolerance with replication across chunk servers.

•Highly optimized for high-throughput operations, critical for services like Google
Search and YouTube.
Integration with DBMS
Data Handling:
•GFS supports large-scale unstructured data, making it a suitable backend for
modern DBMS, including both SQL and NoSQL databases.

•Facilitates database backups by enabling quick and reliable storage of snapshots


and transaction logs.

Distributed DBMS Architectures:


•DBMS systems like Google’s Bigtable or other NoSQL databases often layer on top
of GFS.

•GFS serves as the storage backbone, allowing DBMS to efficiently retrieve and store
massive datasets.
Challenges and Limitations
• Dependancy On Custom Hardwares: Designed for Google’s Infrastructure,not generic use.

• Replication Overhead – Triple replication increases storage requirements.

• Limited Small File Performance – Optimized for large files; small files can cause
inefficiencies.

• High Latency for Metadata Updates – Frequent metadata changes can slow down
operations.
Conclusion
•GFS is a powerful distributed file system designed for large-scale data storage, which can
greatly enhance the performance and scalability of Database Management Systems (DBMS).

•The evolution of cloud-native databases, big data analytics, and machine learning will
continue to depend on distributed systems like GFS for efficient storage and processing.

•As data continues to grow, integrating systems like GFS with DBMS will be essential for
managing complex data at scale in a fault-tolerant and efficient manner.
References
• S. Ghemawat, H. Gobioff, and S.-T. Leung, "The Google File System," ACM Symposium
on Operating Systems Principles (SOSP), 2003
• K. Shvachko, H. Kuang, S. Radia, and R. Chansler, "The Hadoop Distributed File System,"
in IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010.
• A. Anand, "Google File System: Paper Review," Medium, Sep. 2020.
• T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears,
"MapReduce Online," in Proceedings of the 7th USENIX Symposium on Networked
Systems Design and Implementation (NSDI), 2010.
• H. Gupta, "Insights from paper — The Google File System," Medium, Jan. 2021.
• https://www.geeksforgeeks.org/google-file-system/.

You might also like