DS Lecture 5
DS Lecture 5
Fall 2023-2022
File System
Client-server file systems
Consists of :
• Central servers
-Point of congestion, single point of failure
• Alleviate somewhat with replication and client caching
– E.g., Coda, tokens, (aka leases, oplocks)
– Limited replication can lead to congestion
• File data is still centralized
– A file server stores all data from a file
–not split across servers – Even if replication is in place, a client
downloads all data for a file from one server
• File sizes are limited to the capacity available on a server
– What if you need a 1,000 TB file?
Google File System (GFS)
GFS Goals
clusters.
EX: Hadoop 1– Avro™: A data serialization system.
• Written in Java
• Master/Slave architecture
• Single NameNode
– Master server responsible for the namespace & access control
• Multiple DataNodes
– Responsible for managing storage attached to its node
• A file is split into one or more blocks
– Typical block size = 128 MB (vs. 64 MB for GFS)
– Blocks are stored in a set of DataNodes