Unit 4
Unit 4
Distributed File Systems allow files to be accessed and managed across multiple locations,
balancing load and providing fault tolerance. Here’s an in-depth look at the key concepts.
File Models in DFS
1.Client-Server Model:
In this model, a central server or group of servers stores files, and clients connect to these
servers to access files.
Example: Network File System (NFS), a widely used DFS where clients mount file systems
remotely and access them as if they were local. NFS servers maintain file storage, while
clients access these resources using protocols such as RPC (Remote Procedure Call).
2. Cluster-Based Systems:
These systems distribute file storage and processing across clusters of servers or nodes,
allowing data to be spread across multiple machines to balance load and provide redundancy.
Example: Google File System (GFS) splits files into chunks, which are stored across
different machines, improving fault tolerance and performance by allowing concurrent
processing on different parts of a file.
3. Symmetric Model:
• In symmetric DFS, each node can act as both a client and a server, decentralizing
file access and management.
• Example: Hadoop Distributed File System (HDFS) distributes data and tasks
across multiple nodes in a cluster, providing resilience and high throughput for big
data workloads.
• Security in DFS involves protecting data and ensuring only authorized access. This often
involves access control lists, encryption, and secure protocols (e.g., Kerberos
authentication and SSL encryption).
• Authentication, such as using tokens or certificates, ensures that only verified users or
nodes access the system.
Distributed Databases
• Distributed databases manage and store data across multiple nodes, ensuring
scalability, performance, and fault tolerance.
• Partitioning Types
1.Vertical Partitioning:
Splits tables by columns, allowing related attributes to be stored together.
Example: A customer database could store basic information (name, contact) on
one server, while sensitive information (credit card data) is stored separately,
enabling more secure access.
2. Horizontal Partitioning:
• Distributes tables by rows across nodes, usually based on keys (e.g., customer
region).
• Example: A user database could be split so that users from North America are
stored on one server and users from Europe on another, optimizing access based
on region.
3. Hybrid Partitioning:
Combines vertical and horizontal partitioning to optimize data access and query
performance.
Example: Customer data is horizontally partitioned by region, with each region also
vertically partitioned by data types (e.g., basic info, transaction history).
CRUD Operations
1. Master-Slave Architecture:
A central master node handles write operations, while slaves replicate data for reads.
Example: MySQL Master-Slave replication uses the master node for writes,
propagating changes to slave nodes, which serve read requests to improve read
performance
2. Peer-to-Peer Architecture:
Each node acts as an equal, capable of handling reads and writes, with data shared
across peers.
Example: Cassandra uses a peer-to-peer approach where any node can accept reads
and writes, distributing the load evenly and enhancing availability.
CAP Theorem
• The CAP theorem states that a distributed system can only achieve two out of
three properties simultaneously:
1.Consistency: All nodes see the same data at the same time.
2.Availability: The system continues to operate despite failures.
3.Partition Tolerance: The system remains functional even if network partitions
cause a loss of connectivity between some nodes.
• Distributed databases generally make trade-offs between these properties. For
instance:
• AP Systems (e.g., DynamoDB) prioritize availability and partition tolerance but
may provide eventual consistency.
• CP Systems (e.g., HBase) emphasize consistency and partition tolerance but may
experience reduced availability during network partitions.
Distributed Web Systems
Replication
• Replication in distributed web systems enhances availability and reliability by
mirroring resources across multiple servers.
Example: Content Delivery Networks (CDNs) replicate web content globally,
allowing users to access content from a server near their location.
Security in Distributed Web Systems