0% found this document useful (0 votes)
23 views13 pages

Unit 4

distributed system

Uploaded by

2k21CO401 Sachin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views13 pages

Unit 4

distributed system

Uploaded by

2k21CO401 Sachin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Distributed File Systems (DFS)

Distributed File Systems allow files to be accessed and managed across multiple locations,
balancing load and providing fault tolerance. Here’s an in-depth look at the key concepts.
File Models in DFS
1.Client-Server Model:
In this model, a central server or group of servers stores files, and clients connect to these
servers to access files.
Example: Network File System (NFS), a widely used DFS where clients mount file systems
remotely and access them as if they were local. NFS servers maintain file storage, while
clients access these resources using protocols such as RPC (Remote Procedure Call).
2. Cluster-Based Systems:
These systems distribute file storage and processing across clusters of servers or nodes,
allowing data to be spread across multiple machines to balance load and provide redundancy.
Example: Google File System (GFS) splits files into chunks, which are stored across
different machines, improving fault tolerance and performance by allowing concurrent
processing on different parts of a file.
3. Symmetric Model:
• In symmetric DFS, each node can act as both a client and a server, decentralizing
file access and management.
• Example: Hadoop Distributed File System (HDFS) distributes data and tasks
across multiple nodes in a cluster, providing resilience and high throughput for big
data workloads.

4. NFS (Network File System):


• NFS enables users to access files on remote systems as if they were local, using a
client-server model.
• NFS relies on remote procedure calls (RPC) to request services on remote file
systems, allowing for centralized file management across distributed systems.
Naming and Automounting
• Naming: Naming in DFS involves unique identification of files across nodes. It
can be hierarchical (similar to a file path or URL structure) or flat. Hierarchical
naming is usually preferred, allowing clear organization of files across locations.
• Automounting: DFS often use automounting to dynamically map remote
directories to a local system when files are accessed, reducing the overhead of
manually mounting directories.
File Sharing and Replication
• File Sharing: Allows multiple users to access files concurrently. Synchronization
mechanisms, such as file locking, versioning, and conflict resolution strategies,
are used to manage concurrent access.
• Replication: Files are copied across multiple nodes or data centers, improving
availability and fault tolerance. For example, in HDFS, data is replicated across
multiple nodes, allowing data access even if one node fails.
Peer-to-Peer (P2P) Systems
• P2P systems remove the central server, and each node functions as both a client
and a server, allowing direct sharing of files.
Example: BitTorrent uses P2P file-sharing, allowing peers to directly exchange file
chunks, reducing the load on any single server and enabling scalability.
Byzantine Failures
• Byzantine Failures refer to situations where nodes may fail arbitrarily, even
maliciously.
• Byzantine Fault Tolerance (BFT) techniques, such as PBFT (Practical Byzantine
Fault Tolerance), allow a system to continue to operate correctly even when some
nodes exhibit faulty or malicious behaviour.
Security and Authentication

• Security in DFS involves protecting data and ensuring only authorized access. This often
involves access control lists, encryption, and secure protocols (e.g., Kerberos
authentication and SSL encryption).
• Authentication, such as using tokens or certificates, ensures that only verified users or
nodes access the system.
Distributed Databases
• Distributed databases manage and store data across multiple nodes, ensuring
scalability, performance, and fault tolerance.
• Partitioning Types
1.Vertical Partitioning:
Splits tables by columns, allowing related attributes to be stored together.
Example: A customer database could store basic information (name, contact) on
one server, while sensitive information (credit card data) is stored separately,
enabling more secure access.
2. Horizontal Partitioning:
• Distributes tables by rows across nodes, usually based on keys (e.g., customer
region).
• Example: A user database could be split so that users from North America are
stored on one server and users from Europe on another, optimizing access based
on region.
3. Hybrid Partitioning:
Combines vertical and horizontal partitioning to optimize data access and query
performance.
Example: Customer data is horizontally partitioned by region, with each region also
vertically partitioned by data types (e.g., basic info, transaction history).
CRUD Operations

• CRUD (Create, Read, Update, Delete) operations are fundamental to database


management. In distributed databases, handling CRUD requires efficient
mechanisms to ensure synchronization, consistency, and performance across
nodes.
• Query Optimization
• Optimizing queries in distributed databases is crucial to reduce execution time and
resource usage. This can involve:
• Data Localization: Filtering data at local nodes before transferring it.
• Join Optimization: Optimizing how tables from different nodes are
combined.
• Caching: Storing frequently accessed query results locally to reduce
repetitive processing.
Master-Slave and Peer-to-Peer Architectures

1. Master-Slave Architecture:
A central master node handles write operations, while slaves replicate data for reads.
Example: MySQL Master-Slave replication uses the master node for writes,
propagating changes to slave nodes, which serve read requests to improve read
performance
2. Peer-to-Peer Architecture:
Each node acts as an equal, capable of handling reads and writes, with data shared
across peers.
Example: Cassandra uses a peer-to-peer approach where any node can accept reads
and writes, distributing the load evenly and enhancing availability.
CAP Theorem
• The CAP theorem states that a distributed system can only achieve two out of
three properties simultaneously:
1.Consistency: All nodes see the same data at the same time.
2.Availability: The system continues to operate despite failures.
3.Partition Tolerance: The system remains functional even if network partitions
cause a loss of connectivity between some nodes.
• Distributed databases generally make trade-offs between these properties. For
instance:
• AP Systems (e.g., DynamoDB) prioritize availability and partition tolerance but
may provide eventual consistency.
• CP Systems (e.g., HBase) emphasize consistency and partition tolerance but may
experience reduced availability during network partitions.
Distributed Web Systems

• Distributed web systems ensure seamless client-server interactions across


distributed architecture, optimizing performance, security, and scalability.
Web Clients and Servers
1. Web Clients: Web clients, typically browsers, make requests to servers using
HTTP to fetch or interact with resources (e.g., HTML pages, JSON data).
2. Web Servers: Servers handle these requests, providing or managing data,
application logic, and other resources.
HTTP Connections and Methods
1. HTTP Connections:
• HTTP/1.1 introduced persistent connections, allowing multiple requests on a
single connection to reduce latency.
2. HTTP Methods:
• GET: Retrieves data from the server.
• POST: Submits data to the server for processing.
• PUT: Updates existing data.
• DELETE: Removes data from the server.
Messaging and SOAP
• Messaging: Distributed systems use messaging protocols to manage
communication between services, with REST and SOAP as popular protocols.
• SOAP: SOAP is an XML-based messaging protocol for exchanging structured
information across networks, often used in distributed web services requiring
strict standards and security.
Naming and Proxy Caching

• Naming: Distributed systems use unique names, typically URLs or URIs, to


identify resources.
• Proxy Caching: Proxy servers cache frequently accessed resources, reducing
latency for users and load on origin servers. This caching occurs close to the user,
as in a Content Delivery Network (CDN).

Replication
• Replication in distributed web systems enhances availability and reliability by
mirroring resources across multiple servers.
Example: Content Delivery Networks (CDNs) replicate web content globally,
allowing users to access content from a server near their location.
Security in Distributed Web Systems

• Security mechanisms include HTTPS (for secure HTTP connections),


authentication (e.g., OAuth, JWT), and encryption protocols to protect data and
user interactions.
• Authentication: Methods like OAuth, JWT tokens, and Single Sign-On (SSO)
validate user identities, while encryption safeguards sensitive data.

You might also like