DS 2
DS 2
Let’s break down the concept of distributed file systems and their characteristics
in a simpler and more detailed way.
A **distributed file system (DFS)** allows users to store and access files across a
network as if they were on their local computer. This means you can access files from
any computer connected to the network, making it easier to share information and
collaborate.
1. **Remote Access**: Users can access files stored on a server from any computer on
the network, similar to how they would access files on their own hard drive.
3. **File Sharing**: The main goal of a DFS is to facilitate the sharing of files. For
example, web servers allow users to access files stored on them over the Internet.
- **Load Balancing**: Distributing file requests evenly across servers to prevent any
one server from becoming overloaded.
- **Reliability**: Ensuring that files are accessible even if some servers fail.
- **Availability**: Making sure files are available whenever users need them.
1. **Organization and Storage**: File systems organize and store files on disks or other
storage devices. They manage how files are stored and retrieved.
- Owner’s identity
- Access control lists (who can access the file and how)
3. **File Management**: File systems allow users to create, name, and delete files.
They use **directories** (folders) to organize files:
- **Directories**: Special files that map text names to file identifiers. They can also
contain other directories, creating a hierarchical structure (like folders within folders).
4. **Access Control**: File systems control who can access files and what they can do
with them (like read, write, or execute). This is managed through user permissions.
5. **Metadata**: This refers to the extra information stored by the file system, which is
essential for managing files. Metadata includes file attributes and directory structures.
Sure! Let’s simplify and explain the requirements for distributed file systems in detail.
These requirements ensure that distributed file systems function effectively and
efficiently.
1. **Transparency**:
Transparency is about making the distributed file system easy to use without the user
needing to know where files are stored or how they are accessed. There are several
types of transparency:
- **Access Transparency**: Users should be able to access both local and remote files
using the same operations. Programs that work with local files should also work with
remote files without needing changes.
- **Location Transparency**: The file names should remain the same, even if the files
are moved to different locations. Users shouldn’t have to change how they refer to files.
- **Scaling Transparency**: The system should be able to grow easily to handle more
files or users without major changes. This means it can expand incrementally to meet
demand.
When multiple users or programs access and modify the same file at the same time,
the system must ensure that these changes do not interfere with each other. This is
known as **concurrency control**. The system should manage simultaneous updates
so that all users see consistent and correct data.
3. **File Replication**:
In a distributed file system, files can be copied and stored in multiple locations. This
has two main benefits:
- **Load Sharing**: Multiple servers can handle requests for the same file, distributing
the workload and improving performance.
- **Fault Tolerance**: If one server fails, users can still access the file from another
server that has a copy. This helps ensure that files remain available even during server
outages.
The system should work across different types of hardware and operating systems.
This means that clients and servers can be built on various platforms, allowing for
flexibility and compatibility.
5. **Fault Tolerance**:
The file system should continue to function even if some clients or servers fail. Servers
can be designed to be **stateless**, meaning they don’t need to remember past
interactions. If a server fails, it can restart without losing data, and file replication helps
maintain availability.
6. **Consistency**:
In traditional file systems, updates to a file are straightforward, meaning all users see
the same version of the file. However, in a distributed system with replicated files, there
can be delays in updating all copies. The system needs to ensure that users see the
most current version of a file, despite these delays.
7. **Security**:
The distributed file system should provide the same features as traditional file systems
while also performing well. This means it should be fast and responsive, allowing users
to access and manage files without noticeable delays.
Certainly! Let’s break down the architecture of a distributed file service, focusing on its
components and how they work together in a clear and detailed way.
2. **Directory Service**
3. **Client Module**
Each component interacts with the others to provide the users with a complete file
management system, similar to traditional file systems but with enhancements for
distributed environments.
- **Purpose**: The flat file service deals with the actual content of the files. It manages
operations like reading, writing, creating, and deleting files.
- **Unique File Identifiers (UFIDs)**: Each file is referenced by a Unique File Identifier
(UFID), which is a long string of bits. The purpose of a UFID is to ensure that every file in
the distributed system can be uniquely identified, preventing confusion and conflicts.
- **File Operations**: Here are the primary operations performed by the flat file service:
- **Get Attributes**: Fetches the attributes (such as file size and ownership) for a file.
- **Fault Tolerance**: The flat file service interface is designed for reliability:
- Operations (except `Create`) can be repeated without the system requiring a specific
response. This means if a request doesn't return an answer, the client can safely retry it.
- **Stateless Servers**: The servers do not keep track of the history of operations,
meaning they can restart without needing to remember previous interactions.
- **Purpose**: The directory service provides a way to map user-friendly text names
(like "myfile.txt") to their corresponding UFIDs. This allows users to access files without
remembering their UFIDs.
- **Add Name**: Adds a new entry to a directory (like creating a shortcut for a file).
- **Organization**: Each directory is stored as a file with its own UFID, meaning the
directory service relies on the flat file service for storage.
- **Purpose**: The client module runs on each client computer and acts as the
interface that user applications interact with. It simplifies accessing files and
directories.
- **Functionality**: It combines the operations of both the flat file service and the
directory service into a single programming interface:
- It means that user programs can perform file operations (like reading or writing)
without needing to know whether the files are local or stored on a remote server.
- The client module also contains information about the network locations of the flat
file server and directory server.
In a distributed file system, managing who can access what files is crucial. There are
two approaches to handle access control:
- An access check is performed when translating a file name into its UFID. Each check
returns a capability (like a ticket) that grants access to the client for future operations
without needing to verify permissions again.
Absolutely! Here’s a simpler breakdown of hierarchical file systems and file groups,
focusing on the core concepts without getting too technical:
A **hierarchical file system** is like a filing cabinet where everything is organized neatly.
Here’s how it works:
1. **Tree Structure**:
- Imagine a tree where the trunk is the **root directory**. This is the starting point.
- Branches are called **directories** (or folders), and they can hold other directories
or **files** (documents, pictures, etc.).
2. **Pathnames**:
- Every file or directory has a **pathname** that tells you how to get there, starting
from the root. For example, `/documents/picture.jpg` means you go to the
`documents` folder, and then find the file called `picture.jpg` inside it.
3. **Multiple Names**:
- In this system, you can give the same file different names. For example, you might
have a file named `report.txt` that’s in two different folders. You can use a special
command called **linking** to create another name for the same file in a different
folder.
- In a distributed file system, this structure uses services to organize files. The client
(your computer) uses these services to manage directories and files, just like you would
on your local computer.
- There’s a function that helps look up a file’s unique identifier (UFID) based on its
pathname.
**File groups** help keep files organized across different servers. Here’s how they
function:
1. **Definition**:
- A file group is just a collection of files located on the same server. Think of it as a
specific section of the filing cabinet.
2. **Moving Groups**:
- Servers can hold many file groups, and these groups can be moved between servers
if needed. However, once a file is part of a group, it stays in that group forever.
3. **Unique Identifiers**:
- Each file group has a **unique identifier** (like an ID card number) so that it can be
distinguished from other groups. This ID helps ensure there are no duplicates across the
whole system.
Sure! Here’s a simplified explanation of **Sun Network File System (NFS)**, its features,
and how it works:
**Sun NFS** (Network File System) is a system that allows computers on a network to
access files from other computers as if they were on their own local hard drive. It was
created by Sun Microsystems in the 1980s and is widely used for file sharing.
1. **Client-Server Model**:
- Every computer can act as both a **client** (which requests files) and a **server**
(which shares files). This allows any machine to access and provide files to others.
2. **Remote Access**:
- NFS enables programs on client computers to retrieve and send data to files located
on remote servers seamlessly.
3. **File Handles**:
- Each file in NFS is identified by a **file handle**, a unique identifier that helps locate
the file on the server.
4. **Virtual File System (VFS)**:
- NFS uses a **VFS** layer to manage both local and remote files, making it easy for
the system to differentiate between them.
1. **Mounting**:
- **Soft-mounted**: If the server is unavailable, the client will stop trying after a few
attempts and will return an error instead.
3. **Caching**:
- **Client Caching**: Clients store the results of file operations (like reads and writes)
in memory to reduce the number of requests they send to the server. This makes file
access faster.
4. **Pathname Translation**:
- When accessing files, the client translates file pathnames to file handles in a step-
by-step process, sending requests to the server as needed.
- NFS servers check permissions every time a user tries to access a file. This ensures
that only authorized users can access or modify files.
- NFS can also use **Kerberos** for enhanced security, which verifies the identity of
users before granting access.
Sure! Here’s a simplified overview of the **Andrew File System (AFS)**, explaining its
features, how it works, and key components.
The **Andrew File System (AFS)** is a distributed file system designed to provide
location-independent access to files across a network. It uses local caching to improve
performance and reduce the workload on servers.
1. **Location Independence**:
- Files can be accessed regardless of where they are physically stored on the network.
2. **Local Caching**:
- AFS stores copies of files in a local cache on the client’s machine to speed up
access. This means that once a file is fetched, subsequent requests for the same file
can be served from the cache.
- Each file is usually owned by a single user, meaning shared files are often read-only.
4. **Plentiful Disk Space**:
- AFS assumes that there is enough disk space available to store cached files on the
client’s local disk.
1. **Vice**:
- This is the server-side software that provides shared file services. It runs on top of the
UNIX kernel and manages the storage of files.
2. **Venus**:
- This is the client-side software that acts as a cache manager. It serves as the
interface between the application programs and the Vice server.
1. **File Storage**:
- Files can be either local (stored on the client’s disk) or shared (stored on the Vice
server). Local files are treated like regular UNIX files.
2. **Caching Process**:
- When a workstation requests a file for the first time, the request goes to the Vice
server, which sends the file back and stores a copy in the local cache.
- For subsequent requests for the same file, Venus serves the file directly from the
local cache, which is faster.
3. **File Identification**:
- Each file in AFS is identified by a unique **file identifier (fid)**. Venus translates file
pathnames into these fids to manage access.
4. **Location Transparency**:
- Users can access files without needing to know where they are physically stored,
meaning the file names do not reveal their storage locations.
- AFS servers keep track of which clients have open files. If a file is modified by one
client, the server sends a **callback** to all clients that have cached that file, notifying
them of the change.
- When a client requests a file, it receives a **callback promise**, which ensures the
server will inform it if the file changes.
6. **Cache Validation**:
- If a workstation restarts, Venus checks the validity of cached files. It sends a request
to the server to confirm that the cached files are up-to-date. If a file has been modified,
the cached version is marked as invalid.
- These are human-readable names that help users easily identify resources.
Examples include:
2. **Numeric Addresses**:
3. **Object Identifiers**:
- These identify the location of an object rather than the object itself. They are often
used in databases and object-oriented systems.
- **Meaningfulness**: Names are more descriptive and easier for users to remember.
In distributed systems, different naming systems are used for various types of
resources:
**URIs** provide a standardized way to identify resources. There are two main types of
URIs:
- Specifies the location of a resource and the protocol to access it (e.g., `http`, `ftp`).
- Example: `http://www.example.com/index.html`
- Examples:
- `urn:dcs.qmul.ac.uk:TR2007-5`
**Name services** are systems that translate names into attributes or locations of
resources. They are essential for managing names in distributed environments.
2. **RMI Registry**: Binds remote objects to symbolic names in Java's Remote Method
Invocation (RMI).
- **Unification**: It’s helpful for different resources managed by various services to use
a consistent naming scheme (like URIs).
1. **Name Spaces**:
- How names are organized and structured. A well-defined namespace helps avoid
conflicts and makes it easier to manage resources.
2. **Name Resolution**:
1. **Hierarchical Namespace**:
- This is a structured naming system where names are organized in a hierarchy, similar
to a tree.
- Examples include DNS, where domains branch out from a root, creating a parent-
child relationship among names.
- This type features a single global context with one naming authority for all names.
**DNS** is the naming service used on the Internet for TCP/IP networks. It translates
human-friendly domain names into IP addresses that computers use to communicate
with each other.
2. **Hierarchy of Domains**:
- **Root-Level Domain**: This is the top of the hierarchy, represented by a dot (`.`)
and is usually not shown in domain names explicitly.
- **Top-Level Domains (TLDs)**: These follow the root and include well-known
domains such as `.com`, `.org`, and country codes like `.uk` or `.jp`.
- `sub` is a subdomain, indicating a specific area of content within the larger site.
- **Aliases**:
- An alias is a name that refers to another name or resource, meaning both names
point to the same information.
**Global Uniformity**:
- The DNS provides a consistent naming structure where a specific name always points
to the same resource, no matter who or where it's being looked up from.
When merging different naming systems, such as the file systems of two computers
(let's say named "red" and "blue"), each having its own root, you might encounter
overlapping file names (e.g., `/etc/passwd`).
**Merging Approach**:
- To effectively manage multiple file systems, you can create a **super root** directory
and mount each computer's file system within it.
- Example:
- Mount the file system from **red** at `/red` and from **blue** at `/blue`.
- Users can refer to the files using paths like `/red/etc/passwd` for the file on red and
`/blue/etc/passwd` for the file on blue, thus avoiding conflicts.
Let’s break down the **Domain Name System (DNS)** and its components in a clear
and detailed manner, making it easy to understand.
DNS names are organized hierarchically, meaning they are structured in levels from the
highest to the lowest. The highest-level domain is on the right, and as you move left, you
get more specific. For example, in the domain name `www.dcs.qmul.ac.uk`:
- **`uk`**: This is the top-level domain (TLD), indicating it belongs to the United
Kingdom.
- **`.fr`**: France.
- When you enter a URL like `www.dcs.qmul.ac.uk` in your web browser, the browser
sends a DNS query to find the corresponding IP address. This allows the browser to
connect to the correct web server.
Besides the common queries mentioned, there are other types of DNS queries:
- **Reverse Resolution**:
- This allows a user to find the domain name associated with a specific IP address. For
example, if you have an IP address, you can query the DNS to find out which domain
name it corresponds to.
- **Host Information**:
- DNS can store additional information about a host, such as its architecture type and
operating system.
DNS operates using a distributed network of servers that manage domain names and
their corresponding IP addresses. Here’s how it works:
- This server holds the authoritative and writable copy of the zone file, which contains
all the DNS records for its domain.
- These servers maintain copies of the zone file from the master server. They provide
redundancy and help distribute the load of DNS queries.
3. **DNS Zones**:
- **CNAME Record**: Allows one domain name to be an alias for another (e.g.,
`www.example.com` can point to `example.com`).
- **MX Record**: Specifies the mail exchange servers for a domain, indicating where
emails should be sent.
- **NS Record**: Identifies the name servers that are authoritative for a domain.
To improve efficiency, DNS responses are cached. This means that when a DNS server
resolves a domain name, it temporarily stores the result for a period of time known as
**Time To Live (TTL)**. TTL values can range from a few minutes to several days,
depending on how the DNS server is configured. This caching helps reduce the load on
DNS servers and speeds up subsequent queries for the same domain.
Certainly! Let’s break down **DNS name resolution** and how it works in a detailed yet
easy-to-understand way.
When you type a website address into your browser, several steps happen to find the
corresponding IP address:
2. **DNS Resolver**: The operating system asks the DNS resolver (usually provided by
your Internet Service Provider or a public DNS service) to resolve the domain name.
3. **Query Process**: The DNS resolver communicates with various DNS servers to find
the IP address associated with that domain name.
Since DNS holds a vast amount of data, no single server contains all it. Instead, DNS
data is distributed, and the process of finding this data through various servers is called
**navigation**. There are different methods of navigation:
1. **Iterative Navigation**
2. **Multicast Navigation**
In **iterative navigation**, when a DNS resolver doesn't find the name in its cache, it will
ask the local name server:
- If the local server knows the IP address, it returns the IP address immediately.
- If not, the server provides a referral to another server that might know.
- The resolver then queries the next server and continues this process until it finds the IP
address or determines that the name does not exist.
**Example**:
- The DNS resolver queries the local DNS server (let's call it **NS1**).
- If **NS1** doesn't have the information, it refers the resolver to another server
(**NS2**).
- This continues until the IP address is found or the name is confirmed not to exist.
In **multicast navigation**, the query is sent to a group of name servers at once, rather
than one at a time:
- The client sends a query to many servers asking for a specific name and object type.
This method can speed up the name resolution process because multiple servers are
queried simultaneously.
- The selected server can query other name servers either through multicast or
iteratively.
- This process continues until the name is resolved, and the server returns the
resolution result to the client.
**Example**:
- If **NS1** does not know the IP, it will start querying its peer servers either through
direct queries or multicast.
- If the first server doesn’t have the IP address, it will contact other servers on behalf of
the client, continuing this process until it finds the IP address or determines that it
cannot be found.
- Once the IP address is found, the server sends it back to the client.
**Example**:
- If **NS1** doesn’t have the information, it queries **NS2**. If **NS2** doesn’t have
the information, it queries **NS3**, and this continues until the name is resolved or
found to be invalid.
### Caching
- Both the **client** and **DNS servers** keep a record of previous name resolutions.
This means that if the same name is queried again, the system can quickly return the
cached result rather than going through the lookup process again.
- Cached information typically has a **Time To Live (TTL)** value, which specifies how
long the information can be stored before it needs to be refreshed.
Let’s break down **directory services** in a simple and clear way.
One of the most common examples of a directory service is the **Domain Name
System (DNS)**. Here’s how it works:
- **Host Name to IP Address Mapping**: DNS servers store a list of mappings between
human-readable names (like `www.example.com`) and their corresponding IP
addresses (like `192.0.2.1`). This allows users to access websites using easy-to-
remember names instead of complex numbers.
- **Clients and Servers**: When a computer (the DNS client) wants to find the IP
address for a domain name, it sends a query to a DNS server. The server looks up the
name in its database and returns the corresponding IP address.
1. **Yellow Pages Services**: These services help you find resources based on
attributes. For example, if you want to find a printer that supports color printing, you
could search for printers with that specific feature.
2. **White Pages Services**: These services help you find resources based on names.
For example, if you want to find the phone number for a specific person or organization,
you would use a white pages service.
- **Discover Services**: UDDI allows clients to find available web services based on
either their names (white pages) or attributes (yellow pages). For example, a client can
search for a payment processing service by looking for services that support credit card
transactions.
- **Finding Services**: Discovery services help clients, especially mobile ones, find the
most suitable resources available in their current location. For example, when you
arrive at a hotel, your device might use a discovery service to find the nearest printing
service for your documents.
Sure! Here’s a simplified overview of the **Global Name Service (GNS)**:
The **Global Name Service (GNS)** is a naming system that helps manage and
organize names and resources in a network. It was designed in **1986** by Lampson
and colleagues at the **DEC Systems Research Centre**.
1. **Merging Name Servers**: GNS can combine two or more name servers into one
system, making it easier to manage names and resources.
3. **Email Addressing**: GNS simplifies how email addresses are structured, making it
easier to send emails to users.
1. **Directory Structure**:
- **Value Name**: Refers to a specific value (like user data) within that directory.
**Example**: For a user named **Peter Smith** at **QMUL**, their information might
be stored as:
- **<EC/UK/AC/QMUL, Peter.Smith>**
- This tells you where to find Peter Smith's data within the EC (European Community)
directory.
3. **Value Trees**:
- The leaves of the directory tree contain values structured in value trees, which could
include passwords, email addresses, and other attributes associated with a name.
When merging different GNS systems (like one for **Europe** and another for **North
America**), a new root called **WORLD** can be introduced:
- **New Root**: This creates a unified hierarchy that can include both regions’
directories.
- **Directory Identifiers**: To avoid confusion, old names are updated with unique
directory identifiers (e.g., transforming **</UK/AC/QMUL, Peter.Smith>** into
**<#599/UK/AC/QMUL, Peter.Smith>**, where **#599** is the identifier for the EC
directory).
Sure! Here’s a simplified overview of the **X.500 Directory Service**:
1. **Global Directory Service**: X.500 acts like a global "White Pages" directory,
allowing users to find information about individuals and organizations.
- The entire directory structure, including all the data associated with the nodes (like
names, email addresses, etc.), is called the **Directory Information Base** (DIB).
- The user interface program that allows users to access one or more DSAs is called a
**DUA**.
- It helps users perform searches and retrieve information from the directory.
- Many universities, like the **University of Michigan**, use X.500 for routing emails and
providing name lookups.
- X.500 can also work with **Lightweight Directory Access Protocol (LDAP)**, which is a
simpler protocol for accessing directory services.