Chapter 4 - Naming: Distributed Systems (IT 441)
Chapter 4 - Naming: Distributed Systems (IT 441)
(IT 441)
Chapter 4 - Naming
Objectives of the Chapter
we discuss how
human friendly names are organized and implemented;
names are used to locate mobile entities
to remove names that are no more used, also called garbage
collection
2
Which one is easy for humans and machines? and
why?
74.125.237.83 or google.com
128.250.1.22 or distributed systems website
128.250.1.25 or Prof. Buyya
Disk 4, Sector 2, block 5 OR /usr/raj/hello.c
3
3
4. Introduction
In a distributed system, names are used to refer to a wide variety of
resources such as:
Computers, services, remote objects, and files, as well as users.
Naming is fundamental issue in DS design as it facilitates
communication and resource sharing.
A name in the form of URL is needed to access a specific web
page.
Processes cannot share particular resources managed by a
computer system unless they can name them consistently
Users cannot communicate within one another via a DS unless
they can name one another, with email address.
Names are not the only useful means of identification: descriptive
attributes are another.
4
What are Naming Services?
How do Naming Services facilitate communication and resource
sharing?
An URL facilitates the localization of a resource exposed on
the Web.
e.g., abc.net.au means it is likely to be an Australian entity?
A consistent and uniform naming helps processes in a
distributed system to interoperate and manage resources.
e.g., commercials use .com; non-profit organizations use
.org
Users refers to each other by means of their names (i.e. email)
rather than their system ids
Naming Services are not only useful to locate resources but
also to gather additional information about them such as
attributes
5
What are Naming Services?
Definition
In a Distributed System, a Naming Service is a specific
service whose aim is to provide a consistent and uniform
naming of resources, thus allowing other programs or
services to localize them and obtain the required
metadata for interacting with them.
Key benefits
Resource localization
Uniform naming
Device independent address (e.g., you can move domain
name/web site from one server to another server seamlessly).
6
The role of names and name services
7
Role of Names and Naming Services
- Name Resolution
66.102.11.10
4
Client
name IP attributes
www.google.com
www.hotmail.com
……..
Naming
100.109.23.
Service 104
8
Cont..
names play an important role to:
share resources
uniquely identify entities
refer to locations
etc.
an important issue is that a name can be resolved to the entity
it refers to
to resolve names, it is necessary to implement a naming
system
in a distributed system, the implementation of a naming
system is itself often distributed, unlike in nondistributed
systems
efficiency and scalability of the naming system are the main
issues
9
4.1 Naming Entities
Names, Identifiers, and Addresses
a name in a distributed system is a string of bits or
characters that is used to refer to an entity
an entity is anything; e.g., resources such as hosts, printers,
disks, files, objects, processes, users, ...
entities can be operated on; e.g., a resource such as a printer
offers an interface containing operations for printing a
document, requesting the status of a job, ...
to operate on an entity, it is necessary to access it through
its access point, itself an entity (special)
10
Identity
Identifier properties:
An identifier refers to at most one entity
11
Internet Centric View
Addresses:
Says how to reach an object it has location semantics
associated to it
Usually, a format easy to process by computers
Name:
Does not have any location semantics associated to it
Usually, a format easier to understand/read/remember by
people
Examples:
IP address: 169.229.131.109
Name: arachne.berkeley.edu
12
Naming Systems
Flat Naming
Resolves identifiers to addresses
Structured Naming
Resolves structured human-friendly names to addresses
Attributed-based Naming
Resolves descriptive names to addresses
13
Name Service
Name space: define the set of possible names and their relationship
Hierarchical (e.g., Unix and Windows file names)
Flat
Bindings: the mapping between names and values (e.g., addresses or
other names)
Bindings can be implemented by using tables
Resolution: procedure that, when invoked with a name, returns the
corresponding value
Name server: specific implementation of a resolution mechanism that
is available on the network and that can be queried by sending
messages
14
Binding and Resolution in the Internet
15
Mapping
16
access point
the name of an access point is called an address (such as
IP address and port number as used by the transport layer)
the address of the access point of an entity is also referred
to as the address of the entity
an entity can have more than one access point (similar to
accessing an individual through different telephone
numbers)
an entity may change its access point in the course of time
(e.g., a mobile computer getting a new IP address as it
moves)
17
an address is a special kind of name
18
Examples
name of an FTP server (entity)
19
4.2 Name Spaces and Name Resolution
names in a distributed system are organized into a name space
a name space is generally organized as a labeled, directed
graph with two types of nodes
leaf node: represents the named entity and stores
information such as its address or the state of that entity
directory node: a special entity that has a number of outgoing
edges, each labeled with a name
23
Hard link
24
o symbolic link: representing an entity by a leaf node and
instead of storing the address or state of the entity, the
node stores an absolute path name
27
example: Sun’s Network File System (NFS) is a distributed file
system with a protocol that describes how a client can access
a file stored on a (remote) NFS file server
an NFS URL may look like nfs://flits.cs.vu.nl/home/steen
- nfs is an implementation of a protocol
- flits.cs.vu.nl is a server name to be resolved using DNS
- /home/steen is resolved by the server
e.g., the subdirectory /remote includes mount points for
foreign name spaces on the client machine
a directory node named /remote/vu is used to store
nfs://flits.cs.vu.nl/home/steen
consider /remote/vu/mbox
this name is resolved by starting at the root directory on
the client’s machine until node /remote/vu, which returns
the URL nfs://flits.cs.vu.nl/home/steen
this leads the client machine to contact flits.cs.vu.nl
using the NFS protocol
then the file mbox is read in the directory /home/steen
28
Linking and Mounting
Name space A Name space B
Protocol
Server
Mounting point
Mounting point
Mount point
30
A different approach to merge name spaces (with
scalability problems)
Mapping table
New root node
32
global layer
formed by highest level nodes (root node and nodes close
to it or its children)
nodes on this layer are characterized by their stability, i.e.,
directory tables are rarely changed
they may represent organizations, groups of
organizations, ..., where names are stored in the name
space
administrational layer
groups of entities that belong to the same organization or
administrational unit, e.g., departments
relatively stable
managerial layer
nodes that may change regularly, e.g., nodes representing
hosts of a LAN, shared files such as libraries or binaries,
…
nodes are managed not only by system administrators, but
also by end users
33
an example partitioning of the DNS name space, including Internet-
accessible files, into three layers 34
the name space is divided into nonoverlapping parts, called
zones in DNS
a zone is a part of the name space that is implemented by a
separate name server
some requirements of servers at different layers
performance (responsiveness to lookups), availability (failure
rate), etc.
high availability is critical for the global layer, since name
resolution cannot proceed beyond the failing server; it is also
important at the administrational layer for clients in the same
organization
performance is very important in the lowest layer, since
results of lookups can be cached and used due to the relative
stability of the higher layers
they may be enhanced by client side caching (global and
administrational layers since names do not change often)
and replication; they create implementation problems since
they may introduce inconsistency problems (see Chapter 6)
35
Item Global Administrational Managerial
36
Simple DNS Example
root name
server
Host whsitler.cs.cmu.edu wants IP
address of www.berkeley.edu
2 4
1. Contacts its local DNS server, 3
5
mango.srv.cs.cmu.edu
2. mango.srv.cs.cmu.edu contacts
root name server, if necessary
3. Root name server contacts local name server authorititive name server
ns1.berkeley.edu, if necessary 1 6
38
Iterative
a name resolver hands over the complete name to the root name
server
the root server will resolve the name as far as it can and return the
result to the client and Each layer resolves as much as it can and
returns address of next name server
at the minimum it can resolve the first level and sends the name of
the first level name server to the client
the client calls the first level name server, then the second, ..., until
it finds the address of the entity
recursive name resolution of <nl, vu, cs, ftp>; name servers cache
41
intermediate results for subsequent lookups
communication costs may be reduced in recursive name
resolution
43
Label
each node has a label, a string with a maximum of 63
characters (case insensitive)
the root label is null
children of a node must have different names (to guarantee
uniqueness)
Domain Name
each node has a domain
name
a full domain name is a
sequence of labels
separated by dots (the last
character is a dot;)
domain names are read
from the node up to the
root
full path names must not
exceed 255 characters
44
4.3 Locating Mobile Entities
the naming services discussed so far are used for naming
entities that have fixed locations
they are not well suited for supporting name-to-address
mappings that change regularly as is the case in mobile
entities
mobility could be within the same domain or to a different
domain
e.g. 1; an ftp server called ftp.cs.vu.nl is moved to a new
machine (but within the same domain)
update only the DNS database of the name server for
cs.vu.nl; lookups are not affected
46
e.g. 2; ftp.cs.vu.nl is moved to a machine named
ftp.cs.unisa.edu.au, which is in a completely different domain
two solutions to allow users to continue to access the server
record the address of the new machine in the DNS
database for cs.vu.nl; lookup operations are not affected;
but if ftp.cs.vu.nl moves once again to a different machine,
the database must be updated, making operations on
nodes at the managerial layer less efficient
record the name of the new machine, instead of its
address, in the DNS database, making ftp.cs.vu.nl a
symbolic link
lookup operations become less efficient (2 step process)
but a further movement needs only a local update (make
ftp.cs.unisa.edu.au a symbolic link)
but there will be another step added for the lookup
operation
hence, both approaches have drawbacks
47
the problems with traditional naming services is that they
maintain a direct mapping between human friendly names and
the addresses of entities
each time a name or an address changes, the mapping should
also change
a better solution is to separate naming from locating entities
by introducing identifiers (since it never changes, each entity
has exactly one identifier, and an identifier is never assigned to
a different entity)
a naming service is used to look up an identifier; it gets a
name as input and returns an identifier as output
the identifier (obtained from a naming service) can be stored
locally since it does not change
locating an entity is handled by a location service; it gets an
identifier as input and returns the current address of the
identified entity as output
48
a) direct, single level mapping between names and addresses
b) two-level mapping using identifiers
49
Location Service
two solutions for LANs: Broadcasting and Multicasting, and
Forwarding Pointers
1. Broadcasting and Multicasting
a computer that wants to access another computer for
which it knows its IP address, broadcasts this address
the owner responds by sending its Ethernet address
used by ARP (Address Resolution Protocol) in the Internet
to find the data link address (MAC address) of a machine
broadcasting is inefficient when the network grows
(wastage of bandwidth and too much interruption to other
machines)
multicasting is better when the network grows - send only
to a restricted group of hosts
multicasting can also be used to locate the nearest replica
- choose the one whose reply comes in first
50
2. Forwarding Pointers
when an entity moves from A to B, it leaves behind a
reference to its new location
advantage
simple: as soon as the first name is located using
traditional naming service, the chain of forwarding
pointers can be used to find the current address
drawbacks
the chain can be too long - locating becomes expensive
all the intermediary locations in a chain have to maintain
their pointers; vulnerability if links are broken
hence, making sure that chains are short and that
forwarding pointers are robust is an important issue
51
Home-Based Approaches
the previous approaches have scalability problems
a home location keeps track of the current location of an
entity; often it is the place where an entity was created
it is a two-tiered approach
an example where it is used in Mobile IP
each mobile host uses a fixed IP address
all communication to that IP address is initially directly
sent to the host’s home agent located on the LAN
corresponding to the network address contained in the
mobile host’s IP address
whenever the mobile host moves to another network, it
requests a temporary address in the new network and
informs the new address to the home agent
when the home agent receives a message for the mobile
host it forwards it to its new address and also informs the
sender the host’s current location for sending other
packets 52
home-based approach: the principle of Mobile IP
53
problems:
creates communication latency
the host is unreachable if the home does no more exist
(permanently changed); the solution is to register the home
at a traditional name service
Hierarchical Approaches
a generalization of the two-tiered approach into multiple
layers
a network is divided into a collection of domains, similar to
DNS
a single top-level domain spans the entire network
each domain can be subdivided into multiple, smaller
domains
the lowest-level domain is called a leaf domain; typically a
LAN
each domain D has an associated directory node dir(D) that
keeps track of the entities in that domain leading to a tree of
directory nodes
the root (directory) node knows about all entities 54
hierarchical organization of a location service into domains, each having an
associated directory node
55
an example of storing information of an entity having two addresses in
different leaf domains
57
Pointer Caching
caching is effective only if the cached data rarely change
since a mobile entity changes its address regularly, it is not
advisable to cache its address; instead we can cache the
pointers in higher level domains since they don’t change
frequently
if D is the smallest domain in which a mobile entity moves
regularly, then a lookup operation can start at dir(D); hence
cache dir(D)
59
caching a reference to a directory node of the lowest-level domain in
which an entity will reside most of the time
60
4.4 Removing Unreferenced Entities
when an entity is no longer referenced (by naming and location
services), it must be removed
facilities of automatically removing unreferenced entities are
called distributed garbage collectors
The Problem of Unreferenced Objects
consider remote objects
an object can be accessed only if there is a remote reference
to it
an object for which there is no remote reference to it must be
removed
but there could be two objects, each storing a reference to
the other, but are not referenced at all; these can be
generalized to two or more objects creating a cycle of
objects referring only to each other
such objects must be detected and removed
61
this can be modeled by a graph, where each node represents
an object
there are special objects, such as system wide services and
users, which need not be referenced themselves, called the
root set
the hollow nodes represent objects that are not directly or
indirectly referenced by objects in the root set; such objects
must be removed
63
the problem of maintaining a proper reference count in the presence of
unreliable communication
64
another problem occurs when copying a remote reference to
another process
P1 passes a reference to P2, but the object is yet unaware of
the new reference; then P1 removes its own reference before
P2 contacts the object; creating a race condition
solution: let P1 first inform the object that it is passing a
reference to P2
66
when a new remote reference is created, half of the partial
weight is assigned to the new proxy
67
when a remote reference is duplicated, half of the partial
weight of the proxy is assigned to the new proxy
69
Identifying Unreachable Entities
some entities can’t be reached from the root set and must be
removed; but the garbage collection methods discussed so
far fail to locate these entities
we need methods by which all entities can be traced and to
remove those that can not be reached from the root set;
such methods are called tracing-based garbage collection
70
Naive Tracing in Distributed Systems
in a uniprocessor system mark-and-sweep collectors are
used
they use two phases
mark phase: trace all entities from the root set and mark
them (such as recording the entity in a table)
sweep phase: search those that are not marked (those to
be removed)
drawbacks: to ensure that the reachable graph remains the
same, all executing programs needs to be stopped
temporarily and execution is switched to garbage
collection; a scenario called stop-the-world and is not
desirable for distributed garbage collectors
71
Th
a nk
Yo
u!
!
72