Unit 2
Unit 2
Data structures are used to represent the information held in running applications. The information
consists of a sequence of bytes in messages that are moving between components in a distributed
system. So, conversion is required from the data structure to a sequence of bytes before the
transmission of data. On the arrival of the message, data should also be able to be converted back
into its original data structure.
Different types of data are handled in computers, and these types are not the same in every position
where data must be transmitted. Individual primitive data items can have a variety of data values,
and not all computers store primitive values like integers in the same order. Different architectures
also represent floating-point numbers differently. Integers are ordered in two ways, big-endian
order, in which the Most Significant Byte (MSB) is placed first, and little-endian order, in which the
Most Significant Byte (MSB) is placed last or the Least Significant Byte (LSB) is placed first.
Furthermore, one more issue is the set of codes used to represent characters. Most applications on
UNIX systems use ASCII character coding, which uses one byte per character, whereas the Unicode
standard uses two bytes per character and allows for the representation of texts in many different
languages.
There should be a means to convert all of this data to a standard format so that it can be sent
successfully between computers. If the two computers are known to be of the same type, the
external format conversion can be skipped otherwise before transmission, the values are converted
to an agreed-upon external format, which is then converted to the local format on receiving. For
that, values are sent in the sender’s format, along with a description of the format, and the recipient
converts them if necessary. It’s worth noting, though, that bytes are never changed during
transmission. Any data type that can be supplied as a parameter or returned, as a result, must be
able to be converted and the individual primitive data values expressed in an accepted format to
support Remote Procedure Call (RPC) or Remote Method Invocation (RMI) mechanisms. So, an
external data representation is a standard for representing data structures and primitive values that
have been agreed upon.
Approaches:
There are three ways to successfully communicate between various sorts of data between
computers.
Object Request Broker (ORB): It provides a communication infrastructure for the objects to
communicate across a network.
Dynamic Invocation Interface (DII): Using DII, client applications are permitted to use server
objects without even knowing their types at compile time. Here client obtains an instance of
a CORBA object and then invocation requests can be made dynamically on the
corresponding object.
Interface Repository (IR): As the name implies, interfaces can be added to the interface
repository. The purpose of IR is that a client should be able to find an object which is not
known at compile-time and information about its interface then request is made to be sent
to ORB.
Object Adapter (OA): It is used to access ORB services like object reference generation.
Common Data Representation (CDR) is used to describe structured or primitive data types that are
supplied as arguments or results during remote invocations on CORBA distributed objects. It allows
clients and servers’ built-in computer languages to communicate with one another. To exemplify, it
converts little-endian to big-endian.
There are 15 primitive types: short (16-bit), long (32-bit), unsigned short, unsigned long, float (32-
bit), double (64-bit), char, boolean (TRUE, FALSE), octet (8-bit), and any (which can represent any
basic or constructed type), as well as a variety of composite types.
string: It refers to length (unsigned long) followed by characters in order (can also have wide
characters)
array: The elements of the array follow order and length is fixed so not specified.
enumerated: It is unsigned long and here, the values are specified by the order declared.
Example:
struct Person {
string name;
string place;
long year;
};
Marshalling CORBA:
From the specification of the categories of data items to be transmitted in a message, Marshalling
CORBA operations can be produced automatically. CORBA IDL describes the types of data structures
and fundamental data items and provides a language/notation for specifying the types of arguments
and results of RMI methods.
Java Remote Method Invocation (RMI) allows you to pass both objects and primitive data values as
arguments and method calls. In Java, the term serialization refers to the activity of putting an object
(an instance of a class) or a set of related objects into a serial format suitable for saving to disk or
sending in a message.
Java provides a mechanism called object serialization. This allows an object to be represented as a
sequence of bytes containing information about the object’s data and the type of object and the
type of data stored in the object. After the serialized object is written to the file, it can be read from
the file and deserialized. You can recreate an object in memory with type information and bytes that
represent the object and its data.
Moreover, objects can be serialized on one platform and deserialized on completely different
platforms as the whole process is JVM independent.
For example, the Java class equivalent to the Person struct defined in CORBA IDL might be:
Java
import java.io.*;
XML is a markup language that was defined by the World Wide Web Consortium for general use on
the web. XML was initially developed for writing structured documents for the web. XML is used to
enable clients to communicate with web services and for defining the interfaces and other
properties of web services.
Clients communicate with web services using XML, which is also used to define the interfaces and
other aspects of web services. However, XML is utilized in a variety of different applications,
including archiving and retrieval systems; while an XML archive is larger than a binary archive, it has
the advantage of being readable on any machine. Other XML applications include the design of user
interfaces and the encoding of operating system configuration files.
In contrast to HTML, which employs a fixed set of tags, XML is extensible in the sense that users can
construct their tags. If an XML document is meant to be utilized by several applications, the tag
names must be unique.
Example:
Virtualization in Distributed Systems refers to the technology that abstracts and pools physical
resources (such as servers, storage, and network devices) to create virtual resources that can be
dynamically allocated and managed across a distributed network of physical machines. Key Aspects
of Virtualization in Distributed Systems:
Resource Pooling: Physical resources are pooled together and allocated to virtual instances
as needed. This enables efficient utilization of hardware by distributing resources among
multiple virtual environments.
Resource Optimization:
Scalability:
o They support the rapid deployment of new applications and services and enable
easier testing and development of software in isolated environments.
Simplified Management:
In distributed systems, virtualization can take various forms, each addressing different aspects of
resource management and deployment. Here are the primary types of virtualization used in
distributed systems:
1. Server Virtualization
Definition: Server virtualization involves creating multiple virtual servers on a single physical
server using a hypervisor.
2. Storage Virtualization
Definition: Storage virtualization abstracts physical storage resources into a single logical
storage pool, making it easier to manage and allocate storage.
3. Network Virtualization
Definition: Network virtualization abstracts network resources to create virtual networks
that are independent of physical hardware.
Types:
o Virtual LANs (VLANs): Segregates network traffic into different virtual networks
within a physical network, improving security and efficiency.
o Software-Defined Networking (SDN): Separates the control plane from the data
plane in networking, allowing for centralized network management and dynamic
resource allocation.
Virtualization in distributed systems supports a range of use cases and applications that
enhance flexibility, efficiency, and scalability. Here are some key use cases and
applications:
Cloud Computing
o Resource Pooling: Data centers use virtualization to pool and allocate resources
dynamically based on demand, improving efficiency and utilization.
o Rapid Provisioning: Virtual machines and containers can be quickly provisioned for
development and testing, accelerating development cycles and improving agility.