3 MC
3 MC
UNIT
Management Issues
Data
...(106J - 119J)
Part-1 ....
105 (IT-8) J
106 (IT-8) J Data Management Issues
PART- 1
Data Management Issues, Data Replication for Mobile Computers.
CONCEPT OUTLIN E: PART-1
Data management is a process of managing data asa resource
that is valuable to an organization or business.
Some of the issues of data management are :
a. Mobility
b. Wireless medium
C. Transaction management
d Portability of mobile devices
" Data replication generates and manages multiple copies of data
at one or more sites.
Questions-Answers
Long Answer Type and Medium Answer Type Questions
Answer
Mobile database :
1. A mobile database is a database that can be connected to a mobile
computing device over a mobile network.
Mobile Computing 113 (IT-8) J
MU
(BS) Fixed Host
DBS
8. In mobile environment, the user may also change his device to access a
service.
9 So, the diversity of devices and consequently the context of use of services
or an application must be taken into account.
10. Infact the mobile environments are characterized by a frequent change
in their resources which comes from various sources such as the nature
of wireless network itself, the mobility of users and multi-terminal access.
11. This change may influence data replication because the creation of and
access to these data may need a set of resources.
12. For example, confidential data like credit card number may not be
replicated and exchanged across non-secure nodes and links.
13. Thus, variation in the level of security may prevent the user from
accessing this data.
14. So,a traditional system is not able to satisfy the client's request.
15. To ensure service continuity, the replication system functionalities like
creation, placement, read, write and consistency operation must be
adapted toall variations in resources that data may need.
Que 3.5. How the data replication works ?
Answer
1 Replication can be provided in many forms.
2 The basic objective of the replication technique is to improve the
performance and to increase the data availability and consistency.
3 The principle functionalities of a replication system are :
a. Replica creations b. Replica placement
C. Read/write operations d. Replica consistency
Application
Required
context repository HApplication
Replica
planner interface
Provided Application
Localization Strategy
manager manager context Trigger
repository
Consistency
manager
Environment
Fig. 3.5.1.
116 (IT-8) J Data Management Issues
14. After receiving the notification from the application trigger modules,
the strategy manager choose the adapted strategy, ensures system
consistency and implements this strategy in same or all modules (replica
planner, localization manager and consistency manager).
Que 3.6. Discuss different possible replicating strategies.
Answer
Data replications strategies : Replication can be provided in numerous
forms and combinations. There are mainly three strategies of replications :
1. Synchronous replication :
a.
Synchronous replication is atechnique for replicating data between
databases (or file systems) where the system being replicated does
wait for the data to have been recorded on the duplicate system
before proceeding.
b Under synchronous data replication strategy, updates are applied
toall database replicas of an object as part of the original transaction.
C The database replicas are then kept in a state of synchronization.
d. In synchronous replication, if one or more sites that hold replicas
are unavailable, transaction cannot complete.
117 (1T-8) J
Mobile Computing
e Also, a large number of messages required to coordinate
synchronization.
2. Asynchronous replication :
Asynchronous replication is a technique for replicating data between
does
databases (or files systems) where the system being replicated
not wait for the data tohave been recorded on duplicate system
before proceeding.
database
b. Here the target database is updated after the source
modified.
seconds
C. Also, the delay in regaining consistency may range from few
to several hours or even days.
d Asynchronous replication has the advantage of speed, at the
increased risk of the data loss during communication or duplicate
system failure.
for server and
e It is the latest technology to provide fault tolerance
network storage.
asynchronous
Unlike previously used replications technology, files at the
replication technology works by capturing changes in
operating system level (byte level).
replication works
As previous technology like SQL transaction,
within applications or at hardware layer.
3. Push and pull data replication : replication (a
a Push data replication strategy includes both snapshot
only the publishing sites
single updater form of replication wherereplication schemes.
can update the data) and near real time
b. Snapshot data replications is best suited
to applications which are
not in need of current data.
data mining as
C Such applications are found in data warehousing or
wellas non-real time decision support systems.
triggers that are stored at
d A near real time replication employs of replicated
each local database and executes each time a part
other remote
database is updated, propagating the changes to the
database.
independent
e Triggers allow the database to update transparently and
of the programs and users.
controls the data replication
f In the push strategy, the source data schemes, the local
procedures while with pull data replication
database determine the replication processes.
when replication is to
In push replication, a publisher site controls
subscribers.
occur and 'pushes' the changes out to the
when they
h While in pullreplication, the subseriber sites determine
wish to receive replication transactions.
118 (1T-8) J Data Management Issues
i. Here publisher is the originator of a replicated database change
and subscriber is a receiver of a replicated database change.
Asubscriber may also be a publisher and a publisher may also be a
subscriber so it is a bi-directional scheme.
Que 3.7. Discuss the concept of index replication. What purpose
it serves in mobile computing environment ?
UPTU2014-15, Marks 05
Answer
Index replication:
1 Mobile computing environment have two major restrictions, i.e.,
bandwidth limitation and energy restriction.
2 Bandwidth limitation shows that there are very narrow bandwidths
that can be used for wireless communication.
3 Energy restriction shows the mobile computing devices usually use
batteries as their main energy sources.
4 Data broadcasting is a mechanism that can efficiently cope with the
above discussed limitations.
5. Server sends data stream to a large number of unspecified clients and
the clients receive the broadcast data.
6 As clients do not send requests to the server and the energy consumed
in sending data is much larger than that in receiving data, data
broadcasting isenergy-efficient.
7. It is also bandwidth-eficient because many clients share the broadcasting
channel.
8. Mobile units provide two kinds of operating modes, i.e., active and doze
mode.
9. The energy consumption in doze mode is about 1000 times less than that
in active mode.
10. The index" on wireless broadcast data stream enables the mobile unit
toremain in doze mode when it need not read the broadcast data.
11. Without the index, all data stream must be read from the time the data
access request is initiated to the time, the required data are completely
downloaded.
12. However, by using the index, client reads only some index portions in
the broadcast stream and recognizes the appropriate address of the
target data.
13. After obtaining the address, i.e., the temporal offset from the index to
the data, the client can remain in doze mode until the target data are
delivered.
Mobile Computing 119 (IT-8) J
14. The amount of time elapsed from the moment a client asks for data to
the time it receives appropriate data is called access time". The "tuning
time" is the amount of time for which the client actually listens to the
channel.
15. The major index replication schemes that have been developed so far
are (I, M) indexing and distributed indexing. (I, M) indexing replicates
the global index by Mtimes. This can reduce the tuning time for searching
index buckets in the broadcast stream.
16. Distributed indexing organizes the index structure hierarchically and
replicates some part of indexes appropriately. It performs better than (I,
M) indexing with respect to the tuning time, i.e., it is more energy
efficient.
17. A data bucket is the bucket whose contents are data.
Questions-Answers
Long Answer Type and Medium Answer Type Questions
3 8
B
D
6
11. The system topology is divided into clusters with independent control.
12. Agood clustering scheme will tend to preserve its structure when a few
nodes are moving and the topology is slowly changing.
13. Otherwise, high processing and communications overheads will be paid
to re-construct clusters.
UPTU 2013-14,Marks 10
OR
Answer
16
10 17
6
20
19 18
9
(a) System topology
C 13
C14
12
14 15
2 3 11 16
17
20
19 18
C18
(b) Cluster formation
Fig. 3.11.1.
12. After clustering in Fig. 3.11.1(b), we can find six clusters in the system,
which are (1, 2), (3, 4, 11), (5, 6, 7, 8, 9), (10, 12, 13), (14, 15, 16, 17), (18,
19, 20).
13. To prove the correctness of the algorithm we have to show that:
Every node eventually determines its cluster and only one cluster.
Mobile Computing 125 (IT-8) J
b In a cluster, any two nodes are at most two hop away.
C. The algorithm terminates.
if (my_id == min(T)
{
my_cid my_id;
broadcast cluster (my_id, my_cid);
T=I-(my_id);
for(;;)
if (T ==0) stop;
126 (IT-8) J Data Management Issues
Que 3.12. What was the motivation for designing the CODA
system ? Discuss CODA file system in detail.
UPTU2011-12, Marks 10
OR
Explain CODA file system.
Answer
1. CODA was designed to be a scalable, secure, and highly available
distributed file system.
2 An important goal was to achieve a high degree of naming and location
transparency so that the system would appear to its users very similar
to a pure local file system.
3 By also taking high availability into account, the designers of CODA
have also tried to reach a high degree of failure transparency.
4. CODA is a descendant of version 2 ofthe Andrew file system (AFS), and
inherits many of its architectural features.
5. CODA follows the same organization as AFS.
6 Every Virtue workstation hosts a user-level process called Venus, whose
role is similar to that of an NFSclient.
7. AVenus process is responsible for providing access to the files that are
maintained by the Vice file servers.
8. In CODA, venus is also responsible for allowing the client to continue
operation even if access to the file servers is (temporarily) impossible.
9 This additional role is a major difference with the approach followed in
NFS.
10. The important issue is that Venus runs as a user-level process.
11. Again, there is a separate Virtual File System (VFS) layer that intercepts
all calls from client applications, and forwards these calls either to the
local file system or toVenus.This organization with VES is the same as
in NFS.
12. Venus, in turn, communicates with Vice file servers usinga user- level
RPC system. The RPC system is constructed on top of UDP datagrams
and provides at-most-once semantics. There are three different server
side processes. The great majority of the work is done by the actual
Vice file servers, which are responsible for maintaining a local collection
of files.
Mobile Computing 127 (IT-8) J
13. Like Venus, a fle server runs as a user-level process. In addition, trusted
Vice machines are allowed to run an authentication server. Finally,
update processes are used to keep meta information on the file system
consistent at each Vice server.
14. CODA appears to its users as a traditional UNIX-based file system. It
supports most of the operations that form part of the VFS specification.
15. Unlike NFS, CODA provides a globally shared name space that is
maintained by the Vice servers. Clients have access to this namespace
by means of a special subdirectory in their local namespace.
16. Whenever a client looks up a name in this subdirectory, Venus ensures
that the appropriate part of the shared namespace is mounted locally.
Mobile client
Application Cache--Server
Transparent access
to a Vice file server
Virtue client
Answer
1. Communication in CODA is done through remote procedure call (RPC)
and RPC2 systems.
2 In the call, each time a remote procedure called then RPC2 client code
starts a new thread that sends an invocation request to the server and
gets block until receives an answer.
128 (IT-8) J Data Management Issues
3. In this case requesting process may take arbitrary time and server
sends message to client regularly.
4. If server fails to give response then side effects work.
5. A side effect is a mechanism by which the client and server can
communicate using an application specific protocol.
6 RPC2 allows the client and the server to set up a separate connection for
transferring the video data to the client on time.
7 Connection setup is done as a side effect of an RPC call to the server.
8 Venus runs as a user-level process. There is a separate virtual file
system (VFS) layer that intercepts all calls from client applications, and
forwards these calls either tothe local file system or to venus as shown
in Fig. 3.13.1.
9. Venus communicates with vice file servers using a user-level Remote
Procedure Call (RPC system).
10. The RPCsystem is constructed on top of UDP datagrams and provides
at-most-once semantics.
Client
Server
application
RPC
Application-specific
protocol
Client side Server side
effect effect
11. An important design issue in CODA is that server keeps track of which
clients have a local copy of a file.
12. When fle is modified, a server invalidates local copies by notifying the
appropriate clients through an RPC.
13. Ifa server can notify only one client at a time, invalidating all clients
may take some time as shown in Fig. 3.13.2(a).
14. Hence the problem is caused by the fact that an RPC may fail. To
overcome this problem, the server sends an invalidation message to all
clients in parallel as shown in Fig. 3.13.2 (b).
129 (IT-8) J
Mobile Computing
Client Client
Server
Reply Reply
Invalidate Invalidate
Client Client
Time Time
(b)Sending invalidation
(a) Sending an invalidation message in parallel
message one at a time
Fig. 3.13.2.
Answer
2.
Access control and protection database : The directory access
control list protects the files on coda server.
130 (IT-8) J Data Management Issues
Features of CODA:
1. Itis freely available under a liberal license.
2 Server replication.
3. Security model for authentication, encryption, and access
control.
4. Well defined semantics of
sharing, even in the presence of network
failure.
5 Good scalability.
6 Continued operation during partial network failures in server network.
Que 3.15. What do you mean by CODAfile system and also
explain
the clients in C0DA ? How are disconnected
operations performed
in CODA ?
UPTU2013-14, Marks 10
Answer
Coda file system:Refer Q. 3.12, Page 126J, Unit-3.
Disconnected operations in CODA:
1.
Disconnected operations is a mode of operations that enables a client to
continue accessing critical data during temporary failures of shared
data repository.
2. It is a temporary deviation from normal
operation as a client of a shared
repository.
3.
Disconnected operation in a file system is indeed feasible, efficient and
usable.
4. The central idea behind this is that the caching of
data, widely used to
improve performance, can also be exploited to enhance
Clients in CODA:
availability.
1. CODA contains a large collection of untrusted
number of trusted UNIX file servers. UNIX clients and a small
2. Each CODA client has a local disk and can
over a high bandwidth network. communicate with the servers
3. At certain times, a client may be
with some or all of the servers. temporarily unable to communicate
4. This may be due to a server or
of a portable client from the network failure, or due to the detachment
network.
Clients view CODA as a single,
system. location-transparent shared UNIX file
Mobile Computing 131 (IT-8) J
Que 3.16. Design the CODA file system and explain the different
disconnected
states. Draw the state transition diagram and
operation in CODA file system. |UPTU 2014-15, Marks 10
OR
Answer
Coda file system and its states : Refer Q. 3.12, Page 126J and Q. 3.13,
Page 127J; Unit-3.
Disconnected operations : Refer Q. 3.15, Page 130J, Unit-3.
Venus states:
1 Hoarding, emulation and re-integration are three states in the
disconnected operation in the CODA file system.
2 These three states come under venus states or in other words it also
comes under disconnected operation in CODA file system.
132 (IT-8) J Data Management Issues
1. Hoarding :
a. In this state, venus hoards the useful data in anticipation of
disconnection.
b. It manages its cache in such a way that balances the needs of
connected and disconnected operation.
C.
Many factors complicate theimplementation ofhoarding:
i. Disconnections and reconnections are often unpredictable.
Activity at other clients must be accounted for, so that the
latest version of an object is in the cache at disconnection.
iüi. Since cache space is finite, the availability of less critical objects
may have to be sacrificed in favor of more critical objects.
2. Re-integration :
It is a transitory state through which venus passes in changing
roles from pseudo-server to cache manager.
b. Re-integration is performed a volume at a time with all update
activity in the volume suspended until completion.
C During re-integration, conflicts are detected and, where possible,
automatically resolved.
3. Emulation :
When the number of servers in the clients AVSG (Accessible Volume
Storage Group) drop to zero, bringing in into an emulation state,
the behaviour of server for the volume will have to be emulated on
the client's machine.
b It means that all file requests will be directly serviced using the
locally cached copy of the file.
C When a client is in its emulation state, it may still be able to contact
servers to manage other volumes.
Venus states and transitions are shown in Fig. 3.16.1.
Hoarding
Disconnection Logical
reconnection
Disconnected
Emulation Re-integra
tion
Physical
reconnection