0% found this document useful (0 votes)
7 views10 pages

A Case Study On Different Applications and Security Issues in Distributed Systems

This case study explores various applications of distributed systems, focusing on distributed web applications, web caching, and distributed file systems, while addressing security issues associated with these systems. It discusses different caching methods, including pull-based, push-based, and hybrid approaches, and proposes a scalable solution for web caching that maintains consistency. The study concludes by highlighting advancements in distributed systems and the effectiveness of the proposed caching approach in ensuring data freshness and reducing server load.

Uploaded by

Abhinav Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views10 pages

A Case Study On Different Applications and Security Issues in Distributed Systems

This case study explores various applications of distributed systems, focusing on distributed web applications, web caching, and distributed file systems, while addressing security issues associated with these systems. It discusses different caching methods, including pull-based, push-based, and hybrid approaches, and proposes a scalable solution for web caching that maintains consistency. The study concludes by highlighting advancements in distributed systems and the effectiveness of the proposed caching approach in ensuring data freshness and reducing server load.

Uploaded by

Abhinav Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

A Case Study on Different Applications and

Security Issues in Distributed Systems

Abhinav Agarwal

19ucs254

[email protected]

Department of Computer Science and Engineering


The LNM Institute of Information Technology, Jaipur

1
Abstract

This case study will look into the background of the different applications of
distributed systems and security around them.

The background includes the functioning of distributed web applications, It


includes an introduction to web caching into which we will dive in the later part
of the paper. We will look into the basics of distributed file systems and
technologies built on top of it, like Sun’s NFS, CODA, HDFS, etc. System Security
is another critical aspect of maintaining these distributed systems, these systems
need to be secured for various attacks, We will see some of the ways to achieve
the same like Authentication Protocols, Public-Private Key Cryptography, Digital
Signatures, and Firewalls.

In the later part of the case study, we will look into web caching and the various
methods to achieve the same. Like we will see the push, pull, and lease-based
approach and their respective merits and demerits.

Then, we will look into the solution proposed by Haobo Yu and the team in [7] to
achieve scalability in web caching and how to ensure consistency among proxies
which is a major drawback in other approaches.

At last, we will conclude by exploring the efficiency if this approach.

2
Background

Distributed Web Applications

A standard web application consists of a web browser client and a web server.. It
can also contain a Proxy Server, which acts as an intermediary between the client
and the server. This proxy server can have multiple use cases ranging from load
balancing to caching. There is also a database server along with a file server with
which our web server communicates to get the data requested by the user.

This structure discussed above can be built as a monolithic architecture or as a


microservice architecture, which is basically, we divide our applications between
different microservices, each responsible for a particular functionality, building this
way also eliminates the single point of failure; also, we can scale any microservice
independently based on the requirements.

Web Caching

Web caching is used when we want performance while scaling the system, in this
we duplicate some functionality or data on multiple nodes and request to these
functions, and data get served from these nodes instead of our main server. This
helps in reducing the load on our main server. Web caching works best when
employes closest to the clients this is where technologies like Edge Computing and
CDN comes into play. Where we come compute result at the “edge” of the network.

The problem with this method is that we observe consistency issues because
although the data got updated at the main server but it has still not been updated at
these nodes, and this old data get served to the users. To solve this problem we can
either use a pull or push-based approach, wherein the first these nodes pull data
from the central server at regular intervals, and in the second, the main server
pushes the new data onto these nodes whenever it receives it.

3
Distributed File Systems

DFS is yet another application of distributed system, where your files are these on
some remote servers and can be accessed using Remote Procedure Calls.

Sun’s Network File System if a widely used DFS, which uses virtual file system layer
to handle local and remote files, NFS uses mount protocol to access remote files,
mount protocol establishes a local name for the remote files, users access remote
files using local names and OS takes care of the mapping, NFS also allows client
caching, where cache data can stale upto 30 seconds, NFS implements security
using user ID, group ID authentication only [2].

To make the file system disconnection transparent, which is especially needed for
the mobile clients CODA was developed. In CODA each file belongs to exactly one
volume and each volume may be replicated across several servers, CODA works on
the principal of read-once write-all, where write conflicts are resolves manually by
the user like GIT [3].

Lets talk about xFS a little, it is basically a server less file system which is designed
for high speed LAN environments, it distributes data storage disks using software
RAID and log based network stripping. It also eliminates central server caching
using cooperative caching. As xFS uses RAID so overhead of parity management
hurts performance for small writes also RAID are very expensive hardware.

Some other file systems include LFS which is a Log Structured FS, which provides
fast writes, simple recovery and flexible file locations, another is Hadoop DFS
(HDFS), which is optimized for large data sets which is accessed using Hadoop. [6]

Distributed System Security

The objective of the security is to protect against invalid operations, unauthorized


invocations, and unauthorized users and to achieve that various techniques and
protocols are used.

4
There have been alot of developments to answer the question, how to provide the
authentication to the user, alot of answers are in the direction of encryption but
even if we make it possible using public-private key cryptography it is as “secure” as
the public key distribution and for that algorithms like Diffie-Hellman have been
introduced.

To protect against the intruders, one can use Firewall which is a network
component sitting between inside and outside, it drops packets on the basis of
source and destination address,

To provide encryption and authentication between web server and client SSL
(Secure Socket Layer ) was developed by the Netscape, to begin the SSL session
server’s public key is needed which is encrypted using CA’s private key, and it is
decoded using CA’s public keys which are stored in the browser [4].

Blockchain to implement the security uses consensus validation where each


transaction is signed using the user’s private key and inserted into the ledger, this
transaction si validated by a p2p network without compromising private
information and eliminating the need of any central security, once approved it
exists on the ledger permanently, Bitcoin uses the same mechanism to record the
transactions on its chain [5].

5
Case Evaluation - Web Caching
Web caching is traditionally done using three methods, pull-based caching,
push-based caching, and a hybrid approach.

Pull-Based Caching
This approach is based on the concept of time-to-live (TTL). When the request
arrives at the cache after the TTL has expired, it pulls the latest data from the
server. If the TTL is fixed, then the cache staleness is bounded by this TTL. If we set
a very small value for TTL, then it mitigates the benefits of web caching.

The proxy can also dynamically determine the refresh interval (TTL) based on past
observations, this is known as intelligent polling. So it can be something like,
increase the interval if the object has not changed in two previous polls and
decrease the interval if it has.

Generally, the pull-based approach is not preferred for dynamic content due to the
high overhead of pulling unchanged data, also there can also be consistency issues
if the data is changing very frequently then the user can see previous data because
new data has not been pulled yet on its closest node. Whereas for static content it
is the best approach

Push-Based Caching
In this type of caching, each server keeps track of the changes on a particular page,
and then, whenever that page changes, it notifies the proxies and floods the
network with the updated data. While this approach eliminates the staleness but it
incurs the cost of requiring the server to keep track of all proxies. Also, flooding the
entire network has its own overhead, Thus this approach does not scale.

When working with dynamic content, ensuring consistency is a very big issue, if we
make our dynamic content static and store it in the cache to be served to the user
then if we employ the push-based approach, then even for very little change we
have to flush the entire cache regenerate the content and again store it in the
cache. Also, this approach is not resilient to server crashes.

6
Hybrid Approach - Leases
Lease is a duration for which the server agrees to notify the proxy of modification,
so a lease is issued to a proxy on the first request, and the server will send the
notifications until expiry. Once the lease expires the proxy have to renew the lease.
So if there is no load on the proxy so it will just poll the main server whenever
necessary, or if it is in a load then infinite push is there
There are different policies defined for Leased duration, one is an Age-based lease
where larger the expected lifetime, longer the lease. Another is Renewal-Frequency
Based, where proxies at which objects are popular get a longer lease. One more is
server load based where shorter leases are given during heavy load.
The Efficiency of the whole system depends upon the lease duration, and there is
the overhead of renewing the short leases.

7
Proposed Solution & Implementation
A scalable consistent hashing method has been proposed in [7], which utilizes
invalidations, hierarchy, and leases.

Each group in the hierarchy is associated with the caches, and caches send
heartbeats to each other that are equivalent to cache-to-cache leases. The cache
maintains a server table in order to locate where the web server is located in the
hierarchy. The client request is forwarded to the first cache in the hierarchy which
consists of a valid copy of the requested page.

The caching hierarchy is maintained in the form of groups where each cache joins
the group owned by its parent. Thus there is no need for parents to know who its
children are and children can choose its parent freely as long as cycles are
prevented. More on hierarchy establishment and maintenance has been discussed
in [8].

The hierarchy is kept alive with the help of heartbeats, Each group owner sends
periodic heartbeats to its associated group. Let each lease length is T and t is the
time difference between subsequent leases then (T/t = 5) in their case. This ensures
that if some heartbeat is lost then it will cause much problems.

With the heartbeat, we piggyback the knowledge of the invalid page. We only need
invalid pages that have been requested after they were last rendered invalid. Each

8
heartbeat request contains the knowledge of these pages that have been rendered
invalid at the parent cache and this knowledge is propagated to its child caches.

Heartbeat along with traveling down also travels up, from the server to the
top-level cache, the cache with which the web server is attached is called a primary
cache. Each server sends a JOIN request up the hierarchy, and every cache on
receiving this request makes an entry in its server routing table. The thing to note is
top-level cache knows all the servers attached in the hierarchy. These servers
communicate with the primary cache with the help of a heartbeat. A cache can also
send a LEAVE signal to its parent and children if it does not receives a heartbeat
within T seconds.

Client can attach to any cache in the hierarchy lets call it the clients’s primary
cache. When a clients requests a page, it sends the request to its primary cache.
This cache checks if it contains the requested page if not the request is forwarded
the next cache. When the request is fulfilled either by the originating server or
some intermediary cache, the response takes the reverse path updating all the
caches in the way and serving the user in the end.

9
Conclusion
In this term paper we first discussed the various applications of distributed systems
and the role of security in the same. We saw that a lot of advancements have been
made in the fields of web applications, caching, and distributed file systems. Then
we saw various approaches used to perform web caching for static and dynamic
content and the merits and demerits of each approach.
At last, we saw a new kind of approach implementing web caching which is scalable
and consistent invariant. The approach combined the lessons of the pull, push, and
lease-based approach. The respective author’s performance evaluation suggests
that when the heartbeat rate is larger than the write times, then this approach is
very effective in keeping the pages fresh. When the pages are write-dominated
then this approach ensures freshness because if the page is invalid the request is
served from the cache in the hierarchy which contains the correct page. However,
when the pages are read-dominated, then the invalidation approach offers
significant reductions in server hits counts and client response time.

References
1.​ Course on Distributed Systems, 2022-23
2.​ https://www.ibm.com/docs/en/aix/7.1?topic=management-network-f
ile-system
3.​ http://www.coda.cs.cmu.edu/
4.​ https://www.cloudflare.com/learning/ssl/how-does-ssl-work/
5.​ Course on Blockchain Foundation and Smart Contract, 2021-22
6.​ Distributed Principals and Paradigms, Tanenbaum-Steen
7.​ A Scalable Web Consistency Architecture.
8.​ ROSESSTEIN, A.. 12, J.. AND Tow. S. Y. MASH: The rnulticasting archive
server hierarchy. SIGCOMM Computer Cornmrmication Revtew 2’7. 3
(July 1997).

10

You might also like