0% found this document useful (0 votes)
81 views79 pages

Mini Final

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views79 pages

Mini Final

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

A Mini Project Report on

“DATA ROUTING STRATEGY IN CLOUD ENVIRONMENT”


Submitted to the

JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD

In partial fulfillment of the requirement for the award of the degree of

BACHELOR OF TECHNOLOGY IN

COMPUTER SCIENCE & ENGINEERING

BY
K Akash Reddy (22WJ5A0514)

K Sritha (21WJ1A05F8)

K Sharath (22WJ5A0511)

Under the Esteemed Guidance Of

Ms.Hyma Birudaraju,Assistant Professor , GNITC

Department Of Computer Science Engineering

GURU NANAK INSTITUTIONS TECHNICAL CAMPUS (AUTONOMOUS)

School of Engineering and

Technology, Ibrahimpatnam, R.R

District 501506, 2024-2025


Department of Computer Science & Engineering

CERTIFICATE
This is to certify that this major project entitled “DATA ROUTING STRATEGY
IN CLOUD ENVIRONMENT” beingsubmitted by K Akash Reddy
(22WJ5A0514),K Sritha (21WJ1A05F8),K Sharath (22WJ1A0511) in
partial fulfillment for the award of the Degree of Bachelor of Technology in
Computer Science & Engineering of the GuruNanak Institutions
Technical Campus, Hyderabad during the academic year 2024-2025, is a
record of bonafide work carried out under our guidance and supervision at
Guru Nanak Institutions Technical Campus (Autonomous).

—-------------------- –—--------------------- –—----------------

Ms.Hyma Birudaraju Mr. Mohd Irfan Ms. Dr. Geetha Tripati

INTERNAL GUIDE PROJECT COORDINATOR HOD CSE

—------------------------------

EXTERNAL EXAMINER
PROJECT COMPLETION CERTIFICATE
This is to certify that the following students of final year B. Tech, Department Of
Computer
Science and Engineering - Guru Nanak Institutions Technical Campus
(GNITC) have completed their training and project at GNITC successfully.

STUDENT NAME ROLLNUMBER


1. K Akash Reddy 22WJ5A0514
2. K Sritha 21WJ1A05F8
3. K Sharath 22WJ5A0511

The training was conducted on Java Technology for the completion of the project
titled “DATA ROUTING STRATEGY IN CLOUD ENVIRONMENT”in
October 2024. The project has been completed in all aspects.
ACKNOWLEDGEMENT
I wish to express my sincere thanks to Mr.sardar tavinder singh kohli, chairman, GNITC for providing me
with the conductive environment for carrying through my academic schedules and project with ease.

I wish to express my sincere thanks to Mr.sardar Gagandeep singh kohli, Vice chairman, GNITC for
providing me with the conductive environment for carrying through my academic schedules and project with
ease.

I wish to express my sincere thanks to Dr.Harvinder Singh Saini, Managing Director, GNITC for
providing me with the conductive environment for carrying through my academic schedules and project with
ease.

I wish to express my sincere thanks to Dr. S.Sreenatha Reddy , Director, GNITC for providing me with the
conductive environment for carrying through my academic schedules and project with ease.

I wish to express my sincere thanks to Dr. Rishi Sayal, Professor Director, GNITC for providing me with
the conductive environment for carrying through my academic schedules and project with ease.

I have been truly blessed to have a wonderful adviser Ms. Hyma Birudaraju ,Assistant Professor, GNITC
for guiding me to explore the ramification of my work and I express my sincere gratitude towards him for
leading me throughout the completion of the project.

I would like to say my sincere thanks to Mr. Mohd Irfan, Associate Professor, Department of CSE, Project
Coordinator, for providing seamless support and right suggestions are given in the completion of the project.

I specially thank my internal guide Ms. Hyma Birudaraju, Associate Professor , GNITC for her suggestions
and constant guidance in every stage of the project.

Finally, I would like to thank my family members for their moral support and encouragement to achieve goals.

K Akash Reddy (22WJ5A0514)

K Sritha (21WJ1A05F8)

K Sharath (22WJ5A0511)
DATA ROUTING STRATEGY IN CLOUD
ENVIRONMENT
ABSTRACT
The application of data deduplication technology reduces the demand for data storage and
improves resource utilization. Compared with limited storage capacity and computing capacity
of a single node, cluster data deduplication technology has great advantages. However, the
cluster data duplication technology also brings new issues on deduplication rate reduction and
load balancing of storage nodes. The application of data routing strategy can well balance the
problem of deduplication rate and load balancing. Therefore, this paper proposes a data routing
strategy based on distributed Bloom Filter. 1) Superchunk is used as the basic unit of data
routing to improve system throughput. According to Broder’s theorem, k leastsized fingerprints
are selected as the Superchunk features and send to the storage node. The optimal node is
selected as the routing node by matching the BloomFilter, and the storage capacity of the node
and maintained in the memory of the storage node. 2) Design and implement system prototypes.
The specific parameters of all kinds of routing strategies are obtained through experiments, and
the routing strategies proposed in this paper are tested. The theoretical analysis and experimental
results prove the feasibility of the strategies proposed by this paper. Compared with the other
routing strategies, our method improved 3% of the deduplication rate, reduces the
communication query overhead by more than 36% and improves the load balancing degree of the
storage system.
LIST OF CONTENT
CHAPTER NO. TITLE PAGE NO.

1. CHAPTER 1: INTRODUCTION

1.1 General
1.2 Scope Of the Project
1.3 Objective
1.4 Problem Statement
1.5 Existing System
1.5.1 Existing System Disadvantages
1.5.2 Literature Survey
1.6 Proposed System
1.6.1 Proposed System Advantages
2. CHAPTER 2: PROJECT DESCRIPTION

2.1 General
2.2 Methodologies
2.2.1 Modules Name
2.2.2 Modules Explanation
2.2.3 Given Input and Expected Output
2.3 Technique or Algorithm
3. CHAPTER 3: REQUIREMENTS

3.1 General
3.2 Hardware Requirements
3.3 Software Requirements
3.4 Functional Specification
3.5 Non-Functional Specification
4. CHAPTER 4: SYSTEM DESIGN

4.1 General
4.1.1 Use Case Diagram
4.1.2 Class Diagram
4.1.3 Object Diagram
4.1.4 State Diagram
4.1.5 Sequenced Diagram
4.1.6 Collaboration Diagram
4.1.7 Activity Diagram
4.1.8 Component Diagram
4.1.9 E-R Diagram
4.1.01 Data Flow Diagram
4.1.11 Deployment Diagram
4.2 System Architecture
5. CHAPTER 5: SOFTWARE SPECIFICATION

5.1 General
5.2 Features of Java
5.2.1 The Java Framework
5.2.2 Objective of Java
5.2.3 Java Swing Overview
5.2.4 Evolution of Collection Framework
5.3 Conclusion
CHAPTER 6: IMPLEMENTATION

6. 6.1 General
6.2 Implementation
7. CHAPTER 7: SNAPSHOTS

7.1 General
7.2 Various Snapshots
8. CHAPTER 8: SOFTWARE TESTING

8.1 General

8.2 Developing Methodologies

8.3 Types of Testing

8.3.1 Unit testing


8.3.2 Functional test
8.3.3 System Test
8.3.4 Performance Test
8.3.5 Integration Testing
8.3.6 Acceptance Testing
8.3.7 Build the test plan
9. CHAPTER 9: APPLICATIONS AND FUTURE
ENHANCEMENT

9.1 General

9.2 Future Enhancements

10. CHAPTER 10: CONCLUSION & REFERENCES

10.1 Conclusion

10.2 References
LIST OF FIGURES
FIGURE NO NAME OF THE FIGURE PAGE NO.

2.3.2 Module Diagram

4.2 Use case Diagram

4.3 Class Diagram

4.4 Object Diagram

4.5 State Diagram

4.6 Sequence Diagram

4.7 Collaboration Diagram

4.8 Activity Diagram

4.9 Component Diagram

4.10 Data flow Diagram

4.11 E-R Diagram

4.12 Deployment Diagram

4.13 System Architecture

7.1 Home Page


LIST OF NOTATIONS

S.NO NAME NOTATION DESCRIPTION

Class Name
+ public Represents a collection of
-attribute- similar entities grouped
1. Class -private attribute
together.
# protected

+operation

+operation

+operation
NAME Associations represents
Class A Class B
static relationships between
2. Association
classes. Roles represents
the way the two classes see
Class A Class B
each other.

3. Actor It aggregates several classes


into a single classis

Interaction between the


Class A Class A
Aggregation system and external
5.
environment
Class B Class B

5. Used for additional process


Relation Uses
communication.
(uses)
Extends relationship is used
when one use case is similar
6. Relation extends
to another use case but does
(extends) a bit more.

7. Communication Communication between


various use cases.

State
8. State State of the process.

9. Initial State Initial state of the object

10. Final state Final state of the object

11. Control flow Represents various control


flow between the states.

Represents decision making


process from a constraint
12. Decision box

Interact ion between the


system and external
13. Usecase Uses case environment.
Represents physical modules
which is a collection of
14. Component
components.

Represents physical modules


which are a collection of
15. Node
components.

A circle in DFD represents a


state or process which has
Data Process /
been triggered due to some
State
16. event or action.

Represents external entities


such as keyboard, sensors,
17. External entity
etc.

Represents communication
that occurs between
18. Transition
processes.

Represents the vertical


dimensions that the object
19. Object Lifeline
communications.

Represents the message


exchanged.
20. Message Message
LIST OF ABBREVATION

S.NO ABBREVATION EXPANSION

1. DB Data Base

2. JVM Java Virtual Machine

3. JSP Java Server Page

4. CB Collective Behavior

5. RSSS Ramp secret sharing scheme

6. JRE Java Runtime Environment


CHAPTER 1

INTRODUCTION

1.1 GENERAL

Data is more important today than ever for enterprises and individuals. With exponential
growth of data, it is difficult to manage a huge lump of data. Normally the volume of data
reaches the PB level even EB - level in the enterprise data manage center, which increase the
cost of data management. Improving the storage efficiency on data backup has become a
research hotspot. Almost 75% data in the data world are duplicated based on survey, and
duplicated data reaches approximately 90% in the backup data and file system. The development
of deduplication technology in recent years has provided an effective solution for duplicated
data, and deduplication is a data compression strategy applied to storage systems with high data
compression rates.

Limited to the computational and storage capacity of a single node, Cloud Deduplication
(CD) makes use of the powerful storage capacity and parallelism of clustered systems so that
deduplication can be used in large-scale distributed storage systems. Cloud deduplication is
benefit for data routing strategy to send data to each storage node and implement data
deduplicate at each storage node, or the data is deduplicated before it is sent to the storage node.
Both approaches have their pros and cons, but to ensure the parallelism of the system and low
system overhead, the current cluster deduplication system mainly uses the former for data
routing. Implement the routing strategy requires maintaining fingerprint indexes of data blocks in
the memory of storage nodes, which the size of data block is 4KB and the fingerprint size of
each data block is about 40B. The ratio of the size of the block fingerprint to the size of the data
volume is about 1:100. Considering the memory space limitation of storage nodes and the cost
limitation, such a huge amount of fingerprint indexes cannot be stored in the memory of storage
nodes, so most of them are stored on disk, and the needed fingerprint indexes are read into the
memory only when deduplication is performed.
1.2 SCOPE OF THE PROJECT

The routing strategy proposed in this paper, using the data similarity principle and
relevant information such as storage node capacity, achieves a balance between the redundancy
rate and load balancing, avoiding a significant drop in the redundancy rate while achieving a
balanced distribution of data in the storage nodes.

1.3 OBJECTIVE

In contrast, cloud storage is a cloud computing service with data storage and management as the
core. Applying deduplication technology to cloud storage systems and using the computing and
storage capabilities of nodes in cloud storage can improve the performance of the entire storage
system. However, managing a large number of storage nodes in the cloud storage system and
ensuring node load balancing while avoiding a significant drop in deduplication rate are two key
issues that need to be addressed. The main role of the data routing policy is to send the data from
the client to the storage nodes. It needs to send as many duplicate data to the same node as
possible. However, after a certain time, it will cause the state that some nodes store a large
amount of data of the system while other nodes are idle, so how to balance these two problems is
the key point to consider in the data routing policy.
1.4 PROBLEM STATEMENT

However, too many disk accesses will cause disk bottleneck problems and lead to lower
system throughput. At the same time, in the cluster deduplication system is often only the data
stored in the node itself for deduplication, which will lead to a reduction in the overall
deduplication rate of the entire system, and also lead to a large number of duplicate data routed
to a few nodes resulting in load imbalance. Hence, the main problems facing cloud deduplication
are query performance, deduplication rate, and load balancing. With the development of cloud
computing, cloud computing services not only unify the deployment and management of data
resources but also optimize the utilization of resources. Routing strategy afterload balancing is
adopted in the entire deduplication system, the storage capacity of all storage nodes is increased
simultaneously.
1.5 EXISTING SYSTEM

➢ However, too many disk accesses will cause disk bottleneck problems and lead to lower
system throughput.
➢ At the same time, in the cluster deduplication system is often only the data stored in the
node itself for deduplication, which will lead to a reduction in the overall deduplication
rate of the entire system, and also lead to a large number of duplicate data routed to a
few nodes resulting in load imbalance.
➢ Hence, the main problems facing cloud deduplication are query performance,
deduplication rate, and load balancing.

1.5.1 EXISTING SYSTEM DISADVANTAGES

It is difficult for the node memory to maintain such a huge hash table, Less Efficient and
Security.

If we exist efficient and security.


1.5.2 LITERATURE SURVEY

Title: Research on multi-feature data routing strategy in deduplication.

Year: 2020

Author: Q. He, G. Bian, B. Shao, and W. Zhang.

Description: Deduplication is a popular data reduction technology in storage systems which has
significant advantages, such as finding and eliminating duplicate data, reducing data storage
capacity required, increasing resource utilization, and saving storage costs. The file features are a
key factor that is used to calculate the similarity between files, but the similarity calculated by
the single feature has some limitations especially for the similar files. The storage node feature
reflects the load condition of the node, which is the key factor to be considered in the data
routing. This paper introduces a multi-feature data routing strategy (DRMF). The routing
strategy is made based on the features of the cluster, including routing communication, file
similarity calculation, and the determination of the target node. The mutual information
exchange is achieved by routing communication, routing servers, and storage nodes. The storage
node calculates the similarity between the files stored, and then the file is routed according to the
information provided by the routing server. The routing server determines the target node of the
route according to the similar results and the node load features. The system prototype is
designed and implemented; also, we develop a system to process the feature of cluster and
determine the specific parameters of various features of experiments. In the end, we simulate the
multi-feature data routing and single-feature data routing, respectively, and compare the
deduplication rate and data slope between the two strategies. The experimental results show that
the proposed data routing strategy using multiple features can improve the deduplication rate of
the cluster and maintain a lower data skew rate compared with the single-feature-based routing
strategy MCS; DRMF can improve the deduplication rate of the cluster and maintain a lower
data skew rate.
Title: Boafft: Distributed deduplication for big data storage in the cloud.

Year: 2020

Author: S. Luo, G. Zhang, C. Wu, S. U. Khan, and K. Li.

Description:

As data progressively grows within data centers, the cloud storage systems continuously
face challenges in saving storage capacity and providing capabilities necessary to move big data
within an acceptable time frame. In this paper, we present the Boafft, a cloud storage system
with distributed deduplication. The Boafft achieves scalable throughput and capacity using
multiple data servers to deduplicate data in parallel, with a minimal loss of deduplication ratio.
Firstly, the Boafft uses an efficient data routing algorithm based on data similarity that reduces
the network overhead by quickly identifying the storage location. Secondly, the Boafft maintains
an in-memory similarity indexing in each data server that helps avoid a large number of random
disk reads and writes, which in turn accelerates local data deduplication. Thirdly, the Boafft
constructs hot fingerprint cache in each data server based on access frequency, so as to improve
the data deduplication ratio. Our comparative analysis with EMC's stateful routing algorithm
reveals that the Boafft can provide a comparatively high deduplication ratio with a low network
bandwidth overhead. Moreover, the Boafft makes better usage of the storage space, with higher
read/write bandwidth and good load balance.

Currently, the enterprise data centers manage PB or even EB magnitude of data. The data
in those cloud storage systems (e.g., GFS, HDFS, Ceph, Eucalyptus, and GlusterFS) that provide
a large number of users with storage services are even larger. The cost of enterprise data storage
and management is increasing rapidly, and the improvement of storage resource utilization has
become a grand challenge, which we face in the field of big data storage. According to a survey,
about 75 percent data in the digital world are identical [6], and especially the data redundancy in
backup and archival storage systems is greater than 90 percent. The technique of data
deduplication can identify and eliminate duplicate data in a storage system. Consequently, the
introduction of data deduplication into cloud storage systems brings an ability to effectively
reduce the storage requirement of big data and lower the cost of data storage.
Title: Data deduplication technology for cloud storage.

Year: 2020

Author: Q. He, G. Bian, B. Shao, and W. Zhang.

Description: As huge amounts of data are produced and stored for various applications, then the
problem arises in data storage as well as access time of the data. The stored data are accessed in
the least amount of time, despite there exists duplicates of data in the storage. This becomes
easier by suing cloud storage devices due to easy access of data and their duplicates. Memory
reutilization makes the storage efficient. The data can be backed up in the cloud and restored
easily when there is crash preventing inconsistency in data. For deduplication, logical pointers
and the hash functions are used along with deduplication manager that replaces the duplicate data
with pointers and keeps track of duplicates irrespective of the load. The security of the data is
also a key factor along with data consistency. This chapter discusses data deduplication for cloud
computing and explains how the cloud helps in addressing the issues related to duplicates of
data.
Title: EaD: A collision-free and high-performance deduplication scheme for flash storage
systems.

Year: 2020

Author: S. Wu, J. Zhou, W. Zhu, H. Jiang, Z. Huang, Z. Shen, and B. Mao.

Description: Inline deduplication is a popular technique to effectively reduce the write traffic
and improve the space efficiency for flash-based storage. However, it also introduces computing
and memory overhead to generate and store the cryptographic hash (fingerprint). Along the
advent of 3D XPoint and Z-NAND technologies with vastly improved latency and bandwidth,
both the computing and memory overheads are becoming much more pronounced in
deduplication-based flash storage with cryptographic hash functions in use. To address these
problems, we propose an ECC (Error Correcting Code) assisted deduplication approach, called
EaD, which exploits the ECC property and the asymmetric read-write performance
characteristics of modern flash-based storage. EaD first identifies data similarity based on the
fingerprints of data chunks represented by their ECC values, thus significantly reducing the
costly cryptographic hash computing and alleviating the memory space overhead. Based on the
identification results, similar data chunks and their ECCs are read from the flash to perform a
byte-by-byte comparison in memory to definitively identify and remove redundant data chunks.
Our experiments show that the EaD approach significantly reduces the I/O latency by an average
of 1.92× and 1.86×, and reduces the memory consumption by an average of 35.0% and 21.9%,
compared with the existing SHA- and sampling-based deduplication approaches, respectively.

Flash-based devices have been extensively deployed in modern computer systems to


satisfy the increasing demand for storage performance and energy efficiency. Due to the unique
characteristics of the flash memory technology, the performance and reliability of flash storage is
highly sensitive to the write traffic. Thus, techniques that can reduce the number of writes to
flash storage are desirable and have received a lot of attention from both industry and academia.
The most popular and effective among these techniques is data deduplication, which has gained
increasing traction due to its ability to reduce the storage space requirement by eliminating
duplicate data and minimizing the transmission of redundant data in storage systems.
Title: BCD deduplication: effective memory compression using partial cache-line deduplication

Year: 2021

Author: Sungbo Park, Ingab Kang, Yaebin Moon, Jung Ho Ahn, G. Edward SuhAuthors Info &
Claims.

Description: In this paper, we identify new partial data redundancy among multiple cache lines
that are not exploited by traditional memory compression or memory deduplication. We propose
Base and Compressed Difference (BCD) deduplication that effectively utilizes the partial
matches among cache lines through a novel combination of compression and deduplication to
increase the effective capacity of main memory. Experimental results show that BCD achieves
the average compression ratio of 1.94× for SPEC2017, DaCapo, TPC-DS, and TPC-H, which is
48.4% higher than the best prior work. We also present an efficient implementation of BCD in a
modern memory hierarchy, which compresses data in both the last-level cache (LLC) and main
memory with modest area overhead. Even with additional meta-data accesses and
compression/deduplication operations, cycle-level simulations show that BCD improves the
performance of the SPEC2017 benchmarks by 2.7% on average because it increases the effective
capacity of the LLC. Overall, the results show that BCD can significantly increase the capacity
of main memory with little performance overhead.
Title: Ultra-low latency SSDs' impact on overall energy efficiency.
Year: 2020
Author: B. Harris and N. Altiparmak.

Description: Recent technological advancements have enabled a generation of Ultra-Low


Latency (ULL) SSDs that blurs the performance gap between primary and secondary storage
devices. However, their power consumption characteristics are largely unknown. In addition,
ULL performance in a block device is expected to put extra pressure on operating system
components, significantly affecting energy efficiency of the entire system. In this work, we
empirically study overall energy efficiency using a real ULL storage device, Optane SSD, a
power meter, and a wide range of IO workload behaviors. We present a comparative analysis by
laying out several critical observations related to idle vs. active behavior, read vs. write behavior,
energy proportionality, impact on system software, as well as impact on overall energy
efficiency. To the best of our knowledge, this is the first published study of a ULL SSD's impact
on the system's overall power consumption, which can hopefully lead to future energy-efficient
designs.

Various innovations have taken place in the storage subsystem within the past few years,
including wide adoption of the NVMe interface for new generation storage devices, the
corresponding multi-queue request submission/completion capability (blk-mq) implemented in
the Linux block IO layer, and new IO scheduling algorithms specifically designed for blk-mq.
We have also seen the emergence of new storage technologies such as Intel’s Optane SSDs
based on their 3D XPoint technology, Samsung’s Z-SSD with their SLC based 3D NAND
technology, and Toshiba’s XLFlash design using a similar technology as Z-SSDs. All these
innovations enable a new generation of Ultra-Low Latency (ULL) SSDs that are broadly defined
as providing sub10 µs data access latency. Compares ULL SSD performance with existing
storage technologies and shows how ULL SSDs help close the performance gap between
primary and secondary storage devices.
Title: Improving the Performance of Deduplication-Based Storage Cache via Content-Driven
Cache Management Methods
Year: 2021
Author: Yujuan Tan, Congcong Xu, Jing Xie, Zhichao Yan, Hong Jiang, Witawas Srisa-an.

Description: Data deduplication, as a proven technology for effective data reduction in backup
and archiving storage systems, is also showing promises in increasing the logical space capacity
for storage caches by removing redundant data. However, our in-depth evaluation of the existing
deduplication-aware caching algorithms reveals that they only work well when the cached block
size is set to 4 KB. Unfortunately, modern storage systems often set the block size to be much
larger than 4 KB, and in this scenario, the overall performance of these caching schemes drops
below that of the conventional replacement algorithms without any deduplication. There are
several reasons for this performance degradation. The first reason is the deduplication overhead,
which is the time spent on generating the data fingerprints and their use to identify duplicate
data. Such overhead offsets the benefits of deduplication. The second reason is the extremely low
cache space utilization caused by read and write alignment. The third reason is that existing
algorithms only exploit access locality to identify block replacement. There is a lost opportunity
to effectively leverage the content usage patterns such as intensity of content redundancy and
sharing in deduplication-based storage caches to further improve performance. We propose
CDAC, a Content-driven Deduplication-Aware Cache, to address this problem. CDAC focuses
on exploiting the content redundancy in blocks and intensity of content sharing among source
addresses in cache management strategies. We have implemented CDAC based on LRU and
ARC algorithms, called CDAC-LRU and CDAC-ARC respectively. Our extensive experimental
results show that CDAC-LRU and CDAC-ARC outperform the state-of-the-art deduplication-
aware caching algorithms, D-LRU, and D-ARC, by up to 23.83X in read cache hit ratio, with an
average of 3.23X, and up to 53.3 percent in IOPS, with an average of 49.8 percent, under a real-
world mixed workload when the cache size ranges from 20...
1.6 PROPOSED SYSTEM

➢ His paper proposes a data routing strategy based on distributed Bloom Filter.
1) Superchunk is used as the basic unit of data routing to improve system throughput.
According to Broder’s theorem, k leastsized fingerprints are selected as the Superchunk
features and send to the storage node.
The optimal node is selected as the routing node by matching the BloomFilter, and the
storage capacity of the node and maintained in the memory of the storage node.
2) Design and implement system prototypes. The specific parameters of all kinds of
routing strategies are obtained through experiments, and the routing strategies proposed
in this paper are tested.

1.6.1 PROPOSED SYSTEM ADVANTAGES

➢ The system throughput overhead caused by the broadcast data query method and the
BloomFilter with many fingerprints maintained within the query node greatly affect the
system performance.
➢ More Efficient and more security.
CHAPTER 2

PROJECT DESCRIPTION

2.1 GENERAL

The routing algorithm proposed in this section is based on the cluster deduplication system
architecture. It is not possible to send all the data of the superblock to the storage node for
querying to determine the routing node when data routing is performed. If the storage node uses
indexed tables to store the data block fingerprints, then the storage node does not have such a
large memory space to keep all the fingerprint data. It is also possible to maintain in memory
only a representative fingerprint ID index of the superblock. This approach can reduce a large
amount of memory space, but the sequential query mode of the index table increases the
communication overhead of the system. BloomFilter is used to manage fingerprint indexes in
memory, and stateful routing policies use this approach, but this causes a large amount of system
overhead.

2.2 METHODOLOGIES

2.2.1 MODULES NAME:


This project having the following 5 modules:

1. User Interface.
2. Client-Server (CS)
3. Router
4. User
2.2.2 MODULES EXPLANATION AND DIAGRAM

➢ User Interface Design

In this module we design the windows for the project. In this module mainly we are focusing
the login design page with the Partial knowledge information. Application Users need to view
the application they need to login through the User Interface GUI is the media to connect User
and Media Database and login screen where user can input his/her user name, password and
password will check in database, if that will be a valid username and password then he/she can
access the database.

User
Registration

User Login Login


User Cloud
verification Database

User Page Error Page


➢ Client-Server (CS)
The main function of this part is to process the data stream. In this algorithm, when the client-
server receives the uploaded data stream, it performs data chunking, calculates the fingerprint of
the data block, and the combination and routing of the superblock.

Also, after determining the routing node of the data, the metadata information of the data
needs to be sent to the metadata server for data recovery. When determining the routing node, k
minimum block fingerprints are sent to the storage node for querying instead of sending the data
blocks to the storage node, which can reduce the communication overhead of the system.

Client-
CS Validation
Server Login

Login Page
Cloud Database

View Nodes Upload Data &


details Checking Data

Respond request.
➢ Router
This is second module in this project. First Router enter the name and password then
login in to the web page. After login it will check all user’s clients request data then accept all
request. View all files and detect if any problem in the process.

1. Register PC
2. View PC
3. Accept Request
4. PC Results
5. FLA Detection
6. Logout
➢ User (PC)

In this project user register first then login into the web page. After registration it will generate
user name and password with help of name and password login into the web page. Then check
all data and store the data.

1. Interest
2. Content
3. Logout
2.3 TECHNIQUE USED OR ALGORITHM USED

Distributed BloomFilter-Based Data Routing Strategy:

This section designs a distributed BloomFilter-based data routing strategy that enables fast data
routing queries using BloomFitler and a few superblock representative fingerprint IDs. Unlike
EMC’s stateful routing strategy, this routing strategy uses BloomFilter to maintain the
superblock representative fingerprint IDs in the memory of the storage node without all the
fingerprint indexes stored in that storage node. Only a few superblock representative fingerprint
IDs are sent. A few routing nodes are queried during the query process to achieve global load
balancing using local load balancing.

A. Similarity of Super Block

In cluster data deduplication, to ensure system throughput, normally the routing units is a super
block. Before sending data, the destination node of these superblocks needs to be determined. It
is necessary to determine which storage node stores the most duplicate data in the super block.
To reduce the communication overhead of the system, it is impossible to send the data block to
all storage nodes for one-by-one comparison. Methods such as MD5 or SHA-1 are often used to
calculate the fingerprint information of the data block, send the data fingerprint information to
the storage node, and then use the relevant data. The similarity algorithm determines the
similarity of the data. This paper uses BloomFilter to judge the similarity of superblock data. It
uses BloomFilter to query the advantages of high efficiency and low storage occupancy while
avoiding the influence of misjudgment rate on data judgment. In addition, to ensure load balance
in a cluster storage system, it is not only necessary to send as much duplicate data as possible to
the same storage node, but also to combine the storage load of the storage node and
comprehensively select the storage node as the destination node for data routing. As shown in
Figure 1, suppose that the number of k representative fingerprint IDs is 8, and the number of
storage nodes in the cluster deduplication system is 4. As shown in the figure, for Node0, 2 of
the 8 representative fingerprint IDs have BloomFilter in the storage node, and the storage
capacity of the storage node is 0.5. For Node1, 4 fingerprints already exist, 2 in Node2, and 4 in
Node3. Intuitively speaking, the superblock should be sent to Node1 or Node3, but from the
figure 1, It shows that a large amount of data has been stored in these two nodes. Continue
sending data to the two storage nodes will lead to no more storage capacity to receive incoming
data on those two storage nodes for a short while. Therefore, the method of selecting nodes in
this article uses Ci/Vi (Ci represents the number of data block fingerprints that have been stored
in BloomFilter in storage node i, and Vi represents the data block capacity of the storage node
has been stored) to calculate the optimal node for data routing. It can be seen from the figure that
the value of the calculation result of the Node2 node is the largest, so Node2 is selected as the
final node of the data routing.

FIGURE 1. Similarity matching diagram.

Node2 node only has two data block fingerprints matching successfully, and sending the super
block to this node is not the optimal choice in terms of deduplication rate. In actual application,
the system starts from the data routing of the first super block and considers the optimal storage
node; instead of considering the deduplication rate first, the data based on load balancing starts
after the entire system has stored a certain number of data blocks. Routing strategy. Therefore,
afterload balancing is adopted in the entire deduplication system, the storage capacity of all
storage nodes is increased simultaneously. There will be no storage capacity of some storage
nodes that is much larger than the storage capacity of other nodes. Therefore, when the similarity
matching of the super block is used, the storage capacity of the storage node is relatively
balanced, and the situation shown in the figure will not occur. Some storage nodes still have free
space even it stores a large amount of data. Therefore, practical applications, when the storage
node stores more balanced data, the main factor determining the routing node of the superblock
is the number of matching fingerprint IDs. Only using max (Ci/Vi) may not be a good choice, so
using the set threshold or the average storage capacity of the system to select the optimal storage
node strategy are proposed. Use the set threshold or use the average storage capacity of the
system to select the optimal storage node strategy. Literature sets a threshold, which specifies the
minimum value of Ci. When the value is lower than this value, the storage node will not be
considered. This can solve certain problems. In addition, the storage capacity of the node
exceeds the system. The percentage of the average capacity of all storage nodes and the product
of Ci/Vi is used as the judgment basis for selecting the optimal routing node. These methods
have achieved certain results.
CHAPTER 3

REQUIREMENTS ENGINEERING

3.1 GENERAL

However, this routing strategy achieves the idea of global load balancing through the
local load balancing strategy. only a few optimal nodes are selected among the nodes, which
mean that the selected nodes could be local optimal nodes instead of global optimal nodes.

3.2 HARDWARE REQUIREMENTS

The hardware requirements may serve as the basis for a contract for the implementation
of the system and should therefore be a complete and consistent specification of the whole
system. They are used by software engineers as the starting point for the system design. It
shoulds what the system and not how it should be implemented.

• PROCESSOR : PENTIUM IV 2.6 GHz, Intel Core 2 Duo.


• RAM : 512 MB DD RAM
• MONITOR : 15” COLOR
• HARD DISK : 40 GB
3.3 SOFTWARE REQUIREMENTS

The software requirements document is the specification of the system. It should include both a
definition and a specification of requirements. It is a set of what the system should do rather than
how it should do it. The software requirements provide a basis for creating the software
requirements specification. It is useful in estimating cost, planning team activities, performing
tasks and tracking the teams and tracking the team’s progress throughout the development
activity.

• Front End : J2EE (JSP, SERVLET)


• Back End : MY SQL 5.5
• Operating System : Windows 7
• IDE : Eclipse
3.4 FUNCTIONAL REQUIREMENTS

A functional requirement defines a function of a software-system or its component. A


function is described as a set of inputs, the behaviour, and outputs. The outsourced computation
is data is more secured.

➢ Router

7. Register PC
8. View PC
9. Accept Request
10. PC Results
11. FLA Detection
12. Logout

➢ PC

4. Interest
5. Content
6. Logout
3.5 NON-FUNCTIONAL REQUIREMENTS

The major non-functional Requirements of the system are as follows.

➢ Usability

The system is designed with completely automated process hence there is no or less user
intervention.

➢ Reliability

The system is more reliable because of the qualities that are inherited from the chosen platform
java. The code built by using java is more reliable.

➢ Performance

This system is developing in the high-level languages and using the advanced front-end and
back-end technologies it will give response to the end user on client system with in very less
time.

➢ Supportability

The system is designed to be the cross platform supportable. The system is supported on a wide
range of hardware and any software platform, which is having JVM, built into the system.

➢ Implementation

The system is implemented in web environment using struts framework. The apache tomcat is
used as the web server and windows xp professional is used as the platform. Interface the user
interface is based on Struts provides HTML Tag.
CHAPTER 4

DESIGN ENGINEERING

4.1 GENERAL

Design Engineering deals with the various UML [Unified Modelling language] diagrams
for the implementation of project. Design is a meaningful engineering representation of a thing
that is to be built. Software design is a process through which the requirements are translated into
representation of the software. Design is the place where quality is rendered in software
engineering. Design is the means to accurately translate customer requirements into finished
product.
4.1.1 Use Case Diagram

Register PC

<<include>> Active

View PC

Accept Request

ROUTER <<include>> Accept

Fla Detection
<<include>>

FLA Defence
Result

Logout

EXPLANATION:

Use cases are used during requirements elicitation and analysis to represent the functionality of
the system. Use case focus on the behaviour of the system from an external point of view. The
identification of actors and use cases results in the definition of the boundary of the system,
which is, in differentiating the tasks accomplished by the system and the tasks accomplished by
its environment. The actors are outside the boundary of the system, where as the use cases are
inside the boundary of the system.
4.1.2 Class Diagram

EXPLANATION

In this class diagram represents how the classes with attributes and methods are linked
together to perform the verification.
4.1.3 Object Diagram

Router : Router Sender : Sender

Database : Database Receiver : Receiver

EXPLANATION:

In the above digram tells about the flow of objects between the classes. It is a diagram that
shows a complete or partial view of the structure of a modeled system. In this object diagram
represents how the classes with attributes and methods are linked together to perform the
verification with security.
4.1.4 State Chart Diagram

Register PC

View PC's

Accept Request

PC Results

FLA Detection

EXPLANATION:

State diagram are a loosely defined diagram to show workflows of stepwise activities and
actions, with support for choice, iteration and concurrency. State diagrams require that the
system described is composed of a finite number of states; sometimes, this is indeed the case,
while at other times this is a reasonable abstraction. Many forms of state diagrams exist, which
differ slightly and have different semantics.
4.1.5 Sequence Diagram

Router Login Authentication Home PC Detection Result Logout

1 : login()

2 : authentication()

3 : if fail() 4 : success()

5 : view new pc request()


6 : active()

7 : success()

8 : view pc request()
9 : accept()

10 : success()

11 : view Fla detection()


12 : defence()

13 : success/fail()

14 : view final result()


15 : logout request()

16 : loggedout succssfully()

EXPLANATION:

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction


diagram that shows how processes operate with one another and in what order. It is a construct of
a Message Sequence Chart. A sequence diagram shows object interactions arranged in time
sequence. It depicts the objects and classes involved in the scenario and the sequence of
messages exchanged between the objects needed to carry out the functionality of the scenario.
4.1.6 Collaboration Diagram

Logout 13 : logout request()


Home

14 : loggedout succssfully() 4 : if success()

10 : defence()
8 : accept()
12 : view final result()
Authentication
6 : active() 9 : view Fla detection()

2 : authentication()

11 : success/fail()
3 : if fail() 5 : view new pc request() Result

Login
7 : view pc request()
Detection

1 : login()

PC

Router

EXPLANATION:

A collaboration diagram, also called a communication diagram or interaction diagram, is


an illustration of the relationships and interactions among software objects in the Unified
Modeling Language (UML). The concept is more than a decade old although it has been refined
as modeling paradigms have evolved.
4.1.7 Activity Diagram

Router

Login

NO

YES
Home

PC Request Logout

Active Pc View PC Accept Request Fla Detection Fla Defence

EXPLANATION:

Activity diagrams are graphical representations of workflows of stepwise activities and


actions with support for choice, iteration and concurrency. In the Unified Modeling Language,
activity diagrams can be used to describe the business and operational step-by-step workflows of
components in a system. An activity diagram shows the overall flow of control.
4.1.8 Component Diagram

Router Sender

Database Receiver

EXPLANATION:

In the Unified Modeling Language, a component diagram depicts how components are wired
together to form larger components and or software systems. They are used to illustrate the
structure of arbitrarily complex systems. User gives main query and it converted into sub queries
and sends through data dissemination to data aggregators. Results are to be showed to user by
data aggregators. All boxes are components and arrow indicates dependencies.
4.1.9 E-R Diagram:

Address Email
Name pwd
pwd
User
Router name

Verify Data Center Verify


Req Rsp

Cloud Database

EXPLANATION:

Entity-Relationship Model (ERM) is an abstract and conceptual representation of data. Entity-


relationship modeling is a database modeling method, used to produce a type of conceptual
schema or semantic data model of a system, often a relational database.
4.1.10 Data Flow Diagram:

Level 0:
Level 1:

EXPLANATION:

A data flow diagram (DFD) is a graphical representation of the "flow" of data through an
information system, modeling its process aspects. Often they are a preliminary step used to create an
overview of the system which can later be elaborated. DFDs can also be used for the visualization of data
processing (structured design).

A DFD shows what kinds of data will be input to and output from the system, where the data will
come from and go to, and where the data will be stored. It does not show information about the timing of
processes, or information about whether processes will operate in sequence or in parallel.
4.1.11 Deployment Diagram:

Router Sender

Database Receiver

EXPLANATION:

In the Unified Modeling Language, a component diagram depicts how components are wired
together to form larger deployment and or software systems. They are used to illustrate the
structure of arbitrarily complex systems. User gives main query and it converted into sub queries
and sends through data dissemination. Results are to be showed to user by data aggregators. All
boxes are arrow indicates dependencies.
4.2 System Architecture

System Architecture Model


CHAPTER 5

DEVELOPMENT TOOLS

5.1 GENERAL
This chapter is about the software language and the tools used in the development of the project. The
platform used here is JAVA. The Primary languages are JAVA, J2EE and J2ME. In this project J2EE is
chosen for implementation.

5.2 FEATURES OF JAVA

5.2.1 THE JAVA FRAMEWORK

Java is a programming language originally developed by James Gosling at Microsystems


and released in 1995 as a core component of Sun Microsystems' Java platform. The language
derives much of its syntax from C and C++ but has a simpler object model and fewer low-level
facilities. Java applications are typically compiled to byte code that can run on any Java Virtual
Machine (JVM) regardless of computer architecture. Java is general-purpose, concurrent, class-
based, and object-oriented, and is specifically designed to have as few implementation
dependencies as possible. It is intended to let application developers "write once, run anywhere".

Java is considered by many as one of the most influential programming languages of the
20th century, and is widely used from application software to web applications the java
framework is a new platform independent that simplifies application development internet. Java
technology's versatility, efficiency, platform portability, and security make it the ideal
technology for network computing. From laptops to datacenters, game consoles to scientific
supercomputers, cell phones to the Internet, Java is everywhere!
5.2.2 OBJECTIVES OF JAVA

To see places of Java in Action in our daily life, explore java.com.

Why Software Developers Choose Java

Java has been tested, refined, extended, and proven by a dedicated community. And numbering
more than 6.5 million developers, it's the largest and most active on the planet. With its
versatility, efficiency, and portability, Java has become invaluable to developers by enabling
them to:

• Write software on one platform and run it on virtually any other platform
• Create programs to run within a Web browser and Web services
• Develop server-side applications for online forums, stores, polls, HTML forms
processing, and more
• Combine applications or services using the Java language to create highly customized
applications or services
• Write powerful and efficient applications for mobile phones, remote processors, low-cost
consumer products, and practically any other device with a digital heartbeat

Some Ways Software Developers Learn Java

Today, many colleges and universities offer courses in programming for the Java platform. In
addition, developers can also enhance their Java programming skills by reading Sun's
java.sun.com Web site, subscribing to Java technology-focused newsletters, using the Java
Tutorial and the New to Java Programming Center, and signing up for Web, virtual, or
instructor-led courses.

ObjectOriented
To be an Object Oriented language, any language must follow at least the four characteristics.

1. Inheritance: It is the process of creating the new classes and using the behavior of the
existing classes by extending them just to reuse the existing code and adding addition a
feature as needed.
2. Encapsulation: It is the mechanism of combining the information and providing the
abstraction.

3. Polymorphism: As the name suggest one name multiple form, Polymorphism is the
way of providing the different functionality by the functions having the same name based
on the signatures of the methods.

4. Dynamic binding: Sometimes we don't have the knowledge of objects about their
specific types while writing our code. It is the way of providing the maximum
functionality to a program about the specific type at runtime.

5.2.3 JAVA SWING OVERVIEW

Abstract Window Toolkit (AWT) is cross-platform

Swing provides many controls and widgets to build user interfaces with. Swing class names
typically begin with a J such as JButton, JList, JFrame. This is mainly to differentiate them
from their AWT counterparts and in general is one-to-one replacements. Swing is built on the
concept of Lightweight components vs AWT and SWT's concept of Heavyweight
components. The difference between the two is that the Lightweight components are
rendered (drawn) using purely Java code, such as drawLine and drawImage, whereas
Heavyweight components use the native operating system to render the components.

Some components in Swing are actually heavyweight components. The top-level classes and
any derived from them are heavyweight as they extend the AWT versions. This is needed
because at the root of the UI, the parent windows need to be provided by the OS. These top-
level classes include JWindow, JFrame, JDialog and JApplet. All Swing components to be
rendered to the screen must be able to trace their way to a root window of one of those
classes.
Note: It generally it is not a good idea to mix heavyweight components with lightweight
components (other than as previously mentioned) as you will encounter layering issues, e.g.,
a lightweight component that should appear "on top" ends up being obscured by a
heavyweight component. The few exceptions to this include using heavyweight components
as the root pane and for popup windows. Generally speaking, heavyweight components will
render on top of lightweight components and will not be consistent with the look and feel
being used in Swing. There are exceptions, but that is an advanced topic. The truly
adventurous may want to consider reading this article from Sun on mixing heavyweight and
lightweight components.

5.2.4 EVOLUTION OF COLLECTION FRAMEWORK:

Almost all collections in Java are derived from the java.util.Collection interface. Collection
defines the basic parts of all collections. The interface states the add() and remove() methods for
adding to and removing from a collection respectively. Also required is the toArray() method,
which converts the collection into a simple array of all the elements in the collection. Finally, the
contains() method checks if a specified element is in the collection. The Collection interface is a
sub interface of java.util.Iterable, so the iterator() method is also provided. All collections have
an iterator that goes through all of the elements in the collection. Additionally, Collection is a
generic. Any collection can be written to store any class. For example, Collection<String> can
hold strings, and the elements from the collection can be used as strings without any casting
required.

There are three main types of collections:

• Lists: always ordered, may contain duplicates and can be handled the same way as usual
arrays
• Sets: cannot contain duplicates and provide random access to their elements
• Maps: connect unique keys with values, provide random access to its keys and may host
duplicate values
LIST:

Lists are implemented in the JCF via the java.util.List interface. It defines a list as essentially a
more flexible version of an array. Elements have a specific order, and duplicate elements are
allowed. Elements can be placed in a specific position. They can also be searched for within the
list. Two concrete classes implement List. The first is java.util.ArrayList, which implements the
list as an array. Whenever functions specific to a list are required, the class moves the elements
around within the array in order to do it. The other implementation is java.util.LinkedList. This
class stores the elements in nodes that each have a pointer to the previous and next nodes in the
list. The list can be traversed by following the pointers, and elements can be added or removed
simply by changing the pointers around to place the node in its proper place.

SET:

Java's java.util.Set interface defines the set. A set can't have any duplicate elements in it.
Additionally, the set has no set order. As such, elements can't be found by index. Set is
implemented by java.util.HashSet, java.util.LinkedHashSet, and java.util.TreeSet. HashSet uses
a hash table. More specifically, it uses a java.util.HashMap to store the hashes and elements and
to prevent duplicates. Java.util.LinkedHashSet extends this by creating a doubly linked list that
links all of the elements by their insertion order. This ensures that the iteration order over the set
is predictable. java.util.TreeSet uses a red-black tree implemented by a java.util.TreeMap. The
red-black tree makes sure that there are no duplicates. Additionally, it allows Tree Set to
implement java.util.SortedSet.

The java.util.Set interface is extended by the java.util.SortedSet interface. Unlike a regular set,
the elements in a sorted set are sorted, either by the element's compareTo() method, or a method
provided to the constructor of the sorted set. The first and last elements of the sorted set can be
retrieved, and subsets can be created via minimum and maximum values, as well as beginning or
ending at the beginning or ending of the sorted set. The SortedSet interface is implemented
by java.util.TreeSet
java.util.SortedSet is extended further via the java.util.NavigableSet interface. It's similar to
SortedSet, but there are a few additional methods. The floor(), ceiling(), lower(), and higher()
methods find an element in the set that's close to the parameter. Additionally, a descending
iterator over the items in the set is provided. As with SortedSet, java.util.TreeSet implements
NavigableSet.

MAP:

Maps are defined by the java.util.Map interface in Java. Maps are simple data structures that
associate a key with a value. The element is the value. This lets the map be very flexible. If the
key is the hash code of the element, the map is essentially a set. If it's just an increasing number,
it becomes a list. Maps are implemented by java.util.HashMap, java.util.LinkedHashMap,
and java.util.TreeMap. HashMap uses a hash table. The hashes of the keys are used to find the
values in various buckets. LinkedHashMap extends this by creating a doubly linked list between
the elements. This allows the elements to be accessed in the order in which they were inserted
into the map. TreeMap, in contrast to HashMap and LinkedHashMap, uses a red-black tree. The
keys are used as the values for the nodes in the tree, and the nodes point to the values in the map

THREAD:

Simply put, a thread is a program's path of execution. Most programs written today run as a
single thread, causing problems when multiple events or actions need to occur at the same time.
Let's say, for example, a program is not capable of drawing pictures while reading keystrokes.
The program must give its full attention to the keyboard input lacking the ability to handle more
than one event at a time. The ideal solution to this problem is the seamless execution of two or
more sections of a program at the same time.
CREATING THREADS:

Java's creators have graciously designed two ways of creating threads: implementing an interface
and extending a class. Extending a class is the way Java inherits methods and variables from a
parent class. In this case, one can only extend or inherit from a single parent class. This
limitation within Java can be overcome by implementing interfaces, which is the most common
way to create threads. (Note that the act of inheriting merely allows the class to be run as a
thread. It is up to the class to start() execution, etc.)

Interfaces provide a way for programmers to lay the groundwork of a class. They are used to
design the requirements for a set of classes to implement. The interface sets everything up, and
the class or classes that implement the interface do all the work. The different set of classes that
implement the interface have to follow the same rules.

5.3 CONCLUSION

Swing's high level of flexibility is reflected in its inherent ability to override the native
host operating system (OS)'s GUI controls for displaying itself. Swing "paints" its controls using
the Java 2D APIs, rather than calling a native user interface toolkit. The Java thread scheduler is
very simple. All threads have a priority value which can be changed dynamically by calls to the
threads setPriority() method . Implementing the above concepts in our project to do the efficient
work among the Server.
CHAPTER 6

IMPLEMENTATION

6.1 GENERAL

The Implementation is nothing but sores code of project.

6.2 IMPLEMENTATION

Coding:

Index.html

<!DOCTYPE html>
<html>
<head>
<title>Network</title>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<!-- favicons -->
<link rel="apple-touch-icon" sizes="57x57" href="images/favicons/apple-icon-
57x57.png">
<link rel="apple-touch-icon" sizes="60x60" href="images/favicons/apple-icon-
60x60.png">
<link rel="apple-touch-icon" sizes="72x72" href="images/favicons/apple-icon-
72x72.png">
<link rel="apple-touch-icon" sizes="76x76" href="images/favicons/apple-icon-
76x76.png">
<link rel="apple-touch-icon" sizes="114x114" href="images/favicons/apple-icon-
114x114.png">
<link rel="apple-touch-icon" sizes="120x120" href="images/favicons/apple-icon-
120x120.png">
<link rel="apple-touch-icon" sizes="144x144" href="images/favicons/apple-icon-
144x144.png">
<link rel="apple-touch-icon" sizes="152x152" href="images/favicons/apple-icon-
152x152.png">
<link rel="apple-touch-icon" sizes="180x180" href="images/favicons/apple-icon-
180x180.png">
<link rel="icon" type="image/png" sizes="192x192" href="images/favicons/android-
icon-192x192.png">
<link rel="icon" type="image/png" sizes="32x32" href="images/favicons/favicon-
32x32.png">
<link rel="icon" type="image/png" sizes="96x96" href="images/favicons/favicon-
96x96.png">
<link rel="icon" type="image/png" sizes="16x16" href="images/favicons/favicon-
16x16.png">
<link rel="manifest" href="/manifest.json">
<meta name="msapplication-TileColor" content="#ffffff">
<meta name="msapplication-TileImage" content="/ms-icon-144x144.png">
<meta name="theme-color" content="#ffffff">
<!-- favicons -->
<link rel="stylesheet" type="text/css" href="//maxcdn.bootstrapcdn.com/font-
awesome/4.7.0/css/font-awesome.min.css">
<link rel="stylesheet" type="text/css" href="css/style.css">
<link rel="stylesheet" type="text/css" href="css/custom-responsive-style.css">
<link href="//fonts.googleapis.com/css?family=Montserrat" rel="stylesheet">
<script type="text/javascript" src="script/jquery-3.2.1.min.js"></script>
<script type="text/javascript" src="script/all-plugins.js"></script>
<script type="text/javascript" src="script/plugin-active.js"></script>
</head>
<body data-spy="scroll" data-target=".main-navigation" data-offset="150">
<section id="MainContainer">
<!-- Header starts here -->
<jsp:include page="menu.jsp"></jsp:include>
<!-- Header ends here -->

<!-- Banner starts here -->


<section id="HeroBanner">
<div class="hero-content">
<h1>Research on Data Routing Strategy in Cloud Environment</h1>
<%String status = request.getParameter("status");
if(status!=null)
{%>
<h1><%out.print(status); %></h1>
<%}
%>
</div>
</section>
<!-- Banner ends here -->
<!-- Register section starts here -->
<section id="Register">
<div class="container contact-container">
<h3 class="contact-title" style="margin-top: 10%;">PC
Registration</h3>
<div class="contact-outer-wrapper">
<div class="form-wrap">
<form action="./RegisterServlet" method="post">
<div class="fname floating-label">
<input type="text" class="floating-input" name="name"
id="full-name-field" required="" />
<label for="full-name-field">Name</label>
</div>
<div class="email floating-label">
<input type="email" class="floating-input"
name="email" id="mail-field" required="" />
<label for="mail-field">Email</label>
</div>
<div class="contact floating-label">
<input type="password" class="floating-input"
name="password" id="contact-us-field" required=""/>
<label for="contact-us-field">Password</label>
</div>
<div class="company floating-label">
<input type="tel" class="floating-input"
name="mobile" id="company-field" maxlength="10" required=""/>
<label for="company-field">Mobile</label>
</div>
<div class="user-msg floating-label">
<textarea class="floating-input" name="address"
id="user-msg-field" required=""></textarea>
<label for="user-msg-field" class="msg-
label">Address</label>
</div>
<div class="submit-btn">
<button type="submit">Submit</button>
</div>
</form>
</div>
</div>
</div>
</section>
<!-- Services section ends here -->
<!-- About Us section starts here -->
<section id="About">
<div class="container" style="margin-bottom: 20%;">
<div class="about-wrapper">
<h2 style="margin-top: 10%;">About Project</h2>
<p>
</p>
</div>
</div>
</section>
<!-- About Us section ends here -->

<!-- Contact us section starts here -->


<section id="ContactUs">
<div class="container contact-container" style="margin-bottom: 10%;">
<h3 class="contact-title">Login</h3>
<div class="">
<div class="form-wrap">
<form action="./LoginServlet" method="post">
<div class="email floating-label" style="margin-left:
40%;">
<input type="email" class="floating-input"
name="email" id="mail-field" required=""/>
<label for="mail-field">Email</label>
</div>
<div class="contact floating-label" style="margin-left:
40%;">
<input type="password" class="floating-input"
name="password" id="contact-us-field" required=""/>
<label for="contact-us-field">Password</label>
</div>
<div class="submit-btn">
<button type="submit">Login</button>
</div>
</form>
</div>
</div>
</div>
</section>
<!-- Contact us section ends here -->
</section>
</body>

</html>

LoginServlet.java

package detection.defense.cache.pollution.servlet;

import java.io.IOException;

import java.net.InetAddress;

import java.net.URLConnection;

import java.util.ArrayList;

import javax.servlet.RequestDispatcher;

import javax.servlet.ServletException;

import javax.servlet.annotation.WebServlet;

import javax.servlet.http.HttpServlet;

import javax.servlet.http.HttpServletRequest;

import javax.servlet.http.HttpServletResponse;

import javax.servlet.http.HttpSession;
import detection.defense.cache.pollution.Bean.Bean;

import detection.defense.cache.pollution.dao.SecurityDAO;

@WebServlet("/LoginServlet")

public class LoginServlet extends HttpServlet {

static String packets;

protected void doPost(HttpServletRequest request,


HttpServletResponse response) throws ServletException, IOException {

String email = request.getParameter("email");

String password = request.getParameter("password");

InetAddress host = InetAddress.getLocalHost();

System.out.println("Host--->"+host);

System.out.println("Host--->"+host.getHostAddress().trim());

String str[] = email.split("|");

System.out.println("Packets--->%"+str+"%");

System.out.println("URL--->"+request.getRequestURI());

int uid = 0;

String uname = null;

String mail = null;


if(email.equalsIgnoreCase("[email protected]")&&password.equalsIgnoreCase("router"))

{ RequestDispatcher rd =
request.getRequestDispatcher("RouterHome.jsp?status=<font color=white>Welcome
Router</font>");

rd.include(request, response);

} else

{ try {

ArrayList<Bean> al = new SecurityDAO().pcLogin(email,password);

for(Bean b : al)

uid = b.getUid();

uname = b.getUname();

mail = b.getEmail();

if(!al.isEmpty())

HttpSession session = request.getSession();

session.setAttribute("uid", uid);
session.setAttribute("uname", uname);

session.setAttribute("email", email);

RequestDispatcher rd =
request.getRequestDispatcher("PCHome.jsp?status=<font color=white>Welcome
"+uname+"</font>");

rd.include(request, response);

} else

RequestDispatcher rd = request.getRequestDispatcher("index.jsp?status=<font color=red>Invalid


Email and Password</font>");

rd.include(request, response); }

} catch (Exception e) {

e.printStackTrace();

RequestDispatcher rd =
request.getRequestDispatcher("index.jsp?status=<font color=red>Some Internal Error</font>");

rd.include(request, response);

} }

}
CHAPTER 7

SNAPSHOTS

7.1 GENERAL

This project is implements like web application using COREJAVA and the Server process is
maintained using the SOCKET & SERVERSOCKET and the Design part is played by
Cascading Style Sheet.

7.2 VARIOUS SNAPSHOTS


CHAPTER 8
SOFTWARE TESTING

8.1 GENERAL
The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub-assemblies, assemblies and/or a finished product It is the
process of exercising software with the intent of ensuring that the Software system meets its
requirements and user expectations and does not fail in an unacceptable manner. There are
various types of tests. Each test type addresses a specific testing requirement.

8.2 DEVELOPING METHODOLOGIES


The test process is initiated by developing a comprehensive plan to test the general
functionality and special features on a variety of platform combinations. Strict quality control
procedures are used. The process verifies that the application meets the requirements specified in
the system requirements document and is bug free. The following are the considerations used to
develop the framework from developing the testing methodologies.

8.3 Types of Tests

8.3.1 Unit Testing


Unit testing involves the design of test cases that validate that the internal program logic is
functioning properly, and that program input produce valid outputs. All decision branches and
internal code flow should be validated. It is the testing of individual software units of the
application .it is done after the completion of an individual unit before integration. This is a
structural testing, that relies on knowledge of its construction and is invasive. Unit tests perform
basic tests at component level and test a specific business process, application, and/or system
configuration. Unit tests ensure that each unique path of a business process performs accurately
to the documented specifications and contains clearly defined inputs and expected results.
8.3.2 Functional Test
Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user manuals.
Functional testing is centered on the following items:
Valid Input : identified classes of valid input must be accepted.
Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be exercised.
Systems/Procedures: interfacing systems or procedures must be invoked.

8.3.3 System Test


System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system testing is the
configuration-oriented system integration test. System testing is based on process descriptions
and flows, emphasizing pre-driven process links and integration points.

8.3.4 Performance Test


The Performance test ensures that the output be produced within the time limits, and the time
taken by the system for compiling, giving response to the users and request being send to the
system for to retrieve the results.

8.3.5 Integration Testing


Software integration testing is the incremental integration testing of two or more integrated
software components on a single platform to produce failures caused by interface defects.
The task of the integration test is to check that components or software applications, e.g.
components in a software system or – one step up – software applications at the company level –
interact without error.
8.3.6 Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires significant
participation by the end user. It also ensures that the system meets the functional requirements.

Acceptance Testing for Data Synchronization:


➢ The Acknowledgements will be received by the Sender Node after the Packets are
received by the Destination Node
➢ The Route add operation is done only when there is a Route request in need
➢ The Status of Nodes information is done automatically in the Cache Updation process

8.3.7 Build the test plan

Any project can be divided into units that can be further performed for detailed processing.
Then a testing strategy for each of this unit is carried out. Unit testing helps to identity the
possible bugs in the individual component, so the component that has bugs can be identified and
can be rectified from errors.
TEST CASES:

Test cases can be divided in to two types. First one is Positive test cases and second one is
negative test cases. In positive test cases are conducted by the developer intention is to get the
output. In negative test cases are conducted by the developer intention is to don’t get the output.

TEST PLAN

The test procedure is started by building up a thorough arrangement to test the general
usefulness and extraordinary highlights on an assortment of stage mixes. Exacting quality control
methods are utilized. The procedure checks that the application meets the necessities indicated in
the framework prerequisites report and is sans bug.

Any project can be separated into units that can be further performed for detailed processing.
Then a testing strategy for each of this unit is carried out. Unit testing serves to character the
potential bugs in the individual segment, so the segment that has bugs can be recognized and can
be redressed from mistakes.
CHAPTER 9

APPLICATION

9.1 GENERAL

The process is carried out in the storage nodes, which does not affect system performance, so the
price paid is worth it. In practical use, this data routing strategy needs to ensure a balance
between storage nodes and data volume to prevent the use of too many storage nodes with fewer
data resulting in poor deduplication. In contrast, the minimum number of block fingerprints in
the BloomFilter maintained by storage nodes needs to be selected reasonably based on the
performance of the system storage nodes.

9.2 FUTURE ENHANCEMENT

The following aspects deserve more attention and research in the future:

1. The deletion and addition of storage nodes in the cluster deduplication system. The
storage nodes in the cluster deduplication system may stop working due to some
problems, or the storage capacity cannot meet the data growth. At this time, storage
nodes need to be added. However, Ruhe will transfer the data to the new storage node
and how to recover the lost data. Although the current consistency hash and erasure code-
based strategies can have certain effects, they can still be better.

2. Selection of the representative fingerprint ID of the super block. At present, the Border
theorem is mainly used select the representative fingerprint ID of the super block. The
effect of the data routing strategy for the data routing unit is not obvious. Therefore, the
selection of a good data routing unit or the selection of a representative fingerprint ID can
effectively improve the deduplication rate of the system.
CHAPTER 10

CONCLUSION & REFERENCE

10.1 CONCLUSION

In order to reduce the communication overhead of the clustered deduplication system, this paper
proposes a distributed BloomFilter-based data routing strategy, which takes the superblock as the
basic data routing unit, selects the k smallest block fingerprints of the superblock as the
representative fingerprint IDs of the superblock, determines p storage nodes by these fingerprint
IDs, and then uses BloomFilter to determine these The number of occurrences of these k smallest
block fingerprint IDs in the corresponding nodes is then determined using BloomFilter. The
optimal node is calculated as the final data routing node using the number of matches and the
capacity of the storage node. This strategy can obtain a better deduplication rate and load
balancing with lower communication overhead. At present, the research on cluster deduplication
routing strategy mainly focuses on two aspects of deduplication rate and load balancing, and
many research results have been obtained. However, there are still many problems that have not
been solved or have become the main research points.
10.2 REFERENCE

[1] J. Gantz and D. Reinsel, The digital universe decadelare you ready, Needham, MA, USA,
May 2010.

[2] H. Bigger, Experencing data de-duplication: Improving efficiency and reducing capacity
requirements, Silicon Valley, CA, USA, Feb. 2007.

[3] Q. He, G. Bian, B. Shao and W. Zhang, "Research on multi-feature data routing strategy in
deduplication", Sci. Program., vol. 2020, pp. 1-11, Oct. 2020.

[4] N. Mandagere, P. Zhou, M. A. Smith and S. Uttamchandani, "Demystifying data


deduplication", Proc. ACM/IFIP/USENIX Int. Middleware Conf. Companion Middleware
Companion (Companion), pp. 12-17, 2008.

[5] D. Bhagwat, K. Eshghi, D. D. E. Long and M. Lillibridge, "Extreme binning: Scalable


parallel deduplication for chunk-based file backup", Proc. IEEE Int. Symp. Modeling Anal.
Simulation Comput. Telecommun. Syst., pp. 1-9, Sep. 2009.

[6] M. Lillibridge, K. Eshghi, D. Bhagwat, V. Deolalikar, G. Trezis and P. Camble, "Sparse


indexing: Large scale inline deduplication using sampling and locality", Proc. 7th Conf. File

[7] D. Meister and A. Brinkmann, "Multi-level comparison of data deduplication in a backup


scenario", Proc. SYSTOR Israeli Experim. Syst. Conf (SYSTOR), pp. 8, 2009.

[8] C. Dubnicki, L. Gryz, L. Heldt, M. Kaczmarczyk, W. Kilian, P. Strzelczak, et al.,


"HYDRAstor: A scalable secondary storage", Proc. 7th USENIX Conf. File Storage Technol.
(FAST), pp. 197-210, 2009.

[9] W. Dong, F. Douglis, K. Li, H. Patterson, S. Reddy and P. Shilane, "Tradeoffs in scalable
data routing for deduplication clusters", Proc. 9th USENIX Conf. File Storage Technol. (FAST),
pp. 15-29, 2011.

[10] S. Luo, G. Zhang, C. Wu, S. U. Khan and K. Li, "Boafft: Distributed deduplication for big
data storage in the cloud", IEEE Trans. Cloud Comput., vol. 8, no. 4, pp. 1199-1211, Oct. 2020.
[11] R. Real and J. M. Vargas, "The probablilistic basis of Jaccard’s index of
similarity", Systematic Biol., vol. 45, no. 3, pp. 380-385, 1996.

[12] Y. Fu, H. Jiang and N. Xiao, "A scalable inline cluster deduplication framework for big data
protection", Proc. ACM/IFIP/USENIX 13th Int. Conf. Middleware (Middleware), pp. 354-373,
Dec. 2012.

[13] Q. He, G. Bian, B. Shao and W. Zhang, "Data deduplication technology for cloud
storage", Tehnički Vjesnik, vol. 27, no. 5, pp. 1445-1446, 2020.

[14] G. Lu, Y. Jin and D. H. C. Du, "Frequency based chunking for data de-duplication", Proc.
IEEE Int. Symp. Modeling Anal. Simulation Comput. Telecommun. Syst., pp. 287-296, Aug.
2010.

[15] S. Brin, J. Davis and H. García-Molina, "Copy detection mechanisms for digital
documents", Proc. ACM SIGMOD Int. Conf. Manage. Data (SIGMOD), pp. 398-409, 1995.

[16] S. Wu, J. Zhou, W. Zhu, H. Jiang, Z. Huang, Z. Shen, et al., "EaD: A collision-free and
high-performance deduplication scheme for flash storage systems", Proc. IEEE 38th Int. Conf.
Comput. Design (ICCD), pp. 155-162, Oct. 2020.

[17] P. Yang, N. Xue, Y. Zhang, Y. Zhou, L. Sun, W. Chen, et al., "Reducing garbage collection
overhead in SSD based on workload prediction", Proc. 11th USENIX Workshop Hot Topics
Storage File Syst. (HotStorage), pp. 20, Jul. 2019.

[18] B. Harris and N. Altiparmak, "Ultra-low latency SSDs’ impact on overall energy
efficiency", Proc. 12th USENIX Workshop Hot Topics Storage File Syst. (HotStorage), pp. 1-25,
Jul. 2020.

[19] Q. He, G. Bian, B. Shao and W. Zhang, "Research on multi-feature data routing strategy in
deduplication", Sci. Program., vol. 2020, pp. 1-11, Oct. 2020.

[20] Y. Zhang, W. Xia, D. Feng, H. Jiang, Y. Hua and Q. Wang, "Finesse: Fine-grained feature
locality based fast resemblance detection for post deduplication delta compression", Proc. 17th
USENIX Conf. File Storage Technol. (FAST), pp. 121-128, Feb. 2019.
[21] M. O. Rabin, "Fingerprinting by random polynomials", 1981.

[22] A. Z. Broder, "Some applications of Rabin’s fingerprinting method" in Sequences II:


Methods in Communications Security and Computer Science, Cham, Switzerland: Springer, pp.
143-152, 1993.

[23] K. Eshghi and H. K. Tang, "A framework for analyzing and improving content-based
chunking algorithms", 2009.

[24] R. Rivest, The MD5 Message-Digest Algorithm, 1992.

[25] Secure Hash Standard, May 1993.

[26] B. H. Bloom, "Space/time trade-offs in hash coding with allowable errors", Commun. ACM,
vol. 13, no. 7, pp. 422-426, Jul. 1970.

[27] [online] Available:


http://www.google.co.hk/ggblog/googlechinablog/2007/07/bloomfilter_7469.html.

[28] C. Liu, Y. Lu, C. Shi, G. Lu, D. H. C. Du and D.-S. Wang, "ADMAD: Application-driven
metadata aware de-duplication archival storage system", Proc. 5th IEEE Int. Workshop Storage
Netw. Archit. Parallel (I/Os), pp. 29-35, Sep. 2008.

[29] Q. He, G. Bian, W. Zhang, F. Wu and Z. Li, "TCFTL: Improved real-time flash memory
two cache flash translation layer algorithm", J. Nanoelectron. Optoelectron., vol. 16, no. 3, pp.
403-413, Mar. 2021.

[30] Y.-J. Fu, N. Xiao, X.-K. Liao and F. Liu, "Application-aware client-side data reduction and
encryption of personal data in cloud backup services", J. Comput. Sci. Technol., vol. 28, no. 6,
pp. 1012-1024, Nov. 2013.

[31] R. Real and J. M. Vargas, "The probabilistic basis of Jaccard’s index of


similarity", Systematic Biol., vol. 45, no. 3, pp. 380-385, Sep. 1996.

[32] Q. He, G. Bian, W. Zhang, F. Zhang, S. Duan and F. Wu, "Research on routing strategy in
cluster deduplication system", IEEE Access, vol. 9, pp. 135485-135495, 2021.
[33] Z. Zhang, D. Bhagwat, W. Litwin, D. Long and S. J. T. Schwarz, "Improved deduplication
through parallel binning", Proc. IEEE 31st Int. Perform. Comput. Commun. Conf. (IPCCC), pp.
130-141, Dec. 2012.

[34] M. Ajdari, P. Park, J. Kim, D. Kwon and J. Kim, "CIDR: A cost-effective in-line data
reduction system for terabit-per-second scale SSD arrays", Proc. IEEE Int. Symp. High Perform.
Comput. Archit. (HPCA), pp. 28-41, Feb. 2019.

[35] Q. He, Z. Yu, G. Bian, W. Zhang, K. Liu and Z. Li, "Research on key technologies of NBD
storage service system based on load classification", AIP Adv., vol. 11, no. 12, Dec. 2021.

[36] H. Cui, H. Duan, Z. Qin, C. Wang and Y. Zhou, "SPEED: Accelerating enclave applications
via secure deduplication", Proc. IEEE 39th Int. Conf. Distrib. Comput. Syst. (ICDCS), pp. 1072-
1082, Jul. 2019.

[37] K. Han, H. Kim and D. Shin, "WAL-SSD: Address remapping-based write-ahead-logging


solid-state disks", IEEE Trans. Comput., vol. 69, no. 2, pp. 260-273, Feb. 2020.

[38] J. B. Djoko, J. Lange and A. J. Lee, "NeXUS: Practical and secure access control on
untrusted storage platforms using client-side SGX", Proc. 49th Annu. IEEE/IFIP Int. Conf.
Dependable Syst. Netw. (DSN), pp. 401-413, Jun. 2019.

[39] B. Fuhry, L. Hirschoff, S. Koesnadi and F. Kerschbaum, "SeGShare: Secure group file
sharing in the cloud using enclaves", Proc. 50th Annu. IEEE/IFIP Int. Conf. Dependable Syst.
Netw. (DSN), pp. 476-488, Jun. 2020.

[40] D. Harnik, E. Tsfadia, D. Chen and R. Kat, "Securing the storage data path with SGX
enclaves" in arXiv:1806.10883, 2018.

[41] T. Kim, J. Park, J. Woo, S. Jeon and J. Huh, "ShieldStore: Shielded in-memory key-value
storage with SGX", Proc. 14th EuroSys Conf., pp. 1-15, Mar. 2019.

[42] R. Krahn, B. Trach, A. Vahldiek-Oberwagner, T. Knauth, P. Bhatotia and C. Fetzer, "Pesos:


Policy enhanced secure object store", Proc. 13th EuroSys Conf., pp. 1-17, Apr. 2018.
[43] J. Li, P. P. C. Lee, C. Tan, C. Qin and X. Zhang, "Information leakage in encrypted
deduplication via frequency analysis: Attacks and defenses", ACM Trans. Storage, vol. 16, no. 1,
pp. 1-30, Feb. 2020.

[44] J. Li, Z. Yang, Y. Ren, P. P. C. Lee and X. Zhang, "Balancing storage efficiency and data
confidentiality with tunable encrypted deduplication", Proc. 15th Eur. Conf. Comput. Syst., pp.
1-15, Apr. 2020.

[45] S. Mofrad, F. Zhang, S. Lu and W. Shi, "A comparison study of Intel SGX and AMD
memory encryption technology", Proc. 7th Int. Workshop Hardw. Architectural Support Secur.
Privacy, pp. 1-8, Jun. 2018.

[46] O. Oleksenko, B. Trach, R. Krahn, A. Martin, C. Fetzer and M. Silberstein, "Varys:


Protecting SGX enclaves from practical side-channel attacks", Proc. USENIX ATC, pp. 227-240,
2018.

[47] C. Priebe, K. Vaswani and M. Costa, "EnclaveDB: A secure database using SGX", Proc.
IEEE Symp. Secur. Privacy (SP), pp. 264-278, May 2018.

[48] Y. Ren, J. Li, Z. Yang, P. P. C. Lee and X. Zhang, "Accelerating encrypted deduplication
via SGX", 2021, [online] Available:
http://www.cse.cuhk.edu.hk/~pclee/www/pubs/tech_sgxdedup.pdf.

[49 W. You and B. Chen, "Proofs of ownership on encrypted cloud datavia Intel SGX", Proc.
ACNS, pp. 400-416, 2020.

[50] Y. Zhang, W. Xia, D. Feng, H. Jiang, Y. Hua and Q. Wang, "Finesse: Fine-grained feature
locality based fast resemblance detection for post deduplication delta compression", Proc. 17th
USENIX Conf. File Storage Technol. (FAST), pp. 121-128, Feb. 2019.

[51] W. Xia, X. Zou, H. Jiang, Y. Zhou, C. Liu, D. Feng, et al., "The design of fast content-
defined chunking for data deduplication based storage systems", IEEE Trans. Parallel Distrib.
Syst., vol. 31, no. 9, pp. 2017-2031, Sep. 2020.
[52] S. Park, I. Kang, Y. Moon, J. H. Ahn and G. E. Suh, "BCD deduplication: Effective
memory compression using partial cache-line deduplication", Proc. 26th ACM Int. Conf.
Architectural Support Program. Lang. Operating Syst., pp. 52-64, Apr. 2021.

[53] Z. Cao, H. Wen, F. Wu and D. H. Du, "ALACC: Accelerating restore performance of data
deduplication systems using adaptive look-ahead window assisted chunk caching", Proc. 16th
USENIX Conf. File Storage Technol. (FAST), pp. 309-324, 2018.

[54] Y. Tan, C. Xu, J. Xie, Z. Yan, H. Jiang, W. Srisa-An, et al., "Improving the performance of
deduplication-based storage cache via content-driven cache management methods", IEEE Trans.
Parallel Distrib. Syst., vol. 32, no. 1, pp. 214-228, Jan. 2021.

[55] Z. Cao, S. Liu, F. Wu, G. Wang, B. Li and D. H. Du, "Sliding look-back window assisted
data chunk rewriting for improving deduplication restore performance", Proc. 17th USENIX
Conf. File Storage Technol. (FAST), pp. 129-142, 2019.

[56] Y. Zhang, Y. Yuan, D. Feng, C. Wang, X. Wu, L. Yan, et al., "Improving restore
performance for in-line backup system combining deduplication and delta compression", IEEE
Trans. Parallel Distrib. Syst., vol. 31, no. 10, pp. 2302-2314, Oct. 2020.

[57] M. Lu, F. Wang, D. Feng and Y. Hu, "A read-leveling data distribution scheme for
promoting read performance in SSDs with deduplication", Proc. 48th Int. Conf. Parallel
Process., pp. 22, Aug. 2019.

[58] X. Zou, J. Yuan, P. Shilane, W. Xia, H. Zhang and X. Wang, "The dilemma between
deduplication and locality: Can both be achieved", Proc. 19th USENIX Conf. File Storage
Technol. (FAST), pp. 171-185, 2021.

[59] Q. Wang, J. Li, W. Xia, E. Kruus, B. Debnath and P. P. C. Lee, "Austere flash caching with
deduplication and compression", Proc. USENIX Annu. Tech. Conf. (ATC), pp. 713-726, Jul.
2020.

[60] Y. Zhang, Y. Yuan, D. Feng, C. Wang, X. Wu, L. Yan, et al., "Improving restore
performance for in-line backup system combining deduplication and delta compression", IEEE
Trans. Parallel Distrib. Syst., vol. 31, no. 10, pp. 2302-2314, Oct. 2020.

You might also like