0% found this document useful (0 votes)
57 views18 pages

BFC Project

The document discusses an efficient storage technique for cloud-based storage systems. It proposes using lightweight metadata for large files and uploading files in chunks to enable parallel and resumable uploads. It also describes using MD5 hashing to detect duplicate files during uploads in order to avoid data duplication and wasted storage space. The system would allow users to efficiently store, retrieve, and manage large files while minimizing storage usage through data deduplication.

Uploaded by

sudha gilmore
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views18 pages

BFC Project

The document discusses an efficient storage technique for cloud-based storage systems. It proposes using lightweight metadata for large files and uploading files in chunks to enable parallel and resumable uploads. It also describes using MD5 hashing to detect duplicate files during uploads in order to avoid data duplication and wasted storage space. The system would allow users to efficiently store, retrieve, and manage large files while minimizing storage usage through data deduplication.

Uploaded by

sudha gilmore
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18

An Efficient Storage Technique for Cloud

Based Storage Systems

Guided By : Ms. Deepthi V R


Assistant Professor
Department of Computer Science and Engineering

Project By : Arun K
Roll No . 07
M.Tech – M2 CSE
09-May-2019
Contents
• Introduction
• Current system limitations
• An Efficient Storage System
Introduction
• Cloud storage systems are widely in use nowadays which allow
common people to store large files in distributed storage systems.
• Many companies are providing such facilities, they include Google
Drive, ZingMe, DropBox etc.
Limitations
• Efficiently storing, retrieving and managing big files in the system

• Data duplication which incur wastage of storage space which is due to storing same static
data from different users.

• Large amount of metadata required for storing large sized files.


An Efficient Storage System

• A lightweight metadata design for big file. Every file has approximately same
size of metadata
• The file is uploaded in chunks so that parallel and resumable upload is
possible.
• Before uploading, the MD5 value of the file is calculated by the client. The
information is then sent to the server.
An Efficient Storage System
• The server compares the value with other
MD5 values of files it has.
• If a match is found, the file will not be
saved to server space. Only a reference is
kept.
Activities
• Uploading a file

• User can upload any type of file to the server. Every


other user can download these files.

• If same file is uploaded by the same or a different user


then the file is not saved in the data base.

• The second user will get the reference of the data. By


using this we can avoid the duplication of files in the
cloud.
Activities
• Chunks Creation

• A chunk is a data segment generated from a file uploaded by the


user. If the file size is bigger than the configured size, it will be split
into a list of chunks.

• All the chunks which are generated from a file, except the last chunk
will have the same size (except the last chunk which may have an
equal or small size).

• After that, the ID generator will generate id for the file and the first
chunk with auto-increment mechanism.
Activities
• Data deduplication

• Data deduplication is one of the crucial mechanism for minimizing


identical copies of similar data.

• The method used to detect duplicate data is MD5 hash function


while uploading.

• Let us say, that the file selected by the client to be uploaded is X.


Then the MD5 value is computed. The MD5 value will be utilized to
find duplicate file in the server.
Activities
• Downloading a File

• Users can download file from the cloud.

• Since the files are uploaded parallelly in different chunks, parallel


downloading is also possible.
Uploading and Deduplication Algorithm
Downloading Algorithm
Implementation
HARDWARE REQUIREMENT

• Processor : Intel(R) Core(TM) i3


• RAM : 256 MB
• Hard Disk Drive : 20 GB or higher
• Keyboard : 101/102 Natural Keyboard
• Monitor : Resolution of 800 X 600
• Mouse : Serial Mouse
SOFTWARE REQUIREMENT
• Operating System : Windows XP
• Browser : Internet Explorer 5.0
• Front End : javascript and PHP
• Back End : SQL SERV ER 2014
• Server : Apache Tomcat
• IDE : Adobe Dreamweaver
Tools Study
• Javascript and PHP
• Apache Tomcat Server
• MySQL Server
• Adobe Dreamweaver
Conclusion
• A light-weight meta-data design for big files in a cloud
storage system is done.
• Every file has nearly the same size of meta-data.
• A logical contiguous chunk-id of chunk collection of files is
created in the cloud. That makes it easier to distribute
data and scale-out the storage system.
• Brought out the advantages of key-value store into big-
file data store which is not by default supported for big-
value.
References.
• Dropbox tech blog. https://tech.dropbox.com/. Accessed October 28, 2014.
• Zing me. http://me.zing.vn. Accessed October 28, 2014.
• D. Borthakur. Hdfs architecture guide. HADOOP APACHE PROJECT
http://hadoop. apache. org/common/docs/current/hdfs design. pdf, 2008.
• F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach,
– M.Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A
distributed storage system for structured data. ACM Transactions on
Computer Systems (TOCS), 26(2):4, 2008.
• I. Drago, E. Bocchi, M. Mellia, H. Slatman, and A. Pras. Benchmarking
personal cloud storage. In Proceedings of the 2013 conference on Internet
measurement conference, pages 205–212. ACM, 2013.
• I. Drago, M. Mellia, M. M Munafo, A. Sperotto, R. Sadre, and A. Pras. Inside
dropbox: understanding personal cloud storage services. In Proceedings of
the 2012 ACM conference on Internet measurement conference, pages
481–494. ACM, 2012.

You might also like