0% found this document useful (0 votes)
29 views5 pages

Searchstorage Techtarget Com Definition Data-Deduplication

Data deduplication is a method of reducing storage needs by eliminating redundant data. Only a single copy of duplicate data is retained, while other instances reference the unique copy. This can significantly reduce storage demands. For example, 100 instances of a 1MB file could be reduced to just 1MB. Data deduplication offers benefits like lower storage costs, longer retention periods, and reduced backup data. It can operate at the file or block level, with block level generally being more efficient. Potential issues include hash collisions where the same hash is generated for different data.

Uploaded by

Brijesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views5 pages

Searchstorage Techtarget Com Definition Data-Deduplication

Data deduplication is a method of reducing storage needs by eliminating redundant data. Only a single copy of duplicate data is retained, while other instances reference the unique copy. This can significantly reduce storage demands. For example, 100 instances of a 1MB file could be reduced to just 1MB. Data deduplication offers benefits like lower storage costs, longer retention periods, and reduced backup data. It can operate at the file or block level, with block level generally being more efficient. Potential issues include hash collisions where the same hash is generated for different data.

Uploaded by

Brijesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

9 Se archSt o rage

g
Ho m e > Ente rprise sto rage , planning and m anage m e nt > > data de duplicatio n (Inte llige nt co m pre ssio n o r single -
instance sto rage ) de finitio n

Da t a De d u p l i c a t i o n ( I n t e l l i g e n t
Co m p r e s s i o n O r Si n g l e -I n s t a n c e
St o r a g e ) De f i n i t i o n
P oste d by Margare t Rouse
WhatIs.co m

c s o n
2 C on tribu tor(s): S te ph e n J . B ig e low, J e ff Ha wkin s

Data deduplication (often called "intelligent compression" or "single-instance storage") is a


method of reducing storage needs by eliminating redundant data. Only one unique instance of
the data is actually retained on storage media, such as disk or tape. Redundant data is
replaced with a pointer to the unique data copy. For example, a typical email system might
contain 100 instances of the same one megabyte (MB) file attachment. If the email platform is
backed up or archived, all 100 instances are saved, requiring 100 MB storage space. With data
deduplication, only one instance of the attachment is actually stored; each subsequent
instance is just referenced back to the one saved copy. In this example, a 100 MB storage
demand could be reduced to only one MB.

2 0 1 5 P lanning : T he T o p 1 0 D at a St o rag e
D e f init io ns Yo u N e e d T o K no w

Whether you’re a seasoned IT expert or a relative newcomer, the jargon surrounding data
storage technologies can be overwhelming. Before you finalize your 2015 planning, refer
to this Special Report to find out the top 10 most important storage terms you need to
know today.
E- mail
Ad d r e s s :

D o wn l o a d N o w

B y sub mittin g yo ur p e rso n al in fo rmatio n , yo u ag re e to re ce ive e mails re g ard in g re le van t p ro d ucts


an d sp e cial o ffe rs fro m Te ch Targ e t an d its p artn e rs. Yo u also ag re e th at yo ur p e rso n al in fo rmatio n
may b e tran sfe rre d an d p ro ce sse d in th e U n ite d S tate s, an d th at yo u h ave re ad an d ag re e to th e
Te rms o f U se an d th e Privacy Po licy.

Data deduplication offers other benefits. Lower storage space requirements will save money
on disk expenditures. The more efficient use of disk space also allows for longer disk
retention periods, which provides better recovery time objectives (RTO) for a longer time and
reduces the need for tape backups. Data deduplication also reduces the data that must be
sent across a WAN for remote backups, replication, and disaster recovery.

Data deduplication can generally operate at the file or block level. File deduplication eliminates
duplicate files (as in the example above), but this is not a very efficient means of
deduplication. Block deduplication looks within a file and saves unique iterations of each block.
Each chunk of data is processed using a hash algorithm such as MD5 or SHA-1. This process
generates a unique number for each piece which is then stored in an index. If a file is updated,
only the changed data is saved. That is, if only a few bytes of a document or presentation are
changed, only the changed blocks are saved; the changes don't constitute an entirely new
file. This behavior makes block deduplication far more efficient. However, block deduplication
takes more processing power and uses a much larger index to track the individual pieces.

Hash collisions are a potential problem with deduplication. When a piece of data receives a
hash number, that number is then compared with the index of other existing hash numbers. If
that hash number is already in the index, the piece of data is considered a duplicate and does
not need to be stored again. Otherwise the new hash number is added to the index and the
new data is stored. In rare cases, the hash algorithm may produce the same hash number for
two different chunks of data. When a hash collision occurs, the system won't store the new
data because it sees that its hash number already exists in the index.. This is called a false
positive, and can result in data loss. Some vendors combine hash algorithms to reduce the
possibility of a hash collision. Some vendors are also examining metadata to identify data and
prevent collisions.

In actual practice, data deduplication is often used in conjunction with other forms of data
reduction such as conventional compression and delta differencing. Taken together, these
three techniques can be very effective at optimizing the use of storage space.

m CC oo nmtpi nr ue es sRi oe na dOi nr gS iAnbgol eu-tInDsattaanDc ee dSutpolri ac ga tei)oDn e( In t e llig e nt


f init io n
DRAM (d ynamic r and o m has hing

2 ac c e s s me mo r y)

2
s t o r ag e hyp e r vis o r s alt

2 2
s t o r ag e r e s o ur c e Re mo t e b ac kup s e r vic e s

2 manag e me nt (S RM)

2 FAQ

z 0 c o mme nt s O ld e s t 5

Share yo ur co mme nt

Send me notifications when other members comment.

Re gis t e r o r Lo gin
E- Mail

[email protected]

Us e r name / P as s wo r d

Username

Password

Comment

By subm itting yo u agre e to re ce ive e m ail fro m Te chTarge t and its partne rs. If yo u re side o utside o f the Unite d State s,
yo u co nse nt to having yo ur pe rso nal data transfe rre d to and pro ce sse d in the Unite d State s. Privacy

-ADS B Y G OOG LE

SOLID STATE STORAGE VIRTUAL STORAGE CLOUD STORAGE DISASTER RECOVERY DATA BACKUP

5
Se a rc h So lid St a t e St o ra ge

Dat r ium DVX s e r ve r - s id e f las h s t o r ag e c o me s o ut o f s t e alt h


Datrium built its DVX software and NetShelf system for customers that want to use low-cost flash in servers
to separately scale ...

Valle y He alt h S ys t e m p r e s c r ib e s all- f las h Vio lin s t o r ag e


Valley Health System is shifting to an all-flash storage model to improve the performance of its medical
records app and data ...

Abo ut Us Co ntact Us Privacy Po licy Vide o s Pho to Sto rie s


Guide s

Adve rtise rs Busine ss Partne rs Me dia Kit Co rpo rate Site Expe rts

Re prints Archive Site Map Eve nts E-Pro ducts


All Rights Re se rve d,
Co pyright 2000 - 2015, Te chTarge t

You might also like