Searchstorage Techtarget Com Definition Data-Deduplication
Searchstorage Techtarget Com Definition Data-Deduplication
g
Ho m e > Ente rprise sto rage , planning and m anage m e nt > > data de duplicatio n (Inte llige nt co m pre ssio n o r single -
instance sto rage ) de finitio n
Da t a De d u p l i c a t i o n ( I n t e l l i g e n t
Co m p r e s s i o n O r Si n g l e -I n s t a n c e
St o r a g e ) De f i n i t i o n
P oste d by Margare t Rouse
WhatIs.co m
c s o n
2 C on tribu tor(s): S te ph e n J . B ig e low, J e ff Ha wkin s
2 0 1 5 P lanning : T he T o p 1 0 D at a St o rag e
D e f init io ns Yo u N e e d T o K no w
Whether you’re a seasoned IT expert or a relative newcomer, the jargon surrounding data
storage technologies can be overwhelming. Before you finalize your 2015 planning, refer
to this Special Report to find out the top 10 most important storage terms you need to
know today.
E- mail
Ad d r e s s :
D o wn l o a d N o w
Data deduplication offers other benefits. Lower storage space requirements will save money
on disk expenditures. The more efficient use of disk space also allows for longer disk
retention periods, which provides better recovery time objectives (RTO) for a longer time and
reduces the need for tape backups. Data deduplication also reduces the data that must be
sent across a WAN for remote backups, replication, and disaster recovery.
Data deduplication can generally operate at the file or block level. File deduplication eliminates
duplicate files (as in the example above), but this is not a very efficient means of
deduplication. Block deduplication looks within a file and saves unique iterations of each block.
Each chunk of data is processed using a hash algorithm such as MD5 or SHA-1. This process
generates a unique number for each piece which is then stored in an index. If a file is updated,
only the changed data is saved. That is, if only a few bytes of a document or presentation are
changed, only the changed blocks are saved; the changes don't constitute an entirely new
file. This behavior makes block deduplication far more efficient. However, block deduplication
takes more processing power and uses a much larger index to track the individual pieces.
Hash collisions are a potential problem with deduplication. When a piece of data receives a
hash number, that number is then compared with the index of other existing hash numbers. If
that hash number is already in the index, the piece of data is considered a duplicate and does
not need to be stored again. Otherwise the new hash number is added to the index and the
new data is stored. In rare cases, the hash algorithm may produce the same hash number for
two different chunks of data. When a hash collision occurs, the system won't store the new
data because it sees that its hash number already exists in the index.. This is called a false
positive, and can result in data loss. Some vendors combine hash algorithms to reduce the
possibility of a hash collision. Some vendors are also examining metadata to identify data and
prevent collisions.
In actual practice, data deduplication is often used in conjunction with other forms of data
reduction such as conventional compression and delta differencing. Taken together, these
three techniques can be very effective at optimizing the use of storage space.
2 ac c e s s me mo r y)
2
s t o r ag e hyp e r vis o r s alt
2 2
s t o r ag e r e s o ur c e Re mo t e b ac kup s e r vic e s
2 manag e me nt (S RM)
2 FAQ
z 0 c o mme nt s O ld e s t 5
Share yo ur co mme nt
Re gis t e r o r Lo gin
E- Mail
Us e r name / P as s wo r d
Username
Password
Comment
By subm itting yo u agre e to re ce ive e m ail fro m Te chTarge t and its partne rs. If yo u re side o utside o f the Unite d State s,
yo u co nse nt to having yo ur pe rso nal data transfe rre d to and pro ce sse d in the Unite d State s. Privacy
-ADS B Y G OOG LE
SOLID STATE STORAGE VIRTUAL STORAGE CLOUD STORAGE DISASTER RECOVERY DATA BACKUP
5
Se a rc h So lid St a t e St o ra ge
Adve rtise rs Busine ss Partne rs Me dia Kit Co rpo rate Site Expe rts