0% found this document useful (0 votes)
104 views

Data Protection and Recovery in Small Mid-Size

The information and recommendations made by Storage Strategies NOW, Inc. Are based upon public information and sources and may also include personal opinions. Storage Strategies NOW assumes no responsibility or liability for any damages whatsoever (including incidental, consequential or otherwise) caused by your use of, or reliance upon the information presented herein. This report is purchased by Nexsan, who understands and agrees that the report is furnished solely for its use only.

Uploaded by

shidrang_G
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views

Data Protection and Recovery in Small Mid-Size

The information and recommendations made by Storage Strategies NOW, Inc. Are based upon public information and sources and may also include personal opinions. Storage Strategies NOW assumes no responsibility or liability for any damages whatsoever (including incidental, consequential or otherwise) caused by your use of, or reliance upon the information presented herein. This report is purchased by Nexsan, who understands and agrees that the report is furnished solely for its use only.

Uploaded by

shidrang_G
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Data Protection and Recovery in the Small and Mid-sized Business (SMB)

An Outlook Report from Storage Strategies NOW

By Deni Connor, Patrick H. Corrigan and James E. Bagley Intern: Emily Hernandez October 11, 2010

Storage Strategies NOW 8815 Mountain Path Circle Austin, Texas 78759

Note: The information and recommendations made by Storage Strategies NOW, Inc. are based upon public information and sources and may also include personal opinions both of Storage Strategies NOW and others, all of which we believe are accurate and reliable. As market conditions change however and not within our control, the information and recommendations are made without warranty of any kind. All product names used and mentioned herein are the trademarks of their respective owners. Storage Strategies NOW, Inc. assumes no responsibility or liability for any damages whatsoever (including incidental, consequential or otherwise), caused by your use of, or reliance upon, the information and recommendations presented herein, nor for any inadvertent errors which may appear in this document. This report is purchased by Nexsan, who understands and agrees that the report is furnished solely for its use only and may be distributed in whole to partners, prospects and customers. Copyright 2010. All rights reserved. Storage Strategies NOW, Inc.

Sponsored By

Table of Contents
Sponsored By .......................................................................................................................................................... 2 Introduction ............................................................................................................................................................ 6 The Small and Medium Business Market ............................................................................................................... 6 Size of Market by Revenue and IT Spending ...................................................................................................... 6 Importance of Data Protection and Business Continuity Software .................................................................... 6 SMB Unique Requirements .................................................................................................................................7 Growing Data Retention Demands ......................................................................................................................7 Technology Availability ........................................................................................................................................7 US SMB Businesses and Revenue by Size (SBA, 2007 data) ...............................................................................7 The North American Industrial Classification System (NAICS) ........................................................................ 8 How To Reach the SMB Market .......................................................................................................................... 9 SMB Sectors Requiring Large Amounts of Data ................................................................................................. 9 Energy exploration and operations for oil and natural gas ............................................................................. 9 Mining operations other than oil and gas ....................................................................................................... 9 Motion picture and video production ............................................................................................................. 9 Data processing, hosting and related services ................................................................................................ 9 Software publishers ........................................................................................................................................10 The financial industry.....................................................................................................................................10 Legal services ..................................................................................................................................................10 Accounting, tax preparation, bookkeeping and payroll services ...................................................................10 Architectural, engineering and related services ............................................................................................. 11 Computer systems design and related services .............................................................................................. 11 Research and development in physics, engineering and life sciences ........................................................... 11 Healthcare ...................................................................................................................................................... 11 Managed Service Providers (MSPs) ................................................................................................................... 12 Data Protection Technologies ................................................................................................................................ 13 Backup to Tape ................................................................................................................................................... 13 Virtual Tape Library (VTL) ................................................................................................................................ 14 Tape vs. Disk What to Choose and Why ......................................................................................................... 14 Disk-to-Disk-to-Tape (D2D2T) .......................................................................................................................... 14 Tape isnt dead, the mission for tape changed ............................................................................................... 14 Tape vs. Disk - Reliability ............................................................................................................................... 15 Tape vs. Disk - Performance ........................................................................................................................... 16 Tape vs. Disk - Management .......................................................................................................................... 17
3

Tape vs. Disk - Availability .............................................................................................................................18 Tape vs. Disk - Power Efficiency ....................................................................................................................18 Conclusion ...................................................................................................................................................... 19 Backup to Removable Disk ................................................................................................................................ 19 On-Line Backup ................................................................................................................................................ 20 Online Backup Issues ..................................................................................................................................... 21 Methods for Backing up Data ............................................................................................................................ 21 File Synchronization ......................................................................................................................................... 22 Remote Data Replication .................................................................................................................................. 22 Images, Clones and Snapshot Images ............................................................................................................... 23 Continuous Data Protection and Near Continuous Data Protection ................................................................ 24 Agent vs. Agentless Backup ............................................................................................................................... 24 Windows Volume Shadow Copy Service (VSS) ................................................................................................. 24 Encryption and Password Protection of Backup Media ................................................................................... 25 Tape Drive-based Encryption ........................................................................................................................ 25 Encryption Issues .......................................................................................................................................... 25 Backup Data Compression ............................................................................................................................ 25 Data Deduplication ........................................................................................................................................... 26 File Mode and Block Mode ............................................................................................................................ 26 In-Line or Post-Processing Deduplication .................................................................................................... 26 Backup Performance ..................................................................................................................................... 27 Restore Performance ..................................................................................................................................... 28 Power and Cooling Consideration ................................................................................................................. 28 ECO-Matters .................................................................................................................................................. 29 Source or Target Deduplication..................................................................................................................... 29 The Downsides of Data Deduplication .......................................................................................................... 29 Application-Specific Backup ............................................................................................................................. 29 Virtual Machine (VM) Backup .......................................................................................................................... 30 Backing Up Virtual Machines ............................................................................................................................ 31 Hypervisor-specific Backup Methods ............................................................................................................... 32 Microsoft Hyper-V ................................................................................................................................. 32 KVM, VirtualBox, Xen, XenServer and Others ...................................................................................... 32

Tips and Best Practices for Effective Backups ...................................................................................................... 33 Use case profile ..................................................................................................................................................... 34 Customer Name: Clark Enersen .................................................................................................................... 34 Vendor Name: Nexsan ...................................................................................................................................... 35
4

Table 1.1 Vendor/Product Name........................................................................................................................... 36

Introduction
The data protection and recovery space is exploding as more businesses recognize that protecting their assets their information -- is key to business survival. Small and mid-sized businesses are a market that has been underserved by data protection software, appliances and online backup services until the last few years. Yet, these organizations have the same needs as large enterprises to protect their data. SMBs, unlike large enterprises, though are faced with a number of unique challenges. Providing full-time dedicated IT resources may be beyond their means and paying for that IT help and for the software to manage their data may quickly overwhelm them. They often turn to managed service providers or value-added resellers to manage their infrastructures or to supplement the IT skills they have. Now, there are many software packages, appliances, target arrays (which have integrated snapshot and replication capabilities) and services available to SMB customers that provide data protection and recovery. This survey addresses most of them.

The Small and Medium Business Market


For this survey we analyzed companies with at least one paid employee but less than one thousand employees. In the United States alone, there are approximately 5.75 million firms in this category and only about 13,000 firms with one thousand or more employees. Worldwide, SSG-NOW estimates there are more than eight million firms in this category. In addition, many governmental units, be they departments of larger organizations or typical municipalities, have similar IT requirements of the SMB. The SMB market cant be characterized solely by the number of employees an organization has. We talked to many SMBs that while they have few employees, have storage capacities under management that may surprise the casual observer. Their ability to consume storage varies widely from 500GB at the low-end to 100TB at the high-end. In some instances such as video post-production, the data can grow into the petabyte range just in the manufacture of a single movie. The amount of data growth SMBs are experiencing is growing at a pace that doubles every 18 months.

Size of Market by Revenue and IT Spending


All US companies had revenue of about $30 trillion in 2007. SMBs accounted for $13 trillion in revenue that year, which is the most recent data set available. Despite the effects of global recession, worldwide IT spending by SMBs was about $575 billion in 2009 and is estimated to grow to $630 billion by 2014.

Importance of Data Protection and Business Continuity Software


Data protection has become the highest priority for IT spending in the SMB according to surveys conducted in 2010. This represents a shift as SMB executives realize how computer-centric their organizations have become. In recent times, data protection was viewed as expensive insurance against events that could not easily be predicted and costs of data loss were unknown. But organizations of all sizes now realize that the loss of access to data directly affects their ability to operate. The recommended allocation of IT budget to this critical function ranges from 5% to 10%, depending on the type of business. Worldwide, SMB spending on data protection is estimated at $30 billion to $60 billion in 2010.

SMB Unique Requirements


SMBs have unique requirements and challenges when compared to their larger counterparts. First, IT resources are minimal and often performed by the proprietor or staff members that have many other responsibilities in the firm. We talked to one SMB, whose administrator was also responsible for human resources and finance, as well as a myriad of other miscellaneous responsibilities. Further, while infrastructure is often limited to a number of desktop or laptop clients and perhaps a few dozen servers, technology available to these organizations is second to none and the ability to adopt new equipment, often at lower cost and better performance, is usually easier than in large organizations that are not as nimble because they need to move technology forward en masse. Purchasing decisions are likely to be quicker due to fewer people involved. And one data loss experience is usually enough to justify acquisition of data protection and business continuity products. With the reliance upon technology within virtually every business endeavor, data loss experiences happen at an ever increasing rate.

Growing Data Retention Demands


One thing SMBs have in common with their larger counterparts is the explosive growth in data storage requirements. Certain business segments have higher capacity requirements, for example, healthcare providers, law firms and financial organizations. But all organizations have data retention requirements for accounting information and increasing governmental reporting demands. The data must be retained and available for long periods, often, as in the case of electronic health records, forever.

Technology Availability
Changing technology can be rapidly adopted by SMBs. Simple tape backup systems, the norm of a decade ago, are now being replaced by low-cost drive arrays and even small organizations are adopting virtualization, replication, mirroring and deduplication technologies. Low-cost bandwidth supplied by cable and telecommunication providers allows the delivery of cloud-based data protection services to home and small offices. The use case of local data storage appliances that automatically back up to cloud storage are becoming widely deployed as organizations realize that offsite data retention can be transparent and automatic to their operations, as opposed to an expensive, problem-prone effort.

US SMB Businesses and Revenue by Size (SBA, 2007 data)


Size Total 0-4 5-9 10-14 15-19 20-24 25-29 30-34 Firms 6,049,655 3,705,275 1,060,250 425,914 218,928 134,254 89,643 64,753 Establishments 7,705,018 3,710,700 1,073,875 444,721 237,689 152,547 106,623 81,086 Employees 120,604,265 6,139,463 6,974,591 4,981,758 3,674,424 2,928,296 2,405,637 2,063,987 Revenue (x1000) $29,746,741,904 $1,434,680,823 $1,144,930,232 $791,709,665 $603,788,766 $489,530,870 $402,007,359 $364,392,992

Size 35-39 40-44 45-49 50-74 75-99 100-149 150-199 200-299 300-399 400-499 500-749 750-999 0-999

Firms 47,641 38,221 29,705 86,364 41,810 39,316 18,620 17,780 8,155 4,715 6,094 2,970 6,040,408

Establishments 62,878 51,847 43,325 139,864 85,215 102,135 66,602 87,923 55,515 43,678 71,702 45,990 6,663,915

Employees 1,754,582 1,600,913 1,391,754 5,195,105 3,582,686 4,749,055 3,205,201 4,309,143 2,808,347 2,101,982 3,695,682 2,561,972 66,124,578

Revenue (x1000) $304,339,758 $293,476,569 $249,407,544 $979,545,562 $710,220,323 $967,245,234 $674,337,913 $897,848,746 $595,711,397 $476,906,931 $800,475,934 $636,199,229 12,816,755,847

The North American Industrial Classification System (NAICS)


The NAICS is maintained by the Census Bureau as a way to classify businesses into sectors. The following are major classifications with subsectors defined under each. Sectors shown in bold are included in this report as containing SMBs with the largest amount of data per employee. Sector Description 11 21 22 23 31-33 42 44-45 48-49 51 52 53 54 55 56 61 62 71 72 81 99 Forestry, Fishing, Hunting and Agriculture Support Mining Utilities Construction Manufacturing Wholesale Trade Retail Trade Transportation and Warehousing Information Finance and Insurance Real Estate and Rental and Leasing Professional, Scientific and Technical Services Management of Companies and Enterprises Administrative and Support and Waste Management and Remediation Services Educational Services Health Care and Social Assistance Arts, Entertainment and Recreation Accommodation and Food Services Other Services (except Public Administration) Unclassified

How To Reach the SMB Market


With upwards of $60 billion in annual data protection spending, many technology companies are specifically targeting the SMBs with data protection and business continuity products. Due to the large number of organizations, marketing to millions of SMBs is distinctly different than targeting the Global 5000 enterprises. Mass media marketing by software-as-a-service providers MozyPro and Carbonite is an example. Much of the delivery of technology to this market is performed by managed service providers (MSPs), effectively, outsourced IT groups. Data protection software is often bundled with storage systems, ranging from small network attached storage (NAS) filers to large, purpose-built systems with all of the advanced features used by large enterprises. How do these systems apply to SMBs? The answer is simple. While the SMB market is defined as fewer than 1,000 employees, the amount of storage that needs protection varies wildly based on the type of business. As a result, vertical marketing, along with channel partner recruitment, is a critical factor in reaching the gold in the SMB market. A small radiology practice may only have a dozen employees but will often be managing terabytes of data that doubles every 18 months and needs to be retained forever.

SMB Sectors Requiring Large Amounts of Data


As we focus our research on the businesses with fewer than 1,000 employees, we find a disconnect between the amount of storage being protected as a function of the number of employees. This is because the sector the business serves is a bigger predictor than the number of employees. We find that the small business segments that have large amounts of data include energy exploration and extraction, engineering, healthcare practices, law firms and motion picture/video production, to mention a few.

Energy exploration and operations for oil and natural gas

6,430 firms are involved in oil and natural gas extraction (NAICS 211111). 1,950 firms perform oil and natural gas drilling (NAICS 213111). 6,880 firms are involved in support services for oil and gas operations (NAICS 213112). These SMBs consume large amounts of data in the analysis of geologic information, engineering and equipment design data, well design and mapping, production volume and management of flows and depletion. Regulations require significant data retention periods, in many cases, permanent retention, as well as continuous reporting to various governmental entities for safety, revenue, environmental and mapping purposes.

Mining operations other than oil and gas

4,440 firms are involved in mineral extraction other than oil and gas (NAICS 212). Another 680 firms are involved in support of mining operations (NAICS 213113, 213114, 213115). These SMBs have similar data set analysis, retention and reporting requirements to the oil and gas industries.

Motion picture and video production

Motion picture and video production, perhaps surprisingly, is dominated by SMBs. Some 12,300 firms have fewer than 1,000 employees while only 45 firms employ more than 1,000 (NAICS 51211). Another 2,015 firms are involved in post production (NAICS 51219), which has the highest storage requirement per employee because these firms are dealing with editing the full content. The advent of high-definition video and three dimensional productions has multiplied the amount of storage required during production and post production. Additionally the cutover from analog to nearly 100% digital content has hit this sector like a tidal wave.

Data processing, hosting and related services


7,280 firms in this sector are SMBs (NAICS 5182). These include the Managed Service Providers who, in many cases, are the IT departments for the majority of SMBs. These organizations are further critical to the data protection market as resellers, recommenders or providers of data protections equipment and services. The growing amount of cloud-based storage services are accessed through this sector.
9

Software publishers

6,000 software publishers (NAICS 5112) with fewer than 1,000 employees have special data protection requirements including revision control and maintenance of large test case databases.

The financial industry

The financial industry, while including many huge banking institutions, is made up of many thousands of SMBs. Due to the transactional data protection, as well as retention requirements, these organizations all have very large data sets that must be protected from data loss, corruption and theft. Credit intermediation and related services This large segment includes all banking, savings and lending services. Approximately 67,500 firms fall into the SMB category (NAICS 522). Transactional data is critical to all companies. It is not acceptable to have any loss of data in this area. Inability to access transactional data may be acceptable for brief periods, but loss of data is fatal. In addition, all data must be retained for indefinite periods, and most firms keep all transactional data permanently. In addition, support data such as e-mails and other communications are subject to regulatory immutability -- that is, any communication must be maintained in case of inquiry or litigation. Finally, all documentation related to lending has come under additional regulation during the last year. This creates a tremendous data protection requirement for all firms in this sector. Securities intermediation and related services 54,500 firms involved in the brokerage of stocks, bonds and other financial instruments are included in this sector (NAICS 523). Since the Sarbanes-Oxley Act of the early 2000s, and recent enactments under extended financial regulation, transactions and communications are under increasingly strict regulatory control. Additional scrutiny extends to executive compensation and communications regarding all security transactions. This creates additional stress in this sector in terms of data protection and retentions. Insurance carriers and related services 67,000 SMBs are involved in the insurance business (NAICS 524). From simple insurance agents to large scale fiduciary activity, the sector has come under significant new reporting and financial allocation requirements. The new regulations in health insurance create additional reporting and data management requirements. Unless significant changes to existing legislation occurs, this segment will be adding storage and protection at levels of an order of magnitude over prior periods. Large data sets also include research functions and actuarial data. Funds, trusts and other financial vehicles 2,100 SMBs are involved in the management of trusts, mutual funds and other financial instruments (NAICS 525). These firms have similar transactional, data retention and regulatory reporting requirements.

Legal services

185,000 SMBs are involved in legal services (NAICS 5411). Data protection and security are extremely critical in law firms. Extensive access to online research has replaced the traditional law library. Recent enhancements to tool sets related to electronic discovery and immutable archiving have increased the amount of data managed by law firms.

Accounting, tax preparation, bookkeeping and payroll services

106,200 SMBs are involved in accounting and payroll services (NAICS 5412). Data protection is critical to these organizations that have regulatory requirements for long-term record retention.

10

Architectural, engineering and related services

100,000 SMBs are involved in architectural and civil engineering (NAICS 5413). Permanent data retention and fast access to the data is critical in this field. Often multiple offices of these firms require simultaneous access to this information. Computer aided design (CAD) data represents very large data sets with access and revision control as major application requirements.

Computer systems design and related services

99,600 SMBs are involved in computer systems design and services (NAICS 5415). These firms have extensive data requirements for CAD files, computer program source files, test and simulation data and development support data.

Research and development in physics, engineering and life sciences

10,900 SMBs perform research in these areas (NAICS 54171). Huge data basis including genomes, pharmaceuticals and theoretical simulations are required for basic research. This data is often modified and updated with associated revision control and results derivatives. Data protection is critical to the field, with many regulatory aspects related to field testing and trial results.

Healthcare
The American Reinvestment of Recovery Act of 2009 (ARRA), often referred to as the stimulus plan) created a fund in excess of $35 billion to fund new technology for healthcare providers of all types. Along with the large source of funding came new requirements for data retention and security of data against breach of personal data. The Affordable Care Act of 2010 added a number of new regulations that directly affect information technology in this sector. New diagnostic equipment generates huge data sets that must be retained within electronic health records (EHR) permanently. The net result is an exponential increase in the amount of data that must be retained and protected. Offices of physicians 190,500 SMBs make up the vast majority of physician practices in the US (NAICS 6211). These firms are under the same regulations, incentives and data retention requirements of the hospital system, generally without the benefit of information technology employees. In addition to an array of specialized medical equipment that generates large amounts of data, physicians are becoming more computer-centric in all areas, including EHR, billing and even prescription writing. A major requirement of EHR compliance under the ARRA is computerized prescription order entry (CPOE) which will require automation far beyond the simple scribbling of a prescription onto a piece of paper and sending it off with the patient. Outpatient care centers other than family planning and substance abuse Some 8,200 firms are involved in this sector (NAICS 62149) which includes surgical, HMO, dialysis and emergency care centers. Medical and diagnostic laboratories 7,500 medical and diagnostic laboratories are SMBs (NAICS 6215). Huge data sets are generated by these organizations and are subject to the same type of regulatory considerations as all organizations involved in EHR generation. Acute care hospitals We do not consider the 4,000 Acute Care Hospitals to be in the SMB market (NAICS 622). While some individual hospitals may fall into the category, even small hospitals are generally managed by larger organizations with IT staffing and centralized support.

11

Managed Service Providers (MSPs)


Managed Service Providers are major resources for IT support to the SMBs. Ranging from a few employees to large regional and national entities, MSPs provide hardware and software recommendations, resale of equipment and software and often provide hosted data center support. They are a major channel for data protection services to the SMB market.

12

Data Protection Technologies


Backup to Tape
The traditional method of system backup has been file-by-file backup to tape. Typically a tape rotation scheme is used that provides backups at different points in time. Tape has suffered from a number of problems: A multiplicity of tape and data formats. Numerous tape formats of varying capacities have been and are being used. These formats are incompatible with each other, so moving from one tape technology over time or to one tape library to another can be an expensive process. In addition, backup software vendors have often used their own proprietary logical data formats (similar to the way different word processors, such as Word and WordPerfect, use different formats), which further compound the problem. In both cases you must have the same type of tape drive and often the same software to restore a tape. The recent development of the Linear Tape File System (LTFS), a standardized file system for LTO-51 tape, should help alleviate the compatibility issues to some degree, assuming the standard is widely accepted. Reliability issues. Over the years tape has suffered from reliability issues, both with drives and media. It is not uncommon for a tape drive to require repair or replacement within three years. Although newer tape technologies, such as LTO, have improved tape reliability, media issues are still all too common. In addition, tape drive read/write heads must be cleaned on a regular basis to maintain reliability. Tape, libraries add additional mechanical components that can fail as well and require replacement. Cost. The cost per megabyte of tape media has dropped considerably over the years, but it has not kept pace with the drop in the cost of disk media. In addition, the cost of the tape drive itself is relatively high. Internet pricing on LTO-4 drives (800GB native capacity) is about $2,500-$4,000, while LTO-5 drives (1.5TB native capacity) sell for about $3,500-$5,000. Performance. Although tape drive performance and tape capacity have both increased significantly in recent years, the amount of data most organizations need to back up has increased dramatically as well. Even with faster backups, many organizations cannot perform full backups to tape in their available backup window without using multiple tape drives and multiple backup servers, further increasing the cost and complexity of tape backup. Recovery. Recovering data from tape can be time-consuming. For effective recovery tapes must be labeled properly and the backup system must maintain a catalog of tapes in a database. If the database is lost or corrupted tapes must be re-cataloged, which can itself be a very timeconsuming process. Since most tape rotation schemes include some offsite storage of tapes, if the data that needs to be recovered is on a tape stored off site that tape must be retrieved to recover the data. Since tape is a liner format, accessing and restoring a file or files usually takes

Linear Tape Open (LTO) is a tape format created by the LTO Consortium, which was initiated by Seagate, HP and IBM. LTO is an open standard created in the late nineteen nineties as an alternative to the numerous proprietary tape formats then in existence. LTO-5 is the latest incarnation of the standard. LTO-5 tape cartridges have a native capacity of 1.5 TB. Linear Tape File System (LTFS) is a standardized file system for LTO-5 and above. Data written in LTFS format can be used independently of any particular storage application. Since LTO is an open standard, LTO drives and media are available from many manufacturers. 13

significantly longer than restoring the same file or files from disk. Automated tape libraries and bar coding of tapes can alleviate some of these issues, but automated libraries add additional mechanical and electronic components that can fail, so in some circumstances they can create additional problems. In spite of these issues many SMBs still use tape backup. In some organizations it is the only backup method employed, while in others it is used in addition to or in conjunction with another backup method.

Virtual Tape Library (VTL)


Virtual Tape Libraries solve one of the major problems of tape the difficulty of completing a backup within the available backup window. A VTL appears to the system to which it is connected as a tape library with multiple tapes. This means that an organization can use their existing legacy tape backup software to back up to a much faster disk-based systems. With a VTL, the virtual tapes are stored on the system for a period of time to allow file restorations, if necessary. Sophisticated VTLs can also export data to tape for archiving purposes. Vendors of VTLs include IBM, SEPATON, Quantum, FalconStor Software, Data Domain, Overland Storage and Hitachi Data Systems. Employing a VTL might make sense for SMBs who are trying to extend the life of their existing backup software, but a disk-to-disk-to-tape approach2 (see below) probably makes more sense if software is being upgraded or currently supports disk-to-disk-to-tape.

Tape vs. Disk What to Choose and Why

Disk-to-Disk-to-Tape (D2D2T)

Tape isnt dead, the mission for tape changed


The continual announcements of the death of tape are certainly premature. It does however mark the beginning of the inevitable. Storage technologies do die, as witnessed by the demise of everything from Hollerith cards, paper tape, floppy disks, and round reel tape to name a few. They all shared a common fate, the cost and performance to more reliably store data was superseded by emerging technology. Cartridge tape replaced round reel, but does disk threaten the future of tape? The answer is yes, in some areas, and no in others at least for the moment. When looking at the benefit comparison, some IT professionals who choose tape for their backup environment will end up citing a couple things like, tape is still fast enough to meet their window, or, their organization can handle extended periods of downtime while waiting on a restore. However, the most commonly used answer for a tape deployment over disk is that the sheer expense of tape is simply, cheaper. With the performance, management and reliability benefit clearly belonging to disk, the outstanding issue seems to be a perceived cost issue. When making a direct cost comparison of media, it is true that the cost-per-byte is slightly cheaper for a tape cartridge than it is for an equivalent disk. However when you consider the larger costs of media upgrades, new transports, management, and remastering, the overall costs will likely favor the new generations of dense disk
2

VTL is a form of disk-to-disk-to-tape, but it is usually not recognized as such by backup software. Most backup software sees VTL as an actual tape library. 14

arrays using MAID energy savings. For some IT professionals, thats where they draw the line and make a decision. For them, the cost of media is the race. The problem, however, has to do with the fact that the race really isnt about the cost of media, its the associated cost of several other factors: downtime, reliability, management, availability, data growth and the cost of the backup system itself. In other words, its about the big picture. It should be noted that some IT professionals have circumvented the whole tape vs. disk decision dilemma and have implemented tiered solutions that use both in concert, otherwise known as Disk-to-Disk-to-Tape (D2D2T). With this approach, IT professionals are writing directly to a disk array for their backup where the data remains for 90 days before being passed on to tape for deep archiving and off-site portability. With this approach, organizations are leveraging the many benefits of online disk storage while maintaining the portability and long term retention aspects they are used to receiving with tape. Since the cost comparison between tape and disk is obviously far more complex than the media itself, this section outlines all the considerations necessary for a more complete understanding of the benefit and true cost comparison between tape and disk to help IT professionals choose and justify their backup environment. Whereas, the decision was more controversial just a few short years ago, it has never been clearer and easier to understand than now with new technologies, capacities and market prices.

Tape vs. Disk - Reliability


The primary benefit of tape is to offer adequate data protection at low cost. Analysts estimate that one in ten recovery images on tape is unrecoverable. Data up to one year old has a 10-15% failure rate, and the failure rate of data five or more years old is 40-45%. Other studies have revealed that much of this goes unnoticed as Storage Magazine reported that 34% of companies, who backup their data to tape, never test their backups. They went on to say that, 77% of those companies, who did test their backups, found restore failures. Boston Computing Network, Data Loss Statistics found that 7 out of 10 small firms that experience a major data loss are out of business within a year. Paradoxically, all of this risk assumes you have completed a backup and have the option of a restore. Ironically, backups to tape are frequently not completed in the course of a defined backup window. If you have no backup, you have no option for restore. If a 10% failure rate with tape is a best-case scenario for a data center, that means it is still 10% more than an organization can afford. The best-case scenario for tape reliability is still a data centers worst operational risk. For this reason, organizations must make multiple copies of every tape backup to increase the reliability of their protection architecture. Although the cost of media might be cheaper for tape than disk on a one-for-one comparison, after one includes the number of copies it takes for tape to achieve acceptable levels of reliability, the cost-per-byte protected far exceeds disk. Thats why cost comparisons shouldnt revolve around bytes stored but rather bytes protected. Tape only outperforms disk on outright media costs when organizations accept the associated reliability risk and dont make multiple copies of each tape backup. Disk, unlike tape, has a multitude of reliability and protection elements that are built-in and commonly used like RAID and automated error checking. There is no such thing as RAID with tape. If one tape out of a backup job group fails, the integrity of the whole restore collapses. By utilizing RAID 6, organizations are protected against the most extreme circumstances like dual drive failures on the latest large capacity drives. With tape, there is no system or architecture of built-in redundancy.
15

Reliability versus cost is one of the key determinants to disk versus tape. Unless a user has the latest highend tape transport and library targeted at the enterprise, serious reliability issues are to be expected under heavy use, notwithstanding the complexity of tape management and risk from handling. Additionally, users need many transports and libraries to stand a chance of getting nightly backups done on time. Then there is the media. Anyone using tape understands that the cost of the media is the big expense. Every couple of years, as new transports are announced, users discover the old media will no longer work, and all of it has to be replaced. The process of replacement is not only expensive it is also quite disruptive. When taking into consideration the reliability exposures and the necessary retention of multiple copies for a single byte, the cost-per-byte as related to tape is far more expensive than the base measurement. And that still doesnt take into consideration what has always been seen as the necessary evil - performance, networks, resource conflicts, scheduling, media management and more. Using disk as a library offers a flexible, ultra-reliable, high-performance and operationally efficient solution. Features from backup software vendors have made backup to disk the logical choice for simple and flexible backup and recovery. When considering applications like VMware, Exchange and SharePoint, protection, recoverability and performance are key. While tape is still used, it is rare to see it used exclusively today. The benefits of Disk-toDisk (D2D) are too great, which is why at least 70% of all backups are written to disk first .

Tape vs. Disk - Performance


For many organizations, the only cost that really matters is the business cost of downtime. The fundamental question to ask when looking at tape or disk is, What are your recovery objectives? After all, its not really about the backup, its about the restore. It has to work and it has to work on time. The architecture in tape used to gain backup performance has a backlash for restore. The technique that is used to achieve acceptable performance levels with tape is called multi-threading. In multi-threading, a backup application will start many backup streams (typically around a 15), which will be interleaved to a single tape transport. A typical user may need to run 300 threads over the course of a night. The reason this is an issue is that while multi-threading allows for the best performance possible of a backup to tape, it also insures performance issues in restore, and actually increases the probability of a data loss failure. Here is why. While multi-threading allows the backup to run multiple backup job streams interleaved onto one transport to achieve high levels of performance for todays fast transports, as soon as one of the multiple streams completes, because it has a small file size, performance of the backup degrades. This degradation continues over time until eventually the transport has to slow down because it cant receive data fast enough. We are left running a transport so slow it actually goes into a start-stop mode, versus having enough data arriving to keep the transport in a full streaming mode. From a reliability point of view, start-stop mode can stress the media, which can lead to media failures. Beyond start stop mode, 80 passes are required to completely write on LTO5 cartridge, that alone is causes concern for media reliability. Each pass causes wear on the media and heads. The impact from a restore point of view is slow performance. To recover an individual file, directory, user or application, the read back performance requires reading all blocks across all tape cartridges used for the backup. The problem is that the system must read all of the tape(s), with 14/15 of all the data read back thrown away. This results in seriously slow performance if you can read it back at all. If there is a single uncorrectable read error, the entire backup may be lost. Protection objectives are measured as Recovery Point Objectives (the amount of data at risk) and Recovery Time Objectives (the amount of downtime you can tolerate) are a concern with tape. LTO- 5 tape can easily
16

take nearly 8 hours to recover 10TB as compared to 2.5 hours for a high performance disk array used as a protection library. What is the cost of downtime for you? When using disk as a protection library, the problem is solved. Backup software such as Veritas Netbackup indexs all the data as it is stored directly to disk. Whether there is a need to recover a single sub-object to VMware, an email message to Exchange or a SharePoint Document, users can recall them individually with a simple point and click. Due to the nature of random access recovery, performance is unparalleled. Many use the NDMP protocol to write directly to disk for easy configuration and management of network based backups. With NDMP, network congestion is minimized because the data path and control path are separated. With disk used as a protection library, backup can occur locally - from file servers direct to disk - while management can occur from a central location. Operation is simple because it is indexed by the backup application directly to disk. The decreased infrastructure complexity makes everything easier and far more operationally efficient. With disk, backup is faster to restore and much easier to manage.

Tape vs. Disk - Management


Backup to tape has always been an administrative challenge with the amount of manual intervention needed to perform backups. Tape backup must be closely supervised, equipment needs to be regularly maintained, heads have to be cleaned, tapes must be loaded, replaced, labeled and transported. While multiple tapes and monitoring are required for a single backup to tape, backup to disk is a completely automated procedure just set and forget. Backup to tape typically uses a Grandfather-Father-Son (GFS) managed retention plan. The GFS scheme uses daily (Son), weekly (Father), and monthly (Grandfather) backup media sets. Four backup media sets each are labeled for the day of the week. Typically, incremental backups are performed on the Son media, which is reused each week on the day matching its label. The Father media is reused monthly; and the Grandfather media records full backups on the last business day of each month. As a result, the estimated total required capacity for each 1TB of primary disk requires up to 25TB of archival tape storage The cost to implement, maintain and manage this level of protection can be overwhelming. Heres one example to illustrate the capital expense: if an organization were backing up 42TB of primary disk, they would need 1,575 LTO-4 tapes over the course of a year. This assumes an 80% efficiency usage for each cartridge. At $38 per cartridge, the cost is $59,000. Using GFS the cost of storing 25 copies of the data would rise to $1,496,250. By comparison the cost of a second 42TB array as a backup target is in the range of $45,000. A restore of a single user or application can easily require loading and reading 10 to 30 cartridges or more. Finding the right cartridges and having each one of them work without failure is a major concern. The manpower required to manage a tape library is far beyond the manpower needed to manage disk as a protection library. A tape library is typically a serialized resource. Backup jobs are scheduled by priority; resources are switched and allocated to a job. When that job completes, resources are switched again, and the process goes on. The associated monitoring and administration of complex processes creates heavy bandwidth on the IT department and easily leads to operational failures. Using disk for a protection library allows users to share resources among multiple servers, simultaneously, whether it is on a SAN or through the network by way of iSCSI - no monitoring, no switching, no hassles. Backup jobs run simultaneously, avoiding the imposed requirement from tape to wait before starting a backup job after the previous one is complete, and resources are switched. With a disk array, multiple streams can run at the same time. Users can also easily collect or move data offsite on a WAN for geographically protected data.
17

With disk used as a protection library, backups are routed through a centralized backup infrastructure; by leveraging deduplication, users can expect up to 20x savings in stored data with significant improvements in backup and restore performance. With tape requiring 1,575 (times 25 for GFS) cartridges to protect 42 TB over the course of a year, a deduplication disk storage system would need only 2 TB. And with backup data reduced to its raw essentials, data is even more easily transferred over a network to a disk system at a disaster recovery site.

Tape vs. Disk - Availability


It is well understood that magnetic tape degrades over time. Temperature and humidity have a dramatic impact on shelf live. Ten degrees of temperature change can change the life of a tape by ten years or more. If an administrator loads a cart of tapes and takes them to a non-raised floor room, there is a great danger temperature and humidity changes will accelerate the effects of thermal decay which, in turn, will destroy data in as little as five years. The Library of Congress and the National Media Lab recommends, for data having permanent value, storage areas should be kept at a constant 45 to 50 F or colder (do not store magnetic tapes below 46 F as it may cause lubrication separation from the tape binder) and 20 to 30% Relative Humidity (RH) for magnetic tapes (open reel and cassette) and 45 to 50% RH for all others. Environmental conditions must not fluctuate more that 5 F or 5% RH over a 24 hour period. Tape should be stored in dark areas except when being accessed, being sure to keep recordings away from UV sources (unshielded fluorescent tubes and sunlight).(Source: The National Media Lab) Widely fluctuating temperature or RH severely shortens the life span of all tape. This is one of the main reasons why tape is only viable for the large enterprise that can afford a library large enough to maintain tape on raised floor handled exclusively by a robot. The design of the cartridge and the transport are critical to tape reliability as well. The enterprise class transports used today are in the 400,000-hour range. A well-managed cartridge (correctly controlled temperature and humidity) that is also a stagnant cartridge (i.e. a cartridge that has not been used) has a shelf life of around 20 years. Considering a shelf life of 20 years, at least 6 generations of change would have evolved in transports. Without the transport that wrote the cartridge along with the application software, operating system, computer hardware, operations manuals, ample spare parts and the recorded media itself, data cannot be retrieved. Even with all of those moving parts in harmony and perfect environmental conditions, chances of getting data back are about 23%. If anything goes wrong with any of the cartridges used for backup, there is no redundancy which means an organization is unable to retrieve their data. IT organizations deal with this by re-mastering data onto new transports and new media with every generation they change, which is a very expensive process. The mechanism for reading and writing tape are FAR more complicated than disk. With a disk, there is a flat, stable surface that spins without flexing in a hermetically sealed and contaminate-free enclosure. Beyond, the disk itself, disk arrays offer complete data redundancy with RAID technology and 99.999% availability with hot-swappable components, redundant controllers, power supplies, etc.

Tape vs. Disk - Power Efficiency


Tape has long been considered the most power efficient media since a cartridge can be stored without power. However disk has made huge advances in power efficiency with spin-down technology like Nexsans AutoMAID that enables highly cost efficient long-term data retention by progressively putting disks into deeper sleep
18

modes while offering near instantaneous response. With advanced power savings, Nexsan disk arrays allow the performance and management simplicity of disk backup with greatly reduced power consumption. With an easy-to-use power configuration manager, the user can create policies for desired power savings after user-defined periods of idle time. When idle thresholds are met, AutoMAID progressively reduces disk drive power consumption. The first I/O request will wake the array up to full power. Once the array is awake, it performs at 100% performance until enough idle time has passed to satisfy the energy savings policy, which places the array into increasingly deep levels of sleep. All of this happens automatically and provides great response performance as well.

Conclusion
Although the cost-per-byte stored on a single tape cartridge is less than disk, it is an isolated figure that gives a very incomplete look at a much larger picture. Grandfather-Father-Son produces about 25 to 1 more copies on tape than disk. That alone makes tape much more expensive. The choice is even more clear when adding the cost of labor to manage tape, the risk of data loss and downtime, performance limitations and the inconvenience of data that is offline. Protection, performance, reliability, management and cost all favor disk storage. And with AutoMAID power intelligence, online retention of rarely accessed data is justified. From the early 1950s until the late 1990s, the volume of data made sense for tape technology. But with the explosion of the digital universe, tape cant reasonably sustain the role it once held. For most organizations, that threshold has already been reached as they cant even backup all their data within the necessary window, let alone restore data fast enough to meet business requirements. As the pioneer of disk-to-disk backup, Nexsan was the first to understand and deliver the benefit of low-cost disk for the backup environment. As such, Nexsans unique position in the marketplace has been delivering unparalleled value and leadership to enterprises of every size for over ten years. Small to large, Nexsan has the disk library for all your backup and archiving needs.

Backup to Removable Disk


This approach uses removable disks in a manner similar to tape. One or more backup sets are written to multiple set removable disks, which are then periodically rotated using a scheme similar to a tape rotation scheme. With this approach the cost of a tape drive is eliminated and the speed of backup and restore is increased. Hard disks still cost more than tape, however. Also, they are more susceptible to damage from dropping than tape and their ability to retain data while sitting on the shelf is still relatively unknown, although a spokesperson for one vendor said the shelf live should be at least five years, and periodic refreshing by powering up and rereading and rewriting the data should extend the data retention period another five years. One vendor of cartridge systems claims thirty years archival storage. Both tape and disk appear to be susceptible to damage from temperature extremes, but hard disks appear to be less susceptible to damage from high humidity than tape.

19

There are many ways to mount removable disks: External disk drives using USB, FireWire or eSATA interfaces. Internal cartridge dock and cartridges, such as the RDX system developed by ProStor Systems. The dock is installed in a 5 1/4" drive bay. Internal RDX docks use an USB or SATA interface. External cartridge dock and cartridges, such as RDX. External RDX docks use a USB interface. Internal tray-less hot-swap rack. This device allows the swapping of bare SATA drives and requires an available hot-swap SATA port. External tray-less hot-swap rack. This typically requires a USB or eSATA port.

The tray-less drives are the least expensive, since the racks for them only cost approximately $20-$75 and you are not paying for a case or cartridge for each drive. They are, however, the most susceptible to damage from dropping and static electricity. The cartridge systems are probably least susceptible to damage. The damage resistance of the standard external drives is difficult to determine and to a great extend depends on the construction of the enclosure.

On-Line Backup
Increased Internet access speeds, combined with ever-decreasing disk storage costs have made across-theInternet backup viable. Known as both online backup and cloud backup, the use of these services has increased dramatically over the last few years. Numerous companies are providing online backup services, software and even dedicated backup appliances. Some systems combine online backup with more traditional disk-to-tape or disk-to-disk backup. Some online systems provide for maintaining multiple versions or revisions of files and some do not. Because of the low transfer speed of online backup when compared with disk-to-disk or disk-to-tape, most organizations do not rely on it for primary backup. This is not true in all cases, however. Some backup systems, for example, provide for online mounting of virtual machine images, allowing users to access their server resources while local virtual machines are being rebuilt. On-line backup is usually used in conjunction with some method of local backup. Increasingly, backup systems that provide local backup are providing online backup as well. Most online services have a fixed base monthly or yearly cost plus data transfer and storage costs. Low-end services can cost as little as $4-5 base monthly while the base cost of some services can be in the hundreds of dollars per month. Transfer costs and storage costs can vary from a low of about $0.15 per gigabyte to $3.00 per gigabyte or more. Some vendors charge for data transfer and some do not. Also, the types of services provided vary as well. For example, some services are strictly backup and restore, while others provide shared remote access and/or remote drive mapping, so that multiple users can access online data as they would from local storage. Some provide remote application support as well. Some backup services compress and deduplicate your data before uploading to reduce network traffic, data transfer costs and storage costs.

20

Online Backup Issues

The following issues should be considered when selecting an online backup service:

1. Data transfer rate. When large amounts of data need to be backed up, high-speed internet connections are required. 2. Security. Most online backup services provide 256-bit SSL connections, but in some cases secure connections are optional. Also, some service providers encrypt your data and some do not. 3. Protection of your data. Some online providers have redundant data sites, while some store all your data in a single location. It is important to know how your provider protects your data. 4. Retention policies. Can you set a policy for retention of multiple versions of your data? How flexible can your retention policy be? Can you set different policies for different classes of data? What is the service providers retention policy if a billing issue or dispute should arise? Is your data immediately deleted? Is there a grace period before access is cut off, and an additional grace period before data is deleted? 5. Emergency data access. How do you access your data if the systems being backed up are unavailable? Are there alternate access methods? What if you need a large amount of data quickly? Some services can arrange to ship your data to you on disk, if necessary. Also, is the data stored in a proprietary format or can it be accessed by multiple applications? 6. Appropriateness of service. Are the services provided optimal for your organization? For example, if you would like online access to a virtual machine image in an emergency, can your software and online service provide that? 7. Costs versus benefits. Price per gigabyte of data stored or transferred is not the only measurement of online service costs and benefits. Make sure the services provided fit your organizations needs in a costeffective fashion.

Methods for Backing up Data


There are several means for backing up and protecting data. Traditional File-based Backup
The traditional file-based backup approach backs up a systems files and directories, along with file attributes, as discrete items. Some systems can back up directory (Active Directory, eDirectory, etc.) information as well, usually as a separate backup. The big advantage of this approach is that it is easy to restore a file or group of files, or a directory object or objects, relatively quickly and easily from any available backup medium. Most file backup systems maintain a catalog of the files and directory entries of all tapes (or other media) in the backup rotation. As tapes are overwritten those entries are removed from the database.

21

File backup doesnt lend itself to quick bare metal recovery, so a number of backup software vendors have provided add-ons that perform disk imaging of the basic system, including boot sector and operating system. This approach, although greatly improved in recent years, has been problematic, especially if the recovery image has not been kept up to date or if a system was being restored to a different server or dissimilar hardware. Another problem with traditional file backup systems is that if the catalog becomes unavailable, due to problems with the backup server for example, backup media needs to be re-imported into a new catalog, which can be a time-consuming process with multiple sets of backup media.

File Synchronization
File synchronization refers to the periodic or continuous copying of files and directories from a source location to one or more destination locations in order to maintain duplicate file sets. This technique is often used to make sure the most recent versions of files are available elsewhere if a primary system fails. When implemented with a versioning system, this approach can maintain multiple revisions of files. File synchronization, with or without versioning, is often used in cloud (on-line) backup systems. It is also used between systems within an organization, commonly between sites to make sure data is quickly available in case of a site-related disaster. File synchronization is often used in addition to traditional backup systems since it can provide immediate access to data. Most file synchronization approaches are unidirectional, meaning they synchronize in one direction only. Bidirectional or multi-directional approaches also exist, but they are much more complex to implement and often require manual intervention to avoid version conflicts. When updating files that have previously been replicated some programs re-replicate entire files while some use delta encoding to only replicate file changes. Delta encoding can significantly reduce both network traffic and replication time. Data compression and data deduplication can also be employed to optimize performance across WAN links.

Remote Data Replication


Remote data replication is the process of duplicating data between remote sites. With replication data is written to both a local, or primary, storage system and one or more remote, or secondary, storage systems. It is usually employed to guarantee data currency and availability in the event of a site disaster. Remote data replication can be conducted across the Internet or private networks. Remote data replication can be synchronous, asynchronous, semi-synchronous or point-in-time. Synchronous replication assures that each write operation is completed to both primary and secondary storage before a host system or application is notified that the operation is complete. This method assures that identical data is written to both primary and secondary storage, but, because of the timing issues involved, it can definitely affect application performance. Effective synchronous replication requires extremely reliable, high-speed networks. Typically Fibre Channel over IP is used. Synchronous replication is usually employed where real-time replication with the highest level of reliability is a greater concern than cost. This method is often used by financial institutions where the loss of even a few minutes of data can cost millions of dollars. Because of network performance requirements, synchronous replication over long distances typically employs Fiber Channel over IP with channel extenders. As distance increases, latency also increases, which can affect application performance. Typically, distances of less than 150-200 miles are recommended, but under some circumstances greater distances can be achieved.

22

With asynchronous replication, data is written to primary storage and then to secondary storage sometime later. The host system or application is notified that the operation is complete when the write to the primary system is complete. Data is then passed to secondary storage when network bandwidth is available. Typically this is within seconds or less, but sometimes can be several hours. Asynchronous replication is a good choice when relatively slow or unreliable networks are employed. With semi-synchronous replication a transaction is considered to be complete when it is acknowledged by the primary storage system and the secondary storage system has receive the data into memory or to a log file. The actual write to secondary storage is performed asynchronously. This results in better performance than a synchronous system, but it does increase the chance of failure of the secondary system write. Point-in-time replication uses snapshots to periodically update data changes, usually on a scheduled basis. This is the least reliable approach, but can be more effectively performed over low-speed links. Asynchronous, semi-synchronous and point-in-time replication can span virtually any distance, so are a good choices when storage systems are great distances apart. Because these approaches do not require immediate write acknowledgment from secondary storage they also create less of a potential performance impact on the host.

Images, Clones and Snapshot Images


Another method of backup is to replicate a disk or volume to another device. The methods to do this are known as imaging, cloning and snapshotting. The descriptions here are representative and do not reflect all methods used by various software vendors to create images, clones or snapshots. Imaging software creates a replica of a disk, volume or multiple volumes as a file or set of files that can be used to restore a system to its state at the time the image was created. An image file is similar in function to CD/DVD ISO file. There are no standards for disk and volume image file formats and most are proprietary to a particular software package. Older imaging software only allowed the restoration of complete images, but many current systems allow the restoration of specific files and folders. Cloning creates replicas of disks, including bootable replicas of system disks. While imaging requires restoring the image file to a disk, a clone can be used as is in place of a failed disk. Snapshotting is a term that refers to the process of capturing the state of a system at a particular point in time. Disk imaging and cloning are both forms of snapshotting. There are two primary forms of snapshotsfull and differential. A full snapshot captures an entire volume, disk or system, while a differential snapshot only captures changes made since the last full snapshot. By creating and maintaining multiple differential snapshots along with a full snapshot a system can be restored to different points in time. Early image and cloning software, as well as some current software, require the system that is being imaged to be shut down and booted with a floppy disk, CD or USB device that hosts the imaging software in order create or restore the image or clone. A number of current products, however, allow imaging or cloning of a live system. In the Windows environment most products use Microsofts Volume Snapshot Service or Volume Shadow Copy Service (VSS) for this function. VSS is a set of services that are designed to provide consistent copies of Windows systems and applications such as Microsoft SQL Server and Exchange. There are also live imaging systems for Macintosh OS and Linux as well. Apples Time Machine, included with Macintosh OS X, can be used to create bootable backups, and there are several third-party products that do this as well. For Linux, Acronis Backup and Recovery 10 and the open source package Mondo Rescue can be used for live imaging.

23

Continuous Data Protection and Near Continuous Data Protection


When data is written to disk a continuous data protection system saves that new or updated data to a backup system. A near continuous data protection system will capture changed data every few seconds or at predefined intervals instead of immediately upon disk write. For most purposes the effect of the two approaches is the samedata can be restored from nearly any point in time. Both approaches can have some effect on system performance and both generally consume more backup media space than more traditional approaches. Some CDP packages allow administrators to set event-driven points such as the monthly closing of the books.

Agent vs. Agentless Backup


When the backup server or service is not running on the system being backed up some method of data transfer must be employed. This can be accomplished by installing a special piece of software, an agent, that is written to specifically communicate with the backup system, or by using software that is already installed on the computer. This often means using standard communication protocols such as CIFS (SMB) or NFS. The agentless approach usually simplifies the rollout of a backup system and can also reduce overall costs. Agents, on the other hand, can often provide better communication between the backup server and client, allowing, for example, a client to tell the server about changes that need to be implemented in the backup. In agentless systems, as well as some agent-based systems, backup control is generally handled at the backup server. Agents are also used for application backup. An agent can make sure a database is in a consistent state for backup, for example.

Windows Volume Shadow Copy Service (VSS)


Volume Shadow Copy Service (VSS) is a set of services that are designed to provide consistent copies of Windows systems and Windows applications such as Microsoft SQL Server and Exchange. VSS has been included with Windows since Windows Server 2003. VSS allows the backup of open files, locked files and open databases. Backups created with VSS are called shadow copies. VSS can back up full volumes and, with the use of application-aware components, back up specific applications, such as Microsoft SQL Server and Exchange. For volumes, VSS can create clones, or complete volume copies and differential copies, which are copies of data changed since the last full or clone backup. For databases such as SQL Server and Exchange, VSS be used to create full backups, copy backups, incremental backups and differential backups. A full backup includes all selected databases but deletes transaction log files older than the start of the backup. A copy backup does not delete log files and will consume more disk space, but it does allow the ability to restore data from points in time prior to the backup, if that data is in the transaction logs. An incremental backup only backs up database changes since the last full or incremental backup and then deletes logs older than the start of the backup. When using incremental backups, to restore a database, you must have a full or copy backup and all subsequent incrementals. Generally, differential backups are preferred over incremental backups. A differential backup only backs up changes since the last full or copy backup but it does not delete pre-backup logs. When using differential backups you only need the full or copy backup and the last differential.

Some backup systems provide for transaction log backup through VSS as well.

24

A VSS requester, which is usually a component of the backup software, starts the creation of the backup, or shadow copy. A VSS writer, usually using copy on write, will make sure the data being backed up is in a consistent state. A VSS provider creates the copy. Most current software that backs up Windows uses VSS to some degree.

Encryption and Password Protection of Backup Media


Encryption and password protection are often used for media that will be physically transported from one site to another or will be stored in an unsecured location. In some industries legal and/or regulatory compliance may require encryption of such media. Accidental disclosure of personal health records or financial data can have severe repercussions, even if specific laws or regulations are not violated. Some backup programs, such as older versions of Symantec Backup Exec and EMC Networker, for example, provide password protection but not encryption. This makes unauthorized restoration of data difficult but not impossible. Advanced Encryption Standard (AES) Advanced Encryption Standard (AES) is an encryption standard adopted by the U.S. government as Federal Information Processing Standard (FIPS) 197 in 2001. AES is the encryption standard used by most enterpriselevel backup systems. AES supports key sizes of 128, 192 and 256 bits.

Tape Drive-based Encryption


As of version 4, the Linear Tape Open (LTO) tape format supports hardware-based compression at the tape drive. Although encryption is available for LTO-4 and LTO-5 tape drives, it is not implemented in all drives, so if it is used both the backup drive and restore drive, if different, must support encryption.

Encryption Issues

There are several issues to look for when you decide to encrypt data. Performance. Encryption uses CPU cycles, so it will affect performance of the system doing the encryption. Key Management. In simplest terms, an encryption key is a randomly-generated piece of information that determines the output of a cryptographic process or algorithm. Once data is encrypted with a particular key the appropriate key (with AES it is the same key) is required for decryption. When encryption is used for backup systems it is absolutely critical to make sure the key is available when data restoration is necessary. Effective key management procedures must be in place to make sure keys are properly generated, stored, used and replaced if necessary. Keys and key management procedures must be stored and backed up outside the systems to which they apply so that they are available in an emergency. Encryption and Backup Data Compression. If both compression and encryption are used on a backup system, the data should be compressed before it is encrypted. If software encryption is used then compression should be disabled on the backup device. If LTO hardware encryption is being employed then both compression and encryption can be performed by the tape drive.

Backup Data Compression

Data compression is the process of encoding data so it uses less media space. Standard compression algorithms usually operating on the bit level by removing redundant bits of data and replacing them with codes that can be used to restore that data on read. Data compression is supported by most backup systems. Compression can be
25

provided by the backup application, the tape drive, or, if backing up to disk, the operating system of the backup disk. Application-based compression is sometimes proprietary, so the compressed data can only be read by that software. Tape drive-based compression is transparent to the backup software and does not affect readability of the tape. Current Windows operating systems provide for transparent or on-write compression. Transparent compression is not native to any of the current production Linux file systems. Compression and decompression both affect system performance, since the processes use CPU cycles, RAM and disk space.

Data Deduplication
Data deduplication eliminates redundant data to reduce storage requirements. Pointers are used to reference the single unique instance of the data retained on the storage system. Depending on the type of data, deduplication can significantly reduce storage requirements. For example, an e-mail system might maintain copies of a file attachment in multiple users mail boxes. With data deduplication only one copy is maintained. Currently deduplication is used primarily for backup and archiving systems. Although data deduplication can be used on primary file systems, system overhead, lack of standards and lack of direct operating system support3 make this less attractive.

File Mode and Block Mode

There are two primary modes of data deduplication -- file mode and block mode. File mode looks for duplicate files while block mode looks for duplicate blocks of data within files. Block deduplication can be either fixed block deduplication or variable block deduplication. Fixed block deduplication looks for identical data blocks, while variable block deduplication uses more intelligent, thus more processor-intensive, algorithms to look for identical data within blocks. The effectiveness of the three modes varies with the type of data being stored. Generally, file deduplication is the least effective in terms of data reduction but has the least system overhead, variable block deduplication is the most effective, but with the greatest system overhead, and fixed block deduplication falls somewhere in the middle.

In-Line or Post-Processing Deduplication

In a backup or archiving environment, deduplication can be applied in two ways in-line or post-process. Inline deduplication operates as data is being written to a target device. If a new block (or file) is the same as an existing block the new block is not written to the storage device. Instead, a pointer is set to the existing block (or file). With post-process deduplication, data is written to disk as it is received and then analyzed and deduplicated after the fact. In-line deduplication uses RAM instead of disk space, but it can affect performance while data is being written to disk. Post process deduplication is part of an intelligent disk target, which is associated with a disk library. In post process deduplication, backup data is written to a disk staging area where the dedupe process works on data at a later point in time. Post process deduplication allows the use of most backup software choices, certainly all of the mainstream options. While sufficient storage must exist to hold a complete first copy in the scratch pool, the low implementation costs of a SATA disk library will offset the costs as compared to an in-line based server solution. Specifically, the inline approach to deduplication takes lots of computing capacity, and still is a slower performer, and greater risk than the post process approach. Servers are also expensive from a CAPEX point of view, and they
3

Sun Microsystems (now Oracle) ZFS file system includes block deduplication support. ZFS is supported on current versions of Oracle Solaris, OpenIndiana (formerly OpenSolaris) and FreeBSD. 26

are more power hungry than an efficient SATA array. More power produces more heat, and that results in greater costs for cooling. Another major benefit of post processing is that data is moved to the safety of the Disk Library without being slowed by deduplication processes in a server, ala inline deduplication. As a result, post processing systems will accept data and perform at much faster rates than the inline approach. In-line deduplication requires the use of specific backup software clients and becomes the first objection to its implementation. Not everyone wants to abandon the backup software that is currently being used, and people are happy with. This forced replacement is seen as disruptive and will require training and new processes. The specific client software associated with the inline approach is used to talk to a server also running specific backup software; to identify files and the hash or fingerprint that has been created for that file. This is done to determine what if any action should be taken to deduplicate the file. In-line deduplication suppliers will tout the benefits of reductions in the file traffic across LAN/WANs, and decreased storage capacity since they dont use a scratch pool. However, the in-line approach is in the data path and can slow down the incoming backup and other applications that are trying to use the same SAN ports. This is why inline dedupe devices slow down application performance. Although the largest performance risks are associated with applications moving large streams of data, any application can be impacted. A huge risk is if a restore is required while a backup is underway. If a disaster happens with inline deduplication, the server is already capacity consumed with very CPU intensive operations as a part of deduplicating the backups that are running. If the inline server is now asked to restore deduplicated data, there is an additional load placed on the server to rehydrate deduplicated data, which is also very CPU intensive. This results in not only the backup jobs slowing down, the restore will be slow as well. Most organizations find the cost of downtime to be a critical concern to the health of the company. Slowing down a restore could have significant economic ramifications to the business. Better to avoid this potential risk, after all Murphys Law prevails. By deduplicating inline, performance is limited by the speed of the deduplication engine and scalability is typically limited as well. Building out an infrastructure to hit desired performance levels in large environments can be quite expensive. Suppliers of inline deduplication solutions wont talk about are the disadvantages in having to change backup software, the slower overall performance, or the potential increase in TCO caused by expensive and power hungry increases in necessary servers to run this approach. One final approach is a hybrid and is called concurrent processing. Concurrent processing still moves data to a disk staging area first, but doesn't wait for backups to finish before deduping.

Backup Performance
In backing up, performance is a function of two things; 1) Capture a. Number of network port connections and their performance abilities, available bandwidth, and congestion b. The Intelligent Disk Target (IDT) (a.k.a. Disk Library) performance abilities i. The Network and IDT work together to transport data from a backup servers backup application, and capture it to the Intelligent Disk Target (IDT) cache. 2) Post Processing a. The performance of the IDTs back end used to read data from the backup repository to analyze the cached data using a hashing algorithm, and then deduplicating the data down to the block
27

level to create a block level repository. It is reasonable to expect that the capture speed to the cache will be slightly faster than the creation of the block level repository.

Restore Performance
Restoring data to recover an application is always critically sensitive to speed. When restoring from deduplicated data, the volume being restored must be complete. While it is possible to optimize the backup by eliminating redundancy, the restore volume must be re-inflated to restore all the duplicate copies. Therefore, while the total amount of data representing a backup of a data space can be 20 or more times smaller than the original data it backed up, when doing a restore the entire data volume, including duplicated data, must be written back as a part of the restore. Even though time is spent in processing pointers to re-inflate the data, overall restore speed will be approximately the same as the speed of the backup. The reason for this is the result of the application of performance algorithms that first take advantage of a phenomenon known as reference of locality. As sequential data is written to a disk, one record is sequentially laid down after another. In restoring data, it is possible to take advantage of this by pre-processing a read request from the restore utility, anticipating the next sequential record will be requested. By doing this, the wait time or latency associated with the request is avoided with data being available at the moment it is requested. This seemingly small amount of savings, adds up to large savings in time over the length of the restore. Thus restore performance is in practice the same as backup performance. From an application logic point of view, the restore utility asks for records to restore an application. The dedupe appliance fetches the appropriate records, and presents them as requested. Neither the backup or restore application are aware of the deduplication or inflation process at the target.

Power and Cooling Consideration


Deduplication is one of several steps necessary to achieve energy efficiency in the data center. Deduplication started as a way to increase space and time efficiency on tape backup targets. However, as the reliability and performance of SATA arrays have gone way up, price has continued to fall, to the point that the overall value equation now favors disk. Performance has always been a primary concern in the protection architecture. When doing backups, the issue is being able to get everything backed up in the time allotted, that night. When doing a restore, time is money, the longer an application is down, the greater the economic impact to the business. Disk Libraries are far faster and far more reliable than tape. What about efficiency? In using advanced power management technologies available today, power consumption can be significantly reduced. In the case of a disk subsystem it would be reasonable to expect 60% reduction in overall power for an advanced MAID power management disk subsystem versus more traditional disk at a given capacity.

28

ECO-Matters
Where the number really shines is when you add deduplication. Compare the annual power of a 52TB deduplicated system that has advanced power management to a typical and popular system that has neither. If we assume we have obtained a 20 to 1 efficiency in data reduction in the 52TB system, then it would take 1 Petabytes Capacity Annual Cost Annual Compression Annual Energy Required at $.12 per Metric Tons of a nonRatio in kWhs TBs kW of CO2 deduplicat Nexsan 52 20:1 8,609 $1,033 4 ed system. Generic Comparin Disk no1,005 1:1 205,054 $24,606 103 g the dedupe annual Tape power for Library 503 2:1 16,740 $2,009 8 the nostorage, dedup the 52TB system would use 8,609kWhs in one year. The 1PB non-power managed system would use 205,054kWhs in a year. At $.12 per kilowatt, the cost to power them would be $1,033 versus $24,606. An incredible economic difference. To give you some scale for what this means in terms of energy savings, according to the EPA, the average home uses 11,965kWhs annually. Simply by using an advanced SATA array with power management and data deduplication you could save the equivalent energy used by 46.63 homes for 1 year. It is also the difference between only putting 4 metric tons of CO2 in the air in a year versus 103.

Source or Target Deduplication

Data duplication can occur at or near the data sourcethe system being backed upor at or near the targetthe backup system. Source deduplication deduplicates data either on the source system itself or on a separate system near the source, such as a dedicated deduplication appliance. Source deduplication can reduce network traffic, and this could be significant in a WAN environment. Target deduplication operates on the target system itself or on a separate system near the target. One advantage of target deduplication is that it can deduplicate data from multiple sources and can potentially provide greater overall data reduction. Both source and target deduplication can be used in the same backup environment, but usually at a significant cost. A hybrid form of deduplication, sometimes called client backup deduplication, performs the deduplication on the source system and then compares the result to data stored on the target. If identical data is already stored on the target the data is not transferred and the appropriate pointers are created to reference the existing data. This approach prevents duplicate data from being transferred across the network and can potentially reduce impact on network performance. This approach is especially effective when backing up data from multiple similar systems, such as client PCs.

The Downsides of Data Deduplication

The deduplication process uses system resources and, depending on how and where it is implemented, can affect system and network performance. Since there are currently no industry standards for deduplication, the system used to dedupe the data must be used to reconstitute it, creating a high degree of vendor lock-in. Another major issue is cost-effective deduplication requires an investment in hardware, software and implementation services. The costs, however, are often offset by reductions in the cost of backup time and backup media.

Application-Specific Backup
The major issue with backing up database applications is that the files must be quiesced to be backed up in a
29

consistent, synchronized state. This can be accomplished a number of ways4: Shut down the database before the backup and restart it after. This is the simplest method and can easily be done with scripting, but it makes the database unavailable during backup. Lock and flush (write all pending updates to disk) the tables before the backup and unlock after. A read-only lock allows other clients to query the database but not update it. This still has the problem of the database not being updatable during backup. Export data as SQL statements. This preserves the table data, but each table must be individually restored to the database server. Use application-specific APIs or utilities. For example, most approaches to backing up Microsoft SQL Server and Microsoft Exchange use Microsofts Volume Shadow Copy Service (VSS) and a Virtual Backup Device Interface (VDI) for that particular application. Oracle backups typically utilize Oracle Recovery Manager (RMAN). Oracle also provides a VSS writer to allow Oracle database backup with VSS. PostgreSQL provides pgdump and pgdumpall. Zmanda provides

Zmanda Recovery Manager for MySQL (ZRM) in both community and commercial versions. These utilities and APIs will lock the database to prevent updating during backup.

Application write requests are written to a buffer and the database is updated after backup. Read requests can also access the buffer during backup so those requests will be able to access any data written while the database is locked. Use database transaction logging. A transaction log is a record of all changes made to a database. Transaction logs are commonly used to restore data that was deleted, modified or corrupted after the most recent backup. Use third-party application-specific software to back up an application. For example, Zmanda Recovery Manager (ZRM) will back up MySQL databases. Zmanda also provides ZRM agents for using ZRM with other backup products. Such third-party products might use proprietary techniques or use some of the approaches outlined here.

Please note that if a particular backup system doesnt support these methods directly they can often be implemented through scripting and scheduling.

Virtual Machine (VM) Backup


Virtual machines can be backed up in a number of ways: Back up a virtual machine as if it were a physical machine. This means install a backup agent or backup software in the VM. Back up the virtual machine from the host as a file system object or objects. Back up the virtual machine as a bootable copy or snapshot. Depending on the VM, host and backup application, this may be done with or without agent software in the VM itself. Some systems allow the

This is not an all-inclusive list. 30

creation of base snapshots and incremental snapshots, which only store changes since the last snapshot. This approach allows restoration to any point in time at which a snapshot was taken. Use live migration. Live migration allows you to copy a running VM to a different physical machine without shutting down the machine being migrated. Most current VM managers support live migration. Use continuous live migration. This technique uses a combination of live migration and checkpointing to replicate a VM on a continuous basis. Remus, a continuous live migration utility included with current releases of Xen, is an example of this. Some combination of the above. For example, at least one backup system creates an image backup of the entire VM and then backs up the file system from that image. This approach provides the image for disaster recovery and the file system backup for restoration of individual files.

Backing Up Virtual Machines


The widespread deployment of virtual machines (VMs) on microprocessor-based systems has created a whole new set of backup issues. Different virtual machine managers, or hypervisors, have different levels of support for backup. Likewise, different backup systems take different approaches to VM backups. One purpose of virtualization is better utilization of hardware resources. This means that VM hosts are typically running closer to maximum CPU, memory and I/O capacity than physical (non-VM) servers. Backup also tends to use a lot of CPU, memory and I/O resources, so it can impact the performance of a busy VM host. In addition, backing up multiple VMs means encountering most or all of the issues encountered when backing up multiple physical servers, including scheduling issues and completing backups within the available backup window. VM backups should make sure that the entire VM as well as individual files and directories can be restored quickly and easily. Depending on the hypervisor being backed up, a variety of backup techniques may be necessary to achieve this goal.

31

There are a number of options for backing up VMs: Run the backup from the VM host or a proxy and back up the files that contain the VM and its definitions from the host. In most cases the VM must be shut down during backup. Back up the contents of the VM as if it were a physical machine. This approach usually requires an agent running in the guest VM. Database applications must be quiesed using the same techniques used for applications on physical servers. Hot snapshot - create an image of the VM while it is running. Since the backup application is running on the VM host or a proxy, there must be some degree of coordination between the backup system and database applications running in the VM to make sure they are in a consistent, stable state. A combination of approaches. Some backup applications will use a combination of the above

Hypervisor-specific Backup Methods


Specific hypervisor software use different methods to backup up virtual machines. VMware -- Software to back up VMware commonly uses either the VMware Consolidated Backup (VCB) application or the more recent VMware vStorage APIs for Data Protection. VCB is a standalone application that can be called by other backup software. The vStorage APIs, however, are effectively file system drivers that allow access to VMware from Windows or Linux applications. Applications can also access VMwares proprietary VMFS file system from within the VMware ESX service console. Safely backing up database applications usually requires an agent running in the within the VM. Microsoft Hyper-V -- Backup applications for Hyper-V typically use VSS. VSS allows VMs to be backed up from the Hyper-V host. It also allows VSS-aware applications within a VM to be properly quiesed for backup. KVM, VirtualBox, Xen, XenServer and Others -- These hypervisors generally require commandline utilities, scripts or third-party software for backup.

32

Tips and Best Practices for Effective Backups


Given the variety of backup requirements and methods it is nearly impossible to outline a set of best practices that will meet all or even most situations. You should, however, create a data protection plan that fits the needs of your organization. Creating this plan will require a cost/benefit analysis. For example, every step closer to zero downtime will increase costs, so you need weigh the costs of additional anti-downtime measures versus the benefits of the incremental protection provided. Your plan should consider the following: Minimization of Downtime - You need to consider the consequences of both short-term and longterm downtime. Some downtime, even if it is only a few minutes, is inevitable. Even systems that have been sold as non-stop systems have failed. If nothing else, most systems must be shut down for maintenance, at least occasionally. You need to determine what is the maximum unplanned downtime you can afford. You then need to weigh that against the costs of implementing the systems and services necessary to achieve your desired level of uptime. Access to Data in an Emergency - You back up data so that it is available in an emergency. This could mean a disk failure, data corruption, accidental data deletion or any number of other causes. This means having the data available and having the means to restore or access it. This requires, among other things, making sure backed up data is both immediately available and stored off-site. Long-term Storage of Archived Data - Make sure that your data protection plan provides for archival storage of data that must be kept for legal, financial or business reasons.

An effective data protection plan should include some method of local backup, such as disk-to-tape, disk-todisk-to-tape or disk-to-disk. This system should provide for maintaining multiple backup sets or restore points. This gives you protection against the failure of a single backup media set or allows you to restore a file from an earlier point in time. Your media rotation scheme should also include storing some of your backup sets off-site. Your local backup system should back up all critical data as well as provide the ability to fully restore critical systems, such as servers. The ability to perform a bare metal restore (restoring a complete system, including system files, directly from backup) is a plus. Many backup programs offer this feature at varying levels of cost and complexity. Effective use of server virtualization can help minimize downtime. Many backup systems create bootable VM images, and with some systems it is easy to fail-over to a backup image. Some systems will also begin the rebuild of the primary VM and allow it to run while it is being rebuilt, transferring data from the backup as needed. Other than a possible reduction in performance, this process is transparent to users and applications. Online backup can give you additional protection against deletion or corruption of critical data. In most environments online backup is used for critical data files that cannot be easily recreated. Depending on requirements and the online system employed some organizations back up all data online. Applications such as databases and email systems usually have specific backup requirements to make sure they are properly quiesced so they can be backed up in consistent, synchronized state. Make sure your backup routine provides for this. When multiple systems are backed up redundant data is usually backed up as well. Effective use of data deduplication in the backup process can save backup media space, backup time and LAN/WAN bandwidth. One downside of deduplication is that there are no standard formats, so you are reliant on a specific vendors system.
33

Use case profile


Customer Name: Clark Enersen
Type of Organization: Architectural Engineering Firm Number of Employees: 100 Location and Environment: Main office in Lincoln, Nebr., with additional office in Kansas City, Misssouri Contact name: Cory Pierce, IT Manager Amount and type of data protected: 1.5TB in Lincoln office and 1TB in Kansas City, primarily CAD, business data, and e-mail. Eight physical servers in Lincoln and five in Kansas City needed shareable disk space for virtualization, supporting 120 fixed client nodes and 1 mobile worker in New England. Challenges: Needed to convert from siloed NAS and DAS storage to shared storage for virtual machine support Replication of engineering data between both locations Client systems need data protection Systems required shareable storage for Appassure backup and Symantec Backup Exec to LTO-4 tape system. 3PAR and EqualLogic systems evaluated but were too expensive Solution: 2 Nexsan SASBeast systems, one in each location 2 Nexsan iSeries i200 appliances with two Fibre Channel and two iSCSI ports, one in each location Fibre Channel is used between appliances and SASBeasts, iSCSI between servers and offices Benefit: Availability of 42 drive slots in each system which can be SAS or SATA VMware snapshot capability High speed replication support between both sites Systems were easily installed with telephone support from reseller (Condor Systems) Post installation follow-up from Nexsan was appreciated Backup systems from Appassure and Symantec Backup Exec fully compatible

34

Vendor Name: Nexsan


Product Name: iSeries and Dedupe SG Link to website: http://nexsan.com Link to data sheet: http://www.nexsan.com/nexsan_iseries.php http://www.nexsan.com/dedupesg.php Software, Hardware-based Appliance, Virtual Appliance, Online, Target Array: Target array and appliance Product Description: Nexsan manufactures a number of target arrays and appliances for small- and mid-sized business that incorporate snapshot and replication capability. Among them are: The Nexsan iSeries is an iSCSI storage area network that supports full, incremental, synthetic and snapshot backups of Windows, Linux, AIX, HP-UX, Macintosh OS x, NetWare and Solaris servers and Windows, Linux, Macintosh OS X, NetWare and Solaris laptops and desktops. It supports Exchange, SQL Server and SharePoint applications and Oracle, Sybase and MySQL databases. Agents are available for SQL Server and Exchange. The iSeries ships with integrated snapshot capability and has bare metal recovery capability also. The iSeries supports near-continuous data protection and also supports both local and remote replication. Additional services included with the iSeries are mirroring and data migration. With the iSeries, administrators can create and control 1000s of virtual machines that support multiple applications and storage pools simultaneously. The iSeries incorporates both Serial ATA (SATA) and Serial Attached SCSI (SAS) drives in the same system. The iSeries can contain up to 42 drives in a 4U enclosure for scalability of up to a petabyte of data. The Nexsan Dedupe SG is a deduplication appliance that provides disk-based data protection for Windows, Linux, UNIX, Macintosh OS X and NetWare servers, desktops and laptops. It uses FalconStors File-Interface Deduplication System to provide deduplication capabilities for any number of environments. The Dedupe SG provides inline and post-processing deduplication in an appliance format that has a maximum throughput of 600MB per second and a logical capacity of 1.4PB. The Nexsan Dedupe SG is available in three configurations:

4TB and 7TB usable capacity in a 5U configuration; 12TB, 18TB and 26TB usable capacity in a 6U configuration; and, 52TB and 68TB usable capacity in a 10U configuration.

Channel: Nexsan sells through VARs. Cost: The Nexsan iSeries starts at $30,000; the Nexsan Dedupe SG starts at 40,000.

35

Nexsan Nexsan Product Name

Nexsan Dedupe SG Nexsan iSeries iSeries Dedupe SG X X Virtualization supported T H,T Microsoft Hyper-V D D Media Support (Disk, Tape) Server/Desktop/Laptop Support Image-Level Backup S,D,L S,D,L File-level recovery F,I,S,O* F,I,S,O* Block-level replication VMware ESX and ESXi Starting cost range $30,000 $40,000 vStorage API support Image-level backup File-level recovery Block-level replication VMware vSphere vStorage API support Image-level backup File level recovery Block-level replication CitriX XenServer Image-level backup File-level recovery Block-level backup Xen Solaris Zones X Solaris Logical Domains X Other Per agent Price per GB /TB range Price per server range Backup type (Full, Incremental, Synthetic, other) Software/Hardware-based Appliance/Virtual Appliance/Online, Target Array X X X X X X X X X X X X X X X X X $0.50/GB $0.40/logical GB X

Nexsan Dedupe SG Nexsan iSeries

X X

Windows Essential Business Server 2008

X X

Windows Small Business Server 2008

X X

Windows Small Business Server 2008 R2

X X

Windows Server 2003

X X

Windows Server 2008

X X

Windows Server 2008R2

X X

SUSE

X X

Red Hat

X X

Oracle Enterprise Linux

Asianux/RedFlag

FreeBSD

X X

Macintosh OS X X X

Virtualization Support

Server operating systems supported

Table 1.1 Vendor/Product Name

36
X X X X X X X X X X X X X

Price per laptop range Per desktop range Notes

X X

Novell NetWare

X X

UNIX

*Replication *Replication

X X

AIX

X X

HP-UX

x X

Solaris

Other

Nexsan Dedupe SG Nexsan iSeries X Snapshots Supported Optional X X $0 Agent Price SQL Server SharePoint Oracle Sybase X X MySQL X Vista/XP/Windows 7 X Storage Server 2003/2008 Other X Exchange Server X X Active Directory Virtual Tape Desktop Laptop NDMP Notes X X Exchange Price Any point-in-time Bare Metal Restore X X # of snapshots Client Encryption X X Windows 2003/2008/2008R2 32/volume X X X Hardware-agnostic X SQL Server Bare Metal Recovery Price Bootable CD Recover to dissimilar hardware Recover to identical hardware Recover to virtual environments File and folder recovery Notes X X $0 $0 X X X X X X X X X X X X Media Server Encryption SUSE Red Hat Novell OES Linux Asianux/RedFlag Debian/Ubuntu X X Macintosh OS X X X Novell NetWare X AIX X HP-UX X X Solaris Notes $0 X X X

Nexsan Dedupe SG Nexsan iSeries

Nexsan Dedupe SG Nexsan iSeries

Nexsan Dedupe SG Nexsan iSeries

Client operating systems supported

Applications supported

Agents

Snapshots and Bare Metal Recovery

37
x

Nexsan Dedupe SG Nexsan iSeries X X Replication $0 $0 $0 Price Local X 32 Remote X X Asynchronous over IP Microsoft Exchange Target X Post-processing Appliance Raw Capacity Microsoft SQL Sever X Microsoft SharePoint Virtualization # of servers supported X X Bi-directional X Heterogeneous X X Many-to-one One-to-many X X Block-level # of laptops/desktops supported >1000 200 File-level Bandwidth reduction Compression Policy-based File versioning Price Number of versions Notes Notes # of snapshots supported Inline Any point-in-time Source Price Price Continuous Data Protection Deduplication $0 $0 X X X X X X X X X X X X X X 68TB >1000 200 1,4PB Logical capacity Maximum throughput X 600MB/sec. X X

Nexsan Dedupe SG Nexsan iSeries

Nexsan Dedupe SG Nexsan iSeries

Nexsan Dedupe SG Nexsan iSeries

X X

Management Console

X X

Monitoring

X X

Analytics

X X

Reporting

X X

CLI

Scripting

X X

Web-based

X X

Deduplication

Continuous Data Protection

Replication and File Versioning

Management Console

38
X X X X

GUI

X X

Remote console

Notes

Nexsan DeDupe SG Nexsan iSeries Direct MSP Hosted service provider Online/cloud X X VAR Corporate/Government Distributor Other

Channel

39

40

You might also like