Implementing IBM Storage Data Deduplication Solutions Sg247888
Implementing IBM Storage Data Deduplication Solutions Sg247888
Implementing IBM Storage Data Deduplication Solutions Sg247888
Alex Osuna Eva Balogh Rucel F Javier Zohar Mann Alexandre Ramos Galante de Carvalho
ibm.com/redbooks
7888edno.fm
International Technical Support Organization IBM Storage Data Deduplication Solutions February 2011
SG24-7888-00
7888edno.fm
Note: Before using this information and the product it supports, read the information in Notices on page xi.
First Edition (February 2011) This edition applies to Version 2.5 of ProtecTIER, versions 6.1and 6.2 of IBM Tivoli Storage Manager and Data ONTAP 7.3.4 for N series. This document created or updated on February 1, 2011.
Copyright International Business Machines Corporation 2011. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
7888edno.fm
iii
7888edno.fm
iv
7888toc.fm
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi The team who wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv Part 1. Introduction to Deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 1. Introduction to Deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 IBM data reduction and deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Deduplication overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Chunking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Consolidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Where deduplication processing occurs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 When deduplication occurs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Benefits of data deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4 4 5 6 6 6 7 8 8
Chapter 2. Introduction to N series Deduplication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 How does deduplication for IBM System Storage N series storage system work. . . . . 10 2.2 Deduplicated Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Deduplication metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.4 Sizing for performance and space efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.1 Deduplication general best practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5 Compressing and Deduplicating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Chapter 3. Introduction to ProtecTIER deduplication. . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Terms and definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Overview of HyperFactor technology and deduplication . . . . . . . . . . . . . . . . . . . . . . . . 3.3 IBM System Storage TS7600 with ProtecTIER. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 TS7610 - Entry Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 TS7650 - Appliance Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 TS7650G - Enterprise Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.4 TS7680 - Gateway Edition for System z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Deduplication with HyperFactor technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Impact of HyperFactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 ProtecTIER data ingest flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 ProtecTIER VTL concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.4 ProtecTIER OST concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.5 Steady state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 4. Introduction to IBM Tivoli Storage Manager deduplication . . . . . . . . . . . . 4.1 Deduplication overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 How ITSM data deduplication works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 IBM Tivoli Storage Manager deduplication overview . . . . . . . . . . . . . . . . . . . . . . . . . . 19 20 22 24 24 24 24 24 25 25 26 28 28 29 31 32 33 34
7888toc.fm
4.3.1 ITSM server-side deduplication overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.3.2 ITSM client-side deduplication overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Part 2. Planning for Deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 5. N series deduplication planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Supported hardware models and ONTAP versions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Deduplication and Data ONTAP version considerations. . . . . . . . . . . . . . . . . . . . . . . . 5.3 Deduplication Licensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Compatibility with native N series functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Deduplication and High Availability Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Deduplication and Non-disruptive Upgrade (NDU) . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Deduplication and Performance Acceleration Modules (PAM) . . . . . . . . . . . . . . . 5.4.4 Deduplication and Snapshot copies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.5 Deduplication and SnapRestore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.6 Deduplication and SnapMirror replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.7 Deduplication and SnapVault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.8 Deduplication and SnapLock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.9 Deduplication and MultiStore (vFiler) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.10 Deduplication and LUNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.11 Deduplication and the volume copy command . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.12 Deduplication and FlexClone volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.13 Deduplication and read reallocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.14 Deduplication with Quotas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.15 Deduplication with NDMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.16 Deduplication with DataFabric Manager/Protection Manager. . . . . . . . . . . . . . . 5.5 Compatibility with non-native functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Deduplication with VMWare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Deduplication with Tivoli Storage Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.3 Deduplication with Backup Exec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.4 Deduplication with Lotus Domino . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.5 Deduplication with Microsoft Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Data Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Deduplication and Storage Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.1 Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Deduplication and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.1 Duration of the deduplication operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.2 The I/O performance of deduplicated volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9 Deduplication Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10 Aggregate and Volume Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 42 42 43 43 43 44 44 44 45 45 49 50 51 51 55 55 56 56 56 57 57 57 58 58 58 58 59 60 60 61 61 61 62 63
Chapter 6. ProtecTIER planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.1 Hardware planning for the 3959-SM1, 3958-AP1, 3958-DD4, and 3958-DE2 . . . . . . . 66 6.1.1 General overview of the TS7600 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.1.2 Hardware and software components of the 3959-SM1. . . . . . . . . . . . . . . . . . . . . 68 6.1.3 Hardware and software components of the 3958-AP1 . . . . . . . . . . . . . . . . . . . . . 69 6.1.4 Hardware and software components of the 3958-DD4 . . . . . . . . . . . . . . . . . . . . . 70 6.1.5 Hardware and software components of the 3958-DE2 . . . . . . . . . . . . . . . . . . . . . 72 6.2 Planning for deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.3 Planning for Open systems with VTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 6.3.1 Sizing inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.3.2 Capacity sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.3.3 Performance Sizing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.4 Planning for Open systems with OST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 vi
IBM Storage Data Deduplication Solutions
7888toc.fm
6.5 Planning for installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.5.1 Supported backup server operating environments . . . . . . . . . . . . . . . . . . . . . . . 102 6.5.2 ProtecTIER manager workstation requirements . . . . . . . . . . . . . . . . . . . . . . . . . 102 Chapter 7. ITSM planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 ITSM Planning overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 ITSM Deduplication pre-requisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 ITSM active log and archive log sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 ITSM database sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Types of ITSM deduplication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Server-side deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.2 Client-side deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 ITSM deduplication considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Supported versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 Eligible storage pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.3 Encrypted data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.4 Compressed data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.5 Small files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.6 Lan-free considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.7 Hierarchical Storage Management (HSM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.8 Collocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.9 Disaster Recovery considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 When to use ITSM deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7 When not to use ITSM deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 106 106 106 107 108 108 108 109 110 111 111 111 112 112 112 112 112 113 114 114
Part 3. Implementing Deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Chapter 8. Implementing N series deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 8.1 Requirements Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 8.2 The IBM N series System Manager Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 8.2.1 Overview of System Manager Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 8.2.2 Bug Fix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 8.2.3 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 8.2.4 System Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 8.2.5 Installing the IBM N series System Manager Software . . . . . . . . . . . . . . . . . . . . 120 8.2.6 Steps to install the IBM N series System Manager Software . . . . . . . . . . . . . . . 120 8.2.7 Starting the IBM N series System Manager Software. . . . . . . . . . . . . . . . . . . . . 125 8.3 End-to-end Deduplication configuration example using command line. . . . . . . . . . . . 154 8.4 End-to-end Deduplication configuration example using IBM N series System Manager Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 8.5 Configuring deduplication schedules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 8.6 Sizing for performance and space efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 8.6.1 Deduplication general best practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 Chapter 9. Implementing ProtecTIER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Getting started. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.1 TS7610 SMB Appliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.2 TS7650 Appliance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.3 TS7650G Gateway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Installing ProtecTIER Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.2 Installing on Windows XP as an example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Starting ProtecTIER Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents
171 172 172 172 172 173 173 173 179 vii
7888toc.fm
9.3.1 Adding Nodes to ProtecTIER Manager. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.2 Logging in into ProtecTIER Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 ProtecTIER configuration on TS7610 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.1 ProtecTIER Configuration Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.2 ProtecTIER Manager Configuration Wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 ProtecTIER software install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.1 autorun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.2 ptconfig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.3 fsCreate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 Repository creating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6.1 Repository planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6.2 Repository creating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7 OST configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7.1 The OpenStorage Operating Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7.2 Configuring Storage Server (STS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7.3 Modifying STS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7.4 Creating Logical Storage Units (LSUs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7.5 Modifying LSUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7.6 Deleting LSU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.8 Virtual library creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.8.1 TS7610 Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.9 Host implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.9.1 Connecting hosts to ProtecTIER systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.9.2 Installing and configuring the device driver in OS . . . . . . . . . . . . . . . . . . . . . . . . 9.10 Deduplication operation with ProtecTIER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.10.1 Deduplication with VTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.10.2 Compression with VTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.10.3 Deduplication with OST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.10.4 Compression with OST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.11 Backup and restore applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.11.1 Recommendations for all backup servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.11.2 General recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.11.3 Data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.11.4 IBM Tivoli Storage Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 10. Implementing ITSM deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Implementing server-side deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Implementing client-side deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Managing data deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.1 Starting duplicate-identification processes automatically . . . . . . . . . . . . . . . . . 10.3.2 Starting duplicate-identification processes manually . . . . . . . . . . . . . . . . . . . . 10.3.3 Enabling data deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.4 Disabling data deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.5 Disabling the copy storage pool backup requirement . . . . . . . . . . . . . . . . . . . . 10.3.6 Restoring or retrieving deduplicated files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Deduplication best practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.1 Server resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.2 Data safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.3 Software version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.4 Administrative schedules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.5 Restore performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
179 181 183 183 187 195 195 197 200 201 201 203 208 208 209 209 210 214 217 218 227 228 228 232 236 237 238 238 238 239 239 239 242 244 261 262 267 268 268 268 269 270 270 271 272 272 272 273 273 273
viii
7888toc.fm
Data flow considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deduplication platform features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deduplication features matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ProtecTIER deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ITSM deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IBM N-Series deduplication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Part 4. Appendixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 Appendix A. N series use case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lab Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 285 286 286 286
Appendix B. ProtecTIER user cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 ProtecTIER deduplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 Appendix C. ITSM deduplication examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How to get Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 294 294 294 295 297 297 297 297 297 298
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
Contents
ix
7888toc.fm
7888spec.fm
Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
xi
7888spec.fm
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol ( or ), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both:
AIX 5L AIX DB2 Diligent Domino DS4000 DS8000 HyperFactor IBM Informix Lotus Notes Lotus Notes POWER5+ ProtecTIER Redbooks Redbooks (logo) System p5 System Storage System x System z Tivoli VTF XIV z/OS z/VM
The following terms are trademarks of other companies: Snapshot, WAFL, SnapVault, SnapRestore, SnapMirror, SnapLock, NearStore, MultiStore, FlexVol, FlexClone, DataFabric, Data ONTAP, and the NetApp logo are trademarks or registered trademarks of NetApp, Inc. in the U.S. and other countries. Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.
xii
7888pref.fm
Preface
Until now, the only way to capture, store, and effectively retain constantly growing amounts of enterprise data was to add more disk space to the storage infrastructurean approach which can quickly become cost-prohibitive as information volumes continue to grow, while capital budgets for infrastructure do not. Data deduplication has emerged as a key technology in the effort to dramatically reduce the amount and the cost associated with storing large amounts of data. Deduplication is the art of intelligently reducing storage needs an order of magnitude better than common data compression techniques through the elimination of redundant data so that only one instance of a data set is actually stored. IBM has the broadest portfolio of deduplication solutions in the industry giving us the freedom to solve customer issues with the most effective technology. Whether its source or target, inline or post, hardware or software, disk or tape, IBM has a solution with the technology that best solves the problem. This IBM redbooks publication covers the current deduplication solutions that IBM has to offer: IBM ProtecTIER Gateway and Appliance IBM Tivoli Storage Manager IBM System Storage N series Deduplication
xiii
7888pref.fm
management systems administration, and mass Storage/SAN/NAS administration and architecture. Rucel F. Javier is an IT Specialist from IBM Philippines, Inc. He is a Field Technical Support Specialist for System Technology Group (STG), providing pre-sales and post-sales support. Has worked in IT Industry for 9 years and an IBM Certified Specialist in System X, Blade Servers/Centers and Mid Range Systems Storage. Has worked with the implementation and support of N Series, DS3000 and DS5000 storage systems and has an extensive experience in Linux Operating Systems. Rucel is also doing Proof of concept (POC), product presentations and solutions for customer requirements.
Thanks to the following people for their contributions to this project: Dan Edwards IBM Global Technology Services Shayne Gardener IBM Software Group, Tivoli Mikael Lindstrom GTS Services Delivery Craig McAllister IBM Software Group, Tivoli Norbert Pott Tivoli Storage Manager V6.1 Technical Guide authors
xiv
7888pref.fm
Dave Accordino IBM Systems &Technology Group, Systems Hardware Development Denise Brown IBM Systems &Technology Group, Systems Hardware Development ProtectTIER Test Joe Dain IBM Systems &Technology Group, Systems Hardware Development TS7600 Development Mary Lovelace IBM International Technical Support Organization Gerard Kimbuende IBM Systems &Technology Group, Systems Hardware Development Carlin Smith IBM Advanced Technical Skills ( ATS ) - Americas
Comments welcome
Your comments are important to us! We want our books to be as helpful as possible. Send us your comments about this book or other IBM Redbooks publications in one of the following ways: Use the online Contact us review Redbooks form found at: ibm.com/redbooks Send your comments in an email to: [email protected] Mail your comments to: IBM Corporation, International Technical Support Organization Dept. HYTD Mail Station P099 2455 South Road Poughkeepsie, NY 12601-5400
Preface
xv
7888pref.fm
xvi
7888p01.fm
Part 1
Part
Introduction to Deduplication
In this part we introduce deduplication from a industry and IBM point of view.
7888p01.fm
7888ch01.fm
Chapter 1.
Introduction to Deduplication
Business data growth rates will continue to increase rapidly in the coming years. Likewise, retention and retrieval requirements for new and existing data will expand, driving still more data to disk storage. As the amount of disk-based data continues to grow, there is ever-increasing focus on improving data storage efficiencies across the Information Infrastructure. Data reduction is a tactic which can decrease the disk storage and network bandwidth required, lower Total Cost of Ownership (TCO) for storage infrastructures, and optimize use of existing storage assets and improve data recovery infrastructure efficiency. Deduplication and other forms of data reduction are features that can exist within multiple components of the Information Infrastructure. IBM offers a comprehensive set of deduplication solutions. IBM has been the industry leader in storage efficiency solutions for decades. IBM invented Hierarchical Storage Management (HSM) and the progressive incremental backup model, greatly reducing the primary and backup storage needs of its customers. Today, IBM continues to provide its customers the most efficient data management and data protection solutions available. N-Series, ProtecTIER and IBM Tivoli Storage Manager are excellent examples of IBMs continued leadership.
7888ch01.fm
7888ch01.fm
1.2.1 Chunking
Chunking refers to breaking data into some sort of standardized unit that can be examined for duplicates. Depending on the technology and locality of the deduplication process, these units can be files or more granular components such as blocks. File-level deduplication is generally less versatile than other means of deduplication but If the deduplication platform is format-aware, it can potentially identify explicit components within certain files such as embedded objects. Block-level deduplication is generally data-agnostic in the sense that it will compare blocks regardless of the file type, application, or OS that the data originated from. Some handling of data such as compartmentalization or encryption can cause identical files have mismatched blocks which can reduce the efficacy of block-level deduplication. Block-based deduplication is more granular but can take more processing power and can require a larger index or catalog to track the individual pieces. There are four methods of data chunking and each method influences the data deduplication ratio: File based one chunk is one file file based is typically used with devices that only have file system visibility Block based data object is chunked into blocks of fixed or variable size block based is typically used by block storage devices Format aware understands explicit data formats and chunk data object according to the format for example, breaking a power point presentation into separate slides Format agnostic
Chapter 1. Introduction to Deduplication
7888ch01.fm
chunking is based on an algorithm that looks for logical breaks or similar elements within a data object
1.2.2 Processing
Each chunk of data must be identified in a way that is easily comparable. Chunks are processed using a parity calculation or cryptographic hash function. This processing gives the chunks shorter identifiers known as a hash values, digital signatures, or fingerprints. These fingerprints can be stored in an index or catalog where they can be compared quickly with other fingerprints to find matching chunks. In rare cases where hash values are identical but reference non-unique chunks, a hash collision occurs which can lead to data corruption. To avoid this scenario, a secondary comparison should be done to verify that hash-based duplicate chunks are in fact redundant before moving on to consolidation. Some deduplication technologies will do a byte comparison of chunks after a fingerprint match to avoid hash collisions. Processing of chunks is generally the most CPU intensive part of deduplication and can impact I/O if being done in-line. There are three methods to differentiate a chunk and each method influences the duplicate identification performance: Hashing computes a hash (MD-5, SHA-1, SHA-2) for each data chunk compare hash with hash of existing data, identical hash means most likely identical data hash collisions: identical hash but non-identical data, must be prevented through secondary comparison (additional metadata, second hash method, binary comparison) Binary comparison -compares all bytes of similar chunks Delta differencing computes a delta between two similar chunks where one chunk is the baseline and the second chunk is the delta since each delta is unique, there is no possibility of collision to reconstruct the original chunk the deltas have to be re-applied to the baseline chunk
1.2.3 Consolidation
Once duplicate chunks have been clearly compared and identified, the pointers to those chunks must be changed so they point to a single unique copy rather than multiple duplicate chunks. Once the pointers are consolidated, the now extraneous data chunks can be released. In cases of inline processing, the duplicate chunks may never be written to the physical disk storage.
1.3 Architecture
This section offers a high level description of the deduplication architecture.
7888ch01.fm
IBM Tivoli Storage Manager can do client based (ITSM client) and server based (ITSM server) deduplication processing. ProtecTIER and N series do storage system based deduplication processing.
7888ch01.fm
Inline Deduplication
Inline deduplication refers to handling processing and consolidation before data is written to disk. Hashing and the hash comparison must be done on the fly, which can add performance overhead. If a byte-for-byte comparison must be done to avoid hash collisions, still more overhead is needed. In cases of deduplicating primary data and milliseconds are significant, inline deduplication is generally not recommended. A benefit of inline deduplication is that duplicate chunks are never written to the destination disk system.
Postprocess Deduplication
Postprocess deduplication refers to handling processing and consolidation after the data has been written to disk, generally with scheduled or manual runs. This allows more control over the performance impact of deduplication to avoid peaks or conflicts with other data I/O processes such as backups or DR replication. The duplicate data will consume disk space until deduplication runs. Buffer capacity must be more rigorously maintained to accommodate the more dynamic utilization. This method will likely be the only option for processing data already in place on the disk storage systems prior to bringing deduplication technology into the environment.
7888ch02.fm
Chapter 2.
7888ch02.fm
2.1 How does deduplication for IBM System Storage N series storage system work
Regardless of operating system, application, or file system type, all data blocks are written to a storage system using a data reference pointer, without which the data could not be referenced or retrieved. In traditional (non-deduplicated) file systems, data blocks are stored without regard to any similarity with other block in the same file system. In Figure 2-1, five identical data blocks are stored in a file system, each with a separate data pointer. Although all five data blocks are identical, each is stored as a separate instance and each consumes physical disk space.
In a deduplicated N series file system, two new and important concepts are introduced: A catalog of all data block is maintained. This catalog contains a record of all data blocks using a hash or fingerprint that identifies the unique contents of each block. The file system is capable of allowing many data pointers to reference the same physical data block. Cataloging data objects, comparing the objects, and redirecting reference pointers forms the basis of the deduplication algorithm. As shown in Figure 2-2, referencing several identical blocks with a single master block allows the space that is normally occupied by the duplicate blocks to be given back to the storage system.
10
7888ch02.fm
Hashing
Data deduplication begins with a comparison of two data blocks. It would be impractical (and very arduous) to scan an entire data volume for duplicate blocks each time a new data is written to that volume. For that reason, deduplication creates small hash values for each new block, and store these values in a catalog. A hash value, also called a digital fingerprint or digital signature as shown in Figure 2-3, is a small number that is generated from a longer string of data. A hash value is substantially smaller than the data block itself, and is generated by a mathematical formula in such a way that it is unlikely (although not impossible) for two no identical data blocks to produce the same hash value.
Figure 2-3 A hash value is a digital fingerprint that represents a much larger object
Deduplication process: The fingerprint catalog is sorted and searched for identical fingerprints When a fingerprint match is made, the associated data blocks are and retrieved and scanned byte-for-byte. Assuming successful validation, the inode pointer metadata of the duplicate block is redirected to the original block The duplicate block is marked as Free and returned to the system, eligible for re-use.
Hash catalog
A catalog of hash values is used to identify candidates for deduplication. A system process identifies duplicates, and data pointers are modified accordingly. The advantage of catalog deduplication is that the catalog is used only to identify duplicate objects; it is not accessed during the actual reading or writing of the data objects. That task is still handled by the normal file system data structure as shown in Figure 2-4 on page 12.
11
7888ch02.fm
Figure 2-4 Catalog indexing: the file system controls block sharing of deduplicate blocks
Deduplication is an IBM System Storage N series storage efficiency offering that provides block-level deduplication within the entire flexible volume on IBM System Storage N series storage systems. Beginning with Data ONTAP 7.3, IBM System Storage N series gateways also support deduplication. IBM System Storage N series gateways are designed to be used as a gateway system that sits in front of third-party storage, allowing IBM System Storage N series storage efficiency and other features to be used on third-party storage. Figure 2-5 on page 12 shows how IBM System Storage N series deduplication works at the highest level.
Essentially, deduplication stores only unique blocks in the flexible volume and creates a small amount of additional metadata during the process. Notable features of deduplication are that: It works with a high degree of granularity (that is, at the 4 KB block level). It operates on the active file system of the flexible volume. Any block referenced by a Snapshot is not made available until the Snapshot is deleted.
12
7888ch02.fm
It is a background process that can be configured to run automatically, can be scheduled, or can run manually through the command-line interface (CLI). It is application-transparent, and therefore it can be used for data originating from any application using the IBM System Storage N series storage system. It is enabled and managed through a simple CLI. It can be enabled on (and can deduplicate blocks on) flexible volumes with new and existing data. In summary, deduplication works as follows: 1. Newly saved data on the IBM System Storage N series storage system is stored in 4 KB blocks as usual by Data ONTAP. 2. Each block of data has a digital fingerprint, which is compared to all other fingerprints in the flexible volume. 3. If two fingerprints are found to be the same, a byte-for-byte comparison is done of all bytes in the block. If an exact match is found between the new block and the existing block on the flexible volume, the duplicate block is discarded and its disk space is reclaimed.
13
7888ch02.fm
In Figure 2-6, the number of physical blocks used on the disk is three (instead of five), and the number of blocks saved by deduplication is two (five minus three). In the remainder of this chapter, these blocks are referred to as used blocks and saved blocks. This is called multiple block referencing. Each data block has a block-count reference kept in the volume metadata. As additional indirect blocks (shown as IND in Figure 2-6) point to the data, or existing blocks stop pointing to the data, this value is incremented or decremented accordingly. When no indirect blocks point to a data block, it is released. The Data ONTAP deduplication technology allows duplicate 4 KB blocks anywhere in the flexible volume to be deleted, as described in the following sections. The maximum sharing for a block is 255. For example, that if there are 500 duplicate blocks, deduplication would reduce those blocks to only two blocks. Also note that this ability to share blocks is different from the ability to keep 255 Snapshot copies for a volume.
14
7888ch02.fm
If the amount of new data is small, run deduplication infrequently, because there is no benefit to running it frequently in such a case, and it consumes CPU resources. How often you run it depends on the rate of change of the data in the flexible volume. The more concurrent deduplication processes you are running, the more system resources that are consumed. Given this information, the best option is to perform one of the following actions: Use the auto mode so that deduplication runs only when significant additional data has been written to each particular flexible volume (this approach tends to naturally spread out when deduplication runs). Stagger the deduplication schedule for the flexible volumes so that it runs on alternative days. Run deduplication manually.
Chapter 2. Introduction to N series Deduplication
15
7888ch02.fm
If Snapshot copies are required, run deduplication before creating the Snapshot to minimize the amount of data before the data gets locked in to the copies. (Make sure that deduplication has completed before creating the copy.) Creating a Snapshot on a flexible volume before deduplication has a chance to run and complete on that flexible volume can result in lower space savings. If Snapshot copies are to be used, the Snapshot reserve should be greater than zero (0). An exception to this might be in an FCP or iSCSI LUN scenario, where it is often set to zero for thin-provisioning reasons. For deduplication to run properly, you have to leave some free space for the deduplication metadata. Deduplication is tightly integrated with Data ONTAP and the WAFL file structure. Because of this integration, deduplication is performed with extreme efficiency. Complex hashing algorithms and lookup tables are not required. Instead, Deduplication is able to use IBM System Storage N series storage systems with the NearStore option internal characteristics to create and compare digital fingerprints, redirect data pointers, and free-up redundant data areas, as shown in Figure 2-7.
16
7888ch02.fm
17
7888ch02.fm
18
7888ch03-eva.fm
Chapter 3.
19
7888ch03-eva.fm
Front end
The connection between the ProtecTIER system and the backup server is referred to as a front end connection.
Back end
The connection between the ProtecTIER system and the disk array is referred to as a back end connection.
Node
A IBM System x3850 X5 is viewed as a node from the ProtecTIER Manager software. You can have a single node (one IBM System x3850 X5) or two node clusters (two IBM System x3850 X5).
Metadata
Metadata is the data used to keep track of the data about your backup data, including where it is stored on the disk.
User data
User data is the backup files and data sets stored on the virtual tape library. It is the data that you are storing on disk.
Nominal capacity
The amount of user data that ProtecTIER is managing.
Physical capacity
The physical disk space available within the array.
Factoring ratio
The factoring ratio refers to the ratio of nominal capacity to physical capacity. For example, if you have 100 TB of user data (nominal capacity) and it is stored on 10 TB of physical capacity, your factoring ratio is 10 to 1.
Repository
The repository is the physical disk that holds the ProtecTIER factored data. There are two types of file systems that make up the ProtecTIER Repository: User Data and Meta Data.
20
7888ch03-eva.fm
Disk array
The disk array attaches to the IBM System Storage TS7600 with ProtecTIER through back end connections and holds the repository or cache of factored backup data.
Change rate
The percentage of change in data from one backup set to the next. The daily change rate is the percentage of change from one day's backup cycle to the next. For example, if the daily change rate is 10%, it means that only 10% of the backed-up data changes from one day to the next.
Disaster recovery
Disaster recovery (DR) is the process of recovering production site data from a remote location. It includes a way to indicate to a remote repository that the production site has gone down.
Failover
The process of failover is undertaken when continued operations at the source location is no longer possible. A disaster is declared.
Failback
A process that is initiated from the remote site when the source site is now able to continue production operations and therefore back up processes. The process ensures that the paired repositories are re-synchronized using the least amount of bandwidth and maintaining the most recent copies of backups. After the failback process, the operational norms prior to the execution of a DR resume.
Principality/ownership
An attribute indicating the repository in which an individual cartridge can be updated or written on by a backup application. A cartridge at its principal repository can be read/write (R/W) or R/O. At other sites it is R/O. A cartridge can have principality/ownership turned on for only one site.
Replication
A process that transfers logical objects like cartridges from one ProtecTIER repository to another. The replication function allows ProtecTIER deployment to be distributed across sites. Each site has a single or clustered ProtecTIER environment. The ProtecTIER server that is a
21
7888ch03-eva.fm
part of the replication grid has two dedicated replication ports. Eth3 and Eth4 are use for replication. Replication ports are connected to the customers WAN and are configured on two subnets as the default.
Shelf
A container of virtual tape library (VTL) cartridges within a ProtecTIER repository. This is analogous to a shelf or a rack where physical tapes are kept when outside an automated tape library.
TS7650
When used alone, this term signifies the IBM open systems family of virtualization solutions that operate on the ProtecTIER platform.
22
7888ch03-eva.fm
Repository
Disk Arrays
FC Switch
TS7650G
data
With this approach, HyperFactor is able to surpass the reduction ratios attainable by any other data reduction method. HyperFactor can reduce any duplicate data, regardless of its location or how recently it was stored. Unlike hash-based techniques, HyperFactor finds duplicate data without needing exact matches of chunks of data. When new data is received, HyperFactor checks to see if similar data has already been stored. If similar data has already been stored, then only the difference between the new data and previously stored data needs to be retained. Not only is this an effective technique of finding duplicate data, but it performs very well. In Figure 3-1 with HyperFactor deduplication, when new data is received, HyperFactor looks for data similarities and check those similarities in the Memory Resident Index. When similarity matches are found, the existing similar element is read from disk and a binary differential is performed on the similar elements. Unique data with corresponding pointers is stored in the repository and the Memory Resident Index is updated with the new similarities. Existing data is not stored.
1. Look through data for similarity New Data Stream 2. Read elements that are most similar 3. Diff reference with version will use several elements Element A Element B Element C
23
7888ch03-eva.fm
HyperFactor data deduplication uses a 4 GB Memory Resident Index to track similarities for up to 1 petabyte (PB) of physical disk in a single repository. Depending on the data deduplication ratio for your data, you could store much more than one PB of data on your disk array. For example, with a ratio of 12 to 1, you could store 12 PB on that one PB of disk array. With the Memory Resident Index, HyperFactor can identify potentially duplicate data quickly for large amounts of data and does this on data ingest, or in-line, reducing the amount of processing required for your data. The read-back rate of the ProtecTIER deduplication technology is generally faster than the write rate to the system, since there is no risk of fragmentation, and no access to the index or heavy computation is required during a restore activity. It just requires you to open metadata files and fetch the data according to the pointers they contain.
24
7888ch03-eva.fm
Master Server
Backup Server
ProtecTIER Server
Physical capacity
Up to
The amount of space saved is a function of many factors, but mostly of the tape processing policies and retention periods and the variance of the data between them. Over time, the effect of HyperFactor is a system-wide factoring ratio. The factoring ratio is derived by dividing the total data before reduction into the total data after reduction.
25
7888ch03-eva.fm
Example: Total Data Before Reduction: 250TB, Total Data After Reduction: 10 TB 250 TB /10 TB = 25 : 1 HyperFactor Ratio With ProtecTIER we are calculating with HyperFactor ratios. To translate this to how many % of storage is saved, we created a table with some examples to show it. See Table 3-1:
Table 3-1 HyperFactor Ratio translated to Space saved % HyperFactor Ratio 2:1 4:1 10 : 1 20 : 1 Space saved 50 % 75 % 90 % 95 %
The factoring ratio of your data depends heavily on two key variables: 1. Data retention period: the period of time (usually measured in days) that defines how long customers will keep their disk-based backups online. This period of time typically ranges from a period of 30 to 90 days, but can be much longer. 2. Data change rate: the rate at which the data received from the backup application changes from backup to backup. This measurement has most relevance when like backup policies are compared. (Data change rates might range from 1% to >25%, but are difficult to directly observe.) See more details about factoring ratio in section, Factoring ratio on page 20.
26
7888ch03-eva.fm
Memory index
Repository data
Compression
The data flow is as follows: 1. A new data stream is sent to the ProtecTIER server, where it is first received and analyzed by HyperFactor. 2. For each data element in the new data stream, HyperFactor searches the Memory Resident Index in ProtecTIER to locate the data in the repository that is most similar to the data element. 3. The similar data from the repository is read. 4. A binary differential between the new data element and the data from the repository is performed, resulting in the delta difference. 5. The delta from step 4 is now written to the disk repository after being processed with the Delta Compression. It behaves like Lempel-Ziv-Haruyasu (LZH) compression algorithm. With LZH compression, additional size reduction might be achieved for delta data. Some size reduction might be accomplished for new data (such as the initial backup of unique new data) through this compression. 6. The Memory Resident Index is updated with the location of the new data that has been added. The Memory Resident Index is written to the metadata file system frequently. Once the duplicate data is identified, the Memory Resident Index is not needed to read the data. This eliminates the concern that the Memory Resident Index could be corrupted or lost and therefore access to the data might be compromised. Since the Memory Resident Index is only used for data deduplication on data ingest, data accessibility remains if the index is lost. The Memory Resident Index is restored from the Meta Data file system, if needed. If the Memory Resident Index was lost and restored, any index updates for deduplication that occurred in the window between the last index save and the index loss would be unavailable and new data could not be compared for any similarities developed during that very short window. The only impact from this would be a slight, probably unmeasurable, reduction in the overall deduplication ratio.
27
7888ch03-eva.fm
Datbase keeps Database keeps track of the number of times a storage location on User disk is referenced by virtual volumes
Virtual volume metadata files, stored on Meta Data LUNs, track where logical blocks were stored on the User disk
User data, written by Backup Applications, stored in locations on User Data RAID LUNs
S1 S2
T rack
S79 S78 S 77 S7 6
S4
DB 2: S2
DB3: S3
DB 4: S4
ProtecTIER uses the metadata files to read back the virtual cartridges. When a virtual cartridge is overwritten or deleted, the reference count for each segment of user data on a virtual cartridge is decremented. Once the reference count reaches zero, the space occupied by that user data segment is released. ProtecTIER uses the Global File System (GFS) to allow multiple references to the same data.
7888ch03-eva.fm
Backup-to-disk as a NetBackup client Single point of control of NetBackup policies, cartridge sets and pools. Full support via API to simplify writing and managing backup data to disk Connects to NBU via a plug-in on the media server Provides new interface for the storage device called a Logical Storage Unit (LSU) LSUs can be duplicated, moved and shared by multiple NetBackup media servers
Backup Server
ProtecTIER Server
NBU IBM Plug-in
Network (IP)
DFS
Repository
29
7888ch03-eva.fm
storage block identified. After all references in a metadata file are processed, the metadata file is deleted. Each time that the reference count of a storage block goes to zero, that storage block gets returned to the pool of free blocks and becomes usable as free space. Not all units of free space are usable in ProtecTIER. The smallest unit of usable, or allocatable space, in the repository is 1 MB. A storage block is 16 K. As storage blocks are freed as a result of an overwrite of a virtual cartridge, some of the space freed will be in amounts that are less than 1 MB of contiguous blocks, and the system must defragment the file system. ProtecTIER keeps one block group free for defragmentation. The active blocks in a single block group are copied to contiguous space in the free block groups, essentially defragging the block group. The block group from which the data was copied becomes the new free block group. All of this processing occurs in the background and can occur while new data is written to new metadata files associated with the virtual cartridge being overwritten. Once the system enters steady state, ProtecTIER is managing four types of I/O activity to the disk: the two types of I/O performed as all virtual cartridges were filled (standard backup activity, as described previously), plus the new additional work of: Reading and deleting old metadata files Moving data from fragmented block groups to free space in block groups To prevent the defragmentation operations from impacting performance, they are only allowed a maximum of 15% of the total input/outputs operations per second (IOPS) of the system. From a performance standpoint, when the ProtecTIER system first begins ingesting data, ProtecTIER is not matching new data to existing data and no fragmentation is occurring. This enables high performance. Once steady state is achieved, performance stabilizes. Figure 3-7 is a conceptualization of the performance from initial implementation through steady state. The change in performance and the time to reach steady state depends on the size the repository and the operational characteristics of your data.
Steady State
Figure 3-7 ProtecTIER performance from initial implementation through steady state
30
7888ch04.fm
Chapter 4.
31
7888ch04.fm
Client compression
Incremental forever
Subfile backup
Server-side deduplication
Server eliminates redundant data chunks
Client-side deduplication
Client and server eliminate redundant data chunks Yes
Yes
Yes
Yes Backup (Windows only) Subfiles that do not change between backups No
No
Data supported
Backup, archive, HSM, API Redundant data within same file on client node No
Backup, archive, HSM, API Redundant data from any files in storage pool Yes
Backup, archive, API Redundant data from any files in storage pool Yes
Scope of data reduction Avoids storing identical files renamed, copied, or relocated on client node? Removes redundant data for files from different client nodes?
No
No
No
Yes
Yes
Available prior to V6
Available 6.1
Available 6.2
Deduplication is a technique that allows more data to be stored on disks. It works by removing duplicates in the stored version of your data. In order to do that, the deduplication system has to process the data into a slightly different form. When you need the data back, it can be reprocessed into in the same form as it was originally submitted. With deduplication, the larger the quantity being deduplicated, the more opportunity exists to find similar patterns in the data, and the better the deduplication ratio can theoretically be. While all workloads may benefit from deduplication the effectiveness may vary depending on data type, data retention, change rate and backup policies. It requires that there are similarities in the data being deduplicated: For example if a single file exists more than once in the same store, this could be reduced down to one copy plus a pointer for each deduplicated version (this is often referred to as a Single Instance Store). Some other workloads such as uncompressible and non-repeated media (JPEGs, MPEGs, MP3, or specialist data such as geo-survey data sets) will not produce significant savings in space consumed. This is because the data is not compressible, has no repeating segments, and has no similar segments.
32
7888ch04.fm
In many situations, deduplication works better than compression against large data sets, because even with data that is otherwise uncompressible, deduplication offers the potential to efficiently store duplicates of the same compressed file. To sum up, deduplication typically allows for more unique data to be stored on a given amount of media, at the cost of the additional processing on the way into the media (during writes) and the way out (during reads). Figure 4-2 illustrates a data deduplication overview.
TSM Storage Pool
A C A A B A B A A a 1. Files are divided into data chunks which are evaluated to determine a unique signature for each. 2. Signature values are compared to identify all duplicates. 3. a B A C B A C B A B C
Duplicate data chunks are replaced with pointers to a single stored chunk, saving storage space.
# csum -h MD5 tivoli.tsm.devices.6.1.2.0.bff 05e43d5f73dbb5beb1bf8d370143c2a6 tivoli.tsm.devices.6.1.2.0.bff # csum -h SHA1 tivoli.tsm.devices.6.1.2.0.bff 0d14d884e9bf81dd536a9bea71276f1a9800a90f tivoli.tsm.devices.6.1.2.0.bff
33
7888ch04.fm
A typical method of deduplication is to logically separate the data in a store into manageable chunks, then produce a hash value for each chunk, and store those hash values in a table. When new data is taken in (ingested) into the store, the table is then compared with the hash value of each new chunk coming in, and where theres a match, only a small pointer to the first copy of the chunk is truly stored as opposed to the new data itself. Typical chunk sizes could be anywhere in the range of 2 KB to 4 MB (average chunk size is 256 KB). There is a trade-off to be made with chunk size: a smaller chunk size means a larger hash table, so if we use a chunk which is too small, the size of the table of hash pointers will be large, and could outweigh the space saved by deduplication. A larger chunk size means that in order to gain savings, the data must have larger sections of repeating patterns, so while the hash-pointer table will be small, the deduplication will find fewer matches. The hashes used in deduplication are similar to those used for security products; MD5 and SHA-1 are both commonly used cryptographic hash algorithms, and both are used in deduplication products, along with other more specialist customized algorithms. Before Tivoli Storage Manager chunks the data at a bit file object level, it calculates an MD5 of all the objects in question, which are then sliced up into chunks. Each chunk has an SHA-1 hash associated with it, which is used for the deduplication. The MD5s are there to verify that objects submitted to the deduplication system are reformed correctly, because the MD5 is recalculated and compared with the saved one to ensure that returned data is correct. With any hash, there is a possibility of a collision, which is the situation when two chunks with different data happen to have the same hash value. This possibility is extremely remote: in fact the chance of this happening is less likely than the undetected, unrecovered hardware error rate. Other methods exist in the deduplication technology area which are not hash based, so do not have any logical possibility of collisions. One such method is called hyperfactor; this is implemented in the IBM ProtecTIER storage system and explained in Chapter 3, Introduction to ProtecTIER deduplication on page 19.
34
7888ch04.fm
Note: ITSM does not deduplicate random-access disk storage pools (DISK type device class), nor tape storage pools. The primary storage pool must be a sequential-access disk storage pool (FILE type device class) that is enabled for data deduplication. Data deduplication can be enabled by the IBM Tivoli Storage Manager administrator on each storage pool individually, so it is possible to deduplicate those types of data which will benefit the most, as opposed to everything. Figure 4-3 shows how server-side deduplication works.
1. Client1 backs up files A, B, C and D. Files A and C have different names, but the same data.
A B C D
2. Client2 backs up files E, F and G. File E has data in common with files B and G.
E F G
Client1 Vol1
A B C D
Server
Server
3. TSM server process chunks the data and identifies duplicate chunks C1, E2 and G1.
Server Vol3
A1 B1 B2 D1 E1 E3 A0 B0 C0 D0 B1 E1 F1
When migrating or copying from storage pools with the FILE device class type to a tape storage pool, ITSM does not store the deduplicated version of the data on the tape device class devices. Instead, it reconstructs the data so that full copies are stored on tape. This is to aid processing for recoveries, which could otherwise become difficult to manage.
35
7888ch04.fm
When restoring or retrieving files, the client node queries for and displays files as it normally does. If a user selects a file that exists in a deduplicated storage pool, the server manages the work of reconstructing the file. Client-side data deduplication uses the following process: The client creates extents. Extents are parts of files that are compared with other file extents to identify duplicates. The client and server work together to identify duplicate extents. The client sends non-duplicate extents to the server. Subsequent client data-deduplication operations create new extents. Some or all of those extents might match the extents that were created in previous data-deduplication operations and sent to the server. Matching extents are not sent to the server again. Figure 4-4 shows how client-side deduplication works.
1. Client1 backs up files A, B, C and D. They are all different files with different chunks.
A B C D
2. Client2 will back up files E, F and G. Files E and F have data in common with files B and C.
Client1 Vol1
A B C D
Server
Server
3. TSM Server/Client identify which chunks are duplicates using hash index DB built on Client. Duplicate data is not sent to Server.
E1 G
4. Identify not needed to recognize duplicate data. Reclamation processing only needed when dead space happens for deleted data
Server
A1 B1 B2 C1 B1 A0 B0 C0 D0 D1 E1 E1 G1
Server Vol3
A0 B0 C0 D0 D1 E1 G1 A1 B1 B2 C1 B1 E1
With client-side data deduplication, you can: Exclude specific files on a client from data deduplication. Enable a data deduplication cache that reduces network traffic between the client and the server. The cache contains extents that were sent to the server in previous incremental backup operations. Instead of querying the server for the existence of an extent, the client queries its cache. Specify a size and location for a client cache. If an inconsistency between the server and the local cache is detected, the local cache is removed and repopulated. Enable both client-side data deduplication and compression to reduce the amount of data that is stored by the server. Each extent is compressed before being sent to the server. The trade-off is between storage savings and the processing power that is required to compress client data. In general, if you compress and deduplicate data on the client
36
7888ch04.fm
system, you are using approximately twice as much processing power as data deduplication alone. You enable client-side data deduplication using a combination of settings on the client node and the ITSM server. The primary storage pool that is specified by the copy group of the management class associated with the client data must be a sequential-access disk storage pool (FILE type device class) that is enabled for data deduplication. For more information about client-side deduplication, please refer to the Tivoli Storage Manager Backup-archive clients documentation located at: http://publib.boulder.ibm.com/infocenter/tsminfo/v6r2/topic/com.ibm.itsm.ic.doc/we lcome.html
37
7888ch04.fm
38
7888p02.fm
Part 2
Part
39
7888p02.fm
40
7888ch05.fm
Chapter 5.
41
7888ch05.fm
7888ch05.fm
Data ONTAP 7.3.1 and later provide checkpoint restarts- if the deduplication process is interrupted, it can later restart and continue processing the same changelog from a checkpoint rather than from the beginning of the last run. Data ONTAP 8 no longer requires the NearStore license for Deduplication. Only the A-SIS license. Please see the Licensing section for more details. Please see the relevant sections below for Data ONTAP compatibility with other functions.
Both of the licenses are available independently for no additional cost. Please contact your procurement representative for order details. With HA-paired filers, all partnered nodes should have the licenses installed For licensing requirements with SnapMirror source and destination filers, please see the SnapMirror section below. Before removing the A-SIS license, you must disable deduplication on any flexible volumes for which it is enabled. If you attempt to remove the license without first disabling deduplication, you will receive a warning message asking you to disable this feature. If licensing is removed or expired, no additional deduplication can occur and no deduplication commands can be run. However, deduplicated volumes remains deduplicated, the existing storage savings are kept, and all data is still accessible.
Note: Prior to ONTAP 8, these were referred to as Active/Active Configurations or software. Standard and Mirrored HA pairs support deduplication with the same hardware and ONTAP requirements as standalone systems. The requisite deduplication licenses must be installed on all paired nodes. When no takeover has been done, deduplication on each node will work independently- the standard deduplication requirements and limitations are applicable to each node separately.
43
7888ch05.fm
Upon takeover, deduplication processing stops for the unavailable partner nodes volumes. Change logging continues on the active partner until the change log is full. All data continues to be accessible regardless of change log availability. Deduplication operations for volumes owned by the active partner will continue as scheduled but CPU overhead could become become a bigger impact in conjunction with the increased general workload from the takeover. Because of the increased workload during takeover, our best practice recommendation is to disable deduplication on the active partner until after giveback. After giveback occurs, data in the change logs is processed during the next deduplication run (scheduled or manual). Stretch and Fabric MetroClusters support deduplication starting with Data ONTAP 7.3.1
Note: When you run a deduplication scan on a volume with the 16-GB Performance Acceleration Module installed, you obtain suboptimal space savings.
44
7888ch05.fm
Deduplication metadata within the volume can be locked into a snapshot when the copy is created. In Data ONTAP 7.2.x, all of the deduplication metadata resides within the volume and is susceptible to snapshot locking. Starting with Data ONTAP 7.3.0, The deduplication change logs and fingerprint database have been relocated to the aggregate level, outside of the volume. The temporary metadata files that are still in the volume are deleted when the deduplication operation completes. This change enables deduplication to achieve higher space savings with Data ONTAP 7.3, if using Snapshots. For deduplication to provide the greatest storage efficiency when used in conjunction with Snapshot copies, consider the following best practices during planning: Remove old Snapshot copies maintained in deduplicated volumes to release duplicate data locked in snapshots taken prior to deduplication. Reduce the retention time of Snapshot copies maintained in deduplicated volumes. Run deduplication before creating new Snapshots and make sure deduplication has enough time to complete so the temporary metadata is cleared. Schedule deduplication only after significant new data has been written to the volume. Configure the appropriate reserve space for the Snapshot copies. If the snap reserve is zero, you should turn off the Snapshot auto-create schedule (this is the case in most LUN deployments). If the space used by Snapshot copies grows to more than 100%, reports obtained by running the df -s command will show incorrect results because some space from the active file system is being taken away by Snapshot and therefore actual savings from deduplication are not reported.
45
7888ch05.fm
To run deduplication with volume SnapMirror: Deduplication is only enabled, run, scheduled, and managed from the primary location. The flexible volume at the secondary location inherits all of the attributes and storage savings through SnapMirror. Only unique blocks are transferred, so deduplication reduces network bandwidth usage too. The maximum volume size limit is imposed based on the lower maximum volume size limit of the source and destination volumes.
Licensing
The A-SIS license must be installed at the primary location (source) but is not strictly required at the destination. As a best practice, the A-SIS license should also be installed on the destination filer if there is ever the intention of having it run deduplication on volumes locally (example: if the destination filer ever becomes the primary such as in a DR scenario). The NearStore license must be installed on both the source and destination with Data ONTAP versions prior to version 7.3.1. Starting with Data ONTAP 7.3.1, the NearStore license is no longer required on the destination system. With Data ONTAP 8, no NearStore license is required for deduplication on either the source or destination.
46
7888ch05.fm
SnapMirror licensing requirements are unchanged from the standard SnapMirror requirements when used concurrently with deduplication.
Scheduling
When configuring both Volume SnapMirror and Deduplication for a volume, consider the scheduling for each. The processes are not aware of each other and scheduling must be managed independently. As a best practice, start VSM of the intended volume only after deduplication of that volume has completed. This approach is to avoid capturing undeduplicated data and deduplication metadata in the VSM Snapshot and then replicating the extra data over the network and onto the destination filer. Regular Snapshot and deduplication metadata storage efficiency and scheduling concerns are potentially amplified with VSM as these regular snapshots will also be replicated to the destination storage system. Please reference 5.4.4, Deduplication and Snapshot copies on page 44.Deduplication and VSM processes can potentially contend for CPU cycles if running concurrently. Please reference 5.8, Deduplication and Performance on page 61
47
7888ch05.fm
A qtree can be replicated to a secondary storage system (destination) by using Qtree SnapMirror (QSM), as shown in Figure 5-2, Figure 5-3 and Figure 5-4.
Qtree SnapMirror replication from a deduplicated source volume to a non deduplicated destination volume
Qtree SnapMirror replication from a non deduplicated source volume to a deduplicated destination volume
48
7888ch05.fm
Qtree SnapMirror replication from a deduplicated source volume to a deduplicated destination volume
Licensing
The A-SIS license must be installed on any storage systems running deduplication whether source, target, or both. The NearStore license must be installed on any storage systems running deduplication whether source, target, or both. With Data ONTAP 8, no NearStore license is required for deduplication on either the source or destination. SnapMirror licensing requirements are unchanged from the standard SnapMirror requirements when used concurrently with deduplication.
Scheduling
Deduplication savings are not transferred with Qtree SnapMirror- deduplication needs to be configured and scheduled independently at the source and destination, if desired. It can be run at either or both independently. Also, QSM and deduplication processes are not aware of each other and scheduling for these must be managed separately. As a best practice, start deduplication of the destination qtree only after QSM of that qtree has completed. Deduplication and QSM processes can potentially contend for CPU cycles if running concurrently.
49
7888ch05.fm
The deduplication schedule on the destination volume is linked to the SnapVault scheduleEvery SnapVault update (baseline or incremental) kicks off the deduplication process on the destination after the archival Snapshot is taken. The deduplication schedule on the source is still independent of SnapVault, just like qtree SnapMirror. The deduplication schedule on the destination cannot be configured manually, and the sis start command is not allowed either. However, the sis start -s command can be run manually on the destination. The archival Snapshot is replaced with a new one after deduplication has finished running on the destination. (The name of this new Snapshot is the same as that of the archival copy, but the creation time of this copy is changed.) The SnapVault update is not dependent upon completion of the deduplication operation on the destination- subsequent SnapVault incremental updates may run while the deduplication process on the destination volume from the previous backup is still in progress. In this case, the deduplication process continues to run, but the archival Snapshot does not get replaced after deduplication has finished running. When using SnapVault, the maximum volume sizes for deduplication for the primary and secondary are independent of one another. Volumes on each of the systems will need to abide by their respective maximum volume size limits. starting with Data ONTAP 7.3, if using SnapVault with NetBackup, block sharing is supported for partner volumes in takeover mode. When deduplication is run on an existing SnapVault source for the first time, all saved space is transferred to the destination system because the next SnapVault update will recognize deduplicated blocks as changed blocks. Because of this, the size of that transfer might be several times larger than the regular transfers. Running deduplication on the source system periodically will help prevent this issue for future SnapVault transfers. If possible, you should run deduplication before the SnapVault baseline transfer. Protection Manager 3.8 or later can be used to manage deduplication with SnapVault. With Data ONTAP 7.3, SnapVault integration with deduplication replaces Snapshot copies. As a result, Protection Manager has to wait for deduplication to finish before renaming Snapshot copies. During the time that Protection Manager waits, it does not allow clients to list the Snapshot copies or restore from them. This can adversely affect the recovery point objective. Data ONTAP 7.3 is required for use with Open Systems SnapVault (OSSV).
7888ch05.fm
Autocommit functions irrespective of the deduplication status of the files. When using Qtree SnapMirror, deduplication needs to be run separately on the source and destination. The WORM property is carried forward by qtree SnapMirror. Switching on WORM or deduplication on either end has no effect on the qtree SnapMirror transfers. Undoing deduplication will also have no effect when done on either the source or the destination. When using Volume SnapMirror, the WORM property of the files is carried forward by volume SnapMirror. Deduplication only needs to be run on the primary. Volume SnapMirror allows the secondary to inherit the deduplication. Undoing deduplication can only be done after breaking the volume SnapMirror relationship. To revert to a previous release of ONTAP on a system hosting a volume that is deduplicated and has SnapLocked data on it, the volume must first be undeduplicated.
Definitions
Several definitions are used in the examples. They are defined as follows: Volume Space Guarantee This is a volume option that specifies whether the volumes space is reserved from the aggregates free pool at the time of the volume creation. Setting this option to none is essentially thin provisioning of the volume. Fractional reserve is a volume option that enables you to determine how much space Data ONTAP reserves for Snapshot copy overwrites for LUNs, as well as for space-reserved files when all other space in the volume is used. Default is 100%. The behavior of the fractional reserve space parameter with deduplication is the same as though a snapshot has been taken in the volume and blocks are being overwritten.
Fractional Reserve
51
7888ch05.fm
This is an N series LUN option that insures 100% of the LUNs space is removed from the volumes free pool at the time the LUN is created. This pool refers to the free blocks in the parent volume of the LUN. These blocks can be assigned anywhere in the volume as necessary. This pool refers to the free blocks in the parent aggregate of volume containing the LUN. These blocks can be assigned anywhere in the aggregate as necessary.
Volume space guarantees, LUN space reservations, and fractional reserves can be configured so that the use of the freed blocks by the IBM System Storage N series storage system changes depending on the configuration. By varying the values, freed blocks can be returned to the LUN overwrite reserve, the volume free pool, the aggregate free pool, or a combination.
Table 5-2 configuration examples Configuration LUN Space Reservation Value Volume Fractional Reserve Value Volume Space Guarantee After deduplication and thin provisioning (if applicable), free blocks are returned to: A (default) on B on C on D off E off
100
1-99
any
any
52
7888ch05.fm
Default volume
Description
When a LUN containing default values is deduplicated, no apparent savings are observed by the storage administrator because the LUN was, by default, space reserved when it was created and fractional reserve was set to 100% in the volume. Any blocks freed through deduplication are allocated to the fractional reserve area. This configuration means that overwriting to the LUN should never fail, even if it is overwritten entirely.
Description
The only difference between this configuration and configuration A is that the amount of space reserved for overwriting is based on the fractional reserve value set for the volume. As a result, this configuration splits the free blocks between fractional overwrite reserve and volume free space. For example, if the fractional reserve value is set to 25, then 25% of the freed blocks go into fractional overwrite reserve and 75% of the freed blocks are returned to the volume free pool.
53
7888ch05.fm
Note: If Snapshot copies are turned off for the volume (or if no Snapshot exists in the volume) and the percentage of savings is less than the fractional reserve because of deduplication, then this is not a recommended configuration for a volume with deduplication.
Description
The only difference between this configuration and configuration B is that the value of fractional reserve is set to zero. As a result, this configuration forces all freed blocks to the volume free pool and no blocks are set aside for fractional reserve.
Description
The difference between this configuration and configuration C is that the LUN is not space reserved. With LUN space guarantees off, the value for volume fractional reserve is ignored for all LUNs in this volume. From a deduplication perspective, there is no difference between this and the previous configuration, and all freed blocks go to the volume free pool.
54
7888ch05.fm
Description
This configuration forces the free blocks out of the volume and into the aggregate free pool, where the blocks can be reallocated for any other volumes in the aggregate.
55
7888ch05.fm
in the aggregate. In this case, there is no fingerprint database file in the cloned volume for the data that came from the parent. However, the data in the cloned volume inherits the space savings of the original data. The deduplication process also continues for any new data written to the clone, and creates the fingerprint database for the new data. However, the deduplication process obtains space savings in the new data only, and does not deduplicate between the new data and the old data. To run deduplication for all the data in the cloned volume (and thus obtain higher space savings), use the sis start -s command. Depending on the size of the logical data in the volume, this process can take a long time to complete. Beginning with Data ONTAP 7.3.1, in addition to standard FlexClone, FlexClone at the file and LUN level is available and is allowed on deduplicated volumes. Deduplication can be used to regain capacity savings on data that was copied using FlexClone at the file or LUN level, and has been logically migrated (that is, with Qtree SnapMirror, SnapVault, NDMP dump, and so on).
Volume splitting
When a cloned volume is split from the parent volume, all of the original data in the clone is undeduplicated after the volume split operation. If deduplication is running on the cloned volume, this data gets deduplicated again in subsequent deduplication operations on the volume.
7888ch05.fm
file-level operation rather than a physical block-level operation. Not having deduplicated blocks on the tapes could actually be considered an advantage in some cases- if the data on the tape was deduplicated, it would then contain a proprietary format that would require Data ONTAP for restores.
57
7888ch05.fm
Important: With VI and vSphere, the need for proper partitioning and alignment of the VMDKs is extremely important (not just for deduplication). To help prevent the negative performance impact of LUN/VMDK misalignment, refer to IBM System Storage N series with VMware ESX Server, SG24-7636. Also note that the applications in which the performance is heavily affected by deduplication (when these applications are run without VI) are likely to suffer the same performance impact from deduplication when they are run with VI. A deduplication and VMware solution with file-based NFS shares is easy and straightforward. Combining deduplication and VMware with block-based LUNs requires a bit more work. For more information about this topic, please reference 5.4.10, Deduplication and LUNs
58
7888ch05.fm
Figure 5-5 This chart shows the amount of duplicate data recovered from an actual customers production data
Be cognizant of any application-level data management such as containerization or compression as this could potentially reduce the efficacy of deduplication block consolidation. Compressed, encrypted, containerized, block mis-aligned, or audio/video data will likely have low returns with deduplication. Be sure to determine if potential deduplication savings will offset deduplication capacity overhead requirements at the volume and aggregate levels.
Encrypted Data
Encryption removes data redundancy. As a result, encrypted data usually yields extremely low deduplication savings, if any. Because encryption can potentially be run at the share level, it is possible to create a flexible volume where only part of the data on the volume is encrypted and it is still possible to deduplicate the rest of the volume effectively.
Volatile Data
Volumes containing highly volatile data may not good candidates for deduplication because of unpredictable capacity impact. Also, Deduplication may run more frequently than desired if using the auto schedule.
59
7888ch05.fm
Tip: Remember that deduplication is volume-specific. Try to group similar data within the same volume to achieve optimal results.
5.7.1 Metadata
Although deduplication can provide storage savings in many environments, a small amount of storage capacity overhead is associated with it. This fact should be considered when planning for deduplication. Metadata includes the fingerprint database, the change logs, and temporary data. Fingerprint Database: A fingerprint record exists for every 4 KB data block. The fingerprint records for all of the data blocks in the volume are stored in the fingerprint database file. Change logs: The size of the deduplication change log files depends on the rate of change of the data and on how frequently deduplication is run. Temporary files: When deduplication is running, it creates several temporary files. These temporary metadata files are deleted when the deduplication process finishes running.
60
7888ch05.fm
Note: The amount of space required for deduplication metadata is dependent on the amount of data being deduplicated within the volumes, and not the size of the volumes or the aggregates.
61
7888ch05.fm
62
7888ch05.fm
0.5TB
2TB
1TB
2TB
3TB
4TB
16TB
16TB
16TB
16.5TB
18TB
17TB
18TB
19TB
20TB
32TB
32TB
32TB
63
7888ch05.fm
64
7888ch06-eva.fm
Chapter 6.
ProtecTIER planning
In this chapter, we discuss configuration and sizing considerations and give you detailed planning information to prepare for a smooth implementation of the TS7600 Family in your environment. You will find the following topics in this chapter: Hardware planning Planning for deduplication Planning for Open systems with Virtual Tape Library (VTL) Planning for Open systems with OpenStorage (OST) Planning for installation
65
7888ch06-eva.fm
6.1 Hardware planning for the 3959-SM1, 3958-AP1, 3958-DD4, and 3958-DE2
In this chapter, we describe which options and features can be configured with the IBM System Storage TS7610 ProtecTIER Deduplication Appliance Express (3959-SM1), with the IBM System Storage TS7650 ProtecTIER Deduplication Appliance (3958-AP1), with the IBM System Storage TS7650G ProtecTIER Deduplication Gateway (3958-DD4), and with the IBM System Storage TS7680 ProtecTIER Deduplication Gateway for System z (3958-DE2). We provide information about which configuration options are available and useful. This section discusses the following topics: Hardware and software components Configuration options
66
7888ch06-eva.fm
installed on a workstation you can start to work on the 3958-AP1. Once your host is attached and zoned to the 3958-AP1 you can set up your application on your host. The IBM System Storage TS7650G ProtecTIER Deduplication Gateway (3958-DD4) can be ordered with IBM storage. By combining the IBM advantages of IBM disk and tape storage subsystems a high reliability solution is delivered. The IBM System Storage TS7650G ProtecTIER Deduplication Gateway (3958-DD4) can also be ordered without backend disk storage and the 3958-DD4 also supports non-IBM disk storage. Note: For a list of disk subsystems that are supported by the TS7650G, refer to the interoperability matrix: ftp://service.boulder.ibm.com/storage/tape/ts7650_support_matrix.pdf Referring to the TS7650G in the System Storage Interoperation Center (SSIC) provides a list of supported environments: http://www-03.ibm.com/systems/support/storage/config/ssic/index.jsp The IBM System Storage TS7680 ProtecTIER Deduplication Gateway for System z is a 1-rack (19 inch) high availability solution. It comes with a redundant pair of Enterprise Tape Controllers for Mainframe Host (z/OS) attachment and two clustered ProtecTIER servers running the deduplication engine. The TS7680 comes without disk storage, allowing clients to use IBM or non-IBM disk storage as back-end disk storage. The functionality of the 3959-SM1, 3958-AP1, 3958-DD4 and the 3958-DE2 are identical but there are differences in performance and maximum capacity. Table 6-1 shows the specifications for a single server, in case of the DE2 it shows for two ProtecTIER servers.
Table 6-1 Differences for the 3959-SM1, 3958-AP1, 3958-DD4 and 3958-DE2 Components Host type Number of processors cores Number of threadsb Memory (GB) Number of virtual libraries Number of virtual tape drives Number of virtual cartridges Supported Virtual Library Emulation(s) 24 Up to 4 Up to 64 Up to 8.192 ATL P3000 DTC VTF 0100 IBM TS3500 IBM V-TS3500 3959-SM1 Open 4 3958-AP1 Open 16 32 32 Up to 12 Up to 256 Up to 128.000 ATL P3000 DTC VTF 0100 IBM TS3500 IBM V-TS3500 3958-DD4 Open 32 64 64 Up to 16 Up to 256 Up to 500.000 ATL P3000 DTC VTF 0100 IBM TS3500 IBM V-TS3500 64 1 256 1.000.000 IBM C06-PT library 3958-DE2a Mainframe 48
67
7888ch06-eva.fm
Up to 1 PB
Yes, c with IBM TS3500 virtual tape library definition and multipath driver No Yes
Yes, c with IBM TS3500 virtual tape library definition and multipath driver Yes Yes
Yes, c with IBM TS3500 virtual tape library definition and multipath driver Yes Yes
Yes
Two-node cluster configuration IP-based replication configuration Disaster recovery failover Flexible disk-based storage option Sustained in-line throughput d Data reduction d Preinstalled disk storage Server(s) come(s) in a rack
Yesa Yes
a. Only available in dual node configuration b. With Hyper-Threading (2 threads per core) c. Not for the Quantum P3000 virtual library d. Depending on the back-end disk storage and workload
68
7888ch06-eva.fm
manufacturing. The IBM System Storage TS7610 has native Reliability, Availability, and Serviceability (RAS) software, there is no need for additional hardware like TS3000 console. It is also installed at the manufacturing, but you have to make the final setup like providing details for email and-or SNMP traps. See section 9.1.1, TS7610 SMB Appliance on page 172. The Entry Edition Small is the 4 TB version. It can be easily upgraded to 5.4 TB, which is the Entry Edition Medium. Since the appliance contains all disk drives, no physical changes are required, only software update. The software media is shipped based on the upgrade order, which is Feature Code #9314. The upgrade is a customer driven procedure. Figure 6-1 on page 69 shows the view of the Appliance Express.
69
7888ch06-eva.fm
70
7888ch06-eva.fm
a one-unit (1U) System x server that allows an IBM System Service Representative (SSR) to perform maintenance and, if enabled by the customer, the TSCC can remotely monitor the installation and automatically call home with any hardware errors. The 3958-DD4 needs the purchase of the following additional hardware components (one or more frames, one or more disk arrays, and expansions) to be fully functional: The disk array is the term used in this document to refer to a disk storage subsystem. The expansion refers to a disk expansion attached to the disk array. A frame is a 19-inch rack supplied by the customer and used to house the Gateway servers and the TS3000 System Console. A second frame can be used to house disk arrays and expansions. Note: Disk arrays, expansions, and frames are required but are not included with the 3958-DD4. The supported disk arrays have the following characteristics: Support for the 3958-DD4 server operating system with the correct update level. Dual active-active controller for compatibility with the Linux Multipath software included in the 3958-DD4 server operating system to allow path failover. Fibre Channel or SATA disk systems. Support for the Back End Fibre Channel Host Bus Adapter (HBA) brand, model, and firmware level installed on the Gateway server. The back end HBAs are used to direct or SAN attach the 3958-DD4 server to the disk array. In case of SAN attachment, the disk array must also support the fabric switches used. In some cases (e.g. XIV) direct attach is not supported. The array should not perform its own compression by default. ProtecTIER does not require additional compression to be effective. ProtecTIER performs compression, by default, after the deduplication process. The front view of the 3958-DD4 server is shown in Figure 6-3 on page 71.
71
7888ch06-eva.fm
72
7888ch06-eva.fm
an IBM System Service Representative (SSR) to perform maintenance and, if enabled by the client, the also called TS3000 System Console (TSSC) can remotely monitor the installation and automatically call home with any hardware errors. A wide variety of disk-based storage can be attached to the TS7680, you can check it on the list for TS7650G. See Figure 6-5 on page 73 for a logical layout of the IBM System Storage TS7680 ProtecTIER Deduplication Gateway for System z.
Enterprise Controller
Fibre Channel
IP
Enterprise Controller
Fibre Channel
IP
ProtecTIER Server
TSSC
Disk
73
7888ch06-eva.fm
Datbase keeps Database keeps track of the number of times a storage location on User disk is referenced by virtual volumes
Virtual volume metadata files, stored on Meta Data LUNs, track where logical blocks were stored on the User disk
User data, written by Backup Applications, stored in locations on User Data RAID LUNs
S1 S2
T rack
S79 S78 S 77 S7 6
S4
DB 2: S2
DB3: S3
DB 4: S4
The capacity of the ProtecTIER Repository consists of the factored backup streams and the metadata that describes the factored backup streams, so it is fundamental to have the proper amount of back-end disk capacity as part of the ProtecTIER System configuration. The capacity reduction effect of deduplication is expressed as a deduplication ratio or factoring ratio. In essence, the deduplication ratio is the ratio of nominal data (the sum of all user data backup streams) to the physical storage used (including all user data, metadata, and spare capacity, that is, the total amount of disk storage for which the user pays). Dedupe is most effective in environments in which there is a high degree of data redundancy. For example, a 10:1 dedupe ratio implies that the system is able to find redundancy in 90% of the data received. Backup workloads are the best fit, as the backup process creates very high levels of redundancy: Weekly Fulls Daily Incrementals/Differentials Longer retention times increase the number of copies, and therefore the amount of data redundancy In Open Systems, 100% of ProtecTIER systems are deployed in backup environments. See details in 6.3, Planning for Open systems with VTL on page 75, and 6.4, Planning for Open systems with OST on page 101. Mainframe workload profiles will vary significantly. Often, a repository will consist of a large variety of applications and data types, which has implications on planning and sizing the repository. See Appendix , ProtecTIER deduplication on page 290.
74
7888ch06-eva.fm
Pre-Implementation Planning
1. 2. 3. Assessment of customer objectives Assessment of current backup and recovery environment Environment Interoperability Qualification (CEV form)
Tools:
1. ProtecTIER Data Collection Form (Environmental Survey, Data Protection Survey)
Capacity Planning
Performance Planning
Figure 6-7 Key Elements Within the TS7650G Implementation Planning Phase
You must understand how the product itself calculates the deduplication ratio and presents and displays the data. ProtecTIER calculates the deduplication ratio by comparing the
Chapter 6. ProtecTIER planning
75
7888ch06-eva.fm
nominal data sent to the system to the physical capacity used to store the data. This information is displayed through the ProtecTIER Manager GUI and through other ProtecTIER utilities. Table 6-2 describes general Customer profiles for Open systems, giving a high level overview and suggestion for a possible ProtecTIER Product. Since Appliances has a predefined capacity and performance, you have to make sure that your needs are covered with them. Note: In case of Appliances, upgrades can be done only within the same appliance family, either TS7610 or TS7650. In case of TS7650G (and TS7680 for System z) you do not have predefined capacity, it has to be sized in advance. Maximum 1 PB of physical storage can be attached to a TS7650G or TS7680 system. See more details about TS7680 in Appendix , ProtecTIER deduplication on page 290.
Table 6-2 General Customer profiles for Open systems ProtecTIER Solution TS7610 Small Appliance TS7610 Medium Appliance Physical Capacity 4 TB 5.4 TB Performance Up to 80 MB/s Up to 80 MB/s General Profile 500 GB or less incremental backups per day 1-3 TBs full backups each week Experiencing solid data growth Looking to make backup and recovery improvements without making radical changes
TS7650 Appliance
7 TB
Up to 150 MB/s
TB or less incremental backups per day 1-3 TBs full backups each week Experiencing average data growth Needs a cost effective solution 3 TBs or less incremental backups per day 3-6 TBs full backups each week Experiencing rapid data growth Needs good performance to meet backup window 5 TBs or less incremental backups per day 5-12 TBs full backups each week Additional growth expected Meeting the Backup window is an issue - higher performance needed
TS7650 Appliance
18 TB
Up to 250 MB/s
TS7650 Appliance
36 TB
Up to 500 MB/s
TS7650G Gateway
Up to 1 PB
Up to 1200 MB/s
>5 TB of backups per day Meeting backup window is an issue - needs highest performance Experiencing rapid data growth - needs scalability
Table 6-3 illustrates how physical space consumption is derived from the three general factors: backup size, retention, and data change rate. Table 6-3 is also giving a suggestion which model of appliance could be a good fit for the described cases.
76
7888ch06-eva.fm
Table 6-3 Suggested Appliance models for different backup size, retention, and data change rate Daily Backup (GB) 300 300 300 300 300 300 500 500 500 500 500 500 Retention (Days) 30 30 60 60 90 90 30 30 60 60 90 90 Data Change Rate 15% 20% 15% 20% 15% 20% 15% 20% 15% 20% 15% 20% Required Physical Space (GB) 1388 1750 2513 3250 3638 4750 2313 2917 4188 5417 6063 7917 Dedup Ratio x:1 6.5 5.1 7.2 5.5 7.4 5.7 6.5 5.1 7.2 5.5 7.4 5.7 Suggested Model
TS7610 4TB TS7610 4TB TS7610 4TB TS7610 4TB TS7610 5.4TB TS7610 5.4TB TS7610 4TB TS7610 4TB TS7610 5.4TB TS7650 7TB TS7650 7TB TS7650 18TB
77
7888ch06-eva.fm
What is your estimated annual data growth rate? What are your expected capacity savings with a deduplication solution? What are the possible changes that you plan to do in your current backup architecture? For example, you might have a current backup environment like this: LAN-free backup to the physical tape library for databases. Disk-to-disk-to-tape (D2D2T) for file servers, web servers, mail servers, and application servers. A disaster recovery solution based on remote tape vaulting by a truck. You want to change your environment to this configuration: LAN-free backup to the virtual tape library for databases. Disk to virtual tape library for file servers, web servers, mail servers, and application servers. For file servers with small files, you might choose to perform NDMP image backups to VTL or have backup application copy its disk storage pool to VTL. Disaster recovery solutions are based on remote virtual tape vaulting via replication. By greatly reducing the amount of data that is stored through the factoring process, only a fraction of the original data must replicated to protect against disaster. With the reduction in the amount of data, the required bandwidth and disk storage is greatly minimized. As a result, IBM System Storage TS7600 with ProtecTIER provides recovery from online disks and recovery might be fast, reliable, and manageable. After the requirement needs of the environment are well understood, the capabilities of a given solution must be assessed. This assessment might have two stages: An evaluation of the characteristics of the solution itself Actual testing of the system in a live environment
78
7888ch06-eva.fm
The length of time that you would like to retain backup data on disk at the local site and at the DR site The profile of applications that are being backed up Other unique elements of requirements
Throughput considerations
The IBM System Storage TS7600 with ProtecTIER is a virtual tape library with enterprise scale in-band factoring, which means that all data reduction occurs in real time as the backups are running (this is in contrast to post process, during which backups are first written to disk and then factored at a later point. The in-band factoring approach has many advantages, but also requires the appropriate level of hardware and proper configurations to achieve optimal performance. Properly configured, a single Enterprise-level IBM System Storage TS7650G with a ProtecTIER node is capable of achieving sustained throughput rates of up to 500 MBps in live production environments. Using a two-node clustered configuration, the IBM System Storage TS7650G with ProtecTIER can achieve sustained throughput rates of up to 900 MBps for backup and 1200 MBps for restore. The actual performance that any given environment achieves depends on several variables that we cover in this section. The purpose of this section is to discuss performance considerations that can impact throughput performance, measured in megabytes per second (MBps), when testing and deploying the IBM System Storage TS7600 with ProtecTIER. The following three components play a role in the overall system throughput that ProtecTIER can achieve:
Chapter 6. ProtecTIER planning
79
7888ch06-eva.fm
SAN connectivity Disk array Data type (also called backup policy) For each component, we list the best practices for optimal performance.
SAN connectivity
For the best SAN connectivity: Make sure that the fabric switches are up to the latest firmware revision of their operating system (contact manufacturer or reseller). IBM System Storage TS7600 with ProtecTIER front-end ports should not be in a zone with any other IBM System Storage TS7600 with ProtecTIER end ports. If possible, dedicated Host Bus Adapter (HBA) ports in the backup server should be zoned to single IBM System Storage TS7600 with ProtecTIER and its front-end ports. Ensure that Inter Switch Links (ISL) between switches, connected to IBM System Storage TS7600 with ProtecTIER ports and backup servers or storage arrays, are not oversubscribed. ISL links for DS4000 and DS5000 are not recommended. Use at least 4 Gbps HBAs for the TS7650G back-end and front-end connections. If using SAN P2P topology to connect the TS7650G to the disk array, create dedicated zones (one zone per initiator) for ProtecTIER backend ports. Best is if each zone has only one initiator and one target port, creating overlapping zones. Do not mix the ProtecTIER backend ports (Qlogic) with the front end ProtecTIER ports (Emulex) or any other SAN devices in the same zone.
Disk array
A critical hardware component in a ProtecTIER implementation is the disk array that holds the ProtecTIER Repository, see Figure 6-8 on page 81. The repository is the physical disk that holds the ProtecTIER HyperFactored data. There are two types of file systems that make up the ProtecTIER Repository: Meta data User data See more details about setting up of disk array in the section 6.3.3, Performance Sizing on page 93.
80
7888ch06-eva.fm
Network
Repository
CFS Metadata files STU data files
Figure 6-8 Repository of ProtecTIER: metadata and user data
Metadata file systems store all aspects of the data that is backed up and cataloged, but not the data itself, whether it requires new disk space or not. The user data file systems store the actual data that is backed up or referenced by new generations of the data. It is critical that the performance of the metadata file system be optimal. In general, we recommend RAID 10 RAID groups (4+4, 6+6, or 8+8 disks) for the meta data file systems. See details in section Meta Data on page 99. Note: The configuration of the disk array is the variable that has the greatest impact on overall system performance. Tuning the array for the unique ProtecTIER I/O pattern is critical. ProtecTIER is random read oriented. Eighty to ninety percent of I/O in a typical ProtecTIER system environment is random reads at 60KB block size. Therefore, any storage array deployed with ProtecTIER should be optimized for this I/O pattern. In all cases, the disk array manufacturer (or reseller) should be consulted to determine the best tuning and configuration parameters for the particular array being deployed. The user data LUNs should be tuned for a random-read intensive workload, while the metadata LUNs should be tuned for a random-write intensive workload. In 6.3.3, Performance Sizing on page 93 we describe implementation considerations related to the disk array configuration.
Data type
The other factor that affects performance in a ProtecTIER system environment is the data that is being targeted for backup. Some data, such as databases and Lotus Notes email, is highly compressible and also factors quite well. Other data, such as video or seismic data, cannot be compressed or factored well. Also, the various backup applications have features, such as encryption or multiplexing, that can also affect the ProtecTIER factoring ratio and performance. The type of data that ProtecTIER systems are HyperFactoring can affect both
81
7888ch06-eva.fm
the factoring ratio and system performance. This section also includes general system testing and benchmarking considerations. See Table 6-4 on page 82.
Table 6-4 Sample composition of Backup Policies, or Data types Data Type Production Database Data Warehouse Email File & Print Services Example DB2 Informix Lotus Notes Windows or Unix
Tip: Accurate planning assesses the capacity and performance requirements for each data types separately. Consider the following for the throughput: If multiple backup streams of the same data set are used for testing, we recommend that a single copy of the data be backed up first to populate the repository. If encryption features of the backup application are turned on, the factoring ratio and performance of these data sets will degrade drastically. Note: Always encrypt last. Deduplicating encrypted data is ineffective. Compressing encrypted data can decrease security. Drive-level encryption has no performance impact, and ensures that encryption is the last action performed. Backup application or database backup programs should disable compression. Compression is common with SQL database backups using LiteSpeed. Data should be sent to the ProtecTIER servers uncompressed, or the factoring ratio for this data will be low. ProtecTIER can manage multiple VTLs, each with its own configuration. For compressed data streams, create a new ProtecTIER VTL with Compression turned off. Compressing data a second time can cause data expansion, so compressed data should be segregated in ProtecTIER whenever possible. Note: Compaction is as well a type of data compression. Multiplexing features of the backup application or database backup tool (RMAN or similar) should be disabled. The backup application should send only one stream to each virtual tape drive. Since ProtecTIER systems allow up to 256 virtual tape drives per node for the IBM System Storage TS7600 with ProtecTIER, the system can process many streams. For example, before IBM System Storage TS7650 and TS7650G with ProtecTIER, RMAN sent nine streams (three file streams to three real tape drives each). With the IBM System Storage TS7650 and TS7650G, the parameters should be adjusted to send one file stream each to nine virtual tape drives. This does not affect database backup speed (it might improve). Small files do not factor as well as larger files. For best system performance, in test or in production, at least 24 data streams of backup data sets should be run at the same time. This takes advantage of the fact that ProtecTIER systems can process 24 storage units at the same time. The storage unit is one of the four allocation entities that ProtecTIER systems use for abstracting the physical disk to provide contiguous logical space when actual physical space is fragmented. Other options to improve factoring for small files (less than 32 KB) would be to:
82
7888ch06-eva.fm
For files residing on NAS boxes, perform NDMP image backups and send the backups to ProtecTIER. File level backups should first back up to the backup application Disk Storage Pools, and then Disk Storage Pool can be copied to ProtecTIER. Tip: In general, the more virtual tape drives defined, the better. IBM System Storage TS7600 with ProtecTIER is optimized for a large number of virtual drives.
83
7888ch06-eva.fm
Definitions The rate at which data received from the backup application changes from backup to backup. This measurement is more relevant when like policies are compared. Data change rates can be from 1-25% and are best calculated with tools like the IBM TPC for Data product. The period in time (usually in days) that defines how long customers will keep their disk-based backups online. Retention periods on average are 30-90 days, but can be longer depending on business and government regulations.
All the information required in the data protection survey is fundamental to sizing the solution, and the data change rate is the most important variable in determining the size of the ProtecTIER repository. The data change rate can be an estimate or can be measured through a site assessment. The estimated or measured data change rate and all the information gathered by the data protection survey provides an estimate or measurement for the factoring ratio, which is defined as the ratio of nominal capacity (the sum of all user data backup streams) to the physical capacity used (including all user data, metadata, and spare capacity, that is, the total amount of disk storage for which the user pays). In the following section, we discuss formulas and worksheets necessary to calculate the factoring ratio and the corresponding physical capacity to purchase for sustaining a certain amount of nominal capacity. Figure 6-9 on page 84 shows the overview about inputs and outputs of the IBM Capacity Planner tool.
Input from Customer: - Backup Size (full/incr) - Backup Frequency (weekly/daily) - Retention Period (days) - Backup Window (hours)
Output:
- Nominal Capacity (TB) - Factoring Ratio (HF) - Required UD Repository Size (TB) - Required Performance (MB/s)
In production environments, the ProtecTIER Repository will be a blend of many backup policies (data types) that protect many different application and data environments. Each backup policy has two variables that primarily influence the realized factoring ratio (and subsequent physical storage requirements for the ProtecTIER Repository): The data change rate The data retention period The values of these variables differ across the various backup policies and associated data sets. 84
IBM Storage Data Deduplication Solutions
7888ch06-eva.fm
Note: Each policy can be said to have its own unique factoring ratio and nominal and physical storage capacities. The key task in capacity planning is to determine the physical storage required for all data types used in the analysis. This is done by first determining the nominal and physical storage capacities required for each data type and totaling these values up for all data types. After a total nominal and total physical storage capacity is calculated, a system-level factoring ratio can be calculated for the overall repository. Therefore, a weighted average change rate is calculated based on percentage estimates of each type of backup policy. Capacity planning is both an art and a science. When sizing the ProtecTIER Repository capacity, it is important to build in some extra capacity. This allows for a margin of error and adds a buffer for scenarios that require more capacity, for example: You add more backup policies to your environment. Your backup polices grow (corporate data growth). The size of this buffer or padding will vary from situation to situation. Note: Adding 10% to the physical storage calculations is a good rule of thumb. If you can appreciate the importance of this margin, and given the value in disk savings that ProtecTIER systems provides, the incremental cost of the disk is easily justified.
85
7888ch06-eva.fm
There are other factors, which are influencing the Factoring ratio: Use of compression and/or encryption software prior to data being deduplicated will dramatically lessen the factoring ratio. Type of data - typically data base files, operating system and application software packages, log files, email, user documents, and snapshots provide high factoring. For images, video, seismic data the factoring ratio will be lower, unless they are redundant. Backup application- Tivoli Storage Manager incrementals forever will lead to less data, thus less possibilities for deduplication. Full backups will lead to higher ratio. Initial factoring ratios will sometimes be lower then expected while the repository is being populated with daily/weekly backups. The expected factoring ratios will be achieved once a full 60-90 days of backups have been completed.
Where: This parameter represents the overall capacity stored in the repository during the retention period and is composed of all the full and incremental versions stored. FullCapacityVersions This parameter represents the overall full backup capacity (expressed in GB) stored during the retention period. In the following formula, you can see how the FullCapacityVersions depends on the FullCapacity, FullRetention, and FullFrequency parameters. FullCapacity This parameter represents the capacity (expressed in GB) stored during full backup. FullRetention This parameter represents the retention period (expressed in days) for the full backup jobs (for example, you might decide to retain your full jobs for 30 days). FullFrequency This parameter indicates how often you perform the full jobs during the retention period (for example, four versions in 30 days, that is, one full job a week, so this parameter must be set to a value of 7). Note: The number of versions is obtained by dividing FullRetention by FullFrequency. In the following formula, you can see the relationship between these parameters.
FullRetention FullCapacityVersions = FullCapacity ----------------------------------------- FullFrequency
NominalCapacity
86
7888ch06-eva.fm
Where: IncrementalCapacityVersionsThis parameter represents the overall incremental backup capacity (expressed in GB) stored during the retention period. In the formula below, you can see how the IncrementalCapacityVersions depends on the Incremental Capacity, IncrementalFrequency, IncrementalRetention, FullRetention, and FullFrequency parameters. IncrementalCapacity This parameter represents the capacity (expressed in GB) stored during incremental backup. IncrementalRetention This parameter represents the retention period (expressed in days) for the incremental backup jobs. IncrementalFrequency This parameter indicates how often you perform the incrementals during the retention period (this parameter must be set to the value 1 if you perform an incremental every day). FullRetention This parameter represents the retention period (expressed in days) for the full backup jobs (for example, you might decide to retain your full jobs for 30 days). FullFrequency This parameter indicates how often you perform the full jobs during the retention period (for example, four versions in 30 days, that is, one full job a week, so this parameter must be set to a value of 7). Note: In the formula below, you can see that you have to remove the number of full versions because during full backups, incremental backups are not performed. For the physical capacity, we have the following formula.
IncrementalCapacityVersions = IncrementalCapacity IncremetalRetention FullRetention ----------------------------------------------------------------- ----------------------------------------- IncrementalFrequency FullFrequency
Where: PhysicalCapacity This parameter represents the physical capacity (expressed in GB) effectively required in the repository to satisfy the nominal capacity of your environment. FullPhysicalCapacity This parameter indicates the full physical capacity (expressed in GB) effectively required in the repository. In the formula below, note that a first full backup must be entirely stored because no data is in the repository. Therefore, it is not possible to make an initial delta comparison. IncrementalPhysicalCapacityThis parameter indicates the incremental physical capacity (expressed in GB) effectively required in the repository. CompressionRate This parameter describes the compression rate obtainable in the ProtecTIER through its Delta Compression. Note that it is possible to reduce the initial backup of unique new data as well. In the formula shown below, you can calculate the FullPhysicalCapacity parameter. FullPhysicalCapacity = < FullCapacity + (FullCapacityVersions - FullCapacity) x FullChange Rate FullChangeRate indicates the estimated change rate between full backups in your current environment. Again, note that a first full backup must be entirely stored because no data is present on the repository, and so it is not possible to make an initial delta comparison.
87
7888ch06-eva.fm
The following formula shows how to calculate the incremental physical capacity. IncrementalPhysicalCapacity = IncrementalCapacityVersions x Incremental ChangeRate IncrementalChangeRate indicates the estimated change rate between incremental backups in your current environment. Note that a first full backup must be entirely stored, because no data is present on the repository, and so it is not possible to make an initial delta comparison. Finally, the factoring ratio is shown in the following formula.
NominalCapacity FactoringRatio = -------------------------------------------------PhysicalCapacity
This formula is quite complex, but it might give you an idea of the impact of the estimated data change rate on the estimated factoring ratio. Increasing the data change rate leads to a decreased factoring ratio. Also note how the compression rate is inversely proportional to the physical capacity. Another relationship involves the nominal capacity, the retention period, and the backup frequency. Increasing the retention period or decreasing the backup frequency leads to an increasing factoring ratio.
88
7888ch06-eva.fm
equilibrium, the number of cartridges that are returned to the available pool roughly equals the number required for the given days payload. What does this mean in terms of impacting the backup operator? It means that capacity shortages are usually easy to predict. Typically, if the number of new tapes used exceeds the number of tapes being returned to the pool, capacity shortages will happen. One other key point to note is that in the physical tape world, there are early warning (EW) signals provided to the backup application by the tape drive when a tape cartridge is nearing its end. This signal allows the backup application to change to a fresh cartridge efficiently. This EW signal is relevant to understanding IBM System Storage TS7600 with ProtecTIER capacity management.
89
7888ch06-eva.fm
In Table 6-7, the factoring ratio at equilibrium is less than the factoring ratio that was used when the TS7650G or TS7650 was first installed. As you can see in Table 5-8, there are 1000 tape cartridges, but because the factoring ratio has stabilized to a lower value (8:1 versus 10:1), the nominal capacity has decreased from 100 TB to 80 TB. To accommodate the change, the capacity per cartridge has decreased from 100 GB to 80 GB.
Table 6-7 Effect of learning algorithm with a lower than expected factoring ratio Day Day 1 Day 30 Physical capacity 10 TB 10 TB Number of cartridges 1000 1000 Factoring ratio 10:1 8:1 Nominal capacity 100TB 80 TB Capacity per cartridge 100 GB 80 GB
As cartridge size changes, the EW signal arrives sooner or later than originally. In the example shown in Table 6-6, the EW for each cartridge occurs 20 GB later on day 30 than on day 1, allowing more data to fit on a given cartridge. In the example shown in Table 6-7, the EW for each cartridge occurs 20 GB earlier on day 30 than on day 1, allowing less data to fit on a given cartridge. As a result of the learning algorithm, more or fewer tapes will be consumed during a given days workload. Note: Backup administrators for ProtecTIER must keep track of the number of cartridges, as this number is used as a key indicator of capacity fluctuations.
90
7888ch06-eva.fm
the system runs a full backup cycle (for all data sets that it will manage), the capacity changes should stabilize.
Capacity management implications: Adding new data sets to an existing IBM System Storage TS7600 with ProtecTIER
Often during the initial phase, or sometime thereafter, you will decide to send more data to a TS7650G or TS7650. While this is a common occurrence, this situation creates new implications of which the backup administrator must be aware. A new data stream will have a high change rate (given that all of the data is new to the IBM System Storage TS7600 with ProtecTIER). This causes an increase in the system-wide change rate and a decrease in the nominal capacity, because the factoring ratio is going to decrease. As the new data set runs through a full cycle, the nominal capacity might or might not return to what it was previously, depending on the data change rate of the new data set. Given the variability that is inherent in this situation, you must be aware of the phenomenon and understand the impact. The best way to add new data streams is to first sample the data to project the likely impact. In some cases, this action might create a need for more physical disk that might or might not have been built-in to the original TS7650G or TS7650 design.
91
7888ch06-eva.fm
92
7888ch06-eva.fm
The reserve margin is established during the cartridge screen of the create-library wizard. You will see the same wizard when adding new cartridge adding to a library. This wizard gives you the option to set maximum cartridge growth, see Figure 6-10.
The virtual cartridge size text box is dynamically calculated, depending on the number of cartridges you add, and it does not count any reserve. If you set up the Max. cartridge growth to a lower value than the calculated value, you can create reserve. In our example, it is 50 GB, or 50%. You can experiment with the numbers, by adding more or less cartridges in the no. of cartridgesfield. The maximum cartridge growth means that the cartridge will report end of tape when reaching the 100 GB. Since without setting this up, your cartridge size would be around 150 GB, you create some reserve for the deduplication ratio to change. Note: If you create multiple libraries, you need to calculate the total nominal capacity and then compute the appropriate number of carts and max-growth ahead of time in order to maintain a reserve. The process above only applies to a new system with a single library.
93
7888ch06-eva.fm
Table 6-8 Comparison between Fibre Channel and SATA Factor Spin speed Command queuing Single disk I/O rate (number of 512 byte IOPS)a Read bandwidth (MBps) Write bandwidth Fibre Channel 10.000 and 15.000 Yes 16 max 280 and 340 SATA 7.200 No 1 max 88 .31 and .25 SATA difference
69 and 76 68 and 71
60 30
a. Note that IOPS and bandwidth figures are from disk manufacturer tests in ideal lab conditions. In practice, you will see lower numbers, but the ratio between SATA and FC disks still applies.
The speed of the drive is the number or revolutions per minute (RPM). A 15 K drive rotates 15,000 times per minute. At higher speeds, the drives tend to be denser, as a large diameter plate driving at such speeds is likely to wobble. With the faster speeds comes the ability to have greater throughput. Seek time is how long it takes for the drive head to move to the correct sectors on the drive to either read or write data. It is measured in thousandths of a second (milliseconds or ms). The faster the seek time, the quicker the data can be read from or written to the drive. The average seek time reduces when the speed of the drive increases. Typically, a 7.2 K drive will have an average seek time of around 9 ms, a 10 K drive will have an average seek time of around 5.5 ms, and a 15 K drive will have an average seek time of around 3.5 ms. Command queuing allows for multiple commands to be outstanding to the disk drive at the same time. The drives have a queue where outstanding commands can be dynamically rescheduled or re-ordered, along with the necessary tracking mechanisms for outstanding and completed portions of workload. The SATA disks do not have command queuing and the Fibre Channel disks currently have a command queue depth of 16.
RAID
We describe Redundant Array of Independent Drives (RAID), and their levels which you can build, and explain why we choose a particular level in a particular situation.
RAID 0
RAID0 is also known as data striping. It is well suited for program libraries requiring rapid loading of large tables or, more generally, applications requiring fast access to read-only data or fast writing. RAID 0 is only designed to increase performance. There is no redundancy, so any disk failures require reloading from backups. Select RAID 0 for applications that would benefit from the increased performance capabilities of this RAID level. Never use this level for critical applications that require high availability.
RAID 1
RAID 1 is also known as disk mirroring. It is most suited to applications that require high data availability, good read response times, and where cost is a secondary issue. The response time for writes can be somewhat slower than for a single disk, depending on the write policy. The writes can either be executed in parallel for speed or serially for safety. Select RAID 1 for applications with a high percentage of read operations and where cost is not a major concern.
94
7888ch06-eva.fm
Because the data is mirrored, the capacity of the logical drive when assigned RAID 1 is 50% of the array capacity. Here are some recommendations when using RAID 1: Use RAID 1 for the disks that contain your operating system. It is a good choice because the operating system can usually fit on one disk. Use RAID 1 for transaction logs. Typically, the database server transaction log can fit on one disk drive. In addition, the transaction log performs mostly sequential writes. Only rollback operations cause reads from the transaction logs. Therefore, we can achieve a high rate of performance by isolating the transaction log on its own RAID 1 array. Use write caching on RAID 1 arrays. Because a RAID 1 write will not complete until both writes have been done (two disks), performance of writes can be improved through the use of a write cache. When using a write cache, be sure that it is battery-backed up. Note: RAID 1 is actually implemented only as RAID 10 on DS4000 and DS5000 products.
RAID 5
RAID 5 (Figure 6-11 on page 96) stripes data and parity across all drives in the array. RAID 5 offers both data protection and increased throughput. When you assign RAID 5 to an array, the capacity of the array is reduced by the capacity of one drive (for data-parity storage). RAID 5 gives you higher capacity than RAID 1, but RAID 1 offers better performance. RAID 5 is best used in environments requiring high availability and fewer writes than reads. RAID 5 is good for multi-user environments, such as database or file system storage, where typical I/O size is small and there is a high proportion of read activity. Applications with a low read percentage (write-intensive) do not perform as well on RAID 5 logical drives because of the way that a controller writes data and redundancy data to the drives in a RAID 5 array. If there is a low percentage of read activity relative to write activity, consider changing the RAID level of an array for faster performance. Use write caching on RAID 5 arrays, because RAID 5 writes will not be completed until at least two reads and two writes have occurred. The response time of writes will be improved through the use of write cache (be sure that it is battery-backed up).
95
7888ch06-eva.fm
Logical Drive
Block 0 Block 1 Block 2 Block 3 Block4 Block 5 etc
Host View
RAID set
Disk 1
Disk 2
Disk 3
Disk 4
Disk 5
RAID 5 arrays with caching can give as good performance as any other RAID level, and with some workloads, the striping effect gives better performance than RAID 1.
RAID 6
RAID 6 (Figure 6-12 on page 97) provides a striped set with dual distributed parity and fault tolerance from two drive failures. The array continues to operate with up to two failed drives. This makes larger RAID groups more practical, especially for high-availability systems. This becomes increasingly important because large-capacity drives lengthen the time needed to recover from the failure of a single drive. Single parity RAID levels are vulnerable to data loss until the failed drive is rebuilt. The larger the drive, the longer the rebuild will take. Dual parity gives time to rebuild the array without the data being at risk if one drive, but no more, fails before the rebuild is complete. RAID 6 can be used in the same workloads in which RAID 5 excels.
96
7888ch06-eva.fm
Logical Drive
A B P1 = Parity One P2 = Parity Two C D E F etc
Host View
RAID set
A E I M
B F J P1-M,N,O,P
C G P1-I,J,K,L P2-M,N,O,P
D P1-E,F,G,H P2-I,J,K,L N
P1-A,B,C,D P2-E,F,G,H K O
P2-A,B,C,D HH L L PP
Disk 1
Disk 2
Disk 3
Disk 4
Disk 5
Disk 6
RAID 10
RAID 10 (Figure 6-13 on page 98), also known as RAID 1+0, implements block interleave data striping and mirroring. In RAID 10, data is striped across multiple disk drives, and then those drives are mirrored to another set of drives. The performance of RAID 10 is approximately the same as RAID 0 for sequential I/Os. RAID 10 provides an enhanced feature for disk mirroring that stripes data and copies the data across all the drives of the array. The first stripe is the data stripe. The second stripe is the mirror (copy) of the first data stripe, but it is shifted over one drive. Because the data is mirrored, the capacity of the logical drive is 50% of the physical capacity of the hard disk drives in the array.
97
7888ch06-eva.fm
Logical Drive
Block 0 Block 1 Block 2 Block 3 Block 4 Block 5 etc
Host View
Stripeset
Figure 6-13 RAID 10
The recommendations for using RAID 10 are: Use RAID 10 whenever the array experiences more than 10% writes. RAID 5 does not perform as well as RAID 10 with a large number of writes. Use RAID 10 when performance is critical. Use write caching on RAID 10. Because a RAID 10 write will not be completed until both writes have been done, write performance can be improved through the use of a write cache (be sure that it is battery-backed up). When comparing RAID 10 to RAID 5: RAID 10 writes a single block through two writes. RAID 5 requires two reads (read original data and parity) and two writes. Random writes are significantly faster on RAID 10. RAID 10 rebuilds take less time than RAID 5 rebuilds. If a real disk fails, RAID 10 rebuilds it by copying all the data on the mirrored disk to a spare. RAID 5 rebuilds a failed disk by merging the contents of the surviving disks in an array and writing the result to a spare. RAID 10 is the best fault-tolerant solution in terms of protection and performance, but it comes at a cost.
98
7888ch06-eva.fm
Meta Data
We highly recommend that Fibre-channel disks should be used for Meta Data LUNs and RAID10 groups for Meta Data LUNs (with layout per planning requirements). Meta Data RAID Groups: The recommended number of Meta Data RAID groups is determined by the performance planning tool with the help of IBM Representative. This number can range from 2 to 10 or more RAID groups (based on repository size, factoring ratio, and performance needs). Note: Use only one type of RAID groups for Meta Data, do not mix them. For example: if one RAID group for Meta Data is 4+4, then do not create 2+2 for the second Meta Data RAID group. Only create one LUN per RAID Group - that is, one LUN that spans the entire RAID group. The only exception is for the single 1GB Meta Data LUN. The 1GB Meta Data LUN (Quorum or with other name Cluster filesystem) could be created on any of the Meta Data RAID group. The size of the required Meta Data LUNs/file systems is a function of the nominal capacity of the repository (physical space and expected factoring ratio) and could be determined with the help of IBM Representatives, with the Meta Data planner tool, prior to the system installation.
User Data
Fibre Channel User Data LUNs: we recommend that RAID5 with at least five disk members (4+1 per group) should be used. SATA User Data LUNs: We recommend to use RAID5 with at least five disk members (4+P or 8+P per group) or to use RAID6. User Data RAID Groups: the IBM Performance Planner tool will provide the recommended amount of RAID Groups for the given throughput requirements. We recommend that at least 24 User Data RAID groups are created for optimal performance. Only create one LUN per RAID Group. The size of User Data RAID Groups/LUNs should be consistent. For example, dont mix 7+1 SATA User Data LUNs with 3+1 SATA LUNs. Smaller disk groups will hold back the performance of the larger groups and will degrade the overall system throughput. File system size should be maximum 8TB. ProtecTIER wont allow to use any file system larger than 8 TB size. Note: Keep in mind that hot spare disks should be separately assigned.
Repository
The repository will consists of the Meta Data LUNs and User Data LUNs. See an overview of the repository on Figure 6-14 on page 100.
99
7888ch06-eva.fm
Repository
HyperFactor Index
RAID-10
Metadata Metadata
Metadata Metadata
Metadata Metadata
RAID-6 RAID-5
User Data
User Data from Backup Application
User Data
Tool Input:
- Disk technology (FC/SATA/SAS) - Disk speed (10k/15k) - Disk capacity (GB) - RAID grouping (4+4/5+1) for both MD and UD
Input from Capacity Planning: - Physical User Data Repository Size - Factoring Ratio - Required Performance
- # of User Data File Systems needed - # of HDDs needed for Meta Data File Systems - # of Meta Data File Systems needed
100
7888ch06-eva.fm
With OpenStorage, ProtecTIER can be integrated with NetBackup to provide the means for backup-to-disk without having to emulate traditional tape libraries. Using a plug-in that is installed on an OST-enabled media server, ProtecTIER can implement a communication protocol that supports data transfer and control between the backup server and the ProtecTIER server. Therefore, to support the plug-in, ProtecTIER implements a storage server emulation.
Tight integration
NetBackup is aware of all backup image copies and manage the creation, movement and deletion of all backup images.
Chapter 6. ProtecTIER planning
101
7888ch06-eva.fm
102
7888ch06-eva.fm
Note: If you are planning to run ProtecTIER Manager on a UNIX system, configure your graphics card and X windows system. This is done either manually or using the Xconfigurator utility. For instructions, refer to the appropriate Linux documentation.
103
7888ch06-eva.fm
104
7888ch07.fm
Chapter 7.
ITSM planning
This chapter contains information that help you plan for a successful IBM Tivoli Storage Manager (ITSM) deduplication implementation. In this chapter, the following topics are discussed: ITSM Planning overview ITSM Deduplication pre-requisites Types of ITSM deduplication ITSM deduplication considerations When to use ITSM deduplication When not to use ITSM deduplication
105
7888ch07.fm
106
7888ch07.fm
Now imagine we have a nightly backup load of 100,000 files per client and we have 300 clients. This represents 30,000,000 (100,000 * 300) files for that nightly workload. If those 30,000,000 files represented 60,000,000 deduplicatable extents, the total archive log space required would be around 83.8 GB: (60,000,000 * 1500) / 1024 / 1024 / 1024 = 83.8 GB. Another consideration of the active log impact from IDENTIFY processing is the size of the files. For example, if a client backs up a single object, perhaps an image backup of a file system, that is 800 GB in size. This can represent a very high number of extents because of the nature of this data; for this discussion, we assume that it is 1.2 million extents. This 1.2 million extents would represent a single transaction for an IDENTIFY process that requires an estimated 1.67 GB of active log space: (1,200,000 * 1500) / 1024 / 1024 / 1024 = 1.67 GB. This 1.67 GB of active log space for that single file might be easily attainable in isolation. If the deduplication enabled storage pool will have a mix of data (a lot of small files and some very large highly deduplicatable files) then we recommend that you multiple the active log size estimate by two. For example, if the previous estimate recommended a 25 GB active log size, with deduplication of mixed (small and big) files the active log size becomes 50 GB and the archive log is then 150 GB (three times the size of the active log). An easier approach to estimate the ITSM active log size is that if you deduplicate very large objects (for example, image backups), you can use an active log size that is 20% of the database size. So, if you have a 300 GB database for example, you should consider using a 60 GB active log (300 * 0.20) and a 180GB archive log (60 * 3). In conclusion, there are a number of factors to consider when planning for the size of the ITSM server active and archive logs. The previous examples presented some basic values that can be used for estimation purposes. Keep in mind that you might need to consider larger values in your actual environment.
107
7888ch07.fm
Assuming that one third of these files are deduplicated and that each deduplicated file has 10 extends: ((1,500,000 / 3) * 10 * 250 bytes) / 1024 / 1024 / 1024 = 1.16 GB Therefore 2.56 GB of database space is required. Allow up to 50% additional space (or 1.28 GB) for overhead. The database should then have at least 3.84 GB for that client node. If you cannot estimate the numbers of files that may be backed up, you can roughly estimate the database size as from 1% to 5% of the required server storage space. For example, if you need 100 GB of server storage, your database should be 1 - 5 GB. Note: As shown, data deduplication can have a big impact in the ITSM database size. If you plan on using data deduplication, make sure you have enough database space left (consider additional 50% free database space for data deduplication).
7.3 Memory
'Minimum memory requirements to run production servers: On 64-bit systems (which are recommended): 12 GB, or 16 GB if you use deduplication. If you plan to run multiple instances, each instance requires the memory listed for one server. Multiply the memory for one server by the number of instances planned for the system.' These are the minimums, For a highly used TSM 6 server with deduplication you may want to consider least 64GB per instance.
108
7888ch07.fm
1. Client1 backs up files A, B, C and D. Files A and C have different names, but the same data.
A B C D
2. Client2 backs up files E, F and G. File E has data in common with files B and G.
E F G
Client1 Vol1
A B C D
Server
Server
3. TSM server process chunks the data and identifies duplicate chunks C1, E2 and G1.
Server Vol3
A1 B1 B2 D1 E1 E3 A0 B0 C0 D0 B1 E1 F1
Server-side data deduplication offers more protection against data loss. By default, primary sequential-access storage pools that are set up for data deduplication must be backed up to non-deduplicated copy storage pools before they can be reclaimed and before duplicate data can be removed. The default ensures that the server has copies of whole files at all times, in either a primary storage pool or a copy storage pool. If your ITSM server has enough CPU and disk I/O resources available but your ITSM clients do not, you should consider using only server-side deduplication.
109
7888ch07.fm
1. Client1 backs up files A, B, C and D. They are all different files with different chunks.
A B C D
2. Client2 will back up files E, F and G. Files E and F have data in common with files B and C.
Client1 Vol1
A B C D
Server
Server
3. TSM Server/Client identify which chunks are duplicates using hash index DB built on Client. Duplicate data is not sent to Server.
E1 G
4. Identify not needed to recognize duplicate data. Reclamation processing only needed when dead space happens for deleted data
Server
A1 B1 B2 C1 B1 A0 B0 C0 D0 D1 E1 E1 G1
Server Vol3
A0 B0 C0 D0 D1 E1 G1 A1 B1 B2 C1 B1 E1
Client-side data deduplication provides several advantages to server-side data deduplication: It reduces the amount of data sent over the network (LAN). The processing power that is required to identify duplicate data is offloaded from the server to client nodes. Server-side data deduplication is always enabled for deduplication-enabled storage pools. However, files that are in the deduplication-enabled storage pools and that were deduplicated by the client, do not require additional processing. The processing power that is required to remove duplicate data on the server is eliminated, allowing space savings on the server to occur immediately Note: For further data reduction, you can enable client-side data deduplication and compression together. Each extent is compressed before it is sent to the server. Compression saves space, but it increases the processing time on the client workstation. For client-side data deduplication, the IBM Tivoli Storage Manager server must be Version 6.2 or higher. If your network is highly used and you have enough CPU and I/O resources available on the ITSM client, you should consider using client-side deduplication for that client.
110
7888ch07.fm
111
7888ch07.fm
tsm: TSMSRVR>
7.5.8 Collocation
You can use collocation for storage pools that are set up for data deduplication. However, collocation might not have the same benefit as it does for storage pools not set up for data deduplication. 112
IBM Storage Data Deduplication Solutions
7888ch07.fm
By using collocation with storage pools that are set up for data deduplication, you can control the placement of data on volumes. However, the physical location of duplicate data might be on different volumes. No-query-restore and other processes remain efficient in selecting volumes that contain non-deduplicated data. However, the efficiency declines when additional volumes are required to provide the duplicate data.
113
7888ch07.fm
copy storage pool will be referencing different chunks and maintaining different database references in order to track and manage the data chunks that represent a given file. On the other hand, the non-deduplicated copy storage pool in this case, is likely real tape (something other then DEVTYPE=FILE) and is being used along with DRM for the purpose of having an off site data protection site. The off site copy storage pool volumes in conjunction with an appropriate database backup can then be used to restore the server and restore/retrieve data from the copy storage pool volumes. In case you are using virtual volumes to store data on another ITSM server (a source ITSM server can send data to a remote ITSM server storage pool) the following scenarios apply: when you copy or move data to a deduplicated storage pool that uses virtual volumes, the data is deduplicated. when you copy or move data from a deduplicated storage pool to a non-deduplicated storage pool that uses virtual volumes, the data is reconstructed.
114
7888p03.fm
Part 3
Part
Implementing Deduplication
In this part we discuss the implementation tasks required for deduplication.
115
7888p03.fm
116
7888ch08-rucel.fm
Chapter 8.
117
7888ch08-rucel.fm
Note: The Gateway versions of the models require Data ONTAP 7.3. Table 8-2 lists of software features needed.
Table 8-2 Software features needed Requirements Software licenses Specification NearStore option Deduplication(A-SIS bundled on ONTAP see 5.3, Deduplication Licensing on page 43) All See IBM System Storage N series Data ONTAP 7.3 Storage Management Guide, GC52-1277
Protocols Maximum deduplication volume sizes for different Data ONTAP versions (see 5.10, Aggregate and Volume Considerations on page 63)
Note: A volume should never exceed the deduplicated volume size limit for the entire life of the volume. If a volume ever becomes larger than this limit, and is later shrunk to a smaller size, deduplication cannot be enabled on that volume.
118
7888ch08-rucel.fm
8.2.3 Features
System Manager 1.1R1 includes the following features: Quota management: System Manager helps you to manage your quotas. You can create, edit, configure, and delete quotas using a wizard. You can also define quotas when you create a volume or qtree using the Quotas tab in the Volume Create and Qtree Create dialog boxes. System Manager provides complete support for N Series storage system array LUNs. An array LUN is a group of disks or disk partitions in a span of storage space. Instead of direct-attached disks, you can view the array LUNs, create aggregates, and add array LUNs to an existing aggregate. Large aggregate support (16 TB and larger): When configured with Data ONTAP 8.0, System Manager supports aggregates larger than 16 TB. NFS protocol configuration: System Manager supports NFS protocol configuration using the System Manager GUI. Seamless Windows integration System Manager integrates seamlessly into your management environment by using the Microsoft Management Console (MMC). Discovery and setup of storage systems System Manager enables you to quickly discover a storage system or an active/active configuration (HA pair) on a network subnet. You can easily set up a new system and configure it for storage. SAN provisioning System Manager provides a workflow for LUN provisioning, as well as simple aggregate and FlexVol creation. Network-attached storage (NAS) provisioning System Manager provides a unified workflow for CIFS and NFS provisioning, as well as management of shares and exports. Management of storage systems System Manager provides ongoing management of your storage system or active/active configuration (HA pair). Streamlined active/active configuration (HA pair) management System Manager provides a combined setup for active/active configuration (HA pair) of IBM N series storage systems, logical grouping and management of such a configuration in the console or navigation tree, and common configuration changes for both systems in an active/active configuration (HA pair). Systray (Windows notification area) System Manager provides real-time monitoring and notification of key health-related events for a IBM N series storage system. iSCSI and FC System Manager manages iSCSI and FC protocol services for exporting data to host systems.
119
7888ch08-rucel.fm
Figure 8-2 on page 121 shows the license agreement of the IBM N series System Manager. Choose the radio button to agree and press next.
120
7888ch08-rucel.fm
Select which folder the System Manager will be installed as shown in Figure 8-3 on page 122. Select Everyone if you want to allow all users to manage all IBM N series storage system else choose Just me. Then press the next button.
121
7888ch08-rucel.fm
Figure 8-4 on page 123 shows that installation is ready to start. Press next to start the installation.
122
7888ch08-rucel.fm
Figure 8-5 on page 124 shows that IBM N series System Manager Software is now being installed.
123
7888ch08-rucel.fm
Figure 8-6 on page 125 shows that IBM N series System Manager software has been installed. Press the close button to finish the setup wizard and launch the System Manager.
124
7888ch08-rucel.fm
125
7888ch08-rucel.fm
If you have listed the IP address of your IBM N series Storage Systems, press the Add button and specify the IP address of the storage to manage as shown in Figure 8-8 on page 127.
126
7888ch08-rucel.fm
If you don't have the lists of IP address of the IBM N series storage system, choose Discover button and the IBM N series System Manager will discover all IBM N series Storage System running in the network as shown in Figure 8-9 on page 128
127
7888ch08-rucel.fm
Figure 8-10 on page 129 shows the discovered IBM N series storage system running in the network.
128
7888ch08-rucel.fm
To gain access to the storage system, a user id and password must be specify to each system as show in Figure 8-11 on page 130
129
7888ch08-rucel.fm
Figure 8-12 on page 131 shows the Managed IBM N series storage system list. Properties, Performance, Recommendations and Reminders are being displayed in the System Manager screen.
130
7888ch08-rucel.fm
Note: IBM N series System Manager Software will be use in section 8.3, End-to-end Deduplication configuration example using command line on page 154 End-to-end Deduplication (Advanced Single Instance Storage) configuration example Frequent Tasks using the IBM N series System Manager Software as shown in Figure 8-13 on page 132 like Creating Volume, LUNs, Share/Export and NFS datastore for Vmware.
131
7888ch08-rucel.fm
Figure 8-14 on page 133 shows the created volume in the IBM N series storage system with details.
132
7888ch08-rucel.fm
Figure 8-15 on page 134 shows the Shared folders being exported and details of each shared folders. If the deduplication is enabled in the volume, you can see a tab that say Deduplication.
133
7888ch08-rucel.fm
Figure 8-16 on page 135 shows the session which host/s are connected to the shared folder/s.
134
7888ch08-rucel.fm
Figure 8-17 on page 136 shows the LUNs Management. Can Create, Edit, Delete, show Status and Manage Snapshots.
135
7888ch08-rucel.fm
Figure 8-18 on page 137 shows the Qtrees present on each Volume created. Can create, edit and delete.
136
7888ch08-rucel.fm
Figure 8-19 on page 138 shows the Disk Management. Can view the disk the are included in aggregate. Can create and add disk to an aggregate.
137
7888ch08-rucel.fm
Figure 8-20 on page 139 shows the Aggregate Management. Can view, create and delete aggregate.
138
7888ch08-rucel.fm
Figure 8-21 on page 140 shows the User Management. Can create, delete, edit and set password for each users.
139
7888ch08-rucel.fm
Figure 8-22 on page 141 shows the Groups existing. Can Add, Delete and View users who belong to what group.
140
7888ch08-rucel.fm
Figure 8-23 on page 142 shows the DNS setup of the IBM N series System Manager. Add the DNS IP address and turn on the service, the caching and the dynamic update.
141
7888ch08-rucel.fm
Figure 8-24 on page 143 shows the existing interfaces present in the IBM N series storage system. Can see the details of each interfaces attached to the storage.
142
7888ch08-rucel.fm
Figure 8-25 on page 144 shows the IP address and the host name of the IBM N series system storage.
143
7888ch08-rucel.fm
Figure 8-26 on page 145 shows the Network Information Management (NIS). Can specify additional NIS servers via IP.
144
7888ch08-rucel.fm
Figure 8-27 on page 146 shows the CIFS protocol setup. Can see the details of the CIFS, the Home directories and Auditing can be enabled.
145
7888ch08-rucel.fm
146
7888ch08-rucel.fm
Figure 8-29 on page 148 shows the iSCSI setup. Can see the details if the service is running, the target nodename and the alias and some details needed.
147
7888ch08-rucel.fm
Figure 8-30 on page 149 shows the Security and password management Shows the trusted host and RSH settings.
148
7888ch08-rucel.fm
Figure 8-31 on page 150 shows the SSH and SSL setup to which the connection will be use by the host. Can generates SSH keys and certificate for secure communication.
149
7888ch08-rucel.fm
Auto Support setup can be configured as shown in Figure 8-32 on page 151. Can specify the mail hostname, message recipient email addresses and the minimal message recipient email addresses.
150
7888ch08-rucel.fm
The Date and Time zone setup as shown in Figure 8-33 on page 152 can be enable and can be edit. Can see the time servers that is being used by the IBM N series storage system.
151
7888ch08-rucel.fm
SNMP setup can be edited as shown in Figure 8-34 on page 153. If there are multiple SNMP IP addresses, simply add those addresses in the table show in the figure.
152
7888ch08-rucel.fm
System Log of the IBM N series storage system can be viewed as shown in Figure 8-35 on page 154. Events, severity, message and when it happened can be seen in this menu.
153
7888ch08-rucel.fm
itsonas1> vol create VolArchive aggr0 200g Creation of volume 'VolArchive' with size 200g on containing aggregate 'aggr0' has completed
154
7888ch08-rucel.fm
2. Enable deduplication on the flexible volume and verify that it is turned on. The vol status command shows the attributes for flexible volumes that have deduplication turned on.After you turn on deduplication, Data ONTAP lets you know that if this were an existing flexible volume that already contained data before deduplication was enabled, you would want to run the sis start s command. Example 8-2 is using a new flexible volume, so running the command is not necessary.
Example 8-2 Using the vol status command
itsonas1> sis on /vol/VolArchive Deduplication for "/vol/VolArchive" is enabled. Already existing data could be processed by running "sis start -s /vol/VolArchive" itsonas1> vol status VolArchive Volume State Status Options VolArchive online raid_dp, flex nosnap=on sis Containing aggregate: 'aggr0' 3. Another way to verify that A-SIS is enabled on the flexible volume is to simply check the output of sis status on the flexible volume, as shown in Example 8-3.
Example 8-3 Running the sis status command
itsonas1> sis status /vol/VolArchive Path ...........State ..Status Progress /vol/VolArchive Enabled Idle ..Idle for 00:00:20 4. Turn off the default A-SIS schedule (Example 8-4) by modifying the sis config command.
Example 8-4 Running the sis status command
itsonas1> sis config /vol/VolArchive Path ......................Schedule /vol/VolArchive ...........sun-sat@0 itsonas1> sis config -s - /vol/VolArchive itsonas1> sis config /vol/VolArchive Path ..................Schedule /vol/VolArchive ...................... 5. Mount the flexible volume to /testDedup on Linux server, and copy files from the users directories into the new archive directory flexible volume. From the host perspective, the result is shown in Example 8-5.
Example 8-5 Result from host perspective using Linux
[root@localhost /]mkdir testDedup [root@localhost /]mount itsonas1:/vol/VolArchive/1 testDedup [root@localhost /]df -k testDedup Filesystem .............. kbytes ...used ....avail ....capacity Mounted on itsonas1:/vol/VolArchive/1 167772160 89353344................. 78418816 54% ...../testDedup 6. Next, examine the flexible volume, run deduplication, and monitor the status. Use the df s command to examine the storage consumed and the space saved. Note that no space savings have been achieved by simply copying data to the flexible volume, even though deduplication is turned on. What has happened is that all the blocks that have been written
155
7888ch08-rucel.fm
to this flexible volume since deduplication was turned on have had their fingerprints written to the change log file. Refer to Example 8-6 on page 156.
Example 8-6 Examine the storage consumed and space saved
itsonas1> df -s /vol/VolArchive Filesystem used saved %saved /vol/VolArchive/ 89353528 0 0% 7. Run deduplication on the flexible volume. This step causes the change log to be processed, fingerprints to be sorted and merged, and duplicate blocks to be found. Refer to Example 8-7.
Example 8-7 Run deduplication on the flexible volume
itsonas1> sis start /vol/VolArchive The deduplication operation for "/vol/VolArchive" is started. 8. Monitor the progress of deduplication by using the sis status command, as shown in Example 8-8.
Example 8-8 Monitor deduplication progress
itsonas1> sis status /vol/VolArchive Path State Status Progress /vol/VolArchive Enabled Active 65 GB Searched itsonas1> sis status /vol/VolArchive Path State Status Progress /vol/VolArchive Enabled Active 538 MB (5%) Done itsonas1> sis status /vol/VolArchive Path State Status Progress /vol/VolArchive Enabled Active 1424 MB (14%) Done itsonas1> sis status /vol/VolArchive Path State Status Progress /vol/VolArchive Enabled Active 8837 MB (90%) Done itsonas1> sis status /vol/VolArchive Path State Status Progress /vol/VolArchive Enabled Active 9707 MB (99%) Done itsonas1> sis status /vol/VolArchive Path State Status Progress /vol/VolArchive Enabled Idle for 00:00:07 9. When sis status indicates that the flexible volume is again in the Idle state, deduplication has finished running and you can check the space savings it provided in the flexible volume. Refer to Example 8-9.
Example 8-9 Examine the storage consumed and space savings again
itsonas1> df -s /vol/VolArchive Filesystem used saved %saved /vol/VolArchive/ 79515276 9856456 11%
156
7888ch08-rucel.fm
8.4 End-to-end Deduplication configuration example using IBM N series System Manager Software
To create a flexible volume with Deduplication, perform the following steps: 1. Create a flexible volume using the IBM N series System Manager, as shown in Figure 8-36 on page 157.
2. Enable deduplication on the flexible volume and verify that it is turned on. As shown in Figure 8-37 on page 158 and press the next button to create the flexible volume.
157
7888ch08-rucel.fm
3. After creating the flexible volume, IBM N series System Manager lets you know that it is created and deduplication is enabled, the schedule, the status and the type of deduplication as shown in Figure 8-38 on page 159
158
7888ch08-rucel.fm
4. Disable Deduplication as shown in Figure 8-39 on page 160. Press Apply and Ok to disable the deduplication.
159
7888ch08-rucel.fm
160
7888ch08-rucel.fm
6. Verify the export name of the flexible volume that was created as shown in Figure 8-41 on page 162
161
7888ch08-rucel.fm
7. Mount the flexible volume to /testDedup on Linux server, and copy files from the users directories into the new archive directory flexible volume. From the host perspective, the result is shown in Example 8-10.
Example 8-10 Result from host perspective using Linux
[root@localhost /]mkdir testDedup [root@localhost /]mount itsonas1:/vol/VolArchive/ testDedup [root@localhost /]df -k testDedup Filesystem .............. kbytes ...used ....avail ....capacity Mounted on itsonas1:/vol/VolArchive/ 167772160 33388384................. 134383776 20% ...../testArchives 8. Enable Deduplication on /vol/VolArchive by ticking the button as shown in Figure 8-42 on page 163
162
7888ch08-rucel.fm
9. Take note of the percentage Used of /vol/Archive as shown in before Deduplication runs.
163
7888ch08-rucel.fm
11.After pressing start button it will ask if you want Partial volume deduplication or Full volume deduplication. Choose Full volume deduplication for testing purposes as shown in Figure 8-45. Press OK button to start the process.
12.To verify that Deduplication is running, check the Status of Deduplication properties, it should have a value that change once you press the refresh button of the console see Figure 8-46 on page 165
164
7888ch08-rucel.fm
13.Figure 8-47 on page 165 shows that Deduplication process is done and can see the space saved after running the process.
165
7888ch08-rucel.fm
itsonas1> sis help config sis config [ [ -s schedule ] <path> | <path> ... ] Sets up, modifies, and retrieves the schedule of deduplication volumes. If you run it without any flags, sis config returns the schedules for all flexible volumes that have deduplication enabled. Example 8-12 shows the four different formats the reported schedules can have.
Example 8-12 he four format types that reported schedules can have
itsonas1> sis config Path ....................Schedule /vol/dvol_1 ................/vol/dvol_2 .............23@sun-fri /vol/dvol_3 .............auto /vol/dvol_4 .............sat@6 The meaning of each of these schedule types is as follows: On flexible volume dvol_1, deduplication is not scheduled to run. On flexible volume dvol_2, deduplication is scheduled to run at 11 p.m. every day from Sunday to Friday. On flexible volume dvol_3, deduplication is set to auto-schedule. This means that deduplication is triggered by the amount of new data written to the flexible volume, specifically when 20% new fingerprints are in the change log. On flexible volume dvol_4, deduplication is scheduled to run on Saturday at 6 a.m. When the -s option is specified, the command sets up or modifies the schedule on the specified flexible volume. The schedule parameter can be specified in one of four ways, as shown in Example 8-13.
Example 8-13 The schedule parameters
[day_list][@hour_list] [hour_list][@day_list] auto The meaning of these parameters are as follows: day_list specifies which days of the week deduplication should run. It is a comma-separated list of the first three letters of the day: sun, mon, tue, wed, thu, fri, sat. The names are not case-sensitive. Day ranges such as mon-fri can also be used. The default day_list is sun-sat.
166
7888ch08-rucel.fm
hour_list specifies which hours of the day deduplication should run on each scheduled day. hour_list is a comma-separated list of the integers from 0 to 23. Hour ranges such as 8-17 are allowed. Step values can be used in conjunction with ranges. For example, 0-23/2 means every 2 hours. The default hour_list is 0, which is midnight on the morning of each scheduled day. If a hyphen character (-) is specified, there is no scheduled deduplication operation on the flexible volume. The auto schedule causes deduplication to run on that flexible volume when there are 20% new fingerprints in the change log. This check is done in a background process and occurs every hour. Beginning with Data ONTAP 7.3.1, it is configurable by using the sis config -s auto@num /vol/<vol-name> -- command, where num is a two digit number that specifies the percentage. When deduplication is enabled on a flexible volume for the first time, an initial schedule is assigned to the flexible volume. This initial schedule is sun-sat@0, which means once every day at midnight. To configure the schedules shown earlier in this section, issue the commands shown in Example 8-14.
Example 8-14 Configure the schedules
-s -s -s -s
167
7888ch08-rucel.fm
168
7888ch08-rucel.fm
The more concurrent deduplication processes you are running, the more system resources that are consumed. Given this information, the best option is to perform one of the following actions: Use the auto mode so that deduplication runs only when significant additional data has been written to each particular flexible volume (this approach tends to naturally spread out when deduplication runs). Stagger the deduplication schedule for the flexible volumes so that it runs on alternative days. Run deduplication manually. If Snapshot copies are required, run deduplication before creating the Snapshot to minimize the amount of data before the data gets locked in to the copies. (Make sure that deduplication has completed before creating the copy.) Creating a Snapshot on a flexible volume before deduplication has a chance to run and complete on that flexible volume can result in lower space savings (see 5.4.4, Deduplication and Snapshot copies on page 44). If Snapshot copies are to be used, the Snapshot reserve should be greater than zero (0). An exception to this might be in an FCP or iSCSI LUN scenario, where it is often set to zero for thin-provisioning reasons. For deduplication to run properly, you have to leave some free space for the deduplication metadata (see 5.7.1, Metadata on page 60).
169
7888ch08-rucel.fm
170
7888ch9-eva.fm
Chapter 9.
Implementing ProtecTIER
In this chapter we introduce Getting started Installing ProtecTIER Manager Creating repositories (if applicable) Creating Library (if applicable) Adding cartridges (if applicable) Host implementation Backup applications
171
7888ch9-eva.fm
172
7888ch9-eva.fm
9.2.1 Prerequisites
Before you start with the installation of the ProtecTIER Manager on your workstation, make sure that the following prerequisites are met: One of the following operating systems: Windows 32/64 bit (2003/XP/7) Linux Red Hat 32/64 bit (Red Hat Enterprise 4 or 5) At least 1.2 GB of available disk space At least 256 MB of RAM The workstation can access the ProtecTIER service nodes' IP address (ports 3501 and 3503 are open on the firewall). In addition, it is recommended that the monitor for ProtecTIER Manager be configured to the following settings: Resolution of 1024 x 768 pixels or higher (this is the minimum resolution supported, however, 1280 x 1024 is recommended). 24 bit color or higher Note: If you are planning to run ProtecTIER Manager on a UNIX system, configure your graphics card and X windows system. This is done either manually or using the Xconfigurator utility. For instructions, refer to the appropriate Linux documentation.
173
7888ch9-eva.fm
2. If the autorun process does not launch automatically, select Start Run, type D: (where D: is your CD-ROM drive), and press Enter. From the files listed on the CD, select the Windows version. The Introduction window is displayed (Figure 9-1 on page 174).
Click Next. The Software License Agreement window is displayed (Figure 9-2).
174
7888ch9-eva.fm
Note: You can print the License Agreement by clicking Print. If you want to read the non-IBM terms of the license agreement, click Read Non-IBM Terms and a window is displayed with the corresponding text. 3. Select I accept both the IBM and the non IBM-terms and click Next. The Red Hat Enterprise Linux License Agreement window is displayed (Figure 9-3).
175
7888ch9-eva.fm
Figure 9-3 ProtecTIER Manager Install: Red Hat Enterprise Linux Licence Agreement window
4. Select I accept the terms of the License Agreement and click Next. The Choose Install Folder window is displayed (Figure 9-4).
176
7888ch9-eva.fm
5. Enter the path where you want to install ProtecTIER Manager or click Choose to browse for a location. Note: Click Restore Default Folder to revert to the default path. Click Next. The Choose Shortcut Folder window is displayed (Figure 9-5 on page 177).
6. Select one of the following locations for the ProtecTIER Manager shortcut: In a new Program Group Creates a new program group in the Program list of the Start menu.
In an existing Program Group Adds the shortcut to an existing program group in the Program list of the Start menu. In the Start Menu On the Desktop In the Quick Launch Bar Other Dont create icons Creates shortcuts directly in the Start menu. Creates shortcuts on the desktop. Creates shortcuts in the Quick Launch Bar. Enables you to enter a path location for the shortcut or to browse for a location by clicking Choose. No shortcut icons are created.
You can select Create Icons for All Users to create a shortcut in the defined location for all user accounts on the workstation. In our example we used the default Other. Click Next. The Pre-Installation Summary window is displayed (Figure 9-6 on page 178).
177
7888ch9-eva.fm
7. Click Install. The Installing ProtecTIER Manager window is displayed and ProtecTIER Manager is being installed on your computer (Figure 9-7 on page 178).
178
7888ch9-eva.fm
When the installation is complete and ProtecTIER Manager has been successfully installed, the Install complete window is displayed (Figure 9-8).
179
7888ch9-eva.fm
Do these steps: 1. Click on Add Node or click on Node Add Node if you are adding nodes. See Figure 9-10.
It will ask to add the IP address of the node. See Figure 9-11 on page 181.
180
7888ch9-eva.fm
Note: In case of a dual-node cluster, you can add the IP of any node of the cluster. The system and both nodes will be added to the ProtecTIER Manager. In case you just did an uninstall and upgrade of the ProtecTIER Manager, you will have all IPs of nodes still listed in ProtecTIER Manager.
2. Enter the IP address of the node and click OK. The node is displayed in the Nodes pane, and the Login button is displayed in the View pane. Note: Do not change the port number of the node unless directed by IBM support.
181
7888ch9-eva.fm
2. Click Login. You are prompted for your user name and password as seen on Figure 9-13.
3. Enter your user name and password. 4. Click OK. ProtecTIER Manager has default user accounts corresponding to three user permission levels. See Table 9-1 for the levels, default user names and passwords. We recommend to change the default values.
Table 9-1 Default user names and passwords Permission level Administrator Operator Monitor Default user name ptadmin ptoper ptuser Default password ptadmin ptoper ptuser
Note: Only one administrator can be logged into a ProtecTIER system at a time.
182
7888ch9-eva.fm
ProtecTIER Configuration Menu: ============================== 1. 2. 3. 4. 5. 6. 7. 8. 9. Update Time, Date, Timezone & Timeserver(s) Update System Name Update Customer Network Enable/Disable Call Home Activate/Deactivate Replication Manager Update Replication Network Configure Static routes Configure Application Interfaces Restore Network Configuration
q. Quit Please choose an option: The first three points of the ProtecTIER Configuration Menu has to be done as a minimum setup. We describe this first three menu points in detail in the next sections.
Date, Time, Timezone & Timeserver(s) configuration ================================================== 1. Set date & time 2. Set Timezone 3. Set Timeserver(s) c. Commit changes and exit q. Exit without committing changes Please Choose: Here you can setup the date and time, the timezone, and the timeserver(s). We recommend this sequence: first timezone setup, then either date & time, or timeserver setup.
183
7888ch9-eva.fm
Please Choose:1 Please specify the date in DD/MM/YYYY format [16/09/2010]: Please specify the time in HH:MM:SS format [21:10:42]:
Set Timezone
After choosing 2, as shown in the Example 9-4, you have to enter the two letter country code, and according to that the time zones will be listed, and you can choose the appropriate one.
Example 9-4 Country code selection
Please Choose:2 Enter a 2 letter country code(or type 'm' to enter the timezone manually): us Time zones under US: ==================== 1. America/New_York 2. America/Detroit 3. America/Kentucky/Louisville 4. America/Kentucky/Monticello 5. America/Indiana/Indianapolis 6. America/Indiana/Vincennes 7. America/Indiana/Winamac 8. America/Indiana/Marengo 9. America/Indiana/Petersburg 10. America/Indiana/Vevay 11. America/Chicago Press <ENTER> to continue 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. America/Indiana/Tell_City America/Indiana/Knox America/Menominee America/North_Dakota/Center America/North_Dakota/New_Salem America/Denver America/Boise America/Shiprock America/Phoenix America/Los_Angeles
Press <ENTER> to continue 22. 23. 24. 25. 26. 27. America/Anchorage America/Juneau America/Yakutat America/Nome America/Adak Pacific/Honolulu
Set timeserver(s)
After choosing 3, you can add the IP address of the timeserver(s), as seen in Example 9-5. 184
IBM Storage Data Deduplication Solutions
7888ch9-eva.fm
Please Choose:3 Please specify the timeserver's IP Address: 9.11.107.11 Would you like to set a secondary timeserver? (yes|no) yes Please specify the secondary timeserver's IP Address: 9.11.107.12
Date, Time, Timezone & Timeserver(s) configuration ================================================== 1. Set date & time 2. Set Timezone 3. Set Timeserver(s) c. Commit changes and exit * q. Exit without committing changes Please Choose:c Please review the following information: ======================================== Date: Thu Sep 16 22:23:35 2010 Do you wish to apply those settings? (yes|no) yes note: the cluster & VTFD services on all nodes must be stopped in order to continue Do you wish to continue? (yes|no) yes Stopping RAS [ Done ] Stopping VTFD [ Done ] Stopping Cluster Services [ Done ] Stopping NTPD [ Done ] Setting Date & Time [ Done ] Starting NTPD [ Done ] Starting cluster [ Done ] Cluster Started Starting VTFD [ Done ] Starting RAS [ Done ] Press the ENTER key to continue...
Date, Time, Timezone & Timeserver(s) configuration ================================================== 1. Set date & time 2. Set Timezone
Chapter 9. Implementing ProtecTIER
185
7888ch9-eva.fm
3. Set Timeserver(s) c. Commit changes and exit * q. Exit without committing changes Please Choose:q Do you wish to exit without commiting changes? (yes|no) yes Press the ENTER key to continue...
ProtecTIER Configuration Menu: ============================== 1. 2. 3. 4. 5. 6. 7. 8. 9. Update Time, Date, Timezone & Timeserver(s) Update System Name Update Customer Network Enable/Disable Call Home Activate/Deactivate Replication Manager Update Replication Network Configure Static routes Configure Application Interfaces Restore Network Configuration
q. Quit Please choose an option:2 Starting Cluster, please wait Starting cluster Cluster Started Please enter a new system name [pt_system]: Redbooks Changing system name Updated system name successfully UpdateSystemName ended successfully Press the ENTER key to continue...
[ Done ]
[ Done ]
ProtecTIER Configuration Menu: ============================== 1. Update Time, Date, Timezone & Timeserver(s) 2. Update System Name 186
IBM Storage Data Deduplication Solutions
7888ch9-eva.fm
3. 4. 5. 6. 7. 8. 9.
Update Customer Network Enable/Disable Call Home Activate/Deactivate Replication Manager Update Replication Network Configure Static routes Configure Application Interfaces Restore Network Configuration
q. Quit Please choose an option:3 Starting Cluster, please wait Starting cluster Cluster Started Would you like to stop the VTFD service? (yes|no) yes Stopping RAS Stopping VTFD Please provide the following information: ----------------------------------------Customer Network, IP Address [9.11.201.105]: Customer Network, Netmask [255.255.254.0]: Customer Network, Default Gateway [9.11.200.1]: Customer Network, Hostname [Hungary]: Configuring Network Setting Hostname Saving configuration Collecting RAS Persistent configuration Updated network configuration successfully Starting VTFD Starting RAS UpdateNetwork ended successfully Press the ENTER key to continue...
[ Done ]
[ Done ] [ Done ]
[ [ [ [
] ] ] ]
[ Done ] [ Done ]
Note: Do not specify leading zeros in any of the address numbers (e.g. 192.168.001.015).
187
7888ch9-eva.fm
To start the Configuration Wizard, go to Server Configuration wizard. See Figure 9-15.
In the Configuration Wizard you will be able to set up: Customer information Enable SNMP traps Provide details for SNMP traps Enable E-mail alerts Provide details about E-mail alerts The Configuration wizard starts with a Welcome, see Figure 9-16 on page 189.
188
7888ch9-eva.fm
After clicking on Next, you see the Registration dialog, see Figure 9-17.
Figure 9-17 Configuration wizard: Registration information window Chapter 9. Implementing ProtecTIER
189
7888ch9-eva.fm
After filling it out with your details, you can hit the Next button, see Figure 9-18.
The next dialog is the SNMP trap enablement, see Figure 9-19 on page 191.
190
7888ch9-eva.fm
If you enable SNMP traps, you can fill out more details, see Figure 9-20 on page 191.
Figure 9-20 Configuration wizard: SNMP traps setup window -adding host
191
7888ch9-eva.fm
You can add multiple SNMP hosts. After clicking on Add, the host will be listed at the Communities section. See Figure 9-21 on page 192.
After clicking on Next, you can enable e-mail notifications. See Figure 9-22 on page 193.
192
7888ch9-eva.fm
If you enable e-mail alerts, you can add e-mail server IP and e-mail recipients. See Figure 9-23 on page 194.
193
7888ch9-eva.fm
When you filled out, by clicking on the Next, you will arrive to the Report section, which summarizing your changes. See Figure 9-24 on page 195.
194
7888ch9-eva.fm
xxx.xxx.xxx.xxx xxx
If you are satisfied with the Summary, you can hit Finish, and your changes will be added to the system. Your system is ready for host attachment. You can continue at 9.9.1, Connecting hosts to ProtecTIER systems.
9.5.1 autorun
To install the software, in most cases you have to copy the tar package to the ProtecTIER server, untar it, and running autorun. In some cases it is autorun -f. You have to follow the details in the Release Note.
Example 9-10 installing software
195
7888ch9-eva.fm
[root@tuscany PT_TS7650G_V2.5.0.TST_117_3-full.x86_64]# ./autorun -f To check the installed level you can: run get_versions command, as seen on Example 9-11:
Example 9-11 versions command
[root@tuscany ~]# /opt/dtc/app/sbin/get_versions <?xml version="1.0" encoding="UTF-8" ?> <version-info> <component name="PT" version="2.5.0.1261"/> <component name="ptlinux" version="7123.126-1261"/> <component name="dtcemulex" version="5223.004-0"/> <component name="ptrepmgr" version="6123.069-0"/> <component name="vtl" version="7.123.126"/> <component name="model" version="TS7650G"/> </version-info> check it from rasMenu, as seen in Example 9-12:
Example 9-12 RAS Menu
[root@tuscany ~]# rasMenu +------------------------------------------------+ | RAS Text Based Menu running on tuscany | +------------------------------------------------+ | 1) Check if RAS service is running | | 2) Run RAS environment check | | 3) Start RAS service | | 4) Stop RAS service | | 5) Get RAS Version | | 6) Get PT Code Version | | 7) Display Firmware Levels | | 8) Manage Configuration (...) | | 9) System Health Monitoring (...) | | 10) Problem management (...) | | 11) Call Home Commands (...) | | 12) Collect Logs (...) | | 13) Enterprise Controller (...) | | E) Exit | +------------------------------------------------+ >>> Your choice? 6 Begin Processing Procedure PT version : 2.5.0.1261 Build date : Oct_04_2010 PVT main package : ptlinux-7123.126-1261 DTC Emulex driver : dtcemulex-5223.004-0 vtl version : 7123.126 End Processing Procedure Press any key to continue Or you can check it from the GUI. Click on the Version number in the GUI, and you will see the Show version information window, as seen on Figure 9-25.
196
7888ch9-eva.fm
9.5.2 ptconfig
Ptconfig script has many functions. The utility can be found in /opt/dtc/install directory, see Example 9-13 on page 197.
Example 9-13 Ptconfig script
[root@tuscany ~]# cd /opt/dtc/install/ [root@tuscany install]# ./ptconfig Main utility for installing and maintaining a ProtecTIER system Usage: ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig
-modelList -restoreNetworkConfig -updateFirmwares -setClock -updateReplicationIp -addReplication -updateSystemName -activatePTRepMan -upgrade -model=<model> [-usedefaults] -deactivatePTRepMan -configRAS -validate -addReplicationNic -updateNetwork -install -app=<app> -model=<model> [-usedefaults] -appInterfaces -replace -app=<app> -model=<model> [-usedefaults] -appList -updateDSfw -staticRoutes -restoreSerialNumber -modifyNic
197
7888ch9-eva.fm
-modelList list the available models (*) -restoreNetworkConfig Restores network configuration (*) -updateFirmwares Update xSeries firmwares (*) -setClock Manage the time, date, timeserver(s) and timezone settings (*) -model specify model type -updateReplicationIp update ip addresses used by replication (*) -addReplication add IP configuration for replication (*) -updateSystemName replace the system name (cluster name) (*) -usedefaults don't ask the user for configuration attributes -activatePTRepMan Install the ProtecTIER Replication Manager (*) -upgrade upgrade an existing model to a different model (*) -deactivatePTRepMan Uninstall the ProtecTIER Replication Manager (*) -configRAS Configure the RAS service (*) -validate run validation test on an installed cluster (*) -app specify application to install (use the appList option to see all available applications) -addReplicationNic replace replication NIC location after upgrading DD1 to DD3 (*) -updateNetwork replace the external (management) ip address and/or hostname (*) -install install and configure a PT system with a specific model (*) -appInterfaces Configure Application Interfaces (*) -replace install and configure a PT system with an existing repository (*) -appList list the available applications (*) -updateDSfw Update the DS disk array firmware (*) -staticRoutes edit static routes configuration (*) -restoreSerialNumber Restore the serial number for the server (*) -modifyNic change Network interface card configuration (*) Options marked with a '*' are mutually exclusive
We use the ptconfig -install -model=TS7650G -app=VTL to install a TS7650G Gateway model with VTL, see Example 9-14.
Example 9-14 installation example
[root@tuscany install]# ./ptconfig -modelList Available models: ================= TS7650G TS7650G Gateway [root@tuscany install]# ./ptconfig -appList Available applications: ======================= OST_10G OST using 10G NICs VTL_OLD VTL Application on M2 OST_1G OST using 1G NICs VTL VTL Application [root@tuscany install]# ./ptconfig -install -model=TS7650G -app=VTL
198
7888ch9-eva.fm
For each software update you can list the available models. Note: Do not specify leading zeros in any of the address numbers (e.g. 192.168.001.015).
Note: If your system is OST, you have to run the ptconfig -install differently, for example: [root@tuscany install]# ./ptconfig -install -model=TS7650G -app=OST_1G
[root@italy ~]# cd /opt/dtc/install/ [root@italy install]# ./ptconfig Main utility for installing and maintaining a ProtecTIER system Usage: ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig ./ptconfig
-modelList -restoreNetworkConfig [-usedefaults] -updateFirmwares -setClock -updateReplicationIp -addReplication -updateSystemName -activatePTRepMan -upgrade -model=<model> [-usedefaults] -deactivatePTRepMan -configRAS -validate -addReplicationNic -updateNetwork -install -app=<app> -model=<model> [-usedefaults] -appInterfaces -replace -app=<app> -model=<model> [-usedefaults] -appList -updateDSfw -staticRoutes -restoreSerialNumber -modifyNic list the available models (*) Restores network configuration (*) Update xSeries firmwares (*) Manage the time, date, timeserver(s) and timezone specify model type update ip addresses used by replication (*) add IP configuration for replication (*) replace the system name (cluster name) (*) don't ask the user for configuration attributes Install the ProtecTIER Replication Manager (*)
Chapter 9. Implementing ProtecTIER
-modelList -restoreNetworkConfig -updateFirmwares -setClock settings (*) -model -updateReplicationIp -addReplication -updateSystemName -usedefaults -activatePTRepMan
199
7888ch9-eva.fm
-upgrade upgrade an existing model to a different model (*) -deactivatePTRepMan Uninstall the ProtecTIER Replication Manager (*) -configRAS Configure the RAS service (*) -validate run validation test on an installed cluster (*) -app specify application to install (use the appList option to see all available applications) -addReplicationNic replace replication NIC location after upgrading DD1 to DD3 (*) -updateNetwork replace the external (management) ip address and/or hostname (*) -install install and configure a PT system with a specific model (*) -appInterfaces Configure Application Interfaces (*) -replace install and configure a PT system with an existing repository (*) -appList list the available applications (*) -updateDSfw Update the DS disk array firmware (*) -staticRoutes edit static routes configuration (*) -restoreSerialNumber Restore the serial number for the server (*) -modifyNic change Network interface card configuration (*) Options marked with a '*' are mutually exclusive [root@italy install]# ./ptconfig -updateNetwork [root@italy install]# ./ptconfig -updateSystemName Note: Do not specify leading zeros in any of the address numbers (e.g. 192.168.001.015).
9.5.3 fsCreate
This step is required only when you have a separately attached storage, for example in case of TS7650G or TS7680. The script is not required for the appliance models, TS7610 or TS7650. The fsCreate utility is creating file systems on the attached storage devices. You need to run fsCreate, which is in /opt/dtc/app/sbin directory. See Example 9-16.
Example 9-16 FsCreate utility
[root@tuscany install]# cd ../app/sbin/ [root@tuscany sbin]# pwd /opt/dtc/app/sbin [root@tuscany sbin]# ./fsCreate Syntax: fsCreate Options (mutually exclusive): -n -e 200 #create GFS file systems for all mpath devices during first time installation #create GFS file systems for new mpath devices
7888ch9-eva.fm
-t -r -u -g Optional parameters: -s
during capacity upgrade #create mount points and register GFS file systems to /etc/fstab #display all repository GFS file systems #display unused devices #display all non-repository GFS file systems
#script mode, removes header from output and disables user prompts
[root@tuscany sbin]# ./fsCreate -n In our example we use -n, since this is the first time installation. The fsCreate script removes any existing data on the disk array as a result of creating the file systems. You will have to type in data loss to ensure you are aware that all data will be deleted. You will need to use this script again in case of a capacity increase, but with -e. Note: fsCreate is used only for TS7650G and TS7680 models, as the storage is attached separately.
201
7888ch9-eva.fm
Here you can select if you are planning for Virtual Tape Library (VTL) or Open Storage (OST), and give the inputs you have from the Planning, in 6.3, Planning for Open systems with VTL on page 75 or the planning for OST. These numbers are critical for the Repository, since after you create the Repository, you will be able to change only the Repository size. 1. In the Repository size field select the size value of the repository that you want to create. 2. In the Estimated factoring ratio field, enter the value estimated for your environment based on your data change rate, backup policies, and retention period. 3. In the System peak throughput field, specify the rate of system peak throughput that your metadata file systems can support. 4. In the MD RAID configuration field, select the RAID configuration of the logical volumes on which the repository metadata file systems are to be created. For example, select FC-15K 4+4 for a configuration of RAID 10 4+4 with Fibre Channel 15K RPM disks. 5. In the Disk size field enter the size of the disks that you use in your storage array. 6. Click OK. The Repository metadata storage requirements dialog is displayed with a list of file system arrangement options that are suitable for your needs. After you are giving the inputs, you will get a screen describing the Meta Data requirements, as seen on Figure 9-27.
202
7888ch9-eva.fm
On Figure 9-27 you can see what are the requirements from a meta data point of view. As an example: if you have a 5 TB of Repository, estimated factoring ratio is 14, peak throughput is 400 MB/s, using FC 15K 450GB drives in RAID 10 4+4 for Meta Data, you will need three file systems and 16 GB of Memory in the ProtecTIER server. The required meta data (MD) size is 1024 MB, 156.9 GB and 147.6 GB. The 1024 MB requirement is always there, the other are depending on factors like HyperFactor ratio. The recommendation is that when you carve out the LUN, you use all space in the RAID group. In our case we use RAID 10 4+4 for the MD and 450 GB drives, therefore creating approximately 4*450 GB=1.6 TB LUNs. See more details about meta data in section, Meta Data on page 99. Also, we recommend to create a slightly different size of LUNs for deta data and User Data, so you are able to distinguish them from each other.
203
7888ch9-eva.fm
3. Click Next. The Repository Name window is displayed, see Figure 9-29 on page 204.
204
7888ch9-eva.fm
4. In the System name field, enter the name of the system on which the repository will be created. The Repository name field will be populated automatically. If you would like to have a different name for the Repository, you can change that after you entered the System name. 5. Click Next and the Repository size window is displayed, see Figure 9-30 on page 205.
6. In the Repository size field, enter the repository size in terabytes that you determined using the Create repository planning wizard. 7. In the Estimated factoring ratio field, enter the estimated factoring ratio value that was determined with the assistance of your IBM System Services Representative (SSR), Lab-based Services (LBS), and Field Technical Service and Support (FTSS) personnel. 8. In the System peak throughput field, specify the rate of system peak throughput that your metadata file systems can support. 9. In the Metadata RAID configuration field, select the RAID configuration of the logical volumes on which the repository metadata file systems are to be created. For example, select FC-15K 4+4 for a configuration of RAID 10 4+4 with Fibre Channel 15K RPM disks. This depends on the way that your storage arrays are configured. Other choices are Serial Attached SCSI (SAS) storage arrays and Serial Advanced Technology Attachment (SATA) storage arrays. 10.In the Disk size field enter the size of the disks that are in your storage array. 11.Click Next.You will go into the Resources menu, which shows each filesystems which was created before. 12.Verify that the correct file systems are selected for metadata and user data, based on the metadata file system sizes indicated by the repository planning process. The system automatically distributes the filesystems between Meta Data file systems, Available file systems and User Data file systems. If the file systems selected by ProtecTIER for meta data and User Data do not match the file systems that you created for those purposes, change the assignment by selecting file systems from the Available file systems list and
205
7888ch9-eva.fm
click the arrow buttons to move file systems to and from the meta data (MD) file systems and user data (UD) file systems lists. See Figure 9-31 on page 206. Note: By default, the ProtecTIER system generally selects the smallest available file systems for use as metadata file systems. The remaining file systems are available for user data. You cannot assign more storage space for user data than the repository size defined in the Repository Size window.
13.Click OK. The Metadata resources advanced dialog closes. Note: Optionally, reopen the Metadata resources dialog by clicking Advanced in the Metadata window, 14.Now you will see the summarized view of Meta Data and User Data size. See Figure 9-32 on page 207.
206
7888ch9-eva.fm
15.Click Next. The Create repository Report window is displayed, see Figure 9-33:
207
7888ch9-eva.fm
16.Click Finish. The Create repository wizard closes and a confirmation window is displayed, Figure 9-34
17.Click Yes. The ProtecTIER system temporarily goes offline to create the repository. This operation might take a while. The Create repository window is displayed until the repository is created, see Figure 9-35.
208
7888ch9-eva.fm
storage units and their contents. Currently, only one STS can be defined for each ProtecTIER OST storage appliance.
2. The STS definition window will show up, where you will need to fill out the STS name and credentials (user name and password), see Figure 9-37. NetBackup uses these credentials so that the media server can log in to the storage server for storage access.
Note: The STS unique name cannot be modified once the STS has been created. Only the credentials can be changed later. 3. You can create only one STS. You can delete the STS, but if you have LSUs existing already you will get a warning,
209
7888ch9-eva.fm
Note: You cannot modify the STS name, only the credentials.
210
7888ch9-eva.fm
3. Click on Add, to add new LSU. After clicking on Add, you will get the window seen on Figure 9-41.
4. Type in the name for LSU, the description, which can be up to 1024 characters.
211
7888ch9-eva.fm
Note: Once an LSU has been created and saved, the LSU name can no longer be modified. 5. At nominal percentage, you can configure the LSU to some the percentage of the total repository. Maximum 100% can be allocated, and you can define it up to two precision digits in the %, like 10.75%. If you try to overallocate, a warning will show up, see Figure 9-42.
6. You can set up HyperFactor mode for each LSU, see Figure 9-43:
212
7888ch9-eva.fm
7. You can set up Compression mode for each LSU, see Figure 9-44 on page 213:
8. After you filled out all fields and selected the appropriate modes, click on OK and the LSU is showing up in the management window as New.
213
7888ch9-eva.fm
9. To create this LSU, click on Save changes. 10.The new LSU is created.
214
7888ch9-eva.fm
After selecting the LSU and clicking on Modify, you get this window, Figure 9-47:
Here you can change the description, the % size of LSU, mode of HyperFactor and mode of Compression.
215
7888ch9-eva.fm
Note: You cannot change the LSU name. If you are done with the modification, and you click on Ok, you go back to the LSU management window, and see the modified LSU is listed there, see Figure 9-48.
If you want to do the modification, click on Save changes. Note: If you change the HyperFactor or Compression mode to enabled or disabled, the changes will take effect only on the next image. You can select more than one LSU if you are holding the CTRL key and clicking on each, see Figure 9-49 on page 217.
216
7888ch9-eva.fm
In this case the Multiple adjust function becomes available. If you click on it, you can distribute equally the remaining quota of the repository between the selected LSUs, or you can add to all of them a specified % of quota, see Figure 9-50.
217
7888ch9-eva.fm
To make the changes, you have to click on Save Changes. You will be asked to confirm the delete, by typing data loss, see Figure 9-52.
218
7888ch9-eva.fm
This step is not necessary for TS7610, since it has preconfigured Library, but if you want to create more Libraries or new Libraries in other TS7600 product, you will need this section. A library can be created on a ProtecTIER system of either a one-node cluster or a two-node cluster. Note: Use the Scan button of the Port attributes pane to verify that the ports of the ProtecTIER system to which the virtual devices of the library are to be assigned are connected to the correct host. If LUN masking is enabled, make sure to create LUN masking groups for the new library too. 1. Log in to the system you want to create the new library on. 2. Click on VT VT Library Create new library, and the Create library wizard will start, see Figure 9-53.
3. Click Next. The Library details window is displayed, Figure 9-54 on page 220. 4. In the ProtecTIER VT name field, enter a name for the library.
219
7888ch9-eva.fm
5. Click Next. In the library details specify the type of library that you want to use for your application, see Figure 9-55 on page 221.
220
7888ch9-eva.fm
By default the IBM TS3500 is selected. Note: V-TS3500 is Symantec Veritas NetBackup (NBU) requirement. The functionality of the IBM TS3500 and the IBM V-TS3500 are the same.
Note: Verify that the backup application that you are using supports the type of library model that you select. 6. Click Next. The Tape Model window is displayed. Select the tape drive model that you to use for your virtual library, see Figure 9-56 on page 222.
221
7888ch9-eva.fm
7. Click Next. The Tape drives window opens. In the Number of tape drives field for each node, enter the number of tape drives to assign to the node. To maximize load balancing, we recommend that you distribute tape drives across the nodes in a two-node cluster based on the relative power of the nodes. See the window on Figure 9-57 on page 223.
222
7888ch9-eva.fm
In this example we create eight drives on node Tuscany. We can create maximum 230 drives, since there are other libraries already using some tape drives. Note: The maximum number of drives is 256 per node. These drives are divided between the virtual libraries, as you are configuring them. 8. Click Next. The Assignment window opens
223
7888ch9-eva.fm
Figure 9-58 Crete new library: assigning drives and robots to ports
Select or deselect the check boxes next to each port to define which of the nodes ports are assigned virtual devices. In our example we set up a IBM TS3500 and by default all the robots are selected and enabled. If you have chosen a library model other than IBM, the robots are not checked and only one must be chosen. In the Drives fields corresponding to each selected port, select the number of virtual tape drives that are assigned to each port. Optionally, click Select All to automatically select both ports. Click Equally divide to evenly divide the number of drives between the ports. Check the Robot check box if you want the library virtual robot to be accessible through this port. Note: For high-availability purposes the IBM System Storage TS7600 with ProtecTIER supports the assignment of the virtual robot to multiple ports. The backup application can only access the virtual robot through the specific node and port to which the robot is assigned. Verify that the port is connected to the appropriate host in your backup environment using the Scan button in the port attributes pane. If you are using LUN masking, make sure that all required devices are listed in the LUN masking group. 9. Click Next. The cartridge windows is displayed, see Figure 9-59 on page 225.
224
7888ch9-eva.fm
10.In the No. of cartridges field, enter the number of cartridges that you want to have in the library. The Virtual cartridge size field automatically displays the maximum possible size for virtual cartridges for your system, based on the number of cartridges entered, the total amount of available storage space in your repository, and the current HyperFactor ratio. Optionally, select the Max. cartridge growth check box. When selected, you can limit the maximum amount of nominal data that a cartridge can contain. The value of the maximum number of cartridges possible on a system depends on the amount of storage space available on your system. See more details in , Planning for cartridges on page 88. In the Barcode seed field, enter a value for the barcode seed. The barcode seed is the barcode that is assigned to the first cartridge created. Every cartridge added after the first cartridge is assigned a barcode following the initial barcode seed. Note: The barcode seed must contain only numbers and capital letters and be only six characters in length (for example, R00000). The maximum quantity of cartridges are depending on the amount of storage space available on your system. 11.Click Next. The Slots window opens, see Figure 9-60 on page 226.
225
7888ch9-eva.fm
In the Number of slots field, enter the number of cartridge slots that you want to have in the library. By default the number of slots will be equal with the number of cartridges added in previous step. Note: We recommend to create more slots than cartridges created in the previous step, to have empty slots. If you have empty slots, you are able to add cartridges later easily, without any disruption. If you do not have empty slots when adding new cartridges, the action will be disruptive. Since the dimensions of the library have to be changed in this case, the system will go offline creating it. In the Number of import/export slots field, enter the number of import/export slots that you want to have in the library. The maximum number of import/export slots that can be defined in the entire system is 1022. The I/O slots will be used to move cartridges between the libraries and shelf. 12.Click Next Finish. The Create new library wizard closes and a summary report is displayed (Figure 9-61 on page 227).
226
7888ch9-eva.fm
13.The ProtecTIER system temporarily goes offline (Figure 9-62) to create the library. The library is displayed in the Services pane and the VT monitoring window opens.
227
7888ch9-eva.fm
228
7888ch9-eva.fm
[root@frankfurt tmp]# ./itdt -f /dev/IBMtape0 inquiry 80 Issuing inquiry for page 0x80... Inquiry Page 0x80, Length 14 229
7888ch9-eva.fm
E F
0123456789ABCDEF [....1497515000 ]
Exit with code: 0 [root@frankfurt tmp]# ./itdt -f /dev/IBMtape1 inquiry 80 Issuing inquiry for page 0x80... Inquiry Page 0x80, Length 14 0 1 2 3 4 5 6 7 8 9 A B C D 0000 - 0180 000A 3134 3937 3531 3530 3032 E F 0123456789ABCDEF [....1497515002 ]
Exit with code: 0 [root@frankfurt tmp]# ./itdt -f /dev/IBMtape2 inquiry 80 Issuing inquiry for page 0x80... Inquiry Page 0x80, Length 14 0 1 2 3 4 5 6 7 8 9 A B C D 0000 - 0180 000A 3134 3937 3531 3530 3034 E F 0123456789ABCDEF [....1497515004 ]
Exit with code: 0 We recommend to identify the virtual tape devices and create a table with the data, as shown in Table 9-2.
Table 9-2 Virtual Tape Library worksheet Device in OS Type VTL system VTL node VTL port VTL port WWN Serial number Element number
SAN zoning
The ProtecTIER Gateway and appliance have specific recommendations about how SAN zones are created: Use zones based on World Wide Port Name (WWPN). Use two-member zones, that is, one initiator port and one target port per zone. For each backup server, create a separate zone for each HBA that will access ProtecTIER virtual resources. Before creating WWN zones at a SAN switch, you must get the WWPN of each port for both your ProtecTIER and your host computer.
230
7888ch9-eva.fm
For ProtecTIER, you need the WWPN of each front end port. To do this, you can use the ProtecTIER Manager GUI, see Figure 9-64. In ProtecTIER the Front End ports are Emulex HBAs, the Back End ports are Qlogic HBAs.
For your host computer, you can issue a command to get WWPN of each HBA. In our case, we worked with Linux, see Example 9-18.
Example 9-18
[root@frankfurt ~]# cat /sys/class/fc_host/host6/port_name 0x10000000c97c70cf After successful zoning, if you click on the Scan you can check what WWPNs are seen by which Port, see Figure 9-65. Or setup is correct, and this front end port of ProtecTIER sees the WWPN of our Frankfurt node.
231
7888ch9-eva.fm
232
7888ch9-eva.fm
Interoperability
Check the IBM Interoperability Matrix to ensure that the version of backup server and the operating system that you are running on are supported for ProtecTIER. Also check the HBA and FW level from the host platform to ensure that your end-to-end environment is supported. Note: Not all backup applications require the IBM Tape Device Driver installation. For some vendors, the SCSI pass-through or native OS driver is used. Check vendor requirements and ISV. IBM maintains the latest levels of System Storage tape drive and library device drivers and documentation on the Internet. Obtain them by accessing the following URL: http://www.ibm.com/support/fixcentral Refer to the SSIC and ISV websites for release information: System Storage Interoperation Center (SSIC) website: http://www-03.ibm.com/systems/support/storage/config/ssic/displayesssearchwitho utjs.wss?start_over=yes Independent Software Vendors (ISV) website: http://www-03.ibm.com/systems/storage/tape/library.html#interoperabilityv
The lin_tape device driver for Linux is provided in a source rpm package. The utility tools for lin_tape are supplied in binary rpm packages. They will be downloaded with the driver. Note: Latest packages can be accessed here, at Fix Central: http://www.ibm.com/support/fixcentral You have to install device driver Linux lin_tape, see details in IBM device driver Users guide.
233
7888ch9-eva.fm
[root@frankfurt ~]# cd /tmp [root@frankfurt tmp]# ./itdt You will go to the Entry Menu, see Example 9-20.
Example 9-20 Entry Menu
IBM Tape Diagnostic Tool Standard Edition - V4.1.0 Build 026 Entry Menu [S] Scan for tape drives (Diagnostic/Maintenance Mode) [U] Tapeutil (Expert Mode) [H] Help [Q] Quit program
Notes: - During a test, user data on the cartridge will be erased! - Make sure no other program is accessing the devices used by ITDT! - A device scan may take several minutes in some cases! - Q + Enter will always close this program. - H + Enter will display a Help page.
<[H] Help | [Q] Quit | Command > s Here you can choose S to scan for tape drives. First, during the scan, you will see the Example 9-21 screen.
Example 9-21 Scan execution
- Device List
234
7888ch9-eva.fm
| 9 | | | | | | | | | | | 10 | | | | | | | | | | | 11 | | | | | | | | | | +----+----+----+----+----+--------------+------------+------+------------+-+ +----+ Scan running... | | +----+ <[Q] Quit | Command > After the scan is finished you will see the below, populated screen, as Example 9-22. In our case the Library has two Robots and ten Drives.
Example 9-22 post results
- Device List
Host Bus ID LUN Model Serial Ucode Changer [#] +----+----+----+----+----+--------------+------------+------+------------+-+ | 0 | 6 | 0 | 0 | 0 | 03584L32 | 75159990402| 0100 | | | | 1 | 6 | 0 | 0 | 1 | ULT3580-TD3 | 1497515000 | 5AT0 | 75159990402| | | 2 | 6 | 0 | 0 | 2 | ULT3580-TD3 | 1497515002 | 5AT0 | 75159990402| | | 3 | 6 | 0 | 0 | 3 | ULT3580-TD3 | 1497515004 | 5AT0 | 75159990402| | | 4 | 6 | 0 | 0 | 4 | ULT3580-TD3 | 1497515006 | 5AT0 | 75159990402| | | 5 | 6 | 0 | 0 | 5 | ULT3580-TD3 | 1497515008 | 5AT0 | 75159990402| | | 6 | 8 | 0 | 0 | 0 | 03584L32 | 75159990402| 0100 | | | | 7 | 8 | 0 | 0 | 1 | ULT3580-TD3 | 1497515001 | 5AT0 | 75159990402| | | 8 | 8 | 0 | 0 | 2 | ULT3580-TD3 | 1497515003 | 5AT0 | 75159990402| | | 9 | 8 | 0 | 0 | 3 | ULT3580-TD3 | 1497515005 | 5AT0 | 75159990402| | | 10 | 8 | 0 | 0 | 4 | ULT3580-TD3 | 1497515007 | 5AT0 | 75159990402| | | 11 | 8 | 0 | 0 | 5 | ULT3580-TD3 | 1497515009 | 5AT0 | 75159990402| | +----+----+----+----+----+--------------+------------+------+------------+-+
<[H] Help | [Q] Quit | + | - | Line # | Command > After the scan, you can view the same output by checking /proc/scsi/IBMtape and /proc/scsi/IBMchanger, or /proc/scsi/IBM*, see Example 9-23.
Example 9-23 device display
[root@frankfurt /]# cat /proc/scsi/IBMtape lin_tape version: 1.41.1 lin_tape major number: 251 Attached Tape Devices: Number model SN HBA 0 ULT3580-TD3 1497515000 lpfc 1 ULT3580-TD3 1497515002 lpfc 2 ULT3580-TD3 1497515004 lpfc 3 ULT3580-TD3 1497515006 lpfc 4 ULT3580-TD3 1497515008 lpfc 5 ULT3580-TD3 1497515001 lpfc
FO Path NA NA NA NA NA NA
Chapter 9. Implementing ProtecTIER
235
7888ch9-eva.fm
6 ULT3580-TD3 1497515003 lpfc 7 ULT3580-TD3 1497515005 lpfc 8 ULT3580-TD3 1497515007 lpfc 9 ULT3580-TD3 1497515009 lpfc [root@frankfurt /]# cat /proc/scsi/IBMchanger lin_tape version: 1.41.1 lin_tape major number: 251 Attached Changer Devices: Number model SN HBA 0 03584L32 0014975159990402 lpfc 1 03584L32 0014975159990402 lpfc
NA NA NA NA
FO Path NA NA
If you compare this view with the ProtecTIER GUI Library General view, onFigure 9-67, you will find the same devices and serials there.
Note: The robots has the same serial number. It is important to use the appropriate driver in case Control Path Failover (CPF) is enabled, as ISV matrix is describing it.
236
7888ch9-eva.fm
1. You can change this a. by clicking on the drop down button next to HyperFactor mode in the Library view. b. or you can also change it from the menu: first choose the appropriate library in the navigation pane, then in the top menu click on VT VT Library Set HyperFactor mode. You will get the following dialog, as shown on Figure 9-69.
2. Select one of the following options, as directed by IBM Support: HyperFactor enabled: HyperFactor operates as normal. HyperFactor disabled: HyperFactor stops. When you restart HyperFactor, the HyperFactor process proceeds as normal based on the data stored from before HyperFactor stopped.
Chapter 9. Implementing ProtecTIER
237
7888ch9-eva.fm
Baseline: HyperFactor stops factoring incoming data and uses the newly stored non-factored data as the reference for factoring new data after HyperFactor is resumed. 3. Click OK. The ProtecTIER VT Hyperfactor mode window closes.
2. Select Disable compression and click OK. The ProtecTIER compression mode dialog closes and compression is stopped. 3. Selecting Enable compression on the ProtecTIER compression mode dialog resumes data compression. Note: The compression done by ProtecTIER is called Diligent Compression. For compressed data streams, create a new ProtecTIER VTL with compression turned off. Compressing data a second time cause can cause data expansion, so compressed data should be segregated in ProtecTIER whenever possible.
238
7888ch9-eva.fm
disable See more details in 9.7.4, Creating Logical Storage Units (LSUs) on page 210.
239
7888ch9-eva.fm
Interoperability
Check the IBM Interoperability Matrix to ensure that the version of the backup server and the operating system that you are running on are supported for ProtecTIER. You can view the matrix at the following address: http://www.ibm.com/systems/support/storage/config/ssic/index.jsp
Software compatibility
Make sure that your backup server version, platform, and operating system version are on the supported hardware and software list for ProtecTIER. You can view the list at the following address: http://www-03.ibm.com/systems/storage/tape/library.html#compatibility
Software currency
Ensure that the backup server has the latest patches or maintenance level to improve the overall factoring performance. Ensure that the operating system of the platform that the backup server is running on has the latest patches or maintenance level to improve the overall HyperFactor performance.
Compression
Standard compression will effectively scramble the data sent to ProtecTIER, making pattern matching difficult; real-time compression in databases and IBM Real-time compression does not scramble this data. As can be expected, this situation will have an effect on data-matching rates, even if the same data is sent each time. ProtecTIER will compress the data that it sends to the back-end physical disk, after it has been received by the virtual tape drives and deduplicated. If a data stream is not already compressed, our experience suggests that it is most efficient to let ProtecTIER compress data after deduplication. Compressed data can have fewer duplicate blocks than uncompressed data, so the effectiveness of deduplication can be diminished. Workloads and results vary, so we encourage experimentation. Compressed data can reach backup systems and ProtecTIER in a number of ways. Consider the following use cases where the data stream going to ProtecTIER remains compressed: File backups may contain files from remote systems that enable compression in their backup client software to conserve network bandwidth. File system backups may also contain compressed file formats, such as GIF images, which remain compressed until opened by a GIF reader. Block-level database backups may contain compressed database objects. Database compression features deliver about 5:1 data reduction for Oracle, DB2, and Informix, and a bit less for databases with free compression. NDMP image backups may contain data compressed by IBM Real-time compression or another process. NDMP uses a back channel to create backups, so IBM Real-time compression or other rehydration processes are bypassed. Some of the actions you can take regarding data being compressed before reaching ProtecTIER are:
240
7888ch9-eva.fm
ProtecTIER does not require additional compression to be effective. ProtecTIER performs compression, by default, after the deduplication process. Do not allocate disk for ProtecTIER VTLs that compresses data by default. ProtecTIER can manage multiple VTLs, each with its own configuration. For compressed data streams, create a new ProtecTIER VTL with compression turned off. Compressing data a second time cause can cause data expansion, so compressed data should be segregated in ProtecTIER whenever possible. File systems with small files (under 32 KB), whether or not they are compressed, should not be sent directly to ProtecTIER. The following options should be considered to prevent ProtecTIER from bypassing small files: Large NAS systems with small files should use NDMP for image backups and then send those files to ProtecTIER. File level backups should first back up to backup application Disk Storage Pools and then the Disk Storage Pool can be copied to ProtecTIER. If a data stream is not already compressed, our experience suggests it is most efficient to let ProtecTIER compress data after deduplication. Compressed data can have fewer duplicate blocks than uncompressed data, so the effectiveness of deduplication can be diminished. Workloads and results vary, so we encourage experimentation. Always encrypt last. Deduplicating encrypted data is ineffective. Compressing encrypted data can decrease security. Drive-level encryption has no performance impact, and it ensures that encryption occurs last.
Encryption
Encryption makes each piece of data sent to ProtecTIER unique, including duplicate data. As can be expected, this has an effect on data matching rates and the factoring performance because even if the same data is sent each time, it will appear differently to the deduplication engine. We strongly recommend that you disable any encryption features for the ProtecTIER storage pool in the backup server. Note: Always encrypt last. Deduplicating encrypted data is ineffective. Compressing encrypted data can decrease security. Drive-level encryption has no performance impact, and it ensures that encryption occurs last.
Multiplexing
Do not use the multiplexing feature of any backup application with the ProtecTIER storage pool. Although ProtecTIER will work with these features, the benefits (disk savings) of the HyperFactor algorithm and compression will be greatly reduced. We strongly recommend that you disable any multiplexing features in the backup server for the ProtecTIER storage pool.
241
7888ch9-eva.fm
Ports
If possible, provision the HBA ports by connecting to the IBM System Storage TS7650 or TS7650G storage system with no other devices (disk or real tape) connected or zoned to those ports. Ensure that the HBAs in the backup server are spread across all the PCI buses.
Additional factors
Another factor that affects performance in a ProtecTIER environment is the type of data being targeted for backup. Some data is well suited to data deduplication and other data is not. For example, small files (less than 32 KB in size) commonly found in operating systems do not factor well, although the built-in compression might reduce their stored size. You might want to consider some of the following options: Larger NAS systems should use NDMP for image backups and then be sent to ProtecTIER. File level backups should first back up to backup application Disk Storage Pools, and then the Disk Storage Pool can be copied to ProtecTIER. We discuss known configuration changes suited to specific data types in 9.11.3, Data types.
7888ch9-eva.fm
Controls the number of tape drives that will be used for backup. Use more parallel backup streams to achieve higher performance. For this you have to make sure to have the amount of virtual tape drives available within ProtecTIER library to match the number of parallel streams configured within RMAN.
Domino
Compaction is a function of Domino to reduce the capacity of primary space the NSF files take. This is a best practice used in almost all Domino environment. Compaction shuffles the data in the NSF file so it takes less space. By doing that the data will be considered new, since it will have almost 100% of change rate. There could be two solutions to overcome this: Disable compaction - although this is contrary to Dominos best practice Upgrade Domino to 8.5 and enable DAOS (Domino Attachment and Object Service) DAOS allows to store only one instance of an attachment as opposed to having multiple copies stored and written to disk. Of course, if the spreadsheet or attachment is modified, DAOS will store the newly modified copy as a separate version. Moreover, attachment consolidation is not limited to mailit occurs as soon as an attachment is saved in any document of any database on the server on which the feature is enabled. In a standard Notes database (NSF), the attachments are stored inside of the NSF file itself, and the database is self-contained. In order to back up a standard Notes database, only the NSF file itself needs to be backed up. After you introduce DAOS, the NSFs that participate in DAOS contain only references to the NLO files where the attachment content is stored. As a result, backing up the NSF alone is no longer enough. The NLO data needs to be backed up as well. These NLO files could be a very good candidate for deduplication.
SQL
We recommend to revise index defragmentation timing, and create a monitor procedure for it. We recommend to not do index defragmentation daily, but possibly change it to weekly, monthly or defragmentation based on threshold setup, where you define exactly at what level or percentage it should run. This would lead to a better factoring ratio, since the data change rate wont be high. Blocksize: We recommend to use larger blocksizes, above 256KB. Since in SQL native backup you might not have the option to change it higher than 64KB, you will have to use the backup application options for that. For example if you use rather ITSM than SQL native backup, you can set up a larger block size in ITSM. You should check settings for BUFFERCOUNT, MAXTRANSFERSIZE and STRIPES. By tuning these you can eliminate some of the data shuffling happening with SQL.
243
7888ch9-eva.fm
DB2
The current DB2 version involves built-in multiplexing which significantly reduces a dedupe solutions factoring ratio (and performance). Improvements to optimize factoring can be managed with larger buffer sizes. Backup performance, especially for large database files, can be achieved by increasing the parallelism of the backup streams. These split files can then be backed up with multiple ITSM backup sessions. We recommend to use maximal buffers, minimum parallelism and minimum number of sessions, see our Example 9-24.
Example 9-24 db2 backup db abc use tsm open 8 sessions with 18 buffers buffer 16384 parallelism 8
Buffer size
The size of the buffer used for a backup stream. A value of 16384 will provide best factoring, but may require more memory than is available. Adjust if necessary, for example the session in the example above (Example 9-24) requires 1.2G of memory. If thats too much use buffer 4097 instead of buffer 16384.
Parallelism
Determines the number of DB2 threads used to handle the backup process. Set value to the minimum number that still allows backups to complete in time. Start with 24 and adjust as necessary.
Sessions
Determines the number of ProtecTIER tape drives used to write the backup. Set value to the minimum number that still allows backups to complete in time. Start with 8 and adjust as necessary.
7888ch9-eva.fm
Client compression should be disabled. When using Windows-based IBM Tivoli Storage Manager servers, the IBM Tivoli Storage Manager driver for tape and libraries for Windows must be used. Native Windows drivers for the emulated P3000 libraries and DLT7000 drives are not supported. Figure 9-71 illustrates a typical ITSM environment using ProtecTIER. The ITSM environment is straightforward. The ITSM servers are connected to storage devices (disk, real tape, or virtual tape), which are used to store data backed up from the clients it is serving. Every action and backup set that ITSM processes is recorded in the ITSM database. Without a copy of the ITSM database, a ITSM server cannot restore any of the data that is contained on the storage devices.
ProtecTIER provides a virtual tape interface to the ITSM servers and allows the creation of two storage pools: The ACTIVE TSM pool The ONSITE TAPE pool (called PT_TAPE_POOL in Figure 9-71) The user can also maintain another storage pool to create real physical tapes to take offsite (called OFFSITE_TAPE in our example). The user has sized the ProtecTIER system to store all active and about 30 days of inactive client files on virtual tape. The customer also created an ACTIVE TSM pool, which is also hosted on the ProtecTIER system, which contains the most recent (active) file backed up from all client servers. The ACTIVE pool is where client restores will come from. The advantage of this architecture is that it has eliminated the use of physical tape in the data center and allows restores to occur much faster, as they are coming from the ProtecTIER disk-based virtual tape versus real tape.
245
7888ch9-eva.fm
For the purposes of this example, the ITSM server is named server1 and is running 6.1, see Example 9-25. The host server is called frankfurt.storage.tucson.ibm.com and is running Red Hat Enterprise Linux 5.4.
Example 9-25
tsm: SERVER1>q stat Storage Management Server for Linux/x86_64 - Version 6, Release 1, Level 2.0
Server Name: SERVER1 Server host name or IP address: Server TCP/IP port number: 1500 Crossdefine: Off Server Password Set: No Server Installation Date/Time: 08/30/2010 16:56:11 Server Restart Date/Time: 09/24/2010 16:22:03 Authentication: On Password Expiration Period: 90 Day(s) Invalid Sign-on Attempt Limit: 0 Minimum Password Length: 0 Registration: Closed Subfile Backup: No Availability: Enabled Accounting: Off Activity Log Retention: 30 Day(s) Activity Log Number of Records: 6134 Activity Log Size: <1 M Activity Summary Retention Period: 30 Day(s) more... (<ENTER> to continue, 'C' to cancel)
246
7888ch9-eva.fm
The library was given ten virtual LTO3 cartridge drives. The tape drives are divided between two ports. The virtual robot is defined on the ProtecTIER node called tuscany. Five hundred library slots were created along with sixteen import/export (I/E) slots for completeness of the library emulation. Ten virtual cartridges were created also. See more details of the virtual LTO3 drive definitions on Figure 9-73.
Note the logical unit number (LUN) assigned to each virtual tape drive. It is important to know that this is the ProtecTIER LUN number only and the host operating system will almost certainly assign each drive a different LUN than that which appears here, as the host will have more logical units or resources than just these tape drives.
Chapter 9. Implementing ProtecTIER
247
7888ch9-eva.fm
During library definition, ProtecTIER has assigned serial numbers to each drive, seeded from a random number. These can be important later when defining IBM Tivoli Storage Manager paths to link the host devices and the IBM Tivoli Storage Manager tape drive definitions. It can be helpful to know the drive serial number if you should experience any problems when defining the paths. You can use the serial number as a common reference point when matching a drive with its host device file name. Element number is the other address of each object in the tape library. Everything that can be located (such as robot;drive;slot;io slot) has its unique element number in the library. ITSM (and any other tape library related software) uses this number as the address in the SCSI command to drive the robot to work. The element number can also be important later when defining the IBM Tivoli Storage Manager tape drives. Ten virtual cartridges are added into the library. They are set up with Max. cartridge growth of 100 GB. One of the cartridge has data in this view(A00001L3). All of them are R/W: read and write capable, see Figure 9-74. The capacity, Data size and Max size all shows the nominal values.
Defining the virtual tape library to Linux with IBM Tivoli Storage Manager
The steps needed to define a ProtecTIER virtual tape library and drives to IBM Tivoli Storage Manager are identical to those required for the corresponding physical tape library and drives. To define a physical or virtual library in ITSM. perform the following steps: 1. Install Device Drivers for tape, as demonstrated in 9.9.2, Installing and configuring the device driver in OS on page 232. 2. Run the scan from itdt, as shown in , Scan all devices on page 234. You will get the below output. as seen in Example 9-26:
Example 9-26 running scan from itds
IBM Tape Diagnostic Tool Standard Edition Host Bus ID LUN Model
248
7888ch9-eva.fm
+----+----+----+----+----+--------------+------------+------+------------+-+ | 0 | 6 | 0 | 0 | 0 | 03584L32 | 75159990402| 0100 | | | | 1 | 6 | 0 | 0 | 1 | ULT3580-TD3 | 1497515000 | 5AT0 | 75159990402| | | 2 | 6 | 0 | 0 | 2 | ULT3580-TD3 | 1497515002 | 5AT0 | 75159990402| | | 3 | 6 | 0 | 0 | 3 | ULT3580-TD3 | 1497515004 | 5AT0 | 75159990402| | | 4 | 6 | 0 | 0 | 4 | ULT3580-TD3 | 1497515006 | 5AT0 | 75159990402| | | 5 | 6 | 0 | 0 | 5 | ULT3580-TD3 | 1497515008 | 5AT0 | 75159990402| | | 6 | 8 | 0 | 0 | 0 | 03584L32 | 75159990402| 0100 | | | | 7 | 8 | 0 | 0 | 1 | ULT3580-TD3 | 1497515001 | 5AT0 | 75159990402| | | 8 | 8 | 0 | 0 | 2 | ULT3580-TD3 | 1497515003 | 5AT0 | 75159990402| | | 9 | 8 | 0 | 0 | 3 | ULT3580-TD3 | 1497515005 | 5AT0 | 75159990402| | | 10 | 8 | 0 | 0 | 4 | ULT3580-TD3 | 1497515007 | 5AT0 | 75159990402| | | 11 | 8 | 0 | 0 | 5 | ULT3580-TD3 | 1497515009 | 5AT0 | 75159990402| | +----+----+----+----+----+--------------+------------+------+------------+-+
<[H] Help | [Q] Quit | + | - | Line # | Command > 3. Obtain information about the tape devices, as seen in Example 9-27:
Example 9-27 device information
[root@frankfurt /]# cat /proc/scsi/IBMtape lin_tape version: 1.41.1 lin_tape major number: 251 Attached Tape Devices: Number model SN HBA 0 ULT3580-TD3 1497515000 lpfc 1 ULT3580-TD3 1497515002 lpfc 2 ULT3580-TD3 1497515004 lpfc 3 ULT3580-TD3 1497515006 lpfc 4 ULT3580-TD3 1497515008 lpfc 5 ULT3580-TD3 1497515001 lpfc 6 ULT3580-TD3 1497515003 lpfc 7 ULT3580-TD3 1497515005 lpfc 8 ULT3580-TD3 1497515007 lpfc 9 ULT3580-TD3 1497515009 lpfc [root@frankfurt /]# cat /proc/scsi/IBMchanger lin_tape version: 1.41.1 lin_tape major number: 251 Attached Changer Devices: Number model SN HBA 0 03584L32 0014975159990402 lpfc 1 03584L32 0014975159990402 lpfc
FO Path NA NA NA NA NA NA NA NA NA NA
FO Path NA NA
4. You can read the Device IDs or run an Element inventory after opening the device in itdt. a. Start idtd, and choose tapeutil (U), see Example 9-28:
Example 9-28 element inventory
(Q to quit)
249
7888ch9-eva.fm
IBM Tape Diagnostic Tool Standard Edition - V4.1.0 Build 026 Entry Menu [S] Scan for tape drives (Diagnostic/Maintenance Mode) [U] Tapeutil (Expert Mode) [H] Help [Q] Quit program
Notes: - During a test, user data on the cartridge will be erased! - Make sure no other program is accessing the devices used by ITDT! - A device scan may take several minutes in some cases! - Q + Enter will always close this program. - H + Enter will display a Help page.
<[H] Help | [Q] Quit | Command > u b. After that Open a device (1), see Example 9-29:
Example 9-29 command menu
---------------------------- General Commands: ---------------------------[1] Open a Device [5] Reserve Device [9] Mode Sense [2] Close a Device [6] Release Device [10] Query Driver Ver. [3] Inquiry [7] Request Sense [11] Display All Paths [4] Test Unit Ready [8] Log Sense ---------------------------- Tape Drive Commands: ------------------------[20] Rewind [28] Erase [36] Display Message [21] Forward Space Filemarks [29] Load Tape [37] Report Density Supp [22] Backward Space Filemarks [30] Unload Tape [38] Test Encryp. Path [23] Forward Space Records [31] Write Filemarks [39] Config. TCP/IP Port [24] Backward Space Records [32] Synchronize Buffers [25] Space to End of Data [33] Query/Set Parameter [26] Read and Write Tests [34] Query/Set Tape Position [27] Read or Write Files [35] Query Encryption Status ---------------------------- Tape Library Commands: ----------------------[50] Element Information [55] Initialize Element Status [51] Position to Element [56] Prevent/Allow Medium Removal [52] Element Inventory [57] Initialize Element Status Range [53] Exchange Medium [58] Read Device IDs [54] Move Medium [59] Read Cartridge Location
250
7888ch9-eva.fm
---------------------------- Service Aid Commands: -----------------------[70] Dump/Force Dump/Dump [71] Firmware Update <[H] Help | [Q] Quit | Command > 1 c. Type in the right device, in our case it is /dev/IBMchanger0. See Example 9-30.
Example 9-30 selecting device
ITDT- Open a Device +---------------------------------------------+ | /dev/IBMtape0 | +---+-----------------------------------------+ | 1 | 1=Read/Write, 2=Read Only, 3=Write Only, +---+
<Specify Device Name | [enter] for /dev/IBMtape0> /dev/IBMchanger0 d. Then choose 1, for the mode Read/Write, see Example 9-31:
Example 9-31 mode selection
ITDT- Open a Device +---------------------------------------------+ | /dev/IBMchanger0 | +---+-----------------------------------------+ | 1 | 1=Read/Write, 2=Read Only, 3=Write Only, +---+
<Specify Mode | [enter] for 1> 1 e. Then press Enter to execute, and you get the command result, see Example 9-32:
Example 9-32 execution
Command Result +----------------------------------------------------------------------------+ | Opening Device... | | Open Device PASSED | | | | Device:/dev/IBMchanger0 opened | | | | | +----------------------------------------------------------------------------+
251
7888ch9-eva.fm
< [Q] Quit | [N] Next| [P] Previous | + | - | [Enter] Return > f. Press Enter to return to main menu, where choose 50 Element information, see Example 9-33. Here you will see the dimensions of the library and the starting element numbers.
Example 9-33 element information Command Result +-----------------------------------------------------------------------------+ | Getting element information... | | Read Element Information PASSED | | | | | | Number of Robots .............. 1 | | First Robot Address ............ 0 | | Number of Slots ................ 500 | | First Slot Address ............. 1026 | | Number of Import/Exports ....... 16 | | First Import/Export Address .... 64514 | | Number of Drives ............... 10 | | First Drive Address ............ 2 | | | +-----------------------------------------------------------------------------+ < [Q] Quit | [N] Next| [P] Previous | + | - | [Enter] Return >
g. Then by pressing Enter, you go back to the menu. Now you can run 58 Read Device IDs. You will get the below results, as seen Example 9-34. To go to next page within tapeutil, you have to press n (next). In our example we show it only for two drives.
Example 9-34 Read device ids Command Result +-----------------------------------------------------------------------------+ | Reading element device ids... | | Read Device Ids PASSED | | | | | | Drive Address 2 | | Drive State ................... Normal | | ASC/ASCQ ...................... 0000 | | Media Present ................. No | | Robot Access Allowed .......... Yes | | Source Element Address Valid .. No | | Media Inverted ................ No | | Same Bus as Medium Changer .... Yes | | SCSI Bus Address Valid ........ No | | Logical Unit Number Valid ..... No | | Volume Tag .................... | | Device ID, Length 34 | | | | 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF | | 0000 - 4942 4D20 2020 2020 554C 5433 3538 302D [IBM ULT3580-] | | 0010 - 5444 3320 2020 2020 3134 3937 3531 3530 [TD3 14975150] | | 0020 - 3030 [00 ] | | | | | | | | Drive Address 3 | | Drive State ................... Normal | | ASC/ASCQ ...................... 0000 |
252
7888ch9-eva.fm
| | | | | | | | | | | | | |
Media Present ................. Robot Access Allowed .......... Source Element Address Valid .. Media Inverted ................ Same Bus as Medium Changer .... SCSI Bus Address Valid ........ Logical Unit Number Valid ..... Volume Tag .................... Device ID, Length 34
No Yes No No Yes No No
0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF 0000 - 4942 4D20 2020 2020 554C 5433 3538 302D [IBM ULT3580-] 0010 - 5444 3320 2020 2020 3134 3937 3531 3530 [TD3 14975150] 0020 - 3031 [01 ]
| | | | | | | | | | | | | |
Note: You can get the serial number as well for each device with the below command: [root@frankfurt tmp]# ./itdt -f /dev/IBMtape0 inquiry 80 Issuing inquiry for page 0x80... Inquiry Page 0x80, Length 14 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF 0000 - 0180 000A 3134 3937 3531 3530 3030 [....1497515000 ] h. You can choose 52 Element Inventory, to see the element numbers. You will get the following results, see Example 9-35.
Example 9-35 viewing element numbers Command Result +-----------------------------------------------------------------------------+ | Reading element status... | | | | Element Inventory PASSED | | | | | | | | Robot Address 0 | | Robot State ................... Normal | | ASC/ASCQ ...................... 0000 | | Media Present ................. No | | Source Element Address Valid .. No | | Media Inverted ................ No | | Volume Tag .................... | | | | Drive Address 2 | | Drive State ................... Normal | | ASC/ASCQ ...................... 0000 | | Media Present ................. No | | Robot Access Allowed .......... Yes | | Source Element Address Valid .. No | | Media Inverted ................ No | | Same Bus as Medium Changer .... Yes | | SCSI Bus Address Valid ........ No | | Logical Unit Number Valid ..... No | | Volume Tag .................... | | | | Drive Address 3 |
253
7888ch9-eva.fm
| Drive State ................... Normal | ASC/ASCQ ...................... 0000 | Media Present ................. No | Robot Access Allowed .......... Yes | Source Element Address Valid .. No | Media Inverted ................ No | Same Bus as Medium Changer .... Yes | SCSI Bus Address Valid ........ No | Logical Unit Number Valid ..... No | Volume Tag .................... | | | Slot Address ................... 1026 | Slot State .................... Normal | ASC/ASCQ ...................... 0000 | Media Present ................. Yes | Robot Access Allowed .......... Yes | Source Element Address ........ 1038 | Media Inverted ................ No | Volume Tag .................... A00000L3 | | Slot Address ................... 1027 | Slot State .................... Normal | ASC/ASCQ ...................... 0000 | Media Present ................. Yes | Robot Access Allowed .......... Yes | Source Element Address ........ 8 | Media Inverted ................ No | Volume Tag .................... A00001L3 | | Slot Address ................... 1028 | Slot State .................... Normal | ASC/ASCQ ...................... 0000 | Media Present ................. Yes | Robot Access Allowed .......... Yes | Source Element Address ........ 3 | Media Inverted ................ No | Volume Tag .................... A00002L3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
5. Using the above information we filled out this worksheet, Table 9-3 on page 254.
Table 9-3 Tape Library Worksheet Device in OS IBMchanger0 IBMtape0 IBMtape1 Type 3584 LT3 LT3 VTL system Tuscany Tuscany Tuscany VTL node Tuscany Tuscany Tuscany VTL port WWN 10000000c97c6ca1 10000000c97c6ca1 10000000c97c62ef Serial number 0014975159990402 1497515000 1497515001 Element number 0 2 3
6. Now you can define the library in IBM Tivoli Storage Manager, using the commands DEFINE LIBRARY and QUERY LIBRARY. See Example 9-36.
Example 9-36 defining library
tsm: SERVER1>def library vtl libtype=scsi relabelscratch=yes ANR8400I Library VTL defined. tsm: SERVER1>q libr f=d
254
7888ch9-eva.fm
Library Name: Library Type: ACS Id: Private Category: Scratch Category: WORM Scratch Category: External Manager: Shared: LanFree: ObeyMountRetention: Primary Library Manager: WWN: Serial Number: AutoLabel: Reset Drives: Relabel Scratch: Last Update by (administrator): Last Update Date/Time:
VTL SCSI
No
Note: The RELABELSCRATCH parameter allows you to automatically relabel volumes when they are returned to scratch. This will ensure that after a cartridge is expired, and returns to scratch, ITSM will relabel it. In this case the space will be freed up in ProtecTIER too. 7. Define path to the IBM Tivoli Storage Manager library using the DEFINE PATH and QUERY PATH commands. See Example 9-37:
Example 9-37 Define path
tsm: SERVER1>def path server1 vtl srct=server destt=library device=/dev/IBMchanger0 ANR1720I A path from SERVER1 to VTL has been defined. 8. Define the drives to IBM Tivoli Storage Manager using theDEFINE DRIVE and QUERY DRIVE commands. See Example 9-38:
Example 9-38 define drive
tsm: SERVER1>def drive vtl drive00 ANR8404I Drive DRIVE00 defined in library VTL. tsm: SERVER1>def drive vtl drive01 ANR8404I Drive DRIVE01 defined in library VTL. tsm: SERVER1>q dr Library Name -----------VTL VTL Drive Name -----------DRIVE00 DRIVE01 Device Type ----------LTO LTO On-Line ------------------Yes Yes
tsm: SERVER1>q dr vtl DRIVE00 f=d Library Name: VTL Drive Name: DRIVE00
255
7888ch9-eva.fm
Device Type: LTO On-Line: Yes Read Formats: ULTRIUM3C,ULTRIUM3,ULTRIUM2C,ULTRIUM2,ULTRIUMC,ULTRIUM Write Formats: ULTRIUM3C,ULTRIUM3,ULTRIUM2C,ULTRIUM2 Element: 2 Drive State: EMPTY Volume Name: Allocated to: WWN: 20010000C97C6CA1 Serial Number: 1497515000 Last Update by (administrator): ALEXC Last Update Date/Time: 08/31/2010 13:42:09 Cleaning Frequency (Gigabytes/ASNEEDED/NONE): NONE tsm: SERVER1>q dr vtl DRIVE01 f=d Library Name: VTL Drive Name: DRIVE01 Device Type: LTO On-Line: Yes Read Formats: ULTRIUM3C,ULTRIUM3,ULTRIUM2C,ULTRIUM2,ULTRIUMC,ULTRIUM Write Formats: ULTRIUM3C,ULTRIUM3,ULTRIUM2C,ULTRIUM2 Element: 3 Drive State: UNKNOWN Volume Name: Allocated to: WWN: 20010000C97C62EF Serial Number: 1497515001 Last Update by (administrator): ALEXC Last Update Date/Time: 09/24/2010 16:46:19 Cleaning Frequency (Gigabytes/ASNEEDED/NONE): NONE 9. Define the paths to the IBM Tivoli Storage Manager drives. See Example 9-39:
Example 9-39 Defining paths
tsm: SERVER1>def path server1 drive00 srct=server destt=drive library=vtl device=/dev/IBMtape0 ANR1720I A path from SERVER1 to VTL DRIVE00 has been defined. tsm: SERVER1>def path server1 drive01 srct=server destt=drive library=vtl device=/dev/IBMtape1 ANR1720I A path from SERVER1 to VTL DRIVE01 has been defined. tsm: SERVER1>q path Source Name ----------SERVER1 SERVER1 SERVER1 Source Type ----------SERVER SERVER SERVER Destination Name ----------VTL DRIVE00 DRIVE01 Destination Type ----------LIBRARY DRIVE DRIVE On-Line ------Yes Yes Yes
256
7888ch9-eva.fm
10.Define an IBM Tivoli Storage Manager device class using the commands DEFINE DEVCLASS and QUERY DEVCLASS, see Example 9-40.
Example 9-40 Defining Device Class
tsm: SERVER1>def devc ltoclass libr=vtl devtype=lto format=drive mountret=5 ANR2203I Device class LTOCLASS defined. tsm: SERVER1>q devclass Device Class Name --------DISK LTOCLASS Device Access Strategy ---------Random Sequential Storage Pool Count ------1 1 Device Type --------LTO Format Est/Max Capacity (MB) -------Mount Limit -----DRIVES
-----DRIVE
11.Define an IBM Tivoli Storage Manager storage pool that uses the device class defined in previous step using these commands: DEFINE STGPOOL and QUERY STGPOOL. See
Example 9-41 Defining Storage pool
tsm: SERVER1>def stg ptpool ltoclass maxscr=100 ANR2200I Storage pool PTPOOL defined (device class LTOCLASS). tsm: SERVER1>q stgpool Storage Pool Name ----------PTPOOL SPACEMGPOOL Device Class Name ---------LTOCLASS DISK Estimated Capacity ---------0.0 M 0.0 M Pct Util ----0.0 0.0 Pct Migr ----0.0 0.0 High Mig Pct ---90 90 Low Mig Pct --70 70 Next Storage Pool -----------
12.Label the cartridges in the library using these commands: LABEL LIBVOLUME to label the cartridges, QUERY LIBVOL: to list the volumes we labeled.
Example 9-42 Querying the volumes
tsm: SERVER1>label libv vtl search=yes labels=barcode checkin=scratch ANS8003I Process number 2 started. tsm: SERVER1>q libv Library Name Volume Name Status Home Element ------------ ----------- ---------------- ---------- --------- ------VTL A00000L3 Scratch 1,026 VTL A00001L3 Scratch 1,027 VTL A00002L3 Scratch 1,028 VTL A00003L3 Scratch 1,029 VTL A00004L3 Scratch 1,030 VTL A00005L3 Scratch 1,031 VTL A00006L3 Scratch 1,032 VTL A00007L3 Scratch 1,033 VTL A00008L3 Scratch 1,034 VTL A00009L3 Scratch 1,035 Owner Last Use Device Type -----LTO LTO LTO LTO LTO LTO LTO LTO LTO LTO
257
7888ch9-eva.fm
The virtual library is now defined to your IBM Tivoli Storage Manager server, and the virtual drives and cartridges are ready for use. You now must use standard methods to alter your management class copy groups to change the destination value to point to the storage pool created for the virtual library. Note: If you do not label the virtual cartridges before use, when ITSM attempts to write data to them, the process will fail, and ITSM will issue an error message saying that it could not read the internal label of the cartridge. If this error occurs, issue CHECKOUT LIBVOLUME commands to check out and reset the library status of all the cartridges (include the REMOVE=NO option so that they do not leave the virtual library) and label them again with the LABEL LIBVOLUME command. If you forget to include the REMOVE=NO option in your CHECKOUT LIBVOLUME command, the library will place the virtual cartridges in the virtual import/export slots. You can view cartridges stored in these slots on the Import/Export tab of the Library window. Using the menu options accessed by right-clicking the desired cartridge, you can relocate the cartridges back to standard virtual slots. After they are relocated to standard slot locations, use the LABEL LIBVOLUME command to label and check them in again. Alternatively, you can label them directly from the Import/Export slots by using the SEARCH=BULK option on the LABEL LIBVOLUME command.
Note: If IBM Tivoli Storage Manager has SANDISCOVERY ON when you are using the ProtecTIER Virtual Tape Library, it can cause problems with the tape drive path descriptions of a node if the node goes offline or the path to the port is broken. With this option on in this scenario, you must reconfigure all the tape paths or determine which device belongs to which path by serial number (which will take much longer). With a lot of virtual drives, this could be time consuming.
7888ch9-eva.fm
LAN-free backup windows were dictated not entirely by business needs but also by hardware availability. With ProtecTIER and its maximum of 256 virtual tape drives per ProtecTIER node, you can almost completely eliminate any hardware restrictions that you might have faced previously, and schedule your backups when they are required by your business needs. Data streams: You might be able to reduce your current backup window by taking full advantage of ProtecTIERs throughput performance capabilities. If tape drive availability has been a limiting factor on concurrent backup operations on your ITSM server, you can define a greater number of virtual drives and reschedule backups to run at the same time to maximize the number of parallel tape operations possible on ProtecTIER systems. Note: If you choose to implement this strategy, you might need to increase the value of the MAXSESSIONS option on your ITSM server. Reclamation: You should continue to reclaim virtual storage pools that are resident on ProtecTIER. The thresholds for reclamation might need some adjustment for a period until the system reaches steady state (refer to 3.4.5, Steady state on page 29 for an explanation of this term). When this point is reached, the fluctuating size of the virtual cartridges should stabilize and you can make a decision about what the fixed reclamation limit ought to be. Number of cartridges: This is a decision with several factors to be considered: In ProtecTIER, the capacity of your repository is spread across all your defined virtual cartridges. If you define only a small number of virtual cartridges in ProtecTIER Manager, you might end up with cartridges that hold a large amount of nominal data each. While this might reduce complexity, it could also affect restore operations in that a cartridge required for a restore might be in use by a backup or housekeeping task. Preemption can resolve this issue, but it might instead be better to define extra cartridges so that your data is spread over more cartridges and drives to make the best use of your virtual tape environment. Reuse delay period for storage pool cartridges: When deciding how many virtual cartridges to define, remember to consider using the current storage pool REUSEDELAY value. This is usually equal to the number of days that your ITSM database backups are retained before they expire. The same delay period should apply to your storage pools that store data on ProtecTIER virtual cartridges and you might need to increase the number defined to ensure that you always have scratch cartridges available for backup. Collocation: When using a virtual library, you should consider implementing collocation for your primary storage pools. If you begin a restore while another task (for example, a backup or cartridge reclamation) is using the virtual cartridge, you might not be able to access the data on it immediately. Using collocation means that all your data is contained on the same set of virtual cartridges. Because you do not have any of the restrictions of physical cartridges normally associated with this feature (such as media and slot consumption), you can enable the option quite safely. Consider these points when determining how many virtual cartridges are to be created. Remember that you can always create additional virtual cartridges at any time. Physical tape: Depending on your data protection requirements, it might still be necessary to copy the deduplicated data to physical tape. This can be achieved by using standard ITSM copy storage pools that have device classes directing data to physical libraries and drives.
259
7888ch9-eva.fm
260
7888ch10.fm
10
Chapter 10.
261
7888ch10.fm
tsm: TSMSRVR>define devc dedup devtype=file mountl=20 maxcap=1G dir=/tsmpool ANR2203I Device class DEDUP defined. tsm: TSMSRVR>define devc nondedup devtype=file mountl=20 maxcap=1G dir=/tsmpool2 ANR2203I Device class NONDEDUP defined. tsm: TSMSRVR>define stgpool deduppool dedup maxscr=200 deduplicate=yes identifyprocess=0 ANR2200I Storage pool DEDUPPOOL defined (device class DEDUP). tsm: TSMSRVR>define stgpool nondeduppool nondedup maxscr=200 deduplicate=no pooltype=copy ANR2200I Storage pool NONDEDUPOOL defined (device class NONDEDUP). Note: Make sure that the filesystem your device class configuration points to is large enough to store all of your backups (in our example, /tsmpool and /tsmpool2). Now we create a simple backup policy structure that points to the deduplication-enabled storage pool created above (in our example, DEDUPPOOL). See Example 10-2.
Example 10-2 Creating the backup policy structure
tsm: TSMSRVR>def domain dedup descript="Deduplicated Data Domain" ANR1500I Policy domain DEDUP defined. tsm: TSMSRVR>def policyset DEDUP PS_DEDUP ANR1510I Policy set PS_DEDUP defined in policy domain DEDUP. tsm: TSMSRVR>def mgmt DEDUP PS_DEDUP MC_DEDUP ANR1520I Management class MC_DEDUP defined in policy domain DEDUP, set PS_DEDUP. tsm: TSMSRVR>def copyg DEDUP PS_DEDUP MC_DEDUP dest=deduppool ANR1530I Backup copy group STANDARD defined in policy domain DEDUP, set
262
7888ch10.fm
PS_DEDUP, management class MC_DEDUP. tsm: TSMSRVR>assign defmgmt DEDUP PS_DEDUP MC_DEDUP ANR1538I Default management class set to MC_DEDUP for policy domain DEDUP, set PS_DEDUP. tsm: TSMSRVR>validate policyset DEDUP PS_DEDUP ANR1554W DEFAULT Management class MC_DEDUP in policy set DEDUP PS_DEDUP does not have an ARCHIVE copygroup: files will not be archived by default if this set is activated. ANR1515I Policy set PS_DEDUP validated in domain DEDUP (ready for activation). tsm: TSMSRVR>activate policyset DEDUP PS_DEDUP ANR1554W DEFAULT Management class MC_DEDUP in policy set DEDUP PS_DEDUP does not have an ARCHIVE copygroup: files will not be archived by default if this set is activated. Do you wish to proceed? (Yes (Y)/No (N)) y ANR1554W DEFAULT Management class MC_DEDUP in policy set DEDUP PS_DEDUP does not have an ARCHIVE copygroup: files will not be archived by default if this set is activated. ANR1514I Policy set PS_DEDUP activated in policy domain DEDUP. For additional information about policy domain configuration and backup or archive copy group options, please refer to the Tivoli Storage Manager Administrators Guide located at http://publib.boulder.ibm.com/infocenter/tsminfo/v6r2/topic/com.ibm.itsm.ic.doc/we lcome.html Finally we create a new ITSM client nodename using the DEDUP domain (see Example 10-3).
Example 10-3 Creating an ITSM nodename
tsm: TSMSRVR>reg node TSM_CLIENT <password> domain=dedup ANR2060I Node TSM_CLIENT registered in policy domain DEDUP. ANR2099I Administrative userid TSMNODE defined for OWNER access to node TSMNODE. We ran an incremental backup of the TSM_CLIENT machine, copying 101GB to our DEDUPPOOL storage pool as shown in Example 10-4.
Example 10-4 Displaying storage pool usage
tsm: TSMSRVR>q stg Storage Pool Name ----------DEDUPPOOL NONDEDUPPOOL Device Class Name ---------DEDUP NONDEDUP Estimated Capacity ---------133 G 0.0 M Pct Util ----78.0 0.0 Pct Migr ----78.0 High Mig Pct ---90 Low Mig Pct --70 Next Storage Pool -----------
With data in our storage pool, we can run the identify duplicates process to identify duplicated data on the DEDUPPOOL storage pool (see Example 10-5).
263
7888ch10.fm
tsm: TSMSRVR>identify duplic deduppool numproc=3 ANR1913I IDENTIFY DUPLICATES successfully started 3 IDENTIFY processes. tsm: TSMSRVR>q proc Process Number -------299 Process Description -------------------Identify Duplicates Status ------------------------------------------------Storage pool: DEDUPPOOL. Volume: /tsmpool/00000318.BFS. State: active. State Date/Time: 08/31/10 15:59:23. Current Physical File(bytes): 986,619,600. Total Files Processed: 134. Total Duplicate Extents Found: 9. Total Duplicate Bytes Found: 1,763,771. Storage pool: DEDUPPOOL. Volume: /tsmpool/00000319.BFS. State: active. State Date/Time: 08/31/10 15:59:23. Current Physical File(bytes): 252,246,757. Total Files Processed: 28. Total Duplicate Extents Found: 4. Total Duplicate Bytes Found: 1,170,719. Storage pool: DEDUPPOOL. Volume: /tsmpool/0000031A.BFS. State: active. State Date/Time: 08/31/10 15:59:23. Current Physical File(bytes): 93,628,221. Total Files Processed: 241. Total Duplicate Extents Found: 125. Total Duplicate Bytes Found: 14,430,666.
300
Identify Duplicates
301
Identify Duplicates
After all data in the storage pool is identified, the identify duplicates processes will be in an idle state and will remain like that until you cancel the processes or there is new data to be deduplicated. Even if we run the reclamation process now on the DEDUPPOOL storage pool, the data will not be deleted since it was not copied yet to a copy pool (see Example 10-6). By default, ITSM will only delete deduplicated data during reclamation after they have been copied to a copy storage pool. If you want to disable this feature, please check Disabling the copy storage pool backup requirement on page 270.
Example 10-6 Showing storage pool usage
tsm: TSMSRVR>q stg Storage Pool Name ----------DEDUPPOOL NONDEDUPPOOL Device Class Name ---------DEDUP NONDEDUP Estimated Capacity ---------133 G 0.0 M Pct Util ----78.0 0.0 Pct Migr ----78.0 High Mig Pct ---90 Low Mig Pct --70 Next Storage Pool -----------
Now we backup the storage pool to a non-deduplicated copy storage pool (see Example 10-7).
Example 10-7 Backing up the primary storage pool
7888ch10.fm
ANS8003I Process number 302 started. After the backup stg process finishes, we can see that all data was copied as shown in Example 10-8 on page 265.
Example 10-8 Displaying storage pools usage after backup stg
tsm: TSMSRVR>q stg Storage Pool Name ----------DEDUPPOOL NONDEDUPPOOL Device Class Name ---------DEDUP NONDEDUP Estimated Capacity ---------133 G 133 G Pct Util ----78.0 78.0 Pct Migr ----78.0 High Mig Pct ---90 Low Mig Pct --70 Next Storage Pool -----------
As you can see above, even after backing up the deduplicated storage pool to a copy storage pool, the deduplicated data is not deleted. This will only happen after you run reclamation on the deduplicated storage pool (see Example 10-9).
Example 10-9 Running reclamation
tsm: TSMSRVR>reclaim stg DEDUPPOOL thre=1 ANR2110I RECLAIM STGPOOL started as process 44. ANR4930I Reclamation process 44 started for primary storage pool DEDUPPOOL manually, threshold=1, duration=300. ANS8003I Process number 44 started. After the reclaim stg process finishes, the deduplicated data is deleted and you can see the amount of free space as shown in Example 10-10. Note the Duplicate Data Not Stored parameter on the storage pool listing.
Example 10-10 Showing storage pool usage after deduplication
tsm: TSMSRVR>q stg Storage Pool Name ----------DEDUPPOOL NONDEDUPPOOL Device Class Name ---------DEDUP NONDEDUP Estimated Capacity ---------133 G 133 G Pct Util ----58.5 78.0 Pct Migr ----58.5 High Mig Pct ---90 Low Mig Pct --70 Next Storage Pool -----------
tsm: TSMSRVR>q stg deduppool f=d Storage Pool Name: Storage Pool Type: Device Class Name: Estimated Capacity: Space Trigger Util: Pct Util: Pct Migr: Pct Logical: High Mig Pct: DEDUPPOOL Primary DEDUP 133 G 98.6 58.5 58.5 99.9 90
265
7888ch10.fm
Low Mig Pct: Migration Delay: Migration Continue: Migration Processes: Reclamation Processes: Next Storage Pool: Reclaim Storage Pool: Maximum Size Threshold: Access: Description: Overflow Location: Cache Migrated Files?: Collocate?: Reclamation Threshold: Offsite Reclamation Limit: Maximum Scratch Volumes Allowed: Number of Scratch Volumes Used: Delay Period for Volume Reuse: Migration in Progress?: Amount Migrated (MB): Elapsed Migration Time (seconds): Reclamation in Progress?: Last Update by (administrator): Last Update Date/Time: Storage Pool Data Format: Copy Storage Pool(s): Active Data Pool(s): Continue Copy on Error?: CRC Data: Reclamation Type: Overwrite Data when Deleted: Deduplicate Data?: Processes For Identifying Duplicates: Duplicate Data Not Stored: Auto-copy Mode: Contains Data Deduplicated by Client?:
70 0 Yes 1 1
No Limit Read/Write
17:48:58
Note: Not all of the duplicated data may expire at once. Several reclamation processing may be needed to expire all of the duplicated extends. You can use the show deduppending <stgpool_name> command on the ITSM server to show the amount of data that has been identified as duplicate data, but has not yet been deleted (see Example 10-11).
Example 10-11 Running show deduppending command
tsm: TSMSRVR>show deduppending DEDUPPOOL ANR1015I Storage pool DEDUPPOOL has 775,340 duplicate bytes pending removal. tsm: TSMSRVR>
266
7888ch10.fm
tsm: TSMSRVR>reg node TSM_CLIENT <password> domain=dedup dedup=clientorserver ANR2060I Node TSM_CLIENT registered in policy domain DEDUP. ANR2099I Administrative userid TSMNODE defined for OWNER access to node TSMNODE. The next step is to edit the dsm.opt (or dsm.sys if unix) file on the TSM_CLIENT machine, and include the line deduplication yes You can also enable client-side deduplication from the ITSM Backup-Archive Client GUI, using the following steps: Select Edit Client Preferences Click the Deduplication tab Select the Enable Deduplication check box Click OK to save your selections and close the Preferences Editor After saving the dsm.opt (or dsm.sys for unix/aix) file, the machine is ready for client-side deduplication. Table 10-1 summarizes when client-side-deduplication is enabled, because the setting on the ITSM server ultimately determines whether client-side data deduplication is enabled or not.
Table 10-1 Data deduplication settings Value of the deduplication option in the ITSM client configuration file Yes Yes No No Value of the nodenames deduplication parameter on the ITSM server ClientOrServer ServerOnly ClientOrServer ServerOnly Data deduplication method
You can also combine both client-side and server-side data deduplication in the same production environment. For example, you can specify certain nodenames for client-side data deduplication and certain nodenames for server-side data deduplication. You can store the data for both sets of client nodes in the same deduplicated storage pool.
267
7888ch10.fm
tsm: TSMSRVR>define stgpool deduppool dedup maxscr=200 deduplicate=yes identifyprocess=3 ANR2200I Storage pool DEDUPPOOL defined (device class DEDUP).
Note: The identify duplicates processes will always be active and will always appear in the q proc command output. If theres no data to be deduplicated they will continue to run, but will show as in the idle state (see Example 10-16). You can change the number of the duplicate-identification processes at any time, using the update stgpool command, like in Example 10-14.
Example 10-14 Updating identification processes in a storage pool
tsm: TSMSRVR>upd stg DEDUPPOOL identifyprocess=2 ANR2202I Storage pool DEDUPPOOL updated.
tsm: TSMSRVR>define stgpool deduppool dedup maxscr=200 deduplicate=yes identifyprocess=0 ANR2200I Storage pool DEDUPPOOL defined (device class DEDUP). If you are creating a copy storage pool or an active-data pool and you do not specify a value, the server does not start any processes automatically. After the storage pool has been created, you can increase or decrease the number of duplicate-identification processes as shown in example Example 10-14. Remember that any value above zero will cause the duplicate-identification processes to start automatically.
268
7888ch10.fm
Since the identify duplicates processes will not be running (we specified identifyprocess=0), you need to start the processes either manually or via an ITSM administrative schedule, specifying the number of processes and the duration as shown in Example 10-16.
Example 10-16 Running identify duplicates process
tsm: TSMSRVR>identify dup DEDUPPOOL numproc=3 duration=120 ANR1913I IDENTIFY DUPLICATES successfully started 3 IDENTIFY processes. tsm: TSMSRVR>q proc Process Process Description Status Number -------- -------------------- ------------------------------------------------58 Identify Duplicates Storage pool: DEDUPPOOL. Volume: NONE. State: idle. State Date/Time: 09/07/10 16:23:21. Current Physical File(bytes): 0. Total Files Processed: 0. Total Duplicate Extents Found: 0. Total Duplicate Bytes Found: 0. 59 Identify Duplicates Storage pool: DEDUPPOOL. Volume: NONE. State: idle. State Date/Time: 09/07/10 16:23:21. Current Physical File(bytes): 0. Total Files Processed: 0. Total Duplicate Extents Found: 0. Total Duplicate Bytes Found: 0. 60 Identify Duplicates Storage pool: DEDUPPOOL. Volume: NONE. State: idle. State Date/Time: 09/07/10 16:23:21. Current Physical File(bytes): 0. Total Files Processed: 0. Total Duplicate Extents Found: 0. Total Duplicate Bytes Found: 0. When the amount of time that you specify as a duration expires, the number of duplicate-identification processes always reverts to the number of processes specified in the storage pool definition. If that number is zero, all duplicate-identification processes are killed.
tsm: TSMSRVR>upd stg DEDUPPOOL deduplicate=yes identifyproc=3 ANR2202I Storage pool DEDUPPOOL updated.
269
7888ch10.fm
tsm: TSMSRVR>upd stg DEDUPPOOL deduplicate=no ANR2202I Storage pool DEDUPPOOL updated.
tsm: TSMSRVR>setopt deduprequiresbackup no Do you wish to proceed? (Yes (Y)/No (N)) y ANR2119I The DEDUPREQUIRESBACKUP option has been changed in the options file. tsm: TSMSRVR> Note: Setting this option to no means that no storage pool backup is required for deduplicated data to be deleted. This is not the default value and is not recommended. To display all ITSM server configuration options, you can use the q opt command on the administrative command line prompt as shown in Example 10-20.
Example 10-20 ITSM server options
tsm: TSMSRVR>q opt Server Option ----------------CommTimeOut MessageFormat Alias Halt ExpInterval ActiveLogSize ActiveLogDir Option Setting -------------------60 1 HALT 24 16,384 /tsmlog Server Option ----------------IdleTimeOut Language MaxSessions RunStats Interval DatabaseMemPercent MirrorLogDir Option Setting -------------------15 AMENG 25 10 AUTO
270
7888ch10.fm
ArchFailOverLogDir DbDiagLogSize ExpQuiet ReportRetrieve VolumeHistory TxnGroupMax MoveSizeThresh DisableScheds REQSYSauthoutfile QueryAuth ThroughPutTimeThreshold Resource Timeout AdminOnClientPort IMPORTMERGEUsed NDMPControlPort SearchMPQueue SanRefreshTime DedupRequiresBackup NumOpenVolsAllowed ServerDedupTxnLimit TCPPort TCPWindowsize TCPNoDelay CommMethod ShmPort UserExit AssistVCRRecovery AcsTimeoutX AcsQuickInit SNMPSubagentHost TECHost UNIQUETECevents Async I/O 3494Shared SANdiscovery SSLTCPADMINPort
ArchiveLogDir 1024 No No /home/tsm/tsm/volhist.out 4096 4096 No Yes None 0 60 Yes Yes 10,000 No 0 No 20 300 1500 64512 Yes ShMem 1510 Yes 1 Yes 127.0.0.1 No No No Off TcpAdminport TCPBufsize CommMethod MsgInterval FileExit FileTextExit AcsAccessId AcsLockDrive SNMPSubagentPort SNMPHeartBeatInt TECPort UNIQUETDPTECevents SHAREDLIBIDLE CheckTrailerOnFree SSLTCPPort SANDISCOVERYTIMEOUT CheckTapePos EventServer DISPLAYLFINFO Devconfig MoveBatchSize RestoreInterval AuditStorage MsgStackTrace ThroughPutDataThreshold NOPREEMPT TEC UTF8 Events NORETRIEVEDATE DNSLOOKUP NDMPPortRange SHREDding RetentionExtension VARYONSERIALLY ClientDedupTxnLimit
/tsmarchlog Yes Yes No /home/tsm/tsm/devconfig.out 1000 1,440 Yes On 0 (No) No No Yes 0,0 Automatic 365 No 300
No 1521 5 0 No No On
15
271
7888ch10.fm
However, restore or retrieve operations from a sequential-access disk storage pool (FILE type device class) that is set up for data deduplication have different performance characteristics than a FILE storage pool that is not set up for data deduplication. In a FILE storage pool that is not set up for data deduplication, files on a volume that are being restored or retrieved are read sequentially from the volume before the next volume is mounted. This process ensures optimal I/O performance and eliminates the need to mount a volume multiple times. In a FILE storage pool that is set up for data deduplication, however, extents that comprise a single file can be distributed across multiple volumes. To restore or retrieve the file, each volume containing a file extent must be mounted. As a result, the I/O is more random, which can lead to slower restore or retrieve times. These results occur more often with small files that are less than 100 KB. In addition, more processor resources are consumed when restoring or retrieving from a deduplicated storage pool. The additional consumption occurs because the data is checked to ensure that it has been reassembled properly. Although a restore or retrieve operation of deduplicated small files might be relatively slow, these operations are still typically faster than a restore or retrieve operation from tape because of the added tape mount time. If you have data where the restore or retrieve time is critical, you can use a sequential-access disk storage pool that is not set up for data deduplication. Note: To reduce the mounting and unmounting of FILE storage pool volumes, the server allows for multiple volumes to remain mounted until they are no longer needed. The number of volumes that can be mounted at a time is controlled by the NUMOPENVOLSALLOWED option on the ITSM server (see Example 10-21 on page 273).
7888ch10.fm
default value. For more information, please check 10.3.5, Disabling the copy storage pool backup requirement on page 270
tsm: TSMSRVR>setopt NumOpenVolsAllowed 20 Do you wish to proceed? (Yes (Y)/No (N)) y ANR2119I The NUMOPENVOLSALLOWED option has been changed in the options file. Each session within a client operation or server process can have as many open FILE volumes as specified by this option. A session is initiated by a client operation or by a server process. Multiple sessions can be started within each.
273
7888ch10.fm
During a client-restore operation, volumes can remain open for the duration of a client-restore operation and as long a client session is active. During a no-query restore operation, the volumes remain open until the no-query restore completes. At that time, all volumes are closed and released. However, for a standard restore operation started in interactive mode, the volumes might remain open at the end of the restore operation. The volumes are closed and released when the next classic restore operation is requested. For any node backing up or archiving data into a deduplicated storage pool, set the value of the MAXNUMMP parameter in the client-node definition to a value at least as high as the NUMOPENVOLSALLOWED option. Increase this value if you notice that the node is failing client operations because the MAXNUMMP value is being exceeded.
274
7888ch11.fm
11
Chapter 11.
Recommendations
In this chapter we discuss which deduplication option to choose between N series, IBM Tivoli Storage Manager (ITSM) and ProtecTIER. The following topics are covered: Data flow considerations Deduplication platform features Deduplication features matrix ProtecTIER deduplication ITSM deduplication IBM N series deduplication
275
7888ch11.fm
Impact
Which solution is best for a given environment is commonly a trade-off between performance and cost. Firms should consider the economic and operational benefits of deduplication in a multi-tier primary storage system versus only using deduplication for backup/archive data resident in secondary storage. However, there is no one silver bullet or one solution that fits. IT managers will have to choose solutions that address their biggest and their largest number of pain points with a clear understanding of the trade-offs that they face with the choice they make. Considerations include: Performance versus cost is not completely mutually exclusive:
276
7888ch11.fm
Solutions available in the market today represent a continuum which strikes the balance between performance and cost (cost of storage & network bandwidth). Decision makers will have to make purchase decisions after careful analysis of their I/O requirements, data growth rates, network/processing capacity and composition of data types across storage tiers. Some suppliers offer tools that allow customers to access the performance impact of deduplication specific to their own environment before deploying in their production environment. The balance between performance and cost savings is created primarily by using different permutations of performing deduplication namely: Primary storage deduplication In-line deduplication Post process deduplication Inline deduplication Post process deduplication
Infrastructure utilization Primary Storage In-line Primary Storage Post Process Secondary Storage In-line Secondary Storage Post Process Performance
Maximum cost savings as a result of real time duplication and optional network bandwidth utilization
Solutions that strike the middle ground between performance and cost savings.
Least performance degradation and highest transparency as a result of low CPU utilization
Other variables that play a role in balancing these trade-offs at different levels include: Optimization technique: While some products use either deduplication or compression, others use both to provide higher level of optimization. The choice of optimization technique and level of optimization by itself is a balancing act between performance and cost. Solutions that perform intense optimization using both compression and deduplication on primary data tend to place heavy overhead on the computational resources and thus may not be suitable for primary storage IT environments. Speed vs. selective approach: In the primary storage deduplication and compression space, some products attempt to address performance issues by simply being fast while others meet performance objectives by being prudent in the choice of data they optimize. The choice between the two approaches and the technology used to pursue either of the paths will once again have an effect on the performance vs. cost equation. Features to enable, disable, revert and schedule deduplication: Users should have the option to turn off deduplication and/or revert their data to its original state if the consequences of deduplication turn out to be undesirable. Other configurable items should be the ability to enable deduplication on specific applications or workloads, while disabling it for others. Lastly, deduplication done in a post process fashion should be able to be scheduled during less processing intensive periods in the day. From the discussion and Figure 11-1 on page 277 above it is evident that solutions that perform in-line deduplication at the source rank high from an efficiency perspective. These
Chapter 11. Recommendations
277
7888ch11.fm
solutions in reality are well suited to specific use cases within IT environments/ Applications that require random read and write I/O induce substantial latencies in environments that employ inline deduplication. Such latencies cannot be tolerated in majority of primary storage environments.
X X X
X X
X X X X X
278
7888ch11.fm
the market today for data deduplication in very large environments. In addition to its unmatched speed and scalability, ProtecTIERs unique technology is extremely efficient in its utilization of memory and I/O, allowing it to sustain performance as the data store grows. This is a major area of concern with most other deduplication technologies. So, what is a large environment? This also is subjective, but here are some guidelines to consider. If you have 10 TB of changed data to backup per day, then you would need to deduplicate at 121 MB/sec over the full 24 hours to deduplicate all that data. 121 * 60 seconds * 60 minutes * 24 hours = ~ 10 TB The rate of 121 MB/sec is moving towards the practical upper limits of throughput of most deduplication technologies in the industry with the exception of ProtecTIER. Some vendors claim higher than 121 MB/sec with various configurations, but the actual, experienced deduplication rates are typically much lower than claims, especially when measured over time as the data store grows. As a result, most other vendors avoid focusing on direct deduplication performance rates (even with their deduplication calculators) and instead make capability claims based on broad assumptions about the amount of duplicate data (deduplication ratio) processed each day. ProtecTIER, on the other hand, can deduplicate data at 900 MB/sec (1.2 GB/sec with a 2 node cluster), and can sustain those types of rates over time. Another consideration is that it is very unlikely that any environment will have a full 24 hour window every day to perform deduplication (either inline during backup data ingest, or post processed after data has been stored). Typical deduplication windows will be more like 8 hours or less per day (for example, during, or immediately after, daily backup processing). For a specific environment, you can calculate the required deduplication performance rate by dividing the total average daily amount of changed data to be backed up and deduplicated, by the number of seconds in your available daily deduplication window: Dedup_Rate = Amount_of_MBs_Daily_Backup_Data / Number_Seconds_in_Dedup_Window For example, we saw 10 TB of data in a full 24 hour window requires a deduplication rate of 121 MB/sec. Another example is 5 TB of daily data in a 8 hour deduplication window would require a 182 MB/sec deduplication rate. 5,242,880 MB / 28,800 sec = 182 MB/sec. Although deduplication performance rates can vary widely based on configuration and data, 100 to 200 MB/sec seems to be about the maximum for most deduplication solutions except ProtecTIER. The scenario of 5 TB in a 8 hour window would lead immediately to a decision of selecting ProtecTIER. Another view of this is that ProtecTIER can deduplicate 10 TB of data in 2.8 hours. Under very ideal conditions, it would take the fastest of other deduplication solutions 13.9 hours to deduplicate 10 TB of data. Rather than utilize ProtecTIER, you could deploy multiple, distinct deduplication engines to handle a larger daily load like this, but that would restrict the domain across which you deduplicate and minimize your deduplication ratios. The equation above can assist in determining if you need to go with ProtecTIER for highest performance. The above discussion only considers deduplication processing. Note that you also need to consider that IBM Tivoli Storage Manager native deduplication will introduce additional impact to reclamation processing. Also, remember to plan for growth in your average daily backup amount. Here are some considerations for choosing ProtecTIER:
279
7888ch11.fm
ProtecTIER supports data stores up to 1 PB, representing potentially up to 25 PB of primary data. ProtecTIER deduplication ratios for IBM Tivoli Storage Manager data is lower due to IBM Tivoli Storage Manager data reduction efficiencies, but some ProtecTIER customers have seen up to 10:1 or 12:1 deduplication on IBM Tivoli Storage Manager data. Environments needing to deduplicate PBs of represented data should likely choose ProtecTIER. Environments that require global deduplication across the widest domain of data possible, should also use ProtecTIER. ProtecTIER deduplicates data across many IBM Tivoli Storage Manager (or other) backup servers and any other tape applications. IBM Tivoli Storage Managers native deduplication operates only over a single server storage pool. So, if you desire deduplication across a domain of multiple IBM Tivoli Storage Manager (or other backup product) servers, then you should employ ProtecTIER. ProtecTIER is the right choice also if a Virtual Tape Library (VTL) appliance model is desired. Different deduplication technologies use different approaches to guarantee data integrity. However, all technologies have matured their availability characteristics to the point that availability is no longer a salient decision criteria for choosing deduplication solutions.
280
7888ch11.fm
281
7888ch11.fm
282
7888p04.fm
Part 4
Part
Appendixes
There appendixes will give examples of before and after deduplication of data
283
7888p04.fm
284
7888ax01.fm
Appendix A.
Introduction
This section discusses storage (space) savings with deduplication. Testing of various data sets has been performed to determine typical space savings in different environments. These results were obtained in two ways: Running deduplication on various production data sets in the lab Using IBM System Storage N series storage systems deployed in the real world running deduplication. We tested different types of data and obtained different data deduplication ratios. Depending on the type of data (i.e regular user files, compressed files, mp3 files, picture files, etc.)
285
7888ax01.fm
Lab Environment
The IBM N series Storage System used on this user case Data ONTAP version 7.3.4 1 Aggregate 7 Disks and 1 Spare Licenses enabled: A-SIS(deduplication), CIFS, FCP, Flex Clone, iSCSI, Near Store Option, NFS, Snap Restore IBM N series System Manager
Result
The best result achieved in our lab was a 48% space saved using VMware host images. The best result achieved in our lab was a 92% data deduplication ratio, backing up several versions of the same presentation file type, or backing up several presentation files with minor differences between them. Note: On a customers primary storage environment, we currently see VMware data deduplication savings at 92% and mixed application data deduplication savings at 64%.
Summary
As you can see, the data deduplication ratio varies depending on the type of data you are backing up. The more similar blocks a volume contains, the better your deduplication ratio will be. In our tests when we backed up just one version of each file (an initial full backup) we achieved an overall deduplication ratio around 30%. When backing up similar files several times, we achieved a 92% deduplication ratio. You may achieve even better results in a real production environment since you may have more similar files and more versions of the same file stored in your system storage. More space savings can occur if you combine with N series. The IBM Real-time Compression Appliances increase the capacity of existing storage infrastructure helping enterprises meet the demands of rapid data growth while also enhancing storage performance and utilization. All IBM Real-time Compression Appliances apply IBM patented real-time data compression techniques to primary and existing storage, delivering optimization and savings throughout the entire storage life-cycle. The result is exceptional cost savings, ROI, operational and environmental efficiencies. IBM Real-time Compression Appliance Technology is based on proven Lempel-Ziv (LZ) data compression algorithms. Real-time compression enables the IBM Real-time Compression Appliances to deliver real-time, random access and lossless data compression, maintaining reliable and consistent performance and data integrity. The IBM Real-time Compression Appliance provides real-time data compression of up to 80% in N series environments, which increases storage efficiency with no performance degradation.
286
7888ax01.fm
This product (see Figure A-1) is transparent and easy to deploy & manage. There is no change to performance and requires no change to applications, networks or storage. It also preserves high availability.
287
7888ax01.fm
288
7888ax02-eva.fm
Appendix B.
289
7888ax02-eva.fm
ProtecTIER deduplication
In our lab environment we used IBM System Storage TS7650G with 1 TB of Repository. We created one TS3500 Virtual Tape Library with 10 LTO3 drives. We used V2.5 of ProtecTIER software. With our tests we confirmed that compressed data is not a good candidate for deduplication. Since ProtecTIER does compression, we recommend to send uncompressed data to ProtecTIER. Here are some examples in backup environments, see Figure B-1:
If you would like to see the same data in terms of space saved, see Figure B-2:
Space saved 96% 95% 94% 93% 92% 91% 90% 89% 88% 87% Healthcare Media and Entertainment Finance Retail distributor 95% 94%
92% 90%
290
7888ax02-eva.fm
Any performance data contained in this document was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. These measurements quoted in this document were made on development-level systems. There is no guarantee these measurements will be the same on generally-available systems. Users of this document should verify the applicable data for their specific environment.
291
7888ax02-eva.fm
292
7888ax03.fm
Appendix C.
293
7888ax03.fm
Introduction
We performed several backups of different types of data and we obtained different data deduplication ratios. Depending on the type of data that you will back up (regular user file, compressed file, music file, image file, etc.) and the options used (compression enabled or disabled on the ITSM client) you will achieve different data deduplication ratios.
Environment
In our environment we used the following components: IBM Tivoli Storage Manager server and client version 6.2.1.0 running on a AIX 6.1 server with maintenance level 05 IBM Tivoli Storage Manager client 6.2.1.0 running on a Windows 2008 server Client-side data deduplication enabled on the ITSM client Back ups running over the LAN (TCP/IP) to a deduplication-enabled storage pool On the ITSM server we configured a FILE type device class and an ITSM storage pool with deduplication enabled. A standard backup policy retention was created, pointing to the storage pool with deduplication enabled. Finally, we created one ITSM nodename to back up our Windows 2008 server data and we configured the nodename for client-side data deduplication. Incremental backups were performed.
Results
In our tests ITSM client-side data deduplication and server-side data deduplication provided the same results regarding deduplication ratios. The only difference is where the deduplication processing will occur. Since client-side data deduplication deduplicates the files before sending them to the ITSM server, the overall backup duration will be longer than using server-side data deduplication. In the other hand, with client-side data deduplication you dont need to run the identify duplicates process on the ITSM server after the backup finishes. From our tests, we obtained better results using no compression on the ITSM client. With compression enabled, we still had data deduplicated, but in a lower rate than without using compression. For example, we achieved a 26% deduplication ratio backing up common user files (including a lot of .mp3 files) on a windows file server with ITSM client compression disabled and 23% deduplication ratio backing up the same data and using ITSM client compression. Unless you have duplicate .mp3 or .zip files which is very rare you will find that these files dont offer any deduplication savings. You can get better deduplication ratios with flat files rather than with compressed files (.mp3, .zip, etc.). We achieved a 34% data deduplication ratio when backing up common user files on a windows file server (presentations, documents, executables, pictures, etc.) without ITSM client compression and a few compressed files (.mp3). The best result achieved in our lab was a 72% data deduplication ratio, backing up all the presentation files on the windows file server (see Figure C-1 on page 295) without ITSM client compression.
294
7888ax03.fm
Deduplication results 80% 70% 60% 50% 40% 30% 20% 10% 0% common user files - common user files - common user files ITSM compression on ITSM compression off few compressed files
Figure C-1 Deduplication results
72%
presentation files
Note: These results are compared to the incremental backup data stored on the ITSM storage pool. Its not compared to the full amount of data stored on the client. Incremental backups already reduces the amount of data stored on ITSM since it copies only files that have changed since the last backup.
Summary
As you can see, the data deduplication ratio varies depending on the type of data you are backing up. These examples indicate the possible storage savings and differences in deduplication percentages based on file type. The more similar files you have, the better your deduplication ratio will be. Any performance data contained in this document was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. These measurements quoted in this document were made on development-level systems. There is no guarantee these measurements will be the same on generally-available systems. Users of this document should verify the applicable data for their specific environment.
295
7888ax03.fm
296
7888bibl.fm
Related publications
The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this book.
IBM Redbooks
For information about ordering these publications, see How to get Redbooks on page 297. Note that some of the documents referenced here may be available in softcopy only. TS7680 Deduplication ProtecTIER Gateway for System z, SG24-7796-00 IBM System Storage N series Software Guide, SG24-7129-04 Tivoli Storage Manager V6.1 Technical Guide, SG24-7718-00
Other publications
These publications are also relevant as further information sources: IBM System Storage TS7610 ProtecTIER Deduplication Appliance Express - ProtecTIER User's and Maintenance Guide, GA32-0779-01 IBM System Storage TS7610 ProtecTIER Deduplication Appliance Express Introduction and Planning Guide, GA32-0776-02 IBM System Storage TS7610 ProtecTIER Deduplication Appliance Express Installation and Setup Guide for VTL Systems, GA32-0777-01
Online resources
These Web sites are also relevant as further information sources: IBM ProtecTIER Deduplication Solutions http://www-03.ibm.com/systems/storage/tape/protectier/ IBM System Storage TS7650G ProtecTIER Deduplication Gateway http://www-03.ibm.com/systems/storage/tape/ts7650g/ TS7610 ProtecTIER Deduplication Appliance Express http://www-03.ibm.com/systems/storage/tape/ts7610/
297
7888bibl.fm
298
7888ix.fm
Index
Numerics
3958-AP1 3958-DD4 3958-DE2 3959-SM1 7145-PBR 6667, 6970 6667, 7072 6667, 72 6668 72 block-level 45, 12, 47, 5657, 276 Block-level database backups 240 Block-level deduplication 5 Buffer capacity 8 Buffer size 244 BUFFERCOUNT 243 byte comparison 6, 8, 1314, 276 byte level comparison 9 byte-for-byte comparison 14
A
active log 106 active/active 119 administrative schedule 269, 273 Aggregate 52, 60, 138, 286 AIX xi, 232 csum command 33 algorithm 6, 10, 25, 27, 33, 8990, 92, 114, 241 alternate path function 229 API 2829, 35, 208, 210 application data sets 57 application-transparent 13 archival disk tiers 9, 281 archive 3435, 37, 106107, 109, 111, 154155, 162, 263 archive data 4, 34, 276 archive log 106 A-SIS 43, 46, 49, 117118, 155, 286 A-SIS license 43, 46 Assignment window 223 Asynchronous mode 49 Asynchronous Volume SnapMirror 47 ATA 13 auto mode 15, 169 Available file systems list 205
C
cartridge windows 224 catalog deduplication 11 changelog 43 checkpoint 43 chunk 56, 3334 chunk data object 5 Chunking 5 Chunks 6 chunks 33, 269 CIFS 51, 118119, 145, 286 CLI 13, 51 Client based deduplication processing 7 Client compression 245 client compression 258 client node queries 271 client-restore operation 274 client-side data deduplication 3536, 108, 110111, 294 client-side deduplication 261 client-side data deduplication 108 collision 6, 34 Collocation 112, 259 collocation 259 Command queuing 94 command-line interface (CLI) 13 Compaction 243 Compressed data 112 compression 33, 112 CompressionRate 87 Configuration Wizard 187188 copy pool 262, 264 country code 184 CPU 4, 67, 15, 44, 47, 49, 57, 6162, 106, 108110, 112, 114, 168, 272, 276, 280 Create repository Report window 207 Create repository window 208 Create repository wizard 203, 208 csum 33 csum command 33 Customer network 186
B
Backup Exec 58 backup policies 202 backup set 242, 245 backup stream 244 bandwidth 3, 2122, 46, 58, 61, 78, 94, 112, 114, 228229, 239240, 277, 280 Barcode seed field 225 Baseline 238 baseline 6, 50, 238 baseline chunk 6 best practice 15, 168 best practices 261 Binary comparison 6 binary differential 23, 27 block pointer 14 reference 14 block referencing 9, 14 block size 241, 243
D
D2D2T 7778 DAOS 243
299
7888ix.fm
data xi, 311, 1317, 1937, 4245, 47, 5051, 5563, 6970, 7392, 9498, 100, 106114, 118119, 154155, 166, 168169, 261270, 272274, 276280, 283, 285286, 289291, 293295 data access pattern 15 Data change rates 26, 79, 8485 data change rates 92 data chunking 5 Data deduplication processing 7 data integrity 286 data object 5 Data ONTAP 9, 1214, 16, 4246, 5051, 5557, 5963, 117119, 154155, 167, 281, 286 data pointer 10 data pointers 1011, 16 data protection 34, 22, 68, 70, 8386, 95, 98, 114, 259 Data reduction 3, 32, 68 Data retention period 21, 26 data segment 28 data-matching 240 DB2 82, 240, 244 dedup=clientorserver 267 deduplicate data 15, 34, 36, 112, 262, 279 deduplicated blocks 42, 44, 46, 50, 5657 deduplicated chunks 113 deduplicated component analysis 7 deduplicated small files 272 deduplicated volume 14 deduplicating primary data 8 Deduplication 32 deduplication xi, 1, 315, 17, 19, 2224, 2628, 3137, 4147, 4963, 65, 6769, 71, 7375, 7778, 86, 93, 105114, 117118, 133, 154159, 164, 166169, 261262, 265, 267270, 272273, 275281, 283, 285286, 289290, 293295 hash functions 33 hyperfactor 34 Single Instance Store 32 workloads 32 deduplication appliance 4, 280 deduplication architecture 67 deduplication efficiency 7, 57 deduplication license 9, 281 deduplication processing 272 deduplication schedule 15, 50, 55, 169 deduplication schedules 166 deduplication solution 7778 deduplication technology 4, 8, 14, 22, 24, 34 deduprequiresbackup 270 dedupverificationlevel 111 DEFINE LIBRARY 254 DEFINE STGPOOL 257 delta comparison 8788 Delta Compression 27, 87 delta difference 27 Delta differencing 6 device driver ftp site 232 device driver installation 232 Device IDs 249250, 252 DHF 208
Disable compaction 243 Disable compression 238 disable multiplexing 243 Disaster Recovery 113 disk array 2021, 24, 71, 8081, 198, 200201 disk I/O 7, 94, 106, 108109, 112, 114, 228, 268, 276 Disk Storage Pools 241242 disk-based backup 13 disk-based backups 21, 26, 8485 Disk-to-disk-to-tape 78 distributed heterogeneous environments 244 domain 262263, 267 Domino 58, 243 Domino to 8.5 243 DR replication 8 drive serial number 248 Drives fields 224 DRM 113 DS4000 66, 80, 95 DS5020 66 dsm.opt 267 duplicate blocks 911, 14, 51, 58, 156, 240241 duplicate chunks 6, 8 duplicate data 13 block 14 duplicate extents 36 duplicate-identification processes 268269 duplicates 45, 11, 14, 3233, 36, 58, 263264, 268269, 276, 294 dynamic utilization 8
E
Element information 252 Element Inventory 250, 253 Element number 230, 248, 254 element number 248 E-mail alerts 188 e-mail notifications 192 e-mail recipients 193 e-mail server IP 193 Enable compression 238 Encrypted data 111 encrypted data 34, 106, 241 Encryption xi, 235, 241, 249250 encryption 5, 5859, 8182, 86, 111, 241242, 273 Enterprise Linux License Agreement window 175 EW 8991 examples configuration 117 extraneous data chunks 6
F
factoring ratio 20, 2526, 29, 7475, 8182, 8486, 8892, 99, 202203, 205, 239, 243244 factoring ratios 86 failback process 21 failover 21, 68, 71 FC-15K 202, 205 FCP 16, 51, 146, 169, 286
300
7888ix.fm
Fibre Channel 20, 71, 77, 9394, 99, 202, 205 Fibre Channel disks 94 File backups 240 FILE device class 35, 262 file extent 272 file level visibility 7 file system 12 file system arrangement options 202 FILE volumes 273 File-level deduplication 5 FILESPERSET 242243 fingerprint 6, 1011, 1315, 45, 47, 5556, 60 fingerprint catalog 11 Fingerprint Database 60 fingerprint database 14 fingerprint database file 15 fingerprint record 15 fingerprints 6, 9, 11, 1316, 62, 156, 166167 Fixlist 232 FlexClone 55 flexible volume 1216, 46, 59, 6162, 154158, 161162, 166169 flexible volume (FlexVol) 1314 examples 117 FlexVol 14, 44, 119 Format agnostic 5 fractional overwrite reserve 53 Fractional Reserve 5155 free volume pool 53 front end ports 232 FsCreate 200 fsCreate 200201 full backup 286 FullCapacity 8687 FullCapacityVersions 8687 FullFrequency 8687 FullPhysicalCapacity 87 full-plus-incremental 244 FullRetention 8687
HP-UX xi, 102, 232 HSM 112 HW setup 183 HyperFactor 19, 2227, 73, 92, 203, 212213, 215216, 225, 237238, 240241, 276 hyperfactor 34 HyperFactoring 81
I
I/O performance 15, 6061, 168, 272 IBM device driver 232233 IBM Linux Tape 232 IBM N series Storage System 285286 IBM Performance Planner tool 99100 IBM ProtecTIER 228 IBM Real-time Compression Appliance Technology 286 IBM Real-time Compression Appliances 286 IBM System Services Representative 205 IBM System Storage N series 13 storage system 13 technology 13 IBM System Storage N series storage systems 285 IBM System Storage ProtecTIER Appliance Edition 69 IBM System Storage ProtecTIER Enterprise Edition 68, 70, 72 IBM System Storage ProtecTIER Entry Edition 68 IBM System Storage TS7610 ProtecTIER Deduplication Appliance Express 24, 66, 68 IBM System Storage TS7650 242 IBM System Storage TS7650 ProtecTIER Deduplication Appliance 66, 6970, 72 IBM System Storage TS7680 ProtecTIER Deduplication Gateway 6667, 7273 IBM Tivoli Storage Manager xi, 239, 244246, 248, 254258, 261262, 267269, 272273 IBM Tivoli Storage Manager DB2 database 244 IBM Tivoli Storage Manager V5.4 244 IBM TS3500 221, 224 identical hash 6 identification performance 6 IDENTIFY DUPLICATE 106 IDENTIFY process 106 identify processes 272 identifyprocess option 268 idtd 249 import/export slots 226, 258 inactive client files 245 IncrementalCapacity 87 IncrementalCapacityVersions 8788 IncrementalChangeRate 88 IncrementalFrequency 87 IncrementalPhysicalCapacity 8788 IncrementalRetention 87 index defragmentation timing 243 indirect blocks 14 Information Infrastructure 3 initial barcode seed 225 initiator port 230 in-line 6, 24, 68, 277 Inline Deduplication 8 Index
G
Gateway versions 118 get_versions command 196 granularity 12 Grid Manager 21
H
HA 43, 119, 229 hash 6, 811, 23, 3334, 276 hash collisions 6, 8 hash functions 33 hash method 6 hash value 11, 34 hash values 6, 11, 3334 hash-based 6 Hashing 6, 89, 11 HBA ports 240, 242 HBAs 71, 80, 228229, 231, 242 Hierarquial Storage Management 112
301
7888ix.fm
inline deduplication 8, 114, 278 inline processing 6 inode pointer 11 inode pointers 9 interactive mode 274 IOPS 30, 9394, 228 IP address 172173, 179181, 184, 246 iSCSI 16, 51, 77, 118119, 147, 169, 286 ISL 228 ISV 78, 233, 236 itdt 229230, 234, 248249, 253 ITSM 243246, 248, 255, 258259, 261264, 267270, 272273 ITSM client 263, 267, 273 ITSM database 107108, 245, 259, 269270, 280 ITSM database sizing 107 ITSM disk pools 258 ITSM disk storage pools 258 ITSM server 245, 259, 266267, 270, 272 ITSM servers 245
L
Lab-based Services 205 LABEL LIBVOLUME 257258 LAN-free 78, 112, 232, 258, 273 Lan-free 112 LAN-free backup 78, 259 LAN-free backups 258 LAN-free clients 258 library emulation 247 lin_tape 232236, 249 lin_tape device driver 233 lin_tape driver 229, 232 Linux xii, 173, 175176, 231233, 246, 248 Linux Device Driver 233 Linux lin_tape 233 Linux Page 232 load balancing 222 lossless data compression 286 Lotus Domino 58 LSU 29, 210218, 238 LSUs 29, 208210, 214, 217 LTO3 227, 247, 290 LUN 16, 45, 5156, 58, 9899, 118119, 169, 203, 219, 224, 229, 232, 234235, 247248 configuration examples summary 52 LUN masking 219, 224 LUN Space Reservation 52, 5455 LUNs 29, 5152, 54, 58, 81, 9899, 119, 131, 135, 203 LZH compression 27
MAXOPENFILES 242243 MD5 3334 media server 209, 228 medium changer 233 Memory Resident Index 2324, 27 Meta Data 20, 27, 93, 98100, 202203, 205206 meta data 203, 205 Meta Data LUNs 9899 Meta Data planner tool 99 metadata 6, 1112, 1416, 20, 24, 2730, 42, 4547, 55, 6061, 63, 7475, 81, 84, 169, 202, 205206 Metadata RAID 205 Metadata resources advanced dialog 206 metafile 42 MetroClusters 44 Microsoft Exchange 58, 244 Microsoft Windows Page 232 MOVEBATCHSIZE 258 MOVESIZETHRESHOLD 258 Multiple adjust function 217 multiple duplicate blocks 9 multiple references 13 multiplexing feature 241 multiplexing features 241 MultiStore 51 multi-tier storage architecture 9, 281
N
N5200 286 NAS systems 241242 Native Windows drivers 245 navigation pane 181, 237 NBU 29, 221 NDMP 240242 NearStore 16, 43, 46, 49, 117118 NearStore license 43, 46, 49 NetBackup 2829, 50, 101, 208209, 221 NetBackup policies 29 network replication 4 NFS 51, 58, 118119, 131, 286 NLO 243 Nodes pane 181, 203 NominalCapacity 86 non-identical data 6 non-unique chunks 6 no-query restore 274 NSF 243 NTP server 241 Number of slots field 226 NUMOPENVOLSALLOWED 272274
M
maintenance level 240 management class 37, 111, 258, 263 manual deduplication 15, 169 matching chunks 6 Max. cartridge growth check box 225 maximum sharing 14 MAXNUMMP 274
O
one-node cluster 203, 219 OpenStorage 28, 65, 101, 208 Oracle 240, 242, 244 OST 2829, 65, 101, 198199, 202, 208210, 228, 236, 238 OST storage appliance 209 OST-enabled media server 28, 101
302
7888ix.fm
P
P3000 6768, 245, 258 PAM 44 parallel backup 243 PARALLELISM 242243 parallelism 244 parity calculation 6 performance 42, 44, 47, 5658, 6062 I/O 61 write 61 performance degradation 286 persistent binding 229 PhysicalCapacity 87 Plug-in 208 policy domain 262263 Postprocess deduplication 8 Pre-Installation Summary window 177178 primary storage pool 35, 37, 109, 113, 264265 principality 21 problem diagnosis activities 241 Program Group 177 Progressive backup 244 progressive incremental 3, 32 ProtecTIER xi, 114, 171173, 177179, 181183, 186, 195, 197200, 203, 205206, 208, 210, 218219, 224, 227233, 236248, 255, 258259 ProtecTIER compression mode 238 ProtecTIER Configuration Menu 183, 186 ProtecTIER Deduplication Gateway 24, 6667, 70, 7273 ProtecTIER factored data 20 ProtecTIER family 179 ProtecTIER infrastructure servers 7 ProtecTIER Manager 2021, 66, 76, 102103, 171173, 177182, 187, 209, 231, 246, 259 ProtecTIER node 179, 259 ProtecTIER Replication Manager 21, 198199 ProtecTIER Repository 20, 74, 80, 8385 ProtecTIER repository 2122, 84, 9192 ProtecTIER server Configuration 172 ProtecTIER server nodes 179 ProtecTIER service nodes 173 ProtecTIER storage server emulation 208 ProtecTIER VT name field 219 ProtecTIER VTL 28, 82, 238, 241 ProtecTIER VTLs 241 Protection Manager 50, 57 Ptconfig 197 ptconfig 183, 197200, 208
QUERY LIBVOL 257 QUERY PATH 255 Quick Launch Bar 177 Quorum 99 Quota management 119
R
RAID 20, 29, 75, 81, 93100, 202203, 205 RAID group 203 RAM 173 random access 286 RAS 69, 185, 187, 196, 198, 200 rasMenu 196 REALLOC 56 Real-Time Compression 17 Real-time compression 240, 286 Real-time Compression Appliances 17, 286 real-time data compression 286 reclaim stg process 265 Reclamation 259, 265266 reclamation 259, 264266, 270, 273 recovery points 4 Red Hat 68, 72, 102, 173, 175176, 232, 246 Redbooks Web site 297 Contact us xiii redundant copies 4 reference count 14, 2830 reference pointers 10 Registration dialog 189 rehydration processes 240 RELABELSCRATCH 255, 258 Release Note 173, 195 Repository 172, 201208 Repository planning 201 repository planning wizard 205 Repository size field 202, 205 Repository size window 205 requirements 117 Resources menu 205 restore operations 259, 273 retrieve operation 272 risk-averse postprocess methodology 9, 281 RMAN 82, 242243 Robot check box 224 ROI 286 root volume 42 round-robin 62 RPM 15, 94, 202, 205 RSH 148
Q
q opt command 270 Qlogic 80, 231 QSM 45, 4749 Qtree SnapMirror 45, 4749, 51, 56 Qtree SnapMirror replication 48 Qtrees 136 qtrees 47, 63 QUERY DEVCLASS 257
S
SAN xii, 71, 80, 119, 172, 228, 230, 240 SAN HBA 228 SAS 205, 272 SATA 71, 9394, 9899, 205 schedules deduplication 166 parameters 166 viewing 166
Index
303
7888ix.fm
secondary comparison 6 secondary data 4 seismic data 242 sequential-access disk 34 Server based deduplication processing 7 server option 258, 270, 272273 Server-side data deduplication 34 server-side data deduplication 108 server-side deduplication 261 Services pane 227 setopt command 270, 273 SHA-1 6, 3334, 276 show deduppending command 266 Show version information 196 Simultaneous-write operations 273 Single Instance Store 32 sis config 155, 166167 sis start -s 45, 47, 50, 5556, 155 sizing and space 15, 168 Slots window 225 Small files 112 SnapLock 50 SnapMirror 4243, 4551, 56, 62 SnapMirror licensing 49 SnapRestore 45 Snapshot 12, 14, 16, 4445, 47, 5054, 5657, 169 Snapshot reserve 16, 169 Snapshots 4445, 56, 135 SnapVault 4950, 5657, 62 SNMP 69, 152, 172, 188, 190192 SNMP traps 188, 191192 Software License Agreement 174175 Solaris Page 232 space and sizing 15, 168 space reserved 5354 space savings 285286 space_optimized option 56 split volume 56 SQL 82, 243244 SSH keys 149 stale fingerprints 15 Standard compression 240 steady state 2930, 91, 259 storage consumption 13 storage efficiency 286 storage infrastructure xi, 286 storage life-cycle 286 storage performance 286 storage pool 3537, 78, 106107, 109, 111114, 241, 245, 257259, 262265, 267274, 280, 294295 storage pool backup 270, 272 storage server emulation 208 storage system based deduplication processing 7 STS 208210 STS credentials 209 sub-file 9, 33 supported models 117 Sync SnapMirror 47
System Manager 117121, 123127, 130132, 141, 154, 157158, 166167, 286 System name field 205 System peak throughput field 202, 205 System Storage tape drive 233
T
Tape drives window 222 tape management systems 233 Tape Model window 221 tar package 195 TCP IP 208 thin-provisioning 16, 169 timeserver 183185, 198199 Timeserver IP 172 timezone 183184, 198199 timezone setup 183 traditional volumes 63 transient and temporary data 57 TS1120 Tape Controllers 72 TS3000 6972 TS3000 System Console 70, 7273 TS3500 Virtual Library 227 TS7600 product 219 TS7610 24, 66, 6869, 7677, 172, 183, 195, 200201, 219, 227 TS7650 172, 195, 199201, 232, 242 TS7650 Appliance 199 TS7650G 24, 6667, 70, 7273, 7576, 7880, 82, 8893, 172, 195196, 198201, 232, 242, 290 TS7680 24, 6667, 7273, 76, 195, 200201 TSCC 71 two node clusters 20 two phase comparison 9 two-node cluster 219, 222 two-node clusters 173
U
Ultrium Linear Tape Open 78 untar 195 update stgpool command 268270 user accounts 177, 182 User Data LUNs 99 user data segment 28
V
vFiler 51 VI 5758 view deduplication schedules 166 Virtual cartridge size field 225 virtual devices 219, 224 virtual drives 227228, 232, 239, 258259 virtual tape drive definitions 229 virtual tape environment 259 virtual tape interface 245 Virtual Tape Library 4, 65, 202, 228, 258, 280 virtual tape resource 229 VMDK 57
304
7888ix.fm
VMDKs 5758 VMware 57, 286 volume copy 42, 45, 51, 53, 55 metadata 14 splitting 56 Volume SnapMirror 4547, 51 Volume Space Guarantee 5155 VSM 4547 VTL 172, 198, 202, 230, 236238, 241, 245, 254257
W
WAFL 16 WAN 22 Windows 32 173 WORM 5051, 255 Write Anywhere File Layout (WAFL) 13 write performance 61 WWN 228230, 254256 WWPN 230231
X
x3850 20, 70, 72 XIV 71
Z
zoning 229231, 240
Index
305
7888ix.fm
306
To determine the spine width of a book, you divide the paper PPI into the number of pages in the book. An example is a 250 page book using Plainfield opaque 50# smooth which has a PPI of 526. Divided 250 by 526 which equals a spine width of .4752". In this case, you would use the .5 spine. Now select the Spine width for the book and hide the others: Special>Conditional Text>Show/Hide>SpineSize(-->Hide:)>Set . Move the changed Conditional text settings to all files in your book by opening the book file with the spine.fm still open and File>Import>Formats the
7888spine.fm
307
To determine the spine width of a book, you divide the paper PPI into the number of pages in the book. An example is a 250 page book using Plainfield opaque 50# smooth which has a PPI of 526. Divided 250 by 526 which equals a spine width of .4752". In this case, you would use the .5 spine. Now select the Spine width for the book and hide the others: Special>Conditional Text>Show/Hide>SpineSize(-->Hide:)>Set . Move the changed Conditional text settings to all files in your book by opening the book file with the spine.fm still open and File>Import>Formats the
7888spine.fm
308
Back cover
Where Deduplication fits in your environment Deduplication in the storage hierarchy Deduplication benefits
Until now, the only way to capture, store, and effectively retain constantly growing amounts of enterprise data was to add more disk space to the storage infrastructurean approach which can quickly become cost-prohibitive as information volumes continue to grow, while capital budgets for infrastructure do not. Data deduplication has emerged as a key technology in the effort to dramatically reduce the amount and the cost associated with storing large amounts of data. Deduplication is the art of intelligently reducing storage needs an order of magnitude better than common data compression techniques through the elimination of redundant data so that only one instance of a data set is actually stored. IBM has the broadest portfolio of deduplication solutions in the industry giving us the freedom to solve customer issues with the most effective technology. Whether its source or target, inline or post, hardware or software, disk or tape, IBM has a solution with the technology that best solves the problem. This IBM redbooks publication covers the current deduplication solutions that IBM has to offer: -IBM ProtecTIER Gateway and Appliance -IBM Tivoli Storage Manager -IBM System Storage N series Deduplication