0% found this document useful (0 votes)

50 views48 pages

Posts: Data Protection Hub

Networker

Uploaded by

sarath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views48 pages

Posts: Data Protection Hub

Networker

Uploaded by

sarath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 48

Skip to content

Data Protection Hub

All things Data Protection

 About
 Quick Links
 Nomenclature
 Contact
 Site Map
 Archives
 Why the flag?
 Subscription
 Tools
 Books
 Twitter

Posts
Crunching NetWorker
Deduplication Stats
October 31, 2021 Preston de Guise 1 Comment

If you use NetWorker with Data Domain, you’ve probably sometimes wanted to know
which of your clients have the best deduplication – or perhaps more correctly, you’ve
probably wanted to occasionally drill into clients delivering lower deduplication levels.
There are NMC and DPA reports that’ll pull this data for you, but likewise, you can get
information directly from mminfo that’ll tell you details about deduplication for Data
Domain Boost devices.

You’ll find the information you’re after within the output of mminfo -S. The -S option in
mminfo provides a veritable treasure trove of extended details about individual
savesets. To give you an example, let’s look at just one saveset in -S format. First,
below, you’ll see the mminfo command to identify a saveset ID, then the -S option
invoked against a single saveset.

[Sun Oct 24 13:43:02]

## ~

## root@orilla

$ mminfo -q "name=/nsr/scripts,savetime>=4 hours ago" -r "ssid,savetime(20)"

ssid date time

3816067119 24/10/21 09:43:58

3665074337 24/10/21 10:20:00

[Sun Oct 24 13:43:10]

## ~

## root@orilla

$ mminfo -q "ssid=3665074337" -S
ssid=3665074337 savetime=24/10/21 10:20:00 (1635031200) orilla.turbamentis.int:/nsr/scripts

level=manual sflags=vF size=2453183044 files=18 insert=24/10/21

create=24/10/21 complete=24/10/21 browse=24/11/21 10:20:00 retent=24/11/21 23:59:59

clientid=c6fb4ece-00000004-5fdabaf1-5fdabaf0-00019ed8-a41317f3

*backup start time: 1635031200;

*ss data domain backup cloneid: 1635031201;

*ss data domain dedup statistics: "v1:1635031201:2459622768:395458878:151768627";

group: NAS_PMdG;

saveset features: CLIENT_SAVETIME;

Clone #1: cloneid=1635031201 time=24/10/21 10:20:01 retent=24/11/21 flags=

frag@ 0 volid=3923434705 file/rec= 0/0 rn=0 last=24/10/21

There’s a support article that explains how to review this information, here. The line
you’re looking for in particular though is the “*ss data domain dedup statistics” – outlined
in that aforementioned support article.

I recently wanted to review some deduplication details, and while DPA and NMC are
options, I needed to analyse some data over the weekend 1 and as such didn’t have
access to the live DPA or NMC host for a particular environment. So, since I’m not
doing anything social while Melbourne’s COVID case numbers remain so high, I tasked
myself with writing a script to analyse deduplication statistics from raw mminfo
-S output.

The script I wrote (in Perl, of course), can either be run on a NetWorker server via a
query option, or against the saved output of mminfo -S. The usage syntax is this:
Usage Options for dedupe-analysis

The script pulls the data and outputs at least three key data files, all in CSV format:

 An “all” file that contains deduplication details for each saveset

 A “client” file that contains summarised deduplication details for each client
 A “type” file that contains summarised deduplication details for each client
and backup type within the client (e.g., SQL, SAP, Oracle, Exchange, etc.)

Since a saveset, via cloning, can live on more than one Data Domain, the volume IDs
are included in the output – and the second two files actually provide the breakdowns
first by volume ID – so you can see stats on a per-Data Domain basis. There’s also an
option to anonymise the host names in the output, and if you invoke that, a CSV will
also be written containing the original hostname to anonymised hostname conversion 2.
I.e., this would allow you to send the anonymised version to someone for discussion,
but privately lookup the real host in any subsequent discussion.

There’s an additional output option too, which can be handy if you’re analysing millions
of savesets: it’s an option to create one output file per client. You still get the rollup data,
but you’ll also get a per-client CSV file so you can deep-drill into individual client results
with a better chance of avoiding Excel’s 1,000,000 row limit.

Here’s an example of an analysis run against my lab environment, with anonymisation

turned on:

[Sat Oct 30 11:09:12]

## /nsr/scripts

## root@orilla

$ ./dedupe-analysis.pl -q "savetime>=20 years ago" -i -a -o PrestonLab

Processing saveset details

...processed 0 savesets

...processed 261 savesets total

Writing PrestonLab-all.csv

...written 0 saveset details

...wrote 261 saveset details

Written per-saveset details to PrestonLab-all.csv

Individual (per-client) file results requested. Processing.

...Writing noumenon.turbamentis.int data to PrestonLab/host-00000000.csv

...Writing orilla.turbamentis.int data to PrestonLab/host-00000001.csv

Writing PrestonLab-client.csv

Written per-client details to PrestonLab-client.csv

Writing PrestonLab-type.csv

Written per-client/type details to PrestonLab-type.csv

Writing host anonymisation mappings for private reference - do not distribute.

...anonymisation mappings written to PrestonLab-anonmap.csv.

So what sort of output does it generate? For some output examples, I’ve imported the
generated CSV files into Excel and converted them to a table. Here’s an example of the
“per-saveset” data:

Deduplicated Details by Individual Saveset Instance

When going by client rollup you’ll see content such as the following:

Deduplicated Details by Volume ID and Client

And the backup type output looks like the following:

Deduplicated Details by Volume ID, Client and BackupType

Now, they’re just lab environments there — and while accurate, they’re hardly edifying.
What I was working towards was the analysis of a production environment. With host
anonymisation turned on, here’s an example of that output:
Deduplication data by Volume ID, Client and Backup Type from a Production
Environment

The only additional thing I’ve done in the example output there is to set number
formatting on the Original/Post-Comp/Average Reduction columns.

If you’re interested in being able to run this against your own environment (or mminfo -S
output, in general), here’s the script:

#!/usr/bin/perl -w

###########################################################################

# Modules

###########################################################################

use strict;
use File::Basename;

use Getopt::Std;

use Sys::Hostname;

###########################################################################

# Subroutines

###########################################################################

# in_list($elem,@list) returns true iff $elem appears in @list.

sub in_list {

return 0 if (@_+0 < 2);

my $element = $_[0];

return 0 if (!defined($element));

shift @_;

my @list = @_;

return 0 if (!@_ || @_+0 == 0);

my $foundCount = 0;
my $e = quotemeta($element);

foreach my $item (@list) {

my $i = quotemeta($item);

$foundCount++ if ($e eq $i);

return $foundCount;

# show_help([@messages]) shows help, any additional messages, then exits.

sub show_help {

my $self = basename($0);

print <<EOF;

Usage:

$self [-h|-?] [-d] [-a] {-q query | -f file} -o file

Where:

-h | -? Prints this help and exits.

-d Enables debug mode for additional output.

-q query Run nominated mminfo 'query' and analyse against results.

-f file Run analysis against a file containing mminfo results.

-o file File to write results data to. Do NOT include a file extension.

-a Anonymise hostnames in output data.

-i Write the 'all' savesets data as an individual CSV per client.

Analyses Data Domain deduplication statistics held in mminfo -S output

to build information about deduplication results on a per client and

workload type-basis.

EOF

if (@_+0 > 0) {

my @messages = @_;

foreach my $message (@messages) {

my $tmp = $message;

chomp $tmp;

print "$tmp\n";

die "\n";
}

# get_backup_type($savesetname) returns the guessed backup type based on the saveset

name.

sub get_backup_type {

return "Other" if (@_+0 != 1);

my $saveset = $_[0];

my $backupType = "";

if ($saveset =~ /^RMAN/) {

$backupType = "Oracle";

} elsif ($saveset =~ /^backint/) {

$backupType = "SAP";

} elsif ($saveset =~ /^SAPHANA/) {

$backupType = "SAP HANA";

} elsif ($saveset =~ /^VM/) {

$backupType = "Virtual Machine";

} elsif ($saveset =~ /^MSSQL/) {

$backupType = "MSSQL";

} elsif ($saveset =~ /^\// ||

$saveset =~ /^\<\d+\>\//) {
$backupType = "Unix Filesystem";

} elsif ($saveset =~ /^APPLICATIONS.*Exchange/) {

$backupType = "Exchange";

} elsif ($saveset =~ /^[A-Z]\:/ ||

$saveset =~ /SYSTEM/ ||

$saveset =~ /DISASTER/ ||

$saveset =~ /WINDOWS ROLES/ ||

$saveset =~ /\\VOLUME\{/ ||

$saveset =~ /\\\?\\GLOBALROOT/) { # Might need more

test cases here

$backupType = "Windows Filesystem";

} elsif ($saveset =~ /^index/ || $saveset =~ /^bootstrap/ ) {

$backupType = "NetWorker";

} else {

$backupType = "Other";

return $backupType;

###########################################################################

# Globals & main.

###########################################################################

my %opts = ();

my $version = "1.1";

my $method = "query"; # Default method is to seek a query from the command line.

my $query = "";

my $file = "";

my $debug = 0; # Change to 1 for lots of messy debug output.

my $outFile = "";

my $baseOut = "";

my $byClientOut = "";

my $byTypeOut = "";

my $anonOut = "";

my $anonHosts = 0;

my %hostMap = (); # Used for mapping hostnames to anonymised names.

my %vmMap = (); # Used for mapping VM names to anonymised names.

my $hostCount = 0; # Used for iterating anonymised hostnames.

my $vmCount = 0; # Used for iterating anonymised VM names.

my $individualFile = 0;

my %clientElems = (); # If we're doing individual file out, use this to speedily
iterate through gathered data.
# Capture command line arguments.

if (getopts('h?vq:f:o:adi',\%opts)) {

show_help() if (defined($opts{'h'}) || defined($opts{'?'}));

show_help("This release: v$version") if (defined($opts{'v'}));

if (defined($opts{'q'})) {

$query = $opts{'q'};

} elsif (defined($opts{'f'})) {

$method = "file";

$file = $opts{'f'};

if (! -f $file) {

show_help("Nominated file, '$file' does not exist or cannot

be accessed.");

} else {

show_help("You must specify a -q 'query' or -f 'file' option.")

$anonHosts = 1 if (defined($opts{'a'}));

$individualFile = 1 if (defined($opts{'i'}));

$debug = 1 if (defined($opts{'d'}));
if (defined($opts{'o'})) {

$outFile = $opts{'o'};

if ($outFile =~ /\./) {

show_help("Please don't give a file extension for the output

base filename.");

$baseOut = $outFile . "-all.csv";

$byClientOut = $outFile . "-client.csv";

$byTypeOut = $outFile . "-type.csv";

$anonOut = $outFile . "-anonmap.csv";

if (-f $outFile || -f $baseOut || -f $byClientOut || -f $byTypeOut

|| -f $anonOut) {

show_help("Output file exists. Pick another name.\nSelected

output files are:\n" . join("\n",($baseOut,$byClientOut,$byTypeOut,$anonOut)));

if ($individualFile && -d $outFile) {

show_help("Individual/per-client output specified but

directory $outFile already exists.");

} else {

show_help("You have not specified an output file.");

}
my %dataSets = ();

my @rawData = ();

my $ssidCount = 0;

my $ddSavesets = 0;

my %volumeIDsbyCloneID = ();

if ($method eq "query") {

my $queryCommand = "mminfo -q \'$query\' -S";

if (open(MMI,"$queryCommand 2>&1 |")) {

while (<MMI>) {

my $line = $_;

chomp $line;

push(@rawData,$line);

if ($line =~ /ss data domain dedup statistics/) {

$ddSavesets++;

if ($line =~ /^ssid=\d+/) {

$ssidCount++;

if ($ssidCount % 100000 == 0) {
print "...read $ssidCount saveset details
($ddSavesets on Data Domain)\n";

close(MMI);

} elsif ($method eq "file") {

if (open(FILE,$file)) {

while (<FILE>) {

my $line = $_;

chomp $line;

push(@rawData,$line);

if ($line =~ /ss data domain dedup statistics/) {

$ddSavesets++;

if ($line =~ /^ssid=\d+/) {

$ssidCount++;

if ($ssidCount % 100000 == 0) {
print "...read $ssidCount saveset details
($ddSavesets on Data Domain)\n";

close(FILE);

if ($ssidCount == 0) {

die ("Did not find any savesets in an expected format in the $method.\n");

if ($ddSavesets == 0) {

die ("Did not find any savesets on a Data Domain in the $method.\n");

# Now step through and discard all savesets that aren't on Data Domain devices.

my $count = 0;

my @dataSeg = ();

my $foundADDBoostSS = 0;

for (my $index = 0; $index < (@rawData+0); $index++) {

my $line = $rawData[$index];

if ($line =~ /^ssid/) {

if ($index != 0) {

# If we're at index 0 we're at the start of the file. We'll

see

# a line starting with 'ssid' but we won't have a prior

record

# to add to our dataSets.

if ($foundADDBoostSS) {

$debug && print "DEBUG: Boost saveset processing at

index $index\n";

$dataSets{$count} = join("\n",@dataSeg);

@dataSeg = ();

$foundADDBoostSS = 0;

$count++;

$debug && print "DEBUG: Incremented count to

$count\n";

@dataSeg = ($line);
} else {

push (@dataSeg,$line);

if ($line =~ /ss data domain dedup statistics/) {

$foundADDBoostSS = 1;

# Explicitly free up the initially read data. This is useful if you have >1M savesets

# in reducing the runtime footprint. (E.g., on sample of 1.7M savesets it uses 1/3
less

# memory over runtime.)

$debug && print "DEBUG: Deallocating raw data.\n";

undef @rawData;

my %procData = ();

my $datum = 0;

my $procCount = 0;

print "\n\nProcessing saveset details\n";

foreach my $key (sort {$a <=> $b} keys %dataSets) {

my $block = $dataSets{$key};
my @block = split(/\n/,$block);

my $ssid = "";

my $savetime = "";

my $nsavetime = "";

my $client = "";

my $saveset = "";

my $level = "";

my $ssflags = "";

my $totalsize = 0;

my $retention = "";

my $dedupeStats = "";

my $vmname = "";

if ($procCount % 100000 == 0) {

print " ...processed $procCount savesets\n";

$procCount++;

my $vm = 0;

my $blockIndex = 0;
foreach my $item (@block) {

if ($item =~ /^ssid=(\d+) savetime=(.) $(\d+)$ ([^:]):(.*)/) {

$ssid = $1;

$savetime = $2;

$nsavetime = $3;

$client = $4;

$saveset = $5;

$procData{$datum}{ssid} = $ssid;

$procData{$datum}{savetime} = $savetime;

$procData{$datum}{client} = $client;

$procData{$datum}{nsavetime} = $nsavetime;

$procData{$datum}{saveset} = $saveset;

if ($anonHosts) {

if (%hostMap && defined($hostMap{$client})) {

# Nothing to do here.

} else {

$hostMap{$client} = sprintf("host-%08d",
$hostCount);

$hostCount++;
}

$debug && print "\nDEBUG: START $ssid ($savetime -

$nsavetime): $client|$saveset\n";

if ($item =~ /^\s*Clone \#\d+: cloneid=(\d+)/) {

my $cloneID = $1;

my $volID = 0;

# Here we assume any DD saveset only has a single frag to

keep

# things simple. This may break if someone clones something

out

# to tape.

my $nextLine = $block[$blockIndex+1];

if (!defined($nextLine)) {

# Orphaned saveset/clone instance without a volume.

Set VolID to Zero.

$volID = 0;

} else {

if ($nextLine =~ /.*volid=\s*(\d+) /) {

$volID = $1;
} else {

# Something odd going on here. Set VolID to

Zero.

$volID = 0;

$volumeIDsbyCloneID{$ssid}{$cloneID} = $volID;

if ($item =~ /vcenter_hostname/) {

$vm = 1;

if ($vm == 1 && $item =~ /^\s+\\"name\\": \\"(.*)\\"/) {

$vmname = $1;

if ($anonHosts) {

if (%vmMap && defined($vmMap{$vmname})) {

$procData{$datum}{vmname} =
$vmMap{$vmname};

} else {

$vmMap{$vmname} = sprintf("vm-%09d",
$vmCount);

$vmCount++;

$procData{$datum}{vmname} =
$vmMap{$vmname};
}

} else {

$procData{$datum}{vmname} = $vmname;

$vm = 0;

if ($item =~ /^\s+level=([^\s]*)\s+sflags=([^\s]*)\s+size=(\d+).*/)
{

$level = $1;

$ssflags = $2;

$totalsize = $3;

$procData{$datum}{level} = $level;

$procData{$datum}{ssflags} = $ssflags;

$procData{$datum}{totalsize} = $totalsize;

$debug && print "DEBUG: ----> Level = $level, ssflags =

$ssflags, totalsize = $totalsize\n";

}
if ($item =~ /create=.* complete=.* browse=.* retent=(.*)$/) {

$retention = $1;

$procData{$datum}{retention} = $retention;

$debug && print "DEBUG: ----> Retention = $retention\n";

if ($item =~ /^\ss data domain dedup statistics: "(.)"(\,|;)\s*$/)

{

$dedupeStats = $1;

my $eol = $2;

if ($eol eq ",") {

my $stop = 0;

my $lookahead = $blockIndex + 1;

while (!$stop) {

my $tmpLine = $block[$lookahead];

if ($tmpLine =~ /^\s*"(.*)"(\,|;)\s*$/) {

my $tmpstat = $1;

my $tmpset = $2;

$dedupeStats .= "\n" . $tmpstat;

$stop = 1 if ($tmpset eq ";");

}

$lookahead++;

$procData{$datum}{dedupe_stats} = $dedupeStats;

$debug && print "DEBUG: ----> Dedupe stats: " .

join("|||",split(/\n/,$dedupeStats)) . "\n";

} elsif ($item =~ /^\ss data domain dedup statistics: \\\s$/) {

my $stop = 0;

my $lookahead = $blockIndex + 1;

while (!$stop) {

my $tmpLine = $block[$lookahead];

if ($tmpLine =~ /^\s*"(.*)"(\,|;)\s*$/) {

my $tmpstat = $1;

my $tmpset = $2;

$dedupeStats .= "\n" . $tmpstat;

$stop = 1 if ($tmpset eq ";");

$lookahead++;

}
$dedupeStats =~ s/^\n(.*)/$1/s;

$procData{$datum}{dedupe_stats} = $dedupeStats;

$debug && print "DEBUG: ----> Dedupe stats: " .

join("|||",split(/\n/,$dedupeStats)) . "\n";

$blockIndex++;

$datum++;

print (" ...processed $procCount savesets total\n");

# Now dump $dataSets to save memory. This doesn't save as much

# as the previous @rawData drop but retaining in case it's useful

# for someone on a low-memory system.

$debug && print "DEBUG: Deallocating pre-parsed datasets.\n";

undef %dataSets;

# Now do a quick post-process through the clients if we have host anonymisation

turned on
# to catch any index savesets and adjust them. It should be safe to do so here.

if ($anonHosts) {

foreach my $elem (keys %procData) {

if ($procData{$elem}{saveset} =~ /^index:(.*)/) {

my $indexHostname = $1;

if (%hostMap && defined($hostMap{$indexHostname})) {

$procData{$elem}{saveset} = "index:" .
$hostMap{$indexHostname};

} else {

$hostMap{$indexHostname} = sprintf("host-%08d",
$hostCount);

$hostCount++;

$procData{$elem}{saveset} = "index:" .
$hostMap{$indexHostname};

my %clientData = ();

# As we write out the base file, assemble the client rollup data.

my @clients = ();
print ("\n\nWriting $baseOut\n");

my $countOut = 0;

if (open(OUTP,">$baseOut")) {

print OUTP
("SSID,CloneID,VolumeID,Client,Savetime,Level,Name,OriginalMB,PreLCompMB,PostLCompMB,
Reduction(:1)\n");

foreach my $elem (keys %procData) {

my $dedupeStats = $procData{$elem}{dedupe_stats};

my @dedupeStats = split("\n",$dedupeStats);

if ($countOut % 100000 == 0) {

print " ...written $countOut saveset details\n";

$countOut++;

foreach my $stat (@dedupeStats) {

if ($stat =~ /^v1:(\d+):(\d+):(\d+):(\d+)/) {

my $clID = $1;

my $original = $2;

my $precomp = $3;

my $postcomp = $4;

my $client = $procData{$elem}{client};
my $saveset = $procData{$elem}{saveset};

my $savetime = $procData{$elem}{savetime};

my $ssid = $procData{$elem}{ssid};

my $level = $procData{$elem}{level};

my $vmName = (defined($procData{$elem}{vmname})) ?
$procData{$elem}{vmname} : "";

my $volumeID = $volumeIDsbyCloneID{$ssid}{$clID};

# Only build up the client<>element mappings if we

have to output

# individual clients.

if ($individualFile) {

if (in_list($client,@clients)) {

push(@{$clientElems{$client}},
$elem);

} else {

push (@clients,$client);

@{$clientElems{$client}} = ($elem);

if ($vmName ne "") {

$saveset = "VM:$vmName";
}

$original = $original / 1024 / 1024; #MB

$precomp = $precomp / 1024 / 1024;

#MB

$postcomp = $postcomp / 1024 / 1024; #MB

my $reduction = $original / $postcomp; #:1

my $backupType = get_backup_type($saveset);

$procData{$elem}{backuptype} = $backupType; #
Store this in case we need it for individuals.

if (%clientData &&

defined($clientData{by_client}) &&

defined($clientData{by_client}{$volumeID})
&&

defined($clientData{by_client}{$volumeID}
{$client})) {

$clientData{by_client}{$volumeID}{$client}
{original} += $original / 1024;

$clientData{by_client}{$volumeID}{$client}
{postcomp} += $postcomp / 1024;

$clientData{by_client}{$volumeID}{$client}
{count}++;

} else {
$clientData{by_client}{$volumeID}{$client}
{original} = $original / 1024; # Store in GB

$clientData{by_client}{$volumeID}{$client}
{postcomp} = $postcomp / 1024; # Store in GB

$clientData{by_client}{$volumeID}{$client}
{count} = 1;

if (%clientData &&

defined($clientData{by_type}) &&

defined($clientData{by_type}{$volumeID}) &&

defined($clientData{by_type}{$volumeID}
{$client}) &&

defined($clientData{by_type}{$volumeID}
{$client}{$backupType})) {

$clientData{by_type}{$volumeID}{$client}
{$backupType}{original} += $original / 1024; # Store in GB

$clientData{by_type}{$volumeID}{$client}
{$backupType}{postcomp} += $postcomp / 1024; # Store in GB

$clientData{by_type}{$volumeID}{$client}
{$backupType}{count}++;

} else {

$clientData{by_type}{$volumeID}{$client}
{$backupType}{original} = $original / 1024; # Store in GB

$clientData{by_type}{$volumeID}{$client}
{$backupType}{postcomp} = $postcomp / 1024; # Store in GB
$clientData{by_type}{$volumeID}{$client}
{$backupType}{count} = 1;

my $finalClientName = ($anonHosts == 1) ?
$hostMap{$client} : $client;

print OUTP ("$ssid,$clID,$volumeID,

$finalClientName,$savetime,$level,$saveset,$original,$precomp,$postcomp,
$reduction\n");

if ($debug) {

print ("DEBUG: " . $client . "," . $saveset

. " (ClID $clID)\n");

printf ("DEBUG: Original Size: %.2f MB\n",

$original);

printf ("DEBUG: Pre-LComp: %0.2f MB\n",

$precomp);

printf ("DEBUG: Post-LComp: %0.2f MB\n",

$postcomp);

printf ("Debug: Reduction: %0.2f:1\n",

$reduction);

print "DEBUG: \n";

}
}

close(OUTP);

print " ...wrote $countOut saveset details\n";

print "Written per-saveset details to $baseOut\n";

# If we need to output per-client details, do that now.

if ($individualFile) {

print "Individual (per-client) file results requested. Processing.\n";

mkdir("$outFile");

if (-d $outFile) {

foreach my $client (sort {$a cmp $b} @clients) {

my $filename = $client;

# Override here.

if ($anonHosts) {

$filename = $hostMap{$client};

$filename =~ s/\./_/g;

$filename = $outFile . "/" . $filename . ".csv";

if (open(OUTP,">$filename")) {

print " ...Writing $client data to $filename\n";

print OUTP
("SSID,CloneID,VolumeID,Client,VMName,Savetime,Level,Name,OriginalMB,PreLCompMB,PostL
CompMB,Reduction(:1)\n");

my @elemList = @{$clientElems{$client}};

foreach my $elem (@elemList) {

next if ($procData{$elem}{client} ne
$client);

# Else...

my $dedupeStats = $procData{$elem}
{dedupe_stats};

my @dedupeStats = split("\n",$dedupeStats);

foreach my $stat (@dedupeStats) {

if ($stat =~ /^v1:(\d+):(\d+):
(\d+):(\d+)/) {

my $clID = $1;

my $original = $2;

my $precomp = $3;

my $postcomp = $4;

my $saveset =
$procData{$elem}{saveset};
my $savetime =
$procData{$elem}{savetime};

my $ssid =
$procData{$elem}{ssid};

my $level =
$procData{$elem}{level};

my $vmName =
(defined($procData{$elem}{vmname})) ? $procData{$elem}{vmname} : "";

my $volumeID =
$volumeIDsbyCloneID{$ssid}{$clID};

push (@clients,$client) if
(!in_list($client,@clients));

if ($vmName ne "") {

$saveset = "VM:
$vmName";

$original = $original /
1024 / 1024; #MB

$precomp = $precomp / 1024

/ 1024; #MB

$postcomp = $postcomp /
1024 / 1024; #MB

my $reduction =
$original / $postcomp; #:1

my $backupType =
$procData{$elem}{backuptype};
my $finalClient =
($anonHosts == 1) ? $hostMap{$client} : $client;

my $finalSaveset = "";

if ($anonHosts) {

if ($vmName ne "")
{

$finalSaveset = $vmName;

} else {

$finalSaveset = $saveset;

} else {

$finalSaveset =
$saveset;

print OUTP ("$ssid,$clID,

$volumeID,$finalClient,$vmName,$savetime,$level,$finalSaveset,$original,$precomp,
$postcomp,$reduction\n");

}
}

close(OUTP);

} else {

die "Unable to create directory $outFile\n";

# Write file: Summary by VolumeID and Client.

print("\n\nWriting $byClientOut\n");

if (open(OUTP,">$byClientOut")) {

print OUTP ("VolumeID,Client,Total Original (GB),Total Post-Comp

(GB),Average Reduction\n");

foreach my $volumeID (keys %{$clientData{by_client}}) {

foreach my $client (sort {$a cmp $b} keys %{$clientData{by_client}

{$volumeID}}) {

my $finalClientName = ($anonHosts == 1) ?
$hostMap{$client} : $client;

printf OUTP ("%s,%s,%.8f,%.8f,%.8f\n",

$volumeID,

$finalClientName,

$clientData{by_client}
{$volumeID}{$client}{original},
$clientData{by_client}
{$volumeID}{$client}{postcomp},

$clientData{by_client}
{$volumeID}{$client}{original} / $clientData{by_client}{$volumeID}{$client}
{postcomp});

close(OUTP);

print "Written per-client details to $byClientOut\n";

# Write file: Summary by VolumeID, Client and BackupType.

print ("\n\nWriting $byTypeOut\n");

if (open(OUTP,">$byTypeOut")) {

print OUTP ("VolumeID,Client,Backup Type,Total Original (GB),Total Post-Comp

(GB),Average Reduction\n");

foreach my $volumeID (sort {$a cmp $b} keys %{$clientData{by_type}}) {

foreach my $client (sort {$a cmp $b} keys %{$clientData{by_type}

{$volumeID}}) {

foreach my $type (sort {$a cmp $b} keys %

{$clientData{by_type}{$volumeID}{$client}}) {

my $finalClientName = ($anonHosts == 1) ?
$hostMap{$client} : $client;

printf OUTP ("%s,%s,%s,%.8f,%.8f,%.8f\n",

$volumeID,
$finalClientName,

$type,

$clientData{by_type}{$volumeID}{$client}{$type}{original},

$clientData{by_type}{$volumeID}{$client}{$type}{postcomp},

$clientData{by_type}{$volumeID}{$client}{$type}{original} /
$clientData{by_type}{$volumeID}{$client}{$type}{postcomp});

close(OUTP);

print "Written per-client/type details to $byTypeOut\n";

# If we're anonymising hosts, write data that can be kept private to map

# anonymised hostnames to real hostnames.

if ($anonHosts) {

print "\n\nWriting host anonymisation mappings for private reference - do

not distribute.\n";

if (open(ANONMAP,">$anonOut")) {

print ANONMAP "Client Name,Anonymous Mapping\n";

foreach my $client (sort {$a cmp $b} keys %hostMap) {

print ANONMAP "$client,$hostMap{$client}\n";

print ANONMAP "\n";

print ANONMAP "Virtual Machine,Anonymous Mapping\n";

foreach my $vm (sort {$a cmp $b} keys %vmMap) {

print ANONMAP "$vm,$vmMap{$vm}\n";

close (ANONMAP);

print "...anonymisation mappings written to $anonOut.\n";

One thing to note in the script — the breakdown of what types of backups there are was
dependent on the saveset information that I had available to me. So it while it covers
things like Windows and Unix filesystems, Oracle, SAP and MSSQL, it doesn’t include
coverage for identifying Lotus Notes, DB2, and so on. (There’s a subroutine,
get_backup_type, that interprets the backup type from the saveset name, that you’d
need to modify if you wanted additional types.)

Did you like this post? Please share it.

 Facebook

 Twitter

 Reddit

 Email

 Print

 LinkedIn

 Tumblr
 Pinterest

 Pocket

 WhatsApp

Footnotes
1. There are, unfortunately, only so many hours in a busy work-day.
2. The “all” output does not attempt to anonymise saveset names – though it will
anonymise virtual machine names

Like this:

Loading...
Related Posts:
1. Basics – mminfo, savetime and greater than/less than
2. NetWorker and Incremental Manual Backups
3. mminfo2html
4. Accelerating Oracle Recoveries with NetWorker 18.1

Posted in: Data Domain, NetWorker, ScriptingFiled under: deduplication, mminfo

Post navigation
← NetWorker Basics – Policy Tree and Status
Unmanaged Data Hoarding is Deadly to your Business →

1 thought on “Crunching NetWorker

Deduplication Stats”
1. Michael Peyerl says:

November 2, 2021 at 6:47 pm

Cool and helpful scrip -> thx. for sharing this!

Kr,
Michael
Reply

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment

Name *

Email *

Website

Notify me of follow-up comments by email.

Notify me of new posts by email.

Post Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Data Protection: Ensuring Data Availability (2nd Edition)

The second edition of Data Protection: Ensuring Data Availability is now available. It’s
not just an update to existing content, there’s significant new information in this new
edition. Buy here.

Dell Technologies Webinars

Did you know Dell Technologies runs regular webinars? You can watch recordings of
completed webinars and register for upcoming ones at the Dell Technologies
Webinars Homepage. (There are some great data protection sessions in these!)

Musing About Tech

My new blog!

Search…
Search for:

Subscribe to updates

Email Address*

First Name

Last Name
* = required field
Subscribe

unsubscribe from list

Powered by Translate

 About

Quick Links

Nomenclature

Contact

Site Map

Archives

Why the flag?

Subscription

Tools

Books

Twitter
 Facebook

 Twitter

 Google Plus

 Custom Social

User Manual: 4-Axis GPS Drone
100% (1)
User Manual: 4-Axis GPS Drone
24 pages
Windows Interview Questions & Answers
No ratings yet
Windows Interview Questions & Answers
47 pages
02 Script DD With Oracle11gR2
No ratings yet
02 Script DD With Oracle11gR2
26 pages
Active Directory IMP Notes
No ratings yet
Active Directory IMP Notes
39 pages
List of Legato Networker Commands and Tips
100% (1)
List of Legato Networker Commands and Tips
4 pages
Important Commands
No ratings yet
Important Commands
4 pages
What Is NDS - Edirectory
No ratings yet
What Is NDS - Edirectory
8 pages
2017KS Dick-Disaster Recovery Using NetWorker and Data Domain
No ratings yet
2017KS Dick-Disaster Recovery Using NetWorker and Data Domain
18 pages
NetWorker NDMP Backup
No ratings yet
NetWorker NDMP Backup
27 pages
DC To DR Replication
No ratings yet
DC To DR Replication
6 pages
Storage Mirroring Guide 1
No ratings yet
Storage Mirroring Guide 1
197 pages
AD - Troubleshooting Issues
No ratings yet
AD - Troubleshooting Issues
104 pages
Netbackup Commands You Need
No ratings yet
Netbackup Commands You Need
6 pages
Network Appliance Dataontap 7.2 Command Reference: Page 1of 3
No ratings yet
Network Appliance Dataontap 7.2 Command Reference: Page 1of 3
3 pages
Data Domain Products - Overview (Customer Presentation)
No ratings yet
Data Domain Products - Overview (Customer Presentation)
27 pages
05 - Simpana® Deduplication
No ratings yet
05 - Simpana® Deduplication
36 pages
ExaGrid Systems Straight Talk About Disk Backup With Deduplication
100% (1)
ExaGrid Systems Straight Talk About Disk Backup With Deduplication
34 pages
DPA Implementation
No ratings yet
DPA Implementation
4 pages
DNS Reconnaissance: Carlos Perez
No ratings yet
DNS Reconnaissance: Carlos Perez
44 pages
Chapter 4 Backup & Recovery
No ratings yet
Chapter 4 Backup & Recovery
54 pages
VMware VSphere Data Protection
No ratings yet
VMware VSphere Data Protection
53 pages
Deduplication School
No ratings yet
Deduplication School
61 pages
NetWorker 9 Commands
No ratings yet
NetWorker 9 Commands
3 pages
SAN M32 New
No ratings yet
SAN M32 New
42 pages
OpenStorageLogging060811 Symconnect 4
No ratings yet
OpenStorageLogging060811 Symconnect 4
25 pages
BP Duplicate
No ratings yet
BP Duplicate
5 pages
Netbackup Syllabus
No ratings yet
Netbackup Syllabus
2 pages
NetApp Study Notes
100% (6)
NetApp Study Notes
32 pages
Veritas Netbackup 6.5 Administration (Fundamentals I)
No ratings yet
Veritas Netbackup 6.5 Administration (Fundamentals I)
38 pages
Veritas Netbackup 6.5 Administration (Fundamentals I)
100% (1)
Veritas Netbackup 6.5 Administration (Fundamentals I)
38 pages
Netbackup
No ratings yet
Netbackup
5 pages
Backup Exec Interview Question
No ratings yet
Backup Exec Interview Question
5 pages
Networker Cheetsheet
100% (1)
Networker Cheetsheet
23 pages
Useful Commands
No ratings yet
Useful Commands
9 pages
XnView User Manual
No ratings yet
XnView User Manual
57 pages
Scripts: Backup
No ratings yet
Scripts: Backup
5 pages
Data Domain Client Backup
No ratings yet
Data Domain Client Backup
15 pages
EMC Data Domain Tech
No ratings yet
EMC Data Domain Tech
22 pages
Oracle RMAN
No ratings yet
Oracle RMAN
33 pages
R Man
No ratings yet
R Man
0 pages
Informatica Backup
No ratings yet
Informatica Backup
20 pages
TSM Copy Group
No ratings yet
TSM Copy Group
14 pages
Network Administration Exam Cheat Sheet.
100% (2)
Network Administration Exam Cheat Sheet.
3 pages
Commands 742
No ratings yet
Commands 742
4 pages
Networker Emc
No ratings yet
Networker Emc
119 pages
Netbackup Tape Backup, Archive and Restore
No ratings yet
Netbackup Tape Backup, Archive and Restore
8 pages
DupScout Duplicate Files Finder v6.2
No ratings yet
DupScout Duplicate Files Finder v6.2
75 pages
Welcome To The World of "Cache"
No ratings yet
Welcome To The World of "Cache"
37 pages
Netapp Storage Efficiency: Author Srisuba Selvachamy
No ratings yet
Netapp Storage Efficiency: Author Srisuba Selvachamy
45 pages
Fos 90x Mibref
No ratings yet
Fos 90x Mibref
135 pages
Kapsabet Comp pp1 Qs - 1645671214
No ratings yet
Kapsabet Comp pp1 Qs - 1645671214
17 pages
Based On The PaaS Prototype, Which Azure SQL Database Compute Tier Should You Use?
No ratings yet
Based On The PaaS Prototype, Which Azure SQL Database Compute Tier Should You Use?
8 pages
CS609-Assignment No.1 Solution by M.junaid Qazi
0% (1)
CS609-Assignment No.1 Solution by M.junaid Qazi
3 pages
Course Out-Line
No ratings yet
Course Out-Line
24 pages
Solutions - Arrays in C Programming Lang Uage Exercises
No ratings yet
Solutions - Arrays in C Programming Lang Uage Exercises
4 pages
Atmel - START - To - MPLAB - Harmony3 - Migration - Guide - DS70005413A
No ratings yet
Atmel - START - To - MPLAB - Harmony3 - Migration - Guide - DS70005413A
43 pages
Image Compression Module 5
No ratings yet
Image Compression Module 5
20 pages
Static Routing
No ratings yet
Static Routing
17 pages
A+ Emerging Final Exam AAU
No ratings yet
A+ Emerging Final Exam AAU
14 pages
Labcenter Electronics - Proteus VSM - SPICE Co-Simulation - Debugging Benefits For PIC, 8051, AVR, HC11 and ARM
No ratings yet
Labcenter Electronics - Proteus VSM - SPICE Co-Simulation - Debugging Benefits For PIC, 8051, AVR, HC11 and ARM
2 pages
Honors Unit 4
No ratings yet
Honors Unit 4
8 pages
How To Verify The Required NetBackup Daemons
No ratings yet
How To Verify The Required NetBackup Daemons
3 pages
BullSequana SA11a Product Brief
No ratings yet
BullSequana SA11a Product Brief
1 page
Lab2 Synthesis
No ratings yet
Lab2 Synthesis
27 pages
Elasticsearch and Apache Lucene
No ratings yet
Elasticsearch and Apache Lucene
7 pages
Quick Card - MFA
No ratings yet
Quick Card - MFA
13 pages
High Speed Networks
No ratings yet
High Speed Networks
66 pages
Prachi Sharma
No ratings yet
Prachi Sharma
1 page
Unit 2
No ratings yet
Unit 2
27 pages
Rendered Nomicon
No ratings yet
Rendered Nomicon
152 pages
Embedded Wireless Controller Conversion
No ratings yet
Embedded Wireless Controller Conversion
13 pages
Efi Pin Blaster: Using The
No ratings yet
Efi Pin Blaster: Using The
10 pages
MCQ Test On Unit 4.4 - Attempt Review
No ratings yet
MCQ Test On Unit 4.4 - Attempt Review
3 pages
Computer Science Mark Scheme
No ratings yet
Computer Science Mark Scheme
7 pages
D 410 Manuel D Utilisation Et Maintenance UK
No ratings yet
D 410 Manuel D Utilisation Et Maintenance UK
59 pages
Code Generation Sample Questions
No ratings yet
Code Generation Sample Questions
3 pages
Micom P741: Numerical Busbar Protection
100% (1)
Micom P741: Numerical Busbar Protection
32 pages
Onedata Business Profile 2023.
No ratings yet
Onedata Business Profile 2023.
26 pages