0% found this document useful (0 votes)
27 views

Student Guide - PowerScale Advanced Administration_Course Guide

Student Guide - PowerScale Advanced Administration_Course Guide

Uploaded by

Nguyễn Vũ
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Student Guide - PowerScale Advanced Administration_Course Guide

Student Guide - PowerScale Advanced Administration_Course Guide

Uploaded by

Nguyễn Vũ
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 812

POWERSCALE

ADVANCED
ADMINISTRATION

COURSE GUIDE

PARTICIPANT GUIDE

[email protected]
[email protected]
PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page i


[email protected]
Table of Contents

PowerScale Advanced Administration..................................................................... 2


Dell EMC PowerScale Advanced Administration Introduction .............................................. 3
Prerequisite Skills ................................................................................................................ 4
Rebranding - Isilon is now PowerScale ................................................................................ 5
Other Introductory Information ............................................................................................. 6
PowerScale Solutions Certification Journey Map ................................................................. 7
Module 1-4 Topics ............................................................................................................... 8
Module 5-8 Topics ............................................................................................................... 9

Concepts................................................................................................................... 10
Module Objectives ............................................................................................................. 11
A Day in Life of a Storage Administrator ............................................................................ 12
Disaster Recovery Introduction .......................................................................................... 22
PowerScale Disaster Resilience ........................................................................................ 39
OneFS Domains ................................................................................................................ 59
Troubleshooting PowerScale Cluster ................................................................................. 70

Advanced Access .................................................................................................. 105


Module Objectives ........................................................................................................... 106
Networking Architecture ................................................................................................... 107
Access Zones .................................................................................................................. 126
SmartConnect Advanced and DNS .................................................................................. 137
Authentication Providers .................................................................................................. 163
Protocols .......................................................................................................................... 177

Advanced Authorization ........................................................................................ 207


Module Objectives ........................................................................................................... 208
Multiprotocol Permissions ................................................................................................ 209
User Mapping .................................................................................................................. 231

Reporting ................................................................................................................ 245

PowerScale Advanced Administration

© Copyright 2020 Dell Inc.


Page ii [email protected]
Module Objectives ........................................................................................................... 246
Events & Alerts ................................................................................................................ 247
Log Files .......................................................................................................................... 268
Notifications ..................................................................................................................... 281
Protocol Auditing .............................................................................................................. 293
SNMP .............................................................................................................................. 308

OneFS Job Engine ................................................................................................. 313


Module Objectives ........................................................................................................... 314
Job Engine Architecture ................................................................................................... 315
Job Types, Priority, and Impact ........................................................................................ 332
Job Engine Management ................................................................................................. 349

OneFS Services ...................................................................................................... 368


OneFS Services ............................................................................................................... 369
Module Objectives ........................................................................................................... 370
SFSE – Small Files Storage Efficiencies .......................................................................... 371
Migration .......................................................................................................................... 400
SmartQuotas Advanced ................................................................................................... 416
SnapshotIQ Advanced ..................................................................................................... 430
SyncIQ Advanced ............................................................................................................ 447

Disaster Recovery .................................................................................................. 471


Disaster Recovery............................................................................................................ 472
Module Objectives ........................................................................................................... 473
Data Protection and Disaster Recovery ........................................................................... 474
SyncIQ DR ....................................................................................................................... 504
NDMP .............................................................................................................................. 540
Cloud and Virtual Storage Strategies ............................................................................... 561

Performance and Monitoring ................................................................................ 579


Performance and Monitoring ............................................................................................ 580
Module Objectives ........................................................................................................... 581
DataIQ Deep Dive ............................................................................................................ 582

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page iii


[email protected]
HealthCheck .................................................................................................................... 617
Performance Foundation.................................................................................................. 627
Performance Analysis ...................................................................................................... 649

Appendix ............................................................................................... 675

Glossary ................................................................................................ 805

PowerScale Advanced Administration

© Copyright 2020 Dell Inc.


Page iv [email protected]
PowerScale Advanced Administration

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 1


[email protected]
PowerScale Advanced Administration

PowerScale Advanced Administration

PowerScale Advanced Administration

Page 2 © Copyright 2020 Dell Inc.


[email protected]
PowerScale Advanced Administration

Dell EMC PowerScale Advanced Administration Introduction

This course provides an experienced Dell EMC PowerScale Administrator


advanced information about PowerScale cluster operations. The course provides
the expertise necessary for an Administrator to consider and plan for future growth,
the understanding to implement best practices to avoid common issues, and the
ability to quickly diagnose the source of issues and work efficiently with
PowerScale support resources. It also provides advanced details about
PowerScale disaster recovery and the expertise necessary for an administrator to
consider and plan for disasters.
→ Configure cluster monitoring and reporting features
→ Identify and resolve networking and SmartConnect issues
→ Understand multiprotocol permissions
→ Understand the OneFS Job Engine
→ Perform performance analysis
→ Discuss the data protection and disaster recovery features of OneFS
→ Optimize and troubleshoot identified issues in the cluster

Important: If you plan to take the Proven Professional exam, use


the participant guide and lab guide as the exam reference or study
guide.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 3


[email protected]
PowerScale Advanced Administration

Prerequisite Skills

To understand the content and successfully complete this course, a student must
have a suitable knowledge base or skill set. The student must have an
understanding of:
• Networking fundamentals such as TCP/IP, DNS and routing
• PowerScale hardware troubleshooting and maintenance procedures
• PowerScale Concepts Course
• PowerScale Administration Course

PowerScale Advanced Administration

Page 4 © Copyright 2020 Dell Inc.


[email protected]
PowerScale Advanced Administration

Rebranding - Isilon is now PowerScale

Important: In mid-2020 Isilon launched a new hardware platform, the


F200 and F600 branded as Dell EMC PowerScale. Over time the
Isilon brand will convert to the new platforms PowerScale branding. In
the meantime you will continue to see Isilon and PowerScale used
interchangeably, including within this course and any lab activities.
OneFS CLI isi commands, command syntax, and man pages may
have instances of "Isilon".
Videos associated with the course may still use the "Isilon" brand.
Resources such as white papers, troubleshooting guides, other
technical documentation, community pages, blog posts, and others
will continue to use the "Isilon" brand.
The rebranding initiative is an iterative process and rebranding all
instances of "Isilon" to "PowerScale" may take some time.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 5


[email protected]
PowerScale Advanced Administration

Other Introductory Information

• Throughout the course, the "Dell EMC PowerScale" brand is used


interchangably with "PowerScale".
• Generation 6 hardware continues to use the "Isilon" brand.
• The content of this course is based on OneFS 9.0.
• The course content is based on the latest generation of PowerScale hardware
as of the course release date. Although the focus is on the latest hardware,
references and comparison may be made to older hardware such as
Generation 4 and Generation 5 nodes.
• The DataIQ content in the course is based on DataIQ v2.
• The lab cluster is installed with OneFS 9.0.
• The lab exercises build upon one another, in other words, you will not be able to
complete lab 5 if you skip lab 1, 2, 3, or 4. Perform the labs in sequential order.

PowerScale Advanced Administration

Page 6 © Copyright 2020 Dell Inc.


[email protected]
PowerScale Advanced Administration

PowerScale Solutions Certification Journey Map

The graphic shows the PowerScale Solutions Expert certification track. You can
leverage the Dell Technologies Proven Professional program to realize your full
potential. A combination of technology-focused and role-based training and exams
to cover concepts and principles as well as the full range of Dell Technologies'
hardware, software, and solutions. You can accelerate your career and your
organization’s capabilities.

PowerScale Solutions

A. PowerScale Advanced Administration (C, VC)

B. PowerScale Advanced Disaster Recovery (C, VC)

(Knowledge and Experience based Exam)

Implementation Specialist, PowerScale Technology Architect Specialist, Platform Engineer, PowerScale


PowerScale

A. PowerScale Concepts (ODC)


A. PowerScale Concepts (ODC) A. PowerScale Concepts (ODC) B. PowerScale Hardware Concepts (ODC)
C. PowerScale Hardware Installation (ODC)
B. PowerScale Administration (C,VC,ODC) B. PowerScale Solution Design (ODC) D. PowerScale Hardware Maintenance
(ODC)
E. PowerScale Implementation (ODC)

Information Storage and Management

Information Storage and Management (C, VC, ODC)

(C) - Classroom

(VC) - Virtual Classroom

(ODC) - On Demand Course

For more information, visit: http://dell.com/certification

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 7


[email protected]
PowerScale Advanced Administration

Module 1-4 Topics

The course consists of eight modules. The total time required to complete this
course content and lab exercises is approximately 5 days.

1 Concepts 2 Advanced Access 3 Advanced Authorization 4 Reporting

A Day in Life of a Storage Administrator Networking Architecture Multiprotocol Permissions Events & Alerts

Disaster Recovery Introduction Access Zones User Mapping Log Files

SmartConnect Advanced and


PowerScale Disaster Resilience Notifications
DNS

OneFS Domains Authentication Providers Protocol Auditing

Troubleshooting PowerScale
Protocols SNMP
Clusters

PowerScale Advanced Administration

Page 8 © Copyright 2020 Dell Inc.


[email protected]
PowerScale Advanced Administration

Module 5-8 Topics

5 OneFS Job Engine 6 OneFS Services 7 Disaster Recovery 8 Performance and Monitoring

Job Engine Architecture SFSE Data Protection and Disaster DataIQ Deep Dive
Recovery

Job Types, Priority, and Impact Migration SyncIQ DR Healthcheck

Job Engine Management SmartQuotas Advanced NDMP Performance Foundation

SnapshotIQ Advanced Cloud and Virtual Storage Performance Analysis


Strategies

SyncIQ Advanced

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 9


[email protected]
Concepts

Concepts

PowerScale Advanced Administration

Page 10 © Copyright 2020 Dell Inc.


[email protected]
Concepts

Module Objectives

After completion of this module, you can:

• Identify the administrative tasks performed by a PowerScale administrator.


• Understand importance of Disaster Recovery.
• Describe various Disaster Resilience options.
• Describe OneFS Domains.
• Describe the troubleshooting steps and process.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 11


[email protected]
Concepts

A Day in Life of a Storage Administrator

General Administrative Tasks

There is no "one size fits all" when defining the daily tasks of a storage
administrator. Some of the general administrative tasks may include monitoring the
cluster, addressing notifications, analyzing problems, formulating courses of action,
escalating issues, contacting technical support, and making configuration changes.

1 2 3

4 5

1: The storage administrator is responsible for monitoring the performance,


capacity, and hardware components of a PowerScale cluster. This is to ensure that
the system and workflows are operating as intended. Monitoring also helps to
proactively detect and resolve potential problems.

2: Storage administrators perform several configurations to the PowerScale system


among which include storage provisioning, licensing, user account creation and
management, integration with authentication providers, data protection, and
disaster recovery.

3: Storage administrators receive notifications in the form of cluster alerts or SMTP


email for cluster warnings and errors. Administrators may also enable end users to
receive notifications for events such as quota violations.

4: Storage administrators plan and analyze for future implications such as storage
growth, incorporation of new applications, revamping of old applications and
workflows, which may impact the ecosystem of the storage solution.

PowerScale Advanced Administration

Page 12 © Copyright 2020 Dell Inc.


[email protected]
Concepts

5: Storage administrators may propose additions or modifications to the


PowerScale system. The justifications could include: keeping up with SLAs, handle
growth, resolve existing problems and so on.

The first priority for an administrator may be to verify processes that run overnight
have completed, such as backups or remote synchronization tasks. The first view
may be the dashboard of the PowerScale cluster or clusters to get the status of the
system. The dashboard provides a glance at areas such as performance, capacity,
and alerts. If areas look to have, or indicate issues, the administrator goes into
action.

Few questions to ponder are: How are your storage administration tasks done? Are
there other tasks you do daily? Do you provide daily status reports? Other tasks for
discussion are postmortems, help desk tickets and requests, patch planning,
upgrade planning, hardware refresh rollout, proof of concepts, and so on.

Setting the Stage

To give a context to the concepts discussed, a hypothetical corporation called


Diverse Genomics is used throughout the course. Diverse Genomic or Div-Gen in
short is a cutting-edge organization on the forefront of genomics research and
development. The core and DR sites each have a 3-node PowerScale cluster.

Div-Gen provides discrete clinical data for university hospitals for specific patient
demographic information. Within their organization, they have about 6000
individuals accessing data on the cluster. Some of the users are from academia
institutions, while most are internal employees. Internally, the cluster is used for
home directories and file sharing.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 13


[email protected]
Concepts

Proactive and Reactive Management

A storage administrator may manage cluster issues


in either a proactive or reactive manner.

• Event notifications alert administrators with


cluster warnings and issues.
• Reactive Management1 involves storage
administrators responding to problems that have
already occurred and need resolution.
• Proactive Management2 involves storage
administrators anticipating and eliminating
problems before they can occur.

Proactive Management Example

1 An example for reactive management is when a user quota is reached and the
user is unable to further write data to the cluster. The administrator must now
immediately increase the quota limit or notify the user to delete unneeded data.
The emergency could have been avoided if the impending problem was
investigated and acted upon in the early stages.

2 An example for proactive management is when a storage administrator configures


the system so that they are notified when the user is approaching quota limits.
Proactive steps taken by administrators help reduce the occurrence of reactive type
events.

PowerScale Advanced Administration

Page 14 © Copyright 2020 Dell Inc.


[email protected]
Concepts

A variation of proactive management is when preparing for disaster recovery. A


storage administrator may prepare for an impending disaster proactively using
different data protection mechanisms.

• Snapshots3
• Backup and Recover4
• Replication5
• Data Tiering6

3The administrator can create a snapshot schedule using SnapshotIQ to maintain


point in time copies of the production data.

4 Using NDMP, data can be backed up to a tape or disk.

5The administrator can configure asynchronous replication to a target cluster using


SyncIQ with failover and failback capabilities.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 15


[email protected]
Concepts

Thus, when a disaster occurs, the administrator is prepared to recover from it


rather than having to investigate and then come to a solution.

Notification Example

Custom notification sent to Hayden and John Doe.

Let us take the example of a hard limit exceeded on a user quota to demonstrate
the combination of notifications and reactive management.

• User John Doe reaches hard limit of 10 GB.


• An event is generated and notifications are sent to Hayden and John Doe.
• John Doe is unable to further write to his home directory.
• Hayden explores resolution possibilities to solve the problem. Some examples
include:

6 Data can be proactively moved to the cloud by configuring CloudPools.

PowerScale Advanced Administration

Page 16 © Copyright 2020 Dell Inc.


[email protected]
Concepts

− Increase user quota7


− Snapshots8
− Tier data using SmartPools9
− Request user to delete unnecessary files10

Monitoring and Analysis

Monitoring is key to proactive management. Listed below are the most common
interfaces that a storage administrator may use on a daily basis. Click each tab to
learn more about the interface.

WebUI

The WebUI dashboard is a graphical tool that can be used to monitor different
cluster variables such as cluster size, storage efficiency, throughput and CPU,
active client connections, node status and so on. Administrators can also monitor
cluster alerts, job reports and status of different configurations such as Quotas
using the WebUI.

7 If capacity is available, a simple and temporary fix is increasing the quota limit.

8Older snapshots may be deleted to free up space. Another alternative is to


exclude snapshot overhead in the quota calculation by changing the quota settings.

9Data matching a certain pattern such as least accessed files or files older than
100 days can be tiered off to a different physical layer of storage to free up space.

10Administrators may request users to delete unneeded files and personal files in
order to free up space.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 17


[email protected]
Concepts

CLI

You can monitor the PowerScale cluster using different OneFS commands. The
CLI command outputs provide information on a particular configuration at a more
granular level. For example, to view information about a directory quota, the isi
quota command is used.

PAPI

Another advanced area to consider is taking advantage of the API calls. A chief
benefit of PAPI is its scripting simplicity, enabling administrators to automate their
storage administration. Hayden can develop and tailor specific areas to monitor
using APIs.

PowerScale Advanced Administration

Page 18 © Copyright 2020 Dell Inc.


[email protected]
Concepts

DataIQ

DataIQ provides a Analyze function to enable organizations the ability to view


volumes from a business context. The Analyze page enables administrators to
focus on capacity metrics for a given context. You can view multidimensional
project-oriented data. DataIQ Analyze can reveal the true cost of project data to
enable business users to manage their cost and workflows.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 19


[email protected]
Concepts

InsightIQ

InsightIQ focuses on PowerScale data and performance. It provides tools to


monitor and analyze cluster performance and file systems. Cluster monitoring
includes performance, capacity, activity, trending, and analysis. InsightIQ runs on
separate hardware from the clusters that it monitors, and provides a graphical
output for trend observation and analysis.

PowerScale Advanced Administration

Page 20 © Copyright 2020 Dell Inc.


[email protected]
Concepts

Resources - PowerScale Information Hub

The Dell EMC PowerScale Information Hub connects users to a central hub of
information and experts to help maximize their current storage solution.

All the guides pertaining to PowerScale installation, administration, and


maintenance is present in the information hub.

See Dell support portal for all the technical whitepapers related to PowerScale.

Challenge

Lab Assignment: Discuss various administrative tasks and ask qualifying


questions that is required to be a PowerScale administrator.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 21


[email protected]
Concepts

Disaster Recovery Introduction

What Is a Disaster?

Data is the most valuable asset for an organization. Any event that can cause data
unavailability or data loss constitutes as a disaster. A disaster can range from a
single file loss to complete data center loss. Based on the cause, disasters can be
classified into three broad categories: natural, technological and man made.

1: The data center may get physically damaged due to natural calamities such as
earthquakes, hurricanes, floods, and tornadoes occurring at the site. These events
are usually rare. Organizations plan and locate their data center in a place where
the probability of such events happening is minimal.

2: Technological causes may range from software errors to hardware component


failures that render data to be unavailable or lost. Some examples include network
failures such as switch or port failures, power failures, data corruption, file system
errors and so on. Planned or simulated maintenance windows such as a monthly
DR tests render data unavailability.

3: Human negligence is the most likely cause for a disaster. Examples include a
user accidentally overwriting or deleting a file, ignoring cluster warnings such as
quota advisory limits, locking shared resources. Man-made causes also include
intentional attacks such as cyber attacks, acts of terrorism, data theft,
compromising data integrity and so on.

PowerScale Advanced Administration

Page 22 © Copyright 2020 Dell Inc.


[email protected]
Concepts

Disaster Recovery Concepts

Consider a simple two-site replication scenario to describe some of the basic


concepts in disaster recovery.

• Primary Site is the site that is actively hosting read/write data.


• Secondary Site is the site where the same cluster data is inactive or read-only.
• The primary site includes the source cluster and the source data11.
• The secondary site includes the target cluster and the target data.
• RPO12 (Recovery Point Objective) is the acceptable amount of data not
recovered following an incident.
• RTO13 (Recovery Time Objective) is the acceptable amount of time to restore
access to the data following an incident.

11Source data can also be described as production data, primary data, active data,
read/write data, and others.

12 Based on the RPO, organizations plan for the frequency with which a backup or
a replica must be made. For example, if the RPO of a particular business
application is 24 hours, then backups are created every midnight.

13Based on the criticality of data, the RTO for an organization may vary. The
organization must be well prepared or equipped to overcome a disaster within the
RTO to avoid any business impact. For example, if an organization has an RTO of
two hours, data access must be restored within two hours. Both RPO and RTO are
counted in minutes, hours, or days.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 23


[email protected]
Concepts

Business Continuity versus Disaster Recovery

Business Continuity (BC) and Disaster Recovery (DR) are often used
interchangeably, but are two entirely different strategies, each of which plays a
significant aspect in safeguarding business operations.

PowerScale Advanced Administration

Page 24 © Copyright 2020 Dell Inc.


[email protected]
Concepts

• Business Continuity14 is a business-centric approach to ensure that normal


business operations can continue during and after a disaster.
• Disaster Recovery is a data-centric approach to restore data loss and data
access after a disaster occurs.
• Disaster Recovery is a subset of Business Continuity.
• BC and DR strategies are important15 for an organization to act proactively and
reactively in the event of a disaster to ensure minimal damage to the business
and data.

Fact: According to the Data Breach 2019 Report by IBM®, the global
average cost for data breach in the 2019 study is $3.92 million. The
study included 507 organizations in 16 countries and regions and
across 17 industry sectors.

14Business continuity accounts for non-data centric factors. Factors include human
resources, staffing, transportation, skill needs, infrastructure, hardware, software,
and other related CAPEX and OPEX items.

15Loss of data and access to it may have an adverse effect on a business. An


organization seeks to reduce the risk of sensitive data loss to operate its business
successfully. These sensitive data, if lost, may lead to significant financial, legal,
and business loss apart from serious damage to the reputation of an organization.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 25


[email protected]
Concepts

Hot, Warm and Cold Sites

When a disaster occurs at the primary site, data and operations are shifted to the
secondary site until the primary site is restored. Based on the RTO, an organization
can implement the secondary site in 3 different ways:

PowerScale Advanced Administration

Page 26 © Copyright 2020 Dell Inc.


[email protected]
Concepts

• Hot Site16 - Fully equipped and configured to enable failover in a relatively short
RTO.
• Warm Site17 - Equipped and setup, but not configured to resume operations
immediately.
• Cold Site18 - Unprepared and needs to be set up and configured to resume
operations.

Business Continuity Plan

Organizations can fail after a disaster when they have no recovery plan, no
communication plan, unrealistic recovery goals, inaccurate disaster recovery plan,
and unclear roles.

From the conceptualization to the realization of the BC plan, a life cycle of activities
can be defined for the BC process.

16Hot site is a relatively more expensive solution. A hot site provides an


asynchronous mirror image of the source data. Asynchronous replication can incur
some data loss. Standby latencies for hot sites are typically only milliseconds in
length, resulting in little to no downtime during failover.

17 A warm site has the infrastructure unboxed and installed but it is not configured
as a disaster recovery solution. The RTO for a warm site is the time it takes to
restore data and configure the system for access. Because your data is not being
consistently replicated between production and target, there is greater latency for
failover, ranging from seconds to hours.

18 A cold site is the cheapest recovery option, but also the least effective one. Cold
site recovery ranges from powering on dormant systems to a full deployment of
hardware and software. The RTO drives the level of preparedness of the cold site.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 27


[email protected]
Concepts

The BC planning life cycle includes five stages.

1:

• Determine BC requirements, scope, and budget to meet requirement.


• Select a BC team that includes subject matter experts from all areas of
business, whether internal or external.
• Create BC policies.

2:

• Identify essential services and functions to assess the operation disruption in an


emergency.
• Identify required skill sets and staff reallocation to perform and maintain critical
functions.
• Identify potential issues and what impact the loss of critical functions have on
the business.

3:

• Create an action plan for each critical function.


• Design data protection strategies and develop infrastructure.
• Develop contingency solution and emergency response procedures.
• Detail the recovery and restart procedures.

4:

PowerScale Advanced Administration

Page 28 © Copyright 2020 Dell Inc.


[email protected]
Concepts

• Implement risk management and mitigation procedures that include backup,


replication, and management of resources.
• Prepare the DR sites that can be used if a disaster affects the primary data
center. The DR site could be a data center of the organization or could be a
cloud.
• Implement redundancy for every resource in a data center to avoid single points
of failure.

5:

• Train the recovery team on recovery procedures when a disaster is declared.


• Test the BC plan regularly to evaluate its performance and identify its
limitations.
• Review the plan, and identify gaps, identify areas that require clarification or
detail.
• Update the BC plans and recovery procedures to reflect regular changes within
the data center.

Plan for short, medium, and long-term emergencies. Define what a short, medium,
and long-term emergency is. The definition can be in terms of dollars, business
lost, time, or fatigue. The plan should address the long term affects to the business
if any of the services or functions cannot be restored.

Disaster Recovery Plan

A subset of the business continuity plan is the disaster recovery plan. The plan
should define the short, medium, and long-term contingencies.

1: Determine the disaster recovery team, their roles and responsibilities, and their
contact information. The team may include the DR lead, management team, facility
team, and the network, server, application, and storage teams.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 29


[email protected]
Concepts

2: The plan has a description and location of the recovery facilities, transportation
and accommodations details, and the location of data and backups.

3: The communication procedures and contacts should include authorities,


employees, clients, vendors, and partners.

4: The process for activating the plan needs to be clearly defined. The minimum
information is the who, what, when, where, and how. Who decides and who needs
to be contacted? What kind of disaster and what is the scope? What is the timeline
and what needs to happen when? Where is the data? How is access going to be
cutover?

5: Perform trial runs to identify weaknesses or failures and then revise and test the
plan again.

Minimal Function of Recovery

To demonstrate RTO and RPO in a disaster recovery plan, consider the following
scenario for the Div-Gen organization:

• Technical recovery requirements, RPO and RTO: 8 Hours


• Business recovery requirements, RPO and RTO: 10 Hours

PowerScale Advanced Administration

Page 30 © Copyright 2020 Dell Inc.


[email protected]
Concepts

• The severity19 of the event drives the method of data retrieval.


• Hot, Warm, or Cold Site?20
• Business recovery requirements have a greater RTO than that of disaster
recovery.

Disaster Management Overview

In order to manage a disaster efficiently, it must be handled at different stages:


before the disaster takes place, during the disaster, and after the disaster. The
preparation and response for disasters can be classified into 3 categories:

• Disaster Resistance is the integrated protection to withstand a disaster.


• Disaster Resilience is the ability to overcome a disaster quickly.
• Disaster Recovery is the process of recovering data and operations after the
disaster occurs.

19If a catastrophic event causes data unavailability, then the eight-hour RTO
requires secondary site data access within eight hours.

20A hot or warm site may simply require failover steps. A cold site may need
powering up systems and restoring data from tapes, probably not realistic given an
eight-hour RTO.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 31


[email protected]
Concepts

1: Resistance is business continuity preparation, building in protection that


prevents damage when an event occurs. It consists of the measures that are taken
to limit foreseeable problems in the data center. Some of these measures include:

• Modern building structures for the data center to withstand natural calamities
such as earthquakes and fires.
• Security measures such as CCTV, entry badges, and guards.
• Employing redundant power supply for the data center.

2: Resilience involves measures taken to proactively and reactively respond to


disaster quickly. An example for resilience is an administrator detecting an
impending component failure using indicator lights and acting on it. Some of the
resilience measures include:

• Ensuring high availability of software and hardware. For example, taking data
backups and replicas, stock hot swap components and so on.
• Warm and Hot sites to redirect operations when a disaster occurs at the primary
site.
• Ability to perform maintenance task while keeping the data available.
• Identify gaps, risks, vulnerabilities by creating and performing resilience tests.

3: Recovery addresses the question "The disaster has occurred, now what?".
Recovery typically is not immediate and access to the data may be delayed until it
is restored. Organizations with no plan or a poor plan are likely to fail after a
disaster. Some examples of the disaster recovery measures include:

• Restoring data and operations from a cold site.


• Rebuilding data center environment.
• Shifting resources such as manpower to disaster recovery site.

PowerScale and Disaster Management

Resistance

Resistance involves measure from protecting from disasters at a facilities level. It


goes beyond the scope of the PowerScale system. Resistant hardware means

PowerScale Advanced Administration

Page 32 © Copyright 2020 Dell Inc.


[email protected]
Concepts

having inherent protection that is built in to the storage array to prevent damage.
Some examples of PowerScale Resistance include:

• Battery backup for Journals21


• Compliant with ASHRAE A3 data center environment guidelines22

* Dry-bulb temperature, 5.5⁰ C dew point to 60% relative humidity ** Dry-bulb temperature, -12⁰ C
dew point and 8% to 85% relative humidity

21Gen 6 uses a M.2 drive as the journal whereas Gen 6.5 hardware uses a
NVDIMM drive. In both cases, a dedicated battery backup of the vault drive helps
prevent data loss. When the data center loses power, writes to the PowerScale are
preserved.

22PowerScale is compliant with the American Society of Heating, Refrigerating,


and Air-conditioning Engineers or ASHRAE guidelines for temperature and
humidity operating ranges. The A3 class is intended to remove obstacles to new
data center cooling strategies such as free-cooling methods. Free-cooling takes
advantage of the local climate of a facility. It uses outside air to cool IT equipment
without the use of mechanical refrigeration such as chillers or air conditioners
whenever possible.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 33


[email protected]
Concepts

Resilience

Rear view of a 3-F600 node cluster.

PowerScale has inherent high availability features at both the hardware and
software level to reduce the impact of disaster. Some of the PowerScale Resilience
measures include:

• Redundant power supply units23, back-end ports, and front-end ports.

23Each Gen 6.5 node has a dual redundant power supply units. Gen 6 nodes are
added as node pairs which provide power redundancy.

PowerScale Advanced Administration

Page 34 © Copyright 2020 Dell Inc.


[email protected]
Concepts

• Reed-Solomon for data protection with tunable protection levels.


• OneFS Services:

− SyncIQ24
− SnapshotIQ25
− SmartConnect26
− FlexProtect27
− CloudPools28

24 Configures replication to a target cluster with failover and failback capabilities.

25 Proactively create snapshots for files and directories.

26SmartConnect acts as a load balancing mechanism should the external network


card of a node fail.

27 The FlexProtect job copies data to other drives across multiple nodes.

28Configures data to be stored outside the cluster and to target cloud solution such
as ECS.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 35


[email protected]
Concepts

Recovery

With the many OneFS services available and configured, it is possible to recover
from a disaster using snapshots, replicas, backups, or cloud tiered data. For
example, once a disaster occurs, SyncIQ can be used to resume operations at a
target site.

• SyncIQ is configured to replicate of the data to the target site.


• Superna Eyeglass29 is configured to orchestrate the failover and failback.

29 The value of Eyeglass is its ability to replicate configuration data and orchestrate
failover and failbacks. It acts a witness between the source and target clusters.

PowerScale Advanced Administration

Page 36 © Copyright 2020 Dell Inc.


[email protected]
Concepts

• When a disaster occurs and the primary site loses funtionality, the data and
operations are failed over to the target site.
• Once the primary site is functional again, data and operations are failed back to
it.

Disaster Considerations

Some of the disaster recovery considerations for business continuity include:

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 37


[email protected]
Concepts

• Multiple utility sources30


• Segregate mission critical systems31
• Network redundancy32
• Formulate a BC plan
• Avoid Single Point of Failure

30Ideally, the facility has two power sources. Systems should have different and
redundant sources of power and should they fail, generators, and then battery
power. When the data center fails over to generator power, the refueling should be
planned and facilitated.

31Mission critical systems can be given higher protection by segregating them from
the other enterprise system. The segregation can be both physical as well as
logical.

32Communication failures accounts for many issues and connectivity redundancy


should be considered for all key systems.

PowerScale Advanced Administration

Page 38 © Copyright 2020 Dell Inc.


[email protected]
Concepts

PowerScale Disaster Resilience

PowerScale Resilience Overview

Shown in the graphic is PowerScale’s hardware and software resilience features.


PowerScale resilience is the ability to maintain data availability in the event a
hardware component fails.

Node pairs provide power


Clustered nodes for high
redundancy and journal protection
availability Tunable protection levels
Features:

SmartConnect

SmartFail

Power Redundancy

Node pairs use different power


sources
FlexProtect Job

Node pair can run with single


powersupply

Reed-Solomon for data


Dual back-end ports Dual front-end ports
protection

The graphic shows an 8-node Gen 6 cluster.

OneFS Fault Tolerance

Listed here are the OneFS fault tolerant functions with a brief definition.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 39


[email protected]
Concepts

Feature Description

File system journal Journal that is either battery backed33 up


NVRAM or NVMeE SSD and mirrored file
system per node to guard write
transactions against sudden power loss.

Proactive device failure Remove or SmartFail any drive that


reaches an error detected threshold.

PowerScale data integrity Protects file system structures against


corruption.

Protocol checksums Checksum verification for the Remote


Block Management protocol data.

Dynamic sector repair The file system forces bad disk sectors to
be rewritten elsewhere.

MediaScan Check disk sectors and deploy dynamic


sector repair to fix any sector errors
encountered.

33Gen 6 uses a M.2 drive as the journal whereas Gen 6.5 hardware uses a
NVDIMM drive. In both cases, a dedicated battery backup of the vault drive helps
prevent data loss. When the data center loses power, writes to the PowerScale are
preserved. Large journals offer flexibility in determining when data should be
moved to the disk. A node mirrors their journal to its peer node. A backup battery
helps maintain power while data is stored in the vault.

PowerScale Advanced Administration

Page 40 © Copyright 2020 Dell Inc.


[email protected]
Concepts

IntegrityScan Examines the entire file system for


inconsistencies.

Fault isolation Isolates inconsistencies or data loss to the


unavailable or failing device.

Accelerated drive rebuilds Use CPU, memory, and spindles from


multiple nodes to reconstruct data from
failed drives.

Automatic drive firmware updates Automatic drive firmware updates for new
and replacement drives.

Rolling upgrade Upgrades and restarts each node in the


cluster sequentially.

Nondisruptive upgrades Upgrade the operating system while users


access data without error or interruption.

Roll back capable Return a cluster with an uncommitted


upgrade to its previous version of OneFS.

Performing the upgrade Automatically runs a pre-install verification


check before starting an upgrade.

Data protection Allows differing levels of protection to be


applied in real time down to a per-file
granularity.

Link: High Availability and Data Protection with DELL EMC


PowerScale Scale-Out NAS

OneFS fault tolerance is not within the scope of this training, but is part of
PowerScale Disaster resiliency. For more in-depth details, go to

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 41


[email protected]
Concepts

https://www.emc.com/collateral/hardware/white-papers/h10588-isilon-data-
availability-protection-wp.pdf link and review the white paper.

Protection Review

Protection Level Review

* Minimum number of nodes reflects when FEC is calculated vs Mirroring.

Following table provides an easy reference for all the protection levels.

Decision point: What is most important? Increased node failure protection,


increased drive failure protection, or both?

Data Layout Review

Shown in the graphic is a representation of drive sleds with three drives, typical in
the F800, H500, and H400 nodes.

PowerScale Advanced Administration

Page 42 © Copyright 2020 Dell Inc.


[email protected]
Concepts

Example
File size – 768 KB
Stripe can start on
Each stripe unit is
any disk in the failure 768 KB file written
Protection – N+2d:1n
128 KB
domain equals six stripe
units

* Maximum stripe width


N+2d:1n makes 2 is 2 MB, or 16, 128 KB
FEC stripes stripe units

Entire file remains


in a disk pool
FEC placement in
the stripe width is
random

* +3d:1n1d protection has a


maximumof 15 data stripe
units
Blue Disk Pool Each disk pool is a
Purple Disk Pool unique failure domain
Green Disk Pool

Decision point: Do I employ different protection levels for different workflow?

Neighborhood Review

In the illustration the cluster goes from three disk pools to six, each color
represents a disk pool.

Single
Each
neighborhood, 3
neighborhood has

At 40 nodes, protection against chassis failure

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 43


[email protected]
Concepts

Gen 6 Chassis Failure

Shown in the graphic is a 40 node cluster to illustrate a chassis failure. The addition
of 40th node splits the cluster into four neighborhoods, labeled NH 1 through NH 4.

At 40 nodes, no disks in nodes


are in same disk pool as other
node in chassis

At 40 nodes, splits to 4
neighborhoods

Quorum

Quorum is important for anticipating failure scenarios. OneFS uses quorum to


prevent split-brain conditions that might possibly result from a temporary cluster
division. A cluster not meeting quorum is under protected.

PowerScale Advanced Administration

Page 44 © Copyright 2020 Dell Inc.


[email protected]
Concepts

6-node Cluster

More than or
equal to 50% of
equivalent nodes
down or removed

Cluster remains Cluster is read-only


read and write and
underprovisioned

The image shows rear view of Gen 6 node.

Protection Level Review: The cluster provides data availability even when one or
multiple devices fail. Reed-Solomon erasure coding is used for data protection.
Protection is applied at the file level, enabling quick and efficient data recovery.
OneFS has +1n through +4n protection levels, providing protection for up to four
simultaneous component failures. A single failure can be an individual disk, a node,
or an entire Gen 6 chassis. A higher protection level than the minimum can be set
and can be controlled per cluster or directory level. Remember that Gen 6 requires
a minimum of 4 nodes of the same type. With mirroring, the cluster can recover
from N - 1 drive or node failures without sustaining data loss. For example, 4x
protection means that the cluster can recover from a three drive or three node
failures. Consider the trade-offs when making choices. Capacity requirements
increase as the protection level is increased. Data can be split into tiers and node
pools with different protection levels. Do the size of your node pools meet the
minimum required size for the protection level? Will the small performance penalty
impact your workflow? These are the considerations when determining how much
to increase the requested protection level to meet your objectives.

Data Layout Review: Drive failures represent the largest risk of data loss
especially as node pool and drive sizes increase. To illustrate the data protection
resiliency, let us begin with a refresher on the OneFS data layout using Gen 6
nodes. Gen 6 nodes have drive sleds with three, four, or six drives. As shown in the
graphic, each of the 4 nodes has 5 drive sleds and the first disk in each sled

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 45


[email protected]
Concepts

belongs to the same disk pool. Likewise, the second disk in each sled belongs to a
different disk pool, as does the third disk in each sled. The different colors
represent the 3 different disk pools. All data, metadata, and FEC blocks are striped
across multiple nodes in the disk pool. The striping protects against single points of
failure and bottlenecks. All parts of a file are contained in a single disk pool. The
graphic shows an example file of 768 KB written to the disk pool noted in blue.
Containing files into a disk pool creates isolated drive failure zones. Disk pool
configuration is automatically done as part of the auto provisioning process and
cannot be configured manually. File data is broken into 128 KB data stripe units
consisting of up to 16 x 8 KB blocks per data stripe unit. A single file stripe width
can contain up to 16 x 128 KB data stripe units for a maximum size of 2 MB. A
large file has thousands of file stripes per file distributed across the disk pool. The
protection is calculated based on the requested protection level for each file stripe
using the data stripe units assigned to that file stripe. The higher the desired
protection level, the more FEC stripes units are calculated. This example
showcases the +2d:1n protection level on a 4 node cluster. The data can be rebuilt
if two disks fail, or if a node fails. Typically, higher value data is configured with a
higher protection level.

Neighborhood Review: Node pools are made up from groups of like-type nodes.
Gen 6 node pools are divided into neighborhoods for smaller, more resilient fault
domains. A Gen 6 node pool splits into two neighborhoods when adding the 20th
node. One node from each node pair moves into a separate neighborhood. In this
example, the cluster goes from three disk pools to six. In the figure, each color
represents a disk pool. After the 20th node added up to the 39th node, no 2 disks in
a given drive sled slot of a node pair share a neighborhood. The neighborhoods
split again when the node pool reaches 40 nodes. At 40 nodes, each node within a
chassis belongs to a separate neighborhood. The next neighborhood division
happens when the 80th node is added, and then again when the 120th node is
added. Given a protection of +2d:1n, the loss of a single chassis does not result in
a data unavailable or a data loss scenario. In a Gen 6.5 node pool, neighborhood
splitting and suggested protection policies are identical to Gen5.

Gen 6 Chassis Failure: A loss of a node does not automatically start reprotecting
data. Many times a node loss is temporary, such as a reboot. If N+1 data protection
is configured on a cluster, and one node fails, the data is accessible from every
other node in the cluster. If the node comes back online, the node rejoins the
cluster automatically without requiring a rebuild. If the node is physically removed, it
must also be SmartFailed. Only SmartFail nodes when needing to remove from the

PowerScale Advanced Administration

Page 46 © Copyright 2020 Dell Inc.


[email protected]
Concepts

cluster permanently. Once the 40th node is added, the cluster splits into four
neighborhoods, labeled NH 1 through NH 4. The splits place each node in a
chassis into a failure domain different from the other three nodes in the chassis.
Also, every disk in a node is in a separate disk pool from the other node disks.
Having each node in a distinct neighborhood from other nodes in the chassis
allows for chassis failure. SmartFailing a failed node rebuilds the data on the free
space of cluster. Adding a node distributes the data to the new node.

Quorum: For a quorum, more than half the nodes must be available over the
internal network. A 6 node Gen 6 cluster requires a 4-node quorum. Imagine a
cluster as a voting parliament where the simple majority wins all votes. If 50% or
more of the members are missing, there can be no vote. Without quorum, reads
can occur, but no new information is written to the cluster. The OneFS file system
becomes read-only when quorum is lost. The quorum also dictates the minimum
number of nodes required to support a given data protection level. As seen in the
earlier table, each protection level requires a minimum number of nodes. For
example, +2n needs a minimum of four Gen 6 nodes. Why? You can lose two
nodes and still have four Gen 6 nodes up and running; greater than 50%. Under
protection applies maintaining redundancy of power sources and networks. In a
SyncIQ solution, if snapshots are used on a target cluster, consider the snapshot
schedule and maintenance over time. Define a snapshot expiration period on the
target, otherwise the replicated dataset can consume more capacity than intended.

Back-End Network Resilience

PowerScale cluster separates back-end and front-end network connectivity. The


back-end interconnect provides high throughput and low latency for internal
communication.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 47


[email protected]
Concepts

Int-b - Passive
failover network

Int-b

Int-a

Int-a - Active
network

Migrating from InfiniBand to


Ethernet is not supported

The reliability is important in creating a true scale-out storage system. Clusters


consisting a mix of Gen 5 and Gen 6-nodes use an InfiniBand back-end network.
An all Gen 6 cluster uses a 40 Gbs and Gen 6x-nodes of 40 Gbs or 100 Gbs use
an Ethernet back-end network. Nodes coordinate work among themselves when
writing or reading data. A single front-end operation can generate multiple
messages on the back-end. Implement dual back-end switches for increased
resiliency. If the int-a switch fails, internal communication and data traffic continues
on the int-b failover switch, providing the cluster with continued back-end
communications.

Sizing for Maximum Resilience

Shown in the graphic are the areas to size a cluster for maximum resilience.

PowerScale Advanced Administration

Page 48 © Copyright 2020 Dell Inc.


[email protected]
Concepts

File protection Node protection


levels levels
Node
protection

Forward error
correction

Example

3d:1n1d

3 FEC units

2 drives per stripe unit

Left image shows rear view of Gen 6 node and right image shows front view of Gen 6 node.

Decision point: How many nodes will give the best storage
efficiency?

The protection level is tunable and variable down to the file level. With Gen 6x, for
better resiliency, better efficiency, using +2d:1n, +3d:1n1d, or +4d:2n is
recommended. Let us illustrate resiliency and efficiency with an example. If the
workflow requires the +3d:1n1d protection that is recommended on large capacity
drives. What is the most efficient cluster size that must meet the requirement? The
cluster must be large enough to tolerate the loss of three disks and one node. The
graphic shows 15 stripe units (SU) and 3 FEC units for the maximum stripe width
for +3d:1n1d protection. A smaller cluster size does not use a stripe width of 15
data stripe units. 9 Gen 5 nodes or 10 Gen 6 nodes can meet the protection
requirements of the workflow at the maximum efficiency. In the example, if three
disks fail, the data can be rebuilt. If one node fails and one disk fails, the data can
be rebuilt.

High Availability

The availability and protection of data can be usefully illustrated in terms of a


continuum. At the beginning of the continuum sits high availability.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 49


[email protected]
Concepts

High availability and resilience are integral to OneFS from the lowest level on up.
For example, OneFS provides mirrored volumes for the root and /var file systems
using the Mirrored Device Driver (IMDD), stored on flash drives. OneFS also
automatically saves last known good boot partitions for further resilience.
SmartConnect software contributes to data availability by supporting dynamic NFS
failover and failback for Linux and UNIX clients and SMB3 continuous availability
for Windows clients. This ensures that when a node failure occurs, or preventative
maintenance is performed, all in-flight reads and writes are handed off to another
node in the cluster to finish its operation without any user or application
interruption.

Safety Margins

Safety margins are not wasted capacity, but a good use of capacity, and they are
important.

PowerScale Advanced Administration

Page 50 © Copyright 2020 Dell Inc.


[email protected]
Concepts

Used Space Action

80%34 Begin planning to purchase more


storage

85%35 Receive delivery of new storage

90%36 Install the new storage - slower


performance and possible workflow
interruptions

95%37 Possibility of data unavailability if


capacity is not increased

98%38 Data can become unavailable and


file operations fail to execute.

34 Best practice is less than 80%storage capacity used.

35Consider growth rate, protection levels, and how long it takes to order and install
nodes.

36 At 90 percent full, cluster performance can noticeably slow down, high


transaction workflows can suffer interruptions and timeouts, and write speeds for
critical operations can be affected. When drives become full, seek times become
longer, and writes naturally take longer to perform. This can impact other cluster
operations including write cache queues. Leave overhead - creating a file pool
policy can flood the node pool. 90% - Dell Technologies recommends maximum
capacity utilization. Maintain 10% free space is each disk pool.

37 Compensate for hardware failures by leaving headroom.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 51


[email protected]
Concepts

Decision point: When should I add storage?


Link: Best practices guide for maintaining enough free space on
PowerScale Clusters and Pools
Dell EMC PowerScale OneFs Best Practices

Resiliency is ensuring the cluster can continue working even if nodes and disks
malfunction. Key is leaving headroom in the cluster. Data protection can manage
the loss of a node, but reprotecting data requires the space to rebuild. A loss of a
disk or node pushes the capacity of cluster to operate below the demands placed
on it. Examine data delivery variance at 80% and again at 85% capacity utilization.
High CPU levels, pushing the maximum connection counts per node, maximizing
drive IOPS, or creating snapshots as fast as they are deleted, leaves no headroom.
When the space availability of disk pools falls below the low space threshold, the
job engine engages the low space mode. New Jobs are not started in low space
mode and jobs that are not space-saving will be paused. Once free space returns
above the low-space threshold, jobs that have been paused for space are
resumed.

38 At 98 percent full, more issue symptoms can be experienced. The cluster or


node pool performance is slower. Workflow operations can be disruptive, and file
operations fail to execute. You may have an inability to write or delete data from the
cluster. This can also cause an inability to make configuration changes, or run
commands that are used to free up space on the cluster. Data can become
unavailable, and has an increased potential for data loss. Client operations can fail,
or clients may be unable to authenticate. Clients may not be able to connect to the
cluster or navigate data if connected.

PowerScale Advanced Administration

Page 52 © Copyright 2020 Dell Inc.


[email protected]
Concepts

Access Cluster

Before assessing the cluster for a failure, ensure that the failure is within the
cluster.

A simple topology of accessing cluster data.

3
1 2

1: PowerScale accounts for 1/3rd of the solution. Knowing where the breakdown
occurs helps to isolate the failure. If the issue is unique to a single user, then it is
likely a client problem. The customer owns and maintains the client and network
points of the topology, making 2/3rds of the workflow the customer responsibility.

2: If the clients can only access resources outside of the data center, then the
problem may be the network. If a user is denied access to a file share of cluster,
the problem is likely in the authentication process. If the user cannot open a file,
then the issue could be permissions or ownership on the file.

3: If the user cannot save data to the file share, enforced quota limits may be
reached. Because of PowerScale’s resiliency, disk and node failures may have little
impact to client access. Conversely, the impact can become severe if the cluster
runs near full storage capacity, or other resources such as CPU and RAM are near
peak levels. In severe cases such as storage capacity near maximum and
encountering multiple disk failures, the failure can be near catastrophic.

Important: Failing to respond, react, or repair a hardware failure


could develop into a catastrophe.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 53


[email protected]
Concepts

Check Jobs

Check jobs from the web administration interface on the Cluster Management, Job
Operations page shown.

1: The Job Summary page shows FlexProtect running. The elapsed time and the
phase can be used as a rough indication of the total time the job may take. A job
has up to 6 phases depending on the job. Some phases are skipped and go
straight to phase 6 without going to 4 and 5.

2: The Job Types list all the jobs and allows the administrator to manually start a
job.

3: Job Reports provides a history of the jobs completed.

4: Job Events details each phase of the job.

5: The Impact Policies tab allows administrators to add or modify the impact
policies.

The job information is also provided using the isi job command.

Track Activity - DataIQ

DataIQ monitors cluster health independent of the cluster status. DataIQ monitors
multiple clusters with massive node counts. It also configures and receives alerts
that are based on limits and issues.

The graphic shows an excerpt of the WebUI page with error logs generated for
plug-ins and the DataIQ system.

PowerScale Advanced Administration

Page 54 © Copyright 2020 Dell Inc.


[email protected]
Concepts

The DataIQ server scans the managed storage, saves the results in an index, and
provides access to the index. DataIQ is an optimized data storage scan, index, and
in-memory search database platform that provides visibility to data spanning
multiple platforms.

Track Activity - InsightIQ

InsightIQ is an excellent tool for monitoring and identify trending information.


Shown in the graphic is a state where all the user capacity is consumed.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 55


[email protected]
Concepts

Use InsightIQ’s to generate periodic reports. Detailing cluster events, plotting job
execution, and breaking out job workers are some of the many functions of
InsightIQ.

Under - Protected

Operating underprotected or disabling virtual hot spare, or VHS, can gain disk
space for users, but risks filling the cluster to 100% full.

Not enough VHS


reserve

VHS reserve

All free space used Little free space


for data rebuilds Unused Capacity
Used Capacity

Used Capacity

1-4 per node pool

Consider reserved space


characteristics

0% to 20%

By default, all available free space on a cluster is used to rebuild data. Without
enough free space, jobs can stop, and a failed disk cannot SmartFail out of the
cluster. VHS allocation enables space allocation for data rebuild when a drive
failure. VHS can assure there is always space available and to protect data
integrity if overuse of the cluster space occurs. VHS enabled may stop writes when
apparently the space is available. VHS is not deducted when viewing capacity such
as the isi status output. For example, setting two virtual drives or 3% of total
storage causes each node pool to reserve virtual drive space. The space is
equivalent to two drives or 3% of the total capacity for virtual hot spare, whichever
is larger. Free-space calculations exclude the space that is reserved for the virtual
hot spare. The reserved virtual hot spare free space is used for write operations
unless you select the option to deny new data writes.

PowerScale Advanced Administration

Page 56 © Copyright 2020 Dell Inc.


[email protected]
Concepts

Data Recovery

Can data be recovered from


remote site?

Can data be recovered from a


snapshot?

Can data be recovered from tape?

Decision Point: How will I recover lost data? Snaps? SyncIQ? Tape?

If the assessment of damage includes data loss, how is the lost data recovered?
Snapshots are a great way to recover files and directories, but if the loss includes
losing the needed snapshots, restoring from tape might be the solution. If the loss
is extensive or the restore time from tape fails the RTO requirement, failing back
from a target cluster may be a good option. Depending on the amount of data to
recover, restoring data over the WAN can be time intensive and exceed the RTO.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 57


[email protected]
Concepts

Challenge

Lab Assignment:
1) Explore the data protection levels and settings applied at different
levels.
2) Verify PowerScale data resilience when a node is down or smartfailed.

PowerScale Advanced Administration

Page 58 © Copyright 2020 Dell Inc.


[email protected]
Concepts

OneFS Domains

OneFS Domains

The graphic shows the SyncIQ domain /ifs/div-gen/engineering/projects.

A OneFS domain is a directory that when acted upon, the actions are applied only
to that root directory. SnapRevert, SmartLock, SyncIQ, and Snapshot are the four
OneFS domains.

Click for the SyncIQ domain example description.39

39The example shows the SyncIQ domain /ifs/div-gen/engineering/projects. The


SyncIQ policy is configured for the /ifs/div-gen/engineering/projects directory. The
images directory and all the files and directories below consist of the SyncIQ

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 59


[email protected]
Concepts

Source Content: For in-depth details, see the


http://www.unstructureddatatips.com/.

OneFS Domains Overview

View the information in each of the tabs for an overview of the OneFS domains.

domain, also called the SyncIQ root directory for the SyncIQ policy. When an action
such as a domainmark runs, OneFS applies the action only to that SyncIQ domain.

PowerScale Advanced Administration

Page 60 © Copyright 2020 Dell Inc.


[email protected]
Concepts

SyncIQ Domain

OneFS assigns SyncIQ domains to the source and target directories when creating
the replication policy.

Click for a SyncIQ domain example description.40

OneFS functions only act on data within the scope of the SyncIQ domain. For
example, when a domain mark is initiated, the domain mark only tags LINs in the
SyncIQ domain.

40 The graphic shows that a SyncIQ policy replicates the Boston cluster homedirs
directory to the Seattle cluster homedirs directory. OneFS automatically creates the
Seattle cluster homedirs SyncIQ domain when the SyncIQ policy first runs.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 61


[email protected]
Concepts

SmartLock Domain

OneFS assigns SmartLock domains to WORM directories to prevent modifying or


deleting committed files. A SmartLock domain is automatically created when
creating a SmartLock directory. You cannot manually create SmartLock domains
and you cannot delete a SmartLock domain. However, if you remove a SmartLock
directory, OneFS automatically deletes the associated SmartLock domain.

Click for the SmarLock domain description.41

41The example shows a SmartLock domain assigned to the /ifs/div-


gen/engineering/projects directory. The three files in the SmartLock domain are
WORM committed and cannot be modified or moved. The scope of the SmartLock
domain applies only to the /ifs/div-gen/engineering/projects directory.

PowerScale Advanced Administration

Page 62 © Copyright 2020 Dell Inc.


[email protected]
Concepts

SnapshotIQ Domain

Snapshot domain: 6

Snapshot snap ID: 6

Snapshot domain: 5

The SnapshotIQ domains enable governance of scheduled snapshots. Using


snapshot domains increase recurring snapshot efficiency and performance by
limiting the scope of governance to a smaller, well defined domain boundary.

Click for the SnapshotIQ domain description.42

42The example shows a snapshot taken on the /ifs/div-gen/engineering directory


and another on the /ifs/div-gen/engineering/projects directory. OneFS assigns the
snapshot domain IDs 5 and 6. The snaps of the three files are marked with the
snapshot domain IDs, making it more efficient to determine governance. Creating
two domains of the same type on the same directory causes the second domain to
become an alias of the first domain. Aliases do not require marking since they
share the already existing marks. This benefits both snapshots and snapshot
schedules taken on the same directory, reducing the number of I/O and locking
operations needed to resolve snapshot governance.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 63


[email protected]
Concepts

SnapRevert Domain

The SnapRevert job restores a snapshot in full to its top level directory. OneFS
assigns SnapRevert domains to directories within snapshots to prevent
modification of files and directories when reverting a snapshot. OneFS does not
automatically create SnapRevert domains. Click on each item for more information:

• Prevents writes.43

43A SnapRevert domain uses a piece of file system metadata and associated
locking to prevent writes to files in the domain while restoring to the last known
good snapshot.

PowerScale Advanced Administration

Page 64 © Copyright 2020 Dell Inc.


[email protected]
Concepts

• Creating a SnapRevert domain.44


• Create domain early.45

OneFS Domains Administration

Create Domain using WebUI

You can create SyncIQ domains or SnapRevert domains to facilitate snapshot


revert and failover operations. You cannot create a SmartLock domain. OneFS
automatically creates a SmartLock domain when you create a SmartLock directory.

44 You cannot revert a snapshot until creating a SnapRevert domain on its top level
directory.

45 Since the SnapRevert domain is a metadata attribute, or marker, placed onto a


file or directory, a good practice is to create the domain before it has data. An
empty domain avoids the time needed for DomainMark or DomainTag to walk the
entire tree, setting that attribute on every file and directory within it.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 65


[email protected]
Concepts

1. Click Cluster Management > Job Operations > Job Types.


2. In the Job Types area, in the DomainMark row, from the Actions column,
select Start Job.
3. In the Domain root path field, type the path of the directory you want to create a
protection domain for.
4. From the Type of domain list, specify the type of domain you want to create.
5. Ensure that the Delete this domain check box is cleared.
6. Click Start job.

Delete Domain using WebUI

You can delete SyncIQ domains or SnapRevert domains if you want to move
directories out of the domain. You cannot delete a SmartLockdomain. OneFS
automatically deletes a SmartLock domain when you delete a SmartLock directory.

PowerScale Advanced Administration

Page 66 © Copyright 2020 Dell Inc.


[email protected]
Concepts

1. Click Cluster Management > Job Operations > Job Types.


2. In the Job Types area, in the DomainMark row, from the Actions column,
select Start Job.
3. In the Domain root path field, type the path of the directory you want to delete
a protection domain for.
4. From the Type of domain list, specify the type of domain you want to delete.
5. Select Delete this domain.
6. Click Start job.

Create and Delete Domain using CLI

The examples show creating and deleting a SnapRevert Domain.


• Creating a SnapRevert domain:

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 67


[email protected]
Concepts

# isi job jobs start domainmark --root /ifs/eng/sw --dm-


type SnapRevert
• Deleting a SnapRevert domain:

# isi job jobs start domainmark --root /ifs/eng/sw --dm-


type SnapRevert –delete
If you want to move directories out of the domain, you can delete a SyncIQ or
SnapRevert domain.

isi_pdm command

Use the isi_pdm utility to manage OneFS domains. The isi_pdm command has
the options to create, delete, exclude, include, show_exclusions,
showall, list, read, and write domains.

• list example:
# isi_pdm list /ifs/eng/sw All
[ 5.0100, 6.0100, 6.0300 ]
• showall example:

# isi_pdm showall
...
{
Base DomID = 6.0000
Owner LIN = 1:0033:00be
Ref count = 3
Ready = true
Nested tag count = 0
Rename tag count = 0
Modification time = 2018-11-13T06:34:31-0500
DomIDs = [ 6.0100, 6.0300 ]
}

isi get command

Administrators can view the IFS domain IDs by using the isi get command. The
output is verbose so the example uses grep to show only the IFS Domain IDs

PowerScale Advanced Administration

Page 68 © Copyright 2020 Dell Inc.


[email protected]
Concepts

field. The first line of output is for the ./ directory (/ifs/eng/sw). The second line is
for the ../ directory (/ifs/eng), and the third is for a file in the /ifs/eng/sw directory.

• Example:
# isi get -g /ifs/eng/sw | grep "IFS Domain"
* IFS Domain IDs: {5.0100(Snapshot), 6.0100(Snapshot),
6.0300(WORM) }
* IFS Domain IDs: {5.0100(Snapshot) }
* IFS Domain IDs: {5.0100(Snapshot), 6.0100(Snapshot),
6.0300(WORM) }
• The output shows a nested snapshot domain and a SmartLock domain.

OneFS Domain Considerations

The list shows areas to consider when working with OneFS domains. Click each list
item for details.
• Copying many files into a OneFS domain can be a lengthy process. OneFS
must mark each file individually as belonging to the protection domain.
• The best practice is to create protection domains for directories while the
directories are empty, and then add files to the directory.
• The isi sync policies create command contains an --accelerated-
failback true option, which automatically marks the domain. The option can
decrease failback times.
• If using SyncIQ to create a replication policy for a SmartLock compliance
directory, you must configure the SyncIQ and SmartLock compliance domains
at the same root directory level. A SmartLock compliance domain cannot be
nested inside a SyncIQ domain.
• If a domain prevents the modification or deletion of a file, you cannot create a
OneFS domain on a directory that contains that file. For example, if
/ifs/data/smartlock/file.txt is set to a WORM state by a SmartLock
domain, you cannot create a SnapRevert domain for /ifs/data/.
• You cannot move directories in or out of protection domains. However, you can
move a directory to another location within the same protection domain.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 69


[email protected]
Concepts

Troubleshooting PowerScale Cluster

Troubleshooting Guides

Locating Guides

You can access troubleshooting guides from the OneFS CustomerTroubleshooting


Guides Information Hub. Use PowerScale Customer Troubleshooting Guides as a
search to locate all available guides on the Dell EMC support page.

PowerScale Advanced Administration

Page 70 © Copyright 2020 Dell Inc.


[email protected]
Concepts

Using Guides

• Customer accessible troubleshooting guides46


• Technical support47

46 Troubleshooting specific issues utilizes the troubleshooting guides. Here you can
find step-by-step instructions to help you troubleshoot some common issues that
affect OneFS clusters. Some of these troubleshooting guides reference other
troubleshooting guides or refer to other Dell EMC PowerScale documents, such as
knowledge base articles or white papers.

47More troubleshooting guides are available for Technical Support only. These
guides involve specific process issues, or issues of a less common occurrence, or
may involve risky operations only to be performed under support supervision.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 71


[email protected]
Concepts

Guides Availability

New guides are added as they become available or new topics are recognized with
new versions of OneFS and new PowerScale hardware. Visit the Info Hub page
and the support page periodically to check for new guides, or existing guide
updates. The guides on the Info Hub page are grouped into related topic areas.
Under each topic area-specific guide links are listed for each available
troubleshooting guide.

Guides by Topic Area

Each guide is available for download as a PDF file. Guides vary in length and topic
focus. A single guide can contain any number of investigation topic areas. Take a
few moments to review the guides available for each topic area.

PowerScale Advanced Administration

Page 72 © Copyright 2020 Dell Inc.


[email protected]
Concepts

Links: OneFS Troubleshooting Guides, Info Hub, Dell EMC Support


Page.

Troubleshooting Process

The troubleshooting process follows a standard methodology that is applied in each


of the guides. Start with the foundation, and build up by layer. Understanding the
approach to this methodology allows administrators to apply the same principals
when troubleshooting other issues.

1: Client

When reaching the top layer of clients, most likely the issue is external to the
cluster, or a client configuration issue. To troubleshoot client connectivity issue, use
the Clients Cannot Connect To A Node guide.

2: Protocols: Next protocols, such as SMB, NFS, HDFS, S3, and the associated
processes.

3: File System

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 73


[email protected]
Concepts

The file system is the third layer. This includes many internal processes and
routines. To troubleshoot file system issues, use the Troubleshoot An
Unresponsive Node guide.

4: Hardware

The hardware is the next layer. Hardware issues are often masked in other
symptoms. To troubleshoot hardware issues, use the PowerScale Troubleshooting
Guide: Hardware - Top Level guide.

5: Network

Start with the network layer as the foundation. Many PowerScale issues are a
result of network issues, both the internal back-end network, and the client-side
front-end. This includes reaching external resources such as DNS servers48, and
internal issues with SmartConnect. Troubleshooting Your SmartConnect
Configuration guide can be used to perform SmartConnect troubleshooting.

48DNS or Domain Name Service resolves hostnames to IP addresses.


Troubleshooting DNS is performed with the utilities, ‘nslookup’ or ‘dig.’

PowerScale Advanced Administration

Page 74 © Copyright 2020 Dell Inc.


[email protected]
Concepts

Troubleshooting Steps

Within each guide, there is a Content and Overview page or pages depending on
the length of the guide. It acts as a high-level link to separate flowcharts, and quick
access to associated appendixes.

The page numbers and the name of the guides are positioned the same on every
page for quick reference.

Using Guide Flowchart

Using the guide flowcharts is straight forward if familiar with using flowcharts.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 75


[email protected]
Concepts

1: Caution box warns that a particular step must be performed with great care, to
prevent serious consequences. Caution notes must be followed. The symbol is on
the explanation note and the step it is associated with.

2: Each page begins with the page number or a start location. From there, follow
the arrows. Any page-related notes should be read and followed. Directional
arrow indicates the path through the process flow.

3: Follow the decision trees. These may be yes or no decisions, or may contain
multiple choices questions and answers to find the appropriate direction to follow.

4: If the process continues, a Go to Page # box is reached. Click the Page #


hyperlink to go to the continuation page. Any supporting documentation for the
process is indicated using the flowchart documents symbol.

Document Shape: Calls out supporting documentation for a process step. When
possible, these shapes contain links to the reference document. Sometimes linked
to a process step with a colored dot.

5: Every guide applies the troubleshooting guide process to the start page. Each
flowchart starts with a note respective to the troubleshooting guide. The process
begins at the Start symbol and requires a step to log in to the cluster. To continue
the process, select an appropriate Go to Page #.

PowerScale Advanced Administration

Page 76 © Copyright 2020 Dell Inc.


[email protected]
Concepts

For example, the PowerScale Customer Troubleshooting Guide: Troubleshooting


Capacity Alerts on Node Operating System Partitions.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 77


[email protected]
Concepts

Troubleshoot Hardware Events

Hardware Status Significance

Hardware issues can be the root cause of many issues that appear to be
something else.

• Performance Degradation49
• Intermittent Issues
• Unplanned Reboots50
• Job Pauses51

49 The hardware status directly affects the performance.

50 Unplanned node reboots can be often attributed to boot drive partition issues.

PowerScale Advanced Administration

Page 78 © Copyright 2020 Dell Inc.


[email protected]
Concepts

• Intelligent Platform Management Interface52 monitors the hardware


components.
• Maintaining hardware firmware by updating to the most recent approved version
is an important maintenance component.

51 Job pauses are often experienced while a failing drive is SmartFailed from the
cluster.

52The CELOG collects and logs the events that are detected. The log files are
gathered using the isi diagnostics gather command using the CLI or through the
web administration. Many hardware-specific troubleshooting guides and knowledge
base (KB) articles are available to assist with troubleshooting issues.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 79


[email protected]
Concepts

Hardware Status Indicators

Hardware status indicators take several different forms such as: Cluster status,
individual node status, drive status, the percentage full on boot drive partitions and
firmware mismatch.

PowerScale Advanced Administration

Page 80 © Copyright 2020 Dell Inc.


[email protected]
Concepts

Individually Monitored Components53

Methods to verify Hardware Status

Two primary methods are available to check the hardware status: CLI commands
and WebUI pages. The CLI command options that are displayed are also
represented using the web administration interface.

• Cluster status summary - isi status


• Node status - isi status -n <LNN>
• Drive status and information - isi devices drive
• Drive firmware check - isi devices drive firmware
• Node firmware check:
− OneFS 8.0 or later - isi upgrade cluster firmware devices
• Hardware component status - isi_hw_status

53Individually monitored components such as drives, controllers, power supplies,


network interface cards (NICs), fans, and more are monitored and reported on. The
OneFS Event Reference guide provides help in troubleshooting individual alerts.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 81


[email protected]
Concepts

Node Status Checks

isi status

The isi status command provides an overview of the major areas to


concentrate when troubleshooting.

PowerScale Advanced Administration

Page 82 © Copyright 2020 Dell Inc.


[email protected]
Concepts

Checking Node Status Using CLI

The isi status command with the --node or -n option allows greater detail for
individual node status to be displayed.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 83


[email protected]
Concepts

Status Check Using WebUI

Dashboard -> Cluster Overview -> Cluster Status.

Like the isi status command, the Cluster Status provides similar information.

Checking Node Status in WebUI

Specific hardware status information is available in the web administration interface


under Hardware Configuration. Gen 6x hardware properties are displayed in
OneFS 8.2 and newer. The chassis information, the paired node, and the location
within the chassis are displayed.

PowerScale Advanced Administration

Page 84 © Copyright 2020 Dell Inc.


[email protected]
Concepts

Following are the node status properties:

• Manage Nodes54
• Health Status55
• View Details
• SmartFail State56

54The LNN, health status and OneFS version are displayed along with the node
configuration and chassis model number.

55 The service light status for a healthy node should be off.

56 The status for the node being Smartfailed, if the node is down, and if the node is
in the cluster are displayed for quick issue identification.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 85


[email protected]
Concepts

isi_hw_status

PowerScale Advanced Administration

Page 86 © Copyright 2020 Dell Inc.


[email protected]
Concepts

The isi_hw_status command57 provides a point-in-time view of each of the


hardware components. You can use the isi_for_array command for cluster-
wide output. Several command options exist to limit output to a specific type of
component. Technical Support may have you run this command to troubleshoot an
individual component.

isi status: You get a high-level view of the cluster health, cluster storage capacity
and usage, individual node health, node storage capacity and usage, and node
throughput. In addition, critical events are listed, the status of any running, paused,
or failed jobs, and a list of most recently run jobs and the status.

Status Check Using WebUI: You can click an individual node to display
information for a specific node, node model, serial number, node size and capacity
usage, throughput, CPU utilization, and network connection information. The root
cause of a node with an attention status can be found using the basic commands.
Often a log file analysis is required to identify the underlying issue.

57 The command only views information from the node you are logged into.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 87


[email protected]
Concepts

Drive Status Check

Check Drive Status Using CLI

Drives fail as a regular part of ongoing cluster operations. Out-of-date drive


firmware can also be a significant issue with single drive performance issues. The
isi devices drive list command provides information about drive health
state, serial numbers, bay location, device ID, Lnum, and the node number it is

PowerScale Advanced Administration

Page 88 © Copyright 2020 Dell Inc.


[email protected]
Concepts

associated with. You can also specify a specific node or all nodes using the --
node-lnn <LNN | all> option.

Drive Firmware 58

58Beginning with OneFS 8.0, drive firmware can be updated without cluster or
node-wide disruption. Use the isi devices drive firmware list command to view
current drive firmware versions and drive models by node, by drive sled for Gen 6
hardware, or for all nodes in the cluster.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 89


[email protected]
Concepts

Check Drive Status in WebUI

For additional details on a specific drive click View Details. Some


Additional management options are presented by clicking More.

The drive information is available using the web administration interface. The
information is similar to the isi devices drive CLI command.

You can SmartFail a drive or easily update the drive firmware for a single drive.

PowerScale Advanced Administration

Page 90 © Copyright 2020 Dell Inc.


[email protected]
Concepts

Node Firmware Check

Check Node Firmware Using CLI

The firmware version lists the node components.


The associated nodes for each firmware version are listed. An asterisk is used to indicate any
mismatch firmware.

Out of date or mismatched node firmware can often be the root of intermittent
issues. You can specify a specific node or all nodes in the cluster.

• isi upgrade cluster firmware devices command59


• isi firmware status command60

59
To display the node firmware from the CLI use this command for clusters running
OneFS 8.0 or later

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 91


[email protected]
Concepts

Mismatched firmware61

Check Node Firmware in WebUI

60 Use this command for clusters with OneFS versions prior to 8.0.

61 Mismatched firmware indicates the firmware version that is installed on the


cluster does not match the component firmware. This could be caused when a
component is replaced and the firmware is either newer or older than the firmware
installed on the cluster.

PowerScale Advanced Administration

Page 92 © Copyright 2020 Dell Inc.


[email protected]
Concepts

The same node firmware information available using the CLI is also available using
the web administration interface. All firmware for the node hardware is displayed. A
column indication mismatched firmware is added for quick management.

Check Battery Status

You can monitor the status of NVRAM batteries and charging systems. This task
can only be performed using the OneFS CLI and on node hardware that supports
the command.

• Open an SSH connection to any node in the cluster.


• Run the isi batterystatus list command to view the status of all
NVRAM batteries and charging systems on the node. The system displays
output similar to as shown in the graphic.

Note: The command is used to monitor the status of GEN 5


Hardware, Gen 6 do not support NVRAM batteries.

Chassis and Drive States

In a cluster, the combination of nodes in different degraded states determines


whether read requests, write requests, or both work. A cluster can lose write
quorum but keep read quorum. OneFS provides details about the status of chassis
and drives in your cluster. The following table describes all the possible states that
you may encounter in your cluster.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 93


[email protected]
Concepts

State HEALTHY L3 SMARTFAIL or NOT


SmartFail or AVAILABLE
restripe in
progress.

Description All drives in A solid-state The drive is in A drive is


the node are drive (SSD) the process of unavailable for
functioning was being removed various reasons.
correctly. deployed as safely from the You can click the
level 3 (L3) file system, bay to view
cache to either because detailed
increase the of an I/O error information
size of cache or by user about this
memory and request. Nodes condition.
improve or drives in a NOTE: In the
throughput smartfail or web
speeds. read-only state administrative
affect only interface, this
write quorum. state includes
the ERASE and
SED_ERROR
command-line
interface states.

Interface CLI and CLI CLI and WebUI CLI and WebUI
WebUI

Error State X

State SUSPENDED NOT IN REPLACE STALLED


USE

PowerScale Advanced Administration

Page 94 © Copyright 2020 Dell Inc.


[email protected]
Concepts

Description This state A node in The drive was The drive is


indicates that an offline smartfailed stalled and
drive activity is state successfully undergoing
temporarily affects and is ready to stall
suspended and both read be replaced. evaluation.
the drive is not in and write Stall
use. The state is quorum. evaluation is
manually initiated the process of
and does not checking
occur during drives that are
normal cluster slow or having
activity. other issues.
Depending on
the outcome of
the evaluation,
the drive may
return to
service or be
smartfailed.
This is a
transient state.

Interface CLI and WebUI CLI and CLI only CLI only
WebUI

Error State

State NEW USED PREPARING EMPTY

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 95


[email protected]
Concepts

Description The drive is The drive was added The drive is No drive
new and and contained a undergoing a is in this
blank. This PowerScaleGUID but format bay.
is the state the drive is not from operation. The
that a drive this node. This drive drive state
is in when likely will be changes to
you run the formatted into the HEALTHY
isi dev cluster. when the format
command is successful.
with the -a
add option.

Interface CLI only CLI only CLI only CLI only

Error State

State WRONG_TYPE BOOT_DRIVE SED_ERROR ERASE

PowerScale Advanced Administration

Page 96 © Copyright 2020 Dell Inc.


[email protected]
Concepts

Description The drive type is Unique to the The drive The drive
wrong for this A100 drive, cannot be is ready for
node. For which has boot acknowledged removal
example, a non- drives in its by the OneFS but needs
SED drive in a bays. system. your
SED node, SAS NOTE: In the attention
instead of the web because
expected SATA administration the data
drive type. interface, this has not
state is been
included in Not erased.
available. You can
erase the
drive
manually
to
guarantee
that data is
removed.

Interface CLI only CLI only CLI and WebUI CLI only

Error state X

State INSECURE UNENCRYPTED

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 97


[email protected]
Concepts

Description Data on the self- Data on the self-encrypted


encrypted drive is drive is accessible by
accessible by unauthorized personnel. Self-
unauthorized personnel. encrypting drives should never
Self-encrypting drives be used for non-encrypted
should never be used for data purposes.
non-encrypted data NOTE: In the command-line
purposes. interface, this state is labeled
NOTE: In the web INSECURE.
administration interface,
this state is labeled
Unencrypted SED.

Interface CLI only WebUI only

Error State X X

Storage Capacity Issues

Avoid Storage Capacity Issues

Running out of available storage capacity can cause severe issues with cluster and
individual node pool behavior and performance.

• Best Practice62

62 The best practice for the amount of free space to maintain can vary based on the
size of the cluster or node pool. The best practice is to pay attention when a node
pool reaches 80 percent full. Doing so allows adequate time for additional space
provisioning and installation. This helps to maintain adequate space is available.

PowerScale Advanced Administration

Page 98 © Copyright 2020 Dell Inc.


[email protected]
Concepts

• Small cluster63
• Larger node pools64
• Nodes are in a provisioned state65
• Join fails66
• Monitoring available capacity on a regular basis67

Node Size Recommended Space

The use of SmartQuotas can help prevent out of control users or applications from
consuming large amounts of space without intervention. You should run the
FSAnalyze job on schedule, which populates InsightIQ for analysis. The ingest
rates and predictive analysis can assist in capacity consumption estimations.

63 A small cluster/node pool is defined as approximately 3 to 7 nodes.

64 The minimum space for larger node pools is 10 percent.

65Whenever nodes are added to the cluster, you should verify that all nodes are in
a provisioned state, and all nodes are added successfully to a node pool.

66It is possible to attempt to add nodes, and the join fails and remains in an
unprovisioned state.

67 Monitoring available capacity regularly is highly recommended. Configure


capacity alerts on the cluster. The first alert is sent when a node pool reaches 95
percent full. Space allocated as VHS is not considered usable space by the cluster
and is removed from free space calculations. A critical alert is sent again at 99
percent full. This is a critical alert and action must be taken to mitigate potential
issues.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 99


[email protected]
Concepts

Small node pools of 3–7 20 percent


nodes

Medium node pools 8–15 15 percent


nodes

Larger node pools >16 10 percent


nodes

Error Messages Indicating Capacity Issues

The table shows the error messages that you may get when attempting to write to a
full or nearly full cluster or node pool.

Error Where the Error message appears

The operation cannot be OneFS web administration interface, or the


completed because the command-line interface on a Mac with an NFS mount.
disk “<share name>” is full.

No space left on device. OneFS web administration interface, or the


command-line interface on a Mac client.

No space available. OneFS web administration interface, or the


command-line interface of a Windows client.

ENOSPC (error code) On the cluster in /var/log/messages. This error


code is embedded in another message.

Failed to satisfy layout On the cluster in /var/log/messages.


preference.

Disk Quota Exceeded. Cluster command-line interface, or an NFS client


when you encounter a Snapshot Reserve limitation.

Primary Capacity Resolution Step

Steps to resolve capacity full issues:

PowerScale Advanced Administration

Page 100 © Copyright 2020 Dell Inc.


[email protected]
Concepts

To resolve capacity full issues, please follow the steps that are outlined in the
PowerScale customer troubleshooting guide, Troubleshoot a Full Pool or Cluster.
Several steps are available before impacting existing data on the cluster.

• Add any space to the node pool.68


• Enable spillover69
• Temporarily disable VHS70

Other corrective actions:

Other corrective actions have impact on the cluster data. Some corrective actions
may not be appropriate based on the severity of the issues.

• Move data from the full node pool to a node pool with available capacity. 71
• Attempt to deduplicate data72

68These could be unprovisioned nodes requiring node compatibility enablement, or


new nodes. It may require adding a new node pool and adding the node pool to an
existing tier.

69Enable spillover on a full node pool if it has not been enabled to allow writes to
continue on the spillover target node pool.

70Temporarily disable VHS to allow the reserved space to be used by the file
system, while other corrective actions can be taken. Reenable VHS when the
current issue is resolved. VHS provides important required data space safety
margin for the required space when rebuilding data from failed drives, and data
restriping.

71Create and run a SmartPools file pool policy to move data and manually run the
SmartPools job. This does require a SmartPools license. If necessary contact
Technical Support for a temporary SmartPools license key.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 101


[email protected]
Concepts

• Delete shadow stores 73


• Delete snapshots74
• Delete data manually 75

Primary Capacity Resolution Step Continued

Snapshot deletion can be the most significant action to clear up space on a full
cluster.

• Snapshot remain on the cluster76


• SnapshotDelete job77
• FlexProtect and FlexProtectLin jobs78

72 The SmartDedupe job requires a long run time to sample and deduplicate data.
This may be impractical based on the urgency required. The job may also have
limited benefit based on the type of data stored.

73Many shadows stores are stale containing data that has been modified or
deleted.

74 Snapshots retain the data blocks they protect. Depending upon the age and
quantity of the snapshots, a considerable amount of data may be retained on the
cluster.

75 Move the data to an approved local system or to another reachable PowerScale


cluster. To solve severe issues, only data deletion may be possible to perform.

76 Any data blocks protected by a snapshot remain on the cluster until the snapshot
is removed either by expiration or manually deleted.

77 The data blocks are freed when the SnapshotDelete job runs.

PowerScale Advanced Administration

Page 102 © Copyright 2020 Dell Inc.


[email protected]
Concepts

• All other jobs are paused and queued79


• SnapshotIQ license80
• Delete snapshots in order81

To troubleshoot capacity full issue, use the Troubleshoot A Full Pool Or Cluster
guide.

Tip: Download the Best Practices Guide for Maintaining Enough Free
Space on PowerScale Clusters and Pools. The guide provides
guidance to avoid cluster or node pool full situations, enable safety
measures to mitigate risk (though the white paper is old, it is still
valid).

78Only FlexProtect and FlexProtectLin jobs can run while a cluster is in a degraded
protection state.

79All other jobs are paused and queued until the cluster is returned to a fully
operational protection state.

80 A SnapshotIQ license is required to view and delete snapshots.

81The recommendation is to delete snapshots in order from oldest to newest.


Newer snapshots mostly contain pointers to older snapshots. The older snapshots
contain more links to data blocks that have been deleted.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 103


[email protected]
Concepts

Challenge

Lab Assignment: Trouble permission-related issues to correct


misconfigured permissions for different users.
1) Resolve permissions for a user who is denied access to a share which
he is supposed to be allowed to access.
2) Correct permissions for a user to deny access to a directory.

PowerScale Advanced Administration

Page 104 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

Advanced Access

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 105


[email protected]
Advanced Access

Module Objectives

After completion of this module, you can:

• Describe the networking architecture and AIMA Hierarchy.


• Describe access zones and administrative commands for access zones.
• Describe SmartConnect network hierarchy and configure Multi-SSIP.
• Discuss various authentication providers and their configuration.
• Discuss various OneFS protocols and their configuration.

PowerScale Advanced Administration

Page 106 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

Networking Architecture

Network and AIMA Hierarchy

Authentication, identity management, and authorization, or AIMA, ties into the


network hierarchy at different levels. The graphic shows how the AIMA hierarchy
ties into the network hierarchy.

1. The user connects to a SmartConnect zone name, which is tied to a subnet,


and SSIP.
2. The SmartConnect zone name is mapped to an access zone. The access zone
contains the authentication providers, directory services, user mapping, ID
mapping, and generates user tokens.
3. The access zone has a base directory where file permissions and user identities
on disk are applied.
4. Windows shares, NFS exports, and S3 buckets are created per access zone.

Link: See PowerScale Authentication and Identity Management


practical recommendations white paper for more information.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 107


[email protected]
Advanced Access

Internal Network Settings

Failover configuration involves enabling the int-b interface, specifying a valid


netmask, and adding IP address ranges for the int-b interface and the failover
network. Configuring the int-b and failover networks is typically done using the
Configuration Wizard at install. The int-b interface and failover network should be
Enabled.

If required, the CLI command to change or modify the int-b internal network
netmask is: netmask int-b 255.255.255.0. To add an IP range to int-b
netmask network, run the iprange int-b 192.168.206.21-
192.168.206.30 command. Run the interface int-b enable command to
specify the interface name as int-b.

Graphic shows the Enabled int-b network.

PowerScale Advanced Administration

Page 108 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

Note: There can be consequences of not planning for a redundant BE


network, you might have to reboot the cluster to apply modifications to
internal network failover.

Getting Ready to Troubleshoot

Before attempting to debug the network environment around a PowerScale cluster,


understand the environment.

Node types - expected performance characteristics

OneFS version - licensed options

Protocols: NFS / CIFS, Jumbo Frame Enabled, Mount Options used (NFS only Connectivity)

Smartconnect configuration

Interface speeds: Using the correct interface (DataIQ can help)

The whole network stack depends upon the foundation being healthy.

Network Troubleshooting

PowerScale clusters are complex, in terms of networking. This is inevitable,


because each node in a cluster is itself present (or at least potentially present) on
the network. Moreover, the cluster nodes interact with each other to create a
combined network environment. Start by developing a clear view of the cluster
configuration, both in terms of nodes and its network context. Try to establish what
the network element configurations are, and those of the clients and network
services.

Network Architecture Considerations

Designing a network is unique to the requirements of each enterprise data center.


There is not a one size fits all design and not a single good network design.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 109


[email protected]
Advanced Access

Network design is based on many concepts; the following are considerations and
principles to guide the process:

 Single Points of Failure82


 Application and Protocol Traffic83
 Available Bandwidth84
 Minimizing Latency85

82 Ensure the network design has layers of redundancy. Dependence on a single


device or link relates to a loss of resources or outages. The enterprise
requirements consider risk and budget, guiding the level of redundancy.
Redundancy should be implemented through backup paths and load sharing. If a
primary link fails, traffic uses a backup path. Load sharing creates two or more
paths to the same endpoint and shares the network load. When designing access
to PowerScale nodes, it is important to assume links and hardware will fail,
ensuring access to the nodes survives those failures.

83Understanding the application data flow from clients to the PowerScale cluster
across the network allows for resources to be allocated accordingly while
minimizing latency and hops along this flow.

84As traffic traverses the different layers of the network, the available bandwidth
should not be significantly different. Compare this available bandwidth with the
workflow requirements.

85 Ensuring latency is minimal from the client endpoints to the PowerScale nodes
maximizes performance and efficiency. Several steps can be taken to minimize
latency, but latency should be considered throughout network design.

PowerScale Advanced Administration

Page 110 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

 Prune VLANs86
 VLAN Hopping87

Advanced Concern: Buffer Sizes

One of the parameters that administrators tend to tweak on networks is network


buffer size88.

Faster Network Increase buffer size

Higher Latency Increase buffer size

Bufferbloat Decrease buffer size

86It is important to limit VLANs to areas where they are applicable. Pruning
unneeded VLANs is also good practice. If unneeded VLANs are trunked further
down the network, this imposes additional strain on endpoints and switches.
Broadcasts are propagated across the VLAN and impact clients.

87It is recommended to assign the native VLAN to an ID that is not in use.


Otherwise tag the native VLAN to avoid VLAN hopping, allowing a device to access
a VLAN it normally would not have access. Additionally, only allow trunk ports
between trusted devices and assign access VLANs on ports that are different from
the default VLAN.

88In general, a buffer large enough to accommodate a busy network traffic is


desired. This means that faster networks demand larger buffers, and networks with
higher latency demand larger buffers. Remember that buffers may have to
accommodate data while a transmission completes.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 111


[email protected]
Advanced Access

On the other hand, sufficiently large buffers can artificially delay dropping packets
on congested networks. This matters, because TCP depends upon dropped
packets to help signal when to reduce transmission rates. If the delay between
congestion taking effect and packets dropping is sufficiently long, then TCP
transmission rates can oscillate wildly between unrealistically high speeds, and
unnecessarily modest speeds. This is known as bufferbloat. Again, Wireshark is a
great tool for diagnosing bufferbloat. Capture the packets of a connection
suspected of bufferbloat, graph transmission, and drop and retransmission rates to
successfully diagnose bufferbloat.

Link Aggregation

Link aggregation provides methods to combine multiple Ethernet interfaces,


forming a single link layer interface, specific to a switch or server. Therefore, link
aggregation is implemented between a single switch and a PowerScale node, not
across PowerScale nodes.

 Implementing link aggregation is neither mandatory nor is it necessary89


 Link aggregation assumptions90
 Not a substitute for a higher bandwidth link91

89 It is based on workload requirements and is recommended if a transparent


failover or switch port redundancy is required.

90Per the IEEE specification, gigabit speed is available only in full-duplex and all
types of aggregation are only point to point. Link aggregation provides graceful
recovery from link failures. If a link fails, traffic is automatically sent to the next
available link without disruption.

91Link aggregation combines multiple interfaces, applying it to multiply bandwidth


by the number of interfaces for a single session is incorrect. Link aggregation

PowerScale Advanced Administration

Page 112 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

 Protocols may or may not benefit from link aggregation92


 Multi-chassis link aggregation93

Link Aggregation does not provide increase of performance, it provides only high
availability in some scenarios. It is important to recognize that regarding bandwidth,

distributes traffic across links. However, a single session only uses a single
physical link to ensure packets are delivered in order without duplication of frames.

92 Stateful protocols, such as NFSv4 and SMBv2 benefit from link aggregation as a
failover mechanism. On the contrary, SMBv3 Multichannel automatically detects
multiple links, using each for maximum throughput and link resilience.

93Multiple switches are connected with an Inter-Switch link or other proprietary


cable and communicate via a proprietary protocol forming a virtual switch. A virtual
switch is perceived as a single switch to a PowerScale node, with links terminating
on a single switch. The ability to have link aggregation split with multiple chassis
provides network redundancy if a single chassis were to fail.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 113


[email protected]
Advanced Access

the concepts discussed for single switch Link Aggregation still apply to Multi-
Chassis Link Aggregation. Additionally, as the multiple switches form a single
virtual switch, it is important to understand what happens if the switch hosting the
control plane fails. Those effects vary by the vendor’s implementation but will
impact the network redundancy gained through Multi-Chassis Link Aggregation.

Configuring Link Aggregation

You can configure which network interfaces are assigned to an IP address pool. If
you add an aggregated interface to the pool, you cannot individually add any
interfaces that are part of the aggregated interface.

The CLI commandisi network pools view <id> is used to display the
configuration details of a specific IP address pool on the cluster.

The isi network pools modify groupnet0.subnet0.sales --


aggregation-mode failover command modifies sales under
groupnet0.subnet0 to specify failover as the aggregation mode for all
aggregated interfaces in the pool.

--revert-aggregation-mode sets the value of --aggregation-mode to the


system default value.

PowerScale Advanced Administration

Page 114 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

The --aggregation-mode {roundrobin | failover | lacp | fec}


specifies how outgoing traffic is distributed across aggregated network interfaces.
The aggregation mode is applied only if at least one aggregated network interface
is a member of the IP address pool.

The following values are valid:

Values Description

roundrobin Rotates connections through the nodes in a


first-in, first-out sequence, handling all
processes without priority. Balances
outbound traffic across all active ports in
the aggregated link and accepts inbound
traffic on any port.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 115


[email protected]
Advanced Access

failover Switches to the next active interface when


the primary interface becomes unavailable.
Manages traffic only through a primary
interface. The second interface takes over
the work of the first as soon as it detects an
interruption in communication.

lacp Supports the IEEE 802.3ad Link


Aggregation Control Protocol (LACP).
Balances outgoing traffic across the
interfaces based on hashed protocol
header information that includes the source
and destination address and the VLAN tag,
if available. Also assembles interfaces of
the same speed into groups called Link
Aggregated Groups (LAGs) and balances
traffic across the fastest LAGs. This option
is the default mode for new pools.

fec Provides static balancing on aggregated


interfaces through the Cisco Fast
EtherChannel (FEC) driver, which is found
on older Cisco switches. Capable of load-
balancing traffic across Fast Ethernet links.
Enables multiple physical Fast Ethernet
links to combine into one logical channel.

Flow Control

Flow control is the management of data flow between computers or devices or


between nodes in a network so that the data can be handled at an efficient pace.
Too much data arriving before a device can handle it causes data overflow,
meaning the data is either lost or must be retransmitted. This equates to
performance issue.

PowerScale Advanced Administration

Page 116 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

Number greater than


0

CLI command used to see which interfaces in a cluster have received pause frames.

Pause frames are a signaling device for flow control. What they mean is roughly:
"Slow down! You are transmitting too much, too quickly!" The frames might be sent
or received by the cluster, but in either case they mean that there is a network
performance imbalance. The cluster might be flooding clients that cannot keep up
with it, or the cluster might be overloaded. In either case, network connections are
throttled until all participants can keep pace.

OneFS VLAN Configuration

Applying a VLAN trunk/tag to a subnet-connected port that does not require it


causes issues with network traffic94 to/from the PowerScale cluster. You can
partition the external network into Virtual Local Area Networks or VLANs.

• The OneFS network stack was designed to comply with the IEEE_802.1Q RFC.
• Maximum number of VLANs: 409495.

Run the isi network subnets list to identify the name of the external
subnet you want to modify for VLAN tagging. Run the isi network subnets

94 Not applying a tag to a subnet on a port that is not trunked/tagged will cause
issues with network traffic to/from the PowerScale cluster.

95VLAN tagging requires a VLAN ID that corresponds to the ID number for the
VLAN set on the switch.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 117


[email protected]
Advanced Access

modify <id> --vlan-enabled=true command to enable VLAN tagging on


your subnet.

PowerScale Advanced Administration

Page 118 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

Routing

Normal OneFS Routing

If the only systems that the cluster ever has to talk to are on networks to which the
cluster is directly connected, there is not a problem. The issue occurs when there

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 119


[email protected]
Advanced Access

are multiple foreign networks96. Because there can only be one default gateway on
any node at a given time, the default gateway mechanism is insufficient to allow the
cluster to correctly route packets.

UNIX, IPv4, and Routing

Remember that OneFS is a derivative of FreeBSD. By default, OneFS uses routing


system of FreeBSD for choosing a route. Given a series of alternative routes to a
particular destination, the cluster nodes select the most precisely defined route.

Less specific route


Destinations Gateways

More specific route


For destination IP addresses in the 137.69.0.0/16 subnet, packets go to10.13.52.237 except for the address 137.69.122.6/32 where
they are sent to10.104.5.1.

96Networks to which the cluster does not have a direct connection, and that are
only reachable through a gateway/router. If the number of “foreign network” is small
and relatively static, then defining static routes provides an effective workaround.

PowerScale Advanced Administration

Page 120 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

Source - Based Routing

OneFS SBR effectively implements per-subnet default routes, but does so by


creating ipfw rules of the form97 shown.

IPv6 is a special case because of how IPv6 is supposed to handle complex routing,
and currently OneFS does not do SBR for IPv6. If working with a complex IPv6
network, plan to put more of the burden on the network infrastructure, and ask the
network administrator which flexible routing options are available.

• To enable source-based routing: 7.2.x: isi networks sbr enable and 8.x
and later: isi network external modify --sbr=true
• To disable source-based routing: 7.2.x: isi networks sbr disable and
8.x and later: isi network external modify --sbr=false

Resource: Reference material to ordinary networking on PowerScale


clusters

UNIX, IPv4, and Routing: One might have expected the cluster to run down the
list of available routes, in so-called waterfall fashion, and use the first one that
matches, but instead the cluster orders all matching routes in order of specificity
and uses the most specific one. This information is important if trying to reconstruct
what the cluster is actually doing in routing terms. It is also important to remember

97The process of adding ipfw rules is stateless and essentially translates to per-
subnet default routes without manual intervention.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 121


[email protected]
Advanced Access

that some UNIX hosts evaluate routes in waterfall fashion, and administrators
should understand the difference between the ways that the cluster operates and
other hosts operate.

Source - Based Routing: When SBR was originally implemented, it overrode any
existing static routes. This turned out to be a problem, because in some
environments people wanted to have particular static routes with flexible SBR. In
recent versions of OneFS, this has been corrected. Because of the way that similar
features have been implemented on other well-known platforms (e.g. NetApp,
CLARiiON, VNX), there is widespread misunderstanding of what was implemented
on OneFS. Most of the other platforms use stateful packet inspection, so that the
network stack tracks connected flows, which enable the network stack to send a
reply to a packet back to the same gateway that sent the packet to the PowerScale
node. This is NOT what SBR does. SBR enables sending a packet through the
same interface on which it arrived. Instead of relying on the destination IP, SBR
creates dynamic forwarding rules using the IP address of sender and the subnet
that the packet arrives on. It then creates a reverse rule so that packets going to
that IP address will always be forwarded to the default gateway for that subnet.

Source - Based Routing Example

SBR is routing packets based on a source IP address. SBR is a mechanism to


dynamically create per-subnet default routes. The router used as this gateway is
derived from the subnet configuration. Gateways must be defined for each subnet.

Consider a cluster with subnets A, B, and C, as illustrated in the graphic: each


gateway has a defined priority.

PowerScale Advanced Administration

Page 122 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

3 4

1: If SBR is not configured, the highest priority gateway, that is, gateway with the
lowest value which is reachable, is used as the default route. Once SBR is
enabled, when traffic arrives from a subnet that is not reachable via the default
gateway, firewall rules are added. As OneFS is FreeBSD based, these are added
through ipfw.

2: If src-ip is in subnetA and dst-ip is not in (subnetA,B,C)


set next-hop to gatewayA

3: If src-ip is in subnetB and dst-ip is not in (subnetA,B,C)


set next-hop to gatewayB

4: If src-ip is in subnetC and dst-ip is not in (subnetA,B,C)


set next-hop to gatewayC

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 123


[email protected]
Advanced Access

SBR is entirely dependent on the source IP address98 that is sending traffic to the
cluster. SBR creates per-subnet default routes in the following steps:

 A subnet setting of 0.0.0.0 is not supported and is severely problematic, as


OneFS does not support RIP, RARP, or CDP.
 Path for all traffic99
 Static routs are an option100

Non-PowerScale Tools for Network Troubleshooting

The table shows the common tools that are used for network troubleshooting.

iperf Network throughput testing with various parameters.

netstat Reports various host network activity and configurations.

98 If a session is initiated from the source subnet, the ipfw rule is created. The
session must be initiated from the source subnet, otherwise the ipfw rule is not
created. If the cluster has not received traffic that originated from a subnet that is
not reachable via the default gateway, OneFS will transmit traffic it originates
through the default gateway.

99The default gateway is the path for all traffic intended for clients that are not on
the local subnet and not covered by a routing table entry. Utilizing SBR does not
negate the requirement for a default gateway, as SBR in effect overrides the
default gateway, but not static routes.

100 Static routes are an option when the cluster originates the traffic, and the route
is not accessible via the default gateway. As mentioned above, static routes are
prioritized over source-based routing rules.

PowerScale Advanced Administration

Page 124 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

ss Newer equivalent of netstat (many Linux distributions).

arp Report ARP tables to check devices on Ethernet.

traceroute Check network paths.

ping Check ICMP connectivity.

ifconfig Displays configured interfaces.

Wireshark Analysis of packet captures.

Caution: There are some security implications about "Packet


Analyzers".

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 125


[email protected]
Advanced Access

Access Zones

Access zones

Access zones allow the cluster to be divided securely to accomplish various tasks
required for multitenancy.

• Administrators can control who connects to the cluster and what nodes they
connect to based on the hostname when using access zones with the
SmartConnect pools.
• It allows for separating authentication providers or for multiple authentication
providers of the same type that is used in the same cluster.
• Administrators can define user-mapping rules to manipulate identities provided
for various user tokens.
• A base directory must be defined when configuring an access zone. Defining a
base directory allows administrators to isolate directories for security and
compliance requirements.

Client Access Checks

When a client connects to the cluster, they must undergo several checks to ensure
that they are allowed to access wherever they are trying to go.

• What hostname are they connecting to?

PowerScale Advanced Administration

Page 126 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

• Which authentication providers are in the access zone associated with that
SmartConnect zone name?
• Do they have an account in at least one of those authentication providers?
• What other identifiers are needed for their access token?
• Do they have permission to connect to the share or export?

Base Directories and Access Zones

OneFS allows administrators to share base directories between different access


zones. There are some considerations to be aware of before choosing to share
base directories.

• If sharing base directories between access zones, ensure multiple LDAP or NIS
providers use unique UID or GID ranges.
• Do not nest base directories inside the base directory of another zone. Avoid
sharing base directories between access zones.
• Only assign one authentication provider of each type to each access zone.
• Avoid overlapping UID or GID ranges for authentication providers in the same
access zone.

If combining multiple authentication providers with UID or GID values that are
assigned to users and groups, ensure that the various authentication providers do
not use overlapping ranges. Two users in different authentication providers with the
same UID, sharing a base directory, are treated as the same user when accessing

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 127


[email protected]
Advanced Access

files. If possible, avoid putting multiple LDAP or NIS providers in different access
zones that share a base directory.

Another consideration is that you should not nest base directories inside the base
directory of another zone. There is an exception for the System zone as its base
directory is /ifs and all other access zones are nested within that base directory.

If both LDAP and NIS are configured within the same zone, be certain that the two
providers do not use overlapping UID or GID ranges. The consequence of this
occurring is that the users are treated the same within the base directory that is
defined for that access zone. Users treated the same can lead to unintended
permissions being granted or denied to each user. The best practice is to only use
one LDAP or one NIS provider within each access zone, or between access zones
that share the base directory. There is no guarantee that the identifiers each
provider use are globally unique.

Overlapping Access Zones Paths

By default each zone has unique paths.

Keeps different user groups


organised Unique to
zone

Overlapping path supported Useful


in use cases where data passed between groups

The ability for access zones to have overlapping paths has changed over OneFS
releases.

PowerScale Advanced Administration

Page 128 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

• In 7.0, when access zones were created, it worked but it was not intended to, so
it was never supported.
• Starting in 7.1.1, the “loophole” was closed and it stopped working and
upgrading forced people to fix their access zones to comply with the new rules.
• Starting in 8.0, the functionality was restored and is officially supported.
• Example101

Decision point: Will authentication providers on both access zones


give compatible results?

Access Zones on a SyncIQ Secondary Cluster

• You should create access zones on a SyncIQ secondary cluster that is used for
backup and disaster recovery, with some limitations.
• System configuration settings, such as access zones, are not replicated to the
secondary server.

101There are valid use cases for overlapping paths between access zones but care
must be taken due to permission clashes. For example if the two zones
authenticate to two distinct untrusted domains, file permissions may be
troublesome. A warning is issued when a storage admin creates an access zone
that has a path that overlaps with another zone.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 129


[email protected]
Advanced Access

• In a failover scenario, the configuration settings of the primary and secondary


clusters should be similar, if not identical.
• Usually it is recommended that you configure system settings before you run a
SyncIQ replication job. The reason is that a replication job places target
directories in read-only mode. If you attempt to create an access zone where
the base directory is already in read-only mode, OneFS generates an error
message.

Administrative Commands for Access Zones

On the CLI, it is assumed that commands are run in the system zone. For isi
commands where access zones matter (example: isi nfs exports), a –zone
flag allows the admin to specify a different zone. But the traditional FreeBSD
commands do not have the –zone flag and, therefore, execute in the system
zone. This can lead to less clear output such as raw SIDs in the ls –led
command. This is because the AD domain of the System zone may not be able to
resolve the SIDs.

PowerScale Advanced Administration

Page 130 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

• BSD CLI commands are not zone aware and are often useful.
− They run in the system zone.
− They are Numeric-only permissions values and harder to interpret results.
− OneFS is modified from the parent BSD codebase.
• isi_run can context switch these commands to give zone-aware responses.

Example: Getting the Zone ID

Using isi_run to add zone sensitivity to a BSD CLI command requires the zone
ID which can be found by running isi zone. This displays a listing of zones and
their zone IDs. With the correct zone ID, use isi_run to derive detailed
information from the CLI.

Get zone ID

Use zone ID

This process can be valuable in the context of debugging your configuration,


because it allows administrators to directly test file access and related information

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 131


[email protected]
Advanced Access

from the CLI. Administrators can tell, for example, that a user's SID in one access
zone is equivalent to a different user's SID in another access zone, and thus avoid
overlapping the zones to avoid access conflicts. isi_run is also useful for creating
a zone context for a series of commands. This and other options are detailed
together in the man page.

Quality of Service

You can set upper bounds on quality of service102 by assigning specific physical
resources to each access zone.

Access zones do not provide logical quality-of-service guarantees to these


resources, but you can partition these resources between access zones on a single
cluster.

The list describes a few ways to partition resources to improve quality of service:
• NICs103

102 Quality-of-Service addresses physical hardware performance characteristics


that can be measured, improved, and sometimes guaranteed. Characteristics that
are measured for quality of service include but are not limited to throughput rates,
CPU usage, and disk capacity. When you share physical hardware in a
PowerScale cluster across multiple virtual instances, competition exists for the
following services: CPU, Memory, Network Bandwidth, Disk I/O, Disk capacity.

103You can assign specific NICs on specific nodes to an IP address pool that is
associated with an access zone. By assigning these NICs, you can determine the
nodes and interfaces that are associated with an access zone. This enables the
separation of CPU, memory, and network bandwidth.

PowerScale Advanced Administration

Page 132 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

• SmartPools104
• SmartQuotas105

Access Zones Best Practices

Access zones carve out access to a PowerScale cluster creating boundaries for
multitenancy or multiprotocol. They permit or deny access to areas of the cluster.
At the access zone level, authentication providers are also provisioned.

System Zone

When a PowerScale cluster is first configured, the System Zone is created by


default. The System Zone should only be used for management as a best practice.
In certain special cases, some protocols require the system zone, but generally
speaking, all protocol traffic should be moved to an Access Zone.

Moving client traffic to Access Zones ensures that the System Zone is only used for
management and accessed by administrators. Access Zones provide greater
security as administration, and file access is limited to a subset of the cluster,
rather than the entire cluster.

104SmartPools are separated into multiple tiers of high, medium, and low
performance. The data written to a SmartPool is written only to the disks in the
nodes of that pool.
Associating an IP address pool with only the nodes of a single SmartPool enables
partitioning of disk I/O resources.

105 Through SmartQuotas, you can limit disk capacity by a user or a group or in a
directory. By applying a quota to the base directory of an access zone, you can
limit disk capacity that is used in that access zone.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 133


[email protected]
Advanced Access

Root-Based Path

• When an Access Zone is defined a root-based path must be defined to segment


data into the appropriate Access Zone and enable the data to be
compartmentalized.
• Best practice is to use the cluster name, a numerical Access Zone number, and
a directory. Example106
• In the graphic, as Cluster 1 fails over to Cluster 2, the directory structure
remains consistent, easily identifying where the files originated from. This
delineation also ensures clients have the same directory structure after a
failover. Once the IP address is updated in DNS, the failover is transparent to
clients. As more clusters are brought together with SyncIQ, this makes it easier

106 Access Zone 1 maps to /ifs/clustername/az1/, Access Zone 2 maps to


/ifs/clustername/az2/. A Root Based Path with this delineation, provides data
separation, Multi-Tenancy, maintains the Unified Permission model and makes
SyncIQ failover and failbacks easier.

PowerScale Advanced Administration

Page 134 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

to manage data, understanding where it originated from and provides seamless


disaster recovery.

Access Zones Limitations

The table shows a few important limitations when using access zones. We do not
support different access zones using the same IP ranges. Some service providers
give out the same private subnet to multiple customers. OneFS does not support
this. It would be possible to work around this at the networking layer with a NAT
facility, but the cluster itself does not implement a NAT107 facility. Instead, the IP
ranges themselves differentiate access zones from each other, from the cluster's
point of view.

Functions only available in 50 access zones No overlapping IP ranges


System zone and 50 AD
domains per
cluster

• FTP 20 zones with 5 AD IP ranges differentiate


domains before access zones.
• HTTP/RAN
OneFS 8.0
• System-wide administration

While the major protocols are supported in nonsystem zones, some of the minor
ones, such as FTP, HTTP, and RAN (Restful Access to Namespace) are not. This
means that if the scenario relies upon different groups having differentiated FTP
access, use FTP's authentication and control techniques to preserve customer

107Network Address Translation(NAT) is a major translation technology used to


translate IP addresses.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 135


[email protected]
Advanced Access

security. The same applies to HTTP, which may be important if using the cluster to
be the back end of a website. All administrative functions including the CLI, WebUI,
and PAPI, only work in the System zone. This implies that RBAC administrative
users only work in the System zone as well. The only exception is that there is
limited MMC support, that is zone aware. If MMC is a particular part of the Microsoft
networking context, plan to limit access to that facility on a per-domain basis. While
there are no hard limits in the code, support recommends to not exceed 50 zones
with 50 AD domains per cluster starting with OneFS 8.0. In prior versions, the limits
are 20 zones and five AD domains.

PowerScale Advanced Administration

Page 136 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

SmartConnect Advanced and DNS

DNS and SmartConnect

DNS

DNS on a PowerScale cluster serves two functions: DNS client and DNS server

Client Server

Retrieve remote addresses108 Provide addresses to clients.109

Supports authentication Central to SmartConnect

SmartConnect is a DNS Server

SmartConnect acts as a DNS delegation server to return IP addresses for


SmartConnect zones, generally for load-balancing connections to the cluster.

• OneFS uses a custom-written DNS server to support SmartConnect features

108DNS serves the cluster with names and numbers for various reasons (most
notably authentication), and this means that the cluster is acting as a DNS client.

109
The cluster itself serves DNS information to inbound queries (in the service of
SmartConnect) and as such acts as a DNS server.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 137


[email protected]
Advanced Access

• Does not forward requests110


• Does not cache responses111

DNS serves the cluster in the pursuit of name resolution. Also serves the cluster
clients for load balancing and access purposes, misconfiguration on either side can
disrupt cluster functions. Always double-check DNS function, and if something
seems to be misbehaving, get the network team involved in your solution.

SmartConnect Network Hierarchy

The graphic shows the SmartConnect hierarchy.

110This is not a full-fledged DNS server and cannot be used to forward events to
another DNS server in the event it does not recognize a DNS request. So, it is best
used as simply an authoritative source of the DNS zone that is assigned to the
SmartConnect Zone, and should be configured as a delegate for the DNS zone in
the customer DNS server.

111Occasionally, a security or network administrator refuses to allow OneFS to act


as a DNS server because of security concerns. However, this is founded on a
misconception concerning the role of SmartConnect. Once the limitations of
SmartConnect are explained, the argument usually resolves itself.

PowerScale Advanced Administration

Page 138 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

Heyden wants you to configure SmartConnect subnets such


that client connections are to the same physical pool of nodes
on which the data resides.

For example, if a workload’s data lives on a pool of F-series


nodes for performance reasons, the clients that work with
that data should mount the cluster through a pool that
includes the same F-series nodes that host the data.

Groupnet DNS

Subnet SSIP

SSIP

SSIP

SSIP

SSIP

SSIP

Pool SC Zone Name FQDN

Pool SC Zone Name FQDN

Subnet

Subnet

Under each subnet, pools are defined, and each pool will have a unique
SmartConnect Zone Name. It is important to recognize that multiple pools lead to
multiple SmartConnect Zones using a single SSIP.

OneFS DNS: Server Side

DNS lookups of SmartConnect zone names involve four separate DNS operations.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 139


[email protected]
Advanced Access

1: A client makes a DNS request for example.domain.com by sending a DNS


request packet to the site DNS server.

2: The site DNS server has a delegation record for example.domain.com. It sends
a DNS request to the defined nameserver address in the delegation record, the
SmartConnect service SSIP (SmartConnect Service IP Address).

3: The cluster node hosting the SmartConnect Service IP (SSIP) for this zone
receives the request. Cluster determines the IP address to assign based on the
configured connection policy for the pool in question (such as round robin). Cluster
then sends a DNS response packet to the site DNS server.

4: The site DNS server sends the response back to the client.

5: Client access the cluster.

SmartConnect Features

SmartConnect Feature Explanation

Directs clients to nodes By answering DNS queries

Balances load By querying cluster nodes

Does not forward DNS queries No cache nor forwarding roles

Answers on IPv4 and IPv6 A and AAAA records

PowerScale Advanced Administration

Page 140 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

Static or dynamic pool handling By respecting cluster configuration

Responsive to cluster status By using 0 TTL cache

Does not change broader DNS By only answering on assigned IPs


and names

Answers DNS on specific IPs Called the SSIP

Following are the description for SmartConnect features mentioned in the table:

• SmartConnect directs clients to nodes in a load balanced way, by controlling IP


address returns to DNS queries.
• SmartConnect does not change anything in DNS, but only answers for its own
subdomains. SmartConnect monitors the node loads for load-balancing
purposes.
• SmartConnect does not cache or forward DNS query results.
• SmartConnect can work in IPv4 and IPv6 environments.
• SmartConnect can deal with static or dynamic pools transparently on the
cluster.
• SmartConnect sets a 0 TTL to keep results responsive to changing load levels
on cluster nodes.
• SmartConnect does not change anything in DNS, but only answers for its own
subdomains.
• SSIPs can answer for all zones. They are not differentiated per zone. The
address pool differentiates SSIPs, and answer for all associated pools.

SmartConnect CLI Cheat Sheet

The table shows command examples relating to SmartConnect. There is a wide


range of possible commands and diagnostics, but these examples are some of the
most common ones.

Operation Command

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 141


[email protected]
Advanced Access

Create New Pool isi network pools create


groupnet0.subnet0.pool1 --
ranges=172.16.189.120-172.16.189.130 --
sc-dns-zone=a.example.com --ifaces=1:ext-
1

Check Configuration isi network pools view -id


groupnet0.subnet0.pool1

Add or Update SSIP isi network subnets modify


groupnet0.subnet0 --sc-service-
addr=172.16.189.60

DNS Lookup nslookup or dig

Change TTL isi network pools modify <pool name> --


sc-ttl <non zero value>

Protocols and Network Allocation Policies

Client access protocols on PowerScale can be divided into the following categories.

• Stateful: The client/server relationship usually has a session state112 for each
open file.
• Stateless: Stateless protocols are generally accepting of failover without
session state information being maintained (except for locks).

112Failing over IP addresses to other nodes for these types of workflows means
that the client assumes that the session state information was carried over. Session
state information for each file is not shared among PowerScale cluster nodes.

PowerScale Advanced Administration

Page 142 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

Following are the recommended IP allocation strategies for SmartConnect


Advanced for each supported protocol.

Protocol Protocol Recommended


Category Allocation Strategy

NFSv2(not supported in Stateless Dynamic


OneFS 7.2 and above)

NFSv3 Stateless Dynamic

NFSv4 Stateful Static


Note: Though NFSv4
is supported with
dynamic IPs, there
could be a potential
performance impact.

SMBv1 Stateful Static

SMBv2/2.1 Stateful Static

SMBv3 Multi-Channel Stateful Dynamic or Static

FTP/FTPs Stateful Static

SFTP/SSH Stateful Static

HDFS Stateful (but Static


protocol is
tolerant of
failures).

HTTP/HTTPS/RAN Stateful Static

SyncIQ Stateful Static

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 143


[email protected]
Advanced Access

Resource: See the PowerScale: Network Design Considerations


white paper for more information.

SmartConnect SSIP

An SSIP is assigned to the node in the node pool with the lowest device ID 113 in the
subnet with the SSIP. The node has an active interface that is in any IP pools that
SSIP services.

• The SmartConnect DNS service must be active on only one node at any
time114, per subnet.
• For example, The SmartConnect service continues to run throughout the
process as the existing nodes are refreshed115.

113 The device ID is the Node ID given to node and it is different from LNN.

114The SmartConnect Service IP resides on the node with the lowest node ID that
has an interface in the given subnet. SSIP does not necessarily reside on the node
with the lowest Logical Node Number (LNN) in the cluster.

115Suppose that an existing four-node cluster is refreshed with four new nodes.
Assume that the cluster has only one configured subnet, all the nodes are on the
network, and that there are sufficient IP addresses to handle the refresh. The first
step in the cluster refresh is to add the new nodes with the existing nodes,
temporarily creating an eight-node cluster. Next, the original four nodes are
SmartFailed. The cluster is then composed of the four new nodes with the original
dataset.

PowerScale Advanced Administration

Page 144 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

The SmartConnect service always runs on the node with the lowest node ID;
NodeID 1 is mapping to LNN 1.

LNN NodeID NodeName New or Original


Node

1 1 Clustername-1 Original

2 2 Clustername-2 Original

3 3 Clustername-3 Original

4 4 Clustername-4 Original

5 5 Clustername-5 New

6 6 Clustername-6 New

7 7 Clustername-7 New

8 8 Clustername-8 New

The original nodes are removed using SmartFail. At this point, NodeID 5 is
mapping to LNN 1.

LNN Node ID Node Name New or Original


Node

1 5 Clustename-5 New

2 6 Clustename-6 New

3 7 Clustename-7 New

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 145


[email protected]
Advanced Access

4 8 Clustename-8 New

The updated Node IDs and LNNs remain the same, but map to a different Node
Name. At this point, NodeID 5 is mapping to LNN 1.

LNN Node ID Node Name New or Original


Node

1 5 Clustename-1 New

2 6 Clustename-2 New

3 7 Clustename-3 New

4 8 Clustename-4 New

The isi_dnsiq_d daemon on that node is considered the primary for the IP pools
that SSIP services. The primary node is responsible for responding to DNS
requests on that SSIP and for deciding about moving IP addresses in any dynamic
IP pools that the SSIP services. Multiple subnets in a given groupnet can be
assigned a specific SSIP to respond for all queries into the SmartConnect Zone for
host resolution. This step would require you to select the wanted --sc-subnet
when creating the pool for the --sc-dns-zone wanted. This can come in handy
when a client has a large network. Due to restrictions, not all the network segments
to the PowerScale cluster have connectivity to the DNS servers. Should the node
servicing SSIP go down, or its interfaces become inactive, it stops servicing the
SSIP. The isi_dnsiq_d daemon on next lowest node, by device ID, will receive
an update and become the primary for the SSIP and the IP pools that it services.

Configure SSIP Address

Each cluster needs at least one SSIP, as long as there are no firewalls between the
infrastructure DNS servers and the SSIP that block TCP and UDP port 53.

PowerScale Advanced Administration

Page 146 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

SSIP Node Assignment

Multi-SSIP introduces an enhancement to assigning SSIPs.

 Attaching an SSIP to a node is not dependent116 on the Node ID.

116OneFS creates a file containing SSIP information, the SSIP Resource File. To
host an SSIP, a node must hold a lock on this file. All the nodes that are ready to
host an SSIP, attempt to lock the SSIP Resource File. The first nodes to get the
lock, host the SSIP.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 147


[email protected]
Advanced Access

 Node assignment is based on a lock to nodes117.


 The SSIP is held through configuration and group changes118.

SSIP resource file

117In certain scenarios, a node may host more than a single SSIP, depending on
the number of nodes and SSIPs in the subnet. The new process ensures that the
node assignment is based on a lock to nodes within the subnet, avoiding the issues
from previous releases. Once the node is offline, or the interface goes down, the
SSIP becomes available for lock again. The next quickest node to capture the lock
hosts the SSIP. OneFS ensures that SSIPs are as evenly distributed as possible
within a subnet, using a feature to limit a single node from hosting multiple SSIPs.

118Prior to OneFS 8.2, any configuration or group change would result in


SmartConnect stopping and unconfiguring DNS as OneFS was unaware if the
same node could host the SSIP. In OneFS 8.2 and later, the SSIP is held through
configuration and group changes with a re-evaluation after the change to confirm if
the SSIP can be held. If it is determined that the node is no longer qualified to own
the SSIP, some other node picks and releases it, minimizing the failover impact.

PowerScale Advanced Administration

Page 148 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

SmartConnect Multi-SSIP

The addition of more than a single SSIP provides fault tolerance and a failover
mechanism. Multi SSIP ensures the SmartConnect service continues to load
balance clients according to the selected policy.

 The number of SSIPs available per subnet depends on the SmartConnect


license119.
 More SSIPs provide redundancy and reduce failure points in the client
connection sequence, multiple SSIPs is not an additional load balancing
feature120.
 Each node hosts an SSIP independent121 of the other SSIP hosting nodes.

119 SmartConnect Basic allows 2 SSIPs per subnet while SmartConnect Advanced
allows 6 SSIPs per subnet.

120Although the additional SSIPs are in place for failover, the SSIPs configured are
active and respond to DNS server requests. Multi-SSIP configuration is Active-
Passive, where each node hosting an SSIP is independent and ready to respond to
DNS server requests, irrespective of the previous SSIP failing. SmartConnect
continues to function correctly if the DNS server contacted the other SSIPs,
providing SSIP fault tolerance.

121It is unaware of the status of the load-balancing policy and starts the load-
balancing policy back to the first option.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 149


[email protected]
Advanced Access

At step 2, the site DNS server sends a DNS request to the SSIP. Server then
awaits a response in step 3 for a node IP address that is based on the client
connection policy. If for any reason, the response in step 3 is not received within
the timeout window, the connection times out. The DNS server tries the second
SSIP and awaits a response in step 3. After another timeout window, the DNS
server continues cycling through subsequent SSIPs, up to the sixth SSIP with
SmartConnect Advance, if a response is not received after a request is sent to
each SSIP.

Note: Do not configure the site DNS server to load balance the
SSIPs. Each additional SSIP is only a failover mechanism, providing
fault tolerance and SSIP failover. Allow OneFS to perform load
balancing through the selected SmartConnect policy, ensuring
effective load balancing.

Configure Multi-SSIP

You can configure SmartConnect Multi-SSIP from CLI or WebUI.

• To identify the name of the external subnet you want to configure with Multi-
SSIP, run the isi network subnets list command.
• Run the isi network subnets modify command with the --sc-
service-addrs option, specifying an IP address range, in the following
format:

PowerScale Advanced Administration

Page 150 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

− isi network subnets modify <groupnet_name>.<subnet_name>


--sc-service-addrs=<
ip_address_range>
• The following command specifies the SmartConnect service Multi-SSIP
addresses on subnet0:

− isi network subnets modify subnet0 --sc-service-


addrs=192.168.25.10-192.168.25.11

Multiple SSIP from DNS Server Records

You can assign DNS servers to a groupnet and modify DNS settings that specify
DNS server behavior.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 151


[email protected]
Advanced Access

Configuring multiple SSIP from the DNS server records.

Settings Description

DNS Servers Sets a list of DNS IP addresses. Nodes issue


DNS requests to these IP addresses.
You cannot specify more than three DNS
servers.

DNS Search Suffixes Sets the list of DNS search suffixes. Suffixes are
appended to domain names that are not fully
qualified.
You cannot specify more than six suffixes.

Enable DNS resolver rotate Sets the DNS resolver to rotate or round-robin
across DNS servers.

PowerScale Advanced Administration

Page 152 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

Enable DNS server-side search Specifies whether server-side DNS searching is


enabled, which appends DNS search lists to
client DNS inquiries handled by a SmartConnect
service IP address.

Enable DNS cache Specifies whether DNS caching for the groupnet
is enabled.

DNS Zone Name

You can set a DNS zone name in SmartConnect to be a short name, for example,
isicl1 instead of isicl1.example.com. In order to set a zone name, you must
ensure that server-side DNS search is enabled (default configuration) and DNS
search list is specified. This allows for a name that is not an FQDN to see the
SmartConnect zone and answer any of the assigned search domains.

Enabling short names122:


• Enable server-side DNS search - isi network groupnet modify
groupnet0 --server-side-dns-search=yes
• Specify DNS search list - isi network groupnet modify groupnet0 --
dns search=example.com,another.example.com
• Enable short name for any of assigned search domains - isi network
pools modify groupnet0.subnet0.pool1 --sc-dns-zone=isicl1

122Short names have no big technical purpose other than user convenience. They
function by a defaulting process that is a normal function in DNS, and these
instructions set it up correctly.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 153


[email protected]
Advanced Access

OneFS DNS: Common Issues

Mentioned below are the OneFS DNS common issues.

3 5 7
1

4
2

1:

Multiple clients connect to same node despite SmartConnect configuration


• TTL = 0 (default)
• DNS servers (for example Microsoft DNS) may cache for one second
• Nearly simultaneous DNS requests or clients starting simultaneously hitting the
same cluster interface
• To change TTL - isi network pools modify <pool name> --sc-ttl
<non zero value>

2:

Multiple clients connect to same node despite SmartConnect configuration


• TTL = 0 (default)
• DNS servers (for example Microsoft DNS) may cache for one second
• Nearly simultaneous DNS requests or clients starting simultaneously hitting the
same cluster interface
• To change TTL - isi network pools modify <pool name> --sc-ttl
<non zero value>

3:

PowerScale Advanced Administration

Page 154 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

SmartConnect returns "refused" when looking up a zone name.


• Reason 1 - discrepancy between the name requested and the sc-dns-zone
configured. Check for discrepancy123.

4:

SmartConnect returns "refused" when looking up a zone name.


• Reason 1 - discrepancy between the name requested and the sc-dns-zone
configured. Check for discrepancy124.

5:

Why is SmartConnect returning "no error" and no response?


• Not all protocols are enabled on a node (as checked with isi_group_info).

6:

Why is SmartConnect returning "no error" and no response?


• Not all protocols are enabled on a node (as checked with isi_group_info).

7:

123Check for a discrepancy, observe the actual response from SmartConnect using
dig. If the Status is Refused, that usually means that there is a discrepancy
between the name you are asking for and the sc-dns-zone you configured on
cluster.

124Check for a discrepancy, observe the actual response from SmartConnect using
dig. If the Status is Refused, that usually means that there is a discrepancy
between the name you are asking for and the sc-dns-zone you configured on
cluster.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 155


[email protected]
Advanced Access

Why is SmartConnect not returning "no error" for type ANY queries?
• SmartConnect does not currently support ANY queries.

8:

Why is SmartConnect not returning "no error" for type ANY queries?
• SmartConnect does not currently support ANY queries.

The SmartConnect TTL is 0 by default. This is intended to prevent caching to


improve load balancing. DNS servers (for example Microsoft DNS) may cache for
one second. Nearly simultaneous DNS requests may give the same result. Also,
HPC or multiuser lab clients all starting simultaneously may hit the same interface
on the cluster, causing an issue. There can be a number of possible reasons why
SmartConnect returns "refused" when looking up a Zone Name. One of the
reasons is a discrepancy between the name that is requested and the sc-dns-
zone configured on cluster. Check for a discrepancy, observe the actual response
from SmartConnect using dig <your DNS Service Name> <Your Zone
Name>. If the Status is Refused, that usually means that there is a discrepancy
between the name you are asking for and the sc-dns-zone you configured on
cluster. To configure interface in the event the pool has none, use the isi
network pools modify <pool id> --add-ifaces <ifaces> command.
There are multiple reasons why SmartConnect returns "no error" and no response.
If SmartConnect does not return "noerror" for type ANY queries, then
SmartConnect does not currently support ANY queries.

SmartConnect Basic and Advanced

Round Robin

SmartConnect Basic uses a round robin strategy.

• Each node with an IP address in the pool is chosen in sequential order.


• Weights are not used for round robin IP delivery.
• Weights are assigned to addresses even though not used.
• The only option for SmartConnect Basic.

PowerScale Advanced Administration

Page 156 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

Load Balancing

SmartConnect Advanced uses an optional throughput or CPU usage or connection


count strategy. SmartConnect provides the ability to load balance incoming
connections – via DNS.

The load balancing policies are Round Robin(default) , Connection Count, CPU
Utilization, and Network Throughput.

Strategy Weighting Weight


Source

Throughput Traffic on Weight = maxthrough/node throughput,


external weight <=10
interfaces
Throughput uses an algorithm to compute
the rate of network traffic going into/out of a
nodes external interface.

CPU Usage CPU Usage If a node has usage 0: Gets maximum


weight and nodes with higher usage get
minimum weight

Otherwise: Weight = (maxweight) * (least


node CPU usage/current node CPU usage)
rounded down

Connection Connected Weight = (Max Client Count) – (Current


Count Clients Node Client Count), Weight <= 10

IPs moved onto a node increase node's


client count by 1 randomly

Round Robin: A weighted round-robin method is used to determine which IPs to


give out to DNS requests. Weights are computed from the metrics that the
connection policy provides (for example throughput, CPU usage, connection
count). The maximum weight is limited to be at most ten times the minimum weight.
Weights are reset every 5 seconds, when new statistic information is received.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 157


[email protected]
Advanced Access

Thus, if new connections occur slowly (for example, every 10 seconds), all policies
tend to look like round-robin.

A weighted round-robin method is used to choose an available node:

• The node with maximum throughput (maxthrough) has weight one.


• Nodes with less than maximum throughput (t) have weight maxthrough / (t), with
maximum weight ten. CPU Usage uses an algorithm to compute the raw CPU
usage of a node. A weighted round-robin method is used to choose an available
node.
• Nodes with CPU usage of 0 get maximum weight.
• If any node has CPU usage of 0, nodes with nonzero CPU usage get minimum
weight.
• If a node has CPU usage of N * (min cpu usage), it gets weight of maxweight/N
(rounded down).
• If a node has a CPU usage > maxweight * (min cpu usage), it gets weight 0.
Connection Count uses the number of clients that are connected through each
node. A weighted round-robin method is used to choose an available node.
• The node with the highest number of connected clients has weight 0.
• Every other node has the weight of the highest number of connections, minus
the connections that it has.
• The highest weight value is 10.
• If an IP address is moved to a node, as part of a dynamic address pool move,
the client count of node randomly increases by 1 to quickly reflect
developments.

Cbind: PowerScale Client-Side Cache

Cbind handles OneFS client-side DNS. Cbind is the distributed bind cache
daemon on OneFS. The primary purpose of cbind is to speed up DNS lookups on
the cluster, in particular for NFS workloads which can involve large number of DNS
lookups - especially with netgroups.

PowerScale Advanced Administration

Page 158 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

1: Handles client-side DNS of OneFS

• Distributed bind cache daemon


• Speed up DNS lookups on the cluster

2: The design of the cache is to distribute the cache and DNS workload among
each node of the cluster. Cbind supports caching AAAA queries. A records and
silently drops AAAA records. This results in a five second delay as the client falls
back to the next DNS server in the resolver config.

3: To include the concept of tenancy, the cbind daemon has the support for
multiple caches of dnscaches. Each tenant referred to the cache has its own cache
within cbind that is independent of other tenants.

4: To support different DNS caches for multiple groupnets, the cbind interface is
changed to have multiple client interfaces to separate DNS requests from different
groupnets.

5: Previously, client applications reached cbind by using the loopback address


(127.0.0.1), but now cbind has the entire 127.42.x.x address range. The client to
groupnet ID sets the lower 16 bytes of the address for query. For example, if the
client is trying to query DNS servers on a groupnet with an ID of 4, it sends the dns
query to 127.42.0.4.

6: Post OneFS 8.0 the service command to enable/disable the cbind daemon has
been removed. The only way to enable/disable the dns cache option on OneFS 8.0
and later is through the isi network command. The cache can be

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 159


[email protected]
Advanced Access

enabled/disabled per groupnet. Disabling cache can reduce performance, because


the cluster will not have the opportunity to speed DNS lookups. Whereas in an
unstable environment caching can be a greater hindrance than assistance,
because incorrect and out-of-date cache contents result in authentication failures
and delays.

Flushing Cluster Client DNS Cache

Each groupnet has its own cbind cache instance, there are two commands that
flush the cache.

• isi network dnscache flush command - flushes the dnscache for all
groupnets.
• isi_cbind flush groupnet <groupnet-name> command - flushes the
dnscache for a specific groupnet.

Any IP addresses referenced are internal interface(s)

The isi_cbind show cluster command is used to check metrics. Note: This is for OneFS
DNS caching, the IP addresses referenced are internal interface(s), not used for external access.

Cache flushing is not something that should be done regularly for no reason. It
interferes with performance. It can be necessary under certain conditions, to flush
stale cache entries. Typical cases are after large network changes; whether they
are internal to normal corporate functions, or a result of disaster recovery activities
or other big moves. Alternatively, cache flushing can happen with the purpose of
debugging the DNS environment, and the PowerScale cluster relationship to the
environment. If you suspect that there is a problem with name resolution, flushing
the cache for a certain groupnet, and then reexamining its normal operations as it
reestablishes the cache can demonstrate how the DNS infrastructure is operating.

PowerScale Advanced Administration

Page 160 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

Do not alter cbind settings without instructions or knowledge of the potential


implications. If in doubt, call support.

Managing DNS Cache Settings

You can set DNS cache settings for the external network.

• To flush DNS cache, from the Actions area, click Flush DNS Cache and
Confirm.
• To modify the DNS cache settings, enter the required limits and click Save
Changes.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 161


[email protected]
Advanced Access

Challenge

Lab Assignment:
1) Analyze DNS packets between a source and destination.
2) View and analyze SmartConnect configuration.
3) Troubleshoot DNS issues to redirect clients to the right access zone.

PowerScale Advanced Administration

Page 162 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

Authentication Providers

Authentication Provider Recap

Authentication settings for the clusters are managed using an authentication


provider. OneFS supports several authentication providers. The external
authentication providers include Active Directory, LDAP, and NIS. Internal
authentication providers include the Local provider and File provider.

The PowerScale Administration course covers Active Directory and LDAP


configuration.

1: Active Directory is a Microsoft implementation of Lightweight Directory Access


Protocol (LDAP), Kerberos, and DNS technologies that can store information about
network resources. Active Directory can serve many functions, but the primary
reason for joining the cluster to an Active Directory domain is to perform user and
group authentication.

2: The Lightweight Directory Access Protocol (LDAP) is a networking protocol that


enables you to define, query, and modify directory services and resources. OneFS
can authenticate users and groups against an LDAP repository to grant them
access to the cluster.

3: The Network Information Service (NIS) provides authentication and identity


uniformity across local area networks. OneFS includes a NIS authentication
provider that enables you to integrate the cluster with the NIS infrastructure. NIS,
can authenticate users and groups when they access the cluster.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 163


[email protected]
Advanced Access

4: Kerberos is a network authentication provider that negotiates encryption tickets


for securing a connection. OneFS supports Microsoft Kerberos and MIT Kerberos
authentication providers on a cluster. If you configure an Active Directory provider,
support for Microsoft Kerberos authentication is provided automatically. MIT
Kerberos works independently of Active Directory.

5: The local provider provides authentication, and lookup facilities for user accounts
added by an administrator. Local authentication is useful when Active Directory,
LDAP, or NIS directory services are not configured or when a specific user or
application needs access to the cluster.

6: A file provider enables you to supply an authoritative third-party source of user


and group information to a PowerScale cluster. A third-party source is useful in
UNIX and Linux environments that synchronize the /etc/passwd, /etc/group, and
etc/netgroup files across multiple servers.

Note: The MIT Kerberos authentication provider is used with NFS,


HTTP, and HDFS.

Most providers use UIDs (users ID), GIDs (group ID) and SIDs (security ID). A
major consideration in a multiprotocol environment is ensuring that users can
access their files regardless of the protocol they use. There are several ways to
address multiprotocol access. The first is that Active Directory supports RFC 2307,
which allows adding UNIX attributes to domain accounts. Other ways to map the
IDs together are discussed later in this topic.

Authentication Provider Features

Authentication providers support a mix of the features described in the following


table.

Feature Description

PowerScale Advanced Administration

Page 164 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

Authentication All authentication providers support


cleartext authentication. You can
configure some providers to support
NTLM or Kerberos authentication also.

Users and groups OneFS provides the ability to manage


users and groups directly on the cluster.

Netgroups Specific to NFS, netgroups restrict


access to NFS exports.

UNIX-centric user and group properties Login shell, home directory, UID, and
GID. Missing information is
supplemented by configuration templates
or additional authentication providers.

Windows-centric user and group NetBIOS domain and SID. Missing


properties information is supplemented by
configuration templates.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 165


[email protected]
Advanced Access

Trusts and Pass-Through Authentication

AD uses trusts to authenticate


users from trusted domain
Cluster is domain resource

Trusted
Direction of access (account)
Domain
Cluster should belong to
only one AD domain
within a forest
Domains in forest
automatically trust each
other

Trusting
(resource)
Direction of trust
Domain Users from trusted
domains can access
cluster

Can join more than one AD domain only if


additional domain is untrusted

The Active Directory authentication provider in a PowerScale cluster supports


domain trusts and NTLM or Kerberos pass-through125 authentication.

Users must have permission to the cluster resources, but pass-through


authentication grants trusted users access126 to the cluster resources.

125This means that a user authenticated to an AD domain can access resources


that belong to any other trusted AD domain. Because the cluster is a domain
resource, any authenticated user that belong to a trusted domain can access
cluster resources just as members of the cluster’s domain can access its
resources.

PowerScale Advanced Administration

Page 166 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

OneFS uses access zones127 to partition a cluster into multiple virtual containers.

Verifying Authentication Providers

Verify the authentication providers by using the command isi auth status. A
status of online means that the cluster and providers can reach each other. Use
the isi auth refresh command to refresh the status of authentication
providers.

126For this reason, a cluster need to only belong to one Active Directory domain
within a forest or among any trusted domains. A cluster should belong to more than
one AD domain only to grant cluster access to users from multiple unstructured
domains.

127
Access zones support configuration settings for authentication and identity
management services. Access zones are discussed shortly.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 167


[email protected]
Advanced Access

Network Information Service Overview

Definition

The Network Information Service (NIS) provides authentication and identity uniformity across local area networks.

OneFS includes an NIS authentication provider that enables to integrate the cluster
with NIS infrastructure.

• NIS can authenticate users and groups when they access the cluster.
• The NIS provider exposes the passwd, group, and netgroup maps from an NIS
server.
• Hostname lookups are also supported.
• You can specify multiple servers for redundancy and load balancing.

Decision point: Are NIS and NIS+ the same? Does OneFS supports
NIS+?

Each NIS provider must be associated with a groupnet. The groupnet is a top-level
networking container that manages hostname resolution against DNS nameservers
and contains subnets and IP address pools. The groupnet specifies which
networking properties the NIS provider will use when communicating with external
servers. The groupnet associated with the NIS provider cannot be changed.
Instead you must delete the NIS provider and create it again with the new groupnet
association.

You can add an NIS provider to an access zone as an authentication method for
clients connecting through the access zone. An access zone may include at most
one NIS provider. The access zone and the NIS provider must reference the same
groupnet. You can discontinue authentication through an NIS provider by removing
the provider from associated access zones. NIS is different from NIS+, which
OneFS does not support.

PowerScale Advanced Administration

Page 168 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

NIS Configuration

You can view, configure, and modify NIS providers or delete providers that are no
longer needed. You can discontinue authentication through an NIS provider by
removing it from all access zones that are using it. By default, when you configure
an NIS provider it is automatically added to the System access zone.

Configuration pages for the NIS provider.

Decision point: How do you have cluster resolve the error message
that indicates that the client is unable to reach the NIS servers?

Local Provider Overview

Definition

The local provider provides authentication and lookup facilities for user accounts added by an administrator.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 169


[email protected]
Advanced Access

Local authentication is useful when Active Directory, LDAP, or NIS directory


services are not configured or when a specific user or application must access to
the cluster. Local groups can include built-in groups and Active Directory groups as
members.

In addition to configuring network-based authentication sources, local users and


groups can be managed by configuring a local password policy for each node in
the cluster. OneFS settings specify password complexity, password age and reuse,
and password-attempt lockout policies.

A use case for having local providers may be an organization with no networked
providers (dark sites) and that needs to have separate authentication and access to
the cluster. For example, one group requires access to high-performance nodes
while another group access the utility nodes.

Use Case:

In a PowerScale environment, there are 5000 Active Directory users that access
shares in the access zone. 10 Linux users that also access data in the access
zone.

You don’t want the administrator adding LDAP to the access zone only to
authenticate 10 users.

What if the LDAP provider has 5000 users and not all the users are the same as
the 5000 AD users. This can become a very serious issue if adding the LDAP
provider to the access zone. Because OneFS will automatically map the users. So,
a user “John” in LDAP might not be the same “John” in AD, but OneFS will see
them as the same “John” and the token shows this. So, the LDAP John one day
discovers there are some useful files in a directory. He mounts the directory and
discovers he owns all kinds of files and subdirectories. He doesn’t know what any
of it is and so deletes it all. AD John logs in to find all his files gone. Now there is a
confusion between the two John's. One John keeps recovering files while the other
keeps deleting them. IT tickets are generated and the Linux admin doesn’t know
why and the AD admin doesn’t know the reason for the strange behavior.

The solution can be to have the storage admin remove LDAP from Access Zone,
flush and refresh the tokens. Add only the needed Linux users to the AZ local
provider.

PowerScale Advanced Administration

Page 170 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

Local Provider Configuration

When you create an access zone, each zone includes a local provider that allows
you to create and manage local users and groups.

Although you can view the users and groups of any authentication provider, you
can create, modify, and delete users and groups in the local provider only.

The user can change groups added automatically to local provider zone, in order to
grant or deny access if the default configuration does not meet security company
profile.

Configuration pages for the Local provider.

Decision point: Can I create a local provider without creating an


access zone?
Answer: Local provider is created automatically when a new access
zone is created. Without creating an access zone you cannot create a
local provider.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 171


[email protected]
Advanced Access

File Provider Overview

Definition

A file provider enables an authoritative third-party source of user and group information to an Isilon cluster.

A third-party source is useful in UNIX and Linux environments that synchronize


/etc/passwd, /etc/group, and etc/netgroup files across multiple servers.
Standard BSD /etc/spwd.db and /etc/group database files serve as the file
provider backing store on a cluster.

On a PowerScale cluster, a file provider hashes passwords with "libcrypt". It is


recommended to use the Modular Crypt Format in the source /etc/passwd file to
determine the hashing algorithm.

Note: The integrated System file provider includes services to list,


manage, and authenticate against system accounts such as root,
admin, and nobody. It is recommended that you do not modify the
System file provider.

File Provider Configuration

You can configure one or more file providers, each with its own combination of
replacement files, for each access zone.

Each file provider pulls directly from up to three replacement database files: a
group file that has the same format as /etc/group; a netgroups file; and a binary
password file, spwd.db, which provides fast access to the data in a file that has
the /etc/master.passwd format.

• Password database files, which are also called user database files, must be in
binary format.
• You must copy the replacement files to the cluster and reference them by their
directory path.

PowerScale Advanced Administration

Page 172 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

Configuration pages for the File provider.

Kerberos Overview

Definition

Kerberos is a network authentication provider that negotiates encryption tickets for securing a connection. OneFS supports
Microsoft Kerberos and MIT Kerberos authentication providers on a cluster.

MIT Kerberos supports certain standard network communication protocols such as


HTTP, HDFS, and NFS only. MIT Kerberos does not support SMB, SSH, and FTP
protocols.

 MIT Kerberos works independently of Active Directory128.


 Within a realm129, an authentication server has the authority to authenticate a
user, host, or service; the server can resolve to either IPv4 or IPv6 addresses.

128Support for Microsoft Kerberos authentication is provided automatically when an


Active Directory provider is configured.

129For MIT Kerberos authentication, you define an administrative domain that is


known as a realm.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 173


[email protected]
Advanced Access

 Key Distribution Center (KDC)130 and Service Principal Name (SPN)131.


 MIT Kerberos provider can be added to an access zone132.

Kerberos Configuration

130The authentication server in a Kerberos environment is called the Key


Distribution Center and distributes encrypted tickets.

131 When a user authenticates with an MIT Kerberos provider within a realm, an
encrypted ticket with the user service principal name is created. The ticket is
validated to securely pass the identification of user for the requested service. Each
MIT Kerberos provider must be associated with a groupnet.

132 MIT Kerberos provider can be added to an access zone as an authentication


method for clients connecting through the access zone. An access zone may
include at most one MIT Kerberos provider. The access zone and the Kerberos
provider must reference the same groupnet.

PowerScale Advanced Administration

Page 174 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

An MIT Kerberos realm is an administrative domain


You can optionally define MIT Kerberos that defines the boundaries within which
domains to allow additional domain extensions authentication server has the authority to
to be associated with an MIT Kerberos realm. authenticate a user or service.
You can create, modify, delete, and view an
MIT Kerberos domain. A Kerberos domain
name is a DNS suffix that you specify typically
using lowercase characters.

You can configure the settings of a Kerberos


provider to allow the DNS records to locate the
KeyDistribution Center (KDC), Kerberos realms,
and the authentication servers associated with a
Kerberos realm.

Configuration pages for the Kerberos provider.

Decision point: How do I resolve issues with the error, Failed to join
realm: (LW_ERROR_DOMAIN_IS_OFFLINE) The domain is offline.
Link: See Troubleshoot Kerberos Issues on your Isilon Cluster guide
to resolve the error.

You can configure an MIT Kerberos provider for authentication without Active
Directory. Configuring an MIT Kerberos provider involves creating an MIT Kerberos
realm, creating a provider, and joining a predefined realm. Optionally, you can
configure an MIT Kerberos domain for the provider. You can also update the
encryption keys if there are any configuration changes to the Kerberos provider.
You can include the provider in one or more access zones.

To resolve the above error, determine which domain is reporting as offline by


running the isi auth status command. Determine which nodes are reporting
the domain as offline by running the isi_for_array -s "isi auth status
| grep -i " command. Certain ports must be open in order for the nodes to
contact the DCs. Test whether these ports are open by running the following
commands, where is the FQDN of the domain controller. Run these commands for
any of the DCs that are reporting as offline:

• nc -z 88
• nc -z 389

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 175


[email protected]
Advanced Access

• nc -z 445
• nc -z 464

If the port is open, the output looks similar to: Connection to


dc.domain.isilon.com 389 port [tcp/ldap] succeeded!

If the port is not open, no output is returned.

See OneFS: Service Principal Names for Kerberos Authentication document to


learn more about SPN's.

PowerScale Advanced Administration

Page 176 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

Protocols

OneFS File Sharing

• Multiprotocol support in OneFS enables


accessing files and directories on the
PowerScale cluster through:
− Server Message Block (SMB)133
− Network File System (NFS)134
− S3 Protocol135
− HTTP and HTTPS136
− FTP137

133Allows Microsoft Windows and macOS X clients to access files that are stored
on the cluster.

134
Allows Linux and UNIX clients that adhere to the RFC1813 (NFSv3) and
RFC3530 (NFSv4) specifications to access files that are stored on the cluster.

135The S3-on-OneFS technology enables the usage of Amazon Web Services


Simple Storage Service (AWS S3) protocol to store data in the form of objects on
top of the OneFS file system storage.

136Allows clients to access files that are stored on the cluster through a web
browser.

137Allows any client that is equipped with an FTP client program to access files that
are stored on the cluster through the FTP protocol.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 177


[email protected]
Advanced Access

• You can set Windows-based and UNIX-based permissions on OneFS files and
directories.
• With the required permissions and administrative privileges, you can create,
modify, and read data on the cluster through one or more of the supported file
sharing protocols.
• By default, all file sharing protocols are disabled.

SMB Share Permissions

Three settings for users or groups.

Configuring access to the cluster through SMB shares involves setting share
permissions.

Share permissions have only three settings for users or groups: Full control, Read-
write, or Read-only.

Share permissions need to be explicitly granted to a user or group. Otherwise, the


user or group members are given an implicit deny.

• Modified through WebUI, CLI, or MMC

PowerScale Advanced Administration

Page 178 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

• Members in multiple group entries get cumulative share permissions138


• Deny overrides allow139
• Order matters140

SMBv3 Encryption

• OneFS 8.1.1 and above supports SMBv3 encryption to secure access to data
over untrusted networks by providing on-wire encryption141 between the client
and PowerScale cluster.
• SMB encryption can be used by any clients142 which support SMBv3.

138 If a user is a member of multiple groups with different levels of permissions, the
permissions are added to give the user more permission. For example, user JaneD
is a member of Domain Admins and Domain Users. Domain Admins is given
permission of Full Control in the share, and Domain Users, is given Read-write
permission in the share. JaneD is granted Full Control in the share.

139 Another rule to remember is that a deny overrides an allow if ordered correctly
at the top of the permissions list.

140
If the deny is ordered after an allow permission, the cluster does not enforce the
deny.

141
Prevents an attacker from tampering with any data packet in transit without
needing any extra infrastructure.

142
Eligible clients include Windows Server 2012, 2012R2, 2016, Windows Client 8,
and Windows 10.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 179


[email protected]
Advanced Access

• You can configure SMBv3 encryption on a per-share, per-zone, or cluster-wide


basis.
• Configure PowerScale to allow or reject access to older clients that lack SMB
encryption support.

Windows 7 client connection is rejected because it lacks the SMB encryption support.
Windows 10 client data access will be encrypted as it supports SMBv3 encryption.

On the PowerScale side, the encryption and decryption happen in the kernel level
with Intel CPU extensions for hardware acceleration to gain a performance benefit
for next generation PowerScale clusters. Encryption and Decryption can be easily
managed at the global, access zone and individual share level on PowerScale:

• For global level, on-wire data between clients and PowerScale clusters are
encrypted after authentication.
• For access zone level, on-wire data between clients in the access zone are
encrypted after authentication.
• For share level, on-wire data between clients and share are encrypted once
clients can have access to the share.

PowerScale Advanced Administration

Page 180 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

SMBv3 Encryption - Administration

• Enable and enforce cluster-wide SMBv3 encryption:


− WebUI: Navigate to Protocols > Windows Sharing (SMB) > Server
Settings.
− CLI: isi smb settings global modify --support-smb3-
encryption=yes --reject-unencrypted-access=yes
• Per-zone and per-share encryption settings can only be configured through the
OneFS CLI.
− Zone: isi smb settings zone modify --zone=<zone>

− Share: isi smb settings shares modify <share>


• To disable SMBv3 encryption, use the --revert-support-smb3-
encryption option.
• SMB encryption configuration changes require refreshing the SMB server.

Resource: Dell EMC PowerScale: SMB 3 Encryption in Healthcare

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 181


[email protected]
Advanced Access

SMB Multichannel

• SMB Multichannel143 supports establishing a single SMB session over multiple


network connections.
− Increased throughput144
− Connection failure tolerance145
− Automatic discovery146
• You must enable SMB Multichannel on both the cluster147 and the Windows
client computer148.
• SMB Multichannel requires at least one of the following NIC configurations on
the client computer: Single RSS-capable NIC, Multiple NICs or Aggregated
NICs.

143 SMB Multichannel is a feature of the SMB 3.0 protocol.

144OneFS can transmit more data to a client through multiple connections over
high-speed network adapters or over multiple network adapters.

145When an SMB Multichannel session is established over multiple network


connections, the session is not lost if one of the connections has a network fault,
which enables the client to continue to work.

146When an SMB Multichannel session is established over multiple network


connections, the session is not lost if one of the connections has a network fault,
which enables the client to continue to work.

147 Enabled on the cluster by default.

148 Supports clients are Windows Server 2012, 2012 R2 or Windows 8, 8.1 clients.

PowerScale Advanced Administration

Page 182 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

• SMB Multichannel only works between a client and a single PowerScale


node149.
• SMB can be enabled or disabled using the OneFS CLI only: isi smb
settings global modify --support-multichannel={yes | no}

Important: When SMB Multichannel is enabled, avoid using LACP on


the PowerScale cluster. SMB Multichannel automatically detects the
IP addresses of both 10GbE/40GbE interfaces on the client and load
balance across each of the two interfaces on the dual-ported NIC.

149 SMB Multichannel cannot share the load between PowerScale nodes.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 183


[email protected]
Advanced Access

SMB Continuous Availability Administration

You can only enable continuous


availability only when creating a
share.

• SMB CA can be enabled for SMB 3.0 capable Windows clients150 in OneFS 8.0
and later.

150 CA is supported with Microsoft Windows 8, Windows 10, and Windows 2012 R2
clients.

PowerScale Advanced Administration

Page 184 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

• Enable CA while creating a share. The --continuously-available option


is used in the CLI equivalent command.
• Configure CA Timeout Value151 while creating or modifying a share. The --ca-
timeout option in used.
• Configure Strict CA Lockout152 while creating or modifying a share. The --
strict-ca-lockout option in used.
• You can configure write integrity settings to control the stability of writes to the
share using the --ca-write-integrity option.

− None153
− Write-read coherent154
− Full155

151 SMB3 uses persistent handles to provide CA by mirroring the file state across
all nodes. CA timeout value specifies the amount of time you want a persistent
handle to be retained after a client is disconnected or a server fails. The default is 2
minutes.

152 When enabled, prevents a client from opening a file if another client has an
open but disconnected persistent handle for that file. When disabled, OneFS issues
persistent handles, but discards them if any client other than the original opener
tries to access the file. Strict timeout is enabled by default.

153Continuously available writes are not handled differently than other writes to the
cluster. If you specify none and a node fails, you may experience data loss without
notification. This setting is not recommended.

154 Writes to the share are moved to persistent storage before a success message
is returned to the SMB client that sent the data. This is the default setting.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 185


[email protected]
Advanced Access

SMB Shares Advanced Settings

SMB share settings can be configured specific to a share or an access zone.

It is recommended that you configure advanced SMB share settings156 only if you
have a solid understanding of the SMB protocol.

For the complete list of advanced options, view the PowerScale OneFS CLI
Command Reference guide.

Share Description
Setting

Create Sets the default source permissions to apply when a file or directory
Permission is created. The default value is Default acl.

Directory Specifies UNIX mode bits that are removed when a directory is
Create Mask created, restricting permissions. Mask bits are applied before mode
bits are applied. The default value is that the user has Read,
Write, and Execute permissions.

Directory Specifies UNIX mode bits that are added when a directory is
Create Mode created, enabling permissions. Mode bits are applied after mask
bits are applied. The default value is None.

155 Writes to the share are moved to persistent storage before a success message
is returned to the SMB client that sent the data, and prevents OneFS from granting
SMB clients write-caching and handle-caching leases.

156The advanced settings affect the behavior of the SMB service. Changes to
these settings can affect all current and future SMB shares.

PowerScale Advanced Administration

Page 186 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

File Create Specifies UNIX mode bits that are removed when a file is created,
Mask restricting permissions. Mask bits are applied before mode bits are
applied. The default value is that the user has Read, Write, and
Execute permissions.

File Create Specifies UNIX mode bits that are added when a file is created,
Mode enabling permissions. Mode bits are applied after mask bits are
applied. The default value is that the user has Execute
permissions.

Impersonate Determines guest access to a share. The default value is Never.


Guest

Impersonate Allows all file access to be performed as a specific user. This must
User be a fully qualified username. The default value is No value.

NFS Aliases

Create NFS alias using WebUI.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 187


[email protected]
Advanced Access

• NFS aliases157 provide shortcuts for directory path names in OneFS. If those
path names are defined as NFS exports, NFS clients can specify the aliases as
NFS mount points.
• Each alias must point to a valid path on the file system158.
• Aliases and exports are completely independent159.
• NFS aliases are zone-aware160.
• WebUI: Navigate to Protocols > UNIX Sharing (NFS) > NFS Aliases.
• CLI command: isi nfs aliases create/modify/delete
• Example: An alias named /engineering-gen maps to /ifs/div-
gen/engineering/general-purpose. An NFS client could mount that
directory through either of:

157 NFS aliases are designed to give functional parity with SMB share names within
the context of NFS. Each alias maps a unique name to a path on the file system. It
is useful for long path names.

158While this path is absolute, it must point to a location beneath the zone root (/ifs
on the System zone). If the alias points to a path that does not exist on the file
system, any client trying to mount the alias would be denied in the same way as
attempting to mount an invalid full pathname.

159
You can create an alias without associating it with an NFS export. Similarly, an
NFS export does not require an alias. As a best practice, it is recommended to use
NFS aliases for long directory path names.

160 By default, an alias applies to the client's current access zone. To change this,
you can specify an alternative access zone as part of creating or modifying an
alias. Each alias can only be used by clients on that zone, and can only apply to
paths below the zone root. Alias names are unique per zone, but the same name
can be used in different zones—for example, /home.

PowerScale Advanced Administration

Page 188 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

− [root@centos ~]# mount cluster_ip:/engineering-gen

− [root@centos ~]# mount cluster_ip:/ifs/div-


gen/engineering/general-purpose

NFS Root Squash

Maps all users with UID 0 to


another UID

Default mapping is to nobody,


UID 65534

Configure root squash while creating an export using the WebUI.

• The root squash allows all users with a UID of 0 to be given a different UID
while connected to that export.
• The default UID given is 65534, which has a name of nobody or nfsnobody.
• The root-squashing rule prevents root users on NFS clients from exercising root
privileges on the NFS server.
• The exact user that a root UID is mapped to can be changed per export.
• CLI: isi nfs exports modify 1 --map-root-enabled true --map-
root nobody

Best Practice: It is a best practice to use root squash with every


export. If the root-squashing rule is not in effect, you can implement it
for the default NFS export.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 189


[email protected]
Advanced Access

NFS Security Considerations

• Set up an external firewall with appropriate rules and policies to allow only
trusted clients and servers to access the cluster.
• Allow restricted access only to ports that are required for communication161 and
block access to all other ports on the cluster.
• Configure one or more security types: UNIX (system)162, Kerberos5, Kerberos5
Integrity, Kerberos5 Privacy163.
• Limit root access to the cluster to trusted host IP addresses.
• Ensure all new devices added to the network are trusted164.

161
Ports: 2049 for NFS, 300 for NFSv3 mount service, 302 for NFSv3 NSM, 304 for
NFSv3 NLM, and 111 for ONC RPC portmapper.

162 The default security flavor (UNIX) relies upon having a trusted network.

163 If you do not completely trust everything on your network, then the best practice
is to choose a Kerberos option. If the system does not support Kerberos, it will not
be fully protected because NFS without Kerberos trusts everything on the network
and sends all packets in cleartext.

164Use an IPsec tunnel. This option is very secure because it authenticates the
devices using secure keys. Alternatively, configure all of the switch ports to go
inactive if they are physically disconnected. In addition, ensure that the switch ports
are MAC limited.

PowerScale Advanced Administration

Page 190 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

Protect PowerScale system with an external


firewall.

Configure security flavors using WebUI or CLI.

NFS Exports Advanced Settings

NFS export settings can be configured globally for all exports or specific to an
export.

It is recommended that you configure advanced NFS export settings165 only if you
have a solid understanding of the NFS protocol.

For the complete list of advanced options, view the PowerScale OneFS CLI
Command Reference guide.

165Changes to default export settings affect all current and future NFS exports that
use default settings.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 191


[email protected]
Advanced Access

Share Description
Setting

Block Size The block size used to calculate block counts for NFSv3 FSSTAT
and NFSv4 GETATTR requests. The default value is 8192 bytes.

Directory The preferred directory read transfer size reported to NFSv3 and
Transfer Size NFSv4 clients. The default value is 131072 bytes.

Read The maximum read transfer size reported to NFSv3 and NFSv4
Transfer Max clients. The default value is 1048576 bytes.
Size

Write Transfer The maximum write transfer size reported to NFSv3 and NFSv4
Max Size clients. The default value is 1048576 bytes.

Commit If set to yes, allows NFSv3 and NFSv4 commit operations to be


Asynchronous asynchronous. The default value is No.

Max File Size Specifies the maximum file size to allow. This setting is advisory in
nature and is returned to the client in a reply to an NFSv3 FSINFO
or NFSv4 GETATTR request. The default value is
9223372036854776000 bytes.

Encoding Overrides the general encoding settings the cluster has for the
export. The default value is DEFAULT.

PowerScale Advanced Administration

Page 192 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

OneFS S3 Key Management

Users have only one access key ID.


However, users may have at most
two secret keys when the old key
has an expiry date set.

WebUI: Protocols > Objects Storage (S3) > Key Management.

• S3 uses its own method of authentication by generating access key166 for a


user.
• The access key for a user has two components:
− Access ID167

166
Verify signatures using AWS Signature Version 4 or AWS Signature Version 2
and validate it against the S3 request.

167The access key ID can be a 16 to 128-byte string. The access ID indicates who
the user is. OneFS generates one access ID per user. For example, OneFS may
generate access ID 1_joe_accid for user joe. The prefix number represents the
user’s access zone ID. Each access ID would have a latest secret key without an
expiry time set and an old secret key that has an expiry time set.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 193


[email protected]
Advanced Access

− Secret Key168
• OneFS treats unauthenticated requests as anonymous requests made by the
nobody user (UID 65534).
• Only users in the Administrator role are authorized to generate access keys.169
• CLI command: isi s3 keys create

OneFS S3 Multipart Upload

• The multipart upload allows users to upload new large files or make a copy of
an existing file in parts for better uploading performance.
• Parts are uploaded to the temporary directory .isi_s3_parts_UploadId,
and the temporary directory is created under the target directory.
• A part has a maximum size of 5 GB and the last part has a minimum size of 5
MB.
• After all the parts are uploaded successfully, multipart upload is completed by
concatenating the temporary files to the target file.

168The secret key is used to generate the signature value along with several
request-header values. After receiving the signed request, OneFS uses the access
ID to retrieve a copy of the secret key internally, recompute the signature value of
the request, and compare it against the received signature. If they match, the
requester is authenticated, and any header value that was used in the signature is
now verified to be untampered as well.

169If an administrator creates a new secret key for a user and forgets to set the
expiry time, the administrator cannot go back and set the expiry time again. The
new key is created and the old key is set to expire after 10 minutes, by default.

PowerScale Advanced Administration

Page 194 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

1: The client initiates the multipart upload.

2: The data of each part is written to a dedicated file.

3: After receiving the client complete multipart upload request, OneFS finishes the
multipart upload operation by concatenating the temporary files to the target file.

OneFS S3 Bucket and Object Operations

Applications developed using the S3 API can access OneFS files and directories
as objects using the OneFS S3 protocol.

Listed in the table are the common OneFS S3 bucket and object operations.

For the complete list and description, view the Dell EMC PowerScale: OneFS S3
API Guide.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 195


[email protected]
Advanced Access

Bucket Operations Object Operations

CreateBucket170 GetObject171

ListObjects172 DeleteObject173

GetBucketLocation174 HeadObject175

DeleteBucket176 PutObject177

170 The PUT operation is used to create a bucket. Anonymous requests are never
allowed to create buckets. By creating the bucket, the authenticated user becomes
the bucket owner.

171Retrieves objects from OneFS through the S3 protocol. If read permission is


granted to the nobody user in OneFS, a client can retrieve the object without using
an authorization header.

172 The API returns some or all (up to 1,000) of the objects in a bucket.

173Delete a single object from a bucket. Deleting multiple objects from a bucket
using a single request is not supported.

174 Returns the location as an empty string.

175 HEAD operation retrieves metadata from an object without returning the object
itself. This operation is useful if you are only interested in an object's metadata. The
operation returns a 200 OK if the object exists and if you have permission to
access it. Otherwise, the operation might return responses such as 404 Not Found
and 403 Forbidden.

176 Delete a bucket. When a bucket is deleted, OneFS only removes the bucket
information while preserving the data under the bucket.

PowerScale Advanced Administration

Page 196 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

HeadBucket178 CopyObject179

ListBuckets180 CreateMultipartUpload181

ListMultipartUploads182 UploadPart183

177 Add an object to a bucket.

178Determines if a bucket exists and if permission is granted to access it. The


operation returns a 200 OK if the bucket exists and if you have permissions to
access it. Otherwise, the operation might return responses such as 404 Not Found
and 403 Forbidden.

179Create a copy of an object that is already stored in OneFS. You can treat it as
server-side-copy which reduces the network traffic between the clients and OneFS.

180 Get a list of all buckets owned by the authenticated user of the request.

181Initiate a multipart upload and return an upload ID. This upload ID is used to
associate with all the parts in the specific multipart upload. You can specify this
upload ID in each of your subsequent upload part requests. You also include this
upload ID in the final request to either complete or cancel the multipart upload
request.

182List in-progress multipart uploads. An in-progress multipart upload is a multipart


upload that has been initiated using the Initiate Multipart Upload request but has
not yet been completed or aborted.

183Upload a part in a multipart upload. Each part must be at least 5 MB, except the
last part. The maximum size of each part is 5 GB.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 197


[email protected]
Advanced Access

HTTP and HTTPS

• OneFS includes a configurable Hypertext Transfer Protocol (HTTP) service. 184


• HTTP is used to request files that are stored on the cluster and to interact with
the web administration interface.
• OneFS supports a form of the web-based DAV (WebDAV)185 protocol that
enables users to modify and manage files on remote web servers.

184Each node in the cluster runs an instance of the Apache HTTP Server to
provide HTTP access. You can configure the HTTP service to run in different
modes.

185
OneFS performs distributed authoring, but does not support versioning and
does not perform security checks.

PowerScale Advanced Administration

Page 198 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

• OneFS supports both HTTP and its secure variant, HTTPS186.


• Both HTTP and HTTPS are supported for file transfer, but only HTTPS is
supported for API calls.

Important: HTTP and FTP only work for the System access zone.

HTTP Administration

You can configure HTTP and DAV to enable users to edit and manage files
collaboratively across remote web servers.

186
HTTP Secure (HTTPS) encrypts information and then exchanges it. With
HTTPS the message is only understood by the sender and the recipient. Anyone
who opens the message in between cannot understand it.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 199


[email protected]
Advanced Access

Step 1

Click Protocols and then go to HTTP settings. In the Service area, select one of
the following settings:

• Enable HTTP187
• Disable HTTP and redirect to the OneFS Web Administration interface 188

187Allows HTTP access for cluster administration and browsing content on the
cluster.

188Allows only administrative access to the web administration interface. This is the
default setting.

PowerScale Advanced Administration

Page 200 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

• Disable HTTP189
• CLI: isi http settings modify --service= {enabled | disabled
| redirect}
• Enable or disable access to a PowerScale cluster through the Apache service
over HTTPS: isi_gconfig -t http-config https_enabled={true |
false}

189Closes the HTTP port that is used for file access. Users can continue to access
the web administration interface by specifying the port number in the URL. The
default port is 8080.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 201


[email protected]
Advanced Access

Step 2

Type or choose a path within /ifs as the document root directory. Then, select the
HTTP authentication method:

• Off190
• Basic Authentication Only191
• Integrated Authentication Only192

190 Disables HTTP authentication.

191 Enables HTTP basic authentication. User credentials are sent in clear text.

192 Enables HTTP authentication via NTLM, Kerberos, or both.

PowerScale Advanced Administration

Page 202 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

• Integrated and Basic Authentication193


• Basic Authentication with Access Controls194
• Integrated Authentication with Access Controls195
• Integrated and Basic Authentication with Access Controls196
• CLI: isi http settings modify --server-root=/ifs --basic-
authentication={yes | no} --integrated-authentication={yes
| no}

193 Enables both basic and integrated authentication.

194Enables HTTP basic authentication and enables the Apache web server to
perform access checks.

195Enables HTTP integrated authentication via NTLM and Kerberos, and enables
the Apache web server to perform access checks.

196Enables HTTP basic authentication and integrated authentication, and enables


the Apache web server to perform access checks.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 203


[email protected]
Advanced Access

Step 3

• To allow multiple users to manage and modify files collaboratively across


remote web servers, select Enable WebDAV.
• Select Enable access logging.
• Click Save Changes.
• CLI: isi http settings modify --dav=yes --enable-access-
log=yes

FTP File Sharing

• File Transfer Protocol (FTP) allows systems with an FTP client to connect to the
cluster and exchange files.
• OneFS includes a secure FTP service called vsftpd, which stands for Very
Secure FTP Daemon, that you can configure for standard FTP and FTPS file
transfers.
• You can set the FTP service to allow any node in the cluster to respond to FTP
requests through a standard user account.

PowerScale Advanced Administration

Page 204 © Copyright 2020 Dell Inc.


[email protected]
Advanced Access

• When configuring FTP access, ensure that the specified FTP root is the home
directory of the user who logs in to the cluster197.
• Administration:

− WebUI: Navigate to Protocols > FTP settings.


− CLI: isi ftp settings modify

2
3
4

1: FTP is disabled by default. You also need to enable the vsftpd service by
running the isi services vsftpd enable command.

2: Allow users with "anonymous" or "ftp" as the username to access files and
directories without requiring authentication. This setting is disabled by default.

3: Allow local users to access files and directories with their local username and
password, allowing them to upload files directly through the file system. This setting
is enabled by default.

197 For example, the FTP root for local user jsmith should be ifs/home/jsmith.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 205


[email protected]
Advanced Access

4: Allow files to be transferred between two remote FTP servers. This setting is
disabled by default.

Challenge

Lab Assignment: Demonstrate the multitenant, multiprotocol use case for


a PowerScale cluster.
1) Verify access to the same directory using SMB, NFS, S3, HTTP and
FTP.

PowerScale Advanced Administration

Page 206 © Copyright 2020 Dell Inc.


[email protected]
Advanced Authorization

Advanced Authorization

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 207


[email protected]
Advanced Authorization

Module Objectives

After completion of this module, you can:

• Describe the types of protocol access.


• Describe share and file permissions and their viewing commands.
• Compare various ways to modify permissions.
• Describe adding and removing ACEs on the cluster.
• Discuss the OneFS relationship with different authentication providers.
• Verify user mapping rules and access tokens.

PowerScale Advanced Administration

Page 208 © Copyright 2020 Dell Inc.


[email protected]
Advanced Authorization

Multiprotocol Permissions

Multiprotocol Overview

Multiprotocol support ensures the consistency of secured data access, regardless


of protocol. Different users, operating systems, and implementations can write and
read to the same files on the cluster. There are two methods for data access:
Single protocol198 and Multiprotocol199 access.

198 Single data access protocols are self-contained. Windows users access
Windows file servers through the Server Message Block (SMB) protocol. UNIX
users access file servers through the Network File System (NFS) protocol. When a
user connects to a cluster to read and write files, the protocol assesses the security
of file against a set of permissions. The protocol assesses to determine whether
access will be allowed. Each protocol has its own type of file permissions to the
user and to the file(s), which prevents a UNIX user from accessing Windows file
servers, and conversely. Each protocol is a closed system.

199 Multiprotocol access puts the NAS platform in the middle, creating a system
where different users can connect to the same file server (or cluster) through
different protocols. The multiprotocol NAS platform handles and stores the
permissions for each protocol and user.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 209


[email protected]
Advanced Authorization

In OneFS, multiprotocol means that users who connect through NFS, SMB, and
other protocols can access the same file and directories. If necessary, you can
create a file or a directory that a Windows or UNIX client200 accesses. However,
unlike other file systems or NAS systems—which might maintain protocol
permissions separately or rely on user mapping - OneFS uses a single unified
permission model201.

200 OneFS supports the standard UNIX tools for viewing and changing permissions,
"ls, chmod, and chown". For more information, run the "man ls", "man chmod", and
"man chown" commands.

201The unified permission model is implemented by creating a common access


token. The access token is generated when a user connects to the cluster. In
OneFS, your identity (or multiple identities from different directory services) is
encapsulated into a single token that represents you to OneFS. The access token
contains your user identifier (UID), user security identifier (SID), Windows group
memberships (SIDs), group identification number (GIDs) from LDAP group
memberships, and more. All those identities are rolled into one, contained in the
token. This token is then presented directly against the file permissions stored on
the OneFS file system.

PowerScale Advanced Administration

Page 210 © Copyright 2020 Dell Inc.


[email protected]
Advanced Authorization

File Access Checking

The actual file permissions of user are entirely defined by comparing the access
token against permissions on the file.

OneFS files support two distinct states: Permissions state 202and Access
permissions203.

202 When a file is accessed over SMB, OneFS generates a synthetic ACL that is
based directly on the POSIX permissions of file. The synthetic ACL is a correlation
of the POSIX permissions to an ACL. The synthetic ACL is not persistent: it is not
stored on disk. OneFS only creates the synthetic ACL at the time the file is
accessed using the SMB protocol.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 211


[email protected]
Advanced Authorization

* When the NFS client issues an "ls", the approximated POSIX are seen, but actual
file access is evaluated against the ACL. The plus sign (+) indicates that file has
real ACLs.

1:

When you access a file that is in the real POSIX/synthetic ACL state using NFS,
OneFS checks the standard POSIX permissions. When a Windows user checks
the permissions of a file that has the POSIX authoritative state by using the
Windows Explorer Security tab, that user expects to see the file ACLs 204.

203Access permissions determine whether a user can access a file or directory.


Access permissions are based on the comparison of the user access token and the
actual permissions.

204When a file is accessed over SMB, OneFS generates a synthetic ACL that is
based directly on the POSIX permissions of file. The synthetic ACL is a direct, one-
to-one correlation of the POSIX permissions to an ACL. The synthetic ACL is not
persistent: it is not stored on disk. OneFS only creates the synthetic ACL at the
time the file is accessed using the SMB protocol.

PowerScale Advanced Administration

Page 212 © Copyright 2020 Dell Inc.


[email protected]
Advanced Authorization

At the NFS side, File1 is in the real POSIX/synthetic ACL state. If you are a
Windows user looking at permissions over SMB of File1 using a Windows Explorer
Security tab, you do not want to see POSIX bits205. The resulting ACLs emulate the
POSIX permissions and look like normal ACLs. This action does not change the
actual permissions. It also does not affect actions that you can take on File1.

2:

If a file on the PowerScale cluster is in the real ACL/approximated POSIX state,


that real ACL is authoritative206 for file permissions regardless of protocol access. If
you are a UNIX user looking at the same file permissions over NFS, the real ACL207
of that file defines your access.

ACL contains multiple separate entries208. The returned POSIX is an approximation


of a real ACL to the best of UNIX ability. OneFS must check the file access directly
against the ACL as it is likely a lot more granular.

205For one thing, POSIX bits are not as rich as SMB ACLs, and for another, you
expect to see SMB-style ACLs for File1. OneFS accommodates Windows users
expectations by automatically generating a set of synthetic ACLs on the fly based
on the POSIX bits.

206Accessing that file from Windows/SMB means that OneFS performs the file
access check directly against the ACL as usual.

207 The POSIX bits do not matter from an access check perspective, but OneFS will
still need to show the POSIX mode bits. That is because when you issue an "ls"
command on a file across NFS, OneFS has to return an NFS view of file
permissions. However, those POSIX mode bits do not represent the file
permissions: the REAL ACL does.

208For example, you can add many UIDs, GIDs, and SIDs to define permissions if
you need to. But POSIX has only three bits to work with: read/write/execute for the

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 213


[email protected]
Advanced Authorization

Discussion Point: Can I have both, POSIX and ACL permission


states over files and directories?

The permissions state and the access permissions on a file or directory do not
affect each other. Access permissions must be consistent (identical) regardless of
the file permissions state. In OneFS, files and directories can have only one set of
permissions and can exist in only one of two states: POSIX, ACL.

Share Permissions and File Permissions

The share permissions and file permissions are added up and compared to each
other when determining what the effective permissions are on any object.

• Whichever permission is more restrictive is the


effective permission.
− For example, if a share has permission giving
a user read/write control, but the file
permission restricts that user to read-only, the
effective permission is read-only.
• Check the permissions set on the parent directory. Checking the permissions on
the parent directory could be the reason that the user is getting permission
denied messages.

three permissions classes: user/group/everyone. Because there is no exact match


of ACLs to POSIX bits, OneFS cannot do a one-to-one mapping of permissions
settings.

PowerScale Advanced Administration

Page 214 © Copyright 2020 Dell Inc.


[email protected]
Advanced Authorization

File Permissions Viewing Commands

In order to check PowerScale OneFS file permissions, from the PowerScale CLI,
the extended ls commands are used.

Command Applicable to Description


Filename or Directory

ls –le <filename> Filename Displays the file


permission state, ACLs,
owner, and group
information.

ls –len <filename> Filename Displays the file


permission state, ACLs,
owner, and group
information numerically.

ls –led <directory> Directory Displays the directory


permission state, ACLs,
owner, and group
information.

ls –lend <directory> Directory Displays the directory


permission state, ACLs,
owner, and group
information numerically.

Note: The ls command lists the directory contents. The -l option is to list files in the
long format. The -e option prints the Access Control List (ACL). The -d option lists
on the directory and not its contents. The n option displays user and group IDs
numerically.

Changing Owner and Groups Using CLI

Use -s before the username or group name in the command if the user or group is
located in an Active Directory authentication provider.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 215


[email protected]
Advanced Authorization

For example, the command for changing the owner of the file to the student
account in the dees.lab AD domain is chown -s DEES\\student file1.

The AD owner root and group owner wheel has read, write and execute
permissions over file1, all the other users are allowed only to read the files. Users
hayden and domain admins cannot write file1 because hayden and domain
admins are not in group wheel.

The AD owner and group owner has all the permissions, all the other users are
allowed only to read the files.

Without using the -s flag, users hayden and domain admins cannot change the
owners for file1.

After using -s flag, hayden and domain admins are the owner and group owner
of file1, they now have read, write and execute permissions over file1.

PowerScale Advanced Administration

Page 216 © Copyright 2020 Dell Inc.


[email protected]
Advanced Authorization

Modifying File Permissions over SMB

File permissions can be modified over SMB.

• The account that is used to log in to that share has the appropriate
permissions209.
• Inheritance is automatically added to the objects when adding or modifying
permissions210 of SMB.

209 File permissions are applied to users or groups.

210Administrators can disable this behavior per directory through SMB. The ability
to modify permissions over SMB can also be disabled through a global cluster
setting.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 217


[email protected]
Advanced Authorization

PowerScale Advanced Administration

Page 218 © Copyright 2020 Dell Inc.


[email protected]
Advanced Authorization

Export Permissions for NFS

Export Permissions

Can I have nesting access to an Export?

For example, an export that has read-only access to


a subnet of users, and then also has read-write
access in that subnet IP range.

Configuring access to the cluster using NFS requires configuring NFS exports and
associated permissions. With exports, permissions are not granted to users and
groups, but rather to hosts by hostname or IP, entire subnets, or netgroups. Hosts
should only be in one field per export.

Nesting Access Example

Yes, you can create an Export with nesting access. In the following example, all the
clients within the IP range 192.168.3.3/24 has read-only permission. Client with IP
192.168.3.3 has read-write permissions.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 219


[email protected]
Advanced Authorization

The IP 192.168.3.3 is configured as both read-only client and read-write client.


Since read-write client takes precedence, the IP 192.168.3.3 is able to write to the
NFS export.

PowerScale Advanced Administration

Page 220 © Copyright 2020 Dell Inc.


[email protected]
Advanced Authorization

Because the IP 192.168.3.60 is configured as read-only, that client is not able to


write to the NFS export.

You can enter a client by hostname, IPv4 or IPv6 address, subnet, netgroup, or
CIDR range. Client fields:

• Clients: Clients that are specified in the generic Clients field are given
read/write permission, unless the Restrict access to read-only checkbox is also
selected. Selecting treats all hosts in the Clients field as read-only but not affect
the hosts in the Always Read-write Clients or Root Clients fields.
• Always Read-Write Clients: Clients in the Always Read-Write Clients field are
given read/write permissions. The Restrict access to read-only checkbox does
not apply to hosts in this field.
• Always Read-Only Clients: Clients in the Always Read-Only Clients field are
given read-only permissions.
• Root Clients: Clients in the Root Clients field are mapped as root if the user
logged in to the local host is logged in as root. This option gives users
significant privileges within the export directories and should be avoided where
possible. Avoid this issue with the other permission fields by setting the cluster
to automatically perform root squash for all root users when connecting to the
cluster.
• Map Users: The Map Users options allow the administrator to specify users to
be mapped to other UIDs and thus be treated as if they are that other user
when connected to the cluster. This can also be done per access zone with
user-mapping rules. It also allows the cluster to squash root.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 221


[email protected]
Advanced Authorization

Adding and Removing ACE

Add ACE to ACL using CLI.

The chmod command in OneFS enables administrators to modify ACLs directly on


the cluster.

Command: chmod +a user|group allow|deny

If the user or group for this ACE is in an AD domain, the domain must be specified
as part of the username or group name as shown using either
'[user|group]@domain_name' or 'domain_name\[user|group]'

For the AD user student1 and AD group domain admins, the domain dees.lab is specified as part
of the user or group name.

Remove ACE from ACL using CLI.

As ACEs can be added individually, they can also be removed. The command uses
chmod -a# <ACE number> <filename>. It is important to verify that the right
ACE is being removed from the ACL when using this option.

PowerScale Advanced Administration

Page 222 © Copyright 2020 Dell Inc.


[email protected]
Advanced Authorization

Note: You can see the document Dell EMC Isilon: Access Control
Lists on HDFS and Isilon OneFS for list of PowerScale ACEs.
Advanced options can be used to set up ACLs. See the document
Access Control Lists on Dell EMC PowerScale OneFS.

Use the command chmod +a user|group allow|deny to add an ACE. Enter


the word user or group, then the username or group name that the ACE applies to,
and then the word allow or deny. The remainder of the command includes the
permissions and inheritance options to apply to the file or directory. The command
ends with the file or directory for this ACE.

Permission Repair Job

The OneFS PermissionRepair job, as the name suggests, provides an automated


way to fix access controls across a dataset. Run the PermissionRepair job against
a target which can be a file or directory under /ifs. Depending on the job mode,
Permission Repair enables:

• Permissions to be copied from a template to target

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 223


[email protected]
Advanced Authorization

• The target to acquire inheritable permissions from a template


• The target to acquire inheritable permissions from a template

The job contains three different execution options, or modes, depending on the
resolution required.

Mode Usage

Clone Used when a directory tree with a


large file count requires a new set of
permissions, such as switching from
POSIX mode bits to Windows ACLs.

Convert For modifying the on-disk identity type


and permission settings, such as
converting a directory path to UNIX
identity type.

Inherit Typically used whenever an inherited


access control entry (ACE) is added to
an existing directory tree.

Caution: Using Clone or Inherit changes the ownership and group of the
new directory.

PowerScale Advanced Administration

Page 224 © Copyright 2020 Dell Inc.


[email protected]
Advanced Authorization

Configuring Permission Repair Job

The PermissionRepair job consists of a single phase that is composed of several


tasks211. Successful execution of a work item produces an item result, which might
contain a count of the number of retries that are required to repair a file, plus any
errors that occurred during processing.

211Tasks are multiple individual work items that are divided up and load balanced
across the nodes within the cluster.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 225


[email protected]
Advanced Authorization

A quick way to apply permissions to a new directory is to create an empty directory


then apply the wanted default permissions on it. Next use the PermissionRepair job
to clone the permissions to new directories.

This method saves time if many new directories are needed as the basic
permissions do not need to be applied manually on every directory. An
administrator can change the impact configuration of a running job, without
changing the default for all instantiations of that job. The basic idea is that, rather
than updating the basic configuration of job, start the job and then change the
parameters of that one, running job, preventing changing the default configuration.

Access Denied Examples

Review the question and answer provided in each tab along with the directory and
share permissions.

PowerScale Advanced Administration

Page 226 © Copyright 2020 Dell Inc.


[email protected]
Advanced Authorization

Example 1

Can the AD user named “Administrator” be able to write to the share?

Answer: No. POSIX is authoritative and directory permission only allows root to
write to the test directory.

Example 2

Will the AD user named “Administrator” be able to write to the share?

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 227


[email protected]
Advanced Authorization

Answer: No. The share permission gives Everyone read-only permission, including
the “Administrator” account.

Example 3

Will the AD user named “hayden” be able to write to the share?

PowerScale Advanced Administration

Page 228 © Copyright 2020 Dell Inc.


[email protected]
Advanced Authorization

Answer: No. The user “hayden” is a member of the group "Users”, which has a
deny write permission set on the directory.

Note: The SMB ACL is implemented as a canonical ACL.212

212 During the translation to SMB ACL, OneFS translates the internal ACL to a
canonical ACL and sends it to SMB clients. A canonical ACL always places an
explicit ACE before an inherited ACE, and always places a deny ACE before an
allow ACE. Since SMB clients are always presented with a reordered canonical
ACL other than the actual ACL in OneFS, users need to be careful when editing
ACLs through SMB.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 229


[email protected]
Advanced Authorization

Challenge

Lab Assignment: Understand the impact of multiple users, using different


protocols, accessing the same files.
1) Implement NFS root squash.
2) Change permissions for a file accessed by both a Windows and Linux
user.
3) Execute the Permissions Repair job.

PowerScale Advanced Administration

Page 230 © Copyright 2020 Dell Inc.


[email protected]
Advanced Authorization

User Mapping

User Mapping Overview

User mapping provides a way to control access by specifying a user’s complete list
of the security identifiers, user identifiers, and group identifiers. OneFS uses the
identifiers— which are commonly called SIDs, UIDs, and GIDs respectively—to
determine ownership and check access.

As multiple authentication providers from different environments are added, the


challenges of a multiprotocol environment start at identifying users across
platforms.

User Mapping213 vs No User Mapping214

User Mapping Example

213With the user mapping service, rules are configured to manipulate a user’s
access token by modifying which identity OneFS uses, adding supplemental user
identities, and changing a user’s group membership. OneFS maps users only
during login or protocol access.

214 When there is no user mapping OneFS authenticates user from the active
directory and builds an access token which prioritizes the account info from active
directory. If rules are not configured, a user authenticating with one directory
service receives full access to the identity information in other directory services
when the account names are the same.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 231


[email protected]
Advanced Authorization

1: The default mapping provides a user with a UID from LDAP and a SID from the
default group in Active Directory. The user’s groups come from Active Directory
and LDAP, with the LDAP groups added to the list.

2: OneFS is connected to two directory services, Active Directory and LDAP.

ID-Mapping vs User Mapping

A token has both ID mapping215 and User-mapping216 services.

215The ID-mapping service maps the user’s SIDs to UIDs and GIDs if the user
connects over SMB. If the user connects to the cluster over NFS, the ID-mapping
service does not map the UID and GIDs to SIDs by default. There is no mapping
since the default on-disk identity is in the form of a UID and GID.

216 The user-mapping service is responsible for combining access tokens from
different directory services into a single token.

PowerScale Advanced Administration

Page 232 © Copyright 2020 Dell Inc.


[email protected]
Advanced Authorization

Connect via SMB


Access Token

User mapping -
combine directory
service tokens

Connect
via NFS

Source: https: PowerScale OneFS User Mapping

Building Access Tokens

1. When the cluster builds an access token, it must begin by looking up users in
external directory services.
• Over SMB: AD preferred, LDAP can be appended.
• Over NFS: LDAP or NIS only
2. By default, the cluster matches users with the same name in different
authentication providers and treats them as the same user.
3. The ID-mapping service populates the access token with the appropriate
identifies. Accounts are matched to combine access tokens from different
directory services.
4. Finally, the on-disk identity is determined.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 233


[email protected]
Advanced Authorization

User-Mapping Options

There are a few options that the administrator has when considering how to
configure multiprotocol access to the cluster. If multiprotocol access cannot be
avoided, the best practice is to keep the naming schema for users in different
authentication providers the same. However, if the usernames are not the same in
the different authentication providers, the admin must choose how they want to
map.

Example217

Scenario 1 Scenario 2

Usernames are not the same in AD and


Usernames are the same in AD and LDAP
LDAP

Add LDAP information in AD using


No additional action is required Microsoft SFU and RFC 2307

Export AD groups, convert using


LDIF, ingest to LDAP

Usernames will be mapped


automatically by cluster
Create manual mapping rules on
the cluster

User-Mapping Rules for Manual Mapping

The user-mapping rules have five operators that can be applied to each rule, and
each rule can be applied to a specific access zone.

• Append (++)218

217
If the username in the active directory is "jsmith", then the username in the
LDAP should also be the same "jsmith".

PowerScale Advanced Administration

Page 234 © Copyright 2020 Dell Inc.


[email protected]
Advanced Authorization

• Insert (+=)219
• Replace (=>)220
• Remove groups (--)221
• Join (&=)222

Configuring User Mapping Rules

• User-mapping rules can be configured through the web admin interface or


through CLI.

218Append rule adds fields to an access token, but it does not displace a primary
user or group. The mapping service appends the fields that are specified in the list
of options (user, group, groups) to the first identity in the rule.

219 Insert also adds fields to an access token, but it displaces a primary group into
the additional identifiers list. When the rule inserts a primary user or primary group,
it becomes the new primary user or primary group in the token. The previous
primary user or primary group moves to the additional identifiers list.

220 Removes a token and replaces it with a specified user. If the second username
is left blank, the mapping service removes the first username in the token, leaving
no
username, and then login fails with a no such user error.

221Remove groups removes group identifiers from the access token. Modifies a
token by removing the supplemental groups.

222 Join merges two access tokens together. While the operation is bi-directional,
meaning the cluster could perform mapping using either username specified in the
rule, the order does matter to determine file ownership. If one of the usernames
should always be the owner of a file, make it the first name in the rule.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 235


[email protected]
Advanced Authorization

• The mapping rules you create are configured per access zone, so ensure to
select the correct access zone before configuring.
• Once the operation is selected, the page updates to reflect the options to
configure for that operation.

Access -> Membership and Roles -> User Mapping

In this example, two users are being joined. The first user is in Active Directory, and
the second user is in LDAP. If the cluster is unable to perform the lookup, the user
is mapped to the “Guest” account. No other rules are checked for the users who
are specified in this rule.

Configuring User Mapping Rules - CLI

• This example shows joining the AD account student with the AD account
administrator, but only in the System zone.

PowerScale Advanced Administration

Page 236 © Copyright 2020 Dell Inc.


[email protected]
Advanced Authorization

− Other access zones do not map these two accounts together unless another
rule is created specifically for those access zones.
− There was no option to map a default user or to stop processing rules in this
command, but those options are available through the CLI as well.
• isi zones modify <Access Zone> --add-user-mapping-
rules=<username and action>
• isi zone zones modify System --add-user-mapping-
rules=<username and action>

Verify User Mapping Rules

You can view any rules that you create through CLI using the following command
isi zone zones view <Access Zone>. Once the rules are configured, verify
that the IDs are correctly mapped by running isi auth mapping token
<username> and checking that the right identifiers are displayed.

isi zone zones view <Access Zone>

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 237


[email protected]
Advanced Authorization

User Mapping Rules Example

The examples combine the username formats with operators to form rules. Several
of the rules include an option to specify how OneFS processes the rule.

• CORP\username => username [break]223


• DESKTOP\* &= * [break]224
• *\* => ""225
• *\Administrator => nobody226

Verify Access Tokens

• The access token for any user can be viewed using the isi auth mapping
token command.

223 The break option forces OneFS to stop applying rules and to generate the token
at the point of the break.

224This rule uses wildcards to join users from the DESKTOP domain with UNIX
users who have the same name in LDAP, NIS, or the local provider.

225This rule tightly restricts access by removing the identities of everybody other
than those permitted by preceding rules.

226This rule maps the administrator account from any Active Directory domain to
the nobody account on OneFS. The rule exemplifies how to turn a powerful
account into an innocuous account.

PowerScale Advanced Administration

Page 238 © Copyright 2020 Dell Inc.


[email protected]
Advanced Authorization

• Use this to determine correct mapping of the identifiers for the user and
showing the appropriate groups for that user.
• No zone is specified if the access tokens are being checked in the System
access zone. If the access token is being checked for another access zone, the
--zone option needs to be specified.

Username must be
specified as
domain\\username

Only need username


specified

Troubleshooting Providers and User Mapping

Troubleshoot Commands

Check user token. isi auth mapping token


<username>

Check if usernames are normalized on isi auth <provider type> list


lookup. -v | grep Normalize

If usernames are not being normalized isi auth <provider type>


on lookup, the option can be enabled. modify <provider name> --
normalize-users=yes

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 239


[email protected]
Advanced Authorization

Flush and refresh user mapping and


providers to update the information held • isi auth users flush
by the cluster. Flushing is especially
• isi auth groups flush
important to run after modifying user-
mapping rules to verify the token that is • isi auth mapping flush --
displayed by the cluster that contains all
the latest information.
• isi auth refresh

RFC 2307

RFC 2307 allows to implement unified authentication for UNIX and Windows Active
Directory accounts by associating a user ID (UID), group ID (GID), home directory,
and shell with an Active Directory object.

Integrating RFC 2307 with AD, simplifies the management of users in a


multiprotocol environment, as only a single authentication provider is required to
collect the SID and UID with associated GIDs.

OneFS does not require the NIS authentication component, as only the UID/GIDs
are used. AD with RFC 2307 maps SIDs with UID/GIDs, eliminating the need for
mapping in OneFS, simplifying management further.

To enable kerberized hadoop authentication operations where Active Directory is


the authentication authority, a few advanced options are required on the Active
Directory Provider.

PowerScale Advanced Administration

Page 240 © Copyright 2020 Dell Inc.


[email protected]
Advanced Authorization

• Navigate to Access > Authentication Providers > Active Directory.


• For an Active Directory provider, select rfc2307 from Services for UNIX.
• To enable rfc2307 for SFU support using the CLI:

− isi auth ads modify --sfu-suppot=rfc2307 FOO.COM

− isi auth ads view --provider-name=FOO.COM -v

Source: https: OneFS ACTIVE DIRECTORY


SETTINGShttps://www.dell.com/support/article/en-
id/sln319144/powerscale-onefs-considerations-for-active-
directory-based-kerberos-with-hadoop?lang=en

Configure New Cluster for RFC 2307

OneFS contains advancements and user mapping rules that make it easier to
converge LDAP, NIS, and Local Users with Active Directory users and groups.

Instead of converting an entire authentication infrastructure to RFC 2307, you can


use a user mapping rule to achieve the same result.

• Configure Active Directory


• Add UIDs to accounts in Active Directory

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 241


[email protected]
Advanced Authorization

• Add GIDs to accounts in Active Directory


• Join the cluster to Active Directory
• Disable UID and GID allocation on cluster
• Enable RFC 2307 on the cluster
• Verify mapping token from command line interface.

Source: https: OneFS: How to configure OneFS and Active Directory


for RFC2307 compliance

Best Practices

Dell EMC PowerScale recommends the following best practices to simplify user
mapping.

a. Use Microsoft Active Directory with Windows Services for UNIX and RFC 2307
attributes to manage Linux, UNIX, and Windows systems.
b. Follow the naming convention and name the users consistently so that each
UNIX user corresponds to a similarly named Windows user.
c. Ensure that UID and GID ranges do not overlap in networks with multiple
identity sources.
d. You should not use well-known UIDs and GIDs in your ID ranges because they
are reserved for system accounts.

PowerScale Advanced Administration

Page 242 © Copyright 2020 Dell Inc.


[email protected]
Advanced Authorization

e. It is recommended that the rules should be grouped by type and organized in a


particular order227.
f. You cannot use a user principal name in a user mapping rule.
g. When an PowerScale cluster is connected to Active Directory and LDAP, add
the LDAP primary group to the list of supplemental groups

A. Integrating UNIX and Linux systems with Active Directory centralizes identity
management and eases interoperability, reducing the need for user mapping rules.
Ensure your domain controllers are running Windows Server 2003 or later.

B. The simplest configurations name users consistently so that each UNIX user
corresponds to a similarly named Windows user. Such a convention allows rules
with wildcards to match names and map them without explicitly specifying each pair
of accounts.

C. It is also important that the range from which OneFS automatically allocates
UIDs and GIDs does not overlap with any other ID range. The range from which
OneFS automatically allocates a UID and GID is 1,000,000 to 2,000,000. If UIDs
and GIDs overlap across two or more directory services, some users might gain
access to other users’ directories and files.

D. UIDs and GIDs below 1000 are reserved for system accounts; do not assign
them to users or groups.

E. OneFS processes every mapping rule by default. Processing every rule, though,
can present problems when you apply a rule to deny all unknown users access. In
addition, replacement rules may interact with rules that contain wildcard characters.

227- Place the rules that replace an identity first


- Set join, add, and insert rules second.
- Set rules that allow or deny access last.
- Put explicit rules before rules with wildcards.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 243


[email protected]
Advanced Authorization

F. A user principal name (UPN) is an Active Directory domain and username that
are combined into an Internet-style name with an @ sign, like an email address:
[email protected]. If you include a UPN in a rule, the mapping service ignores it
and might return an error.

G. This practice lets OneFS honor group permissions on files created over NFS or
migrated from other UNIX storage systems.

Challenge

Lab Assignment:
1) View and verify the access token for an unmapped user existing on
both Windows and Linux.
2) Create and test a user mapping rule for a user on both Windows and
Linux.

PowerScale Advanced Administration

Page 244 © Copyright 2020 Dell Inc.


[email protected]
Reporting

Reporting

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 245


[email protected]
Reporting

Module Objectives

After completion of this module, you can:

• Describe CELOG, events and event groups.


• Describe alert and alert channels.
• Identify log file locations and log file details.
• Describe quota notifications, email mapping and its configuration.
• Describe audit capabilities and tracking protocol events.
• Describe SNMP architecture and configure its settings.

PowerScale Advanced Administration

Page 246 © Copyright 2020 Dell Inc.


[email protected]
Reporting

Events & Alerts

System Events

• OneFS uses events and event notifications228 to alert administrators to potential


problems with cluster health and performance.
• System events serve229 for two purposes:
• Track management task activities
• Report error and threshold incidents

228OneFS continuously monitors the health and performance of the cluster and
generates events when situations occur that might require attention. Events and
event notifications information includes drives, nodes, snapshots, network traffic,
and hardware.

229The main goal of the system events feature is to provide a mechanism for
customers and support to view the status of the cluster.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 247


[email protected]
Reporting

• Events provide notifications for any ongoing issues and display the history of an
issue230.
• The Cluster Events Log (CELOG) process monitors, logs, and reports the
important activities and error conditions on the nodes and cluster.

CELOG

• CELOG supports the task-management systems231, such as the Job Engine.


• CELOG receives communications from processes that monitor cluster
conditions and must log important events.
• CELOG provides a single location for logging events and for alert
notifications232.
• CELOG provides a single point from which notifications are generated, including
sending alert email messages and SNMP traps233.

230 Event information can be sorted and filtered by date, type/module, and criticality
of the event.

231The task-management systems notify CELOG of major task changes, such as


starting and stopping a job. However, the task-management system does not notify
CELOG of internal sub states, such as what files are being worked on and what
percentage of completion the job has reached.

232
The administrator can configure conditions for alert delivery, to best reflect the
needs of the organization.

233SNMP Version 3 (SNMPv3) is supported, providing authentication, adding


greater security than previous versions.

PowerScale Advanced Administration

Page 248 © Copyright 2020 Dell Inc.


[email protected]
Reporting

1: Monitor is responsible for system monitoring and event creation, it will send the
event to kernel queue.

2: Capture is responsible for reading event occurrences from the kernel queue,
storing them safely on persistent local storage, generating attachments, and
queuing them in priority buckets for analysis. Event capture continues to operate on
isolated nodes until the local storage full.

3: The main analysis process runs on only one node in the cluster. The analysis
process collects related event occurrences together as event group occurrences,
which can be reported upon by the Reporter, ignored (either automatically for
things like Job Engine events or manually) and resolved (either automatically by
event occurrences or manually).

4: Similar to the analysis, the event reporter runs on only one node in the cluster.
The event reporter periodically queries Event Analysis for event group occurrences
that have changed and for each of these evaluates any relevant alert conditions,
generating alert requests for any which are satisfied.

5: The Alerting is the final stage in the CELOG workflow. It is responsible for
delivering the alerts requested by the reporter. There is a single sender on the
cluster for each enabled channel.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 249


[email protected]
Reporting

CELOG Architecture

• Coalesces events into event groups and provides conditional alerting to prevent
over-notification.
• CELOG system processes raw events and stores them in log databases.
• Events themselves are not reported, but CELOG reports on event groups234.

234 Reporting on event groups is not uniform, but depends on conditions, and
defined reporting channels. Networking issues would be reported to a channel that
includes network administrators. However, database administrators would probably
not benefit much from the information, so their reporting channel need not be on
the list for networking related issues.

PowerScale Advanced Administration

Page 250 © Copyright 2020 Dell Inc.


[email protected]
Reporting

Event and Event Groups

• Events can be related to file system integrity, network connections, jobs,


hardware, and other vital operations and components of your cluster.
• Each event has two identifiers that help to establish the context of the event:
Event Type ID235 and Event Instance ID236.
• Events with similar root causes are organized into Event groups.
• Event groups provide a single point of management for multiple event instances
that are generated in response to a situation on your cluster.

235 Identifies the type of event that has been generated on the cluster.

236The event instance ID is a unique number that is specific to a particular


occurrence of an event type. When an event is submitted to the kernel queue, an
event instance ID is assigned. You can reference the instance ID to determine the
exact time that an event occurred.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 251


[email protected]
Reporting

• Event group example - Chassis Fan Failure237


• You can view individual events. However, you manage events and alerts at the
event group level.

Event Groups Administration

Ignore or resolve
multiple event
groups Filter event groups

View details and


take action for a
single event group

• The event groups are listed in the Cluster Management > Events and alerts >
Events page.

237For example, if a chassis fan fails in a node, OneFS might capture multiple
events related both to the failed fan itself, and to exceeded temperature thresholds
within the node. All events related to the fan will be represented in a single event
group. Because there is a single point of contact, you do not need to manage
numerous individual events. You can handle the situation as a single, coherent
issue.

PowerScale Advanced Administration

Page 252 © Copyright 2020 Dell Inc.


[email protected]
Reporting

• Two operations can be performed on an event group:


− Mark Resolved238
− Ignored239
• The list of event groups can be filtered and sorted.
• Like events, each defined event group has a unique ID240.
• CLI command: isi event groups list/view/modify

238When an event group is marked as resolved, no more event occurrences are


assigned to it.

239
When an event group is marked as ignored, it is not reported upon and does not
appear in lists by default.

240Some event groups can collect multiple event types and have IDs that do not
correspond to event types.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 253


[email protected]
Reporting

Event Analysis

OneFS Event Reference Guide

Event group ID as per OneFS Event


Reference Guide

• Understanding event information is important to decipher and take appropriate


action.
• The key details displayed about the event group are:
− Event group causes241
− Event group ID242

241 The 'Event group causes' section provides the short version of the event.

PowerScale Advanced Administration

Page 254 © Copyright 2020 Dell Inc.


[email protected]
Reporting

− Severity243
− Alert Channels244
− Event Count245
− Time noticed246
− Resolver Information247
− Ignored248
• Events within the event group are displayed below the summary information of
the event group.

242The event group instance is assigned a unique identifier within the cluster to
distinguish it from other instances of the same event group type.

243 The level of the event group's severity

244 The alert channel the alert was reported.

245The number of events that were generated for the given event group. The event
count and event time provide key information to determine the root cause.

246 The time logged by the initiating event of the group.

247When the event group is marked as resolved, additional information such as the
resolver name and resolver time is displayed.

248 The flag indicates whether the event group is marked as ignored or not.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 255


[email protected]
Reporting

Event Group Categories

• Event group categories are higher-level constructs that contain a subset of


event groups.
• For each event group category, there are several events defined.
• An event definition, also known as an event type, corresponds to a specific type
of event.
• CLI command to list event group categories: isi event categories list

OneFS Event Reference Guide

PowerScale Advanced Administration

Page 256 © Copyright 2020 Dell Inc.


[email protected]
Reporting

• Topics include managing event groups, alerts, alert channels, alert maintenance
and testing, and a full list of event IDs or codes.
• For each event group, the description and administrative action are specified.

Resource: OneFS Event Reference Guide

Alert and Alert Channels

• An alert is a message that describes a change that has occurred in an event


group249.

249At any point in time, you can view event groups to track situations occurring on
your cluster. However, you can also create alerts that will proactively notify you if

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 257


[email protected]
Reporting

• Alerts are distributed through alert channels250.


• You can configure a channel to deliver alerts with one of the following
mechanisms: SMTP, SNMP, or Connect Home.
• Each channel is one destination, but an alert can travel to multiple destinations.
• You can configure your cluster to only generate alerts for specific event groups,
conditions, severity, or during limited time periods.
• Administrators can assign different alerts to different alert groups251 within the
organization.
• Example: You can generate an alert when a new event is added to an event
group, when an event group is resolved, or when the severity of an event group
changes.

there is a change in an event group. You can control how alerts related to an event
group are distributed.

250Channels are pathways by which event groups send alerts. You can create and
configure a channel to send alerts to a specific audience, control the content the
channel distributes, and limit frequency of the alerts. The channel is a convenient
way of managing alerting configurations, such as SNMP hosts, and lists of email
addresses.

251Alerts are definable to meet the needs of the organization. Different alerts are
defined to provide separate event group alerting. For example, an organization can
create an alert for hardware events only, and route the alerts to the hardware
support team within the organization. Also, administrators can create alerts to
provide management notification when an event severity increases.

PowerScale Advanced Administration

Page 258 © Copyright 2020 Dell Inc.


[email protected]
Reporting

Alert Channel Administration

• Administrators can manually create and manage channels using the OneFS
WebUI and CLI.
• To create a new channel, a channel name and type is required.
• There are three primary channel types: SMTP252, ConnectEmc253, SNMP254
• You can specify255 one or more nodes that are allowed or denied to send alerts
through the channel.

252Alerts are sent as emails through an SMTP server. With an SMTP channel type,
email messages are sent to a distribution list. SMTP, authorization, and security
settings can be set.

253 ConnectEmc enables Dell EMC support to receive alerts from SRS.

254Configuring SNMP enables sending SNMP traps to one or more network


monitoring stations. Administrators can download the management information
base, or MIB, files for SNMP at /usr/local/share/snmp/mibs/. OneFS supports
SNMP Version 3, which provides authentication and adds greater security than
previous versions. OneFS 8.0 and above uses FreeBSD SNMP software (bsnmpd).
This is a faster, more stable solution than the net-snmpd that OneFS had used in
previous versions. This means better scalability, and better stability.

255
The node number is specified as an integer. If you do not specify any allowed
nodes, all nodes in the cluster will be allowed to send alerts.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 259


[email protected]
Reporting

System-created Type of channel


channels

Hayden creates an SMTP alert channel for all the marketing administrators.
WebUI: Cluster Management > Events and alerts > Alerts | CLI: isi event channels
create

Alert Channel Types - Settings

The alert channel types have different setup requirements. When the type is
selected, the setup options change appropriately in the WebUI.

SMTP ConnectEmc SNMP

PowerScale Advanced Administration

Page 260 © Copyright 2020 Dell Inc.


[email protected]
Reporting

• SMTP must be • Secure Remote • SNMP must be


configured in Cluster Services (SRS) must enabled and
Management > be enabled using the configured using the
General settings > Cluster Management Cluster Management
Email settings page. > General settings > > General settings >
Remote support SNMP monitoring
• One or more email
page. page.
addresses.
• The primary SRS • Community string and
gateway address, Host settings are
subnet and IP address required to configure
pool are required to SNMP.
configure SRS.

Alert Administration

• Configure alerts to associate the alert channel for sending alert notifications,
and to determine event criteria.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 261


[email protected]
Reporting

• Event criteria includes: Event Group IDs, condition to send, frequency, event
duration before sending.
• Administrators can create any number of alerts the organization requires.
• Administrators can select any or all event group categories256 to include in the
alert.
• CLI command: isi event alerts create
• Using the CLI, administrators can specify the severity level257.

Hayden creates an alert to email all marketing administrators on new event groups for
SmartQuotas, Snapshots and Software-related events.
WebUI: Cluster Management > Events and alerts > Alerts > Create an alert

It is recommended but not required to create the alert channels before creating
alerts. The different alert conditions available are:

256Also, individual specific event group IDs can be added to the alert. Alert
conditions provide additional refinement for when and how the alert is sent.

257 Severity levels: emergency, critical, warning, or information, or a combination of


different severity levels. To specify multiple alert channels or severity conditions,
use single quote bracketing containing a comma-separated list, for example
'emergency, critical'. If modifying using the WebUI, the severity restores to all.

PowerScale Advanced Administration

Page 262 © Copyright 2020 Dell Inc.


[email protected]
Reporting

• New event groups - Reports on event group occurrences that have never before
reported.
• New events - Reports on event group occurrences that are new since the event
group was last reported on.
• Interval - Provides periodic reports on event group occurrences that have not
been resolved.
• Severity increase - Reports on event group occurrences whose severity has
increased since the event group was last reported on.
• Severity decrease - Reports on event group occurrences whose severity has
decreased since the event group was last reported on.
• Resolved event group - Reports on event group occurrences that have been
resolved since the event group was last reported on.

The Maximum Alert Limit restricts the number of alerts sent. Some events can
generate tens, hundreds, or thousands of alerts. Maximum alert limits do not apply
to Interval conditions. The event longevity condition provides for a time delay
before sending an alert. Some events are self-correcting or may last a few seconds
based on certain cluster conditions. For example, a node CPU at 100 percent
utilization may only last for a short duration. Although the condition may be critical if
the event occurs over a prolonged period, events over a short period may not be
important.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 263


[email protected]
Reporting

Heartbeat Alert

• Heartbeat events are informational messages generated for testing purposes.


• The event logging and alert notification system is tested258 daily by default using
the event group Heartbeat (400050004)259.
• You can test a manually created alert channel by adding it to the default
Heartbeat alert.

258In order to confirm that the system is operating correctly, test events are
automatically sent every day, one event from each node in your cluster.

259By default, heartbeat test alerts are not sent to any other alert channel. To
monitor their success, administrators can configure an alert channel and add the
channel to the Heartbeat alert. Administrators can change the interval using the
WebUI or CLI.

PowerScale Advanced Administration

Page 264 © Copyright 2020 Dell Inc.


[email protected]
Reporting

• The test alert can be created and sent with a custom test message.
• CLI: isi event test create "Test message"

Events and Alerts: Settings

WebUI: Cluster management > Event and alerts > Settings


CLI: isi event settings modify/view

• You can modify settings to determine how event data is handled on your
cluster:
− Resolved event group data retention260
− Event log storage limit261

260By default, data related to resolved event groups is retained indefinitely. You
can set a retention limit to make the system automatically delete resolved event
group data after a certain number of days.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 265


[email protected]
Reporting

• System maintenance activities often create events based on the maintenance


activity262.
• OneFS provides the capability to suspend alerting during system maintenance
windows263.
− Start Date
− Start Time
− Duration
• You can change the frequency that a heartbeat event is generated using only
the OneFS CLI.

261You can also limit the amount of memory that event data can occupy on your
cluster. By default, the limit is 1 megabyte of memory for every 1 terabyte of total
memory on the cluster. You can adjust this limit to be between 1 and 100
megabytes of memory. When your cluster reaches a storage limit, the system will
begin deleting the oldest event group data to accommodate new data.

262These events are known and considered benign to the normal operations of the
cluster.

263Suspending alerts during the maintenance window helps to prevent flooding


alert channels and reduces notification clutter. OneFS continues to log events
during the maintenance window. Only alert generation is suspended. Alert
generation and reporting continue automatically once the set maintenance window
expires.

PowerScale Advanced Administration

Page 266 © Copyright 2020 Dell Inc.


[email protected]
Reporting

Challenge

Lab Assignment:
1) View and analyze events and event groups.
2) Create an SMTP alert channel and alert administrators for different
software events.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 267


[email protected]
Reporting

Log Files

System Log Files

Definition

Log files are a collection of informational files from multiple sources within the cluster.

The log entries provide detailed information about the operating system, file
system, entire cluster and on a node level including health, status, events, and
error conditions. Certain log files, like /var/log/messages, contain multiple
types of data while others are specialized and only contain one type.

Log Files Overview

• Log files are the primary source of information for troubleshooting issues on the
cluster.
• Multiple different log files are created for each type of cluster issue category and
provide the raw captured details.
• Different logs provide cluster-wide and node level details. Each log contains
their own set of information captured.
• The log file information provides the detailed information of activity and status of
the cluster at any given point in time. Use log file information to troubleshoot
issues with the cluster.

PowerScale Advanced Administration

Page 268 © Copyright 2020 Dell Inc.


[email protected]
Reporting

The graphic shows an example with a portion of the /var/log/messages log file from node 1.
The file reflects information from multiple processes.

NFS Log Files

OneFS writes log messages that are associated with NFS events to a set of files in
/var/log.

With the log level option, you can specify the detail at which log messages are
output to log files.

Command: isi nfs log-level modify

The table describes the log files that are associated with NFS.

Log File Description

nfs.log Primary NFS server


functionality (v3, v4,
mount).

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 269


[email protected]
Reporting

rpc_lockd.log NFS v3 locking events


through the NLM protocol

rpc_statd.log NFS v3 reboot detection


through the NSM protocol

isi_netgroup_d.log Netgroup resolution and


caching

Valid logging levels are:

Log Level Description

always Specifies that all NFS events are logged in NFS


log files.

error Specifies that only NFS error conditions are


logged in NFS log files.

warning Specifies that only NFS warning conditions are


logged in NFS log files.

info Specifies that only NFS information conditions


are logged in NFS log files.

verbose Specifies verbose logging.

debug Adds information that we can use to


troubleshoot issues

trace Adds tracing information that we can use to


pinpoint issues

PowerScale Advanced Administration

Page 270 © Copyright 2020 Dell Inc.


[email protected]
Reporting

Example

Contents of nfs.log file

Example of setting the log-level option to always.

Log File Facts

OneFS maintains both the cluster-wide and node-specific logs are maintained on
each node. Each node has its own set of log files.

• Information contained in logs264


• Rotated automatically on predefined basis

264Some logs are general log files such as the messages log file, and some are
process-specific, such as logs for SMB, CELOG, alerts or events, and hardware
logs such as drive evaluation and drive history.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 271


[email protected]
Reporting

• If node is rebooting every 30 seconds265


• Can be sent to Dell EMC for troubleshooting266

Log File Locations

While there are ways to see cluster information remotely, the raw log files on each
node are located under the /var/log directory. Under the /var/log directory, a
series of different log file subdirectories exist that contain the individual log files.

Locate and identify the files using standard UNIX commands:


• ls –F267
• find –name268
• -F269

265The replaced log files remain in the /var/log directory for time. Administrators
may need to delete these old log files when troubleshooting, but should not do so
without senior technical support guidance. The size of the /var/log partition is either
500 MB or 2 GB for generation 4 and 5 nodes, and varies by node type for
generation 6 nodes.

266 You can send log files to technical support. Technical support will request log
files when troubleshooting issues. Upload the logs using Secure Remote Services,
HTTP, or FTP. Log files can be large depending on the cluster activity and can
some time to collect and upload.

267 The ls –F command is helpful to see all files and directories.

268 The find –name command allows administrators to identify the specific files.

269 The -F option for ls displays the/for directories.

PowerScale Advanced Administration

Page 272 © Copyright 2020 Dell Inc.


[email protected]
Reporting

Rotated logs

Directory

The graphic shows the list of all the CELOG files including the older log files that are removed from
the current file being logged to using the find command.

Look for the masterfile.txt to determine which node is the primary CELOG for
the whole cluster of nodes. This is located in the following file:
/ifs/.ifsvar/db/celog/masterfile.txt.

Looking in the masterfile.txt, the number in the file signifies the node that has
devid ‘n’ where ‘n’ is the number in the masterfile.txt. In a six node cluster, 1
signifies the node that has devid as 1.

Log Detail Level

The amount of detail that is captured in the log files can vary based on the level of
detail set either as a default or changed to help troubleshoot an issue.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 273


[email protected]
Reporting

The three verbosity levels are typically used by Technical Support:


• Logging or Warning—default270
• Error271
• Verbose272

Trace and Debug are additional levels that engineering and development use, the
levels are so verbose that they should not be left on. The amount of data they
gather can affect the cluster’s performance substantially. They can generate
enough data to fill /var/log if run for an extended period.

One consideration is that increasing the log detail level increases the number of
entries in the log file and increases the amount of information to sort through to find
the issue. This also increases the amount of time for sorting the information to find
the issue. The practice is to use the lowest detail level required to identify, isolate,
and troubleshoot an issue. Another potential issue is /var/log directory is
limited in size. Too much information can risk filling up the /var directory. If
changing the log detail level, it is important to reset it when finished. Generally, a
reset should only be done at the direction of technical support or engineering.

270 Logging has the lowest performance effect.

271 Error has a more noticeable performance effect.

272
Verbose can have significant effect on cluster performance and effect
workflows.

PowerScale Advanced Administration

Page 274 © Copyright 2020 Dell Inc.


[email protected]
Reporting

Log File Gather

Tarbill File Creation

The graphic shows how the isi diagnostics gather command, whether run
through the CLI or using the web administration interface, collects similar log files,
packages them into several files in .tar file format. The tar files are then
packaged together and compressed using gzip into a single large .tgz file known
as a tarball for transport to Technical Support.

Log files Similar log files in Tarfiles


collected tar files consolidated and
compressed

Tarball

• Extract the contents of a tarball with the following command at the command
line:
• tar xzvf filename.tgz
• To examine a list of the files in a tarball without opening, it uses the command:

• tar tzvf filename.tgz

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 275


[email protected]
Reporting

Web Administration Interface

Cluster Management > Diagnostics > Gather

• Gather in the web administration interface runs the isi diagnostics


gather command.
• The available options start a new gather, stop a gather in progress, download a
gather tar file, or delete a gather tar file.
• Complete log file gathers are stored on the cluster in the
/ifs/data/Isilon_Support/pkg directory.

PowerScale Advanced Administration

Page 276 © Copyright 2020 Dell Inc.


[email protected]
Reporting

Gather Settings

• The setting changes the global defaults for gather info and offer limited options
for modification.
• If having difficulty configuring, contact the network administration team.
• The preference is to gather full log sets. The tool used by support to analyze log
sets requires a full log gather.
• Check the default FTP settings if logs are not being uploaded to Technical
Support’s FTP server during a log gather process.
• HTTP and FTP proxy server settings point to internal FTP or HTTP proxy server
that is used as the gateway to reach external IP network.

isi diagnostics gather

Run isi diagnostics gather command as an alternative to using the web


administration interface.

The gather process collects information about different system utilities and collects
groups of information. You can specify multiple upload locations and upload
methods. You can modify the default setting.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 277


[email protected]
Reporting

To view all the available options and option descriptions, use the following
commands:
• isi diagnostics gather -h
• isi diagnostics gather settings view -h
• isi diagnostics gather settings modify -h

The graphic shows several of the most commonly used command options. Use these options to
view the help related to running log gather, or viewing and modifying log gather settings.

isi diagnostics gather Output

The isi diagnostics gather collects log files from the cluster, runs various
commands, then produces the output from the commands. Logs are copied from
the cluster during the log gathering process and in the output.

In a log set there are many different logs that can be classified as they relate to a
single node at a time, or the whole cluster. Logs can also be classified based on
whether they only describe a functioning aspect of the cluster, or whether a lot of
generic information is placed in them.
• Node-specific logs
• Logs relating to the whole cluster
• Messages log

In a log set, node-specific logs are placed in a separate directory for each node.
This means that administrators can use these logs to understand what was
happening from the perspective of each node by going into that sub directory of the
log set.

PowerScale Advanced Administration

Page 278 © Copyright 2020 Dell Inc.


[email protected]
Reporting

Logs relating to the whole cluster are stored in the shared /ifs directory. These are
not placed in node-specific sub directories. CELOG output is generic because it
relates to all cluster events, without being limited to a single node or a single
application. The CELOG files exist on each node. The
isi_celog_analysis.log, isi_celog_capture.log,
isi_celog_alerting.log, isi_celog_events.log, and
isi_celog_monitor.log logs make up the CELOG log file set. The logs exist
on every node, however, the set on the primary coalescer node contains the
primary log set for the cluster-wide CELOG events. So a log can contain specific
node and cluster-wide information in the same log file.

Each node has its own messages log, because the messages log is built by the
individual OneFS instance running on each node separately, but it is not linked to
any one service or application on the node. The vsftpd.log is specific to the FTP
daemon running on each node individually, and therefore only tells about that one
service on each node individually. Here is paths to some of the logs on the live
system as described above:/var/log/messages/var/log/lsassd.log

Examples of some of the various commands that are run during the log gather are
isi_status, isi_quota, and isi_hw_status. The output from each command
is in the log gather output file.

Log Scope and Variability

• Difficulties in a PowerScale cluster can arise for many reasons, and the same
symptom may appear different in the logs.
• For example, administrators may have a network configuration where routing
errors prevent the cluster from reaching an LDAP server. This could mask the
fact that LDAP is also misconfigured, or this could be the result of a cluster
configuration error.
• Failed reporting systems can completely prevent alerts from reaching
administrative staff. SPAM filters may prevent internal staff from receiving email
alerts.
• Log files are guides to help identify problems. Log files are not answers to the
root cause.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 279


[email protected]
Reporting

Challenge

Lab Assignment: Log in to the cluster and observe detailed information


about the operating system, file system, entire cluster and on a node level
including health, status, events, and error conditions.
1) Gather log files for PowerScale support.

PowerScale Advanced Administration

Page 280 © Copyright 2020 Dell Inc.


[email protected]
Reporting

Notifications

Quota Notifications

Quota notifications are generated for enforcement quotas, providing users with
information when a quota violation occurs. Reminders are sent periodically while
the condition persists.

You can configure notifications globally, and apply to all quota domains, or
configure for specific quota domains. Enforcement quotas support the following
notification settings.

A given quota can use only one of these settings.

Limit Notifications Settings Description

Disable quota notifications Disables all notifications for the quota.

Use the system settings for quota Uses the global default notification for
notification the specified type of quota.

Create custom notifications rules Enables the creation of advanced,


custom notifications that apply to the
specific quota.

Each notification rule defines the condition that is to be enforced and the action that
is to be executed when the condition is true. An enforcement quota can define
multiple notification rules. When thresholds are exceeded, automatic email
notifications can be sent to specified users, or you can monitor notifications as
system alerts or receive emails for these events.

Notification Rules

Quota notification rules can be written to generate alerts that are triggered by event
thresholds. When an event occurs, a notification triggers according to your
notification rule.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 281


[email protected]
Reporting

A notification trigger may execute one or more actions, such as sending an email or
sending a cluster alert to the interface.

Notifications triggered for events are grouped by the following categories:


• Instant Notifications273
• Ongoing Notifications274

SMTP Email Settings

OneFS can send event notifications through an SMTP mail server. OneFS supports
SMTP authentication.

273Includes the write-denied notification, triggered when a hard threshold denies a


write, and the threshold-exceeded notification, triggered at the moment a hard, soft,
or advisory threshold is exceeded.

274Generated on a scheduled basis to indicate a persisting condition, such as a


hard, soft, or advisory threshold being over a limit or a soft threshold's grace period
being expired for a prolonged period.

PowerScale Advanced Administration

Page 282 © Copyright 2020 Dell Inc.


[email protected]
Reporting

1: IPv4 or IPv6 address or the fully qualified domain name of the SMTP relay are
entered here.

2: Port number is entered here. The default port number is 25.

3: If SMTP authentication is required, then check this box. After that enter the
authentication username and password, and confirm the password.

4: Type the originating email address that will be displayed in the To line of the
email in the Send email as field.

5: Type the Subject line of the email.

6: Select an option from the Notification Batch Mode drop-down menu to batch
event notification emails.

7: In the Default Email Template drop-down menu, select whether to use the
default template provided with OneFS or a custom template. If you select a
custom template, the Custom Template Location field appears. Enter a path name
for the template.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 283


[email protected]
Reporting

Managing Quota Notifications

OneFS configures and applies global quota notification to all quotas. You can
continue to use the global quota notification settings, modify the global notification
settings, or disable or set a custom notification for a quota.

Enforcement quotas support four types of notifications and reminders:

• Threshold exceeded
• Over-quota reminder
• Grace period expired
• Write access denied

If a directory service is used to authenticate users, you can configure notification


mappings that control how email addresses are resolved when the cluster sends a
quota notification. If necessary, you can remap the domain that is used for quota
email notifications and you can remap Active Directory domains, local UNIX
domains, or both.

Quota Notifications Settings

You can configure default global quota notification settings that apply to all quotas
of a specified threshold type.

PowerScale Advanced Administration

Page 284 © Copyright 2020 Dell Inc.


[email protected]
Reporting

• Click File System > SmartQuotas > Settings


• Configure the reporting options in Scheduled Reporting area and Manual
Reporting area

1: In the Archive Directory field, type or browse to the directory where you want to
archive the scheduled quota reports.

2: In the Number of Scheduled Reports Retained field, type the number of reports
that you want to archive.

3: Select Scheduled to enable scheduled reporting, or select Manual to disable


scheduled reporting.

4: In the Archive Directory field, type or browse to the directory where you want to
archive the manually-generated quota reports.

5: In the Number of Live Reports Retained field, type the number of reports that
you want to archive.

Quota Notifications Email Mapping

In the Email Mapping area, define the mapping rule or rules that you want to use.
To add an email mapping rule, click Add a Mapping Rule, and then specify the
settings for the rule.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 285


[email protected]
Reporting

1:

From the Type list, select the authentication


provider type for this notification rule

From the Current domain list, select the domain that you
want to use for the mapping rule.Callout

In the Map to domain field, type the name of the domain that
you want to map email notifications to.

2:

PowerScale Advanced Administration

Page 286 © Copyright 2020 Dell Inc.


[email protected]
Reporting

Select the rule type to use


from the Rule type list

Select the 'notify


owner' option to
use

Tip: Before using quota data for analysis or other purposes, verify that
no QuotaScan jobs are in progress by checking Cluster
Management > Job Operations > Job Summary.

Email Quota Notification Messages

If email notifications for exceeded quotas are enabled, you can customize
PowerScale templates for email notifications or create your own. There are four
email notification templates provided with OneFS. The templates are located in
/etc/ifs and are described in the following table:

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 287


[email protected]
Reporting

• quota_email_template.txt

− A notification that disk quota has been exceeded.


• quota_email_grace_template.txt

− A notification that disk quota has been exceeded (also includes a parameter
to define a grace period in number of days).
• quota_email_test_template.txt

− A notification test message you can use to verify that a user is receiving
email notifications.
• quota_email_advisory_template.txt

− A notification that disk quota has been exceeded.

Tip: If the default email notification templates do not meet the needs,
you can configure your own custom email notification templates by
using a combination of text and SmartQuotas variables.

PowerScale Advanced Administration

Page 288 © Copyright 2020 Dell Inc.


[email protected]
Reporting

Custom Email Notification Template

An email template contains text, and optionally, variables that represent values.
You can use any of the SmartQuotas variables in your templates.

Variable Description Example

ISI_QUOTA_DOMAIN_TYPE Quota type. Valid default directory


values are:
directory, user,
group, default-
directory, default-
user, default-group

ISI_QUOTA_EXPIRATION Expiration date of Fri May 22


grace period 14:23:19 PST
2015

ISI_QUOTA_GRACE Grace period, in 5 days


days

ISI_QUOTA_HARD_LIMIT Includes the hard You have 30 MB


limit information of left until you
the quota to make reach the hard
advisory/soft email quota limit of 50
notifications more MB.
informational

ISI_QUOTA_NODE Hostname of the someHost-prod-


node on which the wf-1
quota event
occurred

ISI_QUOTA_OWNER Name of quota jsmith


domain owner

ISI_QUOTA_PATH Path of quota /ifs/data


domain

ISI_QUOTA_THRESHOLD Threshold value 20 GB

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 289


[email protected]
Reporting

ISI_QUOTA_TYPE Threshold type Advisory

ISI_QUOTA_USAGE Disk space in use 10.5 GB

Customize Email Quota Notification Templates

PowerScale templates can be customized for email notifications. Customizing


templates can be performed only from the OneFS command line interface.

This procedure assumes that you are using the PowerScale templates, which are
located in the /etc/ifs directory.

Custom email quota email notification template for Marketing administrators.

• Open a secure shell (SSH) connection to any node in the cluster and log in.
• Copy one of the default templates to a directory in which you can edit the file
and later access it through the OneFS web administration interface.
• Open the template file in a text editor.
• Edit the template. Ensure that the template has a Subject : line, if a
customized template is being used or created.
• Save the changes. Template files must be saved as .txt files.

SMTP Email Settings Use Case

Click each icon to learn more about configuration of SMTP email settings.

PowerScale Advanced Administration

Page 290 © Copyright 2020 Dell Inc.


[email protected]
Reporting

3 2 1

1: If your SMTP server is configured to support authentication, you can specify a


username and password. You can also specify whether to apply encryption to the
connection.

2: You can specify an origination email and subject line for all event notification
email messages sent from the cluster.

3: SMTP settings include the SMTP relay address and port number that email is
routed through.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 291


[email protected]
Reporting

Configure SMTP Email settings Example

1 2

1: Following example configures SMTP email settings:

Click image to enlarge.

2: To view email settings:

PowerScale Advanced Administration

Page 292 © Copyright 2020 Dell Inc.


[email protected]
Reporting

Protocol Auditing

Auditing Overview

Auditing is the ability to log specific activities on the cluster. Auditing provides the
capability to track whether the data was accessed, modified, created, or deleted.

The auditing capabilities in OneFS include two areas: monitoring preaccess275 and
postaccess276 on cluster. These areas are the ability to audit any configuration
changes and to audit the client protocol activity.

Postaccess: Log
configuration
changes

Preaccess: Log
protocol activity -
NFS, SMB, HDFS

Audit capabilities are required to meet regulatory and organizational compliance


mandates. These include HIPAA, SOX, governmental agency, and other
requirements. Only the configuration changes made through API are logged. The

275 Preaccess configuration changes are cluster login failures and successes.

276 Postaccess are changes to protocols and configurations.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 293


[email protected]
Reporting

audit system also provides the capability to make the audit logs available to third-
party audit applications for review and reporting.

Audit Review

OneFS stores all audit data in audit topic277 files, which collect log information that
can be further processed by auditing tools.

1:

OneFS 7.1 consists of an input/output (LWIO) filter manager. The filter manager
provides a plug-in framework for pre, and post input/output request packet278 (IRP).

277System configuration auditing is either enabled or disabled; no additional


configuration is required. If configuration auditing is enabled, all configuration
events that are handled by the application programming interface (API) are tracked
and recorded in the configuration audit topic.

PowerScale Advanced Administration

Page 294 © Copyright 2020 Dell Inc.


[email protected]
Reporting

• The audit events are logged on the individual nodes279


• Logs automatically roll over to a new file280

2:

In OneFS 7.1.1, audit logs are automatically compressed. Audit logs are
compressed on file roll-over281.

278The IRP provides the mechanism to encode a protocol request handled by


LWIO and encodes the request handled by the file system drivers. Audit events are
processed after the kernel has serviced the IRP. If the IRP involves a configured
audit event for an Access Zone where auditing is enabled, an audit payload is
created.

279 The audit events are logged on the individual nodes where the SMB/NFS client
initiated the activity. The events are then stored in a binary file under
/ifs/.ifsvar/audit/logs.

280 The logs automatically roll over to a new file once the size reaches 1 GB. The
default protection for the audit log files is +3. There are various regulatory
requirements, such as HIPAA, which require two years of audit logs, the audit log
files are not deleted from the cluster.

281 As part of the audit log roll-over, a new audit log file is actively written to, while
the previous log file is compressed. The estimated space savings for the audit logs
is 90%.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 295


[email protected]
Reporting

• CEE forwarder handles forwarding events282


• OneFS 7.1.1 added the ability to forward283 config and protocol auditing events
to a syslog server.

3: OneFS 8.0.1 adds the support for concurrent delivery to multiple CEE servers.
Each node initiates 20 HTTP 1.1 connections across a subset of CEE servers.
Each node can choose up to 5 CEE servers for delivery. The HTTP connections
are evenly balanced across the CEE servers from each node. The change results
in increased audit performance.

4:

Starting from OneFS 8.2.0, OneFS protocol audit events have been improved284 to
allow for more control of what protocol activity should be audited. The changes
allow increased performance and efficiency by allowing customers to configure
OneFS to no longer collect audit events that are not registered by their auditing
application.

282Once the auditing event has been logged, a CEE forwarder service handles
forwarding the event to CEE. The event is forwarded using an HTTP PUT
operation. At this point, CEE will forward the audit event to a defined endpoint,
such as Varonis DatAdvantage. The audit events are coalesced by the third-party
audit application.

283 By default, syslog forwarding will write the events to /var/log/audit_protocol.log


for protocol auditing events and /var/log/audit_config for configuration auditing
events.

284
It provides a granular way to select protocol audit events to stop collecting
unneeded audit events that third-party applications do not register for.

PowerScale Advanced Administration

Page 296 © Copyright 2020 Dell Inc.


[email protected]
Reporting

Audit Capabilities

• In OneFS, all data regardless of the zone is logged in the


/var/log/audit_config.log by default.
• Syslog is a protocol that is used to convey certain event notification messages.
• You can configure a PowerScale cluster to log audit events and forward 285 them
to syslog by using the syslog forwarder286.

Cluster management > Auditing page

285 By default, all protocol events that occur on a particular node are forwarded to
the /var/log/audit_protocol.log file, regardless of the access zone the event
originated from.

286The syslog forwarder is a daemon that retrieves configuration changes and


protocol audit events in an access zone and forwards the events to syslog. Only
user-defined audit success and failure events are eligible for being forwarded to
syslog.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 297


[email protected]
Reporting

1: Users can specify which events to log in each access zone. For example, user
might want to audit the default set of protocol events in the System access zone,
but audit only successful attempts to delete files in a different access zone. The
audit events are logged on the individual nodes where the SMB, NFS, or HDFS
client initiated the activity. Then the events are stored in a binary file under
/ifs/.ifsvar/audit/logs. The logs automatically roll over to a new file after
the size reaches 1 GB. Settings are disabled by default.

2: Can view on a per zone basis.

Protocol auditing tracks and stores activity that is performed through SMB, NFS,
and HDFS protocol connections. Users can enable and configure protocol auditing
for one or more access zones in a cluster. Enable protocol auditing for an access
zone records file access event through the SMB, NFS, and HDFS protocols in the
protocol audit topic directories.

CLI commands:

• isi audit settings modify --config-auditing-enabled {true |


false}: enables or disables auditing.
• isi audit settings modify --config-syslog-enabled {true |
false}: enables or disables syslog forwarding.

PowerScale Advanced Administration

Page 298 © Copyright 2020 Dell Inc.


[email protected]
Reporting

Event Forwarding

Users can configure OneFS to send protocol auditing logs to servers that support
the Common Event Enabler, or CEE. The CEE enables third-party auditing
applications to collect and analyze protocol auditing logs. The CEE has been tested
and verified to work on several third-party software vendors.

• CLI command: isi audit settings global modify --cee-server-


uris=<uris>

WebUI: Cluster Management > Auditing

Best Practice: It is recommended that you install and configure third-


party auditing applications before you enable the OneFS auditing
feature. Otherwise, all the events that are logged are forwarded to the
auditing application, and a large backlog causes a delay in receiving
the most current events.

Track Protocol Events

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 299


[email protected]
Reporting

Hi, I want to know how can I track the protocol


events when audit log has been stopped?

It does not matter if the audit log is enabled or disabled, you can query all the SMB
protocol access information through /var/log/lwiod.log.

• For NFS, you can query /var/log/nfs.log for the information.


• For HDFS, you can query /var/log/hdfs.log for the information.

For example, you can disable the protocol access audit log and then create a new
folder called audit test in an SMB file share. In /var/log/lwiod.log you can
find the following entries.

PowerScale Advanced Administration

Page 300 © Copyright 2020 Dell Inc.


[email protected]
Reporting

Audit Event Types

OneFS enables control over what protocol activity is audited.

 Auditing stops the audit event collection287


 Use the CLI command isi audit settings view to list the events.
 Direct mapping to CEE288

Shown are the detail_type events.

Modifying Event

The first command sets a create_file audit event upon success. The second
example logs all audit failures. To view the configured events for the access
zone, use the command that is highlighted.

287
In OneFS auditing stops the collection of audit events that third-party
applications do not register for or need.

288The events are a direct mapping to CEE audit events - create, close, delete,
rename set_security, get_security, write, read. The CEE servers listen, by default,
on port 12228.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 301


[email protected]
Reporting

Audit Log Viewer

OneFS provides a tool to view the binary audit logs stored on the cluster. Errors
while processing audit events when delivering them to an external CEE server are
shown in the /var/log/isi_audit_cee.log. Protocol-specific logs show
issues that the audit filter has encountered:

• /var/log/lwiod.log –SMB

• /var/log/nfs.log –NFS
• /var/log/hdfs.log -HDFS

The isi_audit_viewer command lists protocol audit events. Shown is the


OneFS audit event type in the detailType field.

PowerScale Advanced Administration

Page 302 © Copyright 2020 Dell Inc.


[email protected]
Reporting

Elevation of Privileges

Decision Point

Hayden wants to know, does


PowerScale have the capability to track
the elevation of privileges?

Yes, audit log will show for the elevated users using three aspects:

• SMB – elevation of privileges through run as root


• NFS – elevation of privileges through map users to root
• CLI – elevation of privileges through sudo

SMB - Elevation of Privileges

Create a test SMB share and allow the account Dante to run as root. The following
is the protocol access audit log entry which is generated by isi_audit_viewer
-t protocol:

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 303


[email protected]
Reporting

Note that both "userSID":"S-1-22-1-0" and "userID":0 indicate it is the root


account.

NFS - Elevation of Privileges

Create a test NFS export and map all the non-root users to root:

Note: Map Non Root User mapping is disabled by default. We recommend that
you specify this setting on a per-export basis, when appropriate.

PowerScale Advanced Administration

Page 304 © Copyright 2020 Dell Inc.


[email protected]
Reporting

CLI - Elevation of Privileges

The following example shows a user logging in as Dante to run sudo command.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 305


[email protected]
Reporting

The "UID":0 in graphic represents the root account. However, unlike the example
for SMB/NFS permission elevation, the SID here represents the one behind the
sudo command. In this case, it is Dante:

Note: Root can grant enhanced permissions to users so they may perform
privileged tasks. This way users are provided access to a OneFS to perform their
specific tasks beyond normal end user permissions.

System Level Objects

Will PowerScale capture the creation or


deletion of system level objects?

Audit log can track system level object creation or deletion operations only when
performed through WebUI or CLI.

The graphic shows an example of the audit log for creation and deletion of a
storage tier in the PowerScale SmartPools:

PowerScale Advanced Administration

Page 306 © Copyright 2020 Dell Inc.


[email protected]
Reporting

View the audit log through the CLI command: isi_audit_viewer -t config

Challenge

Lab Assignment:
1) Configure protocol auditing for an access zone.
2) View, add and verify different events to audit.
3) Track changes to an individual user account.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 307


[email protected]
Reporting

SNMP

SNMP Overview

SNMP (Simple Network Management Protocol) is responsible for collecting and


organizing information about managed devices on the IP networks and for
modifying that information to change device behavior.

• SNMP is used to remotely monitor the PowerScale cluster hardware


components, such as fans, hardware sensors, power supplies, and disks.
• The default Linux SNMP tools or a GUI-based SNMP tool of choice is used for
this purpose.
• SNMP is enabled or disabled cluster wide, nodes are not configured
individually.
• Cluster information can be monitored from any node in the cluster. Generated
SNMP traps correspond to CELOG events.
• SNMP notifications can also be sent using:

• isi event channels create snmpchannel snmp --use-snmp-


trap false

SNMP Monitoring Supportability

OneFS SNMP monitoring feature only supports SNMPv2c and SNMPv3.


Compared with SNMPv2c, SNMPv3 adds both authentication and encryption
features.

Protocol Configuration Configuration Supportability Note


Version Description

SNMPv2c N/A N/A Supported By default,


SNMPv2c
is enable

PowerScale Advanced Administration

Page 308 © Copyright 2020 Dell Inc.


[email protected]
Reporting

SNMPv3 AuthPriv Both Not supported By default,


authentication SNMPv3 is
and encryption disabled
are enabled

AuthNoPriv Authentication is Supported


enabled but (default, if you
encryption is enable SNMPv3
disabled and it is
recommended)

noAuthNoPriv Both Supported


authentication
and encryption
are disabled
The table lists the supportability for OneFS SNMP monitoring.

SNMP Architecture

• SNMP applications run in a network management system (NMS) and issue


queries (SNMP GET/GET NEXT) to the SNMP service on PowerScale to gather
information.
• snmpd – the SNMP daemon on cluster responds to the queries and sends the
corresponding statistics to the SNMP applications.
• An SNMP community is a logical relationship between the SNMP service on the
OneFS side and the NMS on the client side.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 309


[email protected]
Reporting

PowerScale Management Information Bases (MIBs)

OneFS
SNMP Service (snmpd)

SNMP Community

OneFS SNMP Response

PowerScale Management Information Bases (MIBs) PowerScale Management Information Bases (MIBs)

Network Management System (NMS)

SNMP GET/GET NEXT

PowerScale Management Information


Bases (MIBs)

SNMP Response

SNMP GET/GET NEXT SNMP Community


Network Management System (NMS) SNMP Service (snmpd)

Management Information Base (MIB)

• Management Information Base (MIB) documents define human-readable names


for managed objects and specify their datatype and other properties.
• You can download MIBs from the OneFS WebUI for SNMP-monitoring a
PowerScale Cluster. You can also be managed by using the command line
interface (CLI).
• MIBs are stored in /usr/share/snmp/mibs/ on a OneFS node.

ISILON-MIB defines a group of SNMP agents that respond to queries from a


network monitoring system (NMS) called OneFS Statistics Snapshot agents. As the
name implies, these agents snapshot the state of the OneFS file system at the time
that it receives a request and reports this information back to the NMS.

OneFS ISILON-MIBs serves two purposes:

• augment the information available in standard MIBs


• provide OneFS specific information that is unavailable in standard MIBs.

PowerScale Advanced Administration

Page 310 © Copyright 2020 Dell Inc.


[email protected]
Reporting

ISILON-TRAP-MIB generates SNMP traps to send to an SNMP monitoring station


when the circumstances occur that are defined in the trap protocol data units
(PDUs).

Configure SNMP Settings

• If using SNMP for CELOG alerts, the SNMP reporting settings are the default
settings used.
• The default SNMP v3 username289 (general) and password, can be changed
using the CLI or the WebUI.
• Configure an NMS to query each node directly through a static IPv4
address.
• To enable SNMP v3 access:
• isi snmp settings modify --snmp-v3-access=yes
• To configure the security level, the authentication password and protocol, and
the privacy password and protocol:

• isi snmp settings modify --help

Configure the Cluster for SNMP Monitoring

Configuring a PowerScale cluster is done to remotely monitor hardware


components using SNMP.

289
The username is only required when SNMP v3 is enabled and making SNMP v3
queries.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 311


[email protected]
Reporting

In the WebUI, navigate to Cluster Management > General Settings > SNMP
Monitoring.

1:2 Download the MIB file you want to use and copy the MIB files to a directory
3
where the SNMP tool can find them.

2: If your protocol is SNMPv2, ensure that the Allow SNMPv2 Access check box is
selected. 4

3: In the SNMPv2 Read-Only Community Name field, enter the appropriate


community name. The default is I$ilonpublic.
5

4: In the SNMPv3 Read-Only User Name field, type the SNMPv3 security name to
change the name of the user with read-only privileges. The default read-only user
is general. In the SNMPv3 Read-Only Password field, type the new password for
the read-only user to set a new SNMPv3 authentication password.

5: In the SNMP Reporting area, enter a cluster description in the Cluster


Description field, and in the System Contact Email field, enter the contact email
address.

Review SNMP Settings

To review SNMP monitoring settings run: isi snmp settings view

The SNMP is enabled cluster wide for unset system and the I$ilonpublic community has read-only
access.

PowerScale Advanced Administration

Page 312 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

OneFS Job Engine

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 313


[email protected]
OneFS Job Engine

Module Objectives

After completion of this module, you can:

• Describe the architecture and workflow of the OneFS Job Engine.


• Describe the different Job Engine job types.
• Describe the job priority and impact policy.
• Describe the management capabilities provided by the OneFS Job Engine.

PowerScale Advanced Administration

Page 314 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

Job Engine Architecture

Job Engine Overview

• The OneFS Job Engine performs cluster-wide automation of tasks. It includes:


− Jobs290
− isi_job_d daemon291
− Work units292
• It runs across the entire cluster and is responsible for dividing and conquering 293
large storage management and protection tasks.

290A Job Engine job is a specific task, or family of tasks, intended to accomplish a
specific purpose. Jobs play a key role in data reprotection and balancing data
across the cluster, especially if the hardware fails or the cluster is reconfigured.

291 The parent process which runs on each node of the cluster.

292 Each job is broken down into work units which are handed off to nodes based
on node speed and workload. Every unit of work is tracked. That way, if you pause
a job, it can be restarted from where it last stopped.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 315


[email protected]
OneFS Job Engine

• Includes a comprehensive checkpointing system294.


• Framework includes an adaptive impact management system and drive-
sensitive impact control.
• Has the ability to run up to three jobs at a time.

Job Engine: Job

Individual jobs are scheduled to run at certain times, or start by an event, such as a
drive failure295, or start manually by the administrator.

All jobs have priorities. The most important jobs have the highest job priority, and
you should not modify them.

Jobs are given impact policies that define the maximum amount of usable cluster
resources.

Jobs run until completion. One job that holds up other jobs can affect job
operations296.

293To achieve this, it reduces a task into smaller work items and then allocates, or
maps, these portions of the overall job to multiple worker threads on each node.
Progress is tracked and reported on throughout job execution and a detailed report
and status is presented upon completion or termination.

294 This allows jobs to be paused and resumed, in addition to stopped and started.

295 For example, the FlexProtect job runs to reprotect the data when a hard drive
fails.

296
If contention occurs, examine which jobs are running, which jobs are queued,
when the jobs started, and the job priority and impact policies for the jobs.

PowerScale Advanced Administration

Page 316 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

Important: OneFS does not enable administrators to define custom


jobs.

Job Engine: isi_job_d Daemon

Each isi_job_d daemon manages the separate jobs that run on the cluster. The
daemons spawn off processes to perform jobs as necessary.

The isi_job_d daemons on each node communicate with each other to confirm
that actions are coordinated across the cluster. This communication ensures that
jobs are shared between nodes to keep the workload as evenly distributed as
possible.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 317


[email protected]
OneFS Job Engine

Job Categories

The Job Engine typically executes jobs as background tasks across the cluster,
using spare or especially reserved capacity and resources.

Job Engine jobs are categorized into three primary classes:

– File System Maintenance Jobs297

297 These jobs perform background file system maintenance, and typically require
access to all nodes. These jobs are required to run in default configurations, and
often in degraded cluster conditions. Examples include file system protection and
drive rebuilds. Although the file system maintenance jobs are run by default, either
on a schedule or in reaction to a particular file system event, any Job Engine Job
can be managed by configuring both its priority-level (in relation to other jobs) and
its impact policy.

PowerScale Advanced Administration

Page 318 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

– Feature Support Jobs298


– User Action Jobs299

Job Types based on access method

Job Engine Hierarchy

• Jobs - Job Engine jobs often consist of several phases, each of which are
executed in a pre-defined sequence.
• Phase - A phase is one complete stage of a job. Jobs may have one or more
phases.
• Task - Each job phase is composed of a number of work chunks, or tasks
distributed around the cluster to be performed by each node individually.
• Item - A task produces an individual work item that run as parallel thread on a
node.
• Item Results - Successful execution of a work item produces an item result.
• Checkpoints - Tasks and task results are written to disk, along with some
details about the job and phase, to provide a restart point.

298
The feature support jobs perform work that facilitates some extended storage
management function, and typically only run when the feature has been configured.
Examples include deduplication and anti-virus scanning.

299These jobs are run directly by the storage administrator to accomplish some
data management goal. Examples include parallel tree deletes and permissions
maintenance.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 319


[email protected]
OneFS Job Engine

2 4 5
3
1

1: Jobs are cluster-wide maintenance and feature-related processes. Quite often it


takes multiple steps to complete these processes. An example would be in the
case of deduplication, where one step relates to identifying duplicate information,
while a different step relates to rearranging the data on the drives for greater
efficiency. These steps are called phases. Jobs can have several phases, simple
jobs may only have one phase whereas complex jobs can have multiple phases.

2: Phases enable better insight into the progress of a job at a high level, by
examining the job engine logs. Phases also enable better efficiency since multiple
parallel phase functions do not contend with each other for cluster resources. If an
error occurs in a phase, the job is marked as failed at the end of the phase and
does not progress. Each phase of a job must complete successfully before
advancing to the next stage or being marked as complete, returning a job state
Succeeded message. Each phase is run in turn, but the job is not finished until all
the phases are complete. Each phase is broken down into tasks.

3: A phase is started with one or more tasks that are created during job startup. All
remaining tasks are derived from those original tasks similar to the way a cell
divides. A single task does not split if one of the halves reduces to a unit less than
whatever makes up an item for the job. For example, if a task derived from a
restripe job has the configuration setting to a minimum of 100 logical inode number
(LINS), then that task does not split further if it derives two tasks, one of which
produces an item with fewer than 100 LINs. A LIN is the indexed information that is
associated with specific data.

The tasks are logically alike within each phase, since they address different parts of
the same phase’s role. An example would be checking the integrity of files on the
cluster-wide file system. Each task would cover a series of files, all performing the
same checks, and different cluster nodes would check different files. The results of
these parallel tasks are collated and amount to the total result of the phase.

PowerScale Advanced Administration

Page 320 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

4: A task which is given to a particular node for execution is not monolithic, but
consists of many work items. If the job is for file deduplication, and the phase is for
block comparisons, and the task is a series of blocks to traverse and compare, then
a single item would be a single block to examine, calculate and compare with other
known block values. This level is the bottom of the job management hierarchy.
Items are not further decomposed into any smaller components. The result of each
item execution is logged, so that if there is an interruption, the job can restart from
where it stopped.

5: Task status from the constituent nodes are consolidated and periodically written
to checkpoint files. These checkpoint files allow jobs to be paused and resumed,
either proactively, or in the event of a cluster outage. Job engine checkpoint files
are stored in results and tasks subdirectories under the path
/ifs/.ifsvar/modules/jobengine/cp/<job_id>/ for a given job.

6: Item results are an accumulated accounting of work on a single item. For


instance, the result might contain a count of the number of retries that are required
to repair a file, plus any error found during processing.

Job Engine Functional Components

Coordinator

The orchestration of the Job Engine is handled by the Coordinator, which is a


process that runs on one of the nodes in a cluster.

• Principle responsibilities include:


− Monitoring work load and the status of the constituent nodes
− Controlling the number of worker threads per-node and cluster-wide
− Managing and enforcing job synchronization and checkpoints

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 321


[email protected]
OneFS Job Engine

− Starting and stopping jobs


− Processing work results as they are returned during job execution
• Get Coordinator node array ID: isi_job_d status

Director

The Director process is responsible for monitoring, governing and overseeing all
job engine activity on a particular node, constantly waiting for instruction from the
coordinator to start a new job.

• Each node in the cluster has a job engine director process, which runs
continuously and independently in the background.
• Principle responsibilities include:

− Create the Manager process


− Delegate to and request work from other peers
− Send and receive status messages

Manager

PowerScale Advanced Administration

Page 322 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

Each Manager process manages a single job at a time on the node. It is


responsible for managing the flow of tasks and task results throughout the node.

• Principle responsibilities include:


− Control and assign work items to multiple Worker threads working on items
for the designated job
− Maintain the number of active Worker threads under direction from
coordinator and director.
− Send status updates to node Director on the various Worker threads.
• When a job completes, the Manager processes associated for that job across
all nodes are terminated.

Worker

• If any task is available, each Worker is given a task. The Worker then
processes the task, item by item, until the task is complete or until the Manager
removes the task from the worker.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 323


[email protected]
OneFS Job Engine

• Towards the end of a job phase, the number of active threads decreases300 as
Workers finish up their allotted work and become idle.
• Check status301 of nodes' Worker threads: isi job statistics view

Job Engine Process Model

The Job Engine consists of multiple interoperating software components. When a


Job is launched, the Job Engine spawns a child process from the isi_job_d
daemon running on each node.

 Delegation Hierarchy302
 Shared Work Distribution303

300Nodes which have completed their work items just remain idle, waiting for the
last remaining node to finish its work allocation. When all tasks are done, the job
phase is considered to be complete and the worker threads are terminated.

301In addition to the number of current worker threads per node, a sleep to work
(STW) ratio average is also provided, giving an indication of the worker thread
activity level on the node.

302The Job Engine is based on a delegation hierarchy that is made up of


coordinator, director, manager, and worker processes. The entire engine is run by
the coordinator, which is a daemon instance on one of the cluster nodes. If that
daemon fails for any reason, another node’s director instance will take over
coordination duties.

303Once the work is initially allocated, the job engine uses a shared work
distribution model in order to execute the work, and each job is identified by a
unique job identification number.

PowerScale Advanced Administration

Page 324 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

 Central Coordination304
 Other Threads305

1: The job daemons elect a Coordinator by racing to lock a file. The node that first
locks the file becomes the Coordinator. Racing is an approximate way of choosing
the least busy node as the Coordinator. If the Coordinator node goes offline and
the lock is released, the next node in line becomes the new Coordinator.

While the actual work item allocation is managed by the individual nodes, the
Coordinator node takes control, divides up the job, and evenly distributes the
resulting tasks across the nodes in the cluster. For example, if the Coordinator

304A job’s workload is delegated from a central coordinator to spread it out across
the cluster, thus avoiding choking any one node.

305 There are other threads which are not displayed in the graphic. They relate to
internal functions, such as communication between daemons, and collection of
statistics. Here we focus on the operational components which perform the jobs.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 325


[email protected]
OneFS Job Engine

needs to communicate with a Manager process running on node 2, it first sends a


message to the Director of node 2, which then passes it on down to the appropriate
Manager process under its control. The Coordinator also periodically sends
messages, via the Director processes, instructing the Managers to increment or
decrement the number of Worker threads.

2: The Director runs on each node, communicates with the job Coordinator, and
coordinates tasks with the Managers. When three jobs are running simultaneously,
each node has three Manager processes, each with its own number of Worker
threads. The Director process serves as a central point of contact for all the
Manager processes running on a node, and as a liaison with the Coordinator
process across nodes.

3: The Managers on each node coordinate and manage the tasks with the Workers
on their respective node. If three jobs run simultaneously, each node would have
three Manager processes, each with its own number of Worker threads. Managers
request and exchange work with each other and supervise the Worker processes
they assign. Under direction from the Coordinator and Director, a Manager process
maintains the appropriate number of active threads for a configured impact level,
and for the current activity level of a node.

4: The job daemon uses Worker threads to enable it to run multiple tasks
simultaneously. A thread is the processing of a single command by the CPU. The
Coordinator tells each node job daemon what the impact policy of the job is, and
how many threads should be started to complete the job. Each thread handles its
task one item at a time, and the threads operate in parallel. The number of threads
determines the number of items being processed. The maximum number of
assigned threads manages the defined impact level and the load that is placed on
any one node. It is possible to run enough threads on a node that they can conflict
with each other. An example would be five threads all trying to read data off the
same hard drive. Since serving each thread at once cannot be done, threads are
queued and wait for each other to complete.

PowerScale Advanced Administration

Page 326 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

Job Engine Exclusion Sets

• For multiple concurrent job execution306, exclusion sets, or classes of similar


jobs, determine which jobs can run simultaneously.
• There are two exclusion sets for job phase activity that modify the core data and
metadata: Restripe307 and Marking308.

306
Concurrent job execution is governed by job priority, exclusion sets and cluster
health.

307 OneFS protects data by writing file blocks across multiple drives on different
nodes. This process is known as ‘restriping’ in the OneFS lexicon. The Job Engine
defines a restripe exclusion set that contains these jobs that involve file system
management, protection and on-disk layout. The restriping exclusion set is per-
phase instead of per job. This helps to more efficiently parallelize restripe jobs
when they don't need to lock down resources. The jobs with restriping phases often
have other no restriping phases as part of the job. For these jobs, when the
restriping phases are not running, other jobs with restriping phases can run. If two
jobs happen to reach their restriping phases simultaneously and the jobs have
different priorities, the higher priority job will continue to run, and the other will
pause. If the two jobs have the same priority, the one already in its restriping phase
will continue to run, and the one newly entering its restriping phase will pause.

308 OneFS marks blocks that are actually in use by the file system. IntegrityScan,
for example, traverses the live file system, marking every block of every LIN in the
cluster to proactively detect and resolve any issues with the structure of data in a
cluster. Multiple jobs from the same exclusion set will not run at the same time. For
example, Collect and IntegrityScan cannot be executed simultaneously, as they are
both members of the marking jobs exclusion set. Similarly, MediaScan and
SetProtectPlus won’t run concurrently, as they are both part of the restripe
exclusion set.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 327


[email protected]
OneFS Job Engine

• A job is not required to be part of any exclusion set309, and jobs may also
belong to multiple exclusion sets310.
• Multiple restripe or mark job phases cannot safely and securely run
simultaneously311 without interfering with each other or risking data corruption.
• Job Engine exclusion sets are predefined and cannot be modified or
reconfigured.

309The majority of the jobs do not belong to an exclusion set. These are typically
the feature support jobs and coexist and contend with any of the other jobs.

310
MultiScan is both a restripe job and a mark job. When MultiScan runs, no
additional restripe or mark job phases are permitted to run.

311Up to three jobs can run simultaneously. The Job Engine restricts the
simultaneous jobs to include only one restripe category job phase and one mark
category job phase simultaneously.

PowerScale Advanced Administration

Page 328 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

Job Engine Exclusion Sets Example

Restriping jobs only block each other when the current phase may perform
restriping.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 329


[email protected]
OneFS Job Engine

• Two restripe jobs, MediaScan and AutoBalanceLin, are both running their
respective first job phases.
• AutoBalanceLin restripes in the first phase causing ShadowStoreProtect,
also a restriping job, to be in waiting state.
• MediaScan restripes in phases 3 and 5 of the job, only if there are disk errors
(ECCs) which require data reprotection.
• If MediaScan reaches phase 3 with ECCs, it will pause until AutoBalanceLin is
no longer running. However, if MediaScan's priority were in the range 1-3, it
would cause AutoBalanceLin to pause instead.

Running and queued jobs:


ID Type State Impact Pri Phase
Running Time
-------------------------------------------------------------
--------------
26850 AutoBalanceLin Running Low 4 1/3
20d 18h 19m
26910 ShadowStoreProtect Waiting Low 6 1/1
-
28133 MediaScan Running Low 8 1/8
1d 15h 37m
-------------------------------------------------------------
--------------

The graphic shows that ShadowStoreProtect is in the waiting state because


AutoBalanceLin restripes in its first phase.

Job Engine Low Space Mode

• Job Engine enters a low space mode when it sees the available space on one
or more disk pools below a low space threshold, it regards the cluster as
running out of space.
• Low space mode enables jobs that free space (space saving jobs) to run before
the Job Engine or even the cluster become unusable.
• When available space reaches the high threshold, the Job Engine exits the low
space mode and resumes the jobs that were paused.

PowerScale Advanced Administration

Page 330 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

The graphic shows disk pools that have varying amounts of free capacity. One of the disk pools
crosses the low space threshold.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 331


[email protected]
OneFS Job Engine

Job Types, Priority, and Impact

Job Priority

• Job priorities determine which job takes precedence when more than three jobs
of different exclusion sets attempt to run simultaneously.
• The Job Engine assigns a priority value between 1 and 10 to every job, with 1
being the most important and 10 being the least important.
• Job priorities are configurable312 by the cluster administrator. The default priority
settings are recommended.
• Higher priority jobs always cause lower-priority jobs of the same exclusion set to
be paused.313

312 Job priority can be changed either permanently or during a manual execution of
a job. A new job does not interrupt a running job when the jobs have the same
priority. It is possible to have a low impact, high priority job, or a high impact, low
priority job. In the Job Engine, jobs from similar exclusion sets are queued when
conflicting phases run. Changing the priority of a job can have negative effects on
the cluster. Jobs priority is a trade-off of importance. Historically, many issues have
been created by changing job priorities. Job priorities should remain at their default
unless instructed to change by a senior level support engineer.

313 The maximum number of jobs that can run simultaneously is three. If a fourth
job with a higher priority is started, either manually or through a system event, the
Job Engine pauses one of the lower-priority jobs that is currently running. The Job
Engine places the paused job into a priority queue, and automatically resumes the
paused job when one of the other jobs is completed. If two jobs of the same priority
level are scheduled to run simultaneously, and two other higher priority jobs are
already running, the job that is placed into the queue first is run first.

PowerScale Advanced Administration

Page 332 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

The graphic shows the timeline for a scenario where different jobs are run. The number on the job
name indicates the job priority for the respective job.

Job Impact Policy

• Job Engine uses impact policies that you can manage to control when a job
runs and the system resources that it consumes314.
• Job Impact Policy is a reflection of the impact caused by jobs on available CPU
and I/O resources315.
• Job Engine has four default impact policies that you can use but not modify.

Impact Policy Allowed to run Resource Consumption

314The Job Engine service monitors system performance to ensure that


maintenance jobs do not significantly interfere with regular cluster I/O activity and
other system administration tasks.

315If you want to specify other than a default impact policy for a job, you can create
a custom policy with new settings. Jobs with a low impact policy have the least
impact on available CPU and disk I/O resources. Jobs with a high impact policy
have a significantly higher impact. In all cases, however, the Job Engine uses CPU
and disk throttling algorithms to ensure that tasks that you initiate manually, and
other I/O tasks not related to the Job Engine, receive a higher priority.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 333


[email protected]
OneFS Job Engine

LOW316 Any time of day Low

MEDIUM317 Any time of day Medium

HIGH318 Any time of day High

OFF_HOURS319 Outside of business Low


hours. Business hours
are defined as 9 AM to 5
PM, Monday through
Friday. OFF_HOURS is
paused during business
hours.

316By default, most jobs have the LOW impact policy, which has a minimum impact
on the cluster resources.

317 More time-sensitive jobs have a MEDIUM impact policy. These jobs have a
higher urgency of completion that is typically related to data protection or data
integrity concerns.

318The use of the HIGH impact policy is discouraged because it can affect cluster
stability. HIGH impact policy use can cause contention for cluster resources and
locks that can result in higher error rates and negatively impact job performance.

319The OFF_HOURS impact policy enables greater control of when jobs run,
minimizing the impact on the cluster and providing the resources to handle
workflows.

PowerScale Advanced Administration

Page 334 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

Important: Increasing job impact policy does not always make a job
complete faster because it may be constrained by disk I/O or other
running cluster processes.

Job Impact Policy: Levels and Intervals

• An impact policy can consist of one or many impact intervals, which are blocks
of time within a given week.
• Each impact interval can be configured to use a single impact level320.
• The available impact levels are: Paused, Low, Medium, and High.

320Impact levels are predefined and they specify the amount of cluster resources to
use for a particular cluster operation. A job's priority and impact level are
independent of each other.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 335


[email protected]
OneFS Job Engine

• Impact intervals and levels can be configured per job321.


• A mix of jobs with different impact levels result in resource sharing322.

321Increasing or lowering an impact level from its default results in increasing or


lowering the number of workers that are assigned to the job. The number of
workers assigned to the job impacts the time that is required to complete the job
and the impact on cluster resources. Impact policies in the Job Engine are based
on the highest impact policy for any running job. Impact policies are not cumulative
between jobs but set the resource levels and number of workers that are shared
between the jobs. Significant issues are caused when cluster resources are
modified in the job impact settings. Lowering the number of workers for a job can
cause jobs to never complete. Raising the impact level can generate errors or
disrupt production workflows. Do not change the default impact policy for any given
job. The default settings are optimized for cluster maintenance without severe
workflow interruptions. The exception to this is the OFF_Hours policy. Depending
upon your workflow, use the OFF_HOURS policy to create a custom job to avoid
workflow peak periods. Do not use the HIGH impact policy unless you have
received senior technical support guidance to do so.

322Each job cannot exceed the impact levels set for it, and the aggregate impact
level cannot exceed the highest level of the individual jobs.

PowerScale Advanced Administration

Page 336 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

Scenarios showing the overall cluster impact when jobs with different impact levels run at the same
time on the cluster.

Job Impact Policy: Throttling Resources

• Job Engine throttling limits the resources and thereby limiting the rate at which
jobs can run.
• Certain jobs, if left unchecked, could consume vast quantities of a cluster’s
resources, contending with and impacting client I/O.
• Throttling is employed at a per-manager process level, so job impact can be
managed both granularly and gracefully.
• The coordinator process gathers cluster CPU and individual disk I/O load data
every 20 seconds from all cluster nodes:
− Decide number of threads323 running on each node to service each job.

323This can be a fractional number, and fractional thread counts are achieved by
having a thread sleep for a given percentage of each second.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 337


[email protected]
OneFS Job Engine

− Coordinator makes a throttling decision.324


• The Job Engine allocates a specific number of threads to each node by default
controlling the impact of a workload on the cluster. 325

Job Types: Data Distribution

The most common Job Engine jobs can be broken into different types of use.

324Using this CPU and disk I/O load data, every sixty seconds the coordinator
evaluates how busy the various nodes are and makes a job throttling decision,
instructing the various job engine processes as to the action they must take. This
enables throttling to be sensitive to workloads in which CPU and disk I/O load
metrics yield different results. Also, there are separate load thresholds tailored to
the different classes of drives used in OneFS powered clusters, including high
speed SAS drives, lower performance SATA disks and SSDs.

325 If little client activity is occurring, more worker threads are spun up to allow more
work, up to a predefined worker limit. For example, the worker limit for a low-impact
job might allow one or two threads per node to be allocated, a medium-impact job
from four to six threads, and a high-impact job a dozen or more. When this worker
limit is reached (or before, if client load triggers impact management thresholds
first), worker threads are throttled back or terminated. For example, a node has four
active threads, and the coordinator instructs it to cut back to three. The fourth
thread is allowed to finish the individual work item it is currently processing, but
then quietly exit, even though the task as a whole might not be finished. A restart
checkpoint is taken for the exiting worker thread’s remaining work, and this task is
returned to a pool of tasks requiring completion. This unassigned task is then
allocated to the next worker thread that requests a work assignment, and
processing continues from the restart check-point. This same mechanism applies
when multiple jobs are running simultaneously on a cluster.

PowerScale Advanced Administration

Page 338 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

Jobs are not exclusive to themselves and often work by calling other jobs to
complete their task.

Jobs that are related to the distribution of the data on the cluster include:

Job Name Description Access Impact Priority Operation


Method Policy

AutoBalance Balances free space in Drive + Low 4 Manual


a cluster, and is most LIN
efficient in clusters that
contain only HDDs.

AutoBalanceLin Balances free space in LIN Low 4 Manual


a cluster, and is most
efficient in clusters
when file system
metadata is stored on
SSDs.

Collect Reclaims disk space Drive + Low 4 Manual


that could not be freed LIN
due to a node or drive
being unavailable while
they suffer from
various failure
conditions.

MultiScan Runs Collect and LIN Low 4 Manual


AutoBalance jobs
concurrently.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 339


[email protected]
OneFS Job Engine

Job Types: Data Integration and Protection

Data integrity and protection jobs326 run regularly on the cluster.

The reprotection jobs327 focus on returning data to a fully protected state.

Jobs that are related to testing the data integrity and protection include:

Job Name Description Access Impact Priority Operation


Method Policy

FlexProtect Scans the file Drive + Medium 1 Manual


system after a LIN
device failure to
ensure that all
files remain
protected. It is
most efficient on
clusters that
contain only
HDDs.

326Data integrity and data protection jobs can be further broken down into active
error detection and reprotection of the data. The active error detection includes jobs
that are often found running for long periods of time. The jobs run when no other
jobs are active and look primarily for errors on the drives or within the files.

327 Events such as a drive failure trigger reprotection jobs.

PowerScale Advanced Administration

Page 340 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

FlextProtectLin Scans the file LIN Medium 1 Manual


system after a
device failure to
ensure that all
files remain
protected. It is
most efficient
when file system
metadata is
stored on SSDs.

IntegrityScan Performs online LIN Medium 1 Manual


verification and
correction of any
file system
inconsistencies.

MediaScan Locates and Drive + Low 8 Scheduled


clears media- LIN
level errors from
disks to ensure
that all data
remains
protected.

ShadowStoreProtect Protects shadow LIN Low 6 Scheduled


stores that are
referenced by
LIN with a higher
level of
protection.

Job Types: Feature-Related

Feature-related jobs run as a part of specific features scheduled in OneFS.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 341


[email protected]
OneFS Job Engine

Job Name Description Access Impact Priority Operation


Method Policy

ChangelistCreate Creates a list of Changelist Low 5 Manual


changes
between two
snapshots with
matching root
paths. You can
specify these
snapshots from
the CLI.

Dedupe Scans a Tree Low 4 Manual


directory for
redundant data
blocks and
deduplicates all
redundant data
stored in the
directory.

DedupeAssessment Scans a Tree Low 6 Manual


directory for
redundant data
blocks and
reports an
estimate of the
amount of
space that
could be saved
by
deduplicating
the directory.

PowerScale Advanced Administration

Page 342 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

FSAnalyze Gathers and Changelist Low 1 Scheduled


reports
information
about all files
and directories
beneath the
/ifs path.
Reports from
this job is used
by InsightIQ
users for
system
analysis
purposes.

SmartPools Enforces LIN Low 6 Scheduled


SmartPools file
pool policies.
This job runs
on a regularly
scheduled
basis, and can
also be started
by the system
when a change
is made (for
example,
creating a
compatibility
that merges
node pools).

SmartPoolsTree Enforce Tree Medium 5 Manual


SmartPools file
policies on a
subtree.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 343


[email protected]
OneFS Job Engine

Job Name Description Access Impact Priority Operation


Method Policy

AVScan Performs an antivirus Tree Low 6 Manual


scan on all files.

QuotaScan Updates quota Tree Low 6 Manual


accounting for
domains created on
an existing file tree.
This job should be run
manually in off-hours
after setting up all
quotas, and whenever
setting up new
quotas.

SetProtectPlus Applies a default file LIN Low 6 Manual


policy across the
cluster. Runs only if a
SmartPools license is
not active.

SnapshotDelete Creates free space LIN Medium 2 Manual


associated with
deleted snapshots.
Triggered by the
system when you
mark snapshots for
deletion.

SnapRevert Reverts an entire LIN Low 5 Manual


snapshot back to
head.

PowerScale Advanced Administration

Page 344 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

WormQueue Processes the WORM LIN Low 6 Scheduled


queue, which tracks
the commit times for
WORM files. After a
file is committed to
WORM state, it is
removed from the
queue.

Job Types: Selective Use

The last category of jobs contains the jobs that are selectively run for specific
purposes.

These jobs may be scheduled, however, the administrator runs them only when the
job is required.

Job Name Description Access Impact Priority Operation


Method Policy

DomainMark Associates a path, Tree Low 5 Manual


and the contents
of that path, with a
domain.

PermissionRepair Uses a template Tree Low 5 Manual


file or directory as
the basis for
permissions to set
on a target file or
directory. The
target directory
must always
subordinate to the
/ifs path.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 345


[email protected]
OneFS Job Engine

ShadowStoreDelete Frees up space LIN Low 2 Scheduled


that is associated
with shadow
stores. Shadow
stores are hidden
files that are
referenced by
cloned and
deduplicated files.

TreeDelete Deletes a Tree Medium 4 Manual


specified file path
in the /ifs
directory.

Upgrade Upgrades the file Tree Medium 3 Manual


system after a
software version
upgrade.

PowerScale Advanced Administration

Page 346 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

Job Performance

Jobs that support result merging.

• Maintenance functions use system resources and can take hours or days to
run.
• Not all OneFS Job Engine jobs run equally fast.328
• The time that it takes for a job to run can vary depending on several factors,
including:
− Other system jobs that are running

328 Some jobs can take a long time to complete. However, those jobs should get
paused so jobs of higher immediate importance can finish. Pausing and restarting
is an example of the balance for job priorities that are considered when the default
settings were determined. A job that runs through files progresses slower on a
cluster with many small files than on a cluster with a few large files. Jobs that
compare data across nodes (such as Dedupe) run slower when making many
comparisons. Many factors play into this, and linear scaling is not always possible.
If a job runs slowly, the first questions should be directed to discover what is the
context of the job.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 347


[email protected]
OneFS Job Engine

− Other processes that are taking up CPU and I/O cycles while the job is
running
− Cluster configuration
− Dataset size
− Job access method329
− Time since the last iteration of the job
• Jobs results are merged and delivered in batches330 when multiple jobs are
running simultaneously.

329The specific access method influences the run time of a job. For instance, some
jobs are unaffected by cluster size, others slow down or accelerate with the more
nodes a cluster has, and some are highly influenced by file counts and directory
depths.

330 On large clusters with multiple jobs running at high impact, the job coordinator
can become bombarded by the volume of task results being sent directly from the
worker threads. In OneFS 8.2 and later, this is mitigated by certain jobs performing
intermediate merging of results on individual nodes and batching delivery of their
results to the coordinator.

PowerScale Advanced Administration

Page 348 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

Job Engine Management

Job Engine Management Overview

• The cluster health depends on the Job Engine and the configuration of jobs in
relationship to each other.
• Administrators can manage the Job Engine using the WebUI or the CLI.
− Manual Job Execution331
− Scheduled Job Execution332
− Proactive Job Execution333

331
The majority of the Job Engine jobs have no default schedule and can be
manually started by a cluster administrator.

332Jobs such as FSAnalyze, MediaScan, ShadowStoreDelete, and SmartPools,


are normally started via a schedule.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 349


[email protected]
OneFS Job Engine

− Reactive Job Execution334


− Job Control335
• As maintenance jobs run, the Job Engine constantly monitors and mitigates
their impact on the overall performance of the cluster.

Job Operations

All job operations are managed by navigating to Cluster management > Job
operations page of the OneFS WebUI.

Job Summary

• View the status of running jobs.

333The Job Engine can also initiate certain jobs on its own. For example, if the
SnapshotIQ process detects that a snapshot has been marked for deletion, it will
automatically queue a SnapshotDelete job.

334The Job Engine executes jobs in response to certain system event triggers. In
the case of a cluster group change, for example the addition or subtraction of a
node or drive, OneFS automatically informs the job engine, which responds by
starting a FlexProtect job. The coordinator notices that the group change includes a
newly-smart-failed device and then initiates a FlexProtect job in response.

335Job administration and execution can be controlled via the WebUI, the CLI, or
the OneFS RESTful platform API. For each of these control methods, additional
administrative security can be configured using RBAC. By restricting access via the
ISI_PRIV_JOB_ENGINE privilege, it is possible to allow only a sub-set of cluster
administrators to configure, manage and execute job engine functionality, as
desirable for the security requirements of a particular environment.

PowerScale Advanced Administration

Page 350 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

• Pause or stop a running job.


• Update a job336.

Job Types

• View a list of all jobs.


• Start a job manually.
• Edit the priority, impact policy, and schedule for a job

336You can change the priority and impact policy of a running, waiting, or paused
job. When you update a job, only the current instance of the job runs with the
updated settings. The next instance of the job returns to the default settings for that
job.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 351


[email protected]
OneFS Job Engine

Job Reports

• View the report details and event associated for a completed job or job phase.
• View job history.

Job Events

• View the details of job events for each job or job phase.
• Jobs can be filtered based on job ID or job type.

PowerScale Advanced Administration

Page 352 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

Job Impact Policies

• Create a custom impact policy from scratch or by copying and editing an


existing system impact policy.
• Modify and delete custom impact policies.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 353


[email protected]
OneFS Job Engine

Manual and Scheduled Job Execution

Scenario

✓ Schedule Dedupe job once every Sunday.


✓ Manually start Dedupe job.
✓ View the active job details and status.
✓ View job statistics.
✓ Check the job report once job completes.

Schedule Job

CLI: isi job types modify Dedupe --schedule "Every Sunday at


12:00 PM"

PowerScale Advanced Administration

Page 354 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

Start Job Manually

CLI: isi job jobs start Dedupe

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 355


[email protected]
OneFS Job Engine

Active Job Details

Job Progress gives live


details about the job
activities

• While a job is running, an Active Job Details report is available.


• The details provide contextual information, including elapsed time, current job
phase, job progress status, and more.
• For inode (LIN) based jobs, Progress as an estimated percentage completion
is also displayed, based on processed LIN counts.
• CLI: isi job jobs view

PowerScale Advanced Administration

Page 356 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

isi job status

Job Engine status

Coordinator Node

Running and Queued


Jobs

Recent jobs which either


completed or failed

• isi job status command views the running, paused, or queued jobs, and
the status of the most recent jobs.
• The --verbose option adds failed and completed jobs to the list and gives
greater detail in the job summary.
• The output provides job-related cluster information, including identifying the
coordinator node and if any nodes are disconnected from the cluster.
• Running isi job status is a simple way to detect whether jobs are creating
the main performance bottleneck on the cluster.

isi job statistics

The Job Engine provides detailed monitoring and statistics gathering, with insight
into jobs and job engine.

Various job engine-specific metrics are available via the OneFS CLI, including per
job disk usage.

For example, worker statistics and job level resource usage can be viewed using
the isi job statistics list command.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 357


[email protected]
OneFS Job Engine

Also, the status of the Job Engine workers is available using the isi job
statistics view command.

isi job statistics list

The Coordinator assigns a Job ID for the entire cluster and the PID for the individual nodes.

isi job statistics view

PowerScale Advanced Administration

Page 358 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

Job Reports

• A comprehensive job report is provided for each job phase.


• Detailed job performance information and statistics are available in a job report.
• The statistics for a job phase include:
− CPU and memory utilization (including minimum, maximum, and average).
− Total read and write IOPS and throughput.
• CLI: isi job reports view

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 359


[email protected]
OneFS Job Engine

Checking Jobs and Job Queue

• If you suspect the combination of client workload and jobs are overworking the
cluster, examine the performance of the disks.
• The sysctl command views the number of operations queued per disk.
• When interpreting the results, keep in mind the type of connections337.

The example shows no latency from the sysctl hw.iosched command output.

Caution: Storage administrators may use the sysctl commands to


regularly poll the cluster to understand the baseline and identify
anomalous behavior. Do not use sysctl to change cluster
parameters without consulting Dell Services.

337 For example, if you are looking at a single-stream SMB-based workflow, a


queue of four jobs could indicate an issue. If the connections represent high-
concurrency NFS namespace operations, four is fine, and a number higher could
still indicate no problems. So, the type of connections matters.

PowerScale Advanced Administration

Page 360 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

Jobs Impact on System Load

Assess the activity of the Job Engine, including assessments of which jobs put
which kinds of loads on your cluster.

After assessment, establish whether the Job Engine is a meaningful source of load
for the cluster at all.

Sometimes it is, in other cases it is entirely negligible but you cannot know with
certainty unless you look at the job run times.

Here are some items to consider:


• Track job system run times
• See which jobs run in parallel
• Measure the load by comparing loads with no jobs running and examining the
isi_job_d process activity
• Use trending and baseline comparisons to determine if jobs are pushing the
cluster beyond its normal usage levels
• Make informed tuning decisions

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 361


[email protected]
OneFS Job Engine

Long Running Jobs

• Sometimes jobs take a long time to complete, or that they appear to be stuck.338
• Bring up the job engine log file at /var/log/isi_job_d.log and look for job engine
worker assignments.
• The log indicates when a worker completes a task and is assigned new tasks. If
still unclear the administrator should call support before taking action.
• Restarting a job may not be the best path as it has to start from the beginning of
the task and may not quickly return to the previous point.
• Another way to resolve the issue is by adjusting the impact policies.

Differentiating Stuck Jobs

• One common source of frustration is trying to determine whether a long running


job is actually in operation, or whether it has stopped running for some reason.
• /var/log/isi_job_d.log is the key log file which contains all job engine activity. It
records the jobs started, stopped, paused, restarted, or modified. It also records
job engine worker thread activities, and which workers are active on which jobs.
This means that even when jobs are running silently, without apparently visible
progress, the administrator can see that the job is active by watching the activity
of worker threads in isi_job_d.log.

338 Job system monitoring can reveal cases where a long running job is repeatedly
interrupted by higher priority jobs, making it appear to stay in the queue indefinitely
when in fact it is rarely getting a chance to run. If a job appears stuck, there may
still be activity going on that is difficult to see.

PowerScale Advanced Administration

Page 362 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

Manager thread reporting

Worker reporting

Job succeeded report

The graphic shows the /var/log/isi_job_d.log output.

Troubleshooting Job Related Issues

Listed are the common areas to consider when addressing Job Engine related
issues:

• Misconfigured Jobs339
• Job History340

339Misconfigured jobs can affect cluster operations. Examine how the jobs have
been configured to run, and how they have been running and if jobs are failing.
Failed jobs can be an indicator of other cluster issues also. For example, when the
MultiScan or Collect jobs have many starts and restarts, indicating group changes.
Group changes occur when drives or nodes leave or join the cluster.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 363


[email protected]
OneFS Job Engine

• Job Engine Misconfigurations341


• Impact Level Changes342
• Long Running Jobs343

340The job events and operations summary either from the WebUI or the CLI is
useful for immediate history. Often an issue is recurring over time and can be more
easily spotted from the job history or job reports. For example, a high priority job
constantly pushes other jobs aside, but a less consistent queue backup can still
prevent features from properly operating. The issue can require deeper dives into
the job history to see what is not running, or is running only infrequently.

341Job Engine misconfigurations are a common way to affect performance.


Changing the priority of a job and when a job is scheduled to run can interfere with
the job to run on schedule. As an example, an administrator changes the priority of
the SmartPools job to a 2 and the SnapshotDelete job to an 8. The administrator
schedules both jobs simultaneously. Almost all other jobs take priority and the
SnapshotDelete job will only run about twice a month. The result is the snapshots
frequently fill the available space on the cluster. Also, when the job runs, it runs
during peak workflow hours, impacting the cluster performance. If the administrator
changes a job priority, investigate the reason for the change. Look for alternative
configuration options to achieve the goal.

342 Impact level changes directly affect the job completion time and the cluster
resources. For example, an administrator modified the LOW impact policy to have
0.1 maximum workers or threads per storage unit. The result was that no low
impact job ever completed. The customer then changed the jobs with LOW impact
policies to a MEDIUM impact policy. When the jobs ran it negatively impacted
cluster performance. After investigation, the customer made the changes to limit
the impact during peak workflow hours. Restoring all settings to the system defaults
fixed the issue. The use of a custom schedule was implemented using a
modification of the OFF_HOURS policy, obtaining the intended goal.

PowerScale Advanced Administration

Page 364 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

Job Engine Observations

Limits

The Job Engine has certain limitations, some by design and others by the nature of
its role:
• Only three concurrent jobs can run on the Job Engine.
• High priority jobs are prioritized in the same exclusion set.
• Job scalability varies:

− Some fast regardless of cluster size


− Some slowdown with more nodes344
− Some depend on file/directory count
− Disk performance and /ifs file system size

343 Some jobs naturally have a long lifespan. FSA, deduplication, and Autobalance
can all have a long active period. Administrators should carefully evaluate the
circumstances before trying to stop the jobs on the assumption that they have
somehow stopped responding. There are some jobs that should not be interfered
with unless support directs you. FlexProtect being the primary one, as this job re-
protects data. The cluster monitors this job closely and alerts if there is a problem
during this job. Adjust between low and medium impact, but consult technical
support before any additional action is taken.

344For example, Dedupe job slows proportional to the product of storage pools size
and the number of storage pools. Small cluster may run dedupe daily, medium
clusters on the weekends and larger clusters might be monthly or quarterly.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 365


[email protected]
OneFS Job Engine

Considerations

• When configuring priority, schedule, and impact:


− What resources are impacted?
− Gain or lose by reprioritizing a job?
− Impact options benefits and drawbacks?
− Job duration and contention?
• Jobs with LIN suffix are faster if SSDs are present in node pool, cluster or L3
cache.
• When more than three jobs with same priority and exclusion contend to run, the
three jobs with lowest job IDs are run.

Best Practices

• Schedule jobs to run during cluster’s low usage hours.


• Use default priority, impact, and schedule where possible.
• Allow jobs to complete before running OneFS upgrade.
• Allow FlexProtect to complete before powering down one or more nodes.
• It is recommended not to disable the snapshot delete job.
• In a heterogeneous cluster, tune job priorities and impact policies to the level of
the lowest performance tier.

Resource: PowerScale OneFS Job Engine Whitepaper

PowerScale Advanced Administration

Page 366 © Copyright 2020 Dell Inc.


[email protected]
OneFS Job Engine

Challenge

Lab Assignment:
1) Perform the different job operations.
2) Understand Job Engine behavior due to factors such as exclusion sets
and job priority.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 367


[email protected]
OneFS Services

OneFS Services

PowerScale Advanced Administration

Page 368 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

OneFS Services

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 369


[email protected]
OneFS Services

Module Objectives

After completion of this module, you can:

• Discuss Small File Storage Efficiencies (SFSE) and SFSE defragmentation.


• Discuss various migration phases and tools.
• Describe the SmartQuotas architecture and understand various scenarios.
• Describe SnapshotIQ architecture and managing and restoring Snapshots.
• Discuss the SyncIQ components and the performance rules.

PowerScale Advanced Administration

Page 370 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

SFSE – Small Files Storage Efficiencies

SFSE – Small Files Storage Efficiencies

Small File Definitions

Before moving into the case study,


Hayden first wants to know what
exactly defines a small file. Mirror 1 Mirror 2

Small file
(<=128 KB)

The graphic shows a 3x mirror for the small files.

A small file is a file less than one stripe unit in length, or 128 KB or less. OneFS
does not break small files into smaller logical chunks.

OneFS uses forward error correction (FEC) to parity protect a file, resulting in high
levels of storage efficiency.

Small files are mirrored, so have a larger on-disk footprint. With mirroring, OneFS
makes copies of each file and distributes multiple instances of the entire protection
unit across the cluster. The loss protection requirements of the requested
protection determine the number of mirrored copies. If the workflow has millions of

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 371


[email protected]
OneFS Services

small files, the efficiency can become a significant issue. When FEC protection is
calculated, it is calculated at the 8 KB block level. If there is only one 8 KB to use in
the calculation, the result is a mirror of the original data block. The requested
protection level determines the number of mirrored blocks.

Many archive datasets are moving away from large file formats such as tar and .zip
files to storing smaller files individually, allowing in-place analytics. To address the
trend, OneFS uses Small File Storage Efficiency or SFSE. SFSE maximizes the
cluster capacity by decreasing the storage that is required for small file archive
data.

Files Sizes

Small Files Under 128 KB

64 KB file

FEC calculates as mirror for


8 KB blocks

Only one block to calculate


against = same block

16 x 8 KB stripe unit per file

PowerScale Advanced Administration

Page 372 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

The table shows a 64 KB file with the protection level set at N+2d:1n. The
protection level results in a 3x mirror. The result is that the 64 KB file consumes
192 KB of storage.

 There is no minimum read or write cache benefits345.


 Random reads occur frequently346.
 Setting mirrored protection347

345Since small files are a single stripe unit and not related to other stripe units,
there is no minimum read or write cache benefits. The use of L3 cache can improve
chances of gaining a cache benefit for repeat random reads. In other words, the
same small read multiple times could benefit from L3 cache.

346If the workflow is predominantly small files, setting the access pattern to random
can reduce using unnecessary cluster resource when predicting cache data.

347If the workflow data is going to be all small files, CPU resources can be saved
by setting the requested protection level as mirrored protection.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 373


[email protected]
OneFS Services

Files Sizes Not Evenly Divisible by 128 KB

176 KB file

FEC and mirrored protection

Files not evenly divisible by 128 KB result with some FEC blocks mirroring348.

The table shows a 176 KB file.

348Not all 8 KB blocks have a corresponding block in the second data stripe to
calculate FEC against.

PowerScale Advanced Administration

Page 374 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

 The file has one 128 KB stripe unit and one 48 KB stripe unit.
 The first six 8 KB blocks of each stripe unit calculate FEC.
 The remaining ten 8 KB blocks have mirrored protection.
 The ten unused blocks of data stripe 2 are free for use in the next stripe unit349.
 The 176 KB file has little caching benefits350.

Calculating Space for Small File

Consume 8 KB block

OneFS uses only the required 8 KB blocks to save the file.

For more precise consumption, the metadata is calculated per file.

Assuming GNA is not enabled, there are three 512 B metadata blocks per file for
this example, or 1.5 KB. So the total space is 25.5 KB for the file on disk.

Example: Small Files

All files 128 KB or less are mirrored. For a protection strategy of N+1 the 128 KB
file is mirrored, the original data and a copy.

349 The stripe unit is not padded, and the capacity is not wasted.

350L3 cache recognizes this file size and enables repeat random read caching.
Setting a random access pattern may be appropriate depending on the workflow.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 375


[email protected]
OneFS Services

The example demonstrates how all files 128 KB or less are mirrored. Shown is a
four node cluster. For a protection strategy of N+1 the 128 KB file has a 2X mirror,
the original data and a mirrored copy. FEC is still calculated on files less than or
equal to 128 KB, but the result is a copy.

Warning: Setting mirrored protection mirrors all files regardless of


size, use only when and where appropriate.

Calculating Space for Small File: 8 KB is the minimum block size used. 8 KB was
chosen for storage efficiencies and is the optimal size for most PowerScale
workflows. Any file or portion of a file less than 8 KB consumes an 8 KB block. So a
4 KB file consumes one 8 KB block. A 12 KB file consumes two 8 KB blocks. A 24
KB file consumes three 8 KB blocks. A 4 KB file with N+2d:1n requested protection
level has 8 KB for the data, and two 8 KB mirrors, totaling 24 KB.

Small Files vs. Large Files

PowerScale Advanced Administration

Page 376 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

Mirroring small files can be a large space concern.

• Even with the protection and metadata overhead, the consumed space is little.
• SFSE can minimize the potential impact351.

With one million 24 KB files at a requested protection of N+2d:1n, the file data,
protection overhead, and metadata is about 70.09 GB.

In the given example, in the case of the small files scenario, it takes 70GB to hold 1
million files of 24KB each so, the efficiency ratio is of 2.92:1 (that is, it take 2.92
storage units to store 1 real storage unit). Compared to the large files scenario
where it takes 1.35GB to hold 1.2GB, the efficiency ratio is of 1.13:1.

A 1.5 hour YouTube video at 1080p averages approximately 1.2 GB per file before
protection and metadata. With protection and metadata overhead, the file is about
1.35 GB. So, one million small files are about the same as 52 YouTube videos. It
takes few large files to equal the consumption of many small files.

Inline Data Reduction Efficiency

You can improve the storage efficiency for small files by using inline data reduction.
OneFS inline data reduction combines both real-time compression and

351 OneFS small file usage may not be highly efficient, but there is not a large
impact. One method is to analyze data in three categories: The number of small
files and the average file size, the number of large files and average file size, and
the number of all other or medium files and average file size. The idea is to look at
all workflows and not just the workflow with many small files.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 377


[email protected]
OneFS Services

deduplication. Data compression352 and deduplication353 are specialized data


reduction techniques that allow for the reduction in physical size of data.

• The primary purpose of OneFS inline data reduction is to reduce the storage
requirements for data, resulting in a smaller storage footprint.
• Inline data reduction also helps to shrink the total amount of physical data that
is written to storage devices.

1 2 3

1: The inline data reduction zero block removal phase detects blocks that contain
only zeros and prevents them from being written to disk. This phase reduces the
disk space requirements and avoids unnecessary writes to SSD, resulting in
increased drive longevity.

Zero block removal occurs first in the OneFS inline data reduction process. As
such, it has the potential to reduce the amount of work that both inline

352 Compression uses a lossless algorithm to reduce the physical size of data when
it is written to disk and decompresses the data when it is read back. More
specifically, lossless compression reduces the number of bits in each file by
identifying and reducing or eliminating statistical redundancy. No information is lost
in lossless compression, and a file can easily be decompressed to its original form.

353 Deduplication differs from data compression in that it eliminates duplicate copies
of repeating data. Whereas compression algorithms identify redundant data inside
individual files and encode the redundant data more efficiently. Deduplication
inspects data and identifies sections, or even entire files, that are identical, and
replaces them with a shared copy.

PowerScale Advanced Administration

Page 378 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

deduplication and compression must perform. The check for zero data does incur
some overhead. However, for blocks that contain nonzero data the check is
terminated on the first nonzero data that is found, which helps to minimize the
impact.

The following characteristics are required for zero block removal to occur:

• A full 8 KB block of zeroes.


• A partial block of zeroes being written to a sparse block.

The write converts the block to sparse if not already. A partial block of zeroes being
written to a non-sparse, non pre-allocated block does not zero eliminate.

2: Storage efficiency is achieved by scanning the data for identical blocks as it is


received and then eliminating the duplicates. When a duplicate block is discovered,
inline deduplication moves a single copy of the block to shadow stores.

When a client writes a file to a node pool configured for inline deduplication on a
cluster, the write operation is divided up into whole 8 KB blocks. Each of these
blocks is then hashed and its cryptographic fingerprint compared against an in-
memory index for a match. One of the following operations occurs:

• If a match is discovered with an existing shadow store block, a byte-by-byte


comparison is performed. If the comparison is successful, the data is removed
from the current write operation and replaced with a shadow reference.
• When a match is found with another LIN, the data is written to a shadow store
instead and replaced with a shadow reference. Next, a work request is
generated and queued that includes the location for the new shadow store
block, the matching LIN and block, and the data hash. A byte-by-byte data
comparison is performed to verify the match, and the request is then processed.
• If no match is found, the data is written to the file natively and the hash for the
block is added to the in-memory index.

3: When a file is written to OneFS using inline data compression, the logical space
of file is divided up into equal sized chunks that are called compression chunks.

• Compaction is used to create 128 KB compression chunks, with each chunk


consisting of sixteen 8 KB data blocks.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 379


[email protected]
OneFS Services

• This is optimal since 128 KB is the same chunk size that OneFS uses for its
data protection stripe units.
• It provides simplicity and efficiency, by avoiding the overhead of additional
chunk packing.

If OneFS SmartDedupe is also licensed and running on the cluster, this data
reduction savings value reflects a combination of compression, inline deduplication,
and postprocess deduplication savings. If both inline compression and
deduplication are enabled on a cluster, zero block removal is performed first,
followed by deduplication, and then compression. This order allows each phase to
reduce the scope of work each subsequent phase.

In-line compression example:

Consider the following 128 KB chunk:

After compression, this chunk is reduced from sixteen to six 8KB blocks in size.
This means that this chunk is now physically 48 KB in size. OneFS provides a
transparent logical overlay to the physical attributes.

This overlay describes whether the backing data is compressed or not and which
blocks in the chunk are physical or sparse, such that file system consumers are
unaffected by compression.

The compressed chunk is logically represented as 128 KB in size, regardless of its


actual physical size. The orange sector in the graphic represents the trailing,
partially filled 8 KB block in the chunk. Depending on how each 128 KB chunk
compresses, the last block is underutilized up to 7 KB after compression.

PowerScale Advanced Administration

Page 380 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

Efficiency savings must be at least 8 KB (one block) for compression to occur,


otherwise that chunk or file passes over and remain in its original, uncompressed
state.

For example, a file of 16 KB that yields 8 KB (one block) of savings would be


compressed. Once a file has been compressed, it is then protected with FEC parity
blocks, reducing the number of FEC blocks and therefore providing further overall
storage savings.

Compression chunks never cross node pools. This avoids the need to decompress
or recompress data to change protection levels, perform recovered writes, or shift
protection-group boundaries.

Inline Data Reduction Efficiency Reporting

OneFS provides six principle reporting methods for obtaining efficiency information
with inline data reduction.

isi statistics data-reduction

The most comprehensive of the data reduction reporting CLI utility is the isi
statistics data-reduction command.

The recent writes data to the left of the output provides precise statistics for the
five-minute period before running the command.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 381


[email protected]
OneFS Services

isi compression

The isi compression stats command provides the option to view


compression statistics.

The isi compression stats command also accepts the list argument, which
consolidates a series of recent reports into a list of the compression activity across
the file system.

The isi compression stats


provides a count of logical and physical
When run in view mode, the command blocks and compression ratios, plus the
returns the compression ratio for both percentage metrics for incompressible
compressed and all writes, also the and skipped blocks.
percentage of incompressible writes, for
a prior five-minute (300 seconds)
interval.

isi dedupe

The isi dedupe stats command provides cluster deduplication data usage and
savings statistics, in both logical and physical terms.

The isi dedupe stats output reflects the sum of both in-line dedupe and
SmartDedupe efficiency.

PowerScale Advanced Administration

Page 382 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

isi get -O

The addition of a -O logical overlay flag to isi get for viewing compression
details of a file.

The logical overlay information is described under the protection groups output.
The example in graphic shows a compressed file where the sixteen-block chunk is
compressed down to six physical blocks (#6) and ten sparse blocks (#10).

Under the Metatree logical blocks section, a breakdown of the block types and
their respective quantities in the file is displayed - including a count of compressed
blocks.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 383


[email protected]
OneFS Services

Configuring SmartQuotas reporting

In OneFS 8.2.1 and later, OneFS SmartQuotas has been enhanced to report the
capacity saving from in-line data reduction as a storage efficiency ratio354.

On a cluster with licensed and configured SmartQuotas, the efficiency ratio can be
easily viewed from the WebUI or using the isi quota quotas list CLI
command.

Dashboard Storage Efficiency Summary

In OneFS 8.2.1 and later, the OneFS WebUI cluster dashboard now displays a
storage efficiency tile, which shows physical and logical space utilization
histograms and reports the capacity saving from in-line data reduction as a storage
efficiency ratio.

354SmartQuotas reports efficiency as a ratio across the desired data set as


specified in the quota path field. The efficiency ratio is for the full quota directory
and its contents, including any overhead, and reflects the net efficiency of
compression and deduplication.

PowerScale Advanced Administration

Page 384 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

Graphics shows OneFS WebUI Cluster Status Dashboard – Storage Efficiency Summary

The cluster data reduction metrics on the right of the output are slightly less real
time but reflect the overall data and efficiencies across the cluster. This metric is
designated by the Est. prefix, denoting an estimated value.

The ratio data in each column is calculated from the values above it. For example,
to calculate the data reduction ratio, the logical data (effective) is divided by the
preprotected physical (usable) value. The calculated data Reduction ratio is
1.76:1 (339.50 / 192.87 = 1.76).

Considerations: Mixed Datasets

Areas to consider when discussing mixed datasets.


• Different file sizes incur different protection overhead depending on the size and
the protection level set.
• Most datasets include mix of small and large files.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 385


[email protected]
OneFS Services

• Storage consolidation creates datasets with mixed file sizes, reducing total
storage overhead.
• Analyze full distribution of small and large files, not average – average file size
calculates to higher storage overhead.

Link: See DELL EMC PowerScale OneFs Storage Efficiency white


paper for more information.

SFSE is a storage efficiency product, not a performance product.

The list show areas to consider.


• If a file is fragmented, the time to retrieve the file may be greater than the time
to retrieve an unpacked file.
• You cannot containerize CloudPools stubbed files.
• Since OneFS unpacks SyncIQ data, the target cluster needs SmartPools
licensed and configured packing.
• The alternate data streams of a file are not containerized by default.
• Packing and unpacking are logically preserving actions and do not trigger a
snapshot.
• Since deduped data is in a ShadowStore, SFSE has little benefit.
• SmartDedupe skips data already in the shadow store.

PowerScale Advanced Administration

Page 386 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

SFSE Defragmentation Overview

SFSE is designed for infrequently modified, archive-type datasets.

• SFSE Estimation tool355


• Containers356
• Shadow stores are similar to regular files.
• The shadow store defragmenter integrates into the ShadowStoreDelete job,
runs on a schedule, and reduces the fragmentation.

355The SFSE estimation tool can anticipate the expected savings from the SFSE
feature. SFSE targets workflows with files less than 1 MB in size.

356 Improvements in storage efficiency are achieved by packing multiple small files
into shadow stores called containers.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 387


[email protected]
OneFS Services

Tip: See Small File Storage Efficiency for Archive to learn more on
points explaining SFSE for archive.

When files with shadow references are deleted, truncated, or overwritten, it can
leave unreferenced blocks in the shadow stores. These blocks are later freed and
result in holes which causes fragmentation and reduces the storage efficiency.
Reclaiming the space is the problem the defragmentation tool, or defragmenter,
solves.

The shadow store defragmenter helps expand the SFSE feature for archive-type
workloads. To improve storage efficiency, the defragmenter reduces fragmentation
that overwrites and deletes cause. Limit overwrites and deletes to containerized
files, which cause fragmentation and impact both file read performance and storage
efficiency. In OneFS 8.2, the ShadowStoreDelete Job runs on a daily schedule
instead of a weekly schedule. The defragmenter divides each shadow store up into
logical chunks and assesses each chunk for fragmentation. If the current storage
efficiency of each chunk is below a target efficiency, then OneFS moves the chunk
to another shadow store location. The default target efficiency is 90% of the
maximum storage efficiency available with the protection level on the shadow store.
Larger protection group sizes can tolerate a higher level of fragmentation before
the target efficiency drops below this threshold.

SFSE Administration and Support

The tabs show the SFSE administration tasks.

PowerScale Advanced Administration

Page 388 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

Prerequisites

Before considering to enable SFSE, ensure that the following prerequisites are
met:
• Archive dataset357
• Small files358
• SmartPools license359
• OneFS 9.0.0 or later360

Enable SFSE

Use the CLI to enable packing. Enabling is done using the CLI. If needed, use the
isi_packing command to configure the maximum file size value instead of
defining in the file pool policy. Also, consider the minimum age for packing (--min-
age <seconds>) when configuring.

Example:

# isi_packing --enabled=true

357SFSE is strictly an archive solution. An active dataset can generate


fragmentation which impacts performance and storage efficiency.

358Most of the archive consists of small files. By default, the threshold target file
size is from 0 MB to 1 MB.

359 Ensure SmartPools is licensed and active on the cluster.

360 SFSE is supported in OneFS 9.0.0 and later.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 389


[email protected]
OneFS Services

File Pool Policy

Configure using a file pool policy. Use a path-based file pool policy, where possible,
rather than complex file pool filtering logic. The default minimum age for packing is
one day. Due to the work to pack the files, the first SmartPools job may take a
relatively long time, but subsequent runs should be much faster.

Example create:
# isi filepool policies create fin_arc --enable-
packing=true --begin-filter --path=/ifs/finance/archive
--end-filter
Example verify:

# isi filepool policies view fin_arc

SmartPools Job

The SmartPools job containerizes the small files in the background. You can run
the job on-demand or using the nightly schedule. Data in a snapshot is not packed,
SFSE only containerizes HEAD file data. A threshold prevents very recently
modified files from being containerized. The SmartPoolsTree job, isi filepool
apply, and isi set can pack files.

Example:

# isi job jobs start SmartPools

Unpacking

Use the isi filepool command to unpack containerized files.

Example:

# isi filepool policies modify fin_arc --enable-


packing=false

PowerScale Advanced Administration

Page 390 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

Job Report

The commands that report on the status and effect of small file efficiency are isi
job and isi_packing --fsa.

The file packed field indicates number of successfully containerized files.

Monitoring

The isi_packing –fsa command provides the storage efficiency percentage.


The command requires an InsightIQ license and a successful run of the FSA job.
Since the isi_packing --fsa command reports on the entire /ifs file system,
the overall usage percentage can be misleading if non-containerized data is
present on the cluster.

Shadow Store Administration and Support

Select each tab for administrative tasks.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 391


[email protected]
OneFS Services

View shadow store

Before ShadowStoreDelete job:

After ShadowStoreDelete job:

The isi_sstore command is used to check the level of shadow store


fragmentation361.

Higher efficiency ratios362 are better.

Example in the graphic uses a 4+2 protection level and the maximum efficiency is
0.66, or 66%.

361The fragmentation score is the ratio of holes in the data where FEC is required.
Fully sparse stripes do not need FEC so are not included. Lower fragmentation
scores are better.

362Efficiency is the ratio of logical data blocks to total physical blocks, including
protection overhead. The protection group layout limits the maximum efficiency.

PowerScale Advanced Administration

Page 392 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

Estimation tool

Use the isi_sfse_assess <assess mode> [process options] [sysctl


options] command to launch the storage efficiency estimation tool. The tool
walks through a directory to estimate how many files OneFS can pack and the raw
space saving resulting from packing.

The example shows that running the command on a newly installed cluster does
not reflect a production system.

Enable defragmenter

Defragmentation is disabled by default. No license is required to enable. The


ShadowStoreDelete job performs the defragmentation process.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 393


[email protected]
OneFS Services

Enable or disable defragmentation globally on the cluster.

Output terminology:
• BSINs 363
• CSINs364
• Chunk size 365
• Target efficiency 366

363Enable defragmentation of shadow stores used by clone and dedupe. Can be


slow due to multiple references from files. Disabled by default.

364 Enables defragmentation of SFSE containers. Enabled by default.

365Chunk size is the logical independent unit of data considered for


defragmentation. Can be set up to 2 GB to force defragmentation of entire shadow
stores in a single pass. Default is 32 MB.

PowerScale Advanced Administration

Page 394 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

• PG efficiency 367
• Snapshots368

Defrag tool

The ShadowStoreDelete job distributes the work across the cluster, isi_sstore
defrag runs on a single node and with a single process. The command defaults
the target efficiency and chunk size to the values in the gconfig. You must explicitly
specify the rest of the options on the command line regardless of the gconfig
settings.

366Target efficiency is the minimum storage efficiency relative to the protection


level in use by the shadow store. Higher values are more aggressive. Default is
90%.

367PG efficiency causes more aggressive defragmentation if it can reduce the total
number of protection groups needed to store the shadow store data. Enabled by
default.

368Snapshots enable processes data referenced by files in snapshots. Can be


slow. Disabled by default.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 395


[email protected]
OneFS Services

The defragmenter can be run outside The primary use of isi_sstore


the ShadowStoreDelete job using the defrag is for assessments. The
isi_sstore defrag command. ShadowStoreDelete job has no
assessment mode.

Healthcare PACS

Hayden has asked you to migrate an


archive for a PACS workflow to a
PowerScale cluster.

PowerScale Advanced Administration

Page 396 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

• Picture Archiving and Communication Systems (PACS).369


• Most of the logical space that the cluster uses must be in small files. The
threshold size is 0 MB to 1 MB.
• There must be an active SmartPools license and a SmartPools policy that is
enabled on the cluster. Having a File System Analytics (FSA) license is highly
recommended, enabling the FSAnalyze job and isi_packing utility to monitor
storage efficiency.

In 2017, regulations no longer allow larger containers of small files. This forces
Hayden to use solutions that can handle smaller files with greater storage
efficiency. Hayden sees that the use case for the storage efficiency for PACS
feature is an archive scenario in the Healthcare PACS workflow. Seeing physicians
who require access to diagnostic imaging are causing a shift in the healthcare
market. That shift is causing PACS and vendor-neutral archive (VNA) vendors to
transition to noncontainerized studies.

There is a trade-off between storage efficiency and performance. The goal of small
file storage efficiency is to improve storage efficiency, which can affect
performance.

Packing

OneFS achieves efficiency by packing small files into a shadow store.

369In the Healthcare vertical, one of the regulations changes how PACS
applications must store their data. PACS applications can create and store larger
containers of small files.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 397


[email protected]
OneFS Services

• The shadow store for PACS isolates fragmentation and supports tiering and is
different from OneFS shadow stores.
• The shadow store for PACS is parity protected, which typically provides a
greater storage efficiency than mirroring.

Unpacked small files mirror After packing

Shadow store for


PACS

3x mirroring of small files Shadow store FEC protected

The graphic shows traditional small files with 3x mirroring, illustrating inefficiency. The After packing
graphic shows the packing process scans for small files and packs them into a shadow store for
PACS.

Interoperability

Storage efficiency for PACS interoperability consists of:


• SyncIQ
• File clones and Deduplication
• InsightIQ
• CloudPools
• SmartLock

Packed files are treated as normal files during failover and failback operations. If
the target PACS feature is enabled and the correct file pools policy can be applied,
packed files can be packed on the cluster.

• SyncIQ: SyncIQ does not synchronize the file pools policies. Manually create
the correct file pools policies on the target cluster. Best practice is to enable
Storage Efficiency for PACS on the source cluster and on the target cluster.
Enabling both sides retains the benefits of storage efficiency on both clusters.
The Storage Efficiency for PACS feature enables you to store more data for
small files: Plan your data replication accordingly. If syncing data between two
equal size clusters, and the PACS feature is enabled on one side. The side that

PowerScale Advanced Administration

Page 398 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

is not feature enabled could potentially run out of space first. Running out of
space blocks data replication.
• File clones and Deduplication: Interoperability is limited. Clones are not
optimized. Deduplication skips packed files. Cloned and deduplicated files
already have good storage efficiency without packing.
• InsightIQ: Packing does not update the disk usage fields, unpacking, or
deduplication operations. InsightIQ cluster summary figures accurately display
the used and free space. However, per directory and per file usage may not be
accurate.
• CloudPools: Stubbed files are not packed. Packed files are stubbed.
• SmartLock: Packing processes write once or read many (WORM) files as
regular files. WORM files are good candidates for packing. WORM files are
unlikely to cause fragmentation due to writing.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 399


[email protected]
OneFS Services

Migration

Migration

Overview

Migrating data
Migrating
authorization - ACLs,
POSIX

Considering pipe -
WAN, LAN, bandwidth

Considering features - Migrating client and


quotas, deduplication, application connectivity
snapshots, filtering

A data migration strategy ensures that the data is migrated with no loss of data
integrity.

A typical migration from a storage system to PowerScale can be complex. Source


environments vary, making each data migration strategy unique.

An analysis of the source environment must consider many aspects from data
integrity to data protection to the impact of feature functionality to environmental
factors.

A thorough analysis should reduce the data migration risk or identify barriers to the
migration.

PowerScale Advanced Administration

Page 400 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

Migration Phases

Migration is categorized in four phases.

1 2 3 4

1: Design a migration strategy that considers minimum risk and downtime. Perform
a detailed review of the migration source environment. Key aspects of the planning
phase are discovery and analysis of source infrastructure, the target data, and the
PowerScale cluster. Areas to consider are the mapping on the cluster, and features
such as access zone, quotas, snapshots, backups, and deduplication.

2: Review, validate, and test the strategy. Testing the migration is typically run on a
subset of data. The test should provide insight into the performance, timing, and
validity of data. Validity of data includes accessibility, permission models, and
workflow function. The test results should meet the requirements of the migration
strategy.

3: Running the migration typically requires an initial full copy and then incremental
updates. The "first full, then incremental" approach eases the migration cutover.
The cutover may involve halting writes on the source data, a final incremental copy,
and moving client and application connectivity.

4: Validate the migration once the cutover is complete. Validate the access and
data before enabling writes to the data. Once clients and applications write and

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 401


[email protected]
OneFS Services

modify data, a rollback becomes difficult. In the validation phase, monitor the
cluster, client access, performance, and feature functionality. Ideally, the post-
migration phase has no issues.

Planning Phase

How are you migrating the data,


security, and workflows?

When and how are you implementing


What are you migrating? the cutover?

In the planning phase, evaluate the infrastructure of the source system, network
architecture, and the network paths between the source data and the PowerScale
cluster.

Determine how the source data maps to the target end state on the cluster.

Components of the planning phase:


• Qualify the project.
• Define the project scope.
• Communicate expectations.
• Identify risks.
• Define a timeline.
• Outline migration requirements.

PowerScale Advanced Administration

Page 402 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

Testing Phase

Validate the migration


tool Migrate a subset of
data

Validate the
performance

Validate the data


transfer
Test the migrated data

The testing phase validates the migration strategy. The test should meet the
requirements of the strategy. The test enables you to tune and explore the use of
alternate migration tools, settings, and methods.

Testing phase goals:


• Migrate the data attributes and confirm access.
• Validate the data integrity.
• Data transfer benchmarks meet expectations.
• Get user acceptance.
• Test rollback plan.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 403


[email protected]
OneFS Services

Migrating Phase

Full (baseline) copy followed by


incremental copies
All data migrated

Rollback Cutover

The main migration core components are:


• Transfer of all targeted data
• Connections and clients moved to the PowerScale cluster
• Migrated data accepted
• If needed, rollback and check for successful rollback

PowerScale Advanced Administration

Page 404 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

Validating Phase

Monitor for access


issues

Monitor load, performance,


connections, user movement, and
security

Monitor for connection


issues

Transition plan for the


old system

Monitor the PowerScale and OneFS to ensure that all expectations are met. Re-
implement features such as quotas and snapshots if needed.

Migration Tools

The table list the common tools used to migrate data to the PowerScale cluster.

Tool Notes

Datadobi DobiMigrate • Simple, quick, and efficient


migrations
• Lessens many of the migration
pain points
• Handles very large PowerScale
migrations
• Resources

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 405


[email protected]
OneFS Services

isi_vol_copy370 • OneFS integrated tool


• For NetApp migrations
• SMB and NFS
• User and group permissions
• Uses NDMP

isi_vol_copy_vnx • For Celerra and VNX migrations

EMCOPY371 • SMB migrations


• Preferred tool

Robocopy372 • Microsoft tool


• SMB migrations

370isi_vol_copy supports data migration using NDMP. The tool enables the cluster
to mimic the behavior of the source system. The tool copies data from the source
system to the cluster.

371
EMCOPY copies files, directories, and subdirectories from SMB shares to other
SMB shares with the security and attributes intact.

372 Robocopy is a Microsoft file and directory copy tool.

PowerScale Advanced Administration

Page 406 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

rsync373 • NFS migrations


• Designed for synchronizing
directories
• Migrate difference when file
changes
• Open-source

tar, cpio • NFS migrations


• Good for one full copy of data
• Designed for backup and restore,
not copying
• No incremental copies

SecureCopy • SMB migrations


• Scheduling and logging
• Not free

Resource: https://www.dellemc.com/resources/en-us/asset/white-
papers/products/storage/h15756-netapp-to-onefs-migration-tools.pdf

373 The rsync tool copies files, directories, and subdirectories from one NFS export
to another. You can run rsync natively on a PowerScale node against locally
mounted NFS source exports that are mounted directly on a node. Data is migrated
directly from the source system to the PowerScale.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 407


[email protected]
OneFS Services

Example: SMB Migration

Warning: This example shows a simple migration for familiarization of


the process and tools. Migrations in production environments can be
much more extensive.

Challenge: Administrator plans to migrate 3 TB of data from a Windows-based file


server to the PowerScale. The target cluster is a three node A200. Network is 10
GbE. The tool that is used for the migration is EMCOPY.

Map PowerScale shares to the Windows server and migrate.

Planning

Best practices:
• Use the correct version of the tool.
• Understand the EMCOPY switches374.
• Use mapped drives375.

374 Different migrations use different switches to meet the migration requirements.

PowerScale Advanced Administration

Page 408 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

• Use a private migration network376.


• Use a specific migration account377.

Create Migration Shares

Each directory has 1 TB of data. EMCOPY is the migration tool.

• You can create a dedicated SMB migration share. Using a hidden ($) share will
hide the share from users browsing the SMB shares on the cluster.
• For extensive migrations, set restrictive share permissions to limit the migration
share access.
• This example creates new, user shares. Configure the share permissions prior
to the data migration.
• When creating the share, use the Do not change existing permissions
option.

EMCOPY Baseline Switches

Know the function of the command options. Get a full list of switches using the
emcopy.exe /? command.

375Mapped drives provide a consistent connection to the storage system, simplifies


connection strings, and enables the use of alternate credentials.

376Separates the migration traffic from the production traffic, allowing for maximum
throughput and reduces the potential production impact.

377 The tool must have access to the source and target data.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 409


[email protected]
OneFS Services

Running the Test Migration

First run a test before performing a full migration:

The example migrates data from the R: directory to the mapped X: directory. The
command copies owner security, directories and subdirectories, and will retry on
files.

• emcopy64.exe R:\eng X:\eng /o /s /sd /purge /c /r:1 /w:2


/log:C:\log

Validate

After the migration, validate the data and the file attributes. Ensure file data is
copied and intact. Verify file security, ownership, and attributes. Check that the
timestamps on the files are correct.

During the test migration, monitor and benchmark performance.

PowerScale Advanced Administration

Page 410 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

Migration

Once the test migration is validated. Run the migration. First run a full copy and
then incremental copies. For large and complex migrations, you may run many
incremental copies before performing a cutover.

Cutover

• Plan cutover window


• Make source data read-only
• Run a final incremental
• Update connection and name resolution protocols
• Make new source data read-write
• Test and monitor as production traffic moves over
• Redirect clients to new production shares
• Rollback if needed

Example: NFS Migration

Warning: This example shows a simple migration for familiarization of


the process and tools. Migrations in production environments can be
much more extensive.

Challenge: Administrator plans to migrate 3 TB of data from a Linux-based file


server to the PowerScale. The target cluster is a three node A200. Network is 10
GbE. The tool that is used for the migration is rsync.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 411


[email protected]
OneFS Services

Mount NFS exports to the PowerScale and migrate.

Planning

Best practices:
• Use the correct version of the tool.
• Understand the rsync switches378.
• Use local paths379.
• Restrict access380.

378Understand the rsync switches and when and how to use them. Each migration
may require the use of different switches. Start with the baseline switches when
testing the migration.

379The rsync tool can operate in both a local and remote mode, and can push or
pull data.

380Restricting access prevents users from changing data on the source server
instead of the migration target.

PowerScale Advanced Administration

Page 412 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

• Check for root_squash381.


• Create the OneFS NFS exports before migrating data. Creating before the
migration enables you to have the permissions set before the cutover.

rsync Baseline Switches

Know the function of the command options. Get a full list of switches using the man
rsync command.

Running the Test Migration

First run a test before performing a full migration:

The example migrates data from the locally mounted file system /mnt/eng, to the
OneFS target /ifs/divgen/eng.

• rsync -a /mnt/eng /ifs/divgen/eng

381root_squash prevents remotely connected user from having root privileges. Root
access is needed to migrate all the files and directories. Use the option
"no_root_squash".

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 413


[email protected]
OneFS Services

Validate

After the migration, validate the data and the file attributes. Ensure file data is
copied and intact. Verify file security, ownership, and attributes. Check that the
timestamps on the files are correct.

During the test migration, monitor and benchmark performance. Knowing how long
an incremental copy takes will help determine the time needed to perform the
cutover and the outage window.

Migration

Once the test migration is validated. Run the migration. First run a full copy and
then incremental copies. For large and complex migrations, you may run many
incremental copies before performing a cutover.

This example mounts the source data on the PowerScale node. For a large
migration, using multiple nodes scales bandwidth. Data moves from the source to
the cluster.

Cutover

• Plan cutover window and outage estimate


• Make source data read-only
• Run a final incremental
• Update connection and name resolution protocols
• Make new source data read-write
• Test and monitor as production traffic moves over
• Redirect clients to new production shares
• Rollback if needed

PowerScale Advanced Administration

Page 414 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

Challenge

Lab Assignment: Migrate user data from a NFS file server to PowerScale
cluster.
1) Perform a NFS migration test.
2) Validate the migrated test data.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 415


[email protected]
OneFS Services

SmartQuotas Advanced

SmartQuotas Advanced

SmartQuotas Recap

SmartQuotas enables administrators to understand, predict, control, and limit


storage usage across their organization and provision a cluster to best meet their
storage needs.

SmartQuotas also facilitate thin provisioning, or the ability to present more storage
capacity to applications and users than is physically present (overprovisioning).

PowerScale Advanced Administration

Page 416 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

Quota Types

Quota Types

SmartQuotas consists of two types of capacity quota:

• Accounting Quotas382
• Enforcement Quotas383

A SmartQuota can have one of four enforcement type settings: Hard, Soft or
Advisory.

There are three SmartQuotas enforcement states: Under (U), Over (O), Expired
(E).

QuotaScan Job

The QuotaScan job updates quota accounting for domains created on an existing
directory path.

 The administrator has the option of manually control if necessary384 or


desirable.

382Accounting Quotas simply monitor and report on the amount of storage


consumed, but do not take any limiting action or intervention. Instead, they are
primarily used for auditing, planning, or billing purposes.

383Enforcement Quotas on the other hand include all of the functionality of the
accounting option plus the ability to limit disk storage and send notifications. Using
enforcement limits, you can logically partition a cluster to control or restrict how
much storage that a user, group, or directory can use.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 417


[email protected]
OneFS Services

 Governance will instantaneously propagate from parent to child


incrementally385.
 A domain created on a non-empty directory will not be marked386 as ready.

The QuotaScan job is the cluster maintenance process responsible for scanning
the cluster to performing accounting activities to bring the desired governance to
each inode.

Quotas Daemons

There are three main processes or daemons that are associated with
SmartQuotas:

• isi_quota_notify_d: A daemon that generates notifications for events of


type limit exceeded and link denied. It also responds to configuration change
events and instructs the QDB to generate expired and violated overthreshold
notifications.

384Although it is typically run without any intervention, the administrator has the
option of manually control if necessary or desirable. By default, QuotaScan runs
with a ‘low’ impact policy and a low priority value of ‘6’.

385If quotas are created on empty directories, governance will instantaneously


propagate from parent to child incrementally. If the directory is not empty, the
QuotaScan job is used to update the governance.

386This triggers a QuotaScan job to be started. QuotaScan is executed by the


OneFS job engine and is subject to the general scheduling and prioritization of
jobs. The QuotaScan performs a tree walk to traverse the directory tree under the
domain root.

PowerScale Advanced Administration

Page 418 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

• isi_quota_sweeper_d: A daemon that is responsible for several quota


housekeeping tasks. Tasks may include propagating default changes, domain,
and notification rule garbage collection and kicking off QuotaScan jobs when
necessary.
• isi_quota_report_d: A daemon that is responsible for generating quota
reports. Since the QDB only produces real-time resource usage, reports are
necessary for providing point-in-time views of a quota domain usage. These
historical reports are useful for trend analysis of quota resource usage.

OneFS 8.2 and later also include the rpc.quotad service to facilitate client-side
quota reporting on UNIX and Linux clients using native quota tools. The service
which runs on tcp/udp port 762 is enabled by default, and control is under NFS
global settings.

Also, in OneFS 8.2 and later, users can view their available user capacity set by
soft and/or hard user and group quotas rather than the entire cluster capacity or
parent directory-quotas. This feature avoids the illusion of seeing available space
that may not be associated with their quotas.

Thin Provisioning and Scale-Out

A company in the Media and Entertainment


industry wants to overprovision storage and only
add capacity when needed. The organization
has 200 TB of capacity for the home directories
for 1000 users. Each user is to be allocated 500
GB, effectively thin provisioning 500TB’s. What
should Hayden do to enable this?

To enable thin provisioning, Hayden can perform following actions:

• Set directory quotas with hard quota of 500 GB each.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 419


[email protected]
OneFS Services

• Set up a system alert387 to notify the storage admin to add capacity (nodes)
when the 200 TB is 75% full.
• Scale out by adding additional capacity only when needed.

Quotas Report

Quota reports and summaries are stored in the


/ifs/.isilon/smartquotas/reports directory by default, but this location is
configurable.

Each generated report includes the quota domain definition, state, usage, and
global configuration settings388.

Quota Report Format

A quota report is a timestamped XML file that starts off with global configuration
settings and global notification rules.

387 Hayden may need to investigate. Chances are the 200 TB limit is a segment of
the cluster capacity and not the entire cluster capacity. Hayden can also add nodes
if users are reaching their limits. Chances are Hayden will notify the users to
cleanup their directories before making a big purchase.

388 Quota Notification Rules are read and inserted into a domain entry only if the
domain is not inherited. These rules are inserted to avoid any performance impact
of reading the Quota Notification Rules with each domain.

PowerScale Advanced Administration

Page 420 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

When listing domains, both inode & path and name & ID are stored with each
domain389.

Quota Report Management

Quota reports are managed by configuring settings that provide control over when
reports are scheduled. How they are generated, where and how many are stored
and how they are viewed.

The maximum number of scheduled reports that are available for viewing in the
web-administration interface can be configured for each report type390.

389 Quota Notification Rules are read and inserted into a domain entry only if the
domain is not inherited. These rules are inserted to avoid any performance impact
of reading the Quota Notification Rules with each domain.

390When the maximum number of reports is stored, the system automatically


deletes the oldest reports to make space for new reports as they are generated.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 421


[email protected]
OneFS Services

You can create manual reports at any time to view the current state of the storage
quotas system. These live reports can be saved manually. The OneFS CLI export
functionality uses the same data generation and storage format as quota reporting.
This functionality should not require any extra requirements beyond the three types
of reports. After the collection of the raw reporting data, data summaries can be
produced given a set of filtering parameters and sorting type. Reports can be
viewed from historical sampled data or a live system. In either case, the reports are
views of usage data at a given time. SmartQuotas does not provide reports on
aggregated data over time (that is trending reports). However, a Quota
Administrator uses the raw data to answer trending questions.

Quota Nesting

Nested quotas have multiple quotas within the same directory structure.

Quota limits do not reflect a likely deployment.

The isi quota quotas list command is used to compare the size of a quota
to the amount of data it holds.

PowerScale Advanced Administration

Page 422 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

The user quota is set for user Dante on /ifs/sales/sales-gen/MySales/test1 directory


of 50 MB. Any other user can write up to the parent directory (test1) limit (500 MB).
In the given example, user john is able to write in test1 directory. Since the quota
limit on /ifs/sales/sales-gen/MySales is 500 MB, john cannot write beyond this limit.

Warning: If you are setting a higher threshold than parent quota hard
threshold. This may cause the current threshold to be ignored.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 423


[email protected]
OneFS Services

At the top of the hierarchy, the /ifs/sales folder has a directory quota of 1 TB. Any
user can write data into this directory, up to a combined total of 1 TB. The
/ifs/sales/sales-gen directory has a group quota assigned that restricts the total
amount of write into this directory to 1 GB. Even though the parent directory (sales)
is below its quota restriction. The /ifs/sales/sales-gen/MySales directory has a
default user quota of 500 MB that restricts the capacity of this directory to 500 MB.
The /ifs/Sales/sales-gen/MySales/test1 directory has a user quota of 50 MB.

The /ifs/sales/sales-gen/Example directory has default user quota of 250 MB. The
/ifs/sales/sales-gen/Example/test3 directory has a user quota of 100 MB. However,
if users place 500 GB of data in the /ifs/sales/MySales directory, users can only
place 500 GB in the other directories. The parent directory cannot exceed 1 TB.

SmartQuotas and OneFS Features Integration

Snapshots and SmartQuotas

The quota configuration provides the option to include or not the snapshot data and
data-protection overhead upon the usage calculation of a quota.

PowerScale Advanced Administration

Page 424 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

Quota snapshot reporting.

SmartQuotas reports only on snapshots that are created after the quota domain
was created391.

Deduplication and SmartQuotas

Deduplicated files appear no differently than regular files to standard quota policies.

However, if the quota is configured to include data-protection overhead, the


additional space used by the shadow store will not be accounted for by the quota.

391 Determining quota governance (including QuotaScan job) for existing snapshots
is a time and resource consuming operation. However, as snapshots age out,
SmartQuotas gradually accrues accounting information for the entire set of relevant
snapshots.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 425


[email protected]
OneFS Services

In-line Compression and SmartQuotas

SmartQuota reports efficiency as a ratio across the desired data set as specified in
the quota path field.

The compression efficiency ratio is for the full quota directory and its contents,
including any overhead, and reflects the net efficiency of compression.

SyncIQ and SmartQuotas

Quotas are matched one-to-one across the replication set392.

SyncIQ Failover and Failback does not replicate cluster configurations such as
SMB shares and NFS exports, quotas, snapshots, and networking settings, from
the source cluster.

Multiple quotas are supported within a source directory or domain structure.

SyncIQ will never automatically delete393 an existing target quota.

392During replication SyncIQ ignores quota limits. However, if a quota is over limit,
quotas still prevent users from writing additional data. Ideally, whatever quotas are
set in the SyncIQ domain, the administrator should configure a quota domain on
the target directory with the same quota settings.

393
Instead, SyncIQ operation fails, as opposed to deleting an existing quota. This
may occur during an initial sync where the target directory has an existing quota
under it, or if a source directory is deleted that has a quota on it on the target. The
quotas still remains and requires administrative removal if desired.

PowerScale Advanced Administration

Page 426 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

CloudPools and SmartQuotas

Application logical quotas, available in OneFS 8.2 and later, provides a quota
accounting metric. The quota accounting metric accounts for reports and enforces
on the actual space consumed.

SmartQuotas Use Cases

Quota Management

A university wants to give their students and


groups a fixed amount of storage to control and
keep storage growth in check. Hayden wants to
know who the highest consumers are and limit
them. How can he accomplish this?

To manage the quotas, Hayden can take following actions:

• Set default user hard or soft quotas.


• Use InsightIQ reporting tool to monitor.
• Run the isi quota quotas list command to view the highest
consumption.
• Configure email alerts to students to encourage self-cleanup of file usage.

HPC Compute Farm Constraining

Scenario: A semiconductor company uses a large HPC compute cluster for parts
of their EDA workflow, and wants to guard against runaway jobs from consuming
massive amounts of storage. The company runs heavy computation jobs from a
large compute farm against a scratch space directory, housed on an F200 tier on
their cluster, and garbage collection is run at midnight.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 427


[email protected]
OneFS Services

Throughout the workday, its hard for the storage admins to keep track of storage
utilization. Occasionally, jobs from the compute gets out-of-control, tying up large
swathes of fast, expensive storage resources and capacity. What should be done
to help prevent this?

Solution: The following actions that can help prevent this:

• Set an advisory directory quota on the scratch space at 80% utilization for
advanced warning of an issue.
• Configure a hard directory quota to prevent writes at 90% utilization.

Considerations

Listed are best practices to consider when discussing SmartQuotas:

• Do not enforce quotas on file system root (/ifs).


• Enforcement quotas are not recommended for snapshot-tracking quota
domains.
• Governing a single directory with overlapping quotas can also degrade
performance.
− Too many nested quotas can limit performance.
• If quota reports are not in the default directory, you can run the isi quota
settings reports view command to find the directory where they are
stored.
• If two quotas are created on the same directory – for example an accounting
quota without Snapshots and a hard quota with Snapshots - the quota without
Snapshot data overrules the limit from the quota with Snapshot data.
• Disabling all quota notifications also disables all system notification behavior.
Use the —clear options to remove specific quota notification rules and fall back
to the system default.
• Thin provisioning can exceed cluster capacity.
• OneFS 8.2 and later:

− Increased from 20,000 quota limits per cluster to 500,000 quota limits per
cluster.

PowerScale Advanced Administration

Page 428 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

− Quota notification daemon optimized to handle about 20 email alerts per


second.
− Supports multiple email recipients for notifications and alerts, the maximum
size of the email address list supported is 1024 characters.

Link: For more information see the Storage Quota Management And
Provisioning With Dell EMC PowerScale SmartQuotas white paper.

Challenge

Lab Assignment:
1) Investigate and troubleshoot misconfigured quotas.
2) Add a notification rule using custom email templates.
3) Configure and monitor quota reports.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 429


[email protected]
OneFS Services

SnapshotIQ Advanced

SnapshotIQ Advanced

SnapshotIQ Overview

1
2

1: Only the changed blocks of a file are stored in a snapshot thereby ensuring
highly-efficient storage capacity utilization. User access to the available snapshots
is via a special hidden ‘snapshot directory' under each file system directory.

2: OneFS snapshots create little performance overhead, regardless of the level of


activity of the file system, the size of the file system or the size of the directory
being snapped.

PowerScale Advanced Administration

Page 430 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

• Snapshots394 are logical pointers to data stored on a cluster at a specific point in


time.
• If you modify a file and determine that the changes are unwanted, you can copy
or restore the file from the earlier file version.
• You can use snapshots to stage content to export, and ensure that a consistent
point-in-time copy of the data is replicated or backed up.
• The recovery time objective (RTO) of a snapshot can be very small and the
recovery point objective (RPO) is also highly flexible with the use of rich policies
and granular schedules.
• PowerScale recommends no more than 1024 snapshots for a single directory.

Important: A SnapshotIQ license395 is not required for system


initiated snapshots function.

394OneFS snapshots are used to protect data against accidental deletion and
modification. Because snapshots are available locally, users can restore their data
without administrative intervention.

395 Some OneFS operations generate snapshots for internal system use without
requiring a SnapshotIQ license. If an application generates a snapshot, and a
SnapshotIQ license is not configured, the snapshot can be still accessed. However,
all snapshots that OneFS operations generate are automatically deleted when no
longer needed. You can disable or enable SnapshotIQ at any time. Note that you
can create clones on the cluster using the "cp" command, which does not require a
SnapshotIQ license.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 431


[email protected]
OneFS Services

Data Protection with SnapshotIQ

• Snapshots can be identified and located either by a unique name or a system


generated snapshot ID396.
• SnapshotIQ can also create up to twenty thousand snapshots on a cluster.
• This large quantity provides a substantial benefit over the majority of other
snapshot implementations, because it allows the snapshot intervals to be far
shorter, and hence offer more granular recovery point objectives (RPOs).

396A snapshot ID is a numerical identifier that OneFS automatically assigns to a


snapshot.

PowerScale Advanced Administration

Page 432 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

Architecture

Snapshot Read Chain

SnapshotIQ has several fundamental differences as compared to most snapshot


implementations. The most significant of these are:

• Directory based397
• Logical snapshot process398
• Snapshot space allocation399

397
OneFS snapshots are per-directory based. This is in contrast to the traditional
approach where snapshots are taken at a file system or volume boundary.

398 Since OneFS manages and protects data at the file-level, there is no inherent
block-level indirection layer for snapshots to use. Instead, OneFS takes copies of
files or pieces of files (logical blocks and inodes) in what’s termed a logical
snapshot process.

399There is no requirement for reserved space for snapshots in OneFS. Snapshots


can use as much or little of the available file system space as desirable. A
snapshot reserve can be configured if preferred, although this will be an accounting
reservation rather than a hard limit

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 433


[email protected]
OneFS Services

In this example, to reconstruct a file with blocks 1-2-3-4 as it appeared at 08:00,


SnapshotIQ would need to read snapshots 12:00, 16:00 (4PM) and 20:00 (8PM).
The changes that took place during that span of time is recorded and accounted for
if necessary.

Snapshot Tracking Files (STF)

Snapshot Change Tracking.

PowerScale Advanced Administration

Page 434 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

Snapshot Tracking Files (STF)400 are the main data structures associated with a
snapshot. A snapshot tracking file has three major purposes:

• Indicating which snapshots are active.


• Storing snapshot attributes, such as usage data, creation time, and root
directory paths.
• Recording a list of LINs modified in the snapshot, which can be freed when the
snapshot is deleted.

In the given example, the Snapshot 0900_snap is tracked and the account
information of the changed blocks are given below in the box.

Snapshot Management

Reading from a Snapshot

When the data is not in the snapshot, the block tree of the inode on the snapshot
doesn’t point to a real data block. Instead it has a flag marking it as a Ditto Block401.

400 STFs are a special file type with several unique characteristics, and are involved
in the full snapshot life cycle, including the creation, storing any changes, and
deletion of snapshots.

401 A Ditto-block means that the data is the same as the next newer version of the
file, so OneFS will automatically look ahead to find the newer version of the block.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 435


[email protected]
OneFS Services

In this case, blocks 3 and 4 were changed after the first snapshot (Snap_ID 98) was taken and
before the second (Snap_ID 100), and blocks 0 and 4 where changed after the second snapshot
was taken

Painting Algorithm

PowerScale Advanced Administration

Page 436 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

When a file is written to, the system needs to do a small amount of work to
determine if the file is part of a snapshot. If so, a copy of the old data needs to be
kept. This is done via a process known as the painting algorithm402.

• When a file is modified, OneFS looks first at the file’s last_snap_id.


• If the last_snap_id is not the most recent snap_id, there is a likelihood that the
governing_snaps information in the file is out of date.
• In this case, OneFS recursively searches the parent directories until it finds up-
to-date information, and then uses the correct directory’s governing-snaps
information.

Snapshot Domains

SnapshotIQ adopts a domains model for governance of scheduled snapshots. By


utilizing the OneFS IFS domains infrastructure, recurring snapshot efficiency and
performance is increased by limiting the scope of governance to a smaller, well
defined domain boundary.

By leveraging IFS Domains, creating a new snapshot on a domain that is fully


marked will not cause further painting operations, so a significant portion of the
performance impact caused by taking a new snapshot is avoided.

402Snapshot “painting” is an expensive operation that has to be performed for


every file in the system whenever a new snapshot is taken – even on files which
are not part of the snapshot.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 437


[email protected]
OneFS Services

User Driven File Recovery

End-User
ifs/data/foo/bar.txt

ifs/data/foo/.snapshot/0900_snap/bar.txt

User driven file recovery with SnapshotIQ.

NFS and SMB users can view and recover data from OneFS snapshots, with the
appropriate access credentials and permissions.

Example - Accidental Deletion403

Snapshot Monitoring and Reporting

A report is available once a snapshot is created. Navigate to Data Protection >


Snapshots > Saved Snapshots and then select View Details. OneFS provides a
variety of information about snapshots, including the total amount of space
consumed by all snapshots.

403A user accidentally deletes a file ‘/ifs/data/foo/bar.txt’ at 9.10am and notices it’s
gone a couple of minutes later. By accessing the 9am snapshot, the user is able to
recover the deleted file themselves at 9.14am, by copying it directly from the
snapshot directory ‘/ifs/data/foo/.snapshot./0900_snap/bar.txt’ back to its original
location at ‘/ifs/data/foo/bar.txt’.

PowerScale Advanced Administration

Page 438 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

1 2 3 4

1: Indicates the total number of snapshots that exist on the cluster.

2: Indicates the total number of snapshots that were deleted on the cluster since
the last snapshot delete job was run. The space consumed by the deleted
snapshots is not freed until the snapshot delete job is run again.

3: Indicates the total number of snapshot aliases that exist on the cluster.

4: Indicates the total amount of space consumed by all snapshots.

Managing Snapshots

Delete Snapshot

Yes, the snapshot can be deleted if required. OneFS frees disk space occupied by
deleted snapshots when the SnapshotDelete job is run. Also, if you delete a

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 439


[email protected]
OneFS Services

snapshot that contains clones or cloned files, data in a shadow store might no
longer be referenced by files on the cluster.

CLI command to delete a snapshot: isi snapshot snapshots delete {--


all | --snapshot <snapshot> | --schedule <schedule> | --type
<type>}

Modify Snapshot Attributes

The name and expiration date of a snapshot can be modified by running the isi
snapshot snapshots modify <snapshot> {--name <name> | --
expires {<timestamp> | <duration>} | --clear-expires | --
alias <name>}... [--verbose] command.

Modify Snapshot Alias

PowerScale Advanced Administration

Page 440 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

A snapshot alias can be reassigned, to redirect clients from a snapshot to the live
file system.

CLI command to modify aliases: isi snapshot aliases modify <alias>


{--name <name> | --target <snapshot>} [--verbose]

Restoring Snapshots

Revert a Snapshot

To restore the snapshot, identify the ID of the snapshot by running: isi


snapshot snapshots view

CLI command to restore a snapshot: isi job jobs start <type> [--dm-
type {snaprevert | synciq}]

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 441


[email protected]
OneFS Services

Restore Using WebUI

Mapped share from the


PowerScale cluster

Right-click and select


Properties

List associated
snapshots with
modification time

Snapshot options

Navigate to the directory that you want to restore or the directory that contains the
file that you want to restore. If the directory has been deleted, you must recreate
the directory.

• Right-click the folder, and then click Properties.


• In the Properties window, click the Previous Versions tab.
• Select the version of the folder that you want to restore or the version of the
folder that contains the version of the file that you want to restore.
• Restore the version of the file or directory.

Restore via UNIX Command

A file or directory can be restored from a snapshot through the CLI command.

PowerScale Advanced Administration

Page 442 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

Example to create a copy of file1: cp -a


/ifs/.snapshot/Snapshot2014June04/archive/file1 \
/ifs/archive/file1_copy

Clone from Snapshot

Clone a file from the snapshot by running the cp command with the -c option.

Example command to clone test.txt from Snapshot2014Jun04: cp -c


/ifs/.snapshot/Snapshot2014June04/archive/tes.txt \
/ifs/archive/test_clone.txt

File Clones

• OneFS also provides the ability to create writable clones of files. OneFS File
Clones provides a rapid, efficient method for provisioning multiple writable
copies of files.
• Common blocks are shared between the original file and clone, providing space
efficiency and offering similar performance and protection levels across both.
• This mechanism is ideal for the rapid provisioning and protection of virtual
machine files and is integrated with VMware's linked cloning and block and file
storage APIs.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 443


[email protected]
OneFS Services

File Clones

Snapshot Reserve

• There is also no requirement for reserved space for snapshots in OneFS.


Snapshots can use as much or little of the available file system space as
desirable and necessary.
• A snapshot reserve can be configured if preferred, although this will be an
accounting reservation rather than a hard limit and is not a recommend best
practice.
• If desired, you can set snapshot reserve using the isi snapshot settings
modify –reserve command.

Best Practices

Dell EMC PowerScale recommends observing the following SnapshotIQ best


practices.

a. Configure the cluster to take fewer snapshots, and for the snapshots to expire
more quickly, so that less space will be consumed by old snapshots.
b. Using SmartPools, snapshots can physically reside on a different disk tier than
the original data.
c. Avoid creating snapshots of directories that are already referenced by other
snapshots.
d. It is recommended that you do not create more than 1000 hard links per file in a
snapshot to avoid performance degradation.
e. Always attempt to keep directory paths as shallow as possible. The deeper the
depth of directories referenced by snapshots, the greater the performance
degradation.

PowerScale Advanced Administration

Page 444 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

f. It is recommended to not create snapshots of /ifs. Avoid taking nested


snapshots, redundant snapshots, or overly scoped snapshots. Consider taking
snapshots of only the intermediate or most granularly scoped part.
g. It is recommended to always enable the snapshot delete job, disabled prevents
unused disk space from being freed and can also cause performance
degradation.
h. If you need to delete snapshots and there are down or smartfailed components,
or the cluster is in an otherwise degraded state, contact Dell EMC Technical
Support for assistance.
i. If you intend to revert snapshot for a directory, create a SnapRevert domain for
the directory while the directory is empty.
j. Create several snapshot schedules for a single directory, and then assign
different snapshot duration periods for each schedule. Ensure that all snapshots
are created at the same time when possible.
k. Avoid storing all snapshot data on lowest tier node pools.

Considerations

• Snapshots are created at the directory-level instead of the volume-level, thereby


providing improved granularity
• There is no requirement for reserved space for snapshots in OneFS. Snapshots
can use as much or little of the available file system space as desirable.
• Quotas are used to calculate a file and directory count that includes snapshot
revisions, provided the quota is configured to include snaps in its accounting via
the --snaps=true configuration option.
• Files with alternate data streams or resource forks are fully supported by
SnapshotIQ.
• The SmartDedupe job automatically ignores file system snapshots
• Snapshots of file clones, and shadow stores in general, are not allowed, since
shadow stores have no hard links.
• Snapshot data is not containerized by the OneFS SFSE feature.
• If a directory is moved, you cannot revert snapshots of that directory that were
taken prior to its move.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 445


[email protected]
OneFS Services

Challenge

Lab Assignment:
1) Create a SnapRevert domain and create snapshots.
2) Create and view a changelist.
3) Restore data using a snapshot.

PowerScale Advanced Administration

Page 446 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

SyncIQ Advanced

SyncIQ Advanced

SyncIQ Overview

• SyncIQ is the OneFS data replication module that provides consistent replicas
of data between two clusters.
• SyncIQ performance increases as cluster scales out.
• SyncIQ provides automated failover and failback capabilities404.

404 Failover and failback only include the cluster preparation activities and do not
include DNS changes, client redirection or any required networking changes.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 447


[email protected]
OneFS Services

4
5
2 3
1

1: The SyncIQ domain is the root of the replication, such as /ifs/div-gen.


Replication is from the source SyncIQ domain to the target SyncIQ domain.
Metadata, such as ACLs and alternate data streams are replicated along with data.

2: SyncIQ uses a policy-driven engine to execute replication jobs across all nodes.
The policy includes information to replicate and the replication schedule. The
administrator then starts the replication policy to launch a SyncIQ job. A policy is
like an invoice list of what should get replicated and how.

3: SyncIQ uses snapshot technology, taking a point in time copy of the source, or
SyncIQ domain, when the SyncIQ job starts. The first time the policy runs, an initial
or full replication of the data occurs. Subsequently, changes are tracked as they
occur and then a snapshot is taken for the change tracking. The new change list
begins when a snapshot is taken to begin the synchronization. On the source,
when a SyncIQ job completes successfully, the older source snapshot is deleted.
With SnapshotIQ licensed, administrators can choose to retain the snapshots for
historical purposes.

4: SyncIQ replicates asynchronously and is not a high availability disaster recovery


strategy. The target system passively acknowledges receipt of the data and returns
an ACK once the target receives the entire file or update. Then the data is
passively written to the target. There are risks and lag, such as missing data, time
to failover and time to failback. SyncIQ requires activating the SyncIQ licenses on
the primary and the secondary clusters before replicating.

PowerScale Advanced Administration

Page 448 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

5: The data on the target cluster is read-only. When a SyncIQ job completes
successfully, a snapshot is taken on the target cluster. This snapshot replaces the
previous last known good snapshot. If a sync job fails, the last known good
snapshot is used to reverse any target cluster modifications. Policies cannot point
to the same target path.

SyncIQ Components

SyncIQ has four processes: Scheduler, Coordinator, Workers and Target


Monitor.

One way replication from source to target

• Scheduler: Each PowerScale node has a Scheduler process running. It is


responsible for the creation and launch of SyncIQ data replication jobs and
creating the initial job directory. Based on the current SyncIQ configuration, the
Scheduler starts a new job and updates jobs based on any configuration
changes.
• Coordinator: The Scheduler launches the Coordinator process. The
Coordinators create and oversee the worker processes as a data replication job
runs. The Coordinator is responsible for snapshot management, report
generation, bandwidth throttling, managing target monitoring, and work
distribution.
• Workers: Primary workers and secondary workers run on the source and target
clusters, respectively. They are responsible for the actual data replication piece
during a SyncIQ job. Replication workers on the source cluster are paired with

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 449


[email protected]
OneFS Services

workers on the target cluster to accrue the benefits of parallel and distributed
data transfer.
• Target monitor: The target monitor provides critical information about the
target cluster and does not participate in the data transfer. It reports back with
IP addresses for target nodes including any changes on the target cluster.
Additionally, the target monitor takes target snapshots as they are required.

SyncIQ Use Cases

1 2

3 4

1: Disaster recovery requires replication of critical business data to a secondary


site. SyncIQ delivers high performance, asynchronous replication of data, providing
protection from both local site and regional disasters, to satisfy a range of recovery
objectives. SyncIQ has a policy-driven engine that allows customization of
replication datasets to minimize system impact while still meeting data protection
requirements. The SyncIQ automated data failover and failback reduces the time,
complexity, and risks that are involved with transferring operations between a
primary and secondary site, to meet an organization’s recovery objectives. This
functionality is crucial to the success of a disaster recovery plan.

2: A business continuance solution must meet the most aggressive recovery


objectives for the most timely, critical data. SyncIQ provides performance that
scales to maximize usage of the available network bandwidth and provides
administrators replication time for aggressive Recovery Point Objectives (RPO).
Use SyncIQ in concert with the SnapshotIQ module, which allows the storage of
point-in-time snapshots to support secondary activities like the backup to tape.

PowerScale Advanced Administration

Page 450 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

3: SyncIQ provides a disk-to-disk backup and restore solution with scalable


performance. This enables IT organizations to reduce backup and restore times
and costs, eliminate complexity, and minimize risk. Petabytes of backup storage is
managed within a single system, as one volume, and one file system and can be
the disk backup target for multiple PowerScale clusters.

4: With the copy policy in SyncIQ, you can delete data on the source without
affecting the target, leaving a remote archive for disk-based tertiary storage
applications or staging data before it moves to offline storage. Remote archiving is
ideal for intellectual property preservation, long-term records retention, or project
archiving.

SyncIQ Performance Management

One of the simplest ways to manage resource consumption on the source and
target clusters is with proper planning of job scheduling.

Proactive Scheduling

• If the business has certain periods when response time for clients is critical,
then schedule replication around these times.
• If a cluster is a target for multiple source clusters, then modifying schedules to
evenly distribute jobs throughout the day is also possible.

Directory Selection

• Improve directory selection to help maintain performance at source or target.


• Improve the process speed by excluding unnecessary data from replication.

However, when required RTOs and RPOs dictate that replication schedules be
more aggressive or data sets be more complete, there are other features of SyncIQ
that help address this.

Worker Control

SyncIQ offers administrators the ability to control the number of workers that are
created when a SyncIQ job is run. This can improve performance when required or
limit resource load if necessary.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 451


[email protected]
OneFS Services

Administrators can also specify which source and target nodes are used for
replication jobs on a per policy basis. This allows for the distribution of workload
across specific nodes to avoid using resources on other nodes that are performing
more critical functions.

• OneFS 8.0+ limits:


• Defined policies per cluster = 1,000
• Concurrent running policies per cluster = 50
• Maximum workers per cluster node = 100
• Maximum workers (pworkers) per cluster - determined by number of CPU cores:
• Default is 4 * [# of CPU Cores].
• Example: 20 nodes with 1 x 4-core CPU per node = (20 * 1 * 4 * 4) = 320
pworkers
• Example: 15 nodes with 2 x 4-core CPU per node = (15 * 2 * 4 * 4) = 480
pworkers
• Maximum workers (pworkers) per SyncIQ policy - determined by number of
nodes

• Default is 8 * [# of nodes].
• Example: 20 nodes * 8 = 160 maximum pworkers per policy
• Workers are dynamically allocated to policies based on the size of the
cluster and the number of running policies.405

405
Workers from the pool are assigned to a policy when it starts, and the number of
workers on a policy will change over time as individual policies start and stop. The
goal is that each running policy always has an equal number (+/- 1) of the available
workers assigned.

PowerScale Advanced Administration

Page 452 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

Policy Bandwidth

• Limits the replication bandwidth between the source and target cluster to
preserve network performance.
• Useful when the link between the clusters has limited bandwidth or to maintain
performance on the local network.
• Administrators can limit the number of files that are processed in a given period
to limit node resource load

− Practical only if the majority of the files were close in size.

Performance Rules

• You can manage the effect of replication on cluster performance by creating


performance rules406.
• You can configure multiple network rules to allow for different bandwidth limits
at different times.
• You can modify system resource load by using file operation rules.
• You can schedule when the limits are in effect.

406 Using performance rules, you can set network and file processing threshold
limits to limit resource usage. You can configure network-usage rules that limit the
bandwidth that is used by SyncIQ replication processes. This may be useful during
peak usage times to preserve the network bandwidth for client response. Limits are
also applied to minimize network consumption on a low bandwidth WAN link that
exists between source and target.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 453


[email protected]
OneFS Services

Limit Concurrent Jobs

• You can limit the number of concurrent SyncIQ jobs running during peak cluster
usage and client activity.
• Consider all factors prior to limiting the number of concurrent SyncIQ jobs, as
policies may take more time to complete, impacting RPO and RTO times.
• Configuration Steps:

− Modify /ifs/.ifsvar/modules/tsm/config/siq-conf.gc using a text editor.


− Change the scheduler.max_concurrent_jobs line to represent the
maximum number of concurrent jobs for the cluster.
− Restart SyncIQ services by executing the isi sync settings modify
--service off;sleep5; isi sync settings modify --service
on command.

SyncIQ Policies - Encryption Support

One 8.2 and later provides over-the-wire, end-to-end encryption for SyncIQ data
replication, protecting and securing in-flight data between clusters.

• SyncIQ encryption uses X.509 certificates407.


• A global setting408 is available enforcing encryption on all incoming and
outgoing SyncIQ policies.

407The certificates are stored and managed in the certificate stores of the source
and target clusters. Encryption between clusters takes place with each cluster
storing its own certificate and the certificate of its peer. Storing the certificate of the
peer essentially creates an approved list of clusters for data replication.
Certification revocation is supported through an external Online Certificate Status
Protocol (OCSP) responder.

PowerScale Advanced Administration

Page 454 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

• SyncIQ encryption supports protocol version: TLS 1.2 and OpenSSL 1.0.2
• A TLS authentication failure causes the corresponding SyncIQ job to
immediately fail.
• SyncIQ peers must store the end entity certificates of each other.
• Customers are responsible for creating, managing, and safeguarding their own
X.509 certificates.

SyncIQ Encryption - Configuration

OneFS recommends configuring SyncIQ encryption using certificates signed by a


Certificate Authority. Alternatively, a self-signed certificate can be used for SyncIQ
encryption.

408The clusters require all incoming and outgoing SyncIQ policies to be encrypted
through a simple change in the SyncIQ global settings.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 455


[email protected]
OneFS Services

Scenario

✓ Create X.509 certificates signed by a certificate authority for source and target
clusters.
✓ Add the certificates to the appropriate source cluster stores.
✓ Set the SyncIQ cluster certificate on the source cluster.
✓ Add the certificates to the appropriate target cluster stores.
✓ Set the SyncIQ cluster certificate on the target cluster.
✓ Create encrypted SyncIQ policy on the source cluster.

Step 1

Create two X.509 certificates signed by a certificate authority:

• <ca_cert_id>
• <src_cert_id>
• <tgt_cert_id>

Step 2

Add the certificates to the appropriate source cluster stores:

• isi sync cert server import <src_cert_id> <src_key>

PowerScale Advanced Administration

Page 456 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

• isi sync cert peer import <tgt_cert_id>


• isi cert authority import <ca_cert_id>

Step 3

On the source cluster, set the SyncIQ cluster certificate:

• isi sync settings modify --cluster-certificate-


id=<src_cert_id>

Step 4

Add the certificates to the appropriate target cluster stores:

• isi sync cert server import <tgt_cert_id> <tgt_key>


• isi sync cert peer import <src_cert_id>
• isi cert authority import <ca_cert_id>

Step 5

On the target cluster, set the SyncIQ cluster certificate:

• isi sync settings modify --cluster-certificate-


id=<tgt_cert_id>

Step 6

On the source cluster, create an encrypted SyncIQ policy:

• isi sync pol create <pol_name> sync <src_dir> <target_ip>


<tgt_dir> --target-certificate-id=<tgt_cert_id>

Optional Configuration steps

• Update the policy to use a specified SSL cipher suite


• isi sync pool modify <pol_name> --encryption-cipher-
list=<suite>
• Update the target cluster to check the revocation status of incoming certificates

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 457


[email protected]
OneFS Services

• isi sync settings modify --ocsp-address=<address> --


ocsp-issuer-certificate-id=<ca_cert_id>
• Update how frequently encrypted connections are renegotiated on a cluster
• isi sync settings modify --renegotiation-period=24H
• Require that all incoming and outgoing SyncIQ policies are encrypted

• isi sync settings modify --encryption-required=True

Resource: For the description for each command, view the CLI
Command Reference Guide. For more information about SyncIQ
encryption, view the Dell EMC PowerScale SyncIQ: Architecture,
Configuration, and Considerations white paper.

SyncIQ Encryption - Troubleshooting

1. Check the report of the SyncIQ policy in question.


• The reason for the failure should be in the report.
• If the failure was due to a TLS authentication failure, then the error message
from the TLS library will also be provided in the report.
2. If it is determined that it is a TLS authentication failure, then more detailed
information can be found in the /var/log/messages on the source and target
clusters.

Detailed information includes:


• The ID of the certificate that caused the failure.
• The subject name of the certificate that caused the failure.
• The depth at which the failure occurred in the certificate chain.
• The error code and reason for the failure.

Performance Rules

• Performance rules allow you to define limits on resource consumption by


SyncIQ policies continuously or during a specific time.

PowerScale Advanced Administration

Page 458 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

• Setting performance rules enables control over resources for different types of
workflows.409
• The performance rules apply to all policies running during the specified time
interval.
• CLI command: isi sync rules create

WebUI Navigation: Cluster management > Data Protection > SyncIQ >
Performance rules

409SyncIQ uses aggregate resources across the cluster to maximize replication


performance, thus potentially affecting other cluster operations and client response.
The default performance configurations, number of workers, network use, and CPU
consumption may not be optimal for certain datasets or the processing needs of the
business.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 459


[email protected]
OneFS Services

1: You can disable a performance rule to temporarily prevent the rule from being
enforced. You can also enable a performance rule after it has been disabled.

2:

• Bandwidth: maximum amount of network bandwidth a SyncIQ policy can


consume.
• File Count: maximum number of files that replication jobs can send per second.
• CPU: limits the CPU consumption to a percentage of the total available CPU.
• Workers: limits the number of workers available to a percentage of the
maximum possible workers.

3: Based on the rule type, you can set the limit in terms of kb/s for bandwidth,
files/s for file count, and % limit for CPU and workers.

PowerScale Advanced Administration

Page 460 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

4: The rule is enforced during the specified time interval for the selected days.

SyncIQ Per-Policy Bandwidth Reservation Overview

• You can configure bandwidth reservations on a per-policy basis410, providing


granularity for each policy.
• The global bandwidth reservation is applied as a combined limit of the
policies411.
• Bandwidth calculation is based on the bandwidth rule, not on the network
bandwidth or throughput available.
• When a policy does not have a specified reservation, they are allocated from
the reserve that is specified in the global configuration settings412.

410SyncIQ meets these reservations based on currently running and existing


bandwidth rules and schedule. SyncIQ per-policy bandwidth reservation allows
customers to specify the amount of bandwidth desired for a particular policy.

411As bandwidth reservations are configured, consider the global bandwidth policy
which may have an associated schedule. The global reservation is split amongst
the running policies.

412The bandwidth reserve is specified as a global configuration parameter, as a


percentage of the global configured bandwidth or an absolute limit in bits per
second. When a bandwidth reservation is not configured for a specific policy, the
default bandwidth reserve is 1% of the global configured bandwidth. The default is
set at this level to encourage administrators to configure the bandwidth reservation
per-policy.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 461


[email protected]
OneFS Services

SyncIQ Per-Policy Bandwidth Reservation - Scenarios

• Scenario 1: More bandwidth available than reservations.413


• Scenario 2: Not enough bandwidth available for all policies to get requested
amount414

413
Even split of bandwidth across all running policies. This is current behavior of
bandwidth rules.

414Even split of bandwidth across all running policies, until they reach their
requested reservation. This effectively ensures that the policies with the lowest
reservation amount reaches their reservation before policies with larger
reservations, preventing starvation.

PowerScale Advanced Administration

Page 462 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

Example 1 - Insufficient bandwidth for all policies

• Global bandwidth rule of 30 Mb.


• Policy 1 has a bandwidth reservation of 20 Mb.
• Policy 2 has a bandwidth reservation of 40 Mb.
• Policy 3 has a bandwidth reservation of 60 Mb.
• Result: Each policy is allocated 10 Mb.

Example 2 - Reservation met for some policies

• Global bandwidth rule of 80 Mb.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 463


[email protected]
OneFS Services

• Policy 1 has a bandwidth reservation of 20 Mb.


• Policy 2 has a bandwidth reservation of 40 Mb.
• Policy 3 has a bandwidth reservation of 60 Mb.
• Result:

− Reservation met for policy 1.


− Remaining bandwidth split between policy 2 and 3.

Example 3 - Extra bandwidth available

• Global bandwidth rule of 80 Mb.


• Policy 1 has a bandwidth reservation of 10 Mb.
• Policy 2 has a bandwidth reservation of 20 Mb.
• Policy 3 has a bandwidth reservation of 30 Mb.
• Result:

− Policy 3 gets the required 30 Mb.


− Policy 1 and 2 each get 25 Mb of the remaining bandwidth.

PowerScale Advanced Administration

Page 464 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

SyncIQ Per-Policy Bandwidth Reservation - Configuration

Step 1

The first step in configuring a per policy bandwidth reservation is to configure a


global bandwidth performance rule:

• isi sync rules

Step 2

For each policy, configure desired bandwidth amount to reserve:

• isi sync policy <create | modify> --bandwidth-reservation=#

Optional Steps

These settings relate to bandwidth allocation to policies that do not have a


reservation. By default, there is a 1% percentage reserve.

• To configure a bandwidth reservation percentage:


− isi sync settings modify --bandwidth-reservation-
reserve-percentage=[% of global bandwidth reservation]
• To configure a bandwidth reservation in bits per second rather than a
percentage:
− isi sync settings modify --bandwidth-reservation-
reserve-absolute=[bits per second]
• To clear a configured bandwidth reserve:

− isi sync settings modify --clear-bandwidth-reservation-


reserve

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 465


[email protected]
OneFS Services

Use Case

✓ Create a performance rule to limit the bandwidth on Monday.


✓ Modify the policy for the Customer Base directory to allocate 50% of the
available bandwidth.
✓ Modify the policy for the Trend Analysis and Revenue directories to allocate
each 25% of the available bandwidth.

SyncIQ Per-Policy Bandwidth Reservation - Troubleshooting

Compare assigned bandwidth between bandwidth daemon and all currently


running coordinators with the currently specified bandwidth rules and policies to
determine the current state and if it is expected.
• Policy configuration
• isi sync policy view <name>
• Bandwidth Rules
• isi sync rules list
• USR2 output from isi_migr_bandwidth
• On node 1 (or specified node): killall–USR2
isi_migr_bandwidth
• /var/tmp/isi_migr_bandwidth_[pid].txt
• USR2 output from isi_migrate

PowerScale Advanced Administration

Page 466 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

• Find the coordinator node using the run directory.


• kill -9 <pid of isi_migrate>
• /var/tmp/migrate_status.txt
• Global SyncIQ unallocated reserve settings

• isi sync settings list

SyncIQ and SmartLock Compatibility

• For SyncIQ and SmartLock environments it is essential to ensure all node


clocks are synchronized415.
• During replication, metadata related to retention date and commit status:
− Persists when replicating data from a source SmartLock directory to a target
SmartLock directory.
− Lost when replicating from a SmartLock directory to a non-SmartLock
directory.
• If a SyncIQ fails with Compliance SmartLock directories, do not break or reset
the policy.416

415 It is recommended to have all nodes on the source and target clusters
configured with Network Time Protocol (NTP) Peer Mode. If Compliance SmartLock
is required, all source and target nodes must be configured in NTP Peer Mode prior
to configuring the compliance clock.

416 Contact support for help to recover the policy. Breaking or resetting the policy
results in duplicate data consuming space because the users are forced to create a
new policy to a new empty target path. The old target path will have to remain with
its data since our SOX compliant code does not allow deleting or overwriting.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 467


[email protected]
OneFS Services

Configurations supported
Source Site Target Site
Create target SmartLock compliance directory before
replicating

Create target SmartLock enterprise directory before


replicating

Retention dates and commit status lost

Only if files are not committed to a WORM state

Configurations not supported

Source directory type Target directory type Failback


Allowed

Non-SmartLock Compliance SmartLock NO

Enterprise SmartLock Compliance SmartLock

Compliance SmartLock Non-WORM

Compliance SmartLock Enterprise SmartLock

Best Practice: Keep the source and target directories of the same
type.

SyncIQ Features Compatibility

Data Reduction

• SmartDedupe:

PowerScale Advanced Administration

Page 468 © Copyright 2020 Dell Inc.


[email protected]
OneFS Services

− Deduplicated files on source are rehydrated to their original size when


replicated to target.
− Run SmartDedupe on target once replication is complete.
− Shadow stores are not transferred to target clusters or backup devices.
• Source clusters consists of F810, F600, F200, or H5600:

− Source data is rehydrated, decompressed and transferred uncompressed to


the target cluster.
− When target cluster consists of F810, F600, F200, or H5600, the replication
data goes through the same inline compression and deduplication as any
other data that is written to these platforms.

Small File Storage Efficiency (SFSE)

• SFSE dataset is unpacked on the source prior to replication.


• When target has SFSE enabled, the dataset is packed when the next
SmartPools job runs on the target.
• When target cluster has SFSE disabled, the dataset remains unpacked.

Large File Support

• Source cluster enabled with 16 TiB file support can only connect to targets that
are also enabled for 16 TiB file support.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 469


[email protected]
OneFS Services

− SyncIQ policies fail when establishing a connection with a cluster without


large file support.
• Source cluster without large file support (max file size of 4 TiB) can connect to
targets enabled with 16 TiB file support.
• Workflow impacts for existing SyncIQ policies are possible when the target
cluster does not have resources for the 16 TiB feature.

OneFS Version Compatibility

It is recommended to have the same OneFS version and patches on both the
source and target cluster.

Source Cluster Target Cluster OneFS Version


OneFS Version
7.2.x 8.x and 9.0 8.2.2 and 9.0 with
16 TiB Feature

7.2.x Yes Yes No

8.x and 9.0 Yes Yes No

8.2.2 and 9.0 with No No Yes


16 TiB Feature

PowerScale Advanced Administration

Page 470 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Disaster Recovery

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 471


[email protected]
Disaster Recovery

Disaster Recovery

PowerScale Advanced Administration

Page 472 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Module Objectives

After completion of this module, you can:

• Describe data protection and disaster recovery approaches.


• Describe initial synchronization and incremental synchronization.
• Identify SyncIQ policy modification and failover and failback phases.
• Describe NDMP backup options and backup enhancement.
• Identify cloud and virtual storage strategies and cloud policies.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 473


[email protected]
Disaster Recovery

Data Protection and Disaster Recovery

Data Protection and Disaster Recovery

OneFS Data Protection

Hayden wants to know about the different


data protection strategies provided by
OneFS to recover from a disaster. Also, the
tools that are best used with different
workflows.

• Data protection and backup are broad terms that are used to describe a host of
tools.
• File-level data protection uses FEC417.
• Other protection tools such as snapshots, data replication, and NDMP backup
can be employed to protect data.
• Typically, there is no one size fits all solution.

417 Reed-Solomon Forward Error Correction

PowerScale Advanced Administration

Page 474 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Disaster Recovery Approaches

The graphic shows the data recovery approaches in order of decreasing timeliness.

• Over the past decade, several technologies like replication, synchronization418


and snapshots419, in addition to disk based backup have become mainstream
and established their place within the data protection realm.
• These approaches include a form of point-in-time snapshots for fast recovery,
followed by synchronous, and asynchronous replication.
• Backup to tape420 or a virtual tape library sits at the end of the continuum,
providing insurance against large-scale data loss, natural disasters, and other
catastrophic events.

418
Synchronization and replication provide valuable tools for business continuance
and offsite disaster recovery.

419Snapshots offer rapid, user-driven restores without the need for administrative
assistance.

420 Data protection was always synonymous with tape backup.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 475


[email protected]
Disaster Recovery

Data Protection Continuum

At the beginning of the continuum sits high availability. Redundancy and fault
tolerant designs satisfy this requirement. The goal here is continuous availability
and the avoidance of downtime by the use of redundant components and services.

• Snapshots421
• Replication422
• NDMP Backup423

OneFS Data Protection technology alignment with protection continuum.

421 Snapshots are frequently used to back up the data for short-term retention and
to satisfy low recovery objective SLAs.

422Replication of data from the primary cluster to a target DR cluster, ideally


located at a geographically separate location, is recommended.

423NDMP backup to tape or VTL (virtual tape library) typically satisfies longer term
high recovery objective SLAs and any regulatory compliance requirements.

PowerScale Advanced Administration

Page 476 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

1:

• Backup to Tape
• Higher RTO and RPO

2:

• Offsite disaster recovery


• Medium to high RTO and RPO

3:

• Disk-based backup and business continuity


• Medium RTO and RPO

4:

• Very fast file recovery


• Low RTO and RPO

Higher RTO corresponds with longer recovery time.

NDMP Deployment Methods

Is restoring petabytes of data


from tape going to take too
long? Is it too unreliable? Is it
feasible?

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 477


[email protected]
Disaster Recovery

• Organizations may use a full backup424 to tape425 as an offsite disaster recovery


solution.
• Another NDMP solution may be using a remote copy426 to do backups.

424An example is taking a full backup from a snapshot for DR, and letting it run
while using snapshots for any daily restores.

425Generally, organizations use tape for long-term storage. Management for tape
can be complex and recovery times unpredictable, unreliable, and long at petabyte
scale. Recovery from a disaster can take weeks at petabyte scale. Many
organizations still use backup to tape as their recovery solution. The cost,
maintenance, and resources can be less than a site to site solution, especially if
backing up to tape instead of disk.

426If a disaster occurs, the tapes can be stored offsite to maintain SLAs until the
source is brought back online.

PowerScale Advanced Administration

Page 478 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

The graphic shows two NDMP recovery methods which are solutions for disaster recovery.

These examples are not all-or-nothing situations. Organizations can have recovery
data on a cluster in a remote site while the less critical, archival data can restore
from tape for months if necessary.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 479


[email protected]
Disaster Recovery

Data Protection Using SyncIQ

SyncIQ data replication over LAN and WAN.

To prepare for disaster scenarios, such as an unplanned outage that can


potentially compromise your data, you need a solution that creates and safely
stores copies of your data.

• SyncIQ is a data replication software of PowerScale and a key element of a


robust data protection and disaster recovery solution.
• SyncIQ works with the snapshot technology to copy data blocks that change
between replications427.
• SyncIQ allows creating user-defined policies that can be scheduled to replicate
data to meet data recovery point objectives.
• One-to-One428

427 The replication process is completed quickly and with minimal disruption.

428SyncIQ is typically set up to replicate data from a single source cluster to a


single target cluster in a one-to-one relationship.

PowerScale Advanced Administration

Page 480 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

One-to-Many429

Data Protection Using Snapshots

A snapshot is a view of the file system or directory at a point-in-time. The snapshot


preserves the data at the time it is executed.

Functions

There are different ways of taking snapshots. OneFS uses,

• Copy on Write (CoW)430


• Redirect on Write (RoW)431

429SyncIQ is capable of replicating from a single source cluster to multiple target


clusters in a one-to-many relationship.

430User snapshots for data protection use CoW. CoW allows users and
administrators to retrieve lost, corrupted, malicious files and more. Before a new
write is written, the original block is copied to the snapshot area. CoW incurs a
double write penalty, but it results in less file fragmentation.

431 RoW are system defined snapshots and not a data protection use case.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 481


[email protected]
Disaster Recovery

A
CO RO
A
B
Snapshot Snapshot
B
C

C
File File D
System System
D D
B

The graphic shows changes that are made to D. Changes incur a double write penalty, there is less
fragmentation of the HEAD file.

Grandfather-Father-Son Scheduling

A typical data protection use case for snapshots is grandfather-father-son


scheduling and preservation. A grandfather-father-son set-up may look like:

Child Daily scheduled snapshots saved for a


week before deleted, maintaining no
more than seven snapshots at any
given time.

Parent Weekly scheduled snapshot saved for


a month, maintaining no more than
four or five snapshots at any given
time.

Grandparent Monthly scheduled snapshot saved for


a year, maintaining no more than
twelve snapshots at any given time.

Snapshots for Backup and Replication

Backup

• Using snapshots for incremental backups shorten backup times.

PowerScale Advanced Administration

Page 482 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

• In PowerScale, OneFS checks the previous snapshot for the NDMP backup
operation, snapshot_T1, and compares it to a new snapshot, snapshot_T2.
• OneFS then backs up all files that are modified since the last snapshot was
made.

Lower change rate equals


faster incremental backups
PowerScale NDMP backup
supports a progressive
incremental forever solution

Identifies data changes


between two snapshots

The graphic shows snapshots that are used to backup data.

Replication

Replication tools, such as PowerScale’s SyncIQ, generate snapshots to facilitate


replication, failover, and failback between storage platforms. The snapshot’s point-
in-time data is synchronized to the target file system. Any modifications to the
source file system during the synchronization period are not reflected on the target
file system.

SyncIQ compares snapshots to perform


incremental updates

Clients

The graphic shows a snapshot of the source directory.

Most storage platforms use snapshots as a mechanism to enhance replication,


increase backup efficiency, and ensure consistency of the backed-up data. The
difference between the snapshots is the changed data between backups. Without

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 483


[email protected]
Disaster Recovery

the use of snapshots, the file system directories would have to be scanned to find
the modified data, which is unrealistic at petabyte scale.

In OneFS, incremental synchronizations are performed similar to the mechanism


used for incremental backups. A second snapshot is compared to the previous, and
only modified data is synchronized to the target. Snapshots of the target file system
can be created for archival purposes. Administrators have two options if a
synchronization fails, fix the failure and complete the job successfully or start over
with an initial baseline synchronization.

Initial Synchronization and Incremental Synchronization

Different platforms use different mechanisms to achieve the replication.

• The first job initial synchronization for a SyncIQ policy sets a point-in-time
baseline of the production data. For the initial synchronization, the source
cluster creates a snapshot432.
• The incremental synchronization updates the target data.
• Once the initial synchronization is done, the first incremental update is ready433.
A snapshot is created before each replication job.

432The snapshot is used to ensure data that is modified after the snapshot point-in-
time is not replicated.

433The first update may take some time depending on the data change rate and
duration of the initial synchronization.

PowerScale Advanced Administration

Page 484 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

The graphic illustrates a SyncIQ synchronize policy as opposed to a copy policy.

In the example, data E’ and F’ are not replicated in the initial synchronization. The
goal is to get a point-in-time baseline copy to the target cluster. The initial
synchronization can consume large amounts of network bandwidth and take a long
time to complete. SyncIQ uses the difference between two snapshots, the previous
snap and the new snap. The graphic shows the new snapshot checking what
blocks are different. Only the changed blocks are updated to the target.
Subsequent updates should replicate less data until the replication achieves a
steady state where replication time and the amount of data are predictable.

Initial Synchronize Challenge

• One of the challenges to establishing replication between two clusters is the


amount of time it takes for the initial replication. The initial replication copies the
source data to the target cluster.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 485


[email protected]
Disaster Recovery

• There are several key variables that impact the time that the initialization takes,
such as link speed and the amount of data434.
• For example, if the initial synchronization takes three weeks during which 80%
of the data is modified, the first update may take several days.
• In this example, it can take some time before the policy reaches a predictable
and steady rate.

In this example, only the /ifs/west/sales directory is replicated.

434With a slow link and massive amount of data, the initial copy could take weeks
to months. Furthermore, depending on the change rate of data, the first
synchronization could also take a long time to finish.

PowerScale Advanced Administration

Page 486 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Initial Synchronize – Local

Performing the initial synchronization locally can address the challenge of


completing the initial synchronization when confronting high change rates, large
datasets, and low link speeds.

• The SyncIQ policy is configured, and 1 PB of data is replicated over the LAN.
• Once the initial synchronization and subsequent updates are complete, the
SyncIQ policy is disabled, the cluster shipped to the remote facility, and the
policy re-enabled.
• A cookie is retained on the target cluster in order for the policy to continue
incremental syncs and avoid a full retransmission of the data via initial
synchronization.

The graphic shows an example where the target cluster is co-located with the source cluster.

Hub and Spoke Topology

One to Many

A challenge organizations face is the need to have parallel synchronization to


multiple recovery clusters. Replicating to multiple sites protects against multiple
cluster and multiple site failures.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 487


[email protected]
Disaster Recovery

The graphic shows an organization with three clusters that has a typical hub and spoke topology.

Many to One

Another hub and spoke topology is a many to one solution.

The graphic shows an organization that replicates data from multiple locations to a core data center.

Student Guide: Another challenge organizations may face is the need to have
parallel synchronization to multiple recovery clusters. Replicating to multiple sites
protects against multiple cluster and multiple site failures. In the scenario, an

PowerScale Advanced Administration

Page 488 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

organization has three clusters. Shown is typical of a hub and spoke topology. In
this scenario, the policy replicating to the North data center uses the same
snapshot as the policy replicating to the East data center. A challenge to this model
is the resources SyncIQ uses, especially if there are multiple SyncIQ policies and
many remote clusters. By default, SyncIQ uses resources across the cluster and its
network and CPU consumption is unlimited. SyncIQ may require throttling to
constrain resources that are used for SyncIQ so as not to interfere with cluster
operations, production workflows, and client responses. OneFS 8.0 and later
changes the worker limitations and updates performance rules.

Another hub and spoke topology is a many to one solution. The graphic shows an
organization that replicates data from multiple locations to a core data center. The
organization may use a many to one solution to consolidate production data to
central, disaster recovery cluster. At the core data center, data is backed up to tape
or can be archived to the cloud, providing added protection. Each remote facility
should follow the naming practices. The remote data centers replicating to data
center West each have unique base directories. The failover target path should be
the same from source to target. Having the same path naming enable smooth DNS
failover, maintains scripts, and keeps a consistent mountpoint for NFS connections.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 489


[email protected]
Disaster Recovery

Data Protection Use Case

In PowerScale, the basic replication methods include What replication


two types, local replication435 and remote deployment is best to
protect my workflows?
replication436. Although the example shows use the
general terms source and target, the SyncIQ policies
are based on SyncIQ domains.

Local intra-cluster Point-in-time data


-Data loss protection protection of directory in
same cluster
Internal Communication

Local cluster to cluster Point-in-time data


-Cluster loss protection protection of directory in
different cluster

Remote Point-in-time data


protection of directory to
-Site loss protection
a different site

What is the goal or the requirement for replication? Is a mirrored copy of the source
the goal? Or is the goal to have all source data copied and retain deleted file copies
in case they are required later? Many other platforms use similar methods. Some
environments may distinguish local and remote by the type of network connecting
the source and the target platforms.

435 If data is replicated over the LAN, the method is local.

436 If data is replicated over the WAN, it is remote replication.

PowerScale Advanced Administration

Page 490 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Local intra-cluster replication is when the production and replicated data reside in
the same storage platform. SyncIQ uses the internal network to replicate, creating
an extra copy of high value data. Intra-cluster replication provides protection in the
event the source directory is lost.

A local cluster to cluster solution replicates data over the LAN to another platform
typically in the same data center. Cluster to cluster replication is typically done
between platforms in the same facility to protect against cluster failure. Remote
replication over the WAN protects against cluster and site failure. PowerScale
supports only asynchronous replication over IP. A subdirectory of /ifs is always
used as the source and target, never /ifs.

Two-Way Remote Data Protection

Two-way data replication is a common use case in many remote disaster recovery
implementations.

• The graphic shows two data centers that are named West and East.
• Each data center hosts read/write production data and synchronizes the
production data to a remote target.
• In the event one of the data centers has a catastrophic failure, both West’s
production data and East’s production data are protected.

Data center: West Data center: East


-Cluster and site loss protection
-Cluster and site loss protection

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 491


[email protected]
Disaster Recovery

Cascading

Cascading437 is not recommended for PowerScale because the synchronization


from East to South may occur while the updates are sent from the North to East.

Cascading may cause inconsistent views of the data between North and South. If
deploying a cascading, PowerScale solution, care must be taken to prevent the
possible inconsistencies438.

437
In a cascading model, the data is replicated from cluster West to cluster East
and then from cluster East to cluster South.

438One method of preventing inconsistencies is to create a copy of the target data


on East and synchronize the copy to South.

PowerScale Advanced Administration

Page 492 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Note: Cascading is specifically for backup purposes, and should be


used carefully – OneFS does not allow failback when the source is
the target of another policy.

Bandwidth

The replication in the example is over the WAN with


How can I estimate
a link speed of 1024 Mb/s. how long the initial
sync takes?

• To begin, convert the speed and capacity into the


same units of measurements—megabits to
megabytes and terabytes to megabytes.
• The source dataset is 15 TB. Factor in link speed
inefficiency, the example uses 80% efficiency.
• The capacity is divided by the link speed to get the amount of time in seconds.
The time in seconds is converted to hours and then days.

The graphic shows an example of estimating the amount of time the initial copy can
take.

1:

• Bits to bytes: 1024 Mbps/8 = 128 MBps


• TB to GB/s: 15 TB * 1024 = 15,360
• GB to MBps: 15,360 GB *1024 = 15,728,640

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 493


[email protected]
Disaster Recovery

• Line rate at 80% maximum efficiency: 0.8 * 128 = 102.4 MBps


• Seconds: 15,728,640/102.4 = 153,600
• Minutes: 153,600/60 = 2560
• Hours: 2560/60 = 42.67
• Days: 42.67/24 = 1.78 days days to complete initial sync

The example shows that an initial synchronization of 15 TB over a 1024 Mb per


second link is estimated to take 1.78 days. Knowing the duration can help
organizations decide whether to seed the cluster.

Recovery Point Objective

• In the example, the change rate is consistent at Can the amount of data I
10 GB per hour. need to synchronize meet
my RPO?

• The policy runs every hour and it takes 30


minutes to complete, making the current RPO
two hours.
• The needed RPO for the dataset is one hour.
Changing the policy to run every 30 minutes makes the RPO one hour.
• The change means that 5 GB of data is updated, taking about 15 minutes.
• Now that the initial copy and the first and second incremental updates are done,
the replication settles in to a steady pace.

PowerScale Advanced Administration

Page 494 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Data Reprotection Overview

• A PowerScale cluster is designed to continuously serve data, even when one or


more components simultaneously fail.
• OneFS ensures data availability by striping or mirroring data across the cluster.
• If a cluster component fails, data that is stored on the failed component is
available on another component.
• After a component failure, lost data is restored on healthy components by the
FlexProtect proprietary system.
• Data protection is specified at the file level, not the block level, enabling the
system to recover data quickly. Because all data, metadata, and parity
information is distributed across all nodes, the cluster does not require a
dedicated parity node or drive. This ensures that no single node limits the speed
of the rebuild process.

FlexProtect Overview

• FlexProtect performs data reprotection when a disk or node is smartfailed. It


locates incomplete protection levels, missing data or parity blocks and fixes
them.
• FlexProtect allows differing levels of protection to be applied to the file system.
• The FlexProtect job is started automatically after smartfailing a drive or node.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 495


[email protected]
Disaster Recovery

Important: Only SmartFail a node with the guidance of the Dell EMC
support team.

The data is always rebuilt from FEC. FlexProtect has a corresponding job,
FlexProtectLin. The suffix indicates that the job automatically uses an SSD-based
copy of metadata to scan the LIN tree, rather than the drives themselves.
Depending on the workflow, using FlexProtectLin often significantly improves job
runtime performance. FlexProtect uses a drive scan and an inode scan, whereas
FlexProtectLIN accesses using an inode scan.

FlexProtect Data Recovery

• OneFS protects data in the cluster based on the configured protection policy.
• It distributes all data and error-correction information across the cluster and
ensures that all data remains intact and accessible even in the event of
simultaneous component failures.

PowerScale Advanced Administration

Page 496 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

• It rebuilds failed disks, uses free storage space across the entire cluster to
further prevent data loss, monitors data, and migrates data off of at-risk
components.
• Under normal operating conditions, all data on the cluster is protected against
one or more failures of a node or drive.
• OneFS reprotects data by rebuilding data in the free space of the cluster. While
the protection status is in a degraded state, data is more vulnerable to data
loss.

FlexProtect Impact

The time that it takes to rebuild data following a drive failure depends on many
variables. The best way to gauge a FlexProtect runtime is to use a previous rebuild
runtime. Without a history, the runtime becomes a guess. The graphic shows the
runtime to rebuild the data following smartfailing the disk is about 6.6 hours on a
1.2 TB drive.

Runtime on
idle system
Variables to the
data rebuild time

The graphic shows an example of a runtime on a cluster with no load and a mix of small and large
files.

The OneFS release determines the job engine version and how efficiently it
operates. The system hardware dictates the drive types, amount of CPU, and
RAM. The amount of file system data, the makeup of data, and the protection
levels have an impact. The load on the cluster also determines the amount of time
a rebuild takes. SmartFail runtimes range from minutes for empty, idle nodes to
days for nodes with large SATA drives and a high capacity utilization.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 497


[email protected]
Disaster Recovery

SmartFail Overview

SmartFail is the mechanism OneFS uses to protect data on failing drives or nodes.
OneFS smartfails drives when any potential data integrity exists. Smartfailing a
drive can be anticipated439 or unanticipated. FlexProtect is the process that handles
the reprotection and verification. After the process is complete, the failed device is
logically removed from the cluster and can then be replaced.

The graphic shows disk 1 of sled C in node 1 failing.

Note:
Consult Dell EMC support before manually smartfailing a node or
drive.
A node issue or failure does not automatically start rebuilding data.
Do not remove a drive without understanding the latest procedure –
consult Dell EMC support for the procedure.

439When SmartFail anticipates a drive failure, it quarantines the drive while


reprotecting its data across other drives.

PowerScale Advanced Administration

Page 498 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Shown is a simplistic example of the process. Data is rebuilt using FEC and rebuilt
in free space within the same disk pool. Typically, smartfailing a node is done for
migration purposes or with assistance from the Dell EMC support. Large capacity
disks, such as 6 TB, 8 TB, and 10 TB SATA drives, may require longer data
reconstruction times. For example, a 6 TB drive at 90% capacity takes longer than
a 2 TB drive at 90% capacity. Conversely, a 2 TB drive at 90% capacity takes
longer than 6 TB drive at 10% capacity. Disk evolution has produced greater disk
density, but the disk mechanics remain constant, meaning the number of heads
and actuators remains the same. Large capacity disks raise the probability of a
multiple drive failure scenario.

Node Failures

If a node reboots, the file system does not need to be rebuilt because it remains
intact during the temporary failure.

• OneFS does not automatically start reprotecting data when a node fails or goes
offline.
• If N+1 data protection is configured on a cluster, and one node fails, all the data
is still accessible from every other node in the cluster.
• If the node comes back online, the node rejoins the cluster automatically without
requiring a full rebuild.
• If a node is physically removed from the cluster, it should be removed logically
also.
• After that the node automatically reformats its own drives, and resets440 itself to
the factory default settings.
• Use SmartFail441 to logically remove a node.

440The reset occurs only after OneFS has confirmed that all data has been
reprotected.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 499


[email protected]
Disaster Recovery

• After the new node is added, OneFS distributes the data to the new node.
• It is more efficient to add a replacement node to the cluster before failing the old
node because OneFS can immediately use the replacement node to rebuild the
data stored on the failed node.

Recommended Folder Structure

The failover target path needs to be the same as the source. A consistent path 442
enables switching between clusters by changing the DNS direction.

441It is important that you smartfail nodes only when you want to permanently
remove a node from the cluster. If you remove a failed node before adding a new
node, data that are stored on the failed node must be rebuilt in the free space in the
cluster.

442Consistent path names keep any scripts using the shares and exports from the
source cluster from breaking. Also, the mount entries for NFS connections must
have a consistent mountpoint to avoid having to manually edit client fstab or
automount entries.

PowerScale Advanced Administration

Page 500 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

The graphic shows a two-way replication where both sites are a source and target.

Scenario
• Two-way replication, each data center hosts production read/write data
• Each data center acts as a DR target
• Data center East has two SyncIQ domains with different RPO requirements

Note: Though it is a two-way replication, each cluster replicates


different production datasets.

Cluster East has two source directories each having a different RPO requirement.
The data in /ifs/east/engineering is critical data and therefore has a shorter
RPO. Exclude and include statements should be used only with a good
understanding of their function. By default, SyncIQ includes all files and folders
under the specified root directory, such as all subdirectories under
/ifs/west/sales. Explicitly including one path, such as
/ifs/west/sales/sales-gen, excludes all other paths such as
/ifs/west/sales/sales-media.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 501


[email protected]
Disaster Recovery

Large-Scale Disaster Recovery Challenges

The larger and more disparate the organization, the more complex a disaster
recovery solution.

What is the best way to replicate


multiple clusters to remote sites?

• Some of the challenges organizations face is how to replicate configuration


data.
• How is the data going to be rebuilt after a catastrophic disaster?

− Several methods can be used depending on how the data is protected. A


hub and spoke or multi-hop solution can be used when organizations with
two data centers require added protection.
If a failure at the production site occurs, the configuration should be identical to
avoid excessive data unavailability. Establishing the replication link and replicating
the data for the first time can take a long time. Factors such as link speed, the
amount of data to replicate, and the data change rate impact the replication time.

Data Protection and Disaster Recovery Considerations

• An explicitly defined and routinely tested procedure is key to minimizing the


potential impact to the workflow when a failure occurs or in the event of a
natural disaster.
• Among the primary approaches to data protection at scale are fault tolerance,
redundancy, snapshots, and replication.

PowerScale Advanced Administration

Page 502 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

• Some of these methods are biased towards cost efficiency but have a higher
risk that is associated with them, and others represent a higher cost but also
offer an increased level of protection.
• Despite support for parallel NDMP and native two-way NDMP over Fiber
channel, traditional backup to VTL or tape is often not a feasible DR strategy at
the large or extra-large cluster scale. Instead, replication is usually preferred.
• For large clusters, snapshots typically provide the first line of defense in a data
protection strategy with low recovery objectives.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 503


[email protected]
Disaster Recovery

SyncIQ DR

SyncIQ DR

Scenario

Use Case

DR is performed by syncing data, syncing configurations and redirecting clients


between a production and DR site.

Hayden is looking to familiarize himself with the following tasks to provide a DR


solution:

✓ Create and assess a SyncIQ policy


✓ Run and verify policy
✓ Perform failover and failback
✓ Automate failover and failback
✓ Ensure access to data in cloud

PowerScale Advanced Administration

Page 504 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Task - Create SyncIQ policy

Create Policy - Settings

• To meet the RPO and RTO of 6 hours, the policy is configured to synchronize
data between source and target every 3 hours during the weekdays.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 505


[email protected]
Disaster Recovery

• The scheduled job runs only if the source contents are modified since the last
run.
• Administrators are made aware via events if RPO of 6 hours is exceeded.

• Policy name and description: As a best practice, the policy name field should
be descriptive for administrators to easily gather the policy workflow, as several
policies could be configured on a cluster. A unique name makes it easy to
recognize and manage.
• Enable/Disable policy: Temporarily disabling a policy allows for a less intrusive
option to deleting a policy when it may not be required. Additionally, after
completing the configuration for a policy, it can be reviewed for a final check,
prior to enabling.
• Policy type:
− Copy: A copy policy maintains a duplicate copy of the source data on the
target. Files deleted on the source are retained on the target. A copy policy
offers file deletion protection, but not file change protection. Copy policies
are most commonly used for archival purposes.
− Synchronize: A synchronization policy maintains an exact point in time copy
of the source directory on the target cluster. If a file or sub-directory is
deleted from the source directory, when the job runs, the file or directory is
removed from the target. The synchronization policy does not provide
protection from file deletion, unless the synchronization has not yet taken
place. It provides sources cluster protection.
• Job options:

− Manually: Run the replication job on a ad hoc basis. This limits cluster
overhead and saves bandwidth. Manual SyncIQ jobs still maintain a source
snapshot that accumulates changed blocks. Therefore, it is recommended to
run the manual job frequently, ensuring the source snapshot growth is
limited.
− On a schedule: Provides a time-based schedule for the SyncIQ policy
execution. When selected the time schedule options change to match the
selected interval. An option is available to not run the policy if no changes to
the data have occurred since the last time the policy was run. This option
saves system resources when replication is not required. Administrators can
specify an RPO (recovery point objective) for a scheduled SyncIQ policy and
trigger an event to be sent if the RPO is exceeded. The RPO calculation is

PowerScale Advanced Administration

Page 506 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

the interval between the current time and the start of the last successful sync
job.
− When source is modified: The SyncIQ domain is checked every 10
seconds for changes. If a change is detected, the policy runs automatically.
Events that trigger replication include file additions, modifications and
deletions, directory path, and metadata changes. The option is only
recommended for datasets with a very low archival change rate. Constantly
forcing SyncIQ to run the policy on high change rate datasets could severely
impact both source and target cluster performance. An option to delay the
start of the replication is available to allow new writes to the source to
complete prior to running the job. Delaying the policy allows fewer file
replication rather than many short replication runs. Content distribution and
Electronic Design Automation, or EDA, are the primary use cases.
− Whenever a snapshot of the source directory is taken: initiates when the
snapshot matching the specified pattern is run. The option is useful in a one-
to-many solution where only one user generated snapshot is required for all
replications. The job has the option to replicate data based on historic
snapshots of the source SyncIQ domain the first time the policy is run. This
creates a mirrored image of the snapshots on the target from the source and
is particularly useful for snapshot protection for file deletions. The Enable
capture of snapshots on the target cluster must also be set for mirrored
image of the snapshots.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 507


[email protected]
Disaster Recovery

Create Policy - Source

Configuration

• The engineering zone base directory is replicated.


• SyncIQ data replication is only supported through the System access zone.
• Excluding nodes from a SyncIQ policy is beneficial for larger clusters where
data replication jobs can be assigned to certain nodes.

Included Directories vs Excluded Directories Scenario

Rule - If both include and exclude directories are specified, any excluded
directories must be contained in one of the included directories. Otherwise, the
excluded directory setting has no effect.

Consider the following example:

• Root Directory: /ifs/div-gen/engineering


• Included Directories:

PowerScale Advanced Administration

Page 508 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

− /ifs/div-gen/engineering/media/graphs
− /ifs/div-gen/engineering/media/sample_demo
• Excluded Directories:
− /ifs/div-gen/engineering/media/graphs/trails
− /ifs/div-gen/engineering/media
• Result:

− All directories under /ifs/div-gen/engineering/media are excluded except


those specified in the included list: /ifs/div-gen/engineering/media/graphs
and /ifs/div-gen/engineering/media/sample_demo
− The directory /ifs/div-gen/engineering/media/graphs/trails is excluded as
it is a subdirectory of the included/ifs/div-gen/engineering/media/graphs
directory.
• Source Root Directory: The source root directory is the SyncIQ domain. The
path has the data that you want to protect by replicating it to the target directory
on the secondary cluster. Unless otherwise filtered, everything in the directory
structure from the source root directory and below replicates to the target
directory on the secondary cluster. Do not use /ifs as the source root.
• Included Directories: The included directories field permits adding one or more
directory paths below the root to include in the replication. Once an include path
is listed that means that only paths listed in the include path replicate to the
target. Without include paths all directories below the root are included.
• Excluded Directories: A way to manage performance at either the source or
target cluster is to use a more specific directory selection in the SyncIQ policy.
This can be useful in excluding unnecessary data from replication and making
the entire process run faster, but it does add to the administrative overhead of
maintaining policies. Exclude Directories lists directories below the root to
explicitly exclude from the replication policy. You cannot fail back replication
policies that specify includes or exclude settings. The DomainMark job does not
work for policies with subdirectories mentioned in Include or Exclude. Using
includes or excludes for directory paths does not affect performance.
• File Matching Criteria: The file matching criteria enables the creation of one or
more rules to filter which files do and do not get replicated. Creating multiple
rules connect them together with Boolean AND or OR statements. When adding
a new filter rule, click either the Add an “And” condition or Add an “Or”

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 509


[email protected]
Disaster Recovery

condition links. File matching criteria says that if the file matches these rules
then replicate it. If the criteria does not match the rules, do not replicate the file.
• Restrict Source Nodes: Restrict Source Nodes allows the cluster to use any of
its external interfaces to synchronize data to the target. Selecting run on only
the nodes in the specified subnet and pool directs the policy to the specific pool
for replication. This option effectively selects a SmartConnect zone over which
the replication traffic transfers. SyncIQ only supports static IP address pools. If
using dynamically allocated IP address pools, SmartConnect might reassign the
address while a replication job is running. Reassigning the address will
disconnect the job and cause it to fail. You can specify a source pool globally on
the Settings tab.

Create Policy - Target

• The policy synchronizes data to the phoenix cluster.


• A similar directory directory structure is configured at the target site.
• Between the same source and target, multiple policies cannot use the same
root source or root target directory (SyncIQ domain).

PowerScale Advanced Administration

Page 510 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

• Target Host: Specify the target host using the target SmartConnect zone IP
address, the fully qualified domain name, or local host. Local host is used for
replication to the same cluster. You also specify the target SyncIQ domain root
path.
• Target Directory: As a best practice, ensure that the source target name, the
access zone name are in the target directory path.
• Restrict Target Nodes: SyncIQ will use only the node connected within the
SmartConnect zone. Once a connection with the target cluster is established,
the target cluster replies with a set of target IP addresses assigned to nodes
restricted to that SmartConnect zone. SyncIQ on the source cluster will use this
list of target cluster IP addresses to connect local replication workers with
remote workers on the target cluster.
• Enable Target Snapshots: SyncIQ always retains one snapshot of the most
recently replicated delta set on the secondary cluster to facilitate failover,
regardless of this setting. Enabling capture snapshots retains snapshots beyond
the time period that is needed for SyncIQ. The snapshots provide more recover
points on the secondary cluster. The snapshot alias name is the default alias for
the most recently taken snapshot. The alias name pattern is
SIQ_%(SrcCluster)_%(PolicyName). For example, a cluster called cluster1
for a policy called policy2 would have the alias SIQ_cluster1_policy2. You can
specify the alias name as a Snapshot naming pattern. For example, the pattern
%{PolicyName}-on-%{SrcCluster}-latest produces names similar to
newPolicy-on-Cluster1-latest.
• Snapshot Expiration: The expire options are days, weeks, months, and years.
It is recommended to always select a snapshot expiration period.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 511


[email protected]
Disaster Recovery

Create Policy - Advanced

• Due to high replication frequency, reports are set to be deleted 2 weeks after
creation.
• Cloud data is retrieved by the source and replicated to the target.

• Priority: Enables policies to be prioritized. If more than 50 concurrent SyncIQ


policies are running at a time, policies with a higher priority take precedent over
normal policies.
• Log Level: SyncIQ logs provide detailed job information. Access the logs using
the /var/log/isi_migrate.log file. The output detail depends on the log level,
with the minimal option being Fatal and the maximum logging option being
Trace.
• Validate File Integrity: Provides an option for OneFS to compare checksums
on SyncIQ file data packets pertaining to the policy. In the event a checksum
value does not match, OneFS attempts to transmit the data packet again.
• Prepare Policy for Accelerated Failback: If the SyncIQ replication is intended
for failover and failback disaster recovery scenarios, selecting Prepare policy for
accelerated failback performance prepares the DomainMark for the failback
performance. The original source SyncIQ domain requires a DomainMark.

PowerScale Advanced Administration

Page 512 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Running a DomainMark during the failback process can take a long time to
complete.
• Report Retention: Defines how long replication reports are retained in OneFS.
Once the defined time has exceeded, reports are deleted.
• Record Sync Deletions: Track file and directory deletions that are performed
during synchronization on the target.
• Deep Copy: Applies to those policies that have files in a CloudPools target.
Deny is the default. Deny enables only stub file replication. The source and
target clusters must be at least OneFS 8.0 to support Deny. The Allow option
enables the SyncIQ policy to determine if a deep copy should be performed.
Force automatically enforces a deep copy for all CloudPools data that are
contained within the SyncIQ domain. Allow or Force are required for target
clusters that are not CloudPools aware.

Assess Policy

SyncIQ can run an assessment on a policy without actually transferring file data
between the primary and secondary cluster.
• SyncIQ scans the dataset and reports443.
• Assessment for performance tuning444.

443SyncIQ provides a detailed report of how many files and directories were
scanned. This is useful if you want to preview the size of the dataset that is
transferred if you run the policy.

444Running a policy assessment is also useful for performance tuning, allowing you
to understand how changing worker loads affects the file scanning process so you
can reduce latency or control CPU resource consumption.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 513


[email protected]
Disaster Recovery

• Verify communication445.
• How much data446 to replicate.

Assess Sync

445The assessment also verifies that communication between the primary, and
secondary clusters is functioning properly.

446The assessment can tell you whether your policy works and how much data will
be transferred if you run the policy. This can be useful when the policy will initially
replicate a large amount of data.

PowerScale Advanced Administration

Page 514 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Report

Best Practice: Run a policy assessment to confirm the policy


configuration and resource commitment prior to the replication
requirement of the policy.

SyncIQ RPO and RTO

Description

• The RPO is the amount of time that has passed since the last completed
replication job started.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 515


[email protected]
Disaster Recovery

• The RPO is never greater than the time it takes for two consecutive replication
jobs to run and complete.
• RTO is the maximum amount of time required to make data on the target
available to clients after a disaster.
• The RTO is always less than or approximately equal to the RPO.

− Replication job runs continuously: RTO approximately equal to RPO


− Replication job runs on an interval:
o Disaster occurs when job is running: RTO approximately equal to RPO
o Disaster occurs when no job is running: RTO is negligible

Scenario

Consider a policy with the following setting:

• Schedule: Every 2 hours


• RPO alerts: 3 hours

In the example, a disaster occurs at 6:50 before the update is complete. The data on the target
cluster reverts to the state it was in when the last replication job completed, which is 4:00.

Decision point: What do I set my interval at to meet my RPO?

SyncIQ Policy Modification

• The impact of the change is dependent upon how the policy is modified.

PowerScale Advanced Administration

Page 516 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

• When an already running policy is modified, SyncIQ will run either the initial
replication or a differential replication again.
• When a policy is deleted:
− Replication jobs are not created for the policy.
− Snapshots and reports associated with the policy are deleted.
− Target cluster breaks the policy association with the source cluster.
− The target directory allows writes.
• Rather than modifying or deleting a policy when a suspension is required, you
can disable a policy and re-enable when required.

Modifying the fields shown in the graphic can trigger a replication to run.

Failover and Failback Scenario

Use Case

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 517


[email protected]
Disaster Recovery

✓ Prepare DR site for failover.


✓ Perform failover and redirect clients to DR site.
✓ Perform failback and redirect clients to production site.

Considerations

• SyncIQ does not replicate:


− Configuration data such as SMB shares, NFS exports, quotas, aliases.
− Local or file provider users and groups.
− Snapshots
• SyncIQ does not include failover and failback for network settings, DNS, and
client redirection.
• Failback can be performed for a policy that meets the following criteria:
− The policy is a synchronization policy.
− The policy does not exclude any files or directories from replication.
• Failback takes a longer time as compared to a failover.

PowerScale Advanced Administration

Page 518 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Failover and Failback Overview

Failover

Failover is the process of changing the role of the target replication directories into
the role of the source directories for assuming client read, write, and modify data
activities.

Failback

A failback is the process of restoring the source-to-target cluster relationship to the


original operations where client activity is again on the source cluster.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 519


[email protected]
Disaster Recovery

Failover Revert

A failover revert undoes a failover job in process. Use revert before writes occur on
the target.

Failover: Failovers can happen when the primary cluster is unavailable for client
activities. The reason could be from any number of circumstances including natural
disasters, site communication outages, power outages or planned events such as
testing a disaster recovery plan or as a result of upgrade or other scheduled
maintenance activities. Failover changes the target directory from read-only to a
read/write status. Failover is managed per SyncIQ policy. Only policies that are
failed over are modified. SyncIQ only changes the directory status and does not
change other required operations for client access to the data. Network routing and
DNS must be redirected to the target cluster. Any authentication resources such as
AD or LDAP must be available to the target cluster. All shares and exports must be
available on the target cluster or be created as part of the failover process.

Failback: A failback can happen when the primary cluster is available once again
for client activities. The reason could be from any number of circumstances
including that natural disasters are no longer impacting operations, or site
communication or power outages have been restored to normal. Each SyncIQ
policy must be failed back. Like failover, failback must be selected for each policy.
The same network changes must be made to restore access to direct clients to the
source cluster.

PowerScale Advanced Administration

Page 520 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Failover Revert: A failover revert undoes a failover job in process. Use revert if the
primary cluster once again becomes available before any writes happen to the
target. A temporary communications outage or if doing a failover test scenario are
typical use cases for a revert. Failover revert stops the failover job and restores the
cluster to a sync ready state. Failover revert enables replication to the target cluster
to once again continue without performing a failback. Revert may occur even if data
modifications have happened to the target directories. If data has been modified on
the original target cluster, perform a failback operation to preserve those changes.
Not doing a failback loses the changes made to the target cluster. Using revert will
cause all changes written to the source and target cluster since the last SyncIQ
snapshot to be permanently lost. Before a revert can take place, a failover of a
replication policy must have occurred. A revert is not supported for SmartLock
directories.

Configuration Preparations

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 521


[email protected]
Disaster Recovery

SyncIQ only manages the failover and failback for the file data. For a smooth
failover, the target must be configured similar to the source:
• The file system directory structure, shares and exports should be the same on
the source and target.
• Similar SmartConnect configurations such as DNS, access zones and SPNs.
• Quotas need to be considered447.
• The source and target must share the same authentication providers448.
• The UIDs, GIDs and SIDs must resolve to be the same user or group. 449

For SyncIQ Domain preparation at the target site, there are two possible scenarios
to prepare for:
1. The first scenario is the last sync job has completed successfully.
2. The second scenario is the last sync job did not complete successfully or failed
mid job.

447 Having a larger, or no, quota on the target can cause problems when failing
back. For example, quotas placed on the source can be exceeded when the target
is read/write and then failing back may deny users or groups write ability.

448All AD domains and forests, LDAP or NIS authentication servers should be


available to both clusters. Any file providers must be available on both clusters.
Local user and groups need to be added to both clusters and be the same.

449 The POSIX and AD mappings for a user need to map to avoid file permission
issues. During the setup process, you may be required to make ID mapping
corrections, especially if the clusters were not sharing LDAP or AD domains
originally. The best practice is to join to the authentication providers before the first
use.

PowerScale Advanced Administration

Page 522 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

The first part of the site preparation stage is to set the SyncIQ directories for the
sync job to no longer accept incoming sync requests. The system then takes a
snapshot of the directories for the sync job, labels as “-new.” The system then
compares the “-new” snapshot to the “-latest” or last-known-good snapshot. If they
are the same and no differences are found, the sync directories have the read-only
bit removed and are placed into a read/write state and ready to accept write
activity.

In the case where a sync job has not completed, failed, or was interrupted in
progress, the “-new” snapshot is taken as before and compared to the “-latest” last-
known-good snapshot. The differences in directories, files, and blocks are then
reverted to the last-known-good state. This process is also called snapshot revert.
This restores the files to the last know consistent state. All synchronized data in the
difference between the snapshots is deleted. Be aware, some data might be lost or
unavailable on the target. After this has been accomplished, the sync directories
have the read-only bit removed and are placed into a read, write state and ready to
accept client write activity.

Failover and Failback Phases

A failback consists of four distinct phases, the preparation, running the mirror
policy, restoring the source, and restoring the SyncIQ policy. A failback without
executing each phase will undo all changes that occurred on target while failed
over. To begin the failover, synchronize the source, shown as boston, dataset to
the target, shown as phoenix. Next, Allow Writes on the target to prepare for the
full failover. The preparation phase runs the resync-prep on the source cluster. This

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 523


[email protected]
Disaster Recovery

phase readies the source to receive the updates that occurred on the target cluster.
It creates a mirror policy on the target. The next phase runs the mirror policy to
failback the dataset. Next make the source active by enabling Allow Writes on the
source SyncIQ local targets mirror policy. The final phase runs the resync-prep on
the target mirror policy to set the target cluster to read-only and ensures that the
datasets are consistent. Run the replication report to verify the failback completed
and redirect clients to the source.

Failover Steps

Steps

Administration

PowerScale Advanced Administration

Page 524 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Target Write Options

When a policy runs for the first time, it creates an association between the source
and target.

The association is formed by placing a cookie450 on the source cluster.

You can make the target data writable by either using the Allow Writes option or by
breaking the association between source and target.

Allow Writes Break Association

Does not result in a full or The policy must be reset before the policy can
differential replication to occur run again. A full or differential replication will
after the policy is active again, occur the next time the policy runs. During this
as the policy is not reset. full resynchronization, SyncIQ creates a new
association between the source and its specified
target.

Used in failover and failback Used in temporary test scenarios, data


operations. migrations, or obsolete SyncIQ policies.

Failover and failback processes are initiated by the administrator using the CLI or
the web administration interface. Each SyncIQ policy must be initiated for failover
or failback separately on the target cluster. There is no global failover or failback
selection. Mirror policies and SyncIQ snapshots are baseline elements used by
SyncIQ in normal operations. Do not delete the mirror SyncIQ policies used for
failback. SyncIQ snapshots begin with SIQ- and should never be manually deleted.

450The cookie allows the association to persist, even if the target cluster’s name or
IP address is modified.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 525


[email protected]
Disaster Recovery

Historically kept SyncIQ snapshots should be deleted according to the policy


settings. Performing a failover makes no changes on the source cluster. The
initiation of the failover prevents synchronizations to the target on that specific
policy. The data under that policy is restored to the last-known-good snapshot.
Then the read-only restriction on the target is removed from the SyncIQ domain for
that policy. If the source cluster is on line, client access should be stopped and the
replication policy set to manual. In a scenario where the network is lost between
the source and target clusters, clients may write data to the source that will be lost
on a failback. Setting the replication policy to manual prevents the source from
attempting to synchronize on schedule and prevents a failed state and needing to
resolve. If possible during a controlled failover, run a final incremental update. Next
allow writes on the target cluster and then redirect the clients to the target. There
are two scenarios as part of the target site preparation, one when the last update
completes successfully and the other when the last update is unsuccessful.

In the site preparation stage, SyncIQ domains no longer accept incoming


synchronization requests. The system snapshots the directories and compares it to
the last-known-good snapshot. If no differences are found, the target directories
have the read-only bit removed and are placed into a read/write state. In the case
where a sync job has not completed, failed or was interrupted in progress, the new
snapshot is compared to the last-known-good snapshot and the differences in
directories, files and blocks are reverted to the last-known-good state. This process
is also called snapshot revert. All synchronized data in the difference between the
snapshots is deleted, thus unavailable on the target.

Failover Revert Steps

Steps

PowerScale Advanced Administration

Page 526 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Administration

Reverting a failover operation does not synchronize data modified on the target
back to the source cluster. It undoes a failover job. The table shows the steps
following the failover. If testing a SyncIQ domain for disaster recovery and the
directory is only temporarily read/write on the target, the changes are discarded
with a failover revert. Discard the changes by clicking disallow writes in the web
administration interface for each policy.

Failback Steps

Steps

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 527


[email protected]
Disaster Recovery

Resync-prep

Mirror Policy
Target

Source

Shown are the failback steps. The target cluster is failed over and accepting writes.
The target is not accepting updates from the source cluster. The source directory is
read-only.

• Prepare re-sync on the source cluster. For each SyncIQ policy on the source, a
mirror policy is created on the target. The source cluster is set to a read-only

PowerScale Advanced Administration

Page 528 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

state, is rolled back to the last known good state, and can accept updates from
the target mirror policies.
• Stop client access to the read/write directory on the target cluster to prevent
new data that may not be synchronized back to the source.
• From the target, resynchronize the target to the source using the mirror policies.
• Allow writes for each sync policy on the source cluster. This resets the sync
direction from the original source to the target. The target is not accepting
updates from the source and is still in a read/write state.
• Run the prepare resync on the target to reset the target sync relationship with
the source. The target is set to accept new syncs from the source and is
restored to a read-only status.
• Redirect clients to the source cluster from the target cluster at this point.

PowerScale and Eyeglass

Scenario

✓ Understand integration of PowerScale with Superna Eyeglass DR Edition.


✓ Configure Eyeglass to provide automated failover.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 529


[email protected]
Disaster Recovery

Topology

• Eyeglass replicates cluster configuration data and orchestrates failover and


failbacks.
• Eyeglass features one button failover automation to include DNS and active
directory.
• Eyeglass’ readiness monitoring tracks disaster recovery related changes.
• The runbook robot offers continuous and automatic disaster recovery testing,
writing test data, failing over, and failing back on a nightly schedule.
• Eyeglass application resides only in the target for one-way replication. It must
reside on both sites to support two-way replication.

PowerScale Advanced Administration

Page 530 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Configuration

High-level steps to configure Eyeglass include:


• Add PowerScale source and target clusters.
• Enable share, export, alias configuration replication.

Advantages and Disadvantages

Advantages Disadvantages

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 531


[email protected]
Disaster Recovery

• Eliminates complex and manual • Added layer of management.


failovers through automation. • Third party application and another
• Replicates cluster configuration layer of support.
data, eliminating the manual or
scripted tasks.
• Runbook robot provides continued
disaster recovery testing between
pairs of clusters.
• Runbook robot tests application
failover logic by creating
configuration data and copying
data into the robot access zone.
• Automated readiness monitoring
tracks all infrastructure changes
that impact disaster capabilities.

Resource:
1) Superna Eyeglass with SyncIQ - PowerScale Info Hub
2) Eyeglass PowerScale Edition Quick Start Guide

Once DR Edition is configured it continually monitors the PowerScale cluster for


DR readiness through auditing, SyncIQ configuration, and several other cluster
metrics. The monitoring process includes alerts and steps to rectify discovered
issues. In addition to alerts, DR edition also provides options for DR testing, which
is highly recommended, ensuring IT administrators are prepared for DR events.
The DR testing can be configured to run on a schedule. For example, depending
on the IT requirements, DR testing can be configured to run on a nightly basis,
ensuring DR readiness. As DR Edition collects data, it provides continuous reports
on RPO compliance, ensuring data on the target cluster is current and relevant.

Configuration: Adding the PowerScale clusters requires entering the


SmartConnect service IP, the login credentials, and the RPO. Once the cluster is
added, Eyeglass automatically runs an inventory task to discover the PowerScale

PowerScale Advanced Administration

Page 532 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

components. When the task completes the discovered inventory can be seen in the
inventory view. Once the inventory task completes, Eyeglass Jobs are
automatically created to replicate between the SyncIQ policy defined source and
target. Enabling the configuration replication can be done on a job by job basis
from the jobs tool.

SyncIQ and CloudPools

Scenario

✓ Modify SyncIQ policy to support cloud access.


✓ Ensure cloud account, pool, and policy are present both in source and target.

Overview

• When data is tiered to the cloud, a SmartLink file is created on the cluster,
containing the relevant metadata to retrieve the file at a later point.
• A file that is tiered to the cloud, cannot be retrieved without the SmartLink file.
• During replication, the SmartLink files are also replicated to the target cluster.
• Both source and target can have read access, but only a single cluster can
have write access.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 533


[email protected]
Disaster Recovery

During normal operation, the source cluster has read-write access to the cloud provider, while the
target cluster is read-only.

Failover and Failback Implications

When failover is performed, clients are only allowed read access to the cloud data
using the SmartLink file.

Changes to the cloud data are propagated only via the source after failback is
performed.

For extended or permanent failover, write access can be granted by using the isi
cloud access command.

PowerScale Advanced Administration

Page 534 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Policy Configuration

To enable CloudPools integration on the target, Deep Copy must be either set to
Allow or Deny.

• Allow451
• Deny452
• Force453

451Replicates the SmartLinks from the source to the target cluster, but also checks
the SmartLinks versions on both clusters. If a mismatch is found between the
versions, the complete file is retrieved from the cloud on the source, and then
replicated to the target cluster.

452Deny is the default setting, allowing only the SmartLinks to be replicated from
the source to the target cluster, assuming the target cluster has the same
CloudPools configuration.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 535


[email protected]
Disaster Recovery

Target Configuration

SyncIQ automatically configures the cloud provider account information,


CloudPools, and the filepool policy on the target cluster.

• When CloudPools is configured prior to the SyncIQ policy: First SyncIQ job
checks for SmartLink files on the source directory.
• When CloudPools is configured after the SyncIQ policy: Next SyncIQ job after
CloudPool configuration checks for SmartLink files on the source directory.
• In both cases, if SmartLink files are found, the target cluster SyncIQ performs
the following:
− Configures the cloud storage account and CloudPools matching the source
cluster configuration.
− Configures the file pool policy matching the source cluster configuration.
• As a best practice, temporarily disable the associated SyncIQ policy prior to
configuring CloudPools.

Managing SyncIQ Jobs

Managing Replication to Remote Cluster

You can manually run, view, assess, pause, resume, cancel, resolve, and reset
replication jobs that target other clusters.

No more than five running and paused replication jobs can exist on a cluster at a
time. However, an unlimited number of canceled replication jobs can exist on a
cluster.

453Requires CloudPools to retrieve the complete file from the cloud provider on to
the source cluster and replicates the complete file to the target cluster.

PowerScale Advanced Administration

Page 536 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

If a replication job remains paused for more than a week, SyncIQ automatically
cancels the job.

• Start Job Manually454


• Pause Running Job455
• Cancel Job456

454You can manually start a replication job for a replication policy at any time. To
replicate data according to an existing snapshot, run the isi sync jobs start
command with the --source-snapshot option.

455You can pause a running replication job and then resume the job later. Pausing
a replication job temporarily stops data from being replicated, but does not free the
cluster resources replicating the data. A paused job reserves cluster resources
whether or not the resources are in use.

456You can cancel a running or paused replication job. Canceling a replication job
stops data from being replicated and frees the cluster resources that were
replicating data. You cannot resume a canceled replication job. To restart
replication, you must start the replication policy again.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 537


[email protected]
Disaster Recovery

Managing Failed Jobs

When a replication job fails due to an error, SyncIQ might disable the
corresponding replication policy.

To fix the policy, you can either resolve457 or reset458 the policy.

It is recommended that you attempt to fix the issue rather than reset the policy.

457If SyncIQ disables a replication policy due to a replication error, and you fix the
issue that caused the error, you can resolve the replication policy. Resolving a
replication policy enables you to run the policy again. If you cannot resolve the
issue that caused the error, you can reset the replication policy.

458 If a replication job encounters an error that you cannot resolve, you can reset
the corresponding replication policy. Resetting a policy causes OneFS to perform a
full or differential replication the next time the policy is run. Resetting a replication
policy deletes the latest snapshot generated for the policy on the source cluster.

PowerScale Advanced Administration

Page 538 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Challenge

Lab Assignment:
1) Create and assess a SyncIQ policy.
2) Execute a failover to a remote cluster.
3) Execute a failback.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 539


[email protected]
Disaster Recovery

NDMP

NDMP

NDMP Overview

Network Data Management Protocol (NDMP) is an open standard protocol for


network-based backup for network-attached storage. The predominant use of
NDMP is backup, restore, and replication of NAS file systems. It is also possible to
copy files and directories between two NDMP-capable systems using the Fibre
Channel (FC) combo card.

NDMP over fiber channel on Gen 6 nodes.

NDMP provides the following benefits:

• Reduces complexity
• Provides interoperability
• Allows faster backups

PowerScale Advanced Administration

Page 540 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

• Enables NAS and DMA459 vendors to focus on core competency and


compatibility
• It is a cooperative open standard initiative.

459 Data Management Application, or DMA uses an industry standard protocol that
facilitates a network-based method of controlling data backup and recovery
between NAS devices and data management applications. The DMA is responsible
for initiating the NDMP connection, provides authentication information, and passes
backup and recovery parameters to the NAS. The DMA also maintains the NDMP
client and device configuration and manages the backup catalog and NDMP client
file index.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 541


[email protected]
Disaster Recovery

NDMP Fibre Channel Combo Card

Fiber channel combo card is an optional front-end hybrid NIC/HBA.

• Gen 6 only
• 2xFC ports and 2x10GbE
• Two-way NDMP connection simultaneous with client connections
• Sold and deployed with node pairs.
• OneFS has no inherent PowerScale tools for troubleshooting the card -
common tools are camcontrol, mt, chio, and various ocs_fc ioctls
and .sysctls

PowerScale Advanced Administration

Page 542 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

NDMP Backup Options

Two-way or Direct Backup

Cluster
Backup Application

Tape Library

Shown is the topology of a direct NDMP model, or two-way backup.

The NDMP two-way backup is also known as the local or direct NDMP backup.

• Backup application manages the backup.460


• Data traverses the back-end network.

460The DMA (Such as NetWorker) controls the NDMP connection and manages
the metadata. The NAS backs up the data directly, over Fibre Channel, to a locally
attached NDMP TAPE device.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 543


[email protected]
Disaster Recovery

If a cluster detects tape devices, the cluster creates an entry for the path461 to each
detected device.

Three-way or Remote Backup

Backup Application
Cluster

Tape Library

The NDMP three-way backup is also known as the remote NDMP backup.

461 If you connect a device through a Fiber Channel switch, multiple paths can exist
for a single device. For example, if you connect a tape device to a Fiber Channel
switch, and then connect the Fiber Channel switch to two Fiber Channel ports.
OneFS creates two entries for the device, one for each path.

PowerScale Advanced Administration

Page 544 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

• Backup application controls the backup.462


• Data traverses the front-end network (LAN).

Two-way or Direct Backup: The accelerators connect over fiber channel to the
backup tape library or virtual tape library system and gain greater backup
efficiencies.

Backups to virtual tape library systems are recommended for greater performance.
If possible use applications such as Data Domain with inline deduplication
capabilities to improve remote backup bandwidth efficiencies and storage
efficiencies. Understanding the bandwidth, the data change rate, the number and
average size of files can help determine the backup window.

Large datasets require either longer backup windows, more bandwidth, or both to
meet the backup SLAs. Two-way backups are the most efficient model and results
in the fastest transfer rates. The data management application uses NDMP over
the Ethernet to communicate with the Backup Accelerator node. The Backup
Accelerator node, which is also the NDMP tape server, backs up data to one or
more tape devices over fiber channel. File History, the information about files and
directories is transferred from the Backup Accelerator node to the data
management application, where it is maintained in a catalog.

NDMP Multistream Backup and Recovery

You can use the NDMP multistream backup feature, with certain DMAs, to speed
up backups.

462During a three-way NDMP backup operation, a DMA on a backup server


instructs the cluster to start backing up data to a tape media server that is either
attached to the LAN or directly attached to the DMA. The NDMP service runs on
one NDMP Server and the NDMP tape service runs on a separate server. Both the
servers are connected to each other across the network boundary.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 545


[email protected]
Disaster Recovery

• Back up concurrently.
• Same backup context.
• Backup context is retained.
• Recover that data in multiple streams.
• Data is recovered one stream at a time using CommVault Simpana.

Note: NDMP re-startable backup feature does not support OneFS


multistream backups.

• Back up concurrently: With multistream backup, you can use your DMA to
specify multiple streams of data to back up concurrently.
• Same backup context: OneFS considers all streams in a specific multistream
backup operation to be part of the same backup context.
• Backup context is retained: A multistream backup context is retained for five
minutes after a backup operation completes.
• Recover that data in multiple streams: If you use the NDMP multistream
backup feature to back data up to tape drives, you can also recover that data in
multiple streams, depending on the DMA.
• CommVault Simpana: If you back up data using CommVault Simpana, a
multistream context is created, but data is recovered one stream at a time.

Snapshot-Based Incremental Backups

You can implement snapshot-based incremental backups to increase the speed at


which these backups are performed.

• Perform incremental backups without activating a SnapshotIQ license on the


cluster.
• Set the BACKUP_MODE environment variable to SNAPSHOT to enable
snapshot-based incremental backups.

DMA Supported

PowerScale Advanced Administration

Page 546 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Symantec NetBackup Enabled only through an


environment variable.

Networker Enabled only through an


environment variable.

Avamar Yes

CommVault Simpana Enabled only through a


cluster-based environment
variable.

Tivoli Storage Manager Enabled only through a


cluster-based environment
variable.

Symantec Backup Exec Enabled only through a


cluster-based environment
variable.

NetVault Enabled only through a


cluster-based environment
variable.

ASG-Time Navigator Enabled only through a


cluster-based environment
variable.
DMAs that snapshot-based incremental backup works with.

Snapshot-based incremental backups: During a snapshot-based incremental


backup, OneFS checks the snapshot that is taken for the previous NDMP backup
operation and compares it to a new snapshot. OneFS then backs up all files that
were modified since the last snapshot was made. If the incremental backup does
not involve snapshots, OneFS must scan the directory to discover which files were
modified. If the change rate is low OneFS can perform incremental backups faster.

Set the BACKUP_MODE environment variable to SNAPSHOT to enable snapshot-


based incremental backups. After setting the BACKUP_MODE environment
variable, snapshot-based incremental backup works with certain DMAs as listed in

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 547


[email protected]
Disaster Recovery

the table. If you enable snapshot-based incremental backups, OneFS retains each
snapshot that is taken for NDMP backups until a new backup of the same or lower
level is performed. However, if you do not enable snapshot-based incremental
backups, OneFS automatically deletes each snapshot that is generated after the
corresponding backup is completed or canceled.

SmartLink Backup Options

NDMP ComboCopy

Using DeepCopy, the recall backs up the files, not the SmartLinks.

ComboCopy provides options to back up SmartLink files with data.

ComboCopy backup environment variable is 0x400 for BACKUP_OPTIONS.

Set NDMP backup environment variable:

isi ndmp settings variables /BACKUP BACKUP_OPTIONS 0x400

PowerScale Advanced Administration

Page 548 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

NDMP DeepCopy

DeepCopy backs up files as regular files.

Set NDMP restore environment variable:

isi ndmp settings variables /RESTORE RESTORE_OPTIONS 0x100

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 549


[email protected]
Disaster Recovery

NDMP ShallowCopy

By default, ShallowCopy backs up the SmartLinks as a SmartLink files.

• No data is recalled from the cloud.


• Files can only be restored as SmartLink files.

ShallowCopy backup environment variable is 0x200 for BACKUP_OPTIONS and


RESTORE_OPTIONS.

The ShallowCopy restore option is available for SmartLink files backed up with
ComboCopy option.

Important: DeepCopy and ComboCopy backups recall file data from


the Cloud. The data is not stored on disks. Recall of file data may
incur charges from Cloud vendors.

ComboCopy: ComboCopy provides options to back up SmartLink files with data


so that you can recover the files as regular files or SmartLink files during a restore.

PowerScale Advanced Administration

Page 550 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

When using ComboCopy, the file recall backs up both the file and the SmartLink.
NDMP environment variables through the DMA or the NDMP setting variables CLI
can set backup and restore options.

DeepCopy: The file is recalled from the cloud and then backed up. Files can only
be restored as regular files. DeepCopy backup environment variable is 0x100 for
BACKUP_OPTIONS and RESTORE_OPTIONS. The DeepCopy restore option is
available for SmartLinked files that are backed up with ComboCopy option. If
DeepCopy or ShallowCopy restore option is not specified during restore, the
default action is to restore using ShallowCopy, but switch to DeepCopy if version
check fails.

NDMP Redirector and Throttler

NDMP Redirector distributes NDMP loads automatically over nodes with the FC
combo card.

NDMP Throttler manages the CPU usage during NDMP two-way sessions on
cluster nodes.

1: NDMP redirector and throttler features are enabled only using the CLI:

• isi ndmp settings global modify --enable-redirector true


• isi ndmp settings global modify --enable-throttler true

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 551


[email protected]
Disaster Recovery

2: NDMP daemon checks each node CPU usage, number of NDMP operations
already running, and availability of tape target used for the operation. If no suitable
node is available, the operation runs on the node receiving the request.

When enabled, the NDMP throttle manages CPU usage of NDMP backup and
restore sessions running on the nodes.

3: An internal NDMP session runs between the agent node and the redirected
node. The DMA does not notice any difference when the session is redirected.

4: Agent session is proxy - redirects messages between DMA and redirected


session. The redirected session runs on the redirected node until DMA closes the
session.

You can enable the NDMP Redirector to automatically distribute NDMP two-way
sessions to nodes with lesser loads. NDMP Redirector checks the CPU usage,
number of NDMP operations already running, and the availability of tape devices
for the operation on each node before redirecting the NDMP operation. The load-
distribution capability results in improved cluster performance when multiple NDMP
operations are initiated. NDMP traffic can overwhelm the Gen 6 nodes that are
deployed with the FC combo card.

If no suitable node is available, the operation runs on the node receiving the
request. When enabled, the NDMP throttle manages CPU usage of NDMP backup
and restore sessions running on the nodes.

NDMP Redirector and Throttler Considerations

Following are the considerations for NDMP Redirector and Throttler:

• Three-way NDMP operations are not supported - redirection only happens with
FC connectivity.
• A redirected session fails if DMA connection breaks or DMA changes the
session to run a 3-way operation.
• The default CPU threshold value is 50, which means that the throttler limits
NDMP to use less than 50% of node CPU resources.
• The throttler threshold value can be changed using CLI - example: isi ndmp
settings global modify –throttler-cpu-threshold 80
• Throttler settings are global to all cluster nodes.

PowerScale Advanced Administration

Page 552 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

NDMP Version Check - CloudPools 2.0

• NDMP backup automatically includes the information that a SmartLink file


requires.
• Version check feature is not configurable, and cannot be disabled.
• Pre OneFS 8.2 - if no version check is performed during restore, SmartLink files
may be recovered, but may not be usable.

1: It is a limitation of performing restore operations. Restore target is a cluster


running pre OneFS 8.2.0 and CloudPools version 1.0. Since the files are backed up
using CloudPools 2.0, the stubs cannot be recovered to the cluster using
CloudPools 1.0. Also, the SmartLink file requires S3 version 4. The restore skips
files if the feature is unsupported by target cluster.

NDMP Statistics

OneFS 8.2 and later has some minor changes to backup and restore statistics. In
the example, the Stub Files field is new in OneFS 8.2 and later. Stub files are
SmartLink files that track the file data. The field shows the backup and the restore
statistics output.

Example output:

Objects (scanned/included):
----------------------------
Regular Files(scan/incl(reg/worm/sparse)): (0/0(0/0/0))
Stub Files(scan/incl(stub/reg/combo)): (4/4(4/0/0))

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 553


[email protected]
Disaster Recovery

Directories : (1/1)
ADS Entries : (0/0)
Soft Links(scan/incl(slink/worm)) : (0/0(0/0))
Hard Links : (2/2)
Block Devices : (0/0)
Char Devices : (0/0)
FIFO : (0/0)
Sockets : (0/0)
Whiteout : (0/0)
Unknown : (0/0)

NDMP for Disaster Recovery

NDMP backup and restore operations can be done on data that is archived to the
cloud.

Cloud data backed up if


using deep copy

Restore cluster data,


and stub files, to
Stub associated data replacement cluster
backed up

The NDMP backup can back up PowerScale CloudPools stub files. Data that is
associated with the stub file such as account information, local cache state, and
unsynchronized cache is also backed up. Data that are archived in the cloud is not
backed up unless using DeepCopy. When the data is restored, the stub file and its
attributes are restored. In the event the source cluster cannot be recovered,
restoring the data to the disaster recovery cluster maintains the stub files. Clients
accessing the new cluster can access the data that is stored in the cloud.

PowerScale Advanced Administration

Page 554 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

PowerScale: Data Protection with NetWorker using NDMP

PowerScale NAS storage integration with NetWorker software provides esteemed


data protection and recovery capabilities for enterprises of all sizes in a secure
way.

• NetWorker supports wide range of data protection options including NDMP


Support for NAS storage devices.
• Three main components that support NDMP data operations with the
NetWorker software are:

− NDMP Data Server (PowerScale)


− NDMP Tape Server that is the host with the backup device to which
NetWorker writes the NDMP data.
− Data Management Agent (DMA) in which the NetWorker server is the
DMA.

The NDMP Data Server (NAS) sends data to a locally attached tape device or library.

NetWorker software uses Network Data Management Protocol (NDMP)


functionality to enable access to storage in a heterogeneous network environment.
NDMP uses TCP/IP to control the movement of the data and specifies various
device drivers to store the data on devices.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 555


[email protected]
Disaster Recovery

Avamar/PowerScale Integration

You can backup up a PowerScale cluster with Avamar (source-based deduplication


backup solution).

Dell EMC Avamar provides fast, reliable NAS system backup and recovery through
the Avamar NDMP Accelerator. Avamar reduces backup timings and the impact on
NAS resources, allowing an easier and faster recovery.

• The NDMP accelerator is available as both a virtual and hardware appliance.


• The fast-Incremental software architecture provides an efficient solution to
protect High-Density File Systems (HDFS).

Accelerator deployment diagram.

To back up and restore data residing on NAS systems, Avamar uses a device
called an Avamar NDMP Accelerator (accelerator). The accelerator is a dedicated
Avamar server node that functions as an Avamar client. The accelerator uses
NDMP to interface with and access NDMP-based NAS systems. Avamar does daily
incremental backups, which can be used with the initial full backup to create daily
synthetic full back ups. Because Avamar uses a hashing strategy on the source
track changes, incremental backups are fast.

Data from the NAS system is not stored on the accelerator. The accelerator
performs NDMP processing and real-time data deduplication and then sends the
data directly to the Avamar server. The accelerator can be connected to either a
Local Area Network (LAN) or Wide Area Network (WAN) with respect to the
Avamar server. However, to ensure acceptable performance, the accelerator must
be located on the same LAN as the NAS systems.

PowerScale Advanced Administration

Page 556 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Backup Considerations

The NDMP backup considerations are:


• While full backups of high capacity clusters may or may not be feasible, do not
forget that it is the restore that really matters.
• LAN performance is limited by network which limits the LAN performance when
using GigE ports on a direct connection.

Link: See PowerScale OneFS Backup and Recovery Guide for more
information.

Although it is not exactly a backup, due to the high capacities and large file counts,
using snapshots and replicating to a DR site is a common “backup” strategy. One
drawback of the snap and replicate approach is the lack of a catalog. You should
know what you want to restore, or search through a snapshot. Using a snap and
replicate strategy on PowerScale with OneFS protects against accidental deletions,
as well as data corruption and provides a DR copy of the data and file system.

Backup Enhancement

Sparse Punch - CommVault

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 557


[email protected]
Disaster Recovery

OneFS 8.2 introduces support for sparse files in a CommVault backup solution.

• CommVault eliminates the need to overprovision PowerScale storage.


• In previous OneFS versions, when CommVault deletes files from the backup
catalog, that capacity was not deleted from the PowerScale cluster.

Note: This functionality is only supported with OneFS 8.2 and later,
and CommVault v11 SP10 and higher.

Sparse Punch Function

In previous OneFS versions, sparse is supported by extending out a file or


truncating a file in, but not "punching" in the middle of a file. The tabs show the
function of punching in the middle of a file.

PowerScale Advanced Administration

Page 558 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Logical and Physical Size

Shown is a simple depiction of a file whose logical and physical size is three
blocks.

Sparse Punch

With Sparse Punch, blocks within the middle of a file can be selected. The logical
size remains the same whereas the physical capacity is freed. Reading Block 1
results in zeros, and writing to Block 1 once again consumes physical capacity.

Sparse Punch Support

Sparse Punch is disabled by default.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 559


[email protected]
Disaster Recovery

• Use the isi smb settings command to enable.


• Sparse Punch is enabled on a per share basis.
• Check /var/log/messages for errors and capture a Pcap to verify protocol
correctness.

Command Output

isi smb settings share view | grep Sparse File: No


Sparse

isi smb settings share modify -- NA


sparse-file=True

isi smb settings share view | grep Sparse File: Yes


Sparse

PowerScale Advanced Administration

Page 560 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Cloud and Virtual Storage Strategies

Cloud and Virtual Storage Strategies

PowerScale CloudPools Overview

 CloudPools offers the flexibility of another tier of storage that is off-premise and
off-cluster.
 CloudPools provide a lower TCO463 CloudPools optimize primary storage with
intelligent data placement.
 CloudPools expands the SmartPools framework by treating a cloud repository
as an additional storage tier.
 CloudPools eliminates management complexity and enables a flexible choice of
cloud providers for archival-type data.

463CloudPools optimize primary storage with intelligent data placement.


CloudPools eliminates management complexity and enables a flexible choice of
cloud providers.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 561


[email protected]
Disaster Recovery

CloudPools Concepts

CloudPools moves file data from PowerScale to the cloud. The files are easily
accessible to the users as needed. Below are key CloudPools concepts that affect
the end users:

 Archive464

464The CloudPools process of moving file data to the cloud. This process extracts
the data from the file and places it in one or more cloud objects. CloudPools then
moves these objects to cloud storage, and leaves in place on the local cluster a
representative file. This file is called a SmartLink file.

PowerScale Advanced Administration

Page 562 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

 SmartLink File465
 Inline Access466

The key configuration components for CloudPools:

• Cloud Provider Accounts467


• Cloud Storage Accounts468
• File pool policies469

465 SmartLink file contains metadata and map information which allows the data in
the cloud to be accessed or fully recalled. Smartlinks are also called stub or stub
files. If the SmartLink file archiving policy permits it, users can automatically
retrieve and cache data from the cloud by accessing the SmarkLink file.

466 CloudPools enables users connecting to a cluster through supported protocols


to access cloud data by opening associated SmartLink files. This process is called
inline access. CloudPools offers inline access as a user convenience. It is designed
mainly as an archival solution, and is not intended for storing data that is frequently
updated.

467CloudPools requires you to set up one or more accounts with a cloud provider.
You use the account information from the cloud provider to configure cloud
accounts on the PowerScale cluster.

468A cloud storage account is a OneFS entity that defines access to a specific
cloud provider account. These accounts are used to enable and track local use of a
cloud provider account. The cloud storage account configuration includes the cloud
provider account credentials.

469File pool policies are the essential control mechanism for both SmartPools and
CloudPools. OneFS runs all file pool policies regularly. Each file pool policy

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 563


[email protected]
Disaster Recovery

CloudPools Setup

Running CloudPools requires the activation of the two software module licenses:
SmartPools and CloudPools. To activate the licenses, upload the signed license file
to OneFS.

• Upload updated license file (Web UI)


− Navigate to Cluster Management > Licensing.
− In the Upload and activate a signed license file area, click Browse and
select the signed license file. Click Upload and Activate.
• Upload updated license file (CLI)

− Once a signed license is received from Dell EMC Software Licensing Central
(SLC), upload the file to your cluster. To add run:
isi license add --path <file-path <file-path-on-your-
local-machine>

specifies the files to manage, actions to take on the files, protection levels, and I/O
optimization settings.

PowerScale Advanced Administration

Page 564 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Configure CloudPools

CloudPools default settings can be managed, including snapshot archival,


encryption, compression, cache settings, data retention settings, and the ability to
regenerate an encryption key.

• To view the top-level configuration settings for CloudPools, run the following CLI
command:
− isi cloud settings view
• To modify CloudPools settings, run the following CLI command:
− isi cloud settings modify
• At times when primary encryption key gets compromised, generate a new
primary encryption key, by running the following command:

− isi cloud settings regenerate-encryption-key

Content source: PowerScale OneFS 9.0.0.0 CloudPools


Administration Guidehttps://inside.dell.com/docs/DOC-50394
Isilon OneFS 8.2.x CloudPools Administration Guide

Cloud Providers and Storage

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 565


[email protected]
Disaster Recovery

CloudPools supports the following cloud providers and associated storage types:

• Dell EMC PowerScale470


• Dell EMC ECS Appliance471
• Amazon S3472
• Amazon C2S S3473

470A secondary PowerScale cluster provides a private cloud solution. The primary
cluster archives files to the secondary cluster. Both clusters are managed in your
corporate data center. The secondary cluster must be running a compatible version
of OneFS. To act as a cloud storage provider, the PowerScale cluster uses APIs
that configure CloudPools policies, define cloud storage accounts, and retrieve
cloud storage usage reports. These APIs are known collectively as the PowerScale
Platform API.

471CloudPools supports ECS appliance as a cloud provider. ECS is a complete


software-defined cloud storage platform deployed on a turn-key appliance from Dell
EMC. It supports the storage, manipulation, and analysis of unstructured data on a
massive scale. The ECS appliance is specifically designed to support mobile,
cloud, big data, and next-generation applications

472CloudPools can be configured to store data on Amazon Simple Storage Service


(Amazon S3), a public cloud provider. CloudPools supports only S3 Standard
storage classes on Amazon S3. When you first establish an account with Amazon
S3, the cloud provider gives you an account ID and allows you to choose a storage
region. Amazon S3 offers multiple storage regions in the U.S. and other regions of
the world.

473CloudPools can be configured to store data on Amazon C2S (Commercial


Cloud Services) S3 (Simple Storage System). When you configure CloudPools to
use Amazon C2S S3 for cloud storage, in addition to URI, username, and passkey,

PowerScale Advanced Administration

Page 566 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

• Microsoft Azure474
• Google Cloud Platform475
• Alibaba Cloud476

Cloud Storage Accounts

• You can create and edit one or more cloud storage accounts in OneFS.

you must specify the S3 Storage Region in the connection settings. When you first
establish an account with Amazon C2S, the cloud provider gives you an account ID
and allows you to choose a storage region. Amazon C2S offers multiple storage
regions in the U.S. and other regions of the world.

474You can configure CloudPools to store data on Microsoft Azure, a public cloud
provider. CloudPools supports Blob storage, Hot access tiers on Microsoft Azure.
Cold blobs are not supported. When you establish an account with Microsoft Azure,
you create a username. Microsoft provides you with a URI and a passkey. When
you configure CloudPools to use Azure, you must specify the same URI,
username, and passkey.

475 CloudPools can store data on Google Cloud Platform, a public cloud provider.
CloudPools supports Standard, Nearline, and Coldline storage types on Google
Cloud Platform. Google Cloud Platform must be set to interoperability mode. Once
it is done, you can now configure Google Cloud Platform as the provider in OneFS
CloudPools.

476 CloudPools can store data on Alibaba Cloud, a public cloud provider.
CloudPools supports Standard OSS storage on Alibaba Cloud. When configuring
Alibaba Cloud as the provider, you must provide the Alibaba URI, username, and
passkey. Alibaba offers multiple sites in the U.S. and other areas of the world. The
URI indicates your chosen connection site.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 567


[email protected]
Disaster Recovery

• Before creating a cloud storage account, establish an account with one of the
supported cloud providers.
• OneFS attempts to connect to the cloud provider using the credentials you
provide in the cloud storage account.
• To create an Amazon C2S S3 account, perform the following steps using the
OneFS CLI:

− Import the CA certificate.


o isi certficate authority import [--name
certificate_name] [--description
certificate_description]

− Import the CAP Client Certificate and Private Key


o isi cloud certificate import [--name certificate_name]
[--certificate-key-password]
A cloud storage account provides OneFS with the information it requires to connect
to a remote cloud storage provider. OneFS attempts to connect to the cloud
provider using the credentials you provide in the cloud storage account. To specify
a proxy server with the cloud storage account, first create the proxy server with the
isi cloud proxies create command.

Create Cloud Storage Account (Web UI)

PowerScale Advanced Administration

Page 568 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

• Navigate to File System > Storage Pools > CloudPools


• Click on:

• In the Type drop-down menu, select a type of cloud account.


• Enter In the Name or Alias field, enter a name for the account.
• Click the Connect Account button.

Create Cloud Storage Account (Web UI)

Once the Create a Cloud Storage Account dialog box closes, a new cloud
account appears in the Cloud Storage Accounts list. The Name, Type, State,
Username, and URI associated with the account is displayed.

1: Enter the fully qualified URI for the account. The URI must use the HTTPS
protocol, and match the URI used to set up the account with your cloud provider

2: Enter the cloud provider account username. This username should have been
set up with the cloud provider.

3: Enter the password or secret key that is associated with the cloud provider
account username.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 569


[email protected]
Disaster Recovery

4: If you have defined one or more network proxies, and want to use one for this
cloud account, select the name from the proxy.

5: Disable or prevent certificate validation.

Cloud Policies

• CloudPools takes advantage of the SmartPools infrastructure, and applies file


pool policies477 to determine the files to archive to the cloud.
• By defining file pool policies, you can have OneFS automatically archive files to
the cloud when they match certain characteristics, such as age, size, type, or
location.
• Now to create file pool policy, navigate to File System > Storage Pools > File
Policies. Select +Create a File Pool Policy.
• In the Create a File Pool Policy dialog box, enter a policy name and
description. In the Select Files to Manage area, use the drop-down menus to
specify the file selection criteria for cloud storage.
• In the Apply CloudPools Actions to Selected Files area, select Move to cloud
storage.
• In the CloudPool Storage Target drop-down menu, select an existing
CloudPool, and specify whether to encrypt and compress data before it is
archived to the cloud.
• Once all this is complete, select Create Policy. The file pool policy appears
under File Pool Policies in the File Pool Policies window.

477A file pool policy can specify a local storage target, a cloud storage target, or
both.

PowerScale Advanced Administration

Page 570 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

SyncIQ with CloudPools

CloudPools relationship
restored during failback

Target access to
CloudPools data

Source and target


can transfer stub
Seamless disaster Intact failback to Stub files
files, or full data
recovery failover to synchronized
source
target

• The source cluster employs CloudPools to tier files to the cloud.


• SyncIQ can replicate and understand the CloudPools data natively.
• Both the source cluster and target cluster are CloudPools aware

Supporting the synchronization of stub files does not change the SyncIQ
capabilities during the process including failover and failback for disaster recovery.
Both the source cluster and target cluster are CloudPools aware, meaning the
target cluster supports direct access to CloudPools data.

Deep Copy Configuration

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 571


[email protected]
Disaster Recovery

• You can create a SyncIQ policy that replicates full files rather than SmartLink
files when copying data from the primary source cluster to a secondary target
cluster.
• When you create a SyncIQ policy, you can modify the Deep Copy for
CloudPools setting.

ECS Integration with PowerScale

PowerScale Advanced Administration

Page 572 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Load balancer presents a


single IPaddress

Balances connectivity to the


individual public node IP
addresses

• Connectivity to each ECS node in a VDC is established using the public node IP
addresses.
• In this configuration, the load balancers communicate with each other to
balance access across the VDCs in addition to between the ECS nodes.
• In a disaster recovery configuration for PowerScale CloudPools, the secondary
PowerScale cluster is also configured for CloudPools access to the ECS
namespace.

CloudPools and ECS provide lower-cost archival storage, expand PowerScale


cluster capacities, and provide multisite disaster recovery protection.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 573


[email protected]
Disaster Recovery

Advantages of ECS with PowerScale

Multisite solution for


Lower operation
data loss and site
cost loss protection

Reclaim capacity Reduce


Remove static data management
overhead

An ECS478 solution offers a single site, a two site, or a three site configuration to
meet organizational requirements and growth. Other benefits of combining ECS
with CloudPools are:

• Reclaim space on existing PowerScale primary storage systems


• Reduce on-going primary and backup storage acquisition costs
• Remove static data out of the recurring backup process
• Reduce management and operation costs
• Colocation benefits such as lower data residency risks, lower networking costs
and lower latency

478ECS provides lower-cost active/active geo-distributed cloud storage to meet the


organization needs. ECS provides the flexibility of single or multi-site data loss
protection for archival data.

PowerScale Advanced Administration

Page 574 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Advantages and Disadvantages of Public Cloud Storage

Advantages Disadvantages

Accessibility • Files in the Security and Data is outside


cloud can be Privacy organizational
accessed from control
anywhere
• With a
PowerScale
solution, files in
the cloud can
be accessed
when a cluster
fails over
• Access to files
remains
through the
cluster,
maintaining file
attributes

Cost Savings Reduced CAPEX Bandwidth • Costly


surcharges
• Network
connectivity
issues

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 575


[email protected]
Disaster Recovery

Disaster Recovery • Provides Lock-in • Cost of


storage for accessing data
more copies of • Migrating
data massive
• With amounts of
PowerScale, data out of the
archive type cloud can be
data is not very costly
replicated or
synchronized to
a remote
PowerScale
cluster
• PowerScale
files are
accessed from
the remote
cluster during a
disaster

Unlimited Storage Grow capacity Cost of Growth As data grows, so


without growing does the cost of
CAPEX storing the data

Deployment After connected Outages and • Network


and deployed, attacks outages can
access to impact
PowerScale files is business
transparent to the • Routine
user maintenance
may not
conform to
organization
windows

PowerScale Advanced Administration

Page 576 © Copyright 2020 Dell Inc.


[email protected]
Disaster Recovery

Configure Write Access to CloudPool Data

• Obtain the GUID that is associated with the cloud data by running the following
command, on the cluster that originally archived to the cloud:
− isi cloud access list
• On the primary cluster, remove write access to the cloud data:
− isi cloud access remove <GUID>
• On the secondary cluster, give write access to the cloud data.
− isi cloud access add
• Now the secondary cluster can write modifications to the cloud, rather than
storing the modifications in cache.

Caution: If the primary cluster is not operational and cannot be made


operational, you must ensure to remove the write access from the
secondary cluster before attempting to restart the primary cluster.
Data corruption could result if two clusters have write access to the
cloud data.

Best Practices

• For better performance, use timestamps for cloud data archival and recall.
• You can gain the most benefit from CloudPools, in terms of freeing up storage
space on your cluster, by archiving larger files.
• Create exclusive accounts for CloudPools purposes, this prevents conflicts that
might lead to data corruption or loss.
• Use entirely separate accounts for other cloud applications with your cloud
provider.
• If you are performing a rolling upgrade to a OneFS 8.2.x version, and intend to
use CloudPools for the first time, wait until the upgrade and commit complete.

− By waiting for the upgrade to complete, you start using CloudPools with the
most recent CloudPools upgrades.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 577


[email protected]
Disaster Recovery

Challenge

Lab Assignment:
1) Configure a cloud storage account and create a CloudPool.
2) Demonstrate SyncIQ and CloudPools integration.

PowerScale Advanced Administration

Page 578 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Performance and Monitoring

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 579


[email protected]
Performance and Monitoring

Performance and Monitoring

PowerScale Advanced Administration

Page 580 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Module Objectives

After completion of this module, you can:

• Discuss performance analysis, identify performance issues, establish


performance baseline.
• Describe DataIQ architecture and understand dashboard navigator.
• Describe HealthCheck features and configurations.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 581


[email protected]
Performance and Monitoring

DataIQ Deep Dive

DataIQ Deep Dive

Scenario

Scenario

Hayden has asked you to help resolve the


organization's storage consumption issue.
Based on the complaint that storage manager
received, identify the issue.

Storage Consumption Report:

• The organization wants an analysis on their projects to see on which volumes


the projects use their storage consumption.
• The storage manager received a complaint that the priority project,
CloudProject, has slower than normal times when rendering video.

PowerScale Advanced Administration

Page 582 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Solution

Identifying the issue:

Show by Tag

Legend shows the


associated volumes
Tag category

Project across
multiple volumes
Show the size

Show by volume Volume and capacity


metrics for that volume

The graphic (DataIQ V1 interface) shows the report on a configured tag using different options.

• The manager can drill into the file and may discover the video rendering files
were moved to a lower tier of storage.
• The report enables the manager to quickly discover the potential issue.
• The chart shows the storage capacity that each project consumes and on which
volume or volumes host the project.
• In a large environment, elements of a project are likely to be located across
several or many different volumes.
• In this example, auto-tagging is configured. You can configure auto-tagging on
the Settings, Data management configuration page.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 583


[email protected]
Performance and Monitoring

DataIQ Overview

DataIQ 1.0 DataIQ 2.0

Data Management Data management


Storage monitoring

Dataset management functionality was released in DataIQ 1.0 and is called Data
Management479.

DataIQ 2.0 introduces new storage monitoring capabilities for PowerScale clusters.

• It provides tools to monitor and analyze480 a cluster.

479Data Management delivers a unique method for managing unstructured data


that is stored across multiple, heterogenous file and object storage platforms, either
on-premises or in the cloud. It provides file system scanning, indexing,
classification, and fast searching, and enables single-pane-of glass visibility into all
unstructured data assets under management.

480It provides tools to monitor and analyze a cluster’s performance and file
systems, and simplifies administration for tasks such as cluster health monitoring,
cluster diagnostics, and capacity planning.

PowerScale Advanced Administration

Page 584 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

• Enables in-depth analytics and cluster troubleshooting.


• It also provides monitoring, reporting, and analysis for OneFS functionality481.

Key values of storage monitoring:

• Achieve single-pane-of-glass visibility into multiple PowerScale clusters.


• On-premise tool enables all sites, including dark sites, under management.
• Track performance metrics across one or multiple PowerScale clusters to assist
with monitoring and diagnosis.
• Analyze based on performance trends over time, and use this information to
proactively diagnose bottlenecks and tune performance to improve resource
efficiency and optimize end-user experience.
• Analyze based on storage usage trends over time, and use this information to
forecast future capacity needs.
• Identify key warnings and critical cluster events, and leverage PowerScale
cluster health insights to meet strict SLA requirements.
• Monitor the PowerScale storage environment cross-cluster at enterprise-
scale—up to 70 clusters and 2000 nodes.

Architecture

DataIQ is designed using a microservices architecture. Users can access DataIQ


through the WebUI, or through HTTP, Python, or Java.

481 It also provides additional OneFS functionality such as quota analysis, tiering
analysis, and file-system analytics. It is also capable of providing a graphical output
for easy trend observation and analysis. It uses OneFS PAPI to collect data for
storage monitoring and does not use cluster resources beyond the data-collection
process.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 585


[email protected]
Performance and Monitoring

DataIQ 2.0 consists of the two main parts:

• Data management482
• Storage monitoring483 (PowerScale only)

482 Data management uses the metadata scanner to scan and index unstructured
file or object metadata and stores it in an on-disk database (RocksDB) for data
management. RockDB is used for data management. The default backup path is
/mnt/ssd/data/claritynow/ssd/backup.

483 After adding a PowerScale cluster to DataIQ and establishing a connection with
DataIQ, a collector is created automatically. It can collect monitoring data and store
it into TimescaleDB for storage monitoring. TimescaleDB is used for storage
monitoring. The default backup path is /opt/dataiq/backup/timescale.

PowerScale Advanced Administration

Page 586 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Note: See Dell EMC DataIQ: Best Practices Guide for more
information.

Sizing Guidelines

Sizing estimates depend on the complexity of the DataIQ solution and


requirements484.

Sizing Challenges

Disk-space requirements depend on various factors that are used for data
management or storage monitoring.

Data management:

• The data management portion of the software writes to an IndexDB which is


installed and hosted on a specific mount point485, such as /mnt/ssd.
• The underlying disk platform486 is required to be SSD.

484There is no one size that can fit all hardware resources that are planned for
DataIQ because every environment is different. The CPU, memory, and network
are shared resources for dataset management and storage monitoring.

485 This mount point (or subdirectory or folder) must exist before installation either
in the form of a simple folder or as a mounted partition.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 587


[email protected]
Performance and Monitoring

Storage monitoring: The general considerations for storage monitoring disk space
are as follows:

• Number of clusters: DataIQ supports up to 70 PowerScale clusters for storage


monitoring.
• Number of nodes: DataIQ supports up to 2,000 PowerScale nodes for storage
monitoring.
• Load rate on the cluster: This is beyond the scope of DataIQ.
• Data collection: DataIQ uses different data collection strategies for different
dashboards.
• Data retention: DataIQ provides a data retention feature and can only delete
older monitoring data.
• Data backup: DataIQ provides a data backup feature for storage monitoring.

General disk sizing

The general rule for sizing is to add additional disk resources to the planned
capacity487.

486This can be in the form of a single large SSD VMDK for the entire operating
system and DataIQ application (such as the OVA by itself), or an SSD partition or
VMDK mounted to /mnt/ssd.

487 It is a best practice to assign designated disk partitions or VMDKs for the
separate functions of storage monitoring and dataset management. These can be
either static primary partitions (for solutions not expected to exceed assigned disk
resources) or by use of logical-volume-managed partitions so the solution may be
extended if needed.

PowerScale Advanced Administration

Page 588 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Data management:

• When sizing a DataIQ solution, consider the complexity of the customer storage
environment and the total number of files, folders, and objects per volume.
• Workloads can be distributed across the external data mover nodes in a scale-
out fashion.

Storage monitoring: Consider the following general sizing rules for storage
monitoring:

• Always add additional disk resources to the planned capacity to allow for
monitoring-data growth.
• Never undersize the estimated size of the disk requirements for monitoring.
• The disk-space requirement depends on various factors488.

Note: DataIQ does not support IPv6. See the Dell EMC DataIQ: Best
Practices Guide for more information about sizing guidelines.

488The disk-space requirement depends on various factors, such as active OneFS


features, load rate, databackup strategy, and data-retention settings. DataIQ
regularly backs up data automatically, but the backup can be disabled or defined as
needed. The data-retention policy is disabled by default, but it can be enabled and
configured.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 589


[email protected]
Performance and Monitoring

Challenge Example

Challenge Example

The graphic shows a representation of the challenge that businesses face when
mining their data.

Example of a project, there could be thousands, millions, or even billions of paths for project x-ray.

• Here the business has three clusters that may be geographically dispersed.
• Each cluster hosts different business units489 and each business unit has work
that is associated with a product feature called x-ray.

489
Furthermore, each business unit has different structures, varying path names,
and path lengths.

PowerScale Advanced Administration

Page 590 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

The challenge is, how can a business find, analyze, and report on a project that
has elements in different locations, with different names, and potentially millions of
different paths at different depths.

Solution Example

The goal or solution is to extract data of the organization490 using their business
knowledge.

geography

department
quarter

project

Geography, department, project, and quarter are the categories of the custom data.

• DataIQ can address the business challenge by applying tags to key categories.

490You can begin identifying or discovering the custom data and then designating
distinguishable categories for the paths. The key is establishing the custom data.
Having the custom data enables you to mine the data.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 591


[email protected]
Performance and Monitoring

• Once the key categories are identified, the auto-tagging configuration file is built
using the categories to create the tags and rules.
• You can then generate reports that are based on the tags, and then act on the
information.

Security Roles and Groups

DataIQ starts with predefined security groups and roles. You can create your own
groups or import groups from tools like Active Directory, and then grant those
groups access to security roles.

Groups Roles Notes

DataIQ Administrators DataIQ Administrator Configure all settings and


(added locally) Data Manager access all features of
Data User DataIQ.

DataIQ Administrators DataIQ Administrator Configure most settings


(inherited through Active Data Manager and features of DataIQ.
Directory) Data User Cannot update users
through the Access and
Permissions menus.

Data Managers Data Manager Manage and access all


Data User data management
content.

Data Users Data User Access data


management content.

Storage Managers Storage Manager Manage and access all


Storage User storage monitoring
content.

Storage Users Storage User Access all storage


monitoring content.
Initial values of the predefined security groups and roles.

PowerScale Advanced Administration

Page 592 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Note: See the Dell EMC DataIQ Admin Guide 2.0.0.0 for more
information about sizing guidelines.

Cluster Summary

The STORAGE MONITORING > Cluster Summary page shows a high-level view
of the monitored clusters. It provides the overall health of clusters at a glance.

There are four common elements at the top of the Cluster Summary page. These
four common elements also apply to other dashboards.

1 2 3 4

1: Clusters: You can choose which specific clusters are shown or filtered through
the Cluster dropdown list. The drop-down list of clusters is always sorted
alphabetically.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 593


[email protected]
Performance and Monitoring

2: Reports: You can share reports by using the Share icon. This option helps you
create a link, and then you can share the link or use to manually create a
bookmark. Click Copy to copy the URL to the clipboard. There are three options:

• Current time range: Choose whether the report shows the time range that is
selected, or whether the time shows relative to when the reader views the
report.
• Template variables: Choose whether the report shows using the system
default variables, or the variables you selected when generating the report.
• Theme: Choose whether the report displays using a light background, a dark
background, or the selected background.

PowerScale Advanced Administration

Page 594 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

3:

Time range: You can change the time range on reports by selecting the Clock
drop-down list. The UTC time zone is used no matter which time zone the user is
in. The time zone is not configurable. Also, you can set a specific time range or a
future time when using the absolute time range. DataIQ expands the graph to cover
that specific time range as they analyze view option.

4:

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 595


[email protected]
Performance and Monitoring

Reports refresh: You can refresh reports to get updated data by clicking the
Refresh button or change the refresh frequency by selecting Refresh drop-down
list. Also, you can change the refresh frequency to off and disable refresh reports.

Note: See Dell EMC DataIQ: Storage Monitoring Solution Guide for
more information.

PowerScale Advanced Administration

Page 596 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Dashboard Navigator

Capacity Dashboard

The Capacity Dashboard page provides a summary capacity usage from tiers and
node pools, directories perspective.

There are five new elements at the top of the Capacity Dashboard, including the
following:

You can choose You can choose one of predefined values


one of predefined including max_physical, max_logical,
You can choose
values including max_app_logical, max_file_count,
one of predefined
hard, soft, and capacity_growth_rate, and
values including 5, You can choose one of advisory. file_count_growth_rate.
10, and 15. predefined values
including growth_rate You can choose one of predefined
andtime_to_full. values including All, not exceeded,
and exceeded.

Client and user dashboard

The Client and User Dashboard page provides detailed information about
protocol operations, protocol latency, and network throughput for clients and users
on the network.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 597


[email protected]
Performance and Monitoring

Cluster performance report

The Cluster Performance Report provides protocol-based operation summaries


from a cluster point of view. The summaries may be selected from across all
clusters, or a single cluster.

Dedupe dashboard

The Dedupe Dashboard page provides the overview on the dedupe and
compression saved and storage efficiency on all clusters.

PowerScale Advanced Administration

Page 598 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

DR/Data protection dashboard

The DR/Data protection dashboard page provides a summary information


pertaining to OneFS SyncIQ policies, NDMP session events, ICAP, and snapshot
statuses.

Filesystem dashboard

The Filesystem dashboard page provides shared directory details which pertain
to file access deferment rates (deadlocked, contended, locked, blocking, and so
forth).

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 599


[email protected]
Performance and Monitoring

Hardware dashboard

The Hardware dashboard page provides a view of the OneFS cluster state from a
hardware perspective. It includes details of operations and activities for each of the
nodes.

Network dashboard

The Network dashboard page provides a network view for the system
administrator from a protocols and network throughput perspective.

PowerScale Advanced Administration

Page 600 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

System dashboard

The System dashboard page provides top N jobs list that you may need attention,
and SRS connectivity status.

Capacity Dashboard: The Capacity Dashboard page includes a Show capacity


forecast section that gives you access to the Capacity forecast sub dashboard.

Graphs and charts are provided which compare physical usage and logical usage
about the top N directories showing the greatest growth rate. The top N directories

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 601


[email protected]
Performance and Monitoring

are sorted by quota-delta to show which directories are pushing assigned quota
limits.

The Capacity Dashboard includes the following three sections:

• Top N tiers and node pools (by time to full or growth rate).
− The Capacity details of tier page can be accessed for a specific tier using
the Capacity dashboard. It shows the capacity that is used for a specific tier
and details of the node pool for the tier.
− The Capacity details of node pool page can be accessed for a specific
node pool using the Capacity dashboard. It shows the capacity that is used
for a specific node pool. You can quickly view significant changes in the
node pool. This line graph helps you to verify when users are writing
significant amounts of data and track large deletes in the form of negative
capacity changes.
• Top N directories (by capacity, file count, and their growth rate).
− The Capacity details of directory page can be accessed for a specific
directory using the Capacity dashboard. It shows the capacity details of a
specific directory and capacity details by user on the directory.
• Top N directories (by quota proximity or overrun).

− The Capacity and Quota details of Directory page can be accessed for a
specific directory using the Capacity dashboard. It shows the capacity and
quota details of a specific directory and capacity details by user on the
directory.
Client and User Dashboard: The Client and User Dashboard includes the
following three sections:

• Top N client IPs and user IDs by protocol ops, protocol latency, and network
throughput.
− This section shows the protocol operations rate, protocol latencies, and
network throughputs on the client over time. You can hover anywhere over
the line graphs to view the point-in-time values of each metric. The line
graphs help you focus on the protocol performance at the client level so that
you can identify abnormal performance on the client.
• Top N user IDs by quota proximity or overrun (quotas include hard, soft, and
advisory).

PowerScale Advanced Administration

Page 602 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

− The Performance details of user and group page can be accessed for a
specific user or group using the Client and user dashboard. It shows the
performance details of a specific user or group, and performance by node
and performance by protocol on the user and group.
− The User/Group Quota page can be accessed for a specific user or group
using the Client and User Dashboard. It shows quota details of a specific
user or group.
• Top N user IDs by capacity, file count, and their growth rates.

− The Capacity details of user/group page can be accessed for a specific user
or group using the Client and user dashboard. It shows the capacity details
of a specific user or group.
Cluster performance report: The Cluster Performance Report includes the
following three sections:

• Protocol operations: This section provides the details that are needed to
monitor protocol operations rate by cluster over time for front-end protocol. It
also provides clusters with operations rate for all protocol.
• Protocol latency: This section provides the details that are needed to monitor
protocol latency by cluster over time for front-end protocol. It also provides
clusters with latency for all protocol.
• Protocol throughput: This section provides the details that are needed to
monitor protocol throughput by cluster over time for front-end protocol. It also
provides clusters with throughput for all protocols.

Dedupe Dashboard: By default, all monitored clusters are displayed. To view a


single cluster, choose the specific cluster of interest from the top-level selection
bar. This information helps you monitor the deduplication and compression at the
cluster level.

• Total capacity used: The average percent of capacity used for all selected
clusters.
• Total dedupe saved: The capacity saved for all selected clusters, including
SmartDedupe (offline deduplication) and inline-dedupe.
• Compression saved: Compression capacity saved for all selected clusters.
• Storage efficiency by clusters: This chart shows the storage efficiency for the
selected clusters.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 603


[email protected]
Performance and Monitoring

DR/Data protection dashboard: The DR/Data Protection Dashboard includes the


following four sections:

• SyncIQ: This section provides the details that are needed to monitor all SyncIQ
policies that are associated with failed and Recovery point objective (RPO) jobs.
• Network Data Management Protocol (NDMP): This section provides details to
help you monitor NDMP halted- and invalid-session-state information and
session details and see if the data is not protected.
• Internet Content Adaptation Protocol (ICAP): This section provides the
details that are needed to monitor all ICAP policies associated with failed jobs.
• Snapshots: A custom view of this section is defined by selecting values for the
following three elements. You can select different values to define a different
view of this section.

Filesystem dashboard: The Filesystem dashboard includes the following two


sections:

• Top N directories by file issued operation activity.


− The Issued events and subpaths page can be accessed for a specific L3
path using the Filesystem Dashboard. It shows the issued events and
subpaths details of a specific L3 path. You can choose one of predefined
values in the Top N sub folders drop-down list including 5, 10 and 15.
• Top N directories by file read/write (RW) operation activity.

− The CRUD events and subpaths page can be accessed for a specific L3
path using the Filesystem Dashboard. It shows the Create, Read, Update,
Delete (CRUD) events and subpaths details of a specific L3 path. You can
choose one of predefined values in the Top N sub folders drop-down list
including 5, 10 and 15.
Hardware dashboard: The Hardware dashboard includes the following four
sections:

• Top N disks
• Top N nodes by performance
• Top N nodes by activity
• Node events

PowerScale Advanced Administration

Page 604 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Network dashboard: The Network dashboard includes the following two sections:

• Top N protocols by latency and throughput.


• List top N network interfaces by throughput.

There are two new elements at the top of the Network dashboard, includes:

• Protocols sorted by: You can choose one of predefined values including
throughput and latency.
• Interfaces sorted by: You can choose one of predefined values including
throughput, packet, and error.

System dashboard: The System dashboard includes the following two sections:

• Top N jobs
• Cluster SRS connectivity issues

There is one new element at the top of the System dashboard, includes:

• Sorted by: Choose one of predefined values including cpu_utilization,


run_time, and disk_IOPS.

Cluster Details

The Cluster details page is accessed for a specific cluster using the Cluster
summary. It provides capacity, event, and node details, and node pool and
CloudPools information about a specific cluster.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 605


[email protected]
Performance and Monitoring

Choose one of predefined values including 50,


100, 500, 1000, 2000, 5000.

Choose which
specific tiers are
Choose one of predefined Choose one of predefined
shown or filtered
values including All, critical, values including All,
through the Tiers
information, and warning. unresolved, and resolved.
drop-down list.

Choose which specific


CloudPools are shown or
filtered through the
CloudPools dropdown list.

There are five elements at the top of the Cluster details.

The Cluster details report includes:

• Capacity details (cluster level): The percent of capacity usage for the
selected cluster is displayed in the left side of the chart. The right side of the
chart displays the capacity by datatype.
• Event details:
− Total critical events: The number of critical events on the cluster.
− Total warning events: The number of warning events on the cluster.
− Total information events: The number of information events on the cluster.
− Event details: This chart shows the detailed information of the events on
the cluster; it includes Issue, Message, Event Group ID, Last event,
Severity, and Status.
• Capacity details by tiers: The capacity details by tiers in the cluster include
Tier, Used, Total capacity, Capacity used %, Growth rate/week, and Time to
full. You can analyze the Tier details report by clicking a specific tier link.

Protocol Cluster Details

The Protocol Cluster Details page is accessed for a specific cluster using the
cluster performance report. It provides the protocol operation, protocol latency, and
protocol throughput for a specific cluster in the selected time range.

PowerScale Advanced Administration

Page 606 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

The Protocol cluster details page includes three sections:

• Protocol details (All protocols)491


• Drill down by protocol (SMB, NFS, S3 only)492
• Analyze by protocol (exclude SMB, NFS, S3)493

491 This section shows the protocol operations rate, protocol latencies and protocol
throughputs for all protocols on the cluster over time. You can hover anywhere over
the line graphs to view the point-in-time values of each metric on the left side of
figures. Also, the maximum and average values of each metric are displayed on the
right side of figures. This chart helps you first focus on the protocol performance at
the cluster level so that you can identify abnormal performance on the cluster.

492This section shows the protocol operations rate, protocol latencies and protocol
throughputs for SMB, NFS, and S3 on the cluster over time.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 607


[email protected]
Performance and Monitoring

DataIQ Analyze Overview

The goal of the DataIQ Analyze function is to provide organizations the ability to
view volumes from a business context.

Show/hide options

Analyze page

Options panel Rollover values

The graphic (DataIQ V1 interface) shows the Data Management, Analyze page default view with
the options panel open.

• The Analyze page enables administrators to focus on capacity metrics for a


given context.
• You can view multidimensional project-oriented data.
• DataIQ Analyze can reveal the true cost of project data to enable business
users to manage their cost and workflows.

Analyze Options Settings

The graphic shows the default options panel on the Analyze page.

493This section shows the protocol operations rate, protocol latencies, and protocol
throughputs for protocols (exclude SMB, NFS, and S3) on the cluster over time.

PowerScale Advanced Administration

Page 608 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

6
1

5
2

4
3

3
4

2
5

1
6

1: You can view All volumes, Volumes, or Tags on the vertical axis. When selecting
the Tags option, the Tag categories option appears.

2: You can sort by Name or Size:

3: The horizontal axis is where you can change most of the elements. You can
show Size or Cost:

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 609


[email protected]
Performance and Monitoring

4: You can breakout the horizontal bar chart by different elements. Shown are the
Breakout options:

5: Switch how the Analyze window displays the data.

6: Refresh analyze data updates the Analyze window. Reset analyze options
puts all the options in the options panel back to the default setting and hides the
panel.

Data Mover Plug-In

DataIQ Data Mover is a plug-in that enables the movement of files and folders.

PowerScale Advanced Administration

Page 610 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Use Data Mover to transfer files and folders between file systems, to S3 storage, or
from S3 storage. The S3 endpoint must be added and mounted to DataIQ.

The Data Mover has three components.

• UI plug-in494
• Service plug-in495
• Workers plug-in496

The Data Mover plug-in contrasts with other data migration tools such as Isilon
CloudPools in that it is not an automated time-based data-stubbing engine. All data
movement is out-of-band in relation to normal file and object access. Data Mover
enables administrators to choose a single file, or an entire folder, for a Copy-Move
operation from the folder tree UI.

Data mover may be installed on the DataIQ server for test purposes. However, the
best practice is to install data mover on a separate VM or host worker node.
Installing data mover separate isolates data I/O traffic from the CPU-intensive
database work that the DataIQ server performs. It is still necessary to install the
data mover plug-in on the DataIQ server first so that the service is running and
available for external worker node call-in.

494 The user interface that is installed on the DataIQ host.

495The service plug-in is the job manager that is installed on the DataIQ host. This
service accepts requests from the Data Mover UI Plugin and services the work
assignment requests from Data Mover Workers.

496 The workers plug-in is the transfer agent service, responsible for transferring
files. Install the Data Mover Worker plug-in on dedicated transfer hosts or the
DataIQ host. Administrators can deploy multiple Data Mover Worker hosts. If on a
dedicated host, requires RHEL/CentOS 7.6 or 7.7.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 611


[email protected]
Performance and Monitoring

Data Mover Process Anatomy

The graphic shows an overview of the end-to-end process for moving data from a
PowerScale cluster to target. The target can be an S3 object target, ECS,
PowerScale, or supported third-party platform.

1: 1. The DataIQ server scans and indexes a mounted volume. The Data Mover UI
plug-in and the Data Mover service plug-in is installed on the DataIQ host.

2: 2. The network share is also mounted to the DataIQ Data Mover worker. The
Data Mover service and workers must be logged into the DataIQ server.

3: 3. A file or folder is manually selected using the DataIQ WebUI.

4: 4. The Data Mover service receives requests from the Data Mover UI and sends
instructions to the Data Mover worker.

5: 5. After validating the file or folder path, the Data Mover heavy-worker begins the
copy job to the target. Only the data, not ACL, and DACL information is sent. No
file-stubs are used. The full file is either copied to target or copied with source
deleted. Light workers perform pre-transfer job validation and path preparation.
Heavy workers perform pre-allocations and data transfer. Administrators can
configure the number of workers in the
/usr/local/data_mover_workers/etc/workers.cfg file.

Allocating too many heavy workers may negatively impact the stability of the host.

6: 6. If the target is S3, the Data Mover uses embedded credentials and the Data
Mover worker logs in to the cloud subscription service. For ECS, the service opens

PowerScale Advanced Administration

Page 612 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

an SSL tunnel to the recommended load balancer. Data copy-move operation is


complete.

DataIQ Vs InsightIQ

The table shows differences between DataIQ and InsightIQ.

Repo DataIQ InsightIQ


rt

Perfor
manc
e by
Client
and
User

Taggi Used to tag tracked items during a Not Available


ng scan.

Down 30s(raw) 30s(raw)


sampl
ing
resolu
tion
and
retenti
on

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 613


[email protected]
Performance and Monitoring

Cluste
r
Perfor
manc
e by
Proto
col

Detail Storage Management File System Analytics


ed
inform
ation
about
files
and
direct
ories

PowerScale Advanced Administration

Page 614 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Cluste
r
Perfor
manc
e by
Node

Cluste
r
Perfor
manc
e

Aggre protocol → client → node → cluster protocol → node → cluster


gation (op_rate/time_avg/in/out) (op_rate/time_avg/in/out)//

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 615


[email protected]
Performance and Monitoring

Challenge

Lab Assignment:
1) Analyze cluster storage using Data Management page.
2) Gather metrics using Storage Monitoring page.

PowerScale Advanced Administration

Page 616 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

HealthCheck

HealthCheck

HealthCheck Overview

• The OneFS HealthCheck tool is a service that helps evaluate the cluster health
status and provides alerts to potential issues.
• Use HealthCheck to verify the cluster configuration and operation, proactively
manage risk, reduce support cycles and resolution times, and improve uptime.
• CLI command for HealthCheck:
• isi healthcheck
• CLI example to view the checklist items:

• isi healthcheck checklists list

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 617


[email protected]
Performance and Monitoring

Checklists and Checklist Items

The graphic shows that the checklist items for the cluster_capacity check. The
HealthCheck terms and their definition are:

• Checklist497
• Checklist item498

497 A list of one or more items to evaluate

498 An evaluated article such as node capacity

PowerScale Advanced Administration

Page 618 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Running HealthCheck

The example shows selecting the Run option for the cluster_capacity checklist.

• By default, a HealthCheck evaluation runs once a day at 11:00 AM.


• You can run a HealthCheck using the WebUI.
• Navigate to Cluster Management > Healthcheck > HealthChecks tab.

Viewing an Evaluation

Evaluation showing
failures

Viewing the evaluation from the WebUI HealthChecks tab.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 619


[email protected]
Performance and Monitoring

You can view the evaluation from the HealthChecks tab or the Evaluations tab. For
a failed evaluation, the file will show the checklist items that failed.

CLI example of viewing a failed evaluation:


isi healthcheck evaluation view basic20200427T0400

HealthCheck Results

Whenever an evaluation of a cluster takes place the system provides a result either
Pass or Fail.

Scenario: View the percentage of service life remaining for the boot flash drives in
each node of the cluster.

Use the isi healthcheck evaluation run boot_drive_wear to run the


evaluation. View the results for the evaluation. The results can also have other
meanings such as:

• Emergency499
• Critical500
• Warning501
• Ok502

499The SSD boot drive has reached its smartfail threshold (100% used or 0% left)
as defined by the manufacturer.

500
The SSD boot drive has reached its end-of-life threshold as defined by the
manufacturer. Contact PowerScale Technical Support for assistance.

501
The SSD boot drive is approaching its end-of-life threshold as defined by the
manufacturer. Contact PowerScale Technical Support for assistance

PowerScale Advanced Administration

Page 620 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Example: Run a SyncIQ HealthCheck

• Check synciq cluster by listing all the available checklists. To do so run: isi
healthcheck checklists list

502
The SSD boot drive has sufficient wear life remaining, as defined by the
manufacturer.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 621


[email protected]
Performance and Monitoring

• Run isi healthcheck checklists view synciq to check which items


the SyncIQ checklist can run for.
• Initiate SyncIQ healthcheck by running: isi healthcheck evaluations
run synciq
• Once completed view the checklist and verify that the evaluation is a success.

• To review the evaluation results run: isi healthcheck evaluations


view synciq20201114T1052

• To view details check the log file /ifs/.ifsvar/modules/health-


check/results/evaluations/synciq20201114T1052

Troubleshooting isi Healthcheck Warnings

Running an isi healthcheck item returns a warning or failure, the first


troubleshooting step is to view the item details. The item details provide further
instructions, including whether the item requires further action and what that action
should be.

PowerScale Advanced Administration

Page 622 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Run the isi healthcheck items view <name> command to view details about a particular
item, where is the name of the item.

Email Notifications

The configuration of the email notifications is set through HealthCheck parameters.

The HealthCheck parameters can be verified by running a new HealthCheck.

The healthcheck_delivery check considers the following three parameters:

• delivery_enabled
• delivery_email
• delivery_email_fail

Configure Email Notifications

Click on each tab to learn more.

Email Settings

To configure email settings for the cluster:

isi email modify --mail-relay mail.test.com --mail-sender


[email protected]

If you update the delivery_email parameter, the results of all evaluations are sent to
the specified address(es), regardless of pass, or fail status.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 623


[email protected]
Performance and Monitoring

HealthCheck Delivery

To configure the HealthCheck delivery settings:

isi healthcheck parameters ser delivery_email [email protected]

If you update the delivery_email_fail parameter, the results of failed evaluations are
sent to the specified address(es). You can use either of these parameters
independently.

Verify Settings

Once the settings are configured the results are sent to [email protected]

• To verify the settings run: isi email view and isi cluster contact
view

Resources

Below is the resource page and link.

PowerScale Advanced Administration

Page 624 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Link to the Info Hub

Best Practices

• Verify the cluster is running the latest Roll Up Patch for HCF (HealthCheck
Framework)
• Verify the latest version of IOCA (Isilon On-Cluster Analysis) has been updated
to HCF.
• A complete run of HCF check would be initiated by running: isi
healthcheck run all.
• For other run options see the isi healthcheck command reference at:

− isi healthcheck Command Reference


• Review the Evaluation log results using isi healthcheck evaluations
list and address any failures/errors.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 625


[email protected]
Performance and Monitoring

Challenge

Lab Assignment:
1) Run and view a HealthCheck evaluation.
2) Configure email notification for a HealthCheck evaluation.

PowerScale Advanced Administration

Page 626 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Performance Foundation

Performance Foundation

PowerScale Generation Comparison

The performance discussion requires an understanding of the make up of the


nodes that build the node pools within the cluster. Understanding performance
helps to define a baseline for the performance discussion later in the topic.

Gen 6.x Nodes I/O Profile Drive Media Tier

F600, F800, F810 High Performance, Flash Extreme


Low Latency Performance

H600 Transactional I/O SAS and SSD Performance

F200, H5600, Concurrency and SATA/SAS and Hybrid/Utility


H500, H400 Streaming SSD
throughput

A200 and A2000 Nearline and Deep SATA Archive


Archive
The table shows the various generations of the PowerScale platforms to better understand the
positioning of the Gen 6, and Gen 6.5 nodes.

F600, F800, and F810 Extreme Performance

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 627


[email protected]
Performance and Monitoring

• The F800 and F810 are an all flash solution that caters to high performance and
high capacity solution needs.
• The F600 provides larger capacity with massive performance in a cost-effective
compact form factor to power the most demanding workloads.
• Target workflows for F800 and F810 are in the digital media503, electronic
design automation504, and life sciences areas505.
• Target Workflows for F600 are M&E studios, hospitals, and financials that need
performance and capacity for demanding workloads.
• The F800/F810 competes against the other all-flash vendor solutions for
workflows that depend on high performance.

− It can accomplish 250-300k IOPS per chassis.


− It gets 15 GB/s aggregate read throughput from the chassis, and with
predictable latency, even when scaling the cluster, the latency remains
predictable.

WinWire: PowerScale for Electronic Design Automation

503 4K, broadcast, real-time streaming, and post production.

504Design, simulation, verification, and analysis of electronic and mechanical


systems, design for manufacturability

505 Genomics DNA and RNA sequencing

PowerScale Advanced Administration

Page 628 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

H600 Performance

• The H600, high performance, SAS-based node is geared toward cost optimized
work environments, but it still produces high-performance numbers.
• It targets verticals such as digital media506 and life sciences507 that do not need
the extreme performance of the F800.
• It is a standard four RU solutions with predictable performance even as it
scales.
• The H600 provides high-density performance that supports 120 drives per
chassis.

506 Broadcast, real-time streaming, rendering, and post production.

507 DNA and RNA sequencing and large-scale microscopy

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 629


[email protected]
Performance and Monitoring

WinWire: PowerScale for Life Sciences

H500, H400, and F200 Utility

• The H500 and H400 hybrid nodes are built for high performance and high
capacity, ideal for utility workflows such as enterprise file services508,
analytics509, ROBO510, and home directories.
• The H500 gives you predictable performance even as it scales.
• The H400 is a capacity optimized solution with an element of performance.

508 Home directories, file shares, group, and project data

509 Big data analytics, Hadoop, and Splunk log analytics

510 The ideal use cases for Gen 6.5 (F200 and F600) is ROBO (remote office/back
office), factory floors, IoT, and retail. Gen 6.5 also targets smaller companies in the
core verticals, and partner solutions, including OEM. The key advantages are low
entry price points and the flexibility to add nodes individually, as opposed to adding
node pairs in Gen 6.

PowerScale Advanced Administration

Page 630 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

• The F200 provides the performance of flash storage in a cost-effective form


factor to address the needs of a wide variety of workloads.
• Target Workflows for F600 are remote offices, small M&E workloads, small
hospitals, retail outlets, IoT, factory floor and other similar deployment
scenarios.

WinWire: PowerScale for Data Analytics

A200 and A2000 Nearline and Deep Archive

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 631


[email protected]
Performance and Monitoring

• The A200 is an active archive box that is optimized for a low cost per TB
solution.
• The PowerScale A2000 is a deep archive solution with the lowest cost per TB.
• Typical workflows are large-scale archives511, disaster recovery targets512, and
general-purpose file archiving513.

WinWire: PowerScale Archive Solutions

511For large-scale, archiving data storage that offers unmatched efficiency to lower
costs.

512Disaster recovery target for organizations requiring an economical, large-


capacity storage solution.

513For economical storage and disk-based access to reference data to meet


business, regulatory and legal requirements.

PowerScale Advanced Administration

Page 632 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Common PowerScale Workflows

The graphic shows the more prevalent profiles or verticals.

• Knowing the primary workflow that the cluster is meant to handle may be a
luxury in predicting the incoming requests.
• Understanding the workflows and aligning the workflows with known profiles
can help administrators prepare accordingly.

Resource: Click the link to explore these and more markets.

Complex Workflows

Once the baseline is established in terms of real metrics, start describing in greater
depth the actual situation on the cluster.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 633


[email protected]
Performance and Monitoring

• Baseline Information514
− Example - Video Surveillance 515
• Multiple Purposes516
− Example - Home Directories and Video Streaming517
• Management518

− Example - Human Resource and Engineering Access Zone519

514 If a cluster serves precisely one need, and nothing ever changes, your baseline
information correlates directly with your activities.

515A typical case may be a cluster which is used as storage for a security camera
installation. As long as the models and usage of the cameras remains unchanged,
the storage needs are likely to be highly predictable. As old information is deleted
or archived, and new information is brought in at a nearly constant rate, the net
cluster usage may be constant.

516PowerScale clusters are frequently used for multiple purposes in parallel. Every
case is different, and the flexibility of PowerScale clusters means that there are
often multiple functions that are implemented on a single cluster.

517 A single cluster might contain home directories for a wide variety of users. The
cluster may also host streaming videos of corporate events and object storage for
internal applications. The well-prepared storage administrator understands what
those load elements are, and differentiates them to understand what factors drive
changes on the cluster.

518 Often workflows are separated into access zones for easy management. Such
cases justify monitoring different access zones separately to establish what their
different baselines are. It can guide delivery of resources where they would be best
applied.

PowerScale Advanced Administration

Page 634 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Performance Monitoring and Planning

Administrators520, application stakeholders521, and management522 teams may want


to understand how to measure their current PowerScale workloads. Performance
measurements help in understanding how adding capacity or workloads can modify
a performance profile. Customers who apply this knowledge to their environments
can be assured that their "go-live" dates are more successful in meeting
operational needs.

519Each access zone has different baseline characteristics. An HR access zone


typically has predominantly general-purpose file access whereas an engineering
group may have more intensive workloads such as testing application for deep
learning projects.

520Administrators need data to: Understand existing performance and capacity


envelopes, review existing or prior performance-impacting events, provide a
qualitative roll-up of needs and requirements to management.

521Application stakeholders need data to: Plan for future growth of existing
applications, assertively query software vendors when there are workload changes
due to upgrades or replacements, provide quantitative requests, and set
performance expectations for storage administration and management.

522Management needs data to: Produce a concise summary of what storage


workloads exist, when more are needed, and why, shorten the funding approval
process through confidence in the performance metrics, understand their storage
workloads when working with software application vendors.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 635


[email protected]
Performance and Monitoring

Administrators Application Stakeholders Management

What is the system doing? What is the trending data for the apps? What storage workloads exist?

What and when is app software When do i need more workloads and
How much capacity is it using?
upgraded? why?

What data do i need funding approval


What does the system need? How will this impact the workload?
for?

Communicate with customers

Workload Analysis Overview

Workload analysis consists of reviewing the ecosystem of an application, and the


storage it lives on. You should understand the configuration of the cluster, how
clients see it, where data lives within it, and the application use cases.

Let us examine the key areas.


• Determine how an application works.523
• Determine how users interact with the application.524

523 If an application has a unique dataset, determine if it relies on a database such


as Oracle®, or flat files such as VMware® VMDKs. Determine if it has broad and
deeply nested directory trees with few files per directory, or shallowly nested
directories with large numbers of files per directory. Determine how the application
uses the stored data.

PowerScale Advanced Administration

Page 636 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

• Determine the network topology.525


• Determine the Storage Stack Performance.526

Analyze your workload performance when disruption occurs or anytime


performance changes. For example, analyze performance when an application
upgrade is performed, a new functionality enabled, or a migration to or from a new
pool. For help with analyzing your workload results, contact your Dell EMC or
Partner account team.

Determine How an Application Works: For example, does it rely heavily on


metadata reads and writes to the dataset? Does it have moderate metadata writes
and intense metadata reads, as is the case with electronic design automation
(EDA)? Are heavy or light data reads and/or writes required? Are the data reads

524Though more difficult to profile, understand what performance numbers users


are accustomed to and what they are expecting. If users interact with the
application through direct requests and responses from a flat data structure. If there
are efficient parallelized databases or flat file requests to derive a result, or if there
are inefficient serialized requests.

525Diagram the network topology completely. Leave nothing out. Pictures can
resolve many issues. For a LAN, itemize gear models, speeds, feeds, maximum
transmission units (MTUs) per link, Layer 2 and 3 routing, and expected latencies.
Perform a performance study using the iperf tool for network performance
measurement. Iperf is distributed with OneFS. For a WAN, itemize providers,
topologies, rate guarantees, direct versus indirect pathways, and perform an iperf
study.

526You can determine storage stack performance by learning over time what your
normal performance is, and how to recognize when it is not normal. All clusters
have a unique configuration with a unique dataset and workload, and therefore you
are observing a unique result from your ecosystem.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 637


[email protected]
Performance and Monitoring

and writes more or less random or sequential? If the application is a latency-


sensitive, are there expected timeouts for data requests built into it? Can they be
changed? Are there other external applications from this application that might
cause latency problems? FindFirstFile or FindNextFile crawls do repetitive work
that is not well suited to NAS use and must be investigated. If an application can
benefit from caching, how much of the unique dataset is read once and then re-
read? Over what periods of time—hourly, daily, weekly, and so on. Know that the
frequencies can help in cluster sizing, regarding L2 cache benefits, and more L3
cache opportunities within OneFS.

Determine How Users Interact with the Application: An example for inefficient
serialized requests is a CAD application needing to load 10,000 objects from
storage before rendering a drawing on the user's display.

Data Collection Methods

Real-time data seen from the WebUI dashboard.

OneFS offers various live data sources for monitoring.

• Real-time methods of data collection

OneFS also makes recorded information available.

• Historic methods of data collection

PowerScale Advanced Administration

Page 638 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Cluster data is made available through SNMP, through the InsightIQ application,
and through the isi statistics command line. In all these cases, information
is made available when it is produced. The cases allow for alerting based on
current thresholds, accumulation in monitoring infrastructure, or active monitoring in
real time. These tools can answer the question: “What is the situation right now?”
Events such as a drive going bad, a configuration element being changed, an
advisory quota being reached all produce such immediate information. In addition
to a stream of immediate information, OneFS also makes recorded information
available.

The OneFS statistics system gathers information in a database which is made


available through PAPI by isi_stats_d. InsightIQ maintains its own database
of historic monitored information. The CELOG database maintains a history of
events which is available through the OneFS web administration interface and
through command lines. Each node maintains activity logs and there is a cluster-
wide set of log files which describe the cluster’s computing activities. The logs and
databases together answer the question: “What has happened?”

Performance and Optimization

• Performance is the predictable data delivery within variance limits that avoids
lost production or more production costs.
• Optimizations are actions that improve or restore data delivery.
• Performance and optimizations are building blocks to understanding the cluster
as a whole. Metrics include latency, throughput, and duration.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 639


[email protected]
Performance and Monitoring

Protocol Operation Rate

Establishing Cluster Baseline

A trend exists because of changes between the past and the present. To measure
these changes, a starting point or baseline is needed. An easy way of establishing
a baseline of information for most metrics is with InsightIQ. Use InsightIQ with file
system analytics (FSA), and monitor activity closely to see the cluster’s behavior.

• Levels of usage can vary substantially between clusters.


• Each cluster needs careful evaluation on various metrics to see which aspects
of the cluster are most heavily loaded.
• Sources for key metrics: isi statistics, SNMP infrastructure, other tools

PowerScale Advanced Administration

Page 640 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

The graph shows InsightIQ throughput rate that shows a spike in activity.

The baseline measurement of a cluster’s performance should include the general


levels of:

• Usage - how much space is taken? How busy are the CPUs? How much RAM
is used for caches?
• Performance - what are typical throughput and latency figures?
• Cycles - when is peak time? How long does it last? How much slack time is
there?

Various metrics have varying degrees of importance based on the storage


environment, but key metrics are consistent:

• Storage - used and free


• Bandwidth - and how it is used across the connections.
• Connections - which links are alive or dead, and their configurations.
• RAM – how much in use? How much is for cache? How much for other
operations?
• CPU - FEC calculations are only one example of all the computation a cluster
does. The job engine, for example, can demand a lot.

Differentiating Load Sources

It can be difficult to precisely determine which functions are placing which loads on
a cluster, but fortunately the PowerScale monitoring tools offer sound options.

• There are various criteria by which loads may be described.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 641


[email protected]
Performance and Monitoring

• When examining the environment, establish what the key properties of the
workloads are.
• There are tools options that are available in PowerScale to monitor the
workloads.

Report data seen from the InsightIQ (Breakout by node pool over 6 hours).

Some loads originate from particular clients. Some operate at particular times, such
as end-of-year bookkeeping or scheduled virus scans every evening. Some loads
originate from particular applications and therefore have particular access patterns.
Some loads interact with the cluster through particular protocols, or use particular
features of those protocols, such as OpLocks, in particular ways. Some loads are
aggressive users of particular datasets.

The easiest part is to determine which are the largest data directories, or if the
cluster roles are split among pools, which pools are busiest. The InsightIQ capacity
estimation tool affords a good way of seeing how much capacity a cluster offers as
a whole. If quotas are enabled, another good source of information is quota
reporting. Quota reports are accessed through either the OneFS web
administration interface or InsightIQ web interface.SmartConnect zones offer a
good way of differentiating separate data flows to and from the cluster. Client
activity measures can help you differentiate the quiet and intermittent work load of
a home directory scenario from the heavy activity of an active Hadoop installation.
Even without SmartConnect, client activity reports are available and as long as
client addresses are differentiated in some respect. Client activity reports can
determine which functions are placing the greatest load on the cluster.

PowerScale Advanced Administration

Page 642 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Storage administrator should expect to work with network and system


administrators to develop a truly well-rounded view of functions which affect the
cluster. However, the different ways in which loads are generated generally show
up clearly in the monitoring data with isi statistics, CELOG, and FSA.

Trends and Timeframes

• Events can move rapidly527


• Most operational issues develop gradually.528
• Catching trends early allows decisions ahead of time.
• Monitor by stretches of time: day, week, month, year529

527Occasionally the storage administrator has to fight an immediate fire, such as a


hardware failure or a power cut. Usually issues that arise are gradual and can be
foreseen and managed ahead of time. The key to managing issues is identifying a
trend and taking the appropriate action.

528Comparing short and long-term events allows for differentiating the loads that
are created by different applications, or usage profiles of different applications.
Comparisons can distinguish between a short-term blip in the numbers, and an
actual trend over the longer term.

529To understand trends fully, examine metrics on different timescales. Look at


developments over a day, or a week, a month, and a year, if possible. If the life
cycle of a given installation is 10 years, then trends which are imperceptible on the
scale of a week, and are minor in a month, can be critically important over a year.
The flexible nature of the PowerScale means that their lifespan in a computer lab is
often long. Predicting and allowing for long-term developments is a crucial factor in
planning.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 643


[email protected]
Performance and Monitoring

Cluster capacity report seen from the InsightIQ trend period over one month.

Trend Analysis

• Administrators cannot rely upon generic ideas about what data is important or
which trends are significant. Skilled administrators examine their environments.
• They then determine which data and trends are more important in their
particular context, based on an understanding of the workflows prevalent in their
environments.
• Trend analysis helps answer the question: “What will happen?”

InsightIQ FSA graph showing a breakdown of the file sizes.

It is easy to monitor immediate signals and see when they cross thresholds.
Monitoring is important, but insufficient for the best storage administration
practices. Good practice includes being able to predict future activities and
performance. An example is monitoring the usage level of a cluster to predict when
it needs upgrading to meet the user base’s needs. The storage administrator must
be able to see the trend to anticipate the future needs. Most SNMP management
systems accumulate and displays data that are exported via SNMP to provide for
trend analysis, but InsightIQ offers substantial trending capabilities as well.

PowerScale Advanced Administration

Page 644 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Safety Margins

• PowerScale storage administration is not only about improving performance,


but also about improving uptime by allowing the cluster to continue working
even if nodes malfunction.
• Without any headroom built into the cluster, any loss within the cluster pushes
the cluster’s capacity to operate below the demands placed on it.
• Safety margins are not wasted capacity, but a good use of capacity.
• Running at high CPU levels, pushing the maximum connection counts per node,
maximizing drive IOPS, or creating snapshots as fast as they are deleted,
leaves no headroom.

Used Space Action

80% Begin planning to purchase more storage.

85% Receive delivery of new storage.

90% Install the new storage.

95% There is a possibility of data unavailability if capacity is


not increased.

PowerScale recommends that storage capacity be maintained below 80%. Data


protection can manage the loss of a node, but reprotecting data would require the
space to rebuild. Once over the 80% capacity, PowerScale is available to provide
more nodes, but how long is your organization's purchase order process? How long
to ship a node to your data center and get it installed?

Resource: Look on EMC Community Network for the Isilon


Guidelines for Large Workloads for a list of limits to consider. Read
and understand document noted for best practices on maintaining
free space in the cluster.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 645


[email protected]
Performance and Monitoring

Filling Cluster

• It is possible to fill a cluster to 100%. OneFS does not prevent this from
happening.
• A full cluster is a bad scenario. The cluster completely locks up and refuses
logins except from the console.
• A prevention measure to consider is having the right protection for the workload.
− Example530
• Another measure is to enable automatic deletion of snapshots.
• Features such as SmartQuotas, deduplication, file filtering are a few areas to
help prevent trouble by limiting the damage done by abusive clients, and
notifying you when thresholds are approached.

The best administrative practice in response to these facts is to maintain adequate


headroom. Trending and planning should enable you to foresee your data needs

530 Do you want all your data that is protected at 5x, consuming much more
capacity than 2d:1n protection? Home directories may need less protection than a
vital repository where customer information is stored. You can survive with the loss
of Home directories, but losing vital customer information such as in progress
engagements can affect the company's bottom line.

PowerScale Advanced Administration

Page 646 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

and purchase sufficient storage for those needs. Maintaining VHS for emergencies
related to data loss is a best practice, and generous headroom is better.

Warning: Scenario 1 - Manually deleting B-trees531

531 Manually deleting B-trees in the file system can temporarily alleviate the
situation. Deleting B-trees is not a safe or desirable option. Deleting B-trees
involves manually and directly editing the literal file system’s internal data
structures. This scenario involves data loss, Severity 1 support calls to get help in
identifying, editing, and deleting items in the data structures, conference calls with
executives and possible weeks of downtime. Such scenarios should be avoided
whenever possible.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 647


[email protected]
Performance and Monitoring

Caution: Scenario 2 - Virtual Hot Spares (VHS) can be disabled.532

532 Virtual Hot Spares (VHS) can be disabled as a temporary measure to get the
cluster to operate again. When disabling VHS, its reserved space can be
recovered. Disabling VHS also means that there is less capacity available to resist
any kind of hardware failure, so it is not a good situation either. Even full clusters
start to suffer with capacity usage over 80%. The issue is a consequence of many
confounding factors, including hard drive physics and the calculations that are
required to maintain an optimized data layout. The more full the cluster, the higher
risk of hardware failure, increasing the potential of filling the cluster to capacity.

PowerScale Advanced Administration

Page 648 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Performance Analysis

Performance Analysis

End to End Model

Workload analysis consists of reviewing the ecosystem of an application, and the


storage it lives on. Storage administrators need to understand the cluster
configuration, how clients see and get to it, where data lives within it, and the
application use cases.

• Questions to keep in the forefront of the analysis533


• Data Flow Model534
• More Information535

533How do the application works? What are the user interactions with the
application? What is the network model? What are the workload-specific metrics for
networking protocols, disk I/O, and CPU usage?

534Data flow models are important to show the processes that are used for
transferring data.

535 There are fewer outliers that are spread around the chart. Most of the results
are contained at the bottom of the graph in a specific area. The results indicate that
the requests were responded to in an appropriate time period. The results that are
shown are generally under 25 milliseconds. The graph depicts a cluster with plenty
of resources to support its workflows.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 649


[email protected]
Performance and Monitoring

Administrators can adapt a data flow model for their environment. Models can help
understand the flows of information within the system and the processes that act
upon the information. This example shows the simplest model of accessing a file
share. A more through model would include the networking processes. Knowing
the processes can not only help isolate a problem, but identify the processes or
areas of an analysis. If a user is denied access to a cluster’s file share, a model
may show that the problem is likely in the authentication process. If the user cannot
open a file, then the issue could be permissions or ownership on the file. If the user
cannot save data to the file share, enforced quota limits may be reached.

Performance Baseline

• Establishing the baseline gives storage administrators a starting point, helping


them to identify trends and isolate issues.
• Review the white paper for comprehensive performance data gathering.

Resource: Dell EMC Isilon OneFS Cluster Performance Metrics Hints


and Tips white paper to form a baseline and Isilon Uptime Info Hub for
more information.

Drive Metrics

When examining metrics, it is important to have an understanding of the


terminology. The table lists the well-defined standards.

PowerScale Advanced Administration

Page 650 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Term Definition

Utilization/Capacity Measure of in-use versus free disk capacity

Utilization/Busy Average of time the disk was busy over the sample
interval

Disk Percent Busy Average of time the disk was busy over an interval,
expressed as a percentage

Disk Latency Sum of seek time, rotation time, and transfer time

Service Time Time from the device controller request to the end of
transfer, including delays due to queuing and latency

Response Time Disk service time plus all other delays such as
network, until data is at the host

Throughput Average amount of data that are transferred within a


period such as MB/s

Queue Depth Average number of requests awaiting processing by


the disk

Time in Queue Average time requests in queue awaiting processing


by the disk

Cluster Metrics

The two metrics to highlight first are the PowerScale cluster portion of the end-to-
end model, the isi statistics drive and isi statistics heat. Many of
the metrics are guidelines only, not laws. The only definitive answer is, what values
work for the workload?

Disk

• To monitor drive performance, the best indicators are how long an I/O operation
is waiting—TimeInQ, and how many are waiting—Queued.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 651


[email protected]
Performance and Monitoring

• To test the impact of different levels of disk I/O, see. Howcan I generate
different levels of disk load on a linux system.

Different drives in same TimeInQ should be <40 ms (SATA Queued should be near zero Best
cluster < 7; SAS < 3; SSD < 1) performance when < 2

Indicator of a spindle bound cluster

Heat

With the heat map storage administrators can tell what is causing so many IOPS to
reach the drives.

Operations per
second

UNKNOWN operations are the OneFS file system


ops, such as SyncIQ or Job

Excessive locks can cause poor access to the directories affected

Disk: Key areas that trigger an investigation are protocol and operational latency.
Remember to see the baseline outputs to compare metrics. Review how the drives
are performing. If the cluster is unable to serve data to clients fast enough, other
performance parameters do not matter as we are waiting on the disks. The graphic
shows an example of using the output that is shown and comparing it to the
baseline data. These drives are over stressed by over 120%. A 3.5” SATA drive
can provide 100 IOPS sequentially. A workflow that is mostly reads or mostly writes
reach the 100 IOPS mark, or even exceed it under a best case scenario. As shown,

PowerScale Advanced Administration

Page 652 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

the drive IOPS are above 220, meaning these drives are being pushed beyond
their limits and may indicate a disk bound cluster.

Heat: Use heat map to review if there are any applications or files that are resource
hungry and consuming a great deal of IOPS. Review these applications to see if
they need such a high level of performance, or if it is ok to throttle them. If the
drives are stressed out, then there is a need to find out where it is coming from.
Isolating the issue is best accomplished with the use of the isi statistics
heat command. This command provides a list of the most used files and paths
from an IOPS perspective. For example, while the metadata intensive application
was running, the command was run. The graphic shows almost 10,000 locks
occurring between the globalcache directory and the IssueCollector
directory. In this scenario, it would be wise to review what is causing IOPS on those
folders if they continue being repeat offenders. Things such as snapshots and
SyncIQ can leave extensive block remapping on the directories. Remapping can
cause increased overhead in terms of latency when accessing those directories. If
the directory itself is not under snapshots or if it needs to be, then the next step is
to use the metadata write acceleration strategy. Change the strategy on the
directory with isi set –-strategy=metadata-write. The cluster directs any
incoming metadata writes to the SSDs.

Latency Metrics

For protocol performance, review if the entire protocol is experiencing latency or


not.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 653


[email protected]
Performance and Monitoring

Latency in
millisecond
s
• Good:
<10 ms
• Normal:
Poor latency can impact data * applies to all nodes, all protocols, all
access and response times operations 10 ms -
20 ms
• Bad: 20
ms +
• Investig
ate: 50
ms +

The graphic shows the command execution of which class of operation is taking the longest.

The output TimeAvg is in microseconds and must be converted to milliseconds to


compare to the standard expectations in the table. The output of this command is
only meaningful with active traffic. In this example, 1826.2 microseconds is good. If
experiencing latency problems, the output may show the TimeAvg numbers more
than 50 milliseconds.

Client Metrics: Users by Protocol

And finally, client performance. Once determining that only a handful of clients that
consume the IOPS, which is typical, focus on what exactly they are doing.

Check for dominant users of the same protocol

UNKNOWN user names are not included in the 1024


records * for protocols that do not supply a user name

The output views the top 20 users and their external protocols.

PowerScale Advanced Administration

Page 654 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Use the isi statistics client command to determine if any users are
dominant or out of balance with other users of the same protocol. The isi
statistics client indicates if a single user or group of users are dominating
the cluster resources. The command only outputs the Ops(1), Proto(6), and
UserName(8) columns. The Ops rate for each user can identify out of balance
usage. This baseline can also be used to identify a user who has an abnormal use
of the workflow. To determine the difference between a busy user and a non busy
user, increase the command output to where the user operation counts begin
decreasing. PowerScale stores 1024 username records for each 15-second
window. UNKNOWN usernames are not in the 1024 records.

Normal vs Abnormal

• Establishing a baseline of normal, day-to-day operations and fluctuations is


possible only by observing and recording your workflows.
• Use scheduled reports to view and compare past behavior.
• View any report on a basis of 3 months, 6 months, or 1-year data ranges to
identify trends.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 655


[email protected]
Performance and Monitoring

Interpreting Monitoring Data

Listed are guidelines to use when faced with monitored data and trying to make
sense of it in the context of monitoring and alerting configurations:

• Understand the scale536


• Identify typical case537
• Identify extremes538
• Workflow contexts539
• System limits540

536Understand the data’s scale. What is the time frame? Does the metric reach
zero? What is the maximum value? Is it dimensionless or are there units of
measurement?

537What is the typical running level of the data? Does the running level reflect the
normal level of activity?

538What is the absolute peak reached? Depending on the dataset, the lowest
trough may also be a concern. Make a note of the troughs.

539What is the context around the peak and/or trough? Is it often approached? How
closely is it approached? Do peaks or troughs cluster near each other? Do the
peaks or troughs relate to a known workflow? Is there a run up to a peak or is it a
sudden spike?

540Are the figures seen near the ultimate limits of the system as it is configured?
Do these numbers require immediate action, or are they read in a report at some
point?

PowerScale Advanced Administration

Page 656 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Point of verification should tell administrators which levels would create an alert
that would require your attention. By understanding how the levels relate to the
workflows, administrators are in positioned to determine abnormal levels and
whether the cluster requires more resources.

Max reached

2nd highest

0-10Gb/s

Typical level

one week

The graph shows a maximum of under 9 GB/s total throughput for an entire cluster.

9 GB/s is well within the limits of operational performance, and should cause no
anxiety whatsoever. On the other hand, if this is the baseline, it may be prudent to
set an alert in case the number goes over 10 or 12 GB/s. The alert signals
abnormal activity levels.

Workflow Analysis Case Study 1

The workflow analysis topic uses a simple case study.

Hayden has been tasked with migrating a


workflow from one PowerScale cluster to
another. The application experiences a
slowdown, the network shows latency, and the
cluster shows a bottleneck.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 657


[email protected]
Performance and Monitoring

A workload and workflow performance should be analyzed when encountering


disruption, or anytime performance changes have occurred.

• For example, when performing an application upgrade that has a new


functionality that is enabled, or a migration to or from a new pool.
• Before moving workflow to cluster
• Wireshark best to benchmark cross-platform performance

To migrate a workflow to the PowerScale cluster, Hayden analyzes an existing


workflow’s performance on the current storage solution model: application –
network - storage. If the performance of the to-be-migrated application is
acceptable on the current storage solution, Hayden gathers a packet capture. The
purpose is to benchmark the needs of the workflow and compare to what the
cluster can provide. If the cluster is unable to provide as much performance that the
application expects, then the user or application experience is degraded.

To benchmark a workflow, the most useful cross-platform tool is Wireshark. This


tool allows packet capture analysis at all layers of the OSI model. Hayden can
focus on the latency of the network and protocol. Another way to gather a packet
capture is using the tcpdump utility on the cluster. In addition, a packet capture
should be gathered on the client as well. Once Hayden has the captures, they are
analyzed in detail. The packet capture should be for a similar workflow and protocol
on both storage solutions, or at least the same protocols.

Networking Processing Delays

Latency is the processing delay in network data. Average latency is not network
latency. Latency is expected. Remember to see the baseline latency metrics when
addressing latency issues.

PowerScale Advanced Administration

Page 658 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

The general flow of analysis is divided into two parts:


• Network latency541
• Protocol latency542

How to Compare Network Latency

Once acquiring a packet capture on the two storage solutions, one way to review
the overall latency is by looking at the round-trip time (RTT).

• Round-trip time543
• Measuring with Wireshark544
• Unhealthy network545

541Network latency allows Hayden to determine if the network change between


previous storage and the cluster adds latency. If adding workflows, compare the
baseline metrics to analyze the possible impact.

542Protocol latency is added latency due to the speed of the protocol operations. If
there is added latency at the protocol level, isolate to separate operations of the
protocol, such as READDIRPLUS for NFS.

543The round-trip time only measures the time that is taken for sending a packet
and for receiving the acknowledgment. Thus, round-trip time does not differentiate
network delays from computational delays.

544Wireshark can measure round-trip time navigating to Statistics > TCP Stream
Graph > Round-Trip Time Graph. Any added latency at the network level
compounds into added latency at the protocol level.

545If the network is not healthy or the storage solution is unable to respond in a
timely fashion, the end-user experience suffers.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 659


[email protected]
Performance and Monitoring

Overloaded Cluster

• Cluster A is where the application currently resides.


• The graphic shows the Wireshark RTT graph546 when the metadata intensive
application is pointed to the cluster and the performance issues are
experienced.
• In Hayden’s case, the application has degraded the performance of the cluster.

546 The graph is not consistent, and contains a great number of outliers. The
outliers indicate performance issues on the storage solution, the cluster in this
case. The cluster is overloaded to such a degree that it is unable to reply to all calls
in a timely fashion. As a rule of thumb, customers experiencing a 0.25 second
response is an inconvenience, whereas a 0.5 second response becomes a
problem. It takes a uniform and highly optimized workload to produce no outliers
whatsoever.

PowerScale Advanced Administration

Page 660 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Outliers spread across graph

Results (dots) not in consistent


area

The graphic shows an example of what to expect when a cluster starts to get too much load for
consistent good performance.

Cluster with Available Resources

• Seeing that the application impacted the performance on Cluster A, Hayden


migrates the application to Cluster B.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 661


[email protected]
Performance and Monitoring

• Here Hayden gets a normal Wireshark RTT graph547, painting a much cleaner
picture.
• Some outliers generally occur even in healthy environments because some
protocol operations are naturally more time consuming than others.

Only a few outliers

Results are in consistent area

Graph shows metadata intensive application on cluster with available resources.

547There are fewer outliers that are spread around the chart. The Most of the
results are contained at the bottom of the graph in a specific area. The results
indicate that the requests were responded to in an appropriate time period. The
results that are shown are generally under 25 milliseconds. The graph depicts a
cluster with plenty of resources to support its workflows.

PowerScale Advanced Administration

Page 662 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Cluster with General Workflow

• For comparison, Hayden returns to Cluster A, showing the Wireshark RTT


graph after the intensive application was migrated to Cluster B.
• The performance is much better.

Almost no outliers

Results are in consistent area

The cluster can respond to other network calls in a timely fashion. There is no
added latency due to the network layer in this packet capture. Thus, we can
summarize that the application was resource hungry to the point that it affected all
other applications. It saw available resources, and it consumed as much as it could
without regards to other workflows.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 663


[email protected]
Performance and Monitoring

Viewing Protocol Latency

• Once eliminated the network as a source of latency via the RTT chart, the next
step is to determine if the protocol latency is acceptable548.
• The network packets may have acceptable RTT, but if the new storage solution
adds 200 milliseconds of latency to protocol responses, the clients feel the
effects.
• The easiest way to review protocol latency is by using the Service Response
Time(SRT), analytics of Wireshark.
• A packet capture should be taken on both solutions, the current storage and the
cluster, and compare their SRT values.
• To view the SRT times, Hayden can go into Wireshark > Statistics > Service
Response Time > <select the appropriate protocol>.
• Examples:

• Wireshark > Statistics > Service Response Time > ONC-RPC > NFS >
NFSv3
• Wireshark > Statistics > Service Response Time > SMB
• Wireshark > Statistics > Service Response Time > SMB2

548Latency acceptability is a matter of context, relating to the nature of the network


load, the applications, the client's needs and more. There is no standard answer
here. Even if the network packets are transmitted with a low latency, the end-user
performance suffers if the protocol operations’ replies are slow.

PowerScale Advanced Administration

Page 664 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Service Response Times (SRT)

Hayden’s case with the metadata intensive


application, there is no capture of NFS workflow to
compare the SRT values with. However, Hayden
can see what the values were on the PowerScale
cluster, and where the latency came from.

Preworkflow

Before the NFS workflow is migrated to the cluster, Hayden ensures that the cluster
SRT values are similar or better when compared to the previous storage.

Avg SRT dictates overall performance


of client-to-storage communication
stream

Before: metadata intensive workflow


pointed toward cluster—sorted by
average SRT

Highest SRT is 0.015 seconds or 15


ms Most operations should be less than 50

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 665


[email protected]
Performance and Monitoring

Postworkflow

• READDIRPLUS calls549
• Metadata reads550
• Performance of the client or application551
• Latency552

549 Here the average times of the READDIRPLUS calls are taking 0.126 seconds
on average to return from client to cluster and back to client. The time is an
extreme delay, especially when considering that READDIRPLUS is a type of
metadata call that is expected to be fast. Based on the SRT times, Hayden can
infer that this workflow would highly benefit from metadata acceleration and a
larger L1 cache on the nodes.

550 If the application requires fast metadata reads, Hayden can view the current
live-performance of the protocol response times to compare with. Viewing is done
using the isi statistics protocol command. Using the isi statistics protocol, Hayden
views live statistics on protocol operations, and compare to what is needed. For
example, Hayden consistently sees the cluster’s NFS READDIRPLUS response
times of less than 30 ms. The application is working properly with 50 ms on the
existing solution. With these metrics, Hayden knows that the application’s response
time needs will be met after migrating.

551 Performance of the client or application should be continuously evaluated after


the migration to ensure proper response times to the protocol. Use the isi statistics
client command to evaluate. Using the metadata intensive application as an
example, we can filter the command’s output to only show us the application
servers. The command to use is isi statistics client --classes= --remote_addrs= --
numeric --top --protocols= --long.

PowerScale Advanced Administration

Page 666 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

This workflow
would highly
benefit from
READDIRPLUS metadata
calls 0.126 Avg acceleration
SRT indicates and a larger
extreme delay L1 cache on
nodes

The average response time is what dictates the overall performance of the client to
storage communication stream. In the example, the highest SRT is 0.015 seconds,
or 15 milliseconds. For most operations, we want it to be less than 50 milliseconds
for the best experience possible. SRT numbers break out by the communication
type with the cluster, not merely the port, or protocol. The numbers allow for
differentiation of the types of activities and how they affect the general performance
profile.

Workflow Analysis Case Study 2

This example problem statement causes users to complain of performance.


• 30% to 40% excessive namespace read values
• Disk average time in queue of 35 to 40 milliseconds

552If the latency continues to match the previous storage solution or exceed it,
Hayden knows that the application should work as expected. If not, there are tuning
options to increase the performance of the application. For example, enabling
metadata-write acceleration or pointing the workflow to A100 accelerator nodes are
some of the options available.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 667


[email protected]
Performance and Monitoring

• VMware RTT spikes to ~30%


• Executing a ls on large directories take too long - 30 to 60 seconds

Using the end-to-end model, the namespace, RTT spike, and long ls return
symptoms can indicate a problem in the network or storage. Disk time in queue
points to a probable storage issue. This case study uses InsightIQ to get a clear
picture of the problem.

Identify Busy Times

The first step is to identify the busy times. The InsightIQ output shows that the
environment is busy during a typical workday, about 8:00 to 5:00 pm. To isolate the
timeframe when seeing performance issues.

8AM to 5PM Daily was their busy time

SMB dominates load,


focus on SMB

Examine Workload Mix

Namespace reads are about 30% percentage of the work on this cluster. It makes
sense that any bottleneck or constraint affects such a dominant operation.

PowerScale Advanced Administration

Page 668 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Client Distribution

Here is an examination of client distribution. The distribution appears evenly


spread. However, the chart does not show distribution of client load.

With 41 nodes, average


connection/node would be
2.4% - fairly distributed

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 669


[email protected]
Performance and Monitoring

Load by Client - Heavy Hitters

• Are heavy hitters expected? If so, are they showing unusual load?
• If not, ensure that the clients are doing what they think they are doing and
adjust resources as necessary.

.Clicking the client


shows breakouts. One
client does 76% of all
traffic on this cluster.

Hover for details

The graphic shows an example of a breakout of External Network Throughput by Client.

Note: This page is for illustration – the example case study has a
fairly even client distribution.

Load by Node

Look for nodes that might be a bottleneck. PowerScale applies more cache and
CPU if access is distributed.

PowerScale Advanced Administration

Page 670 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Issue: 61% of all cluster


traffic is traversing the top 4
nodes

Isolating Nodes

The graphic shows the Protocol Operations rate. Three of the hardest hit nodes for
namespace_read are also among the busiest nodes.

CPU Utilization

• Does CPU utilization correspond with workload?


− Yes, but it is not high enough itself to be a problem. Here the nodes were
doing the HPC workload.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 671


[email protected]
Performance and Monitoring

• CPU was not an issue so no need for more nodes. namespace_read was the
dominating operation.
• Namespace reads can be dramatically accelerated using SSD for namespace
acceleration.

Four of the busiest nodes


are hit hardest for
namespace_read, and all
doing the HPC workload

Note: Typically, CPU utilization should not run > 80% for sustained
periods, short spikes are ok.

Considerations

• Model workflows to understand the process used to read and write data to the
cluster.
• Establish baseline metrics or good picture of the model.
• When performing a workflow or workload analysis, the areas to look at are
storage metrics, network metrics, and client metrics.
• Monitor network and protocol latency before and after configuration changes or
added workflows.

PowerScale Advanced Administration

Page 672 © Copyright 2020 Dell Inc.


[email protected]
Performance and Monitoring

Challenge

Lab Task:
1) Establish cluster baseline.
2) Simulate increased workload for the cluster.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 673


[email protected]
[email protected]
Appendix

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 675


[email protected]
Appendix

Network Stack

Physical Connection

Check the bottom layer of the ISO7498 (OSI) stack: is it physically healthy?

Ensure network plugs are fully inserted and not loose or askew

Look for broken, kinked, crimped or compressed cables

It may seem like a trivial concern, but improperly seated or damaged physical
network interfaces and cables can result in problems ranging from a complete loss
of function, to intermittent reductions in performance. It is cheap and practical to
start by ensuring that the basic hardware is in good shape.

Transport

Media
Simple/duplex Basic configurations
errors

Line Frame
speeds sizes

PowerScale Advanced Administration

Page 676 © Copyright 2020 Dell Inc.


[email protected]
Appendix

• The next level above raw hardware is the transport protocol level553.
• Checking all these factors lie in the realm of network administration, and it
should be a collaborative effort.
• Network administrators also have access to more detailed, low-level logs and
diagnostic tools554.

Protocol

IP

UDP TCP ICM

• At the encapsulation and upper protocol level, PowerScale administrators555 are


a lot closer to the action556.

553Some of these configurations have bearing on PowerScale cluster


configurations, but others are outside the scope of the storage administrator.

554 These tools are available to storage administrators, so it makes sense to


collaborate to cross-check diagnostic results.

555Administrators can use packet captures to look for network problems such as
fragmented packets and symptoms of congestion.

556There are logging and reference tools that can help to identify a long list of
issues - Wireshark being one of the most useful tools.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 677


[email protected]
Appendix

• Assuming that the lower-level network systems are in good shape, most of the
problems at this level reflect either a misconfiguration, or performance overload.
Example557.
• On the other hand, retransmissions tend to happen as a result of timeouts558.

Name Resolution

The systems that are shown in the table are ways that names and addresses are
translated to each other. Some, such as DNS can get sophisticated. Some, such as
host files are trivial. To check for as many systems in the chain as you can identify,
so that you are sure that hostname identification is really happening the way that
you intend.

Client DNS SmartConnect DNS servers

NIS Netgroups Host files

Active Directory LDAP Workgroups

An example of this problem is where users bypass host resolution because of a


slow or unreliable DNS infrastructure, and save IP addresses into their host files.
These users can then be left behind by back-office reconfigurations that invalidate
their host files. Rooting this sort of thing out can take a while, and require some

557For example, packet fragmentation is often a result of misconfigured MTU


settings.

558 Timeouts in turn generally result from oversubscribed facilities. Sometimes the
right answer is to buy more or faster hardware.

PowerScale Advanced Administration

Page 678 © Copyright 2020 Dell Inc.


[email protected]
Appendix

user education. It does happen, and one user who does this may actually persuade
other users to follow suit, resulting in a whole group of users whose storage
operations are not being load that is balanced at all.

Routing

Another facet of networking that takes a lot of work is routing. This is addressed
later in a bit more detail, but briefly there are three topics559 in routing that deserve
our attention. Whenever dealing with a routing issue, the key command to help
examine the routing table is netstat -r

However, routing is never a single device issue. There are routers, switches,
firewalls and virtual networks in practically any modern enterprise, whether
commercial or otherwise, and so it would be beneficial to engage the network
administrator, who is best placed to help navigate the network configuration of the
environment.

Firewalls

Firewalls are a peculiar hybrid of traffic router and traffic blocker. Their mission is to
increase information security, and they do this by letting approved forms of traffic
pass while blocking other forms of traffic.

559 Cluster-wide routing, Static routes and Source-based routing (SBR).

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 679


[email protected]
Appendix

Block(some) Obviously a potential problem


traffic

Undesirable traffic can clog networks


Pass(some) traffic
and interfere with operations

Route traffic People forget, but firewalls can introduce


routing issues!

Most firewalls also route between multiple network segments, including at least one
so-called demilitarized zone (DMZ), which is a network segment that is separated
from the internal network, but still is protected from the Internet at large. This
means that a misconfigured firewall can introduce, not merely traffic access issues,
but routing issues as well. In general, firewalls have limited application to storage
administrators, but if a storage system is in its own DMZ then every interface is
behind a firewall, and every change in the workflow will prompt a reexamination of
the firewall.

PowerScale Advanced Administration

Page 680 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Network Troubleshooting
Basic network troubleshooting is fairly simple, conceptually. Example560. To
understand more complex troubleshooting scenarios, look for more subtle signs of
trouble. Example561. To see these problems clearly, perform an analysis on actual
network transmissions, a good tool for doing that is Wireshark.

Signs of trouble

Retransmissions Fragmented packets Packets received out of order

560Example - Data either flows, or it does not flow. Hostnames either resolve, or do
not. Packets are either routed or dropped. This is the easy part.

561Retransmitted or fragmented packets. Packets received out of order, or even


corrupted, or duplicated. These are signs of something more complex going on, on
an otherwise functional network. These are important as well, because they cause
poor network performance and tend to snowball as well. Getting progressively
worse until the network is effectively unusable.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 681


[email protected]
Appendix

DNS Cache Settings


You can configure settings for the DNS cache.

Settings Description

TTL No Error Minimum Specifies the lower boundary on time-to-live


for cache hits. The default value is 30
seconds.

TTL No Error Maximum Specifies the upper boundary on time-to-live


for cache hits. The default value is 3600
seconds.

TTL Non-existent Domain Minimum Specifies the lower boundary on time-to-live


for nxdomain. The default value is 15
seconds.

TTL Non-existent Domain Maximum Specifies the upper boundary on time-to-live


for nxdomain. The default value is 3600
seconds.

TTL Other Failures Minimum Specifies the lower boundary on time-to-live


for non-nxdomain failures. The default value
is 0 seconds.

TTL Other Failures Maximum Specifies the upper boundary on time-to-live


for non-nxdomain failures. The default value
is 60 seconds.

TTL Lower Limit For Server Failures Specifies the lower boundary on time-to-live
for DNS server failures. The default value is
300 seconds.

TTL Upper Limit For Server Failures Specifies the upper boundary on time-to-live
for DNS server failures. The default value is
3600 seconds.

PowerScale Advanced Administration

Page 682 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Eager Refresh Specifies the lead time to refresh cache


entries that are nearing expiration. The
default value is 0 seconds.

Cache Entry Limit Specifies the maximum number of entries


that the DNS cache can contain. The default
value is 65536 entries.

Test Ping Delta Specifies the delta for checking the cbind
cluster health. The default value is 30
seconds.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 683


[email protected]
Appendix

Log File Rotation


Logs can increase in size at a rapid pace and some log files grow faster than
others. To manage the size of log files, they undergo a rotation process. Rotation
renames the log file to a version reference, and a new log file is started. Log files
are rotated on a predefined basis that differs based on the log file rate of growth.
The details of log rotation are in /var/log/vmlog.

Contents of /var/log/vmlog file.

PowerScale Advanced Administration

Page 684 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Job Types - Access Methods


When a Job Engine job needs to work on a large portion of the file system, there
are three main access methods available to accomplish this:

• Inode (LIN) scan - The most straightforward access method is via metadata,
using a Logical Inode (LIN) Scan. In addition to being simple to access in
parallel, LINs also provide a useful way of accurately determining the amount of
work required.
• Tree walk - A directory tree walk is the traditional access method since it works
similarly to common UNIX utilities, such as find - albeit in a far more distributed
way. For parallel execution, the various job tasks are each assigned a separate
subdirectory tree. Unlike LIN scans, tree walks may prove to be heavily
unbalanced, due to varying sub-directory depths and file counts.
• Drive scan - Disk drives provide excellent linear read access, so a drive scan
can deliver orders of magnitude better performance than a directory tree walk or
LIN scan for jobs that don’t require insight into file system structure. As such,
drive scans are ideal for jobs like MediaScan, which linearly traverses each
node’s disks looking for bad disk sectors.
• Changelist - Some Job Engine jobs utilize a changelist, rather than LIN-based
scanning. The changelist approach analyzes two snapshots to find the LINs
which changed (delta) between the snapshots, and then dives in to determine
the exact changes.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 685


[email protected]
Appendix

Space Saving Jobs


A space saving job is identified with a flag in the job-config output:
isi_gconfig –t job-config jobs.types.<job
type>.pi_alloc_reserved.

If the flag is true, the job is a space saving job. The jobs with the flag set by default
are:

• MultiScan
• AutoBalance
• Collect
• AutoBalanceLin
• ShadowStoreDelete
• SnapshotDelete
• TreeDelete

PowerScale Advanced Administration

Page 686 © Copyright 2020 Dell Inc.


[email protected]
Appendix

System Maintenance Jobs: Deep Dive


Key system maintenance jobs include those jobs responsible for data distribution,
integrity, protection, and inode maintenance.

The fundamental responsibility of the system maintenance jobs is to ensure that the
data on /ifs is:

• Protected at the desired level


• Balanced across nodes
• Correctly accounted

• AutoBalance - The goal of the AutoBalance job is to ensure that each node has
the same amount of data on it, in order to balance data evenly across the
cluster. AutoBalance, along with the Collect job, is run after any cluster group
change, unless there are any storage nodes in a “down” state. Upon visiting
each file, AutoBalance performs the following two operations:
− File level rebalancing - evenly spreads data across the cluster nodes in
order to achieve balance within a particular file.
− Full array rebalancing - moves data between nodes to achieve an overall
cluster balance within a 5% delta across nodes.
• AutoBalanceLin - There is also an AutoBalanceLin job available, which is
automatically run in place of AutoBalance when the cluster has a metadata copy
available on SSD. AutoBalanceLin provides an expedited job runtime.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 687


[email protected]
Appendix

• Collect - The Collect job is responsible for locating unused inodes and data
blocks across the file system. Collect runs by default after a cluster group
change, with AutoBalance, as part of the MultiScan job. In its first phase, Collect
performs a marking job, scanning all the inodes (LINs) and identifying their
associated blocks. Collect marks all the blocks which are currently allocated and
in use, and any unmarked blocks are identified as candidates to be freed for
reuse, so that the disk space they occupy can be reclaimed and reallocated. All
metadata must be read in this phase in order to mark every reference, and must
be done completely, to avoid sweeping in-use blocks and introducing allocation
corruption. Collect’s second phase scans all the cluster’s drives and performs
the freeing up, or sweeping, of any unmarked blocks so that they can be
reused.
• MultiScan - The MultiScan job, which combines the functionality of
AutoBalance and Collect, is automatically run after a group change which adds
a device to the cluster. AutoBalance(Lin) and Collect are only run manually if
MultiScan has been disabled. Multiscan is started when:

− Data is unbalanced within one or more disk pools, which triggers MultiScan
to start the AutoBalance phase only.
− When drives have been unavailable for long enough to warrant a Collect job,
which triggers MultiScan to start both its AutoBalance and Collect phases.
• FlexProtect - responsible for maintaining the appropriate protection level of
data across the cluster. For example, it ensures that a file which is supposed to
be protected at 2x, is protected at that level. Run automatically after a drive or
node removal or failure, FlexProtect locates any unprotected files on the cluster,
and repairs them as quickly as possible. The FlexProtect job includes the
following distinct phases:
− Drive Scan: FlexProtect scans the cluster’s drives, looking for files and
inodes in need of repair. When one is found, the job opens the LIN and
repairs it and the corresponding data blocks using the restripe process.
− LIN Verification: Once the drive scan is complete, the LIN verification phase
scans the inode (LIN) tree and verifies, reverifies and resolves any
outstanding reprotection tasks.
− Device Removal: In this final phase, FlexProtect removes the successfully
repaired drives or nodes from the cluster.

PowerScale Advanced Administration

Page 688 © Copyright 2020 Dell Inc.


[email protected]
Appendix

In OneFS 8.2 and later, FlexProtect does not pause when there is only one
temporarily unavailable device in a disk pool, when a device is smartfailed, or
for dead devices.
• FlexProtectLin - is run by default when there is a copy of file system metadata
available on SSD storage. FlexProtectLin typically offers significant runtime
improvements over its conventional disk-based counterpart.
• IntegrityScan - The IntegrityScan job is responsible for examining the entire
live file system for inconsistencies. It does this by systematically reading every
block and verifying its associated checksum. Unlike traditional ‘fsck’ style file
system integrity checking tools, IntegrityScan is designed to run while the
cluster is fully operational, thereby removing the need for any downtime. When
IntegrityScan detects a checksum mismatch, it generates and alert, logs the
error to the IDI logs and provides a full report upon job completion. IntegrityScan
is typically run manually if the integrity of the file system is ever in doubt.
Although the job itself may take several days or more to complete, the file
system is online and available during this time. Also, like all phases of the
OneFS job engine, IntegrityScan can be prioritized, paused, or stopped,
depending on the impact to cluster operations.
• MediaScan - The role of MediaScan within the file system protection framework
is to periodically check for and resolve drive bit errors across the cluster. This
proactive data integrity approach helps guard against a phenomenon known as
‘bit rot’, and the resulting specter of hardware induced silent data corruption.
MediaScan is run as a low-impact, low-priority background process, based on a
predefined schedule (monthly, by default). First, MediaScan’s search and repair
phase checks the disk sectors across all the drives in a cluster and, where
necessary, uses OneFS’ dynamic sector repair (DSR) process to resolve any
ECC sector errors that it encounters. For any ECC errors which cannot
immediately be repaired, MediaScan will first try to read the disk sector again
several times in the hopes that the issue is transient, and the drive can recover.
Failing that, MediaScan attempts to restripe files away from irreparable ECCs.
Finally, the MediaScan summary phase generates a report of the ECC errors
found and corrected.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 689


[email protected]
Appendix

Small File Storage Efficiency for Archive

The graphic shows mirrored small files that are packed into the shadow store with a more efficient
protection.

SFSE for archive have:


• SmartPools license
• Shadow store
• Packs files less than 1 MB and places in shadow store
• Infrequently modified, archive-type datasets
• Free space when unpacking

Listed are the points explaining SFSE for archive:

• SmartPools license: SFSE uses a SmartPools file-pool policy to move the


files.
• Shadow store: Shadow stores are similar to regular files, but do not contain the
metadata that is typically associated with regular file inodes. In particular, time-
based attributes such as creation time and modification time, are explicitly not
maintained. Shadow stores for storage efficiency differ from existing shadow
stores in order to isolate fragmentation, support tiering, and support future
optimizations.
They are invisible to users and contain data that other files share. Shadow
stores are parity protected using FEC, providing better protection than the
protection small files use. Shadow stores are not designed to handle deletes,
truncates, or overwrites.
• Packs files less than 1 MB and places in shadow store: Efficiency is
achieved by scanning the on-disk data for small files and packing them into

PowerScale Advanced Administration

Page 690 © Copyright 2020 Dell Inc.


[email protected]
Appendix

larger containers, or shadow stores. Shadow stores are parity protected using
erasure coding, and typically provide storage efficiency of 80% or greater.
• Infrequently modified, archive-type datasets: Use SFSE to archive static
small file workloads, or workloads with only moderate overwrites and deletes.
• Free space when unpacking: Ensure that the cluster has sufficient free space
before unpacking any containerized data.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 691


[email protected]
Appendix

Quotas Types
A SmartQuota can have one of four enforcement types:

Enforcement Type Enforcement Type

Hard A limit that cannot be exceeded. If an


operation such as a file write causes a
quota target to exceed a hard quota, the
operation fails, an alert is logged to the
cluster and a notification is sent to any
specified recipients. Writes resume
when the usage falls below the
threshold.

Soft A limit that can be exceeded until a


grace period has expired. When a soft
quota is exceeded, an alert is logged to
the cluster and a notification is issued to
any specified recipients. However, data
writes are permitted during the grace
period. If the soft threshold is still
exceeded when the period expires,
writes will be blocked, and a hard-limit
notification issued to any specified
recipients.

Advisory An informal limit that can be exceeded.


When an advisory quota threshold is
exceeded, an alert is logged to the
cluster and a notification is issued to any
specified recipients. Reaching an
advisory quota threshold does not
prevent data writes.

PowerScale Advanced Administration

Page 692 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Enforcement States
There are three SmartQuotas enforcement states:

Enforcement State Description

Under (U) If the usage is less than the


enforcement threshold, the
enforcement is in state U.

Over (O) If the usage is greater than the


enforcement threshold, the
enforcement is in state O.

Expired (E) If the usage is greater than the soft


threshold, and the usage has remained
over the enforcement threshold past
the grace period expiration, the soft
threshold is in state E. If an
administrator modifies the soft
threshold but not the grace period and
the usage still exceeds the threshold,
the enforcement is in state E.

There are a few exceptions to enforcement of Quotas including the following


scenarios:

• If a domain has an accounting only quota, enforcements for the domain are not
applied.
• Any administrator action may push a domain over quota. Examples include
changing protection, taking a snapshot, removing a snapshot, etc. The
administrator may write into any domain without obeying enforcements.
• Any system action may push a domain over quota, including repair etc. OneFS
maintenance processes are as powerful as the administrator.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 693


[email protected]
Appendix

Cluster Configuration Data

How do I keep configuration


data updated?

Configuration data such as SMB shares, NFS exports, and quota settings are not
replicated with SyncIQ. In a failover, the cluster configuration information must be
configured manually on the remote cluster.

There are alternatives to configuring manually, by using the following:


• Professional services562

562 Professional services can create and install a script that ensures configuration
data on the source cluster is maintained on the target cluster. Without the
professional services script, the best practice is to make configuration changes on
both clusters simultaneously. Use the exact same names for SMB shares and
same aliases for NFS exports. The same naming allows users to connect
seamlessly to the same shares or exports on their system during a failover. Quotas
should be managed on both clusters simultaneously. Best practices would have
quotas on both clusters so there are no potential over quota situations when the
failback occurs.

PowerScale Advanced Administration

Page 694 © Copyright 2020 Dell Inc.


[email protected]
Appendix

• Customer created563
• Another option is to use a third-party solution such as Superna Eyeglass.

563
Customers can create custom scripts using ZSH, Bash, or the Platform API. The
downside is the level of complication and limited support.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 695


[email protected]
Appendix

DataIQ Solution Steps


Click each step to learn more about developing a solution for the problem.

1: Preliminaries:

DataIQ should be installed and have established connections to storage systems


before the meeting. Authentication and group rules should be configured. DataIQ
scans should be done. An initial scan may complete in as little as an hour for a
small cluster. For a large customer with billions of objects, the initial scan may take
several days. Iterative scans complete quicker.

2: Meeting Planning:

Meet after the first scan completes. Could be same day as the install, or, for large
clusters, a day or two later. Meeting must be in front of a DataIQ client browser,
preferably with a large screen. If there are multiple data managers, consider
meeting with one initially. The results of this first meeting help guide the others in
their discussions with you.

3: Rules Guidelines:

If you are experienced with auto-tagging, you can write the rules during the
meeting, in the DataIQ auto-tagging configuration file. Rules can be tested in real
time, and then applied to the entire file system in minutes. Auto-tagging is
reversible. If the results do not meet expectations, tune the rules and retest. The
old tags are automatically removed. Alternately, you can write rules offline, test
against the path examples, and then apply in a follow-up meeting with the
customer.

PowerScale Advanced Administration

Page 696 © Copyright 2020 Dell Inc.


[email protected]
Appendix

4: Rules Investigation:

Determine the key customer file system structures. Key structures follow business
rules and represent value to the business. Make a note of file systems where these
rules are followed, the depth of rules, and the exceptions to the rules. For example,
an object at a depth of eight levels is likely a copy and as such, should not be in the
rule. Get path examples and make notes of how the path applies. Use the DataIQ
flagging feature to aggregate paths that need attention.

Ask about key file system policies such as naming conventions. Ask about common
file system errors or violations such as obsolete naming conventions, common
typos, and so on. Be on the lookout for junk names like "old backup," "delete me,"
and "landfill," that could represent unused data. You can create a tag to identify the
junk data.

5: Rules Configuration Knowledge Transfer:

After the rules are written, familiarize the managers with the DataIQ Analyze
functionality. Guide them through an analysis to emphasize the simplicity of
generating the reports that they want. Know how to act on the results and perform
an analysis or the need for additional refinement.

6: Meeting Follow Up:

Customers frequently pick up on the rules patterns and write their own. Regardless,
a scheduled routine to check or update the rules is a good opportunity to ensure
that reports continue to meet the business needs. Also, check ups provide the
opportunity to engage with the customer and better understand their pain points
and future needs.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 697


[email protected]
Appendix

Multi-factor Authentication

Multi-Factor Authentication (MFA) Overview

Definition

Multi-factor authentication (MFA) is a method of computer access control in which the user is only granted access after
successfully presenting several separate pieces of evidence to an authentication mechanism.

• MFA increases the security564 of a cluster.


• MFA enables the lsass daemon to require and accept multiple forms of
credentials other than a username or password combination for some forms of
authentication.
• You can implement MFA in many ways, the most common being public or
private key authentication.
• The MFA feature adds PAPI support for SSH configurations using public keys
that are stored in LDAP.

564
Increasing the security of privileged account access (for example,
administrators) to a cluster is the best way to prevent unauthorized access.

PowerScale Advanced Administration

Page 698 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Multi-factor Authentication in OneFS

The Duo security platform handles MFA support for SSH with PowerScale.

The Duo service offers flexibility by including support for the Duo App565, SMS566,
voice567, bypass codes568 and USB keys569.

565
Approve login requests via smartphone and smartwatch using the Duo Mobile
app.

566 Receive passcodes by text message to quickly authenticate.

567 Receive a call via cell phone, landline, or car phone to quickly authenticate.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 699


[email protected]
Appendix

The SSH implementation in OneFS 8.2 and later include:

• Support for MFA with the Duo Service in conjunction with passwords, public
keys, or both.
• The public keys for users are stored in the LDAP server.
• SSH is configured by using the OneFS CLI.

568Duo enables the creation of permanent, one-time, or date/time limited bypass


keys for a specific user or group to bypass MFA.

569 Physical USB devices can be used exclusively to verify logins.

PowerScale Advanced Administration

Page 700 © Copyright 2020 Dell Inc.


[email protected]
Appendix

SSH Multi-Factor Authentication with Duo

• Duo requires an account with the Duo service (duo.com).


• Duo provides the host, integration key (ikey), and secret key (skey) needed for
configuration570.
• Duo cannot be configured if the SSH authentication type is set to any or
custom.

570 Duo can be disabled and re-enabled without reentering the host, ikey, and skey.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 701


[email protected]
Appendix

• Specific users or groups can bypass571 MFA if specified on the Duo server.
• Duo uses a simple name match572 and is not AD aware.
• Duo has 2 two failback modes specifying what to do if the Duo service is
unavailable: Safe573 and Secure574.

SSH Multi-Factor Authentication with Duo Procedure

Click the steps below to know more about the process of SSH Multi-factor
authentication with Duo.

571A bypass key does not work if auto push is set to true as no prompt option is
shown to the user.

The AD user ‘DOMAIN\john’ and the LDAP user ‘john’ are the same user to
572

Duo.

573 In safe mode SSH will allow normal authentication if Duo cannot be reached.

574In secure mode SSH will fail if Duo cannot be reached. This includes ‘bypass’
users, since the bypass state is determined by the Duo service.

PowerScale Advanced Administration

Page 702 © Copyright 2020 Dell Inc.


[email protected]
Appendix

1:

• Configure on Duo.
• Go to Dashboard > Application > Protect an Application.
• PowerScale cluster is represented as a UNIX application.
• Three components that are generated are: Integration Key, Secret Key, and API
Hostname.

2:

• Use same usernames as available (authentication providers) on the


PowerScale cluster. Ex: root, admin.
• Choose how to user gets Duo notifications (phone, app).
• Duo allows bypass codes.
• Can import users using a .csv file.

3:

• Modify SSH auth requirements.


• By default, the authentication setting template is set for "any." With Duo, the
template must be set to "password", "public key", or "both."
• Use Duo service API-key and integration-key.

Inclusion and Bypass Groups

Specify group option for use with the Duo service or for exclusion from the Duo
service. One or more groups can be associated. You can configure three types of
groups.
• Local groups (local authentication provider)
• Remote authentication provider groups (for example LDAP) - can add users
without a Duo account to the group.
• Duo groups that are created and managed though the Duo Service.

• Add users to the group and specify as Bypass - users of this group can SSH
in without MFA.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 703


[email protected]
Appendix

• The Duo service must be contacted to determine if the user is in the bypass
group or not.
Administrators can create a local or remote provider group as an exclusion group
using the CLI575. Users in this group are not prompted for a Duo key.

Important: OneFS checks the exclusion before contacting Duo. This


is a method for creating users that can SSH into the cluster when the
Duo Service is not available and failback mode is set to secure.

575Zsh may require to escape the ‘!’. If using such an exclusion group, precede it
by an asterisk to ensure that all other groups require the Duo One Time Key (“--
groups=“*!”).

PowerScale Advanced Administration

Page 704 © Copyright 2020 Dell Inc.


[email protected]
Appendix

MFA Administration - isi ssh Command

List all the configuration options and syntax

The output shows ciphers, algorithms, ignore rhosts,


and so on

SSH has CLI support to view and configure exposed settings using the isi ssh
settings view and isi ssh settings modify commands.

• The --user-auth-method option is used to configure the authentication


method.576
• The --match option creates block of rules applied to subsets of users.577
• The log level for SSH578 can be modified with the --log-level option.

576This option ensures that the correct sets of settings are placed in the required
configuration files. The settings are password, public key, both or any.

577 Match blocks usually span multiple lines. If the option starts with --match=“, it
allows line returns and spaces until reaching the end quote (“).

578 It takes the default values allowed by SSH.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 705


[email protected]
Appendix

An upgrade imports the existing SSH configuration579 into gconfig.

Settings580 not exposed by the CLI can still be set in the


/etc/mcp/templates/sshd.conf file.

Important: The current SSH session stays connected after


configuration changes are made. Keep the session open until the
configuration changes are tested. Closing the current session with a
bad configuration may prevent SSH login.

MFA Administration - isi auth duo Command

List all the configuration options


and syntax

The isi auth duo modify command is used to configure the Duo provider
settings.

579 The upgrade includes settings that are exposed and not exposed by the CLI.

580These settings will be propagated to the actual /etc/ssh/sshd_config file by mcp


as well as imported into gconfig.

PowerScale Advanced Administration

Page 706 © Copyright 2020 Dell Inc.


[email protected]
Appendix

• The --autopush option specifies if Duo automatically sends a push login


request to the users phone.
• The --failmode option specifies if Duo will fail safe or secure on configuration
or service errors.
• The --http-proxy option specifies the HTTP proxy to use in case one is
configured.

The isi auth duo view command is used to view the configured settings.

SSH Authentication Process

The SSH authentication consists of five steps.

1: The administrator sets the user authentication method to either public key,
password or both using the isi ssh settings modify command.

2:

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 707


[email protected]
Appendix

• When the user authentication method is set to public key or both, the private
key of the user is provided at start of session. This is verified against user's
public key (from home directory or LDAP).
• When the user authentication method is set to password or both, the SSH
server requests the user’s password, which is sent to PAM and verified against
the password file or LSASS.

3: If Duo is enable, the user's name is sent to the Duo service.

• If autopush set to yes, one time key sent to the user on set device.
• If autopush set to no, user chooses from list of devices that are linked to
account - one time key that is sent to that device.
• The user enters the key at the prompt and key is sent to Duo to verify if correct.

4: The user is checked for the appropriate RBAC SSH privilege.

5: If all of the above steps succeed, the user is granted SSH access.

LDAP Public Keys

OneFS 8.2 and later enables the use of public SSH keys from LDAP rather than
from a users home directory on the cluster.

• The LDAP create and modify commands support the --ssh-public-key-


attribute option.
• The most common attribute for the --ssh-public-key-attribute option
is the sshPublicKeyattribute from the ldapPublicKey object class.
• The public key for a user may be viewed by adding --show-ssh-key option.
• You can specify multiple keys in the LDAP configuration. While there is a match,
the key that corresponds to the private key in the SSH session is used.
• The user needs a home directory on the cluster, without a home directory the
user gets an error when logging in.

PowerScale Advanced Administration

Page 708 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Antivirus

Antivirus Overview

OneFS allows:

• File system scanning for viruses, malware, and other security threats on a
PowerScale cluster
• Integration with third-party scanning services through the Internet Content
Adaptation Protocol (ICAP)
• Sending files through ICAP to a server running third-party antivirus scanning
software

These ICAP servers run the antivirus software and files are scanned for threats on
ICAP servers, not the cluster itself.

Note: Antivirus scanning is available only on nodes in the cluster that


are connected to the external network.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 709


[email protected]
Appendix

Antivirus Process

When an ICAP server scans a file, it informs OneFS of whether the file is a threat. If
a threat is detected:

• OneFS informs system administrators by creating an event.


• Displays near real-time summary information
• Documents the threat in an anti-virus scan report

OneFS can request ICAP servers attempt to:

• Repair infected files.


• Configure OneFS to protect users against potentially dangerous files by
truncating or quarantining infected files.

Before OneFS sends a file to be scanned, it ensures that the scan is not redundant.
Scanned files, unmodified from a previous scan, are sent for rescanning only if the
virus database on the ICAP server was updated since the last scan.

Antivirus Process Steps

1: The end user requests a file from the cluster.

2: OneFS verifies the file scan requirements, for examples: - file modification since
the last scan - a recent update of the antivirus definition file. In such cases, the file
is placed on the scan queue.

PowerScale Advanced Administration

Page 710 © Copyright 2020 Dell Inc.


[email protected]
Appendix

3: The requested file is assigned a worker thread, which sends it to an ICAP server.
If the requested file is excluded from scanning by path or glob filter, it is skipped.

4: The ICAP server determines if it is clean or needs repair or quarantine. If clean,


the server responds to isi_avscan_d, marks as safe with metadata such as
ISTag, the last scan date, and other attributes.

5: The PowerScale cluster serves the file to the end user.

Scan Types

OneFS supports three types of scans.

On-Access Scanning

You can configure OneFS to send files for scanning581 before they are opened,
after they are closed, or both instances.

Sending files to be scanned:

• After the files are closed is faster but less secure.


• Before they are opened is slower but more secure.

Antivirus Policy Scanning

Using the OneFS Job Engine, you can create antivirus scanning policies, which
sends the files from a specified directory to be scanned.

581
Scanning can be done through file access protocols such as SMB, NFS, and
SSH.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 711


[email protected]
Appendix

• Antivirus policies can run manually at any time, or configured to run according to
a schedule.
• Antivirus policies target a specific directory on the cluster.

Individual file scanning

Individual file scanning sends a specific file to an ICAP server for scanning at any
time.

• If a virus is detected in a file but the ICAP server is unable to repair it, OneFS
can send the file to ICAP server582.
• To perform an individual file scan, run the following CLI command: isi
antivirus scan <file path>

On-Access Scanning: If configuring OneFS to scan files before they are opened,
also configure OneFS to files that are scanned after they are closed. Scanning files
as they are both opened and closed will not necessarily improve security, but
usually improves data availability. This is because when you scan a file after it's
closed and a user wants to access that file, it will not need to be scanned again
unless the antivirus server database has been updated. It means that most of the
time we won't have scan before open if the File is accessed multiple times.

Antivirus Policy Scanning: OneFS can prevent an antivirus policy from sending
certain files for scanning within the specified root directory based on the size,
name, or extension of the file.

582This file is sent after the virus database has been updated. The ICAP server
might then be able to repair the file. Scanning files individually tests the connection
between the cluster and ICAP servers.

PowerScale Advanced Administration

Page 712 © Copyright 2020 Dell Inc.


[email protected]
Appendix

WORM Files and Antivirus

The SmartLock software module identifies a directory in OneFS as a WORM


domain.

• OneFS commits all files in a WORM domain to a WORM state.


• You cannot overwrite, modify, or delete files in a WORM domain.
• ICAP can scan WORM files for viruses and other security threats.
• ICAP cannot repair or delete WORM files during an antivirus scan.
• If a WORM file is found to be a threat, ICAP quarantines the file.

Best Practice: Administrators can initiate an antivirus scan on files


before the files are committed to a WORM state.

Antivirus software can scan and quarantine the Write-Once, Read-Many (WORM)
files, but cannot repair or delete WORM files until their retention period expires.

ICAP Servers

The number of ICAP servers that are required to support a PowerScale cluster
depends on how you configure583 virus scanning.

Measuring sizing results from ICAP servers:

583The amount of data a cluster processes, and the processing power of the ICAP
servers.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 713


[email protected]
Appendix

In general, the workload for ICAP servers is CPU intensive.

If the CPU utilization of the ICAP servers is over 95%, it is recommended to add
more CPU to the ICAP servers or add more ICAP server to the OneFS antivirus
solution.

Measuring sizing results from PowerScale cluster:

OneFS provides two metrics to indicate whether the ICAP service can sustain the
workload584.

• too_busy status
• fail to scan ratio

See Dell EMC PowerScale: Antivirus Solutions for additional information.

Caution: When files are sent from the cluster to an ICAP server, they
are sent across the network in cleartext. Ensure that the path from the
cluster to the ICAP server is on a trusted network. Authentication is
not supported. If authentication is required between an ICAP client
and ICAP server, hop-by-hop Proxy Authentication must be used.

If the user intends to:

• Scan files through antivirus scan policies, it is recommended to have a minimum


of two ICAP servers per cluster.

584If either of these errors occur, it usually indicates that there are not enough ICAP
servers to catch up with the speed of the workload. Add more ICAP servers until
the workload is manageable.

PowerScale Advanced Administration

Page 714 © Copyright 2020 Dell Inc.


[email protected]
Appendix

• Scan files on-access, it is recommended to have at least one ICAP server for
each node in the cluster.
• Configure more than one ICAP server for a cluster, ensure that the processing
power of each ICAP server is relatively equal. OneFS distributes files to the
ICAP server on a rotating basis, regardless of the processing power of the ICAP
servers.

too_busy status: PowerScale internally keeps a list of the status of ICAP servers
that are connected to isi_avscan_d. All state information of isi_avscan_d
including the status for the ICAP servers is recorded in the file
/var/log/isi_avscan_d.log.

If the too_busy state is set to true, this state means that an ICAP server is busy and
not able to respond with the expected reply. The too_busy state indicates that there
are not enough ICAP servers for the workload. Add more ICAP servers until the
too_busy state is false for all ICAP servers.

fail to scan ratio: The failed to scan ratio is as follows:

𝐹𝑎𝑖𝑙𝑒𝑑 𝑛𝑢𝑚𝑏𝑒𝑟𝑠
𝐹𝑎𝑖𝑙𝑒𝑑 𝑡𝑜 𝑠𝑐𝑎𝑛 𝑟𝑎𝑡𝑖𝑜 = × 100%
𝑆𝑐𝑎𝑛𝑛𝑒𝑑 𝑛𝑢𝑚𝑏𝑒𝑟𝑠

High value of failed-to-scan ratio can occur due to various reasons like ICAP
socket timeout or a poor network condition. Higher ratio value means that there are
limited ICAP servers to catch up with the speed of the workload, especially when
using scan on close. In this case, add more ICAP servers and check if the fail-to-
scan ratio is reduced.

Performance Factors

The performance of ICAP servers and its performance impact on the PowerScale
cluster depend on many factors. Click the factors to learn more.

Not All Nodes on Network (NANON) Configuration

The OneFS policy scan does not work with a cluster node that is not on the network
(NANON). A node not connected to the network cannot connect to the ICAP server,
which causes the antivirus job engine to fail. KB article for more information.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 715


[email protected]
Appendix

Updating the Virus Definition File

Most antivirus vendors update their virus definition file at least once a day to
eliminate the potential threats of a new virus.

• Through a scheduled job or a manual update, the updated virus definition files
are pushed to all the ICAP servers.
• At the same time, the ICAP service tag (ISTag) is updated at the ICAP server
level.
• OneFS maintains a timer job to synchronize the ISTags from ICAP servers
every hour.
• The timer job can result in a maximum wait of one hour for ISTags to be
updated on the cluster.

Dell EMC recommends setting an interval for updating the virus definition file for
ICAP servers to align with your scan policy. This alignment avoids unnecessary
scans that could negatively impact overall performance. For the details of ISTag,
refer to the IETF article RFC3507

File Size

File size is a key performance factor for ICAP servers integrated with PowerScale.

• Less than 1 MB file size: This typically results in a very steady and gentle trend
of the value for the scanned files per second.
• Greater than 1MB file size: This typically results in the value of the scanned files
per second decreasing quickly.

Note: Scanned files per second could vary depending on the PowerScale node
type, node number, ICAP server vendors, ICAP server number, network bandwidth,
and other factors. This number should remain steady for small files less than 1 MB.

Network Bandwidth

For optimal performance, Dell EMC recommends the following best practices for
the network bandwidth of ICAP servers, depending on the average file size.

• Less than 1 MB average file size: 1Gbps for ICAP servers


• More than 1 MB average file size: 10Gbps for ICAP servers

PowerScale Advanced Administration

Page 716 © Copyright 2020 Dell Inc.


[email protected]
Appendix

ICAP Server Threads

The number of ICAP server threads is one of the most important configurations
regarding the ICAP server. Vendors have different recommendations for numbers
for threads, and within the same vendor there can be different versions with
different thread recommendations.

The following is a general recommendation to use as a starting point. Test different


thread numbers to determine the best value for your environment.

• McAfee: 50 to 100
• Symantec: ~20

Number of Files in a Directory

The CPU utilization on the PowerScale node can be high when there is a large
number of files in a directory to be scanned. At the same time, the overall scanning
performance is degraded.

For detailed recommendations per disk type, refer to the following:

• HDD: 20,000 files per directory


• SSD: 1,000,000 files per directory

For a detailed explanation, refer to the KB article isi_avscan_d process utilizes a lot
of processor when accessing directories with a high number of files.

Antivirus Administration

To go to the antivirus option in the WebUI, click Data protection tab and then click
Antivirus.

CLI command: isi antivirus policies list

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 717


[email protected]
Appendix

1 2 3 4 5

1: Antivirus policies are created that cause specific files to be scanned for viruses
each time the policy is run. Users can modify and delete antivirus policies. Antivirus
policies can be temporarily disabled should users want to retain the policy but do
not want to scan the files.

Multiple files can be scanned for viruses by manually running an antivirus policy, or
scan an individual file without an anti-virus policy. These scans can also be
stopped.

2: Antivirus reports can be viewed through the web administration interface. Events
that are related to anti-virus activity are also viewable.

3: Files can be repaired, quarantined, or truncated where threats are detected. If a


quarantined file is no longer a threat, the file can be rescanned or removed from
quarantine. A file can be quarantined to prevent the file from user access. A file can
be rescanned for viruses or removed from quarantine if, for example, you believe
that a file is no longer a threat. Files can be viewed that have been identified as
threats by an ICAP server.

4: Before the user can send files to be scanned on an ICAP server, they must
configure OneFS to connect to the server. The user has to test, modify, and
remove an ICAP server connection. User can temporarily disconnect and reconnect
to an ICAP server.

A user can add and connect to an ICAP server. After a server is added, OneFS can
send files to the server for virus scanning. If the user prevents OneFS from sending

PowerScale Advanced Administration

Page 718 © Copyright 2020 Dell Inc.


[email protected]
Appendix

files to an ICAP server, yet wants to retain the ICAP server connection settings, the
user can temporarily disconnect from the ICAP server.

5: A user can configure global antivirus settings that are applied to all anti-virus
scans by default in this Setting tab.

Managing ICAP Servers

Before you can send files to be scanned on an ICAP server, you must configure
OneFS to connect to the server. You can test, modify, and remove an ICAP server
connection. You can temporarily disconnect and reconnect to an ICAP server.

Path: Data protection > Antivirus > ICAP servers

Follow the steps given below to manage ICAP servers:

• Click Data protection > Antivirus > ICAP Server


• In the ICAP server area, click Add an ICAP server
• Optional: To enable the ICAP server, click Enable ICAP server.
• In the ICAP server URL field, type the IPv4 or IPv6 address of an ICAP server.
• Optional: Add a description of the server, type into the Description field.
• Click Add Server

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 719


[email protected]
Appendix

Creating an Antivirus Policy

Users can create an antivirus policy that causes specific files to be scanned for
viruses each time the policy is run.

PowerScale Advanced Administration

Page 720 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Recursion depth dictates how


much of the specific directories you
want to scan

Select Enable force run of policy regardless


of impact policy to scan all files regardless of
previous scan, or if global settings specify
certain files should not be scanned,

To modify the default impact policy of the


antivirus scans, select a new impact policy
from the Impact policy list

Click to create an
antivirus policy

Path: Data protection > Antivirus > Policies

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 721


[email protected]
Appendix

Antivirus Scan Reports

OneFS generates reports about antivirus scans. Each time that an antivirus policy
is run, OneFS generates a report for that policy. OneFS also generates a report
every 24 hours that includes all on-access scans that occurred during the day.

Antivirus scan reports contain the following information:


• The time that the scan started and ended.
• The total number and size of the files scanned.
• The total network traffic sent, and the network throughput virus scanning
consumes.
• The total number and names of the infected files detected.
• The threats that are associated with infected files and how OneFS responded to
detected threats.
• Finally, whether the scan succeeded or not.

Antivirus Threat Responses

You can configure OneFS to send an alert, repair, quarantine, or truncate about
any file in which the ICAP server detects viruses.

Threat Response Description

Alert OneFS generates an alert at the warning level for all


threat detection, regardless of the threat response
configuration.

Repair The ICAP server attempts to repair the infected file


before returning the file to OneFS.

Quarantine OneFS quarantines the infected file. A quarantined file


is non-accessible by any user. However, the root user
can remove a quarantined file while connected to the
cluster through secure shell (SSH). Quarantines
operates independently of access control lists (ACLs).

PowerScale Advanced Administration

Page 722 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Truncate OneFS truncates the infected file. When a file is


truncated, OneFS reduces the size of the file to zero
bytes to render the file harmless.

Tip: For more information about antivirus threat responses, see the
OneFS Web Administration Guide.

Configure Global Antivirus Settings

Users can configure global antivirus settings that are applied to all antivirus scans
by default.

To exclude files based on file size, in the


Maximum file scan size area, specify the
largest file size to scan.

To exclude files based on file


name, select Enable filters.

OneFS can be configured to automatically scan files as they are


accessed by users. On-access scans operate independently of
antivirus policies.
To require that all files be scanned
before they are opened.

Specify whether you want to allow access to files that


cannot be scanned by selecting or clearing Enable file
To scan files after they are access when scanning fails.
closed, select Enable scan of
files on close

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 723


[email protected]
Appendix

Tip: See the OneFS Web Administration Guide for wildcard


characters to add to customize a filter.

Antivirus isi Commands

Example 1

To view the status of the recent scans, enter the following command: isi
antivirus reports scans list

Output that shows the information for a scan.

Example 2

For more details about a scan, use the following command: isi antivirus
reports scans view <id>

PowerScale Advanced Administration

Page 724 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Output that shows ID and the status of the scan reports.

Other Commands

The table shows some other Antivirus isi commands.

isi commands Usage

isi antivirus settings modify To specify any configuration changes to


the target files.

isi antivirus servers create To add and connect to an ICAP server.

isi antivirus policies list To view the antivirus policies.

Tip: For more information regarding antivirus isi commands, see


OneFS CLI Administration Guide and OneFS CLI Command
Reference.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 725


[email protected]
Appendix

PowerScale Advanced Administration

Page 726 © Copyright 2020 Dell Inc.


[email protected]
Appendix

SnapshotIQ and OneFS Feature Integrations

Snapshot and OneFS

• SnapshotIQ allows an administrator to create a frozen, point-in-time view of


OneFS, while allowing normal file system modifications to continue without
interruption.
• Efforts are made to ensure that Snapshots functionality consumes minimal
system resources (disk space, CPU, etc.) while providing maximum flexibility to
the system administrator and users.
• Snapshots can be used on their own to provide functionality like user-initiated
file restoration and staging of exported content, or in conjunction with other
OneFS features such as Backup and SyncIQ to enhance the power and
flexibility of those applications.

CloudPools and Snapshots

CloudPools can archive datasets that have associated snapshots. However,


archiving snapshot files to the cloud does not result in space savings on the cluster
until all the snapshots taken prior to archiving have either expired or been deleted.
CloudPools 2.0 delivers increased snapshot efficiency for files with older, unexpired
snapshots:

• Eliminates snapshots data CoW on archive


• Data consumed by snapshots pre-archive remains on-premise
• Cache invalidation and write-back in snapshots
• Faster recall performance
• Caching is enabled on snapshots and RO/DR file systems
• Fast I/O to stubs in snapshots

Changelist Job and Snapshots

In order to discover its scope of work one of the classes of the Job Engine jobs
utilizes a 'changelist', rather than a full LIN-based scan.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 727


[email protected]
Appendix

• The changelist approach analyzes two snapshots to find the LINs which
changed (delta) between the snapshots, and from there determines the exact
changes.
• SyncIQ replication and the File System Analyze (FSAnalyze) cluster analytics
are good examples of a job that leverages snapshot deltas and the
ChangelistCreate mechanism.
• The FilePolicy and FSAnalyze jobs in OneFS 8.2 and later automatically share
the same snapshots and index, created and managed by the IndexUpdate job.

− The new index stores considerably more file and snapshot attributes than
the old FSA index. Until the IndexUpdate job effects this change, FSA keeps
running on the old index and snapshots.

SmartPools Tiering and Snapshots

OneFS uses the SmartPools jobs to apply its file pool policies. To accomplish this,
the SmartPools job visits every file, and the SmartPoolsTree job visits a tree of
files. However, the scanning portion of these jobs can result in significant random
impact to the cluster and lengthy execution times, particularly in the case of
SmartPools job.

• To address this, the FilePolicy job, included with OneFS 8.2, and later provides
a faster, lower impact method for applying file pool policies than the full-blown
SmartPools job.
• In conjunction with the IndexUpdate job, FilePolicy improves job scan
performance by using a file system index or changelist, to find files needing
policy changes, rather than a full tree scan.
• This dramatically decreases the amount of locking and metadata scanning work
the job is required to perform, reducing impact on CPU and disk - albeit at the
expense of not doing everything that SmartPools does.
• The FilePolicy job enforces just the SmartPools file pool policies, as opposed
to the storage pool settings.

However, the vast majority of the time SmartPools and FilePolicy perform the same
work. Disabled by default, FilePolicy supports the full range of file pool policy
features, reports the same information, and provides the same configuration
options as the SmartPools job.

PowerScale Advanced Administration

Page 728 © Copyright 2020 Dell Inc.


[email protected]
Appendix

NDMP and Snapshots

• The NDMP Snapshot Management Extension Interface leverages the


extensibility of NDMP v4 to define a mechanism and protocol for controlling
primary storage file system images commonly referred to as snapshots.
• Specifically, this interface supports the management of automated and manual
snapshot creation, snapshot deletion, and general directory browsing as well as
full snapshot recovery and selective file recovery.
• This interface provides functionality allowing snapshots to be used to implement
near-line data protection solutions that offer faster backup and recovery times
compared to traditional tape based secondary storage.

In-line Data Reduction and Snapshots

• In-line data reduction, introduced in OneFS 8.1.3 for the F810 platform, will not
affect the data stored in a snapshot.
• However, snapshots can be created on compressed data. If a compression tier
is added to a cluster that already has a significant amount of data stored in
snapshots, it will take time before the snapshot data is affected by compression.
• Newly created snapshots will contain compressed data, but older snapshots will
not.

SmartPools and Snapshots

The snapshot storage target setting is applied to each file version by SmartPools.
When a snapshot is taken, the storage pool setting is simply preserved which
means that the snapshot will initially be written to the default data pool and then
moved

The SmartPools job subsequently finds the snapshot version and moves it to the
intended pool during the next scheduled SmartPools job run.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 729


[email protected]
Appendix

When using SmartPools, snapshots can be stored on a different disk tier than the one the original data
resides on.

Saving Snapshots to a different storage tier.

SyncIQ Replication and Snapshots

SyncIQ also leverages snapshots for the consistency points required to facilitate
replication, failover, and failback between PowerScale clusters. This means that
only the changes between the source and target datasets need to be replicated
between the two clusters. This helps for efficient replication and granular recovery
objectives. The snapshots generated by SyncIQ can also be used for archival
purposes on the target cluster.

Source Cluster Snapshots

SyncIQ creates snapshots on the source cluster to ensure that a consistent point-
in-time image is replicated, and that unaltered data is not sent to the target cluster.

• SyncIQ replicates data according to the snapshot rather than the current state
of the cluster, allowing users to modify source-directory files while ensuring that
an exact point-in-time image of the source directory is replicated.
• SyncIQ can also replicate data according to either an on-demand or scheduled
snapshot generated directly by SnapshotIQ. If data is replicated using a
SnapshotIQ snapshot, SyncIQ does not generate another snapshot of the
source directory.
• SyncIQ generates source snapshots to ensure that replication jobs do not
transfer unmodified data. When a job is created for a replication policy, SyncIQ
checks whether it is the first job created for the policy.

PowerScale Advanced Administration

Page 730 © Copyright 2020 Dell Inc.


[email protected]
Appendix

− If not, SyncIQ compares the snapshot generated for the earlier job with the
snapshot generated for the new job.
• SyncIQ replicates only data that has changed since the last time a snapshot
was generated for the replication policy. When a replication job is completed,
SyncIQ deletes the previous source-cluster snapshot and retains the most
recent snapshot until the next job is run

Target Cluster Snapshots

When a replication job is run, SyncIQ generates a snapshot on the target cluster to
facilitate failover operations. When the next replication job is created for the
replication policy, the job creates a new snapshot and deletes the old one.

• If a SnapshotIQ license has been activated on the target cluster, you can
configure a replication policy to generate additional snapshots that remain on
the target cluster even as subsequent replication jobs run.
• SyncIQ generates target snapshots to enable failover on the target cluster
regardless of whether a SnapshotIQ license has been configured on the target
cluster.
• Failover snapshots are generated when a replication job completes. SyncIQ
retains only one failover snapshot per replication policy and deletes the old
snapshot after the new snapshot is created.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 731


[email protected]
Appendix

Disaster Recovery Plans

Designing a Disaster Recovery Plan

Does my plan consider


PowerScale resiliency and
address recovery steps?

The disaster recovery plan documents the information needed for an organization
to react and act during a disaster scenario.

There is no one plan as each organization has their own priorities and recovery
objectives.

Consider impact outside of PowerScale.


• Authentication and Authorization
• DNS
• VIPs
• Mount points and shares

PowerScale Advanced Administration

Page 732 © Copyright 2020 Dell Inc.


[email protected]
Appendix

• DFS
• Client impact

1: A starting point to consider is what effect a disaster has on the organization?


Can the organization recover and remain viable if their site and all the data goes
away? The answer is probably “no”, and therefore a plan should be in place if a
disaster occurs.

2: What workflows are critical for the business and how long can they be offline?
The plan design ranks the organization’s workflows and applications, determining
recovery objectives for each.

3: Each workflow should have a runbook that outlines a step-by-step process for
recovery.

4: The plan is holistic, meaning all the organization’s functional areas in all affected
facilities are considered.

5: Multiple recovery options should be considered to include cloud storage, tape,


disk back ups, and asynchronous replication.

6: Test and validate the strategy. Do not design and set it up, teams must test and
ensure it works. Include call trees, response actions, and what if scenarios in the
test plan. Testing can be done as a walkthrough or a full interruption. Failback can
take an unexpected time to complete, exceeding testing windows. Whereas a
failover takes 30 seconds, a failback could take a week, depending on the size and
the amount of changed data. The test plan should be dynamic and reviewed and
updated periodically.

7: Maintenance can include contacts, teams, workflows, SOPs, and runbook


updates.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 733


[email protected]
Appendix

Site Disaster Recovery Scenarios

In my organization, what
scenarios apply to my
workflows? Why?

Several factors can help determine what type of scenario applies to a workflow.
First is analyzing the criticality of the workflow.

• A workgroup collaborating on a media editing project may not be affected as


much as files that are needed to meet a deadline.
• An example of an organization that has no tolerance to data downtime is a
financial institution that deals in online stock orders and banking transactions.

Asynchronous Replication:

• Restrict downtime workflows


• Must get the data, but not an emergency

PowerScale Advanced Administration

Page 734 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Synchronous Replication:

• No downtime workflows
• Every minute of downtime costs the business money

Media Recovery:

• Lightweight disaster recovery


• Loss of source data painful, but not a barrier to doing the job

What kind of data protection must the organization maintain to meet an acceptable
level of disruption? Is the data or access to the data important enough to warrant
the cost of a hot site? Is any amount of data loss detrimental to the business? How
important is it to business continuity if the workflow is down? Will the business lose
clients, jobs, money, or reputation if the data is inaccessible for extended periods?
SyncIQ replicates asynchronously and is not a high availability disaster recovery
strategy.

Lightweight Disaster Recovery

Scenario

The scenario highlights a lightweight disaster recovery plan for a media and
entertainment organization.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 735


[email protected]
Appendix

Workflow analysis:
• Workflow - media directory for file sharing and protection
• Business continuity risk is low
• Disaster recovery solution in line with business continuity
• ROI appropriate for workflow
• Dependencies
• Administrator makes recommendation, management decides

Solution

The remote disaster recovery solution shows the workflow that is replicated to a
remote office.

Users

Source Network Target

High level architecture or workflow Tape


Media
archive - long
RTO

Business continuity criteria:


• The criticality of the users’ media share is not high, hence the files are
synchronized to the target nightly.
• Past media projects are archived from the target cluster to tape.
• If a recovery of archive data is needed, resources may need to be implemented
at the remote facility to recover archive from tape.
• The analysis shows that the workgroup must have at least all the current
projects, an RTO, and RPO of 1 day.

Plan to Support

The disaster recovery plan for the lightweight workflow should include or link to
disaster recovery runbook for the workflow.

PowerScale Advanced Administration

Page 736 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Workflow SOP585:
• Disaster declared
• PowerScale administrator follows procedures to fail over
• Assessment made for non-critical data

Scenario: The users edit and update media files, audio and video editing and then
save their work on the /ifs/core/data/media directory on the PowerScale
cluster. Analysis on the workflow concludes that the downtime for the share has a
low risk to business continuity. The workgroup can do their work during extended
downtime of the core cluster or facility. If the core cluster is lost, the users can still
do their job, but may have to recreate some of the work. The PowerScale
administrator makes the recommendation for the workflow’s disaster recovery plan,
management decides. Keep in mind that decisions are made holistically,
considering other areas such as client, network, and critical workflows.

585 Standard Operating Procedure

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 737


[email protected]
Appendix

Solution: When a failure happens, and access is switched to the remote office, user
can access the previous day’s work. Losing some work is painful for the users, but
does not impact the business. User may need to re-create some of the work. This
diagram may be found in the disaster recovery plan, but noting granular details
such as switches and operating systems and versions.

The disaster recovery plan for the lightweight workflow should include or link to
disaster recovery runbook for the workflow. The RTO for the media workflow in the
disaster recovery plan is 24 hours. The runbook details the communications,
actions, and checks that the PowerScale administrator must perform. Prolonged
downtime to the source cluster may warrant implementing systems and restoring
from tape at the target site.

Restricted Downtime

Scenario

The restricted downtime scenario features a hospital that has a central repository
for patient records stored on the PowerScale cluster.

Workflow analysis:
• Workflow – patient data records
• Business continuity – delayed access is acceptable
• Disaster recovery solution in line with business continuity

PowerScale Advanced Administration

Page 738 © Copyright 2020 Dell Inc.


[email protected]
Appendix

• ROI appropriate for workflow


• Dependencies

Solution

The remote disaster recovery solution shows the workflow that is replicated to a
remote office.

Users

Source Network
Target

Cloud Archive data


cloud

Business continuity criteria:


• The disaster recovery solution for this scenario uses SyncIQ to replicate the
/ifs/source/data/records directory to a remote facility.
• The synchronize policy replicates on a two-hour schedule.
• The workflow runbook may include greater detail such as switches, operating
systems and versions, CLI commands to check and verify settings and
configuration.
• Though doctor’s can still see patients without access to the records, patient
records are critical and need continual updates.
• The analysis shows that the staff must have access to all patient records, and
an RTO and RPO of 4 hours.

Plan to Support

The graphic shows how the high-level milestones for the plan may look.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 739


[email protected]
Appendix

Workflow SOP:
• The disaster recovery teams for the hosting IT company are notified and
mobilized.
• The workflow is failed over, verified and access is confirmed.
• PowerScale administrator ensures that the target directory is accessible.
• The runbook details the communications, actions, and checks that the
PowerScale administrator must perform.
• Because archive-type data is tiered to the cloud, restores from tape at the target
is not needed.

Scenario: The restricted downtime scenario features a hospital that has a central
repository for patient records that are stored on the PowerScale cluster. The cluster
is shared with physicians and their staff. The staff accesses the patients records to
get information and medical history. The patient record data is kept in the
/ifs/source/data/records directory. In a disaster, the inability to immediately access
records is not an emergency that hurts the business. The staff and physicians can
still see the patient and meet their needs.

Solution: This diagram may be found in the disaster recovery plan.

PowerScale Advanced Administration

Page 740 © Copyright 2020 Dell Inc.


[email protected]
Appendix

No and Very Low Downtime

Scenario

This scenario highlights a financial trading organization.

Financial trading:
• Workflow - ticker data analysis
• Delayed access to the data places the business at risk

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 741


[email protected]
Appendix

Solution

Business continuity criteria:


• The SyncIQ policy to support the low downtime disaster recovery workflow uses
a synchronize SyncIQ policy on the /ifs/source/data/tick_near folder to
the target cluster.
• The job runs when the source is modified, but with a sync delay of two minutes.
• The archive data in the /ifs/source/data/tick_hist folder is replicated daily.
• This scenario uses Superna Eyeglass to automate the failover.
• The analysis determines the organization must have access to all tick data with
the lowest possible RTO and RPO.

Plan to Support

PowerScale configuration to support requirement

PowerScale Advanced Administration

Page 742 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Workflow SOP:
• The disaster recovery teams are notified and mobilized.
• The failover is automated, with HA servers across sites.
• The recovery teams verify and test for access.
• The organization’s critical systems such as the analytic tick servers that process
real-time data have automated failover with no downtime.

The graphic shows how the high-level milestones for the tick data workflow may look.

The organization collects, stores, and analyzes the data of all their current
investments to quickly react to changing markets and to maximize profits. The tick
data analytics is acted on real-time data, near real-time data, and historical data.
Ticker servers process and analyze the real-time data and then write data to the
PowerScale cluster on a schedule. Near real-time and historical data is accessed
on the PowerScale cluster. In a disaster situation, downtime on the ticker servers
can mean lost opportunities and lost clients for the business. For disaster
protection, the ticker servers form a cluster from both sites, providing no downtime
for real-time processing. Near real-time and historical data is replicated from the
source PowerScale cluster to the cluster at the target site.

Key Players – Teams and Roles

Listed are the teams and roles that should be part of a disaster recovery plan.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 743


[email protected]
Appendix

Team Role

Disaster Recovery Lead Decision maker, guides recovery

Disaster Management Oversee entire recovery process

Facilities Maintain target facility

Network Provide baseline network functionality

Server and Storage Provide baseline server and storage


functionality

Applications Provide tools needed

Operations Provide all communication during a


disaster - employees, clients, vendors, and
suppliers

PowerScale administrator on the disaster recovery team performs the recovery that
gives users and applications access to data on the cluster. The PowerScale
administrator follows the steps that are defined in the SOP for each workflow. A key
to meet the recovery milestones is communication and understanding the
dependencies between the technologies. For example, the PowerScale
administrator cannot verify user access to a failed over SMB share if the network
team has missed its milestone. If the organization is small, a single individual can
handle multiple roles.

Example DR Plan Activation

Cold Site

The scenario shows an organization with an RTO of 24 hours.

PowerScale Advanced Administration

Page 744 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Warm Site

This scenario scenario required a tighter RTO solution. Thus the organization opted
for a warm recovery site.

Cold Site: The plan details the actions and personnel that are needed from the
moment the incident is detected to the time the incident is resolved. In this
scenario, the secondary site is a cold site and given the 24-hour RTO, it is likely the
secondary site also stores the tape backups. The one hour between incident
detection and team activation may be due to the need to analyze the extent of the
problem. The incident may be a pervasive virus that could not be quarantined or
isolated. A disaster is declared at the incident plus 3 hours. This time may be built
into the plan, stating that a decision must be made at or before this milestone.
Once the disaster is declared, the respective teams are mobilized and the data
recovery begins. After access is restored, the teams provide reports on what
worked, what did not work, and how to improve.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 745


[email protected]
Appendix

Warm Site: Like the previous scenario, the plan is activated at about three hours
from the time the incident is detected. A four-hour RTO is not enough time to
recover data from tape on the secondary site. Here the teams switchover the
functions to the warm site and the warm site then hosts the data for the business.

Scenario – Resistance, Resilience, Recovery Plan

Here we examine how a hypothetical company, Diverse Genomics586, planned their


IT infrastructure to be resistant, resilient, and recoverable.

1 2 3

1:

• Earthquake and explosion proofed facility


• Human error proofed - access
• Redundant and backup power
• Fire fighting and emergency services
• Risk Assessment

586 Diverse Genomics is a cutting-edge organization at the forefront on genomics


research and development. The data center is located in California. They provide
discrete clinical data for university hospitals for specific patient demographic
information. Their business critical information is the instrumentation data that is
stored on the PowerScale cluster.

PowerScale Advanced Administration

Page 746 © Copyright 2020 Dell Inc.


[email protected]
Appendix

• Earthquake - likely
• Heat wave - likely
• Drought - likely
• Diverse Genomics:

• Life sciences vertical


• 6,000 employees and contributors
• 8 node PowerScale cluster
• Instrumentation data, home directories, and file shares
• Primary - California
• Secondary - Tennessee
2:

• PowerScale high availabilities features


• PowerScale hot swappable components
• Ease of serviceability
• Replication functionality to hot site
• Archive data to the cloud

3:

• Risks at the secondary site were identified


• Remote team or accommodations

Resistance: For natural disaster proofing, the facility Diverse Genomics is


architected and hardened for earthquakes and explosions. Flooding and hurricanes
are not a concern. For human error prevention, Diverse Genomics institutes a strict
policy for accessing the data center and systems within the data center. Different
levels of access are granted to users based on skills and needs. There is zero
tolerance for food and drink. The data center is also badge access only, and the
building is staffed with physical security. The data center has redundant power
sources for all racks and backup power. Diverse Genomics has procedures with the
local emergency teams and test the procedures once a quarter.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 747


[email protected]
Appendix

Resilience: One of the deciding factors for Diverse Genomics implementing the
PowerScale cluster is the rich set of high availability features. The PowerScale also
can hot swap components, keeping data available while servicing the cluster. The
SyncIQ feature replicates data to the secondary Diverse Genomics data center
located in Tennessee. If a disk or node fails, the components can be replaced with
no disruption. If a node’s network card fails, the card can be replaced with no
downtime. Data replicates to the secondary site, and a failover is tested every 60
days. When data reaches a specific age, it is automatically moved to cloud storage.

Recovery: The Diverse Genomics recovery plan is to bring down the primary site if
possible and fail over to the secondary site. The key individuals are identified in the
disaster plan. The risk at the recovery site in Tennessee is severe storms and
though not likely, tornados. To prevent a perfect storm scenario, the secondary site
has generators and fuel for backup power. Most management and monitoring can
be handled remotely, but if a disaster strikes, key responsibilities are shifted to
designated personnel at the secondary site.

PowerScale Advanced Administration

Page 748 © Copyright 2020 Dell Inc.


[email protected]
Appendix

InsightIQ

InsightIQ Overview

InsightIQ is an off-cluster performance and analytics tool available free of charge to


PowerScale customers.

Key features of InsightIQ are:

• Integrates seamlessly with OneFS Integration.587


• Supports a wide range of OneFS versions and monitors multiple clusters in a
single instance588.
• Offers booth CLI and WebUI configuration and query interfaces.589

Resource: InsightIQ runs either as a VMware virtual machine, or on a


physical Linux system. For additional information, click the links to
download the InsightIQ installation guide and administration guide.
• Isilon InsightIQ v4.1.3 Installation Guide
• Isilon InsightIQ v4.1.3 Administration Guide

587It integrates with PowerScale OneFS operating system to collect and store
performance and file system analytics data.

588
It is compatible with multiple versions of OneFS and can be configured to
monitor one or more PowerScale clusters at a time.

589It uses a web browser interface. Some command-line interface commands are
used for InsightIQ configuration changes and troubleshooting.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 749


[email protected]
Appendix

InsightIQ provides tools to monitor and analyze historical data from PowerScale
clusters. Using the InsightIQ web application, you can view standard or customized
Performance reports and File System reports, to monitor and analyze Isilon cluster
activity. You can create customized reports to view information about storage
cluster hardware, software, and protocol operations. You can publish, schedule,
and share reports, and you can export the data to a third-party application.

OneFS Version Compatibility

Each new non-maintenance release of the PowerScale OneFS operating system


requires an update to InsightIQ to support the feature changes. The newer release
maintains backwards compatibility to older still supported versions of OneFS.

InsightIQ OneFS OneFS OneFS OneFS OneFS


7.1x 7.2x 8.0x 8.1x 9.0x

InsightIQ
3.0x

InsightIQ
3.1x

InsightIQ
3.2x

InsightIQ
4.0x

InsightIQ
4.1x

PowerScale Advanced Administration

Page 750 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Resource: PowerScale Supportability and Compatibility Guide

Older OneFS versions may not contain some functionality that is required for all
current InsightIQ features. A significant improvement was made in file system
analytics (FSA) capabilities in OneFS 8.0. The FSA statistics require OneFS 8.0 or
higher, and InsightIQ 4.0 or higher. The InsightIQ and OneFS compatibility matrix is
available in the PowerScale Supportability and Compatibility Reference Guide.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 751


[email protected]
Appendix

Troubleshooting Datastore Issue

Hayden is not able to access the InsightIQ datastore. The


InsightIQ datastore is on a PowerScale cluster and an NFS
datastore permissions error message appears. The NFS
export is configured to grant write access to the root user.
Help Hayden to resolve this issue.

Verify that a valid NFS export is configured on that cluster.

• The configured NFS export must grant the root user write access590 for the
specified InsightIQ virtual machine IP address.

590This configuration enables InsightIQ to mount the cluster or server and create
the necessary directories and files on the cluster or server. InsightIQ connects to
the NFS host as the root user.

PowerScale Advanced Administration

Page 752 © Copyright 2020 Dell Inc.


[email protected]
Appendix

• If InsightIQ is configured to use a local datastore and a permissions error


message appears, connect to the virtual machine CLI591.
• If InsightIQ cannot write to the datastore, review the permissions settings592 for
the datastore directory and for all the files contained in the directory.

591Verify that the parent directory of the datastore is configured with a permission
setting of 755 (read/write/execute for root or the owner and read/execute for group
and others) or higher.

592All the files in the datastore directory must be configured with a permission
setting of 744 or higher. If the issue persists, verify that the directory's owner and
group settings are correctly configured. For an NFS datastore, the owner:group
setting must be nobody:nobody. For a local datastore, the owner:group setting
must be root:root.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 753


[email protected]
Appendix

InsightIQ User Interface

Dashboard

The InsightIQ software provides powerful performance monitoring and reporting


tools to help you maximize the performance of your PowerScale cluster.

The user interface is separated into four major sections, the Dashboard,
Performance Reporting, File System Reporting, and Settings.

PowerScale Advanced Administration

Page 754 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Cluster Overview

The Dashboard is an at-a-glance view of real-time cluster health and vital cluster
statistics.

You can quickly view capacity and performance for all connected clusters.

View the status of all the monitored clusters. InsightIQ Dashboard, available
through the InsightIQ web application, shows an overview of the status of all the
monitored clusters. The Cluster Status summary includes information about cluster
capacity, clients, throughput, and CPU usage. Current information about the
monitored clusters appears alongside graphs that show the relative changes in the
statistics over the past 12 hours.

The Aggregated Cluster Overview section displays the total or average values of
the status information for the monitored clusters. The aggregate view supports for
multiple clusters, and cluster-by-cluster view for each individual cluster. Each view
displays a capacity snapshot, key trends for connected and active clients, network
and file system throughput, and CPU usage.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 755


[email protected]
Appendix

This information can help you decide what to include in a Performance report. For
example, if the total amount of network traffic for all the monitored clusters is higher
than anticipated, a customized Performance report can show you the data about
network traffic. The report can show you the network throughput by direction, by
using breakouts, to help you determine whether one direction of throughput is
contributing to the total more than the other.

CPU usage provides the most interesting performance statistic relative to cluster
activity. When an anomaly is identified on the Dashboard, use the performance
reporting, or file system reporting to analyze the specific details.

View Performance Reporting

Performance reporting enables live activity for viewing with graphic representations,
using on-demand live generation of reports. Data continues to plot while viewing
the output.

Link: See Isilon InsightIQ 4.1.3 User Guide for more information.

From the Report Type drop down list, you can select desired report template. Live
reporting uses any saved standard or custom created report template to display
performance data. The Date Range option provides a mechanism to generate

PowerScale Advanced Administration

Page 756 © Copyright 2020 Dell Inc.


[email protected]
Appendix

scheduled reports. The report can be based on the current data, or data from a
specified time period. The Zoom Level of the report determines the granularity of
the data displayed.

Performance reporting can be used to examine cluster activities and investigate


potential issues at a deeper level. Scheduled reporting enables point-in-time
historical reporting based on preset criteria. Schedule reports are available for
online viewing, or you can obtain an email as a PDF for distribution.
Create/manage data filters allows further specific criteria to be set. Specific
clients, protocols, nodes, disks, paths, and / or events to only display the wanted
data can be either included or excluded. PERMALINK allows administrators to set
a bookmark to this report in the browser client.

Network Performance Report

The Network Performance report displays the health and status of the clusters
network.

Viewing Report Output

Scroll down in the report to display the available data series generated by the
selected report. Then select breakout category to view the desired metrics.

Breakout categories593

593 It includes client, protocol, op class, direction, interface, node, node pool, or tier.
In the example, data is spread across the node and is displayed below the chart.
Individual lines are displayed for each node. The higher the activity the darker the
time segment displayed.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 757


[email protected]
Appendix

Viewing Data Charts

Chart data displays the averages, and the breakout areas display the details by the
breakout that is selected from high to low.

PowerScale Advanced Administration

Page 758 © Copyright 2020 Dell Inc.


[email protected]
Appendix

• Hover over a data point to see the data point details.


• The breakouts are displayed using the plus symbol594 for the specific breakout.

Saving as CSV

To perform more detailed analytics, the report is available to download as a CSV


file.

Open the CSV file in a spreadsheet application or database to perform more


detailed inspection of the data.

594
The "+" enables to breakout one more level down or regroup data by an
additional breakout category. An element of the breakout can be broken down into
another breakout category.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 759


[email protected]
Appendix

Video: See Viewing protocol operations latency demo video for more
information.

Breakout: Modules in reports can provide a breakout of data by category to refine


the scope of information. You can apply breakouts to modules to view the individual
contributions of various performance characteristics. You can apply only one
breakout to a module at a time.

Breakouts provide heat maps that display variations of color to represent each
component's contribution to overall performance. The darker the color on a heat
map, the greater the activity for that component. Heat maps help you to visualize
performance trends and to identify periods of constrained performance.

If you hover the mouse pointer over any location on a heat map, InsightIQ shows
data for the specified component at that moment in time. Breakouts are sorted by
components that are based on level of activity, with the most active elements at the
top of the list.

Performance Reports

Creating Performance Report Templates

To create a custom report template, click Ceate a New


Performance Report

Create performance report templates if a standard template does not meet the
desired requirements.

Many organizations create specific reports for groups monitoring specific functions.

PowerScale Advanced Administration

Page 760 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Creating Custom Report Templates

Custom reports are created using a new template from a blank template, or with an
existing template and modifying it to meet the organization’s requirements.

Scheduled Reports

Live and Scheduled Report

Choose the report as a live report or a scheduled report.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 761


[email protected]
Appendix

• Live595
• Scheduled596

Scheduling a Performance Report

Scheduled reports provide a snapshot or point-in-time representation of the data.

Configure up to ten email recipients to receive the report as a PDF file, or access
the reports online from the Manage Performance Reporting page.

595 Live reports are then displayed and available in the Live Reporting window.

596Scheduled reports are generated at a specific time and these reports can be
sent as a PDF by email on a scheduled cadence. If starting with a blank template,
choose the modules to be included.

PowerScale Advanced Administration

Page 762 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Scheduled Reports

Scheduled reports provide a history or baseline, allowing storage administrators to


compare current cluster status with standard workload that is shown in a previously
scheduled report.

File System Reporting

FILE SYSTEM REPORTING tab is used to examine the data capacity, data
distribution, deduplication, and quotas on the cluster.

Capacity

Provides overview
of custer usage by
storage location

Capacity reporting provides an overview of the data by storage location. Many


options are available to inspect the data at a detailed level.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 763


[email protected]
Appendix

Trend

PowerScale Advanced Administration

Page 764 © Copyright 2020 Dell Inc.


[email protected]
Appendix

File system analytics597 requires the FSAnalyze job to be run on the cluster.

• The FSA Report field is updated each time the job is run.
• The job is regularly scheduled job in OneFS.

Quotas

SmartQuotas usage viewable


with InsightIQ

Select date range to view Or view last 10 reports

If SmartQuotas are licensed on the cluster, FSA captures the quota status when
the FSAnalyze job is run. Quota reports can be viewed with InsightIQ.

597 The File System Analytics feature allows you to view File System reports. When
File System Analytics (FSA) is enabled on a monitored cluster, a File System
Analytics job runs on the cluster and collects data that InsightIQ uses to populate
file system reports. You can modify how much information is collected by the FSA
job through OneFS. You can also configure the level of detail displayed in file
system reports through InsightIQ.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 765


[email protected]
Appendix

Note: Do not apply a quota to the InsightIQ datastore


(/ifs/.ifsvar/modules/fsa directory) through the SmartQuotas module598.

Video: See Capacity usage through FSA demo video for more
information.

Capacity - File Size Details: The file counts by file size provides details for
examining physical file sizes and logical file sizes on the cluster. It can be useful
when performing a data protection level analysis for downloading the CSV files.

Trend Default Graph: The date range, time, and zoom level can be tailored to
isolate a particular period. The default graph displays existing trends in total usage.
Select other data to plot, such as total capacity, provisioned capacity, and writable
capacity. Note that writable capacity is calculated based on existing file size
distribution and the cluster's data protection level for the node pools.

Select the report that best meets the requirements similar to live or scheduled
performance reporting. Next select the FSA Report to use for the analysis. FSA
reports can be compared to one another.

Trend - Forecast Data Usage: InsightIQ 4.0 includes the capability to forecast
data usage to help plan for cluster expansion, or data cleanup. The projection uses
algorithmic driven estimations to forecast future capacity utilization. Time period to
use is selectable. FSA uses the selected range in the forecast calculations. To

598If you limit the size of the InsightIQ datastore through a quota, InsightIQ cannot
detect the available space. The datastore might become full before InsightIQ can
delete older data to make space available for newer data.

PowerScale Advanced Administration

Page 766 © Copyright 2020 Dell Inc.


[email protected]
Appendix

assist with charting normalization, select to eliminate outlier data points, and select
to show the standard deviation as part of the plot.

File System Reporting - Quotas: InsightIQ enables quotas to be viewed at the


directory level. Each directory can be investigated to any level within the directory
tree.

Quota Reports: Quota reports display information about quotas created through
the SmartQuotas software module. Quota reports can be useful if you want to
compare the data usage of a directory to the quota limits for that directory over
time. This information can help you predict when a directory is likely to reach its
quota limit.

User Management and Authentication

InsightIQ offers three ways for the authentication:

/etc/passwd based authentication

All users defined in the /etc/passwd file have


administrative access into InsightIQ.

The operating system user does not need to be in


the “wheel” group to be an administrator. Does not
require sudo access.

Local Users

Administrative users can create local read-only


users.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 767


[email protected]
Appendix

• Local read-only users can be provisioned within InsightIQ by an administrative


user.
• Read-only users can be provisioned under Settings -> Users -> Local Users.
• Read-only users do not have access to the SETTINGS tab.

PowerScale Advanced Administration

Page 768 © Copyright 2020 Dell Inc.


[email protected]
Appendix

LDAP

LDAP server configuration can be found under: Settings -> Users -> Configure LDAP tab

InsightIQ can be configured to authenticate users through LDAP service.

Once LDAP is enabled, InsightIQ checks the configured LDAP servers users and
groups for authentication.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 769


[email protected]
Appendix

• The LDAP integration feature allows authentication of LDAP users.


• Multiple LDAP groups599 of users can be configured as read-only or
administrator users.

Datastore Move Issue

Hayden is facing problem while moving data, help


him resolve the issue. The data is being migrated
from an NFS datastore, and InsightIQ stops
migrating data.

If there is not enough free space on the target datastore or if an NFS connection
gets interrupted, the datastore move operations can fail.

599
Once a connection is made between InsightIQ and the LDAP server, you can
add LDAP groups and users to InsightIQ.

PowerScale Advanced Administration

Page 770 © Copyright 2020 Dell Inc.


[email protected]
Appendix

• If the connection is permanently severed, you can recover the data only if you
have created a backup of the datastore by exporting it to a .tar file.
• Back up the datastore600 before moving an InsightIQ datastore.
• Set a quota to the target location601.

600If you have created a backup, you can import the backup datastore to a new
instance of InsightIQ to recover the data.

601If a quota is applied to the target location and the quota is configured to report
the size of the entire file system instead of the quota limit, there is less space on
the target location than InsightIQ requires. The migration might fail. If this failure
occurs, InsightIQ automatically transfers the datastore back to the original
datastore.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 771


[email protected]
Appendix

Performance Tuning

Protocol Balance

• The isi statistics is used to get the balance of the protocol traffic.
• Notice that the most significant number of operations are using SMB.
• Clusters having a predominant external protocol such as smb or nfs can affect
the cluster settings for on-disk identity, smb server settings, or UNIX settings.
• A later output showing the balance shifting significantly could indicate
inefficiencies if the cluster was tuned for a given protocol.

Number of Operations Rate of operations Protocol Type of operation

The graphic shows the external


and internal protocols.

The output shows the busiest protocols that are returned by


NumOps.

Connection Distribution

• The first isi statistics command queries the current NFS statistics and
the second the current Windows statistics.
• This example shows a nearly idle environment.

PowerScale Advanced Administration

Page 772 © Copyright 2020 Dell Inc.


[email protected]
Appendix

• As workflows are added and hosts are accessing the cluster, the output trends
accordingly. This output may also show unbalanced connections across nodes.
• Slow access becomes an issue. The output shows zero connections on node 3,
it can isolate an issue with node 3 or the network coming into node 3.

How many
connected and
active clients by
protocol

Key for current connected SMB Key for current active SMB clients
clients

The graphic shows establishing the baseline by documenting the distribution of connections
between the nodes.

Note: There are thousands of keys. Use isi statistics list


keys to view all the keys.

Protocol Read, Write, and Meta Data Mix

• The protocol traffic is predominantly SMB, hence use smb2 with isi
statistics psat to approximate the mix of read, write, and metadata
components.
• If slow access becomes an issue, this output indicates if the read or write or
metadata ratio has shifted, which could be a possible reason for the issue.
• If an application was migrated to the cluster shifting the read/write ratio from
70:30 to 40:60, it may cause unforeseen latencies.

The graphic shows the top section of the output, indicating the protocol command
rates.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 773


[email protected]
Appendix

1:

Calculate:
• Total (1255.99) - Write (333.22) + Read (829.23) = Metadata (93.54)
• Read (829/1256) * 100 = 66%
• Write (333/1256) * 100 = 26.5%
• Metadata (94/1256) * 100 = 7.5%

Other Protocol Metrics

List of commands that show additional metrics.

• Most used protocol operation


• isi statistics protocol list --sort Ops --degraded
• Protocol taking the most time
• isi statistics protocol list --sort TimeAvg --degraded
• Most demanding to least demanding

PowerScale Advanced Administration

Page 774 © Copyright 2020 Dell Inc.


[email protected]
Appendix

• isi statistics client list --sort Ops --degraded


• Slow or timed out Windows DC

• isi statistics protocol list --protocols lsass_out --


degraded

Time in Queue

• Disk time in queue602 indicates how long an operation is queued on a drive.


• This indicator is key for spindle bound clusters.
• A time in queue value of 10 to 50 milliseconds equals the Yellow zone, a time in
queue value of 50 to 100 milliseconds equals Red.
• To capture an overview profile of disk drive activity, run the isi statistics
drive command.

$8 is the “TimeInQ” field in the Gets the average across


Metrics above 50 ms need
output Time in queue for 30 drives all disks
attntion
sorted highest-to-lowest

Excessive queuing time indicates spindle


bound clusters

The output shows insignificant time in queue.

602
Examining the max, min, and average values for the disk time in queue,
administrators can get a disk drive activity baseline.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 775


[email protected]
Appendix

Note: For information about SAS drives, include SAS instead of


SATA.

Number in Queue

• Queue depth indicates how many operations are queued on drives.


• A queue depth of 5 to 10 is considered heavy queuing.
• Run isi statistics drive to examine the max, min, and average values
for the disk time in queue and the number in queue.

$9 is the “Queued” field in the output Metrics above 5 need attention Gets the average across all disks

Outputs the queue depth for 30


Excessive queue depth drives sorted highest-to-lowest

The output shows insignificant queue depth.

Note: For information about SAS drives, include SAS instead of


SATA.

Percent Busy

• Disk percent busy is helpful to determine if the drive is 100% busy.


• Disk percent busy does not indicate how much extra work might be in the
queue.
• To obtain the maximum, minimum, and average disk busy values for SATA
drives, run the isi statistics drive command.

PowerScale Advanced Administration

Page 776 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Continuous at or near Disk percent busy for 30


$10 is the “Busy” drives sorted highest-to-
100% may impact
field in the output lowest Gets the average
performance
across all disks

Extremely high disk activity impacts performance.

Note: For information about SAS drives, include SAS instead of


SATA.

Balance Across Nodes

• View and document the balance of disk operations across the nodes.
• In the example output, nodes 5, 6, 7, 8 are doing most of the work while the
other nodes are nearly idle. After investigation, an application is only using
nodes 5 through 8.
• Though the metrics are small and have no impact on production, continued
trending in this unbalance may need to be addressed.
• As a follow-up, use the isi statistics drive --nodes all --sort
OpsIn,Drive command to analyze to the node with the most disk operations.

An unbalanced distribution of disk operations


requires further investigation

Why is this node doing more work?

Maintain balanced work across the nodes.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 777


[email protected]
Appendix

Busiest Files

• Identify the top 15 files in use and their use rate using isi statistics.
• Each entry for the same path is a different event.
• For example, a read, getattr, lookup, or other operation.
• Multiple instances of the same path aggregate to indicate the total operation
rate for that path.
• Instead of using the all to show all node output, display the busiest files per
node using the switch --nodes 1, which is useful when isolating the node with
issues.

Read/write activity and data transfer identify


UNKNOWN files are:
busy files
A system file

A file with path name too long

A non existing snapshot Can identify busiest files to isolate problem


An unlike file still referenced node

The output shows entries for identical paths with different operation rates.

PowerScale Advanced Administration

Page 778 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Misaligned Writes

• Misalignment603 is at the file level, not at the file system level. Use the
isi_for_array command to view the misaligned writes.
• The output is shown in the upper box. After one minute, execute the command
as seen in the output of the lower box.
• Then calculate the difference between the misaligned write request counts from
each command.
• Divide this difference by the time between samples, 60 seconds.
• The result604 is the rate-per-second of misaligned write requests.

Misaligned writes add overhead.

1:

603Misalignment results from a storage abstraction layer, such as the virtual


machine storage not matching the OneFS storage blocking. The cost of each
misaligned write request depends on many variables and causes more I/O load,
ranging from 10% to 20%.

604The 0.17 rate for node one is insignificant to performance. For the 8.2 rate, the
administrator may monitor the cluster and if misaligned writes begin to impact
performance, take action.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 779


[email protected]
Appendix

Rate/sec of misaligned write requests:


• Node 1: (6997 - 6987)/ 60 = 0.17
• Node 2: (11812 - 11320) / 60 = 8.2

Blocked, Contended, and Deadlocked Events

• In a clustered array, you can expect some resource sharing and locking events.
• Use the isi statistics heat --totalby event,lin,path --limit
50 command to record the most recent 50 locking event counts605 when no
performance issues occur to establish the baseline workload.
• Locking Events are classified as Blocked, Contended606, and Deadlocked607.
• Excessive locking events can degrade performance.

Locking Events

605If an administrator finds subsequent locking event counts to be much higher


than the baseline, investigate locking events as a contributing factor to performance
degradation.

606Blocked and Contended events tend to be correlated together. The new lock
requester is blocked, and the current lock holder gets the contended callback.
Blocked and contended locking events are expected and a storage administrator
may see hundreds or more depending on how busy the cluster is.

607 Deadlock events are different, with no timeout, and deadlock events should be
infrequent.

PowerScale Advanced Administration

Page 780 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Blocked Access to the LIN is blocked waiting for


another operation to release a resource.

Contended A LIN is experiencing cross-node


contention. A node holds a lock after it
finishes an operation because it might
need the resource again—lock caching.
When another node requests the same
LIN lock, the coordinator node instructs
the original node to release the cached
lock by using the contended callback.

Deadlocked The attempt to lock the LIN resulted in a


deadlock.

Latency and Hops

• Administrators must understand if hops exist between the cluster and a host
because each network hop adds latency.
• To display route and transit delays for packets over IP, run the traceroute or
tracert command.
• Excessive hops may indicate a network issue.
• Increased hops may indicate a breakdown somewhere in the network model.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 781


[email protected]
Appendix

Issue from Linux/UNIX client to a node or from node to another node

5 probes

1 line – no hops from client


to node

Issue from Windows client to a node

3 probes

The more hoops, the more latency.

Latency and Packet Loss

• Administrators can record the baseline of packet latency and loss to the target
using the ping command from either Windows or Linux.
• The ping statistics sets a solid baseline to use.
• If encountering problems, compare the statistics to a later execution.
• Check for dropped packets and significant increases in network metrics.

Issue from Linux/UNIX client to a node or from node to another node

Packet loss should not


exceed 1%

Issue from the Windows client to a node

Overload link drop packets

PowerScale Advanced Administration

Page 782 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Bandwidth

• Measure the bandwidth using the iperf command.


• The target node is set up first. The host used here is another node.
• The time default interval is 10 seconds, which may not provide an accurate
sample, especially on high-bandwidth NICs.
• Knowing the bandwidth can help isolate network bottlenecks.
• If the node uses 40 Gb Ethernet, but a heavy load sees under 1 Gb, then the
network is probably the bandwidth bottleneck. The network model may show a 1
Gb switch.

Issue on target node first

Use Ctrl-C to close iperf on target once host


Issue on host execution is complete

Is there enough bandwidth for the workflows?

Jitter

• Jitter is the difference in packet delay between the target and the source.
• Excessive jitter can be the result of network congestion, improper queuing, or
configuration errors.
• To measure Jitter, use iperf. iperf sends the UDP packets between two
hosts running iperf.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 783


[email protected]
Appendix

Excessive jitter can be a source Use CTRL+C to close iperf on target


of the QoS problems Issue on target node first afterhost execution is complete

Issue on host

Jitter should be below 30


ms

The graphic shows that the delay between packets can vary instead of remaining constant.

Note: UDP limits the bandwidth measurement and must not be


considered a valid bandwidth indicator.

Retransmission Rate

• Use netstat to view the TCP retransmissions statistics.


• The output608 is used to assess the retransmission activity against the total
packets counted for that statistic.

608 The output shows that the retransmission rate is less than 0.1%, meaning this
retransmission is not significant and not an issue. Retransmission rates above 3%
negatively affect user experience.

PowerScale Advanced Administration

Page 784 © Copyright 2020 Dell Inc.


[email protected]
Appendix

• Interpret the retransmission rates as a percentage of the total transmission.


• Less than 0.1% retransmission for total transmitted bytes is acceptable for a
local network.

Retransmission rates above 3% affect user experience.

1:

• Retransmitted * 100 / data packets


• 9525723 * 100 / 35556615947 = .0268%

SmartConnect and Load Balance

Use the isi status command and observe the “Out” column of throughput to
assess throughput balance across the nodes in relation to the IP connections.

List of cluster and node commands that are used for the baseline.
• Lists of all the IP addresses that are bound to external interfaces.
• isi network interfaces list
• To view the SMB open files list, run the two commands Having many open
sessions can impact resources, especially RAM.
• isi smb session list
• isi_for_array -X 'isi smb openfiles list -v --format=csv
--no-header --no-footer'
• To get information about NFS locks on the array, it applies to NFSv3 only and
displays a list of NFS Network Lock Manager advisory locks. If users are unable
to access files, the command can help determine or isolate locking issues.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 785


[email protected]
Appendix

• isi nfs nlm locks list

Clusters and Nodes Foundation Baselines

List of commands for viewing the baseline for cluster and nodes.
• Capacity commands display the free capacity on the cluster and the storage
pool. Reaching 100% capacity locks up the cluster, take proactive measures to
prevent reaching 100% capacity.
• isi status --quiet
• isi storagepool list
• isi status -p
• Gives 10 processes using the most CPU on each node
• isi_for_array -s -X 'top -n -S 10'
• The two memory commands show the status of memory for each node.
Monitoring node CPU and memory can identify a resource unbalance between
nodes.

• isi statistics query current --nodes all --degraded --


stats node.memory.used
• isi statistics query current --nodes all --degraded --
stats node.memory.free

PowerScale Advanced Administration

Page 786 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Performance Benchmarking

Benchmark Overview

In the context of benchmarking a NAS system, there are 3 key metrics:

• Protocol Latency609 - measures the time from when a request is issued to the
time when the response is received.
• Data Throughput610 - measures the data transfer rate to and from the storage
system in megabytes/gigabytes per second.
• OPS611 - measures the number of operations performed at a protocol level per
second.

609 Latency can be measured at various points and where you measure can help
identify performance issues. The latency measured at the client side provides a
holistic view which encompasses latency in the client, network and storage.
Latency when measured by the storage system normally includes only the latency
of the storage system and excludes the network and client.

610Throughput can be measured either at the client or storage side. These values
should be identical or very close to each other unlike measuring latency.

611One OPS is not the same as one IOPS. Traditional SAN storage systems would
measure performance with IOPS. For NAS systems using SMB or NFS,
performance is normally measured using OPS. This is important because 1 IOP is
not the same as 1 OP. A single SMB operation could cause a lot of actual disk I/Os

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 787


[email protected]
Appendix

A benchmark can be performed either by specifically designed tools or applications


and workloads running on the storage system.

The important criteria is that the tool or method that you use is repeatable and
produces consistent results.

A good benchmark must help determine the suitability of storage system to run an
application under consideration.

Benchmarking Tools Examples

Benchmarking Softwares: Some tools are designed to test block-based storage


and some that measure file-based storage. Ensure to use the right benchmark.
Using an incorrect benchmark can provide results that do not represent the real
performance of the system for a workload.

SPEC SFS 2008 For file-based storage to monitor server throughput and
response time.

SPEC SFS 2014 File-based storage, updated from SPEC SFS 2008 and
used in measuring an end-to-end storage solution for
specific applications

IOZone File-based benchmarking tool to measure a variety of file


I/O performance metrics.

FIO Open-source file-based benchmarking tool.

to occur. For example, a single request for a directory listing operation like
READDIRPLUS in NFS will cause a lot of disk I/O to be generated.

PowerScale Advanced Administration

Page 788 © Copyright 2020 Dell Inc.


[email protected]
Appendix

Vdbench Free, Java-based, for block-based and file-based


systems.

mdtest and IOR mdtest is used for testing metadata operations while IOR
is designed to test streaming data.

Using applications as benchmarks: If you can use real applications, then


generally that is the best. However, be aware that using a real application involves
a lot more than just the storage. You can have interactions with the type of data,
time to process, client memory or CPU load, and so on. Another area to be careful
while using real applications is the actual workload. Frequently, customers will want
to run just one aspect of the workload as they say it is representative of their entire
workflow. In most cases, this one aspect is a bottleneck in their workflow that may
only come up once in a while. Always try to understand the real customer work
flow.

Why Benchmarking?

• To provide customers a quantitative measure to evaluate the performance of a


storage system.
• After meeting the feature set requirement, performance is usually second in
deciding on a storage system.
• Replicate and compare two storage system to determine which is more suitable
and faster.
• Benchmarks can model a workflow and allow you to leverage replicates in a lab
environment.
• Running benchmarks from time to time helps determine whether a system is
running as intended.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 789


[email protected]
Appendix

Qualifying Questions

Before you benchmark your storage system, ensure


to ask the following questions in order to choose the
right tool and to reflect your workflows:

• What are you benchmarking? What are you


trying to measure?
• What is the size of the test dataset? Is the test
data being compressed?
• If your workload has lots of clients, lots of files,
what do you need to model that?
• Which metric is the most important for your
workflow?
• Does your workload involve billions of files? Do they have a variety of file sizes?
• Does your workload do a lot of metadata operations?
• How much caching is present at the client side and storage system side?
• What is the read to write ratio for your workload?
• Does your workload have more sequential or random file I/O operations?

Vdbench Overview

• Vdbench is a free, Java-based benchmarking tool used to test both file-based


and block-based storage systems.
• Vdbench can run on any system that supports and runs either 32-bit or 64-bit
Java.
• Vdbench is flexible, easy to configure, easy to interpret results and can run on
multiple platforms.
• Vdbench allows creation of a rich file structure. Most benchmark programs
create a small number of identical files in flat directories which is not a good
representation of real world.

PowerScale Advanced Administration

Page 790 © Copyright 2020 Dell Inc.


[email protected]
Appendix

• Vdbench can specify metadata operations as part of workflow and not just data.
Many workflows have more than 50% metadata operations.
• Vdbench is capable of easily modeling complex datasets and I/O patterns. You
can model several dataset and I/O pattern sets and them combine them into a
larger test.

− Example: You could have one test that does large file sequential access and
another test that does small file random I/O. Then, you can combine them
into a single test that runs both.

Resource: Vdbench Software File and User Guide

PowerScale and Vdbench

To setup Vdbench in a PowerScale environment, majority of configuration occurs


on the client side. However there is some minimal configuration on PowerScale.

• For both SMB and NFS, verify that there is an SMB share or NFS export
configured for /ifs on PowerScale with the appropriate read and write
permissions. Starting with OneFS 9.0, there are no default shares or exports
configured.
• For performance benchmarks enabling the run-as-root option for SMB and
enabling map-root-to-root for NFS simplifies configuration by bypassing
security. If this security bypass is not acceptable, such as in a production
environment, you need to ensure to properly configure the benchmark
directories for read and write access by the clients.
• For best performance, a client should be connected to one node in the
PowerScale cluster. The client connections should ideally be balanced across
the number of available nodes. Each client should mount PowerScale using the
same path. This will simplify the vdbench profile configuration.

− Example: Client 1 mounts node 2 at /mnt/test, client 2 mounts node 3 at


/mnt/test, client 3 mount to node 1 at /mnt/test.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 791


[email protected]
Appendix

Anatomy of a Profile

• A profile or workload parameter file defines how the benchmark tool will run.
• It is defined by combining four sections: Host Definition (HD), File System
Definition (FSD), File System Workload Definition (FWD), and Run
Definition (RD).
• There are also general sections which contain parameters related to
deduplication, journaling, data validation and so on.
• The sections need to be in a specific order as later sections reference prior
sections. The required order is General -> HD -> FSD -> FWD -> RD.

1: Specifies the hosts that will participate in generating load. The HD section
defines which clients will run the benchmark.

2: Specifies the file system structure that the test will run across. This section
defines the dataset that the benchmark will operate over.

PowerScale Advanced Administration

Page 792 © Copyright 2020 Dell Inc.


[email protected]
Appendix

3: Specifies the actual I/O operations to perform over the dataset.

4: Specifies the actual test to execute including parameters like duration and target
OPS.

Each section follows the pattern of key-value pairs. There is a key which defines
which section the parameter will belong, followed by a label and then additional
options. There is a special label called default that can be used in each section to
explicitly set the default values so they do not need to be repeated for each
following entry.

include is a special parameter used to import contents from an external file.


When this parameter is found, the contents of the file name specified will be copied
in place. The include parameter can be placed anywhere in the profile and is not
subject to ordering requirements.

Anatomy of a Profile: Host Definition

Host Definition is only needed when running Vdbench in a multi-host environment


or if you want to override the number of JVMs used in a single-host environment.

You will normally want to have one HD parameter for each client you want to run
the benchmark.

You can have the same physical or virtual client have more than one host entry.
You can assign specific work for that client and potentially do more work.

1:

• Host Label - each host has a host label. The label uniquely identifies a client.
Special labels such as default and localhost can also be used.
• System - when running Vdbench in a multi-host environment, you can specify
the FQDN or IP address of the host.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 793


[email protected]
Appendix

• Path - specifies to Vdbench where it can find its installation directory on a


remote host. The vdbench path can be different for every client, but it is a good
practice to have all the clients have the vdbench binaries in the same location.
• Shell - The shell parameter determines how the Vdbench coordinator will
communicate with the clients. For UNIX systems, using SSH with SSH keyless
login works the best. RSH is also a very viable option. For Windows clients, you
will have to use the vdbench shell option. This requires you to launch Vdbench
manually on every client. With the other 2 methods, Vdbench itself will auto-
launch when necessary.
• User - Used for RSH and SSH. It is the responsibility of the user to properly
define the RSH or SSH security settings on the local and on the remote hosts.

2: The example shows the use of the default label as well as the individual entries.
The default values defined above will be applied to each HD entry below. There are
2 hosts defined for running the benchmark and each host will use SSH for the
access method using the root user. Both hosts have Vdbench installed in the
same directory.

Anatomy of a Profile: File System Definition

You can define one or more FSD to customize the size and distribution of the test
data612. Having multiple FSDs is especially useful in replicating a complete
directory structure for your testing. When a FSD is not used by any test, Vdbench
will not create the dataset. For many benchmarks, the file structure usually ends up
being very simple because defining something complicated is difficult.

612A single test can operate across a single or multiple FSDs. You can model a
shallow and wide directory structure and a deep and narrow directory structure at
the same time. You can define the different file counts and size distribution as well.

PowerScale Advanced Administration

Page 794 © Copyright 2020 Dell Inc.


[email protected]
Appendix

3
4

1:

• Label: each FSD requires a unique label to identify this FSD for use.
• Anchor: specifies the parent directory where the test files are created.
• Shared: determines if the FSD being defined is shared between all the clients
or if each client will have their own isolated the dataset.
• Width: specifies the number of directories to create in each parent directory.
• Depth: specifies the number of levels deep to create the directory tree.
• Files: specifies the number of test files to be created in the lowest level
directory.
• Sizes: specifies the size of each test file. Sizes can be a single number
(Example: 50M) or it can be a size distribution which consists of a size and
percentage pair (Example: 1k,30,8k,40,32k,20,1024k,10). The percentages
need to add up to 100.
• Distribution: specifies whether to create files only in the lowest level directories
or in all directories.

2: The example shows created a FWD named fsd_1. Using a numeric suffix can
enable the use of globbing such as * when referencing FSD later in the file.

The FSD will create 2000 files, each 1 megabyte at the lowest level directories in
the directory tree shown in the graphic.

In the example, shared is set to yes which means that all hosts will perform I/O in
the same directory structure. This does not mean that 2 hosts will read and write to
the same file. Instead, for performance and simplicity reasons, every host will work
on a subset of the files in the shared file structure. For example, if you have 2 hosts
and 100 files, one host will get all the odd-numbered files while the second host will
get all the even-numbered files. The set of files that a host gets is predetermined

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 795


[email protected]
Appendix

using an algorithm. When shared is set to no, each host running against this FSD
will create their own working directory and file set.

3: By default, the distribution parameter is set to bottom. The test files are only
created at the bottom level directories, otherwise knows as leaf directories. For the
above example, the size of the dataset is 22 * 2000 * 1 = 8000 megabytes.

4: With distribution set to all, the number of test files specified are created at each
directory in the directory tree.

Anatomy of a Profile: File System Workload Definition

FWD defines the type of test that will be performed. The FWD ties together the
hosts (HD), the file system definition (FSD) and the type of operations you want to
perform. The FWD is also where you tie together a workload with the data that the
workload acts upon.

When doing a test run, you can have a single line FWD, but that only allows very
limited flexibility. Normally, you will have multiple FWD lines to perform different
operations each at a percentage of the total operations.

Basic Format: fwd=<label>,fsd=<fsd>,operation=<operation>

1:

• Label: specifies a unique name for the FWD entry. The default label can be
used to serve as default values for all following FWDs.
• Host: specifies the hosts on which the workload will run.
• File System: specifies the names of the FSDs to use to run the workload.
• Operation: specifies a single file system operation that must be done for this
workload. Example: read, write, access, GETATTR, SETATTR, open, close.
You can also create a sequence of operations.

PowerScale Advanced Administration

Page 796 © Copyright 2020 Dell Inc.


[email protected]
Appendix

• Transfer Size: data transfer size used for read or write operations. Example:
1M,64k,(8k,50,128k,50)
• File I/O: specifies the type of I/O that needs to be done on each file, either
random or sequential. .
• Thread: number of concurrent threads to run for this workload. You must
ensure to have at least one file for each thread.
• Skew: specifies the percentage of the total work for a particular FWD. The total
skew percentage must add up to 100.

2: The example defines a workload with 50/50 read/write operations on files


sequentially. The transfer rate is set to 128k. The operations are performed using
host_1 on the file system defined by fsd_1.

You can define small transfer size I/O on a small file dataset and large block
sequential I/O for a large file dataset. The skew number can be used to easily give
an operation a certain percentage of the total workload. If it is not specified,
Vdbench will try to evenly distribute I/O operations across all the FWDs. For most
workloads, the file I/O pattern should be sequential. Files are normally written or
read as a whole. Random I/O is used only when you have a workflow that reads or
writes random segments within a single file, like a virtual disk image.

Anatomy of a Profile: Run Definition

RD defines which FWD to run for a single test run, how much I/O to generate and
how long to run the workload. When you have more than one RD entry, Vdbench
will run those tests in sequence. This allows you to setup a very long test cycle with
multiple tests and then have it automatically run through each one without user
interaction.

1:

• Label: specifies a unique name for the RD.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 797


[email protected]
Appendix

• Workload: the set of FWDs to execute in the test. To define multiple FWDs, you
can use globbing such as fwd*, or by specifying them individually like
(fwd1,fwd2,fwd8).
• Elapsed Time: the amount of time that this test will run in seconds. The default
value is 30 seconds.
• Interval: the number of seconds between each status report.
• Forward Rate: this is the total number of OPS that the system will attempt to
generate across all the FWD. It can be specified as a single value, a set of
values, or a range with increment. There is a special label max that requests
Vdbench to run as fast as it can.
• Format: specifies how Vdbench will work with the FSD. For large file systems,
you do not want to recreate the entire file system for every run.

2: The example defines an RD to run the workload for fwd_1 for 20 minutes. The
number of operations per second specified is 5000. With format set to restart,
Vdbench will create only files that have not been created and will also expand files
that have not reached their proper size.

Vdbench First Run

• As Vdbench is a Java program, it can run across multiple platforms, even at the
same time.
• The Vdbench primary node can connect to the load generator hosts through 3
different methods: RSH, SSH and vdbench internal RSH.
• Each host can be connected through a different method.
• RSH and SSH can be configured to not require a password. Also, no manual
startup in required.
• Basic syntax to run Vdbench: ./vdbench -f profile_name.txt

PowerScale Advanced Administration

Page 798 © Copyright 2020 Dell Inc.


[email protected]
Appendix

1: The command runs the profile.txt file. The -f option is used to specify one
or more profile files to be executed. The profile file contains all the workload
parameters.

2: The -t option is used to run a demo I/O workload. A small temporary file is
created and a 50/50 read/write test is executed for just five seconds. This is a way
to test that Vdbench has been correctly installed and works for the current
operating system platform without the need to first create a profile file.

3: The -tf is used to run a demo file system workload without the need to create a
profile file.

4: The -o option is used to specify the output directory. When you do not specify
this, a default directory called output will be used for every run. You generally want
to use the –o option so you do not overwrite previous benchmark runs. In the
example, the command executes the test1.txt profile and creates the output
directory named test.

5: Adding the + symbol at the end of the output directory names creates directories
with increasing numbers appended starting from 001. In this example, if the output
directory test does not exist, the command will create the output directory named
test. If test already exists, the output directory created is named test001.

6: Adding .tod at the end of the directory name creates output directories with the
timestamp (yyyymmdd.HHMMss) appended. This is useful when you want to
know when a particular run was performed. In the example, the output directory
name would be something like test20200909.031534

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 799


[email protected]
Appendix

Vdbench Output

• Once a run is complete, a large number of files are created in the specified
output directory.
• The tool outputs HTML that can be loaded into a standard web browser for
easier navigation.
• The detailed output of a run is in a column format. All throughput values are in
MB/s.
• The three most important files to analyze after a run include: summary.html,
totals.html and histogram.html.

Output Descrip Example


File tion

summary. It has
html the
hyperlin
ks to
each
individu
al host,
each of
the
individu
al FWD
defined
in the
profile,
each
RD in
the
profile,
and so
on.

PowerScale Advanced Administration

Page 800 © Copyright 2020 Dell Inc.


[email protected]
Appendix

totals.html Shows a
summari
zed
output
of all the
test runs
without
the
interme
diate
reportin
g. It
provides
just the
totals
and is a
good file
to look
at when
getting
data for
creating
a graph.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 801


[email protected]
Appendix

histogram Shows
.html the
distributi
on of
latency
for
reads
and
writes
combine
d. It can
be
directly
read
into
Excel as
a tab-
delimite
d file for
further
analysis
.

Benchmarking Considerations

The benchmarking process can be easily manipulated because of the large number
of variables that influence performance results. To level the playing field, test
results need to be categorized by product type, configuration standards need to be

PowerScale Advanced Administration

Page 802 © Copyright 2020 Dell Inc.


[email protected]
Appendix

defined for each category and vendors must strictly adhere to the configurations.
Some of the considerations include:

• Use the correct benchmark for your workload613. For example, do not use block-
based benchmark to test a NAS.
• Your dataset should reflect your real workload. A benchmark with 10 large files
is not good if your real dataset has millions or billions of files.
• Your dataset must be large enough so that it does not fit into local client cache
or fully in the storage system cache. As a rule of thumb, the dataset must be
double the memory of the client and storage system cache combined.
• Do not use a single corner case in a real-world workflow as the only metric for a
benchmark.
• Simple benchmarks generally do not provide good results. Example: Drag and
drop copy files for running a throughput test.
• When running a benchmark multiple times, take the average value instead of
the highest value.

613 You need to understand how each of the benchmarks works. Often, a customer
will use a benchmark because they have been using it for a very long time. That
doesn’t mean that the benchmark is still relevant to their workloads today, and most
often they are not. There is a lot of inertia to re-use the same benchmark for every
storage system. You want to use a benchmark that models the customer’s workflow
as closely as possible.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 803


[email protected]
[email protected]
Glossary
Event
Events are individual occurrences or conditions related to the data workflow,
maintenance operations, and hardware components of your cluster.

Event Group
Event groups are collections of individual events that are related symptoms of a
single situation on your cluster.

Impact Policy
The relationship between the running jobs and the system resources is complex. A
job running with a high impact policy can use a significant percentage of cluster
resources, resulting in a noticeable reduction in cluster performance. Because jobs
are used to perform cluster maintenance activities and are often running, most jobs
are assigned a low impact policy. Do not assign high impact policies without
understanding the potential risk of generating errors and impacting cluster
performance. Several dependencies exist between the category of the different
jobs and the amount of system resources that are consumed before resource
throttling begins. The default job settings, job priorities, and impact policies are
designed to balance the job requirements to optimize resources. The
recommendation is to not change the default impact policies or job priorities without
consulting qualified Dell Technologies engineers.

Job Engine
The OneFS Job Engine is an execution, scheduling and reporting framework for
cluster-wide management of tasks.

Job Priority
A job can have a priority between 1 (highest priority) to 10 (lowest priority). If a low-
priority job is running when a high priority job is called, the low-priority job pauses,
and the high priority job runs. The job progress is periodically saved by creating
checkpoints. When the higher priority job completes, the checkpoint is used to
restart the lower priority job at the point where the job paused.

SMB Continous Availability


CA enables SMB clients to transparently and automatically failover to another node
if a network or node fails. CA is supported with Microsoft Windows 8, Windows 10,
and Windows 2012 R2 clients.

PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 805


[email protected]
PowerScale Advanced Administration

© Copyright 2020 Dell Inc. Page 806


[email protected]

You might also like