PowerScale+Administration Course+Guide
PowerScale+Administration Course+Guide
PowerScale+Administration Course+Guide
ADMINISTRATION
COURSE GUIDE
(V3)
PARTICIPANT GUIDE
Internal Use - Confidential
PowerScale Administration-SSP1
PowerScale Administration-SSP1
PowerScale Administration-SSP1
PowerScale Administration-SSP1
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 5
NAS, PowerScale, and OneFS
Scenario
Storage Technologies
DAS
In the early days of system data, corporations1 stored data on hard drives in a
server. To minimize risk, corporations mirrored the data on a RAID. This technique
is called Direct Attached Storage (DAS).
1The intellectual property of the company depended entirely upon that hard drive's
continued functionality.
PowerScale Administration-SSP1
RAID
DAS
SAN
As applications proliferated, soon there were many servers, each with its own DAS.
This worked fine, with some drawbacks2. Due to this limitation with DAS, SAN was
introduced which effectively utilized volume manager and RAID.
SAN
NAS
2If one server’s DAS was full while another server’s DAS was half empty, the
empty DAS could not share its space with the full DAS.
3PCs worked differently from the storage file server and the network
communications in PCs, only communicate from one file system to another file
system.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 7
NAS, PowerScale, and OneFS
NAS
CAS
PowerScale Administration-SSP1
Cloud
Cloud storage stores data over the Internet to a cloud provider. The cloud provider
manages and protects the data. Typically, cloud storage is delivered on demand
with just-in-time capacity and costs.
NAS Overview
NAS provides the advantages of server consolidation by eliminating the need for
multiple file servers.
• Consolidates the storage that is used by the clients onto a single system,
making it easier to manage the storage.
• Uses network and file-sharing protocols to provide access to the file data5.
• Uses its own operating system6 and integrated hardware and software
components to meet specific file-service needs.
PowerScale clusters are a NAS solution. There are two types of NAS architectures;
scale-up and scale-out.
5NAS enables both UNIX and Microsoft Windows users to share the same data
seamlessly.
6 Its operating system is optimized for file I/O and, therefore, performs file I/O better
than a general-purpose server. As a result, a NAS device can serve more clients
than general-purpose servers and provide the benefit of server consolidation.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 9
NAS, PowerScale, and OneFS
Scale-Up
Controller with
disk shelves
Independent systems on
network - separate
Clients points of management
Structured or Unstructured
storage
Scale-Out
• With a clustered NAS solutions, or scale-out architecture, all the NAS boxes, or
PowerScale nodes, belong to a unified cluster with a single point of
management.
• In a scale-out solution, the computational throughput, disks, disk protection, and
management are combined and exist for a single cluster.
PowerScale Administration-SSP1
Unstructured storage
PowerScale cluster
1000+ PBS
Scale-Out NAS
Scale-out NAS7 is now a mainstay in most data center environments. The next
wave of scale-out NAS innovation has enterprises embracing the value8 of NAS
and adopting it as the core of their infrastructure.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 11
NAS, PowerScale, and OneFS
PowerScale Administration-SSP1
With traditional NAS systems the file system9, volume manager10, and the
implementation of RAID11 are all separate entities.
OneFS is the operating system and the underlying file system that drives and
stores data.
OneFS is a single file system that performs the duties of the volume manager and
applies protection.
OneFS is built on FreeBSD.
• Creates a single file system for the cluster.12
• Volume manager and protection.13
• Data shared across cluster.14
• Scale resources.15
9The file system is responsible for the higher-level functions of authentication and
authorization.
12As nodes are added, the file system grows dynamically and content is
redistributed.
13 OneFS performs the duties of the volume manager and applies protection to the
cluster as a whole. There is no partitioning, and no need for volume creation. All
data is striped across all nodes.
14Because all information is shared among nodes, the entire file system is
accessible by clients connecting to any node in the cluster.
15Each PowerScale storage node contains globally coherent RAM, meaning that,
as a cluster becomes larger, it also becomes faster. When adding a node, the
performance scales linearly.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 13
NAS, PowerScale, and OneFS
Challenge
IT Manager:
Open participation question:
Question: What is the difference between scale-up and scale-out
architecture?
PowerScale Administration-SSP1
PowerScale
Scenario
Gen 6 highlights.
Gen 6.5 highlights.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 15
NAS, PowerScale, and OneFS
Gen 6 requires a minimum of four nodes to form a cluster. You must add nodes to
the cluster in pairs.
The chassis holds four compute nodes and 20 drive sled slots.
Both compute modules in a node pair power-on immediately when one of the
nodes is connected to a power source.
Gen 6 chassis
1 10 9
2 8
4
6
3
5 7
1: The compute module bay of the two nodes make up one node pair. Scaling out a
cluster with Gen 6 nodes is done by adding more node pairs.
2: Each Gen 6 node provides two ports for front-end connectivity. The connectivity
options for clients and applications are 10 GbE, 25 GbE, and 40 GbE.
3: Each node can have 1 or 2 SSDs that are used as L3 cache, global namespace
acceleration (GNA), or other SSD strategies.
4: Each Gen 6 node provides two ports for back-end connectivity. A Gen 6 node
supports 10 GbE, 40 GbE, and InfiniBand.
5: Power supply unit - Peer node redundancy: When a compute module power
supply failure takes place, the power supply from the peer node temporarily
provides power to both nodes.
6: Each node has five drive sleds. Depending on the length of the chassis and type
of the drive, each node can handle up to 30 drives or as few as 15.
PowerScale Administration-SSP1
8: The sled can be either a short sled or a long sled. The types are:
9: The chassis comes in two different depths, the normal depth is about 37 inches
and the deep chassis is about 40 inches.
10: Large journals offer flexibility in determining when data should be moved to the
disk. Each node has a dedicated M.2 vault drive for the journal. A node mirrors
their journal to its peer node. The node writes the journal contents to the vault when
a power loss occurs. A backup battery helps maintain power while data is stored in
the vault.
Gen 6.5 requires a minimum of three nodes to form a cluster. You can add single
nodes to the cluster. The F600 and F200 are a 1U form factor and based on the
R640 architecture.
1
5
8 2
7 4
1: Scaling out an F200 or an F600 node pool only requires adding one node.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 17
NAS, PowerScale, and OneFS
3: Each Gen F200 and F600 node provides two ports for backend connectivity. The
PCIe slot 1 is used.
4: Redundant power supply units - When a power supply fails, the secondary
power supply in the node provides power. Power is supplied to the system equally
from both PSUs when the Hot Spare feature is disabled. Hot Spare is configured
using the iDRAC settings.
5: Disks in a node are all the same type. Each F200 node has four SAS SSDs.
6: The nodes come in two different 1U models, the F200 and F600. You need
nodes of the same type to form a cluster.
7: The F200 front-end connectivity uses the rack network daughter card (rNDC).
PowerScale offers nodes for different workloads of performance and capacity. The
table below shows some of the node specifications. To get the latest and a
complete list of specification and compare between the node offerings, browse the
product page.
PowerScale Administration-SSP1
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 19
NAS, PowerScale, and OneFS
PowerScale Administration-SSP1
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 21
NAS, PowerScale, and OneFS
PowerScale Administration-SSP1
PowerScale Features
The design goal for the PowerScale nodes are to keep the simple ideology of NAS,
provide the agility of the cloud, and the cost of commodity. Click each tab to learn
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 23
NAS, PowerScale, and OneFS
more on the features provided by PowerScale. See the student guide for more
information.
16A Media and Entertainment production house needs high single stream
performance at PB scale that is cost optimized. The organization requires cloud
archive in a single namespace, archive optimized density with a low Total Cost of
Ownership (TCO) solution. This environment typically has large capacities and
employs new performance technologies at will.
PowerScale Administration-SSP1
Data Protection
Sizing
17Financial sectors rely heavily on data protection and availability to operate. Data
loss such as customer transactions or system downtime can negatively affect the
business.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 25
NAS, PowerScale, and OneFS
The Gen 6x platforms addresses the challenges of agility and lower TCO by:
• Dedicated cache drives
• Modular architecture
• Non-disruptive upgrades
PowerScale has no dependency on the flash boot drive. Gen 6 nodes boot from
boot partitions on the data drives. These drives are protected using erasure coding
to remove the dependency on dedicated boot drives. Next, PowerScale uses SSD
drives for the journal to remove the NVRAM dependency present on Gen 5 nodes.
There are now multiple distributed copies of the journal.
18A simplicity and agility use case is a small start-up company growing at rapid
pace, who needs to start with limited capacity and then grow on demand for scale
and new workloads.
PowerScale Administration-SSP1
By creating smaller failure domains with significantly fewer drives in each node pool
and neighborhood, increases the reliability of the system by reducing the spindle-
to-CPU ratio. The increased reliability enables the cluster to use larger capacity
drives, without the risk of overburdening the system in the event of a drive failure.
PowerScale enables predictable failure handling at Petabyte (PB) densities.
Gen 6 platforms have dedicated cache drives for dedicated cache. The caching
options offered are 1 or 2 SSD configurations in various capacities to maximize
front end performance. Gen 6 hardware is focused on support and serviceability,
based on a modular architecture with full redundancy. It is possible to increase
performance with data in place, increase cache without disruption, and upgrade
speeds and feeds non-disruptively.
PowerScale Family
The Gen 6x family has different offerings that are based on the need for
performance and capacity. You can scale out compute and capacity separately.
OneFS runs on all nodes. Click each tab to learn more about the different offerings.
F-Series
The F-series nodes sit at the top of both performance and capacity, with the all-
flash arrays. The all-flash platforms can accomplish 250-300k protocol operations
per chassis, and get 15 GB/s aggregate read throughput from the chassis. Even
when the cluster scales, the latency remains predictable.
• F800
• F810
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 27
NAS, PowerScale, and OneFS
• F600
• F200
H-Series
After F-series nodes, next in terms of computing power are the H-series nodes.
These are hybrid storage platforms that are highly flexible and strike a balance
between large capacity and high-performance storage to provide support for a
broad range of enterprise file workloads.
• H400
• H500
• H5600
• H600
A-Series
The A-series nodes namely have lesser compute power compared to other nodes
and are designed for data archival purposes. The archive platforms can be
combined with new or existing all-flash and hybrid storage systems into a single
cluster that provides an efficient tiered storage solution.
• A200
• A2000
PowerScale Administration-SSP1
Node Interconnectivity
1: Backend ports int-a and int-b. The int-b port is the upper port. Gen 6 backend
ports are identical for InfiniBand and Ethernet, and cannot be identified by looking
at the node. If Gen 6 nodes are integrated in a Gen 5 or earlier cluster, the backend
will use InfiniBand. Note that there is a procedure to convert an InfiniBand backend
to Ethernet if the cluster no longer has pre-Gen 6 nodes.
2: PowerScale nodes with different backend speeds can connect to the same
backend switch and not see any performance issues. For example, an environment
has a mixed cluster where A200 nodes have 10 GbE backend ports and H600
nodes have 40 GbE backend ports. Both node types can connect to a 40 GbE
switch without effecting the performance of other nodes on the switch. The 40 GbE
switch provides 40 GbE to the H600 nodes and 10 GbE to the A200 nodes.
4: There are two speeds for the backend Ethernet switches, 10 GbE and 40 GbE.
Some nodes, such as archival nodes, might not need to use all of a 10 GbE port
bandwidth while other workflows might need the full utilization of the 40 GbE port
bandwidth. The Ethernet performance is comparable to InfiniBand so there should
be no performance bottlenecks with mixed performance nodes in a single cluster.
Administrators should not see any performance differences if moving from
InfiniBand to Ethernet.
Gen 6 nodes can use either an InfiniBand or Ethernet switch on the backend.
InfiniBand was designed as a high-speed interconnect for high-performance
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 29
NAS, PowerScale, and OneFS
computing, and Ethernet provides the flexibility and high speeds that sufficiently
support the PowerScale internal communications.
Gen 6.5 only supports Ethernet. All new, PowerScale clusters support Ethernet
only.
Network: There are two types of networks that are associated with a cluster:
internal and external.
19In general, keeping the network configuration simple provides the best results
with the lowest amount of administrative overhead. OneFS offers network
provisioning rules to automate the configuration of additional nodes as clusters
grow.
PowerScale Administration-SSP1
Ethernet
Clients connect to the cluster using Ethernet connections20 that are available on all
nodes.
20Because each node provides its own Ethernet ports, the amount of network
bandwidth available to the cluster scales linearly.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 31
NAS, PowerScale, and OneFS
OneFS supports a single cluster21 on the internal network. This back-end network,
which is configured with redundant switches for high availability, acts as the
backplane for the cluster.22
The Gen 6x back-end topology in OneFS 8.2 and later supports scaling a
PowerScale cluster to 252 nodes. See the participant guide for more details.
27 uplinks per
spine switch
Leaf-Spine is a two level hierarchy where nodes connect to leaf switches, and leaf
switches connects to spine switches. Leaf switches do not connect to one another,
and spine switches do not connect to one another. Each leaf switch connects with
22 This enables each node to act as a contributor in the cluster and isolating node-
to-node communication to a private, high-speed, low-latency network. This back-
end network utilizes Internet Protocol (IP) for node-to-node communication.
PowerScale Administration-SSP1
each spine switch and all leaf switches have the same number of uplinks to the
spine switches.
The new topology uses the maximum internal bandwidth and 32-port count of Dell
Z9100 switches. When planning for growth, F800 and H600 nodes should connect
over 40 GbE ports whereas A200 nodes may connect using 4x1 breakout cables.
Scale planning enables for nondisruptive upgrades, meaning as nodes are added,
no recabling of the backend network is required. Ideally, plan for three years of
growth. The table shows the switch requirements as the cluster scales. In the table,
Max Nodes indicate that each node is connected to a leaf switch using a 40 GbE
port.
Challenge
IT Manager:
Open participation question:
Question: What are the differences between Gen 6 nodes and
Gen 6.5 nodes?
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 33
NAS, PowerScale, and OneFS
Resources
PowerScale Administration-SSP1
Scenario
• Serial Console
• Web Administration Interface (WebUI)
• Command Line Interface (CLI)
• Platform Application Programming Interface (PAPI)
• Front Panel Display
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 35
NAS, PowerScale, and OneFS
This video provides an overview on the serial console. See the student guide for a
transcript of the video.
https://edutube.emc.com/Player.aspx?vno=jHnaLyBuvlzyrARCLAU/jw==&autoplay
=true
Four options are available for managing the cluster. The web administration
interface (WebUI), the command-line interface (CLI), the serial console, or the
platform application programming interface (PAPI), also called the OneFS API. The
first management interface that you may use is a serial console to node 1. A serial
connection using a terminal emulator, such as PuTTY, is used to initially configure
the cluster. The serial console gives you serial access when you cannot or do not
want to use the network. Other reasons for accessing using a serial connection
may be for troubleshooting, site rules, a network outage, and so on. Shown are the
terminal emulator settings.
PowerScale Administration-SSP1
configuration Wizard, running the isi config command enables you to change
the configuration settings.
isi config
Common commands -
shutdown, status, name
Change
s
prompt
to >>>
Other "isi" commands not available in
configuration console
The isi config command, pronounced "izzy config," opens the configuration
console. The console contains configured settings from the time the Wizard started
running.
Use the console to change initial configuration settings. When in the isi config
console, other configuration commands are unavailable. The exit command is
used to go back to the default CLI.
OneFS
version
User must have logon privileges
Connect to
any node in
cluster over
HTTPS on
port 8080
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 37
NAS, PowerScale, and OneFS
The WebUI requires at least one IP address configured23 on one of the external
Ethernet ports present in one of the nodes.
• Out-of-band24
• In-band25
Both methods are done using any SSH client such as OpenSSH or PuTTY. Access
to the interface changes based on the assigned privileges.
OneFS commands are code that is built on top of the UNIX environment and are
specific to OneFS management. You can use commands together in compound
24Accessed using a serial cable connected to the serial port on the back of each
node. As many laptops no longer have a serial port, a USB-serial port adapter may
be needed.
PowerScale Administration-SSP1
command structures combining UNIX commands with customer facing and internal
commands.
4
1
5
3 6
5: The CLI command use includes the capability to customize the base command
with the use of options, also known as switches and flags. A single command with
multiple options result in many different permutations, and each combination
results in different actions performed.
6: The CLI is a scriptable interface. The UNIX shell enables scripting and execution
of many UNIX and OneFS commands.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 39
NAS, PowerScale, and OneFS
CLI Usage
Option
explanation
The man isi or isi --help command is an important command for a new
administrator. These commands provide an explanation of the available isi
commands and command options. You can also view a basic description of any
command and its available options by typing the -h option after the command.
26A chief benefit of PAPI is its scripting simplicity, enabling customers to automate
their storage administration.
PowerScale Administration-SSP1
3: Some commands are not PAPI aware, meaning that RBAC roles do not apply.
These commands are internal, low-level commands that are available to
administrators through the CLI. Commands not PAPI aware: isi config, isi
get, isi set, and isi services
4: The number indicates the PAPI version. If an upgrade introduces a new version
of PAPI, some backward compatibility ensures that there is a grace period for old
scripts to be rewritten.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 41
NAS, PowerScale, and OneFS
The Gen 6 front panel display is an LCD screen with five buttons used for basic
administration tasks27.
The Gen 6.5 front panel has limited functionality28 compared to the Gen 6.
Challenge
Lab Assignment: Launch the lab image and connect to the cluster
using the WebUI and the CLI.
27Some of them include: adding the node to a cluster, checking node or drive
status, events, cluster details, capacity, IP and MAC addresses.
28You can join a node to a cluster and the panel display node name after the node
has joined the cluster.
PowerScale Administration-SSP1
Scenario
Your Challenge: The new IT manager has given you a task to describe
the OneFS licensing and add the new nodes to the PowerScale cluster.
Licensing
Evaluation licensing
No individual per-
enabled from cluster
feature keys
Upgrades translate
keys into file
WebUI Cluster management > Licensing > Open Activation File Wizard or use the "isi license"
command.
In OneFS 8.1 and later a single license file contains all the licensed feature
information in a single location.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 43
NAS, PowerScale, and OneFS
Device ID cannot be
changed
Changing the LNN 3 to LNN 5 to maintain the sequential numbering of the nodes.
You should have an understanding of the two different numbers that identify a
node. The numbers are the device ID and logical node number or LNN.
The status advanced command from the isi config sub menu shows the
LNNs and device ID.
When a node joins a cluster, it is assigned a unique node ID number. If you remove
and rejoin a node from the cluster, the node is assigned a new device ID.
You can change an LNN in the configuration console. To change the LNN to
maintain the sequential numbering of the nodes use lnnset <OldNode#>
<NewNode#>.
PowerScale Administration-SSP1
When adding new nodes to a cluster, the cluster gains more CPU, memory, and
disk space. The methods for adding a node are:
• Front panel
• Configuration Wizard
• WebUI
• CLI
Join the nodes in the order that the nodes should be numbered.
Adding a node not connected to the external network (NANON) increases the
storage and compute capacity of the cluster.
Nodes are automatically assigned node numbers and IP addresses on the internal
and external networks. A node joining the cluster with a newer or older OneFS
version is automatically reimaged to match the OneFS version of the cluster. A
reimage may take up to 5 minutes.
Compatibility
Hardware compatibility is a concern when combining dissimilar Gen 6.5 nodes. For
example, when adding a single F200 node with 48 GB RAM to an F200 node pool
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 45
NAS, PowerScale, and OneFS
that has nodes with 96 GB of RAM. Without compatibility, a minimum of three F200
nodes with 48 GB RAM is required, which creates a separate node pool.
Node series compatibility depends upon the amount of RAM, the SSD size, number
of HDDs, and the OneFS version.
Cluster Shutdown
Can shutdown
entire cluster
PowerScale Administration-SSP1
Administrators can restart or shutdown the cluster using the WebUI29 or the CLI30.
Challenge
Lab Assignment: Launch the lab and add a node using the
Configuration Wizard and add a node using the WebUI.
29The WebUI Hardware page has a tab for Nodes to shut down a specific node, or
the Cluster tab to shut down the cluster.
30
Native UNIX commands do not elegantly interact with OneFS, because the
OneFS file system is built as a separate layer on top of UNIX.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 47
NAS, PowerScale, and OneFS
Scenario
IT Manager: Good, looks like you know what the different PowerScale
management tools are. Now I want you to focus on the directory
structure that OneFS uses. This is important as it sets up the directory
structure we will use moving foreward.
OneFS root
directory
At the core of OneFS, is the single file system across the cluster (/ifs). The single
file system in practice is a common directory structure.
PowerScale Administration-SSP1
Using or intervening with the built-in directory paths is not recommended unless
explicitly instructed to do so.
• Using a single file system starting with a newly created directory under /ifs is
recommended.
• For example, in the simplest form, you can create /ifs/engineering where
the engineering department data is the top-level directory for the engineering
organization.
OneFS root
Cluster root
Authentication
and segregation
root Location to situate data and create
exports and shares as per
requirement
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 49
NAS, PowerScale, and OneFS
Use case:
• A company that is named X-Attire plans to implement a single cluster for their
engineering team.
• After conversations with the customer, you identify that the customer does not
plan to have another cluster for remote disaster recovery.
• The company name or authentication domain name is used as the access zone
name (x-attire).
PowerScale Administration-SSP1
Use case:
• X-Attire plans to implement a disaster recovery solution.
• X-Attire wants to replicate the Boston/homedirs directory to the Seattle data
center.
• from Seattle, they plan to replicate the /groupdirs directory to Boston.
• Having the directory structure design up front makes the implementation easier.
On the /ifs directory, do not set inherited ACLs and do not propagate ACL
values.
Permissions on levels 1 through 5 are customer-specific and you should define the
appropriate permissions and inherited permissions starting at the appropriate level.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 51
NAS, PowerScale, and OneFS
Challenge
Lab Assignment: Go to the lab and build the base directories. The
base directories are used throughout your implementation of the
PowerScale cluster.
PowerScale Administration-SSP1
PowerScale Administration-SSP1
Authentication Providers
Scenario
IT Manager: Now, the next thing to do is get the cluster pointed to the
Active Directory and LDAP servers. Before our clients can access files
that are stored on the cluster, they must be authenticated. Make sure
that you have a good understanding of the authentication providers that
the cluster supports.
1 2 3
4 5 6
PowerScale Administration-SSP1
reason for joining the cluster to an Active Directory domain is to perform user and
group authentication.
5: The local provider provides authentication, and lookup facilities for user accounts
added by an administrator.
PowerScale Administration-SSP1
Authentication
Authentication provider
source / directory
Access control architectural components that show two configured access zones.
lsassd is between the access protocols and the lower-level services providers.
The lsassd daemon mediates between the authentication protocols that clients
use and the authentication providers in the third row.
The authentication providers check their data repositories, which are shown on the
bottom row. The process determines user identity and subsequent access to files.
Function
Active Directory can serve many functions, but the primary reason for joining the cluster to an AD domain is to enable domain
users to access cluster data.
To join the cluster to AD, specify the fully qualified domain name, which can be
resolved to an IPv4 or an IPv6 address, and a username with join permission.
Areas to consider:
• Creates a single AD machine account
• Establishes trust relationship
PowerScale Administration-SSP1
PowerScale Administration-SSP1
Link:
https://edutube.emc.com/Player.aspx?vno=Xu/3IyDNSxbuNMOcLHrqBg==&autopl
ay=true
Select the Join a domain button. This demonstration shows the barest configuration
to join a domain. Start by entering the provider name. The NetBIOS requires that
computer names be 15 characters or less. Two to four characters are appended to
the cluster name you specify to generate a unique name for each node. If the
cluster name is more than 11 characters, you can specify a shorter name in the
Machine Name field. Enter the user name of the account that has the right to add
computer accounts to the domain, and then enter the account password. The
Enable Secure NFS checkbox enables users to log in using LDAP credentials, but
to do this, Services for NFS must be configured in the AD environment.
Shown is the CLI equivalent command used to join Active Directory. To display a
list of command options, run the isi auth ads create -h command at the
CLI. Now, before connecting to an LDAP server you should decide which optional
PowerScale Administration-SSP1
customizable parameters you want to use. Refer the Isilon Web Administration
Guide for details on each of the settings.
Click the Join button. While joining the domain, the browser window displays the
status of the process and confirms when the cluster has successfully joined the AD
domain. The join creates a single computer account is for the entire cluster.
And that is the most basic configuration. Note that AD and LDAP both use TCP
port 389. Even though both services can be installed on one Microsoft server, the
cluster can only communicate with one of services if they are both installed on the
same server. This concludes the demonstration.
PowerScale Administration-SSP1
31 The easiest method is to synchronize the cluster and the authentication servers
all to the same NTP source.
32The cluster time property sets the date and time settings, either manually or by
synchronizing with an NTP server. After an NTP server is established, setting the
date or time manually is not allowed.
33After a cluster is joined to an AD domain, adding an NTP server can cause time
synchronization issues. The NTP server takes precedence over the SMB time
synchronization with AD and overrides the domain time settings on the cluster.
35 Nodes use NTP between themselves to maintain cluster time. When the cluster
is joined to an AD domain, the cluster must stay synchronized with the time on the
domain controller. If the time differential is more than five minutes, authentication
may fail.
PowerScale Administration-SSP1
NTP Configuration
Chimers nodes can contact the external Non-chimer nodes use chimers as NTP
NTP servers servers
WebUI > General settings > NTP page to configure NTP and chimer settings.
You can configure specific chimer nodes by excluding other nodes using the
isi_ntp_config {add | exclude} <node#> command. The list excludes
nodes using their node numbers that are separated by a space.
LDAP Overview
Function
OneFS can authenticate users and groups against an LDAP repository in order to grant them access to the cluster. OneFS
supports Kerberos authentication for an LDAP provider.
PowerScale Administration-SSP1
37Each attribute has a name and one or more values that are associated with it
that is similar to the directory structure in AD.
PowerScale Administration-SSP1
Link:
https://edutube.emc.com/Player.aspx?vno=JKBFLVJaUoqGz8DJmH4zqg==&autop
lay=true
In this demonstration, we’ll go through the steps needed to configure LDAP for the
PowerScale cluster. Let us navigate to Access and then to Authentication providers
page. Next, select the LDAP tab. Now click the Add an LDAP provider button.
For this demonstration, I am only showing the barest configuration. Let us give our
LDAP a provider name. Next, I will enter the URI to the LDAP server. You must
configure a base distinguished name. Often issues involve either misconfigured
base DNs or connecting to the LDAP server. The top-level names almost always
mimic DNS names; for example, the top-level Isilon domain would be dc=isilon,
dc=com for Isilon.com. Our environment is DEES and lab.
Shown is the CLI equivalent command used to configure LDAP. To display a list of
these commands, run the isi auth ldap create -h command at the CLI.
And that is the most basic configuration.
PowerScale Administration-SSP1
Now, before connecting to an LDAP server you should decide which optional
customizable parameters you want to use. If there are any issues while configuring
or running the LDAP service, there are a few commands that can be used to help
troubleshoot. The ldapsearch command runs queries against an LDAP server to
verify whether the configured base DN is correct. The tcpdump command verifies
that the cluster is communicating with the assigned LDAP server.
You have the option to enter a netgroup. A netgroup, is a set of systems that reside
in a variety of different locations, that are grouped together and used for permission
checking. For example, a UNIX computer on the 5th floor, six UNIX computers on
the 9th floor, and 12 UNIX computers in the building next door, all combined into
one netgroup.
Select the Add LDAP Provider button. After the LDAP provider is successfully
added, the LDAP providers page displays a green status. This means that the
cluster can communicate with the LDAP server. Note that AD and LDAP both use
TCP port 389. Even though both services can be installed on one Microsoft server,
the cluster can only communicate with one of services if they are both installed on
the same server. This concludes the demonstration.
Challenge
Lab Assignment:
• Join the cluster to Active Directory
• Configure the cluster for LDAP
PowerScale Administration-SSP1
Access Zones
Scenario
IT Manager: Now that you have configured the cluster for Active
Directory and LDAP, it is time to take the next step in implementation.
You are configuring access zone for two organizations, finance and
engineering. Finance is a Microsoft Windows environment and
engineering is a Linux environment. Before you configure the cluster, I
want to ensure you understand access zones and what they do.
This video provides an overview for access zones. See the student guide for a
transcript of the video.
PowerScale Administration-SSP1
Link: https://edutube.emc.com/Player.aspx?vno=w/pzpXjL6ZCFlcdx0riu5A
Although the default view of a cluster is that of one physical machine, you can
partition a cluster into multiple virtual containers called access zones. Access
zones enable you to isolate data and control who can access data in each zone.
Access zones support configuration settings for authentication and identity
management services on a cluster. Configure authentication providers and
provision protocol directories, such as SMB shares and NFS exports, on a zone-by-
zone basis. Creating an access zone, automatically creates a local provider, which
enables you to configure each access zone with a list of local users and groups.
You can also authenticate through a different authentication provider in each
access zone.
The OneFS identity management maps users and groups from separate directory
services to provide a single combined identity. It also provides uniform access
control to files and directories, regardless of the incoming protocol.
PowerScale Administration-SSP1
External Protocols
Clients use the external access protocols to connect to the PowerScale cluster.
The supported protocols are SMB, NFS, S3, HTTP, FTP, HDFS, and SWIFT.
lsassd Daemon
The lsassd (L-sass-d) daemon mediates between the external protocols and the
authentication providers, with the daemon contacting the external providers for user
lookups.
PowerScale Administration-SSP1
External Providers
Internal Providers
Internal providers sit within the cluster operating system and are the Local, or File
Providers.
• File provider - authoritative third-party source of user and group information.
• Local provider - provides authentication and lookup facilities for user accounts
added by an administrator.
• Local provider automatically created in access zone.
PowerScale Administration-SSP1
1 4
2
4: The /ifs/eng base directory partitions data from the /ifs/dvt directory.
5: The base directory of the default System access zone is /ifs and cannot be
modified. Avoid using the OneFS built-in directories as base directories.
A base or root directory defines the tree structure of the access zone.
The access zone cannot grant access to any files outside of the base directory,
essentially creating a unique namespace.
PowerScale Administration-SSP1
This demonstration provides a look at access zone configuration. See the student
guide for a transcript of the video.
Link: https://edutube.emc.com/Player.aspx?vno=08ieHpVlyvyD+A8mTzHopA
In this demonstration, we will go through the steps to create access zones using
the WebUI and the CLI. First, let’s use the WebUI.
Navigate to Access and then to the Access zones page. Note that the System
access zone is shown in the table. The System zone is created by OneFS. Select
the Create an access zone button. In the window, enter the zone name for the new
access zone. Next enter the zone base directory. This should be unique, and you
should avoid using the OneFS built-in directories such as /ifs/data. Our base
directory is /ifs/sales.
Since we have not created this directory before creating the access zone, select
the checkbox to create the base directory automatically. Notice that we already
configured the authentication providers. This access zone is dedicated for the
Active Directory users. Add the AD provider and then select Create zone.
PowerScale Administration-SSP1
Next, we will create another access zone using the CLI. We are logged in via SSH
to node 1 and using the isi zone command. The name of this access zone is
engineering. The unique base directory is /ifs/engineering. Since the
/ifs/engineering directory does not exist, use the option to create it. And
finally, we will add the LDAP authentication provider to the zone.
Next verify that the zones are created. Use the list option. Moving back to the
WebUI, check the access zone page to verify the zones display. Instead of waiting
for the refresh, click on another page and then back.
This demonstration showed configuring access zones using the WebUI and the
CLI. This concludes the demonstration.
Listed are areas to consider when configuring and discussing access zones.
• The number of access zones should not exceed 50.
• As a good practice, configure an access zone for a specific protocol if multi-
protocol access is not needed. For example, an implementation with both NFS
and SMB access should have an access zone for the NFS access and another
access zone for the SMB access.
• Access zones and authentication providers must be in only one groupnet.
• Authentication sources are joined to the cluster and "seen" by access zones -
multiple instances of the same provider in different access zones is not
recommended.
• Authentication providers are not restricted to one specific zone.
• Only join AD providers not in same forest (untrusted forest).
• Shared UIDs in same zone can potentially cause UID/GID conflicts.
• You can overlap data between access zones for cases where workflows require
shared data - however, overlapping adds complexity that may lead to issues
with client access.
You can avoid configuration problems on the cluster when creating access zones
by following best practices guidelines.
PowerScale Administration-SSP1
System zone is for global admin Employ ZRBAC for zone administration.
access only.
Create zones to isolate data for Do not isolate if workflow requires shared
different clients. data.
Challenge
PowerScale Administration-SSP1
Groupnets
Scenario
Configure SmartConnect
IP address, VLAN, and
MTU on the subnet
Groupnets reside at the top tier of the networking hierarchy and are the
configuration level for managing multiple tenants on your external network.
PowerScale Administration-SSP1
A subnet can also be called the SmartConnet zone and contain one or more pools.
Pools enable more granular network configuration.
Multi-Tenancy Overview
SmartConnect: isilon.xattire.com
192.168.0.0/24
192.168.2.0/24
SmartConnect: isilon.gearitup.com
Groupnets are the configuration level for managing multiple tenants39 on the
external network of the cluster.
In the X-Attire scenario, the solution must treat each business unit as a separate
and unique tenant with access to the same cluster. The graphic shows how each
organization has its own groupnet and access zone.
PowerScale Administration-SSP1
Multi-tenancy Considerations
Groupnets are an option for those clusters that will host multiple companies,
departments, or clients that require their own DNS settings. Some areas to
consider are:
• DNS settings are per groupnet
• Create another groupnet only if separate DNS settings required.
• Follow proper build order:
1. Create groupnet
2. Configure authentication provider
3. Create access zone, and add authentication provider
4. Configure subnet with SmartConnect
5. Create pool, and add access zone
• In a multiple tenant solution, a share can span access zones. Combining
namespaces and overlapping shares is an administrative decision.
This video provides an overview of the groupnet and access zone relationship. See
the student guide for a transcript of the video.
PowerScale Administration-SSP1
Link:
https://edutube.emc.com/Player.aspx?vno=b4A2l5FzF2na/Txqk2AUTA==&autopla
y=true
Because groupnets are the top networking configuration object, they have a close
relationship with access zones and the authentication providers. Having multiple
groupnets on the cluster means that you are configuring access to separate and
different networks, which are shown as org1 and org2. Different groupnets enable
portions of the cluster to have different networking properties for name resolution.
Configure another groupnet if separate DNS settings are required. If necessary, but
not required, you can have a different groupnet for every access zone. The
limitation of 50 access zones enables the creation of up to 50 groupnets.
When the cluster joins an Active Directory server, the cluster must know which
network to use for external communication to the external AD domain. Because of
this, if you have a groupnet, both the access zone and authentication provider must
exist within same groupnet. Access zones and authentication providers must exist
within only one groupnet. Active Directory provider org2 must exist in within the
same groupnet as access zone org2.
PowerScale Administration-SSP1
The graphic shows the Cluster management > Network configuration > external network > Add
a groupnet window.
When creating a groupnet with access zones and providers in the same zone, you
need to create them in the proper order:
1. Create the groupnet.
2. Create the access zone and assign to the groupnet.
3. Create the subnet and pool.
4. Add the authentication provider and associate them with the groupnet
5. Associate the authentication providers with the access zone.
When creating a groupnet with access zones and providers in the same zone, you
should create them in the proper order.
PowerScale Administration-SSP1
PowerScale Administration-SSP1
Challenge
IT Manager:
Because you configure the network components together, you will not
go to the lab until the other topics are discussed. Open participation
question:
Question: When would you create a groupnet?
PowerScale Administration-SSP1
Scenario
This video provides an overview of SmartConnect. See the student guide for a
transcript of the video.
PowerScale Administration-SSP1
Link: https://edutube.emc.com/Player.aspx?vno=L7mXSvTcNQl8+LLKzNEzkw
SmartConnect provides name resolution for the cluster. The cluster appears as a
single network element to a client system. Both cluster and client performance can
be enhanced when connections are more evenly distributed.
In Isilon OneFS 8.2, SmartConnect supports connection service for 252 nodes.
SmartConnect Architecture
SmartConnect: isilon.xattire.com
SIP: 192.168.0.100 - 192.168.0.104
192.168.0.0/24
192.168.2.0/24
SmartConnect: isilon.gearitup.com
SIP: 192.168.2.100 - 192.168.2.104
The example shows two unique groups using the same cluster, X-Attire and GearItUp.
You can configure SmartConnect into multiple zones to provide different levels of
service for different groups of clients.
PowerScale Administration-SSP1
For example, SmartConnect directs X-Attire users to F800 nodes for their needed
performance. GearItUp users access the H500 nodes for general-purpose file
sharing. The zones are transparent to the users.
The SmartConnect Service IPs40 (SSIP or SIP) are addresses that are part of the
subnet.
SmartConnect Licensing
The table shows the differences between the SmartConnect basic and
SmartConnect Advanced.
40Do not put the SIPs in an address pool. The SIPs are a virtual IP within the
PowerScale configuration, it is not bound to any of the external interfaces.
PowerScale Administration-SSP1
The SIPs, SmartConnect zone, and the DNS entries are the configuration
components for SmartConnect.
PowerScale Administration-SSP1
This demonstration shows the initial network configuration for the cluster. See the
student guide for a transcript of the video.
Link: https://edutube.emc.com/Player.aspx?vno=4hL0i4iBe2BLqJzlT4dN/Q
In this demonstration, we’ll go through the step for an initial configuration of the
cluster external network. The demonstration shows configuring SmartConnect and
a dedicated pool for an access zone.
First, login to the WebUI and navigate to the Cluster management, Network
configuration page. The External network tab is selected by default. Note that
groupnet0 and subnet0 is automatically created by OneFS. On the subnet0 line,
PowerScale Administration-SSP1
select View / Edit. There are no values for SmartConnect. Select Edit. Go to the
SmartConnect service IPs and enter the range of SmartConnect IP addresses.
OneFS versions prior to OneFS 8.2 do not allow you to enter a range of IP
addresses. For this demonstration we will be using a SmartConnect service name.
Select Save changes. The CLI equivalent to add the SmartConnect service
address is the isi network subnet modify command. Now that
SmartConnect is configured, we will configure the IP address pool for the access
zone. On the subnet0 line, click on the More dropdown and select Add pool.
Enter the pool name and then select the access zone. For this implementation the
authentication providers and the access zones are already created.
Next enter the range of IP address for this pool. Select the external node interfaces
that will carry the client traffic. The SmartConnect basic fully qualified zone name is
sales.dees.lab. We have the SmartConnect advanced license activated. Here is
where we can configure the advanced functions. For the demonstration, we will
keep the default settings. Select Add pool. The CLI equivalent to create a pool is
the isi network pools create command.
This demonstration showed the initial configuration of the network. This concludes
the demonstration.
SmartConnect Considerations
PowerScale Administration-SSP1
• Static pools are best used for stateful clients, and dynamic pools are best for
stateless clients.
• Time-to-live value41.
Challenge
IT Manager:
Because you configure the network components together, you will not
go to the lab until the other topics are discussed. Open participation
question:
Question: What are the SmartConnect Advanced benefits?
PowerScale Administration-SSP1
IP Address Pools
Scenario
IP Address Pools
OneFS configures
groupnet0, subnet0,
pool0
Control connectivity to
access zones
More subnets are configured as either IPv4 or IPv6 subnets. Other IP address
pools are created within subnets and associated with a node, a group of nodes,
NIC ports or aggregated ports.
The pools of IP address ranges in a subnet enable you to customize42 how users
connect to your cluster.
PowerScale Administration-SSP1
Use case: Say that X-Attire adds 4 F800 nodes for a video media group. X-Attire
wants the video media team to connect directly to the F800 nodes to use various
high I/O applications. The administrators can separate the X-Attire connections.
Access to the home directories connect to the front end of the H500 nodes while
the video media group accesses the F800 nodes. This segmentation keeps the
home directory users from using bandwidth on the F800 nodes.
Link Aggregation
Physical NIC
Single
Logical NIC
Physical NIC
Aggregation combining the two physical interfaces into a single, logical interface.
43The link aggregation mode determines how traffic is balanced and routed among
aggregated network interfaces.
PowerScale Administration-SSP1
Click each tab to learn more about the link aggregation modes.
LACP
Configure LACP at the switch level and on the node. Enables the node to negotiate
interface aggregation with the switch.
PowerScale Node
Logical NIC
Physical NIC
Switch
Physical NIC
Round Robin
Round robin is a static aggregation mode that rotates connections through the
nodes in a first-in, first-out sequence, handling all processes without priority.
Round robin balances outbound traffic across all active ports in the aggregated link
and accepts inbound traffic on any port.
Client requests are served one after the other based on their arrival.
PowerScale Administration-SSP1
PowerScale Node
Rotates connections in a first-in, first-
out sequence
Logical NIC
1
3
Physical NIC 1 Incoming client requests
5 6 7 8
Physical NIC 2
4
2
The graphic shows, client request 2, client request 3 and so on follow client request 1.
Note : Round Robin is not recommended if the cluster is using TCP/IP workloads.
Failover
Active/Passive failover is a static aggregation mode that switches to the next active
interface when the primary interface becomes unavailable. The primary interface
handles traffic until there is an interruption in communication. At that point, one of
the secondary interfaces takes over the work of the primary.
1 2 3 5 6
Physical NIC 1
4
Physical NIC 2
In the graphic, the nodes serve the incoming client requests. If any of the nodes become
unavailable or interrupted due to an issue, the next active node takes over and serves the upcoming
client request.
FEC
Typically used with older Cisco switches - LACP preferred in new generation
PowerScale nodes.
PowerScale Administration-SSP1
FEC accepts all incoming traffic and balances outgoing traffic over aggregated
interfaces that is based on hashed protocol header information that includes source
and destination addresses.
Outgoing traffic
Logical NIC
Incoming client requests 3 1
2
Physical NIC 1
6 5 4 3 2 1
6 5 4
Physical NIC 2
The graphic shows, the node accepts and serves all the incoming client requests. The node
balances outgoing traffic.
PowerScale Administration-SSP1
Allocation Method
PowerScale Administration-SSP1
Static
If there are more IP addresses than nodes, new nodes that are added to the pool
get the additional IP addresses.
Once allocating an IP address, the node keeps the address indefinitely unless
deleting the member interface from the pool, or removing the node from the cluster.
PowerScale Administration-SSP1
Dynamic
Dynamic pools are best used for stateless protocols such as NFSv3. Also configure
for NFSv4 with continuous availability (CA).
PowerScale Administration-SSP1
The graphic shows a two SmartConnect zones, each with different IP allocation methods.
Static pools are best used for SMB clients because of the stateful nature of the
SMB protocol.
Dynamic pools are best used for stateless protocols such as NFSv3. You can
identify a Dynamic range by the way the IP addresses present in the interface as
.110 -.112 or .113 -.115 instead of a single IP address like 0.10.
Challenge
PowerScale Administration-SSP1
PowerScale Administration-SSP1
Scenario
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 97
Configuring Identity Management and Authorization
Overview
A user who is assigned to more than one role has the combined privileges of those
roles.
The root and admin users can assign others to built-in or custom roles that have
login and administrative privileges to perform specific administrative tasks.
The example shows that user Jane is assigned the Backup Administrator role.
Many of the privileges that user Root has are not visible to user Jane.
Role-based access enables you to separate out some administrative privileges and
assign only the privileges that a user needs. Granting privileges makes access to
the configuration of the cluster less restrictive.
Roles
OneFS includes built-in administrator roles with predefined sets of privileges that
you cannot modify. You can also create custom roles and assign privileges. Click
the tabs to learn more about each role.
PowerScale Administration-SSP1
Built-in Roles
Built-in roles44 are included in OneFS and have been configured with the most
likely privileges necessary to perform common administrative functions.
44You cannot modify the list of privileges that are assigned to each built-in role.
However, you can assign users and groups to built-in roles.
47The AuditAdmin built-in role enables you to view all system configuration
settings.
48 The BackupAdmin built-in role enables backup and restore of files from /ifs.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 99
Configuring Identity Management and Authorization
Custom roles
You can create custom roles50 and assign privileges mapped to administrative
areas in your PowerScale cluster environment.
The following list describes what you can and cannot do through roles:
• You can assign privileges to a role but not directly to users or groups.
• You can create custom roles and assign privileges to those roles.
• You can copy an existing role.
• You can add any user or group of users, to one or more roles as long as the
users can authenticate to the cluster.
PowerScale Administration-SSP1
The video provides an overview of role creation. See the student guide for a
transcript of the video.
Link: https://edutube.emc.com/Player.aspx?vno=tQkWrNubtdORFBHxoRlMAg
Login as admin, a user that can assign privileges. Navigate to Access, Membership
and roles. On the Membership and roles page, note that the access zone selected
is System. Go to the Roles tab. Before moving on to the configuration, note that
OneFS has a number of built-in roles that cover most access needs. There may be
a need to define a custom role. In these instances, you can select the Create a
Role button. I will demonstrate this in a moment. A great place to learn more about
the different privileges is the Isilon OneFS Web Administration Guide.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 101
Configuring Identity Management and Authorization
The next example is to add a Windows administrator, Sai, to the sales access
zone. Adding Sai to a role specific to the access zone prevents him from
accidentally configuring Windows shares in other zones. In fact, Sai will have no
visibility into other zones. On the Roles tab, select the sales access zone. Note the
two built-in roles really do not provide the level of access for Sai. Create a role. The
role name is WinAdmin and add a short description. Shown is the CLI command to
create a zone role. Remember OneFS version 8.2 introduces zone-aware roles.
Previous version CLI commands do not have the --zone option. boston-2# isi
auth roles create --zone sales WinAdmin. Just as in the previous
example, add a member to this role. Select the provider and then the domain. Next
Search and select Sai. Now add privileges to the role. First, add the ability to log in
to the WebUI. Next, add the privilege to configure SMB. Give Read/write access to
this privilege. Now save the role. boston-2# isi auth roles modify
WinAdmin --zone sales --add-priv ISI_PRIV_LOGIN_PAPI --add-
priv ISI_PRIV_SMB –-add-user dees\\sai. Now verify the privileges of
the users.
Logout and then log in as Hayden, the AuditAdmin. The first indication is the
Access menu. Notice the options are missing. Navigating to Protocols, Windows
sharing, notice Hayden cannot create a share, only view. Also, since added to a
System zone role, Hayden can audit information in other zones. System zone
administrators are global.
Log out of the WebUI and login as Sai. You must login at an IP address or netBios
associated with the sales access zone. Viewing the Access options, Sai does not
have the privileges. Navigating to Protocols, Windows sharing, notice Sai cannot
switch to another access zone, but can configure SMB shares. This demonstration
stepped through configuring RBAC and ZRBAC. This concludes the demonstration.
PowerScale Administration-SSP1
Role Management
You can view, add, or remove members of any role. Except for built-in roles, whose
privileges you cannot modify, you can add or remove OneFS privileges on a role-
by-role basis.
View Roles
Command Description
isi auth roles list A basic list of all roles on the cluster
isi auth roles view <role> Detailed information about a single role,
where <role> is the name of the role
View Privileges
User Privileges are performed through the CLI. The table shows the commands
that can view a list of your privileges or of another user.
Command Description
isi auth mapping token <user> List of privileges for another user,
where <user> is a placeholder for
another user by name:
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 103
Configuring Identity Management and Authorization
You can create an empty custom role and then add users and privileges to the role.
Deleting a role does not affect the privileges or users that are assigned to it. Built-in
roles cannot be deleted.
The table shows the commands used to create, modify and delete a custom role.
Command Description
isi auth roles create <name> [-- To create a role, where <name> is
description <string>] the name that you want to assign
to the role and <string> specifies
an optional description
isi auth roles modify <role> [-- To add a user to the role, where
add-user <string>] <role> is the name of the role and
<string> is the name of the user
isi auth roles modify <role> [-- To add a privilege with read/write
add-priv <string>] access to the role, where <role> is
the name of the role and <string>
is the name of the privilege
isi auth roles modify <role> [-- To add a privilege with read-only
add-priv-ro <string>] access to the role, where <role> is
the name of the role and <string>
is the name of the privilege
Privileges
PowerScale Administration-SSP1
List privileges
The graphic shows built-in roles that have a predefined set of privileges. Red outlines are the only
privileges available for ZRBAC.
Note: The WebUI privileges names differ from the names that are
seen in the CLI.
ZRBAC - ISI_PRIV_AUTH_Privilege
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 105
Configuring Identity Management and Authorization
1: If zone is created by the system zone admins, only the system zone admins can
modify and delete. Local zone admin can only view and add access zones.
If zone is created by a nonsystem zone admin, both the zone admin and
nonsystem zone admin can view, modify, and delete.
4: The IP address in the IP address pool associated with the access zone.
Challenge
Lab Assignment: Go to the lab and create user accounts for RBAC and
ZRBAC.
PowerScale Administration-SSP1
Scenario
Your Challenge: The IT manager has tasked you to determine the on-
disk identity to configure on the cluster. Before configuring, you should
have an understanding of how identity management works. The
manager expects you to describe identity management, user tokens,
and on-disk identity.
Layers of Access
Identity Assignment
Based on authentication or
mediated in cluster
Cluster connectivity has four layers of interaction. The third layer is identity
assignment. The layer is straightforward and based on the results of the
authentication layer.
There are some cases that need identity mediation within the cluster, or where
roles are assigned within the cluster that are based on user identity.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 107
Configuring Identity Management and Authorization
Identity Management
The OneFS identity management maps the users and groups from separate
services. The mapping provides a single unified identity on a cluster and uniform
access control to files and directories, regardless of the incoming protocol. Click on
the "i" icons for a high-level information about the process.
PowerScale Administration-SSP1
2: The authentication providers uses OneFS to first verify a user identity after which
users are authorized to access cluster resources. The top layers are access
protocols – NFS for UNIX clients, SMB for Windows clients, and FTP and HTTP for
all.
3: Between the protocols and the lower-level services providers and their
associated data repositories, is the OneFS lsassd daemon. lsassd mediates
between the authentication protocols that clients and the authentication providers,
who check their data repositories for user identity and file access, use.
The video describes the access token generation. See the student guide for a
transcript of the video.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 109
Configuring Identity Management and Authorization
URL:
https://edutube.emc.com/Player.aspx?vno=MmSHIH1OvcP5nHsi0hd51g==&autopl
ay=true
When the cluster receives an authentication request, the lsassd searches the
configured authentication sources for matches to the incoming identity. If the
identity is verified OneFS generates an Access Token. Access Token form basis of
who you are when performing actions on the cluster. Shown is the output of the
users mapping token. The token supplies the primary owner and group identities to
use during file creation. For most protocols the access token is generated from the
PowerScale Administration-SSP1
user name or from the authorization data that is received during authentication.
Access tokens are also compared against permissions on an object during
authorization checks. The access token includes all identity information for the
session OneFS exclusively uses the information in the token when determining if a
user has access to a particular resource.
Access tokens form the basis of who you are when performing actions on the
cluster. The tokens supply the primary owner and group identities to use during file
creation. When the cluster builds an access token, it must begin by looking up
users in external directory services. By default, the cluster matches users with the
same name in different authentication providers and treats them as the same user.
The ID-mapping service populates the access token with the appropriate identifiers.
Finally, the on-disk identity is determined.
Primary Identities
OneFS supports three primary identity types, UIDs, GIDs, and SIDs.
UIDs and GIDs from Local, NIS, LDAP providers range from 1 to 65k.
OneFS automatically allocates UIDs and GIDs from the range 1,000,000-
2,000,000.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 111
Configuring Identity Management and Authorization
1
2
1: The user identifier, or UID, is a 32-bit string that uniquely identifies users on the
cluster. UNIX-based systems use UIDs for identity management.
2: The security identifier, or SID, is a unique identifier that begins with the domain
identifier and ends with a 32-bit Relative Identifier (RID). Most SIDs take the form
S-1-5-21-<A>-<B>-<C>-<RID>, where <A>, <B>, and <C> are specific to a domain
or system, and <RID> denotes the object inside the domain. SID is the primary
identifier for users and groups in Active Directory.
3: The group identifier, or GID, for UNIX serves the same purpose for groups that
UID does for users.
Secondary Identities
PowerScale Administration-SSP1
1: Windows provides a single namespace for all objects that is not case-sensitive,
but specifies a prefix that targets the dees Active Directory domain. UNIX assumes
unique case-sensitive namespaces for users and groups. For example, Sera and
sera can represent different objects.
2: Kerberos and NFSv4 define principals that require all names to have a format
similar to an email address. For example, given username sera and the domain
dees.lab, dees\sera and [email protected] are valid names for a single object in
Active Directory. With OneFS, whenever providing a name as an identifier, the
correct primary identifier of UID, GID, or SID is requested.
Multiple Identities
The graphic shows a user that has both a Windows and Linux account. Multiple
identity, or multiprotocol access, could include configuring mapping to ensure user
IDs correctly map to one another.
OneFS is RFC 2307 compliant. Enable RFC 2307 to simplify user mapping.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 113
Configuring Identity Management and Authorization
See the participant guide for information about mapping challenges and
considerations.
ID Mapper Database
1 3
1: The user mapper feature can apply rules to modify the user identity OneFS
uses, add supplemental user identities, and modify the group membership of a
user. The user mapping service combines user identities from different directory
services into a single access token. The mapping service then modifies it according
to the rules that you create.
PowerScale Administration-SSP1
On-Disk Identity
Identifies preferred
identity to store on
disk
Determines identity
stored in ACLs - SID
or UID/GIDs
The graphic shows the token of Windows user Sera with a UID as the on-disk identity.
OneFS uses an on-disk identity store for a single identity for users and groups.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 115
Configuring Identity Management and Authorization
The available on-disk identity types are Native, UNIX, and SID. The on-disk identity
is a global setting. Because most protocols require some level of mapping to
operate correctly, choose the preferred identity to store on-disk.
The use case for the default Native setting is an environment that has NFS and
SMB client and application access. With the Native on-disk identity set, lsassd
attempts to locate the correct identity to store on disk by running through each ID-
mapping method. The preferred object to store is a real UNIX identifier. OneFS
uses a real UNIX identifier when found. If a user or group does not have a real
UNIX identifier (UID or GID), OneFS stores the real SID. Click on the highlighted
icon to learn more.
Troubleshooting Resources
Challenge
PowerScale Administration-SSP1
Authorization
Scenario
Permissions Overview
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 117
Configuring Identity Management and Authorization
3
4
6
1
5
1: OneFS supports NFS and SMB protocols. It accesses the same directories and
files with different clients.
6: OneFS supports two types of authorization data on a file, access control lists, or
ACLs, and UNIX permissions, or POSIX mode bits.
PowerScale Administration-SSP1
The internal representation, which can contain information from either the POSIX
mode bits or the ACLs, is based on RFC 3530.
POSIX Overview
53A file can only be in one of the states at a time. That state is authoritative. The
actual permissions on the file are the same, regardless of the state.
55 OneFS must store an authoritative version of the original file permissions for the
file sharing protocol and map the authoritative permissions for the other protocol.
OneFS must do so while maintaining the security settings for the file and meeting
user expectations for access. The result of the transformation preserves the
intended security settings on the files. The result also ensures that users and
applications can continue to access the files with the same behavior.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 119
Configuring Identity Management and Authorization
1 2 3
4 5
2: Group permissions
4: Configure permission flags to grant read (r), write (w), and execute (x)
permissions to users, groups, and others in the form of permission triplets. The
classes are not cumulative. OneFS uses the first class that matches. Typically,
grant permissions in decreasing order, giving the highest permissions to the file
owner and the lowest to users who are not the owner or the owning group.
5: These permissions are saved in 16 bits, which are called mode bits.
6: The information in the upper 7 bits can also encode what the file can do,
although it has no bearing on file ownership. An example of such a setting would
be the “sticky bit.”
PowerScale Administration-SSP1
Triplets
9 mode bits
Triplet classes
Modify UNIX permissions in the WebUI on the File system > File system explorer page. Click
image to enlarge.
The graphic shows root user who is logged in and the /ifs/boston/hr
directory. Only root user can view and edit the owner and group of the object.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 121
Configuring Identity Management and Authorization
chmod Command
Changing the permissions on a directory so that group members and all others can only read the
directory.
OneFS supports the standard UNIX tools for changing permissions: chmod and
chown. The change mode command, chmod, can change permissions of files and
directories. The man page for chmod documents all options.
Changes that are made using chmod can affect Windows ACLs.
chown Command
The output shows that penni is an LDAP user who is responsible for the content of the
/ifs/boston/hr directory.
The chown command is used to change ownership of a file. Changing the owner of
a file requires root user access. The basic syntax for chown is chown [-R]
PowerScale Administration-SSP1
newowner filenames. Using the -R option changes the ownership on the sub
directories.
The chgrp command changes the group. View the man pages for command
definitions.
Access control
elements
No permissions = no access
List of advanced
List of basic permissions
permissions
On Windows host: Properties > Security tab > Advanced > Edit window
While you can apply permissions for individual users, Windows administrators
usually use groups to organize users, and then assign permissions to groups
instead of individual users.
Windows includes many rights that you can assign individually or you can assign
rights that are bundled together as permissions. For example, the Read permission
includes the rights to read and execute a file while the Full Control permission
assigns all user rights. Full Control includes the right to change ownership and
change the assigned permissions of a file or folder.
When working with Windows, note the important rules that dictate the behavior of
Windows permissions. First, if a user has no permission that is assigned in an ACL,
then the user has no access to that file or folder. Second, permissions can be
explicitly assigned to a file or folder and they can be inherited from the parent
folder. By default, when creating a file or folder, it inherits the permissions of the
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 123
Configuring Identity Management and Authorization
OneFS has configurable ACL policies that manage permissions. You can change
the default ACL settings globally or individually, to best support the environment.
The global permissions policies change the behavior of permissions on the system.
For example, selecting UNIX only changes the individual ACL policies to
correspond with the global setting. The permissions settings of the cluster are
handled uniformly across the entire cluster, rather than by each access zone.
The WebUI > Access > ACL policy settings page and how the policy settings
translate in the CLI command output. You can also use the "isi auth settings acls
modify" command to configure the ACL settings.
1
2
3
4
2: Use case: Permissions operate with UNIX semantics - prevents ACL creation.
3: Use case: Permissions operate with Windows semantics - errors for UNIX
chmod.
PowerScale Administration-SSP1
1 4
1: The ls -le command shows actual permissions stored on disk and ACL from
security descriptor.
2: The ls -len command shows numerical (n) owner and group SID or UID/GID.
4: The long format includes file mode, number of links, owner, group, MAC label,
number of bytes, abbreviated month, day file last modified, hour file last modified,
minute file last modified, and the path name.
OneFS takes advantage of standard UNIX commands and has enhanced some
commands for specific use with OneFS.
The list directory contents, ls, command provides file and directory permissions
information, when using an SSH session to the cluster. PowerScale has added
specific options to enable reporting on ACLs and POSIX mode bits.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 125
Configuring Identity Management and Authorization
Tip: The ls command options are all designed for long notation
format, which is displayed when the -l option is used. The -l
option also displays the actual permissions that are stored on disk.
Running the ls -le command shows the synthetic ACLs for files and directories (the -d flag lists
directory entries).
A Windows client processes only ACLs, it does not process UNIX permissions.
When viewing the permission of a file from a Windows client, OneFS must translate
the UNIX permissions into an ACL.
If a file has Windows-based ACLs (and not only UNIX permissions), OneFS
considers it to have advanced, or real ACLs56.
56Advanced ACLs display a plus (+) sign when listed using an ls –l, or as shown,
the ls -led command. POSIX mode bits are present when a file has a real ACL,
however these bits are for protocol compatibility and are not used for access
checks.
PowerScale Administration-SSP1
The video discusses authentication and authorization. See the student guide for a
transcript of the video.
Link:
https://edutube.emc.com/html5/videoPlayer.htm?vno=EN8uMS3WuRwjY4Q0mIUa
Zw
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 127
Configuring Identity Management and Authorization
corporate building, thus the user has permission to enter. Share level permissions
work similarly in that users get access to the share before they can gain access to
any of the share directories. A user that has access to a directory (office) can then
access the files within the directory, providing permission to the file is given.
Two options are available when creating a share, Do not change existing
permissions and Apply Windows default ACLs. Understand the Apply
Windows default ACLs settings. This setting can destroy or at a minimum alter
explicitly defined directory permissions that are created on the share. For example,
carefully migrated permissions can change, creating more work and the potential of
causing data unavailability. Files and directories can be either POSIX authoritative
or ACLs authoritative.
A synthetic ACL does not exist on the file system and is not stored anywhere.
Instead, OneFS generates a synthetic ACL as needed, and then discards it. OneFS
creates the synthetic ACL in memory when a client that only understands ACLs,
such as Windows clients, queries the permissions on a file that only has POSIX
permissions.
With synthetic ACLs, POSIX mode bits are authoritative. POSIX mode bits handle
permissions in UNIX environments and govern the synthetic ACLs. Permissions
are applied to users, groups, and everyone, and allow or deny file and directory
access as needed. The read, write, and execute bits form the permissions triplets
for users, groups, and everyone. The mode bits can be modified using the WebUI
or the CLI standard UNIX tools such as chmod and chown. Since POSIX governs
the synthetic ACLs, changes made using chmod change the synthetic ACLs. For
example, running chmod 775 on the /ifs/dvt directory changes the mode bits to
read-write-execute for group, changing the synthetic ACL for the group. The same
behavior happens when making the access more restrictive, for example, running
chmod 755, changes the synthetic ACL to its corresponding permission. The
chmod behavior is different when ACLs are authoritative.
PowerScale Administration-SSP1
In the example, the directory /ifs/dvt/win has a real ACL. The POSIX mode bits are
775. Running chmod 755 does not change to the POSIX mode bits since merging
775 with 755 gives the combined value of 775. Shown is an excerpt from the Isilon
cluster WebUI page that shows the different behaviors.
The first example shows that the share permission is everyone read-only although
the POSIX indicates read-write-execute. Windows users can write to the share
based on the synthetic ACLs. The second example shows POSIX at 755. Although
the ACL is set to a user with full control, the user cannot write to the share—POSIX
is authoritative.
The “+” indicates a real or native ACL that comes directly from Windows and is
applied to the file. Access control entries make up Windows ACLs. An administrator
can remove the real ACL permission using the chmod -b command. ACLs are
more complex than mode bits and can express a richer set of access rules.
However, not all POSIX mode bits can represent Windows ACLs any more than
Windows ACLs can represent POSIX mode bits.
Once a file is given an ACL, its previous POSIX mode bits are no longer
enforced—the ACL is authoritative. The first example shows a real ACL used,
POSIX set for 777, and the share permissions for the user set to read-only.
Although the POSIX show read-write-execute for everyone, the user cannot write
because of the ACL. In contrast, the second example shows the case where the
user can write.
Troubleshooting Resources
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 129
Configuring Identity Management and Authorization
Challenge
Lab Assignment:
Log in to the cluster and verify the ACL policy setting.
• Permissions and ownership using the WebUI
• Permissions and ownership using the CLI
• ACL authoritative
• ACL policy setting
PowerScale Administration-SSP1
PowerScale Administration-SSP1
OneFS Caching
Scenario
IT Manager: The next thing that I would like to know more about is how
the PowerScale caches data.
4
2
1
3
2: Accelerate access. The immediacy determines how the cache is refreshed, how
long the data is available, and how the data is emptied or flushed from cache.
3: Different cache levels to account for differing data immediacy. The cache levels
provide guidance to the immediacy of information from a client-side transaction
perspective.
PowerScale Administration-SSP1
4: Cache is temporary. Because cache is a copy of the metadata and user data,
any data that is contained in cache is temporary and can be discarded when no
longer needed.
Caching maintains a copy of the metadata57 and/or the user data blocks in a
location other than primary storage.
Cache in OneFS is divided into levels. Each level serves a specific purpose in read
and write transactions.
Cache Levels
OneFS caching consists of the client-side level 1, or L1, cache and write coalescer,
and level 2, or L2 storage and node-side cache.
Both L1 cache and L2 cache are managed and maintained in RAM. However,
OneFS is also capable of using SSDs as level 3, or L3 cache.
57
The copy is used to accelerate access to the data by placing the copy on a
medium with faster access than the drives.
PowerScale Administration-SSP1
Each cache has its own specialized purpose and works together to provide
performance improvements across the entire cluster.
L1 Cache
Client-side cache.
1: L1 cache allows all blocks for immediate read requests. Read cache is flushed
after a successful read transaction and write cache is flushed after a successful
write transaction. L1 cache collects the requested data from the L2 cache of the
nodes that contain the data.
L1 cache is the client-side cache. It is the buffer on the node that the client
connects, and is involved in any immediate client data transaction.
The write coalescer collects the write blocks and performs the additional process of
optimizing the write to disk.
L2 Cache
L2 cache.
PowerScale Administration-SSP1
1: L2 cache is also contained in the node RAM. It is fast and available to serve L1
cache read requests and take data handoffs from the write coalescer. L2 cache
interacts with the data that is contained on the specific node. The interactions
between the drive subsystem, the HDDs, and the SSDs on the node go through the
L2 cache for all read and write transactions.
L2 cache is the storage side or node-side buffer. L2 cache stores blocks from
previous read and write transactions.
L3 Cache
L3 cache.
PowerScale Administration-SSP1
1: Extension of L2 cache.
2: SSD access is slower than access to RAM and is relatively slower than L2 cache
but faster than access to data on HDDs. L3 cache is an extension of the L2 read
cache functionality. Because SSDs are larger than RAM, SSDs can store more
cached metadata and user data blocks than RAM. When L3 cache becomes full
and new metadata or user data blocks are loaded into L3 cache, the oldest existing
blocks are flushed from L3 cache. Flushing is based on first in first out, or FIFO. L3
cache should be filled with blocks being rotated as node use requires.
L3 cache provides additional level of storage node-side cache using the SSDs as
read cache.
Good for random, read heavy workflows accessing the same data sets.
PowerScale Administration-SSP1
The graphic shows an eight node cluster that is divided into two node pools with a
detailed view of one of the nodes.
3 1
4
1: Clients connect to L1 cache and the write coalescer. The L1 cache is connected
to the L2 cache on the other nodes and within the same node. The connection to
other nodes occurs over the internal network when data that is contained on those
nodes is required for read or write.
2: The L2 cache on the node connects to the disk storage on the same node. The
L3 cache is connected to the L2 cache and serves as a read-only buffer. The L2
cache on the node connects to the disk storage on the same node.
5: Backend network.
Anatomy of a Read
When a client requests a file, the client-connected node uses the isi get
command to determine where the blocks that comprise the file are located.
PowerScale Administration-SSP1
1: The first file inode is loaded, and the file blocks are read from disk on all other
nodes. If the data is not present in the L2 cache, data blocks are copied in the L2.
The blocks are sent from other nodes through the backend network.
2: If the data is already present in L2 cache, it is not loaded from the hard disks.
OneFS waits for the data blocks from the other nodes to arrive. Otherwise, the
node gets the data load from the local hard disks, and then the file is reconstructed
in L1 cache and sent to the client.
When a client requests a file write to the cluster, the client-connected node
receives and processes the file.
PowerScale Administration-SSP1
2 1
1: Cache writes until write coalescer is full, time limit is reached, or protocol
requests confirmation of delivery.
2: The client-connected node creates a write plan for the file including calculating
Forward Error Correction, or FEC. Data blocks assigned to the node are written to
the journal of that node. Data blocks assigned to other nodes travel through the
internal network to their L2 cache, and then to their journal.
At same time, data blocks that are assigned to other nodes go to L2.
3: Once all nodes have all the data and FEC blocks that are journaled, a commit is
returned to the client. Data blocks assigned to client-connected node stay cached
in L2 for future reads, and then data is written onto the HDDs.
4: The Block Allocation Manager, or BAM, on the node that initiated a write
operation makes the layout decisions. The BAM decides on where best to write the
data blocks to ensure that the file is properly protected. Data is copied to journal.
To decide, the BAM Safe Write, or BSW, generates a write plan, which comprises
all the steps that are required to safely write the new data blocks across the
protection group.
5: Once nodes have the data and FEC is journaled, nodes confirmation is sent to
client-connected node and a commit is sent to client.
6: Once complete, the BSW runs this write plan and guaranties its successful
completion. OneFS does not write files at less than the desired protection level.
Data is written to disks.
PowerScale Administration-SSP1
L3 Cache Settings
L3 cache is enabled by default for all new node pools that are added to a cluster.
L3 cache is either on or off and no other visible configuration settings are available.
File system > Storage pools > SmartPools settings. Enabling and disabling L3 at
the global level and at the node pool level.
2: L3 cache cannot enable if node pool has no unprovisioned SSDs and it cannot
coexit with other SSD strategies.
CLI Commands
The following command are used to disable globally and to enable at the node pool
level.
L3 Cache Considerations
PowerScale Administration-SSP1
• L3 cache cannot co-exist with other SSD strategies58 on the same node pool.
• SSDs in an L3 cache enabled node pool cannot participate as space used for
GNA.
• L3 acts as an extension of L2 cache regarding reads and writes59 on a node.
• You cannot enable L3 cache in all-flash nodes60.
• You cannot disable L3 cache in archive-type nodes (A200, A2000, NL410,
HD400).
• If changing the L3 cache behavior, migrating data and metadata from the SSDs
to HDDs can take hours.
The example shows the command to query historical statistics for cache. The first
command lists the keys that are related to cache.
A use case is, running the command to determine the L3 hit and miss stats to
indicate if the node pool needs more SSDs.
60 On Gen 6x nodes all data drives are SSDs in the F800, F810, F600, and F200.
PowerScale Administration-SSP1
1: The command lists the keys that are related to cache. The number and
granularity of available keys is numerous. The keys give administrators insight to
the caching efficiency and can help isolate caching related issues.
2: The command shows the key to list the L1 metadata read hits for node 2, the
node that is connected over SSH.
Challenge
IT Manager:
Open participation question:
Question: What does L1, L2, and L3 cache provide?
PowerScale Administration-SSP1
SMB Shares
Scenario
IT Manager: The first thing that I would like you to configure is an SMB
share for the Windows users. I want you to create a single share for
now, and ensure that the Windows users have access.
Your Challenge: The IT manager has tasked you to create a share that
the Windows users can access. Before creating the shares, you must
know a few things. The manager wants you ensure you can describe
SMB Continuous Availability, enable SMB sharing, and create shares
and home directories.
Protocol Overview
Configure and create SMB shares for Windows users - created at the zone
level
PowerScale Administration-SSP1
Network or Node
failure
Old behavior: If this node goes down or a network interruption, the client
needs to reconnect to the cluster manually.
SMB shares provide Windows clients network access to file system resources on
the cluster.
Too many disconnections prompt the clients to open help desk tickets with their
local IT department to determine the nature of the data unavailability.
Clients using SMB 1.0 and SMB 2.x use a time-out service.
PowerScale Administration-SSP1
Server-side copy offloads copy operations to the server when the involvement of
the client is unnecessary.
File data no longer traverses the network for copy operations that the server can
perform.
The server-side copy feature is enabled by default. To disable the feature, use the
CLI.
61Advanced algorithms are used to determine the metadata and user data blocks
that are cached in L3. L3 cached data is durable and survives a node reboot
without requiring repopulating.
PowerScale Administration-SSP1
/ifs/finance/
Enabled by default
Server-side copy is data
disabled
Network
Copied data
traverses the
network Server-side copy is
To enable SMB, in the WebUI, go to the Protocols > Windows sharing (SMB) > SMB server
settings tab.
PowerScale Administration-SSP1
The SMB server settings page contains the global settings that determine how the
SMB file sharing service operates.
These settings include enabling or disabling support for the SMB service.
A case62 for disabling the SMB service is when testing disaster readiness.
This video demonstrates the process of creating an SMB share, mapping the
share, and verifying access. See the student guide for a transcript of the video.
62 The organization fails over the production cluster or directory to a remote site.
When the remote data is available and users write to the remote cluster, all SMB
traffic should be halted on the production site. Preventing writes on the production
site prevents data loss when the remote site is restored back to the production site.
PowerScale Administration-SSP1
Link:
https://edutube.emc.com/html5/videoPlayer.htm?vno=aMwue+nqUbFdOFoqKa98F
g
This demonstration shows the steps to configure SMB shares. Log in to the WebUI
as admin. The dashboard shows all the cluster nodes are healthy. The cluster is
running OneFS 8.2. Navigate to Protocols, Windows sharing. The SMB share will
be in the marketing access zone. Select Create an SMB share. The share I am
creating is called “general purpose”. I will add a description. The path
/ifs/marketing/GeneralPurpose does not exist so I will ensure it is created. This is a
Windows only share that did not previously exist so I will select Apply Windows
default ACLs. In the Members table I will give Everyone full control and then Create
share. The next step is to access the share from a Windows client. From the
Windows client, I will open Windows Explorer and map the share. Good. Now as a
simple test I am creating a text document. I will write some content and save. And
then I will open the document. This demonstration stepped through configuring,
mapping, and accessing an SMB share.
PowerScale Administration-SSP1
Share Creation
Settings Section
The CLI equivalent are the isi smb shares create or isi smb shares modify commands.
Type the full path of the share in the path field, beginning with /ifs.
You can also browse to the share. If the directory does not exist, the Create SMB
share directory if it does not exist creates the required directory.
PowerScale Administration-SSP1
Directory ACLs
Use caution when applying the default ACL settings as it may overwrite existing
permissions in cases where the data has been migrated onto the cluster.
When a cluster is set up, the default permissions on /ifs may or may not be
appropriate for the permissions on your directories.
PowerScale Administration-SSP1
Summary63
OneFS supports the automatic creation of SMB home directory paths for users.
631) If adding a share to an existing directory structure, you likely do not want to
change the ACL, so select the Do not change existing permissions. 2) If creating a
share for a new directory, you will likely be changing permissions to the ACL to
grant Windows users rights to perform operations. Set the Apply Windows default
ACLs and then once the share is created, go into the Windows Security tab and
assign permissions to users as needed.
PowerScale Administration-SSP1
Variables:
• %L64
• %D65
• %U66
• %Z67
67%Z expands to the access zone name. If multiple zones are activated, this
variable is useful for differentiating users in separate zones.
PowerScale Administration-SSP1
The graphic shows the permissions that are changed to Full control.
Adjustments made to Advanced settings override the default settings for this
share only.
PowerScale Administration-SSP1
You can make access zone global changes to the default values in the Default
share settings tab. Changing the default share settings is not recommended.
In the CLI, you can create shares using the isi smb shares create
command. You can also use the isi smb shares modify to edit a share and
isi smb shares list to view the current Windows shares on a cluster.
The share name can contain up to 80 characters, and can only contain
alphanumeric characters, hyphens, and spaces. The description field contains
basic information about the share. There is a 255-character limit. Description is
optional but is helpful when managing multiple shares.
Example for directory ACLs: Say that /ifs/eng is a new directory that was created
using the CLI. Windows users can create and delete files in the directory. When
creating the share, if the Do not change existing permissions is set and then users
attempt to save files to the share, an access denied occurs because Everyone has
read access. Even as an administrator you cannot modify the security tab of the
directory to add Windows users because the mode bits limit access to only Root.As
an example, /ifs/eng is and NFS export and you explicitly want the /ifs/eng mode bit
rights set based on UNIX client application requirements. Selecting the Apply
Windows default ACLs option as shown in the graphic, overwrites the original
ACLs, which can break the application. Thus, there is risk that is associated with
using Apply Windows default ACLs with an existing directory.
Example for home directories: To create a share that automatically redirects users
to their home directories, select the Allow variable expansion box. To automatically
create a directory for the user, check the Auto-create directories box. You may also
set the appropriate flags by using the isi smb command in the command-line
interface. In the graphic, 1) set up user access to their home directory by mapping
to /ifs/finance/home. Users are automatically redirected to their home directory
/ifs/finance/home/. 2) Expansion variables are used to automatically create a path
where the users store the home directory files. After the creation, users connecting
to this share are automatically redirected to their home directory according to the
used path variables. The access zone is implied, because all access for Active
Directory is done per access zone and each access zone has its own home
directory path.
PowerScale Administration-SSP1
Challenge
Lab Assignment: Now log in to the cluster and create home directories
and a general purpose share.
PowerScale Administration-SSP1
NFS Exports
Scenario
IT Manager: Now that you have the Windows users able to access the
cluster, you configure access for the linux users. I want you to create an
export that the linux users can access. Have a good understanding of
NFS exports before implementing into the lab.
NFS Overview
1
3
1: NFS relies upon remote procedure call (RPC) for client authentication and port
mapping.
2: NFS is native to UNIX clients. You can configure NFS to enable UNIX clients to
access content stored on PowerScale clusters.
PowerScale Administration-SSP1
Exporting a directory enables accessing the data that is hosted on the cluster.
Node or network
issue
CA is enabled by default.
Clients transparently fail over to another node when a network or node fails.
To enable and disable NFS using the WebUI, click Protocols > UNIX sharing (NFS)
> Global settings tab.
PowerScale Administration-SSP1
1 2
If changing a value in the Export settings, that value changes for all NFS exports in
the access zone. Modifying the access zone default values is not recommended.
You can change the settings for individual NFS exports as you create them, or edit
the settings for individual exports as needed.
2: Enabling NFSv4 requires entering the domain in the Zone settings page.
If NFSv4 is enabled, specify the name for the NFSv4 domain in the NFSv4 domain
field on the Zone setting page.
You can customize the user/group mappings, and the security types (UNIX and/or
Kerberos), and other advanced NFS settings.
The NFS global settings determine how the NFS file sharing service operates. The
settings include enabling or disabling support for different versions of NFS.
Enabling NFSv4 is nondisruptive, and it runs concurrently with NFSv3. Enabling
NFSv4 does not impact any existing NFSv3 clients.
PowerScale Administration-SSP1
Configuration steps on the UNIX sharing (NFS) page have the possibilities to
reload the cached NFS exports configuration to ensure that any DNS or NIS
changes take effect immediately.
Create and manage NFS exports using either the WebUI or the CLI. For the CLI,
use the isi nfs exports command.
Protocols > UNIX sharing (NFS) > NFS exports page, Create an export option.
Highlighted are the paths to export.
3: Specifying no clients allows all clients on the network access to the export.
4: Rule order of precedence: Root clients, always read/write clients, Always read-
only clients, and then clients.
PowerScale Administration-SSP1
You can enter a client by host name, IPv4 or IPv6 address, subnet, or netgroup.
Client fields:
• Clients - allowed access to the export
• Always read-write clients - allowed read/write access regardless of export's
access restriction setting
• Always read-only clients - allowed read-only access regardless of export's
access restriction setting
• Root clients - map as root
OneFS can have multiple exports with different rules that apply the same directory.
A network hostname, an IP address, a subnet, or a netgroup name can be used for
reference. The same export settings and rules that are created here apply to all the
listed directory paths. If no clients are listed in any entries, no client restrictions
apply to attempted mounts.
When multiple exports are created for the same path, the more specific rule takes
precedence. For example, if the 192.168.3 subnet has read-only access and
192.168.3.3 client has read/write access.
PowerScale Administration-SSP1
PowerScale Administration-SSP1
Permissions settings can restrict access to read-only and enable mount access to
subdirectories. Other export settings are user mappings.69
NFS Considerations
Challenge
Lab Assignment: Now that you have learned how to create an export,
you are ready to create the NFS directory, export the directory, and
mount it to the Centos client.
69The "root user mapping" default is to map root users to nobody, and group is
none. The default Security type is "UNIX (system)". Scrolling down in the "Create
an export" window shows the "Advanced settings".
PowerScale Administration-SSP1
S3 Buckets
Scenario
S3 Overview
OneFS namespace
Objects stored in buckets
Amazon Simple Storage Service (S3) is an AWS service that provides object
storage through a web interface. OneFS 9.0.x and later support S3 as a tier 1
protocol. OneFS S3 value:
PowerScale Administration-SSP1
• Multi-protocol access71
• Multi-tenancy - access zone aware
• Latency and IOPs equivilent to other OneFS protocols
• Evolve the PowerScale data lake story:
• Single namespace and multi-protocol access
• Concurrent access72 to objects and files
• Interoperability with OneFS data services such as snapshots, WORM, quotas,
SnycIQ, and others
Enable S3 Service
71
Support interoperability between all OneFS supported protocols. File system
mapping: Object to file, object to directory, and bucket to base directory.
PowerScale Administration-SSP1
Default ports
WebUI Protocols > Object storage (S3) page, Global settings tab. Click the image to enlarge.
Zone Settings
You can create buckets using the Object storage (S3) page or using the isi s3
buckets create command.
PowerScale Administration-SSP1
WebUI Protocols > Object storage (S3) page. Click the image to enlarge.
Create Bucket
PowerScale Administration-SSP1
The graphic shows the Create a Bucket fields completed and the command to view
a created bucket.
S3 Bucket Table
PowerScale Administration-SSP1
Key Management
A key must be created to authenticate the access. Key management from WebUI
facilitates generation of secret keys and access ID. The example show key creation
using the CLI.
PowerScale Administration-SSP1
Considerations
Services
PowerScale Administration-SSP1
Challenge
PowerScale Administration-SSP1
Hadoop Introduction
Requires license
The Hadoop Distributed File System (HDFS) protocol enables a cluster to work
with Apache Hadoop, a framework for data-intensive distributed applications.
Swift Overview
OneFS supports Swift, an object storage interface compatible with the OpenStack
Swift 1.0 API. Swift is a hybrid between the two storage types, storing Swift
metadata as an alternative data stream. Through Swift, users can access file-
based data that is stored on the cluster as objects. The Swift API is implemented
as Representational State Transfer, or REST, web services over HTTP or HTTPS.
Since the Swift API is considered a protocol, content and metadata can be ingested
as objects and concurrently accessed through protocols that are configured on the
cluster. The cluster must be licensed to support Swift.
PowerScale Administration-SSP1
through the OneFS HDFS. Swift benefits include secure multitenancy for
applications through access zones while protecting the data with capabilities such
as authentication, access control, and identity management. Manage data through
enterprise storage features such as deduplication, replication, tiering, performance
monitoring, snapshots, and NDMP backups. Swift balances the workload across
the cluster nodes through SmartConnect and stores object data more efficiently
with FEC instead of data replication.
Swift client
access
PowerScale Administration-SSP1
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 173
Foundations of Data Protection and Data Layout
File Striping
Scenario
IT Manager: I am not sure how the cluster does striping. I want you to
do some research and let me know how the operating system stripes a
file.
Your Challenge: The IT manager wants you to describe how files are
broken up for file stripes and diagram the high-level file striping steps.
OneFS protects files as the data is being written. Striping protects the cluster data
and improves performance. To understand OneFS data protection, the first step is
grasping the concept of data and forward error correction or FEC stripes.
PowerScale Administration-SSP1
• File Stripes - files are logically segmented into 128 KB stripe units to calculate
protection
• FEC stripe unit - FEC stripe unit is the calculated piece of data protection
• Data stripe units + FEC stripe units = Stripe width.In the graphic, the stripe
width is 12 (eight data [1 MB file data] + 4 FEC)
• 16 data stripe units + 4 FEC = Maximum Stripe width of 20.
• 16 data stripe units = 2 MB. Files larger than 2 MB have multiple data stripe
units.
The data stripe units and protection stripe units are calculated for each file stripe by
the Block Allocation Manager (BAM) process73.
1
2
73The BAM process calculates 128-KB FEC stripe units to meet the protection
level for each file stripe. The higher the protection level, the more FEC stripes units
are calculated.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 175
Foundations of Data Protection and Data Layout
16 X 8K = 128-KB
3: The protection is calculated based on the requested protection level for each file
stripe using the data stripe units that are assigned to that file stripe.
4: The combined 128-KB stripe units are called the Stripe Width. A single file stripe
width can contain up to 16, 128-KB data stripe units for a maximum size of 2 MB as
the files data portion. A large file has thousands of file stripes per file that is
distributed across the node pool.
The steps shows a simple example of the write process. The client saves a file to
the node it is connected to. The file is divided into data stripe units. The data stripe
units are assembled into the maximum stripe widths for the file. FEC stripe units
are calculated to meet the Requested Protection level. Then the data and FEC
stripe units are striped across nodes.
Step 1
OneFS stripes the data stripe units and FEC stripe units across the node pools.
Some protection schemes74 use more than one drive per node.
74OneFS uses advanced data layout algorithms to determine data layout for
maximum efficiency and performance. Data is evenly distributed across nodes in
the node pool as it is written. The system can continuously reallocate where the
data is stored and make storage space more usable and efficient. Depending on
the file size and the stripe width, as the cluster size increases, the system stores
PowerScale Administration-SSP1
File
Client
Graphic shows Gen 6 cluster with a simple example of the write process.
Step 2
If the file is greater than 128 KB, then the file is divided into data stripe units.
large files more efficiently. Every disk within each node is assigned both a unique
GUID (global unique identifier) and logical drive number. The disks are subdivided
into 32-MB cylinder groups that are composed of 8-KB blocks. Each cylinder group
is responsible for tracking, using a bitmap, whether its blocks are used for data,
inodes or other metadata constructs. The combination of node number, logical
drive number, and block offset make the block or inode address, which the Block
Allocation Manager controls.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 177
Foundations of Data Protection and Data Layout
Step 3
The node that the client connects to is the node that performs the FEC calculation.
PowerScale Administration-SSP1
Step 4
The data stripe units are assembled to maximum stripe width for the file. Also, here
the protection level that is configured is N+1n75.
Step 5
Depending on the write pattern, the data and FEC stripes might be written to one
drive per node or two drives per node. The important take away is that files
segment into stripes of data, FEC is calculated and this data distributes across the
cluster.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 179
Foundations of Data Protection and Data Layout
PowerScale Administration-SSP1
Challenge
IT Manager:
Open participation questions:
Question: What does OneFS consider a small file and how are
small files put on disks for protection?
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 181
Foundations of Data Protection and Data Layout
Data Protection
Scenario
Data protection is one of the variables that are used to determine how data is laid
out. OneFS is designed to withstand multiple simultaneous component failures
while still affording access to the entire file system and dataset.
• OneFS uses the Reed-Solomon algorithm
• The data can be protected up to an N+4n scheme
• In OneFS, protection is calculated per individual files
PowerScale Administration-SSP1
In Gen 6.5 nodes, the journal is stored on an NVDIMM that is battery protected.
N+Mn
76Smaller neighborhoods improve efficiency by the fact that the fewer devices you
have within a neighborhood, the less chance that multiple devices will
simultaneously fail.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 183
Foundations of Data Protection and Data Layout
• Mn79
• N+Mn80
• N=M81
• N>M82
The number of sustainable drive failures are per disk pool. Multiple drive failures on
a single node are equivalent to a single node failure. The drive loss protection level
is applied per disk pool.
79 The “Mn” is the number of simultaneous drive or node failures that can be
tolerated without data loss.
80 The available N+Mn Requested Protection levels are plus one, two, three, or four
“n” (+1n, +2n, +3n, and +4n). With N+Mn protection, only one stripe unit is written
to a single drive on the node.
82N must be greater than M to gain efficiency from the data protection. If N is less
than M, the protection results in a level of FEC calculated mirroring.
PowerScale Administration-SSP1
N+Md:Bn Protection
N + Md : Bn
The “d” is the number of drives and “n” is the number of nodes. So N+3d:1n reads
as N+3 drives or 1 node.
Unlike N+Mn, N+Md:Bn has different values for the number of drive loss and node
losses that are tolerated before data loss may occur. When a node loss occurs,
multiple stripe units are unavailable from each protection stripe and the tolerable
drive loss limit is reached when a node loss occurs.
• M83
• d84
• Colon (:)85
83In this protection level, M is the number of drives per node onto which a stripe
unit is written.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 185
Foundations of Data Protection and Data Layout
• B86
• n87
With Gen 6x, for better reliability, better efficiency, and simplified protection, using
+2d:1n, +3d:1n1d, or +4d:2n is recommended.
N is replaced in the actual protection with the number of data stripe units for each
protection stripe. If there is no / in the output, it implies a single drive per node.
Mirrored file protection is represented as 2x to 8x in the output.
86 The B value represents the number of tolerated node losses without data loss.
PowerScale Administration-SSP1
Drives per
node
N+2/2
The graphic shows viewing the output showing Actual protection on a file from the isi get command.
The output displays the number of data stripe units plus the number of FEC stripe units that are
divided by the number of disks per node the stripe is written to
The protection overhead for each protection level depends on the file size and the
number of nodes in the cluster. The percentage of protection overhead declines as
the cluster gets larger. In general, N+1n protection has a protection overhead equal
to the capacity of one node, N+2n to the capacity of two nodes, N+3n to the
capacity of three nodes, and so on.
Data mirroring requires significant storage overhead and may not always be the
best data-protection method. Example89
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 187
Foundations of Data Protection and Data Layout
The table shows the relative protection overhead associated with each FEC requested protection
level. Indicators include when the FEC protection would result in mirroring.
MTTDL
MTTDL deals with how long you can go without losing data. MTTDL is used to
calculate the OneFS suggested protection.
• Accommodate failures90
• Disk pools91
• MTBF92
91Disk pools improve MTTDL because they create more failure domains, improving
the statistical likelihood of tolerating failures over the lifetime of the equipment.
PowerScale Administration-SSP1
Quorum
There are six data stripe units to write a 768-KB file. The desired protection
includes the ability to sustain the loss of two hard drives.
93For a quorum, more than half the nodes must be available over the internal,
backend network to allow writes. An eight-node Gen 6 cluster, for example,
requires a five-node quorum.
94 If there is no node quorum, reads may occur, depending upon where the data
lies on the cluster but for the safety of new data, no new information will be written
to the cluster. So, if a cluster loses its quorum, the OneFS file system becomes
read-only and will allow clients to access data but not to write to the cluster.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 189
Foundations of Data Protection and Data Layout
1 2 3
1: Using N+2n protection, the 768-KB file will be placed into three separate data
stripes, each with two protection stripe units. Six protection stripe units are required
to deliver the requested protection level for the six data stripe units. The protection
overhead is 50 percent.
2: Using N+2d:1n protection the same 768-KB file requires one data stripe, two
drives wide per node and only two protection stripe units. The eight stripe units are
written to two different drives per node. The protection overhead is the same as the
eight node cluster at 25 percent.
3: If there is a eight node cluster, two FEC stripe units would be calculated on the
six data stripe units using an N+2n protection level. The protection overhead in this
case is 25 percent.
PowerScale Administration-SSP1
Mirroring is used to protect the file metadata and some system files that exist under
/ifs in hidden directories. Mirroring can be explicitly96 set as the requested
protection level in all available locations.
Use Case97
2
2x to 8x
x
Mirroring. Original file plus
- The protection blocks are copies of the original set of data 1 to 7 copies.
3
blocks.
x
5
-The protection is explicitly set and the required mirroring is x
selected.
6
-Actual protection is applied for other Requested Protection
x
Levels
7
x
8
x
96 Mirroring is set as the actual protection on a file even though another requested
protection level is specified under certain conditions. If the files are small, the FEC
protection for the file results in a mirroring. The loss protection requirements of the
requested protection determine the number of mirrored copies. Mirroring is also
used if the node pool is not large enough to support the requested protection level.
For example, five nodes in a node pool with N+3n Requested Protection, saves the
file at 4X mirror level, the actual protection.
97 One particular use case is where the system is used to only store small files. A
file of 128 KB or less is considered a small file. Some workflows store millions of 1
KB to 4-KB files. Explicitly setting the requested protection to mirroring can save
fractions of a second per file and reduce the write ingest time for the files.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 191
Foundations of Data Protection and Data Layout
Stripe
Some protection schemes use a single drive per node per protection stripe. The
graphic shows only a single data stripe unit, or a single FEC stripe unit is written to
each node. These protection levels are N+M or N+Mn.
PowerScale Administration-SSP1
The table shows each requested N+Mn Requested Protection level over the
minimum number of required nodes for each level. The data stripe units and
protection stripe units98 can be placed on any node pool and in any order.
FEC Node 8
FEC Node 9
The number of data stripe units depends on the size of the file and the size of the
node pool up to the maximum stripe width. As illustrated, N+1n has one FEC stripe
98The number of data stripe units depends on the size of the file and the size of the
node pool up to the maximum stripe width. N+1n has one FEC stripe unit per
protection stripe, N+2n has two, N+3n has three, and N+4n has four. N+2n and
N+3n are the two most widely used Requested Protection levels for larger node
pools, node pools with around 15 nodes or more. The ability to sustain both drive or
node loss drives the use when possible.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 193
Foundations of Data Protection and Data Layout
unit per protection stripe, N+2n has two, N+3n has three, and N+4n has four. N+2n
and N+3n are the two most widely used Requested Protection levels for larger
node pools, node pools with around 15 nodes or more. The ability to sustain both
drive or node loss drives the use when possible.
N+M:B or N+Md:Bn protection protection schemes use multiple drives per node.
The multiple drives contain parts of the same protection stripe. Multiple data stripe
units and FEC stripe units are placed on a separate drive on each node.
Protection:
N+2d:1n
Stripe
The graphic shows an example of a 1 MB file with a Requested Protection of +2d:1n. Four stripe
units, either data or protection stripe units are placed on separate drives in each node. Two drives
on different nodes per disk pool can simultaneously be lost or a single node without the risk of data
loss.
PowerScale Administration-SSP1
Advanced Protection
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 195
Foundations of Data Protection and Data Layout
level of node loss protection. Besides the drive loss protection, the node loss
protection is increased.
The table shows examples of the advanced N+Md:Bn protection schemes100. Two
drives per node per protection stripe. The number of FEC stripe units does not
equal the number of drives that are used for the protection stripe. Even if one node
is lost, there is still a greater level of protection available.
100 Like other protection levels, the data stripe units and FEC stripe units are placed
on any node in the node pool and on any drive. N+3d:1n1d is the minimum
protection for node pools containing 6-TB drives. The use of N+4d:2n is expected
to increase especially for smaller to middle sized node pools as larger drives are
introduced.
PowerScale Administration-SSP1
Protection Overhead
The protection overhead for each protection level depends on the file size and the
number of nodes in the cluster. The percentage of protection overhead declines as
the cluster gets larger.
• N+1n101
• N+2n102
• N+3n103
• Data Mirroring104
For better reliability, better efficiency, and simplified protection, use N+2d:1n,
N+3d:1n1d, or N+4d:2n, as indicated with a red box.
101 N+1n protection has a protection overhead equal to the capacity of one node.
102 N+2n protection has a protection overhead equal to the capacity two nodes.
103N+3n is equal to the capacity of three nodes, and so on. OneFS also supports
optional data mirroring from 2x-8x, enabling from two to eight mirrors of the
specified content.
104 Data mirroring requires significant storage overhead and may not always be the
best data-protection method. For example, if you enable 3x mirroring, the specified
content is explicitly duplicated three times on the cluster. Depending on the amount
of content being mirrored, the mirrors can require a significant amount of capacity.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 197
Foundations of Data Protection and Data Layout
The table shows the relative protection overhead that is associated with each FEC requested
protection level available in OneFS. Indicators include when the FEC protection would result in
mirroring.
Considerations
As the cluster scales, the default protection may need adjusting. You may not want
to apply a higher protection to the entire cluster. Although you get better protection,
it is less efficient. Listed are areas to consider.
106Because the system is doing more work to calculate and stripe the protection
data – impact is approximately linear.
PowerScale Administration-SSP1
Challenge
109The customer may want to protect some repositories at a higher level than the
cluster default.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 199
Foundations of Data Protection and Data Layout
Protection Management
Scenario
2 3
4: Actual is the level of protection OneFS applies to data. It can be more than
requested protection but never less.
PowerScale Administration-SSP1
Requested Protection
Directory path
File
H600
A200
Cluster-wide settings
The cluster-wide default data protection setting is made using the default file
pool110 policy.
110The View default policy details window displays the current default file pool
policy settings. The current protection is displayed under requested protection. The
default setting is to use the requested protection setting at the node pool level as
highlighted in the Edit default policy details window.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 201
Foundations of Data Protection and Data Layout
To view or edit the default setting, go to File system > Storage pools > File pool policies, and
click View / Edit on the Default policy. isi file pool policy modify finance --set-
requested-protection +3:1, sets the requested protection for the file pool policy at +3d:1n.
The default file pool policy protection setting uses the node pool or tier setting.
When a node pool is created, the default requested protection111 that is applied to
the node pool is +2d:1n.
The current requested protection for each node pool is displayed in the Tiers and
node pools section.
PowerScale Administration-SSP1
To view and edit the requested protection setting for the node pools in the WebUI, go to the File
system > Storage pools > SmartPools page. isi storagepool nodepools modify
v200_25gb_2gb --protection-policy +2n, sets the requested protection of a node pool to
+2n.
OneFS stores the properties for each file. To view the files and the next level
subdirectories, click the specific directory.
Manual settings112
112 Manual settings can be used to modify the protection on specific directories or
files. The settings can be changed at the directory, subdirectory, and file level. Best
practices recommend against using manual settings, because manual settings can
return unexpected results and create management issues as the data and cluster
age. Once manually set, reset the settings to default to use automated file pool
policy settings, or continue as manually managed settings. Manual settings
override file pool policy automated changes. Manually configuring is only
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 203
Foundations of Data Protection and Data Layout
To view directories and files on the cluster, go to File System > File system explorer.
recommended for unique use cases. Manual changes are made using the WebUI
File system explorer or the CLI isi set command.
PowerScale Administration-SSP1
H600
A200
The graphic shows a workflow that moves data to an archive tier of storage.
Suggested Protection
Suggested protection refers to the visual status and CELOG event notification
when node pools are set below the calculated suggested protection level.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 205
Foundations of Data Protection and Data Layout
Not using the suggested protection does not mean that data loss occurs, but it
does indicate that the data is at risk. Avoid anything that puts data at risk. What
commonly occurs is a node pool starts small and then grows beyond the configured
requested protection level. The once adequate +2d:1n requested protection level
becomes no longer appropriate, but is never modified to meet the increased
protection requirements.Not using the suggested protection does not mean that
data loss occurs, but it does indicate that the data is at risk. Avoid anything that
puts data at risk. What commonly occurs is a node pool starts small and then
grows beyond the configured requested protection level.
PowerScale Administration-SSP1
The Suggested protection feature provides a method to monitor and notify users
when the requested protection setting is different than the suggested protection for
a node pool.
SmartPools module health status - suggested To modify the settings, click on View/Edit
protection is part of the reporting in the tab
Indicates v200_24gb_2gb node pool with a requested protection level that is different
than the suggested
The notification shows the suggested setting and node pools that are within suggested protection
levels are not displayed.
Actual Protection
The actual protection114 applied to a file depends on the requested protection level,
the size of the file, and the number of node pool nodes.
114 The actual protection level is the protection level OneFS sets. Actual protection
is not necessarily the same as the requested protection level.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 207
Foundations of Data Protection and Data Layout
Orange - mirroring,
low minimum size for
requested protection
Bold - actual
requested protection
Red - actual
The chart indicates the actual protection that is applied to a file according to the number of nodes in the node pool. If protection changes
actual protection does not match the requested protection level, it may have been changed to be more efficient given the from requested
file or number of nodes in the node pool. protection
115A requested protection of +2d:1n and there is a 2-MB file and a node pool of at
least 18 nodes, the file is laid out as +2n.
116A 128-KB file is protected using 3x mirroring, because at that file size the FEC
calculation results in mirroring.
117 In both cases, the actual protection applied to the file exceeds the minimum
drive loss protection of two drives and node loss protection of one node. The
exception to meeting the minimum requested protection is if the node pool is too
small and unable to support the requested protection minimums. For example, a
node pool with four nodes and set to +4n requested protection. The maximum
supported protection is 4x mirroring in this scenario.
PowerScale Administration-SSP1
N+2/2
Drives per node
Output
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 209
Foundations of Data Protection and Data Layout
isi get
The isi get command provides detailed file or directory information. The primary
options are –d <path> for directory settings and –DD <path>/<filename> for
individual file settings.
The graphic shows the isi get –DD output. The output has three primary
locations containing file protection. The locations are a summary in the header, line
item detail settings in the body, and detailed per stripe layout per drive at the
bottom.
Challenge
IT Manager:
Open participation questions:
Question: What is a use case for setting requested protection at
the cluster level? At the node pool level? At the directory level?
PowerScale Administration-SSP1
Data Layout
Scenario
IT Manager: You are doing a great job. Now, examine how OneFS lays
out the data on disks.
4
1
3
2
2
3
1
4
1: The number of nodes in a node pool affects the data layout because data
spreads across all nodes in the pool. The number of nodes in a node pool
determines how wide the stripe can be.
2: The nomenclature for the protection level is N+Mn, where N is the number of
data stripe units and Mn is the protection level. The protection level also affects
data layout. You can change the protection level down to the file level, and the
protection level of that file changes how it stripes across the cluster.
3: The file size also affects data layout because the system employs different
layout options for larger files than for smaller files to maximize efficiency and
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 211
Foundations of Data Protection and Data Layout
performance. Files smaller than 128 KB are treated as small files. Due to the way
that OneFS applies protection, small files are triple mirrored.
4: The access pattern modifies both prefetching and data layout settings that are
associated with the node pool. Disk access pattern can be set at a file or directory
level so you are not restricted to using only one pattern for the whole cluster.
There are four variables that combine to determine how OneFS lays out data.
The variables make the possible outcomes almost unlimited when trying to
understand how the cluster behaves with varying workflow with differing variables.
You can manually define some aspects of how it determines what is best, but the
process is automated.
An administrator can optimize layout decisions that OneFS makes to better suit the
workflow. The data access pattern influences how a file is written to the drives
during the write process.
2: Use Streaming for large streaming workflow data such as movie or audio files.
Streaming prefers to use as many drives as possible, within the given pool, when
writing multiple protection stripes for a file. Each file is written to the same sub pool
within the node pool. Streaming maximizes the number of active drives per node as
the streaming data is retrieved. Streaming also influences the prefetch caching
algorithm to be highly aggressive and gather as much associated data as possible.
The maximum number of drives for streaming is five drives per node across the
node pool for each file.
PowerScale Administration-SSP1
3: A random access pattern prefers using a single drive per node for all protection
stripes for a file, like a concurrency access pattern. With random however, the
prefetch caching request is minimal. Most random data does not benefit from
prefetching data into cache.
A 1 MB file is divided into eight data stripe units and three FEC units. The data is
laid out in three stripes. With a streaming access pattern, more spindles are
preferred. 1 MB file split into eight stripe unit and three stripes - streaming uses
spindles.
Streaming
N +1n
1024 KB
file
8 X 128
KB chunk
3 stripes and 3
drives wide
The graphic is a representation of a Gen 6 chassis with four nodes. Each node has five drive sleds.
Each drive sled has three disks. The orange disk represents a neighborhood. The disk that is used
is in the same neighborhood (orange), do not traverse to disks in the other neighborhoods (gray)
A 1-MB file is divided into eight data stripe units and three FEC units. The data is
laid out in three stripes, one drive wide.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 213
Foundations of Data Protection and Data Layout
Concurrency
N+ 1
1024 KB
file
8 X 128
KB chunk
3 stripes and 1
drive wide
The graphic is a representation of a Gen 6 chassis with four nodes. Each node has five drive sleds.
Each drive sled has three disks. The orange disk represents a neighborhood.
Configuring the data access pattern is done on the file pool policy, or manually at
the directory and file level. Set data access patterns using the WebUI or use isi
set for directory and file level or isi file pool policy for file pool policy
level.
PowerScale Administration-SSP1
For WebUI Administration, go to File systems > Storage pools > File pool policies.
Challenge
IT Manager:
Open participation questions:
Question: What is the preferred file layout with a streaming
access pattern?
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 215
Configuring Storage Pools
PowerScale Administration-SSP1
Storage Pools
Scenario
IT Manager: Before you configure file policies and tiering data, I want
you to explain the components of storage pools.
Storage pools monitor the health and status at the node pool level. Using storage
pools, multiple tiers of nodes (node pools) can all co-exist within a single file
system, with a single point of management.
Node Pool
Node Pool
PowerScale Administration-SSP1
Gen 6 drive sleds have three, four, or six drives whereas the F200 has 4 drive bays
and the F600 has 8 drive bays.
The graphic shows a Gen 6 node pool that has two chassis, eight nodes, and each node having five
drive sleds with three disks.
Exploring the building blocks and features of storage pools helps understand the
underlying structure when moving data between tiers. The storage pool
components, SmartPools, File Pools and CloudPools, are covered in detail in other
topics.
PowerScale Administration-SSP1
Disk Pool
Neighborhood Tier
2
Neighborhoods
are a group of disk pools and can span from 4
up to 19 nodes for Gen 6 nodes. Nodes have a
single neighborhood from 1-to-19
nodes. Neighborhoods are
automatically assigned and not configurable.
119Not spanning disk pools the granularity at which files are striped to the cluster.
Disk pool configuration is automatic and cannot be configured manually. Removing
a sled does not cause data unavailability as only one disk per disk pool is
temporarily lost.
PowerScale Administration-SSP1
Gen 6 Neighborhood
A Gen 6 node pool splits into two neighborhoods when adding the 20th node 120.
One node from each node pair moves into a separate neighborhood.
Each
Single neighborh neighborhood
ood, 3 disk pools has 3 disk pools
in a 3 diskper
drive sled
example
At 40 nodes,protection against
chassis failure
120After the 20th node added up to the 39th node, no 2 disks in a given drive sled
slot of a node pair share a neighborhood. The neighborhoods split again when the
node pool reaches 40 nodes.
PowerScale Administration-SSP1
The graphic shows a 40 node cluster used to illustrate a chassis failure. Once the
40th node is added, the cluster splits into four neighborhoods, labeled NH 1
through NH 4.
PowerScale Administration-SSP1
Node Pool
SmartPools
SmartPools Basic121
121The basic version of SmartPools supports virtual hot spares, enabling space
reservation in a node pool for re protection of data. OneFS implements SmartPools
PowerScale Administration-SSP1
SmartPools Advanced122
File Pools
File pools are the SmartPools logical layer, at which file pool policies are applied.
User created, and defined policies are set on the file pools.
CloudPools
Moving the cold archival data to the cloud, lowers storage cost and optimizes
storage resources.
CloudPools offers the flexibility of another tier of storage that is off-premise and off-
cluster.
basic by default. You can create multiple node pools, but only a single tier and only
a single file pool. A single tier has only one file pool policy that applies the same
protection level and I/O optimization settings to all files and folders in the cluster.
122More advanced features are available in SmartPools with a license. With the
advanced features you can create multiple tiers and file pool policies that direct
specific files and directories to a specific node pool or a specific tier. Advanced
features include the ability to create multiple storage tiers, multiple file pool policy
targets, and multiple file pool policies.
PowerScale Administration-SSP1
Node Loss: A loss of a node does not automatically start reprotecting data. Many
times a node loss is temporary, such as a reboot. If N+1 data protection is
configured on a cluster, and one node fails, the data is accessible from every other
node in the cluster. If the node comes back online, the node rejoins the cluster
automatically without requiring a rebuild. If the node is physically removed, it must
also be smartfailed. Only Smartfail nodes when needing to remove from the cluster
permanently.
The graphic shows the isi storagepool settings view command with user
configured settings highlighted.
Serviceability
Listed are the CLI options that can help get information about storage pools.
• isi status -p
PowerScale Administration-SSP1
Challenge
Lab Assignment: Go to the lab and verify the storage pool settings.
PowerScale Administration-SSP1
File Pools
Scenario
IT Manager: Our media team needs their storage on disks that do not
compete with the other disk.
Your Challenge: The IT manager has tasked you to segregate data into
different node pools.
H400
F200
PowerScale Administration-SSP1
File pool policies automate file movement, enabling users to identify and move
logical groups of files.
• User-defined filters123
• File-based, not hardware-based124
• User-defined or default protection and policy settings125
The example shows that each policy has a different optimization and protection
level. A file that meets the policy criteria for tier 3 is stored in the tier 3 node pool
with +3d:1n1d protection. Also, the file is optimized for streaming access.
The default file pool policy is defined under the default policy.
123Files and directories are selected using filters and apply actions to files
matching the filter settings. The policies are used to change the storage pool
location, requested protection settings, and I/O optimization settings.
124
Each file is managed independent of the hardware, and is controlled through the
OneFS operating system.
125 Settings are based on the user-defined and default storage pool policies. File
pool policies add the capability to modify the settings at any time, for any file or
directory.
PowerScale Administration-SSP1
2
1
3
4
1: The individual settings in the default file pool policy apply to files without settings
that are defined in another file pool policy that you create. You cannot reorder or
remove the default file pool policy.
2: To modify the default file pool policy, click File system, click Storage pools,
and then click the File pool policies tab. On the File pool policies page, next to
the Default policy, click View/Edit.
3: You can choose to have the data that applies to the Default policy target a
specific node pool or tier or go anywhere. Without a license, you cannot change
the anywhere target. If existing file pool policies direct data to a specific storage
pool, do not configure other file pool policies with anywhere.
4: You can define the SSD strategy for the Default policy.
5: You can specify a node pool or tier for snapshots. The snapshots can follow the
data, or go to a different storage location.
6: Assign the default requested protection of the storage pool to the policy, or set a
specified requested protection.
PowerScale Administration-SSP1
8: In the Data access pattern section, you can choose between Random,
Concurrency, or Streaming.
PowerScale Administration-SSP1
This example is a use case where a media-orientated business unit wants greater
protection and an access pattern that is optimized for streaming.
A tier that is called media_tier with a node pool has been created.
The business unit targets their mp4 marketing segments to the media_tier where
the hosting application can access them.
PowerScale Administration-SSP1
Create the filters in the File matching criteria section when creating or editing a
file pool policy.
Filter elements:
• Filter type126
126 File pool policies with path-based policy filters and storage pool location actions
are run during the write of a file matching the path criteria. Path-based policies are
first started when the SmartPools job runs, after that they are started during the
matching file write. File pool policies with storage pool location actions, and filters
that are based on other attributes besides path, write to the node pool with the
highest available capacity. The initial write ensures that write performance is not
sacrificed for initial data placement.
PowerScale Administration-SSP1
• Operators127
• Multiple criteria128
SSD Options
With the exception of F-Series nodes, if a node pool has SSDs, by default the L3
cache is enabled on the node pool. To use the SSDs for other strategies, first
disable L3 cache on the node pool. Manually enabling SSD strategies on specific
files and directories is not recommended.
SSDs for Metadata Read Acceleration is the recommended setting. The setting
uses one metadata mirror, other mirrors and data on HDDs.
Pros Cons
127Operators can vary according to the selected filter. You can configure the
comparison value, which also varies according to the selected filter and
operator. The Ignore case box should be selected for files that are saved to the
cluster by a Windows client.
128The policy requires at least one criterion, and allows multiple criteria. You can
add AND or OR statements to a list of criteria. Using AND adds a criterion to the
selected criteria block. Files must satisfy each criterion to match the filter. You can
configure up to three criteria blocks per file pool policy.
PowerScale Administration-SSP1
Helps Job Engine - all random Usually shows small SSD utilization:
lookups and treewalks are faster as clients may ask “Where is the value”
one copy of metadata is always on or complain it was over configured
SSD.
Metadata read/write acceleration requires more SSD space. Writes all metadata
mirrors to SSDs and can consume up to six times more SSD space.
Pros Cons
Metadata updates hit SSDs - speeds Overfilling SSDs can have significant
up creates, writes, and deletes impact – manage with care.
including SnapShot deletes.
Does not show the full utilization until
the file system capacity is high.
Use SSDs for data and metadata requires the most space. Writes all data and
metadata for a file on SSDs.
Pros Cons
PowerScale Administration-SSP1
Use file pool policies designating Must manage total SSD capacity
specific path for the data on SSDs. utilization - can push metadata from
SSD, which has a wide impact.
Avoid SSDs
Using the avoid SSDs option affects performance. This option writes all file data
and all metadata mirrors to HDDs. Typically, use this setting when implementing L3
cache and GNA in the same cluster. You create a path-based file pool policies that
targets an L3 cache enabled node pool. The data SSD strategy and snapshot SSD
strategy for this L3 cache enabled node pole should be set to ‘Avoid SSD’.
PowerScale Administration-SSP1
The FilePolicy job on the WebUI Cluster management > Job operations > Job types page.
129 The SetProtectPlus job applies the default file pool policy.
130
When SmartPools is licensed, the SmartPools job processes and applies all file
pool policies. By default, the job runs at 22:00 hours every day at a low priority.
PowerScale Administration-SSP1
Policy Template
Policy templates on the WebUI File system > Storage pools > File pool policies page.
Template settings are preset to the name of the template along with a brief
description. You can change the settings.
Template considerations:
• Opens a partially populated, new file pool policy.
• You must rename the policy.
• You can modify and add criteria and actions.
• Use in web administration interface only.
131 Uses a file system index database on the file system instead of the file system
itself to find files needing policy changes. By default, the job runs at 22:00 hours
every day at a low priority. The FilePolicy job was introduced in OneFS 8.2.0.
132The SmartPoolsTree job is used to apply selective SmartPools file pool policies.
The job runs the "isi filepool apply" command. The Job Engine manages the
resources that are assigned to the job. The job enables for testing file pool policies
before applying them to the entire cluster.
PowerScale Administration-SSP1
Plan to add more node capacity when the cluster reaches 80% so that it does not
reach 90%. The cluster needs the extra capacity for moving around data, and for
the VHS space to rewrite data when a drive fails. Listed are more considerations.
• Avoid overlapping file policies where files may match more than one rule. If data
matches multiple rules, only the first rule is applied.
• File pools should target a tier and not a node pool within a tier.
• You can use the default policy templates as examples.
PowerScale Administration-SSP1
Serviceability
Example output of the 'isi filepool apply <path/file> -n -v -s' command with truncated output.
Listed here are the CLI options that can help get information about file pools.
• If file pool policy rules are not being applied properly, check the policy order.
• Test file pool policy before applying.
Challenge
PowerScale Administration-SSP1
SmartPools
Scenario
SmartPools Overview
SmartPools enables the grouping of nodes into storage units that include node
pools, CloudPools, and tiers.
With SmartPools, you can segregate data based on its business value, putting data
on the appropriate tier of storage with appropriate levels of performance and
protection.
133Node pool membership changes through the addition or removal of nodes to the
cluster. Typically, tiers are formed when adding different node pools on the cluster.
PowerScale Administration-SSP1
SmartPools Licensing
Because of the availability to have multiple data target locations, some additional
target options are enabled in some global settings.
PowerScale Administration-SSP1
SmartPool Settings
Cache Statistics
GNA
SmartPools can automatically transfer data among tiers with different performance
and capacity characteristics.
Global namespace acceleration, or GNA, enables the use of SSDs for metadata
acceleration across the entire cluster.
PowerScale Administration-SSP1
GNA Aspects
Pros Cons
Allows metadata read acceleration Difficult to manage and size the disk.
for non-SSD nodes - need some
nodes with SSDs Hard rules and limits
Helps Job Engine and random reads Links expansion of one tier to another
tier to adhere to the limits
L3Cache
L3 cache is enabled by default for all new node pools that are added to a cluster.
L3 cache is either on or off and no other visible configuration settings are available.
Any node pool with L3 cache enabled is excluded from GNA space calculations
and do not participate in GNA enablement.
PowerScale Administration-SSP1
The left graphic shows global setting. The right graphic shows L3 cache enable or disable on each
node pool separately. Click image to enlarge.
VHS
Virtual hot spare, or VHS, allocation enables space to rebuild data when a drive
fails.
When selecting the option to reduce the amount of available space, free-space
calculations exclude the VHS reserved space.
OneFS uses the reserved VHS free space for write operations unless you select
the option to deny new data writes.
Command example that reserves 10% capacity for VHS: isi storagepool
settings modify --virtual-hot-spare-limit-percent 10
PowerScale Administration-SSP1
Spillover
With the licensed SmartPools module, you can direct data to spillover to a specific
node pool or tier group.
Actions
If you clear the box (disable), SmartPools does not modify or manage settings on
the files.
PowerScale Administration-SSP1
Protection example: If a +2d:1n protection is set and the disk pool suffers three
drive failures, the data that is not lost can still be accessed. Enabling the option
ensures that intact data is still accessible. If the option is disabled, the intact file
data is not accessible.
GNA can be enabled if 20% or more of the nodes in the cluster contain SSDs and
1.5% or more of the total cluster storage is SSD-based. The recommendation is
that at least 2.0% of the total cluster storage is SSD-based before enabling GNA.
Going below the 1.5% SSD total cluster space capacity requirement automatically
disables GNA metadata. If you SmartFail a node that has SSDs, the SSD total size
percentage or node percentage containing SSDs could drop below the minimum
requirement, disabling GNA. Any node pool with L3 cache enabled is excluded
from GNA space calculations and do not participate in GNA enablement.
GNA also uses SSDs in one part of the cluster to store metadata for nodes that
have no SSDs. The result is that critical SSD resources are maximized to improve
performance across a wide range of workflows.
PowerScale Administration-SSP1
VHS example: If specifying two virtual drives or 3%, each node pool reserves
virtual drive space that is equivalent to two drives or 3% of their total capacity for
VHS, whichever is larger. You can reserve space in node pools across the cluster
for this purpose, equivalent to a maximum of four full drives. If using a combination
of virtul drives and total disk space, the larger number of the two settings
determines the space allocation, not the sum of the numbers.
SmartPools Considerations
PowerScale Administration-SSP1
• Disk pools are not user configurable, and a disk drive is only a member on one
disk pool or neighborhood.
• Node pools must have at least four nodes for Gen 6 and at least three nodes for
the F200/600. The default is one node pool per node type and configuration.
• The file pool policy default is all files are written anywhere on cluster. To target
more node pools and tiers, activate the SmartPools license.
Challenge
PowerScale Administration-SSP1
CloudPools
Scenario
IT Manager: Next, take the file pool policies to the CloudPools level. For
some of the long-term archive data, the group is looking at cloud
options.
CloudPools offers the flexibility of another tier of storage that is off-premise and off-
cluster. Essentially what CloudPools do is provide a lower TCO134 for archival-type
data. Customers who want to run their own internal clouds can use a PowerScale
installation as the core of their cloud.
The video provides a CloudPools overview and use case. See the student guide for
a transcript of the video.
PowerScale Administration-SSP1
Link:
https://edutube.emc.com/html5/videoPlayer.htm?vno=wx4VTLcN32kSlHGFwGLE1
Q
Shown is an Isilon cluster with twelve nodes. A key benefit of CloudPools is the
ability to interact with multiple cloud vendors. Shown in the graphic are the
platforms and vendors that are supported as OneFS 8.1.1.
Let us look at an example, each chassis in the cluster represents a tier of storage.
The topmost chassis is targeted for the production high-performance workflow and
PowerScale Administration-SSP1
may have node such as F800s. When data is no longer in high demand,
SmartPools moves the data to the second tier of storage. The example shows the
policy moves data that is not accessed and that is over thirty days old. Data on the
middle tier may be accessed periodically. When files are no longer accessed for
more than 90 days, SmartPools archives the files to the lowest chassis or tier such
as A200 nodes.
The next policy moves the archive data off the cluster and into the cloud when data
is not accessed for more than 180 days. Stub files that are also called SmartLinks
are created. Stub files consume approximately 8 KB space on the Isilon cluster.
Files that are accessed or retrieved from the cloud, or files that are not fully moved
to the cloud, have parts that are cached on the cluster and are part of the stub file.
The storing of CloudPools data and user access to data that is stored in the cloud
is transparent to users.
CloudPools files undergo a compression algorithm and then are broken into their 2
MB cloud data objects or CDOs for storage. The CDOs conserve space on the
cloud storage resources. Internal performance testing does note a performance
penalty for a plane compression and decompressing files on read. Encryption is
applied to file data transmitting to the cloud service. Each 128 KB file block is
encrypted using a AES 256 encryption. Then transmitted as an object to the cloud.
Internal performance testing notes a little performance penalty for encrypting the
data stream.
CloudPools Considerations
CloudPools uses the SmartPools framework to move data and state information to
off-cluster storage while retaining the ability to read, modify, and write to data.
PowerScale Administration-SSP1
CloudPools Administration
Configure and manage CloudPools from the WebUI File system, Storage pools
page, CloudPools tab. Managing CloudPools using the CLI is done with the isi
cloud command.
135 In OneFS 8.2, CloudPools compress data before sending it over the wire.
PowerScale Administration-SSP1
CloudPools Tab
Once the SmartPools and CloudPools licenses are applied, the WebUI shows the
cloud storage account options.
After a cloud storage account is defined and confirmed, the administrator can
define the cloud pool itself.
The file pool policies enable the definition of a policy to move data out to the cloud.
PowerScale Administration-SSP1
Must be unique
The graphic shows the window for creating a cloud storage account.
After creating a storage account, create a CloudPool and associate or point it to the
account.
PowerScale Administration-SSP1
CloudPools SmartLink
Run the isi get -D command to see files archived to the cloud using
CloudPools.
The example checks to see if the local version on the cluster is a SmartLink file.
PowerScale Administration-SSP1
SmartPools file pool policies are used to move data from the cluster to the selected
CloudPools storage target.
When configuring a file pool policy, you can apply CloudPools actions to the
selected files.
CloudPools Settings
You may want to modify the settings for the file pool policy based on your
requirements. Modifications are not necessary for most workflows. You can elect to
encrypt and compress data.
1
2
3
4
7
5
8
6
9
10
1: The default CloudPools setting allows you to archive files with snapshot
versions, but you can change the default setting.
2: You can encrypt data prior to archiving it to the cloud. Cloud data is decrypted
when accessed or recalled.
3: You can compress data prior to archiving to the cloud. Cloud data is
decompressed when accessed or recalled.
PowerScale Administration-SSP1
4: Set how long to retain cloud objects after a recalled file replaces the SmartLink
file. After the retention period, the cloud objects garbage collector job cleans up the
local resources allocated for the SmartLink files, and removes the associated cloud
objects.
5: If a SmartLink file has been backed up and the original SmartLink file is
subsequently deleted, associated cloud objects are deleted only after the retention
time of the backed-up SmartLink file has expired.
6: If a SmartLink file has been backed up and the original SmartLink file is
subsequently deleted, associated cloud objects are deleted only after the original
retention time, or a longer incremental or full backup retention period, has expired.
7: Specifies how often SmartLink files modified on the cluster are written to their
associated cloud data objects.
8: Determines whether cloud data is cached when a file is accessed on the local
cluster.
9: Specifies whether cloud data is fully or partially recalled when you access a
SmartLink file on the cluster.
10: Specifies how long the system retains recalled cloud data that is in the cache of
associated SmartLink files.
The graphic shows various default advanced CloudPool options that are configured.
The output of the isi cloud command shows the actions that you can take.
PowerScale Administration-SSP1
1 3 5 7 10
2 4 6 8 9
1: Use to grant access to CloudPool accounts and file pool policies. You can add
and remove cloud resource, list cluster identifiers, and view cluster details.
2: Used to manage CloudPool accounts. You can create, delete, modify, and
view a CloudPool account, and list the ClouldPool accounts.
3: Use to archive or recall files from the cloud. Specify files individually, or use a file
matching pattern. Files that are targeted for archive must match the specified file
pool policy, or any file pool policy with a cloud target.
4: Use to manage CloudPools TLS client certificates. You can delete, import,
modify, view, and list certificates.
6: Use to configure and manage a CloudPool pool. You can create, delete,
modify, list, and view pools. OneFS no longer accesses the associated cloud
storage account when it is deleted. If a file pool policy references the CloudPool,
OneFS does not allow the delete.
7: Use to manage network proxies. You can create, delete, modify, list, and
view proxies. CloudPools prevents deletion of a proxy that is attached to a cloud
storage account.
8: Files that are stored in the cloud can be fully recalled using the isi cloud
recall command. Recall can only be done using the CLI. When recalled, the full
file is restored to its original directory. The file may be subject to the same file pool
policy that originally archived it, and rearchive it to the cloud on the next
SmartPools job run. If re-archiving is unintended, the recalled file should be moved
to a different, unaffected, directory. The recalled file overwrites the stub file. You
can start the command for an individual file or recursively for all files in a directory
path.
PowerScale Administration-SSP1
9: Use to manage CloudPool top-level settings. You can list and modify
CloudPool settings, and regenerate the CloudPool master encryption key.
10: Use to restore the cloud object index (COI) for a cloud storage account on the
cluster. The isi cloud access add command also restores the COI for a cloud
storage account.
• Support137
137C2S support delivers full CloudPools functionality for a target endpoint, and
supports the use with C2S Access Portal (CAP), and X.509 client certificate
PowerScale Administration-SSP1
• Integration138
• No Internet connection139
CloudPools Limitations
In a standard node pool, file pool policies can move data from high-performance
tiers to storage tiers and back as defined by their access policies. However, data
that moves to the cloud remains stored in the cloud unless an administrator
explicitly requests data recall to local storage. If a file pool policy change is made
that rearranges data on a normal node pool, data is not pulled from the cloud.
Public cloud storage often places the largest fees on data removal, thus file pool
policies avoid removal fees by placing this decision in the hands of the
administrator.
The connection between a cluster and a cloud pool has limited statistical features.
The cluster does not track the data storage that is used in the cloud, therefore file
spillover is not supported. Spillover to the cloud would present the potential for file
recall fees. As spillover is designed as a temporary safety net, once the target pool
capacity issues are resolved, data would be recalled back to the target node pool
and incur an unexpected fee.
authority. C2S also provides support (from AIMA) to securely store certificates,
validate, and refresh if needed.
139This service is 'air gapped' which means it has no direct connection to the
Internet.
PowerScale Administration-SSP1
Statistic details, such as the number of stub files on a cluster or how much cache
data is stored in stub files and would be written to the cloud on a flush of that
cache, is not easily available. No historical data is tracked on the network usage
between the cluster and cloud either in writing traffic or in read requests. These
network usage details should be viewed from the cloud service management
system.
Challenge
PowerScale Administration-SSP1
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 261
Configuring Data Services
File Filtering
Scenario
Your Challenge: The IT manager wants you to explain file filtering and
configure the shares to filter unnecessary files.
The graphic shows that .avi files are prevented from writing to
the finance access zone.
PowerScale Administration-SSP1
File filtering enables administrators to deny or allow file access on the cluster that is
based on the file extension.
• Denies writes for new files.
• Prevents accessing existing files.
• Explicit deny lists.140
• Explicit allow lists.141
• No limit to extension list.
• Per access zone.142
• Configurable for the SMB defaults143.
• No license is required.
140Explicit deny lists are used to block only the extensions in the list. OneFS
permits all other file types to be written. Administrators can create custom
extension lists based on specific needs and requirements.
141
Explicit allow list permits access to files only with the listed file extensions.
OneFS denies writes for all other file types.
142 The top level of file filtering is set up per access zone. When you enable file
filtering in an access zone, OneFS applies file filtering rules only to files in that
access zone.
143OneFS does not take into consideration which file sharing protocol was used to
connect to the access zone when applying file filtering rules. However, you can
apply additional file filtering at the SMB share level.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 263
Configuring Data Services
If enabling file filtering on an access zone with existing shares or exports, the file
extensions determine access to the files.
• User denied access.144
• Administrator access.145
144 Users cannot access any file with a denied extension. The extension can be
denied through the denied extensions list, or because the extension was not
included as part of the allowed extensions list.
145 Administrators can still access existing files. Administrators can read the files or
delete the files. Administrators with direct access to the cluster can manipulate the
files.
PowerScale Administration-SSP1
146How the file filtering rule is applied to the file determines where the file filtering
occurs. If a user or administrator accesses the cluster through an access zone or
SMB share without applying file filtering, files are fully available.
147 File filters are applied only when accessed over the supported protocols.
149 With the compliance considerations today, organizations struggle to meet many
of the requirements. For example, many organizations are required to make all
emails available for litigation purpose. To help ensure that email is not stored
longer than wanted, deny storing .pst.
150Another use case is to limit the cost of storage. Organizations may not want
typically large files, such as video files, to be stored on the cluster, so they can
deny .mov or .mp4 file extension.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 265
Configuring Data Services
When you enable file filtering in an access zone, OneFS applies file filtering rules
only to files in that access zone.
151
An organizational legal issue is copyright infringement. Many users store their
.mp3 files on the cluster and open a potential issue for copyright infringement.
152 Another use case is to limit an access zone for a specific application with its
unique set of file extensions. File filtering with an explicit allow list of extensions
limits the access zone or SMB share for its singular intended purpose.
PowerScale Administration-SSP1
1. Select access
zone
3. Select to add or
deny
2. Enable -
unchecked by
default
Access zone level: Web UI: Access > File filter > File filter settings.
You can configure file filters on the Protocols > Windows sharing (SMB) >
Default share settings page153.
File filtering settings can be modified by changing the filtering method or editing file
extensions.
• Browse to Access > File Filter, and select the access zone that needs to be
modified from the Current Access Zone drop down list.
• Clear Enable file filters check box to disable file filtering in access zone.
153 Configuring file filters on individual SMB shares enables more granular control.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 267
Configuring Data Services
• Select to deny or allow and then enter the extension of the file, and click submit.
• Click the Remove Filter button next to the extension to remove a file name
extension.
CLI: isi smb shares create and isi smb shares modify commands. If
using RBAC, the user must have the ISI_PRIV_FILE_FILTER privilege.
Challenge
PowerScale Administration-SSP1
SmartQuotas
Scenario
This video provides an overview for SmartQuotas. See the student guide for a
transcript of the video.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 269
Configuring Data Services
Link:
https://edutube.emc.com/Player.aspx?vno=tCIE1bGAUz6k3W1ic8tZfw==&autoplay
=true
SmartQuotas is a software module that is used to limit, monitor, thin provision, and
report disk storage usage at the user, group, and directory levels. Administrators
commonly use file system quotas for tracking and limiting the storage capacity that
a user, group, or project can consume. SmartQuotas can send automated
notifications when storage limits are exceeded or approached.
Quotas are a useful way to ensure that a user or department uses only their share
of the available space. SmartQuotas are also useful for enforcing an internal
chargeback system. SmartQuotas contain flexible reporting options that can help
administrators analyze data usage statistics for their Isilon cluster. Both
enforcement and accounting quotas are supported, and various notification
methods are available.
Before OneFS 8.2, SmartQuotas reports the quota free space only on directory
quotas with a hard limit. For user and group quotas, SmartQuotas reports the size
of the entire cluster capacity or parent directory quota, not the size of the quota.
OneFS 8.2.0 includes enhancements to report the quota size for users and groups.
The enhancements reflect the true available capacity that is seen by the user.
SmartQuotas Implementation
You can choose to implement accounting quotas or enforcement quotas. The table
below displays the difference between the types.
PowerScale Administration-SSP1
Enforcement Quotas
Quota Types
2: User and default user quotas: User quotas are applied to individual users, and
track all data that is written to a specific directory. User quotas enable the
administrator to control the capacity any individual user consumes in a particular
directory. Default user quotas are applied to all users, unless a user has an
explicitly defined quota for that directory. Default user quotas enable the
administrator to apply a quota to all users, instead of individual user quotas.
3: Group and default group quotas: Group quotas are applied to groups and limit
the amount of data that the collective users within a group can write to a directory.
Group quotas function in the same way as user quotas, except for a group of
people and instead of individual users. Default group quotas are applied to all
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 271
Configuring Data Services
groups, unless a group has an explicitly defined quota for that directory. Default
group quotas operate like default user quotas, except on a group basis.
With default directory quotas, you can apply a template configuration to another
quota domain.
The graphic shows an example of creating a 10-GB hard quota, default directory
quota on the /ifs/sales/promotions directory. The directory default quota is not in
and of itself a quota on the promotions directory. Directories below the promotions
directory, such as the /Q1 and /Q2 directories inherit and apply the 10 GB quota.
The /Q1 domain and the /Q2 domain are independent of each other. Sub
directories such as /storage and /servers do not inherit the 10 GB directory
quota.Given this example, if the /Q2 folder reaches 10 GB, that linked quota is
independent of the 10 GB default directory quota on the parent directory.
Modifications to default directory quota, promotions, reflect to inherited quotas
asynchronously. Inheritance is seen when listing quotas, querying inheriting quota
record, or when I/O happen in the sub directory tree.
PowerScale Administration-SSP1
You can use the WebUI to view the created quotas and their links. See the student
guide for information about quota links.
The top example shows creating a template on the Features directory. The
directory has a hard limit of 10 GB, an advisory at 6 GB, and a soft limit at 8 GB
with a grace period of 2 days.
The Unlink option makes the quota independent of the parent, meaning
modifications to the default directory quota no longer apply to the sub directory.
This example shows removing the link on the Screen_shots sub directory and then
modifying the default directory quota on the parent, Quota, directory. Remove the
link using the button on the WebUI or isi quota quotas modify --
path=/ifs/training/Features/Quota/Screen_shots --
type=directory --linked=false. Using the --linked=true option re-links
or links to the default directory quota.
154 The 'isi quota' command is used to create the default directory quota.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 273
Configuring Data Services
Quota Accounting
Count all snapshot data in usage limits - sum of the current directory and any
snapshots of that directory 1 KB file
The quota accounting options are Include snapshots in the storage quota, 155and
155Tracks both the user data and any associated snapshots. A single path can
have two quotas that are applied to it, one without snapshot usage (default) and
one with snapshot usage. If snapshots are in the quota, more files are in the
calculation.
156Enforces the File system logical size quota limits. The default setting is to only
track user data, not accounting for metadata, snapshots, or protection.
PowerScale Administration-SSP1
• Physical size157
• Application logical size158 (OneFS 8.2 and later)
Overhead Calculations
2x data protection
157Tracks the user data, metadata, and any associated FEC or mirroring overhead.
This option can be changed after the quota is defined.
158 Tracks the usage on the application or user view of each file. Application logical
size is typically equal or less than file system logical size. The view is in terms of
how much capacity is available to store logical data regardless of data reduction,
tiering technology, or sparse blocks. The option enforces quotas limits, and reports
the total logical data across different tiers, such as CloudPools.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 275
Configuring Data Services
10 GB for the file and 10 GB for the data-protection overhead. The user has
reached 50% of the 40 GB quota by writing a 10 GB file to the cluster.
150 TB capacity
159 With thin provisioning, the cluster can be full even while some users or
directories are well under their quota limit. Configuring quotas that exceed the
cluster capacity enables a smaller initial purchase of capacity/nodes.
160
Thin provisioning lets you add more nodes as needed, promoting a capacity on-
demand model.
PowerScale Administration-SSP1
• Management reduction.161
• Careful monitoring.162
Quota Nesting
Nesting quotas is having multiple quotas within the same directory structure.
User quota = 25 GB
Directory = 1 TB
Directory can be any size up to 1 TB -
each user can only store 25 GB
Directory structure cannot exceed 1
TB
No quota
Nesting - multiple quotas within same
directory structure
At the top of the hierarchy, the /ifs/sales folder has a directory quota of 1 TB. Any
user can write data into this directory, or the /ifs/sales/proposals directory, up to a
combined total of 1 TB. The /ifs/sales/promotions directory has a user quota
assigned that restricts the total amount that any single user can write into this
directory to 25 GB. Even though the parent directory (sales) is below its quota
restriction, a user is restricted within the promotions directory. The
/ifs/sales/customers directory has a directory quota of 800 GB that restricts the
capacity of this directory to 800 GB. However, if users place 500 GB of data in the
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 277
Configuring Data Services
/ifs/sales/proposals directory, users can only place 500 GB in the other directories.
The parent directory cannot exceed 1 TB.
Create example
Modify example
View example
In OneFS 8.2.0 and later, you can view advisory and soft quota limits as a percent
of the hard quota limit.
A hard limit must exist to set the advisory and soft percentage.
PowerScale Administration-SSP1
Quota Notifications
Rules
Send notifications by email or through a cluster event. See the student guide for
more information.
The email option sends messages using the default cluster settings. You can send
the email to the owner of the event, or to an alternate contact, or both the owner
and an alternate. You can also use a customized email message template. Use a
distribution list to send the email to multiple users.
If using LDAP or Active Directory to authenticate users, the cluster uses the user
email setting that is stored within the directory. If no email information is stored in
the directory, or if a Local or NIS provider authenticates, you must configure a
mapping rule.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 279
Configuring Data Services
The graphic shows one of the available quota templates that are located in the
/etc/ifs directory.
• PAPI support163.
• OneFS 8.2 enhancements164.
164In OneFS 8.2.0, administrators can configure quota notification for multiple
users. The maximum size of the comma-separated email ID list is 1024 characters.
The isi quota command option --action-email-address field accepts multiple
comma-separated values.
PowerScale Administration-SSP1
Template Variables
An email template contains variables. You can use any of the SmartQuotas
variables in your templates.
Considerations
• Increased from 20,000 quota limits per cluster to 500,000 quota limits per
cluster.
• Quota notification daemon optimized to handle about 20 email alerts per
second.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 281
Configuring Data Services
• Support for the rpc.quotad service in the NFS container with some
statistics.
Best Practice:
• Do not enforce quotas on file system root (/ifs).
• Do not configure quotas on SyncIQ target directories.
Challenge
PowerScale Administration-SSP1
SmartDedupe
Scenario
IT Manager: The cluster is hosting home directories for the users. Much
of the data is shared and has multiple copies. Deduplication should help
address the inefficient use of space.
SmartDedupe Overview
Multiple instances of
Single instance of
identical data
data
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 283
Configuring Data Services
SmartDedupe Architecture
4 3 5
1: The SmartDedupe control path consists of PowerScale OneFS WebUI, CLI and
RESTful PAPI, and is responsible for managing the configuration, scheduling, and
control of the deduplication job.
When SmartDedupe runs for the first time, it scans the data set and selectively
samples blocks from it, creating the fingerprint index. This index contains a sorted
list of the digital fingerprints, or hashes, and their associated blocks. Then, if they
PowerScale Administration-SSP1
are determined to be identical, the block’s pointer is updated to the already existing
data block and the new, duplicate data block is released.
3: Shadow stores are similar to regular files but are hidden from the file system
namespace, so cannot be accessed via a path name. A shadow store typically
grows to a maximum size of 2GB, with 32,000 files referring each block. If the
reference count limit is reached, a new block is allocated, which may or may not be
in the same shadow store. Also shadow stores do not reference other shadow
stores. And snapshots of shadow stores are not permitted because the data that is
stored in shadow stores cannot be overwritten.
SmartDedupe Considerations
• SmartDedupe License165
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 285
Configuring Data Services
166
Deduplication is most effective for static or archived files and directories - less
modified files equals less negative effect.
168
In-line data deduplication and in-line data compression is supported in the F810
and H5600 platforms in OneFS 8.2.1.
169Deduplication does not occur across the length and breadth of the entire cluster,
but only on each disk pool individually.
170 Data that is moved between node pools may change what level of deduplication
is available. An example would be a file pool policy that moves data from a high-
performance node pool to nearline storage. The data would no longer be available
for deduplication for the other data on the high-performance node pool, but would
be newly available for deduplication on nearline storage.
171
Metadata is changed more frequently, sometimes in trivial ways, leading to poor
deduplication.
PowerScale Administration-SSP1
173The default size of a shadow store is 2 GB, and each shadow store can contain
up to 256,000 blocks. Each block in a shadow store can be referenced up to
32,000 times.
174 When deduplicated files are replicated to another PowerScale cluster or backed
up to a tape device, the deduplicated files no longer share blocks on the target
cluster or backup device. Although you can deduplicate data on a target
PowerScale cluster, you cannot deduplicate data on an NDMP backup device.
Shadow stores are not transferred to target clusters or backup devices. Because of
this, deduplicated files do not consume less space than non deduplicated files
when they are replicated or backed up. To avoid running out of space, ensure that
target clusters and tape devices have free space to store deduplicated data.
175SmartDedupe will not deduplicate the data stored in a snapshot. However, you
can create snapshots of deduplicated data. If deduplication is enabled on a cluster
that already has a significant amount of data stored in snapshots, it will take time
before the snapshot data is affected by deduplication. Newly created snapshots will
contain deduplicated data, but older snapshots will not.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 287
Configuring Data Services
SmartDedupe Function
A job in the OneFS Job Engine178 runs through blocks that are saved in every disk
pool, and compares the block hash values.179
176Only one deduplication job can run at a time - uses CPU and memory
resources, and you should run at non peak or off hour times.
178 The job first builds an index of blocks, against which comparisons are done in a
later phase, and ultimately confirmations and copies take place. The deduplication
job can be a time consuming, but because it happens as a job the system load
throttles, the impact is absolute. Administrators find that their cluster space usage
has dropped once the job completes.
179If a match is found, and confirmed as a true copy, the block is moved to the
shadow store, and the file block references are updated in the metadata.
PowerScale Administration-SSP1
2 3 4 5
2: Compare 8 KB blocks.
5: Free blocks
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 289
Configuring Data Services
180A home directory scenario where many users save copies of the same file can
offer excellent opportunities for deduplication.
181 Static, archival data is seldom changing, therefore the storage that is saved may
far outweigh the load dedupe places on a cluster. Deduplication is more justifiable
when the data is relatively static.
182 Workflows that create many copies of uncompressed virtual machine images
can benefit from deduplication. Deduplication does not work well with compressed
data, the compression process tends to rearrange data to the point that identical
files in separate archives are not identified as such. Environments with many
unique files do not duplicate each other, so the chances of blocks being found
which are identical are low.
PowerScale Administration-SSP1
SmartDedupe Jobs
Because the sharing phase is the slowest deduplication phase, a dry run, or
DedupeAssessment, returns an estimate of capacity savings.
1: The assessment enables a customer to decide if the savings that are offered by
deduplication are worth the effort, load, and cost.
2: Dedupe works on datasets which are configured at the directory level, targeting
all files and directories under each specified root directory. Multiple directory paths
can be specified as part of the overall deduplication job configuration and
scheduling.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 291
Configuring Data Services
SmartDedupe Administration
The WebUI SmartDedupe management is under the File system menu options.
Enter the paths for deduplication183 from the Settings tab.
From the Deduplication window, you can start a deduplication job and view any generated reports.
Challenge
183 Selecting specific directory gives the administrator granular control to avoid
attempting to deduplicate data where no duplicate blocks are expected, like large
collections of compressed data. Deduplicating an entire cluster without considering
the nature of the data is likely to be inefficient.
PowerScale Administration-SSP1
SnapshotIQ
Scenario
SnapshotIQ Overview
If you modify a file and determine that the changes are unwanted, you can copy or
restore the file from the earlier file version.
You can use snapshots to stage content to export, and ensure that a consistent
point-in-time copy of the data is replicated or backed up.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 293
Configuring Data Services
The graphic represents the blocks for production data and the snapshot of that production data. The
snapshot is preserving the original blocks B and E after they have changed (B' and E').
184 Some OneFS operations generate snapshots for internal system use without
requiring a SnapshotIQ license. If an application generates a snapshot, and a
SnapshotIQ license is not configured, the snapshot can be still accessed. However,
all snapshots that OneFS operations generate are automatically deleted when no
longer needed. You can disable or enable SnapshotIQ at any time. Note that you
can create clones on the cluster using the "cp" command, which does not require a
SnapshotIQ license.
PowerScale Administration-SSP1
Snapshot Operations
Bloc
k D'
Snapshot Usage
copy original
block to Block A
snapshot
Block B
Snapshot File
Block C
Block D
Block D'
Snapshot growth: as the data is modified and only the changed data blocks are
contained186 in snapshots.
185A snapshot is not a copy of the original data, but only an extra set of pointers to
the original data. At the time it is created, a snapshot consumes a negligible
amount of storage space on the cluster. The original file references the snapshots.
186If data is modified on the cluster (Block D’ in the graphic), only one copy of the
changed data is made. With CoW the original block (Block D) is copied to the
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 295
Configuring Data Services
OneFS uses both Copy on Write, or, CoW and Redirect on Write, or, RoW.
CoW are typically user-generated snapshots and RoW are typically system-
generated snapshots.
Both methods have pros and cons, and OneFS dynamically picks the snapshot
method to use to maximize performance and keep overhead to a minimum.
A
COW ROW
A
B
Snapshot
Snapshot
B
C
C
File File System D
System
D D
' B
'
The graphic shows changes that are made to, D. Changes incur a double write penalty, there is less
fragmentation of the HEAD file, which is better for cache prefetch and related file reading functions.
snapshot. The snapshot maintains a pointer to the data that existed at the time that
the snapshot was created.
PowerScale Administration-SSP1
An unordered deletion is the deletion of a snapshot that is not the oldest snapshot
of a directory. For more active data, the configuration and monitoring overhead is
slightly higher, but fewer snapshots are retained.
The benefits of unordered deletions that are compared with ordered deletions
depend on how often the snapshots that reference the data are modified. If the
data is modified frequently, unordered deletions save space. However, if data
remains unmodified, unordered deletions are not likely to save space, and it is
recommended that you perform ordered deletions to free cluster resources.
In the graphic, /ifs/org/dir2 two has two snapshot schedules. If the retention period
on schedule 1 is longer than the retention period on schedule 2, the snapshots for
the directory are deleted out of order. Unordered deletions can take twice as long
to complete and consume more cluster resources than ordered deletions. However,
unordered deletions can save space by retaining a smaller total number of blocks
in snapshots.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 297
Configuring Data Services
Creating Snapshots
188Use shorter expiration periods for snapshots that are generated more
frequently, and longer expiration periods for snapshots that are generated less
frequently.
PowerScale Administration-SSP1
OneFS tracks snapshots in the .snapshot directory. Click each tab for information
about snapshot structure and access.
Snapshot location
Accessing snapshots
190From /ifs all the .snapshots on the system can be accessed, but users can only
open the .snapshot directories for which they already have permissions. Without
access rights users cannot open or view any .snapshot file for any directory.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 299
Configuring Data Services
Preserving Permissions
Snapshots can be taken at any point in the directory tree. Each department or user
can have their own snapshot schedule.
The snapshot preserves193 the file and directory permissions at that point in time of
the snapshot.
191This is a virtual directory where all the snaps listed for the entire cluster are
stored.
192 To view the snapshots on /ifs/eng/media, user can change directory (cd) to
/ifs/eng/media and access /.snapshot
193The snapshot owns the changed blocks and the file system owns the new
blocks. If the permissions or owner of the current file is changed, it does not affect
the permissions or owner of the snapshot version.
PowerScale Administration-SSP1
Restoring Snapshots
Restore Theory
A A
File
B Snapshot
System
Time 1
C
D
D
Snapshot
Time 2
E
Restore Target
Client
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 301
Configuring Data Services
QUESTION: What happens when the user wants to recover block A data that was
overwritten in Time 3 with A’?
Snapshot options
Clients with Windows Shadow Copy Client can restore the data from the snapshot.
PowerScale Administration-SSP1
List point in time copies of the files in To recover a file, use the "mv" or "cp"
the directory command
Clients accessing the export over NFS can navigate using the .snapshot directory.
To recover a deleted file, right-click the folder that previously contained the file,
click Restore Previous Version, and select the required file to recover. To restore a
corrupted or overwritten file, right-click the file itself, instead of the folder that
contains file, and then click Restore Previous Version.
No additional storage is consumed and the restore is instant when restoring the
production file from a snap using RoW. Snapshot Time 2 has preserved A. A
backup snapshot is automatically created before copying A back to the file system.
The backup is a failback or safety mechanism should the restore from the snap be
unacceptable and the user wants to revert to A’.
SnapshotIQ Considerations
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 303
Configuring Data Services
Challenge
PowerScale Administration-SSP1
SyncIQ
Scenario
SyncIQ delivers unique, highly parallel replication performance that scales with the
dataset to provide disaster recovery. The video provides an overview of SyncIQ.
See the student guide for a transcript of the video.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 305
Configuring Data Services
Link:
https://edutube.emc.com/Player.aspx?vno=OZC9t92nwmWVLWNjfT/+5w==&autop
lay=true
Shown is a cluster with the source directory using SyncIQ to replicate data to a
remote target directory. OneFS SyncIQ uses asynchronous replication, enabling
you to maintain a consistent backup copy of your data on another Isilon cluster.
Asynchronous replication is similar to an asynchronous file write.
The target system passively acknowledges receipt of the data and returns an ACK
once the target receives the entire file or update. Then the data is passively written
to the target. SyncIQ enables you to replicate data from one PowerScale cluster to
another. Activate a SyncIQ license on both the primary and the secondary Isilon
clusters before replicating data between them. You can replicate data at the
directory level while optionally excluding specific files and sub-directories from
being replicated.
PowerScale Administration-SSP1
Under each deployment, the configuration could be for the entire cluster or a
specified source directory. Also, the deployment could have a single policy that is
configured between the clusters or several policies, each with different options
aligning to RPO and RTO requirements.
Click the tabs to know more about each type of deployment Typologies.
One-to-one
One-to-many
SyncIQ supports data replication from a single source cluster to many target
clusters, allowing the same dataset to exist in multiple locations, as illustrated in the
graphic below. A one-to-many deployment could also be referenced as a hub-and-
spoke deployment, with a central source cluster as the hub and each remote
location representing a spoke.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 307
Configuring Data Services
Many-to-one
The many-to-one deployment topology is essentially the flipped version of the one-
to-many explained in the previous section. Several source clusters replicate to a
single target cluster as illustrated in the graphic below. The many-to-one topology
may also be referred to as a hub-and-spoke configuration. However, in this case,
the target cluster is the hub, and the spokes are source clusters.
Local Target
A local target deployment allows a single Isilon cluster to replicate within itself
providing the SyncIQ powerful configuration options in a local cluster as illustrated
in the graphic below. If a local target deployment is used for disaster readiness or
archiving options, the cluster protection scheme and storage pools must be
considered.
Cascaded
PowerScale Administration-SSP1
Considerations
Capabilities
194The SyncIQ Job Engine is separate from the cluster maintenance activity Job
Engine in OneFS. SyncIQ runs based on SyncIQ policies that you can schedule or
run as required manually.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 309
Configuring Data Services
197 The SyncIQ process uses snapshots on both the source and target snapshots.
No SnapshotIQ license is required for basic SyncIQ snapshots on either the source
or target clusters. These snapshots are only used for SyncIQ jobs. SyncIQ
snapshots are single-instance snapshots and OneFS only retains the latest or last-
known good version.
198
SyncIQ can support larger maximum transmission units or MTU over the LAN or
WAN. SyncIQ supports auto-negotiation of MTU sizes over WAN connections. The
MTU across the network is negotiated by the network.
PowerScale Administration-SSP1
• Import snapshots199.
• OneFS 8.2 and above provides over-the-wire encryption200 and bandwidth
reservation201 at a policy level.
199SyncIQ has the capability to import manually taken snapshots to use as the
point-in-time reference for synchronization consistency. You can add new nodes
while a sync job runs. There is no requirement to stop the sync job before adding
new nodes. Functionality enables the ability to create a point-in-time report showing
the SyncIQ worker activity.
200In-flight encryption makes data transfer between OneFS clusters secure. The
function benefits customers who undergo regular security audits and/or
government regulations.
201The SyncIQ bandwidth setting at the global level splits the bandwidth
reservation evenly among all policies. Using the CLI, you can make bandwidth
reservations for individual policies.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 311
Configuring Data Services
Limitations
203 Performing a complete failover and failback test on a monthly or quarterly basis
is discouraged. Perform failover testing if quiescing writes to the source (prevent
changing the data) and all SyncIQ policies are successfully run a final time to
assure complete synchronization between source and target. Failing to perform a
final synchronization can lead to data loss.
PowerScale Administration-SSP1
Compatibility
The table shows the versions of OneFS you can synchronize using SyncIQ. Target
cluster running OneFS 7.1.x version of OneFS is no longer supported. For
information about the support and service life-cycle dates for hardware and
software products, see the Isilon Product Availability Guide.
204Retrieving a copy of the data from the target cluster does not require a failover.
The target is a read-only copy of the data. Perform a copy operation to make a
copy of the read-only data on the target cluster to a location outside of the SyncIQ
domain on the target, or to a location on the source cluster, or to the client.
205The 'Whenever the source is modified' option is not for continuous replication.
OneFS does not offer a continuous replication option. This option is for specific
workflows that have infrequent updates and require distribution of the information
as soon as possible.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 313
Configuring Data Services
CloudPools
SyncIQ can synchronize CloudPools data from the CloudPools aware source
cluster to a PowerScale target cluster.
SyncIQ provides data protection for CloudPools data and provides failover and
failback capabilities.
The processes and capabilities of SyncIQ are based on the OneFS version
relationship between the source cluster and the target cluster. This relationship
determines the capabilities and behaviors available for SyncIQ policy replication.
Failover
Failover is the process of changing the role of the target replication directories into
the role of the source directories for assuming client read, write, and modify data
activities.
PowerScale Administration-SSP1
Source
Target
The example shows a failover where the client accesses data on the target cluster.
Failback
Like failover, you must select failback for each policy. You must make the same
network changes to restore access to direct clients to the source cluster.
206A failback can happen when the primary cluster is available once again for client
activities. The reason could be from any number of circumstances including the
natural disasters are no longer impacting operations, or site communication or
power outages have been restored to normal. You must failback each SyncIQ
policy.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 315
Configuring Data Services
Source
Target
The example shows a failback where the client accesses source data.
Failback Preparation
Source cluster
Resync-prep prepares the source cluster to receive the changes made to the data
on the target cluster.
The mirror policy is placed under Data Protection > SyncIQ > Local Targets on the
primary cluster. On the secondary cluster, the mirror policy is placed under Data
Protection > SyncIQ > Policies.
PowerScale Administration-SSP1
Failover Revert
A failover revert undoes a failover job in process207. Use revert before writes
occur208 on the target.
Source
Target
207Failover revert stops the failover job and restores the cluster to a sync ready
state. Failover reverts enables replication to the target cluster to once again
continue without performing a failback.
208Use revert if the primary cluster once again becomes available before any writes
happen to the target. A temporary communications outage or if doing a failover test
scenario are typical use cases for a revert.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 317
Configuring Data Services
SyncIQ
209You create and start replication policies on the primary cluster. A policy
specifies what data is replicated, where the data is replicated to, and how often the
data is replicated.
210The primary cluster holds the source root directory, and the secondary cluster
holds the target directory. There are some management capabilities for the policy
on both the primary and secondary clusters, though most of the options are on the
primary.
211SyncIQ jobs are the operations that do the work of moving the data from one
PowerScale cluster to another. SyncIQ generates these jobs according to
replication policies.
PowerScale Administration-SSP1
The panels describe the files for creating the SyncIQ policy. Refer to the student
guide for more information.
Settings
Creating a SyncIQ policy is done of the Data protection > SyncIQ > Policies page
or using the isi sync policy create command.
Unique name
The graphic shows the SyncIQ policy Settings fields. Click the image to enlarge.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 319
Configuring Data Services
SyncIQ domain root Replicates only listed paths and ignores unlisted -
use with caution
Target Cluster
Advnaced
The final segment of the policy creation are the advanced fields.
PowerScale Administration-SSP1
Prioritize policies
Data details written to /var/log/isi_migrate.log
Settings: In the Settings section, assign a unique name to the policy. Optionally you
can add a description of the policy. The Enable this policy box is checked by
default. If you cleared the box, it would disable the policy and stop the policy from
running. Next designate whether a Copy policy or a Synchronize policy. The
replication policy can be started using one of four different run job options:
Manually, On a Schedule, Whenever the source is modified, or Whenever a
snapshot of the source directory is taken.
Source cluster directories: In the Source Cluster criteria, the Source root directory
is the SyncIQ domain. The path has the data that you want to protect by replicating
it to the target directory on the secondary cluster. Unless otherwise filtered,
everything in the directory structure from the source root directory and below
replicates to the target directory on the secondary cluster.
Includes and excludes: The Included directories field permits adding one or more
directory paths below the root to include in the replication. Once an include path is
listed that means that only paths listed in the include path replicate to the target.
Without include paths all directories below the root are included. The Excluded
directories field lists directories below the root you want explicitly excluded from the
replication process. You cannot fail back replication policies that specify includes or
exclude settings. The DomainMark job does not work for policies with subdrectories
mentioned in Include or Exclude. Using includes or excludes for directory paths
does not affect performance.
File matching criteria: The File matching criteria enables the creation of one or
more rules to filter which files do and do not get replicated. Creating multiple rules
connect them together with Boolean AND or OR statements. When adding a new
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 321
Configuring Data Services
filter rule, click either the Add an “And” condition or Add an “Or” condition links. File
matching criteria says that if the file matches these rules then replicate it. If the
criteria does not match the rules, do not replicate the file.
Target: Snapshots are used on the target directory to retain one or more consistent
recover points for the replication data. You can specify if and how these snapshots
generate. To retain the snapshots SyncIQ takes, select Enable capture of
snapshots on the target cluster. SyncIQ always retains one snapshot of the most
recently replicated delta set on the secondary cluster to facilitate failover,
regardless of this setting. Enabling capture snapshots retains snapshots beyond
the time period that is needed for SyncIQ. The snapshots provide more recover
points on the secondary cluster.
Advanced: The Priority field in the Advanced settings section enables policies to be
prioritized. If more than 50 concurrent SyncIQ policies are running at a time,
policies with a higher priority take precedent over normal policies. If the SyncIQ
replication is intended for failover and failback disaster recovery scenarios,
selecting Prepare policy for accelerated failback performance prepares the
DomainMark for the failback performance. The original source SyncIQ domain
requires a DomainMark. Running a DomainMark during the failback process can
take a long time to complete. You can retain SyncIQ job reports for a specified
time. With an increased number of SyncIQ jobs in OneFS 8.0, the report retention
period could be an important consideration. If tracking file and directory deletions
that are performed during synchronization on the target, you can select to Record
deletions on synchronization.
Deep copy: The Deep copy for CloudPools setting applies to those policies that
have files in a CloudPools target. Deny is the default. Deny enables only stub file
replication. The source and target clusters must be at least OneFS 8.0 to support
Deny. Allow the SyncIQ policy determine if a deep copy should be performed.
Force automatically enforces a deep copy for all CloudPools data that are
contained within the SyncIQ domain. Allow or Force are required for target clusters
that are not CloudPools aware.
PowerScale Administration-SSP1
A SyncIQ policy can copy or synchronize source data to meet organizational goals.
When creating a SyncIQ policy, choose a replication type of either sync 212 or
copy213.
212 If a mirrored copy of the source is the goal, create a sync policy.
213If the goal is to have all source data that is copied and to retain deleted file
copies, then create a copy policy.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 323
Configuring Data Services
Tip: You can always license SnapshotIQ on the target cluster and
retain historic SyncIQ associated snapshots to aid in file deletion
and change protection.
The video details a basic SyncIQ use case, configuring replication between two
clusters. See the student guide for a transcript of the video.
Link:
https://edutube.emc.com/Player.aspx?vno=6cyyA4XvBqkyHJwXs6ltdg==&autoplay
=true
PowerScale Administration-SSP1
Challenge
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 325
Configuring Data Services
SmartLock
Scenario
SmartLock Overview
PowerScale Administration-SSP1
• SyncIQ integration214
• OneFS data services integration215
SmartLock Concepts
Before configuring SmartLock on a cluster, you must familiarize yourself with a few
concepts to fully understand the SmartLock requirements and capabilities.
• Retention Period
• Compliance
• WORM
There are two SmartLock operation modes available to the cluster: SmartLock
compliance mode216 and SmartLock enterprise mode217.
Compliance Enterprise
216
You can create compliance directories only if the cluster has been upgraded to
SmartLock compliance mode.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 327
Configuring Data Services
Only use if SEC 17a-4 must be Does not restrict cluster to follow SEC
followed. 17a-4 rules.
Configured during initial cluster install. Data not modified until retention dates
have passed.
PowerScale Administration-SSP1
2: Enterprise SmartLock directories are data retention directories that do not meet
SEC regulatory compliance requirements. Enterprise directories are the most
commonly used directories in a SmartLock configuration. Enterprise SmartLock
directories enable administrators or RBAC enabled users the ability to delete files,
which are known as privileged deletes. You can enable or turn on, temporarily
disable or turn off, or permanently disable privileged deletes. The Enterprise
directory may be fully populated with data or empty when creating or modifying.
3: Compliance SmartLock directories are data retention directories that meet SEC
regulatory compliance requirements. Set up the cluster in Compliance mode to
support Compliance SmartLock directories.
When using SmartLock, there are two types of directories: enterprise and
compliance. A third type of directory is a standard or non-WORM218 directory.
If using the compliance clock, you must copy data into the Compliance SmartLock
directory structure before committing the data to a WORM state.
SmartLock Configuration
In this use case the administrator wants to create a WORM directory where files
are locked down for a month. Once moved into the folder, the files are committed to
WORM.
Create a WORM domain from the WebUI File system > SmartLock page and select
Create domain or using the CLI "isi worm domains command.
218
OneFS supports standard non-WORM directories on the same cluster with
SmartLock directories.
219When you upgrade, privileged deletes are disabled permanently and cannot be
changed back.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 329
Configuring Data Services
5
6
1: Setting to "On" enables the root user to delete files that are currently committed
to a WORM state.
3: The default retention period is assigned when committing a file to a WORM state
without specifying a day to release the file from the WORM state.
4: The minimum retention period ensures that files are retained in a WORM state
for at least the specified period of time. The maximum retention period ensures that
files are not retained in a WORM state for more than the specified period of time.
5: After a specified period, a file that has not been modified is committed to a
WORM state.
6: Files committed to a WORM state are not released from a WORM state until
after the specified date, regardless of the retention period.
Use case:
• The administrator requires a WORM directory where files are in a WORM state
for at least 30 days and are removed from the WORM state after 60 days.
PowerScale Administration-SSP1
CLI:
For a file to have a file retention date applied, and set to a read-only state, you
must commit the file to WORM.
Until the files are committed to WORM, files that are in a SmartLock directory act
as standard files that you can move, modify, or delete.
First set the retention date on the file, then Set per SmartLock domain
commit the file to WORM.
Sets a time period from when
the file was last modified on a
directory
Commit files to WORM state using Windows After the time period expires,
controls or UNIX commands the file is automatically
Example: # chmod ugo-w committed to WORM.
/ifs/finance/worm/JulyPayroll.xls
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 331
Configuring Data Services
SmartLock Considerations
Challenge
PowerScale Administration-SSP1
Monitoring Tools
PowerScale Administration-SSP1
PowerScale HealthCheck
Scenario
PowerScale Administration-SSP1
HealthCheck Overview
WebUI, Cluster management > HealthCheck page. Click the image to enlarge.
The OneFS HealthCheck tool is a service that helps evaluate the cluster health
status and provides alerts to potential issues.
You can use HealthCheck to verify the cluster configuration and operation,
proactively manage risk, reduce support cycles and resolution times, and improve
uptime.
CLI example to view the checklist items: isi healthcheck checklists list
PowerScale Administration-SSP1
For the CLI equivalent output use the "isi healthcheck checklists view cluster_capacity" command.
Click on the image to enlarge.
The graphic shows that the checklist items for the cluster_capacity check. The
HealthCheck terms and their definition are:
You can use the CLI to view the parameters of a checklist item. The example
shows viewing the node_capacity item parameters.
PowerScale Administration-SSP1
Running a HealthCheck
By default, a HealthCheck evaluation runs once a day at 11:00 AM. You can run a
HealthCheck using the WebUI.
The example shows selecting the Run option for the cluster_capacity checklist. The
HealthCheck table shows the status of the checklist.
PowerScale Administration-SSP1
HealthCheck Schedule
You can manage the HealthCheck schedules of the checklists. By default, the
basic checklist is scheduled.
PowerScale Administration-SSP1
Viewing an Evaluation
Evaluation showing
failures
You can view the evaluation from the HealthChecks tab or the Evaluations tab. For
a failed evaluation, the file will show the checklist items that failed.
PowerScale Administration-SSP1
HealthCheck Resources
Challenge
PowerScale Administration-SSP1
InsightIQ
Scenario
InsightIQ Overview
isi_stat_d
InsightIQ host
http
Client
FSA datastore
http
Datastore
InsightIQ focuses on PowerScale data and performance. Listed are key benefits for
using InsightIQ. Refer to the student guide for more information.
• Determine whether a storage cluster is performing optimally.
• Compare changes in performance across multiple metrics, such as CPU usage,
network traffic, protocol operations, and client activity.
• Correlate critical storage cluster events with performance changes.
• Determine the effect of workflows, software, and systems on storage cluster
performance over time.
PowerScale Administration-SSP1
InsightIQ Dashboard
PowerScale Administration-SSP1
Aggregated view
Metrics
Monitoring 3 clusters
Cluster health
Cluster-by-cluster breakout
You can modify the view to represent any time period where InsightIQ has
collected data. Also, breakouts and filters can be applied to the data. In the
Aggregated Cluster Overview section, you can view the status of all monitored
clusters as a whole. There is a list of all the clusters and nodes that are monitored.
Total capacity, data usage, and remaining capacity are shown. Overall health of the
clusters is displayed. There are graphical and numeral indicators for connected
clients, active clients, network throughput, file system throughput, and average
CPU usage. Depending on the chart type, preset filters enable you to view specific
data. For example, In/Out displays data by inbound traffic compare with outbound
traffic.
You can also view data by file access protocol, individual node, disk, network
interface, and individual file or directory name. If displaying the data by the client
only, the most active clients are represented in the displayed data. Displaying data
by event can include an individual file system event, such as read, write, or lookup.
Filtering by operation class displays data by the type of operation being performed.
PowerScale Administration-SSP1
Capacity Analysis
The capacity analysis pie chart is an estimate of usable capacity is based on the
existing ratio of user data to overhead220.
220 There is an assumption that data usage factors remain constant over more use.
If a customer uses the cluster for many small files and then wants to add some
large files, the result is not precisely what the system predicts.
PowerScale Administration-SSP1
Default Reports
You can monitor clusters through customizable reports that display detailed cluster
data over specific periods of time.
• Performance reports
• File system reports
• Live reporting
PowerScale Administration-SSP1
You can drill down to file system reporting to get a capacity reporting interface that
displays more detail about usage, overhead and anticipated capacity.
The graphic shows the Capacity Forecast, displaying the amount data that can be
added to the cluster before the cluster reaches capacity.
The administrator can select cluster information and use that as a typical usage
profile to estimate when the cluster reaches 90% full. The information is useful for
planning node/cluster expansion ahead of time to avoid delays around procurement
and order fulfillment.
The Plot data shows the granularity of the reporting available. The Forecast data
shows the breakout of information that is shown in the forecast chart. Depending
on the frequency and amount of variation, outliers can have a major impact on the
accuracy of the forecast usage data.
Create custom live performance reports by clicking Performance Reporting > Create a New
Performance Report. Click the image to enlarge.
There are three types of reports On the Create a New Performance Report page.
PowerScale Administration-SSP1
221In the Create a New Performance Report area, in the Performance Report
Name field, type a name for the live performance report. Select the Live
Performance Reporting checkbox. In the Select the Data You Want to See area,
specify the performance modules that you want to view in the report. You can add
a performance module or modify an existing one. Repeat this step for each
performance module that you want to include. Save the report.
PowerScale Administration-SSP1
InsightIQ collects the FSA data from the cluster for display to the administrator.
PowerScale Administration-SSP1
Enable FSA
Monitored Clusters page, Settings > Monitored Clusters. Click the image to enlarge.
Before you can view and analyze data usage and properties through InsightIQ, you
must enable the FSA feature.
222 Unlike InsightIQ datasets, which are stored in the InsightIQ datastore, FSA
result sets are stored on the monitored cluster in the /ifs/.ifsvar/modules/fsa
directory.
223The job collects information across the cluster, such as the number of files per
location or path, the file sizes, and the directory activity tracking.
PowerScale Administration-SSP1
To enable FSA, Open the Monitored Clusters page by clicking Settings > Monitored
Clusters. In the Actions column for the cluster that you want to enable or disable
FSA, click Configure. The Configuration page displays. Click the Enable FSA tab.
To enable the FSA job, select Generate FSA reports on the monitored cluster. To
enable InsightIQ for FSA report, select View FSA reports in InsightIQ.
If there are long time periods between the FSAnalyze job runs, the snapshot can
grow very large, possibly consuming much of the cluster's space. To avoid large
snapshot, you can disable the use of snapshots for FSAnalyze. Disabling snapshot
use means that the jobs may take longer to run.
Considerations
PowerScale Administration-SSP1
Challenge
PowerScale Administration-SSP1
DataIQ v1
Scenario
Your Challenge: The IT manager has asked you to explain DataIQ and
its available monitoring capabilities.
DataIQ Overview
1 2 3 4 5 6 7
1: DataIQ eliminates the problem of data silos by proving a holistic view into
heterogeneous storage platforms on-premises and in the cloud. A single pane of
glass view gives users a file-centric insight into data and enables intuitive
navigation.
2: DataIQ optimized near real-time scan, and high-speed file indexing deliver
immediate project and user information. Powerful search capabilities across
heterogeneous storage can locate data in seconds, no matter where it resides.
High-speed search and indexing scans and organizes files in "look aside" mode.
PowerScale Administration-SSP1
3: DataIQ can ‘tag’ an attribute and use that tag to query millions of files across any
storage system. Tags enable business users, and IT, to view data in a true
business context. Tags give organizations the ability to see their data in the right
context, and to optimize their storage environment costs.
4: DataIQ enables data mobility with bi-directional movement between file and
object storage. The use of self-service archive capabilities to move files to the most
appropriate storage tier, such as archive or the cloud, empowers business owners.
Self-service enables content owners to move data from high-performance file
storage to an object archive.
6: DataIQ quickly scans file and object storage of all types. It can classify data
according to customer specification and provide instant rollup information. For
example, total tree size, average age of subtree data, 'last modified' date at any
point of folder structure. DataIQ generates fast and granular reports with business-
specific views and metrics, enabling rapid issue isolation. DataIQ integrates with IT
infrastructures to provide rights for AD and LDAP for users and group, as well as
APIs to enhance and extract business data. DataIQ plug-ins enable users to gain
additional insights. Plug-ins extend the GUI and launch internal scripts such as
Data Mover, Previewer, Audited Delete, Send to QA, and other custom scripts.
DataIQ Implementation
The DataIQ server scans the managed storage, saves the results in an index, and
provides access to the index.
Access is available from one or more GUI clients, CLI clients, and through the API
for application integration.
PowerScale Administration-SSP1
DataIQ
Windows
Clients
Linux
Clients
MAC
Clients
After logging in to the DataIQ WebUI, the landing page is the Data Management
page.
Settings - Pages
Use the left and right arrows to view the Settings pages.
PowerScale Administration-SSP1
Local settings
The Local settings page allows you to personalize the theme of the DataIQ WebUI.
• Client maps224
• Viewable files and folders225
General management
You can configure email alerts and SRS on the General management page.
If a volume has the minimum free space threshold configured, an email is sent
when the threshold is triggered.
224 Client maps enable you to map the DataIQ path to the path that the client sees.
225 You can view or hide the hidden-type files and folders. You can also set how the
files and folders are viewed in the tables and lists.
PowerScale Administration-SSP1
The Access and permissions page is where you can configure groups, add roles to
the groups, set authentication providers, and add users.
PowerScale Administration-SSP1
The Other settings include file type class and configuration files.
Licensing
From the Licensing page, you can manage and upload licenses generated from the
Software Licensing Central online portal.
PowerScale Administration-SSP1
Shown is an overview of the actions a user with the role of data owner can
perform. The actions are performed from the Data management settings page.
Use the left and right arrows to view the panels.
Volumes Panel
Text
Edits apply globally, settings at the volume Configure volume type, scan management, Volumes added to the scan group adopt the Change the scan management, delete the
level have precedence and hard link handling scan group settings, scan group settings have volume
precedence over volume settings
Click to enlarge.
From the Data management configuration page, Volumes panel, you can set
volume defaults, add and edit volumes, and create scan groups.
PowerScale Administration-SSP1
S3 Endpoints Panel
Click to enlarge.
226DataIQ enables you to setup the endpoint as a volume for scanning. To delete
an endpoint, go to the view breakout for the endpoint.
PowerScale Administration-SSP1
PowerScale Administration-SSP1
Click to enlarge.
From the Data management settings page, Other settings panel, you can configure
file type classes.227
The example shows the configuration files and a breakout of the clientmap file. Click to enlarge.
The Data management settings has four configurations files that you can edit. The
files are listed in the Others setting panel:
• Clientmap configuration files allows you to view file paths as they are seen by
the user.
• Data management configuration file allows you to change DataIQ settings
• Viewfilter configuration file allows you to restrict the view of folders by group
• Autotagging configuration file allows you to setup and define tags
227File type classes allow you to scan the volumes by a file type class. For
example, you can make a class called images and then add .jpeg, .png, and .gif
extensions to the class.
PowerScale Administration-SSP1
Scroll through the carousel to view each of the volume management areas. You
can double click the images to enlarge.
Volume defaults
Set a capacity
threshold. When
triggered, flags the
volume
Provides more
accurate reports on
volumes with hardlinks
The volume defaults are applied to new volumes and volumes without configured
settings.
The settings on volumes that are configured take precedence over the default
values.
Add Volume
The Add new volume window consists of three panels, the general settings, scan
configuration, and advanced settings.
PowerScale Administration-SSP1
Scan Groups
You can create scan groups and add volumes with the same scan, TCO, and
minimum free space trigger to the group.
Settings in the scan group have precedence to the settings on the volume.
Editing Volumes
If the volume belongs to a scan group and the scan group settings no longer apply,
you can remove the volume from the scan group and edit the volume settings.
PowerScale Administration-SSP1
settings for each configuration file. Select each page for an overview and use case
for the configurations.
228For example, a class that is called Video and a class that is called Image are
configured. The IT manager requests a report on the cost of video-type files and
the cost of image-type files. You can use the DataIQ Analyze feature to view the
storage consumption and cost of each class.
PowerScale Administration-SSP1
Clientmap Configuration
Format
Example mappings
Use the clientmap file to map virtual DataIQ paths to valid paths on a client.
Convert229 from virtual to client and from client path to virtual path.
229Conversion from virtual paths to client paths occurs when copying paths to the
system clipboard. Conversion from client paths to DataIQ virtual paths occurs when
a client path is entered into a field such as a search field.
PowerScale Administration-SSP1
Data Management
Format
Option definition
230Modifying settings can impact DataIQ functionality. The defaults are typically
used. The file has a description of each setting.
PowerScale Administration-SSP1
Viewfilter Configuration
Example filters
The Viewfilter configuration file231 enables you to create rules to restrict groups
from viewing folders.
231 Viewfilter uses regular expressions (RE). If a volume or folder matches the RE
for the user's group, then that volume and folder are viewable for the user. If a user
is a member of more than one group, the user is only restricted from folders that
are restricted in all their groups.
PowerScale Administration-SSP1
Autotagging Configuration
Use auto-tagging232 to tag and track items. A use case is applying a tag to project
paths for use when determining a work order for a customer.
Use the left and right arrows to view the Data Management pages.
PowerScale Administration-SSP1
Browse
Configure limits
and actions on
selected item
The main functions of the Browse page are searching233, a panel that shows the
volumes in a tree view, a directory breakdown panel, a table that shows the files
within the selected folder, and an article details panel.
Flagging items in the table makes them reflective in the other data management
components.
233The search bar uses characters similar to Java regular expression (regex) such
as ^ for the beginning of filenames and $ for the ending of filenames.
PowerScale Administration-SSP1
Browse Details
234However, if data changes, updated files may not appear in file searches. Go to
the Actions panel and perform a scan on a volume or path to make sure you are
getting the latest information.
PowerScale Administration-SSP1
Analyze
The Analyze page235 allows you to analyze volumes from a business context.
Flagged items
The Flagged items page lists the items the user marks as flagged.
235
The page enables you to view multi-dimensional project oriented data by cost
and size.
PowerScale Administration-SSP1
Tag management
Business rules configuration, also called auto-tagging, is used to tag tracked items
during a scan.
The Tag management page shows the results of scan when auto-tagging is
configured.
Jobs
The Jobs page shows a table of the jobs and their status as well as a details panel
for the selected job.
PowerScale Administration-SSP1
Logs - Scan
The Logs page has two tabs, the Scan logs and the Error logs. The Scan logs table
show the generated logs from completed scan jobs.
Logs - Error
236A full scans is done the first time a storage file system is indexed. DataIQ walks
the entire file system, indexing every folder. This initial baseline scan ensures that
everything about the file system is known.
237An optimized scan is an incremental scan that only scans the folders where
there have been changes since the last full scan.
PowerScale Administration-SSP1
Auto-Tagging Example
The installer does not create the autotagging configuration file, but you can use the
sample file /usr/local/dataiq/etc/autotag.cfg.sample as a starting
point. Auto-tagging generally occurs when DataIQ scans a file system.
First make a copy of the existing Autotagging configuration file as a backup. The
graphic shows the location of the Autotagging configuration file on the Settings,
Data management configuration page.
PowerScale Administration-SSP1
2. Reference Path
Enter the path examples on their own line, preceded by comment (#).
3. Auto-Tagging Rule
PowerScale Administration-SSP1
Enter the corresponding rule below each reference path. Having the commented
path makes it easier to understand the rule later and provides a reference for other
administrators.
Tags are automatically removed if the rule that created it no longer matches and
the tag has not been altered.
4. Simulate
Once the auto-tagging rules are configured, Simulate and report, and then view
the results. The results panel lists each rule and the number of times it matched. If
the results look reasonable, Save and run the new rules.
The Simulate and report will indicate rules that are invalid.
PowerScale Administration-SSP1
5. Analyze
Go to the Data Management page and watch the auto-tab job details to see when
it completes. View the counts in the details window. Go to the Analyze page to
verify the generated tag sets and view the report.
PowerScale Administration-SSP1
• Put the RE from an existing rule or rule fragment in the Stanford Analyzer to
understand it (select Java). Modify the RE in the analyzer until it meets your
needs.
• Test in an RE tester (search for "Java regular expression tester"), and then
put into DataIQ and run in the simulator.
Plug-In Overview
Plugins provide functions such as data transfer and audited delete to enable
administrators to manage data resources across storage platforms such as
PowerScale and ECS.
PowerScale Administration-SSP1
The plug-ins DataIQ supports are listed. Click each plug-in for a brief description.
• Data Mover
• Audited Deletes
• Duplicate Finder
• Previewer
Plug-in Examples
The graphics show WebUI excerpts of the plug-ins that are installed on a DataIQ
instance.
PowerScale Administration-SSP1
Challenge
Lab Assignment: Go to the lab and add the PowerScale cluster to the
DataIQ application.
PowerScale Administration-SSP1
isi statistics
Scenario
The three main commands that enable you to view the cluster from the command
line are isi status, isi devices, and isi statistics.
isi statistics
The isi statistics command provides protocol, drive, hardware, and node
statistics238.
238Other services such as InsightIQ, the WebUI, and SNMP gather information
using the "isi statistics" command.
PowerScale Administration-SSP1
The output shows the operations by protocol. The example shows that NFS clients
are connected to node 6 with 278.5k bytes per second input rate.
Output for the general cluster statistics in a top-style display where data is continuously overwritten
in a single table.
isi devices
The isi devices command displays information about devices in the cluster and
changes their status. There are multiple actions available including adding drives
and nodes to the cluster. Use the isi devices command for drive states,
hardware condition, node management, and drive replacement management.
isi status
The isi status command displays information about the current status of the
cluster, alerts, and jobs. The example of the isi status output gives a general node
status, performance metrics, critical alerts, and Job Engine status.
PowerScale Administration-SSP1
The --quiet option omits the alerts and Job Engine status output.
Tip: See the CLI Reference guide for a complete list of the
command options and output definitions.
The isi statistics command dumps all collected stats, and you can run the
"query" subcommand on a specific statistic.
• You can build a custom isi statistics query that is not in the provided
subcommands
• Cluster and node statistics from kernel counters
• isi_stats_d
PowerScale Administration-SSP1
The isi statistics command within a cron job239 gathers raw statistics over a
specified time period.
239 A cron job can run on UNIX-based systems to schedule periodic jobs.
PowerScale Administration-SSP1
The example output shows the isi statistics drive command for the SSD
drives on node 6.
The examples shows isi statistics heat, which uses --long to include
more columns.
The head -10 option displays the first 10 most active most accessed files and
directories.
The example node 6 output shows the Timestamp in Epoch timestamp format,
Ops as protocol operations, the Event type and Class (getattr is a namespace
read), and LIN for the file or directory associated with the event.
PowerScale Administration-SSP1
Practical Skills
Combining large sets of collected data with log analysis can help identify long-term
trends and sources of trouble.
2: isi Statistics can fill the gaps. Skillful use of isi statistics can
produce equivalent information to what InsightIQ offers and contains many
performance-related options.
PowerScale Administration-SSP1
Challenge
Lab Assignment: Now that you know which CLI commands are
available for monitoring, go to the lab and run the isi statistics
command.
PowerScale Administration-SSP1
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 389
Appendix
Course Materials
• Participant Guide
• Instructor laptop
• Projector and Speakers
• Internet access
• Whiteboard and markers
PowerScale Administration-SSP1
Course Agenda
Lunch
Depending on course pace and student knowledge, module and lab exercise
schedule may be altered
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 391
Appendix
Introductions
• Name
• Company
• Job Role
• Experience
• Expectations
PowerScale Administration-SSP1
DNS Primer
1: A FQDN is the DNS name of an object in the DNS hierarchy. A DNS resolver
query must resolve an FQDN to its IP address so that a connection can be made
across the network or the Internet. If a computer cannot resolve a name or FQDN
to an IP address, the computer cannot make a connection, establish a session or
exchange information. An example of an FQDN looks like sales.isilon.xattire.com.
2: A single period (.) represents the root domain, and is the top level of the DNS
architecture.
3: Below the root domain are the top-level domains. Top-level domains represent
companies, educational facilities, nonprofits, and country codes such as *.com,
*.edu, *.org, *.us, *.uk, *.ca, and so on. A name registration authority manages the
top-level domains.
4: The secondary domain represents the unique name of the company or entity,
such as EMC, Isilon, Harvard, MIT.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 393
Appendix
5: The last record in the tree is the hosts record, which indicates an individual
computer or server.
PowerScale Administration-SSP1
What is an A record?240
For example, a server that is named centos would have an A record that mapped
the hostname centos to the IP address assigned to it: centos.dees.lab A
192.168.3.3 Where centos is the hostname, dees.lab is the domain name, and
centos.dees.lab is the FQDN.
240
An A-record maps the hostname to a specific IP address to which the user
would be sent for each domain or subdomain. It is simple name-to-IP resolution.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 395
Appendix
The Name Server Record, or NS records, indicate which name servers are
authoritative for the zone or domain.
241Companies that want to divide their domain into sub domains use NS records.
Sub domains indicate a delegation of a portion of the domain name to a different
group of name servers. You create NS records to point the name of this delegated
sub domain to different name servers.
PowerScale Administration-SSP1
You must create an address (A) record in DNS for the SmartConnect service IP.
Delegating to an A record means that if you failover the entire cluster, you can do
so by changing one DNS A record. All other name server delegations can be left
alone. In many enterprises, it is easier to update an A record than a name server
record, because of the perceived complexity of the process.
Delegationtion recommendation.242
242 The recommendation is to create one delegation for each SmartConnect zone
name or for each SmartConnect zone alias on a cluster. This method permits
failover of only a portion of the workflow—one SmartConnect zone—without
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 397
Appendix
affecting any other zones. This method is useful for scenarios such as testing
disaster recovery failover and moving workflows between data centers.
PowerScale Administration-SSP1
The graphic shows how SmartConnect uses the X-Attire DNS server to provide a
layer of intelligence within the OneFS software application.
6
5
4 7
1
2
3: All clients are configured to make requests from the resident DNS server using a
single DNS hostname. Because all clients reference a single hostname,
isilon.xattire.com, it simplifies the management for large numbers of clients.
4: The resident DNS server forwards the delegated zone lookup request to the
delegated zone server of authority, here the SIP address of the cluster.
6: SmartConnect then returns this information to the DNS server, which, in turn,
returns it to the client.
7: The client then connects to the appropriate cluster node using the wanted
protocol.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 399
Appendix
NFS Connectivity
NFS relies upon remote procedure call (RPC) for client authentication and port
mapping. RPC is the NFS method that is used for communication between a client
and server over a network. RPC is on Layer 5 of the OSI model. Because RPC
deals with the authentication functions, it serves as gatekeeper to the cluster.
NFS connectivity
PowerScale Administration-SSP1
Let us look at the flow of a request by a client. When the RPC services start up on
the cluster, it registers with portmapper. The service tells portmapper what port
number it is listening on, and what RPC program numbers it is prepared to serve.
244 When the server receives the CALL, it performs the service that is requested
and sends back the REPLY to the client. During a CALL and REPLY, RPC looks for
client credentials, that is, identity and permissions.
245
If the server is not running a compatible version of the RPC protocol, it sends an
RPC_MISMATCH. If the server rejects the identity of the caller, it sends an
AUTH_ERROR.
246It acts as a gatekeeper by mapping RPC ports to IP ports on the cluster so that
the right service is offered.
247Clients calling for an RPC service need two pieces of information, the number of
the RPC program it wants to call and the IP port number.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 401
Appendix
HDFS Topic
• Data Lakes and Analytics
• HDFS Overview Video
• OneFS with Hadoop
• OneFS vs. Hadoop
• HDFS Administration
• Best Practices Resources
• Troubleshooting Resources
PowerScale Administration-SSP1
Swift Topic
• File and Object Storage Differences
• Accounts, Containers, and Objects
• Configuring Isilon Swift Accounts
• Storage URL
• Isilon Swift Considerations and Limitations
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 403
Appendix
When a node boots, it first checks its own vault resources before querying its
paired node. This way if the node can recover its journal from its own resources,
there is no need to query the paired node. But, if the journal is bad, the node can
identify the journal condition from its node state block data, and recovery should be
possible. There is a consequence to the nodes running in pairs. If a node runs
unpaired, it is under-protected.
PowerScale Administration-SSP1
Concurrency Examples
The process of striping spreads all write operations from a client248 across the
nodes of a cluster. Each tab illustrates a file that is broken down into chunks, after
which it is striped across disks249 in the cluster along with the FEC.
Concurrency
N+1n
256 KB file
128 KB chunk
128 KB chunk
128 KB FEC
248 A client is connected to only one node at a time. However when that client
requests a file from the cluster, the client connected node does not have the entire
file locally on its drives. The client-connected node retrieves and rebuilds the file
using the back-end network.
249 Even though a client is connected to only one node, when that client saves data
to the cluster, the write operation occurs in multiple nodes. The scheme is true for
read operations also.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 405
Appendix
All files 128 KB or less are mirrored. For a protection strategy of N+1 the 128 KB
file has 2 instances, the original data and one mirrored copy.
Concurrency
N+1n
128 KB file
128 KB FEC
The example shows a file that is not evenly distributed in 128 KB chunks. Blocks in
the chunk that are not used are free for use in the next stripe unit. Unused blocks in
a chunk are not wasted.
128 KB chunk
64 KB used
The example shows +2d:1n protection of a 1 MB file. The file is divided into eight
data stripe units and three FEC units. The data is laid out in two stripes over two
drives per node to achieve the protection.
PowerScale Administration-SSP1
Concurrency
N+2d:1n
1 MB file
8 x 128 KB chunk
Blocks within the same stripe (stripe 1) are written to separate drives on
each node
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 407
Appendix
A Data Lake is a central data repository that enables organizations to access and
manipulate the data using various clients and protocols. The flexibility keeps IT
from managing and maintaining a separate storage solution (silo) for each type of
data such as SMB, NFS, Hadoop, SQL, and others.
Click the i buttons in the graphic for information about ingest and OneFS storage.
1 2
2: Utilizing Isilon to hold the Hadoop data gives you all of the protection benefits of
the OneFS operating systems. You can select any of the data protection levels that
OneFS offers giving you both disk and node fault tolerance.
PowerScale Administration-SSP1
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 409
Appendix
URL:
https://edutube.emc.com/Player.aspx?vno=wZCty171ec2RjiMSRZZe9g==&autopla
y=true
Shown is an Isilon cluster with twelve nodes. A key benefit of CloudPools is the
ability to interact with multiple cloud vendors. Shown in the graphic are the
platforms and vendors that are supported as OneFS 8.1.1.
PowerScale Administration-SSP1
Let us look at an example, each chassis in the cluster represents a tier of storage.
The topmost chassis is targeted for the production high-performance workflow and
may have node such as F800s. When data is no longer in high demand,
SmartPools moves the data to the second tier of storage. The example shows the
policy moves data that is not accessed and that is over thirty days old. Data on the
middle tier may be accessed periodically. When files are no longer accessed for
more than 90 days, SmartPools archive the files to the lowest chassis or tier such
as A200 nodes.
The next policy moves the archive data off the cluster and into the cloud when data
is not accessed for more than 180 days. Stub files that are also called SmartLinks
are created. Stub files consume approximately 8 KB space on the Isilon cluster.
Files that are accessed or retrieved from the cloud, or files that are not fully moved
to the cloud, have parts that are cached on the cluster and are part of the stub file.
The storing of CloudPools data and user access to data that is stored in the cloud
is transparent to users.
CloudPools files undergo a compression algorithm and then are broken into their 2
MB cloud data objects or CDOs for storage. The CDOs conserve space on the
cloud storage resources. Internal performance testing does note a performance
penalty for a plane compression and decompressing files on read. Encryption is
applied to file data transmitting to the cloud service. Each 128 KB file block is
encrypted using a AES 256 encryption. Then transmitted as an object to the cloud.
Internal performance testing notes a little performance penalty for encrypting the
data stream.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 411
Appendix
To recap the overview, all production data resides on PowerScale. This removes
the task of exporting it from your production applications and importing it as with a
traditional Hadoop environment. The MapReduce continues to run on dedicated
Hadoop compute nodes. PowerScale requires this Hadoop front end to do the data
analysis. PowerScale holds the data so that Hadoop, applications, or clients can
manipulate it.
PowerScale Administration-SSP1
250 Hadoop requires a landing zone to stage data before using tools to ingest data
to the Hadoop cluster. PowerScale enables cluster data analysis by Hadoop.
Consider the time that it takes to push 100 TB across the WAN and wait for it to
migrate before any analysis can start. PowerScale does in place analytics so no
data moves around the network.
251Hadoop assumes that all members of the domain are trusted. PowerScale
supports integrating with AD or LDAP, and gives you the ability to safely segment
access.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 413
Appendix
252Each physical HDFS cluster can only support one distribution of Hadoop.
PowerScale can co-mingle physical and virtual versions of any Apache standards-
based distributions.
253Hadoop pairs the storage with the compute, so adding more space may require
you to pay for more CPU that may go unused. If you need more compute, you end
up with a lot of overhead space. With PowerScale you scale compute as needed or
storage as needed, aligning your costs with your requirements.
PowerScale Administration-SSP1
HDFS Administration
The graphic shows the WebUI Protocols, Hadoop (HDFS), Settings page, and
the corresponding isi hdfs settings command output.
6 7
4
5
1: The Default block size determines how the HDFS service returns data upon
read requests from Hadoop compute client. The server-side block size determines
how the OneFS HDFS daemon returns data to read requests. Leave the default
block size at 128 MB. If the customer runs an older version of HDFS, consider a 64
MB block size. If the block size is set to high, many read/write errors and
performance problems occur. Tune on setup.
2: Default checksum type is used for old HDFS workflows. Because OneFS uses
forward error correction, checksums for every transaction are not used, as it can
cause a performance issue.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 415
Appendix
5: Odp version - on updates, the Hortonworks version must match the version that
is seen in Ambari. Version conflict is common when customer upgrades
Hortonworks. Can cause jobs not to run. Installation also fails when Odp version
does not match.
6: Proxy users for secure impersonation can be created on the Proxy users tab.
For example, create an Apache Oozie proxy user to securely impersonate a user
called HadoopAdmin. Enable the Oozie user to request that the HadoopAdmin user
perform Hadoop jobs. Apache Oozie is an application that can automatically
schedule, manage, and run Hadoop jobs.
7: On the Virtual racks tabs, nodes can be preferred along with an associated
group of Hadoop compute clients to optimize access to HDFS data.
PowerScale Administration-SSP1
• Visit the Using Hadoop with Isilon - Isilon Info Hub web page for documentation.
• Use the Isilon Hadoop tools to create users and groups in the local provider.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 417
Appendix
Troubleshooting Resources
There are several guides that are dedicated to troubleshooting an HDFS solution.
PowerScale Administration-SSP1
Object storage combines the data with richly populated metadata to enable
searching for information by file content. Instead of a file that tells you the create or
modified date, file type, and owner, you can have metadata that tells you the
project name, formula results, personnel assigned, location of test and next run
date. The rich metadata of an object store enables applications to run analytics
against the data.
Object storage has a flat hierarchy and stores its data within containers as
individual object. An object storage platform can store billions of objects within its
containers, and you can access each object with a URL. The URL associated with
a file enables the file to be located within the container. Hence, the path to the
physical location of the file on the disk is not required. Object storage is well suited
for workflows with static file data or cloud storage.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 419
Appendix
Metadata:
Rich metadata:
File Name: Formula 5Xa
Object ID: 98765 Level: xxx
Created by: M.Smith
File Type: .doc
Created on: 9/9/14 Test date: xxx
Lab facility: Atlanta
File type: Word Patient trial: xxx
Building: 7
Patent: xxx
Lead Scientist: M. Smith
Approval ID: xxx
Description: xxx
Risk Assessment: xxx
PowerScale Administration-SSP1
Object1
Administrative
Container1 Object2
control point
Contain user
data
Object3
Account
Object4
Container1
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 421
Appendix
Administrators must provision the accounts before users can use the service. The
general steps are enable Swift license, decide upon file system user or group
ownership, create accounts using the isi swift command, and then assign
users access to account. Make any necessary file system permission changes if
you are relocating data into the account.
The example shows creating a Swift account in the sales access zone and using
an Active Directory user and group. The isi swift accounts list shows the
accounts that are created in the access zone. The isi swift accounts view
shows the account details.
PowerScale Administration-SSP1
Storage URL
Shown is what a Swift Storage URL looks like. URIs identify objects in the form
http://<cluster>/v1/account/container/object. In the example shown,
192.168.0.1 identifies the cluster. HTTP requests are sent to an internal web
service listening on port 28080. This port is not configurable. HTTPS requests are
proxied through the Apache web server listening on port 8083. This port is not
configurable. OpenStack defines the protocol version /v1. The reseller prefix
/AUTH_bob, where /AUTH is a vestige of the OpenStack implementation's internal
details. The _bob portion of the URL is the account name used. The container /c1
is the container in which an object is stored and the object /obj1 is the object.
Web service
Cluster Protocol version Reseller prefix Account Container Object
listening port
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 423
Appendix
Pre OneFS 8.0 Swift accounts are deactivated when upgrading to OneFS 8.0 and
later. After the upgrade, Swift no longer uses home directories for accounts. The
upgrade plan should determine which users are using Swift. Create new accounts
under the new Swift path, and then move the data from the old accounts into the
newly provisioned accounts. Swift is not compatible with the auditing feature.
PowerScale Administration-SSP1
Cache - L2
Storage side or node-side buffer. Buffers write transactions and L2 writes to disk
and prefetches anticipated blocks for read requests, sometimes called read ahead
caching. For write transactions, L2 cache works with the journaling process to
ensure protected committed writes. As L2 cache becomes full, it flushes according
to the age of the data. L2 flushes the least recently used, or LRU, data.
Chimer Nodes
By default, if the cluster has more than three nodes, three of the nodes are
selected as chimers. If the cluster has four nodes or less, only one node is selected
as a chimer. If no external NTP server is set, nodes use the local clock. Chimer
nodes are selected by the lowest node number that is not excluded from chimer
duty.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 425
DataIQ Previewer Plug-in
The Preview plug-in shows a preview image of the file in the WebUI for common
file types. The supported graphic file extensions are: ".jpg", ".jpeg", ".tiff", ".tif",
".dpx", ".bmp", ".png", ".gif", ".tga", ".targa", ".exr", ".pcx", ".pict", ".ico". The
supported video file extensions are: ".mov", ".mp4", ".mpeg", ".mpg", ".ts", ".avi",
".mkv", ".wmf", ".wmv", ".mxf", ".ogv". The plug-in does not work with object stores
such as S3, GCP, or ECS.
File Provider
A file provider enables you to supply an authoritative third-party source of user and
group information to a clustr. A third-party source is useful in UNIX and Linux
environments that synchronize /etc/passwd, /etc/group, and etc/netgroup
files across multiple servers.
Generation 6 Hardware
The Gen 6 platforms reduce the data center rack footprints with support for four
nodes in a single 4U chassis. It enable enterprise to take on new and more
demanding unstructured data applications. The Gen 6 can store, manage, and
protect massively large datasets with ease. With the Gen 6, enterprises can gain
new levels of efficiency and achieve faster business outcomes.
PowerScale Administration-SSP1
Groupnet
The groupnet is a top-level networking container that manages hostname resolution
against DNS nameservers and contains subnets and IP address pools. Every
subnet is assigned to a single groupnet. Each cluster has a default groupnet
named groupnet0. Groupnet0 contains an initial subnet, subnet0, an initial IP
address pool, pool0, and an initial provisioning rule, rule0. Groupnets are how the
cluster communicates with the world. DNS client settings, such as name servers
and a DNS search list, are properties of the groupnet. If the cluster communicates
to another authentication domain, it must find that domain. To find another
authentication domain, you need a DNS setting to route to that domain. With
OneFS 8.0 and later releases, groupnets can contain individual DNS settings,
whereas prior OneFS versions had a single global entry.
Hadoop
Hadoop is designed to scale up from a single server to thousands of servers.
Hadoop clusters dynamically scale up and down based on the available resources
and the required services levels. Performance varies widely for processing, and
queries can take a few minutes to multiple days depending on how many nodes
and the amount of data requested.
Home Directory
Home directory provisioning creates a single home share that redirects users to
their SMB home directories. If one does not exist, a directory is automatically
created.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 427
File system reports include data about the files that are stored on a cluster. The
reports have use if, for example, you want to identify the types of data being stored
and where that data is stored. Before applying a file system report, enable InsightIQ
File System Analytics for that cluster.
isi get
The isi get command displays the protection settings on an entire directory path or,
as shown, a specific file without any options. The POLICY or requested protection
policy, the LEVEL or actual protection, the PERFORMANCE or data access pattern
are displayed for each file. Using with a directory path displays the properties for
every file and subdirectory under the specified directory path. Output can show files
where protection is set manually. Mirrored file protection is represented as 2x to 8x
in the output.
Job - Schedule
With the Schedule options, you can start the job manually or set to run on a
regularly scheduled basis.
PowerScale Administration-SSP1
Layers of Access
• Protocol Layer - The first layer is the protocol layer. Protocols may be Server
Message Block, or SMB, Network File System, or NFS, File Transfer Protocol,
or FTP, or some other protocol.
• Authentication Layer - The authentication layer identifies a user using a system
such as NIS, local files, or Active Directory.
• Identity Assignment Layer - The third layer is identity assignment. This layer is
straightforward and based on the results of the authentication layer, but there
are some cases that need identity mediation within the cluster, or where roles
are assigned within the cluster that are based on user identity.
• Authorization Layer - Finally, based on the established connection and
authenticated user identity, the file and directory permissions are evaluated. The
evaluation determines whether the user is entitled to perform the requested data
activities.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 429
Leaf-Spine
Leaf-Spine is a two level hierarchy where nodes connect to leaf switches, and leaf
switches connects to spine switches. Leaf switches do not connect to one another,
and spine switches do not connect to one another. Each leaf switch connects with
each spine switch and all leaf switches have the same number of uplinks to the
spine switches.
Local Provider
Local authentication is useful when Active Directory, LDAP, or NIS directory
services are not configured or when a specific user or application needs access to
the cluster. Local groups can include built-in groups and Active Directory groups as
members
MTTDL
MTTDL is a statistical calculation that estimates the likelihood of a hardware failure
resulting in data loss. MTTDL is a system view of reliability and asks the question
“What happens when hardware does fail, and will I lose any data when it does?”
NAS
NAS is an IP-based, dedicated, high-performance file sharing and storage device.
NFS
Network File System, or NFS, is an open standard that UNIX clients use. The NFS
protocol enables a client computer to access files over a network. NFS clients
mount the OneFS export that is accessible under a client mountpoint. The
mountpoint is the directory that displays files from the server. The NFS service
enables you to create as many NFS exports as needed.
OneFS CLI
The command-line interface runs "isi" commands to configure, monitor, and
manage the cluster. Access to the command-line interface is through a secure shell
(SSH) connection to any node in the cluster.
PowerScale Administration-SSP1
PaaS
PaaS combined with approaches like continuous integration and deployment can
measure application development cycles in the days and weeks rather than months
or years. The combinations can dramatically reduce the time it takes from having
an idea to identifying insight, to action, and creating value.
PAPI
The PAPI is divided into two functional areas: one area enables cluster
configuration, management, and monitoring functionality, and the other area
enables operations on files and directories on the cluster. A chief benefit of PAPI is
its scripting simplicity, enabling customers to automate their storage administration.
PowerScale A200
The A200 is an ideal active archive storage solution that combines near-primary
accessibility, value and ease of use.
PowerScale A2000
The A2000 is an ideal solution for high density, deep archive storage that
safeguards data efficiently for long-term retention.
PowerScale F200
Ideal for low-cost all-flash node pool for existing Gen6 clusters. Ideal for small,
remote clusters.
PowerScale F600
Ideal for small, remote clusters with exceptional system performance for small
office and remote office technical workloads.
PowerScale F800
Use the F800 for workflows that require extreme performance and efficiency.
PowerScale F810
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 431
Use the F810 for workflows that require extreme performance and efficiency. The
F810 also provides high-speed inline data deduplication and in-line data
compression. It delivers up to 3:1 efficiency, depending on your specific dataset
and workload.
PowerScale H400
The H400 provides a balance of performance, capacity and value to support a wide
range of file workloads. It delivers up to 3 GB/s bandwidth per chassis and provides
capacity options ranging from 120 TB to 720 TB per chassis.
PowerScale H500
The H500 is a versatile hybrid platform that delivers up to 5 GB/s bandwidth per
chassis with a capacity ranging from 120 TB to 720 TB per chassis. It is an ideal
choice for organizations looking to consolidate and support a broad range of file
workloads on a single platform.
PowerScale H5600
The H5600 combines massive scalability – 960 TB per chassis and up to 8 GB/s
bandwidth in an efficient, highly dense, deep 4U chassis. The H5600 delivers inline
data compression and deduplication. It is designed to support a wide range of
demanding, large-scale file applications and workloads.
PowerScale H600
The H600 is Designed to provide high performance at value, delivers up to 120,000
IOPS and up to 12 GB/s bandwidth per chassis. It is ideal for high performance
computing (HPC) workloads that don’t require the extreme performance of all-flash.
Quotas - Accounting
Accounting quotas monitor, but do not limit, disk storage. With accounting quotas,
you can review and analyze reports to help identify storage usage patterns.
Accounting quotas assist administrators to plan for capacity expansions and future
storage requirements. Accounting quotas can track the amount of disk space that
various users or groups use.
Quotas - Advisory
Advisory quotas do not deny writes to the disk, but they can trigger alerts and
notifications after the threshold is reached.
PowerScale Administration-SSP1
Quotas - Enforcement
Enforcement quotas include the functionality of accounting quotas and enable the
sending of notifications and the limiting of disk storage. Enforcement quotas include
the functionality of accounting quotas and enable the sending of notifications and
the limiting of disk storage.
Reed-Solomon
OneFS uses the Reed-Solomon algorithm, which is an industry standard method to
create error-correcting codes, or ECC, at the file level.
Reed-Solomon
OneFS uses the Reed-Solomon algorithm, which is an industry standard method to
create error-correcting codes, or ECC, at the file level.
Scale-out Solution
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 433
Not all clustered NAS solutions are the same. Some vendors overlay a
management interface across multiple independent NAS boxes. This gives a
unified management interface, but does not unify the file system. While this
approach does ease the management overhead of traditional NAS, it still does not
scale well.
With scale-out, a single component (node) of a system or cluster contains the
performance, compute, and capacity. As the need for capacity or compute power
increases, you add more nodes to the cluster. The node is not equivalent to a
scale-up controller as disk capacity is not added to a node. The cluster scales out
as nodes you add nodes, making it a much more scalable solution than a scale-up
implementation.
Scale-up Solution
The two controllers can run active/active or active-passive. For more capacity, add
another disk array. Each of these components is added individually. As more
systems are added, NAS sprawl becomes an issue.
Scale-up Storage
Scale-up storage is the traditional architecture that is dominant in the enterprise
space. High performance, high availability single systems that have a fixed capacity
ceiling characterize scale-up.
Serial Console
The serial console is used for initial cluster configurations by establishing serial
access to the node designated as node 1.
SmartDedupe
OneFS deduplication saves a single instance of data when multiple identical
instances of that data exist, in effect, reducing storage consumption. Deduplication
can be done at various levels: duplicate files, duplicate blocks in files, or identical
extents of data within files. Stored data on the cluster is inspected, block by block,
and one copy of duplicate blocks is saved, thus reducing storage expenses by
reducing storage consumption. File records point to the shared blocks, but file
metadata is not deduplicated.
SmartLock Compliance
PowerScale Administration-SSP1
SmartLock WORM
SmartLock provides WORM (write-once/read-many) status on files. In a WORM
state, files can be read but not modified. "Committing" a file is changing a file from
a read/write state to a WORM state that has a retention expiration date. Files are
committed to a WORM state when using SmartLock.
SmartPools
SmartPools is a software module that enables administrators to define and control
file management policies within a cluster.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 435
A single tier has only one file pool policy that applies the same protection level and
I/O optimization settings to all files and folders in the cluster. The basic version of
SmartPools supports virtual hot spares, enabling space reservation in a node pool
for reprotection of data. OneFS implements SmartPools basic by default.
PowerScale Administration-SSP1
Snapshot Schedule
The most common method is to use schedules to generate the snapshots. A
snapshot schedule generates snapshots of a directory according to a schedule. A
benefit of scheduled snapshots is not having to manually create a snapshot every
time wanted. An expiration period should be assigned to the snapshots that are
generated, automating the deletion of snapshots after the expiration period.
SnapshotIQ
OneFS snapshots are used to protect data against accidental deletion and
modification. Because snapshots are available locally, users can restore their data
without administrative intervention.
Stateless Connection
A stateless connection maintains the session or “state” information about the client
side. If a node goes down, the IP address that the client is connected to fails over
to another node in the cluster. The client would not know that their original node
had failed.
WebUI
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 437
The browser-based OneFS web administration interface provides secure access
with OneFS-supported browsers. This interface is used to view robust graphical
monitoring displays and to perform cluster-management tasks.
Windows ACL
A Windows ACL is a list of access control entries, or ACEs. Each entry contains a
user or group and a permission that allows or denies access to a file or folder.
PowerScale Administration-SSP1
© Copyright
Internal Use - Confidential 2020 Dell Inc. Page 439