Isilon OneFS 8.1.2 Hortonworks Installation Guide
Isilon OneFS 8.1.2 Hortonworks Installation Guide
Abstract
This guide walks you through the process of installing PowerScale OneFS
with Hadoop for use with the Hortonworks Data Platform (HDP) 3.0.1 and
later, and the Apache Ambari manager 2.7.1 and later.
Copyright © 2020 Dell Inc. or its subsidiaries. All rights reserved.
Dell believes the information in this publication is accurate as of its publication date. The information
is subject to change without notice.
Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may
be the property of their respective owners.
EMC Corporation
Hopkinton, Massachusetts 01748-9103
1-508-435-1000 In North America 1-866-464-7381
www.EMC.com
Audience
This guide is intended for systems administrators, IT program managers, IT architects, and IT managers who
are installing OneFS 8.1.2.0 or later with Ambari 2.7.1.0 and later and HDP 3.0.1.0 and later.
Overview
The PowerScale OneFS scale-out network-attached storage (NAS) platform provides Hadoop clients with direct
access to big data through a Hadoop Distributed File System (HDFS) interface. An PowerScale cluster that is
powered by the OneFS operating system delivers a scalable pool of storage with a global namespace.
Hadoop compute clients can access the data that is stored in an PowerScale OneFS cluster by connecting to
any node over the HDFS protocol. All nodes that are configured for HDFS provide NameNode and DataNode
functionality. Each node boosts performance and expands the cluster capacity. For Hadoop analytics, the
PowerScale scale-out distributed architecture minimizes bottlenecks, rapidly serves big data, and optimizes
performance for MapReduce jobs.
In a traditional Hadoop deployment, the Hadoop compute nodes run analytics jobs against large sets of data.
A NameNode directs the nodes to the data stored on a series of DataNodes. The NameNode is a separate
server that holds metadata for every file that is stored on the DataNode. Often data is stored in production
environments and then copied to a landing zone server before it is loaded on to HDFS. This process is network
intensive and exposes the NameNode as a potential single point of failure.
The Hortonworks distribution is stored on the compute cluster, and the clients connect to the PowerScale
OneFS cluster over the HDFS protocol to store and access Hadoop data.
Prerequisites
For supported versions, see Hadoop Distributions and Products Supported by OneFS.
• Hortonworks Data Platform (HDP) 3.0.1 or later with Ambari 2.7.1 or later
• Password-less SSH is configured
o See the Hortonworks documentation for configuring Password-less SSH.
• Familiarity with the Ambari and Hortonworks documentation and the installation instructions
o To view the Ambari and the Hortonworks Data Platform (HDP) documents, go to
http://docs.hortonworks.com/index.html
o Use the following table to record the components that you have installed.
Component Version
Ambari version
PowerScale OneFS
SmartConnect module
HDFS module
1 Preparing OneFS
2 Preparing Ambari
3 Configuring Ambari
Preparing OneFS
Complete the following steps to configure your OneFS cluster for use with Ambari and Hortonworks Data
Platform. Preparing OneFS requires you to configure DNS, SmartConnect, and Access Zones to allow for the
Hadoop cluster to connect to the OneFS cluster. If these preparation steps are not successful, the subsequent
configuration steps might fail.
Review the current Isilon OneFS and Hadoop Known Issues for any changes or updates to OneFS and Hadoop
configuration.
1. From a node in your OneFS cluster, confirm that the cluster is running OneFS 8.1.2 or later by typing the
following command:
isi version
3. Confirm that license for HDFS is operational. If this license is not active and valid, some commands in this
guide might not work.
4. If your modules are not licensed, obtain a license key from your Dell EMC PowerScale sales representative.
Type the following command to activate the license:
isi license add --path <license file path>
6. Install the latest rollup patches for your version of OneFS. See Current Isilon OneFS Patches for the latest
rollup patches and run the following:
isi upgrade patches list
isi upgrade patches install patch-<patch-ID>.pkg –-rolling=false
Example:
isi upgrade patches install patch-240163.pkg –-rolling=false
Use the following table to record the configuration information for the OneFS cluster with Hortonworks Ambari
integration:
Parameter Value
Ambari NameNode
3. Create the HDFS root directory within the access zone that you created:
mkdir -p /ifs/data/zone1/hdp/Hadoop
isi hdfs settings modify --zone=zone1-hdp –-root-
directory=/ifs/data/zone1/hdp/Hadoop
For example:
isi network pools create –-id=groupnet0:subnet0:hadoop-pool-hdp --
ranges=10.120.130.30-10.120.140.40 --access-zone=zone1-hdp --alloc-method=static --
ifaces=1-4:40gige-1 --sc-subnet=subnet0 --sc-dns-zone=hdp.zone1.emc.com --
description=hadoop"
Set up DNS records for a SmartConnect zone. Create the required DNS records that are used to access your
OneFS cluster from the Hadoop cluster. All hosts in your Hadoop cluster must be configured for both forward
and reverse DNS lookups. Hadoop relies heavily on DNS and performs many DNS lookups during normal
operation.
You can set up a SmartConnect zone for the connections from Hadoop compute clients. SmartConnect is a
module that specifies how the OneFS cluster handles connection requests from clients. For additional
information and best practices for SmartConnect, see the Isilon External Network Connectivity Guide.
Each SmartConnect zone represents a specific pool of IP addresses. When you associate a SmartConnect zone
with an access zone, OneFS allows only clients that connect through the IP addresses in the SmartConnect
zone to reach the HDFS data in the access zone. A root HDFS directory is specified for each access zone. This
configuration isolates data within access zones and allows you to restrict client access to the data.
A SmartConnect zone distributes NameNode requests from Hadoop compute clients across the node
interfaces in the IP pool. Each nodes NameNode process replies with the IP address of any OneFS node where
the client can access the data. When a Hadoop compute client makes an initial DNS request to connect to the
SmartConnect zone FQDN, the Hadoop client requests are delegated to the SmartConnect Service IP, which
responds with a valid node to connect to. The client connects to an OneFS node that serves as a NameNode.
When a second Hadoop client makes a DNS request to connect to the SmartConnect zone, the SmartConnect
Service routes the client connection to a different node than the node that is used by the previous Hadoop
compute client.
When you create a SmartConnect zone, you must add a Name Server (NS) record as a delegated domain to the
authoritative DNS zone that contains the OneFS cluster.
When you view the output of this command, note that different IP addresses are returned for each ping
command. With each DNS response, the IP addresses are returned through rotating round-robin DNS from the
list of potential IP addresses. This validates that the SmartConnect zone name FQDN is operating correctly.
Important
Dell EMC PowerScale recommends that you maintain consistent names and numeric IDs for all users and
groups on the OneFS cluster and your Hadoop clients. This consistency is important in multiprotocol
environments because the HDFS protocol refers to users and groups by name, and NFS refers to users and
groups by their numeric IDs (UIDs and GIDs). Maintaining this parity is critical in the behavior of OneFS
multiprotocol file access.
During installation the Hadoop installer creates all the required system accounts. For example, a Hadoop
system account, yarn, is created with the UID of 502 and the GID of 500 on the Hadoop cluster nodes. Since
the Hadoop installer cannot create the local accounts directly on OneFS, they must be created manually.
Create the OneFS yarn local account user in the OneFS access zone in which yarn accesses data. Create a local
user yarn with the UID of 502 and the GID of 500 to ensure consistency of access and permissions.
For guidance and more information about maintaining parity between OneFS and Hadoop local users and
UIDs, see the following blog post: Isilon and Hadoop Local User UID Parity
There are many methods of achieving UID and GID parity. You can leverage Tools for Using Hadoop with
OneFS, perform manual matching, or create scripts that parse users and create the equivalent users. However
you choose to achieve this, the sequence depends on your deployment methodology and management
practices. It is highly recommended that you maintain consistency between the Hadoop cluster and OneFS, for
example, hdfs=hdfs, yarn=yarn, hbase=hbase, and so on, from a UID and GID consistency perspective.
Create users and directories on the OneFS cluster using Tools for Using Hadoop with OneFS
Go to Tools for Using Hadoop with OneFS to set up the users and directories on the cluster.
Warning
If you want the users and groups to be defined by your directory service, such as Active Directory or LDAP, do
not run these commands. This section addresses setting permissions of the HDFS root files or membership to
run jobs. These steps create users but will likely fail when you run jobs with this configuration.
3. Assign permissions to the user's home directory on the Hadoop cluster. The ID 2 in the example below is
from when you previously ran the isi zone zones view zone1 command.
isi_run -z2 chown hduser1:hduser1 /ifs/data/zone1/hdp/user/hduser1
chmod 755 /ifs/data/zone1/hdp/hadoop/user/hduser1
On a node in the OneFS 8.1.2 cluster, create and configure the HDFS root directory.
2. Set the HDFS root directory for the access zone. Note: It is recommended that the directory for the access
zone is not set to the root of /ifs.
isi hdfs settings modify --zone=zone1-hdp –-root-
directory=/ifs/data/zone1/hdp/hadoop
3. Map the HDFS user to root. Create a user mapping rule to map the HDFS user to the OneFS root account.
This mapping enables the services from the Hadoop cluster to communicate with the OneFS cluster using
the correct credentials.
isi zone modify –add-user-mapping-rules=”hdfs=>root[]” --zone=zone1-hdp
isi zone modify --add-user-mapping-rules="yarn-ats-hbase=>yarn-ats" –-zone=zone1-
hdp
Note: User mapping yarn-ats-hbase to yarn-ats is required only if HDP and OneFS clusters are going to be
secured (Kerberized).
You can skip yarn-ats-hbase to yarn-ats user mapping in two cases as follows:
b. You do not need to set user mapping on OneFS if TLSv2.0 is configured on external HBase.
For more details, see: https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-
3.0.1/data-operating-system/content/dosg_timeline_service_2.0_installation.html
For example:
isi hdfs settings modify --zone=zone1-hdp --ambari-namenode=hdfs.hop-isi-
m.solarch.lab.emc.com
For example:
isi hdfs settings modify --zone=zone1-hdp --ambari-server=amb-srv.hop-isi-
m.solarch.lab.emc.com
6. Create an indicator file in the Hadoop directory to view your OneFS cluster and access zone through HDFS.
touch /ifs/data/zone1/hdp/hadoop/THIS_IS_ISILON-hdp.txt
7. Modify the ACL setting for OneFS 8.1.2 and earlier only.
Run the following command on a node in the OneFS cluster to modify ACL settings before you create
directories or files in the next section. This creates the correct permission behavior on the cluster for
HDFS.
Note
ACL policies are cluster-wide, so you should understand this change before performing it on production
clusters.
isi auth settings acls modify --group-owner-inheritance=parent
isi auth settings acls view
2. Set the HDFS root directory for the access zone. Note: It is recommended that the directory for the access
zone is not set to the root of /ifs.
isi hdfs settings modify --zone=zone1-hdp –-root-
directory=/ifs/data/zone1/hdp/hadoop
3. Assign the Ambari NameNode in the access zone and associate the SmartConnect name with it.
isi hdfs settings modify --zone=<zone> --ambari-namenode=<my-smartconnectzone-name>
For example:
isi hdfs settings modify --zone=zone1-hdp --ambari-namenode=hdfs.hop-isi-
m.solarch.lab.emc.com
For example:
isi auth roles modify HdfsAccess --add-priv=ISI_PRIV_IFS_RESTORE --zone=zone1-hdp
For example:
isi auth roles modify HdfsAccess --add-priv=ISI_PRIV_IFS_BACKUP --zone=zone-hdp
For example:
isi auth roles modify HdfsAccess --add-user=hdfs --zone=zone1-hdp
8. Verify the role setup, backup/restore privileges, and HDFS user setup.
isi auth roles view <role_name> --zone=<access_zone>
For example:
isi auth roles view HdfsAccess --zone=zone1-hdp
Name: HdfsAccess
Description: Bypass FS permissions
Members: - hdfs
Privileges
ID: ISI_PRIV_IFS_BACKUP
Read Only: True
ID: ISI_PRIV_IFS_RESTORE
Read Only: True
9. (Optional) Flush auth mapping and auth cache to make the HDFS user take immediate effect as the
“HdfsAccess” role that you created above.
isi_for_array "isi auth mapping flush --all"
isi_for_array "isi auth cache flush --all"
Note
ACL Policies do not need to be modified for OneFS 8.2 and later. The HDFS protocols act the same as non-
OneFS HDFS for File System Group Owner inheritance.
The Isilon OneFS Ambari Management Pack is a software component that can be installed in Ambari
to define OneFS as a service in a Hadoop cluster. The management pack allows an Ambari
administrator to start, stop, and configure OneFS as an HDFS storage service. This provides native
namenode and datanode capabilities similar to traditional HDFS.
Important
Complete the steps in the Hortonworks guide in section 1, "Getting Ready", and section 4, "Installing Ambari."
After you start the Ambari server, do not continue to section 6 of the Hortonworks Guide until after you have
completed the instructions that are described in the Preparing OneFS section of this guide.
Complete the following steps that are described in the Hortonworks Guide:
1. Download the Ambari repository for the operating system that runs your installation host.
3. Download the Isilon OneFS Ambari Management Pack installation bundle from the product download
page and extract the contents on to the Ambari server.
4. Install the management pack on the Ambari server by running the following command:
ambari-server install-mpack --mpack=<tar file_name.tar.gz> –verbose
For example:
ambari-server install-mpack --mpack=isilon-onefs-mpack-1.0.0.0-SNAPSHOT.tar.gz –-
verbose
Note
The Isilon OneFS Ambari Management Pack includes a setting for Yarn that you may need to change. The
Yarn Timeline Service 2.0 relies on Apache HBase for backend storage. As PowerScale is a single storage
tier from Yarn’s perspective, the storage policy for HBase is set to NONE in the Yarn-HBase-site. If your
Yarn deployment uses an external HBase for Timeline Service 2.0, then the storage policy settings should
be changed to the HBase default, HOT, or whatever is appropriate for your environment.
Important: Do not continue to section 6 of the Hortonworks Guide until the OneFS cluster is prepared as
described in the following steps and is ready to be integrated into Ambari during the installation.
2. In the OneFS Service Settings section, specify your SmartConnect FQDN and any other HDFS configuration
settings that you want to change.
1. After starting the Ambari service, open Ambari Web using a web browser.
5. In the Name your cluster field, type a unique name for the cluster.
Note: As a best practice Ambari Cluster names should be fewer than seven characters. Longer cluster
names require additional configuration for multitenant AD installation due to OneFS specific
requirements. Use abbreviated cluster names where possible to facilitate integration with OneFS. For
example: hdp1, h-prod1, or similar.
9. Under Host Registration Information section, attach the SSH private key.
The creation of this host and key was performed before running the Ambari wizard.
See the Hortonworks Ambari documentation for additional details.
Important
f. Copy the output to a file and save the file on your desktop.
g. Copy the file to the machine on which you are running the web-based Ambari Install Wizard.
13. Ensure that all the ranger requirements are met before you click Proceed on the Ranger Requirements
screen.
a. On the CREDENTIALS tabbed page, specify values as shown in the following screen:
You are now configured to use a OneFS cluster with Hortonworks for Apache Hadoop.
HDFS policies that are defined in Ranger are checked before the native file access control is applied. This two-
layered authorization model differs in the way the standard Ranger HDFS policies are checked with Directed
Attached Storage (DAS), but the model is suitable for using OneFS as a multiprotocol data lake with Hadoop.
OneFS native file system ACL allows a storage administrator to correctly set up access control for multiple
workloads and with multiprotocol access to the HDFS dataset. A Ranger administrator can apply a further
restrictive Hadoop user access control to the same HDFS dataset, thus providing the administrators the
appropriate control span within their management domains.
In a OneFS cluster with Hadoop deployment, Ranger authorization policies serve as a filter before applying the
native file access control.
Notes
• The Ranger Audit and Transparent Data Encryption components are not supported.
• You can enable Apache Ranger on OneFS clusters and then check for new authorization policies,
receive HDFS requests from clients, and apply authorization policies to the HDFS requests which can
be one of DENY, ALLOW, or UNDETERMINED.
• The Ranger DENY policy takes precedence over the ALLOW policy.
• The Ranger DENY policy prevents user or group access to files and directories in OneFS that the file
system would have otherwise allowed the users or groups to access.
To understand how the Ranger policies are applied, consider this example: A user in the Sales group requires
access to certain files and directories that have specific HDFS file system ownership and permissions as
shown:
One policy that provides everyone including the Sales group, access to Yes No Yes No
the root directory.
1. On a per-access zone basis, perform the following steps using the OneFS Web administration interface or
the OneFS command-line administration interface to configure Ranger:
b. Specify the URL of the Apache Ranger Management console to use port 6182 for https and 6080 for
http to get the policies.
2. Ensure that a Ranger service account user is configured on OneFS within your access zone.
3. Install Ranger using the steps outlined in the Hortonworks Security Guide.
4. Enable the Apache Ranger HDFS plug-in using the steps outlined in the Hortonworks Security Guide.
5. If you have a Kerberos-enabled cluster, follow the instructions in the Hortonworks Security Guide to
enable the Ranger HDFS plug-in on the cluster.
6. Enable the Ranger Deny policy using the instructions in the Apache Ranger deny policies with OneFS
8.0.1.0 article.
Create a service instance for OneFS using the Create Service page in Ranger. See the Hortonworks Security
Guide for details. Specify values in the following fields to create a service instance for OneFS and make
note of the values:
• Specify a value in the Service Name field and make note of it because you must use the same
value in OneFS.
• Specify a username and in the Config Properties section specific to the service instance. The Test
Connection option continues to fail until you have saved and reopened the service instance.
A service instance is created with a default policy all - path, granting access to all the files for
the user that you included in the Service Details page.
7. Add all your groups and individual users who are associated with an access zone within OneFS to the
default policy in order to grant access to the groups and users. If you create local users in OneFS, or use
Active Directory, you must change the UserSync settings in Ambari or add the users in the Ranger
interface.
Note
OneFS file system permissions take precedence even if the policy indicates that the user or group can
access everything.
8. Using the Edit Policy page in the Ranger interface, specify the group or users who have limited access to
the repository within OneFS and indicate the permissions that must be denied to that path.
9. Create a DENY policy in Ranger using the steps outlined in the Hortonworks Security Guide, if required.
After you have saved the policy, OneFS enforces the policy at the next download. If you attempt to take
action on a path that is denied access by the Ranger policy, this will be reported in the OneFS HDFS log at
/var/log/hdfs.log. For more information, see the Apache Ranger deny policies with OneFS 8.0.1.0
article.
Note: OneFS metrics for specific access zones that contain HDFS dataset is not supported.
1. Access Ambari Web by opening a supported browser and entering the Ambari Web URL.
2. Click Ambari Metrics Service > Metrics Collector to determine the hostname where Ambari Metrics
Collector has been installed.
3. From a node in your OneFS cluster, run the following command to set the access zone and to specify the
name of the external Ambari host where the Ambari Metrics Collector component is installed:
isi hdfs settings modify --zone=ZONE --ambari-metrics-collector=<FQDN of metrics
collector>
4. From the Ambari Web home page, select the OneFS service and verify that Ambari can collect metrics
details from the OneFS SmartConnect zone as shown in the following sample screen:
NameNode Uptime The OneFS node that is running for the longest time.
NameNode Heap Used: The sum of the current memory allocated by the
HDFS process (cluster-wide).
5. From the Ambari Web home page, select the OneFS service and then click the Metrics tab to create
widgets to monitor and view OneFS metrics data.
b. Select one of Gauge, Number, or Graph widget types for creating the widget. Alternatively, you can
create a widget using a new template.
ii. Under Expression, click ADD METRIC > OneFS > All OneFS Clients and then select a
metric.
d. On the Name and Description screen, provide the necessary details and click SAVE.
Note:
• You can enable wire encryption per access zone in OneFS.
• Enabling HDFS wire encryption with an Access Zone could result in HDFS traffic performance
degradation while accessing data in that zone. You can characterize the performance impact as wire
encryption enabled to determine whether this is acceptable to your workload.
Note
HDFS wire encryption that is supported by Dell EMC PowerScale is different from the Apache HDFS Transparent
Data Encryption technology.
You can configure HDFS wire encryption using the OneFS web administration interface or command-line
administration interface. See the Isilon OneFS HDFS Reference Guide for details.
Telephone Support:
United States: 800-782-4362 (800-SVC-4EMC)
Canada: 800-543-4782
Worldwide: +1-508-497-7901
Other worldwide access numbers