CDH5 Security Guide
CDH5 Security Guide
Important Notice
(c) 2010-2015 Cloudera, Inc. All rights reserved.
Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service
names or slogans contained in this document are trademarks of Cloudera and its
suppliers or licensors, and may not be copied, imitated or used, in whole or in part,
without the prior written permission of Cloudera or the applicable trademark holder.
Hadoop and the Hadoop elephant logo are trademarks of the Apache Software
Foundation. All other trademarks, registered trademarks, product names and
company names or logos mentioned in this document are the property of their
respective owners. Reference to any products, services, processes or other
information, by trade name, trademark, manufacturer, supplier or otherwise does
not constitute or imply endorsement, sponsorship or recommendation thereof by
us.
Complying with all applicable copyright laws is the responsibility of the user. Without
limiting the rights under copyright, no part of this document may be reproduced,
stored in or introduced into a retrieval system, or transmitted in any form or by any
means (electronic, mechanical, photocopying, recording, or otherwise), or for any
purpose, without the express written permission of Cloudera.
Cloudera may have patents, patent applications, trademarks, copyrights, or other
intellectual property rights covering subject matter in this document. Except as
expressly provided in any written license agreement from Cloudera, the furnishing
of this document does not give you any license to these patents, trademarks
copyrights, or other intellectual property. For information about patents covering
Cloudera products, see http://tiny.cloudera.com/patents.
The information in this document is subject to change without notice. Cloudera
shall not be liable for any damages resulting from technical errors or omissions
which may be present in this document, or from use of this document.
Cloudera, Inc.
1001 Page Mill Road Bldg 2
Palo Alto, CA 94304
[email protected]
US: 1-888-789-1488
Intl: 1-650-362-0488
www.cloudera.com
Release Information
Version: 5.1.x
Date: April 16, 2014
Table of Contents
About this Guide ......................................................................................................11
Introduction to Hadoop Security............................................................................13
Hadoop Users in CDH 5...........................................................................................15
Configuring Hadoop Security in CDH 5..................................................................19
Step 1: Install CDH 5..................................................................................................................................20
Step 2: Verify User Accounts and Groups in CDH 5 Due to Security.....................................................20
Step 2a (MRv1 only): Verify User Accounts and Groups in MRv1......................................................................20
MRv1: Directory Ownership in the Local File System.........................................................................................21
MRv1: Directory Ownership on HDFS...................................................................................................................21
Step 2b (YARN only): Verify User Accounts and Groups in YARN......................................................................22
YARN: Directory Ownership in the Local File System.........................................................................................22
YARN: Directory Ownership on HDFS...................................................................................................................23
Step 3: If you are Using AES-256 Encryption, install the JCE Policy File..............................................24
Step 4: Create and Deploy the Kerberos Principals and Keytab Files..................................................25
When to Use kadmin.local and kadmin...............................................................................................................25
To create the Kerberos principals.........................................................................................................................26
To create the Kerberos keytab files......................................................................................................................27
To deploy the Kerberos keytab files.....................................................................................................................27
Appendix A Troubleshooting.............................................................................179
Sample Kerberos Configuration files: krb5.conf, kdc.conf, kadm5.acl...............................................179
Problem 1: Running any Hadoop command fails after enabling security. .......................................181
Problem 2: Java is unable to read the Kerberos credentials cache created by versions of MIT
Kerberos 1.8.1 or higher. ..................................................................................................................181
Problem 3: java.io.IOException: Incorrect permission.........................................................................182
Example Rules.........................................................................................................................................194
Default Rule..............................................................................................................................................195
Testing Mapping Rules...........................................................................................................................195
Unix User ID
Primary Group
Apache Avro
No special users.
flume
hbase
HDFS
hdfs
hdfs
impala
Apache Hive
hive
hive
impala
in hive-site.xml.
Apache
HCatalog
hive
hive
HttpFS
httpfs
httpfs
Hue
hue
hue
Cloudera
Impala
impala
impala
Llama
llama
llama
Apache
Mahout
MapReduce
No special users.
mapred
mapred
Unix User ID
Primary Group
Apache Oozie
oozie
oozie
Parquet
No special users.
Apache Pig
No special users.
Cloudera
Search
solr
spark
sentry
sqoop
Apache
Sqoop2
sqoop
solr
sqoop2
Apache Whirr
No special users.
YARN
yarn
yarn
Apache
ZooKeeper
zookeeper
zookeeper
Other
hadoop
yarn, hdfs,
mapred
Note:
The Kerberos principal names should be of the format,
username/[email protected], where the term username refers to
the username of an existing UNIX account, such as hdfs or mapred. The table below lists the usernames
to be used for the Kerberos principal names. For example, the Kerberos principal for Apache Flume
would be flume/[email protected].
Table 2: CDH 5 Keytabs and Keytab File Permissions
Project (UNIX ID) Service
Kerberos
Principal
Primary
Filename
(.keytab)
Flume (flume)
flume-AGENT
flume
flume
flume
flume
600
HBase (hbase)
hbase-REGIONSERVER
hbase
hbase
hbase
hbase
600
Kerberos
Principal
Primary
Filename
(.keytab)
hdfs
hdfs
hdfs
hdfs
600
hive
hive
600
hbaseHBASETHRIFTSERVER
hbase- HBASERESTSERVER
hbase-MASTER
HDFS (hdfs)
hdfs-NAMENODE
Secondary:
Merge hdfs
and HTTP
hdfs-DATANODE
hdfsSECONDARYNAMENODE
Hive (hive)
hive-HIVESERVER2
hive
hive
hive-WEBHCAT
HTTP
HTTP
hive-HIVEMETASTORE
hive
hive
httpfs
httpfs
httpfs
httpfs
600
Hue (hue)
hue-KT_RENEWER
hue
hue
hue
hue
600
Impala (impala)
impala-STATESTORE
impala
impala
impala
impala
600
llama
llama
llama
llama
600
mapred
hadoop
600
oozie
oozie
600
solr
solr
600
impala-CATALOGSERVER
impala-IMPALAD
Llama (llama)
impala-LLAMA
Secondary:
Merge
llama and
HTTP
MapReduce
(mapred)
mapreduce-JOBTRACKER
Oozie (oozie)
oozie-OOZIE_SERVER
mapred
mapred
Secondary:
Merge
mapred
and HTTP
mapreduce- TASKTRACKER
oozie
oozie
Secondary:
Merge
oozie and
HTTP
Search (solr)
solr-SOLR_SERVER
solr
solr
Secondary:
Merge solr
and HTTP
Sentry (sentry)
sentry-SENTRY_SERVER
sentry
sentry
sentry
sentry
600
Spark (spark)
spark_on_yarn-SPARK
_YARN_HISTORY_SERVER
spark
spark
spark
spark
600
Kerberos
Principal
Primary
Filename
(.keytab)
yarn
yarn
yarn
Sqoop (sqoop)
Sqoop2 (sqoop2)
YARN (yarn)
yarn-NODEMANAGER
yarn- RESOURCEMANAGER
yarn-JOBHISTORY
ZooKeeper
(zookeeper)
zookeeper-server
Secondary:
Merge yarn
and HTTP
hadoop
644
600
600
Here are the general steps to configuring secure Hadoop, each of which is described in more detail in the following
sections:
1. Install CDH 5.
2. Verify User Accounts and Groups in CDH 5 Due to Security.
3. If you are Using AES-256 Encryption, install the JCE Policy File.
4. Create and Deploy the Kerberos Principals and Keytab Files.
5. Shut Down the Cluster.
6. Enable Hadoop security.
7. Configure secure HDFS.
8. Optional: Configuring Security for HDFS High Availability.
9. Optional: Configuring secure WebHDFS.
10. Optional: Configuring secure NFS
11. Set Variables for Secure DataNodes.
12. Start up the NameNode.
13. Start up a DataNode.
14. Set the Sticky Bit on HDFS Directories.
15. Start up the Secondary NameNode (if used).
16. Configure Either MRv1 Security or YARN Security.
Note:
Kerberos security in CDH 5 has been tested with the following version of MIT Kerberos 5:
krb5-1.6.1 on Red Hat Enterprise Linux 5 and CentOS 5
Kerberos security in CDH 5 is supported with the following versions of MIT Kerberos 5:
hdfs
mapred
Directory
Owner
Local
dfs.namenode.name.dir hdfs:hdfs
(dfs.name.dir is
Permissions
drwx------
dfs.datanode.data.dir hdfs:hdfs
(dfs.data.dir is
drwx------
mapred.local.dir
mapred:mapred
drwxr-xr-x
Directory
Owner
Permissions
Local
HDFS_LOG_DIR
hdfs:hdfs
drwxrwxr-x
Local
MAPRED_LOG_DIR
mapred:mapred
drwxrwxr-x
Local
userlogs directory in
MAPRED_LOG_DIR
mapred:anygroup
Directory
Owner
HDFS
mapreduce.jobtracker.system.dir mapred:hadoop
(mapred.system.dir is
Permissions
drwx------
1
2
/ (root directory)
hdfs:hadoop
drwxr-xr-x
In CDH 5, package installation and the Hadoop daemons will automatically configure the correct permissions
for you if you configure the directory ownership correctly as shown in the table above.
When starting up, MapReduce sets the permissions for the mapreduce.jobtracker.system.dir (or
mapred.system.dir) directory in HDFS, assuming the user mapred owns that directory.
CDH 5 Security Guide | 21
If kinit hdfs does not work initially, run kinit -R after running kinit to obtain credentials. (For more
information, see Problem 2 in Appendix A - Troubleshooting). To change the directory ownership on HDFS, run
the following commands. Replace the example /mapred/system directory in the commands below with the
HDFS directory specified by the mapreduce.jobtracker.system.dir (or mapred.system.dir) property in
the conf/mapred-site.xml file:
$
$
$
$
sudo
sudo
sudo
sudo
-u
-u
-u
-u
hdfs
hdfs
hdfs
hdfs
hadoop
hadoop
hadoop
hadoop
fs
fs
fs
fs
-chown
-chown
-chmod
-chmod
mapred:hadoop /mapred/system
hdfs:hadoop /
-R 700 /mapred/system
755 /
In addition (whether or not Hadoop security is enabled) create the /tmp directory. For instructions on creating
/tmp and setting its permissions, see these instructions.
hdfs
HDFS: NameNode, DataNodes, Standby NameNode (if you are using HA)
yarn
mapred
Important:
The HDFS and YARN daemons must run as different Unix users; for example, hdfs and yarn. The
MapReduce Job History server must run as user mapred. Having all of these users share a common
Unix group is recommended; for example, hadoop.
Directory
Local
dfs.namenode.name.dir hdfs:hdfs
(dfs.name.dir is
Owner
Directory
Owner
Local
dfs.datanode.data.dir hdfs:hdfs
(dfs.data.dir is
yarn.nodemanager.local-dirs yarn:yarn
drwxr-xr-x
Local
yarn.nodemanager.log-dirs yarn:yarn
drwxr-xr-x
Local
container-executor
root:yarn
--Sr-s---
Local
conf/container-executor.cfg root:yarn
r--------
You must also configure the following permissions for the HDFS, YARN and MapReduce log directories (the
default locations in /var/log/hadoop-hdfs, /var/log/hadoop-yarn and /var/log/hadoop-mapreduce):
File System
Directory
Owner
Permissions
Local
HDFS_LOG_DIR
hdfs:hdfs
drwxrwxr-x
Local
$YARN_LOG_DIR
yarn:yarn
drwxrwxr-x
Local
MAPRED_LOG_DIR
mapred:mapred
drwxrwxr-x
Directory
Owner
Permissions
HDFS
/ (root directory)
hdfs:hadoop drwxr-xr-x
HDFS
yarn.nodemanager.remote-app-log-dir
yarn:hadoop drwxrwxrwxt
HDFS
mapreduce.jobhistory.intermediate-done-dir
mapred:hadoop drwxrwxrwxt
HDFS
mapreduce.jobhistory.done-dir
mapred:hadoop drwxr-x---
In CDH 5, package installation and the Hadoop daemons will automatically configure the correct permissions
for you if you configure the directory ownership correctly as shown in the two tables above. See also Deploying
MapReduce v2 (YARN) on a Cluster.
CDH 5 Security Guide | 23
In addition (whether or not Hadoop security is enabled) create the /tmp directory. For instructions on creating
/tmp and setting its permissions, see these instructions
In addition (whether or not Hadoop security is enabled), change permissions on the /user/history Directory.
See these instructions here.
Step 3: If you are Using AES-256 Encryption, install the JCE Policy File
If you are using CentOS/Red Hat Enterprise Linux 5.6 or later, or Ubuntu, which use AES-256 encryption by
default for tickets, you must install the Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy
File on all cluster and Hadoop user machines. For JCE Policy File installation instructions, see the README.txt
file included in the jce_policy-x.zip file.
Alternatively, you can configure Kerberos to not use AES-256 by removing aes256-cts:normal from the
supported_enctypes field of the kdc.conf or krb5.conf file. Note that after changing the kdc.conf file, you'll
need to restart both the KDC and the kadmin server for those changes to take affect. You may also need to
recreate or change the password of the relevant principals, including potentially the Ticket Granting Ticket
principal (krbtgt/REALM@REALM). If AES-256 is still used after all of those steps, it's because the
aes256-cts:normal setting existed when the Kerberos database was created. To fix this, create a new Kerberos
database and then restart both the KDC and the kadmin server.
To verify the type of encryption used in your cluster:
1. On the local KDC host, type this command to create a test principal:
$ kadmin -q "addprinc test"
3. On a cluster host, type this command to view the encryption type in use:
$ klist -e
If AES is being used, output like the following is displayed after you type the klist command (note that
AES-256 is included in the output):
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: test@SCM
Valid starting
Expires
Service principal
05/19/11 13:25:04 05/20/11 13:25:04 krbtgt/SCM@SCM
Etype (skey, tkt): AES-256 CTS mode with 96-bit SHA-1 HMAC, AES-256 CTS mode
with 96-bit SHA-1 HMAC
Step 4: Create and Deploy the Kerberos Principals and Keytab Files
A Kerberos principal is used in a Kerberos-secured system to represent a unique identity. Kerberos assigns
tickets to Kerberos principals to enable them to access Kerberos-secured Hadoop services. For Hadoop, the
principals should be of the format username/[email protected]. In this guide,
the term username in the username/[email protected] principal refers to
the username of an existing Unix account, such as hdfs or mapred.
A keytab is a file containing pairs of Kerberos principals and an encrypted copy of that principal's key. The keytab
files are unique to each host since their keys include the hostname. This file is used to authenticate a principal
on a host to Kerberos without human interaction or storing a password in a plain text file. Because having access
to the keytab file for a principal allows one to act as that principal, access to the keytab files should be tightly
secured. They should be readable by a minimal set of users, should be stored on local disk, and should not be
included in machine backups, unless access to those backups is as secure as access to the local machine.
Important:
For both MRv1 and YARN deployments: On every machine in your cluster, there must be a keytab file
for the hdfs user and a keytab file for the mapred user. The hdfs keytab file must contain entries for
the hdfs principal and a HTTP principal, and the mapred keytab file must contain entries for the mapred
principal and a HTTP principal. On each respective machine, the HTTP principal will be the same in
both keytab files.
In addition, for YARN deployments only: On every machine in your cluster, there must be a keytab file
for the yarn user. The yarn keytab file must contain entries for the yarn principal and a HTTP principal.
On each respective machine, the HTTP principal in the yarn keytab file will be the same as the HTTP
principal in the hdfs and mapred keytab files.
Note:
The following instructions illustrate an example of creating keytab files for MIT Kerberos. If you are
using another version of Kerberos, refer to your Kerberos documentation for instructions. You may
use either kadmin or kadmin.local to run these commands.
OR:
$ kadmin
Note:
In this guide, kadmin is shown as the prompt for commands in the kadmin shell, but you can type
the same commands at the kadmin.local prompt in the kadmin.local shell.
Note:
If your Kerberos administrator or company has a policy about principal names that does not allow
you to use the format shown above, you can work around that issue by configuring the <kerberos
principal> to <short name> mapping that is built into Hadoop. For more information, see
Appendix C - Configuring the Mapping from Kerberos Principals to Short Names.
2. Create the mapred principal. If you are using MRv1, the mapred principal is used for the JobTracker and
TaskTrackers. If you are using YARN, the mapred principal is used for the MapReduce Job History Server.
kadmin:
3. YARN only: Create the yarn principal. This principal is used for the ResourceManager and NodeManager.
kadmin:
Important:
The HTTP principal must be in the format
HTTP/[email protected]. The first component of the principal
must be the literal string "HTTP". This format is standard for HTTP principals in SPNEGO and is
hard-coded in Hadoop. It cannot be deviated from.
2. Create the mapred keytab file that will contain the mapred principal and HTTP principal. If you are using MRv1,
the mapred keytab file is used for the JobTracker and TaskTrackers. If you are using YARN, the mapred keytab
file is used for the MapReduce Job History Server.
kadmin: xst -norandkey -k mapred.keytab mapred/fully.qualified.domain.name
HTTP/fully.qualified.domain.name
3. YARN only: Create the yarn keytab file that will contain the yarn principal and HTTP principal. This keytab
file is used for the ResourceManager and NodeManager.
kadmin: xst -norandkey -k yarn.keytab yarn/fully.qualified.domain.name
HTTP/fully.qualified.domain.name
4. Use klist to display the keytab file entries; a correctly-created hdfs keytab file should look something like
this:
$ klist -e -k -t hdfs.keytab
Keytab name: WRFILE:hdfs.keytab
slot KVNO Principal
---- ---- --------------------------------------------------------------------1
7
HTTP/[email protected] (DES cbc mode with
CRC-32)
2
7
HTTP/[email protected] (Triple DES cbc mode
with HMAC/sha1)
3
7
hdfs/[email protected] (DES cbc mode with
CRC-32)
4
7
hdfs/[email protected] (Triple DES cbc mode
with HMAC/sha1)
5. Continue with the next section To deploy the Kerberos keytab files.
b. Make sure that the hdfs.keytab file is only readable by the hdfs user, and that the mapred.keytab file
is only readable by the mapred user.
$ sudo chown hdfs:hadoop /etc/hadoop/conf/hdfs.keytab
$ sudo chown mapred:hadoop /etc/hadoop/conf/mapred.keytab
$ sudo chmod 400 /etc/hadoop/conf/*.keytab
Note:
To enable you to use the same configuration files on every host, Cloudera recommends that
you use the same name for the keytab files on every host.
c. YARN only: Make sure that the yarn.keytab file is only readable by the yarn user.
$ sudo chown yarn:hadoop /etc/hadoop/conf/yarn.keytab
$ sudo chmod 400 /etc/hadoop/conf/yarn.keytab
Important:
If the NameNode, Secondary NameNode, DataNode, JobTracker, TaskTrackers, HttpFS, or Oozie
services are configured to use Kerberos HTTP SPNEGO authentication, and two or more of these
services are running on the same host, then all of the running services must use the same
HTTP principal and keytab file used for their HTTP endpoints.
If you only want to specify a set of users, add a comma-separated list of users followed by a blank space. Similarly,
to specify only authorized groups, use a blank space at the beginning. A * can be used to give access to all users.
For example, to give users, ann, bob, and groups, group_a, group_b access to Hadoop's DataNodeProtocol service,
modify the security.datanode.protocol.acl property in hadoop-policy.xml. Similarly, to give all users
access to the InterTrackerProtocol service, modify security.inter.tracker.protocol.acl as follows:
<property>
<name>security.datanode.protocol.acl</name>
<value>ann,bob group_a,group_b</value>
<description>ACL for DatanodeProtocol, which is used by datanodes to
communicate with the namenode.</description>
</property>
<property>
<name>security.inter.tracker.protocol.acl</name>
<value>*</value>
<description>ACL for InterTrackerProtocol, which is used by tasktrackers to
communicate with the jobtracker.</description>
</property>
Note:
If you already have principals and keytabs created for the machines where the JournalNodes are
running, then you should reuse those principals and keytabs in the configuration properties above.
You will likely have these principals and keytabs already created if you are collocating a JournalNode
on a machine with another HDFS daemon.
2. Add the following properties to the hdfs-site.xml file on every machine in the cluster. Replace the example
values shown below with the correct settings for your site.
<property>
<name>dfs.web.authentication.kerberos.principal</name>
<value>HTTP/[email protected]</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.keytab</name>
<value>/etc/hadoop/conf/HTTP.keytab</value> <!-- path to the HTTP keytab -->
</property>
HADOOP_SECURE_DN_USER=hdfs
HADOOP_SECURE_DN_PID_DIR=/var/lib/hadoop-hdfs
HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop-hdfs
JSVC_HOME=/usr/lib/bigtop-utils/
Note:
Depending on the version of Linux you are using, you may not have the /usr/lib/bigtop-utils
directory on your system. If that is the case, set the JSVC_HOME variable to the
/usr/libexec/bigtop-utils directory by using this command:
export JSVC_HOME=/usr/libexec/bigtop-utils
and:
12/05/23 18:18:31 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter to
getDelegationToken
12/05/23 18:18:31 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter to
renewDelegationToken
12/05/23 18:18:31 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter to
cancelDelegationToken
12/05/23 18:18:31 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter to fsck
12/05/23 18:18:31 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter to getimage
12/05/23 18:18:31 INFO http.HttpServer: Jetty bound to port 50070
12/05/23 18:18:31 INFO mortbay.log: jetty-6.1.26
12/05/23 18:18:31 INFO server.KerberosAuthenticationHandler: Login using keytab
/etc/hadoop/conf/hdfs.keytab, for principal
HTTP/[email protected]
12/05/23 18:18:31 INFO server.KerberosAuthenticationHandler: Initialized, principal
[HTTP/[email protected]] from keytab
[/etc/hadoop/conf/hdfs.keytab]
You can verify that the NameNode is working properly by opening a web browser to http://machine:50070/
where machine is the name of the machine where the NameNode is running.
Cloudera also recommends testing that the NameNode is working properly by performing a metadata-only
HDFS operation, which will now require correct Kerberos credentials. For example:
$ hadoop fs -ls
Note:
If you are running MIT Kerberos 1.8.1 or higher, a bug in versions of the Oracle JDK 6 Update 26 and
earlier causes Java to be unable to read the Kerberos credentials cache even after you have successfully
obtained a Kerberos ticket using kinit. To workaround this bug, run kinit -R after running kinit
initially to obtain credentials. Doing so will cause the ticket to be renewed, and the credentials cache
rewritten in a format which Java can read. For more information about this problem, see Problem 2
in Appendix A - Troubleshooting.
If you can get a single DataNode running and you can see it registering with the NameNode in the logs, then
start up all the DataNodes. You should now be able to do all HDFS operations.
After running this command, the permissions on /tmp will appear as shown below. (Note the "t" instead of the
final "x".)
$ hadoop fs -ls /
Found 2 items
drwxrwxrwt - hdfs supergroup 0 2011-02-14 15:55 /tmp
drwxr-xr-x - hdfs supergroup 0 2011-02-14 14:01 /user
Note:
If you are using HDFS HA, do not use the Secondary NameNode. See Configuring HDFS High Availability
for instructions on configuring and deploying the Standby NameNode.
You'll see some extra information in the logs such as:
10/10/26 12:03:18 INFO security.UserGroupInformation:
Login successful for user hdfs/fully.qualified.domain.name@YOUR-REALM using keytab file
/etc/hadoop/conf/hdfs.keytab
and:
12/05/23 18:33:06 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter to getimage
12/05/23 18:33:06 INFO http.HttpServer: Jetty bound to port 50090
12/05/23 18:33:06 INFO mortbay.log: jetty-6.1.26
12/05/23 18:33:06 INFO server.KerberosAuthenticationHandler: Login using keytab
/etc/hadoop/conf/hdfs.keytab, for principal
HTTP/[email protected]
12/05/23 18:33:06 INFO server.KerberosAuthenticationHandler: Initialized, principal
[HTTP/[email protected]] from keytab
[/etc/hadoop/conf/hdfs.keytab]
You should make sure that the Secondary NameNode not only starts, but that it is successfully checkpointing.
If you're using the service command to start the Secondary NameNode from the /etc/init.d scripts,
Cloudera recommends setting the property fs.checkpoint.period in the hdfs-site.xml file to a very low
value (such as 5), and then monitoring the Secondary NameNode logs for a successful startup and checkpoint.
Once you are satisfied that the Secondary NameNode is checkpointing properly, you should reset the
fs.checkpoint.period to a reasonable value, or return it to the default, and then restart the Secondary
NameNode.
You can make the Secondary NameNode perform a checkpoint by doing the following:
$ sudo -u hdfs hdfs secondarynamenode -checkpoint force
Note that this will not cause a running Secondary NameNode to checkpoint, but rather will start up a Secondary
NameNode that will immediately perform a checkpoint and then shut down. This can be useful for debugging.
Note:
If you encounter errors during Secondary NameNode checkpointing, it may be helpful to enable
Kerberos debugging output. For instructions, see Appendix D - Enabling Debugging Output for the
Sun Kerberos Classes.
Note:
In the taskcontroller.cfg file, the default setting for the banned.users property is mapred,
hdfs, and bin to prevent jobs from being submitted via those user accounts. The default setting
for the min.user.id property is 1000 to prevent jobs from being submitted with a user ID less
than 1000, which are conventionally Unix super users. Note that some operating systems such
as CentOS 5 use a default value of 500 and above for user IDs, not 1000. If this is the case on your
system, change the default setting for the min.user.id property to 500. If there are user accounts
on your cluster that have a user ID less than the value specified for the min.user.id property,
the TaskTracker returns an error code of 255.
3. The path to the taskcontroller.cfg file is determined relative to the location of the task-controller
binary. Specifically, the path is <path of task-controller binary>/../../conf/taskcontroller.cfg.
If you installed the CDH 5 package, this path will always correspond to
/etc/hadoop/conf/taskcontroller.cfg.
Note:
For more information about the task-controller program, see Appendix B - Information about
Other Hadoop Security Programs.
Important:
The same mapred-site.xml file and the same hdfs-site.xml file must both be installed on every
host machine in the cluster so that the NameNode, Secondary NameNode, DataNode, Job Tracker and
Task Tracker can all connect securely with each other.
You can verify that the JobTracker is working properly by opening a web browser to http://machine:50030/
where machine is the name of the machine where the JobTracker is running.
2. Add the following properties to the mapred-site.xml file on every machine in the cluster:
<!-- MapReduce Job History Server security configs -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>host:port</value> <!-- Host and port of the MapReduce Job History Server;
default port is 10020 -->
</property>
<property>
<name>mapreduce.jobhistory.keytab</name>
<value>/etc/hadoop/conf/mapred.keytab</value> <!-- path to the MAPRED keytab for
the Job History Server -->
</property>
<property>
<name>mapreduce.jobhistory.principal</name>
<value>mapred/[email protected]</value>
</property>
<!-- To enable SSL -->
<property>
<name>mapreduce.jobhistory.http.policy</name>
<value>HTTPS_ONLY</value>
</property>
3. Create a file called container-executor.cfg for the Linux Container Executor program that contains the
following information:
yarn.nodemanager.local-dirs=<comma-separated list of paths to local NodeManager
directories. Should be same values specified in yarn-site.xml. Required to validate
paths passed to container-executor in order.>
yarn.nodemanager.linux-container-executor.group=yarn
yarn.nodemanager.log-dirs=<comma-separated list of paths to local NodeManager log
directories. Should be same values specified in yarn-site.xml. Required to set
proper permissions on the log files so that they can be written to by the user's
containers and read by the NodeManager for log aggregation.
banned.users=hdfs,yarn,mapred,bin
min.user.id=1000
Note:
In the container-executor.cfg file, the default setting for the banned.users property is hdfs,
yarn, mapred, and bin to prevent jobs from being submitted via those user accounts. The default
setting for the min.user.id property is 1000 to prevent jobs from being submitted with a user
ID less than 1000, which are conventionally Unix super users. Note that some operating systems
such as CentOS 5 use a default value of 500 and above for user IDs, not 1000. If this is the case
on your system, change the default setting for the min.user.id property to 500. If there are user
accounts on your cluster that have a user ID less than the value specified for the min.user.id
property, the NodeManager returns an error code of 255.
4. The path to the container-executor.cfg file is determined relative to the location of the container-executor
binary. Specifically, the path is <dirname of container-executor
Note:
The container-executor program requires that the paths including and leading up to the
directories specified in yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs to
be set to 755 permissions as shown in this table on permissions on directories.
5. Verify that the ownership and permissions of the container-executor program corresponds to:
---Sr-s--- 1 root yarn 36264 May 20 15:30 container-executor
Note:
For more information about the Linux Container Executor program, see Appendix B - Information
about Other Hadoop Security Programs.
You can verify that the ResourceManager is working properly by opening a web browser to http://host:8088/
where host is the name of the machine where the ResourceManager is running.
You can verify that the NodeManager is working properly by opening a web browser to http://host:8042/ where
host is the name of the machine where the NodeManager is running.
Enabling ACLs
By default, ACLs are disabled on a cluster. To enable them, set the dfs.namenode.acls.enabled property to
true in the NameNode's hdfs-site.xml.
<property>
<name>dfs.namenode.acls.enabled</name>
<value>true</value>
</property>
Commands
You can use the File System Shell commands, setfacl and getfacl, to modify and retrieve files' ACLs.
getfacl
hdfs dfs -getfacl [-R] <path>
<!-- COMMAND OPTIONS
<path>: Path to the file or directory for which ACLs should be listed.
-R: Use this option to recursively list ACLs for all files and directories.
-->
Examples:
<!-- To list all ACLs for the file located at /user/hdfs/file -->
hdfs dfs -getfacl /user/hdfs/file
<!-- To recursively list ACLs for /user/hdfs/file -->
hdfs dfs -getfacl -R /user/hdfs/file
Examples:
<!-- To give user ben read & write permission over /user/hdfs/file -->
hdfs dfs -setfacl -m user:ben:rw- /user/hdfs/file
<!-- To remove user alice's ACL entry for /user/hdfs/file -->
hdfs dfs -setfacl -x user:alice /user/hdfs/file
<!-- To give user hadoop read & write access, and group or others read-only access -->
hdfs dfs -setfacl --set user:hadoop:rw-,group::r--,other::r-- /user/hdfs/file
Prerequisites
Sentry depends on an underlying authentication framework to reliably identify the requesting user. It requires:
CDH 5
HiveServer2 with strong authentication (Kerberos or LDAP)
A secure Hadoop cluster
This is to prevent a user bypassing the authorization and gaining direct access to the underlying data.
In addition to the above, make sure that the following are true:
The Hive warehouse directory (/user/hive/warehouse or any path you specify as
hive.metastore.warehouse.dir in your hive-site.xml) must be owned by the Hive user and group.
Permissions on the warehouse directory must be set as follows (see following Note for caveats):
771 on the directory itself (for example, /user/hive/warehouse)
771 on all subdirectories (for example, /user/hive/warehouse/mysubdir)
All files and subdirectories should be owned by hive:hive
For example:
$ sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse
$ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse
Note:
If you set hive.warehouse.subdir.inherit.perms to true in hive-site.xml, the
permissions on the subdirectories will be set when you set permissions on the warehouse
directory itself.
If a user has access to any object in the warehouse, that user will be able to execute use
default. This ensures that use default commands issued by legacy applications work
when Sentry is enabled. Note that you can protect objects in the default database (or any
other database) by means of a policy file.
Important: These instructions override the recommendations in the Hive section of the CDH
5 Installation Guide.
HiveServer2 impersonation must be turned off.
The Hive user must be able to submit MapReduce jobs. You can ensure that this is true by setting the
minimum user ID for job submission to 0. Edit the taskcontroller.cfg file and set min.user.id=0.
Important:
You must restart the cluster and HiveServer2 after changing this value, whether you use
Cloudera Manager or not.
These instructions override the instructions under Configuring MRv1 Security on page 36
These instructions override the instructions under Configuring YARN Security on page 38
Each object must be specified as a hierarchy of the containing objects, from server to table, followed by the
privilege granted for that object. A role can contain multiple such rules, separated by commas. For example a
role might contain the Select privilege for the customer and items tables in the sales database, and the
Insert privilege for the sales_insights table in the reports database. You would specify this as follows:
sales_reporting =
\server=server1->db=sales->table=customer->action=Select,
\server=server1->db=sales->table=items>action=Select,
\server=server1->db=reports->table=sales_insights>action=Insert
Privilege Model
With CDH 5.1, the privilege model has undergone changes to accomodate the new grant/revoke syntax that is
used with the Sentry service. These changes are common to both the new database-backed Sentry service, as
well as the previous policy file approach.
The Sentry privilege model has the following characteristics:
Allows any user to execute show function, desc function, and show locks.
Allows the user to see only those tables and databases for which this user has privileges.
Requires a user to have the necessary privileges on the URI to execute HiveQL operations that take in a
location. Examples of such operations include LOAD, IMPORT, and EXPORT.
Important: When Sentry is enabled, a user with no privileges on a database will not be allowed to
connect to HiveServer2. This is because the use <database> command is now executed as part of
the connection to HiveServer2, which is why the connection fails. See HIVE-4256.
Here the group analyst is granted the roles sales_reporting, data_export, and audit_report. The members
of this group can run the HiveQL statements that are allowed by these roles. If this is an HDFS-backed group,
then all the users belonging to the HDFS group analyst can run such queries.
Important: You can use either Hadoop groups or local groups, but not both at the same time. Use
local groups if you want to do a quick proof-of-concept. For production, use Hadoop groups. Refer
Appendix I - Configuring LDAP Group Mappings on page 211 for details on configuring LDAP group
mappings in Hadoop.
Command
RHEL
SLES
Ubuntu or Debian
Command
RHEL
SLES
Ubuntu or Debian
Policy file
The sections that follow contain notes on creating and maintaining the policy file, and using URIs to load external
data and JARs.
Warning: An invalid policy file will be ignored while logging an exception. This will lead to a situation
where users will lose access to all Sentry-protected data, since default Sentry behaviour is deny
unless a user has been explicitly granted access. (Note that if only the per-DB policy file is invalid, it
will invalidate only the policies in that file.)
Storing the Policy File
Considerations for storing the policy file(s) in HDFS include:
1. Replication count - Because the file is read for each query in Hive and read once every five minutes by all
Impala daemons, you should increase this value; since it is a small file, setting the replication count equal to
the number of slave nodes in the cluster is reasonable.
2. Updating the file - Updates to the file are reflected immediately, so you should write them to a temporary
copy of the file first, and then replace the existing file with the temporary one after all the updates are
complete. This avoids race conditions caused by reads on an incomplete file.
Defining Roles
Keep in mind that role definitions are not cumulative; the the definition that is further down in the file replaces
the older one. For example, the following results in role1 having privilege2, not privilege1 and privilege2.
role1 = privilege1
role1 = privilege2
Important: Because the NameNode host and port must be specified, Cloudera strongly recommends
you use High Availability (HA). This ensures that the URI will remain constant even if the NameNode
changes.
To enable URIs in per-DB policy files, add the following string to the Java configuration options for HiveServer2
during startup.
-Dsentry.allow.uri.db.policyfile=true
Important: Enabling URIs in per-DB policy files introduces a security risk by allowing the owner of
the db-level policy file to grant himself/herself load privileges to anything the hive user has read
permissions for in HDFS (including data in other databases controlled by different db-level policy
files).
Loading Data
Data can be loaded using a landing skid, either in HDFS or via a local/NFS directory where HiveServer2/Impala
run. The following privileges can be used to grant a role access to a loading skid:
Load data from a local/NFS directory:
server=server1->uri=file:///path/to/nfs/local/to/nfs
In addition to the privilege in Sentry, the hive or impala user will require the appropriate file permissions to
access the data being loaded. Groups can be used for this purpose. For example, create a group hive-users,
and add the hive and impala users along with the users who will be loading data, to this group.
The example usermod and groupadd commands below are only applicable to locally defined groups on the
NameNode, JobTracker, and ResourceManager. If you use another system for group management, equivalent
changes should be made in your group management system.
$ groupadd hive-users
$ usermod -G someuser,hive-users someuser
$ usermod -G hive,hive-users hive
2. Copy the JAR file (and its dependent libraries) to the host running HiveServer2/Impala.
3. On the HiveServer2/Impala host, open /etc/default/hive-server2 and set the AUX_CLASSPATH variable
to a comma-separated list of the fully-qualified paths to the JAR file and any dependent libraries.
AUX_CLASSPATH=/opt/local/hive/lib/my.jar
4. To access the UDF, you must have URI privilege to the jar where the UDF resides. This privilege prevents
users from creating functions such as the reflect function which is disallowed because it allows users to
execute arbitrary Java code.
udf_r = server=server1->uri=file:///opt/local/hive/lib
5. Restart HiveServer2.
You should now be able to use the UDF:
CREATE TEMPORARY FUNCTION my_udf AS 'MyUDF';
Sample Configuration
This section provides a sample configuration.
Policy Files
The following is an example of a policy file with a per-DB policy file. In this example, the first policy file,
sentry-provider.ini would exist in HDFS; hdfs://ha-nn-uri/etc/sentry/sentry-provider.ini might
CDH 5 Security Guide | 49
customers.ini
[groups]
manager = customers_insert_role, customers_select_role
analyst = customers_select_role
[roles]
customers_insert_role = server=server1->db=customers->table=*->action=insert
customers_select_role = server=server1->db=customers->table=*->action=select
Important: Sentry does not support using the view keyword in policy files. If you want to define a
role against a view, use the keyword table instead. For example, to define the role analyst_role
against the view col_test_view:
[roles]
analyst_role = server=server1->db=default->table=col_test_view->action=select
groupadd hive-users
usermod -G hive,impala,hive-users hive
usermod -G hive,impala,hive-users impala
usermod -G etl,hive-users etl
Once you have added users to the hive-users group, change directory permissions in the HDFS:
$ hadoop fs -chgrp -R hive:hive-users /user/hive/warehouse
$ hadoop fs -chmod -R 770 /user/hive/warehouse
which indicate each evaluation Sentry makes. The FilePermission is from the policy file, while
RequestPermission is the privilege required for the query. A RequestPermission will iterate over all appropriate
FilePermission settings until a match is found. If no matching privilege is found, Sentry returns false indicating
Access Denied.
Object
INSERT
DB, TABLE
SELECT
DB, TABLE
ALL
Granular privileges on
object
Privileges on container
object that implies
privileges on the base
object
DATABASE
ALL
SERVER
ALL
TABLE
INSERT
DATABASE
ALL
TABLE
SELECT
DATABASE
ALL
VIEW
SELECT
DATABASE
ALL
Scope
Privileges
CREATE DATABASE
SERVER
ALL
DROP DATABASE
DATABASE
ALL
CREATE TABLE
DATABASE
ALL
DROP TABLE
TABLE
ALL
CREATE VIEW
DROP VIEW
VIEW/TABLE
ALL
CREATE INDEX
TABLE
ALL
DROP INDEX
TABLE
ALL
TABLE
ALL
ALL
ALL
ALL
TABLE
ALL
TABLE
ALL
TABLE
ALL
TABLE
ALL
TABLE
ALL
TABLE
ALL
URI
Others
SELECT on TABLE
URI
URI
Scope
Privileges
URI
ALTER TABLE ..
PARTITION SET
FILEFORMAT
TABLE
ALL
SHOW TBLPROPERTIES
TABLE
SELECT/INSERT
TABLE
SELECT/INSERT
SHOW PARTITIONs
TABLE
SELECT/INSERT
DESCRIBE TABLE
TABLE
SELECT/INSERT
DESCRIBE TABLE ..
PARTITION
TABLE
SELECT/INSERT
LOAD DATA
TABLE
INSERT
SELECT
TABLE
SELECT
INSERT OVERWRITE
TABLE
TABLE
INSERT
CREATE TABLE .. AS
SELECT
USE <dbName>
Any
TABLE
ALL
ALTER TABLE ..
PARTITION SET
SERDEPROPERTIES
TABLE
ALL
INSERT OVERWRITE
DIRECTORY
TABLE
INSERT
Analyze TABLE
TABLE
SELECT + INSERT
IMPORT TABLE
DATABASE
ALL
URI
EXPORT TABLE
TABLE
SELECT
URI
TABLE
ALL
TABLE
ALL
ALTER TABLE ..
CLUSTERED BY SORTED
BY
TABLE
ALL
ALTER TABLE ..
ENABLE/DISABLE
TABLE
ALL
ALTER TABLE ..
PARTITION
ENABLE/DISABLE
TABLE
ALL
ALTER TABLE ..
TABLE
PARTITION.. RENAME TO
PARTITION
ALL
Others
URI
SELECT on TABLE
Hive-Only Operations
URI
Scope
Privileges
ALTER DATABASE
DATABASE
ALL
DESCRIBE DATABASE
DATABASE
SELECT/INSERT
SHOW COLUMNS
TABLE
SELECT/INSERT
SHOW INDEXES
TABLE
SELECT/INSERT
GRANT PRIVILEGE
REVOKE PRIVILEGE
SHOW GRANTS
ADD JAR
Not Allowed
ADD FILE
Not Allowed
DFS
Not Allowed
Impala-Only Operations
EXPLAIN
TABLE
SELECT
INVALIDATE METADATA
SERVER
ALL
INVALIDATE METADATA
<table name>
TABLE
SELECT/INSERT
TABLE
SELECT/INSERT
CREATE FUNCTION
SERVER
ALL
DROP FUNCTION
SERVER
ALL
COMPUTE STATS
TABLE
ALL
URI
Others
Prerequisites
Sentry depends on an underlying authentication framework to reliably identify the requesting user. It requires:
CDH 5.1.x
HiveServer2 with strong authentication (Kerberos or LDAP)
A secure Hadoop cluster
This is to prevent a user bypassing the authorization and gaining direct access to the underlying data.
In addition to the above, make sure that the following are true:
The Hive warehouse directory (/user/hive/warehouse or any path you specify as
hive.metastore.warehouse.dir in your hive-site.xml) must be owned by the Hive user and group.
Permissions on the warehouse directory must be set as follows (see following Note for caveats):
771 on the directory itself (for example, /user/hive/warehouse)
771 on all subdirectories (for example, /user/hive/warehouse/mysubdir)
All files and subdirectories should be owned by hive:hive
For example:
$ sudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehouse
$ sudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse
Note:
If you set hive.warehouse.subdir.inherit.perms to true in hive-site.xml, the
permissions on the subdirectories will be set when you set permissions on the warehouse
directory itself.
If a user has access to any object in the warehouse, that user will be able to execute use
default. This ensures that use default commands issued by legacy applications work
when Sentry is enabled. Note that you can protect objects in the default database (or any
other database) by means of a policy file.
Important: These instructions override the recommendations in the Hive section of the CDH
5 Installation Guide.
Important:
You must restart the cluster and HiveServer2 after changing this value, whether you use
Cloudera Manager or not.
These instructions override the instructions under Configuring MRv1 Security on page 36
These instructions override the instructions under Configuring YARN Security on page 38
Privilege Model
With CDH 5.1, the privilege model has undergone changes to accomodate the new grant/revoke syntax that is
used with the Sentry service. These changes are common to both the new database-backed Sentry service, as
well as the previous policy file approach.
The Sentry privilege model has the following characteristics:
Allows any user to execute show function, desc function, and show locks.
Allows the user to see only those tables and databases for which this user has privileges.
Requires a user to have the necessary privileges on the URI to execute HiveQL operations that take in a
location. Examples of such operations include LOAD, IMPORT, and EXPORT.
Important: When Sentry is enabled, a user with no privileges on a database will not be allowed to
connect to HiveServer2. This is because the use <database> command is now executed as part of
the connection to HiveServer2, which is why the connection fails. See HIVE-4256.
For more information, see Appendix: Authorization Privilege Model for Hive and Impala on page 66.
Important: You can use either Hadoop groups or local groups, but not both at the same time. Use
local groups if you want to do a quick proof-of-concept. For production, use Hadoop groups. Refer
Appendix I - Configuring LDAP Group Mappings on page 211 for details on configuring LDAP group
mappings in Hadoop.
Configuring Hadoop Groups
Set the hive.sentry.provider property in sentry-site.xml.
<property>
<name>hive.sentry.provider</name>
<value>org.apache.sentry.provider.file.HadoopGroupResourceAuthorizationProvider</value>
</property>
Command
RHEL
SLES
Ubuntu or Debian
Command
RHEL
SLES
Ubuntu or Debian
Alternatively, you can set the sentry.verify.schema.version configuration property to false. However,
this is not recommended.
3. Start the Sentry service.
bin/sentry --command service --conffile <sentry-site.xml>
No roles enabled:
SET ROLE NONE;
SHOW Statement
To list all the roles in the system (only for sentry admin users):
SHOW ROLES;
To list all the roles in effect for the current user session:
SHOW CURRENT ROLES;
To list all the roles assigned to the given <groupName> (only for sentry admin users):
SHOW ROLE GRANT GROUP <groupName>;
The SHOW statement can also be used to list the privileges that have been granted to a role or all the grants
given to a role for a particular object.
To list all the grants for the given <roleName> (only for sentry admin users):
SHOW GRANT ROLE <roleName>;
The following sections show how you can use the new GRANT statements to assign privileges to roles (and assign
roles to groups) to match the sample policy file above.
Grant privileges to analyst_role:
CREATE ROLE analyst_role;
GRANT ALL ON DATABASE analyst1 TO ROLE analyst_role;
GRANT SELECT ON DATABASE jranalyst1 TO ROLE analyst_role;
GRANT ALL ON URI 'hdfs://ha-nn-uri/landing/analyst1' \
TO ROLE analyst_role;
To enable authorization based on policy server metadata set the following flag on the impalad.
--server_name=<server name>
To enable authorization based on a file-based policy set the following flags on the impalad.
--server_name=<server name>
--authorization_policy_file=<path to policy file>
If the --authorization_policy_file flag is set, Impala will use the policy file-based approach. Otherwise,
the policy server metadata approach will be used to implement authorization.
The impala user also needs to be added to list of administrative users of the Sentry Policy Server. For more
details, see SENTRY-191.
Object
INSERT
DB, TABLE
SELECT
DB, TABLE
ALL
Granular privileges on
object
Privileges on container
object that implies
privileges on the base
object
DATABASE
ALL
SERVER
ALL
TABLE
INSERT
DATABASE
ALL
TABLE
SELECT
DATABASE
ALL
VIEW
SELECT
DATABASE
ALL
Scope
Privileges
CREATE DATABASE
SERVER
ALL
DROP DATABASE
DATABASE
ALL
CREATE TABLE
DATABASE
ALL
DROP TABLE
TABLE
ALL
CREATE VIEW
DROP VIEW
VIEW/TABLE
ALL
CREATE INDEX
TABLE
ALL
URI
Others
SELECT on TABLE
Scope
Privileges
DROP INDEX
TABLE
ALL
TABLE
ALL
ALL
ALL
ALL
TABLE
ALL
TABLE
ALL
TABLE
ALL
TABLE
ALL
TABLE
ALL
TABLE
ALL
ALTER TABLE ..
PARTITION SET
FILEFORMAT
TABLE
ALL
SHOW TBLPROPERTIES
TABLE
SELECT/INSERT
TABLE
SELECT/INSERT
SHOW PARTITIONs
TABLE
SELECT/INSERT
DESCRIBE TABLE
TABLE
SELECT/INSERT
DESCRIBE TABLE ..
PARTITION
TABLE
SELECT/INSERT
LOAD DATA
TABLE
INSERT
SELECT
TABLE
SELECT
INSERT OVERWRITE
TABLE
TABLE
INSERT
CREATE TABLE .. AS
SELECT
USE <dbName>
Any
TABLE
ALL
ALTER TABLE ..
PARTITION SET
SERDEPROPERTIES
TABLE
ALL
URI
Others
URI
URI
URI
SELECT on TABLE
Scope
Privileges
URI
INSERT OVERWRITE
DIRECTORY
TABLE
INSERT
URI
Analyze TABLE
TABLE
SELECT + INSERT
IMPORT TABLE
DATABASE
ALL
URI
EXPORT TABLE
TABLE
SELECT
URI
TABLE
ALL
TABLE
ALL
ALTER TABLE ..
CLUSTERED BY SORTED
BY
TABLE
ALL
ALTER TABLE ..
ENABLE/DISABLE
TABLE
ALL
ALTER TABLE ..
PARTITION
ENABLE/DISABLE
TABLE
ALL
ALTER TABLE ..
TABLE
PARTITION.. RENAME TO
PARTITION
ALL
ALTER DATABASE
DATABASE
ALL
DESCRIBE DATABASE
DATABASE
SELECT/INSERT
SHOW COLUMNS
TABLE
SELECT/INSERT
SHOW INDEXES
TABLE
SELECT/INSERT
GRANT PRIVILEGE
REVOKE PRIVILEGE
SHOW GRANTS
ADD JAR
Not Allowed
ADD FILE
Not Allowed
DFS
Not Allowed
Hive-Only Operations
Impala-Only Operations
EXPLAIN
TABLE
SELECT
INVALIDATE METADATA
SERVER
ALL
INVALIDATE METADATA
<table name>
TABLE
SELECT/INSERT
TABLE
SELECT/INSERT
CREATE FUNCTION
SERVER
ALL
Others
Scope
Privileges
DROP FUNCTION
SERVER
ALL
COMPUTE STATS
TABLE
ALL
URI
Others
Note:
These instructions have been tested with CDH 5 and MIT Kerberos 5 only. The following instructions
describe an example of how to configure a Flume agent to be a client as the user flume to a secure
HDFS service. This section does not describe how to secure the communications between Flume
agents, which is not currently implemented.
where:
agentName is the name of the Flume agent being configured, which in this release defaults to the value "agent".
sinkName is the name of the HDFS sink that is being configured. The respective sink's type must be HDFS.
In the previous example, flume is the first component of the principal name, fully.qualified.domain.name
is the second, and YOUR-REALM.COM is the name of the Kerberos realm your Hadoop cluster is in. The
/etc/flume-ng/conf/flume.keytab file contains the keys necessary for
flume/[email protected] to authenticate with other services.
Writing as different users across multiple HDFS sinks in a single Flume agent
In this release, support has been added for secure impersonation of Hadoop users (similar to "sudo" in UNIX).
This is implemented in a way similar to how Oozie implements secure user impersonation.
The following steps to set up secure impersonation from Flume to HDFS assume your cluster is configured using
Kerberos. (However, impersonation also works on non-Kerberos secured clusters, and Kerberos-specific aspects
should be omitted in that case.)
1. Configure Hadoop to allow impersonation. Add the following configuration properties to your core-site.xml.
<property>
<name>hadoop.proxyuser.flume.groups</name>
<value>group1,group2</value>
<description>Allow the flume user to impersonate any members of group1 and
group2</description>
</property>
<property>
<name>hadoop.proxyuser.flume.hosts</name>
<value>host1,host2</value>
<description>Allow the flume user to connect only from host1 and host2 to
impersonate a user</description>
</property>
2.
3.
4.
5.
6.
You can use the wildcard character * to enable impersonation of any user from any host. For more information,
see Secure Impersonation.
Set up a Kerberos keytab for the Kerberos principal and host Flume is connecting to HDFS from. This user
must match the Hadoop configuration in the preceding step. For instructions, see Configuring Hadoop Security
in CDH 5.
Configure the HDFS sink with the following configuration options:
hdfs.kerberosPrincipal - fully-qualified principal. Note: _HOST will be replaced by the hostname of the
local machine (only in-between the / and @ characters)
hdfs.kerberosKeytab - location on the local machine of the keytab containing the user and host keys for
the above principal
hdfs.proxyUser - the proxy user to impersonate
Example snippet (the majority of the HDFS sink configuration options have been omitted):
agent.sinks.sink-1.type = HDFS
agent.sinks.sink-1.hdfs.kerberosPrincipal = flume/[email protected]
agent.sinks.sink-1.hdfs.kerberosKeytab = /etc/flume-ng/conf/flume.keytab
agent.sinks.sink-1.hdfs.proxyUser = weblogs
agent.sinks.sink-2.type = HDFS
agent.sinks.sink-2.hdfs.kerberosPrincipal = flume/[email protected]
agent.sinks.sink-2.hdfs.kerberosKeytab = /etc/flume-ng/conf/flume.keytab
agent.sinks.sink-2.hdfs.proxyUser = applogs
In the above example, the flume Kerberos principal impersonates the user weblogs in sink-1 and the user
applogs in sink-2. This will only be allowed if the Kerberos KDC authenticates the specified principal (flume
in this case), and the if NameNode authorizes impersonation of the specified proxy user by the specified principal.
Limitations
At this time, Flume does not support using multiple Kerberos principals or keytabs in the same agent. Therefore,
if you want to create files as multiple users on HDFS, then impersonation must be configured, and exactly one
72 | CDH 5 Security Guide
where fully.qualified.domain.name is the fully qualified domain name of the given Flume agent host
machine, and YOUR-REALM.COM is the Kerberos realm.
Each Flume agent machine that writes to HDFS does not need to have a flume Unix user account to write
files owned by the flume Hadoop/Kerberos user. Only the keytab for the flume Hadoop/Kerberos user is
required on the Flume agent machine.
DataNode machines do not need Flume Kerberos keytabs and also do not need the flume Unix user account.
TaskTracker (MRv1) or NodeManager (YARN) machines need a flume Unix user account if and only if
MapReduce jobs are being run as the flume Hadoop/Kerberos user.
The NameNode machine needs to be able to resolve the groups of the flume user. The groups of the flume
user on the NameNode machine are mapped to the Hadoop groups used for authorizing access.
The NameNode machine does not need a Flume Kerberos keytab.
If HBase is running with the AccessController coprocessor, the flume user (or whichever user the agent is
running as) must have permissions to write to the same table and the column family that the sink is configured
to write to. You can grant permissions using the grant command from HBase shell as explained in HBase
Security Configuration.
The Flume HBase Sink does not currently support impersonation; it will write to HBase as the user the agent
is being run as.
If you want to use HDFS Sink and HBase Sink to write to HDFS and HBase from the same agent respectively,
both sinks have to use the same principal and keytab. If you want to use different credentials, the sinks have
to be on different agents.
Each Flume agent machine that writes to HBase (via a configured HBase sink) needs a Kerberos principal of
the form:
flume/[email protected]
where fully.qualified.domain.name is the fully qualified domain name of the given Flume agent host
machine, and YOUR-REALM.COM is the Kerberos realm.
cacerts
key
cert
validate
Choose whether Hue should validate certificates received from the server.
Default: true
options={"ssl":{"ca":"/tmp/ca-cert.pem"}}
Session Timeout
Session timeouts can be set by specifying the ttl configuration property under the [desktop]>[[session]]
section in hue.ini.
ttl
The cookie containing the users' session ID will expire after this amount
of time in seconds.
Default: 60*60*24*14
Secure Cookies
Secure session cookies can be enable by specifying the secure configuration property under the
[desktop]>[[session]] section in hue.ini. Additionally, you can set the http-only flag for cookies containing
users' session IDs.
secure
The cookie containing the users' session ID will be secure. Should only be
enabled with HTTPS.
Default: false
http-only
The cookie containing the users' session ID will use the HTTP only flag.
Default: false
Default: options,get,head,post,put,delete,connect
Default: !aNULL:!eNULL:!LOW:!EXPORT:!SSLv2
For example, to restrict users to your local domain and FQDN, the following
value can be used:
^\/.*$,^https:\/\/fanyv88.com:443\/http\/www.mydomain.com\/.*$
where: hue is the principal the Hue server is running as, hue.server.fully.qualified.domain.name is
the fully-qualified domain name (FQDN) of your Hue server, YOUR-REALM.COM is the name of the Kerberos
realm your Hadoop cluster is in
2. Create a keytab file for the Hue principal using the same procedure that you used to create the keytab for
the hdfs or mapred principal for a specific host. You should name this file hue.keytab and put this keytab
file in the directory /etc/hue on the machine running the Hue server. Like all keytab files, this file should
have the most limited set of permissions possible. It should be owned by the user running the hue server
(usually hue) and should have the permission 400.
3. To test that the keytab file was created properly, try to obtain Kerberos credentials as the Hue principal using
only the keytab file. Substitute your FQDN and realm in the following command:
$ kinit -k -t /etc/hue/hue.keytab
hue/[email protected]
4. In the /etc/hue/hue.ini configuration file, add the following lines in the sections shown. Replace the
kinit_path value, /usr/kerberos/bin/kinit, shown below with the correct path on the user's system.
[desktop]
[[kerberos]]
# Path to Hue's Kerberos keytab file
hue_keytab=/etc/hue/hue.keytab
# Kerberos principal name for Hue
hue_principal=hue/FQDN@REALM
# add kinit path for non root users
kinit_path=/usr/kerberos/bin/kinit
[beeswax]
# If Kerberos security is enabled, use fully-qualified domain name (FQDN)
## hive_server_host=<FQDN of Hive Server>
# Hive configuration directory, where hive-site.xml is located
## hive_conf_dir=/etc/hive/conf
[impala]
## server_host=localhost
## impala_principal=impala/impalad.hostname.domainname.com
[search]
# URL of the Solr Server
## solr_url=http://localhost:8983/solr/
# Requires FQDN in solr_url if enabled
## security_enabled=false
[hadoop]
[[hdfs_clusters]]
[[[default]]]
# Enter the host and port on which you are running the Hadoop NameNode
namenode_host=FQDN
hdfs_port=8020
http_port=50070
security_enabled=true
# Thrift plugin port for the name node
## thrift_port=10090
# Configuration for YARN (MR2)
# -----------------------------------------------------------------------[[yarn_clusters]]
[[[default]]]
Important:
In the /etc/hue/hue.ini file, verify the following:
Make sure the jobtracker_host property is set to the fully-qualified domain name of the
host running the JobTracker. The JobTracker host name must be fully-qualified in a secured
environment.
Make sure the fs.defaultfs property under each [[hdfs_clusters]] section contains the
fully-qualified domain name of the file system access point, which is typically the NameNode.
Make sure the hive_conf_dir property under the [beeswax] section points to a directory
containing a valid hive-site.xml (either the original or a synced copy).
Make sure the FQDN specified for HiveServer2 is the same as the FQDN specified for the
hue_principal configuration property. Without this, HiveServer2 will not work with security
enabled.
5. In the /etc/hadoop/conf/core-site.xml configuration file on all of your cluster nodes, add the following
lines:
<!-- Hue security configuration -->
<property>
<name>hue.kerberos.principal.shortname</name>
<value>hue</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value> <!-- A group which all users of Hue belong to, or the wildcard
value "*" -->
</property>
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>hue.server.fully.qualified.domain.name</value>
</property>
Important:
Make sure you change the /etc/hadoop/conf/core-site.xml configuration file on all of your
cluster nodes.
6. If Hue is configured to communicate to Hadoop via HttpFS, then you must add the following properties to
httpfs-site.xml:
<property>
<name>httpfs.proxyuser.hue.hosts</name>
<value>fully.qualified.domain.name</value>
</property>
<property>
7. Add the following properties to the Oozie server oozie-site.xml configuration file in the Oozie configuration
directory:
<property>
<name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>oozie.service.ProxyUserService.proxyuser.hue.groups</name>
<value>*</value>
</property>
8. Restart the JobTracker to load the changes from the core-site.xml file.
$ sudo service hadoop-0.20-mapreduce-jobtracker restart
10. Restart the NameNode, JobTracker, and all DataNodes to load the changes from the core-site.xml file.
$ sudo service hadoop-0.20-(namenode|jobtracker|datanode) restart
The LDAP authentication backend will automatically create users that dont exist in Hue by default. Hue needs
to import users in order to properly perform the authentication. Passwords are never imported when importing
users. If you want to disable automatic import set the create_users_on_login property under the [desktop]
> [[ldap]] section of hue.ini to false.
[desktop]
[[ldap]]
create_users_on_login=false
The purpose of disabling the automatic import is to allow only a predefined list of manually imported users to
login.
There are two ways to authenticate with a directory service through Hue:
Search Bind
Direct Bind
Search Bind
The search bind mechanism for authenticating will perform an ldapsearch against the directory service and
bind using the found distinguished name (DN) and password provided. This is the default method of authentication
used by Hue with LDAP.
The following configuration properties under the [desktop] > [[ldap]] > [[[users]]] section in hue.ini
can be set to restrict the search process.
user_filter
user_name_attr
With the above configuration, the LDAP search filter will take on the form:
(&(objectClass=*)(sAMAccountName=<user entered usename>))
The NT domain to connect to (only for use with Active Directory). This
AD-specific property allows Hue to authenticate with AD without having
to follow LDAP references to other partitions. This typically maps to the
email address of the user or the user's ID in conjunction with the domain.
If provided, Hue will use User Principal Names (UPNs) to bind to the LDAP
service.
Default: mycompany.com
ldap_username_pattern
Provides a template for the DN that will ultimately be sent to the directory
service when authenticating. The <username> parameter will be replaced
with the username provided at login.
Default: "uid=<username>,ou=People,dc=mycompany,dc=com"
Groups can also be imported using the User Admin interface, and users can be added to this group. As in the
image below, not only can groups be discovered via DN and rDN search, but users that are members of the group
or members of its subordinate groups can be imported as well.
LDAPS/StartTLS support
Secure communication with LDAP is provided using the SSL/TLS and StartTLS protocols. They allow Hue to
validate the directory service it is going to converse with. Hence, if a Certificate Authority certificate file is provided,
Hue will communicate using LDAPS. You can specify the path to the CA certificate under :
[desktop]
[[ldap]]
ldap_cert=/etc/hue/ca.crt
Oracle Linux:
For Oracle Linux systems, download the xmlsec1 package from http://www.aleksey.com/xmlsec/ and execute
the following commands:
tar -xvzf xmlsec1-<version>.tar.gz
cd xmlsec1-<version>
./configure && make
sudo make install
Important: The xmlsec1 package must be executable by the user running Hue.
You should now be able to install djangosaml and pysaml2 on your machines.
build/env/bin/pip install -e git+https://github.com/abec/pysaml2@HEAD#egg=pysaml2
build/env/bin/pip install -e git+https://github.com/abec/djangosaml2@HEAD#egg=djangosaml2
Description
xmlsec_binary
create_users_on_login
Create Hue users received in assertion response upon successful login. The
value for this parameter can be either "true" or "false".
required_attributes
Attributes Hue asks for from the IdP. This is a comma-separated list of
attributes. For example, uid, email and so on.
optional_attributes
Optional attributes Hue can ask for from the IdP. Also a comma-separated
list of attributes.
metadata_file
This is a path to the IdP metadata copied to a local file. This file should be
readable.
key_file
Path to the private key used to encrypt the metadata. File format .PEM
cert_file
Path to the X.509 certificate to be sent along with the encrypted metadata.
File format .PEM
user_attribute_mapping
Mapping from attributes received from the IdP to the Hue's django user
attributes. For example, {'uid':'username', 'email':'email'}.
logout_requests_signed
Important:
If the NameNode, Secondary NameNode, DataNode, JobTracker, TaskTrackers, ResourceManager,
NodeManagers, HttpFS, or Oozie services are configured to use Kerberos HTTP SPNEGO authentication,
and two or more of these services are running on the same host, then all of the running services must
use the same HTTP principal and keytab file used for their HTTP endpoints.
Important:
The HTTP/ component of the HTTP service user principal must be upper case as shown in the
syntax and example above.
3. Create keytab files with both principals.
$ kadmin
kadmin: xst -k oozie.keytab oozie/fully.qualified.domain.name
kadmin: xst -k http.keytab HTTP/fully.qualified.domain.name
5. Test that credentials in the merged keytab file work. For example:
$ klist -e -k -t oozie-http.keytab
6. Copy the oozie-http.keytab file to the Oozie configuration directory. The owner of the oozie-http.keytab
file should be the oozie user and the file should have owner-only read permissions.
7. Edit the Oozie server oozie-site.xml configuration file in the Oozie configuration directory by setting the
following properties:
Important: You must restart the Oozie server to have the configuration changes take effect.
Property
Value
oozie.service.HadoopAccessorService.kerberos.enabled true
local.realm
<REALM>
package installation, or
<EXPANDED_DIR>/conf/oozie-http.keytab for a
tarball installation
oozie.service.HadoopAccessorService.kerberos.principal oozie/<fully.qualified.domain.name>@<YOUR-REALM.COM>
oozie.authentication.type
kerberos
oozie.authentication.kerberos.principal
HTTP/<fully.qualified.domain.name>@<YOUR-REALM.COM>
oozie.authentication.kerberos.name.rules
oozie/[email protected]
oozie/[email protected]
oozie/[email protected]
HTTP/[email protected]
HTTP/[email protected]
HTTP/[email protected]
HTTP/[email protected]
The keystore file will be named .keystore and located in the oozie user's home directory.
2. You will now be asked a series of questions in an interactive prompt. Below is a sample of what this looks
like, along with some responses:
$ sudo -u oozie keytool -genkey -alias tomcat -keyalg RSA
Enter keystore password: password
Re-enter new password: password
What is your first and last name?
[Unknown]: oozie.server.hostname
What is the name of your organizational unit?
[Unknown]: Engineering
What is the name of your organization?
[Unknown]: A Great Company
What is the name of your City or Locality?
[Unknown]: Anywhere
What is the name of your State or Province?
[Unknown]: CA
What is the two-letter country code for this unit?
[Unknown]: US
Is CN=oozie.server.hostname, OU=Engineering, O=A Great Company, L=Anywhere, ST=CA,
C=US correct?
[no]: yes
Enter key password for <tomcat>
(RETURN if same as keystore password):
Important:
The password you enter for "keystore password" and "key password for <tomcat>" must be the
same. If you want to use a password other than "password", you will need to make an additional
change later when configuring the Oozie Server.
CDH 5 Security Guide | 89
3. Run the following command to export a certificate file from the keystore file:
sudo -u oozie keytool -exportcert -alias tomcat -file
path/to/where/I/want/my/certificate.cert
The keystore file will be named .keystore and located in the oozie user's home directory.
Configure the Oozie Server to use SSL (HTTPS)
1. Stop Oozie by running
sudo /sbin/service oozie stop
2. To enable SSL, set the MapReduce version that the Oozie server should work with using the alternatives
command.
Note: The alternatives command is only available on RHEL systems. For SLES, Ubuntu and
Debian systems, the command is update-alternatives.
For RHEL systems, to use YARN with SSL:
alternatives --set oozie-tomcat-conf /etc/oozie/tomcat-conf.https
Important:
The OOZIE_HTTPS_KEYSTORE_PASS variable must be the same as the password used when creating
the keystore file. If you used a password other than password, you'll have to change the value of
the OOZIE_HTTPS_KEYSTORE_PASS variable in this file.
3. Start Oozie by running
sudo /sbin/service oozie start
Where ${JRE_cacerts} is the path to the JRE's certs file. It's location may differ depending on the Operating
System, but its typically called cacerts and located at ${JAVA_HOME}/lib/security/cacerts but may be
under a different directory in ${JAVA_HOME} (you may want to create a backup copy of this file first). The
default password is changeit.
3. When using the Oozie Client, you will need to use https://oozie.server.hostname:11443/oozie instead of
http://oozie.server.hostname:11000/oozie Java will not automatically redirect from the http address to
the https address.
Connect to the Oozie Web UI using SSL (HTTPS)
Use https://oozie.server.hostname:11443/oozie though most browsers should automatically redirect you if
you use http://oozie.server.hostname:11000/oozie
Important:
If using a Self-Signed Certificate, your browser will warn you that it can't verify the certificate or
something similar. You will probably have to add your certificate as an exception.
Important:
If the NameNode, Secondary NameNode, DataNode, JobTracker, TaskTrackers, ResourceManager,
NodeManagers, HttpFS, or Oozie services are configured to use Kerberos HTTP SPNEGO authentication,
and two or more of these services are running on the same host, then all of the running services must
use the same HTTP principal and keytab file used for their HTTP endpoints.
2. Create a HTTP service user principal that is used to authenticate user requests coming to the HttpFS HTTP
web-services. The syntax of the principal is: HTTP/<fully.qualified.domain.name>@<YOUR-REALM>
where: 'fully.qualified.domain.name' is the host where the HttpFS server is running YOUR-REALM is
the name of your Kerberos realm
kadmin: addprinc -randkey HTTP/[email protected]
Important:
The HTTP/ component of the HTTP service user principal must be upper case as shown in the
syntax and example above.
3. Create keytab files with both principals.
$ kadmin
kadmin: xst -k httpfs.keytab httpfs/fully.qualified.domain.name
kadmin: xst -k http.keytab HTTP/fully.qualified.domain.name
5. Test that credentials in the merged keytab file work. For example:
$ klist -e -k -t httpfs-http.keytab
6. Copy the httpfs-http.keytab file to the HttpFS configuration directory. The owner of the
httpfs-http.keytab file should be the httpfs user and the file should have owner-only read permissions.
7. Edit the HttpFS server httpfs-site.xml configuration file in the HttpFS configuration directory by setting
the following properties:
Property
Value
httpfs.authentication.type
kerberos
httpfs.hadoop.authentication.type
kerberos
httpfs.authentication.kerberos.principal
HTTP/<HTTPFS-HOSTNAME>@<YOUR-REALM.COM>
httpfs.authentication.kerberos.keytab
/etc/hadoop-httpfs/conf/httpfs-http.keytab
httpfs.hadoop.authentication.kerberos.principal
httpfs/<HTTPFS-HOSTNAME>@<YOUR-REALM.COM>
httpfs.hadoop.authentication.kerberos.keytab
/etc/hadoop-httpfs/conf/httpfs-http.keytab
httpfs.authentication.kerberos.name.rules
Important:
You must restart the HttpFS server to have the configuration changes take effect.
where: The --negotiate option enables SPNEGO in curl. The -u : option is required but the user name is
ignored (the principal that has been specified for kinit is used). The -b and -c options are used to store
and send HTTP cookies.
The keystore file will be named .keystore and located in the httpfs user's home directory.
2. You will now be asked a series of questions in an interactive prompt. Below is a sample of what this looks
like, along with some responses:
$ sudo -u httpfs keytool -genkey -alias tomcat -keyalg RSA
Enter keystore password: password
Re-enter new password: password
What is your first and last name?
[Unknown]: httpfs.server.hostname
What is the name of your organizational unit?
[Unknown]: Engineering
What is the name of your organization?
[Unknown]: A Great Company
What is the name of your City or Locality?
[Unknown]: Anywhere
What is the name of your State or Province?
[Unknown]: CA
What is the two-letter country code for this unit?
[Unknown]: US
Is CN=httpfs.server.hostname, OU=Engineering, O=A Great Company, L=Anywhere, ST=CA,
C=US correct?
[no]: yes
Enter key password for <tomcat>
(RETURN if same as keystore password):
Important:
The password you enter for "keystore password" and "key password for <tomcat>" must be the
same. If you want to use a password other than "password", you will need to make an additional
change later when configuring the HttpFS Server.
Important:
The answer to "What is your first and last name?" (i.e. "CN") must be the hostname of the machine
where the HttpFS Server will be running.
The keystore file will be named .keystore and located in the httpfs user's home directory.
Configure the HttpFS Server to use SSL (HTTPS)
1. Stop HttpFS by running
sudo /sbin/service hadoop-httpfs stop
2. To enable SSL, change which configuration the HttpFS server should work with using the alternatives
command.
Note: The alternatives command is only available on RHEL systems. For SLES, Ubuntu and
Debian systems, the command is update-alternatives.
For RHEL systems, to use SSL:
alternatives --set hadoop-httpfs-tomcat-conf /etc/hadoop-httpfs/tomcat-conf.https
Important:
The HTTPFS_SSL_KEYSTORE_PASS variable must be the same as the password used when creating
the keystore file. If you used a password other than password, you'll have to change the value of
the HTTPFS_SSL_KEYSTORE_PASS variable in /etc/hadoop-httpfs/conf/httpfs-env.sh.
3. Start HttpFS by running
sudo /sbin/service hadoop-httpfs start
Where ${JRE_cacerts} is the path to the JRE's certs file. It's location may differ depending on the Operating
System, but its typically called cacerts and located at ${JAVA_HOME}/lib/security/cacerts but may be
under a different directory in ${JAVA_HOME} (you may want to create a backup copy of this file first). The
default password is changeit.
3. When using the HttpFS Client, you will need to use
https://<httpfs_server_hostname>:14000/webhdfs/v1/ instead of
http://<httpfs_server_hostname>:14000/webhdfs/v1/ Java will not automatically redirect from the
http address to the https address.
Note:
These instructions have been tested with CDH and MIT Kerberos 5 only.
Important:
Although an HBase Thrift server can connect to a secured Hadoop cluster, access is not secured from
clients to the HBase Thrift server.
2. On every HBase client host, add the same properties to the hbase-site.xml configuration file:
<property>
<name>hbase.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hbase.rpc.engine</name>
<value>org.apache.hadoop.hbase.ipc.SecureRpcEngine</value>
</property>
where: fully.qualified.domain.name is the host where the HBase server is running YOUR-REALM is the
name of your Kerberos realm
2. Create a keytab file for the HBase server.
$ kadmin
kadmin: xst -k hbase.keytab hbase/fully.qualified.domain.name
3. Copy the hbase.keytab file to the /etc/hbase/conf directory on the HBase server host. The owner of the
hbase.keytab file should be the hbase user and the file should have owner-only read permissions. That is,
assign the file 0600 permissions and make it owned by hbase:hbase.
-r--------
1 hbase
hbase
hbase.keytab
4. To test that the keytab file was created properly, try to obtain Kerberos credentials as the HBase principal
using only the keytab file. Substitute your fully.qualified.domain.name and realm in the following
command:
$ kinit -k -t /etc/hbase/conf/hbase.keytab
hbase/[email protected]
Important:
Make sure you change the /etc/hbase/conf/hbase-site.xml configuration file on all of your
cluster hosts that are running the HBase daemon.
Step 2: Configure HBase Servers and Clients to Authenticate with a Secure ZooKeeper
In order to run a secure HBase, you must also use a secure ZooKeeper. To use your secure ZooKeeper, each
HBase host machine (Master, Region Server, and client) must have a principal that allows it to authenticate with
your secure ZooKeeper ensemble. Note, this HBase section assumes that your secure ZooKeeper is already
configured according to the instructions in the ZooKeeper Security Configuration section and not managed by
HBase.
This HBase section also assumes that you have successfully completed the previous steps, and already have a
principal and keytab file created and in place for every HBase server and client.
Configure HBase JVMs (all Masters, Region Servers, and clients) to use JAAS
1. On each host, set up a Java Authentication and Authorization Service (JAAS) by creating a
/etc/hbase/conf/zk-jaas.conf file that contains the following:
Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
useTicketCache=false
keyTab="/etc/hbase/conf/hbase.keytab"
principal="hbase/fully.qualified.domain.name@<YOUR-REALM>";
};
2. Modify the hbase-env.sh file on HBase server and client hosts to include the following:
export HBASE_OPTS="$HBASE_OPTS
-Djava.security.auth.login.config=/etc/hbase/conf/zk-jaas.conf"
export HBASE_MANAGES_ZK=false
where $ZK_NODES is the comma-separated list of hostnames of the ZooKeeper Quorum hosts that you
configured according to the instructions in ZooKeeper Security Configuration.
2. Add the following lines to the ZooKeeper configuration file zoo.cfg:
kerberos.removeHostFromPrincipal=true
kerberos.removeRealmFromPrincipal=true
Start HBase
If the configuration worked, you should see something similar to the following in the HBase Master and Region
Server logs when you start the cluster:
INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=ZK_QUORUM_SERVER:2181 sessionTimeout=180000 watcher=master:60000
INFO zookeeper.ClientCnxn: Opening socket connection to server /ZK_QUORUM_SERVER:2181
INFO zookeeper.RecoverableZooKeeper: The identifier of this process is
PID@ZK_QUORUM_SERVER
INFO zookeeper.Login: successfully logged in.
INFO client.ZooKeeperSaslClient: Client will use GSSAPI as SASL mechanism.
INFO zookeeper.Login: TGT refresh thread started.
INFO zookeeper.ClientCnxn: Socket connection established to ZK_QUORUM_SERVER:2181,
initiating session
INFO zookeeper.Login: TGT valid starting at:
Sun Apr 08 22:43:59 UTC 2012
INFO zookeeper.Login: TGT expires:
Mon Apr 09 22:43:59 UTC 2012
INFO zookeeper.Login: TGT refresh sleeping until: Mon Apr 09 18:30:37 UTC 2012
INFO zookeeper.ClientCnxn: Session establishment complete on server
ZK_QUORUM_SERVER:2181, sessionid = 0x134106594320000, negotiated timeout = 180000
Scope
Permissions
Description
Senior Administrator
Global
Access, Create
Junior Administrator
Global
Create
Table Administrator
Table
Access
Data Analyst
Table
Read
Web Application
Table
Read, Write
Further Reading
Access Control Matrix
Security - Apache HBase Reference Guide
Note:
Once the Access Controller coprocessor is enabled, any user who uses the HBase shell will be subject
to access control. Access control will also be in effect for native (Java API) client access to HBase.
In the above commands, fields encased in <> are variables, and fields in [] are optional. The permissions
variable must consist of zero or more character from the set "RWCA".
R denotes read permissions, which is required to perform Get, Scan, or Exists calls in a given scope.
W denotes write permissions, which is required to perform Put, Delete, LockRow, UnlockRow,
IncrementColumnValue, CheckAndDelete, CheckAndPut, Flush, or Compact in a given scope.
X denotes execute permissions, which is required to execute coprocessor endpoints.
C denotes create permissions, which is required to perform Create, Alter, or Drop in a given scope.
A denotes admin permissions, which is required to perform Enable, Disable, Snapshot, Restore, Clone,
Split, MajorCompact, Grant, Revoke, and Shutdown in a given scope.
For example:
grant 'user1', 'RWC'
grant 'user2', 'RW', 'tableA'
Be sure to review the information in Understanding HBase Access Levels on page 102 to understand the
implications of the different access levels.
The above code example adds support for the ONE.COM realm in a different realm. So, in the case of replication,
you must add a rule for the master cluster realm in the slave cluster realm. DEFAULT is for defining the default
rule.
3. Add rules for creating short names in the Hadoop processes. To do this, add the
hadoop.security.auth_to_local property in the core-site.xml file in the slave cluster. For example,
to add support for the ONE.COM realm:
<property>
<name>hadoop.security.auth_to_local</name>
<value>
RULE:[2:$1@$0](.*@\QONE.COM\E$)s/@\QONE.COM\E$//
DEFAULT
</value>
</property>
For more information about adding rules, see Appendix C - Configuring the Mapping from Kerberos Principals
to Short Names.
The preceding examples set up a symbolic name of server1 to refer to the current instance of Impala. This
symbolic name is used in the following ways:
In an environment managed by Cloudera Manager, the server name is specified through Impala > Service-Wide
> Advanced > Server Name for Sentry Authorization and Hive > Service-Wide > Advanced > Server Name for
Sentry Authorization. The values must be the same for both, so that Impala and Hive can share the privilege
rules. Restart the Impala and Hive services after setting or changing this value.
In an environment not managed by Cloudera Manager, you specify this value for the sentry.hive.server
property in the sentry-site.xml configuration file for Hive, as well as in the -server_name option for
impalad.
When impalad is started with one or both of the -server_name=server1 and -authorization_policy_file
options, Impala authorization is enabled. If Impala detects any errors or inconsistencies in the authorization
settings or the policy file, the daemon refuses to start.
Using Impala with the Sentry Service (CDH 5.1 or higher only)
When you use the Sentry service rather than the policy file, you set up privileges through GRANT and REVOKE
statement in Hive, then Impala inherits those same privileges automatically. (Currently, Impala does not implement
the GRANT and REVOKE statements.)
Hive already had GRANT and REVOKE statements prior to CDH 5.1, but those statements were not production-ready.
CDH 5.1 is the first release where those statements use the Sentry framework and are considered GA level. If
you used the Hive GRANT and REVOKE statements prior to CDH 5.1, you must set up these privileges with the
CDH 5.1 versions of GRANT and REVOKE to take advantage of Sentry authorization.
For information about using the updated Hive GRANT and REVOKE statements, see Sentry service topic in the
CDH 5 Security Guide.
For the server_name value, substitute the same symbolic name you specify with the impalad -server_name
option. You can use * wildcard characters at each level of the privilege specification to allow access to all such
objects. For example:
server=impala-host.example.com->db=default->table=t1->action=SELECT
server=impala-host.example.com->db=*->table=*->action=CREATE
server=impala-host.example.com->db=*->table=audit_log->action=SELECT
server=impala-host.example.com->db=default->table=t1->action=*
When authorization is enabled, Impala uses the policy file as a whitelist, representing every privilege available
to any user on any object. That is, only operations specified for the appropriate combination of object, role, group,
and user are allowed; all other operations are not allowed. If a group or role is defined multiple times in the
policy file, the last definition takes precedence.
To understand the notion of whitelisting, set up a minimal policy file that does not provide any privileges for any
object. When you connect to an Impala node where this policy file is in effect, you get no results for SHOW
DATABASES, and an error when you issue any SHOW TABLES, USE database_name, DESCRIBE table_name,
SELECT, and or other statements that expect to access databases or tables, even if the corresponding databases
and tables exist.
The contents of the policy file are cached, to avoid a performance penalty for each query. The policy file is
re-checked by each impalad node every 5 minutes. When you make a non-time-sensitive change such as adding
new privileges or new users, you can let the change take effect automatically a few minutes later. If you remove
or reduce privileges, and want the change to take effect immediately, restart the impalad daemon on all nodes,
again specifying the -server_name and -authorization_policy_file options so that the rules from the
updated policy file are applied.
Then the following policy file specifies read-only privilege for that view, without authorizing access to the
underlying table:
[groups]
cloudera = view_only_privs
[roles]
view_only_privs = server=server1->db=reports->table=name_address_view->action=SELECT
Thus, a user with the view_only_privs role could access through Impala queries the basic information but not
the sensitive information, even if both kinds of information were part of the same data file.
You might define other views to allow users from different groups to query different sets of columns.
Separating Administrator Responsibility from Read and Write Privileges
Remember that to create a database requires full privilege on that database, while day-to-day operations on
tables within that database can be performed with lower levels of privilege on specific table. Thus, you might
set up separate roles for each database or application: an administrative one that could create or drop the
database, and a user-level one that can access only the relevant tables.
To enable URIs in per-DB policy files, add the following string in the Cloudera Manager field Impala Service
Environment Advanced Configuration Snippet (Safety Valve):
JAVA_TOOL_OPTIONS="-Dsentry.allow.uri.db.policyfile=true"
Important: Enabling URIs in per-DB policy files introduces a security risk by allowing the owner of
the db-level policy file to grant himself/herself load privileges to anything the impala user has read
permissions for in HDFS (including data in other databases controlled by different db-level policy
files).
The server name is specified by the -server_name option when impalad starts. Specify the same name for all
impalad nodes in the cluster.
URIs represent the HDFS paths you specify as part of statements such as CREATE EXTERNAL TABLE and LOAD
DATA. Typically, you specify what look like UNIX paths, but these locations can also be prefixed with hdfs:// to
make clear that they are really URIs. To set privileges for a URI, specify the name of a directory, and the privilege
applies to all the files in that directory and any directories underneath it.
There are not separate privileges for individual table partitions or columns. To specify read privileges at this
level, you create a view that queries specific columns and/or partitions from a base table, and give SELECT
privilege on the view but not the underlying table. See Views for details about views in Impala.
URIs must start with either hdfs:// or file://. If a URI starts with anything else, it will cause an exception
and the policy file will be invalid. When defining URIs for HDFS, you must also specify the NameNode. For example:
data_read = server=server1->uri=file:///path/to/dir, \
server=server1->uri=hdfs://namenode:port/path/to/dir
Warning:
Because the NameNode host and port must be specified, Cloudera strongly recommends you use
High Availability (HA). This ensures that the URI will remain constant even if the namenode changes.
data_read = server=server1->uri=file:///path/to/dir,\
server=server1->uri=hdfs://ha-nn-uri/path/to/dir
Object
INSERT
DB, TABLE
SELECT
DB, TABLE
ALL
Operation
Scope
Privileges
EXPLAIN
TABLE
SELECT
LOAD DATA
TABLE
INSERT
CREATE DATABASE
SERVER
ALL
DROP DATABASE
DATABASE
ALL
CREATE TABLE
DATABASE
ALL
DROP TABLE
TABLE
ALL
DESCRIBE TABLE
TABLE
SELECT/INSERT
TABLE
ALL
ALL
ALL
ALL
TABLE
ALL
TABLE
ALL
TABLE
ALL
TABLE
ALL
TABLE
ALL
TABLE
ALL
ALTER TABLE ..
PARTITION SET
FILEFORMAT
TABLE
ALL
TABLE
ALL
CREATE VIEW
URI
Others
URI
URI
URI
SELECT on TABLE
Scope
Privileges
DROP VIEW
VIEW/TABLE
ALL
ALTER VIEW
TABLE
ALL
CREATE EXTERNAL
TABLE
ALL, SELECT
SELECT
TABLE
SELECT
USE <dbName>
Any
CREATE FUNCTION
SERVER
ALL
DROP FUNCTION
SERVER
ALL
TABLE
SELECT/INSERT
INVALIDATE METADATA
SERVER
ALL
INVALIDATE METADATA
<table name>
TABLE
SELECT/INSERT
COMPUTE STATS
TABLE
ALL
TABLE
SELECT/INSERT
TABLE
SELECT/INSERT
SHOW FUNCTIONS
DATABASE
SELECT
SHOW TABLES
No special
privileges needed
to issue the
statement, but
only shows objects
you are authorized
for
SHOW DATABASES,
SHOW SCHEMAS
No special
privileges needed
to issue the
URI
URI
Others
Scope
Privileges
URI
Others
statement, but
only shows objects
you are authorized
for
which indicate each evaluation Sentry makes. The FilePermission is from the policy file, while
RequestPermission is the privilege required for the query. A RequestPermission will iterate over all appropriate
FilePermission settings until a match is found. If no matching privilege is found, Sentry returns false indicating
Access Denied.
Note: Make sure to use single quotes or escape characters to ensure that any * characters do not
undergo wildcard expansion when specified in command-line arguments.
See Modifying Impala Startup Options for details about adding or changing impalad startup options. See this
Cloudera blog post for background information about the impersonation capability in HiveServer2.
Important:
If you plan to use Impala in your cluster, you must configure your KDC to allow tickets to be renewed,
and you must configure krb5.conf to request renewable tickets. Typically, you can do this by adding
the max_renewable_life setting to your realm in kdc.conf, and by adding the renew_lifetime
parameter to the libdefaults section of krb5.conf. For more information about renewable tickets,
see the Kerberos documentation.
Currently, you cannot use the resource management feature in CDH 5 on a cluster that has Kerberos
authentication enabled.
Start all impalad and statestored daemons with the --principal and --keytab-file flags set to the
principal and full path name of the keytab file containing the credentials for the principal.
122 | CDH 5 Security Guide
Blank for the ODBC 2.0 driver or higher, when connecting to a secure cluster.
HS2NoSasl for the ODBC 2.0 driver or higher, when connecting to a non-secure cluster.
To enable Kerberos in the Impala shell, start the impala-shell command using the -k flag.
To enable Impala to work with Kerberos security on your Hadoop cluster, make sure you perform the installation
and configuration steps in the topic on Configuring Hadoop Security in the CDH4 Security Guide or the CDH 5
Security Guide. Also note that when Kerberos security is enabled in Impala, a web browser that supports Kerberos
HTTP SPNEGO is required to access the Impala web console (for example, Firefox, Internet Explorer, or Chrome).
If the NameNode, Secondary NameNode, DataNode, JobTracker, TaskTrackers, ResourceManager, NodeManagers,
HttpFS, Oozie, Impala, or Impala statestore services are configured to use Kerberos HTTP SPNEGO authentication,
and two or more of these services are running on the same host, then all of the running services must use the
same HTTP principal and keytab file used for their HTTP endpoints.
Creating, merging, and distributing key tab files for these principals.
Editing /etc/default/impala (in cluster not managed by Cloudera Manager), or editing the Security settings
in the Cloudera Manager interface, to accommodate Kerberos authentication.
Note: The HTTP component of the service principal must be uppercase as shown in the preceding
example.
3. Create keytab files with both principals. For example:
kadmin: xst -k impala.keytab impala/impala_host.example.com
kadmin: xst -k http.keytab HTTP/impala_host.example.com
kadmin: quit
4. Use ktutil to read the contents of the two keytab files and then write those contents to a new file. For
example:
$ ktutil
ktutil: rkt impala.keytab
ktutil: rkt http.keytab
ktutil: wkt impala-http.keytab
ktutil: quit
6. Copy the impala-http.keytab file to the Impala configuration directory. Change the permissions to be only
read for the file owner and change the file owner to the impala user. By default, the Impala user and group
are both named impala. For example:
$
$
$
$
cp impala-http.keytab /etc/impala/conf
cd /etc/impala/conf
chmod 400 impala-http.keytab
chown impala:impala impala-http.keytab
7. Add Kerberos options to the Impala defaults file, /etc/default/impala. Add the options for both the
impalad and statestored daemons, using the IMPALA_SERVER_ARGS and IMPALA_STATE_STORE_ARGS
variables. For example, you might add:
-kerberos_reinit_interval=60
-principal=impala_1/[email protected]
-keytab_file=/var/run/cloudera-scm-agent/process/3212-impala-IMPALAD/impala.keytab
For more information on changing the Impala defaults specified in /etc/default/impala, see Modifying
Impala Startup Options.
Note: Restart impalad and statestored for these configuration changes to take effect.
Query ID
Statement Type - DML, DDL, and so on
SQL statement text
Execution start time, in local time
Execution Status - Details on any errors that were encountered
Target Catalog Objects:
CDH 5 Security Guide | 127
where:
hive.server2.authentication in particular, is a client-facing property that controls the type of
authentication HiveServer2 uses for connections to clients. In this case, HiveServer2 uses Kerberos to
authenticate incoming clients.
The [email protected] value in the example above is the Kerberos principal for the host where
HiveServer2 is running. The special string _HOST in the properties is replaced at run-time by the fully-qualified
domain name of the host machine where the daemon is running. This requires that reverse DNS is properly
where hive is the principal configured in hive-site.xml and HiveServer2Host is the host where HiveServer2
is running.
For ODBC Clients, refer the Cloudera ODBC Driver for Apache Hive documentation.
Using Beeline to Connect to a Secure HiveServer2
Use the following command to start beeline and connect to a secure running HiveServer2 process. In this
example, the HiveServer2 process is running on localhost at port 10000:
$ /usr/lib/hive/bin/beeline
beeline> !connect
jdbc:hive2://localhost:10000/default;principal=hive/[email protected]
0: jdbc:hive2://localhost:10000/default>
For more information about the Beeline CLI, see Using the Beeline CLI.
or: the Trust Store arguments are set using the Java system properties javax.net.ssl.trustStore
and javax.net.ssl.trustStorePassword; for example:
java -Djavax.net.ssl.trustStore=/home/usr1/ssl/trust_store.jks
-Djavax.net.ssl.trustStorePassword=xyz \
MyClass jdbc:hive2://localhost:10000/default;ssl=true
For more information on using self-signed certificates and the Trust Store, see the Oracle Java SE keytool page.
where:
The LDAP_URL value is the access URL for your LDAP server. For example, ldap://[email protected].
Enabling LDAP Authentication with HiveServer2 using OpenLDAP
To enable the LDAP mode of authentication using OpenLDAP, include the following properties in the
hive-site.xml file:
<property>
<name>hive.server2.authentication</name>
<value>LDAP</value>
</property>
<property>
<name>hive.server2.authentication.ldap.url</name>
<value>LDAP_URL</value>
</property>
<property>
<name>hive.server2.authentication.ldap.baseDN</name>
<value>LDAP_BaseDN</value>
</property>
where:
The LDAP_URL value is the access URL for your LDAP server.
The LDAP_BaseDN value is the base LDAP DN for your LDAP server. For example,
ou=People,dc=example,dc=com.
Configuring JDBC Clients for LDAP Authentication with HiveServer2
The JDBC client needs to use a connection URL as shown below. JDBC-based clients must include user=LDAP_Userid;password=LDAP_Password in the JDBC connection string.
For example:
String url = "jdbc:hive2://node1:10000/default;user=LDAP_Userid;password=LDAP_Password"
Connection con = DriverManager.getConnection(url);
where the LDAP_Userid value is the user id and LDAP_Password is the password of the client user.
For ODBC Clients, refer the Cloudera ODBC Driver for Apache Hive documentation.
For clusters managed by Cloudera Manager, go to the Hive service and select Configuration > View and Edit.
Under the HiveServer2 category, go to the Advanced section and set the HiveServer2 Environment Safety
Valve property.
Restart HiveServer2.
Pluggable Authentication
Pluggable authentication allows you to provide a custom authentication provider for HiveServer2.
To enable pluggable authentication:
1. Set the following properties in /etc/hive/conf/hive-site.xml:
<property>
<name>hive.server2.authentication</name>
<value>CUSTOM</value>
<description>Client authentication types.
NONE: no authentication check
LDAP: LDAP/AD based authentication
KERBEROS: Kerberos/GSSAPI authentication
CUSTOM: Custom authentication provider
(Use with property hive.server2.custom.authentication.class)
</description>
</property>
<property>
<name>hive.server2.custom.authentication.class</name>
<value>pluggable-auth-class-name</value>
<description>
Custom authentication class. Used when property
'hive.server2.authentication' is set to 'CUSTOM'. Provided class
must be a proper implementation of the interface
org.apache.hive.service.auth.PasswdAuthenticationProvider. HiveServer2
will call its Authenticate(user, passed) method to authenticate requests.
The implementation may optionally extend the Hadoop's
org.apache.hadoop.conf.Configured class to grab Hive's Configuration object.
</description>
</property>
HiveServer2 Impersonation
Note: This is not the recommended method to implement HiveServer2 impersonation. Cloudera
recommends you use Sentry to implement this instead.
Impersonation support in HiveServer2 allows users to execute queries and access HDFS files as the connected
user rather than the super user who started the HiveServer2 daemon. Impersonation allows admins to enforce
an access policy at the file level using HDFS file and directory permissions.
To enable impersonation in HiveServer2:
1. Add the following property to the /etc/hive/conf/hive-site.xml file and set the value to true. (The
default value is false.)
<property>
<name>hive.server2.enable.impersonation</name>
<description>Enable user impersonation for HiveServer2</description>
<value>true</value>
</property>
2. In HDFS or MapReduce configurations, add the following property to the core-site.xml file:
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
Note:
The values shown above for the hive.metastore.kerberos.keytab.file and
hive.metastore.kerberos.principal properties are examples which you will need to replace
with the appropriate values for your cluster. Also note that the Hive keytab file should have its
access permissions set to 600 and be owned by the same account that is used to run the Metastore
server, which is the hive user by default.
Requests to access the metadata are fulfilled by the Hive metastore impersonating the requesting user. This
includes read access to the list of databases, tables, properties of each table such as their HDFS location, file
type and so on. You can restrict access to the Hive metastore service by allowing it to impersonate only a
subset of Kerberos users. This can be done by setting the hadoop.proxyuser.hive.groups property in
core-site.xml on the Hive metastore host.
For example, if you want to give the hive user permission to impersonate members of groups hive and
user1:
<property>
<name>hadoop.proxyuser.hive.groups</name>
In this example, the Hive metastore can impersonate users belonging to only the hive and user1 groups.
Connection requests from users not belonging to these groups will be rejected.
where:
You replace YOUR-REALM with the name of your Kerberos realm
You replace zookeeper1,zookeeper2,zookeeper3 with the names of your ZooKeeper servers. The
hbase.zookeeper.quorum property is configured in the hbase-site.xml file.
The special string _HOST is replaced at run-time by the fully-qualified domain name of the host machine
where the HBase Master or Region Server is running. This requires that reverse DNS is properly working on
all the hosts configured this way.
In the following, _HOST is the name of the host where the HBase Master is running:
-hiveconf hbase.master.kerberos.principal=hbase/[email protected]
In the following, _HOST is the host name of the HBase Region Server that the application is connecting to:
-hiveconf hbase.regionserver.kerberos.principal=hbase/[email protected]
Tip:
You can also set the HIVE_OPTS environment variable in your shell profile.
For more information about HCatalog see Installing and Using HCatalog.
2. Move the file into the WebHCat configuration directory and restrict its access exclusively to the hcatalog
user:
$ mv HTTP.keytab /etc/webhcat/conf/
$ chown hcatalog /etc/webhcat/conf/HTTP.keytab
$ chmod 400 /etc/webhcat/conf/HTTP.keytab
Value
templeton.kerberos.secret
templeton.kerberos.keytab
/etc/webhcat/conf/HTTP.keytab
templeton.kerberos.principal
HTTP/[email protected]
Example configuration:
<property>
<name>templeton.kerberos.secret</name>
<value>SuPerS3c3tV@lue!</value>
3. Test that the credentials in the keytab file work. For example:
$ klist -e -k -t llama.keytab
4. Copy the llama.keytab file to the Llama configuration directory. The owner of the llama.keytab file should
be the llama user and the file should have owner-only read permissions.
5. Edit the Llama llama-site.xml configuration file in the Llama configuration directory by setting the following
properties:
Property
Value
llama.am.server.thrift.security
true
llama.am.server.thrift.kerberos.keytab.file
llama/conf.keytab
llama.am.server.thrift.kerberos.server.principal.name llama/<fully.qualified.domain.name>
llama.am.server.thrift.kerberos.notification.principal.name impala
Important:
You must restart Llama to make the configuration changes take effect.
3. Copy the zookeeper.keytab file to the ZooKeeper configuration directory on the ZooKeeper server host.
For a package installation, the ZooKeeper configuration directory is /etc/zookeeper/conf/. For a tar ball
installation, the ZooKeeper configuration directory is <EXPANDED_DIR>/conf. The owner of the
zookeeper.keytab file should be the zookeeper user and the file should have owner-only read permissions.
4. Add the following lines to the ZooKeeper configuration file zoo.cfg:
authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
jaasLoginRenew=3600000
5. Set up the Java Authentication and Authorization Service (JAAS) by creating a jaas.conf file in the ZooKeeper
configuration directory containing the following settings. Make sure that you substitute
fully.qualified.domain.name as appropriate.
Server {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="/etc/zookeeper/conf/zookeeper.keytab"
storeKey=true
useTicketCache=false
principal="zookeeper/fully.qualified.domain.name@<YOUR-REALM>";
};
7. If you have multiple ZooKeeper servers in the ensemble, repeat steps 1 through 6 above for each ZooKeeper
server. When you create each new Zookeeper Server keytab file in step 2, you can overwrite the previous
keytab file and use the same name (zookeeper.keytab) to maintain consistency across the ZooKeeper
servers in the ensemble. The difference in the keytab files will be the hostname where each server is running.
8. Restart the ZooKeeper server to have the configuration changes take effect. For instructions, see ZooKeeper
Installation.
Note:
Some versions of kadmin do not support the -norandkey option in the command above. If your
version does not, you can omit it from the command. Note that doing so will result in a new
password being generated every time you export a keytab, which will invalidate previously-exported
keytabs.
3. Set up JAAS in the configuration directory on the host where the ZooKeeper client shell is running. For a
package installation, the configuration directory is /etc/zookeeper/conf/. For a tar ball installation, the
configuration directory is <EXPANDED_DIR>/conf. Create a jaas.conf file containing the following settings:
Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="/path/to/zkcli.keytab"
storeKey=true
useTicketCache=false
principal="zkcli@<YOUR-REALM>";
};
4. Add the following setting to the java.env file located in the configuration directory. (Create the file if it does
not already exist.)
export JVMFLAGS="-Djava.security.auth.login.config=/etc/zookeeper/conf/jaas.conf"
3. Create a protected znode from within the ZooKeeper CLI. Make sure that you substitute YOUR-REALM as
appropriate.
create /znode1 znode1data sasl:zkcli@{{YOUR-REALM}}:cdwra
The results from getAcl should show that the proper scheme and permissions were applied to the znode.
Note:
The HTTP/ component of the HTTP service user principal must be upper case as shown in the
syntax and example above.
3. Create keytab files with both principals.
kadmin: xst -norandkey -k solr.keytab solr/fully.qualified.domain.name \
HTTP/fully.qualified.domain.name
4. Test that credentials in the merged keytab file work. For example:
$ klist -e -k -t solr.keytab
5. Copy the solr.keytab file to the Solr configuration directory. The owner of the solr.keytab file should be
the solr user and the file should have owner-only read permissions.
To modify default configurations
Repeat this process on all Solr server nodes.
1. Ensure that the following properties appear in /etc/default/solr and that they are uncommented. Modify
these properties to match your environment. The relevant properties to be uncommented and modified are:
SOLR_AUTHENTICATION_TYPE=kerberos
SOLR_AUTHENTICATION_SIMPLE_ALLOW_ANON=true
SOLR_AUTHENTICATION_KERBEROS_KEYTAB=/etc/solr/conf/solr.keytab
SOLR_AUTHENTICATION_KERBEROS_PRINCIPAL=HTTP/localhost@LOCALHOST
SOLR_AUTHENTICATION_KERBEROS_NAME_RULES=DEFAULT
SOLR_AUTHENTICATION_JAAS_CONF=/etc/solr/conf/jaas.conf
Using Kerberos
The process of enabling Solr clients to authenticate with a secure Solr is specific to the client. This section
demonstrates:
Using Kerberos and curl
Using solrctl
Configuring SolrJ Library Usage
This enables technologies including:
Command-line solutions
Java applications
The MapReduceIndexerTool
Configuring Flume Morphline Solr Sink Usage
Secure Solr requires that the CDH components that it interacts with are also secure. Secure Solr interacts with
HDFS, ZooKeeper and optionally HBase, MapReduce, and Flume. See the CDH 5 Security Guide or the CDH 4
Security Guide for more information.
Using Kerberos and curl
You can use Kerberos authentication with clients such as curl. To use curl, begin by acquiring valid Kerberos
credentials and then execute the desired command. For example, you might use commands similar to the
following:
$ kinit -kt username.keytab username
$ curl --negotiate -u: foo:bar http://solrserver:8983/solr/
Java applications
Set the Java system property java.security.auth.login.config. For example, if the JAAS configuration
file is located on the filesystem as /home/user/jaas-client.conf. The Java system property
java.security.auth.login.config must be set to point to this file. Setting a Java system property can
be done programmatically, for example using a call such as:
System.setProperty("java.security.auth.login.config",
"/home/user/jaas-client.conf");
3. Add the flume JAAS configuration to the JAVA_OPTS in /etc/flume-ng/conf/flume-env.sh. For example,
you might change:
JAVA_OPTS="-Xmx500m"
to:
JAVA_OPTS="-Xmx500m
-Djava.security.auth.login.config=/etc/flume-ng/conf/jaas-client.conf"
A role can contain multiple such rules, separated by commas. For example the engineer_role might contain
the Query privilege for hive_logs and hbase_logs collections, and the Update privilege for the current_bugs
collection. You would specify this as follows:
engineer_role = collection=hive_logs->action=Query, collection=hbase_logs->action=Query,
collection=current_bugs->action=Update
Here the group dev_ops is granted the roles dev_role and ops_role. The members of this group can complete
searches that are allowed by these roles.
Note: Note that, by default, this uses local shell groups. See the Group Mapping section of the HDFS
Permissions Guide for more information.
OR
To configure local groups:
Policy File
The sections that follow contain notes on creating and maintaining the policy file.
Warning: An invalid configuration disables all authorization while logging an exception.
Defining Roles
Keep in mind that role definitions are not cumulative; the newer definition replaces the older one. For example,
the following results in role1 having privilege2, not privilege1 and privilege2.
role1 = privilege1
role1 = privilege2
Sample Configuration
This section provides a sample configuration.
Note: Sentry with CDH Search does not support multiple policy files. Other implementations of Sentry
such as Sentry for Hive do support different policy files for different databases, but Sentry for CDH
Search has no such support for multiple policies.
Policy File
The following is an example of a CDH Search policy file. The sentry-provider.ini would exist in an HDFS
location such as hdfs://ha-nn-uri/user/solr/sentry/sentry-provider.ini.
If you have an existing collection using a version of solrconfig.xml that you have modified, contact Support
for assistance.
The enabled Boolean determines whether document-level authorization is enabled. To enable document
level security, change this setting to true.
The sentryAuthField string specifies the name of the field that is used for storing authorization information.
You can use the default setting of sentry_auth or you can specify some other string that you will use for
assigning values on ingest.
Note: This field must exist as an explicit or dynamic field in the schema. sentry_auth exists in
the default schema.xml.
The allRolesToken string represents a special token defined to allow any role access to the document. By
default, this feature is disabled. To enable this feature, uncomment the specification and specify the token.
This token should be different from the name of any sentry role to avoid collision. By default it is "*". This
feature is useful when first configuring document level security or it can be useful in granting all roles access
to a document when the set of roles may change. See the following Best Practices section for additional
information.
Best Practices
Using the allGroupsToken
You may want to grant every user that belongs to a role access to certain documents. One way to accomplish
this is to specify all known roles in the document, but this requires updating or reindexing the document if you
add a new role. Alternatively, an "allUser" role, specified in the Sentry .ini file, could contain all valid groups, but
this role would need to be updated every time a new group was added to the system. Instead, specifying the
allGroupsToken allows any user that belongs to a valid role to access the document. This access requires no
updating as the system evolves.
In addition, the allGroupsToken may be useful for transitioning a deployment to use document-level security.
Instead of having to define all the roles upfront, all the documents can be specified with the allGroupsToken
and later modified as the roles are defined.
Consequences of Document-Level Authorization Only Affecting Queries
Document-level security does not prevent users from modifying documents or performing other update operations
on the collection. Update operations are only governed by collection-level authorization.
To change the supported number of clauses, edit the maxBooleanClauses setting in solrconfig.xml. For
example, to allow 2048 clauses, you would edit the setting so it appears as follows:
<maxBooleanClauses>2048</maxBooleanClauses>
For maxBooleanClauses to be applied as expected, make any change to this value to all collections and then
restart the service. You must make this change to all collections because this option modifies a global Lucene
property, affecting all SolrCores. If different solrconfig.xml files have different values for this property, the
effective value is determined per node, based on the first SolrCore to be initialized.
Note: Cloudera Manager has its own management of secure impersonation for Hue. To add additional
users for Secure Impersonation, use the environment variable safety value for Solr to set the
environment variables as above. Be sure to include "hue" in SOLR_SECURITY_ALLOWED_PROXYUSERS
if you want to use secure impersonation for hue.
which indicate each evaluation Sentry makes. The FilePermission is from the policy file, while
RequestPermission is the privilege required for the query. A RequestPermission will iterate over all appropriate
FilePermission settings until a match is found. If no matching privilege is found, Sentry returns false indicating
Access Denied.
Required Privilege
select
QUERY
collection1
query
QUERY
collection1
get
QUERY
collection1
browse
QUERY
collection1
tvrh
QUERY
collection1
clustering
QUERY
collection1
terms
QUERY
collection1
elevate
QUERY
collection1
analysis/field
QUERY
collection1
analysis/document
QUERY
collection1
update
UPDATE
collection1
update/json
UPDATE
collection1
update/csv
UPDATE
collection1
Required Privilege
create
UPDATE
admin, collection1
Required Privilege
delete
UPDATE
admin, collection1
reload
UPDATE
admin, collection1
createAlias
UPDATE
admin, collection1
Note: "collection1" here
refers to the name of the
alias, not the underlying
collection(s). For example,
http://YOUR-HOST:8983/
solr/admin/collections?action=
CREATEALIAS&name=collection1
&collections=underlyingCollection
deleteAlias
UPDATE
admin, collection1
Note: "collection1" here
refers to the name of the
alias, not the underlying
collection(s). For example,
http://YOUR-HOST:8983/
solr/admin/collections?action=
DELETEALIAS&name=collection1
syncShard
UPDATE
admin, collection1
splitShard
UPDATE
admin, collection1
deleteShard
UPDATE
admin, collection1
Required Privilege
create
UPDATE
admin, collection1
rename
UPDATE
admin, collection1
load
UPDATE
admin, collection1
unload
UPDATE
admin, collection1
status
UPDATE
admin, collection1
persist
UPDATE
admin
reload
UPDATE
admin, collection1
swap
UPDATE
admin, collection1
mergeIndexes
UPDATE
admin, collection1
split
UPDATE
admin, collection1
prepRecover
UPDATE
admin, collection1
requestRecover
UPDATE
admin, collection1
requestSyncShard
UPDATE
admin, collection1
requestApplyUpdates
UPDATE
admin, collection1
Required Privilege
LukeRequestHandler
QUERY
admin
SystemInfoHandler
QUERY
admin
SolrInfoMBeanHandler
QUERY
admin
PluginInfoHandler
QUERY
admin
ThreadDumpHandler
QUERY
admin
PropertiesRequestHandler
QUERY
admin
LogginHandler
admin
ShowFileRequestHandler
QUERY
admin
Configuring Encrypted Shuffle, Encrypted Web UIs, and Encrypted HDFS Transport
Configuring Encrypted Shuffle, Encrypted Web UIs, and Encrypted HDFS Transport
core-site.xml Properties
To configure encrypted shuffle, set the following properties in the core-site.xml files of all nodes in the cluster:
Property
Default Value
Explanation
hadoop.ssl.enabled
false
hadoop.ssl.require.client.cert false
hadoop.ssl.hostname.verifier DEFAULT
hadoop.ssl.keystores.factory.class org.apache.hadoop
.security.ssl.
FileBasedKeyStoresFactory
The KeyStoresFactory
implementation to use.
hadoop.ssl.server.conf
ssl-server.xml
hadoop.ssl.client.conf
ssl-client.xml
Note:
All these properties should be marked as final in the cluster configuration files.
Example
<configuration>
...
<property>
<name>hadoop.ssl.require.client.cert</name>
<value>false</value>
Configuring Encrypted Shuffle, Encrypted Web UIs, and Encrypted HDFS Transport
<final>true</final>
</property>
<property>
<name>hadoop.ssl.hostname.verifier</name>
<value>DEFAULT</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.keystores.factory.class</name>
<value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.server.conf</name>
<value>ssl-server.xml</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.client.conf</name>
<value>ssl-client.xml</value>
<final>true</final>
</property>
<property>
<name>hadoop.ssl.enabled</name>
<value>true</value>
</property>
...
</configuration>
The cluster should be configured to use the Linux Task Controller in MRv1 and Linux container executor in MRv2
to run job tasks so that they are prevented from reading the server keystore information and gaining access to
the shuffle server certificates. Refer to Appendix B - Information about Other Hadoop Security Programs for
more information.
mapred-site.xml Property (MRv2 only)
To enable Encrypted Shuffle for MRv2, set the following property in the mapred-site.xml file on every node in
the cluster:
Property
Default Value
mapreduce.shuffle.ssl.enabled false
Explanation
If this property is set to true,
encrypted shuffle is enabled. If this
property is not specified, it defaults
to the value of
hadoop.ssl.enabled. This value
can be false when
hadoop.ssl.enabled is true but
can not be true when
hadoop.ssl.enabled is false
Configuring Encrypted Shuffle, Encrypted Web UIs, and Encrypted HDFS Transport
<final>true</final>
</property>
...
</configuration>
Default Value
Description
ssl.server.keystore.type
jks
ssl.server.keystore.location NONE
ssl.server.keystore.password NONE
ssl.server.keystore.keypassword NONE
Key password
ssl.server.truststore.type
jks
ssl.server.truststore.location NONE
ssl.server.truststore.password NONE
ssl.server.truststore.reload.interval 10000
Example
<configuration>
<!-- Server Certificate Store -->
<property>
<name>ssl.server.keystore.type</name>
<value>jks</value>
</property>
<property>
<name>ssl.server.keystore.location</name>
<value>${user.home}/keystores/server-keystore.jks</value>
</property>
<property>
<name>ssl.server.keystore.password</name>
<value>serverfoo</value>
</property>
<property>
<name>ssl.server.keystore.keypassword</name>
<value>serverfoo</value>
</property>
<!-- Server Trust Store -->
Configuring Encrypted Shuffle, Encrypted Web UIs, and Encrypted HDFS Transport
<property>
<name>ssl.server.truststore.type</name>
<value>jks</value>
</property>
<property>
<name>ssl.server.truststore.location</name>
<value>${user.home}/keystores/truststore.jks</value>
</property>
<property>
<name>ssl.server.truststore.password</name>
<value>clientserverbar</value>
</property>
<property>
<name>ssl.server.truststore.reload.interval</name>
<value>10000</value>
</property>
</configuration>
Default Value
Description
ssl.client.keystore.type
jks
ssl.client.keystore.location NONE
ssl.client.keystore.password NONE
ssl.client.keystore.keypassword NONE
Key password
ssl.client.truststore.type
jks
ssl.client.truststore.location NONE
ssl.client.truststore.password NONE
ssl.client.truststore.reload.interval 10000
Example
<configuration>
<!-- Client certificate Store -->
<property>
<name>ssl.client.keystore.type</name>
<value>jks</value>
</property>
<property>
<name>ssl.client.keystore.location</name>
<value>${user.home}/keystores/client-keystore.jks</value>
</property>
<property>
<name>ssl.client.keystore.password</name>
<value>clientfoo</value>
</property>
<property>
<name>ssl.client.keystore.keypassword</name>
Configuring Encrypted Shuffle, Encrypted Web UIs, and Encrypted HDFS Transport
<value>clientfoo</value>
</property>
<!-- Client Trust Store -->
<property>
<name>ssl.client.truststore.type</name>
<value>jks</value>
</property>
<property>
<name>ssl.client.truststore.location</name>
<value>${user.home}/keystores/truststore.jks</value>
</property>
<property>
<name>ssl.client.truststore.password</name>
<value>clientserverbar</value>
</property>
<property>
<name>ssl.client.truststore.reload.interval</name>
<value>10000</value>
</property>
</configuration>
Client Certificates
Client Certificates are supported but they do not guarantee that the client is a reducer task for the job. The Client
Certificate keystore file that contains the private key must be readable by all users who submit jobs to the
cluster, which means that a rogue job could read those keystore files and use the client certificates in them to
establish a secure connection with a Shuffle server. The JobToken mechanism that the Hadoop environment
provides is a better protector of the data; each job uses its own JobToken to retrieve only the shuffle data that
belongs to it. Unless the rogue job has a proper JobToken, it cannot retrieve Shuffle data from the Shuffle server.
Important:
If your certificates are signed by a certificate authority (CA), you must include the complete chain of
CA certificates in the keystore that has the server's key.
Reloading Truststores
By default, each truststore reloads its configuration every 10 seconds. If a new truststore file is copied over the
old one, it is re-read, and its certificates replace the old ones. This mechanism is useful for adding or removing
nodes from the cluster, or for adding or removing trusted clients. In these cases, the client, TaskTracker or
NodeManager certificate is added to (or removed from) all the truststore files in the system, and the new
configuration is picked up without requiring that the TaskTracker in MRv1 and NodeManager in YARN daemons
are restarted.
Configuring Encrypted Shuffle, Encrypted Web UIs, and Encrypted HDFS Transport
Note:
The keystores are not automatically reloaded. To change a keystore for a TaskTracker in MRv1 or a
NodeManager in YARN, you must restart the TaskTracker or NodeManager daemon.
The reload interval is controlled by the ssl.client.truststore.reload.interval and
ssl.server.truststore.reload.interval configuration properties in the ssl-client.xml and
ssl-server.xml files described above.
Debugging
Important:
Enable debugging only for troubleshooting, and then only for jobs running on small amounts of data.
Debugging is very verbose and slows jobs down significantly.
To enable SSL debugging in the reducers, set -Djavax.net.debug=all in the mapred.reduce.child.java.opts
property; for example:
<configuration>
...
<property>
<name>mapred.reduce.child.java.opts</name>
<value>-Xmx200m -Djavax.net.debug=all</value>
</property>
...
</configuration>
2. Type the following command to add the local realm trust to Active Directory:
netdom trust YOUR-LOCAL-REALM.COMPANY.COM /Domain:AD-REALM.COMPANY.COM /add /realm
/passwordt:<TrustPassword>
On Windows 2008:
ksetup /SetEncTypeAttr YOUR-LOCAL-REALM.COMPANY.COM <enc_type>
where the <enc_type> parameter specifies AES, DES, or RC4 encryption. Refer to the documentation for your
version of Windows Active Directory to find the <enc_type> parameter string to use.
Important: Make sure the encryption type you specify is supported on both your version of Windows
Active Directory and your version of MIT Kerberos.
where the <enc_type_list> parameter specifies the types of encryption this cross-realm krbtgt principal will
support: either AES, DES, or RC4 encryption. You can specify multiple encryption types using the parameter in
the command above, what's important is that at least one of the encryption types corresponds to the encryption
type found in the tickets granted by the KDC in the remote realm. For example:
kadmin: addprinc -e "rc4-hmac:normal des3-hmac-sha1:normal"
krbtgt/[email protected]
The cross-realm krbtgt principal that you add in this step must have at least one entry that uses the same
encryption type as the tickets that are issued by the remote KDC. If no entries have the same encryption
type, then the problem you will see is that authenticating as a principal in the local realm will allow you to
successfully run Hadoop commands, but authenticating as a principal in the remote realm will not allow you
to run Hadoop commands.
2. To properly translate principal names from the Active Directory realm into local names within Hadoop, you
must configure the hadoop.security.auth_to_local setting in the core-site.xml file on all of the
cluster machines. The following example translates all principal names with the realm
AD-REALM.CORP.FOO.COM into the first component of the principal name only. It also preserves the standard
translation for the default realm (the cluster realm).
<property>
<name>hadoop.security.auth_to_local</name>
<value>
RULE:[1:$1@$0](^.*@AD-REALM\.CORP\.FOO\.COM$)s/^(.*)@AD-REALM\.CORP\.FOO\.COM$/$1/g
RULE:[2:$1@$0](^.*@AD-REALM\.CORP\.FOO\.COM$)s/^(.*)@AD-REALM\.CORP\.FOO\.COM$/$1/g
DEFAULT
</value>
</property>
For more information about name mapping rules, see: Configuring the Mapping from Kerberos Principals to
Short Names
2. When a client sends a request, the authenticate method will be called. For browsers,
AltKerberosAuthenticationHandler will call the alternateAuthenticate method, which is what you
need to implement to interact with the desired authentication mechanism. For non-browsers,
AltKerberosAuthenticationHandler will follow the Kerberos SPNEGO sequence (this is provided for you).
3. The alternateAuthenticate(HttpServletRequest request, HttpServletResponse response)
method in your subclass should following these rules:
4. Return null if the authentication is still in progress; the response object can be used to interact with the
client.
5. Throw an AuthenticationException if the authentication failed.
6. Return an AuthenticationToken if the authentication completed successfully.
3. (Optional) You can also specify which user-agents you do not want to be considered as browsers by setting
the following property as required (default value is shown). Note that all Java-based programs (such as
Hadoop client) will use java as their user-agent.
<property>
<name>hadoop.http.authentication.alt-kerberos.non-browser.user-agents</name>
<value>java,curl,wget,perl</value>
</property>
Appendix A Troubleshooting
Appendix A Troubleshooting
This Troubleshooting appendix contains sample Kerberos configuration files, krb5.conf and kdc.conf for your
reference. It also has solutions to potential problems you might face when configuring a secure cluster:
Sample Kerberos Configuration files: krb5.conf, kdc.conf, kadm5.acl
Problem 1: Running any Hadoop command fails after enabling security.
Problem 2: Java is unable to read the Kerberos credentials cache created by versions of MIT Kerberos 1.8.1
or higher.
Problem 3: java.io.IOException: Incorrect permission
Problem 4: A cluster fails to run jobs after security is enabled.
Problem 5: The NameNode does not start and KrbException Messages (906) and (31) are displayed.
Problem 6: The NameNode starts but clients cannot connect to it and error message contains enctype code
18.
(MRv1 Only) Problem 7: Jobs won't run and TaskTracker is unable to create a local mapred directory.
(MRv1 Only) Problem 8: Jobs won't run and TaskTracker is unable to create a Hadoop logs directory.
Problem 9: After you enable cross-realm trust, you can run Hadoop commands in the local realm but not in
the remote realm.
mapred.local.dir> (MRv1 Only) Problem 10: Jobs won't run and can't access files in mapred.local.dir.
Problem 11: Users are unable to obtain credentials when running Hadoop jobs or commands.
Problem 12: Request is a replay exceptions in the logs. on page 188
Sample krb5.conf:
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults]
default_realm = EXAMPLE.COM
dns_lookup_realm = false
Appendix A Troubleshooting
dns_lookup_kdc = false
ticket_lifetime = 24h
renew_lifetime = 7d
forwardable = true
# udp_preference_limit = 1
#
#
#
#
# uncomment the following if AD cross realm auth is ONLY providing DES encrypted tickets
# allow-weak-crypto = true
[realms]
AD-REALM.EXAMPLE.COM = {
kdc = AD1.ad-realm.example.com:88
kdc = AD2.ad-realm.example.com:88
admin_server = AD1.ad-realm.example.com:749
admin_server = AD2.ad-realm.example.com:749
default_domain = ad-realm.example.com
}
EXAMPLE.COM = {
kdc = kdc1.example.com:88
admin_server = kdc1.example.com:749
default_domain = example.com
}
# The domain_realm is critical for mapping your host domain names to the kerberos realms
# that are servicing them. Make sure the lowercase left hand portion indicates any
domains or subdomains
# that will be related to the kerberos REALM on the right hand side of the expression.
REALMs will
# always be UPPERCASE. For example, if your actual DNS domain was test.com but your
kerberos REALM is
# EXAMPLE.COM then you would have,
[domain_realm]
test.com = EXAMPLE.COM
#AD domains and realms are usually the same
ad-domain.example.com = AD-REALM.EXAMPLE.COM
ad-realm.example.com = AD-REALM.EXAMPLE.COM
Sample kadm5.acl:
*/[email protected] *
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
*
*
*
*
*
*
*
*
*
*
*
*
*
*
flume/*@HADOOP.COM
hbase/*@HADOOP.COM
hdfs/*@HADOOP.COM
hive/*@HADOOP.COM
httpfs/*@HADOOP.COM
HTTP/*@HADOOP.COM
hue/*@HADOOP.COM
impala/*@HADOOP.COM
mapred/*@HADOOP.COM
oozie/*@HADOOP.COM
solr/*@HADOOP.COM
sqoop/*@HADOOP.COM
yarn/*@HADOOP.COM
zookeeper/*@HADOOP.COM
Appendix A Troubleshooting
Solution:
You can examine the Kerberos tickets currently in your credentials cache by running the klist command. You
can obtain a ticket by running the kinit command and either specifying a keytab file containing credentials, or
entering the password for your principal.
Because of a change [1] in the format in which MIT Kerberos writes its credentials cache, there is a bug [2] in
the Oracle JDK 6 Update 26 and earlier that causes Java to be unable to read the Kerberos credentials cache
created by versions of MIT Kerberos 1.8.1 or higher. Kerberos 1.8.1 is the default in Ubuntu Lucid and later
releases and Debian Squeeze and later releases. (On RHEL and CentOS, an older version of MIT Kerberos which
does not have this issue, is the default.)
Footnotes:
[1] MIT Kerberos change: http://krbdev.mit.edu/rt/Ticket/Display.html?id=6206
[2] Report of bug in Oracle JDK 6 Update 26 and earlier:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6979329
Appendix A Troubleshooting
Solution:
If you encounter this problem, you can work around it by running kinit -R after running kinit initially to obtain
credentials. Doing so will cause the ticket to be renewed, and the credentials cache rewritten in a format which
Java can read. To illustrate this:
$ klist
klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_1000)
$ hadoop fs -ls
11/01/04 13:15:51 WARN ipc.Client: Exception encountered while connecting to the server
: javax.security.sasl.SaslException:
GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism
level: Failed to find any Kerberos tgt)]
Bad connection to FS. command aborted. exception: Call to nn-host/10.0.0.2:8020 failed
on local exception: java.io.IOException:
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No
valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
$ kinit
Password for [email protected]:
$ klist
Ticket cache: FILE:/tmp/krb5cc_1000
Default principal: [email protected]
Valid starting
Expires
Service principal
01/04/11 13:19:31 01/04/11 23:19:31 krbtgt/[email protected]
renew until 01/05/11 13:19:30
$ hadoop fs -ls
11/01/04 13:15:59 WARN ipc.Client: Exception encountered while connecting to the server
: javax.security.sasl.SaslException:
GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism
level: Failed to find any Kerberos tgt)]
Bad connection to FS. command aborted. exception: Call to nn-host/10.0.0.2:8020 failed
on local exception: java.io.IOException:
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No
valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
$ kinit -R
$ hadoop fs -ls
Found 6 items
drwx------ atm atm
0 2011-01-02 16:16 /user/atm/.staging
Note:
This workaround for Problem 2 requires the initial ticket to be renewable. Note that whether or not
you can obtain renewable tickets is dependent upon a KDC-wide setting, as well as a per-principal
setting for both the principal in question and the Ticket Granting Ticket (TGT) service principal for the
realm. A non-renewable ticket will have the same values for its "valid starting" and "renew until"
times. If the initial ticket is not renewable, the following error message is displayed when attempting
to renew the ticket:
kinit: Ticket expired while renewing credentials
Appendix A Troubleshooting
org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:144)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:160)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1484)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1432)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1408)
at org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:418)
at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:279)
at org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:203)
at
org.apache.hadoop.test.MiniHadoopClusterManager.start(MiniHadoopClusterManager.java:152)
at
org.apache.hadoop.test.MiniHadoopClusterManager.run(MiniHadoopClusterManager.java:129)
at
org.apache.hadoop.test.MiniHadoopClusterManager.main(MiniHadoopClusterManager.java:308)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:83)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Solution:
Make sure that the umask for hdfs and mapred is 0022.
Appendix A Troubleshooting
If you're encountering this problem, you may see errors in the TaskTracker or NodeManager logs. The following
example is for a TaskTracker on MRv1:
10/11/03 01:29:55 INFO mapred.JobClient: Task Id : attempt_201011021321_0004_m_000011_0,
Status : FAILED
Error initializing attempt_201011021321_0004_m_000011_0:
java.io.IOException: org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.mapred.LinuxTaskController.runCommand(LinuxTaskController.java:212)
at
org.apache.hadoop.mapred.LinuxTaskController.initializeUser(LinuxTaskController.java:442)
at
org.apache.hadoop.mapreduce.server.tasktracker.Localizer.initializeUserDirs(Localizer.java:272)
at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:963)
at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2209)
at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2174)
Caused by: org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:250)
at org.apache.hadoop.util.Shell.run(Shell.java:177)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:370)
at org.apache.hadoop.mapred.LinuxTaskController.runCommand(LinuxTaskController.java:203)
... 5 more
Solution:
Delete the mapred.local.dir or yarn.nodemanager.local-dirs directories for that user across the cluster.
and
Caused by: KrbException: Identifier doesn't match expected value (906)
Note:
These KrbException error messages are displayed only if you enable debugging output. See Appendix
D - Enabling Debugging Output for the Sun Kerberos Classes.
Solution:
Although there are several possible problems that can cause these two KrbException error messages to display,
here are some actions you can take to solve the most likely problems:
If you are using CentOS/Red Hat Enterprise Linux 5.6 or later, or Ubuntu, which use AES-256 encryption by
default for tickets, you must install the Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction
Policy File on all cluster and Hadoop user machines. For information about how to verify the type of encryption
used in your cluster, see Step 3: If you are Using AES-256 Encryption, install the JCE Policy File on page 24.
Alternatively, you can change your kdc.conf or krb5.conf to not use AES-256 by removing
aes256-cts:normal from the supported_enctypes field of the kdc.conf or krb5.conf file. Note that
after changing the kdc.conf file, you'll need to restart both the KDC and the kadmin server for those changes
Appendix A Troubleshooting
to take affect. You may also need to recreate or change the password of the relevant principals, including
potentially the TGT principal (krbtgt/REALM@REALM).
In the [realms] section of your kdc.conf file, in the realm corresponding to HADOOP.LOCALDOMAIN, add (or
replace if it's already present) the following variable:
supported_enctypes = des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal
des-cbc-md5:normal des-cbc-crc:normal des-cbc-crc:v4 des-cbc-crc:afs3
Recreate the hdfs keytab file and mapred keytab file using the -norandkey option in the xst command (for
details, see Step 4: Create and Deploy the Kerberos Principals and Keytab Files on page 25).
kadmin.local: xst -norandkey -k hdfs.keytab hdfs/fully.qualified.domain.name
HTTP/fully.qualified.domain.name
kadmin.local: xst -norandkey -k mapred.keytab mapred/fully.qualified.domain.name
HTTP/fully.qualified.domain.name
Problem 6: The NameNode starts but clients cannot connect to it and error
message contains enctype code 18.
Description:
The NameNode keytab file does not have an AES256 entry, but client tickets do contain an AES256 entry. The
NameNode starts but clients cannot connect to it. The error message doesn't refer to "AES256", but does contain
an enctype code "18".
Solution:
Make sure the "Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy File" is installed or
remove aes256-cts:normal from the supported_enctypes field of the kdc.conf or krb5.conf file. For more
information, see the first suggested solution above for Problem 5.
For more information about the Kerberos encryption types, see
http://www.iana.org/assignments/kerberos-parameters/kerberos-parameters.xml.
(MRv1 Only) Problem 7: Jobs won't run and TaskTracker is unable to create
a local mapred directory.
Description:
The TaskTracker log contains the following error message:
11/08/17 14:44:06 INFO mapred.TaskController: main : user is atm
11/08/17 14:44:06 INFO mapred.TaskController: Failed to create directory
/var/log/hadoop/cache/mapred/mapred/local1/taskTracker/atm - No such file or directory
11/08/17 14:44:06 WARN mapred.TaskTracker: Exception while localization
java.io.IOException: Job initialization failed (20)
at
org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:191)
at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1199)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1174)
at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1089)
at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2257)
Appendix A Troubleshooting
at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2221)
Caused by: org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
at org.apache.hadoop.util.Shell.run(Shell.java:182)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
at
org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:184)
... 8 more
Solution:
Make sure the value specified for mapred.local.dir is identical in mapred-site.xml and taskcontroller.cfg.
If the values are different, the error message above is returned.
(MRv1 Only) Problem 8: Jobs won't run and TaskTracker is unable to create
a Hadoop logs directory.
Description:
The TaskTracker log contains an error message similar to the following :
11/08/17 14:48:23 INFO mapred.TaskController: Failed to create directory
/home/atm/src/cloudera/hadoop/build/hadoop-0.23.2-cdh3u1-SNAPSHOT/logs1/userlogs/job_201108171441_0004
- No such file or directory
11/08/17 14:48:23 WARN mapred.TaskTracker: Exception while localization
java.io.IOException: Job initialization failed (255)
at
org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:191)
at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1199)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1174)
at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1089)
at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2257)
at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2221)
Caused by: org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
at org.apache.hadoop.util.Shell.run(Shell.java:182)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
at
org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:184)
... 8 more
Solution:
In MRv1, the default value specified for hadoop.log.dir in mapred-site.xml is
/var/log/hadoop-0.20-mapreduce. The path must be owned and be writable by the mapred user. If you
change the default value specified for hadoop.log.dir, make sure the value is identical in mapred-site.xml
and taskcontroller.cfg. If the values are different, the error message above is returned.
Appendix A Troubleshooting
Problem 9: After you enable cross-realm trust, you can run Hadoop
commands in the local realm but not in the remote realm.
Description:
After you enable cross-realm trust, authenticating as a principal in the local realm will allow you to successfully
run Hadoop commands, but authenticating as a principal in the remote realm will not allow you to run Hadoop
commands. The most common cause of this problem is that the principals in the two realms either don't have
the same encryption type, or the cross-realm principals in the two realms don't have the same password. This
issue manifests itself because you are able to get Ticket Granting Tickets (TGTs) from both the local and remote
realms, but you are unable to get a service ticket to allow the principals in the local and remote realms to
communicate with each other.
Solution:
On the local MIT KDC server host, type the following command in the kadmin.local or kadmin shell to add the
cross-realm krbtgt principal:
kadmin: addprinc -e "<enc_type_list>"
krbtgt/[email protected]
where the <enc_type_list> parameter specifies the types of encryption this cross-realm krbtgt principal will
support: AES, DES, or RC4 encryption. You can specify multiple encryption types using the parameter in the
command above, what's important is that at least one of the encryption types parameters corresponds to the
encryption type found in the tickets granted by the KDC in the remote realm. For example:
kadmin: addprinc -e "aes256-cts:normal rc4-hmac:normal des3-hmac-sha1:normal"
krbtgt/[email protected]
(MRv1 Only) Problem 10: Jobs won't run and can't access files in
mapred.local.dir .
Description:
The TaskTracker log contains the following error message:
WARN org.apache.hadoop.mapred.TaskTracker: Exception while localization
java.io.IOException: Job initialization failed (1)
Solution:
1. Add the mapred user to the mapred and hadoop groups on all hosts.
2. Restart all TaskTrackers.
Appendix A Troubleshooting
Problem 11: Users are unable to obtain credentials when running Hadoop
jobs or commands.
Description:
This error occurs because the ticket message is too large for the default UDP protocol. An error message similar
to the following may be displayed:
13/01/15 17:44:48 DEBUG ipc.Client: Exception encountered while connecting to the server
: javax.security.sasl.SaslException:
GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism
level: Fail to create credential.
(63) - No service creds)]
Solution:
Force Kerberos to use TCP instead of UDP by adding the following parameter to libdefaults in the krb5.conf
file on the client(s) where the problem is occurring.
[libdefaults]
udp_preference_limit = 1
Note:
More Info About the udp_preference_limit Property
When sending a message to the KDC, the library will try using TCP before UDP if the size of the ticket
message is larger than the setting specified for the udp_preference_limit property. If the ticket
message is smaller than udp_preference_limit setting, then UDP will be tried before TCP. Regardless
of the size, both protocols will be tried if the first attempt fails.
is a replay
Description:
Symptom: The following exception shows up in the logs for one or more of the Hadoop daemons:
2013-02-28 22:49:03,152 INFO ipc.Server (Server.java:doRead(571)) - IPC Server listener
on 8020: readAndProcess threw exception javax.security.sasl.SaslException: GSS initiate
failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism l
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure
unspecified at GSS-API level (Mechanism level: Request is a replay (34))]
at
com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:159)
at org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1040)
at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1213)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:566)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:363)
Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism level: Request
is a replay (34))
at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:741)
at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:323)
at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:267)
at
com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:137)
... 4 more
Caused by: KrbException: Request is a replay (34)
Appendix A Troubleshooting
at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:300)
at sun.security.krb5.KrbApReq.<init>(KrbApReq.java:134)
at sun.security.jgss.krb5.InitSecContextToken.<init>(InitSecContextToken.java:79)
at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:724)
... 7 more
In addition, this problem can manifest itself as performance issues for all clients in the cluster, including dropped
connections, timeouts attempting to make RPC calls, and so on.
Likely causes:
Multiple services in the cluster are using the same kerberos principal. All secure clients that run on multiple
machines should use unique kerberos principals for each machine. For example, rather than connecting as
a service principal [email protected], services should have per-host principals such as
myservice/[email protected].
Clocks not in synch: All hosts should run NTP so that clocks are kept in synch between clients and servers.
Be owned by root
Be owned by a group that contains only the user running the MapReduce daemons
Be setuid
Be group readable and executable
The TaskTracker will check for this configuration on start up, and fail to start if the Task-controller is not configured
correctly.
Be owned by root
Be owned by a group that contains only the user running the YARN daemons
Be setuid
Be group readable and executable
Principal Translation
The first section of a rule, <principal translation>, performs the matching of the principal name to the rule.
If there is a match, the principal translation also does the initial translation of the principal name to a short
name. In the <principal translation> section, you specify the number of components in the principal name
and the pattern you want to use to translate those principal component(s) and realm into a short name. In
Kerberos terminology, a principal name is a set of components separated by slash ("/") characters.
The principal translation is composed of two parts that are both specified within "[ ]" using the following syntax:
[<number of components in principal name>:<initial specification of short name>]
where:
<number of components in principal name> This first part specifies the number of components in the principal
name (not including the realm) and must be 1 or 2. A value of 1 specifies principal names that have a single
component (for example, hdfs), and 2 specifies principal names that have two components (for example,
hdfs/fully.qualified.domain.name). A principal name that has only one component will only match
single-component rules, and a principal name that has two components will only match two-component rules.
<initial specification of short name> This second part specifies a pattern for translating the principal
component(s) and the realm into a short name. The variable $0 translates the realm, $1 translates the first
component, and $2 translates the second component.
Here are some examples of principal translation sections. These examples use [email protected] and
atm/[email protected] as principal name inputs:
[1:$1]
atm
[1:$1.foo]
atm.foo
[2:$1@$0]
[2:$1]
atm/[email protected]
atm/fully.qualified.domain.name
[email protected]
atm
Footnotes:
1
Rule does not match because there are two components in principal name
atm/[email protected]
2
Rule does not match because there is one component in principal name [email protected]
Acceptance Filter
The second section of a rule, (<acceptance filter>), matches the translated short name from the principal
translation (that is, the output from the first section). The acceptance filter is specified in "( )" characters and is
a standard regular expression. A rule matches only if the specified regular expression matches the entire
translated short name from the principal translation. That is, there's an implied ^ at the beginning of the pattern
and an implied $ at the end.
Example Rules
Suppose all of your service principals are either of the form
App.service-name/[email protected] or
[email protected], and you want to map these to the short name string service-name.
The first $1 in each rule is a reference to the first component of the full principal name, and the second $1 is a
regular expression back-reference to text that is matched by (.*).
In the following example, suppose your company's naming scheme for user accounts in Active Directory is
FirstnameLastname (for example, JohnDoe), but user home directories in HDFS are /user/firstnamelastname.
The following rule set converts user accounts in the CORP.EXAMPLE.COM domain to lowercase.
<property>
<name>hadoop.security.auth_to_local</name>
<value>RULE:[1:$1@$0](.*@\QCORP.EXAMPLE.COM\E$)s/@\QCORP.EXAMPLE.COM\E$///L
RULE:[2:$1@$0](.*@\QCORP.EXAMPLE.COM\E$)s/@\QCORP.EXAMPLE.COM\E$///L
DEFAULT</value>
</property>
In this example, the [email protected] principal becomes the johndoe HDFS user.
Default Rule
You can specify an optional default rule called DEFAULT (see example above). The default rule reduces a principal
name down to its first component only. For example, the default rule reduces the principal names
[email protected] or atm/[email protected] down to atm, assuming that
the default domain is YOUR-REALM.COM.
The default rule applies only if the principal is in the default realm.
If a principal name does not match any of the specified rules, the mapping for that principal name will fail.
Name
Description
INVALID_ARGUMENT_NUMBER
INVALID_USER_NAME
INVALID_COMMAND_PROVIDED
INVALID_TT_ROOT
SETUID_OPER_FAILED
UNABLE_TO_EXECUTE_TASK_SCRIPT
UNABLE_TO_KILL_TASK
INVALID_TASK_PID
10
ERROR_RESOLVING_FILE_PATH
11
Name
Description
12
UNABLE_TO_STAT_FILE
13
FILE_NOT_OWNED_BY_TASKTRACKER
14
PREPARE_ATTEMPT_DIRECTORIES_FAILED
15
INITIALIZE_JOB_FAILED
16
PREPARE_TASK_LOGS_FAILED
17
INVALID_TT_LOG_DIR
18
OUT_OF_MEMORY
19
INITIALIZE_DISTCACHEFILE_FAILED
20
INITIALIZE_USER_FAILED
21
UNABLE_TO_BUILD_PATH
22
INVALID_TASKCONTROLLER_PERMISSIONS
23
PREPARE_JOB_LOGS_FAILED
24
INVALID_CONFIG_FILE
255
Unknown Error
Name
Description
Jobs won't run and the TaskTracker is unable
to create a Hadoop logs directory. For more
information, see (MRv1 Only) Problem 8: Jobs
won't run and TaskTracker is unable to create
a Hadoop logs directory. on page 186.
This error is often caused by previous errors;
look earlier in the log file for possible causes.
Name
Description
INVALID_ARGUMENT_NUMBER
INVALID_USER_NAME
INVALID_COMMAND_PROVIDED
INVALID_NM_ROOT
SETUID_OPER_FAILED
UNABLE_TO_EXECUTE_CONTAINER_SCRIPT
UNABLE_TO_SIGNAL_CONTAINER
INVALID_CONTAINER_PID
18
OUT_OF_MEMORY
20
INITIALIZE_USER_FAILED
21
UNABLE_TO_BUILD_PATH
Name
Description
22
INVALID_CONTAINER_EXEC_PERMISSIONS
24
INVALID_CONFIG_FILE
25
SETSID_OPER_FAILED
26
WRITE_PIDFILE_FAILED
255
Unknown Error
2. Create the mapred keytab file, which contains an entry for the mapred principal. If you are using MRv1, the
mapred keytab file is used for the JobTracker and TaskTrackers. If you are using YARN, the mapred keytab
file is used for the MapReduce Job History Server.
kadmin:
3. YARN only: Create the yarn keytab file, which contains an entry for the yarn principal. This keytab file is
used for the ResourceManager and NodeManager.
kadmin:
4. Create the http keytab file, which contains an entry for the HTTP principal.
kadmin:
rkt hdfs-unmerged.keytab
rkt http.keytab
wkt hdfs.keytab
clear
rkt mapred-unmerged.keytab
rkt http.keytab
wkt mapred.keytab
clear
rkt yarn-unmerged.keytab
rkt http.keytab
wkt yarn.keytab
This procedure creates three new files: hdfs.keytab, mapred.keytab and yarn.keytab. These files contain
entries for the hdfs and HTTP principals, the mapred and HTTP principals, and the yarn and HTTP principals
respectively.
6. Use klist to display the keytab file entries. For example, a correctly-created hdfs keytab file should look
something like this:
$ klist -e -k -t hdfs.keytab
Keytab name: WRFILE:hdfs.keytab
slot KVNO Principal
---- ---- --------------------------------------------------------------------1
7
HTTP/[email protected] (DES cbc mode with
CRC-32)
2
7
HTTP/[email protected] (Triple DES cbc mode
with HMAC/sha1)
3
7
hdfs/[email protected] (DES cbc mode with
CRC-32)
4
7
hdfs/[email protected] (Triple DES cbc mode
with HMAC/sha1)
7. To verify that you have performed the merge procedure correctly, make sure you can obtain credentials as
both the hdfs and HTTP principals using the single merged keytab:
$ kinit -k -t hdfs.keytab hdfs/[email protected]
$ kinit -k -t hdfs.keytab HTTP/[email protected]
If either of these commands fails with an error message such as "kinit: Key table entry not found
while getting initial credentials", then something has gone wrong during the merge procedure. Go
back to step 1 of this document and verify that you performed all the steps correctly.
8. To continue the procedure of configuring Hadoop security in CDH 5, follow the instructions in the section To
deploy the Kerberos keytab files.
6. Configure firewalls.
Block all access from outside the cluster.
The gateway node should have ports 11000 (oozie) and 14000 (hadoop-httpfs) open.
Optionally, to maintain access to the Web UIs for the cluster's JobTrackers, NameNodes, etc., open their
HTTP ports: see Ports Used by Components of CDH 5.
7. Optionally configure authentication in simple mode (default) or using Kerberos. See HttpFS Security
Configuration on page 93 to configure Kerberos for HttpFS and Oozie Security Configuration on page 87 to
configure Kerberos for Oozie.
8. Optionally encrypt communication via HTTPS for Oozie by following these directions.
Accessing HDFS
With the Hadoop client:
All of the standard hadoop fs commands will work; just make sure to specify -fs webhdfs://HOSTNAME:14000.
For example (where GATEWAYHOST is the hostname of the gateway machine):
$ hadoop fs -fs webhdfs://GATEWAYHOST:14000 -cat /user/me/myfile.txt
Hello World!
You can find a full explanation of the commands in the WebHDFS REST API documentation.
Appendix H - Using a Web Browser to Access an URL Protected by Kerberos HTTP SPNEGO
Open Internet Explorer and click the Settings "gear" icon in the top-right corner. Select Internet options.
Select the Security tab.
Select the Local Intranet zone and click the Sites button.
Make sure that the first two options, Include all local (intranet) sites not listed in other zones and Include
all sites that bypass the proxy server are checked.
CDH 5 Security Guide | 207
Appendix H - Using a Web Browser to Access an URL Protected by Kerberos HTTP SPNEGO
5. Click Advanced and add the names of the domains that are protected by Kerberos HTTP SPNEGO, one at a
time, to the list of websites. For example, myhost.example.com. Click Close.
6. Click OK to save your configuration changes.
Appendix H - Using a Web Browser to Access an URL Protected by Kerberos HTTP SPNEGO
You need to perform the following steps only if you have a proxy server already enabled.
1.
2.
3.
4.
5.
6.
Click the Settings "gear" icon in the top-right corner. Select Internet options.
Select the Connections tab and click LAN Settings.
Verify that the proxy server Address and Port number settings are correct.
Click Advanced to open the Proxy Settings dialog box.
Add the Kerberos-protected domains to the Exceptions field.
Click OK to save any changes.
Kerberos Issues
For Kerberos issues, your krb5.conf and kdc.conf files are valuable for support to be able to understand
your configuration.
If you are having trouble with client access to the cluster, provide the output for klist -ef after kiniting as
the user account on the client host in question. Additionally, confirm that your ticket is renewable by running
kinit -R after successfully kiniting.
Specify if you are authenticating (kiniting) with a user outside of the Hadoop cluster's realm (such as Active
Directory, or another MIT Kerberos realm).
If using AES-256 encryption, please ensure you have the Unlimited Strength JCE Policy Files deployed on all
cluster and client nodes.
SSL/TLS Issues
Specify whether you are using a private/commercial CA for your certificates, or if they are self-signed.
Clarify what services you are attempting to setup SSL/TLS for in your description.
When troubleshooting SSL/TLS trust issues, provide the output of the following openssl command:
openssl s_client -connect host.fqdn.name:port
LDAP Issues
Specify the LDAP service in use (Active Directory, OpenLDAP, one of Oracle Directory Server offerings, OpenDJ,
etc)
Provide a screenshot of the LDAP configuration screen you are working with if you are troubleshooting setup
issues.
Be prepared to troubleshoot using the ldapsearch command (requires the openldap-clients package)
on the host where LDAP authentication or authorization issues are being seen.