Pdi Security
Pdi Security
https://help.pentaho.com/Documentation/5.2/Version_5.2 1/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
Copyright Page
This document supports Pentaho Business Analytics Suite 5.2 GA and Pentaho Data Integration 5.2 GA,
documentation revision October 7, 2014, copyright © 2014 Pentaho Corporation. No part may be reprinted
without written permission from Pentaho Corporation. All trademarks are the property of their respective
owners.
If you do not find answers to your questions here, please contact your Pentaho technical support
representative.
Support-related questions should be submitted through the Pentaho Customer Support Portal at
http://support.pentaho.com.
For information about how to purchase support or enable an additional named support contact, please
contact your sales representative, or send an email to [email protected].
The author(s) of this document have used their best efforts in preparing the content and the programs
contained in it. These efforts include the development, research, and testing of the theories and programs to
determine their effectiveness. The author and publisher make no warranty of any kind, express or implied,
with regard to these programs or the documentation contained in this book.
The author(s) and Pentaho shall not be liable in the event of incidental or consequential damages in
connection with, or arising out of, the furnishing, performance, or use of the programs, associated instructions,
and/or claims.
Trademarks
The trademarks, logos, and service marks ("Marks") displayed on this website are the property of Pentaho
Corporation or third party owners of such Marks. You are not permitted to use, copy, or imitate the Mark, in
whole or in part, without the prior written consent of Pentaho Corporation or such third party. Trademarks of
Pentaho Corporation include, but are not limited, to "Pentaho", its products, services and the Pentaho logo.
https://help.pentaho.com/Documentation/5.2/Version_5.2 2/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
Trademarked names may appear throughout this website. Rather than list the names and entities that own the
trademarks or inserting a trademark symbol with each mention of the trademarked name, Pentaho
Corporation states that it is using the names for editorial purposes only and to the benefit of the trademark
owner, with no intention of infringing upon that trademark.
For a listing of open source software used by each Pentaho component, navigate to the folder that contains
the Pentaho component. Within that folder, locate a folder named licenses. The licenses folder contains
HTML.files that list the names of open source software, their licenses, and required attributions.
Contact Us
http://www.pentaho.com
https://help.pentaho.com/Documentation/5.2/Version_5.2/Copyright_Page 3/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
Introduction
Prerequisites
Pentaho Data Integration (PDI) can be configured to use your implementation of LDAP, MSAD, Apache DS, or
Kerberos to authenticate users and authorize data access. You can also configure PDI to use a Single Sign On
(SSO) framework or use a combination of these approaches.
Before you implement advanced security, you should have installed and configured the DI Server and Spoon,
which is the PDI design tool.
You should have administrative-level knowledge of the security provider you want to use, details about your
user community, and a plan for the user roles to be used in PDI. You should also know how to use the
command line to issue commands for Microsoft Windows or Linux.
You will need a text editor to modify text files. You might also need to work on the actual machine that has DI
software installed.
We support two different security options: Pentaho Security or advanced security providers, such as LDAP,
Single Sign-On, or Microsoft Active Directory. This table can help you choose the option that is best for your
environment.
Summary Pentaho Security is the easiest way to If you are already using a security
configure security quickly. Spoon provider, such as LDAP, Single Sign-
enables you to define and manage On, or Microsoft Active Directory, you
users and roles. The DI Server can use the users and roles you have
controls which users and roles can already defined with Pentaho. Your
access resources in the DI repository. security provider controls which
users and roles can access the DI
Pentaho Security works well if you repository.
do not have a security provider or Advanced security scales well for
if you have a user community production and enterprise user
with less than 100 users. communities.
https://help.pentaho.com/Documentation/5.2/Version_5.2/Copyright_Page 4/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
Explore Considerations Choose Options
Related Articles
These articles explain how to administer, fine-tune, and troubleshoot Pentaho systems.
Administer DI Server
Troubleshoot DI Server Issues
https://help.pentaho.com/Documentation/5.2/0P0/0W0/000 5/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
Configure LDAP for the DI Server
You must have a working directory server with an established configuration before continuing.
Follow the instructions below to manually switch from Pentaho default security to LDAP security.
contextSource.providerUrl=ldap\://localhost\:10389/ou\=system
contextSource.password=secret
4. Update adminRole and adminUser for your system, replacing adminRole with the administrator role
that you have
defined in your LDAP server, and replacing adminUser with the user name that has the administrator
role assigned
to it.
adminRole=cn\=Administrator,ou\=roles
adminUser=uid\=admin,ou\=users
5. Save and close the file, then edit the following files in the /pentaho/server/data-integration-
server/pentaho-solutions/system/ directory and change all instances of
the Administrator and Authenticated role values to match the appropriate roles in
your LDAP configuration:pentaho.xmlrepository.spring.propertiesapplicationContext-spring-
security.xml
pentaho.xml
repository.spring.properties
applicationContext-spring-security.xml
https://help.pentaho.com/Documentation/5.2/0P0/0W0/001 6/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
LDAP Properties
Users
These options control how the LDAP server is searched for usernames that are entered in the Pentaho login
dialog box.
The {0} token is replaced by the username from the login dialog.
definition, you would not need to repeat that in userSearch.searchBase below because it is appended
automatically to the defined value here.
https://help.pentaho.com/Documentation/5.2/0P0/0W0/001 7/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
usernames are represented by
sAMAccountName; full names
are represented by
displayName.
Populator
The populator matches fully distinguished user names from userSearch to distinguished role names for
roles those users belong to.
The {0} token will be replaced with the user DN found during a user search; the {1} token is replaced with
the username entered in the login screen.
https://help.pentaho.com/Documentation/5.2/0P0/0W0/001 8/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
LDAP Property Purpose Example
https://help.pentaho.com/Documentation/5.2/0P0/0W0/001/000 9/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
Manual JDBC Connection Configuration
You must have existing security tables in a relational database in order to proceed with this task.
Follow the instructions below to switch from Pentaho default security to JDBC security, which will allow you to
use your own security tables.
Note: If you are using the BA Server and choose to switch to a JDBC security shared object, you will no longer
be able to use the role and user administration settings in the Administration portion of the User Console.
datasource.driver.classname=org.hsqldb.jdbcDriver
datasource.url=jdbc:hsqldb:hsql://localhost:9002/userdb
b. Change the user name and password by editing these two items.
\datasource.username=sa, datasource.password=
c. Set the validation query by editing this row. There are examples of different validation queries
in the file.
d. Set the wait timeout, max pool, and max idle by editing these three items to change the
defaults.
https://help.pentaho.com/Documentation/5.2/0P0/0W0/001/000 10/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
<value>
<![CDATA[SELECT username, authority FROM GRANTED_AUTHORITIES
WHERE username = ? ORDER BY authority]]>
</value>
c. Find this line and change the query that determines the user, password, and whether they can
log in as appropriate.
<value>
<![CDATA[SELECT username, password, enabled FROM USERS WHERE
username = ? ORDER BY username]]>
</value>
6. If you need to, modify these three queries that pull information about users/authorities.
a. Open the /pentaho-solutions/system/applicationContext-pentaho-security-
jdbc.xml file with a text editor.
b. Find this line and change the query that shows the roles for security on objects as appropriate.
<value>
<![CDATA[SELECT distinct(authority) as authority FROM AUTHORITIES
ORDER BY authority]]>
</value>
c. Find this line and change the query that returns all users in a specific role as appropriate.
<value>
<![CDATA[SELECT distinct(username) as username FROM GRANTED_
AUTHORITIES where authority = ? ORDER BY username]]>
</value>
d. Find this line and change the query that returns all users in a specific role as appropriate.
<value>
<![CDATA[SELECT distinct(username) as username FROM USERS ORDER
BY username]]>
</value>
singleTenantAdminUserName=<Admin User>
https://help.pentaho.com/Documentation/5.2/0P0/0W0/001/000 11/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
8. To fully map the JDBC's admin role to other configuration files, specify the name of the administrator
role for your JDBC authentication database in the applicationContext-pentaho-security-
jdbc.xml file.
a. Open the /pentaho-solutions/system/applicationContext-pentaho-security-
jdbc.xml file with a text editor.
b. Find these lines and change the entry key to the key assigned to the administrator role in your
JDBC authentication database.
https://help.pentaho.com/Documentation/5.2/0P0/0W0/002 12/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
Create LDAP/JDBC Hybrid Configuration for the DI
Server
You must have a working directory server with an established configuration, and a database containing your
user roles before continuing.
It is possible to use a directory server for user authentication and a JDBC security table for role definitions. This
is common in situations where LDAP roles cannot be redefined for DI Server use. Follow the below instructions
to switch the BA Server's authentication backend from the Pentaho data access object to an LDAP/JDBC hybrid.
Note: Replace the pentahoAdmins and pentahoUsers references in the examples below with the appropriate
roles from your LDAP configuration.
<pen:publish as-type="INTERFACES">
<pen:attributes>
<pen:attr key="priority" value="50"/>
</pen:attributes>
</pen:publish>
</pen:bean>
<bean id="dataSource"
class="org.springframework.jdbc.datasource.DriverManagerDataSource">
https://help.pentaho.com/Documentation/5.2/0P0/0W0/002 13/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
<property name="driverClassName" value="org.
hsqldb:hsql://localhost:9002/userdb" />
<property name="url" value="jdbc:hsqldb:hsql://localhost:9002/userdb"
/>
<property name="username" value="sa" />
<property name="password" value="" />
</bean>
<bean id="userDetailsService"
class="org.springframework.security.userdetails.jdbc.JdbcDaoImpl">
<property name="dataSource">
<ref local="dataSource" />
</property>
<property name="authoritiesByUsernameQuery">
<value> <![CDATA[SELECT username, authority FROM
granted_authorities WHERE username = ?}}></value>
</property>
<property name="usersByUsernameQuery">
<value> <![CDATA[SELECT username,
password, enabled FROM users WHERE username = ?]]>
</value>
</property>
</bean>
6. Close applicationContext-pentaho-security-jdbc.xml.
7. Open /pentaho-solutions/system/applicationContext-springsecurity-ldap.xml file and
replace the populator bean definition with this one.
https://help.pentaho.com/Documentation/5.2/0P0/0W0/002 14/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
10. Start the DI Server and Spoon, then log into Spoon.
The DI Server is configured to authenticate users against your directory server.
https://help.pentaho.com/Documentation/5.2/0P0/0W0/003 15/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
Configure Microsoft Active Directory for the DI Server
The server does not recognize any difference among LDAP-based directory servers, including Active Directory.
However, the way that you modify certain LDAP-specific files will probably be different for Microsoft Active
Directory (MSAD) than for more traditional LDAP implementations. Below are some tips for specific MSAD-
specific configurations that you might find helpful. The file you need to edit is applicationContext-pentaho-
security-ldap.xml.
Binding
MSAD allows you to uniquely specify users in two ways, in addition to the standard DN. If the standard DN is
not working, try one of the two below. Each of the following examples is shown in the context of the userDn
property of the Spring Security DefaultSpringSecurityContextSource bean.
Note: The examples in this section use DefaultSpringSecurityContextSource. Be aware that you may need to
use the same notation (Kerberos or Windows domain) in all of your DN patterns.
Kerberos notation example for [email protected]:
File: applicationContext-security-ldap.properties
contextSource.providerUrl=ldap\://mycompany\:389
[email protected]
contextSource.password=omitted
File: applicationContext-security-ldap.properties
contextSource.providerUrl=ldap\://mycompany\:389
contextSource.userDn=MYCOMPANY\pentahoadmin
contextSource.password=omitted
Referrals
If more than one Active Directory instance is serving directory information, it may be necessary to enable
referral following. This is accomplished by modifying the DefaultSpringSecurityContextSource bean.
https://help.pentaho.com/Documentation/5.2/0P0/0W0/003 16/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
<property name="userDn" value="${contextSource.userDn}"/>
<property name="password" value="${contextSource.password}"/>
<property name="referral" value="follow" />
</bean>
<bean id="authenticator"
class="org.springframework.security.providers.ldap.authenticator.
BindAuthenticator">
<constructor-arg>
<ref local="contextSource"/>
</constructor-arg>
<propertyname="userDnPatterns">
<list>
<value>{0}@mycompany.com
</value> <!-- and/or -->
<value>domain\{0}</value>
</list>
</property>
</bean>
In user searches, the sAMAccountName attribute should be used as the username. The searchSubtree
property (which influences the SearchControls) should most likely be true. Otherwise, it searches the specified
base plus one level down.
<bean id="userSearch"
class="org.springframework.security.ldap.search.FilterBasedLdapUserSearch">
<constructor-arg index="0" value="DC=mycompany,DC=com" />
<constructor-arg index="1">
https://help.pentaho.com/Documentation/5.2/0P0/0W0/003 17/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
<value>(sAMAccountName={0})</value>
</constructor-arg> <constructor-arg index="2">
<ref local="contextSource" />
</constructor-arg>
<property name="searchSubtree" value="true"/>
</bean>
Nested Groups
You can remove nested or transitive groups out of Active Directory. In the LDAP popular group filter, enter the
following LDAP filter for MSAD nested groups:
(member:1.2.840.113556.1.4.1941:={0})
https://help.pentaho.com/Documentation/5.2/0P0/0W0/004 18/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
Implement Kerberos Authentication
If your Hadoop clusters or MongoDB installation is secured using Kerberos, you can configure the Spoon and
DI Server nodes so that users can access them.
https://help.pentaho.com/Documentation/5.2/0P0/0W0/030 19/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
Use Kerberos Authentication to Provide Spoon Users
Access to Hadoop Cluster
If you use Kerberos to authenticate access to your Hadoop cluster, with a little extra configuration, you can also
use Kerberos to authenticate Spoon users who attempt the access the cluster through a step in the
transformation. When a user attempts to run a transformation that contains a step that connects to a Hadoop
cluster to perform a function, the user's account credential is matched against the credentials in the Kerberos
administrative database on the Hadoop cluster. If the credentials match, the Kerberos Key Distribution Center
(KDC) grants an authorization ticket and access is granted. If not, the user is not authentication and the step
does not run.
To set up Kerberos authentication to provide Spoon users with access to the Hadoop cluster, you will need to
perform four sets of tasks.
Install a Hadoop cluster on one or more Linux servers. The cluster should be running one of the versions of
Hadoop listed in the Configuring Pentaho for your Hadoop Distro and Version section of the Pentaho Big Data
wiki.
Configure the Hadoop cluster with a Kerberos Realm, Kerberos KDC, and Kerberos Administrative Server.
Make sure the Hadoop cluster, including the name node, data nodes, secondary name node, job tracker, and task
tracker nodes have been configured to accept remote connection requests.
Make sure the Kerberos clients have been set up for all data, task tracker, name, and job tracker nodes if you are
have deployed Hadoop using an enterprise-level program.
Install the current version of Spoon on each client machine.
Make sure each client machine can use a hostname to access the Hadoop cluster. You should also test to ensure
that IP addresses resolve to hostnames using both forward and reverse lookups.
https://help.pentaho.com/Documentation/5.2/0P0/0W0/030 20/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
1. Log in as root (or a privileged user), to the server that hosts the Kerberos database.
2. Make sure there is an operating system user account on each node in the Hadoop cluster for each user
that you want to add to the Kerberos database. Add operating system user accounts if necessary. Note
that the user account UIDs must be greater than the minimum user ID value (min.user.id). Usually,
the minimum user ID value is set to 1000.
3. Add user identification to the Kerberos database by completing these steps.
a. Open a Terminal window, then add the account username to the Kerberos database, like this.
The name should match the operating system user account that you verified (or added) in the
previous step. If successful, a message appears indicating that the user has been created.
1. If you have not done so already, log into the server that contains the Kerberos Administrative Server
and the KDC.
2. Set the Kerberos Administrative Server to run as a service when the system starts. By default, the name
of the Kerberos Administrative Server is kadmin. If you do not know how to do this, check the
documentation for your operating system.
3. Set the KDC to run as a service when the system starts. By default, the name of the KDC is krb5kdc.
https://help.pentaho.com/Documentation/5.2/0P0/0W0/030 21/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
Install JCE on Linux and Mac Clients
This step is optional. The KDC configuration includes an AES-256 encryption setting. If you want to use this
encryption strength, you will need to install the Java Cryptographic Extension (JCE) files.
1. Download the Java Cryptographic Extension (JCE) for the currently supported version of Java from the
Oracle site.
2. Read the installation instructions that are included with the download.
3. Copy the JCE jars to the java/lib/security directory where PDI is installed on the Linux client
machine.
Configure PDI for Hadoop Distribution and Version on Linux and Mac Clients
To configure DI to connect to the Hadoop cluster, you'll need to copy Hadoop configuration files from the
cluster's name node to the appropriate place in the hadoop-configurations subdirectory.
1. Back up the core-site.xml, hdfs-site.xml, and mapred-site.xml files that are in the design-
tools/data-integration/plugins/pentaho-big-data-plugin/hadoop-
configurations/<directory of the shim that is in your plugin.properties file>.
2. Copy the core-site.xml, hdfs-site.xml, and mapred-site.xml from the cluster's name node to
this directory on each client: design-tools/data-integration/plugins/pentaho-big-data-
plugin/hadoop-configurations/<directory of the shim that is in your
plugin.properties file>. Note: If you made configuration changes to the core-site.xml, hdfs-
site.xml, or mapred-site.xml files previously, you will need to make those changes again. Reference
your backed up copies of the files if needed.
Modify Kerberos Configuration File to Reflect Realm, KDC, and Admin Server on Linux
and Mac Clients
Modify the Kerberos configuration file to reflect your Realm, KDC, and Admin Server.
1. Open the krb5.conf file. By default this file is located in /etc/krb5.conf, but it might appear
somewhere else on your system.
2. Add your Realm, KDC, and Admin Server information. The information in-between the carats < >
indicates where you should modify the code to match your specific environment.
[libdefaults]
default_realm = <correct default realm name>
clockskew = 300
v4_instance_resolve = false
v4_name_convert = {
host = {
rcmd = host
ftp = ftp
}
plain = {
something = something-else
https://help.pentaho.com/Documentation/5.2/0P0/0W0/030 22/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
}
}
[realms]
<correct default realm name>= {
kdc=<KDC IP Address, or resolvable Hostname>
admin_server=< Admin Server IP Address, or resolvable Hostname>
}
MY.REALM = {
kdc = MY.COMPUTER
}
OTHER.REALM = {
v4_instance_convert = {
kerberos = kerberos
computer = computer.some.other.domain
}
}
[domain_realm]
.my.domain = MY.REALM
Consult your operating system's documentation for information on how to properly set your clock.
https://help.pentaho.com/Documentation/5.2/0P0/0W0/030 23/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
Install JCE on Windows Client
Configure PDI for Hadoop Distribution and Version on Windows Client
Download and Install Kerberos on Windows Client
Modify Kerberos Configuration File to Reflect Realm, KDC, and Admin Server
Synchronize Clock on Windows Client
Obtain Kerberos Ticket on Windows Client
1. Download the Java Cryptographic Extension (JCE) for the currently supported version of Java from the
Oracle site.
2. Read the installation instructions that are included with the download.
3. Copy the JCE jars to the java\lib\security directory where PDI is installed.
1. Back up the core-site.xml, hdfs-site.xml, and mapred-site.xml files that are in the design-
tools/data-integration/plugins/pentaho-big-data-plugin/hadoop-
configurations/<directory of the shim that is in your plugin.properties file>.
2. Copy the core-site.xml, hdfs-site.xml, and mapred-site.xml from the cluster's name node to
this directory on each client: design-tools/data-integration/plugins/pentaho-big-data-
plugin/hadoop-configurations/<directory of the shim that is in your
plugin.properties file>. Note: If you made configuration changes to the core-site.xml, hdfs-
site.xml, or mapred-site.xml files previously, you will need to make those changes again. Reference
your backed up copies of the files if necessary.
Modify Kerberos Configuration File to Reflect Realm, KDC, and Admin Server on
Windows Client
You will need to modify the Kerberos configuration file to reflect the appropriate realm, KDC, and Admin
Server.
1. Open the krb5.conf file. By default this file is located in c:\Program Data\Kerberos. This location
might be different on your system.
2. Add the appropriate realm, KDC, and Admin Server information. An example of where to add the data
appears below.
[libdefaults]
default_realm = <correct default realm name>
clockskew = 300
v4_instance_resolve = false
https://help.pentaho.com/Documentation/5.2/0P0/0W0/030 24/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
v4_name_convert = {
host = {
rcmd = host
ftp = ftp
}
plain = {
something = something-else
}
}
[realms]
<correct default realm name>= {
kdc=<KDC IP Address, or resolvable Hostname>
admin_server=< Admin Server IP Address, or resolvable Hostname>
}
MY.REALM = {
kdc = MY.COMPUTER
}
OTHER.REALM = {
v4_instance_convert = {
kerberos = kerberos
computer = computer.some.other.domain
}
}
[domain_realm]
.my.domain = MY.REALM
Consult your operating system's documentation for information on how to properly set your clock.
https://help.pentaho.com/Documentation/5.2/0P0/0W0/030 25/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
Obtain Kerberos Ticket on Windows Client
To obtain a Kerberos ticket, complete these steps.
1. Start Spoon.
2. Open an existing transformation that contains a step to connect to the Hadoop cluster. If you don't
have one, consider creating something like this.
a. Create a new transformation.
b. Drag the Generate Rows step to the canvas, open the step, indicate a limit (the number of rows
you want to generate), then put in field information, such as the name of the field, type, and a
value.
c. Click Preview to ensure that data generates, then click the Close button to save the step.
d. Drag a Hadoop File Output step onto the canvas, then draw a hop between the Generate Rows
and Hadoop File Output steps.
e. In the Filename field, indicate the path to the file that will contain the output of the Generate
Rows step. The path should be on the Hadoop cluster. Make sure that you indicate an extension
such as txt and that you want to create a parent directory and that you want to add filenames
to the result.
f. Click the OK button then save the transformation.
3. Run the transformation. If there are errors correct them.
4. When complete, open a Terminal window and view the results of the output file on the Hadoop
filesystem. For example, if you saved your file to a file named test.txt, you could type a command
like this:
https://help.pentaho.com/Documentation/5.2/0P0/0W0/030/040 26/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
Use Kerberos Authentication to Provide Spoon Users
Access to MongoDB
If you use Kerberos to authenticate access to your installation of MongoDB, with a little extra configuration,
you can also use Kerberos to authenticate Spoon users who attempt the access MongoDB through a step in a
transformation. When a user attempts to run a transformation that contains a step that connects to a
MongoDB cluster to perform a function, the credentials in the step are matched against the credentials in the
Kerberos administrative database on MongoDB. If the credentials match, the Kerberos Key Distribution Center
(KDC) grants an authorization ticket and access is granted. If not, the user is not authentication and the step
does not run.
To set up Kerberos authentication to provide Spoon users with access to MongoDB you will need to perform
several sets of tasks.
Make sure that you have installed and configured an Enterprise version MongoDB according to the instructions in
the MongoDB installation guide http://docs.mongodb.org/manual/installation/.
Configure MongoDB to use Kerberos. Instructions for how to do that appear here: http://docs.mongodb.org/
manual/tutorial/control-access-to-mongodb-with-kerberos-authentication/.
Install the current version of Spoon on each client machine.
Make sure each client machine can use a hostname to access MongoDB. You should also test to ensure that IP
addresses resolve to hostnames using both forward and reverse lookups.
1. Log in as root (or a privileged user), to the server that hosts the Kerberos database.
2. Add user identification to the Kerberos database by completing these steps.
a. Open a Terminal window.
https://help.pentaho.com/Documentation/5.2/0P0/0W0/030/040 27/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
b. Add the account username to the Kerberos database, like this. The username should match the
one used to create the user in MongoDB. See the create users part of
http://docs.mongodb.org/manual/tutorial/control-access-to-mongodb-with-kerberos-
authentication/ for more details. If successful, a message appears indicating that the user has
been created.
1. If you have not done so already, log into the server that contains the Kerberos Administrative Server
and the KDC.
2. Set the Kerberos Administrative Server to run as a service when the system starts. By default, the name
of the Kerberos Administrative Server is kadmin. If you do not know how to do this, check the
documentation for your operating system.
3. Set the KDC to run as a service when the system starts. By default, the name of the KDC is krb5kdc.
https://help.pentaho.com/Documentation/5.2/0P0/0W0/030/040 28/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
1. Download the Java Cryptographic Extension (JCE) for the currently supported version of Java from the
Oracle site.
2. Read the installation instructions that are included with the download.
3. Copy the JCE jars to the java/lib/security directory where PDI is installed on the Linux client
machine.
Modify Kerberos Configuration File to Reflect Realm, KDC, and Admin Server on Linux
and Mac Clients
Modify the Kerberos configuration file to reflect your Realm, KDC, and Admin Server.
1. Open the krb5.conf file. By default this file is located in /etc/krb5.conf, but it might appear
somewhere else on your system.
2. Add your Realm, KDC, and Admin Server information. The information in-between the carats < >
indicates where you should modify the code to match your specific environment.
[libdefaults]
default_realm = <correct default realm name>
clockskew = 300
v4_instance_resolve = false
v4_name_convert = {
host = {
rcmd = host
ftp = ftp
}
plain = {
something = something-else
}
}
[realms]
<correct default realm name>= {
kdc=<KDC IP Address, or resolvable Hostname>
admin_server=< Admin Server IP Address, or resolvable Hostname>
}
MY.REALM = {
kdc = MY.COMPUTER
}
OTHER.REALM = {
v4_instance_convert = {
kerberos = kerberos
https://help.pentaho.com/Documentation/5.2/0P0/0W0/030/040 29/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
computer = computer.some.other.domain
}
}
[domain_realm]
.my.domain = MY.REALM
Specify the Location of the Kerberos Configuration File on Mac Clients That Run Spoon
If you are configuring Spoon to use Kerberos to authenticate MongoDB on a Mac client, you might need to
manually specify where the Kerberos configuration file can be found. Do this if the version of the JRE that the
Spoon uses is earlier than Java 1.70_40, because the JRE attempts to find the Kerberos Configuration file in a
different location than the default.
-Djava.security.krb5.realm=<Kerberos Realm>
-Djava.secrutiy.krb5.kdc=<Kerberos KDC>
3. If you need to set additional configuration properties for your Kerberos installation, see Locating the
krb5.conf Configuration File section located in http://docs.oracle.com/javase/7/docs/technotes/guides/
security/jgss/tutorials/KerberosReq.html for details.
4. Close and save the launcher.properties file.
Specify the Location of the Kerberos Configuration File on Mac Clients that Run PRD
If you are configuring the PRD to use Kerberos to authenticate MongoDB on a Mac, you will need to manually
specify where the Kerberos Configuration file can be found. You must do this if the version of the JRE that the
PRD uses is earlier than Java 1.7.0_40, because it attempts to find the Kerberos Configuration file in a different
location than the default.
1. Use Finder to navigate to the Pentaho Report Designer.app file which is in the design-tools
directory. Right click and select Show Package Contents.
2. Navigate to the Contents > Java.
3. Open launcher.properties. Do not use the launcher.properties file that is in the root of the
app directory.
4. In the launcher.properties file, add a java parameter that indicates the realm and the KDC that you
specified in the Modify Kerberos Configuration File to Reflect Realm, KDC, and Admin Server step.
Make sure to set both of these properties.
-Djava.security.krb5.realm=<Kerberos Realm>
-Djava.secrutiy.krb5.kdc=<Kerberos KDC>
https://help.pentaho.com/Documentation/5.2/0P0/0W0/030/040 30/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
5. If you need to set additional configuration properties for your Kerberos installation, see Locating the
krb5.conf Configuration File section located in http://docs.oracle.com/javase/7/docs/technotes/guides/
security/jgss/tutorials/KerberosReq.html for details.
6. Close and save the launcher.properties file.
Consult your operating system's documentation for information on how to properly set your clock.
1. Download the Java Cryptographic Extension (JCE) for the currently supported version of Java from the
Oracle site.
2. Read the installation instructions that are included with the download.
3. Copy the JCE jars to the java\lib\security directory where PDI is installed.
https://help.pentaho.com/Documentation/5.2/0P0/0W0/030/040 31/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
Modify Kerberos Configuration File to Reflect Realm, KDC, and Admin Server on
Windows Client
You will need to modify the Kerberos configuration file to reflect the appropriate realm, KDC, and Admin
Server.
1. Open the krb5.conf file. By default this file is located in c:\Program Data\Kerberos. This location
might be different on your system.
2. Add the appropriate realm, KDC, and Admin Server information. An example of where to add the data
appears below.
[libdefaults]
default_realm = <correct default realm name>
clockskew = 300
v4_instance_resolve = false
v4_name_convert = {
host = {
rcmd = host
ftp = ftp
}
plain = {
something = something-else
}
}
[realms]
<correct default realm name>= {
kdc=<KDC IP Address, or resolvable Hostname>
admin_server=< Admin Server IP Address, or resolvable Hostname>
}
MY.REALM = {
kdc = MY.COMPUTER
}
OTHER.REALM = {
v4_instance_convert = {
kerberos = kerberos
computer = computer.some.other.domain
}
}
[domain_realm]
.my.domain = MY.REALM
https://help.pentaho.com/Documentation/5.2/0P0/0W0/030/040 32/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
5. Restart the computer.
Consult your operating system's documentation for information on how to properly set your clock.
1. Start Spoon.
2. Create a new transformation.
3. Drag the MongoDB Input step to the canvas and open the step.
4. Enter the host name of the MongoDB instance and port for MongoDB.
5. In the username field, indicate the Kerberos principal, using this format:
<primary>/<instance>@KERBEROS_REALM. (Be sure to include the forward slash. Also note that the
Kerberos Realm is case sensitive.) Check with your administrator if you do not know your Kerberos
principal.
6. Leave the password field blank.
7. Click the Authenticate using Kerberos checkbox.
8. Click the Input options tab, then enter the name of a database on MongoDB to which you have read
permissions.
9. Click the Get Collections button. You should be able to see the databases you have read accesss to, as
well as the collections in the drop down lists.
10. Click the Preview button. If you see data, then you know that Kerberos is working properly.
https://help.pentaho.com/Documentation/5.2/0P0/0W0/030/050 33/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
Use Impersonation to Access a MapR Cluster
By default, the DI Server admin user executes transformations and jobs. But, if your transformation or job
needs to run on a MapR cluster or access its resources, the DI Server admin might not have an account there
or have the right permissions and accesses.
Using impersonation helps solve this issue. With impersonation, you indicate that a transformation should run
using the permissions and accesses of a different Hadoop user. Impersonation leverages the Hadoop user’s
existing permissions and accesses to provide access to components that run on MapR clusters such
as mapreduce, pig, oozie, sqoop, hive, or a directory on HDFS.
Instructions for impersonation or spoofing depend on your Spoon client’s operating system.
• Prerequisites for Impersonation and Spoofing for Both Linux and Windows Nodes
• Set Up Impersonation on Linux Client Node
• Set Up Spoofing on Windows Client Node
https://help.pentaho.com/Documentation/5.2/0P0/0W0/030/050 34/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
Make MapR the Active Hadoop Distribution
Make MapR your active Hadoop distribution and configure it. See Set Active Hadoop Distribution Set Active
Hadoop Distribution and Additional Configuration for MapR Shims Additional configuration for MapR Shims for
more detail.
NOTE:
[hadoop distribution] is the name of the Hadoop distribution, such as mapr31.
authentication.kerberos.password=userPassword
NOTE:
Use Kettle encryption to store the password more securely.
• To authenticate with a keytab file, set the authentication.kerberos.keytabLocation property to the keytab
file path.
authentication.kerberos.keytabLocation=/home/Server14/Kerberos/username.keytab
NOTE:
If both the authentication.kerberos.password and authentication.kerberos.keytabLocation properties are set, the
authentication.kerberos.password property takes precedence.
https://help.pentaho.com/Documentation/5.2/0P0/0W0/030/050 35/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
1. Assign an ID to the authentication credentials that you just specified (Kerberos Principal and password
or keytab), by setting the authentication.kerberos.id property.
authentication.kerberos.id=mapr-kerberos
1. To use authentication credentials you just specified, set the authentication.superuser.provider to the
authentication.kerberos.id.
authentication.superuser.provider=mapr-kerberos
1. Open the mapr.login.conf file on the host. By default, the file is located in opt/mapr/conf.
2. In the hadoop_hybrid section, set useTicketCache and renewTGT variables to false, like this:
hadoop_hybrid{
org.apache.hadoop.security.login.KerberosBugWorkAroundLoginModule optional
useTicketCache=false
renewTGT=false
1. Open the hive-site.xml file that is on the hive server host. Note the values for the
kerberos.principal and the sasl.qop.
2. Close the hive-site.xml file.
3. Start Spoon.
4. In Spoon, open the Database Connection window.
5. Click Options. Add the following parameters and set them to the values that you noted in the hive-
site.xml file.
•
sasl.qop
•
principal
https://help.pentaho.com/Documentation/5.2/0P0/0W0/030/050 36/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
NOTE:
The principal typically has a mapr prefix before the name, like this: mapr/mapr31.pentaho@mydomain
NOTE:
[hadoop configuration] is the name of the Hadoop distribution, such as mapr31.
1. Set the usernames in the <value> tag for the proxy users as needed. The username you use should be
recognized by every node in the MapR cluster.
HDFS pentaho.hdfs.proxy.user
Mapreduce pentaho.mapreduce.proxy.user
Pig pentaho.pig.proxy.user
Sqoop pentaho.sqoop.proxy.user
Oozie pentaho.oozie.proxy.user
<configuration>
<property>
<name>pentaho.hdfs.proxy.user</name>
<value>jdoe</value>
</property>
<property>
<name>pentaho.mapreduce.proxy.user</name>
<value>bmichaels</value>
</property>
<property>
https://help.pentaho.com/Documentation/5.2/0P0/0W0/030/050 37/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
<name>pentaho.pig.proxy.user</name>
<value>jdoe</value>
</property>
<property>
<name>pentaho.sqoop.proxy.user</name>
<value>cclarke</value>
</property>
<property>
<name>pentaho.oozie.proxy.user</name>
<value>jdoe</value>
</property>
<property>
<name>hadoop.spoofed.user.uid</name>
<value>{UID}</value>
</property>
<property>
<name>hadoop.spoofed.user.gid</name>
<value>{GID}</value>
</property>
<property>
<name>hadoop.spoofed.user.username</name>
<value>{id of user who has UID}</value>
</property>
• Replace {id of user who has UID} with the username the principal in the config.properties file.
• Replace {UID} with the hadoop.spoofed.user.username UID.
• Replace {GID} with the hadoop.spoofed.user.username GID.
https://help.pentaho.com/Documentation/5.2/0P0/0W0/030/050 38/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
1. Save and close the file.
2. Repeat these steps for Spoon. In Spoon the core-site.xml file is in data-integration/plugins/
pentaho-big-data-plugin/hadoop-configurations/[hadoop distribution].
https://help.pentaho.com/Documentation/5.2/0P0/0W0/090 39/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
Apply AES Password Encryption
There are two ways to secure passwords in PDI: Kettle obfuscation and AES. Kettle obfuscation is applied by
default. To increase security, use the Advanced Encryption Standard (AES) instead. The password security
method you choose is applied to all passwords, including those in database connections, transformation steps,
and job entries. To learn more about AES, see http://en.wikipedia.org/wiki/Advanced_Encryption_Standard.
NOTE:
If you switch password security methods, all existing passwords will also use new method.
To use the 192-bit or 256-bit encryption strengths, install the Java Cryptography Extension (JCE). You do not
need to install the JCE to use 128-bit encryption. To learn more about the JCE, see the Oracle site.
1. Create a text file that contains a key phrase, such as !@ExampleKey#123. Note that leading and trailing
whitespaces are ignored.
2. Save and close the file.
NOTE:
Safeguard the key file. If the key file becomes corrupted or lost, passwords cannot be decrypted.
1. Open the kettle.properties file for Spoon. By default, the kettle.properties file is in the user’s home
directory.
2. Add the following variables and values.
Required. Indicates
KETTLE_PASSWORD_ENCODER_PLUGIN the type of plugin AES
used.
https://help.pentaho.com/Documentation/5.2/0P0/0W0/090 40/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT
NOT the location of the
kettle.properties file.
Example: c:/securearea/
keyfile.txt
Optional. Maintain
backwards
compatibility by
setting this variable
KETTLE_AES_KETTLE_PASSWORD_HANDLING DECODE
to Decode. If this is
not set, Kettle
encoded passwords
are not decoded.
1. Start Spoon.
2. Create a blank transformation.
3. Add a database connection that requires a password.
4. Save, then the close the transformation.
5. Use a text editor to open the transformation you just saved, then search for the name of the connection
you created.
6. Examine the password. If the password is preceded by the letters AES, the encryption method was
applied properly.
7. Close the transformation.
https://help.pentaho.com/Documentation/5.2/0P0/0W0/Apply_AES_Password_Encryption 41/41
Updated: Tue, 30 Sep 2014 01:23:00 GMT