Zeppelin Using
Zeppelin Using
Zeppelin Using
http://docs.hortonworks.com
Contents
Introduction............................................................................................................... 3
Launch Zeppelin....................................................................................................... 3
Introduction
An Apache Zeppelin notebook is a browser-based GUI you can use for interactive data exploration, modeling, and
visualization.
As an Apache Zeppelin notebook author or collaborator, you write code in a browser window. When you run the code
from the browser, Zeppelin sends the code to backend processors such as Spark. The processor or service and returns
results; you can then use Zeppelin to review and visualize results in the browser.
Note: Zeppelin on HDP does not support sharing a note by sharing its URL, due to lack of proper access control over
how and with whom a note can be shared.
Apache Zeppelin is supported on the following browsers:
• Internet Explorer, latest supported releases. (Zeppelin is not supported on versions 8 or 9, due to lack of native
support for WebSockets.)
• Google Chrome, latest stable release.
• Mozilla Firefox, latest stable release.
• Apple Safari, latest stable release. Note that when SSL is enabled for Zeppelin, Safari requires a Certificate
Authority-signed certificate to access the Zeppelin UI.
Launch Zeppelin
Use the follwing steps to launch Apache Zeppelin.
To launch Zeppelin in your browser, access the host and port associated with the Zeppelin server. The default port is
9995:
http://<zeppelinhost>:9995.
When you first connect to Zeppelin, you will see the home page:
If Zeppelin is configured for LDAP or Active Directory authentication, log in before using Zeppelin:
1. Click the Login button at the top right corner of the page.
2. In the login dialog box, specify a valid username and password.
If Active Directory is used for the identity store, you might need to fully qualify your account name (unless the
activeDirectoryRealm.principalSuffix property was set during AD configuration); for example:
[email protected]
3. Zeppelin presents its home page.
The following menus are available in the top banner of all Zeppelin pages:
• Notebook
3
Apache Zeppelin Launch Zeppelin
• User settings
Displays your username, or "anonymous" if security is not configured for Zeppelin.
4
Apache Zeppelin Working with Zeppelin Notes
5
Apache Zeppelin Working with Zeppelin Notes
Zeppelin ships with several sample notes, including tutorials that demonstrate how to run Spark scala code, Spark
SQL code, and create visualizations.
To run a tutorial:
1. Navigate to the tutorial: click one of the Zeppelin tutorial links on the left side of the welcome page, or use the
Notebook pull-down menu.
2. Zeppelin presents the tutorial, a sequence of paragraphs prepopulated with code and text.
3. Starting with the first paragraph, click the triangle button at the upper right of the paragraph. The status changes to
PENDING, RUNNING, and then FINISHED when done.
4. When the first cell finishes execution, results appear in the box underneath your code. Review the results.
5. Step through each cell, running the code and reviewing results.
2. Zeppelin displays status near the triangle button: PENDING, RUNNING, ERROR, or FINISHED.
3. When finished, results appear in the result section below your code.
The settings icon (outlined in red) offers several additional commands:
These commands allow you to perform several note operations, such as showing and hiding line numbers, clearing the
results section, and deleting the paragraph.
6
Apache Zeppelin Working with Zeppelin Notes
Import a Note
Use the following steps to import an Apache Zeppelin note.
Procedure
1. Click "Import note" on the Zeppelin home page:
7
Apache Zeppelin Working with Zeppelin Notes
3. To upload the file or specify the URL, click the associated box.
By default, the name of the imported note is the same as the original note. You can rename it by providing a new
name in the "Import AS" field.
Export a Note
Use the following steps to export an Apache Zeppelin note.
To export a note to a local JSON file, use the export note icon in the note toolbar:
8
Apache Zeppelin Import External Packages
9
Apache Zeppelin Configuring and Using Zeppelin Interpreters
• SPARK_SUBMIT_OPTIONS in conf/zeppelin-env.sh
If you want to import a library for a note that uses the Livy interpreter, see "Using the %livy Interpreter to Access
Spark" in the HDP Apache Spark guide.
10
Apache Zeppelin Configuring and Using Zeppelin Interpreters
3. Under "Settings", make sure that the interpreter you want to use is selected (in blue text). Unselected interpreters
appear in white text:
4. To select an interpreter, click on the interpreter name to select the interpreter. Each click operates as a toggle.
5. You should unselect interpreters that will not be used. This makes your choices clearer. For example, if you plan
to use %livy to access Spark, unselect the %spark interpreter.
Whenever one or more interpreters could be used to access the same underlying service, you can specify the
precedence of interpreters within a note:
• Drag and drop interpreters into the desired positions in the list.
• When finished, click "Save".
Use an interpreter in a paragraph
To use an interpreter, specify the interpreter directive at the beginning of a paragraph, using the format
%[INTERPRETER_NAME]. The directive must appear before any code that uses the interpreter.
The following paragraph uses the %sh interpreter to access the system shell and list the current working directory:
%sh
pwd
home/zeppelin
11
Apache Zeppelin Configuring and Using Zeppelin Interpreters
Some interpreters support more than one form of the directive. For example, the %livy interpreter supports directives
for PySpark, PySpark3, SparkR, Spark SQL.
To view interpreter directives and settings, navigate to the Interpreter page and scroll through the list of interpreters
or search for the interpreter name. Directives are listed immediately after the name of the interpreter, followed by
options and property settings. For example, the JDBC interpreter supports the %jdbc directive:
Note: The Interpreter page is subject to access control settings. If the Interpreters page does not list settings, check
with your system administrator for more information.
Use interpreter groups
Each interpreter belongs to an interpreter group. Interpreters in the same group can reference each other. For example,
if the Spark SQL interpreter and the Spark interpreter are in the same group, the Spark SQL interpreter can reference
the Spark interpreter to access its SparkContext.
First paragraph:
%spark.conf
spark.app.namehelloworld
master yarn-client
spark.jars.packages com.databricks:spark-csv_2.11:1.2.0
Second paragraph:
%spark
12
Apache Zeppelin Configuring and Using Zeppelin Interpreters
import com.databricks.spark.csv._
In the first paragraph, the conf interpreter is used to create a custom Spark interpreter configuration (set app name,
yarn-client mode, and add spark-csv dependencies). After running the first paragraph, the second paragraph can be
run to use spark-csv in the note.
In order for the conf interpreter to run successfully, it must be configured on an isolated per-note basis. Also, the
paragraph with the conf interpreter customization settings must be run first, before subsequent applicable interpreter
processes are launched.
%jdbc(hive)
SELECT * FROM db_name;
If you receive an error, you might need to complete the following additional steps:
1. Copy Hive jar files to /usr/hdp/current/zeppelin-server/interpreter/jdbc (or create a soft link).
2. In the Zeppelin UI, navigate to the %jdbc section of the Interpreter page.
3. Click edit, then add a hive.proxy.user.property property and set its value tohive.server2.proxy.user.
4. Click Save, then click restart to restart the JDBC interpreter.
Procedure
1. Add the following directive at the start of a paragraph:
%jdbc(phoenix)
2. Run the query that accesses Phoenix.
13
Apache Zeppelin Configuring and Using Zeppelin Interpreters
The Livy interpreter offers several advantages over the default Spark interpreter (%spark):
• Sharing of Spark context across multiple Zeppelin instances.
• Reduced resource use, by recycling resources after 60 minutes of activity (by default). The default Spark
interpreter runs jobs--and retains job resources--indefinitely.
• User impersonation. When the Zeppelin server runs with authentication enabled, the Livy interpreter propagates
user identity to the Spark job so that the job runs as the originating user. This is especially useful when multiple
users are expected to connect to the same set of data repositories within an enterprise. (The default Spark
interpreter runs jobs as the default Zeppelin user.)
• The ability to run Spark in yarn-cluster mode.
Prerequisites:
• Before using SparkR through Livy, R must be installed on all nodes of your cluster. For more information, see
"SparkR Prerequisites" in the HDP Apache Spark guide.
• Before using Livy in a note, check the Interpreter page to ensure that the Livy interpreter is configured properly
for your cluster.
Note: The Interpreter page is subject to access control settings. If the Interpreters page does not list access settings,
check with your system administrator for more information.
To access PySpark using Livy, specify the corresponding interpreter directive before the code that accesses Spark; for
example:
%livy.pyspark
print "1"
Similarly, to access SparkR using Livy, specify the corresponding interpreter directive:
%livy.sparkr
hello <- function( name ) {
sprintf( "Hello, %s", name );
}
hello("livy")
Important:
To use SQLContext with Livy, do not create SQLContext explicitly. Zeppelin creates SQLContext by default.
If necessary, remove the following lines from the SparkSQL declaration area of your note:
Livy sessions are recycled after a specified period of session inactivity. The default is one hour.
For more information about using Livy with Spark, see "Submitting Spark Applications Through Livy" in the HDP
Apache Spark guide.
Importing External Packages
To import an external package for use in a note that runs with Livy:
1. Navigate to the interpreter settings.
2. If you are running the Livy interpreter in local mode (as specified by livy.spark.master), add jar files to the /usr/
hdp/<version>/livy/repl-jars directory.
3. If you are running the Livy interepreter in yarn-cluster mode, either complete step 2 or edit the Livy configuration
on the Interpreters page as follows:
a. Add a new key, livy.spark.jars.packages.
14
Apache Zeppelin Configuring and Using Zeppelin Interpreters
Using Spark Hive Warehouse and HBase Connector Client .jar files with Livy
This section describes how to use Spark Hive Warehouse Connector (HWC) and Spark HBase Connector (SHC)
client .jar files with Livy. These steps are required to ensure token acquisition and avoid authentication errors.
Use the following steps to use Spark HWC and SHC client .jar files with Livy:
1. Copy the applicable HWC or SHC .jar files to the Livy server node and add these folders to the livy.file.local-dir-
whitelist property in the livy.conf file.
2. Add the required configurations in the /usr/hdp/current/spark2-client/conf folder:
• For Hive, in /usr/hdp/current/spark2-client/conf/hive-site.xml
• For HBase, in /usr/hdp/current/spark2-client/conf/hbase-site.xml).
Or add the required configurations using the conf field in the session creation request. This is equivalent to using
"--conf" in spark-submit.
3. Reference these local .jar files in the session creation request using the file:/// URI format.
HWC Example
1. Add the add the following folders to the livy.file.local-dir-whitelist property in the livy.conf file.
/usr/hdp/current/hive_warehouse_connector/
2. Add hive-site.xml to /usr/hdp/current/spark2-client/conf on all cluster nodes.
3. When running using the Zeppelin Livy interpreter, reference the HWC .jar file as shown below.
%livy2.conf
livy.spark.jars file:///usr/hdp/current/hive_warehouse_connector/hive-
warehouse-connector-assembly-1.0.0.3.0.0.0-1634.jar
SHC Example
1. Add the add the following folders to the livy.file.local-dir-whitelist property in the livy.conf file.
/usr/hdp/current/hbase-client/lib, /usr/hdp/current/shc
2. Add hbase-site.xml to /usr/hdp/current/spark2-client/conf on all cluster nodes.
3. When running using the Zeppelin Livy interpreter, reference the following HBase .jar files as shown below. Note
that some of these .jar files have 644/root permissions, and therefore may throw an exception. If this happens, you
may need to change the permissions of the applicable .jar files on the Livy node.
%livy2.conf
livy.spark.jars file:///usr/hdp/current/shc/shc-
core-1.1.0.3.0.1.0-65.jar,
file:///usr/hdp/current/hbase-client/lib/hbase-shaded-protobuf-2.1.0.jar,
file:///usr/hdp/current/hbase-client/lib/hbase-shaded-
miscellaneous-2.1.0.jar,
file:///usr/hdp/current/hbase-client/lib/hbase-protocol-shaded.jar,
file:///usr/hdp/current/hbase-client/lib/hbase-shaded-netty-2.1.0.jar,
file:///usr/hdp/current/hbase-client/lib/hbase-shaded-client.jar,
file:///usr/hdp/current/hbase-client/lib/hbase-shaded-mapreduce.jar,
file:///usr/hdp/current/hbase-client/lib/hbase-common.jar,
file:///usr/hdp/current/hbase-client/lib/hbase-server.jar,
file:///usr/hdp/current/hbase-client/lib/hbase-client.jar,
file:///usr/hdp/current/hbase-client/lib/hbase-protocol.jar,
15
Apache Zeppelin Configuring and Using Zeppelin Interpreters
file:///usr/hdp/current/hbase-client/lib/hbase-mapreduce.jar,
file:///usr/hdp/current/hbase-client/lib/guava-11.0.2.jar
Note: The references to /usr/hdp/current/shc and its associated .jar file are included because SHC was used in
this example. They are not required for token acquisition.
Related Information
Using the Hive Warehouse Connector with Spark
HBase Data on Spark with Connectors
16