0% found this document useful (0 votes)
235 views63 pages

Apache Hue-Cloudera

Hue is a web interface for interacting with Apache Hadoop clusters. It simplifies using Hadoop by allowing users to analyze data through a browser without having to use complex command line tools. Hue provides applications for SQL querying, browsing HBase and HDFS data, building data pipelines with Oozie, and more. It is open source and has over 4,000 commits from 56 contributors.

Uploaded by

Dheepika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
235 views63 pages

Apache Hue-Cloudera

Hue is a web interface for interacting with Apache Hadoop clusters. It simplifies using Hadoop by allowing users to analyze data through a browser without having to use complex command line tools. Hue provides applications for SQL querying, browsing HBase and HDFS data, building data pipelines with Oozie, and more. It is open source and has over 4,000 commits from 56 contributors.

Uploaded by

Dheepika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

BIG DATA WEB APPS

FOR INTERACTIVE
HADOOP
Enrico Berti
Big Data Spain, Nov 17, 2014
GOAL
OF HUE
WEB INTERFACE FOR ANALYZING DATA
WITH APACHE HADOOP

SIMPLIFY AND INTEGRATE

FREE AND OPEN SOURCE

—> OPEN UP BIG DATA


VIEW FROM
30K FEET

Hadoop Web Server


You, your colleagues and even that
friend that uses IE9 ;)
OPEN SOURCE

~4000 COMMITS

56 CONTRIBUTORS

911 STARS

337 FORKS

github.com/cloudera/hue
AROUND
THE WORLD
TALKS

Meetups and events in NYC, Paris,


LA, Tokyo, SF, Stockholm, Vienna,
San Jose, Singapore, Budapest, DC,
Madrid…

RETREATS

Nov 13 Koh Chang, Thailand


May 14 Curaçao, Netherlands AnMlles
Aug 14 Big Island, Hawaii
Nov 14 Tenerife, Spain
Nov 14 Nicaragua and Belize
Jan 15 Philippines
TREND: GROWTH

gethue.com
HISTORY

HUE 1

Desktop-‐like in a browser, did its


job but preVy slow, memory leaks
and not very IE friendly but
definitely advanced for its Mme
(2009-‐2010).
HISTORY

HUE 2

The first flat structure port, with


TwiVer Bootstrap all over the
place.

HUE 2.5

New apps, improved the UX


adding new nice funcMonaliMes
like autocomplete and drag &
drop.
HISTORY

HUE 3 ALPHA

Proposed design, didn’t make it.


HISTORY

HUE 3.6+

Where we are now, a brand new


way to search and explore your
data.
WHICH DISTRIBUTION?

HACKER ADVANCED USER NORMAL USER

GITHUB TARBALL CDH / CM


Very latest Advanced preview The most stable and cross
component checked
WHERE TO PUT HUE? IN ONE MACHINE
WHERE TO PUT HUE? OUTSIDE THE CLUSTER
WHERE TO PUT HUE? INSIDE THE CLUSTER
WHAT DO YOU NEED?

SERVER CLIENT
Python 2.4 2.6 Web Browser

That’s it if using a packaged version. If building from the IE 9+, FF 10+, Chrome, Safari
source, here are the extra packages

Hi there, I’m “just” a web server.


HOW DOES THE HUE SERVICE LOOK LIKE?

1 SERVER 1 DB
Process serving pages and also For cookies, saved queries,
static content workflows, …

Hi there, I’m “just” a web server.


HOW TO CONFIGURE HUE

HUE.INI [desktop]
Similar to core-‐site.xml but [[database]]
with .INI syntax # Database engine is typically one of:
# postgresql_psycopg2, mysql, or sqlite3
Where? engine=sqlite3
/etc/hue/conf/hue.ini ## host=
## port=
or
## user=
$HUE_HOME/desktop/conf/ ## password=
pseudo-distributed.ini name=desktop/desktop.db
AUTHENTICATION

SIMPLE ENTERPRISE
Login/Password in a Database LDAP (most used), OAuth,
(SQLite, MySQL, …) OpenID, SAML
DB BACKEND
LDAP BACKEND

Integrate your employees: LDAP How to guide


USERS

ADMIN USER
Can give and revoke Regular user + permissions
permissions to single users or
group of users
CONFIGURE APPS
AND PERMISSIONS
LIST OF GROUPS AND PERMISSIONS

A permission can:
- allow access to one app (e.g.
Hive Editor)
- modify data from the app (e.g
drop Hive Tables or edit cells in
HBase Browser)

A list of permissions
CONFIGURE APPS
AND PERMISSIONS
PERMISSIONS IN ACTION

User ‘test’ belonging to the group


‘hiveonly’ that has just the ‘hive’
permissions
HOW HUE INTERACTS
WITH HADOOP
LDAP Zookeeper
SAML

Sqoop2
YARN

JobTracker Hue Plugins HBase

Oozie

Pig Solr

HDFS Cloudera
HiveServer2
Impala

Hive
Metastore
RCP CALLS TO ALL HDFS EXAMPLE
THE HADOOP COMPONENTS
DN DN
WebHDFS
REST
DN DN

… NN

hVp://localhost:50070/webhdfs/v1/<PATH>?op=LISTSTATUS
RCP CALLS TO ALL
THE HADOOP COMPONENTS
HOW

List all the host/port of Hadoop [hbase]


APIs in the hue.ini # Comma-separated list ofHBase Thrift servers for
# clusters in the format of '(name|host:port)'.
For example here HBase and Hive.
hbase_clusters=(Cluster|localhost:9090)

[beeswax]
hive_server_host=host-abc
hive_server_port=10000

Full list
SECURITY
FEATURES

HTTPS SSL WITH HIVESERVER2 SSL DB

SENTRY KERBEROS READ MORE …


HIGH AVAILABILITY

HOW

2 Hue instances
HA proxy
MulM DB
Performances: like a website,
mostly RPC calls
FULL SUITE OF APPS
HBASE BROWSER

WHAT

Simple custom query language


Supports HBase filter language
Row Key Prefix Scan Thri= Filterstring
Supports selecMon & Copy + Paste,
gracefully degrades in IE
Autocomplete Help Menu
Scan Length Column/Family Filters
Searchbar Syntax Breakdown
SQL

WHAT

Impala, Hive integraMon, Spark


InteracMve SQL editor
IntegraMon with MapReduce,
Metastore, HDFS
SENTRY APP
SEARCH

WHAT

Solr & Cloud integraMon


Custom interacMve dashboards
Drag & drop widgets (charts,
Mmeline…)
JUST A VIEW
ON TOP OF SOLR API

REST
HISTORY
V1 USER
HISTORY
V1 ADMIN
HISTORY
V2 USER
HISTORY
V2 ADMIN
ARCHITECTURE

www….

Templates
REST AJAX +
JS Model
/select /add_widget
/admin/collections /zoom_in
/get /select_facet
/luke... /select_range...
ARCHITECTURE
UI FOR FACETS

All the 2D positioning (cell ids), visual, drag&drop


LAYOUT

Dashboard, fields, template, widgets (ids)


COLLECTION

QUERY Search terms, selected facets (q, fqs)


ADDING A WIDGET
LIFECYCLE

Load the initial page


Edit mode and Drag&Drop

REST AJAX

/solr/zookeeper/clusterstate.json /get_collection
/solr/admin/luke…
ADDING A WIDGET
LIFECYCLE
Select the field
Guess ranges (number or dates)
Rounding (number or dates)

REST AJAX

/solr/select?stats=true /new_facet
ADDING A WIDGET
LIFECYCLE
Query part 1
facet.range={!ex=bytes}bytes&f.bytes.facet.range.start=0&f.bytes.facet.range.end=9000000&
f.bytes.facet.range.gap=900000&f.bytes.facet.mincount=0&f.bytes.facet.limit=10

Query Part 2
q=Chrome&fq={!tag=bytes}bytes:[900000+TO+1800000]

{
...,
{ 'normalized_facets':[
'facet_counts':{ {
'facet_ranges':{ 'extraSeries':[
'bytes':{
'start':10000, ],
'counts':[ 'label':'bytes',
'900000', 'field':'bytes',
'counts':[
3423, Augment Solr response {
'1800000', 'from’:'900000',
339, 'to':'1800000',
... 'selected':True,
] 'value':3423,
} 'field’:'bytes',
} 'exclude':False
}
], ...
}
}
}
JSON TO WIDGET
{ {
"field":"rate_code", "field":"medallion",
"counts":[ "counts":[
{ {
"count":97797, "count":159,
"exclude":true, "exclude":true,
"selected":false, "selected":false,
"value":"1", "value":"6CA28FC49A4C49A9A96",
"cat":"rate_code" "cat":"medallion"
} ... } ….

{ {
"extraSeries":[ "field":"passenger_count",
"counts":[
], {
"label":"trip_time_in_secs", "count":74766,
"field":"trip_time_in_secs", "exclude":true,
"counts":[ "selected":false,
{ "value":"1",
"from":"0", "cat":"passenger_count"
"to":"10", } ...
"selected":false,
"value":527,
"field":"trip_time_in_secs",
"exclude":true
} ...
REPEAT UNTIL…
ENTERPRISE FEATURES

- Access to Search App configurable, LDAP/SAML auths


- Share by link
- Solr Cloud (or non Cloud)
- Proxy user
/solr/jobs_demo/select?user.name=hue&doAs=romain&q=
- Security
Kerberos
- Sentry
Collection level, Solr calls like /admin, /query, Solr UI, ZooKeeper
SPARK IGNITER
HISTORY

OCT 2013

Submit through Oozie

Shell like for Java, Scala, Python


HISTORY

JAN 2014
V2 Spark Igniter
Spark 0.8

Java, Scala with Spark Job Server

APR 2014

Spark 0.9

JUN 2014

Ironing + How to deploy


“JUST A VIEW”
ON TOP OF SPARK

submit
list apps
list jobs
Saved script metadata Hue list contexts
Job Server
eg. name, args, classname, jar name…
HOW TO TALK
TO SPARK?

Hue Spark Job Server

Spark
APP
LIFE CYCLE

Hue Spark Job Server

Spark
APP
LIFE CYCLE sbt _/package

… extend SparkJob

JAR

Upload

.scala
APP
LIFE CYCLE sbt _/package

… extend SparkJob

JAR

Upload

.scala

Context

create context: auto or manual


SPARK JOB SERVER

WHERE

hVps://github.com/ooyala/spark-‐jobserver curl -d "input.string = a b c a b see" 'localhost:8090/jobs?


appName=test&classPath=spark.jobserver.WordCountExample'
{
WHAT "status": "STARTED",
"result": {
REST job server for Spark "jobId": "5453779a-f004-45fc-a11d-a39dae0f9bf4",
"context": "b7ea0eb5-spark.jobserver.WordCountExample"
}
WHEN }
Spark Summit talk Monday 5:45pm:
Spark Job Server: Easy Spark Job
Management by Ooyala
FOCUS ON UX

curl -d "input.string = a b c a b see" 'localhost:8090/jobs?


appName=test&classPath=spark.jobserver.WordCountExample'
{
"status": "STARTED",
VS "result": {
"jobId": "5453779a-f004-45fc-a11d-a39dae0f9bf4",
"context": "b7ea0eb5-spark.jobserver.WordCountExample"
}
}
TRAIT SPARKJOB

/**
* This trait is the main API for Spark jobs submitted to the Job Server.
*/
trait SparkJob {
/**
* This is the entry point for a Spark Job Server to execute Spark jobs.
* */
def runJob(sc: SparkContext, jobConfig: Config): Any

/**
* This method is called by the job server to allow jobs to validate their input and reject
* invalid job requests. */
def validate(sc: SparkContext, config: Config): SparkJobValidation
}
DEMO
TIME
SUM-UP

INSTALL ENABLE CONFIGURE


Install Hue on one machine Enable Hadoop Service APIs Configure hue.ini to point to
for Hue as a proxy user each Service API

LDAP HELP

Use an LDAP backend Get help on @gethue or hue-‐


user
ROADMAP
NEXT 6 MONTHS
WHAT

Oozie v2
Spark v2
SQL v2

More dashboards!
Inter component integraMons
(HBase <-‐>Search, create index
wizards, document permissions),
Hadoop Web apps SDK

Your idea here.


CONFIGURATIONS ARE HARD…

…GIVE CLOUDERA MANAGER A TRY!

vimeo.com/91805055
MISSED
SOMETHING?

learn.gethue.com
GRACIAS!
WEBSITE

hVp://gethue.com
LEARN

hVp://learn.gethue.com

TWITTER

@gethue

USER GROUP

hue-‐user@

You might also like