Apache Hue-Cloudera
Apache Hue-Cloudera
FOR INTERACTIVE
HADOOP
Enrico Berti
Big Data Spain, Nov 17, 2014
GOAL
OF HUE
WEB INTERFACE FOR ANALYZING DATA
WITH APACHE HADOOP
~4000 COMMITS
56 CONTRIBUTORS
911 STARS
337 FORKS
github.com/cloudera/hue
AROUND
THE WORLD
TALKS
RETREATS
gethue.com
HISTORY
HUE 1
HUE 2
HUE 2.5
HUE 3 ALPHA
HUE 3.6+
SERVER CLIENT
Python 2.4 2.6 Web Browser
That’s it if using a packaged version. If building from the IE 9+, FF 10+, Chrome, Safari
source, here are the extra packages
1 SERVER 1 DB
Process serving pages and also For cookies, saved queries,
static content workflows, …
HUE.INI [desktop]
Similar to core-‐site.xml but [[database]]
with .INI syntax # Database engine is typically one of:
# postgresql_psycopg2, mysql, or sqlite3
Where? engine=sqlite3
/etc/hue/conf/hue.ini ## host=
## port=
or
## user=
$HUE_HOME/desktop/conf/ ## password=
pseudo-distributed.ini name=desktop/desktop.db
AUTHENTICATION
SIMPLE ENTERPRISE
Login/Password in a Database LDAP (most used), OAuth,
(SQLite, MySQL, …) OpenID, SAML
DB BACKEND
LDAP BACKEND
ADMIN USER
Can give and revoke Regular user + permissions
permissions to single users or
group of users
CONFIGURE APPS
AND PERMISSIONS
LIST OF GROUPS AND PERMISSIONS
A permission can:
- allow access to one app (e.g.
Hive Editor)
- modify data from the app (e.g
drop Hive Tables or edit cells in
HBase Browser)
A list of permissions
CONFIGURE APPS
AND PERMISSIONS
PERMISSIONS IN ACTION
Sqoop2
YARN
Oozie
Pig Solr
HDFS Cloudera
HiveServer2
Impala
Hive
Metastore
RCP CALLS TO ALL HDFS EXAMPLE
THE HADOOP COMPONENTS
DN DN
WebHDFS
REST
DN DN
… NN
hVp://localhost:50070/webhdfs/v1/<PATH>?op=LISTSTATUS
RCP CALLS TO ALL
THE HADOOP COMPONENTS
HOW
[beeswax]
hive_server_host=host-abc
hive_server_port=10000
Full list
SECURITY
FEATURES
HOW
2 Hue instances
HA proxy
MulM DB
Performances: like a website,
mostly RPC calls
FULL SUITE OF APPS
HBASE BROWSER
WHAT
WHAT
WHAT
REST
HISTORY
V1 USER
HISTORY
V1 ADMIN
HISTORY
V2 USER
HISTORY
V2 ADMIN
ARCHITECTURE
www….
Templates
REST AJAX +
JS Model
/select /add_widget
/admin/collections /zoom_in
/get /select_facet
/luke... /select_range...
ARCHITECTURE
UI FOR FACETS
REST AJAX
/solr/zookeeper/clusterstate.json /get_collection
/solr/admin/luke…
ADDING A WIDGET
LIFECYCLE
Select the field
Guess ranges (number or dates)
Rounding (number or dates)
REST AJAX
/solr/select?stats=true /new_facet
ADDING A WIDGET
LIFECYCLE
Query part 1
facet.range={!ex=bytes}bytes&f.bytes.facet.range.start=0&f.bytes.facet.range.end=9000000&
f.bytes.facet.range.gap=900000&f.bytes.facet.mincount=0&f.bytes.facet.limit=10
Query Part 2
q=Chrome&fq={!tag=bytes}bytes:[900000+TO+1800000]
{
...,
{ 'normalized_facets':[
'facet_counts':{ {
'facet_ranges':{ 'extraSeries':[
'bytes':{
'start':10000, ],
'counts':[ 'label':'bytes',
'900000', 'field':'bytes',
'counts':[
3423, Augment Solr response {
'1800000', 'from’:'900000',
339, 'to':'1800000',
... 'selected':True,
] 'value':3423,
} 'field’:'bytes',
} 'exclude':False
}
], ...
}
}
}
JSON TO WIDGET
{ {
"field":"rate_code", "field":"medallion",
"counts":[ "counts":[
{ {
"count":97797, "count":159,
"exclude":true, "exclude":true,
"selected":false, "selected":false,
"value":"1", "value":"6CA28FC49A4C49A9A96",
"cat":"rate_code" "cat":"medallion"
} ... } ….
{ {
"extraSeries":[ "field":"passenger_count",
"counts":[
], {
"label":"trip_time_in_secs", "count":74766,
"field":"trip_time_in_secs", "exclude":true,
"counts":[ "selected":false,
{ "value":"1",
"from":"0", "cat":"passenger_count"
"to":"10", } ...
"selected":false,
"value":527,
"field":"trip_time_in_secs",
"exclude":true
} ...
REPEAT UNTIL…
ENTERPRISE FEATURES
OCT 2013
JAN 2014
V2 Spark Igniter
Spark 0.8
APR 2014
Spark 0.9
JUN 2014
submit
list apps
list jobs
Saved script metadata Hue list contexts
Job Server
eg. name, args, classname, jar name…
HOW TO TALK
TO SPARK?
Spark
APP
LIFE CYCLE
Spark
APP
LIFE CYCLE sbt _/package
… extend SparkJob
JAR
Upload
.scala
APP
LIFE CYCLE sbt _/package
… extend SparkJob
JAR
Upload
.scala
Context
WHERE
/**
* This trait is the main API for Spark jobs submitted to the Job Server.
*/
trait SparkJob {
/**
* This is the entry point for a Spark Job Server to execute Spark jobs.
* */
def runJob(sc: SparkContext, jobConfig: Config): Any
/**
* This method is called by the job server to allow jobs to validate their input and reject
* invalid job requests. */
def validate(sc: SparkContext, config: Config): SparkJobValidation
}
DEMO
TIME
SUM-UP
LDAP HELP
Oozie v2
Spark v2
SQL v2
More dashboards!
Inter component integraMons
(HBase <-‐>Search, create index
wizards, document permissions),
Hadoop Web apps SDK
vimeo.com/91805055
MISSED
SOMETHING?
learn.gethue.com
GRACIAS!
WEBSITE
hVp://gethue.com
LEARN
hVp://learn.gethue.com
@gethue
USER GROUP
hue-‐user@