0% found this document useful (0 votes)

235 views63 pages

Apache Hue-Cloudera

Hue is a web interface for interacting with Apache Hadoop clusters. It simplifies using Hadoop by allowing users to analyze data through a browser without having to use complex command line tools. Hue provides applications for SQL querying, browsing HBase and HDFS data, building data pipelines with Oozie, and more. It is open source and has over 4,000 commits from 56 contributors.

Uploaded by

Dheepika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

235 views63 pages

Apache Hue-Cloudera

Uploaded by

Dheepika

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

BIG DATA WEB APPS

FOR INTERACTIVE
HADOOP
Enrico Berti
Big Data Spain, Nov 17, 2014
GOAL
OF HUE
WEB INTERFACE FOR ANALYZING DATA
WITH APACHE HADOOP

SIMPLIFY AND INTEGRATE

FREE AND OPEN SOURCE

—> OPEN UP BIG DATA

VIEW FROM
30K FEET

Hadoop Web Server

You, your colleagues and even that
friend that uses IE9 ;)
OPEN SOURCE

~4000 COMMITS

56 CONTRIBUTORS

911 STARS

337 FORKS

github.com/cloudera/hue
AROUND
THE WORLD
TALKS

Meetups and events in NYC, Paris,

LA, Tokyo, SF, Stockholm, Vienna,
San Jose, Singapore, Budapest, DC,
Madrid…

RETREATS

Nov 13 Koh Chang, Thailand

May 14 Curaçao, Netherlands AnMlles
Aug 14 Big Island, Hawaii
Nov 14 Tenerife, Spain
Nov 14 Nicaragua and Belize
Jan 15 Philippines
TREND: GROWTH

gethue.com
HISTORY

HUE 1

Desktop-‐like in a browser, did its

job but preVy slow, memory leaks
and not very IE friendly but
deﬁnitely advanced for its Mme
(2009-‐2010).
HISTORY

HUE 2

The ﬁrst ﬂat structure port, with

TwiVer Bootstrap all over the
place.

HUE 2.5

New apps, improved the UX

adding new nice funcMonaliMes
like autocomplete and drag &
drop.
HISTORY

HUE 3 ALPHA

Proposed design, didn’t make it.

HISTORY

HUE 3.6+

Where we are now, a brand new

way to search and explore your
data.
WHICH DISTRIBUTION?

HACKER ADVANCED USER NORMAL USER

GITHUB TARBALL CDH / CM

Very latest Advanced preview The most stable and cross
component checked
WHERE TO PUT HUE? IN ONE MACHINE
WHERE TO PUT HUE? OUTSIDE THE CLUSTER
WHERE TO PUT HUE? INSIDE THE CLUSTER
WHAT DO YOU NEED?

SERVER CLIENT
Python 2.4 2.6 Web Browser

That’s it if using a packaged version. If building from the IE 9+, FF 10+, Chrome, Safari
source, here are the extra packages

Hi there, I’m “just” a web server.

HOW DOES THE HUE SERVICE LOOK LIKE?

1 SERVER 1 DB
Process serving pages and also For cookies, saved queries,
static content workflows, …

Hi there, I’m “just” a web server.

HOW TO CONFIGURE HUE

HUE.INI [desktop]
Similar to core-‐site.xml but [[database]]
with .INI syntax # Database engine is typically one of:
# postgresql_psycopg2, mysql, or sqlite3
Where? engine=sqlite3
/etc/hue/conf/hue.ini ## host=
## port=
or
## user=
$HUE_HOME/desktop/conf/ ## password=
pseudo-distributed.ini name=desktop/desktop.db
AUTHENTICATION

SIMPLE ENTERPRISE
Login/Password in a Database LDAP (most used), OAuth,
(SQLite, MySQL, …) OpenID, SAML
DB BACKEND
LDAP BACKEND

Integrate your employees: LDAP How to guide

USERS

ADMIN USER
Can give and revoke Regular user + permissions
permissions to single users or
group of users
CONFIGURE APPS
AND PERMISSIONS
LIST OF GROUPS AND PERMISSIONS

A permission can:
- allow access to one app (e.g.
Hive Editor)
- modify data from the app (e.g
drop Hive Tables or edit cells in
HBase Browser)

A list of permissions
CONFIGURE APPS
AND PERMISSIONS
PERMISSIONS IN ACTION

User ‘test’ belonging to the group

‘hiveonly’ that has just the ‘hive’
permissions
HOW HUE INTERACTS
WITH HADOOP
LDAP Zookeeper
SAML

Sqoop2
YARN

JobTracker Hue Plugins HBase

Oozie

Pig Solr

HDFS Cloudera
HiveServer2
Impala

Hive
Metastore
RCP CALLS TO ALL HDFS EXAMPLE
THE HADOOP COMPONENTS
DN DN
WebHDFS
REST
DN DN

… NN

hVp://localhost:50070/webhdfs/v1/<PATH>?op=LISTSTATUS
RCP CALLS TO ALL
THE HADOOP COMPONENTS
HOW

List all the host/port of Hadoop [hbase]

APIs in the hue.ini # Comma-separated list ofHBase Thrift servers for
# clusters in the format of '(name|host:port)'.
For example here HBase and Hive.
hbase_clusters=(Cluster|localhost:9090)

[beeswax]
hive_server_host=host-abc
hive_server_port=10000

Full list
SECURITY
FEATURES

HTTPS SSL WITH HIVESERVER2 SSL DB

SENTRY KERBEROS READ MORE …

HIGH AVAILABILITY

HOW

2 Hue instances
HA proxy
MulM DB
Performances: like a website,
mostly RPC calls
FULL SUITE OF APPS
HBASE BROWSER

WHAT

Simple custom query language

Supports HBase ﬁlter language
Row Key Preﬁx Scan Thri= Filterstring
Supports selecMon & Copy + Paste,
gracefully degrades in IE
Autocomplete Help Menu
Scan Length Column/Family Filters
Searchbar Syntax Breakdown
SQL

WHAT

Impala, Hive integraMon, Spark

InteracMve SQL editor
IntegraMon with MapReduce,
Metastore, HDFS
SENTRY APP
SEARCH

WHAT

Solr & Cloud integraMon

Custom interacMve dashboards
Drag & drop widgets (charts,
Mmeline…)
JUST A VIEW
ON TOP OF SOLR API

REST
HISTORY
V1 USER
HISTORY
V1 ADMIN
HISTORY
V2 USER
HISTORY
V2 ADMIN
ARCHITECTURE

www….

Templates
REST AJAX +
JS Model
/select /add_widget
/admin/collections /zoom_in
/get /select_facet
/luke... /select_range...
ARCHITECTURE
UI FOR FACETS

All the 2D positioning (cell ids), visual, drag&drop

LAYOUT

Dashboard, fields, template, widgets (ids)

COLLECTION

QUERY Search terms, selected facets (q, fqs)

ADDING A WIDGET
LIFECYCLE

Load the initial page

Edit mode and Drag&Drop

REST AJAX

/solr/zookeeper/clusterstate.json /get_collection
/solr/admin/luke…
ADDING A WIDGET
LIFECYCLE
Select the field
Guess ranges (number or dates)
Rounding (number or dates)

REST AJAX

/solr/select?stats=true /new_facet
ADDING A WIDGET
LIFECYCLE
Query part 1
facet.range={!ex=bytes}bytes&f.bytes.facet.range.start=0&f.bytes.facet.range.end=9000000&
f.bytes.facet.range.gap=900000&f.bytes.facet.mincount=0&f.bytes.facet.limit=10

Query Part 2
q=Chrome&fq={!tag=bytes}bytes:[900000+TO+1800000]

{
...,
{ 'normalized_facets':[
'facet_counts':{ {
'facet_ranges':{ 'extraSeries':[
'bytes':{
'start':10000, ],
'counts':[ 'label':'bytes',
'900000', 'field':'bytes',
'counts':[
3423, Augment Solr response {
'1800000', 'from’:'900000',
339, 'to':'1800000',
... 'selected':True,
] 'value':3423,
} 'field’:'bytes',
} 'exclude':False
}
], ...
}
}
}
JSON TO WIDGET
{ {
"field":"rate_code", "field":"medallion",
"counts":[ "counts":[
{ {
"count":97797, "count":159,
"exclude":true, "exclude":true,
"selected":false, "selected":false,
"value":"1", "value":"6CA28FC49A4C49A9A96",
"cat":"rate_code" "cat":"medallion"
} ... } ….

{ {
"extraSeries":[ "field":"passenger_count",
"counts":[
], {
"label":"trip_time_in_secs", "count":74766,
"field":"trip_time_in_secs", "exclude":true,
"counts":[ "selected":false,
{ "value":"1",
"from":"0", "cat":"passenger_count"
"to":"10", } ...
"selected":false,
"value":527,
"field":"trip_time_in_secs",
"exclude":true
} ...
REPEAT UNTIL…
ENTERPRISE FEATURES

- Access to Search App configurable, LDAP/SAML auths

- Share by link
- Solr Cloud (or non Cloud)
- Proxy user
/solr/jobs_demo/select?user.name=hue&doAs=romain&q=
- Security
Kerberos
- Sentry
Collection level, Solr calls like /admin, /query, Solr UI, ZooKeeper
SPARK IGNITER
HISTORY

OCT 2013

Submit through Oozie

Shell like for Java, Scala, Python

HISTORY

JAN 2014
V2 Spark Igniter
Spark 0.8

Java, Scala with Spark Job Server

APR 2014

Spark 0.9

JUN 2014

Ironing + How to deploy

“JUST A VIEW”
ON TOP OF SPARK

submit
list apps
list jobs
Saved script metadata Hue list contexts
Job Server
eg. name, args, classname, jar name…
HOW TO TALK
TO SPARK?

Hue Spark Job Server

Spark
APP
LIFE CYCLE

Hue Spark Job Server

Spark
APP
LIFE CYCLE sbt _/package

… extend SparkJob

JAR

Upload

.scala
APP
LIFE CYCLE sbt _/package

… extend SparkJob

JAR

Upload

.scala

Context

create context: auto or manual

SPARK JOB SERVER

WHERE

hVps://github.com/ooyala/spark-‐jobserver curl -d "input.string = a b c a b see" 'localhost:8090/jobs?

appName=test&classPath=spark.jobserver.WordCountExample'
{
WHAT "status": "STARTED",
"result": {
REST job server for Spark "jobId": "5453779a-f004-45fc-a11d-a39dae0f9bf4",
"context": "b7ea0eb5-spark.jobserver.WordCountExample"
}
WHEN }
Spark Summit talk Monday 5:45pm:
Spark Job Server: Easy Spark Job
Management by Ooyala
FOCUS ON UX

curl -d "input.string = a b c a b see" 'localhost:8090/jobs?

appName=test&classPath=spark.jobserver.WordCountExample'
{
"status": "STARTED",
VS "result": {
"jobId": "5453779a-f004-45fc-a11d-a39dae0f9bf4",
"context": "b7ea0eb5-spark.jobserver.WordCountExample"
}
}
TRAIT SPARKJOB

/**
* This trait is the main API for Spark jobs submitted to the Job Server.
*/
trait SparkJob {
/**
* This is the entry point for a Spark Job Server to execute Spark jobs.
* */
def runJob(sc: SparkContext, jobConfig: Config): Any

/**
* This method is called by the job server to allow jobs to validate their input and reject
* invalid job requests. */
def validate(sc: SparkContext, config: Config): SparkJobValidation
}
DEMO
TIME
SUM-UP

INSTALL ENABLE CONFIGURE

Install Hue on one machine Enable Hadoop Service APIs Configure hue.ini to point to
for Hue as a proxy user each Service API

LDAP HELP

Use an LDAP backend Get help on @gethue or hue-‐

user
ROADMAP
NEXT 6 MONTHS
WHAT

Oozie v2
Spark v2
SQL v2

More dashboards!
Inter component integraMons
(HBase <-‐>Search, create index
wizards, document permissions),
Hadoop Web apps SDK

Your idea here.

CONFIGURATIONS ARE HARD…

…GIVE CLOUDERA MANAGER A TRY!

vimeo.com/91805055
MISSED
SOMETHING?

learn.gethue.com
GRACIAS!
WEBSITE

hVp://gethue.com
LEARN

hVp://learn.gethue.com

TWITTER

@gethue

USER GROUP

hue-‐user@

TALEND ESB 6.0 Cours 1444874212 - 00 - Course - LessonTOC - 13 Files Merged
No ratings yet
TALEND ESB 6.0 Cours 1444874212 - 00 - Course - LessonTOC - 13 Files Merged
203 pages
RedBooks-InfoSphere DataStage For Enterprise XML Data Integration PDF
100% (1)
RedBooks-InfoSphere DataStage For Enterprise XML Data Integration PDF
404 pages
Oracle Database Administration Hand Book
83% (12)
Oracle Database Administration Hand Book
78 pages
ZKBioAccess IVS Manual de Usuario
No ratings yet
ZKBioAccess IVS Manual de Usuario
218 pages
Datastage Performance Guide PDF
No ratings yet
Datastage Performance Guide PDF
108 pages
Cloudera Administration PDF
100% (1)
Cloudera Administration PDF
476 pages
Sta SK Atmarpana Stuti Mahalinga Sastry Urai
No ratings yet
Sta SK Atmarpana Stuti Mahalinga Sastry Urai
122 pages
Ge Amx 4 Error Codes
60% (5)
Ge Amx 4 Error Codes
2 pages
s10332300-3004 - 0 Distributed Control System (DCS)
No ratings yet
s10332300-3004 - 0 Distributed Control System (DCS)
58 pages
Aws 03 S3
No ratings yet
Aws 03 S3
136 pages
Apache Hive Tutorial
No ratings yet
Apache Hive Tutorial
139 pages
Gitops: Florian Beetz Anja Kammer Simon Harrer
No ratings yet
Gitops: Florian Beetz Anja Kammer Simon Harrer
53 pages
AWS - 06 - Best Practice To Secure DataLake
No ratings yet
AWS - 06 - Best Practice To Secure DataLake
75 pages
AWS 05 DataLake
No ratings yet
AWS 05 DataLake
78 pages
Questions de Révision HCIAIoT
No ratings yet
Questions de Révision HCIAIoT
11 pages
NC Explorer Instruction Manual
No ratings yet
NC Explorer Instruction Manual
58 pages
PLCopen - Creating PLCopen Compliant Function Block Libraries
No ratings yet
PLCopen - Creating PLCopen Compliant Function Block Libraries
4 pages
Amazon Aurora: Relational Database Reimagined For The Cloud
No ratings yet
Amazon Aurora: Relational Database Reimagined For The Cloud
31 pages
Innum Oru Pen Manam
No ratings yet
Innum Oru Pen Manam
33 pages
Aleph Whitepaper
No ratings yet
Aleph Whitepaper
20 pages
Bigdata Interview Preparation Guide
No ratings yet
Bigdata Interview Preparation Guide
292 pages
Principles of SAP HANA Sizing - On Premise and Cloud-1
No ratings yet
Principles of SAP HANA Sizing - On Premise and Cloud-1
47 pages
Cloudera Administrator Training For Apache Hadoop
No ratings yet
Cloudera Administrator Training For Apache Hadoop
5 pages
Hive Tutorial
No ratings yet
Hive Tutorial
25 pages
Control4 EA-5 Controller: Installation Guide
No ratings yet
Control4 EA-5 Controller: Installation Guide
2 pages
Tutorial-HDP-Administration V III
100% (1)
Tutorial-HDP-Administration V III
274 pages
Network Cisco Fundamentals
No ratings yet
Network Cisco Fundamentals
31 pages
Cloudera Developer Training Exercise Manual
No ratings yet
Cloudera Developer Training Exercise Manual
131 pages
Spark Scala Protected
No ratings yet
Spark Scala Protected
211 pages
Step by Step Procedure To Transport SAP BI/BW Objects
No ratings yet
Step by Step Procedure To Transport SAP BI/BW Objects
18 pages
SYNOPSIS (HOTEL MANAGEMENT SYSTEM Project) 1
No ratings yet
SYNOPSIS (HOTEL MANAGEMENT SYSTEM Project) 1
14 pages
Cloudera Developer Training PDF
No ratings yet
Cloudera Developer Training PDF
593 pages
Fcis PPT Group 7
No ratings yet
Fcis PPT Group 7
47 pages
16 BHA Tally Format
No ratings yet
16 BHA Tally Format
2 pages
Agilent CDS OQ
No ratings yet
Agilent CDS OQ
9 pages
HDPDeveloper EnterpriseSpark1 StudentGuide
100% (1)
HDPDeveloper EnterpriseSpark1 StudentGuide
244 pages
Notes To Financial Statements
No ratings yet
Notes To Financial Statements
2 pages
Building Information Modelling For Cultural Heritage: A Review
No ratings yet
Building Information Modelling For Cultural Heritage: A Review
7 pages
Data Engineer Sample Resume: Eliot
No ratings yet
Data Engineer Sample Resume: Eliot
1 page
Script Efe
No ratings yet
Script Efe
2 pages
DB MSF Readme
No ratings yet
DB MSF Readme
3 pages
How To Start With Meditation
No ratings yet
How To Start With Meditation
5 pages
Mapreduce Lab
No ratings yet
Mapreduce Lab
36 pages
Experience Software Engineer
No ratings yet
Experience Software Engineer
1 page
Cloudera Spark
No ratings yet
Cloudera Spark
70 pages
Teradata Studio User Guide
No ratings yet
Teradata Studio User Guide
256 pages
What Are DBT Sources
No ratings yet
What Are DBT Sources
109 pages
Apache Hive Interview Questions
50% (2)
Apache Hive Interview Questions
6 pages
HERE Maps and Location Services Technical Guide
No ratings yet
HERE Maps and Location Services Technical Guide
15 pages
Introduction To Datastage: Ibm Infosphere Datastage V11.5
No ratings yet
Introduction To Datastage: Ibm Infosphere Datastage V11.5
23 pages
Spark Optimizations & Deployment
No ratings yet
Spark Optimizations & Deployment
13 pages
Hive and Impala
No ratings yet
Hive and Impala
46 pages
Technical Project
No ratings yet
Technical Project
1 page
Install Sqoop
No ratings yet
Install Sqoop
7 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
HDFS Commands
No ratings yet
HDFS Commands
15 pages
Critical Factors That Affecting Adoption of E-Lear
No ratings yet
Critical Factors That Affecting Adoption of E-Lear
17 pages
Introduction To Apache Spark (Spark) : - by Praveen
No ratings yet
Introduction To Apache Spark (Spark) : - by Praveen
19 pages
Hadoop: Fasilkom/Pusilkom UI (Credit: Samuel Louvan)
No ratings yet
Hadoop: Fasilkom/Pusilkom UI (Credit: Samuel Louvan)
44 pages
Matillion Optimizing Snowflake
No ratings yet
Matillion Optimizing Snowflake
23 pages
Joe Spiro - Gameplay Engineer - Resume
No ratings yet
Joe Spiro - Gameplay Engineer - Resume
1 page
Spark Sample Resume 2
100% (1)
Spark Sample Resume 2
7 pages
TalendOpenStudio BigData UG 5.2.1 en
No ratings yet
TalendOpenStudio BigData UG 5.2.1 en
266 pages
What Is Spark?: Up To 100× Faster
No ratings yet
What Is Spark?: Up To 100× Faster
56 pages
Hadoop Security S360 2015v8 PDF
No ratings yet
Hadoop Security S360 2015v8 PDF
27 pages
Hadoop Admin Course
No ratings yet
Hadoop Admin Course
8 pages
Administration of Hadoop Summer 2014 Lab Guide v3.1
No ratings yet
Administration of Hadoop Summer 2014 Lab Guide v3.1
107 pages
Spark With Bigdata
No ratings yet
Spark With Bigdata
94 pages
Real Time Data Processing With PDI
No ratings yet
Real Time Data Processing With PDI
15 pages
Edureka - Scala Interview Questions
No ratings yet
Edureka - Scala Interview Questions
21 pages
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
No ratings yet
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
23 pages
Spark in Production
No ratings yet
Spark in Production
34 pages
PySpark Meetup Talk
No ratings yet
PySpark Meetup Talk
35 pages
Spark With Python Notes
No ratings yet
Spark With Python Notes
206 pages
Hadoop Administrator Interview Questions: Cloudera® Enterprise Version
No ratings yet
Hadoop Administrator Interview Questions: Cloudera® Enterprise Version
13 pages
Single Node Deploy
100% (1)
Single Node Deploy
61 pages
Senior Data Engineer Resume Example
No ratings yet
Senior Data Engineer Resume Example
1 page
Flask PyKafka Integration
No ratings yet
Flask PyKafka Integration
15 pages
Oozie Tutorial
No ratings yet
Oozie Tutorial
84 pages
Hive Interview Questions Answers
No ratings yet
Hive Interview Questions Answers
6 pages
Hive Cheat Sheet - Quick Reference
No ratings yet
Hive Cheat Sheet - Quick Reference
19 pages
DataStage Material
No ratings yet
DataStage Material
40 pages
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
No ratings yet
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
7 pages
Hive Queries
No ratings yet
Hive Queries
5 pages
Spark DataFrames Project Exercise - Jupyter Notebook
No ratings yet
Spark DataFrames Project Exercise - Jupyter Notebook
7 pages
Cloudera Administration Study Guide
No ratings yet
Cloudera Administration Study Guide
3 pages
Hive Query Optimization Infinity
No ratings yet
Hive Query Optimization Infinity
13 pages
Atm Machine
No ratings yet
Atm Machine
12 pages
UML Violet Documentation
No ratings yet
UML Violet Documentation
3 pages
Hadoop HDFS Commands
No ratings yet
Hadoop HDFS Commands
6 pages
Vijay
No ratings yet
Vijay
3 pages
DataEngineer Roadmap
No ratings yet
DataEngineer Roadmap
12 pages
Mobile Application Development MAD - 22617 (Report)
No ratings yet
Mobile Application Development MAD - 22617 (Report)
37 pages
PSTAT 130 Midterm 2 Notes
No ratings yet
PSTAT 130 Midterm 2 Notes
31 pages
Pyspark Practice - Databricks
No ratings yet
Pyspark Practice - Databricks
66 pages
JSP-Servlet Interview Questions You'll Most Likely Be Asked
From Everand
JSP-Servlet Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet