Apache Kylin - Extreme OLAP Engine For Hadoop Presentation
Apache Kylin - Extreme OLAP Engine For Hadoop Presentation
Yang Li | [email protected]
Apache Kylin PMC member & Tech Leader
Sr. Architect of eBay GDI
from Shanghai China
Agenda
About Apache Kylin
Feature Highlights
Tech Highlights
Roadmap
Q&A
What
kylin / ˈkiːˈlɪn / 麒麟 @ApacheKylin
--n. (in Chinese art) a mythical animal of composite form
e
siz
Latency
10s
Balance Between Space and Time
0-D(apex) cuboid
time
OLAP Cube
item location supplier • Cuboid = one combination of dimensions
1-D cuboids • Cube = all combination of dimensions
(all cuboids)
• Base vs. aggregate cells; ancestor vs. descendant cells; parent vs. child cells
1. (9/15, milk, Urbana, Dairy_land) - <time, item, location, supplier>
2. (9/15, milk, Urbana, *) - <time, item, location>
3. (*, milk, Urbana, *) - <item, location>
4. (*, milk, Chicago, *) - <item, location>
5. (*, milk, *, *) - <item>
How
BI Tools, Web App…
ANSI SQL
Kylin
Map Reduce
Agenda
About Apache Kylin
Feature Highlights
Tech Highlights
Roadmap
Q&A
Feature Highlights
• Extremely Fast OLAP Engine at scale
• ANSI SQL Interface on Hadoop
• Seamless Integration with BI Tools, like Tableau
• Interactive Query Capability
• MOLAP Cube
• Incremental Build of Cubes
• Approximate Query Capability for Distinct Count (HyperLogLog)
• Leverage HBase Coprocessor for query latency
• Job Management and Monitoring
• User friendly Web GUI for manage, build, monitor and query cubes
• Security capability to set ACL at Cube/Project Level
• Support LDAP Integration
Define Data Model
Manage Jobs
Explore the Data
Interactive with BI Tool - Tableau
Who are using Kylin?
eBay
- 90% query < 5 seconds
Baidu
- Baidu Map internal analysis
REST Server
Query Engine
Mid Latency - Minutes Low Latency - Seconds
Routing
Cube: …
Row Key
Dim Fact Table: … Column
Streaming cubing
Analyze real-time data
Build delay down to seconds
Spark
Cube by Layer
The current algorithm 0-D Cuboid
MR
- Many MRs, the number of 1-D Cuboid
dimensions MR
4-D Cuboid
MR
Full Data
Cube by Segments
- Reduced shuffles, map side Cube Segment Cube Segment Cube Segment
Streaming cubing
- Build micro cube segments from streaming
- Use inverted index to capture last minute data
Kylin Lambda Architecture
Inverted
Index
l ay
Query Engine
ds de Last Hour
ANSI SQL
n
co
se
Hybrid Storage
Streaming
Interface
mi
nu
tes
de
lay Cube
H1, 2015
TBD
Next Gen
• Adv OLAP Functions
Oct, 2014 HybridOLAP • In-Memory Analysis
• Lambda Arch (TBD)
StreamingOLAP • Automation • Mobile (TBD)
• Streaming OLAP • Capacity • … more
• JDBC Driver Management
MOLAP •
•
New UI • Spark
Incremental Refresh
• ANSI SQL • Excel • … more
Jan, 2014
• ODBC Driver • SparkSQL
• Web GUI • … more
Prototype for • Tableau
• ACL
Sep, 2013
MOLAP • Open Source
• Basic end to end POC
Initial
Kylin Ecosystem
Kylin Core
Fundamental framework of Kylin OLAP Engine Integration Extension
ODBC Driver Security
ETL Redis Storage
Extension Drill Spark Engine
SparkSQL Docker
Plugins to support for additional functions and features
Kylin OLAP
Integration Core
Lifecycle Management Support to integrate with other
applications
Interface
Interface Web Console
Customized BI
Ambari/Hue Plugin
Allows for third party users to build more features via user-
interface atop Kylin core
If you want to go fast, go alone.
If you want to go far, go together.
[email protected] --African Proverb
http://kylin.io