0 ratings0% found this document useful (0 votes) 20 views7 pagesbdcc-2 6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
yamn6r24, 23 AN
BDCC
Apache Drill
Apache Drill is an open-source software
framework that supports data-intensive
distributed applications for interactive
analysis of large-scale datasets.
Drill is the open source version of Google's
Dremel system which is available as an
infrastructure service called Google BigQuery.
One explicitly stated design goal is that Drill
is able to scale to 10,000 servers or more and
to be able to process petabytes of data and
trillions of records in seconds.
Drill is an Apache top-level project.
Drill supports a variety of NoSQL databases
and file systems, including HBase, MongoDB,
MapR-DB, HDFS, MapR-FS, Amazon S3,
Azure Blob Storage, Google Cloud Storage,
Swift, NAS and local files.
Asingle query can join data from multiple
datastores. For example, you can join a user
profile collection in MongoDB with a directory
of event logs in Hadoop.
Drill's datastore-aware optimizer automatically
restructures a query plan to leverage the
datastore's internal processing capabilities.
In addition, Drill supports data locality, so it's
a good idea to co-locate Drill and the
datastore on the same nodes.
ntps:odce santechz com/uni-26-apache-dil,
BOCC-6 - Apache Dri
APACHE
DRILL
Ohedaap @mongo
e,
ainaz00|SB Pes
El Windows Azureya7n6124, 83 AN
BDCC
Drill gets rid of all that overhead so that
users can just query the raw data in-situ.
There's no need to load the data, create and
maintain schemas, or transform the data
before it can be processed.
Instead, simply include the path to a
Hadoop directory, MongoDB collection or
$3 bucket in the SQL query.
Drill leverages advanced query compilation
and re-compilation techniques to maximize
performance without requiring up-front
schema knowledge.
Drill features a JSON data model that enables
queries on complex/nested data as well as.
rapidly evolving structures commonly seen in
modern applications and non-relational
datastores.
Drill also provides intuitive extensions to SQL
so that you can easily query complex data.
Drill is the only columnar query engine that
supports complex data.
It features an in-memory shredded columnar
representation for complex data which allows
Drill to achieve columnar speed with the
flexibility of an internal SON document model.
hips bac. santechz.comvunit2I6-apache-dril
BOCC- 8 - Apache Dil
SELECT * FROM dfs.root.*/web/Logs”;
SELECT country, count(*)
FROM mongodb.web.users
GROUP BY country;
SELECT timestamp
A cata can be represented a+
‘eva the SON data mode
dein data must beya7n6124, 83 AN BOCC- 8 - Apache Dil
BDCC
Tableau, Qlik, MicroStrategy, Spottire,
SAS and Excel to interact with non-
9 i att .
relational datastores by leveraging Dril's = {++} +a bleau
JDBC and ODBC drivers. ++ QlikQ
* Developers can leverage Drill's simple MxcroStrategy @Spotfire’
REST API in their custom applications to TIBCO Software
EE] Excel §sas 3
create beautiful visualizations.
Drill's virtual datasets allow even the
most complex, non-relational data to be
mapped into Bl-friendly structures which
users can explore and visualize using
their tool of choice.
Drill isn't the world's first query engine, but it's the first that combines both flexibility and
speed.
To achieve this, Drill features a radically different architecture that enables record-breaking
performance without sacrificing the flexibility offered by the JSON document model.
Drill's design includes:
Columnar execution engine (the first ever to support complex data!)
Data-driven compilation and recompilation at execution time
Specialized memory management that reduces memory footprint and eliminates
garbage collections
Locality-aware execution that reduces network traffic when Drill is co-located with the
datastore
Advanced cost-based optimizer that pushes processing into the datastore when
possible
hips bac. santechz.comvunit2I6-apache-dril ayamn6r24, 23 AN BOCC-6 - Apache Dri
BDCC
cul Tableau, Excel, Qlik, Web/Custom
a set
Apache Drill
eee)
NoSQL Search Files laaS/PaaS: Relational
HBase Elasticsearch NAS (NetApp, etc.) Amazon $3 Oracle
MongoDB HDFS MySQL
Kudu SQL Server
INSTALLING AND USING APACHE DRILL
First we download Apache Drill
wget http://apache.mirrors.hoobly.com/drill/drill-1.18.0/apache-drill-1. 18.0. tar.gz
Then we extract it
tar -xvzf apache-drill-1.18.0. tar.gz
my apache-drill-1.18.0 apache-drill
Then we launch it
apache-drill/bin/drill-embedded
hadoop@aaron-hadoop:~$ apache-drill/bin/drill-embedded
‘Apache Drill 1.18.0
“Data is the new oil. Ready to Drill some?"
apache drill> §f
ntps:odce santechz com/uni-26-apache-dil,ya7n6124, 83 AN
BDCC
BOCC- 8 - Apache Dil
Plugin Management
Enabled Storage Plugins Disabled Storage Plugins
From the menu bar, Select Query
eet]
Sample Sol query: SELECT + FROM cp."employee. zon” LUT 20
‘Query ype: OAL Physical Logical
very
int: Use Metter to submat
te)
FEY eect 000 ome @ ott ca:
hips bac. santechz.comvunit2I6-apache-dril
57ya7n6124, 83 AN
Bpce
Se || symm nsencrat arena
BOC - 6 - Apache Dri
The query returns results that are not useable.
We convert the data from byte arrays to UTF8 types that are
meaningful. We also store this query in a view.
(CREATE VIEW dis.tmp.students AS
‘SELECT CONVERT_FROM(ow key, 'UTF8) AS studentc,
‘CONVERT_FROM(studenis.account.name, UTF8) AS name,
CONVERT FROM(students.address.state, ‘UTF8) AS state, —
CONVERT_FROM(ctudents.address street, 'UTF8) AS stroot, trae set cov scent ete
CONVERT_FROM(etudents. address zipcode, 'UTFB) AS zipcode
FROM hbase. students; Soeaeitices,
‘SELECT * FROM ats mp students; Se Bets seo or
shea etn
ntps:odce santechz com/uni-26-apache-dil, er276124, 8:49 AM [BOCC-6 - Apache Dri
BDCC
CONVERT_FROMiclcks.cickinfo ur, UTF8) AS ul
FROM hbase.cicks;
Note:- We write time within "backquotes' as tis an sql keyword.
SELECT * FROM dis.imp.clicks;
Pelelelejelele/ee
Join the two tables together using a join
‘SELECT * FROM
(SELECT * FROM dfs.tmp.students) s
LEFT JOIN
(SELECT * FROM ais.tmp clicks) ¢
ON s.studentid = ¢.studentid;
‘tenis = same = state > stent © spcote = ald = sade = tine wt :
siete CA are tests leet sate owororrzooiow pew grg com
suet) ce CA aR nr tats eka aot rowororororowon —— yawnamsrancam
dena A tine ms etd. stufentz_——=—=ow or oorozmteco! pwn con
suena 80 CA Hin ee
sen So CR Nita ats cekS_—— dene aTTZOFOLOOD puree
sed Fak Ck as Matar ts cok sudena—=«=«2ORLERGTIZOFELCOD! mum gnogacom
a Cl tus coy adens om ezartzaseioooY gmat
seule My_—— CR SStPeny Heb aot SRT ZPOFONONOY pawn
set ay CASS Pny KS ekd | adett 20RD GLZBONONONOY Mp fawamarncam
Compiled by Aaron Stanislaus Johns
ntps:ifbdce santechz.com/unit-216-apache-dil, a"