Bigdata 11
Bigdata 11
Sunbeam Infotech
multiple files.
12. A
3
⑦ ->
When data is processed using MR job, -
->
24,B 1
number of reducers will be same as 2
- >
AD-N -
35,C
number of buckets. 3
-
&
44,7 3
- -
must be uploaded via staging table. ->
c
39,7 6
- >
->
Usually buckets are created on unique 84/G 7
-
->
->
->
F
I
It provides better sampling and speed-
up map side joins.
It is mandatory for DML operations.
& grouping).
ANALYST ne
3
Creating index is time-taking job (for huge surs PRESIDENT
data). If indexing is done under load, then MR m
mr, m, m,
~
Compact:
Stores combination of indexed column value & its HDFS block id.
Bitmap:
Stores combination of indexed column value & list of rows as bitmap.
Bitmap indexes work faster than Compact.
Hive indexes are not supported from Hive 3.x onwards. Use materialized view
instead to improve the performance.
Spark is Distributed computing framework, that can process huge amount of data.
Spark can be used as eco-system of Hadoop or can be used as independent
distributed computing framework. ↑ Research projects on my algos for people m
SQL streaming
Spark Philosophy Mr L
GraphX
similar api for
Unified -> similar
& any language.
performance(in high level apis.
Compute Engine -> works with distributed storage Spark High Level APIs (Data frances)
any
e.g. HDFs, 53, AzreBlob. -