Bigdata Analytics

Uploaded by

samjaiwin2210

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views13 pages

Bigdata Analytics

Uploaded by

samjaiwin2210

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 13

BIGDATA ANALYTICS

USING MACHINE LEARNING

By
S.Manasha
II MSc Computer Science
HIVE
 Data warehouse and an ETL tool which provides an SQL-like interface
between the user and the Hadoop distributed file system (HDFS) which
integrates Hadoop.
 It facilitates reading, writing and handling wide datasets that stored in
distributed storage and queried by Structure Query Language (SQL) syntax. It
is not built for Online Transactional Processing (OLTP) workloads.
 It is designed to enhance scalability, extensibility, performance, fault-tolerance
and loose-coupling with its input formats.
HIVE DATA MODELLING

 Tables - Tables in Hive are created the same way it is done in RDBMS
 Partitions - Here, tables are organized into partitions for grouping similar
types of data based on the partition key
 Buckets - Data present in partitions can be further divided into buckets for
efficient querying
HIVE INTERNAL TABLES VS EXTERNAL TABLES

Internal:
 Data is stored in the Hive data warehouse. The data warehouse is located at
/hive/warehouse/ on the default storage for the cluster.
 Use internal tables when one of the following conditions applies.
 Data is temporary.
 You want Hive to manage the lifecycle of the table and data.
External:

 Data is stored outside the data warehouse. The data can be stored on any
storage accessible by the cluster.
 Use external tables when one of the following conditions apply:
 The data is also used outside of Hive. For example, the data files are updated
by another process (that doesn't lock the files.)
 Data needs to remain in the underlying location, even after dropping the table.
 You need a custom location, such as a non-default storage account.
PARTITIONS
 A table may be partitioned in multiple dimensions.
 For example, in addition to partitioning logs by date, we might also subpartition each date
partition by country to permit efficient queries by location.
 Partitioned are defined at table creation time using the PATITIONED by the clause, which takes
a list of column definitions.
 If we want to search a large amount of data, then we can divide the large data into partitions.

hive>create table party table(loaded int, logerror string) PARTITIONED BY (Logdt string, country
string)
BUCKETS
 To enable more efficient queries.
 To bucket a table is to make sampling more efficient.

hive>CREATE TABLE bucketed users(id INT, name STRINA)

CLUSTERED BY (id)INTO 4 BUCKETS
HIVE DATA TYPES
Primitive Data Types:

 Numeric Data types - Data types like integral, float, decimal

 String Data type - Data types like char, string
 Date/ Time Data type - Data types like timestamp, date, interval
 Miscellaneous Data type - Data types like Boolean and binary

Complex Data Types:

 Arrays - A collection of the same entities. The syntax is: array<data_type>

 Maps - A collection of key-value pairs and the syntax is map<primitive_type, data_type>
 Structs - A collection of complex data with comments. Syntax: struct<col_name : data_type [COMMENT
col_comment],…..>
 Units - A collection of heterogeneous data types. Syntax: uniontype<data_type, data_type,..>
MODES OF HIVE

 Local Mode - Used when Hadoop has one data node, and the amount of data is
small. Here, the processing will be very fast on smaller datasets, which are present in
local machines.
 Mapreduce Mode - Used when the data in Hadoop is spread across multiple data
nodes. Processing large datasets can be more efficient using this mode.
ADVANTAGES OF HIVE
 Scalability
 Familiar SQL-like interface
 Supports partitioning and bucketing
 User-defined functions
HIVE QL
 Hive QL is the HIVE QUERY LANGUAGE
 DDL and DML are the parts of HIVE QL
 Data Definition Language (DDL) is used for creating, altering and dropping
databases, tables, views, functions and indexes.
 Data manipulation language is used to put data into Hive tables and to
extract data to the file system and also how to explore and manipulate data
with queries, grouping, filtering, joining etc.
COMMANDS
 CREATE DATABASE db name -- to create a database in Hive
 USE db name -- To use the database in Hive.
 DROP db name -- To delete the database in Hive.
 SHOW DATABASE -- to see the list of the DataBase
THANKYOU

Unit-4 Pig Hive
No ratings yet
Unit-4 Pig Hive
40 pages
Bda Unit 5 Hive Notes
No ratings yet
Bda Unit 5 Hive Notes
23 pages
BDA Unit-5
No ratings yet
BDA Unit-5
39 pages
Unit IV
No ratings yet
Unit IV
64 pages
Hive Data Types and Data Models
No ratings yet
Hive Data Types and Data Models
24 pages
Bda-Unit-Iv - 2020-21
100% (1)
Bda-Unit-Iv - 2020-21
30 pages
Module 4
No ratings yet
Module 4
34 pages
Hive
No ratings yet
Hive
63 pages
Cse3002 Big Data m2
No ratings yet
Cse3002 Big Data m2
76 pages
Hive Part 2
No ratings yet
Hive Part 2
53 pages
Unit 3 BDA
No ratings yet
Unit 3 BDA
44 pages
Unit 2.2 Hive
No ratings yet
Unit 2.2 Hive
80 pages
Hive
No ratings yet
Hive
47 pages
Big Data
No ratings yet
Big Data
120 pages
Hive
No ratings yet
Hive
42 pages
Hive
No ratings yet
Hive
49 pages
(R17a0528) Big Data Analytics-57-100
No ratings yet
(R17a0528) Big Data Analytics-57-100
44 pages
Session 3.1
No ratings yet
Session 3.1
29 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Hive Final
No ratings yet
Hive Final
75 pages
Hadoop Hive
No ratings yet
Hadoop Hive
61 pages
Apache HIVE
No ratings yet
Apache HIVE
44 pages
Unit Iv Part - 1
No ratings yet
Unit Iv Part - 1
60 pages
Hive
No ratings yet
Hive
12 pages
HIVE
No ratings yet
HIVE
28 pages
6.1NoSQL ApacheHIVE Witha3
No ratings yet
6.1NoSQL ApacheHIVE Witha3
45 pages
HIVE Lect
No ratings yet
HIVE Lect
91 pages
Hive Main
No ratings yet
Hive Main
33 pages
Unit IV
No ratings yet
Unit IV
22 pages
HIVE
No ratings yet
HIVE
80 pages
Hive
No ratings yet
Hive
30 pages
Introduction To Hive
No ratings yet
Introduction To Hive
9 pages
Session 3.2
No ratings yet
Session 3.2
27 pages
Unit5 Notes
No ratings yet
Unit5 Notes
29 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
Course On: Big Data Analytics
No ratings yet
Course On: Big Data Analytics
59 pages
Hive
No ratings yet
Hive
29 pages
Hive
No ratings yet
Hive
65 pages
Module 3-1
No ratings yet
Module 3-1
32 pages
Hive
No ratings yet
Hive
23 pages
HIVE Data Types
No ratings yet
HIVE Data Types
6 pages
Hive Notes
No ratings yet
Hive Notes
15 pages
Introduction To Hive
No ratings yet
Introduction To Hive
14 pages
Unit 5-Hive
No ratings yet
Unit 5-Hive
18 pages
Hive Overview
No ratings yet
Hive Overview
28 pages
Unit 3
No ratings yet
Unit 3
8 pages
Hiveppt
No ratings yet
Hiveppt
29 pages
Big Data Analytics: Welcome
No ratings yet
Big Data Analytics: Welcome
69 pages
04 Bigdata Hive
No ratings yet
04 Bigdata Hive
22 pages
Hive Tutorial
No ratings yet
Hive Tutorial
25 pages
Cheat Sheet: Hive Basics
No ratings yet
Cheat Sheet: Hive Basics
1 page
03 Hive
No ratings yet
03 Hive
48 pages
Hive
No ratings yet
Hive
9 pages
Unit 4 HIVE - PIG
No ratings yet
Unit 4 HIVE - PIG
71 pages
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
Unit-Vi Hive Hadoop & Big Data
100% (1)
Unit-Vi Hive Hadoop & Big Data
24 pages