0% found this document useful (0 votes)
106 views24 pages

6 H Data With Hive Big Data Analytics B.tech. Final Year

Uploaded by

RISHIKA ARORA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views24 pages

6 H Data With Hive Big Data Analytics B.tech. Final Year

Uploaded by

RISHIKA ARORA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Women Engg.

College, Ajmer

Presented by : Monalisa Meena


Assistant Professor
Dept. of Computer Enginerring
Big Data Analytics
Credit: 3
Max. Marks: 150(IA:30, ETE:120) 3L+0T+0P End Term Exam: 3 Hours
 Objective,
 scope and
 outcome of the course.
 Big data features and challenges, Problems
with Traditional Large-Scale System , Sources
of Big Data, 3 V’s of Big Data, Types of Data.
Working with Big Data: Google File System.
Hadoop Distributed File System (HDFS) -
Building blocks of Hadoop (Namenode. Data
node. Secondary Namenode. Job Tracker.
Task Tracker), Introducing and Configuring
Hadoop cluster (Local. Pseudo- distributed
mode, Fully Distributed mode). Configuring
XML files.
 A Weather Dataset. Understanding Hadoop
API for MapReduce Framework (Old and New).
Basic programs of Hadoop MapReduce: Driver
code. Mapper code, Reducer code. Record
Reader, Combiner,Partitioner.
 The Writable Interface. Writable Comparable
and comparators. Writable Classes: Writable
wrappers for Java primitives. Text. Bytes
Writable. Null Writable, Object Writable and
Generic Writable. Writable collections.
Implementing a Custom Writable:
Implementing a Raw Comparator for speed,
Custom comparators.
 Hadoop Programming Made Easier Admiring
the Pig Architecture, Going with the Pig Latin
Application Flow. Working through the ABCs
of Pig Latin. Evaluating Local and Distributed
Modes of Running Pig Scripts, Checking out
the Pig Script Interfaces, Scripting with Pig
Latin.
 Saying Hello to Hive, Seeing How the Hive is
Put Together, Getting Started with Apache
Hive. Examining the Hive Clients. Working
with Hive Data Types. Creating and Managing
Databases and Tables, Seeing How the Hive
Data Manipulation Language Works, Querying
and Analyzing Data.
 Most Popular Dataware House
 The Apache Hive ™ data warehouse software
facilitates reading, writing, and managing
large datasets residing in distributed storage
using SQL.
 Originaly developed by facebook
 Now maintained by Apache Hive by Apache
Foundation
 A command line tool and JDBC driver are
provided to connect users to Hive.
 Traditional- SQL queries for extracting data.
 Hadoop and big data
 Hive provides SQL intellect, so that users can
write SQL like queries-HQL to extract the data
from hadoop
 Used in OLAP
 Scalable, Flexible and fast
 Helpful for the users to write SQL like queries
for the dataset which resides in HDFS.
 It is not the relational databases.
 Not be used in OLTP
 Not be used for real time updates and
queries, and applications where low latency
data retrieval is required
1. Used for data Analysis
2. Supports different file formats
3. Metadata is in RDBMS
4. Compression techniques
5. HQL support
6. UDF support
7. Specialized join operations
8. Simplifies and abstracts load on hadoop
9. No need to learn java and hadoop API
 Data Mining
 Document Indexing
 Predictive modelling
 Business Intelligence
 Log processing
 SQL type Queries
 OLAP based design
 Fast
 Scalable
 Hive closely associated with RDBMS/EDW
technology is extract, transform, and load
(ETL) technology.
 extract unstructured text data from an
Internet forum
 transform the data into a structured format
 then load the structured data into its EDW.
 Apache Hive gives you powerful analytical
tools, all within the framework of HiveQL.
1. Hive command-line interface (CLI)
2. Hive Web Interface (HWI) Server
3. Open source SQuirreL client using the JDBC
driver.
hadoop-and-big-data-unit-6.pdf
 Integers
◦ Tinyint
◦ Smallint
◦ Int
◦ Bigint
 Float
◦ Float
◦ Double

 String
 Boolean
 Same as RDBMS with default settings
 Alter table emp rename to Employee
 Alter table employee add column(address
string)
 Alter table employee change name string to
firstname string
 hadoop-and-big-data-unit-6.pdf

You might also like