0% found this document useful (0 votes)

180 views83 pages

Bda Unit-5 PDF

Unit 5 covers topics related to NoSQL databases including key-value stores, document stores, column stores, graph databases, and characteristics of NoSQL databases. It discusses advantages of NoSQL such as flexibility to handle diverse data, ease of distribution and scaling, and relaxed consistency requirements compared to relational databases. Some disadvantages are lack of transactions and weaker security. NewSQL aims to provide scalability of NoSQL with consistency of SQL databases.

Uploaded by

Harry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

180 views83 pages

Bda Unit-5 PDF

Uploaded by

Harry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 83

Unit- 5

Topics
• Introduction to NoSQL Databases

• Introduction to Hive
Relational Databases
• A relational database refers to a database that stores data in a structured format, using rows and
columns. This makes it easy to locate and access specific values within the database
• Relation is sometimes used to refer to a table in a relational database but is more commonly refers to
the relation between the different elements of a row, e.g.

• The relation is defined in a “schema”. It is the logical definition of a table

• The data is said to be structured.
• The data usually consists of simple types like integers, strings, floats etc.
Relational Database Management System
(RDBMS)
• Software that manages a collection
or database of tables.

• Is designed to support multi-user

access.

• Transactional Processing - Designed

for random access of table elements
for updation purposes as opposed
to batch processing.
Features of Relation Databases
• Records are organized into tables

• Rows of tables are identified by unique keys

• Data Spans multiple tables, which are linked by join operation

• Transactions are ACID-compliant

Structured Query Language -SQL
• Structured Query Language or SQL is a standard Database language which is used to create,
maintain and retrieve the data from relational databases

• Much more compact and expressive than programs written in standard programming languages
such as C++, Java etc. But only for tabular data stores.

Note: SQL has been found a very effective language for relational databases, hence they are so closely associated with
RDBMS systems. But there is no rule which states that SQL has to be used for RDBMS.
Denormalization
• Avoid Joins
• Expand number of columns
• Design table to include related
data
• Query a single table
• Improves read performance
• Introduces the possibility of data
anomalies
Creating shards
• Breaking up a database and storing
pieces of the database on different
servers.
• Uses multiple database instances
• Stores a subset of data
• Queries are read from a subset of
shards
• Improves read and write
performance
• Complex
Replication
• Makes copies of tables and
indexes
• Copies are stored on different
servers
• Any copy may be used to answer
a query
• Improves read performance
• Possibility of inconsistency
Not Only SQL (NoSQL) DatabasesHistory
• RDBMS found unsuitable to handle unstructured data
generated by the proliferation of the internet.
• Unstructured data includes: web pages, images, audio
clips, videos, documents (pdf, csv, text).
• There is a need to mine the data, hence a need to store
and manipulate the data in an efficient and organized
manner.
• Difficult to scale RDBMS on clusters.
• All the above gives rise to NoSQL databases around the
year 2000.
• Origins in Google’s BigTable and Amazon’s SimpleDB.

• Note: The “SQL” in the name “NoSQL” does not imply that these
category of databases do not or can not use SQL as the query
language.
What isNoSQL?
Non-relational data storage systems

No fixed table schema

No Joins
NoSQL

No multi-document transactions

Relaxes one or more ACID properties

NoSQL Database Types
• Key-Value Store.

• Document Store.

• Column Store.

• Graph databases
Key-Value Pair Store
• Key is unique.

• Value can be anything including a

document, an image etc.

• DBMS typically does not know anything

about the contents of the “value”.

• But database might allow storage of

metadata about the values.

• Application: online shopping information -

(user, user preferences)
Document Store
• Pair each key with a complex data
structure known as document
• Documents can contain many
different key-value pairs or key-array
pairs or even nested documents
• Support for embedded document
• Consumes more space as compared
to counterparts
• MongoDB is an example of this type
• Collection contains lots of document
• Each document can contain diverse
and heterogeneous field.
https://beginnersbook.com/2017/09/mapping-relational-databases-to-mongodb/
Graph stores
• Used to store information about
networks of data, such as social
networking connections
• Graph stores include Neo4J.
• Not very well suited for all sets
of problems
• Best suited for connected data
Wide Column Stores
• Store columns of data together
instead of rows
• Cassandra and Hbase are
optimized for queries over large
datasets
• Excellent for lookups on a single
field
• Lookup on other fields are not
supported
• Columns are not fixed
Types ofNoSQL

Key value data Column-oriented Document data Graph data

store data store store store

• Riak • Cassandra • MongoDB • InfiniteGraph

• Redis • HBase • CouchDB • Neo4
• Membase • HyperTable • RavenDB • Allegro Graph
NoSQL Vendors

Company Product Most widely used by

Amazon DynamoDB LinkedIn, Mozilla

Facebook Cassandra Netflix, Twitter, eBay

Google BigTable Adobe Photoshop

NoSQL Characteristics
Advantages ofNoSQL
Cheap, Easy to implement

Easy to distribute

Can easily scale up & down

Advantages of NoSQL
Relaxes the data consistency
requirement

Doesn’t require a pre-defined

schema

Data can be replicated to

multiple nodes and can be
partitioned
BASE Properties Has to do with
the “AP” of CAP.

• Basic Availability: The database appears to work most of the time (even if some nodes fail, or
packets are dropped).
• Soft-state: State changes even without input (to provide eventual consistency). Both have to do
with the “C” in
• Eventual consistency: Stores exhibit consistency at some later point. CAP.

BASE is a relaxed form of the CAP properties. NoSQL databases strive to satisfy the
BASE properties.

The BASE model is a flexible alternative (as is found acceptable with customer
shopping data) to the ACID model for databases that don't require strict adherence
to a relational model (as is required for banking data).
NoSQL Pros and Cons
Cons
Pros
• Not mature.
• Handles the diverse kind of data
generated by proliferation of the
internet. Flexible. • Do not provide same level of
guarantees (ACID properties) as
RDBMS systems.
• Designed to scale.
• Not transactional.
• Easier to maintain.
• Less secure.

• Not designed for typical business

intelligence applications.
SQL Vs.NoSQL
SQL NoSQL
Relational database Non-relational, distributed database
Relational model Model-less approach
Pre-defined schema Dynamic schema for unstructured data
Table based databases Document-based or graph-based or wide column store or key-value
pairs databases
Vertically scalable (by increasing system resources) Horizontally scalable (by creating a cluster of commodity machines)
Uses SQL Uses UnQL (Unstructured QueryLanguage)
Not preferred for large datasets Largely preferred for large datasets
Not a best fit for hierarchical data Best fit for hierarchical storage as it follows the key-value pair of
storing data similar to JSON (Java Script ObjectNotation)
Emphasis on ACID properties Follows Brewer’s CAP theorem
Excellent support from vendors Relies heavily on community support
Supports complex querying and data keeping needs Does not have good support for complex querying
Can be configured for strong consistency Few support strong consistency (e.g., MongoDB), few others can be
configured for eventual consistency (e.g., Cassandra)
Examples: Oracle, DB2, MySQL, MS SQL,PostgreSQL, MongoDB, HBase, Cassandra, Redis, Neo4j, CouchDB, Couchbase, Riak,
etc. etc.
NewSQL
Goal is to provide the scalabilityand flexibility of NoSQLdatabasesand the consistency of SQLdatabases

SQL interface for application interaction

ACID support for transactions

Characteristics of NewSQL An architecture that provides higher per node

performance vis-a-vs traditional RDBMS solution

Scale out, shared nothing architecture

Non-locking concurrency control mechanism so

that real time reads will not conflict with writes
SQL Vs. NoSQL Vs.NewSQL
SQL NoSQL NewSQL
Adherence to ACID Yes No Yes
properties
OLTP/OLAP Yes No Yes
Schema rigidity Yes No Maybe
Adherence to data model Adherence to
relational model
Data Format Flexibility No Yes Maybe
Scalability Scale up Scale out Scale out
Vertical Scaling Horizontal
Scaling
Distributed Computing Yes Yes Yes
Community Support Huge Growing Slowly
growing
Introduction to Hive
History of Hive

Hive 0.14
Hive 0.10 Hive 0.13
• Transaction with ACID
• Batch • Interactive
semantics
• Read –only Data • Read –only Data
• Cost Based Optimizer
• Hive QL • Substantial SQL
• SQL temporary tables
• MR • MR,TEZ
• MR, TEZ, Spark

Enterprise SQL at Hadoop Scale

Hive a Data Warehousing Tool
When to use hive
Meta Store in Hive (metastore)
• The Metastore stores the information about the tables, partitions, the
columns within the tables.
• There are 3 ways of storing in Metastore:
• Embedded Mode
• Local Mode
• Remote Mode
Embedded Metastore

• In this mode, the Metastore service run in the same JVM as Hive service and contains an embedded Derby
database instance backed by local disk. This mode required least configuration but support only 1 session at a
time. Therefore not suited for production.
Local meta store

In this mode, Metastore service run in the same JVM as Hive service, but Metastore
database run on separate process.
In this mode, Metastore service run on its own JVM. This brings better manageability and security because the
database tier can be completely fire walled off, and the clients no longer need the database credentials. In this,
Metastore service communicate with database over JDBC. Hadoop ecosystem software can communicate with
Hive using Thrift service.
Namespaces that separate tables
Database
and other data units
SQL HiveQL

Insert values row by row Insertion of bulk data(not single row at a time)

Update command is used Update command cannot be used

Delete command used to delete row or column Can not be used

Hive Query Language (HiveQL)
• It is HiveQL and not HQL.
• Based on SQL.
• Does not strictly follow the full SQL-92 standard.
• HiveQL offers extensions not in SQL including multitable inserts.
• Limited support for various SQL operations such as subqueries.
• Internally, a compiler translates HiveQL statements into a directed
acyclic graph of MapReduce, Tez, or Spark jobs, which are executed
on a distributed cluster.
Hive Query Language (HQL)
1. Create and manage tables and partitions.
2. Support various Relational, Arithmetic, and Logical Operators.
3. Evaluate functions.

4. Download the contents of a table to a local directory or result of queries to HDFS directory.
5. Large number of functions defined in Hive. Categorized as mathematical, Statistical, String, Date, Conditional,
Aggregate and so on.
We can retrive the list on hive shell by
hive> show function
Data Definition Language
• Build and modify the tables & other objects in the database
• Create/Drop/Alter Database
• Create/Drop/Truncate Table
• Alter Table/Partition/Column
• Create/Drop/Alter View
• Create/Drop/Alter Index
• Show
• Describe
Data Manipulation Language
• To receive
• Store
• Modify
• Delete
• Update data in database
Database
• To create a database named “STUDENTS” with comments and database properties.

• CREATE DATABASE IF NOT EXISTS STUDENTS

COMMENT 'STUDENT Details’
WITH DBPROPERTIES ('creator' = 'JOHN');

To describe a database. To drop database.

DESCRIBE DATABASE STUDENTS; DROP DATABASE STUDENTS;

Internal versus External Tables
Internal Table(Managed Table) External Table(Self Managed Table)
• Table data is stored in Hive managed HDFS • Table data is not managed by Hive and is
store. stored outside the warehouse.
• Dropping the table deletes the table • Dropping the table deletes the metadata but
metadata and data. not the data.
• Default create table • “External” is used, location need to be
specified
• One file is referred
• One file is referred by any number of tables,
by one table only
external references by location
To create managed table named ‘STUDENT’

CREATE TABLE IF NOT EXISTS student (rollno INT,name STRING,gpaFLOAT)

ROW FORMAT DELIMITED
FIELDS TERMINATED BY'\t';

To create external table named ‘EXT_STUDENT’

CREATE EXTERNAL TABLE IF NOT EXISTS ext_student(rollno INT,name STRING,gpa FLOAT)

ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t’
LOCATION ‘/STUDENT_INFO;
To load data into the table from file named student.tsv.

LOAD DATA LOCAL INPATH ‘/root/hivedemos/student.tsv’

OVERWRITE INTO TABLE ext_student;

To retrieve the student details from “EXT_STUDENT” table.

SELECT * FROM ext_student;

Partitioning - Prelimaries
Employee Name Employee ID Country
Big Table
Alok Nath 36554 India

Arun Thomas 36553 India

Break into smaller parts based on key
Geeta Rao 36555 India (“Country” in this case) and store in
separate units (can be files).
Susan Phillips 71222 UK

John Chambers 71225 UK

If data is required only from one part
(say India), access will be faster.

Breaking into too many small parts causes degradation of performance e.g. by employeeID.
Hashing - Prelimaries
Employee Name Employee ID Country Big Table Solution:
- map each key to a number e.g.
Alok Nath 36554 India
(empid modulo 2).
Arun Thomas 36553 India - 36554 mod 2 = 0
- 36553 mod 2 = 1
Geeta Rao 36555 India
- even EmpID mod 2 = 0
Susan Phillips 71222 UK - odd EmpID mod 2 = 1
- partition by above number.
John Chambers 71225 UK
- two partitions generated for above
Liam Neeson 80162 Ireland numbers, one with odd empid, other
with even empid.
Milo O’Shea 80233 Ireland
- Generating a partition number using
a function on a key is called Hashing.
Require to “partition” by empid. But do not want one
partition per key since it leads to too many partitions.
partitioning based on hashing is
Why “partition” by empid (but want small number of partitions) ? called hashPartitioning in Part 3.
Could be for joining two tables by empid (see example in Part 3).
Partitions
• Partitions split the larger dataset into more meaningful chunks.
• Partition improves i/o performance
• Hive provides two kinds of partitions:
• Static Partition
• Dynamic Partition.
Static Partitions

• Static Partition can be done on columns whose values are known at compile time
• create static partition based on “gpa” column.
CREATE TABLE IF NOT EXISTS static_part_student (rollno INT, name STRING)
PARTITIONED BY (gpa FLOAT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';

• Load data into partition table from table

INSERT OVERWRITE TABLE static_part_student PARTITION (gpa =4.0);
SELECT rollno, name from EXT_STUDENT where gpa=4.0;
Dynamic Partition

To create dynamic partition- The Column whose values are know only at execution time

CREATE TABLE IF NOT EXISTS dynamic_part_student(rollno INT, name STRING)

PARTITIONED BY (gpa FLOAT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';

• To load data into a dynamic partition table from table.

SET hive.exec.dynamic.partition = true;

SET hive.exec.dynamic.partition.mode = nonstrict;

Note: The dynamic partition strict mode requires at least one static partition column. To turn this off,
set hive.exec.dynamic.partition.mode=nonstrict

INSERT OVERWRITE TABLE DYNAMIC_PART_STUDENT

PARTITION (gpa);
SELECT rollno,name,gpa FROM ext_student;
Hive Partitions and Buckets
• Hive partitions are partitions generated by using keys which could be
column values or may not be.

• Hive buckets are “partitions” generated by hashing column values.

Hive Partitions
CREATE TABLE logs (ts BIGINT, line STRING)
Example: Partitions by date data was created and
PARTITIONED BY (dt STRING, country STRING);
further by country. Note: Date is not part of table.
LOAD DATA LOCAL INPATH ‘input/hive/partitions/file1’
INTO TABLE logs
PARTITION (dt=‘2001-01-01’, country=‘GB”)

• Separate folders / directories are

created per partition.

• Partition values may or may not be part

of the table data.
Hive Buckets
• Buckets are specified CREATE TABLE bucketed_users (id INT, name STRING)
using column names and CLUSTERED BY (id) INTO 4 BUCKETS;

number of buckets.
column on which to hash.
number of buckets into
which column entries
should be hashed.

can use id modulo

4 to hash.
• To create a bucketed table having 3 buckets.

CREATE TABLE IF NOT EXISTS student_bucket (rollnoINT,name STRING,grade FLOAT)

CLUSTERED BY (grade) into 3 buckets;

• Load data to bucketed table.

FROM STUDENT INSERT OVERWRITE TABLE student_bucket

SELECT rollno,name,grade;

• To display the content of first bucket.

SELECT DISTINCT grade FROM student_bucket TABLESAMPLE(BUCKET 1 OUT OF 3 ON GRADE);

Hive supports aggregation functions like avg, count, etc.

To write the average and count aggregation function.

SELECT avg(gpa) FROM STUDENT;

SELECT count(*) FROM STUDENT;

To write group by and having function.

SELECT rollno, name,gpa

FROM STUDENT
GROUP BY rollno,name,gpa
HAVING gpa > 4.0;
SerDe
• SerDe stands for Serialization and Deserialization.
• A serialization function converts complex in-memory data (for example a Java class object or Hive
Table) into a string to be stored on disk.
• The string is usually in compressed binary format for space savings.
• A deserialization function converts the string back into the complex data structure in-memory.
• Since a string is a “uniform sequential” or serial data form as opposed to a complex data
structure, this conversion is known as serialization.
• Hive can use SerDe functions to read / write its data from HDFS efficiently.
• Custom SerDe can be used.
• Note: RCFile storage uses SerDe to compress column data.
User Defined Functions
User Defined Functions allow customization of Hive queries.

1. Create a Java class for the User Defined Function, public final class MyUpperCase extends UDF {
Class must extend UDF abstract class public string evaluate(final String word) {
return word.toUpperCase
2. Class must have one or more evaluate() }
methods. Put in your desired logic. }

3. Compile the java file.

hive> ADD JAR UpperCase.jar;
4. Package your Java class into a JAR file.
hive> CREATE TEMPORARY FUNCTION toUpperCase AS
5. Go to Hive CLI and add your JAR. MyUpperCase;

6. CREATE TEMPORARY FUNCTION in Hive which

points to your Java class. hive> SELECT toUpperCase(name) FROM STUDENT;

• Use it in Hive SQL ! Note: The syntax of the Hive commands above are not meant to be complete
and are for illustration purposes only.
End of Unit 5

MacOS Monograph
No ratings yet
MacOS Monograph
58 pages
Sanjay REPORT
No ratings yet
Sanjay REPORT
7 pages
The Global Positioning System Consists of Three Major Segments
No ratings yet
The Global Positioning System Consists of Three Major Segments
6 pages
No SQL
No ratings yet
No SQL
3 pages
CWS19产品资料英文
No ratings yet
CWS19产品资料英文
7 pages
BDA Module 5 - Part1 (No SQL) 2023
No ratings yet
BDA Module 5 - Part1 (No SQL) 2023
32 pages
Unit V Big Data Frameworks
No ratings yet
Unit V Big Data Frameworks
42 pages
Unit 3 Nosql Databases Adt
No ratings yet
Unit 3 Nosql Databases Adt
64 pages
Module 1 Introduction
No ratings yet
Module 1 Introduction
9 pages
Unit II - BDA NEW
No ratings yet
Unit II - BDA NEW
48 pages
Unit II - BIG DATA ANALYTICS
No ratings yet
Unit II - BIG DATA ANALYTICS
11 pages
Intro S4HANA Using Global Bike Exercises PP Fiori en v4.2
No ratings yet
Intro S4HANA Using Global Bike Exercises PP Fiori en v4.2
16 pages
Clock
No ratings yet
Clock
13 pages
BasicMath F4 2022
No ratings yet
BasicMath F4 2022
6 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
Overview of NoSQL
No ratings yet
Overview of NoSQL
17 pages
No SQL
No ratings yet
No SQL
32 pages
Chapter 5: No SQL Data Management and Mongodb: Unit-2
No ratings yet
Chapter 5: No SQL Data Management and Mongodb: Unit-2
65 pages
Big Data Unit 3
No ratings yet
Big Data Unit 3
374 pages
DBMS Module 5 Part 2
No ratings yet
DBMS Module 5 Part 2
18 pages
NoSQL Lec
No ratings yet
NoSQL Lec
45 pages
No SQL
No ratings yet
No SQL
12 pages
Nosql Module 1
No ratings yet
Nosql Module 1
23 pages
2 Failure Theory
No ratings yet
2 Failure Theory
53 pages
CloudComputing DATABASE
No ratings yet
CloudComputing DATABASE
27 pages
Part 1 Summary
No ratings yet
Part 1 Summary
70 pages
Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
Chapter 1 - Introducing Big Data & NoSQL
No ratings yet
Chapter 1 - Introducing Big Data & NoSQL
14 pages
BDT Unit-Ii
No ratings yet
BDT Unit-Ii
13 pages
Itelect2a Module 1
No ratings yet
Itelect2a Module 1
37 pages
41 NoSQL Introduction
No ratings yet
41 NoSQL Introduction
18 pages
Chapter14 BigData&NoSQLDatabases
No ratings yet
Chapter14 BigData&NoSQLDatabases
39 pages
SAL Event Documentation
No ratings yet
SAL Event Documentation
13 pages
BDA Unit-3
No ratings yet
BDA Unit-3
13 pages
NoSQL Database
No ratings yet
NoSQL Database
10 pages
No SQL Lecture Notes
No ratings yet
No SQL Lecture Notes
17 pages
Bayesian Learning Unit 3 PDF
No ratings yet
Bayesian Learning Unit 3 PDF
18 pages
Atlas Copco Pf4000 Manual
67% (6)
Atlas Copco Pf4000 Manual
476 pages
BDA - M 3 - NoSQL
No ratings yet
BDA - M 3 - NoSQL
81 pages
2010 Ford Scape 3.0l Fluid Capacities
No ratings yet
2010 Ford Scape 3.0l Fluid Capacities
2 pages
DSA 4-Introduction To NoSQL
No ratings yet
DSA 4-Introduction To NoSQL
59 pages
Module 5 - NoSQL Databases
No ratings yet
Module 5 - NoSQL Databases
33 pages
Unit 2
No ratings yet
Unit 2
26 pages
A Study On E-Commerce Recommender System Based On Big Data
No ratings yet
A Study On E-Commerce Recommender System Based On Big Data
5 pages
NoSQL Notes
No ratings yet
NoSQL Notes
11 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
Stocks Analysis and Prediction Using Big Data Analytics
No ratings yet
Stocks Analysis and Prediction Using Big Data Analytics
4 pages
No SQL
No ratings yet
No SQL
12 pages
Bda CHP 3
No ratings yet
Bda CHP 3
75 pages
Unit 2
No ratings yet
Unit 2
65 pages
Data Base - SQL Vs NoSQL
No ratings yet
Data Base - SQL Vs NoSQL
14 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
Unit 2 Handouts
No ratings yet
Unit 2 Handouts
11 pages
BDT Unit 4
No ratings yet
BDT Unit 4
93 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
Augmented Reality in Education and Training: Techtrends March 2012
No ratings yet
Augmented Reality in Education and Training: Techtrends March 2012
10 pages
Unit 3
No ratings yet
Unit 3
10 pages
TUPLE
No ratings yet
TUPLE
16 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
Audovia Documentation 4.0
No ratings yet
Audovia Documentation 4.0
12 pages
Unit 2
No ratings yet
Unit 2
23 pages
Discussion Forum Unit 5
No ratings yet
Discussion Forum Unit 5
2 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
NOs QL
No ratings yet
NOs QL
14 pages
ANN-unit 4 PDF
No ratings yet
ANN-unit 4 PDF
23 pages
ANN-unit 4 PDF
No ratings yet
ANN-unit 4 PDF
23 pages
NOSQL
No ratings yet
NOSQL
25 pages
Lecture 3.1.2
No ratings yet
Lecture 3.1.2
47 pages
Emulator Ground Plane PDF
No ratings yet
Emulator Ground Plane PDF
2 pages
Arabic Pronunciation Activity - Azida Hazlin Binti Hayazi (MC200912233) (Section 2)
No ratings yet
Arabic Pronunciation Activity - Azida Hazlin Binti Hayazi (MC200912233) (Section 2)
3 pages
HCIE-R&S Huawei Certified Internetwork Expert-Routing and Switching Material V1.1
No ratings yet
HCIE-R&S Huawei Certified Internetwork Expert-Routing and Switching Material V1.1
1,212 pages
Table 1: Sales and Advertising Data Agent Sales Advertising: Regression Statistics
No ratings yet
Table 1: Sales and Advertising Data Agent Sales Advertising: Regression Statistics
10 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
Big Data Based Retail Recommender System of Non E-Commerce: IEEE - 33044
No ratings yet
Big Data Based Retail Recommender System of Non E-Commerce: IEEE - 33044
7 pages
NoSQL Databases Notes
No ratings yet
NoSQL Databases Notes
5 pages
Chemical Shift
No ratings yet
Chemical Shift
10 pages
PDF
No ratings yet
PDF
3 pages
Getting Started With Vuforia: 1. Create A License Key For Our Application
No ratings yet
Getting Started With Vuforia: 1. Create A License Key For Our Application
12 pages
Corrosion Protection of Rock Bolts by Epoxy Coating and Its Effec PDF
No ratings yet
Corrosion Protection of Rock Bolts by Epoxy Coating and Its Effec PDF
9 pages
14 Slide
No ratings yet
14 Slide
44 pages
Full Stack-Unit-Iii
No ratings yet
Full Stack-Unit-Iii
56 pages
Computer Networks Lab Manual On Computer Networks (08CSL67) & VIVA VOCE Questions
No ratings yet
Computer Networks Lab Manual On Computer Networks (08CSL67) & VIVA VOCE Questions
46 pages
Deaerator Performance Testing
100% (3)
Deaerator Performance Testing
3 pages
Cryptography and Network Security Self-Study: Submitted by
No ratings yet
Cryptography and Network Security Self-Study: Submitted by
28 pages
Cryptography and Network Security Self-Study: Submitted by
No ratings yet
Cryptography and Network Security Self-Study: Submitted by
28 pages
Bda Unit-4 PDF
No ratings yet
Bda Unit-4 PDF
63 pages
Lec 15 Notes
No ratings yet
Lec 15 Notes
3 pages
KRNT fx175qtv Data Cheet PDF
No ratings yet
KRNT fx175qtv Data Cheet PDF
2 pages
Husqvarna 2003 SM WRE 125 Manual
No ratings yet
Husqvarna 2003 SM WRE 125 Manual
2 pages
MongoDB Slides Until ClassTest
No ratings yet
MongoDB Slides Until ClassTest
221 pages
Unit 1 & 2
No ratings yet
Unit 1 & 2
26 pages
Mu-Analysis and Synthesis Toolbox
No ratings yet
Mu-Analysis and Synthesis Toolbox
734 pages
Kobelco 6E - Hyd Motors PDF
100% (1)
Kobelco 6E - Hyd Motors PDF
26 pages
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet
Fdocuments - in Nosql-Seminar
No ratings yet
Fdocuments - in Nosql-Seminar
40 pages
Features of Nosql: Non-Relational
No ratings yet
Features of Nosql: Non-Relational
7 pages
Realtek Driver For Windows 10
No ratings yet
Realtek Driver For Windows 10
5 pages
Introduction To: Nosql
No ratings yet
Introduction To: Nosql
27 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
Design of Horizontal Axis Tidal Turbines
No ratings yet
Design of Horizontal Axis Tidal Turbines
8 pages
C Program by Best Author
No ratings yet
C Program by Best Author
358 pages
NoSQL DATABSES
No ratings yet
NoSQL DATABSES
12 pages
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
No ratings yet
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
44 pages
Why Nosql - Ibm
No ratings yet
Why Nosql - Ibm
6 pages
DBA's Guide to NoSQL
From Everand
DBA's Guide to NoSQL
The Enlightened DBA
5/5 (1)

Bda Unit-5 PDF

Uploaded by

Bda Unit-5 PDF

Uploaded by

Unit- 5

• The relation is defined in a “schema”. It is the logical definition of a table

• Is designed to support multi-user

• Transactional Processing - Designed

• Rows of tables are identified by unique keys

• Data Spans multiple tables, which are linked by join operation

• Transactions are ACID-compliant

No fixed table schema

Relaxes one or more ACID properties

• Value can be anything including a

• DBMS typically does not know anything

• But database might allow storage of

• Application: online shopping information -

Key value data Column-oriented Document data Graph data

• Riak • Cassandra • MongoDB • InfiniteGraph

Company Product Most widely used by

Amazon DynamoDB LinkedIn, Mozilla

Facebook Cassandra Netflix, Twitter, eBay

Google BigTable Adobe Photoshop

Can easily scale up & down

Doesn’t require a pre-defined

Data can be replicated to

• Not designed for typical business

SQL interface for application interaction

ACID support for transactions

Characteristics of NewSQL An architecture that provides higher per node

Scale out, shared nothing architecture

Non-locking concurrency control mechanism so

Enterprise SQL at Hadoop Scale

Update command is used Update command cannot be used

Delete command used to delete row or column Can not be used

• CREATE DATABASE IF NOT EXISTS STUDENTS

To describe a database. To drop database.

DESCRIBE DATABASE STUDENTS; DROP DATABASE STUDENTS;

CREATE TABLE IF NOT EXISTS student (rollno INT,name STRING,gpaFLOAT)

To create external table named ‘EXT_STUDENT’

CREATE EXTERNAL TABLE IF NOT EXISTS ext_student(rollno INT,name STRING,gpa FLOAT)

LOAD DATA LOCAL INPATH ‘/root/hivedemos/student.tsv’

To retrieve the student details from “EXT_STUDENT” table.

SELECT * FROM ext_student;

Arun Thomas 36553 India

John Chambers 71225 UK

• Load data into partition table from table

CREATE TABLE IF NOT EXISTS dynamic_part_student(rollno INT, name STRING)

• To load data into a dynamic partition table from table.

SET hive.exec.dynamic.partition = true;

INSERT OVERWRITE TABLE DYNAMIC_PART_STUDENT

• Hive buckets are “partitions” generated by hashing column values.

• Separate folders / directories are

• Partition values may or may not be part

can use id modulo

CREATE TABLE IF NOT EXISTS student_bucket (rollnoINT,name STRING,grade FLOAT)

• Load data to bucketed table.

FROM STUDENT INSERT OVERWRITE TABLE student_bucket

• To display the content of first bucket.

SELECT DISTINCT grade FROM student_bucket TABLESAMPLE(BUCKET 1 OUT OF 3 ON GRADE);

To write the average and count aggregation function.

SELECT avg(gpa) FROM STUDENT;

SELECT count(*) FROM STUDENT;

To write group by and having function.

SELECT rollno, name,gpa

3. Compile the java file.

6. CREATE TEMPORARY FUNCTION in Hive which

You might also like