0% found this document useful (0 votes)

3 views30 pages

08 SQLOperators BigDataNB

The document discusses relational algebra and SQL operators, including selection, projection, union, intersection, and join, as well as their implementation using the MapReduce paradigm. It highlights that MapReduce is efficient for full scans but not for selective queries, and it outlines how preprocessing activities often involve relational operators. Additionally, the document covers the implementation of various operations such as filtering, union, intersection, and difference using mappers and reducers.

Uploaded by

montanavincenzo04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views30 pages

08 SQLOperators BigDataNB

Uploaded by

montanavincenzo04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

 The relational algebra and the SQL language

have many useful operators

 Selection
 Projection
 Union, intersection, and difference
 Join (see Join design patterns)
 Aggregations and Group by (see the
Summarization design patterns)

2
 The MapReduce paradigm can be used to
implement relational operators
 However, the MapReduce implementation is
efficient only when a full scan of the input table(s)
is needed
▪ i.e., when queries are not selective and process all data
 Selective queries, which return few tuples/records
of the input tables, are usually not efficient when
implemented by using a MapReduce approach

3
 Most preprocessing activities involve
relational operators
 E.g., ETL processes in the data warehousing
application context

4
 Relations/Tables (also the big ones) can be
stored in the HDFS distributed file system
 They are broken in blocks and spread across the
servers of the Hadoop cluster

5
 Note
 In relational algebra, relations/tables do not
contain duplicate records by definition
 This constraint must be satisfied by both
the input and the output relations/tables

6
 σC (R)
 Applies predicate (condition) C to each
record of table R
 Produces a relation containing only the
records that satisfy predicate C
 The selection operator can be
implemented by using the filtering
pattern
7
Courses CCode CName Semester ProfID
M2170 Computer science 1 D102
M4880 Digital systems 2 D104
F1401 Electronics 1 D104
F0410 Databases 2 D102

 Find the courses held in the second semester

 σSemester=2 (Courses)

8
Courses CCode CName Semester ProfID
M2170 Computer science 1 D102
M4880 Digital systems 2 D104
F1401 Electronics 1 D104
F0410 Databases 2 D102

Result CCode CName Semester ProfID

M4880 Digital systems 2 D104
F0410 Databases 2 D102

9
 Map-only job
 Each mapper
 Analyzes one record at a time of its
split
▪ If the record satisfies C then it emits a
(key,value) pair with key=record and
value=null
▪ Otherwise, it discards the record
10
 πS(R)
 For each record of table R, keeps only
the attributes in S
 Produces a relation with a schema
equal to S (i.e., a relation containing
only the attributes in S)
 Removes duplicates, if any
11
Professors ProfId PSurname Department
D102 Smith Computer engineering
D105 Jones Computer engineering
D104 Smith Electronics

 Find the surnames of all professors

 πPSurname(Professors)

12
Professors ProfId PSurname Department
D102 Smith Computer engineering
D105 Jones Computer engineering
D104 Smith Electronics

Result PSurname
Smith
Jones

 Duplicated values are removed

13
 Each mapper
 Analyzes one record at a time of its split
▪ For each record r in R
▪ It selects the values of the attributes in S and
constructs a new record r’
▪ It emits a (key,value) pair with key=r’ and value=null
 Each reducer
 Emits one (key, value) pair for each input
(key, [list of values]) pair with key=r’ and
value=null
14
 RS
 R and S have the same schema
 Produces a relation with the same schema
of R and S
 There is a record t in the output of the union
operator for each record t appearing in R or
S
 Duplicated records are removed
15
DegreeCourseProf
ProfID PSurname Department
D102 Smith Computer engineering
D105 Jones Computer engineering
D104 White Electronics

MasterCourseProf
ProfID PSurname Department
D102 Smith Computer engineering
D101 Red Electronics

 Find information relative to the professors of degree

courses or master’s degrees
 DegreeCourseProf  MasterCourseProf

16
DegreeCourseProf
ProfID PSurna Department
me
D102 Smith Computer
engineering Result
D105 Jones Computer ProfID PSurna Department
engineering me
D104 White Electronics D102 Smith Computer
engineering
MasterCourseProf
D105 Jones Computer
ProfID PSurna Department engineering
me D104 White Electronics
D102 Smith Computer D101 Red Electronics
engineering
D101 Red Electronics
17
 Mappers
 For each input record t in R, emit one (key,
value) pair with key=t and value=null
 For each input record t in S, emit one (key,
value) pair with key=t and value=null
 Reducers
 Emit one (key, value) pair for each input (key,
[list of values]) pair with key=t and value=null
▪ i.e., one single copy of each input record is
emitted

18
 RS
 R and S have the same schema
 Produces a relation with the same schema
of R and S
 There is a record t in the output of the
intersection operator if and only if t appears
in both relations (R and S)

19
DegreeCourseProf
ProfID PSurname Department
D102 Smith Computer engineering
D105 Jones Computer engineering
D104 White Electronics

MasterCourseProf
ProfID PSurname Department
D102 Smith Computer engineering
D101 Red Electronics

 Find information relative to professors teaching both

degree courses and master’s courses
 DegreeCourseProf  MasterCourseProf

20
DegreeCourseProf
ProfID PSurna Department
me
D102 Smith Computer
engineering Result
D105 Jones Computer ProfID PSurna Department
engineering me
D104 White Electronics D102 Smith Computer
engineering
MasterCourseProf
ProfID PSurna Department
me
D102 Smith Computer
engineering
D101 Red Electronics
21
 Mappers
 For each input record t in R, emit one
(key, value) pair with key=t and
value=“R”
 For each input record t in S, emit one
(key, value) pair with key=t and
value=“S”
22
 Reducers
 Emit one (key, value) pair with key=t
and value=null for each input (key, [list
of values]) pair with [list of values]
containing two values
▪ It happens if and only if both R and S
contain t

23
 R-S
 R and S have the same schema
 Produces a relation with the same schema
of R and S
 There is a record t in the output of the
difference operator if and only if t appears
in R but not in S

24
DegreeCourseProf
ProfID PSurname Department
D102 Smith Computer engineering
D105 Jones Computer engineering
D104 White Electronics

MasterCourseProf
ProfID PSurname Department
D102 Smith Computer engineering
D101 Red Electronics

 Find the professors teaching degree courses but not

master’s courses
 DegreeCourseProf - MasterCourseProf

25
DegreeCourseProf
ProfID PSurna Department
me
D102 Smith Computer
engineering Result
D105 Jones Computer ProfID PSurna Department
engineering me
D104 White Electronics D105 Jones Computer
engineering
MasterCourseProf
D104 White Electronics
ProfID PSurna Department
me
D102 Smith Computer
engineering
D101 Red Electronics
26
 Mappers
 For each input record t in R, emit one (key,
value) pair with key=t and value=name of
the relation (i.e., R)
 For each input record t in S, emit one (key,
value) pair with key=t and value=name of
the relation (i.e., S)
 Two mapper classes are needed
 One for each relation

27
 Reducers
 Emit one (key, value) pair with key=t
and value=null for each input (key, [list
of values]) pair with [list of values]
containing only the value R
▪ It happens if and only if t appears in R but
not in S

28
 The join operators can be implemented by
using the Join pattern
 By using the reduce side or the map side pattern
depending on the size of the input relations/tables

29
 Aggregations and Group by are implemented
by using the Summarization pattern

3.1-Relational Algebra
No ratings yet
3.1-Relational Algebra
32 pages
Programming: Just Basic Tutorials
67% (3)
Programming: Just Basic Tutorials
360 pages
Language Form One
No ratings yet
Language Form One
91 pages
CSC 301: Computer Center Management
No ratings yet
CSC 301: Computer Center Management
58 pages
Aokatec AK-G750
No ratings yet
Aokatec AK-G750
2 pages
Mapúa University: Mesh Analysis and Nodal Analysis
No ratings yet
Mapúa University: Mesh Analysis and Nodal Analysis
9 pages
Case Study UIUX Sumit B - Designerrs
No ratings yet
Case Study UIUX Sumit B - Designerrs
37 pages
Ttl1 Module
No ratings yet
Ttl1 Module
50 pages
Relational Algebra
100% (1)
Relational Algebra
140 pages
Oracle - Overview of Oracle Spatial
No ratings yet
Oracle - Overview of Oracle Spatial
20 pages
Digital Signal Processing
No ratings yet
Digital Signal Processing
103 pages
Sentinel One SOP
No ratings yet
Sentinel One SOP
4 pages
FYP Final Report 20048661 Bimal Khatri 89378
No ratings yet
FYP Final Report 20048661 Bimal Khatri 89378
141 pages
IITKGP Induction Handbook
No ratings yet
IITKGP Induction Handbook
52 pages
CHAPTER 2 (Relational Algebra)
No ratings yet
CHAPTER 2 (Relational Algebra)
22 pages
02b.relational Algebra
No ratings yet
02b.relational Algebra
65 pages
Internet Technologies Exam
No ratings yet
Internet Technologies Exam
14 pages
July 2023 JLPT Registration Guide - Manila Test Site
No ratings yet
July 2023 JLPT Registration Guide - Manila Test Site
43 pages
Chapter Four
No ratings yet
Chapter Four
14 pages
Relational Algebra
No ratings yet
Relational Algebra
13 pages
Os 1-4
No ratings yet
Os 1-4
16 pages
ClassX PreAnnual Sahodaya 2024-25
No ratings yet
ClassX PreAnnual Sahodaya 2024-25
9 pages
Chapter 8 SQL Complex Queries
No ratings yet
Chapter 8 SQL Complex Queries
51 pages
Relation Algebra
No ratings yet
Relation Algebra
54 pages
Societal Project On Scada by Medha Servo Drives
No ratings yet
Societal Project On Scada by Medha Servo Drives
24 pages
Autocad 2d 2
No ratings yet
Autocad 2d 2
15 pages
Module4 Chapter2
No ratings yet
Module4 Chapter2
18 pages
Relational Algebra and SQL
No ratings yet
Relational Algebra and SQL
68 pages
Introduction To Relational Model
No ratings yet
Introduction To Relational Model
55 pages
Printout - Configuration and Implementation of Apache Webserver On Ubuntu
No ratings yet
Printout - Configuration and Implementation of Apache Webserver On Ubuntu
5 pages
18csc303j Dbms Unit IV
No ratings yet
18csc303j Dbms Unit IV
96 pages
Chemistry Lab Experiment Collaborative Teaching System - CLECTS
No ratings yet
Chemistry Lab Experiment Collaborative Teaching System - CLECTS
6 pages
English 1st
No ratings yet
English 1st
2 pages
A Comparative Study of Language Models For Book and Author Recognition
No ratings yet
A Comparative Study of Language Models For Book and Author Recognition
12 pages
Data Bases Cheatsheet
No ratings yet
Data Bases Cheatsheet
2 pages
DBMS Unit-4
No ratings yet
DBMS Unit-4
51 pages
Relational Algebra
No ratings yet
Relational Algebra
34 pages
MCA Projects
No ratings yet
MCA Projects
6 pages
Relational Data Processing Models
No ratings yet
Relational Data Processing Models
29 pages
25 DBMS
No ratings yet
25 DBMS
22 pages
C2 Logarithms & Exponential Functions 5 QP
No ratings yet
C2 Logarithms & Exponential Functions 5 QP
3 pages
2 Relational Algebra
No ratings yet
2 Relational Algebra
34 pages
Relational Algebra
100% (1)
Relational Algebra
40 pages
Sqlrealtion Albra
No ratings yet
Sqlrealtion Albra
20 pages
Database
No ratings yet
Database
14 pages
Unlocked Games For School
No ratings yet
Unlocked Games For School
2 pages
Working With Text Data in R
No ratings yet
Working With Text Data in R
1 page
Relational Algebra Week 6
No ratings yet
Relational Algebra Week 6
29 pages
Relational Algebra and SQL-1
No ratings yet
Relational Algebra and SQL-1
11 pages
3.2-Relational Algebra
No ratings yet
3.2-Relational Algebra
32 pages
DBMS Unit - 2 Notes
No ratings yet
DBMS Unit - 2 Notes
45 pages
Dbmsunit 2
No ratings yet
Dbmsunit 2
68 pages
Important Alg Rela
No ratings yet
Important Alg Rela
39 pages
Relational Algebra
No ratings yet
Relational Algebra
53 pages
Chapter 3
No ratings yet
Chapter 3
53 pages
CH 4
No ratings yet
CH 4
50 pages
Chapter 3: Formal Relational Query Languages
No ratings yet
Chapter 3: Formal Relational Query Languages
51 pages
Lec 12
No ratings yet
Lec 12
26 pages
DBMS - Unit 2
No ratings yet
DBMS - Unit 2
108 pages
Relational Algebra
No ratings yet
Relational Algebra
58 pages
Relational Algebra Operations: Understanding Basics of Query Processing!!
No ratings yet
Relational Algebra Operations: Understanding Basics of Query Processing!!
26 pages
Relational Algebra and Relational Calculus: Pearson Education © 2009
No ratings yet
Relational Algebra and Relational Calculus: Pearson Education © 2009
57 pages
Relation Algebra Anshul
No ratings yet
Relation Algebra Anshul
50 pages
Unit 3 Pre
No ratings yet
Unit 3 Pre
22 pages
Chapter 06
No ratings yet
Chapter 06
57 pages
Lecture 77777
No ratings yet
Lecture 77777
104 pages
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications
No ratings yet
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications
60 pages
DBMS Series 2
No ratings yet
DBMS Series 2
26 pages
Relational Algebra Solutions
No ratings yet
Relational Algebra Solutions
8 pages
Advanced D.base 4
No ratings yet
Advanced D.base 4
20 pages
Introduction To Relational Model
No ratings yet
Introduction To Relational Model
29 pages
CPE/EE 421/521 Fall 2004 Chapter 1 - The Microcomputer: Dr. Rhonda Kay Gaede
No ratings yet
CPE/EE 421/521 Fall 2004 Chapter 1 - The Microcomputer: Dr. Rhonda Kay Gaede
6 pages
Chapter - 03C Rel Algebra and SQL
No ratings yet
Chapter - 03C Rel Algebra and SQL
28 pages
DBMS Unit 2
No ratings yet
DBMS Unit 2
30 pages
03-Relational Model
No ratings yet
03-Relational Model
40 pages
The Relational Algebra
No ratings yet
The Relational Algebra
64 pages
Relational Algebra
No ratings yet
Relational Algebra
11 pages
Dbms Ia 2 Set B Scheme
No ratings yet
Dbms Ia 2 Set B Scheme
9 pages
Chapter 5 - Relational Algebra
No ratings yet
Chapter 5 - Relational Algebra
40 pages
SQL Primary Key
No ratings yet
SQL Primary Key
63 pages
Chapter7-Relational Algebra
No ratings yet
Chapter7-Relational Algebra
41 pages
FALLSEM2019-20 CSE2004 ETH VL2019201000657 Reference Material I 26-Aug-2019 RELATIONAL ALGEBRA
No ratings yet
FALLSEM2019-20 CSE2004 ETH VL2019201000657 Reference Material I 26-Aug-2019 RELATIONAL ALGEBRA
68 pages
Dbms Unit II
No ratings yet
Dbms Unit II
49 pages
21IT304 Notes-3
No ratings yet
21IT304 Notes-3
17 pages
DBMS Unit-2 (I)
No ratings yet
DBMS Unit-2 (I)
35 pages
DB Chap4 1
No ratings yet
DB Chap4 1
41 pages
DATABSE
No ratings yet
DATABSE
4 pages
Unit 2 - Relational Algebra Operators
No ratings yet
Unit 2 - Relational Algebra Operators
15 pages
DBMS
No ratings yet
DBMS
5 pages
Computer Programming Using C
From Everand
Computer Programming Using C
Ramkrishna Ghosh
No ratings yet

08 SQLOperators BigDataNB

Uploaded by

08 SQLOperators BigDataNB

Uploaded by

 The relational algebra and the SQL language

have many useful operators

 Find the courses held in the second semester

Result CCode CName Semester ProfID

 Find the surnames of all professors

 Duplicated values are removed

 Find information relative to the professors of degree

 Find information relative to professors teaching both

 Find the professors teaching degree courses but not

You might also like