Big Data Analytics, NLP, Game Theory and Deep Learning
Big Data Analytics, NLP, Game Theory and Deep Learning
7
A
6
F7
61
00
Paper / Subject Code: 42172 / BIG DATA ANALYTICS
7C
7A
DE
AE
7F
7D
0E
61
00
7C
7A
8B
EF
DE
7F
7D
0E
A9
1
7C
7A
8B
6
E9
E
AE
D
Time: 3 Hours Marks: 80
0E
CD
A9
F1
7
A
8B
EF
9
01
7
7
E
7D
E
CD
A9
A
1A
B0
F
F
Note: 1. Question 1 is compulsory
9
01
06
D7
E7
AE
98
1E
A
F0
B0
7
9A
2. Answer any three out of the remaining five questions.
7A
1
F
1
17
06
E
8
1E
0
3. Assume any suitable data wherever required and justify the same.
E
A9
AA
1A
E6
0
F
F
8B
9
01
CD
06
7
1E
1
E
A9
AA
1A
6
F0
B0
Q1 a) Distinguish between Name node and Data node. [5]
D7
9
01
D
E7
98
1E
F7
61
00
C
b) List and explain the core business drivers behind the NoSQL movement. [5]
B0
A
D7
1F
AE
61
9
CD
8
E
c) Mention four characteristics of big data. Elaborate these characteristics with
0
[5]
F7
00
7A
9
1A
E6
F1
9A
7
AE
F
respect to social media websites.
D
0E
01
D
17
1E
7
00
C
A
8B
A
F
E6
d) List and explain the different issues and challenges in data stream query [5]
9A
7
7
1F
E
61
D
0E
17
A9
processing. A
1E
A0
F7
00
7C
A
B
E6
E9
E7
1F
E
7F
8
61
D
CD
9
AA
F1
A0
7
1
A
0
B
EF
E6
F0
Q2 a) What is a key-value store? What are the benefits of using a key-value store? [10]
E9
01
7
7
98
61
7D
E
CD
17
A
1A
F1
0
A
00
b) Write a map reduce pseudo code to multiply two matrices. Apply map reduce [10]
A
B
EF
E6
9
01
6
7
7
7F
8
E
7D
E
CD
9
A
1A
F1
B0
61
A
7A
F
1 2 6 7
EF
E9
1
17
D7
DE
8
A0
00
E
A9
AA
X
E6
7C
F
F
8B
EF
E9
3 4 8 9
1
CD
17
7
A0
00
7D
0E
A9
AA
E6
F1
D7
61
8B
EF
9
01
CD
17
E7
E
F7
00
Q3 a) Suppose the stream is S = {2, 1, 6, 1, 5, 9, 2, 3, 5}. Let hash functions h(x) = ax + [10]
9
AA
A
E6
F1
B0
9A
D7
AE
b mod 16 for some a and b, treat result as a 4-bit binary integer. Show how the
01
CD
06
E7
98
E
F7
1
A
1A
Flajolet- Martin algorithm will estimate the number of distinct elements, h(x) = 4x
6
F1
0
B0
9A
D7
E7
E
AE
01
CD
06
+ 1 mod 16.
98
1E
B0
F7
61
7A
A
F0
9A
D7
1F
AE
98
61
0E
17
1E
A0
F7
9A
00
7C
7A
E 6
1F
AE
7F
61
1E
D
0E
CD
A9
61
0
1F
A
8B
F0
E9
1 11 1 56
7
E7
DE
AE
61
D
17
A9
1
2 12 2 75
B0
F7
00
7C
1F
E6
E9
E7
7F
3 13 1 48
8
A0
7D
CD
A9
A
F1
B0
61
7A
4 14 2 69
61
F
E9
01
D7
DE
AE
8
00
5 15 1 84
A9
1A
B0
F7
7C
1F
A
7F
6 16 2 53
E9
06
AE
8
A0
7D
E
A9
1
F0
B0
1F
7A
1
EF
E9
17
06
i.
A0
AA
E6
1
F0
B0
9A
1F
the output.
1
CD
17
06
E7
8
E
A0
A9
E6
ii. Create a subset where the course column is less than 3 or the class equals
F0
B0
D7
1F
1
9
CD
17
06
F0
9A
7
1F
1
7D
CD
17
06
1E
A0
Q4 a) Explain natural join and grouping and aggregation relational algebraic operation [10]
EF
E6
F0
7
1F
61
7D
using MapReduce.
CD
17
AA
A0
0
EF
E6
F0
7
E7
61
7D
CD
17
AA
b) With a neat sketch, explain the architecture of the data-stream management [10]
00
EF
E6
system.
7
E7
7F
7D
CD
AA
B0
61
EF
7
E7
DE
8
7D
A9
30013 Page 1 of 2
AA
B0
7C
F
9
E7
AE
8
1E
7D
A9
B0
7A
EF
E9
0E
A9
01F1E9A98B0E7AAEF7D7CDE617F0061A
AA
F1
1E E7 D7 7F 1F
9 A9 AA CD 00 1E
8 EF E6 61 9A
F1 B0 7D 17 A0 98
E9 E7 7 F0 1F B0
A9 AA CD 06 1 E9 E7
8 B0 EF E6 1A A9 A AE
E7 7D 17 01 8B
A9 AA 7 CD F0
06
F1
E9 0E F7
D7
8 EF E6 1 A9 7A CD
B0 7D 17 A0 AE
Q6 a)
Q5 a)
8B
b)
b)
30013
E7 1F E6
7 F0 1 0E F7 17
AA
EF
CD
E6
06
1
E9 7A D7
CD F 00
0E 7D 17 A0 A9
8 AE E6 6 1A
A
7 1F B0 F7
bars
7A F0 1
Milk
17
users.
01
CD F
Bread
06 E9 E7 D7 F1
AE
F E6 1 A CD 00
A9 6
Product
A0 AE E9
8
Detergent
7D 17 1A
Chocolate
E6
Cola Cans
1F A9
7 F0 1
B0
8B F7 17 01
B
AA CD E7 F
different days:
06 E9 D7 F1
EF E6 1 0E A CD 00
5
6
A0 A9
8 AE E 1
E9
A 7A 6
10
21
12
7D 17 F 6
F0 1F
1
B0 7 1 A0 9 8 AE
Newman algorithm.
7C 0 E E D 7 1 B
61 7A 7C F F1 F7
C
DE 9A 00 0E
61 A D E 9 7 D7
7F
A0
1F 98
B0 E F7 E61
61
A0 A A AE CD
E D 7 1 98
8
7
1
3
00 1E
9 7 7 F F B 0 F7 E6
27
61 A9 A CD 006 1 E9 E7 D7 17
F0
A0
8 AE
F E 6 1A A A A C D 06
1F B0
E 7D 17 01 9 8 E E 1A
1E
7A 7C F0 F1 B 0 F 7 6 1
9A 06 E9 E 7 D 7 7 F
01
AE 0 F1
4
5
98 DE
D
1A AA CD
12
33
18
B0 F7 61 A9 06 E9
E7 D7 7F 01 8 B0 E F7 E 61 1 A A9
Page 2 of 2
00 F1 E D 7 01 8B
AA CD E 9 7 7 F0 F 1
EF E6 61
A A9 A A C D 06 E 0E
7D 17 01 8B 1A 9 7A
EF E6 A9 A
_____________________
Monday Tuesday Wednesday
0 7 1 0
F
DE 61 9A 7C 00 1E 0E
6
9 A D 6 9 y
11
61
20
13
20
A0 7A
01F1E9A98B0E7AAEF7D7CDE617F0061A
7F 1F 8B EF E6 1A A9
E
00 1E 0E 7D 17 01 8B AE
61 9A 7 AA 7 CD F 00 F 1E 0E F7
D
Thursda
A0 98 61 9A 7A
1F B0 EF
7D
E6
17 A 01 98 A EF
1E
ii. Name and explain the operators used to form data subsets in R.
9
E7 7 F F
Paper / Subject Code: 42172 / BIG DATA ANALYTICS
B0
23
12
12
15
9A AA CD 00 1E E 7D
98 EF 6 1A 9 7A 7C
B0 7
E6
1 0
A9
8 AE D
Friday
E7 D7 7F
00
1F
1E
B0 F 7D
AA CD E7
[10]
[10]
[10]
Determine communities for the given social network graph using Girvan- [10]
00 1E E 7D 7F
61 9A 7 AA 7 CD 00
A0 9 8 E E 6
1F
1E
B0 F 7 6 1
E7 D7 7F
9A AA CD 00
B
4
4
23
A9
A9
YE
YE
Paper / Subject Code: 42172 / BIG DATA ANALYTICS
BX
41
41
37
37
9
A9
YE
YE
X2
X2
1A
41
37
37
9B
9B
E4
YE
2
2
A
1A
7Y
X
X
41
37
B
B
E4
23
9
9
Time: 03 Hours Marks: 80
YE
2
1A
1A
BX
7Y
37
B
E4
E4
23
A9
A9
X2
2
7Y
7Y
BX
BX
41
41
9B
Note: 1. Question 1 is compulsory
23
23
A9
E
A9
1A
Y
Y
BX
X
2. Answer any three out of the remaining five questions.
41
41
37
37
B
E4
A9
9
E
YE
X2
X2
3. Assume any suitable data wherever required and justify the same.
7Y
Y
41
41
37
9B
37
23
YE
9
YE
E
2
X2
1A
1A
BX
Y
BX
37
Q1 a) Explain how big data problems are handled by Hadoop system. [5]
37
37
9B
E4
E4
A9
A9
X2
2
X2
1A
7Y
Y
BX
b) Mention four characteristics of big data and explain in detail. [5]
41
41
9B
7
B
E4
23
23
A9
YE
9
YE
1A
1A
c) List and explain the core business drivers behind the NoSQL movement. [5]
BX
7Y
BX
41
37
7
E4
E4
3
23
A9
A9
A9
X2
X2
d) Explain the concept of bloom filter with an example. [5]
7Y
7Y
Y
BX
41
41
41
9B
7
9B
23
3
3
YE
E
A9
YE
2
2
1A
1A
7Y
BX
BX
BX
37
41
E4
E4
Q2 a) What is graph store? Give an example where a graph store can be used to effectively [10]
3
3
A9
9
9
X2
YE
2
X2
1A
A
7Y
7Y
BX
solve a particular business problem.
41
41
9B
37
23
3
A9
YE
9
YE
E
2
X2
1A
1A
b) Write a map reduce pseudo code for word count problem. Illustrate with an example [10]
BX
7Y
BX
41
37
37
9B
E4
E4
9
9
X2
X2
X2
A
1A
7Y
1A
7Y
7Y
41
9B
B
9B
23
E4
3
3
A9
E
YE
X2
2
1A
1A
7Y
BX
7Y
BX
41
Q3 a) Suppose the stream is S = {4, 2, 5 ,9, 1, 6, 3, 7}. Let hash functions h(x) = 3x + [10]
7
E4
4
3
3
3
A9
A9
9
E
YE
2
X2
X2
7mod 32 for some a and b, treat result as a 5-bit binary integer. Show how the
A
7Y
7Y
BX
41
41
41
7
9B
9B
Flajolet- Martin algorithm will estimate the number of distinct elements in this
23
3
3
YE
YE
YE
X2
X2
A
A
1A
stream.
BX
41
41
37
37
7
B
9B
4
9
A9
YE
E
X2
YE
2
X2
1A
b) Describe applications of data visualization.
1A
[10]
7Y
BX
1
9B
37
4
4
4
23
3
E
A9
YE
YE
2
2
1A
1A
7Y
BX
BX
BX
7
37
E4
4
4
23
3
A9
A9
YE
A9
YE
2
2
7Y
BX
BX
BX
41
1
Q4 a) Explain selection and projection relational algebraic operation using MapReduce. [10]
41
7
37
23
3
A9
9
E
A9
E
YE
X2
2
1A
7Y
BX
Y
BX
1
1
37
B
37
E4
4
E4
23
A9
A9
A9
b) Explain DGIM algorithm for counting ones in a stream with example. [10]
X2
2
7Y
7Y
BX
7Y
BX
41
1
1
4
E4
23
3
Q5 a) Determine communities for the given social network graph using Girvan- Newman [10]
23
A9
YE
A9
E
A9
X2
7Y
BX
7Y
BX
algorithm.
1
1
37
9B
4
E4
3
23
A9
YE
A9
X2
X2
A
7Y
BX
41
41
1
9B
B
E4
3
23
YE
A9
YE
X2
1A
1A
7Y
BX
A B C
1
37
37
E4
4
E4
23
9
A9
YE
X2
2
1A
7Y
7Y
BX
1
37
B
4
E4
23
23
A9
A9
E
2
7Y
BX
7Y
BX
BX
1
1
4
E4
23
23
A9
A9
YE
A9
7Y
X
BX
41
41
1
37
B
E4
23
E F
A9
YE
A9
YE
X2
D
BX
41
41
37
37
9 B
A9
E
YE
X2
2
1A
7Y
BX
1
9B
37
E4
E4
23
A9
X2
1A
7Y
7Y
BX
1
9B
E4
E4
23
23
A9
7Y
7Y
BX
BX
1
E4
23
3
9
A9
X2
1A
7Y
BX
1
9B
57520 Page 1 of 2
E4
E4
23
A9
1A
7Y
7Y
BX
41
E4
23
23
A9
YE
7Y
BX
41
37
23
X237YE41A9BX237YE41A9BX237YE41A9BX237YE41A9B
A9
YE
X2
23 9B E4 23 A9
7Y X2 1 7Y BX
E4 37
A9 E4 2
1 A9 YE BX 1 A9
37
YE
23 41 23
7Y BX
A9 7Y BX 41
E4 23 E4 23 A9
1A 7Y BX 1 7Y BX
9B E4 23 A9 E4 2
YE X2 1 A9 7Y BX 1 A9
37
YE
41 3 7Y BX E4
1
23
7Y BX 4
Q6 a)
b)
b)
1A
57520
A9 E4 23 A9 E4 2
BX 1 7Y BX 1 37 9B
23 A9 E4 23 A9 YE X2
7Y BX 1 7Y 7Y BX 4 1A 3
ii.
i.
A9 E4 2 37
A9 E4 E4 2 9B
BX 1 BX 1A 1 37
A9 23
41 2 37
A9
B
YE X2
3 9B
YE
4
7Y BX
A9 YE X2 1 A9 7 YE X2
E4 23
1A 7Y BX 41
A
37
Y B 4 37
6
5
4
3
2
1
X2
E 2 1A
37 9 41 37 9B E 4
X2
3 9B YE
BX 1 7 41
course
YE 23 A9 YE X2 A9 YE X2
the output.
3 A9
41 7Y BX 4 1A 7Y BX 41 3 7Y BX
A9 E4 23 9 E 2 A 9 E
BX B 4 3 23
id
7Y 41
11
16
15
14
13
12
1A X2 1A 7Y BX
A 7Y
23
7Y 9 B E 4 3 9 BX E 41 23 9 E4
X2 1A 7Y A 7Y BX 1A
E4 37 9 B E4 2 3 9B E4 2 3 9B
1A YE X2 1 A9 7 YE X2 1A 7Y X2
2
1
2
1
2
1
9B 4 1 3 7 B 4 1 3 7 9 B E4 37
class
X2 A9 YE X2 A9 YE X2 1A YE
37 B 4 3 B 4 3 9
BX 41 7Y BX 41 7Y BX 41 7Y
Page 2 of 2
E 2 A
marks
23 A9 A9
7Y BX 41 37 B E4 23
7 9 B E4
X2 1A Y X 1A
Consider the following data frame given below:
E4 23 A9 YE 23
1A 7Y BX 41 37 9B E4 9B
A9 YE X2 1A 7Y X2
9B E4
1
23
7 B 41A 3 7 9 B E4 37
X2 A9 YE X2 YE X2 1A
37 BX 41 3 7Y 9 BX 41 37 9 B
YE
4
YE 23 A 9 E 2 A9 Y E X 2
41 7Y BX 41 3 7Y BX 41 37 Describe collaborative filtering in recommendation system.
A9 E 2 A9 E 2 A9 YE
____________________________________
BX 4 1A 3 7Y BX 41 37 B 41
23 9 E 2 A9 Y E X 2 A9
3 B
X237YE41A9BX237YE41A9BX237YE41A9BX237YE41A9B
7Y BX 41 7Y BX 41 37
to 1 by using subset () function and demonstrate the output.
E4 23 A9 E 2 A9 YE
Paper / Subject Code: 42172 / BIG DATA ANALYTICS
1A 7Y BX 41 37 BX 41
9B E4 23
7Y
A9
BX
YE
41 2 37
A9
X2 1A A9 YE BX
37 9B E4 23 BX 41 23
YE X2 1A 7Y A
41 37 9B E4 23 9B
ii. Explain the various functions provided by R to combine different sets of data.
7Y
Create a subset where the course column is less than 4 or the class equals
A9 YE X2 1A X2
Create a subset of course less than 5 by using [ ] brackets and demonstrate
BX 41 37 9B E4 37
A9 YE X2 1A YE
23
7Y BX 41 37 9 B 4
Y E X 2
[10]
i. Write a script to create a dataset named data1 in R containing the following text. [10]
23 A9
[10]
E4 BX 41 37
1A 7Y A9 YE
9B E4 23 BX 41
X2 1A 7Y A9
37 9B E4 23
YE X2 1A 7Y
41 37 9B E4
A9 YE X2 1A
9
7
A0
3
C
A4
B7
17
58
Paper / Subject Code: 42172 / BIG DATA ANALYTICS
C3
CA
78
09
C7
A4
B7
3A
7
58
CB
81
CA
09
4C
C7
77
65
3A
58
AA
B
9B
B4
C
C7
A0
65
AD
Time: 03 Hours Marks: 80
4
8
A
CB
B4
C3
5
A
DB
C7
65
D
A4
3A
8
54
BA
B
4
5
Note: 1. Question 1 is compulsory
5C
CA
DB
4C
C7
73
46
81
58
AA
54
B
2. Answer any three out of the remaining five questions.
5C
B
DB
77
7
73
3. Assume any suitable data wherever required and justify the same.
4D
8C
C
46
9B
BA
CB
8
75
35
DB
7
A0
65
7
BC
17
9B
Q1 a) What is function of Map Tasks in the Map Reduce framework? Explain with the [5]
58
54
A
C3
4
8
5C
DB
DB
77
0
C7
73
help of an example.
A4
6
9B
54
BA
C3
CB
4
CA
8
b) Demonstrate how business problems have been successfully solved faster, cheaper [5]
B
7
A0
3
4
65
B7
D
17
58
A
and more effectively considering NoSQL Google’s MapReduce case study. Also
A
3
B4
CA
78
09
5
C
C7
DB
73
illustrate the business drivers and the findings in it.
AD
A
58
AA
CB
46
9B
4
3
35
4C
C7
c) Why is HDFS more suited for applications having large datasets and not when there [5]
DB
DB
77
A0
65
17
8
AA
CB
9B
B4
54
BA
C3
75
8
77
A0
65
AD
73
8C
C
A4
d) Explain the concept of bloom filter with an example [5]
4D
B
9B
B4
81
C3
75
5C
A
DB
35
77
A0
D
C
C
4
6
17
8
AA
54
CB
9B
4
Q2 a) Name the three ways that resources can be shared between computer systems. Name [10]
3
5
B
DB
78
4C
C7
3
A0
65
4D
8C
17
B7
AA
A
CB
B4
8
C3
75
5
09
77
73
b) Write a map reduce pseudo code for word count problem. Apply map reduce [10]
65
D
8C
BC
A4
3A
9B
B4
75
35
5C
A
DB
77
A0
4C
D
8C
C
7
46
9B
AA
54
BA
C3
CB
8
5
“This is an apple. Apple is red in color”.
DB
7
A0
C7
3
A4
65
7
8C
17
9B
BA
C3
CB
Q3 a) Suppose the stream is 1, 3, 2, 1, 2, 3, 4, 3, 1, 2, 3, 1. Let h(x) = 6x + 1 mod 5. [10]
B4
CA
78
75
35
A0
A4
65
Show how the Flajolet- Martin algorithm will estimate the number of distinct
7
4D
AD
BC
7
58
1
C3
B4
CA
8
09
5C
B
77
A4
D
3A
17
58
B
46
9B
54
BA
CA
78
C
C7
DB
A0
73
4
4D
6
58
A
CB
B
B4
BA
C3
A
1 1 56
78
09
35
C7
65
AD
8C
B7
4D
3A
2 2 75
7
A
CB
B4
81
5
CA
09
3 1 48
35
4C
C7
77
65
AD
17
4 2 69
58
AA
CB
9B
B4
C3
DB
78
7
5 1 84
A0
65
AD
8C
C
A4
B7
54
CB
6 2 53
B4
C3
75
CA
DB
09
73
65
AD
A4
3A
81
58
54
B
B4
i. Create a subset of subject less than 4 by using subset () function and demonstrate
5C
A
B
4C
C7
73
4D
AD
the output.
6
81
AA
CB
4
5
35
DB
B
77
ii. Create a subset where the subject column is less than 3 and the class equals to 2
7
65
D
8C
BC
17
B
54
75
5C
DB
77
73
D
3A
BC
46
9B
81
54
DB
77
A0
73
46
9B
81
54
BA
C3
b) With a neat sketch, explain the architecture of the data-stream management system. [10]
DB
77
A0
73
4
4D
AA
9B
81
Q5 a) Determine communities for the given social network graph using Girvan- Newman [10]
BA
C3
35
77
A0
C
algorithm.
4
4D
17
58
AA
9B
C3
35
77
A0
C
17
58
AA
9B
C3
78
C7
0
C
B7
3A
58
AA
CB
09
15786 Page 1 of 2
4C
C7
65
8C
3A
AA
CB
B4
75
4C
65
8C
BC
AA
B4
BADB465CBC758CAA4C3A09B77817354D
75
5C
7
A0
3
C
A4
B7
17
58
Paper / Subject Code: 42172 / BIG DATA ANALYTICS
C3
CA
78
09
C7
A4
B7
3A
7
58
CB
81
CA
09
4C
C7
77
65
3A
58
AA
B
9B
B4
C
C7
A B D E
A0
65
AD
4
8
A
CB
B4
C3
5
A
DB
C7
65
D
A4
3A
8
54
BA
B
4
5
5C
CA
DB
4C
C7
73
D
C G F
46
81
58
AA
54
B
5C
B
DB
77
7
73
4D
8C
C
46
9B
BA
CB
8
75
35
DB
7
A0
65
7
BC
17
9B
58
54
A
C3
4
b) [10]
8
The data analyst of Argon technology Mr. John needs to enter the salaries of 10
5C
DB
DB
77
0
C7
73
A4
A
employees in R. The salaries of the employees are given in the following table:
6
9B
54
BA
C3
CB
4
CA
B
7
A0
3
4
65
B7
D
17
58
A
Sr. No. Name of employees Salaries
A
3
B4
CA
78
09
5
C
C7
DB
73
4
AD
A
1 Vivek 21000
58
AA
CB
46
9B
4
3
35
4C
C7
DB
DB
77
A0
65
17
2 Karan 55000
8
AA
CB
9B
B4
54
BA
C3
75
8
77
A0
65
AD
73
8C
C
A4
4D
3 James 67000
B
9B
B4
81
C3
75
5C
A
DB
35
77
A0
D
C
C
4
4 Soham 50000
6
17
8
AA
54
CB
9B
4
3
5
B
DB
78
4C
C7
3
A0
65
4D
8C
17
5 Renu 54000
B7
AA
A
CB
B4
8
C3
75
5
09
77
73
65
D
8C
BC
6 Farah 40000
A4
3A
9B
B4
8
75
35
5C
A
DB
77
A0
4C
D
8C
7 Hetal 30000 C
7
46
9B
AA
54
BA
C3
CB
8
5
DB
7
A0
C7
3
A4
8 Mary 70000
65
7
8C
17
9B
BA
C3
CB
B4
CA
78
75
35
A0
A4
9 Ganesh 20000
65
7
4D
AD
BC
7
58
1
C3
B4
CA
8
09
35
C7
5C
B
77
10 Krish 15000
A4
D
3A
17
58
B
46
9B
54
BA
5C
CA
78
C
C7
DB
A0
73
4
4D
6
i. Which R command will Mr. John use to enter these values demonstrate the output.
58
A
CB
B
B4
BA
C3
A
78
09
35
C7
ii. Now Mr. John wants to add the salaries of 5 new employees in the existing table,
65
AD
8C
B7
4D
3A
7
A
CB
B4
which command he will use to join datasets with new values in R. Demonstrate the
81
5
CA
09
35
4C
C7
77
65
AD
output.
A
17
58
AA
CB
9B
B4
C3
DB
78
7
A0
65
AD
8C
C
A4
B7
Q6 a) i. Write the script to sort the values contained in the following vector in ascending [10]
54
CB
B4
C3
75
CA
DB
09
order and descending order: (23, 45, 10, 34, 89, 20, 67, 99). Demonstrate the
73
65
AD
A4
3A
81
58
54
output.
B
B4
5C
A
B
4C
C7
73
ii. Name and explain the operators used to form data subsets in R.
4D
AD
C
6
81
AA
CB
4
5
35
DB
7
65
D
8C
BC
17
B
suitable example.
54
B4
8
09
75
5C
DB
77
73
D
3A
BC
46
9B
81
54
-----------------
5C
DB
DB
77
A0
73
46
9B
81
54
BA
C3
DB
77
A0
73
4
4D
AA
9B
81
BA
C3
35
77
A0
C
4D
17
58
AA
9B
C3
35
77
A0
C
17
58
AA
9B
C3
78
C7
0
C
B7
3A
58
AA
CB
09
15786 Page 2 of 2
4C
C7
65
8C
3A
AA
CB
B4
75
4C
65
8C
BC
AA
B4
BADB465CBC758CAA4C3A09B77817354D
75
5C
WXYZ[\]\^_`aZbc\defZg\hijki\]\lmn\oXcX\pqXrscmbt
14/11/2024 CSE-AIML SEM-VII C SCHEME BIG DATA ANALYTICS QP CODE: 10064705
9 773
!"#$%"&'()*"$")("+,-'.(/0#
1%"2*(3 /"4*0"5/ "'"6"5"/,4)*)*7"6)8"9'()*(%#
:%"2((', "4*0"(')4;."<44"35/8/"/9')/<"4*<"='()60"5"(4, %#
#
&%$"">4?#@A-.4)*"B"CD("6";)7"<44%# 34
>;?#E)66/*)4";3 *"F&G"8("F&G# 34
>+?#H/)"5".),)4)*("6"I4<-%# 34
><?#@A-.4)*"53"64).'/("4/"54*<.<")*"J4-"K <'+"=;# 34
# #
&%1"">4?#L..'(/4"/.4)*4."4.7;/4"-/4)*("3)5"A4,-.%# 3
"""""""">;?#@A-.4)*";)7"<44"*4;.)*7"+5*.7)(%# # 3
# #
&%:"">4?#@A-.4)*"MNO"4.7/)5,"4*<")("0-("3)5"*4".4;.<"<)47/4,# 3
"""""""">;?#N,-4/"<)66/*"0-("6""F&G"4/+5)+'/4."-4/*# 3
# #
&%P"">4?#@A-.4)*"I4<-"2/+5)+'/4."J<."3)5";5"+,-**(")*"<4).# 3
"""""""">;?#H/)"5"6'*+)*("6"5"+,-**("4*<"A+')*"(-(")*"J4-" 3
K <'+#
# #
&%B"">4?#H/)")(('(")*"<44"(/4,"9'/)(%"@A-.4)*"5")(('(")*"<44"(/4,)*7# 3
"""""""">;?#@A-.4)*"M47"/4*Q"'()*7"J4-"/<'+R"4.("A-.4)*"(-)</"/4-("4*<"<4<" 3
*<(#
# #
&%S"">4?#@A-.4)*"NTK@"4.7/)5,"3)5")("4<84*47("8/"/4<))*4."+.'(/)*7" 3
4.7/)5,#
>;?#@A-.4)*"J8)"/+,, *<4)*"'()*7"N..4;/4)8"U;4(<"6)./)*7%# 3
#
"""#
VVVVVVVVVVVVVVVVVVVVVVVVV#
01234 6789
uvwvxyz{|w}uvwvxyz{|w}uvwvxyz{|w}uvwvxyz{|w}
25
6F
55
52
85
X5
Paper / Subject Code: 42372 / Big Data Analytics
5Y
F8
5X
FF
6F
55
52
Y6
85
X5
5Y
F8
5X
FF
25
6F
55
52
Y6
85
X5
5Y
F8
5X
FF
25
55
F
52
Y6
85
X5
F8
6
5Y
5X
FF
25
6F
55
Time: 03 Hours Marks: 80
52
Y6
85
5
5Y
F8
6
X
5Y
5X
FF
5
6F
55
52
Note: 1. Question 1 is compulsory
52
52
Y6
85
Y
F8
5X
X
2. Answer any three out of the remaining five questions.
5X
5
FF
25
6F
55
2
85
Y6
85
3. Assume any suitable data wherever required and justify the same.
5
Y
F8
5X
FF
X
5
FF
25
6F
55
2
Y6
5
Q1 Write short notes on: [20]
X5
8
6
5
Y
F8
FF
5Y
F8
5X
25
5
5
a) Big Data and its characteristics
F
2
Y6
6F
X5
52
Y6
85
X5
8
b) Distance measures for Big Data
5Y
5X
5
F
55
5
F
5
2
F
2
6
5
c) The Map and Reduce Tasks
X5
F8
52
Y6
85
X5
Y
F 8
5X
25
6F
F
55
d) Bloom filter for stream data mining
5
F
F
2
6
5
5
5Y
Y6
85
X5
5Y
8
X
F
X
6F
FF
55
52
25
F
5
2
Q2 a) Explain HDFS architecture. [10]
55
6
5
5
5Y
F8
5X
Y6
X5
5Y
F8
5X
F8
b) Explain Column family store and Graph Store NoSQL architectural pattern [10]
6F
52
85
25
F
55
52
85
6F
6
with example. 5Y
X
FF
X5
5Y
F8
5X
F
5Y
5
6F
52
Y6
85
55
52
85
52
6
Y
X
FF
5Y
F8
5X
25
Q3 a) Write a Map reduce pseudo code to multiply two matrices. Illustrate [10]
FF
5X
25
5
6
85
6F
X5
52
6
85
X5
with an example showing all the steps.
Y
85
Y
F
5Y
5X
5
FF
55
25
F
FF
5
2
85
X5
F8
52
6
85
5
Y
Y6
Y
X
F
5X
5
6F
FF
5
25
F
55
2
5
25
6
5
85
5
Y
8
5X
F
X5
Y
5X
F
5
F
FF
52
25
6F
2
6
55
5
5
5Y
8
5X
6
5
5Y
8
5X
F
5Y
F8
5X
F
F
2
85
F
2
6
85
6F
5
52
Y6
5
5
5Y
5X
FF
8
5X
F
5Y
5X
F
based system .
5
F
52
Y6
85
F
52
Y6
85
52
Y6
85
5X
FF
X
25
5X
25
F
25
F
5
Y6
85
Y6
85
X5
Y6
85
X5
FF
25
FF
55
Also show how the CPM finds clique for the following graph. Explain with steps.
25
FF
55
25
5
6
X5
F8
Y6
5
X5
Y
Y6
X5
F8
FF
25
6F
55
25
55
25
F
55
6
X5
5Y
F8
Y6
X5
Y
F8
X5
F8
25
6F
55
52
5
6F
55
6F
55
52
5
5Y
F8
5X
F8
5X
5Y
F8
X
5
6F
52
85
55
52
5
6F
2
Y6
5Y
8
5X
FF
X5
F8
5X
F
5Y
5
F
52
85
55
2
Y6
52
Y6
X5
8
X
FF
F8
F
5X
5
55
5
F
5
52
Y6
F
2
Y6
5
F8
Y6
85
X5
8
X
25
F
25
6F
FF
55
25
F
5
X5
Y6
5
X5
5Y
F8
Y6
X5
F8
55
25
6F
5
52
25
F
55
5
.
6
X5
5Y
F8
5X
X5
5Y
F8
F
55
52
85
55
52
6
6
5Y
F8
5X
FF
5Y
F8
5X
6F
52
85
6F
52
5
5Y
F8
5X
FF
5Y
5X
25
85
52
85
5Y
5X
FF
5X
25
FF
2
Y6
5
X5
Y6
85
X5
F8
25
FF
55
25
6F
5
85
X5
*****************
F8
Y6
X5
5Y
FF
6F
55
25
55
52
6
F8
X5
5Y
F8
5X
6F
55
52
85
6
5Y
5Y
F8
5X
FF
52
55103 Page 1 of 1
6F
52
Y6
85
5X
5Y
5X
FF
25
85
52
Y6
85
X5
5X
FF
25
55
X525Y6FF855X525Y6FF855X525Y6FF855X525Y6FF855
Y6
85
X5
F8
4D
25
C
8F
96
FD
Paper / Subject Code: 42372 / Big Data Analytics
5E
AE
5D
03
4D
25
8F
28-Dec-2023 10:30 am - 01:30 pm 1T01877 - B.E. Computer Science &
96
A3
FD
5E
AE
5D
03
41
Engineering (Artificial Intelligence & Machine Learning) (R-2019-20 C Scheme) (Sem
4D
25
8F
96
A3
76
5E
VII) / 42372 - Big Data Analytics QP CODE: 10043750
AE
5D
3
9C
4D
Time: 03 Hours Marks: 80
6
A3
6
AC
39
C7
E
5D
41
A
F0
29
9
Note: 1. Question 1 is compulsory
6
A3
76
AC
E4
8
CE
9
5D
2. Answer any three out of the remaining five questions.
3
C
41
6A
F0
29
FD
C9
A3
3. Assume any suitable data wherever required and justify the same.
76
8
CE
9
A
25
5D
03
C
41
9
FD
C9
5E
F
2
A3
76
Q1 a) What is Hadoop and Why it Matters. [5]
D8
CE
A
4D
25
41
b) Compare traditional database and big data. [5]
5
D
9
5E
E2
AE
3
6
AC
5F
D8
c) Explain CAP theorem. State how it is different from ACID properties. [5]
1A
7
D
C
96
9
4
35
d) Compare DBMS VS DSMS. [5]
64
C9
E
E2
E
03
5F
5
1A
A
7
A
4D
9C
8F
96
E2
9
FD
64
E2
E
5D
AC
3
D5
6A
C7
0
C
Q2 a) Draw Hadoop Ecosystem and briefly explain its components. [10]
8F
A3
29
4
C9
9
E
AE
5D
76
03
5F
D5
1
A
64
DC
9C
b) Explain the four types of NoSQL database. 8F [10]
6
A3
29
4
9
5E
C7
AE
D
AC
3
5F
1
CE
F0
5
4D
64
C9
6
A3
29
D
D8
39
E
C7
AE
A
5F
5
1
CE
F0
29
35
4D
64
96
E2
FD
8
CE
1A
C7
AE
A
5D
03
5
29
4D
25
D
4
C9
F
b) Explain DGIM algorithm. [10]
6
A3
6
5F
8
CE
5E
7
AE
A
5D
03
C
1
E2
29
4D
D
64
C9
96
A3
5F
D5
D8
CE
AE
A
03
C
41
E2
29
E4
5
D
8F
96
3
6
C
CE
A
6A
7
9A
5D
03
9C
41
E2
E4
D
39
8F
2
3
76
C
5F
D5
1A
6A
F0
5D
C
9C
E2
29
E4
64
39
A3
AC
5F
D5
CE
6A
C7
F0
35
41
E2
29
E4
D
D8
C9
39
1A
76
F
D5
E
6A
0
9A
25
64
DC
9C
8F
E4
39
A
5E
7
2
5D
AC
9C
5F
41
CE
6A
F0
D
3
E2
6
AC
29
4
b) What is a Social Network? Give Varieties of Social Networks and the [10]
D
D8
39
1A
C7
5F
D5
CE
6A
F0
E2
AC
E4
FD
D8
39
1A
C7
5
6A
F0
29
35
4D
25
64
C9
D8
CE
39
1A
5E
C7
AE
A
35
4D
FD
64
9
6
AC
8
CE
39
1A
C7
AE
25
5D
64
9
96
3
AC
8
CE
1A
C7
25
03
29
35
FD
64
9
5E
8F
AC
______________
CE
1A
C7
4D
25
5D
29
FD
64
9
5E
E
A3
AC
CE
6A
C7
4D
25
41
29
FD
C9
5E
AE
76
CE
A
4D
25
9C
96
29
FD
5E
AE
AC
03
CE
4D
25
F
96
29
FD
D8
5E
AE
03
CE
35
4D
25
8F
96
FD
5E
AE
5D
03
4D
25
8F
96
3
1A
5E
AE
5D
03
4D
64
43750 Page 1 of 1
8F
96
3
1A
C7
AE
5D
03
64
C9
8F
96
A3
C7
5D
03
41
C9
9AC9C7641A35D8F0396AE4D5E25FDCE2
8F
A3
76