0% found this document useful (0 votes)
6 views13 pages

Big Data Analytics, NLP, Game Theory and Deep Learning

The document is an examination paper for a course on Big Data Analytics, with a total of 80 marks and a duration of 3 hours. It contains various questions related to big data concepts, including distinctions between nodes, NoSQL drivers, characteristics of big data, and issues in data stream processing. Students are required to answer one compulsory question and any three out of five additional questions.

Uploaded by

dubeyneha0027
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views13 pages

Big Data Analytics, NLP, Game Theory and Deep Learning

The document is an examination paper for a course on Big Data Analytics, with a total of 80 marks and a duration of 3 hours. It contains various questions related to big data concepts, including distinctions between nodes, NoSQL drivers, characteristics of big data, and issues in data stream processing. Students are required to answer one compulsory question and any three out of five additional questions.

Uploaded by

dubeyneha0027
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

D

7
A

6
F7

61

00
Paper / Subject Code: 42172 / BIG DATA ANALYTICS

7C
7A

DE
AE

7F
7D
0E

61

00
7C
7A
8B

EF

DE

7F
7D
0E
A9

1
7C
7A
8B

6
E9

E
AE

D
Time: 3 Hours Marks: 80

0E

CD
A9
F1

7
A
8B

EF
9
01

7
7
E

7D
E

CD
A9

A
1A

B0
F

F
Note: 1. Question 1 is compulsory

9
01
06

D7
E7

AE
98
1E
A
F0

B0

7
9A
2. Answer any three out of the remaining five questions.

7A
1

F
1
17

06

E
8
1E
0
3. Assume any suitable data wherever required and justify the same.

E
A9

AA
1A
E6

0
F
F

8B
9
01
CD

06

7
1E
1

E
A9

AA
1A
6

F0

B0
Q1 a) Distinguish between Name node and Data node. [5]

D7

9
01
D

E7
98
1E
F7

61

00
C
b) List and explain the core business drivers behind the NoSQL movement. [5]

B0
A
D7

1F
AE

61

9
CD

8
E
c) Mention four characteristics of big data. Elaborate these characteristics with

0
[5]
F7

00
7A

9
1A
E6

F1

9A
7
AE

F
respect to social media websites.

D
0E

01
D

17

1E
7

00
C
A
8B

A
F

E6
d) List and explain the different issues and challenges in data stream query [5]

9A
7
7

1F
E

61
D
0E

17
A9

processing. A

1E
A0
F7

00
7C
A
B

E6
E9

E7

1F
E

7F
8

61
D

CD
9

AA
F1

A0
7

1
A

0
B

EF

E6

F0
Q2 a) What is a key-value store? What are the benefits of using a key-value store? [10]
E9
01

7
7
98

61
7D
E

CD

17
A
1A

F1

0
A

00
b) Write a map reduce pseudo code to multiply two matrices. Apply map reduce [10]
A
B

EF

E6
9
01
6

7
7

7F
8
E

working to perform following matrix multiplication.


00

7D
E

CD
9

A
1A

F1

B0

61
A

7A
F

1 2 6 7

EF
E9
1
17

D7

DE
8
A0
00

E
A9

AA
X
E6

7C
F
F

8B

EF
E9

3 4 8 9
1
CD

17

7
A0
00

7D
0E
A9

AA
E6

F1
D7

61

8B

EF
9
01
CD

17

E7
E
F7

00

Q3 a) Suppose the stream is S = {2, 1, 6, 1, 5, 9, 2, 3, 5}. Let hash functions h(x) = ax + [10]
9

AA
A
E6

F1

B0
9A
D7
AE

b mod 16 for some a and b, treat result as a 4-bit binary integer. Show how the
01
CD

06

E7
98
E
F7

1
A

1A

Flajolet- Martin algorithm will estimate the number of distinct elements, h(x) = 4x
6

F1
0

B0
9A
D7
E7

E
AE

01
CD

06

+ 1 mod 16.
98
1E
B0

F7

61
7A

A
F0

9A
D7

1F
AE
98

61
0E

17

1E
A0
F7
9A

00
7C
7A

b) Consider the following data frame given below: [10]


8B

E 6

1F
AE

7F

61
1E

D
0E

CD
A9

course id class marks


A0
7

61

0
1F

A
8B

F0
E9

1 11 1 56
7
E7

DE
AE

61
D

17
A9
1

2 12 2 75
B0

F7

00
7C
1F

E6
E9

E7

7F

3 13 1 48
8
A0

7D

CD
A9

A
F1

B0

61
7A

4 14 2 69
61

F
E9
01

D7

DE
AE
8
00

5 15 1 84
A9
1A

B0

F7

7C
1F

A
7F

6 16 2 53
E9
06

AE
8
A0

7D
E
A9
1
F0

B0
1F

7A
1

EF
E9
17

06

Create a subset of course less than 3 by using [ ] brackets and demonstrate


98

i.
A0

AA
E6

1
F0

B0
9A
1F

the output.
1
CD

17

06

E7
8
E
A0

A9
E6

ii. Create a subset where the course column is less than 3 or the class equals
F0

B0
D7

1F
1

9
CD

17

06

to 2 by using subset () function and demonstrate the output.


98
1E
A0
E6

F0

9A
7

1F
1
7D

CD

17

06

1E
A0

Q4 a) Explain natural join and grouping and aggregation relational algebraic operation [10]
EF

E6

F0
7

1F
61
7D

using MapReduce.
CD

17
AA

A0
0
EF

E6

F0
7
E7

61
7D

CD

17
AA

b) With a neat sketch, explain the architecture of the data-stream management [10]
00
EF

E6

system.
7
E7

7F
7D

CD
AA
B0

61
EF

7
E7

DE
8

7D
A9

30013 Page 1 of 2
AA
B0

7C
F
9

E7

AE
8
1E

7D
A9

B0

7A

EF
E9

0E
A9

01F1E9A98B0E7AAEF7D7CDE617F0061A
AA
F1
1E E7 D7 7F 1F
9 A9 AA CD 00 1E
8 EF E6 61 9A
F1 B0 7D 17 A0 98
E9 E7 7 F0 1F B0
A9 AA CD 06 1 E9 E7
8 B0 EF E6 1A A9 A AE
E7 7D 17 01 8B
A9 AA 7 CD F0
06
F1
E9 0E F7
D7
8 EF E6 1 A9 7A CD
B0 7D 17 A0 AE

Q6 a)
Q5 a)
8B

b)
b)

30013
E7 1F E6
7 F0 1 0E F7 17
AA
EF
CD
E6
06
1
E9 7A D7
CD F 00
0E 7D 17 A0 A9
8 AE E6 6 1A

A
7 1F B0 F7

bars
7A F0 1

Milk
17

users.
01
CD F

Bread
06 E9 E7 D7 F1
AE
F E6 1 A CD 00
A9 6

Product
A0 AE E9
8

Detergent
7D 17 1A

Chocolate
E6

Cola Cans
1F A9
7 F0 1
B0
8B F7 17 01

B
AA CD E7 F

different days:
06 E9 D7 F1
EF E6 1 0E A CD 00

5
6
A0 A9
8 AE E 1
E9
A 7A 6

10
21
12
7D 17 F 6
F0 1F
1
B0 7 1 A0 9 8 AE
Newman algorithm.

7C 0 E E D 7 1 B
61 7A 7C F F1 F7
C

DE 9A 00 0E
61 A D E 9 7 D7
7F
A0
1F 98
B0 E F7 E61
61
A0 A A AE CD
E D 7 1 98

8
7
1
3
00 1E
9 7 7 F F B 0 F7 E6

27
61 A9 A CD 006 1 E9 E7 D7 17
F0
A0
8 AE
F E 6 1A A A A C D 06
1F B0
E 7D 17 01 9 8 E E 1A
1E
7A 7C F0 F1 B 0 F 7 6 1
9A 06 E9 E 7 D 7 7 F
01
AE 0 F1

4
5
98 DE
D

1A AA CD

12
33
18
B0 F7 61 A9 06 E9
E7 D7 7F 01 8 B0 E F7 E 61 1 A A9

Page 2 of 2
00 F1 E D 7 01 8B
AA CD E 9 7 7 F0 F 1
EF E6 61
A A9 A A C D 06 E 0E
7D 17 01 8B 1A 9 7A
EF E6 A9 A

_____________________
Monday Tuesday Wednesday
0 7 1 0
F

Create five sample numeric vectors from this data.


7C F0 F1 E D 7 1 8B
0 E 7A F F
List and discuss various types of data structures in R.

DE 61 9A 7C 00 1E 0E

6
9 A D 6 9 y
11
61

20
13
20
A0 7A

01F1E9A98B0E7AAEF7D7CDE617F0061A
7F 1F 8B EF E6 1A A9
E

00 1E 0E 7D 17 01 8B AE
61 9A 7 AA 7 CD F 00 F 1E 0E F7
D
Thursda

A0 98 61 9A 7A
1F B0 EF
7D
E6
17 A 01 98 A EF
1E

ii. Name and explain the operators used to form data subsets in R.
9

E7 7 F F
Paper / Subject Code: 42172 / BIG DATA ANALYTICS

B0

23
12
12
15

9A AA CD 00 1E E 7D
98 EF 6 1A 9 7A 7C
B0 7
E6
1 0
A9
8 AE D
Friday

E7 D7 7F
00
1F
1E
B0 F 7D
AA CD E7

Define collaborative filtering. Using an example of an e-commerce site like


61 9 A 7C
i. The following table shows the number of units of different products sold on

flipkart or amazon describe how it can be used to provide recommendation to


EF
7D
E6
17 01
A9
8B A EF DE
7C F0 F1 0 7 61
0 E E D 7
DE 61 9 A 7A 7C
61 A0 98 AE DE
7F 1F B0 F 61

[10]
[10]
[10]
Determine communities for the given social network graph using Girvan- [10]

00 1E E 7D 7F
61 9A 7 AA 7 CD 00
A0 9 8 E E 6
1F
1E
B0 F 7 6 1
E7 D7 7F
9A AA CD 00
B

4
4

23

A9
A9

YE
YE
Paper / Subject Code: 42172 / BIG DATA ANALYTICS

BX

41
41

37
37

9
A9

YE
YE

X2
X2

1A
41

37
37

9B
9B

E4
YE

2
2

A
1A

7Y
X
X

41
37

B
B
E4

23
9
9
Time: 03 Hours Marks: 80

YE
2

1A
1A

BX
7Y

37
B

E4
E4
23

A9
A9

X2
2

7Y
7Y

BX
BX

41
41

9B
Note: 1. Question 1 is compulsory

23
23

A9

E
A9

1A
Y
Y

BX
X
2. Answer any three out of the remaining five questions.

41
41

37
37
B

E4
A9
9

E
YE

X2
X2
3. Assume any suitable data wherever required and justify the same.

7Y
Y

41
41

37

9B
37

23

YE
9

YE
E

2
X2

1A
1A

BX
Y

BX

37
Q1 a) Explain how big data problems are handled by Hadoop system. [5]

37
37
9B

E4
E4

A9
A9

X2
2
X2
1A

7Y
Y

BX
b) Mention four characteristics of big data and explain in detail. [5]

41
41

9B
7
B
E4

23
23

A9

YE
9

YE

1A
1A
c) List and explain the core business drivers behind the NoSQL movement. [5]

BX
7Y

BX

41

37
7

E4
E4

3
23

A9
A9
A9

X2
X2
d) Explain the concept of bloom filter with an example. [5]

7Y
7Y
Y
BX

41
41
41

9B
7

9B

23
3
3

YE
E
A9

YE

2
2

1A
1A

7Y

BX
BX
BX

37
41

E4
E4
Q2 a) What is graph store? Give an example where a graph store can be used to effectively [10]

3
3

A9
9
9

X2
YE

2
X2

1A
A

7Y
7Y

BX
solve a particular business problem.

41
41

9B
37

23
3

A9

YE
9

YE
E

2
X2

1A
1A

b) Write a map reduce pseudo code for word count problem. Illustrate with an example [10]

BX
7Y

BX

41

37
37
9B

E4
E4

showing all the steps.


3

9
9

X2
X2
X2

A
1A

7Y
1A

7Y
7Y

41

9B
B
9B

23
E4

3
3

A9

E
YE

X2
2

1A
1A

7Y

BX
7Y

BX

41

Q3 a) Suppose the stream is S = {4, 2, 5 ,9, 1, 6, 3, 7}. Let hash functions h(x) = 3x + [10]
7

E4
4

3
3
3

A9
A9
9

E
YE

2
X2
X2

7mod 32 for some a and b, treat result as a 5-bit binary integer. Show how the
A

7Y
7Y

BX

41
41
41
7

9B
9B

Flajolet- Martin algorithm will estimate the number of distinct elements in this

23
3
3

YE
YE
YE

X2
X2

A
A
1A

stream.
BX
41
41

37
37
7

B
9B
4

9
A9

YE
E

X2
YE

2
X2

1A
b) Describe applications of data visualization.
1A

[10]
7Y

BX
1

9B
37

4
4
4

23
3

E
A9

YE
YE

2
2

1A
1A

7Y
BX
BX
BX

7
37

E4
4
4

23
3

A9
A9

YE
A9

YE

2
2

7Y
BX
BX
BX

41
1

Q4 a) Explain selection and projection relational algebraic operation using MapReduce. [10]
41

7
37

23
3

A9
9

E
A9

E
YE

X2
2

1A

7Y

BX
Y
BX

1
1

37

B
37

E4
4
E4

23

A9
A9
A9

b) Explain DGIM algorithm for counting ones in a stream with example. [10]
X2
2

7Y
7Y

BX
7Y
BX

41
1
1

4
E4

23
3

Q5 a) Determine communities for the given social network graph using Girvan- Newman [10]
23

A9

YE
A9

E
A9

X2

7Y

BX
7Y
BX

algorithm.
1
1

37
9B

4
E4

3
23

A9
YE
A9

X2
X2
A
7Y
BX

41
41
1

9B
B
E4

3
23

YE
A9

YE

X2

1A
1A
7Y

BX

A B C
1

37
37

E4
4
E4
23

9
A9

YE

X2
2

1A

7Y
7Y

BX
1

37

B
4
E4

23
23

A9
A9

E
2

7Y

BX
7Y

BX
BX

1
1

4
E4

23
23

A9
A9

YE
A9

7Y

X
BX

41
41
1

37
B
E4

23

E F
A9

YE
A9

YE

X2

D
BX

41
41

37
37

9 B
A9

E
YE

X2
2

1A
7Y
BX
1

9B
37

E4
E4

23
A9
X2

1A
7Y
7Y

BX
1
9B

E4
E4

23
23

A9

7Y
7Y

BX
BX

1
E4

23
3

9
A9

X2

1A
7Y

BX
1

9B

57520 Page 1 of 2
E4
E4

23

A9
1A

7Y
7Y

BX

41
E4

23
23

A9

YE
7Y

BX
41

37
23

X237YE41A9BX237YE41A9BX237YE41A9BX237YE41A9B
A9
YE

X2
23 9B E4 23 A9
7Y X2 1 7Y BX
E4 37
A9 E4 2
1 A9 YE BX 1 A9
37
YE
23 41 23
7Y BX
A9 7Y BX 41
E4 23 E4 23 A9
1A 7Y BX 1 7Y BX
9B E4 23 A9 E4 2
YE X2 1 A9 7Y BX 1 A9
37
YE
41 3 7Y BX E4
1
23
7Y BX 4

Q6 a)

b)
b)
1A

57520
A9 E4 23 A9 E4 2
BX 1 7Y BX 1 37 9B
23 A9 E4 23 A9 YE X2
7Y BX 1 7Y 7Y BX 4 1A 3

ii.
i.
A9 E4 2 37
A9 E4 E4 2 9B
BX 1 BX 1A 1 37
A9 23
41 2 37
A9
B
YE X2
3 9B
YE
4
7Y BX
A9 YE X2 1 A9 7 YE X2
E4 23
1A 7Y BX 41
A
37
Y B 4 37
6
5
4
3
2
1

X2
E 2 1A
37 9 41 37 9B E 4
X2
3 9B YE
BX 1 7 41
course

YE 23 A9 YE X2 A9 YE X2

the output.
3 A9
41 7Y BX 4 1A 7Y BX 41 3 7Y BX
A9 E4 23 9 E 2 A 9 E
BX B 4 3 23
id

7Y 41
11

16
15
14
13
12

1A X2 1A 7Y BX
A 7Y
23
7Y 9 B E 4 3 9 BX E 41 23 9 E4
X2 1A 7Y A 7Y BX 1A
E4 37 9 B E4 2 3 9B E4 2 3 9B
1A YE X2 1 A9 7 YE X2 1A 7Y X2
2
1
2
1
2
1

9B 4 1 3 7 B 4 1 3 7 9 B E4 37
class

X2 A9 YE X2 A9 YE X2 1A YE
37 B 4 3 B 4 3 9

Text: 2, 3, 4, 5, 6.7, 7, 8.1, 9


YE X2 1 A9 7 YE X2 1A 7 YE B X2 41
41 37 BX 4 1A 3 7Y 9 BX 41 37
A9
BX
A9 YE 23 9 E 2 A9 Y E
3 23
53
84
69
48
75
56

BX 41 7Y BX 41 7Y BX 41 7Y

Page 2 of 2
E 2 A
marks

23 A9 A9
7Y BX 41 37 B E4 23
7 9 B E4
X2 1A Y X 1A
Consider the following data frame given below:

E4 23 A9 YE 23
1A 7Y BX 41 37 9B E4 9B
A9 YE X2 1A 7Y X2
9B E4
1
23
7 B 41A 3 7 9 B E4 37
X2 A9 YE X2 YE X2 1A
37 BX 41 3 7Y 9 BX 41 37 9 B
YE
4
YE 23 A 9 E 2 A9 Y E X 2
41 7Y BX 41 3 7Y BX 41 37 Describe collaborative filtering in recommendation system.
A9 E 2 A9 E 2 A9 YE

____________________________________
BX 4 1A 3 7Y BX 41 37 B 41
23 9 E 2 A9 Y E X 2 A9
3 B

X237YE41A9BX237YE41A9BX237YE41A9BX237YE41A9B
7Y BX 41 7Y BX 41 37
to 1 by using subset () function and demonstrate the output.

E4 23 A9 E 2 A9 YE
Paper / Subject Code: 42172 / BIG DATA ANALYTICS

1A 7Y BX 41 37 BX 41
9B E4 23
7Y
A9
BX
YE
41 2 37
A9
X2 1A A9 YE BX
37 9B E4 23 BX 41 23
YE X2 1A 7Y A
41 37 9B E4 23 9B
ii. Explain the various functions provided by R to combine different sets of data.

7Y
Create a subset where the course column is less than 4 or the class equals

A9 YE X2 1A X2
Create a subset of course less than 5 by using [ ] brackets and demonstrate

BX 41 37 9B E4 37
A9 YE X2 1A YE
23
7Y BX 41 37 9 B 4
Y E X 2
[10]
i. Write a script to create a dataset named data1 in R containing the following text. [10]

23 A9
[10]

E4 BX 41 37
1A 7Y A9 YE
9B E4 23 BX 41
X2 1A 7Y A9
37 9B E4 23
YE X2 1A 7Y
41 37 9B E4
A9 YE X2 1A
9
7
A0

3
C

A4

B7

17
58
Paper / Subject Code: 42172 / BIG DATA ANALYTICS

C3
CA

78
09
C7

A4

B7
3A

7
58
CB

81
CA

09
4C
C7

77
65

3A
58

AA
B

9B
B4

C
C7

A0
65
AD
Time: 03 Hours Marks: 80

4
8

A
CB
B4

C3
5

A
DB

C7
65
D

A4

3A
8
54

BA

B
4

5
Note: 1. Question 1 is compulsory

5C

CA
DB

4C
C7
73

46
81

58

AA
54

B
2. Answer any three out of the remaining five questions.

5C
B

DB
77

7
73
3. Assume any suitable data wherever required and justify the same.

4D

8C
C
46
9B

BA

CB
8

75
35

DB
7
A0

65
7

BC
17
9B
Q1 a) What is function of Map Tasks in the Map Reduce framework? Explain with the [5]

58
54

A
C3

4
8

5C
DB

DB
77
0

C7
73
help of an example.
A4

6
9B

54

BA
C3

CB
4
CA

8
b) Demonstrate how business problems have been successfully solved faster, cheaper [5]

B
7
A0

3
4

65
B7

D
17
58

A
and more effectively considering NoSQL Google’s MapReduce case study. Also

A
3

B4
CA

78
09

5
C
C7

DB
73
illustrate the business drivers and the findings in it.

AD
A
58

AA
CB

46
9B

4
3

35
4C
C7

c) Why is HDFS more suited for applications having large datasets and not when there [5]

DB

DB
77
A0
65

17
8

AA
CB

are small files? Elaborate.

9B
B4

54

BA
C3
75

8
77
A0
65
AD

73
8C
C

A4
d) Explain the concept of bloom filter with an example [5]

4D
B

9B
B4

81
C3
75
5C

A
DB

35
77
A0
D

C
C

4
6

17
8

AA
54

CB

9B
4

Q2 a) Name the three ways that resources can be shared between computer systems. Name [10]

3
5
B

DB

78
4C
C7
3

A0
65
4D

8C
17

the architecture used in big data solutions and describe it in detail.

B7
AA
A

CB
B4
8

C3
75
5

09
77

73

b) Write a map reduce pseudo code for word count problem. Apply map reduce [10]
65
D

8C
BC

A4

3A
9B

B4

working on the following document:


8

75
35

5C

A
DB
77
A0

4C
D

8C
C
7

46
9B

AA
54

BA
C3

CB
8

5
“This is an apple. Apple is red in color”.
DB
7
A0

C7
3
A4

65
7

8C
17
9B

BA
C3

CB
Q3 a) Suppose the stream is 1, 3, 2, 1, 2, 3, 4, 3, 1, 2, 3, 1. Let h(x) = 6x + 1 mod 5. [10]
B4
CA

78

75
35
A0
A4

65

Show how the Flajolet- Martin algorithm will estimate the number of distinct
7

4D

AD

BC
7
58

1
C3

B4
CA

8
09

elements in this stream.


35
C7

5C
B
77
A4

D
3A

17
58
B

46
9B

54

BA

b) Consider the following data frame given below: [10]


5C

CA

78
C
C7

DB
A0

73
4

subject class marks


7

4D
6

58

A
CB

B
B4

BA
C3
A

1 1 56
78
09

35
C7
65
AD

8C

B7

4D
3A

2 2 75
7
A
CB
B4

81
5

CA

09

3 1 48
35
4C
C7

77
65
AD

17

4 2 69
58

AA
CB

9B
B4

C3
DB

78
7

5 1 84
A0
65
AD

8C
C

A4

B7
54

CB

6 2 53
B4

C3
75

CA
DB

09
73

65
AD

A4

3A
81

58
54

B
B4

i. Create a subset of subject less than 4 by using subset () function and demonstrate
5C

A
B

4C
C7
73

4D

AD

the output.
6
81

AA
CB
4

5
35

DB

B
77

ii. Create a subset where the subject column is less than 3 and the class equals to 2
7
65
D

8C
BC
17
B

54

by using [ ] brackets and demonstrate the output.


B4
8
09

75
5C
DB
77

73

D
3A

BC
46
9B

81

54

Q4 a) What are the Core Hadoop components? Explain in detail. [10]


5C
DB

DB
77
A0

73

46
9B

81

54

BA
C3

b) With a neat sketch, explain the architecture of the data-stream management system. [10]
DB
77
A0

73
4

4D
AA

9B

81

Q5 a) Determine communities for the given social network graph using Girvan- Newman [10]
BA
C3

35
77
A0
C

algorithm.
4

4D
17
58

AA

9B
C3

35
77
A0
C

17
58

AA

9B
C3

78
C7

0
C

B7
3A
58

AA
CB

09

15786 Page 1 of 2
4C
C7
65

8C

3A
AA
CB
B4

75

4C
65

8C
BC

AA
B4

BADB465CBC758CAA4C3A09B77817354D
75
5C
7
A0

3
C

A4

B7

17
58
Paper / Subject Code: 42172 / BIG DATA ANALYTICS

C3
CA

78
09
C7

A4

B7
3A

7
58
CB

81
CA

09
4C
C7

77
65

3A
58

AA
B

9B
B4

C
C7
A B D E

A0
65
AD

4
8

A
CB
B4

C3
5

A
DB

C7
65
D

A4

3A
8
54

BA

B
4

5
5C

CA
DB

4C
C7
73

D
C G F

46
81

58

AA
54

B
5C
B

DB
77

7
73

4D

8C
C
46
9B

BA

CB
8

75
35

DB
7
A0

65
7

BC
17
9B

58
54

A
C3

4
b) [10]

8
The data analyst of Argon technology Mr. John needs to enter the salaries of 10

5C
DB

DB
77
0

C7
73
A4

A
employees in R. The salaries of the employees are given in the following table:

6
9B

54

BA
C3

CB
4
CA

B
7
A0

3
4

65
B7

D
17
58

A
Sr. No. Name of employees Salaries

A
3

B4
CA

78
09

5
C
C7

DB
73
4

AD
A
1 Vivek 21000
58

AA
CB

46
9B

4
3

35
4C
C7

DB

DB
77
A0
65

17
2 Karan 55000
8

AA
CB

9B
B4

54

BA
C3
75

8
77
A0
65
AD

73
8C
C

A4

4D
3 James 67000
B

9B
B4

81
C3
75
5C

A
DB

35
77
A0
D

C
C

4
4 Soham 50000
6

17
8

AA
54

CB

9B
4

3
5
B

DB

78
4C
C7
3

A0
65
4D

8C
17

5 Renu 54000

B7
AA
A

CB
B4
8

C3
75
5

09
77

73

65
D

8C
BC
6 Farah 40000

A4

3A
9B

B4
8

75
35

5C

A
DB
77
A0

4C
D

8C
7 Hetal 30000 C
7

46
9B

AA
54

BA
C3

CB
8

5
DB
7
A0

C7
3
A4

8 Mary 70000
65
7

8C
17
9B

BA
C3

CB
B4
CA

78

75
35
A0
A4

9 Ganesh 20000
65
7

4D

AD

BC
7
58

1
C3

B4
CA

8
09

35
C7

5C
B
77

10 Krish 15000
A4

D
3A

17
58
B

46
9B

54

BA
5C

CA

78
C
C7

DB
A0

73
4

4D
6

i. Which R command will Mr. John use to enter these values demonstrate the output.
58

A
CB

B
B4

BA
C3
A

78
09

35
C7

ii. Now Mr. John wants to add the salaries of 5 new employees in the existing table,
65
AD

8C

B7

4D
3A

7
A
CB
B4

which command he will use to join datasets with new values in R. Demonstrate the
81
5

CA

09

35
4C
C7

77
65
AD

output.
A

17
58

AA
CB

9B
B4

C3
DB

78
7

A0
65
AD

8C
C

A4

B7

Q6 a) i. Write the script to sort the values contained in the following vector in ascending [10]
54

CB
B4

C3
75

CA
DB

09

order and descending order: (23, 45, 10, 34, 89, 20, 67, 99). Demonstrate the
73

65
AD

A4

3A
81

58
54

output.
B
B4

5C

A
B

4C
C7
73

ii. Name and explain the operators used to form data subsets in R.
4D

AD

C
6
81

AA
CB
4

5
35

DB

b) How recommendation is done based on properties of product? Elaborate with a [10]


77

7
65
D

8C
BC
17
B

suitable example.
54

B4
8
09

75
5C
DB
77

73

D
3A

BC
46
9B

81

54

-----------------
5C
DB

DB
77
A0

73

46
9B

81

54

BA
C3

DB
77
A0

73
4

4D
AA

9B

81

BA
C3

35
77
A0
C

4D
17
58

AA

9B
C3

35
77
A0
C

17
58

AA

9B
C3

78
C7

0
C

B7
3A
58

AA
CB

09

15786 Page 2 of 2
4C
C7
65

8C

3A
AA
CB
B4

75

4C
65

8C
BC

AA
B4

BADB465CBC758CAA4C3A09B77817354D
75
5C
WXYZ[\]\^_`aZbc\defZg\hijki\]\lmn\oXcX\pqXrscmbt
14/11/2024 CSE-AIML SEM-VII C SCHEME BIG DATA ANALYTICS QP CODE: 10064705


9         773


!"#$%"&'()*"$")("+,-'.(/0#
1%"2*(3 /"4*0"5/ "'"6"5"/,4)*)*7"6)8"9'()*(%#
:%"2((', "4*0"(')4;."<44"35/8/"/9')/<"4*<"='()60"5"(4, %#
#
&%$"">4?#@A-.4)*"B"CD("6";)7"<44%# 34
>;?#E)66/*)4";3 *"F&G"8("F&G# 34
>+?#H/)"5".),)4)*("6"I4<-%# 34
><?#@A-.4)*"53"64).'/("4/"54*<.<")*"J4-"K <'+"=;# 34
# # 
&%1"">4?#L..'(/4"/.4)*4."4.7;/4"-/4)*("3)5"A4,-.%# 3
"""""""">;?#@A-.4)*";)7"<44"*4;.)*7"+5*.7)(%# # 3
# # 
&%:"">4?#@A-.4)*"MNO"4.7/)5,"4*<")("0-("3)5"*4".4;.<"<)47/4,# 3
"""""""">;?#N,-4/"<)66/*"0-("6""F&G"4/+5)+'/4."-4/*# 3
# # 
&%P"">4?#@A-.4)*"I4<-"2/+5)+'/4."J<."3)5";5"+,-**(")*"<4).# 3
"""""""">;?#H/)"5"6'*+)*("6"5"+,-**("4*<"A+')*"(-(")*"J4-" 3
K <'+#
# # 
&%B"">4?#H/)")(('(")*"<44"(/4,"9'/)(%"@A-.4)*"5")(('(")*"<44"(/4,)*7# 3
"""""""">;?#@A-.4)*"M47"/4*Q"'()*7"J4-"/<'+R"4.("A-.4)*"(-)</"/4-("4*<"<4<" 3
*<(#
# # 
&%S"">4?#@A-.4)*"NTK@"4.7/)5,"3)5")("4<84*47("8/"/4<))*4."+.'(/)*7" 3
4.7/)5,#
>;?#@A-.4)*"J8)"/+,, *<4)*"'()*7"N..4;/4)8"U;4(<"6)./)*7%# 3
#
"""#
VVVVVVVVVVVVVVVVVVVVVVVVV#

01234 6789 

uvwvxyz{|w}uvwvxyz{|w}uvwvxyz{|w}uvwvxyz{|w}
25
6F

55
52
85

X5
Paper / Subject Code: 42372 / Big Data Analytics

5Y

F8
5X
FF

6F

55
52
Y6

85

X5
5Y

F8
5X
FF
25

6F

55
52
Y6

85
X5

5Y

F8
5X
FF
25
55

F
52
Y6

85
X5
F8

6
5Y
5X
FF
25
6F

55
Time: 03 Hours Marks: 80

52
Y6

85
5
5Y

F8

6
X

5Y
5X
FF
5
6F

55
52
Note: 1. Question 1 is compulsory

52

52
Y6

85
Y

F8
5X

X
2. Answer any three out of the remaining five questions.

5X
5

FF
25
6F

55
2
85

Y6

85
3. Assume any suitable data wherever required and justify the same.

5
Y

F8
5X
FF

X
5

FF
25
6F

55
2
Y6

5
Q1 Write short notes on: [20]

X5
8

6
5
Y

F8
FF

5Y

F8
5X
25

5
5
a) Big Data and its characteristics

F
2
Y6

6F
X5

52
Y6

85
X5
8
b) Distance measures for Big Data

5Y
5X
5

F
55

5
F

5
2

F
2
6

5
c) The Map and Reduce Tasks

X5
F8

52
Y6

85
X5
Y

F 8

5X
25
6F

F
55
d) Bloom filter for stream data mining

5
F

F
2
6

5
5
5Y

Y6

85
X5
5Y

8
X
F

X
6F

FF
55
52

25
F

5
2
Q2 a) Explain HDFS architecture. [10]

55
6

5
5
5Y

F8
5X

Y6
X5
5Y

F8
5X

F8
b) Explain Column family store and Graph Store NoSQL architectural pattern [10]
6F
52
85

25
F

55
52
85

6F
6
with example. 5Y
X
FF

X5
5Y

F8
5X
F

5Y
5

6F
52
Y6

85

55
52
85

52
6
Y
X
FF

5Y

F8
5X
25

Q3 a) Write a Map reduce pseudo code to multiply two matrices. Illustrate [10]

FF

5X
25
5
6

85

6F
X5

52
6

85
X5
with an example showing all the steps.
Y

85
Y
F

5Y
5X
5

FF
55

25
F

FF
5
2

b) Explain Issues in Data stream query processing [10]


6

85
X5
F8

52
6

85
5
Y

Y6
Y
X
F

5X
5
6F

FF
5

25
F

55
2
5

25
6
5

Q4 a) List the main components of Map reduce execution pipeline. [10]


5Y

85
5
Y

8
5X
F

X5
Y
5X
F
5
F

FF
52

b) Explain DGIM algorithm. [10]

25
6F
2
6

55
5
5
5Y

8
5X

6
5
5Y

8
5X
F

5Y

F8
5X
F
F
2
85

F
2
6

85

6F
5

Q5 a) Explain Collaborative filtering system. How is it different from content [10]

52
Y6

5
5
5Y
5X
FF

8
5X
F

5Y
5X
F

based system .
5
F
52
Y6

85

F
52
Y6

85

52
Y6

85
5X
FF

X
25

5X
25

F
25
F

5
Y6

85

b) What is clique percolation method Write an algorithm on (CPM). [10]


X5

Y6

85
X5

Y6

85
X5
FF
25

FF
55

Also show how the CPM finds clique for the following graph. Explain with steps.
25

FF
55

25
5
6
X5
F8

Y6

5
X5
Y

Y6
X5
F8
FF
25
6F

55

25
55

25
F

55
6
X5
5Y

F8

Y6
X5
Y

F8

X5
F8
25
6F

55
52

5
6F

55

6F

55
52
5
5Y

F8
5X

F8
5X

5Y

F8
X
5
6F
52
85

55
52
5

6F
2
Y6
5Y

8
5X
FF

X5
F8
5X
F

5Y
5
F
52
85

55
2
Y6

52
Y6
X5
8
X
FF

F8
F

5X
5
55

5
F

5
52
Y6

F
2
Y6

5
F8

Y6

85
X5
8
X
25

F
25
6F

FF
55

25
F

5
X5

Y6

5
X5
5Y

F8

Y6
X5
F8
55

25
6F

5
52

25
F

55
5

.
6
X5
5Y

F8
5X

X5
5Y

F8
F

55
52
85

55
52
6

6
5Y

F8
5X
FF

5Y

F8
5X
6F
52

Q6 a) Explain PageRank algorithm. [10]


Y6

85

6F
52
5
5Y

F8
5X
FF

5Y
5X
25

b) Explain CURE algorithm. [10]


6F
52
Y6

85

52
85
5Y
5X
FF

5X
25

FF
2
Y6

5
X5

Y6

85
X5
F8
25

FF
55

25
6F

5
85
X5

*****************
F8

Y6
X5
5Y

FF
6F

55

25
55
52

6
F8

X5
5Y

F8
5X
6F

55
52
85

6
5Y

5Y

F8
5X
FF
52

55103 Page 1 of 1
6F
52
Y6

85
5X

5Y
5X
FF
25
85

52
Y6

85
X5

5X
FF
25
55

X525Y6FF855X525Y6FF855X525Y6FF855X525Y6FF855
Y6

85
X5
F8
4D

25

C
8F

96

FD
Paper / Subject Code: 42372 / Big Data Analytics

5E
AE
5D

03

4D

25
8F
28-Dec-2023 10:30 am - 01:30 pm 1T01877 - B.E. Computer Science &

96
A3

FD
5E
AE
5D

03
41
Engineering (Artificial Intelligence & Machine Learning) (R-2019-20 C Scheme) (Sem

4D

25
8F

96
A3
76

5E
VII) / 42372 - Big Data Analytics QP CODE: 10043750

AE
5D

3
9C

4D
Time: 03 Hours Marks: 80

6
A3
6
AC

39
C7

E
5D
41

A
F0
29

9
Note: 1. Question 1 is compulsory

6
A3
76
AC

E4
8
CE

9
5D
2. Answer any three out of the remaining five questions.

3
C

41

6A
F0
29
FD

C9

A3
3. Assume any suitable data wherever required and justify the same.

76

8
CE

9
A
25

5D

03
C

41
9
FD

C9
5E

F
2

A3
76
Q1 a) What is Hadoop and Why it Matters. [5]

D8
CE

A
4D

25

41
b) Compare traditional database and big data. [5]

5
D

9
5E

E2
AE

3
6
AC
5F

D8
c) Explain CAP theorem. State how it is different from ACID properties. [5]

1A
7
D

C
96

9
4

35
d) Compare DBMS VS DSMS. [5]

64
C9
E

E2
E
03

5F
5

1A
A

7
A
4D

9C
8F

96

E2

9
FD

64
E2
E
5D

AC
3

D5
6A

C7
0

C
Q2 a) Draw Hadoop Ecosystem and briefly explain its components. [10]
8F
A3

29
4

C9
9

E
AE
5D

76
03

5F
D5
1

A
64

DC

9C
b) Explain the four types of NoSQL database. 8F [10]

6
A3

29
4
9

5E
C7

AE
D

AC
3

5F
1

CE
F0
5

4D
64
C9

6
A3

29
D
D8

39

E
C7

AE
A

5F
5
1

CE
F0
29

35

4D
64

Q3 a) Explain architecture of Big data and give characteristics of it. [10]


C9

96

E2

FD
8
CE

1A
C7

AE
A

5D

03

5
29

4D

25
D

4
C9

F
b) Explain DGIM algorithm. [10]

6
A3
6
5F

8
CE

5E
7

AE
A

5D

03
C

1
E2

29

4D
D

64
C9

96
A3
5F
D5

D8
CE

AE
A

03
C

41
E2

29
E4

5
D

8F

96
3
6
C

Q4 a) List the main components of Mapreduce execution pipeline. [10]


5F
D5

CE

A
6A

7
9A

5D

03
9C

41
E2
E4

D
39

8F
2

3
76
C
5F
D5

1A
6A
F0

5D
C

9C
E2

29
E4

b) Explain cure algorithm. [10]


D
D8

64
39

A3
AC
5F
D5

CE
6A

C7
F0
35

41
E2

29
E4

D
D8

C9
39
1A

76
F
D5

E
6A
0

Q5 a) What is Recommender System? Explain Types of recommender system. [10]


35

9A
25
64

DC

9C
8F

E4
39
A

5E
7

2
5D

AC
9C

5F
41

CE
6A
F0

D
3

E2
6
AC

29
4

b) What is a Social Network? Give Varieties of Social Networks and the [10]
D
D8

39
1A
C7

5F
D5

CE
6A
F0

need for social network graph.


35
64
9

E2
AC

E4

FD
D8

39
1A
C7

5
6A
F0
29

35

4D

25
64
C9

D8
CE

39
1A

5E
C7

AE
A

Q6 a) Explain with example two major classes of distance measures. [10]


0
29

35

4D
FD

64
9

6
AC

8
CE

39
1A
C7

AE
25

5D

b) Explain the structure of web with suitable diagram. [10]


F0
29
FD

64
9

96
3
AC

8
CE

1A
C7
25

03
29

35
FD

64
9
5E

8F
AC

______________
CE

1A
C7
4D

25

5D
29
FD

64
9
5E
E

A3
AC
CE
6A

C7
4D

25

41
29
FD

C9
5E
AE

76
CE

A
4D

25

9C
96

29
FD
5E
AE

AC
03

CE
4D

25
F

96

29
FD
D8

5E
AE
03

CE
35

4D

25
8F

96

FD
5E
AE
5D

03

4D

25
8F

96
3
1A

5E
AE
5D

03

4D
64

43750 Page 1 of 1
8F

96
3
1A
C7

AE
5D

03
64
C9

8F

96
A3
C7

5D

03
41
C9

9AC9C7641A35D8F0396AE4D5E25FDCE2
8F
A3
76

You might also like