0% found this document useful (0 votes)
50 views3 pages

Dsbda Nov2023

The document discusses data analytics and big data. It contains 8 questions related to data analytics concepts like data analytics lifecycle, roles in analytics projects, types of analytics, and clustering algorithms. It also discusses logistic regression, handling missing data, text analysis techniques like POS tagging and lemmatization. Evaluation metrics like accuracy, precision, recall for classification problems are defined. Text preprocessing steps are also explained.

Uploaded by

cryptoshubz1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views3 pages

Dsbda Nov2023

The document discusses data analytics and big data. It contains 8 questions related to data analytics concepts like data analytics lifecycle, roles in analytics projects, types of analytics, and clustering algorithms. It also discusses logistic regression, handling missing data, text analysis techniques like POS tagging and lemmatization. Evaluation metrics like accuracy, precision, recall for classification problems are defined. Text preprocessing steps are also explained.

Uploaded by

cryptoshubz1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Total No. of Questions : 8] SEAT No.

8
23
P-7545 [Total No. of Pages : 3

ic-
tat
[6180]-53

5s
T.E. (Computer Engineering)

3:3
02 91
9:5
DATA SCIENCE AND BIG DATA ANALYTICS

0
30
(2019 Pattern) (Semester - II) (310251)
2/1 13
Time : 2½ Hours] [Max. Marks : 70
0
2/2
.23 GP

Instructions to the candidates :


1) Answer Q1 or Q2, Q3 or Q4, Q5 or Q6. Q7 or Q8.
E
81

2) Neat diagrams must be drawn wherever necessary.

8
C

23
3) Figures to the right side indicate full marks.

ic-
4) Assume suitable data if necessary.
16

tat
5) Use of Scientific calculator is permitted.
8.2

5s
.24

Q1) a) Explain Data Analytics Cycle with suitable diagram and its phases. [8]
3:3
91
49

b) List and Explain the various activities involved in identifying potential


9:5
30

data resources as a part of discovery phase in Data Analytics Life Cycle?


30

[9]
01
02

OR
2/2
GP

Q2) a) List and explain the key roles for successful analytics project. [8]
2/1
CE

b) Write short note on : [9]


81

8
23
i) Common Tools for the Model Building
.23

ii) Model selection for Data Analytics ic-


16

tat
8.2

5s
.24

3:3

Q3) a) List and explain the various types of analytics in Big data. [9]
91
49

9:5

b) Calculates the support and confidence value for all the possible item sets.[9]
30
30

Transaction ID Items bought


01
02

1 Onion, Potato, Cold Drink


2/2
GP

2 Onion, Burger, Cold Drink


2/1

3 Eggs, Onion, Cold Drink


CE
81

4 Potato, Milk, Eggs


.23

5 Potato, Burger, Cold Drink, Milk, Eggs


16

OR
8.2

P.T.O.
.24
49
Q4) a) Explain the need of logistic regression along with its various types. [9]

8
23
b) Explain the following terms with suitable example. [9]

ic-
i) Removing Duplicates from dataset.

tat
5s
ii) Handling Missing Data

3:3
02 91
9:5
Q5) a) Suppose that the given data the task is to cluster points (with (x, y)

0
30
representing location) into three clusters, where the points are A1 (2, 10),
2/1 13
A2(2, 5), A3(8, 4), B1(5, 8), B2(7, 5), B3(6, 4), C1(1, 2), C2(4, 9). The
0
2/2
distance function is Euclidean distance. Suppose initially we assign A1,
.23 GP

B1 and C1 as the center of each cluster, respectively. [8]


E

Use the k-means algorithm to show only show only the first round of
81

8
C

23
execution with cluster center.

ic-
b) Explain the following Text Analysis steps with suitable example [9]
16

tat
8.2

i) Part-of-speech(POS)tagging

5s
.24

3:3
ii) Lemmatization
91
49

9:5
OR
30
30

Q6) a) Given the confusion matrix, Calculate Accuracy, Precision, Recall, Error
01
02

rate with description on Diabetic Risk. [8]


2/2
GP

Predicted classes
2/1

Classes Diabetic Risk Diabetic Risk


CE
81

8
-Yes -No

23
.23

Actual Diabetic Risk- 90 210


ic-
16

tat
classes Yes
8.2

5s

Diabetic Risk- 140 9560


.24

3:3
91

No
49

9:5
30

b) Explain the Text Preprocessing steps with suitable example. [9]


30
01
02
2/2

Q7) a) List the few data visualization tools and discuss any four applications of
GP
2/1

data visualization along with the use of the various plots with Python/R
CE

or suitable tool. [9]


81

b) List the challenges of Data Visualization. Explain the types of visualization


.23

with example. [9]


16
8.2

OR
.24

[6180]-53 2
49
Q8) a) Explain in detail the Hadoop Ecosystem with suitable diagram along with

8
23
the various components. [9]

ic-
b) Write a short note on the following. [9]

tat
5s
a) Map Reduce

3:3
b) Pig

02 91
9:5
0
30
2/1 13 
0
2/2
.23 GP
E
81

8
C

23
ic-
16

tat
8.2

5s
.24

3:3
91
49

9:5
30
30
01
02
2/2
GP
2/1
CE
81

8
23
.23

ic-
16

tat
8.2

5s
.24

3:3
91
49

9:5
30
30
01
02
2/2
GP
2/1
CE
81
.23
16
8.2
.24

[6180]-53 3
49

You might also like