0% found this document useful (0 votes)
73 views4 pages

CEG Assessment II

The document provides information about an internal assessment test for a data science and analytics course. It includes details like course outcomes, exam date and duration, instructions, questions in different parts covering various concepts, and a marking scheme.

Uploaded by

M S Shanmukhaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views4 pages

CEG Assessment II

The document provides information about an internal assessment test for a data science and analytics course. It includes details like course outcomes, exam date and duration, instructions, questions in different parts covering various concepts, and a marking scheme.

Uploaded by

M S Shanmukhaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Roll No.

DEPARTMENT OF INFORMATION SCIENCE AND TECHNOLOGY, ANNA UNIVERSITY, CHENNAI

INTERNAL ASSESSMENT TEST II

VI Semester – B.TECH. in INFORMATION TECHNOLOGY


(R2019)
IT5602 – DATA SCIENCE AND ANALYTICS

Academic Session: August 2023 – December 2023

Program: B.Tech. IT Year / SEM: 2/3


Max. Marks: 50 Duration: 90 mins
Date of Exam: 06.05.2024 Faculty names: Dr. S. Sendhilkumar

CO 1 To learn the fundamentals of data science and big data.


CO 2 To gain in-depth knowledge on descriptive data analytical techniques.
CO 3 To gain knowledge to implement simple to complex analytical. Algorithms in big data
frameworks.
CO 4 To develop programming skills using required libraries and packages to perform data
analysis in Python.
CO 5 To understand and perform data visualization, web scraping, machine learning and
natural language processing using various Data Science tools.
BL – Bloom’s Taxonomy Levels
(L1 - Remembering, L2 - Understanding, L3 - Applying, L4 - Analyzing, L5 - Evaluating, L6 - Creating)

PART- A (7 x 2 = 14 Marks)

Q. No Questions Marks CO BL
1 Give the significance of Pearson’s coefficient in bivariate analysis. 2 3 L2
Also interpret its possible values.
2 Differentiate between bias and variance and state the tradeoff 2 3 L2
between these parameters in Machine Learning.
3 State how Hadoop is fault-tolerant. 2 4 L1
4 What is reinforcement learning? How it is different from 2 2 L2
unsupervised learning?
5 Use the data given in question No. 8(b) and create 3 numpy 2 5 L4
arrays with 9 data elements each. Write a simple Python
program to find the mean of every NumPy array in the given list?
6 What is the function of job tracker and task tracker in Hadoop 2 4 L1
architecture?
7 What type of OLAP servers can be implemented in a warehouse 2 3 L1
framework? Brief each type in a sentence or two.

PART- B (2 x 12 = 24 Marks)
Q. No Questions Marks CO BL
8(a) (i) Consider the following variables X and Y: 6+6 3 L3
X = [1, 2, 3, 4]
Y = [1, 4, 9, 15]
Apply polynomial regression to find the coefficients using the
matrix approach and hence the polynomial regression equation.
(ii) Let's say you want to know if gender has anything to do with
political party preference. You poll 440 voters in a simple random
sample to find out which political party they prefer. The results of
the survey are shown in the table below:
Republican Democrat Independent Total
Male 109 59 22 200
Female 120 65 25 220
Total 240 130 50 440
To see if gender is linked to political party preference, perform a
Chi-Square test of independence. (For 5% level of significance
and dof=2, the tabulated Chi-square value = 5.991).
(OR)
8(b) Suppose that the data for analysis includes the attribute age. The 12 3 L3
age values for the data tuples are (in increasing order) 13, 15, 16,
16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33,
33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
(i) Use smoothing by bin means to smooth the data,
using a bin depth of 3.
(ii) Use min-max normalization to transform the value
35 for age onto the range [0.0 to 1.0].
(iii) Use z-score normalization to transform the value 35
for age, where the standard deviation of age is 12.94
years.
(iv) Use normalization by decimal scaling to transform
the value 35 for age.

9(a) (i) Consider an enterprise that deals with very large amount of 12 4 L4
data, such as terabytes or petabytes structured in warehouse
schemas, and the data/queries come in at high velocity. Also, the
enterprise requires a high availability of data for answering
various ad-hoc queries. Suggest a suitable architecture that
enforces effective querying on the warehouse schemas by
transforming the queries into suitable map-reduce tasks and
explain how it works with suitable diagrams.
(OR)
9(b) (i) Consider a word counting problem on large set of web pages 12 4 L4
(size = 10GB) that is stored in a Hadoop distributed framework.
Explain how this task if submitted as a Map-Reduce program
will be executed in HDFS with a neat diagram and necessary
steps.

PART- C (1 x 12 = 12 Marks)

Q. No Questions Marks CO BL
10 (i) Calculate the Eigen Value and Eigen Vector for the data given 6 2 L5
in the below Table.

Feature Example 1 Example 2 Example 3 Example 4

X1 13 7 4 8

X2 5 14 11 4

Given the table below do the following: 6 3 L5


(ii) Estimate the conditional probabilities for P(A|+), P(B|+), P(C|
+), P(A|−), P(B|−), and P(C|−).
(iii) Use the estimate of conditional probabilities given in the previous
question to predict the class label for a test sample (A = 0, B =
1, C = 0) using the naıve Bayes approach.
Mark Distribution:
Question. Marks / CO Total Marks / BL
No Marks
CO 1 CO 2 CO 3 CO 4 CO 5 L1 L2 L3 L4 L5 L6
1 2 2 2
2 2 2 2
3 2 2 2
4 2 2 2
5 2 2 2
6 2 2 2
7 2 2 2
8 12 12 12
9 12 12 12
10 6 6 12 12
Total - 8 24 16 2 50 L1+L2=12 L3+L4=26 L5+L6=12
Mark
Distribution - 16% 48% 32% 4% 100 24% 52% 24%
in (%)

Date: 03/05/2024 Course Instructor(s) Signature

Professor In-charge Signature

You might also like