Technology Kharagpur
Indian Instituteof Artificinl Jntelligence
Departnent of (AIl1002)
Introduction to
Data Science
2025
April 16,
Clnss Test 2, Date:
questions)
Spring Semester 2024-25 Marks: 25 (5
Timing: 5:15 PM to 6:15 PM
the answer)
to be present with
(Allcalculations need
Science:
to - Programming for Data
I. This questions pertains importing
perform a Out
standard libraries to two-dimensional
program by contains (a) A
(2) Write a Python dataset. The input data sepal
lier rejection on Iris element is an array that
contains: sepal length,
NumPy array where each one-dimensional NumPy array
and petal width; (b) A versi
width, petal length,
represents the species of iris.
0- Iris setosa, 1 - Iris
where each element (5)
color, and 2 - Iris virginica.
Data:
2. This question pertains to - Collecting cost/volume of data
you design a predictive model to estimate the
(a) How would such a training data to
From where you would get (5)
storage five years from now.
optimize your model.
predict data storage. How
Suppose you built the above predictive model to (5)
(b)
would you evaluate itwhether it is a good predictive model?
Data:
3. This question pertains to - Cleaning
models. What types of outliers might
(a) What is the effect of outliers on predictive Salary
data sets: (a) Student grades. (b)
you expect to occur in the following technique to identify and remove
data (c) Lifespans in Wikipedia. Explain a (5)
those outliers in individual cases.
have on designing predictive
(b) What drawbacks do missing values in a dataset problem of missing values? (5)
models. During analysis, how do you address the
Best wishes!