CSE1703 - Fundamental of Data Science

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Name:

Enrolment No:

End-Term Examination – January 2022

COURSE: CSE1703 – Fundamental of Data Science

Programme: B.Tech: CSE, CSC Semester: Ist

Duration: 1.5 hrs. Max. Marks: 40

Instructions:
 All Questions are compulsory.
 All questions are Objective Type.
 Excel sheet to solve questions will be provided to you just before the exam.
 Exam is of 40 marks and you have 1.5 hours to complete it.

Section A (Total 10 Marks)

1 A. Which among the following is not related to data cleaning? CO1


a) Standardizing the process
b) Removing duplicate data
c) Getting rid of unwanted observations
d) Getting rid of some independent variable which are not related to dependent
variable

B. Which of the following would be more appropriate to be replaced with question


mark in the following figure?

CO1 [1
X10
=
10]

a) Data Analysis
b) Data Science
c) Descriptive Analytics
d) None of the mentioned
C. Which of the following is not true about Pivot Tables? Select all that apply CO1
a) Pivot tables can be filtered by multiple columns
b) Pivot tables automatically calculate grand totals of rows and columns
c) Editing a pivot table will impact the original data source
d) Dates in a pivot table can be grouped by years, quarters, months, days, hours,
minutes and seconds.

D. Point out the correct statement.


a) Raw data is original source of data
b) Pre-processed data is original source of data CO1
c) Raw data is the data obtained after processing steps
d) None of the mentioned

E. What type of association (correlation) does this graph have?

CO2

a. positive linear association


b. negative linear association
c. no association
d. nonlinear association

F. Say, we have 200 input features and 1 target variable. Now you have to select 20 most
important features based on the relationship between input and the target features. Do CO1
you think, this is an example of dimensionality reduction?
a. Yes
b. No

G. Which of the following techniques would perform better for reducing dimensions of a
feature set? CO1
a. Removing columns with dissimilar data trends
b. Removing columns which have high variance in data.
c. Removing columns which have too many missing values
d. None of these
H. PCA can be used for projecting and visualizing data in lower dimensions.
a. True CO3
b. False

I. The eigenvector corresponding to the largest eigenvalue gives the


a. direction of the Lowest variance of the data
b. direction of the highest variance of the data
c. direction of the equal variance of the data CO2
d. do not provide any information about the direction of the variance of the data

J. How can you handle missing or corrupted data in a dataset?


CO1
a. Drop missing rows or columns
b. Assign a unique category to missing values
c. Replace missing values with mean/median/mode
d. All of the above

SECTION B (Total 30 Marks)


2 Problem Statement: Use the following code and extend it to answer the following questions.

import matplotlib.pyplot as plt


import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
url= "https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data"
df=pd.read_csv(url, header = None)
headers=["symboling","normalized-losses","make","fuel-type","aspiration","num-of-
doors","body-style","drive-wheels","engine-location","wheel-
base","length","width","height","curb-weight","engine-type","num-of-cylinders","engine-
size","fuel-system","bore","stroke","compression-ratio","horsepower","peak-rpm","city-
mpg","highway-mpg","price" ]
df.columns=headers CO2

a. What is the dimension of the data frame: df-


I. (205, 26)
II. (26,205) [2]
III. (205,25)
IV. (25,206)

b. You can see the ‘?’ in the dataframe. Remove all the ‘?’ and the dimension of the
resultant dataframe is:
I. (170,26)
II. (159, 26) [3]
III. (201,26)
IV. (205,26)

c. What is the mean city-mpg? [3]


I. 32.08
II. 26.52
III. 98.26
IV. 62.52 [2]
d. What is the Maximum highway-mpg?
I. 49
II. 54
III. 50
IV. 60

3 Use the Excel sheet Summer-Olympic-medals-1976-to-2008student.xlsx, provided to you for


the following problem.
Problem Statement: Use the following code and extend it to answer the following
questions.

import pandas as pd
import seaborn as sns
import numpy as np
import csv
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
# Reading an excel file using Python
import xlrd
#read excel Data

df = pd.read_excel("…Insert path…/Summer-Olympic-medals-1976-to-2008student.xlsx")
df.head()

a. Find the total number of medals won by city Athens.


i. 1859 CO2 [2]
ii. 1998
iii. 1705
iv. 2042
b. Find the total number of medals won by city Sydney
i. 1305
ii. 2015 [3]
iii. 1546
iv. 1387

c. Find the total number of medals won by city Atlanta in the sport ‘Wrestling’
i. 48
ii. 70 [3]
iii. 60
iv. 54
d. How many total medals won by Women from the city Sydney?
i. 600
ii. 777
iii. 899 [2]
iv. 889

4 Use the Excel sheet outlier_example2.xlsx, provided to you for the following problem. CO2

Problem Statement: Use the following code and extend it to answer the following
questions.
import csv
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
import seaborn as sns

# Reading an excel file using Python


import xlrd

# Give the location of the file


df = pd.read_excel("…insert path…/outlier_example2.xlsx")
df.head()
df.shape

a. Find the number of outliers in the given data.


[2]
i. 1
ii. 2
iii. 3
iv. 4
b. What is the inter quartile range of the data [3]
i. 9750
ii. 16000
iii. 9000
iv. 18750

5 Let’s assume you wanted to predict the Car price based on the parameter Highway-mpg.
After building a model you will obtain following results
CO3 [2]

What will be final model:


a. price = - 821.73 + 38423.31 x highway-mpg
b. price = 38423.31 - 821.73 x highway-mpg
c. price = 821.73 + 38423.31 x highway-mpg
d. price = 38423.31 + 821.73 x highway-mpg
e. highway-mpg = - 821.73 + 38423.31 x price
f. highway-mpg = 38423.31 - 821.73 x price

6 Consider the following code and the respective solutions. What is the final estimated linear CO3 [3]
model that you get?

a. Price = -15806.62 + 53.49 x horsepower + 4.707 x curb-weight + 81.53x engine-


size + 36.05 x highway-mpg
b. Price = 53.49 x horsepower + 4.707 x curb-weight + 81.53x engine-size + 36.05
x highway-mpg
c. Price = -15806.62 x horsepower + 53.49 x curb-weight + 4.707 x engine-size +
81.53 x highway-mpg + 36.05
d. highway-mpg = -15806.62 + 53.49 x horsepower + 4.707 x curb-weight + 81.53
x engine-size + 36.05 x Price
e. highway-mpg = 53.49 x horsepower + 4.707 x curb-weight + 81.53 x engine-size +
36.05 x Price

You might also like