CSE1703 - Fundamental of Data Science

Name:
Enrolment No:
End-Term Examination – January 2022
COURSE: CSE1703 – Fundamental of Data Science
Programme: B.Tech: CSE, CSC Semester: Ist
Duration: 1.5 hrs. Max. Marks: 40
Instructions:
 All Questions are compulsory.
 All questions are Objective Type.
 Excel sheet to solve questions will be provided to you just before the exam.
 Exam is of 40 marks and you have 1.5 hours to complete it.
Section A (Total 10 Marks)
1 A. Which among the following is not related to data cleaning? CO1

a) Standardizing the process
b) Removing duplicate data
c) Getting rid of unwanted observations
d) Getting rid of some independent variable which are not related to dependent
variable
B. Which of the following would be more appropriate to be replaced with question

mark in the following figure?
CO1 [1
X10
=
10]
a) Data Analysis
b) Data Science
c) Descriptive Analytics
d) None of the mentioned
C. Which of the following is not true about Pivot Tables? Select all that apply CO1
a) Pivot tables can be filtered by multiple columns
b) Pivot tables automatically calculate grand totals of rows and columns
c) Editing a pivot table will impact the original data source
d) Dates in a pivot table can be grouped by years, quarters, months, days, hours,
minutes and seconds.
D. Point out the correct statement.

a) Raw data is original source of data
b) Pre-processed data is original source of data CO1
c) Raw data is the data obtained after processing steps
d) None of the mentioned
E. What type of association (correlation) does this graph have?
CO2
a. positive linear association

b. negative linear association
c. no association
d. nonlinear association
F. Say, we have 200 input features and 1 target variable. Now you have to select 20 most
important features based on the relationship between input and the target features. Do CO1
you think, this is an example of dimensionality reduction?
a. Yes
b. No
G. Which of the following techniques would perform better for reducing dimensions of a
feature set? CO1
a. Removing columns with dissimilar data trends
b. Removing columns which have high variance in data.
c. Removing columns which have too many missing values
d. None of these
H. PCA can be used for projecting and visualizing data in lower dimensions.
a. True CO3
b. False
I. The eigenvector corresponding to the largest eigenvalue gives the

a. direction of the Lowest variance of the data
b. direction of the highest variance of the data
c. direction of the equal variance of the data CO2
d. do not provide any information about the direction of the variance of the data
J. How can you handle missing or corrupted data in a dataset?

CO1
a. Drop missing rows or columns
b. Assign a unique category to missing values
c. Replace missing values with mean/median/mode
d. All of the above
SECTION B (Total 30 Marks)

2 Problem Statement: Use the following code and extend it to answer the following questions.
import matplotlib.pyplot as plt

import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
url= "https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data"
df=pd.read_csv(url, header = None)
headers=["symboling","normalized-losses","make","fuel-type","aspiration","num-of-
doors","body-style","drive-wheels","engine-location","wheel-
base","length","width","height","curb-weight","engine-type","num-of-cylinders","engine-
size","fuel-system","bore","stroke","compression-ratio","horsepower","peak-rpm","city-
mpg","highway-mpg","price" ]
df.columns=headers CO2
a. What is the dimension of the data frame: df-

I. (205, 26)
II. (26,205) [2]
III. (205,25)
IV. (25,206)
b. You can see the ‘?’ in the dataframe. Remove all the ‘?’ and the dimension of the
resultant dataframe is:
I. (170,26)
II. (159, 26) [3]
III. (201,26)
IV. (205,26)
c. What is the mean city-mpg? [3]

I. 32.08
II. 26.52
III. 98.26
IV. 62.52 [2]
d. What is the Maximum highway-mpg?
I. 49
II. 54
III. 50
IV. 60
3 Use the Excel sheet Summer-Olympic-medals-1976-to-2008student.xlsx, provided to you for

the following problem.
Problem Statement: Use the following code and extend it to answer the following
questions.
import pandas as pd
import seaborn as sns
import numpy as np
import csv
import pandas as pd
import numpy as np
# Reading an excel file using Python
import xlrd
#read excel Data
df = pd.read_excel("…Insert path…/Summer-Olympic-medals-1976-to-2008student.xlsx")
df.head()
a. Find the total number of medals won by city Athens.

i. 1859 CO2 [2]
ii. 1998
iii. 1705
iv. 2042
b. Find the total number of medals won by city Sydney
i. 1305
ii. 2015 [3]
iii. 1546
iv. 1387
c. Find the total number of medals won by city Atlanta in the sport ‘Wrestling’
i. 48
ii. 70 [3]
iii. 60
iv. 54
d. How many total medals won by Women from the city Sydney?
i. 600
ii. 777
iii. 899 [2]
iv. 889
4 Use the Excel sheet outlier_example2.xlsx, provided to you for the following problem. CO2
Problem Statement: Use the following code and extend it to answer the following
questions.
import csv
import pandas as pd
import numpy as np
import seaborn as sns
# Reading an excel file using Python

import xlrd
# Give the location of the file

df = pd.read_excel("…insert path…/outlier_example2.xlsx")
df.head()
df.shape
a. Find the number of outliers in the given data.

[2]
i. 1
ii. 2
iii. 3
iv. 4
b. What is the inter quartile range of the data [3]
i. 9750
ii. 16000
iii. 9000
iv. 18750
5 Let’s assume you wanted to predict the Car price based on the parameter Highway-mpg.
After building a model you will obtain following results
CO3 [2]
What will be final model:

a. price = - 821.73 + 38423.31 x highway-mpg
b. price = 38423.31 - 821.73 x highway-mpg
c. price = 821.73 + 38423.31 x highway-mpg
d. price = 38423.31 + 821.73 x highway-mpg
e. highway-mpg = - 821.73 + 38423.31 x price
f. highway-mpg = 38423.31 - 821.73 x price
6 Consider the following code and the respective solutions. What is the final estimated linear CO3 [3]
model that you get?
a. Price = -15806.62 + 53.49 x horsepower + 4.707 x curb-weight + 81.53x engine-

size + 36.05 x highway-mpg
b. Price = 53.49 x horsepower + 4.707 x curb-weight + 81.53x engine-size + 36.05
x highway-mpg
c. Price = -15806.62 x horsepower + 53.49 x curb-weight + 4.707 x engine-size +
81.53 x highway-mpg + 36.05
d. highway-mpg = -15806.62 + 53.49 x horsepower + 4.707 x curb-weight + 81.53
x engine-size + 36.05 x Price
e. highway-mpg = 53.49 x horsepower + 4.707 x curb-weight + 81.53 x engine-size +
36.05 x Price

CSE1703 - Fundamental of Data Science

Uploaded by

Copyright:

Available Formats

CSE1703 - Fundamental of Data Science

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CSE1703 - Fundamental of Data Science

Uploaded by

Copyright:

Available Formats

Name:

End-Term Examination – January 2022

COURSE: CSE1703 – Fundamental of Data Science

Programme: B.Tech: CSE, CSC Semester: Ist

Duration: 1.5 hrs. Max. Marks: 40

Section A (Total 10 Marks)

1 A. Which among the following is not related to data cleaning? CO1

B. Which of the following would be more appropriate to be replaced with question

D. Point out the correct statement.

E. What type of association (correlation) does this graph have?

a. positive linear association

I. The eigenvector corresponding to the largest eigenvalue gives the

J. How can you handle missing or corrupted data in a dataset?

SECTION B (Total 30 Marks)

import matplotlib.pyplot as plt

a. What is the dimension of the data frame: df-

c. What is the mean city-mpg? [3]

3 Use the Excel sheet Summer-Olympic-medals-1976-to-2008student.xlsx, provided to you for

a. Find the total number of medals won by city Athens.

# Reading an excel file using Python

# Give the location of the file

a. Find the number of outliers in the given data.

What will be final model:

a. Price = -15806.62 + 53.49 x horsepower + 4.707 x curb-weight + 81.53x engine-

You might also like