Midterm Text
Midterm Text
The total score of the test is 150 points, with a maximum score
of 120 points.
Note 2: You can open books and use AI tools for assistance,
but you CANNOT directly copy the answers they provide. In
other words, please try to describe the answers in your own
words after understanding them.
I. (10%) True or False Questions (Please answer T for True
or F for False):
a) describe()
b) read_csv()
c) plot()
d) fit()
a) Linear Regression
b) K Nearest Neighbors
c) K-means
d) PCA
3. Which function is used for handling missing values in
data analysis?
a) dropna()
b) fillna()
c) isnull()
a) Groupby()
b) Split()
c) Filter()
d) Map()
5. Which chart can be used to visualize the distribution of
data?
a) histogram
b) boxplot
c) scatter plot
import pandas as pd
df = pd.read_csv("question.csv")
data_dict = {}
for i in range(df.shape[0]):
category = df.loc[i, "Category"]
if (df.loc[i,'Group'] not in data_dict):
data_dict[df.loc[i,'Group']] = {}
data_dict[df.loc[i,'Group']][category] = df.loc[i,
'Value']
group_list = []
categoryA_list = []
categoryB_list = []
categoryC_list = []
categoryD_list = []
categoryE_list = []
for k, v in data_dict.items():
group_list.append(k)
if "A" in v:
categoryA_list.append(1)
else:
categoryA_list.append(0)
if "B" in v:
categoryB_list.append(1)
else:
categoryB_list.append(0)
if "C" in v:
categoryC_list.append(1)
else:
categoryC_list.append(0)
if "D" in v:
categoryD_list.append(1)
else:
categoryD_list.append(0)
if "E" in v:
categoryE_list.append(1)
else:
categoryE_list.append(0)
df_new["A_Count"] = df_new["A"]
df_new["B_Count"] = df_new["B"]
df_new["C_Count"] = df_new["C"]
df_new["D_Count"] = df_new["D"]
df_new["E_Count"] = df_new["E"]
2. (35%) In class, we utilized three features from the
Titanic dataset: "Pclass," "Sex," and "Age," as features /
predictors (X), with the "Survived" column serving as the
target label / variable (y). With data from nearly 900
passengers, we trained eight classification supervised
learning models (e.g., Logistic Regression, K-NN, SVC,
Gaussian Naive Bayes, Multinomial Naive Bayes,
Decision Tree, Random Forest, and XGBoost) to predict
the survival of Jack and Rose from the movie Titanic.
Now, we have obtained an updated passenger list
(comprising over 1,300 passengers) and additional column
meanings from titanic.csv and the definitions of the
following data dictionary and variable notes.
Data Dictionary
Variable Definition Key
1 980
2 1050
3 990
4 1100
5 1000
6 980
7 1020
8 990
9 950
10 1020
IV. (50%) Essay Questions: