Assignment - Data Science Concepts
Assignment - Data Science Concepts
A) k-Means Clustering
B) Logistic Regression
C) Principal Component Analysis (PCA)
D) DBSCAN
Answer: ??
Answer: ??
Answer: ??
Question 4: Which of the following techniques can be used to handle missing values in a
dataset?
A) When a model performs well on the training data but poorly on new, unseen data
B) When a model performs equally well on both training and testing data
C) When a model has too few parameters
D) When a model uses cross-validation for evaluation
Answer: ??
Load the Iris dataset, perform basic data preprocessing, and conduct exploratory data
analysis.
python
Copy code
import pandas as pd
data = pd.read_csv('iris.csv')
print(data.head())
print(data.isnull().sum())
print(data.describe())
# Visualize the pairwise relationships between features
sns.pairplot(data, hue='Species')
plt.show()
Question: How many missing values are there in the Iris dataset?
A) 0
B) 5
C) 10
D) 20
Answer: ??
Train a logistic regression model to classify the Iris species and evaluate its performance.
python
Copy code
y = data['Species']
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
# Predict on the test set
y_pred = model.predict(X_test)
print(f'Accuracy: {accuracy}')
print('Classification Report:')
print(classification_report(y_test, y_pred))
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))
Question: What is the accuracy of the logistic regression model on the test set?
A) Around 0.70
B) Around 0.80
C) Around 0.90
D) Around 1.00
Answer: ??
Perform hyperparameter tuning on a k-NN model to find the optimal value of k using cross-
validation.
Python
accuracy_scores = []
# Perform cross-validation for each k value
for k in k_values:
knn = KNeighborsClassifier(n_neighbors=k)
accuracy_scores.append(scores.mean())
plt.xlabel('k')
plt.ylabel('Cross-Validated Accuracy')
plt.show()
A) 1
B) 3
C) 5
D) 10
Answer: ??
Evaluate the performance of the k-NN model with the optimal value of k.
python
Copy code
# Train the k-NN model with the optimal value of k (assume k=5)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
print(f'Accuracy: {accuracy_knn}')
print('Classification Report:')
print(classification_report(y_test, y_pred_knn))
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred_knn))
Question: What is the accuracy of the k-NN model with the optimal value of k on the test set?
A) Around 0.70
B) Around 0.80
C) Around 0.90
D) Around 1.00
Answer: ??