Test2 ML Model Answer
Test2 ML Model Answer
Instructions:
1. All questions are compulsory
2. Illustrate your answers with neat sketches wherever necessary
3. Figures to the right indicate full marks
4. Use of non-programmable calculator is permissible
5. Assume suitable data if necessary
Preferably, write the answers in sequential order.
Interpretation of Results:
Regression provides a quantitative prediction (e.g., temperature is
28.5°C).
Classification provides a qualitative label (e.g., "hot" or "cold").
Understanding whether the problem requires predicting a
continuous value or assigning categories helps determine whether
to use regression or classification techniques.
4R2 g) Features CO4
Input variables or attributes used to make predictions.
Serve as the independent variables for training.
Labels
Output variable or target value to be predicted.
Represent the dependent variable (ground truth).
Recursive Partitioning:
The dataset is split based on the chosen feature, and this process
continues recursively for each subset until a stopping criterion is
met (e.g., maximum depth, minimum samples in a node, or no
further improvement in the split).
Class Assignment:
Once the tree reaches the leaf nodes, a class label is assigned to
each leaf based on the majority class in the data points that fall into
that leaf.
Overfitting: Decision trees can easily overfit the training data if
they grow too deep. Techniques like pruning (cutting back
branches of the tree) can help prevent overfitting.
1. Dataset Understanding
Assume that you are provided with a dataset where each row
contains weather features and a target variable indicating whether
tennis was played. Here's an example of the dataset structure:
Handle missing data: Check if there are any missing values and
handle them by removing or filling them.
Encode categorical variables: The Outlook and Wind columns are
categorical. You need to convert these into numerical values using
encoding techniques like One-Hot Encoding.
Feature scaling: Random Forest is not sensitive to feature scaling,
but if you are using other models, scaling might be necessary.
4. Model Building
We will use the Random Forest Classifier for this task, which is an
ensemble of decision trees. Here's how you would build the model:
print(f'Accuracy: {accuracy:.2f}')
print('Confusion Matrix:\n', conf_matrix)
print('Classification Report:\n', class_report)
6. Model Evaluation
Accuracy gives you the overall performance of the model.
Confusion Matrix shows the number of true positives, true
negatives, false positives, and false negatives, which is helpful in
understanding model errors.
Classification Report provides precision, recall, and F1-score for
both classes (Yes/No in this case).
# Step 4: Define the feature matrix (X) and target vector (y)
X = df[iris.feature_names] # Features: Sepal length, Sepal width,
Petal length, Petal width
y = df['IsSetosa'] # Target: 1 if Setosa, 0 otherwise
Course Outcomes
CO4 Apply feature engineering on dataset.
CO5 Apply classification algorithm on dataset.
CO6 Apply regression algorithm on dataset.