0% found this document useful (0 votes)
5 views

FAQ's - Supervised Learning

Uploaded by

shreyasgawade12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

FAQ's - Supervised Learning

Uploaded by

shreyasgawade12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

AIML Online

Frequently Asked Questions in Problem Statement


Course: Supervised Learning
PART - A [30 Marks]
* Direct or Self-explanatory questions are not covered in this FAQ.

1. Data Understanding:
1 C. Compare Column names of all the 3 DataFrames and clearly write observations. [1 Mark]
→ Compare the column names of all the three dataframes. As we are going to merge datasets by rows,
checking the column names, order and type is mandatory. Use a simple compare operator to check
whether all 3 dataframes have the same column names. And write your observations from the result.

1 D. Print DataTypes of all the 3 DataFrames. [1 Mark]


→ Print the datatypes of all the 3 dataframes and write your observations.

1 E. Observe and share variation in ‘Class’ feature of all the 3 DaraFrames. [1 Mark]
→ Check the ‘Class’ variable’s distribution and categories.

2. Data Preparation and Exploration:


2 A. Unify all the variations in ‘Class’ feature for all the 3 DataFrames. [1 Marks]
→ Unify the variations reported in the previous step 1.E.
Example - If the ‘Class’ variable of ‘normal’ dataframe has ‘Normal’, ‘normal’ or ‘Nrml’ replace them
with ‘normal’. Similarly, check and unify the ‘class’ for type_s and type_h dataframes.

2 B. Combine all the 3 DataFrames to form a single DataFrame [1 Marks]


→ Combine the 3 datasets into 1. Look at the checkpoint that the final dataframe should have 310 rows
and 7 columns.

1
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
3. Data Analysis:
3 C. Visualize a pairplot with 3 classes distinguished by colors and share insights. [2 Marks]
→ Create a pairplot for the given variables and the color of the data points in the pairplot should be
distinguished by ‘Class’ categories.

4. Model Building:
4 D. Print all the possible performance metrics for both train and test data. [2 Marks]
→ Print the performance metric of classification models that include accuracy, precision, recall, F1 score
etc.

5. Performance Improvement:
5 A. Experiment with various parameters to improve performance of the base model. [2 Marks]
→ So far you would have run the default model, now you can tune the model by changing the
parameters in KNeighborsClassifier() or svm function. Firstly, self-explore what are the parameters
available in the models and check how you can fine-tune it by changing the options. You have to just
research a bit and do it. (Detailed parameter tuning will be covered in feature engineering course)
Reference link for Hyperparameter tuning for a KNN problem -
https://medium.datadriveninvestor.com/k-nearest-neighbors-in-python-hyperparameters-tuning-71673
4bc557f
You can explore and tune the hyperparameters for other models too. You can learn about Gridsearch,
Random search cross validation techniques and use them.

PART - B [30 Marks]


1. Data Understanding and Preparation:
1 D. Change Datatype of below features to ‘Object’ [1 Marks]
‘CreditCard’, ‘InternetBanking’, ‘FixedDepositAccount’, ‘Security’, ‘Level’, ‘HiddenScore’.
[Reason behind performing this operation: - Values in these features are binary i.e. 1/0. But DataType is
‘int’/’float’ which is not expected.]

2
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
→ The variables are of object type with Binary or multi-class outputs like 0,1 or 1,2,3 etc. Hence,
convert them to ‘Object’ type

2. Data Exploration and Analysis:


2 A. Visualize distribution of Target variable ‘LoanOnCard’ and clearly share insights. [2 Marks]
→ Plot a suitable plot to display distribution of Target variable.

2 C. Check for unexpected values in each categorical variable and impute them with the best suitable
value. [2 Marks]
→ Unexpected values mean if all values in a feature are 0/1 then ‘?’, ‘a’, 1.5 are unexpected values
which needs treatment

3. Data Preparation and model building:


3 D. Print evaluation metrics for the model and clearly share insights. [1 Marks]
→ Print the performance metric of classification models that include accuracy, precision, recall, F1 score
etc.

3 E. Balance the data using the right balancing technique. [2 Marks]


→ Target balancing can be done by upsampling the minority class or downsampling the majority class
or by using SMOTE as per target distribution. You can research a bit and do this task.

4. Performance Improvement:
4 A. Train a base model each for SVM, KNN. [4 Marks]
→You have to build a base model without tuning any parameters on the balanced data.

4 B. Tune parameters for each of the models wherever required and finalize a model. [3 Marks]
(Optional: Experiment with various Hyperparameters - Research required)
→ Tune the parameters as performed in Part A, Question 5 A.

3
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
You can tune the model by changing the parameters in KNeighborsClassifier() or svm function. Firstly,
self-explore what are the parameters available in the models and check how you can fine-tune it by
changing the options. You have to just research a bit and do it. (Detailed parameter tuning will be
covered in feature engineering course)
Reference link for Hyperparameter tuning for a KNN problem -
https://medium.datadriveninvestor.com/k-nearest-neighbors-in-python-hyperparameters-tuning-71673
4bc557f
You can explore and tune the hyperparameters for other models too.

4 C. Print evaluation metrics for final model. [1 Marks]


→ Print the performance metric of the final model that includes accuracy, precision, recall, F1 score etc.

4 D. Share improvement achieved from base model to final model. [2 Marks]


→ Show the performance improvement of that model (comparing its base model & final model
performance report).

4
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited

You might also like