FAQ's - FMT Project

The document outlines the objectives and requirements for a project focused on feature engineering, data cleansing, and model training in machine learning. It provides specific tasks for learners, including removing features with excessive null values, conducting univariate and multivariate analyses, and ensuring statistical characteristics of train and test data align with original data. Additionally, it emphasizes the importance of justifying feature selection and engineering steps taken during the project.

Uploaded by

Amit Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views3 pages

FAQ's - FMT Project

Uploaded by

Amit Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

AIML Online

Frequently Asked Questions in Problem Statement

Course: Featurization, Model Selection and Tuning
* Direct or Self-explanatory questions are not covered in this FAQ.
Objective: FMT project's main objective is feature engineering and feature extraction. Finally, only or
most of the useful variables must be fitted to the model. Once data is clean and we have all relevant
variables, then our model will perform well. This project gives learners wisdom to explore Feature
engineering steps. Clean your data as much as possible.
2. Data cleansing:
2A. Write a for loop which will remove all the features with 20%+ Null values and impute rest with
the mean of the feature. [5 Marks]
Query: Should I remove the features with 20%+ null values or remove only those rows?
You have to remove features having 20%+ null values and not the rows.
Query: I know how to remove null values but how to remove variables having 20%+ null values and
how to impute the remaining variables with less than 20% null values?
Here learners are expected to check % of null values and remove those features having more than 20%
null values present in them. For performing this step, you have to write a ‘for’ loop that calculates % of
null values for that feature.
And for remaining features which have less than 20% null values, impute those features with its mean.
2B. Identify and drop the features which are having the same value for all the rows. [3 Marks]
2C. Drop other features if required using relevant functional knowledge. Clearly justify the same.
Query: I could not understand what to do in project FMT project q2b and q2c?
Can you please elaborate what I have to do in these questions?
Feature engineering consists of creation, transformation, extraction, and selection of features, also
known as variables, that are most conducive to creating an accurate ML algorithm.
For Q.2.B and Q.2.C - It is expected from the learners to do all feature engineering steps and extract
only those features which are good for building models.
Here you should drop the features having the same kind of information, for this you can choose to do
different feature engineering steps like PCA and others like forward selection, backward elimination.
You can check whether features have zero standard deviation and drop them, also can check high
correlation etc.

Note: Doing PCA is not mandatory, it's just a suggestion, you can choose to do PCA on 5.D. But mention

In Q.2.C please justify why you are again choosing a feature engineering step to drop the features. Your
statement should justify your action. Meanwhile, if you feel all your features are good and there is no
need to drop any of them, then justify the same.
For Q.2 Data Cleansing--- Clean the data to the best of your knowledge and drop all highly correlated
and not so useful columns. Data should be cleaned before building a model.
2E. Make all relevant modifications on the data using both functional/logical reasoning/assumptions.
[2 Marks]
Query: Please elaborate or provide the hint as data cleansing is already done in above all questions
related to this project.
Here list down all the modifications made to the data (2.a, 2.b, 2.c, 2.d) and your assumptions for
choosing these steps in cleaning data. And What can be done further, is there any scope for PCA or any
feature engineering steps. You can also express your assumptions on the cleaned data. A brief
explanation is needed here.
3. Data analysis & visualisation: [5 Marks]
3A. Perform a detailed univariate Analysis with appropriate detailed comments after each analysis.
[2 Marks]
3B. Perform bivariate and multivariate analysis with appropriate detailed comments after each
analysis. [3 Marks]
Query: How easy is it to do Univariate, Bivariate, and Multivariate analyses, when I have more than
500+ features?
🡪 Yes, there are huge number of variables which is way more difficult to interpret. But in real life
problems you will have still more columns and to make the learners understand the concepts, this
project is designed.
Since we don't have variable names here, it is difficult to understand which variable is giving us what
information. So, please choose any 3 or 4 variables and perform univariate analysis. Likewise choose
any two variables and perform bivariate analysis. Pair plot is a challenge here, so please avoid doing it.
Once you perform a correlation plot you can mention your observations there.
For correlation plot or heat map, there is no need to specify any column name; you just have to give
your overall interpretation and observations, like if you observe any correlation or not.
4D. Check if the train and test data have similar statistical characteristics when compared with
original data. [2 Marks]
For this question please print 5-point summary of original data, train data and test data separately, for
which you can use 'describe' function, and note down your observation like do you feel they are still
same or any variations between them.

2
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Statistical characteristics are many like Sampling and Errors, Statistical measures of the data etc.
From one description function you can know about a 5 points summary like Mean, median, mode, std,
Range, IQR, counts etc. These are all describing how your data is distributed, that is what statistical
characteristics mean in the question.
5A. Use any Supervised Learning technique to train a model. [2 Marks]
Query: For questions 5A-5C, can we just use "raw" data (i.e. data that is not balanced or
standardised)? The reason is because 5D already asks for the same. Can we build any Supervised
model of our choice?
🡪 For Question 5.A to 5.C, you can continue with the same data which use used in Question 4, follow all
the steps as asked in problem statement.
In 5D, it's just a hint to improvise your model performance, you are free to explore. E.g.: you can choose
to do PCA.
Yes, you can build any Supervised model of your choice.
***************************HAPPY LEARNING********************************

Foundation of Data Science Previous Year Question Paper
No ratings yet
Foundation of Data Science Previous Year Question Paper
40 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
Machine Learning Mindmap PDF
100% (1)
Machine Learning Mindmap PDF
5 pages
Governor Controller (GAC ESD5111) - Datasheet
100% (1)
Governor Controller (GAC ESD5111) - Datasheet
5 pages
Machine Learning With Big Data Final
No ratings yet
Machine Learning With Big Data Final
120 pages
11.feature Selection, Extraction
No ratings yet
11.feature Selection, Extraction
38 pages
20 Questions On Feature Engineering and Eda
No ratings yet
20 Questions On Feature Engineering and Eda
9 pages
Tappi T264 Cm-97
100% (6)
Tappi T264 Cm-97
3 pages
ML Questions Answer Q1
No ratings yet
ML Questions Answer Q1
79 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
50 pages
01 - Feature Engg
No ratings yet
01 - Feature Engg
43 pages
AI-Module 4 - Updated
No ratings yet
AI-Module 4 - Updated
53 pages
PPA Data Preparation
No ratings yet
PPA Data Preparation
31 pages
Class PPT - Unit2
No ratings yet
Class PPT - Unit2
139 pages
Interview Questions On Machine Learning
100% (4)
Interview Questions On Machine Learning
22 pages
Unit 3
No ratings yet
Unit 3
50 pages
L 10 Principal Component Analysis 09052024 072206pm
No ratings yet
L 10 Principal Component Analysis 09052024 072206pm
37 pages
Unit 2exploratory Analysis
No ratings yet
Unit 2exploratory Analysis
37 pages
Feature Engineering For Machine Learning
No ratings yet
Feature Engineering For Machine Learning
41 pages
Data
No ratings yet
Data
36 pages
ML Unit 2
No ratings yet
ML Unit 2
33 pages
PBS TJ100 Turbojet
No ratings yet
PBS TJ100 Turbojet
2 pages
DS For Business Home Assignments
No ratings yet
DS For Business Home Assignments
24 pages
ML Notes
No ratings yet
ML Notes
44 pages
Unit-4 Part 3 Feature Engineering
No ratings yet
Unit-4 Part 3 Feature Engineering
29 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
Unit 4 Basics of Feature Engineering
No ratings yet
Unit 4 Basics of Feature Engineering
33 pages
ML Self Unit 2
No ratings yet
ML Self Unit 2
20 pages
Bussiness Report PM
No ratings yet
Bussiness Report PM
44 pages
Lesson Plan - Ksa 2
50% (4)
Lesson Plan - Ksa 2
2 pages
Data Mining Methods
No ratings yet
Data Mining Methods
17 pages
Unit 6aics
No ratings yet
Unit 6aics
25 pages
Ds 5
No ratings yet
Ds 5
9 pages
DS End Sem.
No ratings yet
DS End Sem.
31 pages
Employee Performance Analysis
No ratings yet
Employee Performance Analysis
3 pages
TE ML LAB Mannual
No ratings yet
TE ML LAB Mannual
21 pages
01 Apply Data Preprocessing On Heart Dataset and Evaluate Performance Using Confusion Matrix
No ratings yet
01 Apply Data Preprocessing On Heart Dataset and Evaluate Performance Using Confusion Matrix
19 pages
Explore Feature Engineering
No ratings yet
Explore Feature Engineering
10 pages
NASHEEEEYYYYYY
No ratings yet
NASHEEEEYYYYYY
30 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
EDA Explanations
No ratings yet
EDA Explanations
22 pages
What Are The Differences Between Supervised and Unsupervised Learning?
No ratings yet
What Are The Differences Between Supervised and Unsupervised Learning?
21 pages
Data Preprocessing
No ratings yet
Data Preprocessing
8 pages
Aiml Ut2 QB Solution
No ratings yet
Aiml Ut2 QB Solution
8 pages
Assignment 1 - LP1
No ratings yet
Assignment 1 - LP1
14 pages
DS Ex1975
No ratings yet
DS Ex1975
5 pages
Module 2
No ratings yet
Module 2
12 pages
SMB013 Risk Assessment Use Storage and Disposal of Flammable Liquids
No ratings yet
SMB013 Risk Assessment Use Storage and Disposal of Flammable Liquids
6 pages
Data Prep and Cleaning For Machine Learning
No ratings yet
Data Prep and Cleaning For Machine Learning
22 pages
Machine: Learning
No ratings yet
Machine: Learning
24 pages
Types of Data (Qualitative and Quantitative)
No ratings yet
Types of Data (Qualitative and Quantitative)
89 pages
FAQ's - Supervised Learning
No ratings yet
FAQ's - Supervised Learning
4 pages
DS Assignment COMPLETED
No ratings yet
DS Assignment COMPLETED
11 pages
Information Technology 409
No ratings yet
Information Technology 409
6 pages
Sfds Aat
No ratings yet
Sfds Aat
8 pages
Xplore Feature Engineering
No ratings yet
Xplore Feature Engineering
9 pages
BOQ Full Comprehevsive AMC Swimming Pool Operation
No ratings yet
BOQ Full Comprehevsive AMC Swimming Pool Operation
12 pages
CCW331 Set4
No ratings yet
CCW331 Set4
5 pages
Milestone FMT
No ratings yet
Milestone FMT
2 pages
HW 02
No ratings yet
HW 02
3 pages
A Reinforced Soil Mix Wall Cofferdam Supported by High Capacity Removable Soil Anchors
No ratings yet
A Reinforced Soil Mix Wall Cofferdam Supported by High Capacity Removable Soil Anchors
9 pages
DSI Underground Systems PDF
No ratings yet
DSI Underground Systems PDF
18 pages
DWM - END SEM LAB Questions
No ratings yet
DWM - END SEM LAB Questions
9 pages
Marine Event Recorder: Bergen
No ratings yet
Marine Event Recorder: Bergen
2 pages
Compre FoDS
No ratings yet
Compre FoDS
3 pages
A Comparative Analysis of Procurement Methods Used On Competitively Tendered Office Projects in The UK
No ratings yet
A Comparative Analysis of Procurement Methods Used On Competitively Tendered Office Projects in The UK
23 pages
Bom Fastraq DX SN 1107-1216a
No ratings yet
Bom Fastraq DX SN 1107-1216a
5 pages
Common DS Interview Questions and Answers - 1
No ratings yet
Common DS Interview Questions and Answers - 1
4 pages
Job Analysis of HRM
No ratings yet
Job Analysis of HRM
4 pages
Dme I Mock Test Question Bank
No ratings yet
Dme I Mock Test Question Bank
5 pages
HSE Hoop Buckling
No ratings yet
HSE Hoop Buckling
153 pages
Engines 18 Exhaust System PDF
No ratings yet
Engines 18 Exhaust System PDF
24 pages
RrorKeeway Super Shadow 250 Manual Despiece
100% (1)
RrorKeeway Super Shadow 250 Manual Despiece
16 pages
Bio Well Information
No ratings yet
Bio Well Information
5 pages
Assembly Hydraulic Pump
No ratings yet
Assembly Hydraulic Pump
2 pages
Cortex Gas Analyser (VO2 Max) - Learn The Basics!!
No ratings yet
Cortex Gas Analyser (VO2 Max) - Learn The Basics!!
2 pages
Annotated Bib
No ratings yet
Annotated Bib
5 pages
51 43 252 Removing and Installing/replacing Panel For Rear Roof Pillar (D-Pillar), Left or Right Special Tools Required
No ratings yet
51 43 252 Removing and Installing/replacing Panel For Rear Roof Pillar (D-Pillar), Left or Right Special Tools Required
2 pages
Buildmate Products 2
No ratings yet
Buildmate Products 2
8 pages
Rohzin Rahman Abbas Instant Download
No ratings yet
Rohzin Rahman Abbas Instant Download
8 pages
Excel2016 Charts
No ratings yet
Excel2016 Charts
22 pages
BC 22msds
No ratings yet
BC 22msds
2 pages
Data Crow
No ratings yet
Data Crow
8 pages
Ir Ghh-Rand CD4D-VSD
No ratings yet
Ir Ghh-Rand CD4D-VSD
2 pages
Prime HRM Roll Out Action Plan Final
No ratings yet
Prime HRM Roll Out Action Plan Final
2 pages
Closed Expansion Tank - Pressurised: Features
No ratings yet
Closed Expansion Tank - Pressurised: Features
2 pages
PTC Thermistors For Electric Motors: +Irivepmrjsvqexmsrsr48'Xlivqmwxsvw
No ratings yet
PTC Thermistors For Electric Motors: +Irivepmrjsvqexmsrsr48'Xlivqmwxsvw
1 page
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Practice Questions for UiPath Certified RPA Associate Case Based
From Everand
Practice Questions for UiPath Certified RPA Associate Case Based
Exam OG
No ratings yet
Blue Prism Developer Certification Case Based Practice Question - Latest 2023
From Everand
Blue Prism Developer Certification Case Based Practice Question - Latest 2023
Exam OG
No ratings yet

FAQ's - FMT Project

Uploaded by

FAQ's - FMT Project

Uploaded by

AIML Online

Frequently Asked Questions in Problem Statement

You might also like