FAQ's - FMT Project
FAQ's - FMT Project
Note: Doing PCA is not mandatory, it's just a suggestion, you can choose to do PCA on 5.D. But mention
1
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
your observations here.
In Q.2.C please justify why you are again choosing a feature engineering step to drop the features. Your
statement should justify your action. Meanwhile, if you feel all your features are good and there is no
need to drop any of them, then justify the same.
For Q.2 Data Cleansing--- Clean the data to the best of your knowledge and drop all highly correlated
and not so useful columns. Data should be cleaned before building a model.
2E. Make all relevant modifications on the data using both functional/logical reasoning/assumptions.
[2 Marks]
Query: Please elaborate or provide the hint as data cleansing is already done in above all questions
related to this project.
Here list down all the modifications made to the data (2.a, 2.b, 2.c, 2.d) and your assumptions for
choosing these steps in cleaning data. And What can be done further, is there any scope for PCA or any
feature engineering steps. You can also express your assumptions on the cleaned data. A brief
explanation is needed here.
3. Data analysis & visualisation: [5 Marks]
3A. Perform a detailed univariate Analysis with appropriate detailed comments after each analysis.
[2 Marks]
3B. Perform bivariate and multivariate analysis with appropriate detailed comments after each
analysis. [3 Marks]
Query: How easy is it to do Univariate, Bivariate, and Multivariate analyses, when I have more than
500+ features?
🡪 Yes, there are huge number of variables which is way more difficult to interpret. But in real life
problems you will have still more columns and to make the learners understand the concepts, this
project is designed.
Since we don't have variable names here, it is difficult to understand which variable is giving us what
information. So, please choose any 3 or 4 variables and perform univariate analysis. Likewise choose
any two variables and perform bivariate analysis. Pair plot is a challenge here, so please avoid doing it.
Once you perform a correlation plot you can mention your observations there.
For correlation plot or heat map, there is no need to specify any column name; you just have to give
your overall interpretation and observations, like if you observe any correlation or not.
4D. Check if the train and test data have similar statistical characteristics when compared with
original data. [2 Marks]
For this question please print 5-point summary of original data, train data and test data separately, for
which you can use 'describe' function, and note down your observation like do you feel they are still
same or any variations between them.
2
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Statistical characteristics are many like Sampling and Errors, Statistical measures of the data etc.
From one description function you can know about a 5 points summary like Mean, median, mode, std,
Range, IQR, counts etc. These are all describing how your data is distributed, that is what statistical
characteristics mean in the question.
5A. Use any Supervised Learning technique to train a model. [2 Marks]
Query: For questions 5A-5C, can we just use "raw" data (i.e. data that is not balanced or
standardised)? The reason is because 5D already asks for the same. Can we build any Supervised
model of our choice?
🡪 For Question 5.A to 5.C, you can continue with the same data which use used in Question 4, follow all
the steps as asked in problem statement.
In 5D, it's just a hint to improvise your model performance, you are free to explore. E.g.: you can choose
to do PCA.
Yes, you can build any Supervised model of your choice.
***************************HAPPY LEARNING********************************
3
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited