0% found this document useful (1 vote)
123 views4 pages

Day14-PCA - Problem Statement

Uploaded by

Priya kamble
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
123 views4 pages

Day14-PCA - Problem Statement

Uploaded by

Priya kamble
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Topic: Dimension Reduction with PCA

Instructions:
Please share your answers wherever applicable in-line with the word document. Submit code
separately wherever applicable.

Please ensure you update all the details:


Name: _____________ Batch ID: ___________
Topic: Principal Component Analysis

Guidelines:
1. An assignment submission is considered complete only when correct and executable code(s) are submitted along
with the documentation explaining the method and results. Failing to submit either of those will be considered an
invalid submission and will not be considered as correct submission.
2. Ensure that you submit your assignments correctly and in full. Resubmission is not allowed.
3. Post the submission you can evaluate your work by referring to keys provided. (will be available only post the
submission).

Hints:
1. Business Problem
1.1. What is the business objective?
1.1. Are there any constraints?

2. Work on each feature of the dataset to create a data dictionary as displayed in the below
image:

2.1 Make a table as shown above and provide information about the features such as its data type
and its relevance to the model building. And if not relevant, provide reasons and a description of the
feature.

3. Data Pre-processing
3.1 Data Cleaning, Feature Engineering, etc.
4. Exploratory Data Analysis (EDA):
4.1. Summary
4.2. Univariate analysis
4.3. Bivariate analysis

© 2013 - 2021 360DigiTMG. All Rights Reserved.


5. Model Building
5.1 Build the model on the scaled data (try multiple options)
5.2 Perform PCA analysis and get the maximum variance between components
5.3 Perform clustering before and after applying PCA to cross the number of clusters
formed
5.4 Briefly explain the model output in the documentation

6. Write about the benefits/impact of the solution - in what way does the business (client)
benefit from the solution provided?

Problem Statement: -
1. Perform hierarchical and K-means clustering on the dataset. After that, perform PCA on
the dataset and extract the first 3 principal components and make a new dataset with
these 3 principal components as the columns. Now, on this new dataset, perform
hierarchical and K-means clustering. Compare the results of clustering on the original
dataset and clustering on the principal components dataset (use the scree plot technique
to obtain the optimum number of clusters in K-means clustering and check if you’re
getting similar results with and without PCA).

© 2013 - 2021 360DigiTMG. All Rights Reserved.


© 2013 - 2021 360DigiTMG. All Rights Reserved.
Problem Statement: -

2. A pharmaceuticals manufacturing company is conducting a study on a new medicine to treat


heart diseases. The company has gathered data from its secondary sources and would like
you to provide high level analytical insights on the data. Its aim is to segregate patients
depending on their age group and other factors given in the data. Perform PCA and clustering
algorithms on the dataset and check if the clusters formed before and after PCA are the same
and provide a brief report on your model. You can also explore more ways to improve your
model.

Note: This is just a snapshot of the data. The datasets can be downloaded from AiSPRY LMS in
the Hands-On Material section.

© 2013 - 2022 360DigiTMG. All Rights Reserved.

You might also like