Shaikh Assignment 3

Uploaded by

abhaykumbhar2231

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views9 pages

Shaikh Assignment 3

Uploaded by

abhaykumbhar2231

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Part 1: Cluster Analysis

1. How many clusters did you use initially? Please explain why you picked that number.

Initially, I created two cluster models: one with three clusters and another with two.
I then chose the model with three clusters for my analysis based on the Silhouette
coefficient, a metric used to assess the quality of clustering. A higher Silhouette
score suggests better separation between clusters. In this case, I observed an
improvement in the score when using three clusters.

2. Which cluster has the greatest number of customers?

Cluster_1 is the largest group, as it contains the highest number of customers,

totaling 82.

3. What are the characteristics of that largest cluster (from question 2)? What makes it
different from the other clusters.

Cluster_1 is the largest group, consisting of 190 customers. This cluster likely includes a mix
of CustomerTypes, a range of IndustryTypes, and various FirmSizes. Its significance lies in its
size, representing a diverse customer base with varied preferences.

4. Which cluster has the highest product quality average? Why do you think it is the highest?
Base your opinion on the cluster’s characteristics.
The cluster with the highest average product quality is Cluster_1, with an average score of
8.58. This cluster likely has the highest product quality score because it contains a significant
number of customers (82, the largest group). It’s possible that these customers are more
engaged or loyal, leading to a higher perception of product quality. This could also indicate
that the products provided to this cluster match their preferences and needs better, which
contributes to a more favorable rating of product quality.

5. Which cluster has the lowest product quality average? Why do you think it is the lowest?
Base your opinion on the cluster’s characteristics.

The cluster with the lowest average product quality is Cluster_0, with an average score of
6.83.The lower product quality score for Cluster_0 could indicate that the customers in this
group are less satisfied with the products they are receiving. This might be due to a mismatch
between the products and their expectations or needs. Another possible reason could be that
this cluster represents a customer segment that is either more critical or less engaged, leading to
lower overall satisfaction with product quality.

6. To improve your analysis, what missing dimension would you add to your data set? Why?

7. Run the cluster again but prune your variables to simplify your model.

a. Which variables did you keep? Why?

To simplify the model, I suggest retaining key variables like "CustomerType,"
"IndustryType," and "ProductQuality." These variables capture essential customer
characteristics and perspectives, playing a significant role in clustering and
analyzing customer satisfaction.

b. How many clusters did you choose? Again, why?

The goal of reducing the clusters to two is to simplify the analysis and create clearer
distinctions.

c. Describe the cluster with the highest largest customers.

Both Cluster_0 and Cluster_1 have an equal number of customers, with each
cluster containing 100 customers.

8. Provide a screenshot of your KNIME workflow.

Part 2: Principal Component Analysis (PCA)

1.When you first run the analysis, how many components did the PCA generate?

The PCA analysis generated 18 components, which are linear combinations of the initial
variables, while retaining the data's variability.

2.Calculate the percentage variance and the cumulative percentage variance for each component
in your output file.
3.Include in your Word document a table (screenshot will work) that includes the percentage
variance, cumulative percentage variance, and the weights for each feature.
4. Based on the answers from question 2, how many components would you keep?
a. Why did you choose that number?

I choose 5 components, as first 5 components are cumulatively explaining a large portion of

variation in the data set.

b. What was the cumulative percentage variance for all those components?
The cumulative variation of the 5 components I have chosen is 80.80%.

5. Using your answer from question 4 and the table from question 3, which features would
you keep?
a. Why did you choose those features?

I chose to retain the CustomerType, IndustryType, FirmSize, Region, and

DistributionSystem features because they had the highest coefficients among the 18
components produced. This suggests that these features have the most significant
impact on the components, making them crucial for the analysis.

b. Provide supporting evidence.

6. Run your PCA again but only use the Customer Perception Data.

7. Calculate the percentage variance and the cumulative percentage variance for each
component.

8. Include in your Word document a table (screenshot will work) that includes the
percentage variance, cumulative percentage variance, and the weights for each feature.

9. Based on the answers from question 7, how many components would you keep?