0% found this document useful (0 votes)

57 views1 page

MARKET Segmentation

The document discusses loading and exploring a marketing dataset containing 8950 rows and 18 columns. Some key variables include customer ID, balance, purchases, payments made, and tenure. Descriptive statistics are provided on the numeric variables, showing their counts, means, standard deviations, minimums, and other values. The data is explored through visualizing its distribution and detecting any outliers.

Uploaded by

Abysz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views1 page

MARKET Segmentation

Uploaded by

Abysz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Import Libraries

In [1]:
import numpy as np

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler, normalize

from sklearn.cluster import KMeans

from sklearn.decomposition import PCA

import warnings

warnings.filterwarnings("ignore")

Load the Data

In [2]:
df = pd.read_csv('Marketing_data.csv')

CUST_ID : Identification of Credit Card holder (Categorical)

BALANCE : Total amount of money that you owe to your credit card company

BALANCE_FREQUENCY : How frequently the Balance is updated, score between 0 and 1 (1 = frequently updated, 0 = not frequently updated)

PURCHASES : Amount of purchases made from account

ONEOFF_PURCHASES : Maximum purchase amount done in one-go

INSTALLMENTS_PURCHASES : Amount of purchase done in installment

CASH_ADVANCE : Cash in advance given by the user

PURCHASES_FREQUENCY : How frequently the Purchases are being made, score between 0 and 1 (1 = frequently purchased, 0 = not frequently purchased)

ONEOFF_PURCHASES_FREQUENCY : How frequently Purchases are happening in one-go (1 = frequently purchased, 0 = not frequently purchased)

PURCHASES_INSTALLMENTS_FREQUENCY : How frequently purchases in installments are being done (1 = frequently done, 0 = not frequently done)

CASH_ADVANCE_FREQUENCY : How frequently the cash in advance being paid

CASH_ADVANCE_TRX : Number of Transactions made with "Cash in Advanced"

PURCHASES_TRX : Numbe of purchase transactions made

CREDIT_LIMIT : Limit of Credit Card for user

PAYMENTS : Amount of Payment done by user

MINIMUM_PAYMENTS : Minimum amount of payments made by user

PRC_FULL_PAYMENT : Percent of full payment paid by user

TENURE: Tenure of credit card service for user

In [3]:
# Display the dataset
df

Out[3]: CUST_ID BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLM

0 C10001 40.900749 0.818182 95.40 0.00 95.40 0.000000 0.166667 0.000000

1 C10002 3202.467416 0.909091 0.00 0.00 0.00 6442.945483 0.000000 0.000000

2 C10003 2495.148862 1.000000 773.17 773.17 0.00 0.000000 1.000000 1.000000

3 C10004 1666.670542 0.636364 1499.00 1499.00 0.00 205.788017 0.083333 0.083333

4 C10005 817.714335 1.000000 16.00 16.00 0.00 0.000000 0.083333 0.083333

... ... ... ... ... ... ... ... ... ...

8945 C19186 28.493517 1.000000 291.12 0.00 291.12 0.000000 1.000000 0.000000

8946 C19187 19.183215 1.000000 300.00 0.00 300.00 0.000000 1.000000 0.000000

8947 C19188 23.398673 0.833333 144.40 0.00 144.40 0.000000 0.833333 0.000000

8948 C19189 13.457564 0.833333 0.00 0.00 0.00 36.558778 0.000000 0.000000

8949 C19190 372.708075 0.666667 1093.25 1093.25 0.00 127.040008 0.666667 0.666667

8950 rows × 18 columns

In [4]:
# Get some information about the data

df.info()

RangeIndex: 8950 entries, 0 to 8949

Data columns (total 18 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 CUST_ID 8950 non-null object

1 BALANCE 8950 non-null float64

2 BALANCE_FREQUENCY 8950 non-null float64

3 PURCHASES 8950 non-null float64

4 ONEOFF_PURCHASES 8950 non-null float64

5 INSTALLMENTS_PURCHASES 8950 non-null float64

6 CASH_ADVANCE 8950 non-null float64

7 PURCHASES_FREQUENCY 8950 non-null float64

8 ONEOFF_PURCHASES_FREQUENCY 8950 non-null float64

9 PURCHASES_INSTALLMENTS_FREQUENCY 8950 non-null float64

10 CASH_ADVANCE_FREQUENCY 8950 non-null float64

11 CASH_ADVANCE_TRX 8950 non-null int64

12 PURCHASES_TRX 8950 non-null int64

13 CREDIT_LIMIT 8949 non-null float64

14 PAYMENTS 8950 non-null float64

15 MINIMUM_PAYMENTS 8637 non-null float64

16 PRC_FULL_PAYMENT 8950 non-null float64

17 TENURE 8950 non-null int64

dtypes: float64(14), int64(3), object(1)

memory usage: 1.2+ MB

In [5]:
# Describe the data

df.describe()

Out[5]: BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FR

count 8950.000000 8950.000000 8950.000000 8950.000000 8950.000000 8950.000000 8950.000000 8950.000000 8

mean 1564.474828 0.877271 1003.204834 592.437371 411.067645 978.871112 0.490351 0.202458

std 2081.531879 0.236904 2136.634782 1659.887917 904.338115 2097.163877 0.401371 0.298336

min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000

25% 128.281915 0.888889 39.635000 0.000000 0.000000 0.000000 0.083333 0.000000

50% 873.385231 1.000000 361.280000 38.000000 89.000000 0.000000 0.500000 0.083333

75% 2054.140036 1.000000 1110.130000 577.405000 468.637500 1113.821139 0.916667 0.300000

max 19043.138560 1.000000 49039.570000 40761.250000 22500.000000 47137.211760 1.000000 1.000000

Mean BALANCE is $1,564

BALANCE_FREQUENCY is frequently updated on average ~0.9

PURCHASES average is $1,000

ONEOFF_PURCHASES average is ~$600

Average PURCHASES_FREQUENCY is around 0.5

Average ONEOFF_PURCHASES_FREQUENCY, PURCHASES_INSTALLMENTS_FREQUENCY, and CASH_ADVANCE_FREQUENCY

Average CREDIT_LIMIT is ~$4,500

Average PRC_FULL_PAYMENT is 15%

Average TENURE is 11.5 years

Data Cleaning
Visualize and Explore Dataset
In [6]:
# Check the missing data

print(df.isnull().sum())

print('----------------------------------------\n Percentage of missing data: \n')

print((df[['MINIMUM_PAYMENTS', 'CREDIT_LIMIT']].isnull().sum()/df['CUST_ID'].count())*100)

CUST_ID 0

BALANCE 0

BALANCE_FREQUENCY 0

PURCHASES 0

ONEOFF_PURCHASES 0

INSTALLMENTS_PURCHASES 0

CASH_ADVANCE 0

PURCHASES_FREQUENCY 0

ONEOFF_PURCHASES_FREQUENCY 0

PURCHASES_INSTALLMENTS_FREQUENCY 0

CASH_ADVANCE_FREQUENCY 0

CASH_ADVANCE_TRX 0

PURCHASES_TRX 0

CREDIT_LIMIT 1

PAYMENTS 0

MINIMUM_PAYMENTS 313

PRC_FULL_PAYMENT 0

TENURE 0

dtype: int64

----------------------------------------

Percentage of missing data:

MINIMUM_PAYMENTS 3.497207

CREDIT_LIMIT 0.011173

dtype: float64

Here we have calculated the percentage of data that are missing

There are two variables with missing data, namely CREDIT_LIMIT and MINIMUM_PAYMENTS. The missing values in these columns make up a insignificant percentage of the data set and can be
safely deleted without risking a loss in data. The missing data in CREDIT_LIMIT make up less than 1% of the data and in MINIMUM_PAYMENTS only around 3%.

In [7]:
# Fill up the missing elements with mean of the MINIMUM_PAYMENTS

df.loc[(df['MINIMUM_PAYMENTS'].isnull() == True),

'MINIMUM_PAYMENTS'] = df['MINIMUM_PAYMENTS'].mean()

# Fill up the missing elements with mean of the CREDIT_LIMIT

df.loc[(df['CREDIT_LIMIT'].isnull() == True),

'CREDIT_LIMIT'] = df['CREDIT_LIMIT'].mean()

df.isnull().sum()

CUST_ID 0

Out[7]:
BALANCE 0

BALANCE_FREQUENCY 0

PURCHASES 0

ONEOFF_PURCHASES 0

INSTALLMENTS_PURCHASES 0

CASH_ADVANCE 0

PURCHASES_FREQUENCY 0

ONEOFF_PURCHASES_FREQUENCY 0

PURCHASES_INSTALLMENTS_FREQUENCY 0

CASH_ADVANCE_FREQUENCY 0

CASH_ADVANCE_TRX 0

PURCHASES_TRX 0

CREDIT_LIMIT 0

PAYMENTS 0

MINIMUM_PAYMENTS 0

PRC_FULL_PAYMENT 0

TENURE 0

dtype: int64

In [8]:
# Chek duplicated entries in the data

df.duplicated().sum()

0
Out[8]:

In [9]:
# Drop Customer ID column 'CUST_ID'

df.drop('CUST_ID', axis = 1, inplace = True)

df.head()

Out[9]: BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQU

0 40.900749 0.818182 95.40 0.00 95.4 0.000000 0.166667 0.000000 0.0

1 3202.467416 0.909091 0.00 0.00 0.0 6442.945483 0.000000 0.000000 0.0

2 2495.148862 1.000000 773.17 773.17 0.0 0.000000 1.000000 1.000000 0.0

3 1666.670542 0.636364 1499.00 1499.00 0.0 205.788017 0.083333 0.083333 0.0

4 817.714335 1.000000 16.00 16.00 0.0 0.000000 0.083333 0.083333 0.0

EDA
In [10]:
df.columns

Index(['BALANCE', 'BALANCE_FREQUENCY', 'PURCHASES', 'ONEOFF_PURCHASES',

Out[10]:
'INSTALLMENTS_PURCHASES', 'CASH_ADVANCE', 'PURCHASES_FREQUENCY',

'ONEOFF_PURCHASES_FREQUENCY', 'PURCHASES_INSTALLMENTS_FREQUENCY',

'CASH_ADVANCE_FREQUENCY', 'CASH_ADVANCE_TRX', 'PURCHASES_TRX',

'CREDIT_LIMIT', 'PAYMENTS', 'MINIMUM_PAYMENTS', 'PRC_FULL_PAYMENT',

'TENURE'],

dtype='object')

Distplot combines the matplotlib.hist function with seaborn kdeplot().

KDE Plot represents the Kernel Density Estimate.

KDE is used for visualizing the Probability Density of a continuous variable.

KDE demonstrates the probability density at different values in a continuous variable

In [11]:
plt.rcParams['figure.figsize'] =(20,40)

for num in range(0,17):

ax = plt.subplot(9,2,num+1)

col = df.columns[num]

sns.distplot(df[col], ax=ax,kde_kws={'color':'b', 'lw':3, 'label':'KDE'})

plt.show()

Above are the distribution plots of every variables in the dataframe

Here we have the overview of the whole distribution of the dataframe. We can see right away that these distributions are very left skewed and there are a lot of zero values.

In [12]:
plt.rcParams['figure.figsize'] = (13,4)

sns.distplot(df['BALANCE'],bins=150,color = 'r')

plt.title('Distribution of Balance', size=20)

plt.xlabel('Balance')

Text(0.5, 0, 'Balance')
Out[12]:

By keeping the balance low (in this case zero) but the credit limit high,would increase the credit utilization ratio and in turn increases overall credit score.

In [13]:
plt.rcParams['figure.figsize'] = (13,10)

sns.countplot(y=df['BALANCE_FREQUENCY'],order = df['BALANCE_FREQUENCY'].value_counts().index)

plt.ylabel('Balance Frequncy Score (0-1)')

plt.title('Counts of Balance Frequency Score', fontsize=20)

Text(0.5, 1.0, 'Counts of Balance Frequency Score')

Out[13]:

We can see here most of the accounts have the score of one, the best score,most people do use credit card frequently and only a small number of people keep their cards relatively inactive.

In [14]:
plt.rcParams['figure.figsize'] = (13,4)

sns.distplot(df['PURCHASES'], color='orange', bins=150)

plt.title('Distribution of Purchases', size=20)

plt.xlabel('Purchases')

Text(0.5, 0, 'Purchases')
Out[14]:

Many people have the purchase amounts of 0 since earlier alot of people are holding zero balance cards.

In [15]:
plt.subplot(1,2,1)

sns.distplot(df['ONEOFF_PURCHASES'],color='green')

plt.title('Distribution of One Off Purchase', fontsize = 20)

plt.xlabel('Amount')

plt.subplot(1,2,2)

sns.distplot(df['INSTALLMENTS_PURCHASES'], color='red')

plt.title('Distribution of Installment Purchase', fontsize = 20)

plt.xlabel('Amount')

Text(0.5, 0, 'Amount')
Out[15]:

This still follows that same trend of zeros balance account. One off purchases go up as high as more than 40,000 dollars while the highest installment purchases go up to around 25,000 dollars.

In [16]:
plt.rcParams['figure.figsize'] = (16,15)

plt.subplot(2,2,1)

sns.scatterplot(df['PURCHASES'],df['CREDIT_LIMIT'])

plt.title('Credit Limit And Purchases', fontsize =20)

plt.xlabel('Purchases')

plt.ylabel('Credit limit')

plt.subplot(2,2,2)

sns.scatterplot(df['BALANCE'],df['CREDIT_LIMIT'])

plt.title('Credit Limit And Balance', fontsize =20)

plt.xlabel('Balance')

plt.ylabel('Credit limit')

plt.subplot(2,2,3)

sns.scatterplot(df['ONEOFF_PURCHASES'],df['CREDIT_LIMIT'])

plt.title('Credit Limit And One Off Purchases', fontsize =20)

plt.xlabel('One off purchases')

plt.ylabel('Credit limit')

plt.subplot(2,2,4)

sns.scatterplot(df['INSTALLMENTS_PURCHASES'],df['CREDIT_LIMIT'])

plt.title('Credit Limit And Installments Purchases', fontsize =20)

plt.xlabel('Installment Purchases')

plt.ylabel('Credit limit')

Text(0, 0.5, 'Credit limit')

Out[16]:

There seems to be no strong correlation between the credit limit and these variables.For most people, credit cards are tools for credit utilization rather than spending device.

As for balance, there seems to be a better correlation that as credit limit goes up balance also goes up but it is also clear to see that there are also points where balance stays at zeros but credit
limits do go up.

In [17]:
# Correlation matrix between features

correlations = df.corr()

f, ax = plt.subplots(figsize=(20,10))

sns.heatmap(correlations, annot = True);

Above is the heatmap of the dataset

Here we can take a closer look at the correlation with in the dataset. Purchases and one off purchase have very high correlation as we would expect at 0.92. This is the same for varaibles and their
frequency score counter parts such as cash advance trx and cash advance frequency at 0.8. Not surprisingly things like balance and payment have poor correlation. This tell us that the data do
make sense.

In [18]:
plt.rcParams['figure.figsize'] = (6,4)

sns.countplot(df['TENURE'], palette='rainbow')

plt.title('Counts of Tenures', fontsize = 20)

plt.xlabel('Months')

Text(0.5, 0, 'Months')
Out[18]:

Tenure is the repayment period of the cards, ranging from 6-12 months.Most of the cards are 12 months cards.

Model Building
Find the Optimal Number of Clusters Using Elbow Method

The elbow method is a heuristic method of interpretation and validation of consistency within cluster analysis designed to help find the appropriate number of clusters in a dataset.
If the line chart looks like an arm, then the "elbow" on the arm is the value of k that is the best.
Source:
https://en.wikipedia.org/wiki/Elbow_method_(clustering)
https://www.geeksforgeeks.org/elbow-method-for-optimal-value-of-k-in-kmeans/

In [19]:
# Scale the data first

scaler = StandardScaler()

df_scaled = scaler.fit_transform(df)

df_scaled.shape

(8950, 17)
Out[19]:

In [20]:
scores_1 = []

range_values = range(1,11)

for i in range_values:

kmeans = KMeans(n_clusters = i,init='k-means++', n_init=10, max_iter=300, random_state=0)

kmeans.fit(df_scaled)

scores_1.append(kmeans.inertia_)

plt.figure(figsize = (10,10))

plt.plot(scores_1,marker = 'o')

plt.xlabel('Values of K', fontsize = 10)

plt.title('The Elbow Method', fontsize = 15);

Here is the graph deplicting the elbow method used to find the optimum number clusters using kmean analysis

We tried different number of clusters from 1-10 and then we graph inertia or wcss (within clusters sum square) against the cluster number. Inertia is basically how close the datapoints in the clusters
are to the centers, which means the lower it is the more fitting the points are to their respective clusters. Here, we are trying to find the place where the wcss is as low as possible while still keeping
the number of clusters as low as possible.

Here the optimum number of clusters is 4 cluster since it is the place where the graph starts to flatten out meaning that having higher number of clusters will not yield a much more
fitting machine.

Apply K-Means Method

In [21]:
kmeans = KMeans(n_clusters=4, init='k-means++', n_init=10, max_iter=300, random_state=0)

kmeans.fit(df_scaled)

labels = kmeans.labels_ # Labels (cluster) associated to each data point

kmeans.cluster_centers_.shape

(4, 17)
Out[21]:

In [22]:
cluster_centers = pd.DataFrame(data=kmeans.cluster_centers_,

columns = [df.columns])

cluster_centers

Out[22]: BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQUEN

0 -0.321688 0.242574 0.109044 0.000926 0.255904 -0.366373 0.983721 0.317153 0.874

1 1.459578 0.384753 -0.234638 -0.163914 -0.253747 1.688972 -0.504848 -0.212939 -0.450

2 0.954485 0.462694 3.125845 2.713251 2.406470 -0.155091 1.136338 1.798653 1.065

3 -0.265552 -0.368944 -0.343190 -0.230500 -0.387798 -0.182691 -0.797823 -0.389437 -0.714

First customers cluster (Transactors): Those are customers who pay least amount of intrerest charges and careful with their money. Cluster with lowest balance, lowest cash advance, and percentage of full payment =
23%.
Second customers cluster (Revolvers): who use credit card as a loan (most lucrative sector): highest balance and cash advance, low purchase frequency, high cash advance frequency (0.5), high cash advance
transactions (16) and low percentage of full payment (3%).
Third customers cluster (VIP/Prime): high credit limit $16,000, and highest percentage of full payment, target for increase credit limit and increase spending habits.
Fourth customers cluster (Low tenure): these are customers with low tenure (7 years), low balance.

In [23]:
# Transformation
cluster_centers = scaler.inverse_transform(cluster_centers)

cluster_centers = pd.DataFrame(data = cluster_centers,

columns = [df.columns])

cluster_centers

Out[23]: BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQU

0 894.907458 0.934734 1236.178934 593.974874 642.478274 210.570626 0.885165 0.297070 0.7

1 4602.462714 0.968415 501.896219 320.373681 181.607404 4520.724309 0.287731 0.138934 0.1

2 3551.153761 0.986879 7681.620098 5095.878826 2587.208264 653.638891 0.946418 0.739031 0.7

3 1011.751528 0.789871 269.973466 209.853863 60.386625 595.759339 0.170146 0.086281 0.0

In [24]:
y_kmeans = kmeans.fit_predict(df_scaled)

y_kmeans

array([0, 1, 2, ..., 2, 0, 0])

Out[24]:

In [25]:
# Concatenate the clusters labels to our original dataframe

df_cluster = pd.concat([df, pd.DataFrame({'cluster':labels})], axis = 1)

df_cluster.head()

Out[25]: BALANCE BALANCE_FREQUENCY PURCHASES ONEOFF_PURCHASES INSTALLMENTS_PURCHASES CASH_ADVANCE PURCHASES_FREQUENCY ONEOFF_PURCHASES_FREQUENCY PURCHASES_INSTALLMENTS_FREQU

0 40.900749 0.818182 95.40 0.00 95.4 0.000000 0.166667 0.000000 0.0

1 3202.467416 0.909091 0.00 0.00 0.0 6442.945483 0.000000 0.000000 0.0

2 2495.148862 1.000000 773.17 773.17 0.0 0.000000 1.000000 1.000000 0.0

3 1666.670542 0.636364 1499.00 1499.00 0.0 205.788017 0.083333 0.083333 0.0

4 817.714335 1.000000 16.00 16.00 0.0 0.000000 0.083333 0.083333 0.0

In [26]:
# Plot the histogram of various clusters

for i in df.columns:

plt.figure(figsize=(35,5))

for j in range(7):

plt.subplot(1,7,j+1)

cluster = df_cluster[df_cluster['cluster'] == j]

cluster[i].hist(bins = 20)

plt.title('{} \nCluster {}'.format(i,j))

plt.show()

Apply PCA
Principal Component Analysis (PCA)
PCA is an unsupervised machine learning algorithm.
PCA performs dimensionality reductions while attemting at keeping the original information unchanged.
PCA works by trying to find a new set of features called components.
Components are composites of the uncorrelated given input features.

In [27]:
# Obtain the principal components

pca = PCA(n_components = 2)

principal_comp = pca.fit_transform(df_scaled)

principal_comp

array([[-1.68222054, -1.07645217],

Out[27]:
[-1.13830193, 2.50644962],

[ 0.96968128, -0.38351775],

...,

[-0.9262004 , -1.81077471],

[-2.33654769, -0.65795594],

[-0.55642242, -0.40046607]])

In [28]:
# Create a dataframe with the two components

pca_df = pd.DataFrame(data = principal_comp, columns = ['pca1', 'pca2'])

pca_df.head()

Out[28]: pca1 pca2

0 -1.682221 -1.076452

1 -1.138302 2.506450

2 0.969681 -0.383518

3 -0.873628 0.043159

4 -1.599433 -0.688578

In [29]:
# Concatenate the clusters labels to the dataframe

pca_df = pd.concat([pca_df, pd.DataFrame({'cluster':labels})], axis=1)

pca_df.head()

Out[29]: pca1 pca2 cluster

0 -1.682221 -1.076452 3

1 -1.138302 2.506450 1

2 0.969681 -0.383518 0

3 -0.873628 0.043159 3

4 -1.599433 -0.688578 3

In [30]:
label = kmeans.fit_predict(df_scaled)

df['label'] = label

plt.rcParams['figure.figsize'] = (12,8)

sns.scatterplot(df['BALANCE'],df['PURCHASES'], hue=df['label'], palette=['red','green','blue','yellow'])

plt.title('Clusters of Balance vs Purchases')

plt.xlabel('Balance')

plt.ylabel('Purchases')

Text(0, 0.5, 'Purchases')

Out[30]:

Here is the scatter plot of balance and purchases seperated by clusters

Here we can see that cluster 0 are high spenders with the highest balance while cluster 1 are people with higher balance but not as big of spenders. Cluster 2,3 are people who do not spend as
much and have relatively lower balance (down to zero).

In [31]:
plt.figure(figsize = (12,8))

ax = sns.scatterplot(x='pca1', y='pca2', hue='cluster',

data=pca_df,

palette = ['red','green','blue','yellow'])

plt.show()

CONCLUSION
In conclusion, with this information from our cluster analysis, as a credit card company, we could spend more time on marketing campaigns on the right people. People in cluster 0 and 1 clearly have the capacity to spend and
since they are already spending we could use their spending habits to optimize the strategies to get them to spend even more. The analysis also tells us that there is untapped potential in people from cluster 2 and 3. These
people already some balance but are not purchasing as much, with the right push we might be able to get them to use the card for spending and become important sources of revenue.

H17picoursetext PDF
No ratings yet
H17picoursetext PDF
301 pages
Apex Financial Services Loan Data Automation
No ratings yet
Apex Financial Services Loan Data Automation
18 pages
Staffing Plan Excel Template
No ratings yet
Staffing Plan Excel Template
10 pages
Domain Name System: Window Server 2012 R2
No ratings yet
Domain Name System: Window Server 2012 R2
46 pages
Operation Research Project: Scotsville Textile Mill
100% (1)
Operation Research Project: Scotsville Textile Mill
7 pages
Staffing Plan Excel Template
100% (2)
Staffing Plan Excel Template
10 pages
Observation: As We Can See We Have Threwe Types of Datatypes I.E. (Int, Float, Object) That Means We Have Both Categorical and Numerical Data
No ratings yet
Observation: As We Can See We Have Threwe Types of Datatypes I.E. (Int, Float, Object) That Means We Have Both Categorical and Numerical Data
2 pages
Company-List - Import Injection Machine From 2012 - 5001-6000
No ratings yet
Company-List - Import Injection Machine From 2012 - 5001-6000
63 pages
Violin 2020 Grade 1 PDF
No ratings yet
Violin 2020 Grade 1 PDF
18 pages
Liturgy of St. John (Eliz. English) - Staff Notation
100% (2)
Liturgy of St. John (Eliz. English) - Staff Notation
99 pages
Biography Text of BJ
100% (2)
Biography Text of BJ
3 pages
Week 2 Nursery
No ratings yet
Week 2 Nursery
12 pages
d04634 41 Value Sheet Nortrol Mu
100% (1)
d04634 41 Value Sheet Nortrol Mu
8 pages
Template 01 FIFO Inventory
No ratings yet
Template 01 FIFO Inventory
1,366 pages
Character When Relevant
No ratings yet
Character When Relevant
4 pages
UPES-CCE - MBA - SEM4 - Dissertation Topics
No ratings yet
UPES-CCE - MBA - SEM4 - Dissertation Topics
33 pages
Letter To Governor
No ratings yet
Letter To Governor
4 pages
O.R Case Study
100% (1)
O.R Case Study
7 pages
MRA Project Milestone 1 PDF
No ratings yet
MRA Project Milestone 1 PDF
1 page
Notis Georgiou, Portfolio
No ratings yet
Notis Georgiou, Portfolio
75 pages
Week3 Logistic Regression Post PDF
No ratings yet
Week3 Logistic Regression Post PDF
110 pages
PCL Cordinates
No ratings yet
PCL Cordinates
122 pages
Music Assignment 1
No ratings yet
Music Assignment 1
3 pages
Building Successful Internships - Lessons From The Research For Interns, Schools, and Employers
No ratings yet
Building Successful Internships - Lessons From The Research For Interns, Schools, and Employers
22 pages
Biol 224 Lab Manual
No ratings yet
Biol 224 Lab Manual
92 pages
3rd Attempt Revised
No ratings yet
3rd Attempt Revised
137 pages
Arabic Dynamic Farmland Objects Generator 1
No ratings yet
Arabic Dynamic Farmland Objects Generator 1
288 pages
Data Pre Processing 1
No ratings yet
Data Pre Processing 1
35 pages
ML Lab Manual
No ratings yet
ML Lab Manual
53 pages
Bank Loan
No ratings yet
Bank Loan
85 pages
Dự báo và phát triển kinh doanh
No ratings yet
Dự báo và phát triển kinh doanh
43 pages
Customer Churn Syntax
No ratings yet
Customer Churn Syntax
66 pages
Diapositivas Grupo Ingles
No ratings yet
Diapositivas Grupo Ingles
23 pages
ML LAB Prob 1 5
No ratings yet
ML LAB Prob 1 5
22 pages
AIML Lab Ex 3-5 - 1
No ratings yet
AIML Lab Ex 3-5 - 1
31 pages
Machine Tool Industry in India
No ratings yet
Machine Tool Industry in India
26 pages
Analytics Functions Demo
No ratings yet
Analytics Functions Demo
36 pages
Rowid Value Count Area MIN MAX Range: #Area MAX MIN Promedio Area Acumulad A Area Sobre La Curva
No ratings yet
Rowid Value Count Area MIN MAX Range: #Area MAX MIN Promedio Area Acumulad A Area Sobre La Curva
17 pages
Jurisprudence Syllabus - NAAC - New
No ratings yet
Jurisprudence Syllabus - NAAC - New
8 pages
Art Market
No ratings yet
Art Market
23 pages
ML Lab Manual 1-10
No ratings yet
ML Lab Manual 1-10
58 pages
Parental Involvement and The Reading Ability Skills of Grade Three Learners
No ratings yet
Parental Involvement and The Reading Ability Skills of Grade Three Learners
15 pages
Istanbul Aydin University Faculty of Engineering Department of Industrial Engineering
No ratings yet
Istanbul Aydin University Faculty of Engineering Department of Industrial Engineering
11 pages
56 Supreme Court Reports Annotated: Velarde vs. Court of Appeals
No ratings yet
56 Supreme Court Reports Annotated: Velarde vs. Court of Appeals
15 pages
Assignment2 VidulGarg
No ratings yet
Assignment2 VidulGarg
11 pages
T R O D P I: Echnical Eport ON Racle Atabase Erformance Ssues
No ratings yet
T R O D P I: Echnical Eport ON Racle Atabase Erformance Ssues
18 pages
Employee Turnover Analytics
No ratings yet
Employee Turnover Analytics
32 pages
ML Cops
No ratings yet
ML Cops
17 pages
Nikitha
No ratings yet
Nikitha
15 pages
Untitled 91
No ratings yet
Untitled 91
6 pages
Sumanca 1485 Cap
No ratings yet
Sumanca 1485 Cap
12 pages
Eny Eg LH5 PP LWRJQ AJCb 8 S65 HT0 Ty 8 Q
No ratings yet
Eny Eg LH5 PP LWRJQ AJCb 8 S65 HT0 Ty 8 Q
9 pages
Practical 3
No ratings yet
Practical 3
8 pages
Ref Res Pit Diamond
No ratings yet
Ref Res Pit Diamond
7 pages
Receivable Record (6-5-24)
No ratings yet
Receivable Record (6-5-24)
36 pages
Lab3.ipynb - Colaboratory
No ratings yet
Lab3.ipynb - Colaboratory
7 pages
MCP Sparse Sdpa
No ratings yet
MCP Sparse Sdpa
9 pages
Ficha TUT5528
No ratings yet
Ficha TUT5528
8 pages
Unit 2 - Assignment Brief 1
No ratings yet
Unit 2 - Assignment Brief 1
5 pages
Aosdijfpqoiew
No ratings yet
Aosdijfpqoiew
6 pages
Aligarh Slip 28 Jan 25
No ratings yet
Aligarh Slip 28 Jan 25
8 pages
20MIS1025 - Comparative Analysis - Ipynb - Colaboratory
No ratings yet
20MIS1025 - Comparative Analysis - Ipynb - Colaboratory
6 pages
Data Cleaning
No ratings yet
Data Cleaning
10 pages
Churn V2
No ratings yet
Churn V2
15 pages
Access Chamber
No ratings yet
Access Chamber
5 pages
Solving Linear Programming Problem Using LINGO Software: ABC Transistor Radio Company Code For Execution
No ratings yet
Solving Linear Programming Problem Using LINGO Software: ABC Transistor Radio Company Code For Execution
4 pages
10 Civics Ch-1 Notes
No ratings yet
10 Civics Ch-1 Notes
4 pages
Insurance Dataset Description2
No ratings yet
Insurance Dataset Description2
4 pages
Here Are 40 Common Accounting Interview Questions and Answers For Freshers
No ratings yet
Here Are 40 Common Accounting Interview Questions and Answers For Freshers
4 pages
Supplychain Businessmodel
No ratings yet
Supplychain Businessmodel
27 pages
Pat B.ing Kls 3
No ratings yet
Pat B.ing Kls 3
5 pages
Supply Chain Error Analysis Problems
No ratings yet
Supply Chain Error Analysis Problems
4 pages
Credit Card Default
No ratings yet
Credit Card Default
5 pages
IEEE 30 Bus
No ratings yet
IEEE 30 Bus
5 pages
Activins in Adipogenesis and Obesity: Review
No ratings yet
Activins in Adipogenesis and Obesity: Review
4 pages
Energy Use ANOVA Table
No ratings yet
Energy Use ANOVA Table
3 pages
Q 1
No ratings yet
Q 1
2 pages
Exp 10
No ratings yet
Exp 10
1 page
HSN May KB
No ratings yet
HSN May KB
2 pages
Usando El Programa Lindo para La Resolución
No ratings yet
Usando El Programa Lindo para La Resolución
2 pages
DSM V Adhd
No ratings yet
DSM V Adhd
1 page
Heart Diseases EDA
No ratings yet
Heart Diseases EDA
1 page
Gunluk - Plan 7 Ingilizce 33 39924
No ratings yet
Gunluk - Plan 7 Ingilizce 33 39924
1 page
Raya Contoh
No ratings yet
Raya Contoh
1 page
Expand All Collapse All
No ratings yet
Expand All Collapse All
1 page
House Price Prediction
No ratings yet
House Price Prediction
1 page
Project Manager or Project Designer or Project Architect
No ratings yet
Project Manager or Project Designer or Project Architect
4 pages
Millennium City: Death Comes Univited
From Everand
Millennium City: Death Comes Univited
Joygopal Podder
No ratings yet
You Can (Marathi)
From Everand
You Can (Marathi)
George Matthew Adams
No ratings yet
Kayaking with Eric Jackson: Strokes and Concepts
From Everand
Kayaking with Eric Jackson: Strokes and Concepts
Eric Jackson
No ratings yet