Aerofit - Business - Case - JupyterLab
Aerofit - Business - Case - JupyterLab
C:\Users\Hp\AppData\Local\Temp\ipykernel_19688\4030016802.py:2: DeprecationWarning:
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
import pandas as pd
Data Cleaning
In [3]: df.head()
Out[3]: Product Age Gender Education MaritalStatus Usage Fitness Income Miles
In [4]: df.info()
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 1/36
3/19/24, 8:58 PM Aerofit_business_case
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 180 entries, 0 to 179
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Product 180 non-null object
1 Age 180 non-null int64
2 Gender 180 non-null object
3 Education 180 non-null int64
4 MaritalStatus 180 non-null object
5 Usage 180 non-null int64
6 Fitness 180 non-null int64
7 Income 180 non-null int64
8 Miles 180 non-null int64
dtypes: int64(6), object(3)
memory usage: 12.8+ KB
In [5]: df.describe(include='all')
Out[5]: Product Age Gender Education MaritalStatus Usage Fitness Income Miles
count 180 180.000000 180 180.000000 180 180.000000 180.000000 180.000000 180.000000
top KP281 NaN Male NaN Partnered NaN NaN NaN NaN
mean NaN 28.788889 NaN 15.572222 NaN 3.455556 3.311111 53719.577778 103.194444
std NaN 6.943498 NaN 1.617055 NaN 1.084797 0.958869 16506.684226 51.863605
min NaN 18.000000 NaN 12.000000 NaN 2.000000 1.000000 29562.000000 21.000000
25% NaN 24.000000 NaN 14.000000 NaN 3.000000 3.000000 44058.750000 66.000000
50% NaN 26.000000 NaN 16.000000 NaN 3.000000 3.000000 50596.500000 94.000000
75% NaN 33.000000 NaN 16.000000 NaN 4.000000 4.000000 58668.000000 114.750000
max NaN 50.000000 NaN 21.000000 NaN 7.000000 5.000000 104581.000000 360.000000
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 2/36
3/19/24, 8:58 PM Aerofit_business_case
In [6]: df.shape
Out[6]: (180, 9)
In [7]: df.isnull().sum()
Out[7]: Product 0
Age 0
Gender 0
Education 0
MaritalStatus 0
Usage 0
Fitness 0
Income 0
Miles 0
dtype: int64
In [8]: # Clipping the 'Age' column values to be within the 5th and 95th percentiles
clipped_data = np.clip(df['Age'], np.percentile(df['Age'], 5), np.percentile(df['Age'], 95))
# Replacing the original 'Age' column with the clipped values
df["Age"]=clipped_data
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 3/36
3/19/24, 8:58 PM Aerofit_business_case
else:
return 'High'
df['Price'] = df['Product'].apply(get_price)
In [11]: df.head()
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 4/36
3/19/24, 8:58 PM Aerofit_business_case
0 KP281 20.0 Male 14 Single 3 4 29562 112 Young Low 1500 High
In [12]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 180 entries, 0 to 179
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Product 180 non-null object
1 Age 180 non-null float64
2 Gender 180 non-null category
3 Education 180 non-null int64
4 MaritalStatus 180 non-null category
5 Usage 180 non-null category
6 Fitness 180 non-null category
7 Income 180 non-null int64
8 Miles 180 non-null int64
9 Age Category 180 non-null category
10 Income Category 180 non-null category
11 Price 180 non-null int64
12 Miles Category 180 non-null category
dtypes: category(7), float64(1), int64(4), object(1)
memory usage: 10.9+ KB
plt.figure(figsize=(12,8))
plt.subplot(2,3,1)
plt.boxplot(df["Age"])
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 5/36
3/19/24, 8:58 PM Aerofit_business_case
plt.xlabel('Age')
plt.title('Box Plot of Age')
plt.subplot(2,3,2)
plt.boxplot(df["Education"])
plt.xlabel('Education')
plt.title('Box Plot of Education')
plt.subplot(2,3,3)
plt.boxplot(df["Usage"])
plt.xlabel('Usage')
plt.title('Usage')
plt.subplot(2,3,4)
plt.boxplot(df["Income"])
plt.xlabel('Income')
plt.title('Box Plot of Income')
plt.subplot(2,3,5)
plt.boxplot(df["Miles"])
plt.xlabel('Miles')
plt.title('Box Plot of Miles')
plt.subplot(2,3,6)
plt.boxplot(df["Fitness"])
plt.xlabel('Fitness')
plt.title('Box Plot of Fitness')
plt.subplots_adjust(wspace=0.5)
plt.tight_layout()
plt.show()
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 6/36
3/19/24, 8:58 PM Aerofit_business_case
After cleaning the data by changing datatypes, adding informations and getting the overall structure of data like outliers we move on to the
next stage. In data cleaning we have added new columns and had found out the outliers in each numerical data.
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 7/36
3/19/24, 8:58 PM Aerofit_business_case
plt.subplot(2, 3, 1)
sns.boxplot(x='Product', y='Age', data=df)
plt.subplot(2, 3, 2)
sns.boxplot(x='Product', y='Education', data=df)
plt.subplot(2, 3, 3)
sns.boxplot(x='Product', y='Usage', data=df)
plt.subplot(2, 3, 4)
sns.boxplot(x='Product', y='Income', data=df)
plt.subplot(2, 3, 5)
sns.boxplot(x='Product', y='Miles', data=df)
plt.tight_layout()
plt.show()
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 8/36
3/19/24, 8:58 PM Aerofit_business_case
The above plots shows us the distribution of product over each numeric features. For Age the two products KP281 and KP481 have the range
between around 20 to 45. While KP781 has the range from around 25 to 37 with a few outliers around 40. The meidan for all the produtcs are
more or less same. But it is observed that KP281 has most concerated age group between 23 to 33, for KP481 its around 24 to 34 and for
KP781 its around 25 to 30. As for Education KP281 and KP481 has the same amount of distribution but for KP781 the education is rather high
with the data starting from education level 14 which is the 25th percentile of both KP281 and KP481. The crowded education level is 16 to 18
for KP781. When it comes to usage its come to light the lower model is used 3 to 4 majorly and the moderate model is used very less 3 to 3.2
and the higher end model is used 4 to 5 times majorly. As for income KP781 model is preferred by higher income people while only a very few
people buy KP481. Majority buys KP781 then KP281 and then KP481. And for mile ran, people tend to run more miles in KP781 and then
KP481 and then KP281. with very few outliers.
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 9/36
3/19/24, 8:58 PM Aerofit_business_case
Based on the age distribution, target marketing efforts for KP281 and KP481 towards individuals in the 20-45 age range. For KP781, focus on
individuals aged 25-37, highlighting its features suitable for this age group. : Consider adding features to KP781 that cater to individuals in the
25-30 age range, as this seems to be a concentrated group for this product. For KP281 and KP481, focus on features that appeal to the 23-33
and 24-34 age groups, respectivel ss: Since KP781 seems to be preferred by individuals with higher education levels, consider educational
campaigns or partnerships to highlight the benefits of this model to educated consum erns: Understand the reasons behind the usage
patterns observed. For example, why is the moderate model (KP481) used less frequently in the 3-3.2 range? This information can help tailor
marketing or product strategies to increase tegies: Given the income distribution, consider adjusting pricing strategies to make KP781 more
accessible to moderate-income individuals. This could involve discounts or financing omotions: Since people tend to run more miles with
KP781, promote this model as suitable for individuals looking for high-performance equipment for their fitness needs.
plt.show()
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 10/36
3/19/24, 8:58 PM Aerofit_business_case
From the plots we can conclude that people with low income buy only KP281 the lower model. While Moderate category is clustered around
KP281 and KP481, very few buy the top end model KP781. Coming to the People with higher income they are surprisingly they are equally
distributed over the 3 models. And for Age both young and middle_aged people prefer the KP281 and KP481 models.
Focus marketing efforts for the KP281 model towards individuals with lower income, highlighting its affordability and value proposition. For
the KP781 model, emphasize its premium features and benefits to attract higher-income individuals. : Consider introducing more mid-range
features in the KP481 model to cater to the moderate income group, potentially increasing its appeal and market shar ts: Evaluate the pricing
strategy for the KP781 model to make it more attractive to individuals with higher incomes. This could involve offering discounts or
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 11/36
3/19/24, 8:58 PM Aerofit_business_case
promotions to encourage purcha ment: Engage with customers in the moderate income category to understand their preferences and
expectations better. This insight can be used to enhance product features and improve customer satisfa diences
plt.subplot(2,2,3)
sns.countplot(x='Fitness', hue='Product', data=df)
plt.subplot(2,2,4)
sns.countplot(x='Gender', hue='Product', data=df)
plt.show()
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 12/36
3/19/24, 8:58 PM Aerofit_business_case
It is evident, from the illustrated graphs that both the partnered and sinlge groups prefer the products from lower end to higher end that is
KP281, KP481, KP781. But its noted that compared to partenerd people singles buy less products. On the basis of young and old aged, Young
people tend to buy KP281 model more than the other two models with KP481 second and KP781 third. But the middle aged people prefer
both the KP281 and KP481 on the same level and KP781 much less. When it comes to fitness level, people with lower fitness level prefer the
products which are in lower end and moderate. People who are average fit are considering the product KP281 very much and then KP481 with
few precendts lower than KP481. On this level people start to buy the higher end model. People who are very fit buy the KP781 mostly and
very few prefer the KP281. At last when it comes to Gender Males buy the higher end model more while only very few buy the higher end
model. males also prefer KP781 than KP481 while in females its the opposite. The KP281 model reigns the top in both the genders.
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 13/36
3/19/24, 8:58 PM Aerofit_business_case
Tailor marketing campaigns to highlight the affordability and value proposition of KP281 for singles, as they show a preference for this model.
For partnered individuals, emphasize the benefits and features of KP481, which seems to be their preferred choice. : Consider introducing
features in KP281 and KP481 that cater to the preferences of young individuals, as they show a preference for these models. For middle-aged
individuals, focus on improving features in KP281 and KP481 to maintain their interes es: Since fitness level seems to influence product choice,
consider incorporating features in KP781 that appeal to very fit individuals, such as advanced performance tracking or training programs. For
less fit individuals, emphasize ease of use and accessibility in KP281 and KP ting: Recognize the gender differences in product preferences and
tailor marketing strategies accordingly. For example, target males with messaging that highlights the premium features of KP781, while
focusing on the value and versatility of KP281 for fe tation: Utilize the insights from age, marital status, fitness level, and gender preferences to
segment the market effectively. This can help in developing targeted marketing campaigns and product offerings that cater to specific
segments more effectively.
average_miles_ran_product = df.groupby('Product')['Miles'].mean()
plt.figure(figsize=(8, 6))
plt.subplot(1,2,1)
sns.barplot(x=MilesRun_fitness.index, y=MilesRun_fitness.values)
plt.xlabel('Fitness Level')
plt.ylabel('Average Miles ran')
plt.title('Average miles ran by Fitness Level')
plt.subplots_adjust(wspace=0.5)
plt.subplot(1,2,2)
sns.barplot(x=average_miles_ran_product.index, y=average_miles_ran_product.values)
plt.xlabel('Product')
plt.ylabel('Average miles')
plt.title('Average miles ran in Product')
plt.show()
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 14/36
3/19/24, 8:58 PM Aerofit_business_case
Product KP281 has an average of approximately 82.79 miles run, which falls within the lower range of the fitness scale. Product KP481 has an
average of approximately 87.93 miles run, which is slightly higher than KP281 but still falls within the lower range of the fitness scale. Product
KP781 has a significantly higher average of approximately 166.90 miles run, indicating that it is associated with higher fitness levels. Products
KP281 and KP481 are associated with lower average miles run, suggesting that they may be more suitable for individuals with lower fitness
levels or those just starting their fitness journey. Product KP781, with its significantly higher average miles run, seems to be more popular
among individuals with higher fitness levels who run longer distances.
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 15/36
3/19/24, 8:58 PM Aerofit_business_case
For individuals looking to improve their fitness or increase their running mileage, products like KP281 or KP481 could be suitable starting
points. Those already at a higher fitness level or aiming to increase their running distance may find product KP781 more suitable.
Coorrelation Matrix
In [19]: numeric_cols = df.select_dtypes(include=['number'])
numeric_cols = numeric_cols.drop('Price', axis=1)
# Calculate the correlation matrix
correlation_matrix = numeric_cols.corr()
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 16/36
3/19/24, 8:58 PM Aerofit_business_case
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 17/36
3/19/24, 8:58 PM Aerofit_business_case
Age and Education: There is a positive correlation of approximately 0.28 between age and education, indicating that as age increases,
education tends to be higher. This could suggest that older individuals in this dataset tend to have higher levels of education. Age and
Income: There is a moderate positive correlation of approximately 0.51 between age and income. This suggests that older individuals tend to
have higher incomes, which is a common trend in many societies due to career advancement and experience. Age and Miles: The correlation
between age and miles run is very low (0.04), indicating that there is no significant relationship between age and the number of miles run. This
suggests that age does not directly influence the distance people run. Age and Price: The correlation between age and price is also very low
(0.02), indicating that there is no significant relationship between age and the price of the product. Education and Income: There is a positive
correlation of approximately 0.63 between education and income, indicating that individuals with higher education levels tend to have higher
incomes. This suggests that education may play a role in determining income levels. Education and Miles: There is a moderate positive
correlation of approximately 0.31 between education and miles run, suggesting that individuals with higher education levels may engage in
more running or fitness activities. Education and Price: There is a moderate positive correlation of approximately 0.56 between education and
price, indicating that products with higher prices may be more popular among individuals with higher education levels. Income and Miles:
There is a moderate positive correlation of approximately 0.54 between income and miles run, suggesting that individuals with higher incomes
may engage in more running or fitness activities. Income and Price: There is a strong positive correlation of approximately 0.70 between
income and price, indicating that products with higher prices are more likely to be purchased by individuals with higher incomes.
Target marketing efforts towards older individuals, as they tend to have higher levels of education and income, making them potentially more
receptive to higher-priced products. Consider offering discounts or promotions to attract younger individuals, as they may be more price-
sensitive. Develop products or services that cater to individuals with higher education levels, as they may be willing to pay more for premium
offerings. Focus on promoting the health and fitness benefits of products to individuals with higher education and income levels, as they may
be more interested in maintaining a healthy lifestyle.
plt.figure(figsize=(6, 4))
sns.countplot(x='Gender', data=df)
plt.title('Count of Gender')
plt.show()
plt.figure(figsize=(10, 6))
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 18/36
3/19/24, 8:58 PM Aerofit_business_case
# Bivariate Analysis
plt.figure(figsize=(12, 6))
sns.histplot(x='Age', hue='Product', data=df, multiple='stack')
plt.title('Age vs Product Purchased')
plt.show()
C:\Users\Hp\AppData\Local\Temp\ipykernel_19688\3855968921.py:3: UserWarning:
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
sns.distplot(df['Age'])
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 19/36
3/19/24, 8:58 PM Aerofit_business_case
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 20/36
3/19/24, 8:58 PM Aerofit_business_case
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 21/36
3/19/24, 8:58 PM Aerofit_business_case
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 22/36
3/19/24, 8:58 PM Aerofit_business_case
Age Distribution: The average age of the customers is approximately 28.64 years, with a standard deviation of 6.45 years. The ages range from
20 to 43.05 years, with most customers falling between 24 and 33 years old. The distribution is slightly positively skewed, as the mean (28.64)
is greater than the median (26). Gender Distribution: There are more male customers (104) than female customers (76). Income Distribution:
The average income of customers is approximately $53,719.58$, with a standard deviation of $16,506.68$. Incomes range from $29,562$ to
$104,581$, with most customers earning between $44,058.75$ and $ 58,668 $$ $. Count of Each Product Purchased by Age Group: Product
KP281 is most popular among younger customers, with the highest counts in the 20-25 age range. Product KP481 is popular among
customers aged 25-35, with the highest counts in the 25-30 age range. Product KP781 is also popular among customers aged 25-35, with the
highest counts in the 25-30 age range.
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 23/36
3/19/24, 8:58 PM Aerofit_business_case
Targeting Strategies: For Product KP281, focus marketing efforts on younger customers aged 20-25, as they make up a significant portion of
the customer base for this product. For Product KP481 and KP781, target customers aged 25-35, as they are the primary age group for these
products. Product Development: Consider developing new products or modifying existing ones to cater to the preferences and needs of
different age groups. For example, products targeting older customers could focus on advanced features or specialized functionalities. Pricing
Strategies: Since customers with higher education levels and incomes tend to purchase higher-priced products, consider offering premium
features or services for these products to justify the higher prices. Gender-Specific Marketing: Tailor marketing campaigns to appeal to both
male and female customers, considering their different preferences and purchasing behaviors. Customer Engagement: Engage with customers
in the 25-35 age range through targeted promotions, loyalty programs, and personalized recommendations to enhance customer satisfaction
and loyalty.
Marginal Probability
In [21]: # Create a contingency table for Product Purchased with marginal probability
marginal_probability = pd.crosstab(index=df['Product'], columns='count', normalize=True)
print(marginal_probability)
col_0 count
Product
KP281 0.444444
KP481 0.333333
KP781 0.222222
KP281: Approximately 44.44% of the entries correspond to this product. KP481: Approximately 33.33% of the entries correspond to this
product. KP781: Approximately 22.22% of the entries correspond to this product.
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 24/36
3/19/24, 8:58 PM Aerofit_business_case
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 25/36
3/19/24, 8:58 PM Aerofit_business_case
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 26/36
3/19/24, 8:58 PM Aerofit_business_case
Age Category: Younger customers (age category "Young") have the highest probability of buying product KP281 (0.469), while middle-aged
customers have similar probabilities for all products. Age category seems to have a varying impact on the probability of buying different
products. Income Category: Customers with low income have a 100% probability of buying product KP281, while those with high income have
a higher probability of buying KP781.Income category strongly influences the probability of buying different products. Miles Category:
Customers who run fewer miles ("Low" and "Medium" categories) have a higher probability of buying product KP281. Customers who run
more miles ("High" and "Very High" categories) have a higher probability of buying product KP481. Gender: Females have a higher probability
of buying product KP281, while males have a higher probability of buying KP481. Gender plays a role in product preference. Marital Status:
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 27/36
3/19/24, 8:58 PM Aerofit_business_case
There is a slight difference in the probability of buying products based on marital status, but it's not as pronounced as other factors. Fitness:
Customers with higher fitness levels (e.g., fitness level 5) have a higher probability of buying product KP781, which might be more suitable for
advanced fitness enthusiasts. Usage: Customers who use the product more frequently (e.g., usage level 5) have a higher probability of buying
product KP781. Education: Customers with lower education levels (e.g., education level 12, 13, 14) have a higher probability of buying product
KP281, while those with higher education levels (e.g., education level 20, 21) have a higher probability of buying product KP781.
Marketing Strategy: Target marketing efforts towards younger customers for product KP281, highlighting its appeal to this demographic.
Focus on marketing KP481 to customers with higher incomes and who run more miles, as they are more likely to purchase this product.
Promote product KP781 to customers with higher fitness levels, higher usage, and higher education levels. Product Development: Consider
developing product features that cater to the preferences and needs of customers with different income levels, fitness levels, and usage
patterns.Create marketing campaigns that emphasize the benefits of each product based on the target customer's profile. Customer
Segmentation:Segment customers based on their age, income, miles run, gender, marital status, fitness level, usage, and education to tailor
marketing strategies and product offerings accordingly.Use customer segmentation to offer personalized recommendations and promotions,
enhancing customer satisfaction and loyalty.
Age Category
Younger customers (age category "Young") have the highest probability of buying product KP281 (0.469), while middle-aged customers have
similar probabilities for all products. Age category seems to have a varying impact on the probability of buying different products.
Focus marketing efforts for product KP281 towards younger customers, highlighting features or benefits that appeal to this demographic.
Consider different marketing strategies for middle-aged customers to increase their interest in all products, as they seem to have a more
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 28/36
3/19/24, 8:58 PM Aerofit_business_case
balanced preference.
Probability of Female and Male customers who are Married and Single buying a
Product
In [25]: # Calculate the cross-tabulation of 'MaritalStatus' and 'Gender'
pd.crosstab(df['MaritalStatus'], df['Gender'], normalize='all',margins=True)
MaritalStatus
Partnered females have a lower probability of buying products compared to partnered males.Single females have a lower probability of buying
products compared to single males. Overall, males have a higher probability of buying products compared to females, regardless of marital
status.
Tailor marketing strategies to appeal to partnered females, highlighting product benefits that resonate with this demographic.Consider
offering promotions or discounts that specifically target single females to increase their likelihood of purchasing products.Engage with both
partnered and single females to understand their preferences and reasons for lower purchase probabilities, then adjust marketing strategies
accordingly. Offer personalized recommendations or loyalty programs to increase the overall purchase probability of females.
Probability of Usage and Fitness, Fitness and Miles Category using the Product
In [40]: # Calculate the cross-tabulation of 'Usage' and 'Fitness'
a=pd.crosstab(df['Usage'], df['Fitness'],normalize=True,margins=True)
# Calculate the cross-tabulation of 'Miles Category' and 'Fitness'
b=pd.crosstab(df['Miles Category'], df['Fitness'],normalize=True,margins=True)
print(a)
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 29/36
3/19/24, 8:58 PM Aerofit_business_case
print()
print(b)
Fitness 1 2 3 4 5 All
Usage
2 0.005556 0.077778 0.100000 0.000000 0.000000 0.183333
3 0.005556 0.055556 0.261111 0.055556 0.005556 0.383333
4 0.000000 0.011111 0.166667 0.038889 0.072222 0.288889
5 0.000000 0.000000 0.011111 0.033333 0.050000 0.094444
6 0.000000 0.000000 0.000000 0.005556 0.033333 0.038889
7 0.000000 0.000000 0.000000 0.000000 0.011111 0.011111
All 0.011111 0.144444 0.538889 0.133333 0.172222 1.000000
Fitness 1 2 3 4 5 All
Miles Category
Low 0.011111 0.133333 0.111111 0.000000 0.000000 0.255556
Medium 0.000000 0.011111 0.238889 0.016667 0.005556 0.272222
High 0.000000 0.000000 0.166667 0.044444 0.011111 0.222222
Very High 0.000000 0.000000 0.022222 0.072222 0.155556 0.250000
All 0.011111 0.144444 0.538889 0.133333 0.172222 1.000000
Fitness and Usage: Customers with higher fitness levels (4 and 5) have a higher probability of using the product more frequently (usage 3, 4,
and 5). There is a noticeable increase in the probability of higher fitness levels as the usage level increases. Fitness and Miles Category:
Customers with higher fitness levels (4 and 5) have a higher probability of running more miles (high and very high miles categories). There is a
clear relationship between fitness level and miles run, with higher fitness levels associated with running more miles.
Develop products or features that cater to customers with higher fitness levels, as they are more likely to use the product frequently and run
more miles.Consider offering advanced training programs or personalized coaching for customers with higher fitness levels to enhance their
experience and encourage product loyalty.Target marketing campaigns towards customers with higher fitness levels, highlighting product
benefits that align with their fitness goals and activities. Create promotions or incentives to encourage customers to increase their usage and
mileage, particularly for those with higher fitness levels.
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 30/36
3/19/24, 8:58 PM Aerofit_business_case
Miles Category
Males have a higher probability of falling into the "High" and "Very High" miles categories compared to females. Females have a higher
probability of falling into the "Low" and "Medium" miles categories compared to males.
Tailor marketing efforts towards males for products or services related to high-mileage activities, highlighting features that appeal to this
demographic.Target females with marketing campaigns that emphasize the benefits of products or services suited for low to medium-mileage
activities.Develop products or services that cater to the specific needs and preferences of each gender based on their mileage categories.
Consider offering different product variations or features that target the different mileage categories to attract a wider range of customers.
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 31/36
3/19/24, 8:58 PM Aerofit_business_case
Out[30]: Fitness 1 2 3 4 5
Product Gender
Fitness and Product by Gender: For product KP281, both females and males show an increasing probability of higher fitness levels (3, 4, and 5).
For product KP481, females show a higher probability of having fitness levels 3 and 4, while males show a higher probability of having fitness
levels 3, 4, and 5.For product KP781, females show a higher probability of having fitness level 5, while males show a higher probability of
having fitness levels 4 and 5
Position product KP281 as suitable for individuals looking for a product that supports higher fitness levels, targeting both males and females.
Position product KP481 towards females as it seems to appeal more to those with moderate fitness levels (3 and 4).Position product KP781
towards males as it appears to appeal more to those with higher fitness levels (4 and 5).Tailor marketing messages for each product based on
the fitness preferences of each gender, highlighting how the product aligns with their fitness goals.Consider offering gender-specific
promotions or discounts to further target each demographic effectively.
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 32/36
3/19/24, 8:58 PM Aerofit_business_case
Product Usage
3 0.513514 0.486486
4 0.318182 0.681818
5 0.500000 0.500000
3 0.451613 0.548387
4 0.416667 0.583333
5 1.000000 0.000000
4 0.111111 0.888889
5 0.250000 0.750000
6 0.285714 0.714286
7 0.000000 1.000000
Usage and Product by Gender: For product KP281, females have a higher probability of falling into the "2" and "3" usage categories, while
males have a higher probability for the "4" usage category. For product KP481, there is a more balanced distribution between females and
males across different usage categories, except for the "5" category, where all purchases are from females. For product KP781, females are
more likely to fall into the "5" and "6" usage categories, while males are more likely to fall into the "3" and "4" categories.
Position product KP281 towards females for lower usage categories (2 and 3) and towards males for higher usage categories (4 and 5).Position
product KP481 as a gender-neutral product, as there is a balanced distribution of usage across genders, except for the "5" category, which is
predominantly female.Position product KP781 towards females for higher usage categories (5 and 6) and towards males for moderate usage
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 33/36
3/19/24, 8:58 PM Aerofit_business_case
categories (3 and 4).Develop marketing campaigns that highlight how each product meets the usage preferences of different genders. Offer
promotions or discounts targeted towards each gender based on their usage preferences to increase product adoption and loyalty.
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 34/36
3/19/24, 8:58 PM Aerofit_business_case
Age Category:All three products are popular among young customers, indicating a trend towards fitness among younger demographics.
Gender:KP281 is more popular among females, while KP481 and KP781 are more popular among males. This suggests a gender preference for
different treadmill models. Education:Customers across all products have a similar education level, with the majority having completed 16
years of education. Marital Status:Partnered individuals are more likely to purchase all three products, indicating a potential market segment
that values fitness as a couple or family activity. Usage:Customers across all products have similar usage patterns, with an average usage of 3-4
times per week. Fitness:Customers purchasing KP781 tend to rate their fitness level higher, suggesting that this product may cater to more
fitness-conscious individuals. Income Category:KP281 is popular among customers with moderate income, while KP481 and KP781 are more
popular among customers with high income, indicating a price sensitivity and premium market segment. Miles Category:Customers
purchasing KP781 expect to run/walk the highest number of miles per week, followed by KP481 customers and then KP281 customers.
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 35/36
3/19/24, 8:58 PM Aerofit_business_case
Marketing Strategy: Target young, fitness-conscious individuals for all products but tailor the messaging to appeal to the specific gender
preferences. Product Features: Highlight differentiating features of each product based on customer preferences, such as premium features for
KP481 and advanced fitness tracking for KP781. Price Positioning: Consider pricing strategies that align with the income categories of the
target market for each product. Customer Engagement: Offer personalized training plans, challenges, or rewards programs to encourage
regular usage and engagement. Partnerships: Collaborate with fitness influencers or celebrities who resonate with the target demographics for
each product.
In [ ]:
localhost:8888/lab/workspaces/auto-a/tree/Python/Probability/Aerofit_business_case.ipynb 36/36