Final Project - Data Analytics Case 1
Final Project - Data Analytics Case 1
Team
Omer Rahim
Khawar Malik
13/12/2024
1
Table of Contents
2
Chapter 1: Background, Problem Definition and Research Questions
1.1 Background
Classic Models is a B2B company dedicated to the sale, and distribution of scale replicas
of classic vehicles, including cars, motorcycles, planes, ships, trains, trucks, buses, and vintage
cars. These products are designed for children and collectors of all ages who are passionate about
vehicles and scale models. Its primary customers are specialized wholesalers, such as toy stores,
For this project, we have a database that supports key aspects of commercial operations,
management, orders, offices, deliveries, and returns. This enables the company to understand and
track sales and business process through data analysis. It also will identify potential problems and
revenue.
performance, key product sales, customer segmentations and strategies to optimize sales,
managing inventory levels effectively, as improper handling can disrupt supply chain operations
and adversely impact sales performance across different markets, product lines, and customer
segments. Due to the large number of products managed by Classic Models, there is a possibility
of inadequate inventory management, which could lead to serious problems both inside and outside
the company.
3
If demand is not met effectively, the company could be negatively impacted in areas such
as reputation, brand perception, and loss of sales, which in turn could lead to the loss of customers
and, consequently, affect profitability. Additionally, the lack of knowledge about inventory and
market trends presents a significant risk. However, understanding the sales performance of top
performing markets, the sales trend in the different product line using geospatial data, such as
regional location, providing avenues for customers to increase the purchasing power through credit
lines, will not only boost revenue but can help to understand the market, make data-driven
decisions and also close the gap between the business and the customers.
Figure 1- ER Diagram
4
1. Customers: Classic Model information about customers includes names, addresses, and
2. Employees: Contains details about their employees, including their names, job titles,
3. Orders: Records the orders placed by customers, With it, it is possible to validate the
4. OrderDetails: Provides detailed data about the products in each order, including
Motorcycles). This table is crucial for analyzing sales trends by product category.
1. What is the relationship in sales performance between France, Spain, and USA?
5
1.4.1 The Importance of Data Analysis in each research question
With data analysis, strategies can be identified and generated to mitigate difficulties related
to inventory management. For example, by understanding performance by region, the business can
identify which areas demand higher quantities of products and how to manage this to maintain a
controlled supply based on demand. It is also possible to identify sales trends by product line,
which is key to understanding which items have higher turnover in the inventory, helping us
customer groups based on their preferences allows us to adjust inventory levels to meet specific
demands according to segmentation. Finally, identifying credit and spending patterns, and
understanding how they relate, can provide relevant information for future strategies in credit
Hypothesis One: What is the relationship in sales performance between France, Spain, and USA?
The relevance of this hypothesis is aimed at identifying whether there is a relationship between
sales performance by region. By so doing, adapting inventory capacity according to the needs
of each region to avoid supply chain disruptions or excess inventory by location. It is also
6
important to the purchasing power of the different markets, knowing how to price products to
Lastly, this insight can inform the business to identify the marketing and sales strategies that
works best in all the markets and regions and how to improve the low-performing market
territories.
• Null Hypothesis (H₀): There is no significant sales trend in different product lines.
• Alternative Hypothesis (H₁): There are significant sales trend in different product lines.
Identifying the sales trend in different product lines will help the business to understand which
product lines or models are selling and go a long way to the strategies around sales so not waste
resources. For example, the business can focus more on the products that are selling more or
can choose to put a lot of marketing on the products that are not selling.
Moreover, understanding the sale pattern aids in data-driven decisions and efficient
management of resources. This leads to better returns on investment and sales efforts.
• Null Hypothesis (H₀): There are no changes in customer trends based on segmentation.
segmentation.
Another key aspect of the business is understanding the background and segmentation of
customers such as the age group. This will help the business to introduce customers and
7
optimize customer-focused strategies with respect to the customer’s preference, demographic,
Again, the insight gained from this hypothesis is crucial for resource allocation and targeted
marketing efforts. The business can invest in high growth segments and adjust strategies for
Lastly, identifying the trends in customer segmentation supports long-term planning and
competitiveness. The business can stay informed and be proactive to respond to evolving
market dynamics, staying ahead of competitors by addressing new opportunities and mitigating
Hypothesis Four. What is the relationship between credit limit and amount spent?
• Null Hypothesis (H₀): There is no relationship between the credit limit and the amount
spent.
• Alternative Hypothesis (H₁): There is a relationship between the credit limit and the
amount spent.
Testing the hypothesis about the correlation of credit limits and the amount spent could get
close correlation between the credit limit and amount spent, the business can introduce
policies to improve the credit limits to increasing the customer’s purchasing power.
Also, the business can be able to identify the risk levels of the customer segmentation. As a
result, identify high-value customers who can be target for marketing and individual credit
8
policies to maximize revenue. Some strategic decision-making regarding financing of
Hypothesis One: To answer research question one, it is necessary query the database to
identify the three countries with the highest total sales of the database, for this, we create a
temporary table (CTE). This process joins the tables; customers, orders, orderdetails tables and
grouping the results according to the countries, ordering them according to the results of the total
sales and limiting the query to only three. This refers to the three countries, in addition to this, a
filtering of the data is performed and the results where the status of the order is cancelled are
eliminated.
Once this temporary table is obtained, we proceed to perform the main query, which
involves the display of the country name, the order number, the product code, the quantity ordered,
the price per unit, the total line, the date of the order and the order status. To perform this query, it
9
Figure 2
10
The next step is the prepare the data for analysis by checking for outliers and standardizing
the dataset after to avoid bias and improve the model accuracy.
Hypothesis Two: As we did for question one, it is important to analyze the monthly sales
trends by product line, for example, to carry out an identification of a line as classic models
regarding consistent sales or seasonal peaks. For this reason, a main query is developed, where the
product lines are obtained, a sorting is performed according to the dates and for each of these, the
total value in sales is printed. In addition to this, to give a better clarity to the data obtained, a
filtering of the results is performed, with respect to the capture of results different to a cancelled
status.
With this query, it will be possible to identify patterns and plan inventory strategies with
Figure 3
11
Same as we did the for data from research question one, we check for outliers and
that focuses on categorizing customers with respect to the total value of their orders, using specific
criteria and analyzing their behavior through useful metrics such as the number of orders and the
total value of their purchases. For this reason, a main query is carried out that yields as results the
customer number, customer name and country, also performs counting operations by which the
total value of orders per customer is obtained, and sum operations, by which the total value of the
Finally, validations are made by means of a case that allows filtering and categorizing the
orders in 'High value', 'Mid value' and 'Low value'. To give more veracity and clarity to the results,
in this query, a filtering is performed to exclude the results with Cancelled status.
Figure 4
12
Hypothesis Four: We carry out a similar process to retrieve the dataset by carry out a
query that examines the customers' expenses and evaluate whether they are spending within their
credit limit. It is for this reason that the query focuses on obtaining the data relating to the customer,
the country, the credit limit of each one and performs a sum of the quantities of products ordered
multiplied by their price per unit to obtain the total expenses of each customer.
To obtain this query, it is necessary to make the respective joins of the tables customers,
orders and order details. Finally, the filtering is performed where the orders with cancelled status
are excluded and only the data where the total expense is greater than the credit limit is shown. We
check for outliers and standardize the dataset for further analysis.
Figure 5
13
Chapter 4: Data Understanding
between the three performing countries for the classic model business with respect to the quantity
ordered. We started by running descriptive statistics to understand the data, information on the
mean, minimum and maximum value, variance and standard deviation helps us to understand the
variables and further test to be perform for gain insights for the operation of the models.
The table provides a summary of key descriptive statistics for the dataset, specifically
1. N (Valid Data): There are 1,630 valid observations for each variable, with no missing
values. This indicates that the dataset is complete and reliable for analysis.
2. Quantity Ordered:
• Mean: 35.77 units per order, indicating the average order size across the dataset.
14
• Minimum and Maximum: Orders range from 6 units to 97 units, highlighting a significant
3. Price Each:
• Minimum and Maximum: Prices range from $26.55 to $214.30, suggesting a diverse
• Standard Deviation: $1,669.50, indicating high variability in sales values per order.
• Minimum and Maximum: Sales values range from $531 to $11,170.52, reflecting both
Observations:
The data shows a wide range of order sizes, product prices, and sales values, which is
critical for evaluating sales performance across the three countries (France, Spain, USA).
Identification of Key Metrics, Trends, and Patterns Relevant to the Research Question
Key Metrics:
• Total Sales (Sum of Line Total): $5,291,532.59. This is the overall revenue generated in
the dataset. Comparing this figure across France, Spain, and the USA will reveal regional
performance.
• Average Order Size (Mean Quantity Ordered): Indicates typical buying behavior. For
15
• Price Variability: The standard deviation and range in priceEach suggest diverse product
• The variability in lineTotal suggests that order value is influenced by both quantityOrdered
purchasing habits.
• The maximum sales value of $11,170.52 suggests high-value transactions, which could be
Clear Interpretation of Findings and Insights Derived from the Data Analysis.
• The dataset exhibits substantial variability in order size, price, and sales value, which
provides an opportunity to analyze differences across France, Spain, and the USA.
• High variability in prices and sales values suggests that customer purchasing behavior and
• France, Spain, and the USA likely contribute differently to the total sales of $5.29M.
performance.
Hypothesis two: Hypothesis delves to shows the sales trend across different product lines
(classic cars, motorcycles, planes, ships, trains, trucks and buses, and vintage cars). The goal is to
learn about these distinct product lines totals sales over periods.
16
Figure 7 – Descriptive Analysis of Hypothesis three
The statistics describe the distribution of sales data for a sample size of 181 observations:
17
• Mean (Average): The mean sales amount is $51,742.19, which represents the central
• Median: The median sales value is $36,552.33, indicating that half the sales values are
• Mode: Not available (#N/A), meaning no sales amount occurs more frequently than others
in the data.
• Standard Deviation: Sales data show a high variability, with a standard deviation of
$56,889.24.
• Range: The difference between the highest ($415,952.81) and lowest ($1,860.93) sales is
suggesting that most sales values are relatively low, but a few large values pull the
• Kurtosis: The kurtosis value of 16.72 indicates a distribution with extreme outliers, likely
Identification of Key Metrics, Trends, and Patterns Relevant to the Research Question.
• The research question focuses on sales trends in different product lines. Although the
provided statistics summarize the total sales data, here are observations that might guide
further investigation:
18
• High Variability: The standard deviation and range suggest that sales vary significantly
between product lines or other groupings (e.g., by region, time period, or customer
segment).
• Right-Skewed Distribution: The large skewness and kurtosis values indicate a small
promotional campaigns.
• Median vs. Mean: The mean is much higher than the median, emphasizing that the
Clear Interpretation of Findings and Insights Derived from the Data Analysis
• Key Insight 1: The sales data is unevenly distributed, with some extreme values
significantly influencing the overall metrics. This implies that certain product lines or
• Key Insight 2: The high range and standard deviation suggest that product performance
• Key Insight 3: Outliers and right skewness imply that a small number of sales contribute
disproportionately to the total, which may align with specific customer behaviors or
product-line preferences.
In hypothesis three, we want to understand customer segmentation and their spending behavior.
We segmented the customers into clusters and grouped based on the spending behavior and
19
order value to high-value, mid-value and low value customers. Below is a descriptive statistic
of the data:
• Median: The median value is 3, showing that half of the customers placed three or
fewer orders.
20
• Standard Deviation: A value of 2.77 indicates some variation in the number of orders,
• Range: The range of orders spans from 1 to 25, highlighting outliers where a few
customers placed a low number of orders while a small group placed many.
• Kurtosis (45.03): The high kurtosis suggests extreme outliers, further reinforcing that
TotalOrderValue:
• Median: The median value is $79,306.57, suggesting a slight skew in the data.
• Standard Deviation: With a standard deviation of $93,286.80, the total order value
• Range: The total order value ranges from $7,918.60 to $773,642.18, indicating
• Skewness (5.59): The highly positive skew reflects a few customers with exceptionally
high spending.
• Kurtosis (36.97): The high kurtosis shows that a small proportion of customers
Identification of Key Metrics, Trends, and Patterns Relevant to the Research Question.
The research question examines trends in customer segmentation. Key trends and patterns
include:
21
• Order Frequency: Many customers place a small number of orders (median = 3), but
a few customers have very high order counts, possibly representing a loyal or bulk-
buying segment.
• High Variance in Spending: The disparity in total order values suggests distinct
customers.
sales and order frequency, as evidenced by high skewness and kurtosis for both metrics.
Potential Segments:
• Low-Value, Infrequent Buyers: Customers with low total order value and a small
number of orders.
• Moderate Buyers: Customers with spending and frequency near the mean or median.
Clear Interpretation of Findings and Insights Derived from the Data Analysis
• The data suggests a classic "80/20" Pareto distribution, where a small segment of high-
value customers (20%) accounts for a large share of total revenue (80%).
• The high variability and presence of outliers highlight the importance of personalized
22
Hypothesis Four: The final hypothesis aims to establish a relationship between customers' credit
limit and their spending behavior, It seeks to analyze how the available credit amount affects
customers' purchasing decisions, influencing both the frequency and volume of their purchases.
This behavior can provide key insights for classic model in optimizing their inventory, allowing
them to adjust stock levels based on consumption trends and anticipate demand. In this way,
companies can improve inventory management efficiency, reducing costs and ensuring product
Exploration of Data
1. Credit Limit:
• Mean: 88,150.91
2. Total Spent:
• Mean: 122,289.62
23
• Standard Deviation: 116,478.01
Figure 12 – Box plot Total spent, box plot credit limit, Hypothesis 4
Figure 13 – scatter plot Total spent, scatter plot credit limit, Hypothesis 4
Two outliers were identified in the two analyzed variables. To prevent these data points from
affecting the correlation and given that the distribution of the data is normal, it was decided to
replace them with the mean. This decision is justified as the outliers represent few observations,
minimizing their impact on the analysis and ensuring the integrity and representativeness of the
results. Additionally, it was verified that the data followed a linear distribution using a scatter plot.
24
This allowed for a visual confirmation of the linear relationship between the variables, validating
Clear Interpretation of Findings and Insights Derived from the Data Analysis
• Total Spent:
• The high standard deviation (116,478 USD) and wide range (22,314.36 - 773,624.2 USD)
• Skewness (4.57) and kurtosis (22.57) suggest a heavily skewed distribution with extreme
outliers.
• Credit Limit:
• A smaller standard deviation (36,881.21 USD) compared to total spent suggests less
• Skewness (1.46) and kurtosis (4.45) indicate a moderate skew towards higher credit limits,
25
Chapter 5: Data Visualization
Hypothesis One:
The chart is showing the "Quantity Ordered" and "Price Each" for three different countries: France,
Spain, and the USA. This type of data visualization allows for an easy comparison of these key
France has the lowest quantity ordered at 35 units at a price of $90. Spain and USA have the same
quantity ordered even though USA has the highest price per each product. This could mean USA
customers have more purchasing power, and the business can learn from the marketing and sales
26
Hypothesis Two:
The figure above shows that 2004 contributes the highest percentage of sales, nearly 46.39%,
which is almost half of the total of the entire sales years. The business can provide a further
insight into key factors that led to the success of 2004, re-introduce, and optimize for the 2005
27
Figure 16 - Pie Chart of product line sales from 2003-2005
This pie chart breaks down the sales contributions of different product lines over the 2003-2005
• Motorcycles and Vintage Cars both contributed 14.36% and 14.92% respectively.
• Planes, Ships, and Trains had the lowest sales contributions at 12.71%, 13.26%, and 12.71%
respectively.
So, the data suggests that the Classic Cars product line was the strongest performer in terms of
sales over this 3-year period, while the Planes, Ships, and Trains product lines had the lowest sales
contributions.
28
Hypothesis Three:
It is always good to learn that most of the customers are categorized in the “High value” bracket.
This can further help the products and marketing team to introduce more customer retention and
credit policies to keep the customers while working on improving the sales performance of the
29
Figure 18 - Total Orders by Each Territory
The figure 18 above illustrates the distribution of orders across four geographic territories:
APAC, EMEA, Japan, and NA. The largest share of orders, 34%, comes from NA (North
America), followed closely by EMEA (Europe, Middle East, and Africa) with 33%. Japan
accounts for a moderate share of 25%, while APAC (Asia-Pacific) contributes the smallest
portion at 8%. Combined, NA and EMEA dominate the order distribution, comprising 67% of
the total. This chart emphasizes the significant performance disparity between territories, with
NA leading and APAC showing the least activity. Such insights can help businesses strategize
and focus on underperforming regions like APAC to improve their order distribution.
Territory Country
APAC Australia
30
APAC New Zealand
APAC Singapore
EMEA Austria
EMEA Belgium
EMEA Denmark
EMEA Finland
EMEA France
EMEA Germany
EMEA Ireland
EMEA Italy
EMEA Norway
EMEA Norway
EMEA Spain
EMEA Sweden
EMEA Switzerland
EMEA UK
Japan Japan
Japan Philippines
Japan Singapore
NA Canada
NA USA
31
Hypothesis Four:
Figure 19 details the total spending amounts across various countries. The y-axis represents the
total spent in monetary terms, while the x-axis lists the countries. Notably, the USA shows the
highest total spending at $2,334,180.24, followed by Singapore with $954,584.74 and France with
$672,136.39. Other countries, such as Australia, Italy, and Japan, exhibit moderate spending levels,
while smaller totals are observed for countries like Sweden, Belgium, and Norway.
America) held the largest share of orders (34%), and APAC had the lowest (8%). The dominance
of the USA in spending aligns with NA’s high contribution to the order share. Similarly, the
substantial spending by Singapore (in APAC) indicates its pivotal role within a territory that
otherwise accounted for the smallest order share. In contrast, spending in European countries like
32
France and Germany reflects EMEA’s strong showing in the pie chart, where it contributed 33%
of the orders.
Insight:
The line chart reveals the granular spending dynamics within each territory. While NA and EMEA
dominate order distribution, the spending within these regions is heavily concentrated in a few
countries like the USA, France, and Singapore. This correlation highlights the strategic importance
of these countries to the overall revenue distribution and suggests a need to explore growth
Figure 20
33
Figure 21
o Null Hypothesis (H₀): The variances across the groups (France, Spain, and USA) are
equal.
o Alternative Hypothesis (H₁): The variances across the groups are not equal.
Results:
o Based on the Levene Statistic, the p-values (Sig.) for all tests (Mean, Median, Median
with adjusted df, and Trimmed Mean) are greater than 0.05:
Conclusion:
Since all p-values are greater than 0.05, we fail to reject the null hypothesis. This means the
34
Interpretation of ANOVA Results:
Results:
o F-Statistic: 0.160
Conclusion:
Since the p-value = 0.852 is greater than 0.05, we fail to reject the null hypothesis. This
Summary:
You can conclude that, based on this sample, sales performance is consistent across France,
Spain, and the USA, and no country shows a statistically higher or lower performance than the
others.
35
Hypothesis Two:
Figure 22
Figure 23
Figure 24
Hypotheses Recap:
• Null Hypothesis (H₀): There is no significant sales trend in different product lines.
• Alternative Hypothesis (H₁): There is a significant sales trend in different product lines.
36
Step 1: Levene's Test for Homogeneity of Variances
(Since this was done earlier and you used Welch’s test due to variance inequality)
Indicates that variances across product lines are significantly different (p < 0.05). Therefore, the
Interpretation:
• This indicates that there is a significant difference in total sales trends between the
Conclusion:
• Based on Welch’s ANOVA, there is a significant difference in total sales trends across
product lines. Therefore, we reject the null hypothesis and conclude that sales
37
Hypothesis Three:
Figure 25
Figure 26
Figure 27
38
• Significance (Sig.) = 0.322
Since p > 0.05, you fail to reject the null hypothesis of Levene's Test. This means there is no
significant difference in variances across the groups (customer segments in your case).
ANOVA Table:
• F-Statistic = 1.295
Key Interpretation:
• The F-statistic (1.295) is calculated by dividing the Mean Square Between Groups
• The p-value (Sig.) for the ANOVA is 0.279, which is greater than 0.05.
• Since the p-value (0.279) is greater than 0.05, you fail to reject the null hypothesis (H₀).
Final Conclusion:
39
• The results of your One-Way ANOVA indicate that customer segment does not have a
Hypothesis Four:
Figure 28
Correlation Table:
• N (Number of observations) = 55
Hypotheses Recap:
• Null Hypothesis (H₀): There is no relationship between the credit limit and the amount
spent.
40
• Alternative Hypothesis (H₁): There is a relationship between the credit limit and the
amount spent.
Key Interpretation:
between the credit limit and the amount spent. As the credit limit increases, the
o Since the p-value is less than 0.05, the result is statistically significant. This
means we reject the null hypothesis (H₀) and conclude that there is a relationship
3. Correlation Strength:
o The strong positive correlation (0.839) suggests that the two variables, creditLimit
Conclusion:
Based on the Pearson correlation result, we reject the null hypothesis and accept the
alternative hypothesis. This indicates that there is a significant positive relationship between the
41
Chapter 7 – Model Evaluation
The analysis presented in hypothesis one provides a rigorous evaluation of the sales
performance data across three countries: France, Spain, and the USA. The evaluation is carried out
using two key statistical tests - Levene's Test for Homogeneity of Variances and One-Way
ANOVA.
Levene's Test is used to assess the assumption of equal variances across the three country
groups. The results indicate that the p-values for all the test variants (Mean, Median, Median with
adjusted df, and Trimmed Mean) are greater than the significance level of 0.05. This means the
null hypothesis of equal variances cannot be rejected, and the assumption of homogeneity of
variances is met.
The One-Way ANOVA is then employed to evaluate whether there are any significant
differences in sales performance (lineTotal) between the three countries. The analysis reveals an
F-statistic of 0.160 and a corresponding p-value of 0.852, which is greater than the 0.05
significance level. Therefore, the null hypothesis of no significant difference in sales performance
In summary, the model evaluation based on these statistical tests leads to the following
conclusions:
42
These findings suggest that, based on the given sample, sales performance is consistent across the
three countries, and no country demonstrates a statistically higher or lower performance than the
others.
The strengths of this analysis lie in the rigorous application of well-established statistical
methods, Levene's Test and One-Way ANOVA, to assess the underlying assumptions and draw
conclusions about the sales performance data. The use of these standard benchmarks and criteria
provides a robust evaluation of the model's performance in addressing the research question.
Overall, the model evaluation presented in the document provides a solid foundation for
understanding the sales performance dynamics across the three countries, while also highlighting
the need for ongoing monitoring and analysis to fully address the research question.
In hypothesis two, we employ Levene's Test for Homogeneity of Variances and Welch's
Levene's Test is used to evaluate the assumption of equal variances across the product line groups.
The results indicate that the p-value is less than the significance level of 0.05, rejecting the null
hypothesis of equal variances. This violation of the homogeneity of variances assumption justifies
the use of Welch's ANOVA, a more robust alternative to the standard ANOVA.
The Welch's ANOVA is then applied to examine the differences in total sales trends
between the various product lines. The analysis reveals a Welch statistic of 13.575 with a
corresponding p-value of 0.000, which is less than the 0.05 significance level. This leads to the
rejection of the null hypothesis, indicating that there is a statistically significant difference in sales
43
The strengths of this model evaluation lie in the rigorous application of established
statistical methods, Levene's Test and Welch's ANOVA, to assess the underlying assumptions and
draw conclusions about the sales trends. The use of Welch's ANOVA, which is more appropriate
when the homogeneity of variances assumption is violated, provides a robust and reliable analysis.
However, it is important to note that the implications of these findings are limited to the specific
dataset and context provided. Further research may be necessary to explore the potential drivers
or factors influencing the observed differences in sales performance across product lines.
Additionally, the discussion could be strengthened by considering the practical significance of the
results and their potential implications for product management, pricing strategies, or resource
allocation decisions.
Overall, the model evaluation provides a solid foundation for understanding the sales trends
across different product lines, while also highlighting the need for ongoing monitoring and analysis
✓ The Levene Statistic of 0.992 and a corresponding p-value of 0.322 (greater than 0.05)
indicate that the assumption of equal variances across the customer segments is met.
✓ This justifies the use of the standard One-Way ANOVA, as the homogeneity of variances
44
• One-Way ANOVA Results:
✓ The ANOVA table shows an F-statistic of 1.295 and a p-value of 0.279, which is greater
✓ This means the null hypothesis of no significant differences in total orders across customer
The strengths of this model evaluation lie in the rigorous application of Levene's Test and One-
Way ANOVA, which are well-established benchmarks for assessing the equality of variances and
The clear articulation of the hypotheses, the interpretation of the Levene's Test and ANOVA
results, and the final conclusion provide a comprehensive and statistically sound evaluation of the
model's performance. However, the analysis is limited to the specific dataset and customer segment
groupings provided. The implications of the findings may not directly translate to other contexts
Additionally, while the ANOVA results indicate no statistically significant differences in total
orders across the customer segments, there may still be practical or business-relevant differences
that warrant further investigation. The analysis could be strengthened by considering the
magnitude of the differences, even if they do not meet the statistical significance threshold.
In conclusion, the model evaluation demonstrates a robust and rigorous assessment of the
differences in total orders across customer segments. The findings provide valuable insights, but
their practical implications should be later considered within the broader context in classic model
company.
45
Evaluation of the relationship between credit limit and total amount spent was tested using Pearson
• Pearson Correlation Coefficient: The analysis reveals a strong positive correlation of 0.839
between credit limit and total amount spent. This indicates a significant positive
relationship between the two variables, suggesting that as credit limit increases, the total
• Statistical Significance: The reported p-value of 0.000 is less than the standard significance
level of 0.05, allowing the rejection of the null hypothesis. This means the observed
• Correlation Strength: The strong positive correlation coefficient of 0.839 signifies a robust
relationship between the two variables. This suggests that the credit limit is a strong
The strengths of this model evaluation lie in the application of the well-established Pearson
correlation analysis, which is a widely recognized benchmark for assessing the linear relationship
between two variables. The clear articulation of the hypotheses, the interpretation of the correlation
coefficient, and the assessment of statistical significance provide a comprehensive and rigorous
While the results indicate a strong positive relationship between credit limit and total amount spent,
further research would be needed to infer the underlying drivers and mechanisms behind this
relationship. Additionally, the analysis is limited to the specific dataset provided, and the
implications may not generalize to different contexts or populations without additional validation.
46
Overall, the model evaluation presented in the document provides a robust and statistically sound
assessment of the relationship between credit limit and total amount spent. The findings can serve
as a valuable foundation for further research, customer segmentation, credit risk management, or
Conclusion
This comprehensive analysis examined four key hypotheses using rigorous statistical methods to
gain insights into the sales performance and customer behavior patterns within the classic model
company.
In Hypothesis one, the One-Way ANOVA analysis revealed no statistically significant differences
in sales performance across the France, Spain, and USA country groups. This suggests that sales
performance is consistent across these regions, with no country demonstrating a higher or lower
Hypothesis two, explored differences in sales trends across various product lines. By leveraging
Welch's ANOVA to account for unequal variances, the analysis found a significant difference in
total sales trends between the product lines. This indicates that sales performance varies
The evaluation of Hypothesis three utilized One-Way ANOVA to examine differences in total
orders across customer segments. The results showed no statistically significant differences,
47
implying that the customer segment does not have a substantial effect on the total number of orders
placed.
Finally, Hypothesis four assessed the relationship between credit limit and total amount spent using
Pearson correlation analysis. The strong positive correlation coefficient of 0.839, along with the
Overall, the statistical analyses presented in this document offer valuable insights into the
company's sales dynamics and customer behaviors. The rigorous application of well-established
benchmarks, such as Levene's Test, One-Way ANOVA, Welch's ANOVA, and Pearson correlation,
has delivered a comprehensive and reliable evaluation of the developed models. These findings
can inform strategic decision-making, product management, credit risk assessment, and targeted
48