570 Asm 2
570 Asm 2
Student declaration
I certify that the assignment submission is entirely my own work and I fully understand the consequences of plagiarism.
I understand that making a false declaration is a form of malpractice.
Student Signature
Uyen
Grading grid
1
P3 P4 P5 M2 M3 M4 D1 D2 D3
2
Description of activity undertaken
Assessor name:
3
Summative Feedbacks Resubmission Feedbacks
4
Table of Contents
Introduction ..................................................................................................................................................... 7
Differences between qualitative and quantitative raw data analysis ................................................................. 9
Quantitative data.......................................................................................................................................... 9
Advantages ............................................................................................................................................... 9
Disadvantages........................................................................................................................................... 9
Qualitative data .......................................................................................................................................... 10
Advantages ............................................................................................................................................. 10
Disadvantages......................................................................................................................................... 10
Descriptive statistics ....................................................................................................................................... 11
Measures of Central Tendency: mean, mode, and median .......................................................................... 11
Mean ...................................................................................................................................................... 11
Median ................................................................................................................................................... 13
Mode ...................................................................................................................................................... 14
Measures of Variability: range, variance and standard deviation .................................................................... 15
Range ......................................................................................................................................................... 15
Variance ..................................................................................................................................................... 16
Standard Deviation ..................................................................................................................................... 17
Inferential statistics .................................................................................................................................... 18
The differences between population and sample based on different sampling techniques and methods..... 18
One sample T-test: Estimation and Hypotheses testing ................................................................................... 19
Estimation .................................................................................................................................................. 19
Hypotheses testing ..................................................................................................................................... 21
Two-tailed testing ................................................................................................................................... 23
One-tailed testing ................................................................................................................................... 24
Two sample t-test ........................................................................................................................................... 26
Independent Sample T-test ......................................................................................................................... 27
Estimation .............................................................................................................................................. 27
Hypothesis testing .................................................................................................................................. 28
Dependent Sample T-test ........................................................................................................................... 30
Estimation .............................................................................................................................................. 30
Hypothesis testing .................................................................................................................................. 31
Measuring the association between two variables (from the dataset) ............................................................ 33
Correlation analysis .................................................................................................................................... 33
Regression .................................................................................................................................................. 35
Simple linear regression.............................................................................................................................. 35
Multiple linear regression ........................................................................................................................... 37
Histogram of the Residual ........................................................................................................................... 42
5
Normal P-P Plot of Regression..................................................................................................................... 43
Scatterplot .................................................................................................................................................. 44
Apply a range of statistical methods used in business planning for quality, inventory and capacity
management .................................................................................................................................................. 45
Probability Distribution ............................................................................................................................... 45
Joint Probability .......................................................................................................................................... 45
Conditional Probability ............................................................................................................................... 46
Applying a range of statistical methods used in business planning for quality, inventory, and capacity
management .................................................................................................................................................. 47
Measuring the variability in business processes or quality management ..................................................... 47
Measuring the probability by using probability distributions to business operations and processes............. 47
Normal Distribution ................................................................................................................................ 47
Poisson Distribution and Binomial Distribution........................................................................................ 50
Comparison ............................................................................................................................................ 50
Inference ................................................................................................................................................ 51
Using appropriate charts and tables to communicate findings of given variables ............................................ 52
Frequency table .......................................................................................................................................... 52
Bar chart ..................................................................................................................................................... 52
Pie chart ..................................................................................................................................................... 53
Histogram ................................................................................................................................................... 54
Scatter plot ................................................................................................................................................. 54
The strengths and weaknesses of using different types of charts and tables................................................ 55
The most effective way of communicating the results of the analysis .......................................................... 56
Conclusion...................................................................................................................................................... 58
References ..................................................................................................................................................... 59
6
Introduction
Statistical research in business empowers chiefs to analyze past execution, predict future business
strategic approaches and lead associations successfully. Statistics can portray markets, inform
advertising, set costs and respond to changes in customer interest. This assignment will present
measurable statistical definitions and explicit applications of them.
Bayerische Motoren Werke AG, normally known as Bavarian Motor Works, BMW or BMW AG, is a
German automobile, motorcycle and engine manufacturing organization established in 1916. BMW has
it’s headquarter in Munich, Bavaria. It likewise possesses and delivers Mini vehicles, and is the parent
organization of Rolls-Royce Motor Cars. BMW creates motorcycles under BMW Motorrad. In 2012,
the BMW Group created 1,845,186 autos and 117,109 bikes across its brands in general. BMW is
essential for the “German Big 3” luxury automakers, alongside Audi and Mercedes-Benz, which are the
three top rated extravagance automakers on the planet.
However, during its operation, technology is constantly innovating in this day and age. Therefore, the
company must update the latest technologies that can be applied to its products and services to attract
customers, raise the brand awareness and compete with competitors in the market. Besides, the company
also has some difficulties to face. This is likewise the reason for this report.
BMW is intending to work on the data framework and the decision-making process. As a research
examiner, I am assigned to lead an exploration by applying a few statistical methods, and present the
outputs to the BOD. This examination has significant implications for research investigation, it offers
the specialist with a precise and complete perspective on the numbers, data, and business context of an
organization. Simultaneously, when leading the examination, the information will be precise and
complete measurements. Consequently, the factual outcomes can uphold the organization's chiefs to
settle on new choices and techniques all the more adequately.
Secondary methodology is used in this statistic. Specifically, the data of used BMW car is acquired
from the distributed source by means of the website https://www.kaggle.com/. Moreover, dataset
contains 9 variables: information of price, transmission, mileage, fuel type, road tax, miles per gallon
(mpg), and engine size, and more than 50 observations.
This report is going to clearly provide definitions identified with statistics, three methods for
investigation, and a critical evaluation of the techniques for analysis. Moreover, the specified
measurable outputs of BMW Group will be clarified exhaustively in the assignment 2.
7
8
Differences between qualitative and quantitative raw data analysis
Quantitative data
Quantitative data refers to any data that can be evaluated — that is, numbers. Assuming that it tends to
be counted or estimated, and given a mathematical worth, it is quantitative in nature. Consider it a gauge.
Quantitative factors can tell you “the number of”, “how much”, or “how frequently”. (The fullstory
education team, 2021)
Advantages
Can be tested and checked. Quantitative exploration requires careful experimental design and the
capacity for anybody to repeat both the test and the outcomes. This makes the information you
accumulate more dependable and less bring to argument.
Direct analysis. At the point when you gather quantitative information, the sort of results will let you
know which factual tests are suitable to utilize. Therefore, interpreting your data and introducing those
discoveries is direct and less open to error and subjectivity.
Prestige. Research that includes complex measurements and information examination is viewed as
important and great on the grounds that many individuals do not understand the mathematics involved.
Quantitative exploration is related with specialized progressions like PC demonstrating, stock choice,
portfolio assessment, and different information based business choices. The relationship of renown and
worth with quantitative exploration can reflect well on your small business. (Devault, 2020)
Disadvantages
False focus on numbers. Quantitative research can be restricted in its quest for concrete, factual
connections, which can prompt scientists ignoring more extensive subjects and connections. By zeroing
in exclusively on numbers, you risk missing amazing or higher perspective data that can help your
business.
Difficulty setting up a research model. At the point when you conduct quantitative research, you need
to be careful fostering a hypothesis and setting up a model for gathering and breaking down data. Any
errors in your set up, bias with respect to the scientist, or missteps in execution can negate every one of
your outcomes. Even coming up with a hypothesis can be abstract, particularly assuming you have a
particular inquiry that you definitely realize you need to demonstrate or invalidate.
Can be misleading. Many individuals accept that on the grounds that quantitative exploration depends
on insights it is more sound or logical than observational, subjective examination. Nonetheless, the two
9
sorts of exploration can be emotional and deluding. The sentiments and predispositions of a scientist
are similarly prone to affect quantitative ways to deal with data gathering. Truth be told, the effect of
this inclination happens prior during the time spent quantitative examination than it does in subjective
exploration. (Devault, 2020)
Qualitative data
Everything revolves around the numbers. Quantitative exploration depends on the assortment and
understanding of numeric information. It centers on estimating (utilizing inferential measurements) and
summing up outcomes.
As far as advanced experience information, it puts everything as far as numbers (or discrete information)
— like the quantity of clients clicking a button, bob rates, time nearby, and then some. (The fullstory
education team, 2021)
Advantages
Qualitative Research can catch changing perspectives inside a target group such as purchasers of an
item or administration, or attitudes in the working environment.
Qualitative ways to deal with research are not limited by the restrictions of quantitative strategies.
Assuming reactions don't fit the specialist’s assumption that is similarly helpful subjective information
to add setting and maybe clarify something that numbers alone cannot uncover.
Qualitative Research gives a significantly more adaptable methodology. Assuming valuable
experiences are not being caught specialists can rapidly adjust questions, change the setting or some
other variable to further develop reactions.
Qualitative data catch permits specialists to be undeniably more theoretical with regards to what
regions they decide to explore and how to do as such. It permits information catch to be provoked by
a scientist’s instinctual or ‘stomach feel’ for where great data will be found. (Vaughan, 2021)
Disadvantages
The quality of qualitative data relies upon the nature of the scientists. Scientists need to have industry
experience and great talking abilities to ask follow-up inquiries. They additionally need to bond well
with the members to guarantee the exactness of the information. Consequently, assuming that the
analysts do not have industry experience or talking abilities, they will be unable to get great reactions
from the members. (Rahman, 2021)
10
Gathering subjective information is tedious. Assuming each meeting endures somewhere in the range
of one and two hours, a limit of three or four every day is regularly all that is conceivable (BPP Learning
Media, 2013).
A few inquiries might be awkward for members to reply in an up close and personal meeting, and along
these lines, they may not give answers illustrative of their actual sentiments.
Descriptive statistics
Measures of Central Tendency: mean, mode, and median
A measure of central tendency is a single value that endeavors to portray a bunch of information by
distinguishing the focal situation inside that arrangement of information. The mean, median and mode
are for the most part legitimate proportions of focal propensity, however under various conditions, a
few proportions of focal inclination become more proper to use than others.
Mean
The mean (or average) is the most famous and notable measure of central tendency. It tends to be utilized
with both discrete and persistent information, despite the fact that its utilization is frequently with
continuous data (statistics.leard.com, 2021). The mean is equivalent to the amount of the relative
multitude of qualities in the data set divided by the number of values in the data set. Thus, in the event
that we have n esteems in an informational index and they have values x1, x2, …, xn, the sample mean,
usually denoted by 𝑥 (pronounced "x bar"), is:
𝑥1 + 𝑥 2 + ⋯ + 𝑥 𝑥
𝑥=
𝑥
11
This formula is typically written in a marginally unique way utilizing the Greekcapitol letter ∑
pronounced “sigma”, which means “sum of...”:
𝑥𝑥
𝑥=
𝑥
Statistics
price (€)
N Valid 50
Missing 4793
Mean 17228.0
0
12
Median
The worth of the middlemost observation, gotten after organizing the data in ascending order, is known
as the median of the data. (statistics.leard.com, 2021)
n+1
Median = 2 𝑥ℎ (if n is an odd number)
( 𝑥)𝑥ℎ+( 𝑥+1
Median = 2 2 )𝑥ℎ
(if n is an even number)
2
Statistics
engine_power
N Valid 50
Missing 4793
Median 132.50
Example 1: The following dataset has an odd number of observations that are organized in ascending
order.
In this case: n = 5
13
1+5 𝑥ℎ = 3rd = 17
Median value =2
Example 2: The following dataset has an even number of observations that are organized in ascending
order.
In this case: n = 6
6 6+1
( )𝑥ℎ+( )𝑥ℎ 3𝑥𝑥 + 4𝑥ℎ 9+17
2 2
Median value = = = = 13
2 2 2
Mode
The mode is the most incessant score in our informational index. There is no recipe to sort out the Mode
of the dataset, yet it tends to be taken by the perception technique. (statistics.leard.com, 2021)
14
Statistics
mileage
N Valid 50
Missing 4793
Mode 13131a
a. Multiple modes exist.
The smallest value is
shown
Range
The range of a dataset is the distinction between the biggest and littlest qualities in that dataset. While
the reach is straightforward, it depends on just the two most outrageous qualities in the dataset, which
makes it truly defenseless to anomalies. In the event that one of those numbers is uncommonly high or
low, it influences the whole reach regardless of whether it is abnormal. (Frost, 2021)
Statistics
price (€)
N Valid 50
Missing 4793
Range 67900
Minimum 1800
Maximum 69700
Example:
In this case:
Minimum value of the dataset = 1
15
Maximum value of the dataset = 23
Range value of the dataset = Maximum value – Minimum value = 23 – 1 = 22
Variance
Variance is the average squared difference of the values from the mean. Unlike the previous measures
of variability, the variance includes all qualities in the evaluation by contrasting each worth with the
mean. To ascertain this measurement, you calculate a bunch of squared contrasts between the data points
and the mean, sum them, and afterward divide by the number of observations. There are two equations
for the variance contingent upon whether you are computing the variance for an entire population or
using a sample to gauge the populace difference. (Frost, 2021)
Sample Variance Population Variance
𝑥(𝑥 − 𝑥)2 𝑥(𝑥 − µ)2
𝑥2 = 𝑥2 =
𝑥−1 𝑥
Statistics
mileage
N Valid 50
Missing 4793
Mean 119943.82
Std. Deviation 71202.830
Variance 5069843006
.844
Example: Considering the following dataset including 2, 9, 5, 3, 8 and calculate the variance.
In this case: N = 5
Use the formula of Sample variance
∑x 2+9+5+3+8
𝑥= = = 5.4
𝑥 5
16
∑(X − M)2 (2 − 5.4)2 + (9 − 5.4)2 + (5 − 5.4)2 + (3 − 5.4)2 +(8 − 5. 4)2
𝑥2 = = = 9.3
𝑥−1 5−1
Standard Deviation
The standard deviation is the norm or commonplace distinction between every important item and the
mean. At the point when the qualities in a dataset are assembled nearer, you have a more modest
standard deviation. Then again, when the qualities are fanned out more, the standard deviation is bigger
on the grounds that the standard distance is more noteworthy.
The standard deviation is only the square root of the variance. Review that the difference is in squared
units. Henceforth, the square root returns the worth to the normal units. (Frost, 2021)
Sample standard deviation Population standard deviation
s = √𝑥2 σ = √σ2
𝑥 2 : the sample variance σ2 : the population variance
s: the sample standard deviation σ: the population standard deviation
Statistics
mileage
N Valid 50
Missing 4793
Mean 119943.82
Std. Deviation 71202.830
Variance 5069843006
.844
Example: Considering the following dataset including 2, 9, 5, 3, 8 and calculate the standard deviation.
In this case: N = 5
Use the formula of Sample standard deviation
∑x 2+9+5+3+8
𝑥= = = 5.4
𝑥 5
∑(X − M)2 (2 − 5.4)2 + (9 − 5.4)2 + (5 − 5.4)2 + (3 − 5.4)2 +(8 − 5.4)2
𝑥2 = = = 9.3
𝑥−1 5−1
𝑥 = √𝑥 2 = √9.3 = 3.05
17
Inferential statistics
Descriptive statistics portrays data (for instance, a diagram or graph) and inferential statistics permits
you to make expectations (“inferences”) from that data. With inferential statistics, you take data from
tests and make speculations about a populace. (statisticshowto.com, 2021)
There are two fundamental spaces of inferential statistics:
Estimating parameters. This implies taking a measurement from your sample data (for example the
sample mean) and utilizing it to offer something about a populace boundary (i.e. the population mean).
Furthermore, it is divided into point estimate and interval estimate.
Hypothesis tests. This is the place where you can utilize test information to address research questions.
Furthermore, Hypothesis testing in inferential measurement comprise of One-tail test and Two-tail test.
In basic terms, populace implies the total of all components under study having one or more normal
trademark. The population is not bound to individuals just, however it might likewise incorporate
creatures, occasions, articles, structures, and it very well may be of any size.
By the term sample, we mean a piece of populace picked indiscriminately for interest in the review. The
example so chose ought to be to such an extent that it address the populace in the entirety of its attributes,
and it ought to be liberated from predisposition, to create little cross-segment, as the example
perceptions are utilized to make speculations about the populace.
18
In spite of the above contrasts, it is additionally a fact that sample and population are identified with
one another, for example test is drawn from the populace, so without populace, test may not exist.
Further, the essential target of the sample is to make statistical inferences about the populace, and that
excessively would be pretty much as precise as could really be expected. The more prominent the size
of the sample, the higher is the degree of precision of speculation. (Surbhi, 2017)
Estimation
In statistics, estimation refers to the cycle by which one makes derivations about a populace, in light of
data acquired from an example. There are two sorts of estimation in statistics: point estimate and interval
estimate.
A point estimate is a worth of an example measurement that is utilized as a solitary gauge of a populace
boundary. No assertions are made with regards to the quality or accuracy of a point estimate.
Statisticians prefer interval estimates because interval estimates are joined by an assertion concerning
the level of certainty that the span contains the populace boundary being assessed. Interval estimates of
population parameters are known as confidence intervals. (Williams et al, 2020)
19
When σ is known
𝑥 𝑥: Population mean
µ = 𝑥 ± 𝑥𝑥/2
√𝑥
𝑥 : Sample mean
α: Significant level
Zα/2: The z-value of standard normal
distribution
𝑥: Population Standard deviation
n: Sample size
Example: Find the population mean with 10% significant level if a sample of 81 people has a mean of
65Kg and the population Standard Deviation is 20kg.
Conclusion: The population mean of this case ranges from 61.35Kg to 68.65Kg.
When σ is unknown
𝑥 𝑥: Population mean
µ = 𝑥 ± 𝑥𝑥/2
(𝑥−1)
√𝑥 𝑥: Sample mean
α : Significant level
𝑥𝑥/2 The value of the student (t) probability
(𝑥−1)
Example: Find the income of population mean with 95% confidence if a sample of 25 persons has a
mean $1000 and the sample standard deviation is $30. How much is the interval estimate of
population mean?
In this case: 𝑥 = 1000; 𝑥 = 30; 𝑥 = 25
20
the population mean with 95% confidence
the population mean with 100% — 95% = 5% = 0.05
𝑥
µ = 𝑥 ± 𝑥 𝑥/2 = 1000 ±2.064 30 = 1000 ± 12.384 = (987.616; 1012.384)
(𝑥−1) √𝑥 √25
Conclusion: The population mean of this case ranges from $987.616 to $1012.384.
Example calculated in excel: Calculate the population mean of female salaries in data with 95%
confidence.
After applying data given to the formula, the conclusion in this case is the population mean of female
salaries is between $36,088.53 and $38,331.33.
Hypotheses testing
Hypothesis testing is a type of statistical inference that utilizes information from an example to make
determinations about a populace boundary or a populace likelihood conveyance. Statistical analysts
test a hypothesis by estimating and inspecting a random sample of the population being examined. All
experts utilize an irregular populace test to test two distinct speculations: the null hypothesis and the
alternative hypothesis.
First, a conditional supposition that is made with regards to the boundary or circulation. This
supposition that is known as the invalid theory and is indicated by Ho. An alternative hypothesis
(denoted Ha), which is something contrary to what is expressed in the invalid speculation, is then
characterized. The hypothesis-testing methodology includes utilizing test information to decide if Ho
21
can be rejected. Assuming Ho is rejected, the statistical conclusion is that the alternative hypothesis Ha
is valid. (Williams et al, 2020)
The null hypothesis is typically a speculation of fairness between population parameters; e.g., a null
hypothesis might express that the population mean return is equivalent to zero. The alternative
hypothesis is viably something contrary to a null hypothesis (e.g., the population mean return is not
equal to zero). Along these lines, they are totally unrelated, and just one can be valid. In any case, one
of the two speculations will forever be valid. (Majaski, 2021)
The following table shows the various speculations in the pertinent sets. As far as hypothesis,
particularly, the Ho is always the one that has the equal (=) sign.
Ho Ha
Equal (=) Not equal (≠)
Greater than or equal to (≥) Less than (<)
Less than or equal to (≤) More than (>)
As a rule, a theory test about the value of a population mean µ must take one of the following three
structures (where µo is the hypothesized value of the population mean) which are portrayed as
underneath:
Ho: µ ≥ µo Ho: µ ≤ µo Ho: µ = µo
H a: µ < µ o H a: µ > µ o H a: µ ≠ µ o
One-tailed (Lower-tail) One-tailed (Upper-tail) Two-tailed
22
µ ≤ µo µ > µo (𝑥 − µ𝑥)√𝑥 𝑥 ≥ 𝑥𝑥
𝑥−1
𝑥=
𝑥
µ ≥ µo µ < µo 𝑥 ≤ 𝑥𝑥
𝑥−1
Two-tailed testing
Example 1: A clothing designer claims that: The mean height of adult females is 155cm. The evidence
we have is that a sample of 16 females had an average height of 153cm and the population standard
deviation is known to be 9cm. With significance level 1%, the claim of the A clothing designer is right
or wrong?
Step 1:
Assuming Ho: the mean height of adult females is 155cm, µ = µo = 155cm.
Ha: the mean height of adult females is NOT 155cm, µ ≠ µo = 155cm.
Step 2: µo = 155cm; n = 16; 𝑥 = 153cm; 𝑥 = 9cm
(𝑥 − µ𝑥)√𝑥 (155 − 153)√16
𝑥= = = 0.889
𝑥 9
Example 2: In the statistical task of BMW company, the author compares the mean of the "Price"
variable of the database in the attached excel file with the test value is 17200 at the 99% level of
confidence.
23
Step 1: Assuming Ho: the average price of the BMW is €17200: µ = µ𝑥 = 17200 Ha: the
average price of the BMW is NOT €17200: µ ≠ µ𝑥 = 17200
Step 2: Confidence coefficient is 99%
α = 1 – 99% = 1% = 0.01
Step 3:
One-Sample Statistics
Std. Std. Error
N Mean Deviation Mean
price 50 17228.0 12095.606 1710.577
(€) 0
One-Sample Test
Test Value = 17200
99% Confidence Interval
Sig. (2- Mean of the Difference
t df tailed) Difference Lower Upper
price .016 49 .987 28.000 -4556.26 4612.26
(€)
Conclusion: Ho should be accepted, which means the average price of the BMW is €17200: µ = µ𝑥 =
17200.
One-tailed testing
Example 1: A sample of 16 people has an average weight of 55kg and a standard deviation of 5kg. Let’s
test the claim whether the sample mean is less than or equal to 52kg at the 90% level of confidence.
Assuming Ho: the sample mean weight of people is less than or equal to 52kg, µ ≤ µo = 52kg.
Ha: the sample mean weight of people is greater than 52kg, µ > µo = 52kg.
24
Step 2: µo = 52kg; n = 16; 𝑥 = 55kg; 𝑥 = 5kg
(𝑥 − µ𝑥)√𝑥 (55 − 52)√16
𝑥= = = 1.2
𝑥 5
Step 3: Confidence = 90%
α = 10%; n = 16
𝑥𝑥 = 𝑥 0.1 = 1.341
𝑥−1 15
Example 2: In the statistical task of BMW Company, the author tested the mean of "Price" variable of
database in the attached excel file that is less than or equal to $20000 at 95% level of confidence
(significant value = 0.05) and 99% level of confidence (significant value = 0.01) by using t-value and
p-value test of statistic.
Assuming Ho: the mean price of the BMW is less than or equal to €20000: µ ≤ µ𝑥 = 17200 Ha: the
mean price of the BMW is more than €20000: µ > µ𝑥 = 17200
Findings:
After analyzing the Ho by excel, the above table presented findings:
25
t-multiple at 5% level of significant = -1.677
t-multiple at 1% level of significant = -2.405
t-value test = -1.621
p-value test = 0.056
Discussion:
By using t-value
At 5% level of significance: |𝑥 − 𝑥𝑥𝑥𝑥𝑥| = |−1.621| = 1.621 < |𝑥 − 𝑥𝑥𝑥𝑥𝑥𝑥𝑥𝑥| = |−1.677| = 1.677
Accept Ho
By using p-value
At 5% level of significance: p-value = 0.056 > alpha = 0.05
Accept Ho
Conclusion: It is highlighted that hypothesis Ho is accepted or not rejected, which means that the mean
price of the BMW is less than or equal to €20000: µ ≤ µ𝑥 = 17200.
26
• Data values must be independent. Estimations for one perception do not influence
estimations for some other perception.
• Data in each gathering should be acquired through a random sample from the population.
• Data in each group are ordinarily distributed.
• Data values are persistent.
• The variances for the two autonomous gatherings are equivalent.
Estimation
Example 1:
Step 1: Degree of freedom when standard deviations are similar (4 : 3.5 = 1.1428 < 1.5)
𝑥𝑥 = 𝑥1 + 𝑥2 − 2 = 22 + 20 − 2 = 40
Step 2:
Confidence coefficient = 95%
Significance level = 1 – 95% = 5% = 0.05
α/2 = 0.05/2 = 0.025
1 1
= 150 − 146 ± 2.021√14.21875 ( + ) = 4 ± 2.355
22 20
Step 5: Conclusion: We are 95% confidence that the difference between the average in Math grades of
female and male is 4 ± 2.355 = (1.645; 6.355). In other word, on average, males might be between 1.645 and
6.355 Math grade better than that of females.
Hypothesis testing
Example 2:
Conditions to reject Ho
Ho Ha Value testing Reject Ho
𝑥1 − 𝑥 2 = 0 𝑥1 − 𝑥2 ≠ 0 (𝑥1− 𝑥 2) − (𝑥 1 − 𝑥 2) 𝑥 ≤ −𝑥𝑥/2
𝑥= 1 1
√𝑥2 ( + ) or
𝑥 𝑥 𝑥2
1
𝑥 ≥ 𝑥𝑥/2
Step 1: Assuming
Ho that there is NO difference between Math grade of males and females, 𝑥1 − 𝑥2 = 0. Ha
that there is difference between Math grade of males and females, 𝑥1 − 𝑥2 ≠ 0.
Step 2: Similar Variance of two sample:
(𝑥1 − 1)𝑥1 2 + (𝑥 2 − 1)𝑥 2 2 (22 − 1)42 + (20 − 1)3.52
𝑥2 =
𝑥
= = 14.21875
𝑥1 + 𝑥2 − 2 22 + 20 − 2
Step 3:
Calculate t
(𝑥1 − 𝑥 2) − (𝑥 1 − 𝑥 2) (150 − 146) − 0
𝑥= = = 3.4334
1 1 √14.21875 ( 1
√𝑥2 ( + ) 1
+ )
𝑥𝑥 𝑥2 22 20
1
Calculate 𝑥𝑥,𝑥𝑥
2
29
𝑥𝑥 = 𝑥0.025,40 = 2.021
2 ,𝑥 𝑥
Step 4: Comparison
Reject Ho
Looking at the graph given below, we can see the Ho and Ha parts. Because the Ho part is rejected, we
will look at the sides in the right and left of Ha. And seeing that the value 𝑥 = 3.4334 > 𝑥 𝑥 ,𝑥𝑥 =
2
Step 5: Conclusion: there is difference between Math grade of males and females, 𝑥1 − 𝑥2 ≠ 0.
Example 3:
Assuming
Ho: there is NO difference between the average salary of technical staff and not technical staff: µ
− µ𝑥 = 0
Ha: there is difference between the average salary of technical staff and not technical staff: µ − µ𝑥 ≠ 0
Group Statistics
PC Std. Std. Error
Job N Mean Deviation Mean
Salary Yes 19 40305.2 6026.467 1382.566
6
No 189 39883.3 11662.429 848.317
9
30
Notice: Yes stands for technical staff
No stand for not technical staff
Conclusion: Accept Ho, which means there is NO difference between the average salary of technical
staff and not technical staff. In other words, the average salary of technical staff is equal to that of not
technical staff.
31
Husband 14 11 8 13 28
Diff (DF) = Wife - 5 -1 4 9 -3
Husband
Step 2:
Calculate SD
Have: 𝑥𝑥 = 5; 𝑥𝑥 = 2.8
Σ(𝑥𝑥 − 𝑥𝑥 )2 (5 − 2.8)2 + (−1 − 2.8)2 + (4 − 2.8)2 + (9 − 2.8)2 + (−3 − 2.8)2
𝑥2 =
𝑥
= = 23.2
𝑥𝑥 − 1 5−1
𝑥𝑥 = 4.816
Step 3:
DF = 𝑥𝑥 − 1 = 5 − 1 = 4
With 95% confidence
Significance level = 1 – 95% = 0.05
α/2 = 0.05/2 = 0.025
2 √ 𝑥 √5
Step 5:
Interpretation: The mean difference be as high as 8.78 (in favor of the wives) and as low as -3.18 (with
“-“ indicates that it could be in favor of the husbands).
Hypothesis testing
Example 2:
Testing the difference between the mean of the amount of money between Mile age variable and Price
variable based on a data set of BMW company at 95% level of confidence (α = 0.05).
Step 1: Assuming
Ho that there is NO difference in the average of the amount of money between Mile age and Price: µ1
− µ2 = 0
Ha that there is difference in the average of the amount of money between Mile age and Price: µ1 − µ2
32
≠0
33
Paired Samples Statistics
Std. Std. Error
Mean N Deviation Mean
Pair 1 mileage 119943.8 50 71202.830 10069.601
2
price 17228.00 50 12095.606 1710.577
In conclusion: There is difference in the average of the amount of money between Mile age and Price:
µ1 − µ2 ≠ 0
34
Measuring the association between two variables (from the dataset)
Correlation analysis
Correlation examination in research is a factual technique used to quantify the strength of the direct
connection between two factors and figure their affiliation. Basically – correlation analysis works out
the degree of progress in one variable because of the change in the other. A high correlation points to a
solid connection between the two factors, while a low correlation means that the variables are weakly
related. (QuestionPro, 2021)
The conceivable scope of qualities for the correlation coefficient is –1.0 to 1.0. All in all, the qualities
cannot surpass 1.0 or be not exactly –1.0. A correlation of – 1.0 shows an ideal negative connection,
and a correlation of 1.0 demonstrates an ideal positive connection. Assuming the correlation coefficient
is more prominent than zero, it is a positive relationship. Alternately, if the value is under zero, it is a
negative relationship. A value of zero shows that there is no connection between the two variables.
(Nickolas, 2021)
35
There are a few types of correlation coefficient equations. One of the most ordinarily utilized formula
is Pearson’s correlation coefficient one. In addition, there are the recipe of the correlation coefficient of
population and test.
Correlation coefficient of population:
𝑥(𝑥𝑥 − µ𝑥 )(𝑥𝑥 − µ𝑥 )
𝑥=
√𝑥(𝑥𝑥 − µ𝑥 )2 𝑥(𝑥𝑥 − µ𝑥 )2
Correlation coefficient of sample:
𝑥(𝑥𝑥 − 𝑥)(𝑥 𝑥 − 𝑥)
𝑥=
√𝑥(𝑥𝑥 − 𝑥)2 𝑥(𝑥𝑥 − 𝑥)2
The data given below is the practice of the BMW’s data analyzing the correlation coefficient Pearson
in SPSS so that we can understand the parameters in the correlation to make the statistical analysis more
effective. The table below is the results of running SPSS about the correlation with four variables as
Price, Engine power, Registration date, Mile age.
Correlations
engine_pow registration_ mile_ag
price er date e
price Pearson 1 .355* .702** -.603**
Correlation
Sig. (2-tailed) .011 .000 .000
N 50 50 50 50
engine_power Pearson .355* 1 .067 -.194
Correlation
Sig. (2-tailed) .011 .643 .177
N 50 50 50 50
registration_da Pearson .702** .067 1 -.623**
te Correlation
Sig. (2-tailed) .000 .643 .000
N 50 50 50 50
mile_age Pearson -.603** -.194 -.623** 1
Correlation
Sig. (2-tailed) .000 .177 .000
N 50 50 50 50
36
*. Correlation is significant at the 0.05 level (2-tailed).
**. Correlation is significant at the 0.01 level (2-tailed).
Looking at this table, we can see that Price, Engine power, Registration date, Mile age will have an
absolute linear relationship with themselves, so they all have r = 1. When looking at the Correlation
table, we will be interested in the sig and Pearson Correlation values. Sig value (with purple color) must
be less than α = 0.05 for the r correlation to be significant. Any value smaller than 0.05, we conclude
that the independent variable is linearly correlated with the dependent variable and value higher than
0.05, there is no correlation between the independent variable and the dependent variable.
Specifically as follows, the sig value of Registration date for the Engine power is 0.643 (> α = 0.05), so
there will be no linear correlation between these two variables. Also, we see that there is a correlation
between Engine power and Price because sig of 0.011 is smaller than 0.05 (< α = 0.05). And we will
know the correlation coefficient through Pearson Correlation (with green color). With the number 0.355,
it means that the correlation coefficient between Engine power and Price is 0.355.
Regression
Regression analysis is a bunch of measurable strategies utilized for the assessment of connections
between a reliant variable and at least one autonomous factors. It tends to be used to survey the strength
of the connection among factors and for demonstrating the future connection between them. Regression
models portray the connection between factors by fitting a line to the observed data. Linear regression
models apply a straight line, whilst calculated and nonlinear regression models utilize a curved line.
Regression permits you to assess how a reliant variable changes as the autonomous variable(s) change.
Regression investigation incorporates a few varieties, for example, linear, multiple linear, and nonlinear.
The most widely recognized models are simple linear and multiple linear. Nonlinear regression
examination is regularly utilized for more complicated data sets in which the reliant and free factors
show a nonlinear relationship. (CFI Education, 2021)
37
𝑥0 : the intercept, the predicted value of 𝑥 when the 𝑥 is 0
𝑥1 : the regression coefficient – how much we expect 𝑥 to change as 𝑥 increases
𝑥: the independent variable ( the variable we expect is influencing 𝑥)
𝑥: the error of the estimate, or how much variation there is in our estimate of the regression coefficient
Example 1:
A part-time employee is paid a basic salary of 1 million VND. In addition, if he signs a contract, he will
be paid an additional 200,000 VND. Let S be the total amount of money the employee receives in a
month, T is the total number of contracts he signed, and we have the equation 𝑥 = 1,000,000 + 200,000𝑥.
So we can easily calculate, assuming this guy signs 3 contracts, the total amount he gets is
𝑥 = 1,000,000 + 200,000 × 3 = 1,600,000 𝑥𝑥𝑥.
Example 2:
Practice on SPSS
I choose the dependent variable Price and the independent variable Engine power to run linear
regression for BMW dataset. The results obtained are the following four tables:
Variables Entered/Removeda
Variables Variables
Model Entered Removed Method
1 engine_pow . Enter
erb
a. Dependent Variable: price
b. All requested variables entered.
Model Summary
R Adjusted R Std. Error of
Model R Square Square the Estimate
1 .355a .126 .108 11422.903
a. Predictors: (Constant), engine_power
38
ANOVAa
Sum of Mean
Model Squares df Square F Sig.
1 Regressio 905710557. 1 905710557. 6.941 .011b
n 834 834
Residual 6263170242 48 130482713.
.166 378
Total 7168880800 49
.000
a. Dependent Variable: price
b. Predictors: (Constant), engine_power
Coefficientsa
Standardize
Unstandardized d
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 4648.878 5040.431 .922 .361
engine_pow 88.176 33.468 .355 2.635 .011
er
a. Dependent Variable: price
From ANOVA table has shown that the sig. of the F test is 0.011 that is smaller than 0.05. Therefore,
the regression model is statistically meaningful. After that, based on Coefficients table can determine
linear regression model like that: 𝑥 = 4648.878 + 88.176𝑥. More specifically, where 𝑥 is Price and 𝑥 is Engine
power, the equation will be expressed as 𝑥𝑥𝑥𝑥𝑥 = 4648.878 + 88.176 × 𝑥𝑥𝑥𝑥𝑥𝑥 𝑥𝑥𝑥𝑥𝑥.
39
ordinary least-squares (OLS) regression because it includes more than one logical variable. (Hayes,
2021)
Formula and Calculation of Multiple Linear Regression:
𝑥𝑥 = 𝑥 0 + 𝑥1𝑥𝑥1 + 𝑥2𝑥𝑥2 + ⋯ + 𝑥𝑥𝑥𝑥𝑥 + 𝑥
Where 𝑥 = 𝑥 observations:
𝑥𝑥: dependent variable
𝑥𝑥 : explanatory variables
𝑥0 : y-intercept (constant term)
𝑥𝑥 : slope coefficients for each explanatory variable
𝑥: the model’s error term (also known as the residuals)
When running SPSS on Multiple linear regression, we will be keen on boundaries, for example, R-
squared, Adjusted R-squared, Durbin-Watson in model summary, sig value in ANOVA table, VIF and
B in coefficient table.
The "R" segment addresses the worth of R, the multiple correlation coefficient. R can be viewed as
one proportion of the nature of of the dependent variable.
The "R Square" segment addresses the R2 value (likewise called the coefficient of assurance), which
is the extent of fluctuation in the reliant variable that can be clarified by the independent variables
(actually, it is the extent of variety represented by the regression model above and beyond the mean
model). (Laerd Statistics, 2021)
"Adjusted R Square" (adj. R2) is a modified form of R-squared that has been adapted to the quantity
of indicators in the model. The adjusted R-squared increments when the new term works on the model
more than would be normal by some coincidence. It decreases when an indicator works on the model
by not exactly anticipated. Commonly, the adjusted R-squared is positive, not negative. It is consistently
lower than the R-squared. (The Investopedia Team, 2021)
The Durbin Watson (DW) measurement is a test for autocorrelation in the residuals from a factual
model or regression analysis. The Durbin-Watson statistic will always have a worth ranging between 0
and 4. (Kenton, 2021)
Values from 1.5 to 2.5 point to no autocorrelation recognized in the sample.
If the value come closely to 0 means proportional autocorrelation.
If the value come closely to 4 means inverse autocorrelation.
Unstandardized coefficients show how much the reliant variable fluctuates with a free factor when any
remaining autonomous factors are held steady.
40
We can test for the statistical significance of each of the autonomous factors. This tests whether the
unstandardized (or normalized) coefficients are equivalent to 0 (zero) in the populace. If p < .05, we
can conclude that the coefficients are measurably essentially unique to 0 (zero). The t-value and
corresponding p-value are situated in the “t” and “Sig.” columns, respectively. (Laerd Statistics, 2021)
Variance inflation factor (VIF) is a proportion of how much multicollinearity in a bunch of various
regression variables. Numerically, the VIF for a regression model variable is equivalent to the
proportion of the general model difference to the fluctuation of a model that incorporates just that
solitary autonomous variable. This proportion is determined for each independent variable. A high VIF
shows that the related independent variable is exceptionally collinear with the other variables in the
model. (The Investopedia Team, 2021)
Precisely how huge a VIF must be before it causes issues is a subject of discussion. What is known is
that the more your VIF builds, the less solid your regression results are going to be. As a rule, a VIF
above 10 indicates high correlation and is cause for concern. Some authors propose a more safe degree
of 2.5 or above. (Stephanie, 2015)
A standardized beta coefficient compares the strength of the impact of every individual independent
variable to the dependent variable. The higher the absolute value of the beta coefficient, the stronger
the impact. (Stephanie, 2016)
Example:
Practice on SPSS the BMW’s dataset
Model Summaryb
Std. Error of
R Adjusted R the Durbin-
Model R Square Square Estimate Watson
1 .650a .422 .398 9387.581 1.737
a. Predictors: (Constant), engine_power, mile_age
b. Dependent Variable: price
Assuming Ho: R2 = 0
The model does not exist.
Ha: R2 ≠ 0
The model does exist.
41
From the Model Summary above, have R2 = 0.422, which means to reject Ho, the model does exist.
Next, have Adjusted R2 = 0.398, indicates that independent variables account for 39.8% of variability
of the dependent variable. The rest of 60.2% is accounted for by external variables and random error.
And the last indicator to mention is Durbin-Watson. Have Durbin-Watson value = 1.737 (ranges from
1.5 to 2.5 point) presents for no autocorrelation of the selected data.
ANOVAa
Sum of Mean
Model Squares df Square F Sig.
1 Regressio 3026927112 2 151346355 17.174 .000b
n .315 6.157
Residual 4141953687 47 88126674.2
.685 06
Total 7168880800 49
.000
a. Dependent Variable: price
b. Predictors: (Constant), engine_power, mile_age
From the ANOVA above, Sig. of F-test is 0.000 < α = 0.05. So we can conclude that the regression
model is statistically proper.
Coefficientsa
Standardize
Unstandardized d Collinearity
Coefficients Coefficients Statistics
Toleranc
Model B Std. Error Beta t Sig. e VIF
1 (Constant) 19750.508 5160.786 3.827 .000
mile_age -.094 .019 -.554 -4.906 .000 .962 1.039
engine_powe 61.512 28.037 .248 2.194 .033 .962 1.039
r
a. Dependent Variable: price
42
From the Coefficients table result above, none of VIF of any independent variable is greater than 10,
especially all are less than 2 (1.039 < 2), so it will be no collinearity.
Reversely, if VIF of any is greater than 10, look back to consider which 2 independent variables have
high correlation. Then eliminate those two variables respectively and compare their adjusted R square
from the two results after elimination. Select the correlation model with the higher adjusted R square.
Move on to the Sig. of the T-test column, all of the variables’ sig value is less than 0.05, 0.000 < 0.05
and 0.033 < 0.05, which means independent variables “Mile Age” and “Engine Power” impacts
dependent variable “Price”.
Coefficientsa
Standardize
Unstandardized d Collinearity
Coefficients Coefficients Statistics
Toleranc
Model B Std. Error Beta t Sig. e VIF
1 (Constant) 19750.508 5160.786 3.827 .000
mile_age -.094 .019 -.554 -4.906 .000 .962 1.039
engine_powe 61.512 28.037 .248 2.194 .033 .962 1.039
r
a. Dependent Variable: price
Let’s discuss the Coefficients table again, but this discussion will focus on the Unstandardized
Coefficients column.
According to B value in coefficients to create a regression equation, with the basic regression equation
and parameters like 𝑥 for “Price”, 𝑥1 for “Mile Age”, 𝑥2 for “Engine Power”, 𝑥0 is 19750.508, 𝑥1 is – 0.094,
𝑥2 is 61.512, I can have a regression equation below:
𝑥 = 19750.508 − 0.094𝑥1 + 61.512𝑥2
43
All factors unchanged, when Mile Age increases by 1, Price will decrease by 𝑥 = 19750.508 − 0.094 =
19750.414
All factors unchanged, when Engine Power increases by 1, Price will rise by 𝑥 = 19750.508 + 61.512
= 19812.02
Coefficientsa
Standardize
Unstandardized d Collinearity
Coefficients Coefficients Statistics
Toleranc
Model B Std. Error Beta t Sig. e VIF
1 (Constant) 19750.508 5160.786 3.827 .000
mile_age -.094 .019 -.554 -4.906 .000 .962 1.039
engine_powe 61.512 28.037 .248 2.194 .033 .962 1.039
r
a. Dependent Variable: price
Moving to the Standardized Coefficients column, this coefficient is better for proposing solutions and
standardized regression equation is formed in decreasing order of impact level of independent variables.
𝑥𝑥𝑥𝑥𝑥 = −0.554 × 𝑥𝑥𝑥𝑥 𝑥𝑥𝑥 + 0.248 × 𝑥𝑥𝑥𝑥𝑥𝑥 𝑥𝑥𝑥𝑥𝑥
To deeply evaluate, I have:
All factors unchanged, when Mile Age increases by 1, Price will rise by 𝑥𝑥𝑥𝑥𝑥 = −0.554
All factors unchanged, when Engine Power increases by 1, Price will rise by 𝑥𝑥𝑥𝑥𝑥 = 0.248
44
From the data above, Mean = −6.77𝑥 × 10−17 comes to 0, and Standard Deviation = 0.979 comes to
1. Therefore, it is clearly that the residual has a standard distribution.
45
From the plot above: the plots tightly associate with the regression line, so residual has standard
distribution.
Scatterplot
46
Based on the scatter plot, the scatters distribute randomly and gather around 0 axis. So I can jump to a
conclusion that linear relation between independent and dependent variables is not violated.
Probability Distribution
A probability distribution is a measurable capacity that depicts every one of the potential qualities and
probabilities that an irregular variable can take inside a given reach. This reach will be limited between
the base and greatest potential qualities, yet exactly where the conceivable worth is probably going to
be plotted on the likelihood circulation relies upon various elements. These elements incorporate the
dispersion’s mean (normal), standard deviation, skewness, and kurtosis. (Hayes, 2020)
Joint Probability
Joint probability is a factual measure that ascertains the probability of two occasions happening together
and at a similar moment. Joint likelihood is the likelihood of occasion Y happening while occasion X
happens. (Kenton, 2021)
P(A and B) if A and B are independent events:
𝑥(𝑥 𝑥𝑥𝑥 𝑥) = 𝑥(𝑥) ∗ 𝑥(𝑥)
The probability of A or B depends on if you have mutually exclusive events (ones that cannot happen
at the same time) or not.
P(A or B) if A and B are mutually exclusive:
𝑥(𝑥 𝑥𝑥 𝑥) = 𝑥(𝑥) + 𝑥(𝑥)
P(A or B) if A and B are NOT mutually exclusive:
𝑥(𝑥 𝑥𝑥 𝑥) = 𝑥(𝑥) + 𝑥(𝑥) − 𝑥(𝑥 𝑥𝑥𝑥 𝑥)
47
Conditional Probability
Conditional probability is characterized as the probability of an occasion or result happening, in view
of the event of a past occasion or result. Restrictive likelihood is determined by duplicating the
likelihood of the former occasion by the refreshed likelihood of the succeeding, or contingent, occasion.
(Barone, 2021)
This revised probability that an occasion A has happened, considering the extra data that one more
occasion B has most certainly happened on this preliminary of the analysis, is known as the restrictive
likelihood of A given B and is indicated by P(A|B).
𝑥(𝑥 𝑥𝑥𝑥 𝑥)
𝑥(𝑥|𝑥) =
𝑥(𝑥)
Example: I draw a sample of 50 students in class X, and classify them into Male and Female and as
subject Math, Literature, English.
48
𝑥(𝑥𝑥𝑥𝑥𝑥𝑥|𝑥𝑥𝑥𝑥𝑥𝑥ℎ) = 0.75 ≠ 𝑥(𝑥𝑥𝑥𝑥𝑥𝑥) = 0.6
The two events “Female” and “English” are dependent events.
Normal Distribution
Normal distribution, otherwise called as the Gaussian distribution, is a probability distribution that is
symmetric with regards to the mean, showing that information close to the mean are more continuous
in event than information a long way from the mean. In chart structure, ordinary dispersion will show
up as a bell curve. (Chen, 2021)
Standard normal density function:
1 2
𝑥(𝑥) = 𝑥−𝑥 /2
√2𝑥
The standard normal distribution has two parameters: the mean (of 0) and the standard deviation (of 1).
Formula:
49
𝑥−𝑥
𝑥= 𝑥
Example 1: The mean weight of female is normally distributed with 55kg and the standard deviation is
10kg. Let's calculate the probability of X ≤ 57.3kg?
Step 1: Apply to formula
𝑥−𝑥 57.3 − 55
𝑥= = = 0.23
𝑥 10
50
𝑥(𝑥 ≤ 57.3) = 𝑥(𝑥 < 0.23) = 0.5910 = 59.1%
Conclusion: The probability of X ≤ 57.3kg is 59.1%.
Example 2:
The mean of the distribution that client’s expenditure is $45, with an average standard deviation of $3.
Calculate the probability that a randomly selected client spends less than $36?
Calculate the probability that a randomly selected client spends between $13 and $33?
Calculate the probability that a randomly selected client spends more than $12?
Calculate the $ amount such that 80% of all clients spending no more than this?
By using Excel
51
Poisson Distribution and Binomial Distribution
The Poisson distribution describes the probability of encountering k occasions during a decent time
stretch. Assuming an irregular variable X follows a Poisson distribution, then the probability that X =
k events can be calculated by the following equation:
𝑥𝑥𝑥 −𝑥
𝑥(𝑥) =
𝑥!
The Binomial distribution portrays the likelihood of acquiring k triumphs in n binomial analyses.
Assuming an arbitrary variable X follows a binomial distribution, then the probability that X = k
successes can be found by the following formula:
Comparison
There are various comparative angles between these two dispersions: both are the discrete hypothetical
likelihood dissemination. Further, based on the upsides of boundaries, both can be unimodal or bimodal.
Additionally, the Binomial circulation can be approximated by the Poisson dispersion, if the quantity
of attempts (n) tends to infinity and success probability (p) tends to 0 so that m = np. (Surbhi, 2017)
The differences between Binomial and Poisson distribution can be drawn clearly on the following chart:
52
Inference
Inference, in statistics, the most common way of reaching inferences about a boundary one is looking
to quantify or appraise. One principal methodology of statistical inference is Bayesian estimation, which
joins sensible assumptions or earlier decisions (maybe dependent on past examinations), just as novel
perceptions or trial results. Another strategy is the probability approach, in which “prior probabilities”
are eschewed in favor of ascertaining a worth of the boundary that would be generally “reasonable” to
create the noticed dissemination of exploratory results.
In parametric inference, a specific numerical type of the dissemination work is accepted. Nonparametric
inference avoids this suspicion and is utilized to appraise boundary upsides of an obscure dissemination
having an obscure utilitarian structure.
53
Using appropriate charts and tables to communicate findings of given
variables
Frequency table
Frequency alludes to the occasions an occasion or a worth happens. A frequency table is a table that
rundowns things and shows the occasions the things happen. We represent the frequency by the English
alphabet ‘f’. A table that presents the frequency of different results in an example is called a Frequency
distribution table. (Mastin, 2020)
Bar chart
A bar is a graph that plots information utilizing rectangular bars or sections (called receptacles) that
address the total amount of observations in the data for that classification. Bar diagrams can be shown
with vertical segments, horizontal bars, comparative bars (multiple bars to show a comparison between
values), or stacked bars (bars containing multiple kinds of data). (Mitchell, 2021)
54
Pie chart
A Pie Chart is a type of diagram that shows information in a roundabout diagram. The bits of the
diagram are corresponding to the negligible portion of the entire in every classification. At the end of
the day, each cut of the pie is comparative with the size of that class in the gathering in general. The
whole “pie” addresses 100% of an entire, while the pie “cuts” address segments of the entirety.
(Statistics How To, 2021)
55
Histogram
A histogram is a graphical representation that puts together a gathering of elements into client indicated
ranges. Comparable in appearance to a structured presentation, the histogram consolidates an
information series into an effectively deciphered visual by taking numerous relevant items and gathering
them into consistent ranges or receptacles. (Chen, 2021)
Scatter plot
Scatter plots (also called scatter graphs) are similar to line diagrams. A line graph chart utilizes a line
on an X-Y hub to plot a ceaseless capacity, while a scatter plot uses dots to r address individual bits of
information. In statistics, these plots are helpful to check whether two factors are identified with one
another. For example, a scatter chart can propose a linear relationship (i.e. a straight line). (Statistics
How To, 2021)
56
The strengths and weaknesses of using different types of charts and tables
57
The most effective way of communicating the results of the analysis
Accounting data is frequently introduced as tables of numbers, at times essentially as a print out from
an accounting page or reports from an accounting software package. While this style of presentation
gives itemized figures, it may not forever be the best method for introducing and convey data. It is
possible that some key data ought to be featured, maybe connections between specific figures ought to
be underlined, or drifts recognized. Fitting show of information as diagrams or outlines can be a helpful
investigation instrument and assuming that the information is then adequately deciphered this can work
with the dynamic cycle.
First and foremost, bar diagrams are quite possibly the most well-known information visualization. You
can utilize them to rapidly think about information across classifications, feature contrasts, show
patterns and exceptions, and uncover recorded highs and lows initially. Bar graphs are particularly
powerful when you have information that can be parted into different classifications. Also, pie outlines
are strong for adding subtlety to different representations. Alone, a pie outline doesn't give the watcher
an approach to rapidly and precisely analyze data. Since the watcher needs to make setting all alone,
central issues from your information are missed. Rather than making a pie diagram the focal point of
your dashboard, take a stab at utilizing them to bore down on different representations. In conclusion,
Scatter plots are a successful method for exploring the connection between various factors, appearing
in the event that one variable is a decent indicator of another, or on the other hand assuming they will
generally change freely. A dissipate plot presents loads of particular informative items on a solitary
58
graph. The diagram would then be able to be improved with examination like group investigation or
pattern lines. (Rodgers, n.d)
59
Conclusion
To conclude, the report has successfully presented findings and recommendations to support decision-
making and business planning processes in BMW. To support those, I have analyzed and evaluated
qualitative and quantitative raw business data from a range of examples using appropriate statistical
methods. Additionally, I also applied a range of statistical methods used in business planning for quality,
inventory and capacity management. And lastly, I used appropriate charts and tables to communicate
findings of given variables. For the tools which I used, SPSS and Excel are very useful in presenting
clear data.
60
References
Barone, A., 2021. Conditional Probability. [Online] Investopedia. Available at:
<https://www.investopedia.com/terms/c/conditional_probability.asp#:~:text=Conditional%20probabil
ity%20is%20defined%20as,succeeding%2C%20or%20conditional%2C%20event> [Accessed: 13
May 2022].
Bevans, R., 2020. An introduction to simple linear regression. [Online] Scribbr. Available at:
<https://www.scribbr.com/statistics/simple-linear-regression/> [Accessed: 13 May 2022].
BPP Learning Media (2013). Business Essential Marketing Intelligence and Planning. London: BPP
Learning Media, 3rd Ed. [Accessed: 13 May 2022].
CFI Education, 2021. Regression Analysis – The estimation of relationships between a dependent
variable and one or more independent variables. Available at:
<https://corporatefinanceinstitute.com/resources/knowledge/finance/regression-analysis/> [Accessed:
13 May 2022].
Chen, J., 2021. Histogram. [Online] Investopedia. Available at:
<https://www.investopedia.com/terms/h/histogram.asp> [Accessed: 13 May 2022].
Chen, J., 2021. Normal Distribution. [Online] Investopedia. Available at:
<https://www.investopedia.com/terms/n/normaldistribution.asp> [Accessed: 13 may 2022]. Devault,
G., 2021. Advantages and Disadvantages of Quantitative Research. [Online] the balance small
business. Available at: <https://www.thebalancesmb.com/quantitative-research-advantages-and-
disadvantages-2296728> [Accessed: 13 may 2022].
Frost, J., 2021. Measures of Variability: Range, Interquartile Range, Variance, and Standard Deviation.
[Online] Statistics By Jim – Making statistics intuitive. Available at:
<https://statisticsbyjim.com/basics/variability-range-interquartile-variance-standard-deviation/>
[Accessed: 13 may 2022].
Hayes, A., 2020. Probability Distribution. [Online] Investopedia. Available at:
<https://www.investopedia.com/terms/p/probabilitydistribution.asp#:~:text=A%20probability%20dist
ribution%20is%20a,take%20within%20a%20given%20range.&text=These%20factors%20include%2
0the%20distribution's,deviation%2C%20skewness%2C%20and%20kurtosis> [Accessed: 13 May
2022].
Hayes, A., 2021. Multiple Linear Regression (MLR). [Online] Investopedia. Available at:
<https://www.investopedia.com/terms/m/mlr.asp> [Accessed: 13 May 2022].
Kenton, W., 2021. Durbin Watson Statistic Definition. [Online] Investopedia. Available at:
<https://www.investopedia.com/terms/d/durbin-watson-statistic.asp> [Accessed: 13 May 2022].
61
Kenton, W., 2021. Joint Probability Definition. [Online] Investopedia. Available at:
<https://www.investopedia.com/terms/j/jointprobability.asp#:~:text=Joint%20probability%20is%20a
%20statistical,time%20that%20event%20X%20occurs> [Accessed: 13 May 2022].
Laerd Statistics, 2021. Multiple Regression Analysis using SPSS Statistics. Available at:
<https://statistics.laerd.com/spss-tutorials/multiple-regression-using-spss-statistics.php> [Accessed: 13 May
2022].
Majaski, C., 2021. Hypothesis Testing. [Online] Investopedia. Available at:
<https://www.investopedia.com/terms/h/hypothesistesting.asp> [Accessed: 13 May 2022]. Mastin, L.,
2020. Frequency Statistic – Explanation & Examples. [Online] The Story of Mathematics. Available
at: <https://www.storyofmathematics.com/frequency-statistic> [Accessed: 13 May 2022].
Mitchell, C., 2021. Bar chart. [Online] Investopedia. Available at:
<https://www.investopedia.com/terms/b/bar-graph.asp> [Accessed:13 May 2022].
Nickolas, S., 2021. What Do Correlation Coefficients Positive, Negative, and Zero Mean? [Online]
Investopedia. Available at: <https://www.investopedia.com/ask/answers/032515/what-does-it-mean-if-
correlation-coefficient-positive-negative-or-zero.asp> [Accessed: 13 May 2022].
QuestionPro, 2021. Correlation analysis – Using correlation analysis to identify linear relationships
between two variables. Available at: <https://www.questionpro.com/features/correlation-
analysis.html> [Accessed:13 May 2022].
Rahman, M., 2021. Advantages and disadvantages of qualitative research. [Online] Howandwhat.
Available at: <https://howandwhat.net/advantages-disadvantages-qualitative-research/> [Accessed: 13
May 2022].
Reid, A., 2018. Advantages & Disadvantages of a Frequency Table. [Online] Sciencing. Available at:
<https://sciencing.com/do-calculate-class-width-8516043.html> [Accessed:13 May 2022]. Richards,
L., n.d. The Role of Probability Distribution in Business Management. [Online] Chron. Available
at: <https://smallbusiness.chron.com/role-probability-distribution-business-management-
26268.html> [Accessed:13 May 2022].
Rodgers, T., n.d. Which Type of Chart or Graph is Right for You? [Online] Available at:
<https://www.tableau.com/learn/whitepapers/which-chart-or-graph-is-right-for-you> [Accessed: 13 May 2022].
ROM Knowledgeware, 2011. Advantages and disadvantages of different types of graphs. [Online]
Available at: <http://www.kmrom.com/Site-En/Articles/ViewArticle.aspx?ArticleID=416> [Accessed:
13 may 2022].
62
Statistics How To, 2021. Pie Chart: Definition, Examples, Make one in Excel/SPSS. [Online] Available
at: <https://www.statisticshowto.com/probability-and-statistics/descriptive-statistics/pie-chart/>
[Accessed:13 may 2022].
Statistics How To, 2021. Scatter Plot / Scatter Chart: Definition, Examples, Excel/TI-83/TI-89/SPSS.
[Online] Available at: <https://www.statisticshowto.com/probability-and-statistics/regression-
analysis/scatter-plot-chart/> [Accessed: 13 May 2022].
Statistics Solutions, 2021. One Sample T-Test. [Online] Complete Dissertation by Statistics Solutions.
Available at: <https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/one-
sample-t-test/> [Accessed:13 May 2022].
statistics.leard.com, 2021. Measures of Central Tendency. [Online] Available at:
<https://statistics.laerd.com/statistical-guides/measures-central-tendency-mean-mode-median.php>
[Accessed:13 May 2022].
statisticshowto.com, 2021. Inferential Statistics: Definition, Uses. [Online] Available at:
<https://www.statisticshowto.com/probability-and-statistics/statistics-definitions/inferential-
statistics/> [Accessed:13 May 20222].
Stephanie, 2015. Variance Inflation Factor. [Online] Statistics How To. Available at:
<https://www.statisticshowto.com/variance-inflation-factor/> [Accessed: 25 December 2021].
Stephanie, 2016. Standardized Beta Coefficient: Definition & Example. [Online] Statistics How To.
Available at: <https://www.statisticshowto.com/standardized-beta-coefficient/> [Accessed: 25
December 2021].
Surbhi, S., 2017. Difference Between Binomial and Poisson Distribution. [Online] Available at:
<https://keydifferences.com/difference-between-binomial-and-poisson-distribution.html> [Accessed:
12 December 2021].
Surbhi, S., 2017. Difference Between Population and Sample. [Online] Key Differences. Available at:
<https://keydifferences.com/difference-between-population-and-sample.html> [Accessed: 06
December 2021].
The fullstory education team, 2021. Qualitative vs. quantitative data: what's the difference? [Online]
Available at: <https://www.fullstory.com/blog/qualitative-vs-quantitative-data/> [Accessed: 06
December 2021].
The Investopedia Team, 2021. R-Squared vs. Adjusted R-Squared: What's the Difference? Available
at: <https://www.investopedia.com/ask/answers/012615/whats-difference-between-rsquared-and-
adjusted-rsquared.asp> [Accessed: 25 December 2021].
63
The Investopedia Team, 2021. Variance Inflation Factor (VIF). Available at:
<https://www.investopedia.com/terms/v/variance-inflation-factor.asp> [Accessed: 13 may 2022].
Vaughan, T., 2021. 10 Advantages and Disadvantages of Qualitative Research. [Online] Poppulo.
Available at: <https://www.poppulo.com/blog/10-advantages-and-disadvantages-of-qualitative-
research> [Accessed:13 May 2022].
weebly, n.d. Pros and Cons of Histograms. [Online] Available at:
<https://histogramsdennard.weebly.com/pros-and-cons-of-histograms.html> [Accessed: 13 May
2022].
Weedmark, D., 2021. Importance of Variation in Total Quality Management. [Online] Chron. Available
at: <https://smallbusiness.chron.com/importance-variation-total-quality-management-52234.html>
[Accessed: 13 May 2022].
Williams, T. A., Anderson, D. R. and Sweeney, D. J. (2020). Statistics. [Online] Encyclopedia
Britannica. Available at: <https://www.britannica.com/science/statistics> [Accessed:13 May 2022].
64