Report Stat
Report Stat
Report Stat
of data. Statistics is a branch that deals with every aspect of the data. Statistical knowledge
helps to choose the proper method of collecting the data and employ those samples in the
correct analysis process in order to effectively produce the results. In short, statistics is a crucial
process which helps to make the decision based on the data.
The term 'statistic' was introduced by the Italian scholar Girolamo Ghilini in 1589 with reference to
this science.[4][5] The birth of statistics is often dated to 1662, when John Graunt, along with William
Petty, developed early human statistical and census methods that provided a framework for
modern demography
INFERENTIAL STATISTICS The methods used to estimate a property of a population on the basis of a
sample. For example, a recent survey showed only 46 percent of high school seniors can solve problems
involving fractions, decimals, and percentages.
Variable : Any characteristic which may vary either in magnitude or in quality is called variable. A
variable may also be called a data item. Age, sex, business income and expenses,
country of birth, capital expenditure, class grades, eye colour and vehicle type are
examples of variables. It is called a variable because the value may vary between
data units in a population, and may change in value over time.
Numeric variables
Numeric variables have values that describe a measurable quantity as a
number, like 'how many' or 'how much'. Therefore numeric variables are
quantitative variables.
Categorical variables
Data presentation is defined as the process of using various graphical formats to visually represent the
relationship between two or more data sets so that an informed decision can be made based on them.
Textual- Out of the different methods of data presentation, this is the simplest one. All the findings are
written in a coherent manner and the job is done. The demerit of this method is that one has to read the
whole text to get a clear picture. The introduction, summary, and conclusion can help condense the
information.
Tabular-To avoid the complexities involved in the textual way of data presentation, people use tables
and charts to present data. In this method, data is presented in rows and columns. Frequency table.
Diagrammatic- Bar charts, Pie charts, Line graphs, Scatter plots etc.
A frequency distribution is a tabular
summary of data showing the
frequency (or number) of items in
each of several non-overlapping
classes.
The relative frequency of a class is the fraction or
proportion of the total number of data items
belonging to the class.
Cumulative frequency distribution − shows the
number of items with values less than or equal to
the upper limit of each class.
Class frequency: The number of observations in
each class.
For qualitative data-
BAR CHART A graph that shows qualitative classes on the horizontal axis and the class frequencies on
the vertical axis. The class frequencies are proportional to the heights of the bars.
PIE CHART A chart that shows the proportion or percentage that each class represents of the total
number of frequencies.
Example- SkiLodges.com is test marketing its new website and is interested in how easy its Web page
design is to navigate. It randomly selected 200 regular Internet users and asked them to perform a
search task on the Web page. Each person was asked to rate the relative ease of navigation as poor,
good, excellent, or awesome. The results are shown in the following table:
Awesome 102
Excellent 58
Good 30
Poor 10
1. Draw bar diagram and pie chart with frequency table.
HISTOGRAM A graph in which the classes are marked on the horizontal axis and the class frequencies on
the vertical axis. The class frequencies are represented by the heights of the bars, and the bars are
drawn adjacent to each other.
A frequency polygon also shows the shape of a distribution and is similar to a histogram. It consists of
line segments connecting the points formed by the intersections of the class midpoints and the class
frequencies.
Ans:
Measure of central
tendency
Arithmetic Mean
Weighted Mean
Geometric Mean
Harmonic Mean
Mean:-
Arithmetic mean is the sum of all observations positive, negative, zero divided by number of
observations. Formula-
x̅ =
∑x n = no. of observation, ∑ x = sum of given dataset
n
∑ x = 9+6+3+2+7+1 = 28
Now divide the total from 6, to get the mean.
Mean =
∑ x = 28 = 4.667
n 6
The weighted mean is a special case of the arithmetic mean. It occurs when there are several
observations of the same value. Formula-
x̅ =
∑w x
∑w
Example: During a one hour period on a hot Saturday afternoon cabana boy Chris served fifty drinks. He
sold five drinks for $0.50, fifteen for $0.75, fifteen for $0.90, and fifteen for $1.15. Compute the
weighted mean of the price of the drinks.
Ans:
The geometric mean is useful in finding the average change of percentages, ratios, indexes, or growth
rates over time. It has a wide application in business and economics because we are often interested in
finding the percentage changes in sales, salaries, or economic figures, such as the Gross Domestic
Product, which compound or build on each other. Formula-
Harmonic mean is the number of variables divided by the sum of the reciprocal of the variables.
Formula-
n
H.M. = ∑ 1
x
Advantage of the mean
The mean can be used for both continuous and discrete numeric data.
The mean cannot be calculated for categorical data, as the values cannot be
summed.
As the mean includes every value in the distribution the mean is influenced
by outliers and skewed distributions.
Median:-
Median is the middle value of the observations after they have been ordered from the smallest to the
largest or from the largest to the smallest.
Example: The age of the members of a weekend poker team has been listed
below. Find the median of the above set.
Solution:
Median = 42
The median is less affected by outliers and skewed data than the mean and
is usually the preferred measure of central tendency when the distribution is
not symmetrical.
Mode:-
The value which occurs with the highest frequency in the data set is called Mode. Data can have more
than one mode.
Example: The exam scores for ten students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87.calculate mode
Solution: Because the score of 81 occurs the most often, it is the mode.
The mode has an advantage over the median and the mean as it can be
found for both numerical and categorical (non-numerical) data.
The are some limitations to using the mode. In some distributions, the mode
may not reflect the centre of the distribution very well.
f= Frequency
For group data:-
x= midpoint
Mean=
∑ fx n= no. of observations
n
L= lower class limit of mean/median/mode class
n
− pcf pcf= previous cumulative frequency of mean/median/mode
Median= L+ 2 ×i
f class
Example- Find the mean, median and mode for the following table:-
Ans:-
291600
Mean=
600
= 486
n/2th observation = 600/2 = 300th observation lies in class 450-525
300−236
Median= 450 + 207
× 75 = 473.1884
40
Mode= 450 + 40+142 ×75 = 420.5882
Ans:
300-375 69 69
375-450 167 236
450-525 207 443
525-600 65 508
600-675 58 566
675-750 24 590
750-825 10 600
Total=600
5× 600
So, D5 = 10 = 300th observation, lies in class 450-525
kn
− pcf 300−236
So, D5= 10 = 450 + × 75 = 473.1884
L+ ×i 207
f
15× 600
So, P15 = 100 = 90th observation, lies in class 375-450
kn
− pcf 90−69
So, P15= L+
10 0
×i
= 375+
169
×75 = 384.3195
f
3× 600
So, Q3 = 4
= 450th observation, lies in class 525-600
kn
−pcf 450−443
So, Q3= L+ 4 ×i
= 52 5+
65
× 75 = 533.0769
f
Dispersion measures the spread or variability of a set of observations among themselves or about some
central values. Small dispersion means high uniformity. Measure of dispersion measures the variability
of dataset and gives us more reliable dataset.
(ii) To serve as a basis for the control of the variability. Another purpose of measuring dispersion is to
determine nature and cause of variation in order to control the variation itself.
(iv) To facilitate the use of other statistical measures such as correlation analysis, the testing of
hypothesis, the analysis of fluctuations, techniques of production control, cost control, and so on are
based on measures of variation of one kind or another.
Measure of dispersion
Quartile deviation
Range:-
Example: Marks of 8 students are 65, 20, 55, 80, 75, 66, 96, 50. Calculate the range.
Ans: Range= 96-20 = 67
Limitation: Range cannot tell us anything about the character of the distribution within two extreme
observations
Mean Deviation: The arithmetic mean of the absolute values of the deviations from the
arithmetic mean. Formula-
Example:- Find the Mean Deviation of data values are 2, 3, 3, 3.5, 4, 5, 6.5, 7, 8, 8.
Ans:-
limitations: The greatest limitation of this method is that algebraic sings are ignored while taking the
deviations of the items.
VARIANCE The arithmetic mean of the squared deviations from the mean.
STANDARD DEVIATION The square root of the variance
Merits of Standard Deviation: Among all measures of dispersion Standard Deviation is considered
superior because it possesses almost all the requisite characteristics of a good measure of dispersion.
Coefficient of Variation expresses the standard deviation as a percentage of the mean. Formula-
Example: The mean final exam marks of Section A is 30 out of 40 with standard deviation 4 and the
mean final exam marks of Section B is 25 out of 40 with standard deviation 6. Which Section is more
consistent in getting final exam marks? Relative measure: Coefficient of Variation(c.v.)
Mean = 30
Standard deviation = 4
σ 4
Coefficient of variance = × 100 = ×100 = 13.33%
μ 30
Mean = 25
Standard deviation = 6
σ 6
Coefficient of variance = × 100 = ×100 = 24%
μ 25
Since c.v. of section A is less than c.v. of section B, so section A is more consistent in getting their final
exam marks.