Report Stat

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 21

Statistics is the study of the collection, analysis, interpretation, presentation, and organization

of data. Statistics is a branch that deals with every aspect of the data. Statistical knowledge
helps to choose the proper method of collecting the data and employ those samples in the
correct analysis process in order to effectively produce the results. In short, statistics is a crucial
process which helps to make the decision based on the data.

The term 'statistic' was introduced by the Italian scholar Girolamo Ghilini in 1589 with reference to
this science.[4][5] The birth of statistics is often dated to 1662, when John Graunt, along with William
Petty, developed early human statistical and census methods that provided a framework for
modern demography

1. Data Interpretation: Statistics help make sense of complex data by


summarizing, analyzing, and interpreting information.
2. Decision Making: It provides a basis for informed decision-making by
offering insights into trends, patterns, and relationships within data.
3. Research Validity: In scientific research, statistics validate findings and
conclusions, ensuring that results are not merely coincidental.
4. Risk Assessment: Businesses and industries use statistics to assess and
manage risks, aiding in strategic planning and risk mitigation.
5. Predictive Modeling: Statistics enable the creation of models that can
predict future trends and outcomes based on historical data.
6. Comparisons: It allows for meaningful comparisons between groups,
populations, or variables, facilitating a deeper understanding of differences
and similarities.
7. Quality Improvement: Industries use statistical methods to monitor and
improve processes, ensuring efficiency and quality control.
8. Public Policy: Governments use statistical data to formulate and evaluate
policies, addressing social, economic, and health issues.
9. Economics: Statistics play a crucial role in economic analysis, helping to
measure and understand economic phenomena, inflation, and
unemployment.
10. Education and Research: In academia, statistics is fundamental for
designing experiments, analyzing results, and drawing meaningful
conclusions in various fields.
11. Sports Analytics: Teams and athletes use statistical analysis to
enhance performance, identify strengths and weaknesses, and make
strategic decisions.
12. Medical Research: Statistics is vital in medical studies, from clinical
trials to epidemiological research, providing evidence for healthcare
decisions.
Organize Analyze Making
Gathering facts Summerizing data
and data Arranging data
in tabular,
Describing data conclusion
graphical or based on
for textual form
using statistical
presentation methods analyzed data

Collect Present Interpret

DESCRIPTIVE STATISTICS Methods of organizing, summarizing, and presenting data in an informative


way. For instance, the United States government reports the population of the United States was
179,323,000 in 1960; 203,302,000 in 1970; 226,542,000 in 1980; 248,709,000 in 1990; 265,000,000 in
2000; and 308,400,000 in 2010. This information is descriptive statistic.

INFERENTIAL STATISTICS The methods used to estimate a property of a population on the basis of a
sample. For example, a recent survey showed only 46 percent of high school seniors can solve problems
involving fractions, decimals, and percentages.

Variable : Any characteristic which may vary either in magnitude or in quality is called variable. A
variable may also be called a data item. Age, sex, business income and expenses,
country of birth, capital expenditure, class grades, eye colour and vehicle type are
examples of variables. It is called a variable because the value may vary between
data units in a population, and may change in value over time.

Numeric variables
Numeric variables have values that describe a measurable quantity as a
number, like 'how many' or 'how much'. Therefore numeric variables are
quantitative variables.

Numeric variables may be further described as either continuous or discrete:

 A continuous variable is a numeric variable. Observations can take any


value between a certain set of real numbers. The value given to an
observation for a continuous variable can include values as small as
the instrument of measurement allows. Examples of continuous
variables include height, time, age, and temperature.
 A discrete variable is a numeric variable. Observations can take a
value based on a count from a set of distinct whole values. A discrete
variable cannot take the value of a fraction between one value and the
next closest value. Examples of discrete variables include the number
of registered cars, number of business locations, and number of
children in a family, all of of which measured as whole units (i.e. 1, 2, 3
cars).

The data collected for a numeric variable are quantitative data.

Categorical variables

Categorical variables have values that describe a 'quality' or 'characteristic'


of a data unit, like 'what type' or 'which category'. Categorical variables fall
into mutually exclusive (in one category or in another) and exhaustive
(include all possible options) categories. Therefore, categorical variables are
qualitative variables and tend to be represented by a non-numeric value.

Categorical variables may be further described as ordinal or nominal:

 An ordinal variable is a categorical variable. Observations can take a


value that can be logically ordered or ranked. The categories
associated with ordinal variables can be ranked higher or lower than
another, but do not necessarily establish a numeric difference between
each category. Examples of ordinal categorical variables include
academic grades (i.e. A, B, C), clothing size (i.e. small, medium, large,
extra large) and attitudes (i.e. strongly agree, agree, disagree, strongly
disagree).
 A nominal variable is a categorical variable. Observations can take a
value that is not able to be organised in a logical sequence. Examples
of nominal categorical variables include sex, business type, eye colour,
religion and brand.
A variable has one of four different levels of measurement: Nominal,
Ordinal, Interval, or Ratio
1. Nominal Level:
 Definition: This is the most basic level, where data is categorized or
named without any specific order.
 Example: Colors, gender, or types of fruit. For instance, apples,
oranges, and bananas represent different categories, but there's no
inherent order among them.
2. Ordinal Level:
 Definition: In this level, data is not only categorized but also has a
meaningful order or ranking.
 Example: Educational degrees like high school, bachelor's, master's,
and doctorate. While we can order them, the intervals between them
aren't uniform. The gap between high school and bachelor's is not
necessarily the same as between bachelor's and master's.
3. Interval Level:
 Definition: Here, the data has a meaningful order, and the intervals
between values are consistent. However, there is no true zero point.
 Example: Temperature measured in Celsius or Fahrenheit. The
difference between 20 and 30 degrees is the same as the difference
between 30 and 40 degrees, but 0 degrees doesn't mean the absence
of temperature.
4. Ratio Level:
 Definition: This is the highest level, possessing all the characteristics
of the previous levels (nominal, ordinal, and interval), plus a true zero
point, indicating the absence of the measured attribute.
 Example: Height, weight, income. For instance, a weight of 0 kg
implies the absence of weight, and a person with twice the weight of
another person has exactly twice as much weight.
The second step of statistics is data presentation. The need for proper presentation arises because of
the fact that statistical data in their raw form almost defy comprehension. When data are presented in
easy to read form, it can help the reader acquire knowledge in much shorter period of time.

Data presentation is defined as the process of using various graphical formats to visually represent the
relationship between two or more data sets so that an informed decision can be made based on them.

1. Clarity: A well-presented set of data ensures that information is clear and


easily understandable.
2. Engagement: Visual appeal captures the audience's attention, keeping
them interested in the content.
3. Accessibility: A good presentation makes data accessible to a wider
audience, regardless of their expertise in the subject.
4. Decision-Making: Clear visuals aid in quick decision-making, as key
insights are easily discernible.
5. Professionalism: A polished presentation reflects positively on the
presenter and the organization, conveying a sense of professionalism.
6. Memorability: Visual elements enhance information retention, making it
more likely that the audience will remember key points.
7. Impact: Well-presented data has a greater impact, leaving a lasting
impression on the audience.
8. Communication: Effective data presentation is a form of communication,
conveying complex information in a straightforward manner.
9. Persuasion: Visuals can be persuasive, influencing opinions and decisions
based on the compelling representation of data.
10. Time Efficiency: A well-organized presentation saves time for both
the presenter and the audience, focusing on the most crucial information.

Types of Data Presentation

Broadly speaking, there are three methods of data presentation:

Textual- Out of the different methods of data presentation, this is the simplest one. All the findings are
written in a coherent manner and the job is done. The demerit of this method is that one has to read the
whole text to get a clear picture. The introduction, summary, and conclusion can help condense the
information.

Tabular-To avoid the complexities involved in the textual way of data presentation, people use tables
and charts to present data. In this method, data is presented in rows and columns. Frequency table.

Diagrammatic- Bar charts, Pie charts, Line graphs, Scatter plots etc.
A frequency distribution is a tabular
summary of data showing the
frequency (or number) of items in
each of several non-overlapping
classes.
The relative frequency of a class is the fraction or
proportion of the total number of data items
belonging to the class.
Cumulative frequency distribution − shows the
number of items with values less than or equal to
the upper limit of each class.
Class frequency: The number of observations in
each class.
For qualitative data-

BAR CHART A graph that shows qualitative classes on the horizontal axis and the class frequencies on
the vertical axis. The class frequencies are proportional to the heights of the bars.

PIE CHART A chart that shows the proportion or percentage that each class represents of the total
number of frequencies.

Example- SkiLodges.com is test marketing its new website and is interested in how easy its Web page
design is to navigate. It randomly selected 200 regular Internet users and asked them to perform a
search task on the Web page. Each person was asked to rate the relative ease of navigation as poor,
good, excellent, or awesome. The results are shown in the following table:

Awesome 102
Excellent 58
Good 30
Poor 10
1. Draw bar diagram and pie chart with frequency table.

Ans: Frequency table-

Navigation Frequency Relative Percentage Degree


frequency
Awesome 102 0.51 51% 183.6
Excellent 58 0.29 29% 104.4

Good 30 0.15 15% 54


Poor 10 0.05 5% 18

Total-200 Total-100% Total- 360

For quantitative data-

HISTOGRAM A graph in which the classes are marked on the horizontal axis and the class frequencies on
the vertical axis. The class frequencies are represented by the heights of the bars, and the bars are
drawn adjacent to each other.

A frequency polygon also shows the shape of a distribution and is similar to a histogram. It consists of
line segments connecting the points formed by the intersections of the class midpoints and the class
frequencies.

An ogive chart is a curve of the cumulative frequency distribution or cumulative relative


frequency distribution.
Example- Below is the frequency distribution of the profits on vehicle sales last month at the Applewood
Auto Group.

1. Draw histogram, frequency polygon and ogive curve.

Ans:

Profit ($) Midpoint($) Frequency Cumulative Frequency


200-600 400 8 8
600-1000 800 11 19
1000-1400 1200 23 42
1400-1800 1600 38 80
1800-2200 2000 45 125
2200-2600 2400 32 157
2600-3000 2800 19 176
3000-3400 3200 4 180
Total- 180
Time Series Data:
A time series data is simply a series of data points ordered in time. Most
commonly, a time series is a sequence taken at successive equally spaced points
in time. Thus it is a sequence of discrete-time data.
Line Graph:
A line graph is a type of chart used to show information that changes over time.
So Time series data can be drawn by Line Graph.
A measure of central tendency is a summary statistic that attempts to describe a set of data by
calculating the center point or typical value within that data set.

Measure of central
tendency

Mean Median Mode

Arithmetic Mean

Weighted Mean

Geometric Mean

Harmonic Mean

Mean:-

Arithmetic mean is the sum of all observations positive, negative, zero divided by number of
observations. Formula-

x̅ =
∑x n = no. of observation, ∑ x = sum of given dataset
n

Example: Find the mean of 9, 6, 3, 2, 7, 1

Add all the numbers first:

∑ x = 9+6+3+2+7+1 = 28
Now divide the total from 6, to get the mean.

Mean =
∑ x = 28 = 4.667
n 6

The weighted mean is a special case of the arithmetic mean. It occurs when there are several
observations of the same value. Formula-
x̅ =
∑w x
∑w
Example: During a one hour period on a hot Saturday afternoon cabana boy Chris served fifty drinks. He
sold five drinks for $0.50, fifteen for $0.75, fifteen for $0.90, and fifteen for $1.15. Compute the
weighted mean of the price of the drinks.

Ans:

The geometric mean is useful in finding the average change of percentages, ratios, indexes, or growth
rates over time. It has a wide application in business and economics because we are often interested in
finding the percentage changes in sales, salaries, or economic figures, such as the Gross Domestic
Product, which compound or build on each other. Formula-

Harmonic mean is the number of variables divided by the sum of the reciprocal of the variables.
Formula-

n
H.M. = ∑ 1
x
Advantage of the mean

The mean can be used for both continuous and discrete numeric data.

Limitations of the mean

The mean cannot be calculated for categorical data, as the values cannot be
summed.

As the mean includes every value in the distribution the mean is influenced
by outliers and skewed distributions.

Median:-

Median is the middle value of the observations after they have been ordered from the smallest to the
largest or from the largest to the smallest.

After sorting data the formula-

Example: The age of the members of a weekend poker team has been listed
below. Find the median of the above set.

{42, 40, 50, 60, 35, 58, 32}

Solution:

Arrange the data items in ascending order.

Ordered Set: {32, 35, 40, 42, 50, 58, 60}

Count the number of observations. If the number of observations is odd, then


we will use the following formula: Median = [(n + 1)/2]th term
Calculate the median using the formula.

Median = [(n + 1)/2]th term

= (7 + 1)/2th term = 4th term = 42

Median = 42

Advantage of the median

The median is less affected by outliers and skewed data than the mean and
is usually the preferred measure of central tendency when the distribution is
not symmetrical.

Limitation of the median

The median cannot be identified for categorical nominal data, as it cannot be


logically ordered.

Mode:-

The value which occurs with the highest frequency in the data set is called Mode. Data can have more
than one mode.

Example: The exam scores for ten students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87.calculate mode
Solution: Because the score of 81 occurs the most often, it is the mode.

Advantage of the mode

The mode has an advantage over the median and the mean as it can be
found for both numerical and categorical (non-numerical) data.

Limitations of the mode

The are some limitations to using the mode. In some distributions, the mode
may not reflect the centre of the distribution very well.
f= Frequency
For group data:-
x= midpoint

Mean=
∑ fx n= no. of observations
n
L= lower class limit of mean/median/mode class
n
− pcf pcf= previous cumulative frequency of mean/median/mode
Median= L+ 2 ×i
f class

Δ1 i= mean/median/ mode class interval


Mode= L+ Δ 1+ Δ 2 ×i
Δ 1= difference between mode class and previous class
frequency

Here n/2 is the indicator of mean, median


Δ 2 =difference between mode class and previous class
frequency
and mode class,

Example- Find the mean, median and mode for the following table:-

Class Frequency Class Frequency


300-375 69 600-675 58
375-450 167 675-750 24
450-525 207 750-825 10
525-600 65 Total 600

Ans:-

Class Midpoint (x) Frequency (f) fx Cumulative


Frequency
300-375 337.5 69 23287.5 69
375-450 412.5 167 68887.5 236
450-525 487.5 207 100912.5 443
525-600 562.5 65 36562.5 508
600-675 637.5 58 36975 566
675-750 712.5 24 17100 590
750-825 787.5 10 7875 600
Total=600 Total=291600

291600
Mean=
600
= 486
n/2th observation = 600/2 = 300th observation lies in class 450-525

300−236
Median= 450 + 207
× 75 = 473.1884

40
Mode= 450 + 40+142 ×75 = 420.5882

Quartile, Decile, and Percentiles of partition values represent various


perspectives on the same subject. To put it another way, these are values that
partition the same collection of observations in several ways. As a result, it can
divide these into many equal parts. According to the definition of the median, it
is the middle point in the axis frequency distribution curve, and it divides the
area under the curve into two areas with the same area on the left and right.
Example:- Find the 50%, 15% and 75% of the following data.

Class Frequency Class Frequency


300-375 69 600-675 58
375-450 167 675-750 24
450-525 207 750-825 10
525-600 65 Total 600

Ans:

Class Frequency (f) Cumulative Frequency

300-375 69 69
375-450 167 236
450-525 207 443
525-600 65 508
600-675 58 566
675-750 24 590
750-825 10 600
Total=600

For 50% we will use decile, D5

5× 600
So, D5 = 10 = 300th observation, lies in class 450-525

kn
− pcf 300−236
So, D5= 10 = 450 + × 75 = 473.1884
L+ ×i 207
f

For 15% we will use percentile, P15

15× 600
So, P15 = 100 = 90th observation, lies in class 375-450

kn
− pcf 90−69
So, P15= L+
10 0
×i
= 375+
169
×75 = 384.3195
f

For 75% we will use quartile, Q3

3× 600
So, Q3 = 4
= 450th observation, lies in class 525-600
kn
−pcf 450−443
So, Q3= L+ 4 ×i
= 52 5+
65
× 75 = 533.0769
f

Dispersion measures the spread or variability of a set of observations among themselves or about some
central values. Small dispersion means high uniformity. Measure of dispersion measures the variability
of dataset and gives us more reliable dataset.

Measures of dispersion are needed for four basic purposes:

(i) To determine the reliability of an average. Small dispersion consistent data

(ii) To serve as a basis for the control of the variability. Another purpose of measuring dispersion is to
determine nature and cause of variation in order to control the variation itself.

(iii) To compare two or more series with regard to their variability.

(iv) To facilitate the use of other statistical measures such as correlation analysis, the testing of
hypothesis, the analysis of fluctuations, techniques of production control, cost control, and so on are
based on measures of variation of one kind or another.

Measure of dispersion

Absolute measure Relative measure

Range Coefficient of range

Mean deviation Coefficient of mean deviation

Variance Coefficient of variance

Standard deviation Coefficient of quartile deviation

Quartile deviation

Range:-

It is the simplest measure of dispersion. Formula-

Range = Largest value – Smallest value

Example: Marks of 8 students are 65, 20, 55, 80, 75, 66, 96, 50. Calculate the range.
Ans: Range= 96-20 = 67

Limitation: Range cannot tell us anything about the character of the distribution within two extreme
observations

Mean Deviation: The arithmetic mean of the absolute values of the deviations from the
arithmetic mean. Formula-

Example:- Find the Mean Deviation of data values are 2, 3, 3, 3.5, 4, 5, 6.5, 7, 8, 8.

Ans:-

Merits: Less affected by the values of extreme observation.

limitations: The greatest limitation of this method is that algebraic sings are ignored while taking the
deviations of the items.

VARIANCE The arithmetic mean of the squared deviations from the mean.
STANDARD DEVIATION The square root of the variance

Merits of Standard Deviation: Among all measures of dispersion Standard Deviation is considered
superior because it possesses almost all the requisite characteristics of a good measure of dispersion.

1) It is based on all the observations of the data set.

2) It is amenable to further mathematical calculation.

Limitations: 1) It is more affected by extreme items

Coefficient of Variation expresses the standard deviation as a percentage of the mean. Formula-
Example: The mean final exam marks of Section A is 30 out of 40 with standard deviation 4 and the
mean final exam marks of Section B is 25 out of 40 with standard deviation 6. Which Section is more
consistent in getting final exam marks? Relative measure: Coefficient of Variation(c.v.)

Solution: For the students of section A,

Mean = 30

Standard deviation = 4

σ 4
Coefficient of variance = × 100 = ×100 = 13.33%
μ 30

For the students of section B,

Mean = 25

Standard deviation = 6

σ 6
Coefficient of variance = × 100 = ×100 = 24%
μ 25

Since c.v. of section A is less than c.v. of section B, so section A is more consistent in getting their final
exam marks.

You might also like