1st Mid
1st Mid
Basics of statistics
Definition of Statistics:
Statistics is concerned with scientific methods for collecting, organizing, summarizing,
presenting, analyzing and interpreting data as well as with drawing valid conclusions and
making reasonable and effective decisions on the basis of such analysis.
Types of statistics:
Uses of Statistics
● Biology
● agriculture
● medical science
● business studies
● computer science
Data : The raw materials of statistics consist of numbers or observations and usually
obtained by some process of counting or measurement is referred to collectively as data.
Sources of Data :
● Primary data
● Secondary data
Variable :
A measurable quantity which varies from one to another.
Example: the height of the students, weight of the students, etc.
Random Variable :
The variable associated with probability is called the random variable.
Types of Variable:
1. Quantitative
2. Qualitative
Quantitative Variable
Discrete variable: When a variable can assume only isolated values.
Continuous variable: Any value within a given range or ranges
3
Statistic : Any function of sample values which is an estimate of the parameter and which is
a known value.
A Taxonomy of Statistics
Presentation of Data
Raw data are General , huge ,unwieldy.
Classification of Data
Frequency Distribution
Definition : A set of classes together with the frequencies of occurrence of values in each
class in a given set of data,presented in a tabular form.
● Exclusive : any one of the end numbers is not considered in a class, the class
interval is exclusive. Usually, larger end numbers are excluded.
● Inclusive: When both the end numbers of classes are considered in a class.
Class limits: The end numbers of a inclusive class interval are known as class limits.
Class boundaries: The end numbers of an exclusive class interval are known as class
boundaries.
Interval size / Class width: The difference between the upper and lower class boundaries
is known as the width of the class. The common width is denoted by C. Class width may not
be equal or same for all the classes (specially for the terminal/ end classes).
Decide on the size of the groups or the class-intervals. Generally 5 to 25 classes are
suggested.
1. Find the range of the variable by subtracting the lowest value from the highest value.
2. Divide the range by 5 and 25, and round these numbers to the same degree of
accuracy as found in the original data
3. Arrange a sheet with three headings: class interval, tally marks, and frequency. Being
at the top with the class-interval which contains the smallest value, and continues
writing until the interval with the highest value is reached.
4. Read off the items on the original table of raw data and put, for each value, a tally
mark (/) against the appropriate class-interval. It is convenient to mark each fifth by a
diagonal tally mark(////).
5. Count the number of tally marks opposite each interval, and write the result in the
frequency column.
5
Types of diagram
Some types of diagrams are :
● Bar diagram
● Pie diagram
● Line diagram
● Histogram
● Frequency polygon
● C. F. Polygon
● Scatter diagram
Bar diagram
● This diagram is used mainly for portraying qualitative data.
● It is drawn by making a series of blocks of equal widths.
● The vertical blocks are alternatively known as bars.
● horizontal bars are also used for depicting qualitative data.
● The bars may be arranged in a chronological, numerical, or some other convenient
order.
6
Pie Chart
● This diagram is intended to compare the distinct components, which together
constitute a whole.
● A circle of arbitrary radius represents the whole and the segments of the circle
represent the component parts.
● To construct such a diagram we use the fact that “the whole” corresponds to the total
number of degrees in the circular are, namely, 3600.
● This type of diagram should be used for multiple segments.
Line Diagram
● This diagram is alternatively called a line graph or a time series graph.
● If we are given the values of a variable at different points of time, the set of values is
known as a time series and a line diagram is used to represent this type of data.
● In this diagram time is represented along the x-axis and the variable is plotted along
the y-axis.
7
Histogram
● To construct this diagram the horizontal axis is divided into segments corresponding
to the Class boundaries of the frequency distribution.
● On each segment a rectangle with area proportional to the frequency in the class is
erected. The set of adjacent rectangle so constructed constitutes a histogram.
● The histogram is particularly appropriate when the variable is continuous. A discrete
variable is also treated as a continuous variable while constructing a histogram.
Frequency polygon
● It is a diagram used to represent a frequency distribution.
● The mid-values of class intervals are plotted along the x-axis and corresponding
frequencies are plotted along the y-axis.
● The obtained points for each of the class-intervals are then joined by straight lines,
this forming with the x-axis a polygon called frequency polygon.
● The frequency polygon should be brought down at each end to the x-axis by joining it
to the mid-value (on the baseline) of the next outlying interval (of zero frequency).
8
Ogive Curve
A graph of cumulative frequency distribution or Cumulative relative frequency distribution is
Ogive.
9
Scatter Diagram
Scatter diagrams are useful for displaying information on two quantitative variables which
are believed to be interrelated.
Central Tendency
Central Tendency
● Mean—Average
● Median—Middle
● Mode—Most frequent
Mean or Average
● The mean or average is obtained by adding up the values for all the observations
and then dividing by the number of observations.
● In general, the mean is the best measure of central tendency to use, but there are
exceptions.
11
AVERAGE-WEIGHTED AVERAGE
Used when a number of averages are combined with different frequencies
Median
● The median is the “middle” observation when the complete list of observations is
sorted in order.
● When there is a odd number of observations, the value of the middle one is the
median.
● When there is a even number of observations, the value of the average of the two
“middle” observations is used as the median.
● The median may be a better indication of the center of a group of numbers if there
are some values that are considerably more extreme than others.
● Median income is often used for this reason
13
Median
Advantage:
Resistant to outliers
Disadvantage:
May not be so informative:
(1, 1, 2, 2, 2, 2, 5, 6, 9, 9, 10 )
Does the value of 2 really represent this sample as a whole very well?
14
15
Mode
The Mode is the value that occurs with the greatest frequency.
It is possible to have no modes in a series or numbers or to have more than one mode.
Mode
Advantages
● Very quick and easy to determine
● Is an actual value of the data
● Not affected by extreme scores
Disadvantages
● Sometimes not very informative (e.g. cigarettes smoked in a day)
● Can change dramatically from sample to sample
● Might be more than one (which is more representative?)
16
● Because the mean, the median, and the mode are all measuring central tendency,
the three measures are often systematically related to each other.
● In a symmetrical distribution, for example, the mean and median will always be
equal.
● If a symmetrical distribution has only one mode, the mode, mean, and median will all
have the same value.
● In a skewed distribution, the mode will be located at the peak on one side and the
mean usually will be displaced toward the tail on the other side.
● The median is usually located between the mean and the mode.
Skewed distributions
17
Harmonic Mean
The inverse of the arithmetic mean of the inverse values of a variable is known as the
harmonic mean of that variable. It is usually denoted by H . Let xi is the ith (t = 1, 2,………,
n) value of a variable x, then
18
Quartile: There are three values which divide the distribution into four equal parts. These
values are called quartiles. The ith (i = 1,2,3) quartile is denoted by Qi and is defined as
quartile.
Decile: There are nine values which divide a distribution into ten equal parts. These values
are called deciles. The jth (j = 1, 2, ……….9) decile is denoted by Dj and is defined as decile.
Percentile: There are nine values which divide a distribution into ten equal parts. These
values are called deciles. The jth (j = 1, 2, ……….9) decile is denoted by Dj and is defined as
percentile.