0% found this document useful (0 votes)
24 views19 pages

1st Mid

The document discusses basics of statistics including data, frequency distribution, diagrams, and central tendency. It defines statistics, describes descriptive and inferential statistics, and discusses population and sample. It also covers frequency distribution, graphical presentation of data including various diagrams, and concepts of parameter and statistic.

Uploaded by

rownokislam2018
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views19 pages

1st Mid

The document discusses basics of statistics including data, frequency distribution, diagrams, and central tendency. It defines statistics, describes descriptive and inferential statistics, and discusses population and sample. It also covers frequency distribution, graphical presentation of data including various diagrams, and concepts of parameter and statistic.

Uploaded by

rownokislam2018
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

1

STT:141 1st Mid


Basics of statistics,
Data, Frequency Distribution and Diagrams,
Central Tendency

Basics of statistics

Statistics is used in three distinct senses:


By statistics
● Numerical data relating to any field of enquiry.
● Scientific method of collecting explains, analyzes and interprets numerical data.
● A set of numerical characteristics calculated from a sample.

Sir Ronald A. Fisher (Known as the father of Statistics)


“The science of Statistics is essentially a branch of Applied Mathematics and may be
regarded as mathematics applied to observational data”.

Definition of Statistics:
Statistics is concerned with scientific methods for collecting, organizing, summarizing,
presenting, analyzing and interpreting data as well as with drawing valid conclusions and
making reasonable and effective decisions on the basis of such analysis.

Types of statistics:

Two types of statistics


Descriptive Statistics:
Describes the characteristics of a product or process using information collected on it.
organizing, summarizing, and presenting data in an informative way.
Inferential Statistics (Inductive):
The methods used to estimate a property of a population on the basis of sample.

Uses of Statistics
● Biology
● agriculture
● medical science
● business studies
● computer science

Population and Sample


● Population: Set of all items that possess a characteristic of interest
● Sample: Subset of a population
2

The Raw Material of Statistics

Data : The raw materials of statistics consist of numbers or observations and usually
obtained by some process of counting or measurement is referred to collectively as data.

Sources of Data :
● Primary data
● Secondary data

Variable :
A measurable quantity which varies from one to another.
Example: the height of the students, weight of the students, etc.

Random Variable :
The variable associated with probability is called the random variable.

Types of Variable:
1. Quantitative
2. Qualitative
Quantitative Variable
Discrete variable: When a variable can assume only isolated values.
Continuous variable: Any value within a given range or ranges
3

Parameter and Statistic

Parameter : A constant, which is a function of population values, can characterize the


variable of the underlying population to some extent and is usually unknown,

Statistic : Any function of sample values which is an estimate of the parameter and which is
a known value.

A Taxonomy of Statistics

Data, Frequency Distribution and Diagrams

Presentation of Data
Raw data are General , huge ,unwieldy.

There are three broad ways of presenting data.


● Textual presentation: data is presented along with the text
● Tabular presentation: arranged in a systematic way in rows and columns
● Graphic or diagrammatic presentation: for quick and ready comprehension.

Classification of Data

The purposes of classification are as follows:


● meaningful comparisons.
● condensing the data.
● aids in studying the relationship.
4

The bases of classification are:


● Geographical
● Chronological
● Quantitative

Frequency Distribution

Definition : A set of classes together with the frequencies of occurrence of values in each
class in a given set of data,presented in a tabular form.

Class interval: The interval defining a class is known as a class interval.

● Exclusive : any one of the end numbers is not considered in a class, the class
interval is exclusive. Usually, larger end numbers are excluded.
● Inclusive: When both the end numbers of classes are considered in a class.

Class limits: The end numbers of a inclusive class interval are known as class limits.

Class boundaries: The end numbers of an exclusive class interval are known as class
boundaries.

Interval size / Class width: The difference between the upper and lower class boundaries
is known as the width of the class. The common width is denoted by C. Class width may not
be equal or same for all the classes (specially for the terminal/ end classes).

Construction of a Frequency distribution

Decide on the size of the groups or the class-intervals. Generally 5 to 25 classes are
suggested.

The exact number of classes, it will depend on:


● The range of the data
● The total number of observations.

Construction of a frequency distribution

1. Find the range of the variable by subtracting the lowest value from the highest value.
2. Divide the range by 5 and 25, and round these numbers to the same degree of
accuracy as found in the original data
3. Arrange a sheet with three headings: class interval, tally marks, and frequency. Being
at the top with the class-interval which contains the smallest value, and continues
writing until the interval with the highest value is reached.
4. Read off the items on the original table of raw data and put, for each value, a tally
mark (/) against the appropriate class-interval. It is convenient to mark each fifth by a
diagonal tally mark(////).
5. Count the number of tally marks opposite each interval, and write the result in the
frequency column.
5

Graphical Presentation of data

Some of the uses of graphs are:


● helpful in elucidating the main features of a set of data.
● suggesting an appropriate method of analysis and in explaining the results.
● pinpoint gross error in statistical records

The principal limitations of graphs and diagrams are:


● May be misleading, unless drawn studied with care.
● The conclusions drawn from graphs regarded as tentative and, therefore, are not
suitable for more critical statistical analysis.

Types of diagram
Some types of diagrams are :
● Bar diagram
● Pie diagram
● Line diagram
● Histogram
● Frequency polygon
● C. F. Polygon
● Scatter diagram

Bar diagram
● This diagram is used mainly for portraying qualitative data.
● It is drawn by making a series of blocks of equal widths.
● The vertical blocks are alternatively known as bars.
● horizontal bars are also used for depicting qualitative data.
● The bars may be arranged in a chronological, numerical, or some other convenient
order.
6

Pie Chart
● This diagram is intended to compare the distinct components, which together
constitute a whole.
● A circle of arbitrary radius represents the whole and the segments of the circle
represent the component parts.
● To construct such a diagram we use the fact that “the whole” corresponds to the total
number of degrees in the circular are, namely, 3600.
● This type of diagram should be used for multiple segments.

Line Diagram
● This diagram is alternatively called a line graph or a time series graph.
● If we are given the values of a variable at different points of time, the set of values is
known as a time series and a line diagram is used to represent this type of data.
● In this diagram time is represented along the x-axis and the variable is plotted along
the y-axis.
7

Histogram
● To construct this diagram the horizontal axis is divided into segments corresponding
to the Class boundaries of the frequency distribution.
● On each segment a rectangle with area proportional to the frequency in the class is
erected. The set of adjacent rectangle so constructed constitutes a histogram.
● The histogram is particularly appropriate when the variable is continuous. A discrete
variable is also treated as a continuous variable while constructing a histogram.

Frequency polygon
● It is a diagram used to represent a frequency distribution.
● The mid-values of class intervals are plotted along the x-axis and corresponding
frequencies are plotted along the y-axis.
● The obtained points for each of the class-intervals are then joined by straight lines,
this forming with the x-axis a polygon called frequency polygon.
● The frequency polygon should be brought down at each end to the x-axis by joining it
to the mid-value (on the baseline) of the next outlying interval (of zero frequency).
8

Cumulative frequency polygon


A graph showing the cumulative frequency less than any upper class boundary (or more
than any lower class boundary) plotted against the upper class boundary (or lower class
boundary) is called a cumulative frequency polygon.

Two types of cumulative frequency polygon:


● less than type
● more than type

Ogive Curve
A graph of cumulative frequency distribution or Cumulative relative frequency distribution is
Ogive.
9

Scatter Diagram
Scatter diagrams are useful for displaying information on two quantitative variables which
are believed to be interrelated.

Types of Frequency Curve


● Bell-shaped Frequency curve
● Moderately asymmetrical Frequency Curve.
● J-shaped Frequency Curve.
● Frequency Curve for Rectangular Distributions.
10

Central Tendency

Measures of Central Tendency


● While distributions provide an overall picture of some data set, it is sometimes
desirable to represent some property of the entire data set using a single statistic.
● The first descriptive statistics we will discuss are those used to indicate where the
‘center’ of the distribution lies. — The expected value.
● It is not a value that has to be in the dataset itself
● There are different measures of central tendency, each with their own advantages
and disadvantages

Central Tendency
● Mean—Average
● Median—Middle
● Mode—Most frequent

Mean or Average
● The mean or average is obtained by adding up the values for all the observations
and then dividing by the number of observations.
● In general, the mean is the best measure of central tendency to use, but there are
exceptions.
11

AVERAGE-WEIGHTED AVERAGE
Used when a number of averages are combined with different frequencies

When the Mean Won’t Work


● When a distribution contains a few extreme scores the mean will be pulled toward the
extremes
● With data from a nominal scale or an ordinal scale it is impossible to compute a
mean
12

Median
● The median is the “middle” observation when the complete list of observations is
sorted in order.
● When there is a odd number of observations, the value of the middle one is the
median.
● When there is a even number of observations, the value of the average of the two
“middle” observations is used as the median.
● The median may be a better indication of the center of a group of numbers if there
are some values that are considerably more extreme than others.
● Median income is often used for this reason
13

When to Use the Median:


● Extreme scores or skewed distribution
● Undetermined values
● Open-ended distribution
● Ordinal scale

Median
Advantage:
Resistant to outliers

Disadvantage:
May not be so informative:
(1, 1, 2, 2, 2, 2, 5, 6, 9, 9, 10 )

Does the value of 2 really represent this sample as a whole very well?
14
15

Mode

The Mode is the value that occurs with the greatest frequency.
It is possible to have no modes in a series or numbers or to have more than one mode.

The formula for the mode of grouped data is:

Mean is the most commonly used measure of central tendency because:


● It uses every score.
● It’s usually a very good representative value.
● It’s mathematically related to variance and standard deviation (most common
measures of variability.

Advantages of the mode over the mean:


● Easy to compute
● Can be used with any scale of measurement.
● Especially useful when you can’t calculate a median or a mean

Mode

Advantages
● Very quick and easy to determine
● Is an actual value of the data
● Not affected by extreme scores

Disadvantages
● Sometimes not very informative (e.g. cigarettes smoked in a day)
● Can change dramatically from sample to sample
● Might be more than one (which is more representative?)
16

Central Tendency and the Shape of the Distribution

● Because the mean, the median, and the mode are all measuring central tendency,
the three measures are often systematically related to each other.
● In a symmetrical distribution, for example, the mean and median will always be
equal.

Central Tendency and the Shape of the Distribution (cont.)

● If a symmetrical distribution has only one mode, the mode, mean, and median will all
have the same value.
● In a skewed distribution, the mode will be located at the peak on one side and the
mean usually will be displaced toward the tail on the other side.
● The median is usually located between the mean and the mode.

Skewed distributions
17

Empirical Relation between Mean, Median and Mode

For moderately symmetrical distributions, the relation

Mode = Mean –3(Mean –Median) approximately holds.


For symmetrical distribution, the mean, median and mode coincide.

Harmonic Mean

The inverse of the arithmetic mean of the inverse values of a variable is known as the
harmonic mean of that variable. It is usually denoted by H . Let xi is the ith (t = 1, 2,………,
n) value of a variable x, then
18

SOME OTHER MEASURES OF FREQUENCY DISTRIBUTION

Quartile: There are three values which divide the distribution into four equal parts. These
values are called quartiles. The ith (i = 1,2,3) quartile is denoted by Qi and is defined as
quartile.

Li : Lower boundary of the ith quartile class


F-i : Cumulative frequency of the pre-ith quartile class
fi : Frequency of the ith quartile class
Ci : Width of the ith quartile class interval
n : Number of observations

Decile: There are nine values which divide a distribution into ten equal parts. These values
are called deciles. The jth (j = 1, 2, ……….9) decile is denoted by Dj and is defined as decile.

Lj : Lower boundary of the jth decile class


F-j :Cumulative frequency of the pre-jth decile class
fj : Frequency of the jth decile class
Cj : Width of the jth decile class interval
n : Number of observations

Percentile: There are nine values which divide a distribution into ten equal parts. These
values are called deciles. The jth (j = 1, 2, ……….9) decile is denoted by Dj and is defined as
percentile.

Lj : Lower boundary of the jth decile class


F-j :Cumulative frequency of the pre-jth decile class
fj : Frequency of the jth decile class
Cj : Width of the jth decile class interval
n : Number of observations
19

You might also like