Sta 321
Sta 321
People view statistics in many ways. Generally, it is a subject that deals with percentages, charts,
graphs, averages and tables. Some people think that statistics is a subject consisting of rules,
methods and techniques of collecting and presenting large amount of numerical information, while
other people think that it is a subject of making inferences about the population based on sample
information.
Definition of Statistics: - Statistics is defined as scientific technique used in collection, presentation,
analysis and interpretation of numerical data, and drawing inferences from these data.
Meaning of statistics: - The word statistics in plural sense means the numerical observations
collected for some definite purpose in any field of study. In singular sense, it means a body of methods
used in collection, presentation, analysis and interpretation of data. It is also used as plural of statistic
which means any numerical measure from a sample.
Why Study Statistics?
Statistics is required for many programs. Why is statistics required in so many majors?
The first reason is that numerical information is everywhere. Look in the newspapers, news
magazines, business magazines, or general interest magazines, or sports magazines, and you will be
bombarded with numerical information.
A second reason for taking a statistics course is that statistical techniques are used to make decisions
that affect our daily lives. That is, they affect our personal welfare.
A third reason for taking a statistics course is that the knowledge of statistical methods will help you
understand how decisions are made and give you a better understanding of how they affect you.
No matter what line of work you select, you will find yourself faced with decisions where an
understanding of data analysis is helpful. In order to make an informed decision, you will need to be
able to:
Determine whether the existing information is adequate or additional information is required.
Gather additional information, if it is needed, in such a way that it does not provide misleading
results.
Summarize the information in a useful and informative manner.
Analyze the available information.
Draw conclusions and make inferences while assessing the risk of an incorrect conclusion.
In summary, there are at least three reasons for studying statistics:
Data are everywhere
Statistical techniques are used to make many decisions that affect our lives
No matter what your career, you will make professional decisions that involve data. An
understanding of statistical methods will help you make these decisions more effectively.
Scope/ Importance of statistics
Statistics plays an important role in almost every field of humane life.
Insurance Companies:Insurance companies use statistical analysis to set rates for home, automobile,
life, and health insurance. Tables are available that summarize the probability that a 25-year-old
woman will survive the next year. Based on these probabilities, life insurance premiums can be
established.
Environment:The Environmental Protection Agency is interested in the water quality at a certain
city. They periodically take water samples to establish the level of contamination and maintain the
level of quality.
Medical Field:Medical researchers study the cure rates for diseases using different drugs and
different forms of treatment. For example, what is the effect of treating a certain type of knee injury
surgically or with physical therapy? If you take an aspirin each day, does that reduce your risk of a
heart attack?
Administration: Statistics plays an important role in the field of administration. A modern
administrator depends upon statistical data. Preparation of budget is impossible without statistical
record.
Banking:Statistical methods are helpful to the bankers. They can estimate the amount of money that
is required to fulfill the demands of depositors during various days of week.
Agriculture:Statistical methods help to study the comparison of various varieties of seed or
fertilizers.
Business& Economics:Statistical tools can be applied in the study of economic problems and the
business activity. A businessman depends upon statistical data for studying the need and desire of
consumer according to their tastes.
Moreover, statistics plays a vital role in all sciences like Physics, Chemistry, Biology,
Psychology, Sociology, Zoology and Botany.
Functions of statistics:
There are four main functions of statistics are
1. Collection of Data 2. Presentation of Data 3. Analysis of Data 4. Interpretation of Results
Branches of statistics:
Statistics can be divided in to following branches:
1. Descriptive Statistics
2. Inferential Statistics
Descriptive Statistics:It is a branch of statistics that refers to the methods and principles of collecting
information and presenting such information in the form of tables and graphs.
Inferential Statistics:It is a branch of statistics that deals with procedures of drawing inferences about
the population on the basis of the sample information obtained from a sample.
Population:A population is an aggregate of individuals, or objects or material about population which
some sort of information is required. OR The whole form which a sample is drawn is known as
population. OR The aggregate of objects with which we are concerned is called population. e.g
Heights of students in Statistics department is a population. Number of observation in the population
is denoted by “N” and is called population size.
Finite Population:A population is Finite if its individuals can be counted. For example, the
population of Universities in Vehari.
Infinite Population:A population is infinite if its individuals cannot be counted. For example,
population of stars in the sky.
Sample:Representative part of population is known as sample.OR Any subset of a population is
called a sample. Number of observation in a sample is denoted by “n” and is called sample size. e.g
Heights of five students in statistics department selected from all students in statistics department.
Parameter:Any numerical value describing a characteristic of a population is called a parameter. It
is usually denoted by Greek small letters. For example, population mean and Population standard
deviation .
Statistic:Any numerical value describing a characteristic of a sample is called a statistic. It is usually
denoted by Latin letters. For example, sample mean X and sample standard deviation S .Constant:
A quantity which can assume only one value is called constant. For example, e 2.7182 3.1415
.
Variable:A measurable quantity which changes from one individual to another individualis called
variable. For example, Speed of the car, Heights of students.
Types of Variable:
1. Quantitative variable 2. Qualitative variable/Attributes
Quantitative variable:The variables which can be numerically measured are called quantitative
variable. For example, Heights of students, Speed of car.
Types of Quantitative variable:
1. Discrete variable 2. Continuous variable
Discrete variable:A variable which can assume only some specific values or the values in whole
numberswith in a given range is called discrete variable. For example, No, of children in a family,
No, of chairs in a class room.
Continuous variable: A variable which can assume all possible values or the values in fraction with
in a given range is called continuous variable. For example, speed of car, heights of students.
Qualitative variable/Attributes: A variable which cannot be numerically measured, but only its
presence or absence can be described is called qualitative variable or attributes. For example, Sex,
Religion, Eyes colour, Beauty.
Data: The set of observations is called data.
Statistical Data: A sequence of observations, made on a set of objects included in the sample drawn
from a population is called statistical data. These observations may be obtained either by counting or
by measurement.
Primary data:Data obtained from the original or directsource and have not undergone any sort of
statistical treatment are calledPrimary data
Secondary data:Data that have undergone any sort of statistical treatment at least once are
calledSecondary data.
Collection of primary data:
1. Direct personal investigation 2. Through Investigators 3. Through questionnaires. 4.
Through local sources 5. Through telephone and Internet.
Direct personal investigation: In this method, the researcher collects the data personally. Such type
of data is considered more accurate and complete, but it is possible only for small scale survey.
Through Investigators: In this method, the trained investigators are employed to collect the data
they fill in the questionnaire form after asking the required information’s.
Through questionnaires: In this method, the required information’s are obtained by sending the
questionnaire form to the selected individuals by mail who fill in the questionnaire and returned it to
the investigator. This method is cheap and non-response rate is very high.
Through local sources:In this method, the required information’s are obtained from the local
representatives or agents. This method is quick but give only rough estimates.
Through telephone and Internet:In this method, the required information is obtained by contacting
the individuals on telephone or internet. This method is quick and gives accurate estimates.
Collection of Secondary data:
1. Government organization 2. Semi- Government organizations 3. Newspapers
Government organizations: Secondary data may be obtained from Federal Bureau of Statistics,
Ministries of food, Education department, Health department, etc.
Semi- Government organizations: Municipal committees, district councils, commercial and
financial institutions, banks, etc.
Newspapers:Secondary data may be obtained from newspapers.
REPRESENTATION OF DATA
A major reason for calculating statistics is to describe andsummarize a set of data. A mass of numbers
is not usually veryinformative so we need to find ways of abstracting the key information thatallows
us to present the data in a clear and comprehensible form.
PRESENTATION OF DATA:
The raw data, which have been collected are usually very large in quantity. Therefore, we have to
organize and summarize the collected data in such a form that is easy to understand. This is called
presentation of statistical data.
Array:The arrangement of data in ascending or descending order of magnitude is called an array.
Different methods used in the presentation of statistical data
1. Classification 2. Tabulation 3. Diagram 4. Graph
Classification: Process of arranging the data into relatively homogenous groups or classes according
to some common characteristics is called classification. For example, population of the country is
classified according to age, sex, religion and marital status.
Tabulation:The systematic arrangement of the data in the form of rows and columns for the purpose
of comparison and analysis is known as tabulation.
Frequency distribution: A frequency distribution is a tabular arrangement of data in which various
items are arranged into classes or groups and the number of items falling in that class is stated. The
number of observations falling in a particular class is called class frequency or simply frequency of
that class and is denoted by "f".
Class and Class frequency:when a set of data are divided into non-overlapping homogeneous
groups, each group is called class or class interval. The number of observations falling in a particular
class is called frequency of that class or simply frequency and is denoted by "f".
Class limits: The class limits are defined as the number or the values of the variables which are used
to separate two classes. The smaller number is called lower class limit and larger number is called
upper class limit.
Class boundaries:The class boundaries are obtained by subtracting and adding half of the difference
between the upper limit and lower limit of two successive classes respectively. It can also be obtained
by subtracting and adding h/2 from midpoint of each class.
Class mark or mid points: The class mark or the midpoint is that value which divides a class into
two equal parts. It is obtained by dividing the sum of lower and upper class limits or class boundaries
of a class by 2.
Class interval:Class interval is the length of a class. A class interval is usually denoted by "h".It is
obtained by
(i) The difference between the upper-class boundary and the lower-class boundary.(Not the
difference between class limits)OR
(ii) The difference between either two successive lower class limits or two successive upper class
limits. OR
(iii) The difference between two successive midpoints.
CONSTRUCTION OF A FREQUENCY DISTRIBUTION:
Decide the number of classes: The number of classes is determining by the formula i.e.
K=1+3.3log(n) OR k n (approximately)
Where K denotes the number of classes and n denotes the total number of observations.
Determine the range of the data: The difference between the largest and smallest values in the data
is called the range of the data. i.e. R = largest observation - smallest observation
Where R denote the range of the data.
Determine the approximate size of class interval: The size of the class interval is determined by
dividing the range of the data by the number of classes i.e. h= R/K
Where h denotes the size of the class interval. In case of fractional results, the next higher whole
number is usually taken as the size of the class interval.
Decide where to locate the class limits: The lower-class limit of the first class is started just below
the smallest value in the data and then add class interval to get lower class limit of the next class,
repeat this process until the lower-class limit of the last class is achieved.
Distribute the data into appropriate classes:Take an observation and marked a vertical bar
"I"(Tally) against the class it belongs.
Cumulative Frequency:Cumulative frequency of a class is obtained by adding all the frequencies of
all preceding classes including that class and is denoted by c.f.
Relative Frequency:The frequency of a class divided by the total frequency of all the classes is called
Relative frequency and is denoted by r.f.
Cumulative relative frequency:Cumulative relative frequency of a class is obtained by adding all
the relative frequencies of all preceding classes including that class.
Percentage frequency:Percentage frequency of a class is obtained by multiplying100 to the relative
frequencies of that class.
Cumulative percentage frequency:Cumulative percentage frequency of a class is obtained by
adding all the percentage frequencies of all preceding classes including that class.