0% found this document useful (0 votes)
22 views38 pages

Introduction of Statistics

Uploaded by

sahasneupane85
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views38 pages

Introduction of Statistics

Uploaded by

sahasneupane85
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

INTRODUCTION TO

STATISTICS

B . S . PA R A J U L I
History and Development of Statistics

• Human beings used Statistics even unknowingly


• kept the information regarding the daily life activities-
records of cattle, properties, crops etc.
• States used Statistics only to keep information regarding
the number of police, soldiers etc.
• It took long time to understand the importance and
application of Statistics formally.
• Britain also started to realize statistics knowingly after the
Napoleonic War.
• Realized to formulate the special economic policies.
• Argued that the origin of Statistics had been started with
the origin of the development of human society
simultaneously.
• First comprehensive work on numerical interpretation
of biological and social phenomenon is due to John
Graunt, who published a book in 1662 based on both
birth and death records gathered in plague stricken
England
• Sir William Petty began developing theory what become
known as Political Arithmetic, 'the art of reasoning by
figures upon things relating to government'
• By 1700, the work of Edmund Halley on mortality tables
and life expectancy had established statistical inference
as a science both viable and eminently worthwhile
• Nowadays, one cannot imagine any area or discipline
without Statistics.
Meaning of Statistics

• The word Statistics has been derived form


– the Latin word 'Status',
– Italian word 'Statista'
– German word 'Statistik'
Each of which means a political state
DEFINITIONS OF STATISTICS
• These are broadly divided into two categories :
1. Definitions in plural sense.
2.Definitions in singular sense.
• In the plural sense:
"Statistics are aggregates of facts affected to a marked extent
by multiplicity of causes, numerically expressed,
enumerated or estimated according to reasonable standard
of accuracy, collected in a systematic manner for a
predetermined purpose and placed in relation to each
other“
- by Horace Secrist
• In a singular sense:
"Statistics may be defined as the collection,
presentation, analysis and interpretation of
numerical data"
- by Croxton and Cowden

• Combining both:
Statistics is the science which is the combination of
the numerical statements of facts capable of analysis
and interpretation and the study of the principles,
the methods applied in collecting, presenting,
analysis and interpreting the numerical data in any
field of investigations
Statistics may also be classified into two parts,
which are as follows:
• Theoretical Statistics or Mathematical Statistics
• Applied Statistics
Theoretical statistics can further be subdivided
into two parts.
• Descriptive Statistics
• Inferential Statistics
Descriptive Statistics Inferential Statistics

Collecting, summarizing, and Drawing conclusions and/or


describing data making decisions concerning a
population based only on sample
data
Descriptive Statistics

• Collect data
– e.g., Survey

• Present data
– e.g., Tables and graphs

• Characterize data
– e.g., Sample mean = X i

n
Inferential Statistics

• Estimation
– e.g., Estimate the population
mean weight using the sample
mean weight
• Hypothesis testing
– e.g., Test the claim that the
population mean weight is
120 pounds
Applied Statistics
• Statistics which deals with the application of
statistical methods to specific problems is called
applied statistics. It has application of
mathematical statistics on real world problems.
It prescribes analysis method depending upon
nature of data or nature of problem.
• It is used to solve many practical problems in
diversified fields such as medical, engineering,
agriculture, industrial area etc. for decision
making
Functions of Statistics

• To represent facts from numerical figures in a definite form


• To condense the wieldy and voluminous data:
• To help classification of data
• To provide methods for making comparison
• To help in formulating policies
• To determine relationship between different phenomena
• To help in predicting future trends
• To formulate and test the hypothesis:
• To have an idea about the occurrence or non occurrence
of certain events
• To draw valid inferences or conclusions, etc
Scope of Statistics

• Statistics, Business and Industries


• Statistics and Finance
• Statistics and Economics
• Statistics, Computer, and Information
Technology
• Statistics and Science, etc
Statistics in Computer Science and Information
• Statistics have been especially useful in speech recognition software with the
advent of Apple’s Siri. Statistics also back programs such as Google Translate,
which uses data to perform online translations. Statistics are used in both of
these instances by using the spoken or typed word and changing it into a
sequence of numbers that matches it with known dictionaries.

• Data mining is performed with the help of statistics by using functions to find
irregularities or inconsistencies within data. Data compression uses statistical
algorithms to compress data. Statistics are also used in network traffic
modeling, whereby available bandwidth is exploited to be usable while the use
of statistical programs avoids network congestion. Artificial intelligence tries to
simulate human thought using algorithms that are similar to voice recognition
or translation software. Other statistical uses in computer science include
quality management, software engineering, storage and retrieval processes and
software and hardware engineering and manufacturing. Algorithms have
become necessary in many facets of computer programming and data mining
Limitations of Statistics

• Statistics does not deal with individuals


• Statistics does not deal qualitative
characteristics directly
• Statistical laws are not exact
• Statistics can be misused, etc
Measurement Scales
• The assignment of numbers to objects or
events according to certain rules
• The measurement of an object or an event or
observation may be according to height,
weight or any number
• Measurement of statistical data is essential for
further statistical analysis
4 Types of Measurement Scales

• Nominal scale
• Ordinal scale
• Interval scale
• Ratio scale
• Measurement can be distinguished on the basis of:
• Level
• Characteristics of classification
• Order
• Distance
• Origin
Scales Properties
• Uniquely classifies (Categories)
• Preserves order(Rank)
• Equal intervals
• Natural zero(True Zero Point)
Nominal Scale
• Lowest measurement scale
• Consist of naming observations or classifying
them into various categories
• Attributes such as – religious, gender, defective
or non defective items, etc
• Codes such as 0,1,2 etc are used to identify the
characteristics like religious affiliation, gender,
etc.
• Categorical data are generally measured on
nominal scale.
Nominal Scale Properties
• No. of a set of objects is not comparable to
the other set.
• Not any sense of the computation of A.M. &
S.D., Product moment correlation, etc
• No parametric tests can be applied,
• Association between 2 nominal scaled
variables can be measured using chi-square
or Fisher’s exact test
Ordinal Scale
• When they can be ranked according to certain
criterion or priorities or importance, then they are
said to be measured on an ordinal scale
• Example: Attitude of people towards certain fact
such as positive, negative and bad
• We may assign numbers 1 for positive, 2 for
negative and 3 for bad
• These numbers are known as ranks.
• Such characteristic is said to be measured on ordinal
scale.
Ordinal Scales Properties
• Median, the appropriate measure of central
tendency
• Percentile rank and the Quartile deviation
used as the measures of dispersion.
• Rank correlation used to correlation
between two sets of ordered data
• Only Non- Parametric statistical tests can be
used
Interval Scale
• Some measurement scales possess a constant
interval size; they are called interval scales.
• Example: Two common temperature scales:
Celsius (C ) & Fahrenheit(F).
• Same difference exist between 200C(680F) &
250C (770F) as between 50C (410F) & 100C(500F)
But cannot say 400 C(1040 F) is twice as hot as
temp of 200 C (680 F) i.e. Zero point is arbitrary
Interval Scale Properties

• Arithmetic mean, S.D., common measures of


central tendency
• Product moment correlation coefficient
• Regression analysis
• t-test, F-test, etc.
Ratio Scale
• Ratio scale is an extension of interval scale.
• It includes all the properties of interval scale.
• This interval has an absolute zero point also.
• It is the highest level of measurement
• Cost, revenue, market share, length, breadth,
weight etc. can be considered as the simple
examples of ratio scale.
• Able to say:
- A 30 cm(11.8 in) tall plant is half as tall as 60
cm(23.6 in) plant, etc
Ratio Scale Properties
• Geometric mean & Harmonic mean can also
be used as measure of central tendency
• Coefficients of variation, skewness &
kurtosis can be used as the measures of the
variability and to determine the nature of
data
• All types tests of significance can be used as
fitted to the nature and distribution of data
The following are the properties of categories,
rank, equal intervals and true zero point of four
scales:
Property

Level of
Measurement Categories Rank Equal intervals True Zero Point

Nominal Yes No No No

Ordinal Yes Yes No No

Interval Yes Yes Yes No

Ratio Yes Yes Yes Yes


Variable
• A variable is a characteristic of an item or
individual. It is generally denoted by x, y etc.The
value of vaiable can vary from one entity to
another.
• For Example: marks obtained by students in
certain exam, age , height, religion, education
level, Profit etc.
• Variables can be broadly classified into two
types: (i) Qualitative Variables
(ii) Quantitative Variables
 Qualitative(Categorical) variables have
values that can only be placed into categories.
E.g. The color of a ball(red, green,
blue),Gender(Male, Female)

 Quantitative(Numerical) variables have


values that represent quantities. e.g. weight,
age, temperature, profit, no. of people of a city
etc. It can be further classified as:
1. Discrete Variable
2. Continuous Variable
1. Discrete Variable: A variable is said to be
discrete if it takes only countable
values(whole numbers). For example:
Number of buses, number of persons, family
size etc.
2. Continuous Variable: A variable is said to be
continuous if it takes all possible real
values(Whole number as well as fractional
values) within a certain range. For example:
heights, weights, temperature records, marks
obtained by students in a certain exam etc.
Data
• Data are the different values associated with a
variable.
• It is a collection of facts and figures to be used for
a specific purpose such as a survey or analysis.
• Actually data are the raw materials for final
statistical conclusions. In statistics, the main
resource is the data.
There are mainly two types of data on the basis of
collection procedures. They are as:
• Primary data
• Secondary data
Primary Data
• These are the data that are collected for the first time by
an investigator for a specific purpose.
• Primary data are ‘pure’ in the sense that no statistical
operations have been performed on them and they are
original. An example of primary data is the Census of
Nepal.
The sources of this type of data is called primary source.
Following are the methods of collecting primary data:
• Direct personal interview method
• Indirect oral interview method
• Information through correspondence
• Mailed questionnaire method
• Schedule sent through enumerators
Secondary Data
• They are the data that are sourced from someplace that has
originally collected it. This means that this kind of data has
already been collected by some researchers or investigators
in the past and is available either in published or
unpublished form. This information is impure as statistical
operations may have been performed on them already.

• Published Sources: Like Nepal census of household and


population, agriculture, business data, vital statistics, UN
publication, budgets etc.

• Unpublished Sources: Like record maintained by school,


colleges, government offices, any institution etc.
Differences between Primary data and Secondary Data
Types of Data

Data

Categorical/Qualitative Numerical/Quantitative

Examples:
 Marital Status
 Political Party
 Eye Color Discrete Continuous
(Defined categories)
Examples:
Examples:
 Number of Children
 Defects per hour
 Weight
(Counted items)
 Voltage
(Measured characteristics)
• Cross-Sectional Data
Cross-sectional data refers to data collected by observing
many subjects (such as individuals, firms, countries, or
regions) at the one point or period of time.
E.g. population of children in census year 2068 B.S.

• Time series data


The data which are collected over a period such as a month,
quarter, or year. Time series data occurs wherever the same
measurements are recorded.
E.g. population growth rate of Nepal in census years 1991,
2001,2011.
• Failure Time Data
Failure time data is a data set in which the outcome of interest is time to
failure or time to event. The data of each subject or of each unit kept for each
follow up time till the occurrence of the event or till the unit falls. Time to
event data is mostly found in the biomedical sciences, clinical studies, more
often in cancer studies.

• Panel Data
Panel data is a dataset in which the behaviours of entities are observed across
time. These entities could be individuals, states, companies, institutions,
countries etc. Panel data is also known as longitudinal or cross-sectional time
series data. e.g. income of persons X and Y in years 2016, 2017 and 2018
according to age and qualification.
• Spatial Data
Spatial data is any type of data that directly or indirectly references a
specific geographical area or location. Sometimes called geospatial data or
geographic information, spatial data can also numerically represent a physical
object in a geographic coordinate system
Population refers to the all totality of
cases(items) under investigation.
Sample is the portion of a population selected for
analysis.
Parameter
Parameter is a numerical measures that
describes a characteristic of a population.
 Population mean(, Population variance ()

Statistic
Statistic is a numerical measure that describes a
characteristic of a sample.
 Sample mean(), sample variance(s2)

You might also like