Introduction of Statistics
Introduction of Statistics
STATISTICS
B . S . PA R A J U L I
History and Development of Statistics
• Combining both:
Statistics is the science which is the combination of
the numerical statements of facts capable of analysis
and interpretation and the study of the principles,
the methods applied in collecting, presenting,
analysis and interpreting the numerical data in any
field of investigations
Statistics may also be classified into two parts,
which are as follows:
• Theoretical Statistics or Mathematical Statistics
• Applied Statistics
Theoretical statistics can further be subdivided
into two parts.
• Descriptive Statistics
• Inferential Statistics
Descriptive Statistics Inferential Statistics
• Collect data
– e.g., Survey
• Present data
– e.g., Tables and graphs
• Characterize data
– e.g., Sample mean = X i
n
Inferential Statistics
• Estimation
– e.g., Estimate the population
mean weight using the sample
mean weight
• Hypothesis testing
– e.g., Test the claim that the
population mean weight is
120 pounds
Applied Statistics
• Statistics which deals with the application of
statistical methods to specific problems is called
applied statistics. It has application of
mathematical statistics on real world problems.
It prescribes analysis method depending upon
nature of data or nature of problem.
• It is used to solve many practical problems in
diversified fields such as medical, engineering,
agriculture, industrial area etc. for decision
making
Functions of Statistics
• Data mining is performed with the help of statistics by using functions to find
irregularities or inconsistencies within data. Data compression uses statistical
algorithms to compress data. Statistics are also used in network traffic
modeling, whereby available bandwidth is exploited to be usable while the use
of statistical programs avoids network congestion. Artificial intelligence tries to
simulate human thought using algorithms that are similar to voice recognition
or translation software. Other statistical uses in computer science include
quality management, software engineering, storage and retrieval processes and
software and hardware engineering and manufacturing. Algorithms have
become necessary in many facets of computer programming and data mining
Limitations of Statistics
• Nominal scale
• Ordinal scale
• Interval scale
• Ratio scale
• Measurement can be distinguished on the basis of:
• Level
• Characteristics of classification
• Order
• Distance
• Origin
Scales Properties
• Uniquely classifies (Categories)
• Preserves order(Rank)
• Equal intervals
• Natural zero(True Zero Point)
Nominal Scale
• Lowest measurement scale
• Consist of naming observations or classifying
them into various categories
• Attributes such as – religious, gender, defective
or non defective items, etc
• Codes such as 0,1,2 etc are used to identify the
characteristics like religious affiliation, gender,
etc.
• Categorical data are generally measured on
nominal scale.
Nominal Scale Properties
• No. of a set of objects is not comparable to
the other set.
• Not any sense of the computation of A.M. &
S.D., Product moment correlation, etc
• No parametric tests can be applied,
• Association between 2 nominal scaled
variables can be measured using chi-square
or Fisher’s exact test
Ordinal Scale
• When they can be ranked according to certain
criterion or priorities or importance, then they are
said to be measured on an ordinal scale
• Example: Attitude of people towards certain fact
such as positive, negative and bad
• We may assign numbers 1 for positive, 2 for
negative and 3 for bad
• These numbers are known as ranks.
• Such characteristic is said to be measured on ordinal
scale.
Ordinal Scales Properties
• Median, the appropriate measure of central
tendency
• Percentile rank and the Quartile deviation
used as the measures of dispersion.
• Rank correlation used to correlation
between two sets of ordered data
• Only Non- Parametric statistical tests can be
used
Interval Scale
• Some measurement scales possess a constant
interval size; they are called interval scales.
• Example: Two common temperature scales:
Celsius (C ) & Fahrenheit(F).
• Same difference exist between 200C(680F) &
250C (770F) as between 50C (410F) & 100C(500F)
But cannot say 400 C(1040 F) is twice as hot as
temp of 200 C (680 F) i.e. Zero point is arbitrary
Interval Scale Properties
Level of
Measurement Categories Rank Equal intervals True Zero Point
Nominal Yes No No No
Data
Categorical/Qualitative Numerical/Quantitative
Examples:
Marital Status
Political Party
Eye Color Discrete Continuous
(Defined categories)
Examples:
Examples:
Number of Children
Defects per hour
Weight
(Counted items)
Voltage
(Measured characteristics)
• Cross-Sectional Data
Cross-sectional data refers to data collected by observing
many subjects (such as individuals, firms, countries, or
regions) at the one point or period of time.
E.g. population of children in census year 2068 B.S.
• Panel Data
Panel data is a dataset in which the behaviours of entities are observed across
time. These entities could be individuals, states, companies, institutions,
countries etc. Panel data is also known as longitudinal or cross-sectional time
series data. e.g. income of persons X and Y in years 2016, 2017 and 2018
according to age and qualification.
• Spatial Data
Spatial data is any type of data that directly or indirectly references a
specific geographical area or location. Sometimes called geospatial data or
geographic information, spatial data can also numerically represent a physical
object in a geographic coordinate system
Population refers to the all totality of
cases(items) under investigation.
Sample is the portion of a population selected for
analysis.
Parameter
Parameter is a numerical measures that
describes a characteristic of a population.
Population mean(, Population variance ()
Statistic
Statistic is a numerical measure that describes a
characteristic of a sample.
Sample mean(), sample variance(s2)