Statistical Characteristics of Numerical Data
Statistical Characteristics of Numerical Data
Mohammad Rahman
Contents
About Statistics ............................................................................................................................................. 2
Sampling and Errors ...................................................................................................................................... 2
Sampling Types ......................................................................................................................................... 3
Sampling Errors ......................................................................................................................................... 3
Experimental Techniques.......................................................................................................................... 3
Categories of Data Analysis........................................................................................................................... 3
Measure of Center .................................................................................................................................... 3
Measure of Position .................................................................................................................................. 4
Measure of Shape ..................................................................................................................................... 4
Measure of Spread.................................................................................................................................... 4
Measure of Certainty ................................................................................................................................ 4
Measure of Prediction .............................................................................................................................. 5
Measure of Comparison............................................................................................................................ 6
Measure of Graph ..................................................................................................................................... 7
Statistical Data Representation ............................................................................................................ 7
About Statistics
Statistics is about collecting, organizing, describing and analyzing data. Data is any numerical or
categorical data. Statistics is used to predict and make decisions. Describing is done through
statistical formulas, and analysis is interpretation of this data through statistical formula also.
These concepts of description and analysis are discussed in section Category of Data Analysis.
Description is static representation of data as collected while analysis is making a
claim/hypothesis about population from sample.
The description and analysis of data is to find a pattern and meaning in data. This is often
achieved through correlation and regression among other similar pattern finding mechanism.
Data pattern defines distribution of a population and sample.
Gathering of data is done through objective observation or objective experiment. Bias and
subjectivism are discarded as error and incomplete study. Even in allegedly objective study
sampling (data gathering) error occurs. Sampling is gathering partial data from a whole group
called population. Characteristics of sample is called statistic while that of population is called
parameter. Examples include sex, height, age or weight. These are statistical characteristics.
➢ Different ethnicities for a countries population when collecting sample statistic such as
height, IQ, weight etc.
➢ Pharmacokinetics and Pharmacodynamics of a genus (population) of a drug such as
Penicillin or more broadly antibiotics.
➢ Planetary data of a group of planets from a certain galactic system.
The universe of data is a fascinating one with a life and nature of its own. Theorems reveal this
aspect of data. Also, data size affects mathematical rules used. Central limit theorem is one of
many signifying mathematical behaviors of data. If we consider Central Limit theorem we can
notice such:
Success of statistical study depends on unbiased and objective sampling. There are many ways of
sampling. Some of them are:
➢ Random Sampling
➢ Systematic Sampling
➢ Cluster Sampling
➢ Stratified Sampling
➢ Census
Sampling Errors
➢ Sampling Bias
➢ Selection Bias
➢ Self-Selected Samples
➢ Non-Response Errors
➢ Response Errors
➢ Interview Errors/Data Errors
➢ Errors in Analysis
Experimental Techniques
In an observation such as sampling population for voter behavior or food preferences the data is
collected by a person or group by going out in the real world as it exists. This is different from
experimental study where experiment is designed to study the sample. There are few techniques:
Measure of Center
➢ Mean
➢ Mode
➢ Median
Measure of Position
➢ Percentiles
➢ Quartiles
➢ Z-Scores: Determines how much further/standard deviation a data value is from mean
Measure of Shape
The existence consists of data about God’s creations or human inventions. The data follow
distribution pattern. Distribution simply distributes sample characteristics across data. Examples
include height of male or female, data about pharmacokinetics and pharmacodynamics of drugs,
data about planetary movements, blood data of humans etc. will show a certain curve on graph
representing how data is distributed. Distribution are following types:
➢ Normal Distribution
➢ Skewed Distribution
➢ Uniform Distribution
➢ Examples of distributions: z, t, F, chi-square and many more!
Measure of Spread
➢ Range
➢ IQR
➢ Standard Deviation (SD) & Empirical Rule for Data Spread in Bell Shaped Curve: SD determines
total difference of all data from mean value.
➢ Variance & Co-efficient of Variance
Measure of Certainty
➢ Margin of Error: Largest distance from point estimate (a value for population parameter) that
contains population mean
➢ Interval Estimate: Range of values that contain mean
➢ Level of Confidence & Level of Significance: How certain that interval estimate contains the
mean
➢ Confidence Interval: Interval estimate that goes with Level of Confidence
(Smith, n.d.)
Measure of Prediction
(Smith, n.d.)
Statistics cannot determine causality as it’s predictive based on probabilistic significance. Causality will
always work in the same way given the same category of data no matter how many data is tested.
However statistical correlation will not work for all data of the same category. There will be certain data
for which this correlation will not work. Hence causality is permanent and repeats in the same way for
any given data whereas correlation is temporary and repeats up to a certain point. Correlated variables
need to be found out so that we do not miss any lurking variables. We may also need measurement of
data beyond maximum boundaries of the linear regression model.
Measure of Comparison
o Multiple Means
o Multiple Proportions
o Multiple Variances
o Dependent & Independent Samples
(Smith, n.d.)
Measure of Graph
These attributes are the measurements for X and Y axis. You use them to plot a statistical graph. They
are
➢ Frequency Distributions
➢ Ranged Data Types
➢ Probability Distribution: Mean and SD
➢ Histogram
➢ Pie Chart
➢ Bar Chart
➢ Pareto Chart
➢ Stem and Leaf Diagram
➢ Scatter Plot