0% found this document useful (0 votes)
244 views9 pages

Statistical Characteristics of Numerical Data

This document provides an introduction to statistical characteristics of data. It discusses key concepts in statistics including sampling, measures of center, position, shape, spread, certainty, prediction, and comparison. Some specific statistical techniques mentioned include random sampling, mean, median, standard deviation, correlation, regression, and hypothesis testing. The goal of statistics is to collect, organize, describe and analyze data to find patterns and make predictions.

Uploaded by

Mohammad Rahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
244 views9 pages

Statistical Characteristics of Numerical Data

This document provides an introduction to statistical characteristics of data. It discusses key concepts in statistics including sampling, measures of center, position, shape, spread, certainty, prediction, and comparison. Some specific statistical techniques mentioned include random sampling, mean, median, standard deviation, correlation, regression, and hypothesis testing. The goal of statistics is to collect, organize, describe and analyze data to find patterns and make predictions.

Uploaded by

Mohammad Rahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Introduction to Statistical Characteristics of Data

Mohammad Rahman

Contents
About Statistics ............................................................................................................................................. 2
Sampling and Errors ...................................................................................................................................... 2
Sampling Types ......................................................................................................................................... 3
Sampling Errors ......................................................................................................................................... 3
Experimental Techniques.......................................................................................................................... 3
Categories of Data Analysis........................................................................................................................... 3
Measure of Center .................................................................................................................................... 3
Measure of Position .................................................................................................................................. 4
Measure of Shape ..................................................................................................................................... 4
Measure of Spread.................................................................................................................................... 4
Measure of Certainty ................................................................................................................................ 4
Measure of Prediction .............................................................................................................................. 5
Measure of Comparison............................................................................................................................ 6
Measure of Graph ..................................................................................................................................... 7
Statistical Data Representation ............................................................................................................ 7
About Statistics
Statistics is about collecting, organizing, describing and analyzing data. Data is any numerical or
categorical data. Statistics is used to predict and make decisions. Describing is done through
statistical formulas, and analysis is interpretation of this data through statistical formula also.
These concepts of description and analysis are discussed in section Category of Data Analysis.
Description is static representation of data as collected while analysis is making a
claim/hypothesis about population from sample.

The description and analysis of data is to find a pattern and meaning in data. This is often
achieved through correlation and regression among other similar pattern finding mechanism.
Data pattern defines distribution of a population and sample.

Gathering of data is done through objective observation or objective experiment. Bias and
subjectivism are discarded as error and incomplete study. Even in allegedly objective study
sampling (data gathering) error occurs. Sampling is gathering partial data from a whole group
called population. Characteristics of sample is called statistic while that of population is called
parameter. Examples include sex, height, age or weight. These are statistical characteristics.

Sampling data has techniques and there are types of error/bias.

Sampling and Errors


Collecting data through sampling from the population gets us sampling distribution. A
population can consist of various compositions of samples. Examples include:

➢ Different ethnicities for a countries population when collecting sample statistic such as
height, IQ, weight etc.
➢ Pharmacokinetics and Pharmacodynamics of a genus (population) of a drug such as
Penicillin or more broadly antibiotics.
➢ Planetary data of a group of planets from a certain galactic system.

The universe of data is a fascinating one with a life and nature of its own. Theorems reveal this
aspect of data. Also, data size affects mathematical rules used. Central limit theorem is one of
many signifying mathematical behaviors of data. If we consider Central Limit theorem we can
notice such:

➢ Sample mean is equal to population mean


➢ Sample standard deviation is mathematically related to population standard deviation
➢ If Population is normally distributed, then sample mean will also be normally distributed.
➢ If Population is NOT normally distributed, but sample size > 30, then sample mean will
also be normally distributed.
Sampling Types

Success of statistical study depends on unbiased and objective sampling. There are many ways of
sampling. Some of them are:

➢ Random Sampling
➢ Systematic Sampling
➢ Cluster Sampling
➢ Stratified Sampling
➢ Census

Sampling Errors

Can take many forms. Some of them are:

➢ Sampling Bias
➢ Selection Bias
➢ Self-Selected Samples
➢ Non-Response Errors
➢ Response Errors
➢ Interview Errors/Data Errors
➢ Errors in Analysis

Experimental Techniques

In an observation such as sampling population for voter behavior or food preferences the data is
collected by a person or group by going out in the real world as it exists. This is different from
experimental study where experiment is designed to study the sample. There are few techniques:

➢ Completely Randomized Design


➢ Randomized Block Design
➢ Matched Pairs
➢ Control Groups vs Placebo
➢ Double Blind

Categories of Data Analysis


Statistics measure certain attributes about data. These attributes can be divided in the following
categories:

Measure of Center

➢ Mean
➢ Mode
➢ Median

Measure of Position

➢ Percentiles
➢ Quartiles
➢ Z-Scores: Determines how much further/standard deviation a data value is from mean

Measure of Shape

The existence consists of data about God’s creations or human inventions. The data follow
distribution pattern. Distribution simply distributes sample characteristics across data. Examples
include height of male or female, data about pharmacokinetics and pharmacodynamics of drugs,
data about planetary movements, blood data of humans etc. will show a certain curve on graph
representing how data is distributed. Distribution are following types:

➢ Normal Distribution
➢ Skewed Distribution
➢ Uniform Distribution
➢ Examples of distributions: z, t, F, chi-square and many more!

The shape can be determined by mean, standard deviation or degree of freedom.

Measure of Spread

➢ Range
➢ IQR
➢ Standard Deviation (SD) & Empirical Rule for Data Spread in Bell Shaped Curve: SD determines
total difference of all data from mean value.
➢ Variance & Co-efficient of Variance

Measure of Certainty

➢ Margin of Error: Largest distance from point estimate (a value for population parameter) that
contains population mean
➢ Interval Estimate: Range of values that contain mean
➢ Level of Confidence & Level of Significance: How certain that interval estimate contains the
mean
➢ Confidence Interval: Interval estimate that goes with Level of Confidence

o Sample size < 30 and mean is involved


▪ Use t statistic
o Sample size > 30 and mean is involved
▪ Use normal distribution Z score
o If proportion is involved
▪ Use normal distribution Z score
o If SD and Variance involved
▪ Use chi-square
o If Normal Distribution use P values else use rejection regions

➢ Hypothesis Testing & Statistical Significance


o Null Hypothesis
o Alternate Hypotheses
o Type I and Type II Errors
➢ Test Statistics

(Smith, n.d.)

Measure of Prediction

➢ Correlation and Regression


o Coefficient of Correlation
o Coefficient of Determination
o Residual and Sum of Squared Error
o Standard Error of Estimate
o Prediction Interval
➢ Markov Chain
o Future State= Probability Matrix * Current State
o States are also matrix representing probabilities
o Transition Matrix
o Stable Matrix

(Smith, n.d.)

Statistics cannot determine causality as it’s predictive based on probabilistic significance. Causality will
always work in the same way given the same category of data no matter how many data is tested.
However statistical correlation will not work for all data of the same category. There will be certain data
for which this correlation will not work. Hence causality is permanent and repeats in the same way for
any given data whereas correlation is temporary and repeats up to a certain point. Correlated variables
need to be found out so that we do not miss any lurking variables. We may also need measurement of
data beyond maximum boundaries of the linear regression model.

Measure of Comparison

➢ Analysis and comparison of multiple sample statistics and population parameters


o Anova
▪ Grand Mean
▪ Sum of Square (Populations)
▪ Sum of Square of Error
▪ Mean Square (Populations)
▪ Total variation

o Multiple Means
o Multiple Proportions
o Multiple Variances
o Dependent & Independent Samples

(Smith, n.d.)

Measure of Graph

These attributes are the measurements for X and Y axis. You use them to plot a statistical graph. They
are

➢ Frequency Distributions
➢ Ranged Data Types
➢ Probability Distribution: Mean and SD

Statistical Data Representation

Some major types of graph:

➢ Histogram
➢ Pie Chart
➢ Bar Chart
➢ Pareto Chart
➢ Stem and Leaf Diagram
➢ Scatter Plot

You might also like