0% found this document useful (0 votes)
76 views61 pages

Statistics

The document provides information about research services offered at UC Research & Innovation including research consultation, statistical consultation, and geographic information system (GIS) consultation. It lists the vice president, director, secretary, and research assistants of the UC R&I office and their contact information. It also offers a brief overview of the definition, importance, history, concepts, and types of statistics used in research.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views61 pages

Statistics

The document provides information about research services offered at UC Research & Innovation including research consultation, statistical consultation, and geographic information system (GIS) consultation. It lists the vice president, director, secretary, and research assistants of the UC R&I office and their contact information. It also offers a brief overview of the definition, importance, history, concepts, and types of statistics used in research.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

UC Research &Innovation​Research

Services Of ice
Ariel Nimo B. Pumecha- UC R&I Vice President
Nathaniel Vincent A. Lubrica - RSO Director Ruben I. Rubia -RSO
Secretary
Judale W. Quianio - RSO Research Assistant ​Carlo Jay S.
Valdez - RSO Research Assistant

Services Offered
Research Consultation
Statistical Consultation
Geographic Information System (GIS) Consultation Research
Ethics Review
Tel. No. (074) 442-6268 local 195 email address:
[email protected]

Definition
Originally the word “​statistics​” was usedforthecollection of
data concerning states bothhistoricalanddescriptive. Now it
has acquired a muchwidermeaningand is used for all types of
data andmethodsfortheanalysis of the data.
In general, ​Statistics is a
systematiccollectionofnumerical facts​. In simple
andcomprehensivemeaning, in singular sense, ​Statistics
isemployedforthe purpose of data collection,
classification,presentation, comparison
andinterpretationofdata​.
The purpose is to make the data simple,
lucidandeasytounderstood.
Importance
• It ​simplifies ​mass of data (condensation). • Helps to
get ​concrete information​aboutanyproblem.
• Helps for ​reliable and objective​decisionmaking.• It
presents facts in a ​precise &definiteform​.• To
establish ​empirical evidence​. • To ​understand the
behavior ​of variousvariables
such as enrollment, patients, widows,
householdconsumption expenditure, etc.
History ​Early Beginnings

First Ancient Olympics: 400


BC-780 BC
Peloponnesian War: 431BC

First Recoded Census: AD


2 Earliest known Graph: 10thCentury

History
Early Modern Period (1500-1800)
Gambling-Probabilities: 1500's

Invention of theword
“Statistik” (inGerman):1949
by GottfriedAchenwall

Use of the word


“Statistics” (inEnglish):1971
by Sir JohnSinclair
History
19th to Present

1st
ProgrammableElectronic
WWI: 1916​ Automobile&Electronics:
Computer: 1940-45

1950's ​BIG DATA: 1997

Some basic concepts

Variables
When we are measuring height or
weight these can be seen as variables. The reason is that their
measurement can vary from time to time.
Constants
When we deal with a quantity or
value that does not change it is
referred to as a constant for
example the speed of light.
Some basic concepts
Variables
Important terms regarding variables:

Independent Variable
A variable thought to be the cause of some effect.
Predictor Variable
A variable thought to predict an outcome – another
termforindependent variable.

Dependent Variable
A variable thought to be affected by changes in the
independentvariable.
Outcome Variable
A variable thought to change as a function of changes
inapredictorvariable – synonymous with dependent variable.
Some basic concepts
Variables
Important terms regarding variables:
Some basic concepts
Variables
Continuous VS Discrete Variables

Continuous Variable
Can take any value in a defined range. Also refered as
MeasurableVariables​.
Examples are weight or height.

Discrete Variable
These variables can only take certain values. Discrete
Variablesalsoknown as ​Categorical Variables.
Example in a race 1
st​
, 2​nd ​and 3​rd ​place can be awardednot 2.5or3.25rd
or assigning two victims as 1 for VictimA and 2 for
VictimBthereisn’ta1.5 category.
Some basic concepts

Level of Measurement
• ​Nominal
Indicate that there is a difference between categories of objects, personsor
characteristics – numbers are used here as labels. – Example:
• Gender (1 = Male, 2 = Female)
• Psychopathology (1 = Schizophrenic, 2 =ManicDepressive,3=
Neurotic)
• ​Ordinal
Variables indicate categories that are both different
fromeachotherand ranked in terms of the attribute.
– Example
• Race winners 1​st​, 2​nd ​and 3​rd
Some basic concepts

Level of Measurement ​• ​Interval


• These variables are true quantitative measures – the
difference/distancebetween any 2 scores is an accurate reflection of the
differenceintheamount of an attribute that the two objects have – Example
• ​Temperature, IQ scores, scores of attitude, knowledge tests, calendaryears,
income

• ​Ratio
These variables have all the properties of Interval scales but
becausetheyhavethe true zero value
– Example
• ​Exam marks – 0%-100%
• ​Age – a 40 years old person is twice the age of a 20 year old​• ​Time, length and
weight other examples

Some basic concepts


Some basic concepts

Samples and
Populations

Population
An entire collection of elements
or
individuals.
Sample
A portion of the population.
STATISTICS
The two major branchesof
statisticsare​descriptive
statisticsandinferential statistics.
(Colman, 2001)
STATISTICS

Descriptive Statistics ​• Summaries of


numerical datathatmakethem more easily
interpretable, includingespecially the
measuresofcentraltendency
(mean,median,mode),frequency,
variance, standarddeviation, range,
standarderrorofthe mean, kurtosis
andskewnesso ​ fa set of scores​.
STATISTICS

Inferential Statistics​: • Techniques for


inferringconclusionsaboutpopulations on the
basisof datafromsamples. The major
objectiveisusuallytodecide whether the results
of theresearchare statistically significant. •
There are 2 routes onecantakewithr e g a r d s t
o i n f e r e nti al statistics:​Parametric
andNon-ParametricStatistics
STATISTICS

Parametric Statistics•​ Most statistical


techniques basedonthis• There are also a
couple of assumptions
that need to be adheredto: ​– Normally
Distributed Data – Homogeneity of Variance –
Interval Data
– Independence
STATISTICS

Non-Parametric Statistics​• Do not


have stringent requirementsanddo not make
assumptionsabouttheunderlying
populationdistribution
• Disadvantage:
– Less sensitive than the parametric
statisticsandmayfail to detect differences
betweengroupsthatactuallydo exist
Recommended: Always try and use Parametric Stats – but can be used for Nominal
andOrdinaldata and also when you have a small sample.

Descriptive Statistics

There are a number of uses of


descriptivestats:
– Describing the characteristics of the sample– Checking
your variables for any violationof theassumptions
underlying the statistical techniques– To address specific
research questions (Pallant,2007)
Descriptive Statistics

• To get
descriptivestatistics​for​categorical
variables(males–females)
frequencies​shouldbeused​,
thi s will tell you howmanygavearesponse in
these categories. • To get
descriptivestatistics​for​continuous
variables(age)itisbetter to use
descriptiveanalysis​which will then provide
asummaryofthevariables (mean,
medianandthemode)
Descriptive Statistics

Univariate analysis:
– Involves the analysis of one variable
acrosscasesonevariable at a time
– 3 major characteristics
• Distribution
• Central Tendency
• Dispersion
Descriptive Statistics –UnivariateAnalysis
• Distribution ​– The first step in turning data
intoinformationistocreate a distribution.
– The most primitive way to present a
distributionistosimply list, in one column, each
valuethatoccursinthe population and, in the next column,
thenumberoftimes it occurs.
– Thi s s impl e li s ting i s c a ll eda​f
requencydistribution​. A more elegant way
toturndataintoinformation is to draw a graph of the
distribution.

Frequency (count of occurence)


Frequency DistributionTable

Descriptive Statistics –UnivariateAnalysis


• ​Measures of Central Tendency​– ​Mean
(Average) ​the most commonlyusedmethodof
describing​ the​ ​central tendency​–sumthevalues
divide by the amount of instances. – ​Median (middle
score) ​the scorefoundintheexact middle
– 1, 2, ​3​, 4, 5 median is 3 – ​Mode (frequently occur)
the most frequentlyoccurring score – scores
arrangedinordercountthescores and the most frequently
occurringscoreisthemode.
– 1,2,2,2,3,​4,4,4,4​,5,6 mode is 4
Descriptive Statistics –UnivariateAnalysis
• ​Measures of Spread/ Dispersion​Refer
to the spread of valuesaroundthecentral
tendency.
– ​Range ​– take highest value andsubtractitbythe
lowest value – 42- 10=32– ​Standard
Deviation​–is amoreaccurateand detailed
estimate of dispersionbecauseanoutlier can greatly
amplify therange
Descriptive Statistics
–UnivariateAnalysis
• Standard deviation
Descriptive Statistics – Univariate
Analysis
Descriptive Statistics
Inferential Statistics (Parametric)

• Inferential statistics use


measurements from the
sample of subjects in the
experiment to
compare
the treatment groups
and make
generalizations
aboutthelarge r
population of
subjects.
Inferential
Statistics (Parametric)

Commonly Used ParametricInferential


Statistics

• ​Hypothesis tests
(Testing an assumption regardingapopulation.)​•
Confidence intervals • Regression analysi​s

Interestingly, these inferential methodscanproducesimilar


summary values as descriptive statistics, suchasthe mean
and standard deviation.
Inferential Statistics
(Parametric)​Hypothesis Testing
T-tests
–Are used to determine whether
aprocessoratreatment actually has aneffect
onthepopulation of interest, or
whether2groupsare
different fromeach other. ​Types of t-tests
– Paired samples t-test
(repeatedmeasures/beforeand after)
– Independent samples t-test
Inferential Statistics
(Parametric)​Hypothesis Testing
TWO Main types of t-tests​– Paired
samples t-test (repeatedmeasures),alsocalled
dependent samples/groups
Inferential Statistics
(Parametric)​Hypothesis Testing
TWO Main types of t-tests​-
Independent group/samples t-test
Inferential Statistics
(Parametric)​Hypothesis Testing
Assumptions (T-Test)
• ​The data are continuous (not discrete) ​• ​The data
follow the normal probabilitydistribution
Inferential Statistics
(Parametric)​Hypothesis Testing
ANOVA (Analysis of Variance) ​Extension
of independent samplest-test,​means​are
compared among three or
moreindependentgroups
Inferential Statistics
(Parametric)​Hypothesis Testing
Inferential Statistics
(Parametric)​Hypothesis Testing

Assumptions (ANOVA)
• ​The data are continuous (not discrete) ​• ​The data
follow the normal probabilitydistribution​• ​Equal
variances between groups ​• ​Independence of
Samples
Inferential Statistics
(Parametric)​Hypothesis Testing
Correlation (e.g. Pearson)
• Exploring Relationships:
Lookingatthestrength of the
relationshipbetweenvariables​– To explore
the strength of the relationshipbetween2
continuous variables. This gives
theindicationofthe ​direction (positive /
negative)andthestrength of the
relationship​. –A positive correlation
indicate –asoneincreasetheother increase as
well. Negative correlationoneincrease the
other decrease.
Inferential Statistics
(Parametric)​Hypothesis Testing
Assumptions (Correlation)
• Related pairs
• Absence of outliers
• Linearity
Sample Computations ​• Paired T-test
At 5% level of significance, compare the scores of
thestudentsduringpre and post exams after taking a 100 pts. exam.
Students Pre-Score Post-Scores Difference (d)
Ʃ​(d​i​)=42, n=10,df=n-1
1 80 75 5

=​4.2​
= ∑​
​ nd​
2 59 51 8
di​
3 86 84 2
4 69 57 12
2​=​ −​
4.89​
1​ ( )/ ​ −​ = ​ ∑∑​ n​
d dn​ i i
s​
5 90 85 5
D
6 76 74 2
2.72​ −=​ d​
/​0​( ) ​=​ s n​ t​D​i
7 89 81 8
computed
8 62 63 -1
9 84 78 6
t​(tabulated)​=t​(0.05/2),df=9​=2.262
10 82 87 -5

Decision: ​If ​t(computed) is greater than t(tabulated)​,


thenthereisasignificantdifference in the scores during the pre and post exams.

Sample Computations ​• Using SPSS

Software
p-value or Sig. <0.05,then
there is a significantdifference
betweenthetwoexams
• Independent Samples ​At 5% level of significance,
compare the scoresof thetwogroups.
x​
1,2 1.21,2​ i​= ​
=​
∑​x​/​n

t​(tabulated)​=t​0.05/2,8​=2.306

Decision: ​If ​t(computed) islesserthan


t(tabulated)​, then there is nosignificant
groups.
difference in the scores of the

Using SPSS Software


p-value or Sig. >0.05,then
there is nosignificant
difference betweenthetwo
groups
• ANOVA ​At 5% level of significance, compare the scores of the three
groups.
• ANOVA
• ANOVA
Decision: ​If ​F(computed) islesserthan
F(tabulated)​, then there is anosignificant
difference in the scores of the groups.
• Correlations
• ​If r=1, then all the points lie in a straight
lineandtherelationship is said to be perfect
positiverelationship.​• ​If r=-1, then all the points lie in a
straight linebutinaninverse ralation and this relationshipis
saidtobeperfect negative relationship.
• ​If r=0, then there is no relationship betweentwovariables
and the variables are aid to be independent.

• Correlations
• ​If r=1, then all the points lie in a straight
lineandtherelationship is said to be perfect positive
relationship.​• ​If r=-1, then all the points lie in a straight
linebutinaninverse ralation and this relationship is
saidtobeperfect negative relationship.
• ​If r=0, then there is no relationship betweentwovariables
and the variables are aid to be independent.
• Correlations
Sincer=0.87,it
suggestsastrong, positivecorrelation
betweenmathand stat score.

1878 (130)(100)/10

=
2 2​=
0.87
(1878 (130) /10)(1138 (100) /10)
−−

You might also like