0% found this document useful (0 votes)
9 views20 pages

(DSIMGTS) Notes - November 21, 2024, 3 - 28 PM

The document provides an overview of statistics, including its purpose, branches, and key concepts such as population, sample, and variables. It discusses various statistical methods, data presentation techniques, measures of central tendency, and correlation analysis. Additionally, it covers sampling techniques, the importance of sample size, and different types of statistical tests like ANOVA and regression analysis.

Uploaded by

samanthaneolpis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views20 pages

(DSIMGTS) Notes - November 21, 2024, 3 - 28 PM

The document provides an overview of statistics, including its purpose, branches, and key concepts such as population, sample, and variables. It discusses various statistical methods, data presentation techniques, measures of central tendency, and correlation analysis. Additionally, it covers sampling techniques, the importance of sample size, and different types of statistical tests like ANOVA and regression analysis.

Uploaded by

samanthaneolpis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Statistics -​ Not numbers but rather characteristics or

-​ Both an art and science that deals with the properties of a population of a population or
collection, organization, presentation, sample which makes the members similar to
analysis (extracting important info from the each other
data), and interpretation (be able to explain -​ There should be diversity and, preferrably, no
why scores are good/bad then offer findings constant
and recommendations after) of data; science Variable
because we use it in making decisions -​ Any characteristics or info measurable or
observable on every element of the
Purpose of statistics population or sample
-​ Provide info -​ Subdivided into 2 general categories
-​ Provide comparisons Qualitative (Categorical)
-​ Help discern relationships -​ Answers wh questions
-​ Aid in decision-making -​ Indicate what kind of a given characteristic
-​ Justify claims or assertions an individual, object, or even possesses
-​ Estimate unknown quantities Quantitative (Numerical)
-​ Predict future outcomes -​ Indicate how much a given characteristic an
individual, object, or event possesses
Branches of statistics -​ Discrete
Descriptive -​ Values are obtained through the
-​ Consists of methods concerned with the process of counting
collection, organization, summarization, and -​ Whole number
presentation of a set of data -​ Continuous
Inferential -​ Values are obtained through the
-​ Comprised of those methods concerned with process of measuring
making predictions or inferences about an -​ Using simple tools
entire population based on info provided by -​ Can be whole or decimal numbers
the sample Dependent
-​ Variable that is affected by another variable
Population Independent
-​ totality of all the elements or entities from -​ Affects the dependent variable
which you want to obtain an information
Sample Frameworks
-​ Small portion of the population; subset of the -​ Dependent variable lagi sa right and
population independent sa left
-​ If randomly selected, more or less the same Moderating (indirect)
result as the population compared to when Mediating (effect of IV can give + / - effect on the
chosen using a preferential method DV)

Census and survey Scales of measurement of variable


-​ Referring to different data sets Nominal
Census -​ Variables whose values are simply labels or
-​ Process of collecting information from the names or categories without any explicit or
population implicit ordering of the labels
-​ PSA -​ Lowest level of measurement known as
Survey categorical scale
-​ The process of collecting information from Ordinal
the sample -​ Variables whose values are simply labels or
Parameter names or categories with an implied ordering
-​ A summary or numerical measure used to in these labels
describe a population -​ Ranking can be done on the data
Statistic -​ Distance between two labels can not be
-​ A summary or numerical measure used to determined
describe a sample; from sample data Interval
Constant
-​ Variables whose values can be ordered and 4.​ Identity the class intervals (CI)
distance between any two labels are of -​ If inclusive kasama yung lowest or
known size highest
-​ Always numeric and have no true zero point 5.​ Identify the frequency in each CI or
Ratio tallying
-​ Variable whose values have all the properties
of the interval scale and the ratio of two Class size/width
values is meaningful -​ Difference between the upper or lower class
-​ Has true zero point limits of consecutive classes
-​ Highest level of measurement -​ All classes should have the same class width
_____________________ -​ Lower class limit (least value that can belong
to a class) and Higher class limit (greatest
Data presentation value)
-​ Numerical quantities focus
Textual Class boundaries
-​ Data are presented in paragraph form -​ Numbers that separates classes without
-​ Enumeration or important characteristics, forming gaps between them; like intervals
giving emphasis on significant figures and pero walang gaps so may overlapping na
identifying the important features of the data -​ Dapat laging a little more kesa a little less
-​ If hindi naman super laki yung data pwede -​ Use histogram
naman i-text nalang
-​ Arrange i array form (smallest to largest/least Class mark/midpoint
to most) -​ Middle value of each data class
-​ To find the class midpoint, average the upper
and lower class limits

Relative frequency
-​ Obtained by dividing the frequency of the
given class by the total number of
observations

Additional information
Less than CF (<cummlative frequency)
-​ Total no. of observations within a class
whose values do not exceed the upper limit
of the class
Greater than CF
-​ Total no. of observations within a class
Tabular
whose values are not less than the lower limit
-​ Sometimes we could hardly grasp info from a
of the class
textual presentation thus data are presented
Cumulative frequency of a data class
using tables
-​ The number of data elements in that class
and all previous classes
Frequency distribution table
__________________________________________
-​ Tubular summary of data showing the
_
frequency or number of items in each several
non-overlapping classes
Numerical descriptive measure
-​ Use bar graph
-​ Values to describe data
1.​ Determine the range (R)
-​ Point intersection of ogives (central
-​ Range - difference between the
tendency)
highest and lowest value
2.​ Decide on the number of classes (K)
Measures of central tendency
-​ K - no. of non-overlapping intervals
-​ Describes the center of a given data set;
3.​ Compute for the class size (C)
single value about which the observation
-​ C - quotient of steps 1 and 2
tends to cluster
Mean -​ Variability is descriptive statistics that
-​ Average describe how similar a set of scores are to
1.​ It always exist each other
2.​ It is unique -​ Types of variability
3.​ It is sensitive; it takes into account all the 1.​ range
element; sensitive = good quality -​ Diff. between max and min
values; not that reliable
Median 2.​ Variance
-​ Observation -​ Mean squared differences of
1.​ It always exist the observation from their
2.​ It is unique mean (deviation score)
3.​ Not sensitive; not very reliable measurement 3.​ standard deviation
-​ Only the positive square root
Mode of the variance
-​ Appear more often compared to other 4.​ Coefficient of variation
1.​ May not exist -​ Ration of the SD to its mean
2.​ May not be unique; can be multiple expressed in percentage
3.​ No computation needed -​ Expressed in percent; higher =
more dispersed, lower = less
●​ For nominal variables, mode is the only dispersed
measure that can be used
●​ For ordinal variables, the mode and the Skewness
median may be used; median provides more -​ Measure if it is normal or not; related to
info (taking into account the ranking of normality of data; measure of symmetry in
categories) the distribution of scores
●​ For interval-ratio variables, the mode, -​ If data is normal, it should be approx. normal
median, and mean may all be calculated; distributed; skewness = 0 pag hindi pa nag
mean provides the most info about the one above its approx normal; all measures of
distribution, but the median is preferred if the central tendency are in one place
distribution is skewed -​ Mean > median = positively skewed (sk > 0);
mean < median = negatively skewed (sk < 0)
Measures of position -​ Positively skewed (skewed to the right); left is
-​ Descriptive statistics that discriminate (being almost normal (more low scores) and right
categorized) one score from another score side is not (less high scores)
within the same data set; fighting for a -​ Negatively skewed right is almost normal
position (more high scores) and left is not (less low
-​ Quantile scores)
-​ Dividing the data set into several
parts equally
-​ Types of quantiles
1.​ Quartile (4)
2.​ Decile (10)
3.​ Percentile (100)

Interpolation Kurtosis
-​ Looking for a number between two identified -​ Measuring the peakedness
numbers

Measure of Variability (measures of dispersion)


-​ Far from each other = very disperse
-​ More similar scores = lower dispersion, less
similar scores = higher dispersion
-​ Measure the closeness and layo
-​ Perfectly normal curve (mesokurtic)
-​ Medyo flat (platykurtic)
-​ Normal curve but high peak (leptokurtic) =
data is compressed the gitna so frequency
increases
-​ 3 = mesokurtic
-​ k < 3 = platy
-​ k > 3 = lepto
__________________________________________
Pearson product moment coefficient of correlation
Pearson (r)
-​ An index of relationship between two
variables
-​ x = independent variable, y = dependent
-​ The value of r ranges from -1, 0, +1
-​ If r = 1 or -1 = perfect correlation
-​ If r = 0, x and y are independent of
each other
-​ Ex. test scores

If the trend of the line graph is going upward, the


value of r is positive. This indicates that as the value
x increases, the value of y also increases, x and y
being positively correlated -​ Pag nasa middle moderate correlation
-​ If malapit sa zero pwedeng + or - , medyo
weak correlation
-​ If malapit sa + or -1, medyo strong correlation

Why do we use r?
-​ To analyze if a relationship exists between
two variables
-​ Coefficient of determination
-​ Equal to the square of r and multiplied
by 100%
-​ Explain or answer how much the
independent variable influences the
dependent variables or how much y
depends on x
If the trend of the line graph is going downward, the -​ Degree of relationship between x and
value of r is negative. This indicates that as the value y which cannot be seen in other
x increases, the value of y decreases, x and y being statistical tests of relationship
negatively correlated -​ More powerful test of relationship
compared with other nonparametric
tests

When do we use r, the pearson product moment


coefficient of correlation?
-​ Determine the index relationship between IV
and DV
-​ The value of r ranges from +1 through zero
-1. There is a perfect positive correlation of r
= +1, likewise there is a negative perfect
correlation if the value of r = -1. However, if r
= 0 then there is no correlation between the
two variables x and y
-​ Positive correlation, as x increases y also
If the trend of the line cannot be established either increases or vice versa
upward or downward, then r = 0, indicating that there -​ negative correlation, as x decreases y
is no correlation between x and y variables increases or vice versa
When to use?
-​ When there is a relationship between x and y
variables
-​ Data should be normally distributed using the
level of measurement, which is expressed in
an interval or ratio data
Why use?
-​ Interest in predicting the value of y, the
dependent variable; used for forecasting and
prediction

__________________________________________

F test
-​ A parametric test used to compare the
means of two or more groups of independent
__________________________________________ samples
-​ Analysis of variance (ANOVA)
Simple regression analysis -​ Kinds of analysis of variance:
Regression model (equation) -​ One-way: only 1 variables
-​ predict the value of y given the value of x -​ Two-way: 2 variables (column and
row); used to know if there are
significant differences between and
among columns and rows

Why use?
-​ To find if there is a significant difference
between and among the means of the two or
more independent groups
When to use?
-​ If there is normal distribution and when the
level of measurement is expressed in interval
or ratio data (like t-test and z-test)
-​ Data should be numeric to know if they are
the same or different

Formula:

-​ TSS = the total sum of squares minus CF, the


correction factor
-​ BSS = the between sum of squares minus
the CF
-​ WSS = within sum of squares or it is the
difference between the TSS minus BSS
-​ GT = grand total
-​ N = total number of observations

B+W=T
W=T-B
-​ Compare three groups
F-test two-way ANOVA with interaction effect
-​ If there is interaction between variables
-​ If its two-way there can be more than 2 hypo
; hypothesis is 3 (one for each problem)
-​ The presence of 1 can affect the other
-​ Only used to know if they are the same or not
the same
Multiple regression
-​ Several independent variables
-​ Used to predict the dependent variable y
given the independent variable x
-​ Aside from prediction, we can also see
relationship between the dependent variable
and the different independent variables
-​ Selects every kth member of the
population with the starting point
determined at random
Non probability
-​ Each member does not have equal chance
1.​ Used when there isnt an exhaustive
population list available (no list)
2.​ Not random
3.​ Can be effective when trying to generate
ideas and getting feedback; not considered
for the whole population parang feedback
lang from the sample
4.​ More convenient and less costly
Sampling techniques -​ Convenience sampling
Population -​ Uses subject that are readily available
-​ Set of which includes all measurements of or includes only people who are easy
interest to the researcher to reach
-​ Collection of responses measurements or -​ Purposive sampling
counts that are of interest -​ Researcher looks for predefined
Sample groups that will serve as samples
-​ A subset of the population
Sample size (n)
Why do we do sampling -​ Number of respondents
-​ Impossible to study the whole population -​ Most statisticians agree that the minimum
-​ Manageability of data sample size to get any kind of meaningful is
-​ Economic reasons 100 but if less than 100 lang yung population,
-​ Time and effort try to get all of them.
-​ A good maximum sample size is usually 10%
Types of sampling as long as it does not exceed 1000
Probability samping
-​ Everybody in the population is given equal
chance of being includes
1.​ You have a complete sampling frame
2.​ You can select a random sample from the
population -​ The more samples, the better; the opinion of
3.​ You can generalize your results from a 1000 people is always better than 100
random sample
4.​ Can be more expensive and time consuming Determining sample size
(theres a process) 1.​ Using a census for small population
-​ Simple random sampling (everyone is surveyed)
-​ All members of the population have a 2.​ Using sample size which is 10% of N
chance of being included in the 3.​ Using published tables
sample 4.​ Using formulas to determine sample size:
-​ Fish bowl method slovin’s formula
-​ Stratified sampling
-​ Used when the population can be
subdivided into several smaller
groups or strata and then SRS is
applied to get samples from each
stratum
-​ Cluster sampling
-​ Employs the use of cluster (groups)
instead of individuals that are
randomly chosen; usually for big data
-​ Systematic sampling
-​ Critical point = divides the rejection and
acceptance (this is the decision rule)
-​ Value is taken from a statistical table;
Tabular value (z or t-table)
4. Test statistics; do the computation
5. Make a decision
6. Write a conclusion

Null hypothesis
-​ default/established= “it is believed”; thought
to be true unless it is rejected
-​ Currently accepted value for a parameter
-​ Always hoped to be rejected
-​ Always contains “=” sign
-​ Status quo
-​ Hypothesis of equality
Alternative hypothesis
-​ Also called the research hypothesis; involves
the claim to be tested
-​ Used to contradict the null hypothesis
-​ Uses > or < or ≠l to
-​ Generally represents the idea which the
researcher want to prove

Ho and Ha are mathematical opposites

______________________

Hypothesis
-​ A premise or claim that we want to test
-​ Assumption about the population parameter
-​ An educated guess
Hypothesis testing
-​ Process of making an inference or
generalization on population parameters
based on the results of the study on samples
-​ Deciding between what is reality and what is
a coincidence
Statistical hypothesis
-​ A guess or prediction made by the
researcher regarding the possible outcome of
the study Level of significance, alpha and the rejection
region
Steps in hypothesis testing -​ alpha = 0.05, means the probability of being
1. Formulate Ho and Ha right is 95% and the probability of being
2. Set the level of significance, usually it is given in wrong is 5%
the problem; the level of significance is the same as
the margin of error Possible outcomes
-​ Maximum tolerance of error; 95% -​ Reject null hypothesis
3. Formulate the decision rule (when to reject Ho); -​ Fail to reject null hypothesis
find the critical value/P-value
-​ Directional and non-directional (left OR Test statistic
right = 1-tailed test, left and right = 2-tailed -​ Calculated from sample data and used to
test) decide (either reject or fail to reject)
-​ Sample 50 bars
Statistically significant
-​ Where do we draw the line to help us decide * If nasatisfy nasa rejection pero pag hindi nasa
if we should reject or fail to reject the null acceptance
hypothesis?
Testing the hypothesized value of the mean
Level of confidence (C)
-​ How confident are we in our decision?

Level of significance (alpha = 1 - C)


-​ 1-10%; normally 5% so 95% ang confidence
level

Types of hypothesis tests


1. One-tailed (left directional) Examples:
-​ Used if Ha uses “<” symbol

2. One-tailed test (right directional)


-​ Used if Ha uses “>” symbol

__________________________________________

-​ .

3. Two-tailed test (Non-directional) __________________________________________


-​ Used if Ha uses “≠” symbol

-​ and
__________________________________________
__________________________________________

__________________________________________
__________________________________________
__________________________________________
Testing the difference between two means

__________________________________________ __________________________________________

Decisions made regarding Ho (Reject Ho/Do not


reject Ho)
-​ If we reject Ho, it means it is wrong
-​ If we accept Ho, it doesn’t mean it is correct,
we just don’t have enough evidence to reject
it

Errors in hypothesis testing

__________________________________________
Source SS d MS F-value
s of f
variatio Comput Tabula Interpr
n ed r et.

Betwee 178. 2 89.09 24.82 3.26 S


nC 18

Betwee 14.0 2 7.02 1.95 3.26 NS


nR 5

Interacti 187. 4 46.79 13.03 2.63 S


on 15

Within 129. 3 3.59


20 6

Total 508. 4
58 4

A B C GT

MT1 198 241 197 636

MT2 196 210 224 630

MT3 201 216 199 616

GT 595 667 620 1882

You might also like