(DSIMGTS) Notes - November 21, 2024, 3 - 28 PM
(DSIMGTS) Notes - November 21, 2024, 3 - 28 PM
- Both an art and science that deals with the properties of a population of a population or
collection, organization, presentation, sample which makes the members similar to
analysis (extracting important info from the each other
data), and interpretation (be able to explain - There should be diversity and, preferrably, no
why scores are good/bad then offer findings constant
and recommendations after) of data; science Variable
because we use it in making decisions - Any characteristics or info measurable or
observable on every element of the
Purpose of statistics population or sample
- Provide info - Subdivided into 2 general categories
- Provide comparisons Qualitative (Categorical)
- Help discern relationships - Answers wh questions
- Aid in decision-making - Indicate what kind of a given characteristic
- Justify claims or assertions an individual, object, or even possesses
- Estimate unknown quantities Quantitative (Numerical)
- Predict future outcomes - Indicate how much a given characteristic an
individual, object, or event possesses
Branches of statistics - Discrete
Descriptive - Values are obtained through the
- Consists of methods concerned with the process of counting
collection, organization, summarization, and - Whole number
presentation of a set of data - Continuous
Inferential - Values are obtained through the
- Comprised of those methods concerned with process of measuring
making predictions or inferences about an - Using simple tools
entire population based on info provided by - Can be whole or decimal numbers
the sample Dependent
- Variable that is affected by another variable
Population Independent
- totality of all the elements or entities from - Affects the dependent variable
which you want to obtain an information
Sample Frameworks
- Small portion of the population; subset of the - Dependent variable lagi sa right and
population independent sa left
- If randomly selected, more or less the same Moderating (indirect)
result as the population compared to when Mediating (effect of IV can give + / - effect on the
chosen using a preferential method DV)
Relative frequency
- Obtained by dividing the frequency of the
given class by the total number of
observations
Additional information
Less than CF (<cummlative frequency)
- Total no. of observations within a class
whose values do not exceed the upper limit
of the class
Greater than CF
- Total no. of observations within a class
Tabular
whose values are not less than the lower limit
- Sometimes we could hardly grasp info from a
of the class
textual presentation thus data are presented
Cumulative frequency of a data class
using tables
- The number of data elements in that class
and all previous classes
Frequency distribution table
__________________________________________
- Tubular summary of data showing the
_
frequency or number of items in each several
non-overlapping classes
Numerical descriptive measure
- Use bar graph
- Values to describe data
1. Determine the range (R)
- Point intersection of ogives (central
- Range - difference between the
tendency)
highest and lowest value
2. Decide on the number of classes (K)
Measures of central tendency
- K - no. of non-overlapping intervals
- Describes the center of a given data set;
3. Compute for the class size (C)
single value about which the observation
- C - quotient of steps 1 and 2
tends to cluster
Mean - Variability is descriptive statistics that
- Average describe how similar a set of scores are to
1. It always exist each other
2. It is unique - Types of variability
3. It is sensitive; it takes into account all the 1. range
element; sensitive = good quality - Diff. between max and min
values; not that reliable
Median 2. Variance
- Observation - Mean squared differences of
1. It always exist the observation from their
2. It is unique mean (deviation score)
3. Not sensitive; not very reliable measurement 3. standard deviation
- Only the positive square root
Mode of the variance
- Appear more often compared to other 4. Coefficient of variation
1. May not exist - Ration of the SD to its mean
2. May not be unique; can be multiple expressed in percentage
3. No computation needed - Expressed in percent; higher =
more dispersed, lower = less
● For nominal variables, mode is the only dispersed
measure that can be used
● For ordinal variables, the mode and the Skewness
median may be used; median provides more - Measure if it is normal or not; related to
info (taking into account the ranking of normality of data; measure of symmetry in
categories) the distribution of scores
● For interval-ratio variables, the mode, - If data is normal, it should be approx. normal
median, and mean may all be calculated; distributed; skewness = 0 pag hindi pa nag
mean provides the most info about the one above its approx normal; all measures of
distribution, but the median is preferred if the central tendency are in one place
distribution is skewed - Mean > median = positively skewed (sk > 0);
mean < median = negatively skewed (sk < 0)
Measures of position - Positively skewed (skewed to the right); left is
- Descriptive statistics that discriminate (being almost normal (more low scores) and right
categorized) one score from another score side is not (less high scores)
within the same data set; fighting for a - Negatively skewed right is almost normal
position (more high scores) and left is not (less low
- Quantile scores)
- Dividing the data set into several
parts equally
- Types of quantiles
1. Quartile (4)
2. Decile (10)
3. Percentile (100)
Interpolation Kurtosis
- Looking for a number between two identified - Measuring the peakedness
numbers
Why do we use r?
- To analyze if a relationship exists between
two variables
- Coefficient of determination
- Equal to the square of r and multiplied
by 100%
- Explain or answer how much the
independent variable influences the
dependent variables or how much y
depends on x
If the trend of the line graph is going downward, the - Degree of relationship between x and
value of r is negative. This indicates that as the value y which cannot be seen in other
x increases, the value of y decreases, x and y being statistical tests of relationship
negatively correlated - More powerful test of relationship
compared with other nonparametric
tests
__________________________________________
F test
- A parametric test used to compare the
means of two or more groups of independent
__________________________________________ samples
- Analysis of variance (ANOVA)
Simple regression analysis - Kinds of analysis of variance:
Regression model (equation) - One-way: only 1 variables
- predict the value of y given the value of x - Two-way: 2 variables (column and
row); used to know if there are
significant differences between and
among columns and rows
Why use?
- To find if there is a significant difference
between and among the means of the two or
more independent groups
When to use?
- If there is normal distribution and when the
level of measurement is expressed in interval
or ratio data (like t-test and z-test)
- Data should be numeric to know if they are
the same or different
Formula:
B+W=T
W=T-B
- Compare three groups
F-test two-way ANOVA with interaction effect
- If there is interaction between variables
- If its two-way there can be more than 2 hypo
; hypothesis is 3 (one for each problem)
- The presence of 1 can affect the other
- Only used to know if they are the same or not
the same
Multiple regression
- Several independent variables
- Used to predict the dependent variable y
given the independent variable x
- Aside from prediction, we can also see
relationship between the dependent variable
and the different independent variables
- Selects every kth member of the
population with the starting point
determined at random
Non probability
- Each member does not have equal chance
1. Used when there isnt an exhaustive
population list available (no list)
2. Not random
3. Can be effective when trying to generate
ideas and getting feedback; not considered
for the whole population parang feedback
lang from the sample
4. More convenient and less costly
Sampling techniques - Convenience sampling
Population - Uses subject that are readily available
- Set of which includes all measurements of or includes only people who are easy
interest to the researcher to reach
- Collection of responses measurements or - Purposive sampling
counts that are of interest - Researcher looks for predefined
Sample groups that will serve as samples
- A subset of the population
Sample size (n)
Why do we do sampling - Number of respondents
- Impossible to study the whole population - Most statisticians agree that the minimum
- Manageability of data sample size to get any kind of meaningful is
- Economic reasons 100 but if less than 100 lang yung population,
- Time and effort try to get all of them.
- A good maximum sample size is usually 10%
Types of sampling as long as it does not exceed 1000
Probability samping
- Everybody in the population is given equal
chance of being includes
1. You have a complete sampling frame
2. You can select a random sample from the
population - The more samples, the better; the opinion of
3. You can generalize your results from a 1000 people is always better than 100
random sample
4. Can be more expensive and time consuming Determining sample size
(theres a process) 1. Using a census for small population
- Simple random sampling (everyone is surveyed)
- All members of the population have a 2. Using sample size which is 10% of N
chance of being included in the 3. Using published tables
sample 4. Using formulas to determine sample size:
- Fish bowl method slovin’s formula
- Stratified sampling
- Used when the population can be
subdivided into several smaller
groups or strata and then SRS is
applied to get samples from each
stratum
- Cluster sampling
- Employs the use of cluster (groups)
instead of individuals that are
randomly chosen; usually for big data
- Systematic sampling
- Critical point = divides the rejection and
acceptance (this is the decision rule)
- Value is taken from a statistical table;
Tabular value (z or t-table)
4. Test statistics; do the computation
5. Make a decision
6. Write a conclusion
Null hypothesis
- default/established= “it is believed”; thought
to be true unless it is rejected
- Currently accepted value for a parameter
- Always hoped to be rejected
- Always contains “=” sign
- Status quo
- Hypothesis of equality
Alternative hypothesis
- Also called the research hypothesis; involves
the claim to be tested
- Used to contradict the null hypothesis
- Uses > or < or ≠l to
- Generally represents the idea which the
researcher want to prove
______________________
Hypothesis
- A premise or claim that we want to test
- Assumption about the population parameter
- An educated guess
Hypothesis testing
- Process of making an inference or
generalization on population parameters
based on the results of the study on samples
- Deciding between what is reality and what is
a coincidence
Statistical hypothesis
- A guess or prediction made by the
researcher regarding the possible outcome of
the study Level of significance, alpha and the rejection
region
Steps in hypothesis testing - alpha = 0.05, means the probability of being
1. Formulate Ho and Ha right is 95% and the probability of being
2. Set the level of significance, usually it is given in wrong is 5%
the problem; the level of significance is the same as
the margin of error Possible outcomes
- Maximum tolerance of error; 95% - Reject null hypothesis
3. Formulate the decision rule (when to reject Ho); - Fail to reject null hypothesis
find the critical value/P-value
- Directional and non-directional (left OR Test statistic
right = 1-tailed test, left and right = 2-tailed - Calculated from sample data and used to
test) decide (either reject or fail to reject)
- Sample 50 bars
Statistically significant
- Where do we draw the line to help us decide * If nasatisfy nasa rejection pero pag hindi nasa
if we should reject or fail to reject the null acceptance
hypothesis?
Testing the hypothesized value of the mean
Level of confidence (C)
- How confident are we in our decision?
__________________________________________
- .
- and
__________________________________________
__________________________________________
__________________________________________
__________________________________________
__________________________________________
Testing the difference between two means
__________________________________________ __________________________________________
__________________________________________
Source SS d MS F-value
s of f
variatio Comput Tabula Interpr
n ed r et.
Total 508. 4
58 4
A B C GT