0% found this document useful (0 votes)
17 views8 pages

Statistics Final

Uploaded by

pjjkxgzh5v
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views8 pages

Statistics Final

Uploaded by

pjjkxgzh5v
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Presentation #1

Contingency Tables
Contingency tables are tools used in statistics to summarize the relationship between two
categorical variables. Each entry in the table represents the frequency count of the
occurrences of specific combinations of variables.

Testing Dependency
The dependency between variables in a contingency table is tested using the Chi-square (χ2)
test and Fisher’s Exact test, depending on the sample size and expected frequencies:

Chi-square Test: Suitable for larger sample sizes (n > 40 or when 20 < n ≤ 40 and there’s no
any expected frequency that is < 5).). It checks the independence between two variables by
comparing observed frequencies with expected frequencies under the assumption that
variables are independent.

Chi-square Test: If the computed chi-square statistic is greater than the critical value from
chi-square distribution tables, we reject the null hypothesis (H0) and conclude that there is a
dependency between the variables.

Fisher’s Exact Test: Used when sample sizes are smaller (n ≤ 20 or when 20 < n ≤ 40 and any
expected frequency is < 5). This test is more accurate when the sample size is too small for
the χ2 test to be reliable.

Null Hypothesis (H0): Variables are independent.

Measures of Association
Relative Risk (RR) and Odds Ratio (OR)
Relative Risk (RR): This tells you how much more likely something is to happen in one group
compared to another. For example, if the relative risk of developing a disease among
smokers compared to non-smokers is 2, it means smokers are twice as likely to develop the
disease compared to non-smokers.

Odds Ratio (OR): This compares the odds of something happening in one group to the odds
of it happening in another group. It's commonly used in case-control studies. An odds ratio
of 1 means there's no difference in odds between the groups. If it's greater than 1, it means
the event is more likely to happen in the first group, and if it's less than 1, it's more likely to
happen in the second group.

Attributive Risk (AR)


If smokers develop a disease with chance of 10% and non-smokers develop it with a chance
of 2% then AR = 10-2=8% which means among smokers 8% of developed diseases are
literally because of smoking.
More Tests
McNemar Test: Used for cases when two groups (samples) have paired data like
before/after states.

Therminoloy to remember
Nominal Data: Data categorized without an inherent order; distinct categories based on
names or labels only.
Ordinal Data: Data arranged in a specific sequence where the order is meaningful, but
differences between values are not consistent.
Contingency Table: A matrix format that displays the frequency distribution of variables to
study the dependency between them.
Dependency: A situation in statistical terms where the occurrence of one event is
influenced by the presence of another.
Chi-square Test: A statistical method used to assess whether observed frequencies in a
contingency table differ significantly from expected frequencies.
Fisher Test: Also known as Fisher’s Exact Test, used for exact hypothesis testing on
categorical data, especially useful with small sample sizes.
Observed Frequency: The actual number of observations in each category or cell of a
contingency table.
Expected Frequency: The theoretical frequency calculated under the null hypothesis of
independence in a contingency table.
Relative Risk (RR): Measures the ratio of the probability of an event occurring in an exposed
group compared to a non-exposed group.
PRE Measure: Proportional reduction in error; indicates how much better one can predict
the dependent variable by knowing the independent variable.
Chi-square Critical Value: The threshold value against which the calculated chi-square
statistic is compared to decide whether to reject the null hypothesis.
Alpha Level (α): The threshold of probability at which you reject the null hypothesis;
commonly set at 0.05 or 5%.
Independence in Statistics: The scenario where the occurrence of one event does not affect
the probability of occurrence of another event.
Power of the Test: The likelihood that a test will correctly reject the null hypothesis when it
is false, i.e., it will detect an effect if there is one.
Coefficient of Association: A measure used to quantify the strength and direction of the
relationship between two nominal variables.
Fisher’s Exact Probability Test: A statistical significance test used in the analysis of 2x2
contingency tables.
Null Hypothesis (H0): A general statement or default position that there is no relationship
between two measured phenomena.
Alternative Hypothesis (H1): The hypothesis that sample observations are influenced by
some non-random cause.
P-value: The probability of observing test results at least as extreme as the results actually
observed, under the assumption that the null hypothesis is correct.
Cramer’s V: A measure of association between two nominal variables, giving a value
between 0 and 1 where 0 indicates no association.
Presentation #2
Simple Linear Regression Analysis
Linear regression models the relationship between two variables where one variable
(dependent, Y) is considered a function of the other variable (independent, X).

The equation for simple linear regression is typically expressed as Y = a + bX, where:
• 𝑎 (intercept) is the constant term.
• 𝑏 (regression coefficient) represents the change in the dependent variable for a one-
unit change in the independent variable.
Parameter Estimation:
Method of taking a random sample and checking how many pieces of one variable is in this
sample = predicting the proportion of this variable in the whole population.

Correlation analysis
Coefficient of Correlation (r):
Measures the strength and direction of the linear relationship between two variables.
Values range from -1 to +1:
• +1 indicates a perfect positive linear relationship,
• -1 indicates a perfect negative linear relationship,
• 0 indicates no linear relationship.
Coefficient of Determination (r2):
Represents the proportion of the variance in the dependent variable that is predictable from
the independent variable.
Measures how well variablity of one variable can predict the other variable.

Testing in regression analysis


Significance of Parameters (t-test):
• Helps us figure out if the relationship between our independent variables (like age)
and our dependent variable (like grocery spendings) are meaningful.
Significance of the Regression Model:
• This test helps us see if the overall model, which includes all the independent
variables together, is useful in predicting the dependent variable.Continuing with the
grocery spending example, let's say you have a model that includes age, income, and
location as predictors of grocery spending. The F-test helps you determine if this
whole model, including all these factors, is better at predicting grocery spending
than just guessing randomly.

In essence, the t-test looks at each individual predictor's significance, while the F-test looks
at the overall usefulness of the entire model in predicting the outcome.

Terminology to remember
Linearity:
The relationship between the independent and dependent variable must be linear.
Verification is through visual inspection of data plots or statistical tests.
Homoscedasticity:
The strength/level of the variability of our errors.
Ordinary Least Squares (OLS) - Method for estimating the parameters in a regression model
by getting an intercepts and slope of the line that estimates the relationship.
Intercept (a) - The expected mean value of Y when all X=0.
Regression Coefficient (b) - Represents the change in the dependent variable for a one-unit
change in the independent variable.
Coefficient of Correlation (r) - Measures the strength and direction of a linear relationship
between two variables.
Coefficient of Determination (r2) - Proportion of the variance in the dependent variable that
is predictable from the independent variable.
Testing in Regression - Processes like t-tests and F-tests to assess the significance of
regression models and their coefficients.
Presentation #3
Time Series Fundamentals
A time series is a sequence of data points recorded at consistent time intervals. This data
can be analyzed from several perspectives:

Fixed Moment and Interval: Refers to specific points in time versus continuous intervals.
Periodicity: Divided into short-term and long-term series, reflecting the length of time over
which data are collected.
Variable Types: Original variables (raw data) and derived variables (calculated or processed
data).
Unit of Measure: Natural variables (raw units like liters, meters) and financial variables
(monetary units).

Descriptive Statistics in Time Series


Level of Time Series: This could be simple measures like the arithmetical average of data
points, e.g., average annual consumption of chicken meat.
Dynamics of Development: Measured either in absolute terms (e.g., first and second
differences of the data points) or relative terms (fixed-base and chain-base indexes).
Time Series Models and Decomposition
Trend Analysis: Identifying the underlying trend in data which could be linear, quadratic,
exponential, etc., using different smoothing techniques such as moving averages or trend
functions.
Components of Time Series: Includes trend, periodic fluctuations (periodic and seasonal),
cyclic components, and irregular movements.

Trend Estimation and Forecasting


Ordinary Least Squares (OLS): A method used to estimate the parameters in a linear
regression model, minimizing the sum of squared residuals.
Forecasting: Involves both point estimates and interval estimates to predict future values,
taking into account the confidence interval around the predicted value to account for
uncertainty.

Exponential Smoothing
Adaptive Models: These models adjust the parameters over time and do not assume
stability of the trend. Exponential smoothing is a key example where recent observations
are given more weight, decreasing exponentially into the past.
Types of Exponential Smoothing:
Simple Exponential Smoothing: Used when data has no trend or seasonality.
Double and Triple Exponential Smoothing: Handle data with trends (linear or quadratic).
Holt and Winters Methods: For data with trends and seasonal patterns.
Analysis of Seasonality
Seasonal Indices: Measures that quantify the seasonal pattern within a time series, allowing
adjustments to forecasts to account for seasonality.
Calculation of Seasonal Indices: Involves smoothing data points, averaging them over
periods, and comparing these averages to the overall trend to determine seasonal effects.

Forecast Accuracy and Model Fit


Accuracy Measures: Include mean error (ME), mean square error (MSE), mean absolute
error (MAE), and mean absolute percentage error (MAPE).
Model Evaluation: The fit of a model can be assessed using measures such as the index of
determination and error statistics to determine the effectiveness of a model at capturing
the data.

Terminology to remember
Time Series: Data points indexed in time order.
OLS: Method for estimating the unknown parameters in a linear regression model.
Trend: The underlying pattern in the data in a time series, excluding irregular effects.
Stationarity: A statistical characteristic of a time series whose mean and variance are
constant over time.
Seasonality: Repeating patterns or cycles of behavior over time.
Seasonal Index: A measure used to adjust predictions based on seasonal variations.
Forecast: A calculation or estimate of future events.
Exponential Smoothing: A rule of weighted moving averages where weights decrease
exponentially.
Accuracy: The closeness of a measured or calculated value to its true value.
Smoothing Constant (α): The weighting applied to the most recent period's value in
exponential smoothing.
Presentation #4
Introduction to Index Numbers
Index numbers are statistical measures designed for comparing quantities over different
periods or different entities. They are primarily used to measure changes in economic data
such as price levels, quantities, or other financial indicators over time.

Main Aims of Index Numbers


Comparison: Index numbers allow for both absolute (differences) and relative
(proportional) comparisons.
Time Factor: They analyze changes over time, providing insights into economic trends.
Spatial Factor: They compare different geographical locations or sectors.

Determination and Types of Indicators


Indicators can be classified based on:
Finding Method:
• Primary: Directly measured.
• Secondary (Derived): Calculated from primary data.
Expression Method:
• Absolute
• Relative
Time Period:
• Moment: Specific point in time.
• Interval: Span of time.
Variable Type:
• Homogenous: Similar items.
• Non-homogenous: Diverse items.
Nature of Variables:
• Extensity: Extent of activity or usage.
• Intensity: Degree or strength of activity.

Types of Indices
Individual Indices:
• Simple: Measures a single variable.
• Composite: Combines multiple simple indices.
Aggregate Indices:
Combine data from non-homogenous variables to create a unified index.

Examples and Applications


Industry Data: Changes in employment and wages over years in various sectors.
City Branch Data: Employee changes and wage adjustments in different cities.
Product Sales: Monthly sales changes and earnings for products.
Key Indices Explained
Laspeyres Index:
• Uses base period quantities as weights.
• Commonly used to measure price changes.
Paasche Index:
• Uses current period quantities as weights.
• Helps in reflecting the current economic conditions.
Fisher Index:
• Geometric mean of Laspeyres and Paasche indices.
• Considered more robust as it evens out the biases of both indices.

Decomposition of Indices
Indices can be decomposed to understand the influence of different components like price
and quantity on the overall index value. This can be done using:

Index of Constant Composition (ICC): Holds one element constant to measure the effect of
the other.
Index of Structure (ISTR): Measures the structural changes.

Practical Computations
Various exercises are provided to compute changes in:
• Employee numbers and wages in industries and branches.
• Quantity sold and earnings for specific products.

Terminology to remember
Base Period: The time against which all comparisons are made.
Current Period: The time period under analysis for changes.
Simple Index: Measures one variable directly.
Composite Index: Combines several simple indices.
Aggregate Index: Combines different data types into a unified measure.
Laspeyres Index: Uses base period quantities as weights.
Paasche Index: Uses current period quantities for weighting.
Fisher Index: Geometric mean of Laspeyres and Paasche indices.
Extensity: Extent or scope of activity measured.
Intensity: Intensity or level of activity.
Decomposition: Breaking down an index into components to analyze effects separately.

You might also like