0% found this document useful (0 votes)
11 views5 pages

Unit 5-DVER

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views5 pages

Unit 5-DVER

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

UNIT 5

Statistical Modeling & R Markdown


Statistical Modeling Data
https://www.simplilearn.com/tutorials/statistics-tutorial/what-is-statistical-modeling
Statistics
https://www.geeksforgeeks.org/statistics-for-data-science/
https://www.dasca.org/world-of-data-science/article/what-is-statistical-modeling-in-data-science
ANOVA
https://www.youtube.com/watch?v=OypCNBPmGBY&list=PLEIbY8S8u_DIJJ1nZWGDXaG_e2Yxwgy
iA&ab_channel=Dr.PuspendraClasses
ANOVA (Analysis of Variance) is a statistical method used to compare the means of three or more groups
to see if they are significantly different from each other. It helps to determine whether any of the differences
between group means are statistically significant.
Real-life Example:
Suppose you're a teacher who wants to know if different teaching methods (e.g., traditional lecture, online
lecture, and blended learning) have different effects on students' exam scores. You collect exam scores from
students who were taught using these three different methods. ANOVA can help determine if the average
scores differ significantly between these groups.
R code
# Create data for the example
set.seed(123) # For reproducibility
exam_scores <- data.frame(
method = rep(c("Traditional", "Online", "Blended"), each = 10),
score = c(rnorm(10, mean = 70, sd = 5), # Traditional
rnorm(10, mean = 75, sd = 5), # Online
rnorm(10, mean = 80, sd = 5)) # Blended
)

# View the first few rows of the data


head(exam_scores)

# Perform one-way ANOVA


anova_result <- aov(score ~ method, data = exam_scores)

# Summary of the ANOVA result


summary(anova_result)
Explanation of Code:
1. Data Preparation:
o We create a dataset where method represents the three different teaching methods
(Traditional, Online, Blended).
o The score column contains exam scores generated using random normal distributions with
different means for each group.
2. ANOVA Test:
o We use the aov() function to perform ANOVA, with score as the dependent variable and
method as the independent variable.
3. Result Interpretation:
o The summary(anova_result) will provide an F-statistic and a p-value. If the p-value is less
than 0.05, it suggests that at least one group mean is significantly different from the others.
Output Interpretation:
If the p-value is significant (e.g., p < 0.05), you can conclude that the teaching methods had a statistically
significant effect on exam scores. Otherwise, the differences in scores between the groups may not be
significant.
Two-Way ANOVA in R
Two-Way ANOVA is an extension of one-way ANOVA that examines the effect of two independent
variables (factors) on a dependent variable. It also allows for testing interactions between the two factors.
This method helps to determine whether the means of different groups are significantly different when
considering two factors.
Real-Life Example:
Suppose you want to investigate how teaching method (Traditional, Online, and Blended) and student
gender (Male and Female) impact student exam scores. In this case:
• Factor 1: Teaching Method (Traditional, Online, Blended)
• Factor 2: Gender (Male, Female)
• Dependent Variable: Exam Scores
Hypotheses Tested in Two-Way ANOVA:
1. Main Effect of Factor 1 (Teaching Method): Does the teaching method affect student
performance?
2. Main Effect of Factor 2 (Gender): Does gender affect student performance?
3. Interaction Effect: Does the effect of teaching method on exam performance depend on gender?
R Code for Two-Way ANOVA:
# Create a data frame with exam scores, method, and gender
exam_data <- data.frame(
method = rep(c("Traditional", "Online", "Blended"), each = 10),
gender = rep(c("Male", "Female"), each = 5, times = 3),
score = c(65, 70, 68, 72, 67, 69, 71, 66, 70, 68, # Traditional - Male/Female
75, 78, 77, 76, 74, 77, 76, 75, 79, 78, # Online - Male/Female
80, 82, 81, 84, 83, 85, 80, 82, 81, 83) # Blended - Male/Female
)

# View the data


print(exam_data)

# Perform two-way ANOVA


anova_result <- aov(score ~ method * gender, data = exam_data)

# Summary of the ANOVA result


summary(anova_result)
Explanation of the Code:
1. Data Creation: We create a data frame exam_data that contains three variables:
o method: The teaching method (Traditional, Online, Blended).
o gender: The gender of the students (Male, Female).
o score: The exam scores corresponding to each group (provided manually).
2. Two-Way ANOVA Test:
o We use aov(score ~ method * gender, data = exam_data) to perform a two-way ANOVA.
The * symbol in the formula tests both the main effects of method and gender, as well as
their interaction.
3. ANOVA Summary: The summary(anova_result) function prints the results, including the F-values
and p-values for the main effects of method, gender, and their interaction.
Output Explanation:
1. Main Effect of Teaching Method (method):
o Df: Degrees of freedom for the method variable is 2 (3 levels of method - 1).
o Sum Sq: Sum of squares for the method.
o F value: The F-statistic for the teaching method.
o Pr(>F): The p-value is extremely small, indicating that teaching method has a significant
effect on exam scores.
2. Main Effect of Gender (gender):
o Df: Degrees of freedom for gender is 1 (2 levels of gender - 1).
o Sum Sq: Sum of squares for gender.
o F value: The F-statistic for gender.
o Pr(>F): The p-value, indicating that gender does not have a significant effect on exam
scores.
3. Interaction Effect (method):
o Df: Degrees of freedom for the interaction between method and gender is 2.
o Sum Sq: Sum of squares for the interaction effect.
o F value: The F-statistic for the interaction.
o Pr(>F): The p-value, indicating that there is no significant interaction between teaching
method and gender in terms of their impact on exam scores.
4. Residuals:
o The residual sum of squares represents the unexplained variability after considering both
factors and their interaction.
Conclusion:
• Teaching Method has a significant effect on student exam scores.
• Gender does not significantly affect exam scores.
• Interaction between Teaching Method and Gender is not significant, meaning the effect of
teaching method on exam performance does not depend on gender.
Correlation Plots
Simple Linear Regression
Multiple Linear Regressions
Logistic Regression
Clustering model
R Markdown: Introduction
Code Chunks
Markdown Basics
R Notebooks
Output Formats
Implement Regression techniques and Clustering Model using R

You might also like