We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5
UNIT 5
Statistical Modeling & R Markdown
Statistical Modeling Data https://www.simplilearn.com/tutorials/statistics-tutorial/what-is-statistical-modeling Statistics https://www.geeksforgeeks.org/statistics-for-data-science/ https://www.dasca.org/world-of-data-science/article/what-is-statistical-modeling-in-data-science ANOVA https://www.youtube.com/watch?v=OypCNBPmGBY&list=PLEIbY8S8u_DIJJ1nZWGDXaG_e2Yxwgy iA&ab_channel=Dr.PuspendraClasses ANOVA (Analysis of Variance) is a statistical method used to compare the means of three or more groups to see if they are significantly different from each other. It helps to determine whether any of the differences between group means are statistically significant. Real-life Example: Suppose you're a teacher who wants to know if different teaching methods (e.g., traditional lecture, online lecture, and blended learning) have different effects on students' exam scores. You collect exam scores from students who were taught using these three different methods. ANOVA can help determine if the average scores differ significantly between these groups. R code # Create data for the example set.seed(123) # For reproducibility exam_scores <- data.frame( method = rep(c("Traditional", "Online", "Blended"), each = 10), score = c(rnorm(10, mean = 70, sd = 5), # Traditional rnorm(10, mean = 75, sd = 5), # Online rnorm(10, mean = 80, sd = 5)) # Blended )
# View the first few rows of the data
head(exam_scores)
# Perform one-way ANOVA
anova_result <- aov(score ~ method, data = exam_scores)
# Summary of the ANOVA result
summary(anova_result) Explanation of Code: 1. Data Preparation: o We create a dataset where method represents the three different teaching methods (Traditional, Online, Blended). o The score column contains exam scores generated using random normal distributions with different means for each group. 2. ANOVA Test: o We use the aov() function to perform ANOVA, with score as the dependent variable and method as the independent variable. 3. Result Interpretation: o The summary(anova_result) will provide an F-statistic and a p-value. If the p-value is less than 0.05, it suggests that at least one group mean is significantly different from the others. Output Interpretation: If the p-value is significant (e.g., p < 0.05), you can conclude that the teaching methods had a statistically significant effect on exam scores. Otherwise, the differences in scores between the groups may not be significant. Two-Way ANOVA in R Two-Way ANOVA is an extension of one-way ANOVA that examines the effect of two independent variables (factors) on a dependent variable. It also allows for testing interactions between the two factors. This method helps to determine whether the means of different groups are significantly different when considering two factors. Real-Life Example: Suppose you want to investigate how teaching method (Traditional, Online, and Blended) and student gender (Male and Female) impact student exam scores. In this case: • Factor 1: Teaching Method (Traditional, Online, Blended) • Factor 2: Gender (Male, Female) • Dependent Variable: Exam Scores Hypotheses Tested in Two-Way ANOVA: 1. Main Effect of Factor 1 (Teaching Method): Does the teaching method affect student performance? 2. Main Effect of Factor 2 (Gender): Does gender affect student performance? 3. Interaction Effect: Does the effect of teaching method on exam performance depend on gender? R Code for Two-Way ANOVA: # Create a data frame with exam scores, method, and gender exam_data <- data.frame( method = rep(c("Traditional", "Online", "Blended"), each = 10), gender = rep(c("Male", "Female"), each = 5, times = 3), score = c(65, 70, 68, 72, 67, 69, 71, 66, 70, 68, # Traditional - Male/Female 75, 78, 77, 76, 74, 77, 76, 75, 79, 78, # Online - Male/Female 80, 82, 81, 84, 83, 85, 80, 82, 81, 83) # Blended - Male/Female )
# View the data
print(exam_data)
# Perform two-way ANOVA
anova_result <- aov(score ~ method * gender, data = exam_data)
# Summary of the ANOVA result
summary(anova_result) Explanation of the Code: 1. Data Creation: We create a data frame exam_data that contains three variables: o method: The teaching method (Traditional, Online, Blended). o gender: The gender of the students (Male, Female). o score: The exam scores corresponding to each group (provided manually). 2. Two-Way ANOVA Test: o We use aov(score ~ method * gender, data = exam_data) to perform a two-way ANOVA. The * symbol in the formula tests both the main effects of method and gender, as well as their interaction. 3. ANOVA Summary: The summary(anova_result) function prints the results, including the F-values and p-values for the main effects of method, gender, and their interaction. Output Explanation: 1. Main Effect of Teaching Method (method): o Df: Degrees of freedom for the method variable is 2 (3 levels of method - 1). o Sum Sq: Sum of squares for the method. o F value: The F-statistic for the teaching method. o Pr(>F): The p-value is extremely small, indicating that teaching method has a significant effect on exam scores. 2. Main Effect of Gender (gender): o Df: Degrees of freedom for gender is 1 (2 levels of gender - 1). o Sum Sq: Sum of squares for gender. o F value: The F-statistic for gender. o Pr(>F): The p-value, indicating that gender does not have a significant effect on exam scores. 3. Interaction Effect (method): o Df: Degrees of freedom for the interaction between method and gender is 2. o Sum Sq: Sum of squares for the interaction effect. o F value: The F-statistic for the interaction. o Pr(>F): The p-value, indicating that there is no significant interaction between teaching method and gender in terms of their impact on exam scores. 4. Residuals: o The residual sum of squares represents the unexplained variability after considering both factors and their interaction. Conclusion: • Teaching Method has a significant effect on student exam scores. • Gender does not significantly affect exam scores. • Interaction between Teaching Method and Gender is not significant, meaning the effect of teaching method on exam performance does not depend on gender. Correlation Plots Simple Linear Regression Multiple Linear Regressions Logistic Regression Clustering model R Markdown: Introduction Code Chunks Markdown Basics R Notebooks Output Formats Implement Regression techniques and Clustering Model using R