0% found this document useful (0 votes)

17 views11 pages

R Cac1

Uploaded by

aadityakumar.dhaka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views11 pages

R Cac1

Uploaded by

aadityakumar.dhaka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

CAC – 1 :

ASSIGNMENT COMPONENT

Name : Aaditya Kumar Dhaka

Registration no : 2448001
Course Code : MDS 272
Course Name : Inferential Statistics Using R
R-Codes:

# Load the 'mtcars' dataset

data("mtcars")

# Example 1: Population vs. Sample

Objective: Visualize the difference between the entire population's distribution of mpg (miles per
gallon) in the mtcars dataset and a random sample of that data.

# Set a seed for reproducibility and draw a random sample of 10 rows

set.seed(123)
sample_data <- mtcars[sample(1:nrow(mtcars), 10), ]

# Plot population vs. sample for 'mpg' (Miles per Gallon)

hist(mtcars$mpg, breaks = 10, col = rgb(0, 0, 1, 0.5), main = "Population vs
Sample Distribution of 'mpg'",
xlab = "Miles per Gallon (mpg)", ylab = "Frequency")
hist(sample_data$mpg, breaks = 10, col = rgb(1, 0, 0, 0.5), add = TRUE)

# Add a legend
legend("topright", legend = c("Population", "Sample"), fill = c(rgb(0, 0, 1,
0.5), rgb(1, 0, 0, 0.5)))
Interpretation: The blue bars show the mpg distribution for all cars (the population), while the red bars
show the mpg distribution for a random sample of 10 cars. While the sample is only a subset of the
population, it attempts to represent the broader pattern. The distribution for the sample may differ
slightly from the population, especially with a small sample size, but it provides a snapshot, useful for
estimation or preliminary analysis.

# Example 2: Sampling Distribution of the Mean

Objective: Visualize the sampling distribution of the sample means of mpg to understand how sample
means approximate the population mean when sampling repeatedly.

set.seed(123)
sample_means <- replicate(1000, mean(sample(mtcars$mpg, size = 10, replace =
TRUE)))

# Plot the sampling distribution of the sample means

hist(sample_means, breaks = 20, col = "lightblue",
main = "Sampling Distribution of Sample Means (mpg)",
xlab = "Sample Mean of mpg", ylab = "Frequency")

# Add a red line representing the population mean of mpg

abline(v = mean(mtcars$mpg), col = "red", lwd = 2, lty = 2)

# Annotate the population mean for clarity

text(mean(mtcars$mpg), max(table(round(sample_means, 1))) - 5,
paste("Population Mean =", round(mean(mtcars$mpg), 2)),
col = "red", pos = 4)
Interpretation: The histogram represents the sampling distribution of mpg means across 1,000 samples,
each containing 10 random cars. The red line is the true population mean of mpg. This distribution is
concentrated around the population mean, showing the Law of Large Numbers: as we take more
samples, the sample means converge to the population mean. This concept is fundamental in inferential
statistics, as it demonstrates how sample statistics (like the mean) can estimate population parameters.

# Example 3: Standard Error

Objective: Calculate the standard error of the sample means to quantify the variability of the sampling
distribution and understand the precision of sample means as estimators.

# Calculate the standard error of sample means

standard_error <- sd(sample_means)
cat("Standard Error of Sample Means:", standard_error, "\n")

## Standard Error of Sample Means: 1.85098

Interpretation: The standard error (SE) measures the average deviation of the sample means from the
population mean. A smaller SE indicates that the sample means are clustered closely around the
population mean, implying greater precision in estimating the population mean. In our example, the
calculated SE shows how much the means of different samples of 10 cars typically vary from the true
mpg mean for all cars in the mtcars dataset.

# Example 4: Hypothesis Testing (Null and Alternative Hypotheses)

Objective: Perform a hypothesis test to check if the population mean of mpg significantly differs from a
hypothesized mean (e.g., 20).

# Perform a t-test to test if the population mean is equal to 20

t_test_result <- t.test(mtcars$mpg, mu = 20)
print(t_test_result)

##
## One Sample t-test
##
## data: mtcars$mpg
## t = 0.08506, df = 31, p-value = 0.9328
## alternative hypothesis: true mean is not equal to 20
## 95 percent confidence interval:
## 17.91768 22.26357
## sample estimates:
## mean of x
## 20.09062
Interpretation: This test examines if the average mpg for all cars in the dataset is statistically different
from 20. Here, the null hypothesis H0 : mean mpg = 20 is tested against the alternative hypothesis
H1 : mean mpg ≠ 20. The p-value from the t-test indicates whether this difference is statistically
significant. A low p-value (below the chosen significance level, typically 0.05) would mean rejecting the
null hypothesis, concluding that the population mean likely differs from 20.

# Example 5: Critical Region and Level of Significance

Objective: Illustrate the critical regions in a t-distribution for a significance level of 0.05, demonstrating
the rejection areas for a two-tailed hypothesis test.

alpha <- 0.05

df <- length(mtcars$mpg) - 1 # Degrees of freedom
critical_value <- qt(1 - alpha / 2, df = df)

# Generate t-distribution data

x <- seq(-4, 4, length = 100) # Range of t-values
y <- dt(x, df = df) # t-distribution density values

# Plot the t-distribution curve

plot(x, y, type = "l", col = "blue", lwd = 2,
main = "Critical Regions in t-Distribution (alpha = 0.05)",
xlab = "t-value", ylab = "Density")

# Add vertical lines for the critical regions (two-tailed)

abline(v = c(-critical_value, critical_value), col = "red", lty = 2, lwd = 2)

# Shade critical regions to show rejection areas

polygon(c(x[x <= -critical_value], -critical_value),
c(y[x <= -critical_value], 0), col = rgb(1, 0, 0, 0.2), border = NA)
polygon(c(x[x >= critical_value], critical_value),
c(y[x >= critical_value], 0), col = rgb(1, 0, 0, 0.2), border = NA)

# Add text labels for critical regions

text(-critical_value, 0.05, paste("Critical t =", round(-critical_value, 2)),
col = "red", pos = 4)
text(critical_value, 0.05, paste("Critical t =", round(critical_value, 2)),
col = "red", pos = 2)

# Add a legend to indicate the critical and acceptance regions

legend("topright", legend = c("Acceptance Region", "Critical Regions (Reject
H0)"),
fill = c("white", rgb(1, 0, 0, 0.2)), border = c("blue", "red"),
lty = c(1, 2), col = c("blue", "red"), lwd = c(2, 2))
Interpretation: The blue curve represents the t-distribution, with red dashed lines marking the critical
values. The shaded areas on the tails show the critical regions (where ∣t∣>2.045 at α=0.05), indicating
where we would reject H0 in a two-tailed test. If the test statistic falls within these shaded regions, we
reject the null hypothesis, concluding that our sample mean is significantly different from the
hypothesized mean.

# Example 6: Characteristics of a Good Estimator: Unbiasedness

Objective: Demonstrate the concept of unbiasedness by comparing the sample mean to the true
population mean.

# True mean of 'mpg' in population

true_mean <- mean(mtcars$mpg)
set.seed(123)
sample_data <- mtcars[sample(1:nrow(mtcars), 10), ]
sample_mean <- mean(sample_data$mpg)

# Display the true mean and sample mean

cat("True Mean of mpg:", true_mean, "\n")

## True Mean of mpg: 20.09062

cat("Sample Mean of mpg (unbiased):", sample_mean, "\n")

## Sample Mean of mpg (unbiased): 19.74

Interpretation: The true mean is the average mpg of all cars in the mtcars dataset, which serves as the
population parameter. The sample mean is calculated from a random sample of 10 cars. If the sample
mean is close to the true mean, this indicates that the sample is an unbiased estimator of the population
mean. Unbiasedness is a critical property of an estimator, implying that, on average, it neither
overestimates nor underestimates the parameter it estimates.

# Example 7: Scatter Plot for Correlation

Objective: Visualize the relationship between two continuous variables, horsepower and mpg, to assess
correlation.

# Create a scatter plot for horsepower vs. mpg

plot(mtcars$hp, mtcars$mpg, main = "Scatter Plot of Horsepower vs. mpg",
xlab = "Horsepower (hp)", ylab = "Miles per Gallon (mpg)", pch = 19, col
= "blue")
abline(lm(mpg ~ hp, data = mtcars), col = "red", lwd = 2)

Interpretation: This scatter plot displays each car's horsepower (x-axis) against its miles per gallon (y-
axis). The blue points represent individual cars. The red line is the linear regression line, showing the
best-fit relationship between horsepower and mpg. If the line slopes downward, it indicates a negative
correlation, meaning that higher horsepower tends to be associated with lower fuel efficiency (lower
mpg). This visual helps identify trends, patterns, and potential outliers in the data.
# Example 8: Histogram with Density Plot

Objective: Illustrate the distribution of the mpg variable and visualize its density for better
understanding.

# Create a histogram for 'mpg' with density overlay

hist(mtcars$mpg, breaks = 10, probability = TRUE, col = "lightblue", main =
"Histogram with Density Plot",
xlab = "Miles per Gallon (mpg)", ylab = "Density")
lines(density(mtcars$mpg), col = "red", lwd = 2)

Interpretation: The histogram represents the frequency distribution of mpg values. The area under the
histogram bars reflects the proportion of data points within each range of mpg. The red line shows the
estimated density of mpg values, providing a smoothed curve that helps visualize the overall shape of
the distribution. This allows us to observe the central tendency, spread, and any potential skewness or
outliers in the data.

# Example 9: Cumulative Frequency Plot

Objective: Display the cumulative frequency of the mpg variable to understand how values accumulate
across a range.

# Calculate cumulative frequency

cum_freq <- cumsum(table(cut(mtcars$mpg, breaks = 10)))

# Create a cumulative frequency plot

plot(cum_freq, type = "b", main = "Cumulative Frequency of mpg",
xlab = "Miles per Gallon (mpg)", ylab = "Cumulative Frequency", col =
"purple")

Interpretation: This cumulative frequency plot shows how the count of mpg values accumulates as mpg
increases. Each point on the plot represents the total number of cars that have an mpg less than or
equal to a specific value. This type of visualization is helpful for identifying percentiles and
understanding the distribution of the data. For example, if you know the cumulative frequency at a
certain mpg value, you can determine how many cars perform better or worse in terms of fuel
efficiency.

# Example 10: Boxplot for Comparing Groups

Objective: Compare the distribution of mpg across different groups defined by the number of cylinders
in cars.

# Create a boxplot for 'mpg' grouped by the number of cylinders

boxplot(mpg ~ cyl, data = mtcars, main = "Boxplot of mpg by Number of
Cylinders",
xlab = "Number of Cylinders", ylab = "Miles per Gallon (mpg)",
col = c("lightblue", "lightgreen", "lightpink"))
grid()
Interpretation: The boxplot displays the distribution of mpg for cars grouped by the number of cylinders
(4, 6, and 8). Each box represents the interquartile range (IQR), with the line inside indicating the
median mpg for that group. The whiskers extend to the minimum and maximum values (excluding
outliers), providing insights into variability and central tendency. Boxplots are useful for comparing
distributions between multiple groups and identifying potential outliers within each group.

# Example 11: Pie Chart for Proportions of Car Origin

Objective: Visualize the proportion of cars based on their origin, categorized by the number of cylinders.

# Create a new variable for car origin based on the number of cylinders
mtcars$origin <- ifelse(mtcars$cyl == 4, "European",
ifelse(mtcars$cyl == 6, "American", "Japanese"))

# Count the number of cars in each origin category

origin_counts <- table(mtcars$origin)

# Create a pie chart for the proportion of car origins

pie(origin_counts,
main = "Proportion of Cars by Origin",
col = c("lightblue", "lightgreen", "lightcoral"),
labels = paste(names(origin_counts), "\n(", origin_counts, " cars)", sep
= ""),
radius = 0.9)
legend("topright", legend = names(origin_counts), fill = c("lightblue",
"lightgreen", "lightcoral"))
Interpretation: The pie chart visually represents the proportion of cars based on their origin—European,
American, and Japanese. Each slice of the pie corresponds to the number of cars from each origin
category, with the size of each slice indicating its relative proportion to the whole dataset. This
visualization is effective for quickly understanding the distribution of categories and is helpful in
identifying which origin category has the most cars in the dataset.

BN2102 1-6 Notes
No ratings yet
BN2102 1-6 Notes
38 pages
Tenko Raykov, George A. Marcoulides-Basic Statistics - An Introduction With R-Rowman & Littlefield Publishers (2012) PDF
No ratings yet
Tenko Raykov, George A. Marcoulides-Basic Statistics - An Introduction With R-Rowman & Littlefield Publishers (2012) PDF
345 pages
ARCHIES The Way Indians Greet
No ratings yet
ARCHIES The Way Indians Greet
14 pages
Computer Aided Chemical Engineering
100% (2)
Computer Aided Chemical Engineering
149 pages
Using R For Basic Statistical Analysis
No ratings yet
Using R For Basic Statistical Analysis
11 pages
R Intro 2011
No ratings yet
R Intro 2011
115 pages
DEV Lab Manual
No ratings yet
DEV Lab Manual
27 pages
R Module 11 - Statistics
No ratings yet
R Module 11 - Statistics
35 pages
ProbList2 24 SLN
No ratings yet
ProbList2 24 SLN
20 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
32 pages
BES - R Lab
No ratings yet
BES - R Lab
5 pages
Business Analytics Unit - IV Notes - 60637706 - 2025 - 05!15!02 - 16
No ratings yet
Business Analytics Unit - IV Notes - 60637706 - 2025 - 05!15!02 - 16
28 pages
Unit4 R
No ratings yet
Unit4 R
21 pages
Unit4 R
No ratings yet
Unit4 R
21 pages
Lab6 STA552
No ratings yet
Lab6 STA552
5 pages
Introductory Statistical Concepts
100% (1)
Introductory Statistical Concepts
118 pages
Chapter 1
No ratings yet
Chapter 1
63 pages
Mit 302 Cat Solutions - 1
No ratings yet
Mit 302 Cat Solutions - 1
4 pages
Algorithm M
No ratings yet
Algorithm M
8 pages
R Module 5
No ratings yet
R Module 5
21 pages
Unit 2 R
No ratings yet
Unit 2 R
16 pages
R Ka Assignment
No ratings yet
R Ka Assignment
4 pages
R Notes For Data Analysis and Statistical Inference
No ratings yet
R Notes For Data Analysis and Statistical Inference
10 pages
Creating EDA Reports Using R Markdown
No ratings yet
Creating EDA Reports Using R Markdown
6 pages
Experiment 8
No ratings yet
Experiment 8
4 pages
Business Analytics-1: STR (Crew - Data)
No ratings yet
Business Analytics-1: STR (Crew - Data)
16 pages
Analisis Jalur
No ratings yet
Analisis Jalur
30 pages
R Module 5
No ratings yet
R Module 5
21 pages
CB161 (R Lab Manual)
No ratings yet
CB161 (R Lab Manual)
32 pages
Functions and Packages
No ratings yet
Functions and Packages
7 pages
#PART 1a) : "Vqv/ggbiplot"
No ratings yet
#PART 1a) : "Vqv/ggbiplot"
29 pages
R Programming Unit 4
No ratings yet
R Programming Unit 4
26 pages
List of Experiments
No ratings yet
List of Experiments
5 pages
R Unit-4
No ratings yet
R Unit-4
13 pages
Week 3
No ratings yet
Week 3
6 pages
Module2 BDA
No ratings yet
Module2 BDA
44 pages
Exploratory Data Analysis - NOTES
No ratings yet
Exploratory Data Analysis - NOTES
31 pages
Graph Plotting in R Programming
No ratings yet
Graph Plotting in R Programming
12 pages
MTH 4407 - Group 2 (Dr. Farid Zamani) - Lecture 2
No ratings yet
MTH 4407 - Group 2 (Dr. Farid Zamani) - Lecture 2
25 pages
R Manual PDF
No ratings yet
R Manual PDF
78 pages
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
No ratings yet
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
8 pages
Statistics Cheat Sheet
100% (1)
Statistics Cheat Sheet
4 pages
Stats 201 Midterm Sheet
No ratings yet
Stats 201 Midterm Sheet
2 pages
Simple Statistics Functions in R
No ratings yet
Simple Statistics Functions in R
41 pages
Type I and Type II Errors Type I Error
No ratings yet
Type I and Type II Errors Type I Error
7 pages
Project - P1: Simulation and Basic Inferential Data Analysis Project
No ratings yet
Project - P1: Simulation and Basic Inferential Data Analysis Project
6 pages
2023 Tutorial 12
No ratings yet
2023 Tutorial 12
6 pages
Medical Statistics With R
No ratings yet
Medical Statistics With R
85 pages
4.5-Bootstrap Variations
No ratings yet
4.5-Bootstrap Variations
25 pages
Mtcars: Choosing The Most Related Variable (S) To The Response
No ratings yet
Mtcars: Choosing The Most Related Variable (S) To The Response
13 pages
R Console
No ratings yet
R Console
6 pages
Prob & Stats (Slides) PDF
No ratings yet
Prob & Stats (Slides) PDF
101 pages
Espiritu Labex4
No ratings yet
Espiritu Labex4
6 pages
Statistical Modeling Using R - Lab Manual
No ratings yet
Statistical Modeling Using R - Lab Manual
23 pages
Confidence Intervals
No ratings yet
Confidence Intervals
3 pages
Genetica Cuantitativa
No ratings yet
Genetica Cuantitativa
120 pages
Eco 570 Assign
No ratings yet
Eco 570 Assign
10 pages
Confidence Interval and Credintial Interval
No ratings yet
Confidence Interval and Credintial Interval
15 pages
Sampling and Standard Error
No ratings yet
Sampling and Standard Error
33 pages
Duniatex Compro v1 22
No ratings yet
Duniatex Compro v1 22
24 pages
Reinforced Concrete Design - I: UET Peshawar
No ratings yet
Reinforced Concrete Design - I: UET Peshawar
22 pages
Tech Weirdo Profile
No ratings yet
Tech Weirdo Profile
13 pages
05 July 2019 DHSUD DRAFT IRR
No ratings yet
05 July 2019 DHSUD DRAFT IRR
37 pages
Final Accounts - Principles of Accounting
No ratings yet
Final Accounts - Principles of Accounting
9 pages
Edu 214 Syllabus
No ratings yet
Edu 214 Syllabus
8 pages
Infix
No ratings yet
Infix
23 pages
Masterclass On Risk Governance
No ratings yet
Masterclass On Risk Governance
6 pages
Thesis
No ratings yet
Thesis
210 pages
Full Text-Labor Review Cases
No ratings yet
Full Text-Labor Review Cases
41 pages
Worksheet Classification1
No ratings yet
Worksheet Classification1
15 pages
Topic:-: Quantum Tunneling Process
100% (1)
Topic:-: Quantum Tunneling Process
25 pages
NESPAK Carreer Opportunities
No ratings yet
NESPAK Carreer Opportunities
3 pages
Sahil Gupta (Resume)
No ratings yet
Sahil Gupta (Resume)
3 pages
Injection Techniques
No ratings yet
Injection Techniques
11 pages
CS504 Highlighted Handouts by PIN2 and MUHAMMAD (MAS All Rounder) - 1
No ratings yet
CS504 Highlighted Handouts by PIN2 and MUHAMMAD (MAS All Rounder) - 1
281 pages
Survey Ii Year (Ra13) Key
No ratings yet
Survey Ii Year (Ra13) Key
20 pages
FSG Project Overview
No ratings yet
FSG Project Overview
29 pages
GatieTrades - Predicting Market Moves (Po3)
100% (3)
GatieTrades - Predicting Market Moves (Po3)
29 pages
Lets Talk About Money Activities Promoting Classroom Dynamics Group Form 2574
No ratings yet
Lets Talk About Money Activities Promoting Classroom Dynamics Group Form 2574
1 page
Dell N5050 N5040 DELL 3520 Wistron
No ratings yet
Dell N5050 N5040 DELL 3520 Wistron
104 pages
Guide To Vapour Recovery Unit
100% (2)
Guide To Vapour Recovery Unit
6 pages
Buku PUIL
No ratings yet
Buku PUIL
133 pages
Employee Mental Health and Well-Being - Emerging Best Practices and Case Study Examples
No ratings yet
Employee Mental Health and Well-Being - Emerging Best Practices and Case Study Examples
19 pages
Part II 1 Summary A
No ratings yet
Part II 1 Summary A
1 page
Alpha Serve Res 40 Service Guide
0% (1)
Alpha Serve Res 40 Service Guide
423 pages
Research Paper G+20 Residential Building
No ratings yet
Research Paper G+20 Residential Building
8 pages
16 Graphic Design Technology-TMok
No ratings yet
16 Graphic Design Technology-TMok
12 pages