0% found this document useful (0 votes)
7 views5 pages

Statistical Functions and Regression in R

The document provides an overview of built-in statistical functions in R, including mean(), median(), sd(), and var(), along with their calculations and examples. It also explains simple linear regression using the lm() function to model the relationship between two variables. The document includes manual calculations and R code examples for each statistical function and regression analysis.

Uploaded by

mouryaharsh833
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views5 pages

Statistical Functions and Regression in R

The document provides an overview of built-in statistical functions in R, including mean(), median(), sd(), and var(), along with their calculations and examples. It also explains simple linear regression using the lm() function to model the relationship between two variables. The document includes manual calculations and R code examples for each statistical function and regression analysis.

Uploaded by

mouryaharsh833
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Built-in Statistical Functions and Regression Analysis in R

Built-in Statistical Functions in R

1. mean() function:

The mean (arithmetic average) is calculated by summing all the numbers in a set and dividing by the

number of values.

Example:

# R Code Example

data <- c(4, 7, 10, 12, 15)

mean_value <- mean(data)

print(mean_value)

Output:

[1] 9.6

Manual Calculation:

Mean = (4 + 7 + 10 + 12 + 15) / 5 = 48 / 5 = 9.6

In this example, we sum the values (4 + 7 + 10 + 12 + 15 = 48), and since there are 5 numbers,

we divide by 5, giving us a mean of 9.6.

2. median() function:

The median is the middle value in a sorted list of numbers. If the list has an odd number of values,
the median is the middle number. If the list has an even number of values, the median is the

average of

the two middle numbers.

Example:

# R Code Example

data <- c(4, 7, 10, 12, 15)

median_value <- median(data)

print(median_value)

Output:

[1] 10

Manual Calculation:

In the sorted list 4, 7, 10, 12, 15, the middle value is 10. Since there are 5 values (an odd number),

the median is the middle one directly.

3. sd() function (Standard Deviation):

The standard deviation measures the amount of variation or dispersion in a set of values.

Example:

# R Code Example

data <- c(4, 7, 10, 12, 15)

sd_value <- sd(data)

print(sd_value)
Output:

[1] 4.62

Manual Calculation:

1. Find the mean: Mean = 9.6

2. Subtract the mean from each number and square the result:

(4 - 9.6)^2 = 31.36, (7 - 9.6)^2 = 6.76, (10 - 9.6)^2 = 0.16, (12 - 9.6)^2 = 5.76, (15 - 9.6)^2 = 28.16

3. Find the mean of these squared differences (variance): Variance = 18.51

4. The standard deviation is the square root of the variance: Standard Deviation = sqrt(18.51) = 4.30

4. var() function (Variance):

The variance is the square of the standard deviation, representing the spread of data points.

Example:

# R Code Example

data <- c(4, 7, 10, 12, 15)

var_value <- var(data)

print(var_value)

Output:

[1] 21.33333

Manual Calculation:

Variance is the average of the squared differences from the mean: Variance = 21.33333
5. Regression Analysis: Simple Linear Regression

Linear regression models the relationship between two variables by fitting a line to the observed

data.

In R, the function lm() is used for linear modeling.

Example:

# R Code Example

hours <- c(2, 3, 4, 5, 6)

scores <- c(50, 60, 70, 80, 90)

# Fit a simple linear regression model

model <- lm(scores ~ hours)

print(model)

summary(model)

Output:

Call:

lm(formula = scores ~ hours)

Coefficients:

(Intercept) 30.000

hours 10.000

Manual Calculation:
The equation of the regression line is score = 30 + 10 * hours.

- Intercept (30): When hours = 0, the score is 30.

- Slope (10): For every additional hour studied, the score increases by 10 points.

The model predicts that if a student studies for 5 hours, their score will be:

Predicted Score = 30 + 10 * 5 = 80

You might also like