Compute Summary Statistics In R

Last Updated : 25 Jul, 2025

Summary statistics are values that describe and simplify a dataset. They include measures like mean, median, mode, range, standard deviation and variance. These values help understand the center, spread and shape of the data. In R programming language, they can be calculated using both built-in functions and external packages.

Implementation of Summary Statistics in R

We use different methods in R to compute summary statistics to get insights into the dataset. We will be using mtcars dataset which is a built-in dataset in R programming language.

1. Using Base R

We use built-in R functions to compute basic summary statistics like mean, median, standard deviation and variance.

1.1 One Variable

We compute summary statistics on a single variable using various base R functions.

summary: Displays basic statistics like min, max, mean, median and quartiles.
mean: Calculates the arithmetic average.
median: Finds the middle value in the data.
min: Returns the smallest value.
max: Returns the largest value.
quantile: Returns quantile values (e.g., 25th, 50th, 75th percentiles).
sd: Computes standard deviation.
var: Computes variance.
cat: Used to concatenate and print values.

data(mtcars)
cat("Summary statistics for mpg:\n")
summary(mtcars$mpg)
cat("\nMean:", mean(mtcars$mpg))
cat("\nMedian:", median(mtcars$mpg))
cat("\nMin:", min(mtcars$mpg))
cat("\nMax:", max(mtcars$mpg))
cat("\nQuantiles:", quantile(mtcars$mpg))
cat("\nStandard Deviation:", sd(mtcars$mpg))
cat("\nVariance:", var(mtcars$mpg))

Output:

Mean: The average mileage across all cars.
Median: The middle value when all mpg values are sorted in order.
Minimum: The lowest mpg value in the dataset.
Maximum: The highest mpg value in the dataset.
Quantiles: The values that divide the data into equal parts, typically at 25 percent, 50 percent and 75 percent.
Standard Deviation: Measures how much the mpg values vary from the mean.
Variance: Represents the squared average of deviations from the mean.

1.2 Multiple Variables

We use summary() on multiple variables to get descriptive statistics in one go.

summary: Can also be used on a data frame to get stats for multiple variables.

summary(mtcars[c("mpg", "disp", "hp", "wt")])

Output:

Minimum and Maximum: The smallest and largest values in each column.
Mean and Median: The average and middle values for each variable.
First and Third Quartiles: Indicate the spread of the middle fifty percent of the data for each column.

2. Using External Packages

We use external packages to compute advanced summary statistics like skewness, kurtosis and grouped metrics.

2.1 Using e1071 Package

We compute distribution shape statistics like skewness and kurtosis using e1071 package.

skewness: Measures asymmetry in the distribution.
kurtosis: Measures the peakedness of the distribution.
IQR: Returns the interquartile range (Q3 - Q1).
print: Displays output to the console.

install.packages("e1071")
library(e1071)
print("Skewness of mpg:")
skewness(mtcars$mpg)
print("Kurtosis of mpg:")
kurtosis(mtcars$mpg)
print("IQR of mpg:")
IQR(mtcars$mpg)

Output:

Skewness: Measures how symmetrical the data is. A value greater than zero indicates right skew.
Kurtosis: Indicates how peaked or flat the distribution is compared to a normal distribution.
Interquartile Range (IQR): The difference between the 75th and 25th percentiles, showing the range of the middle half of the data.

2.2 Using psych Package

We get detailed descriptive stats using the describe() function from psych package.

describe: Returns count, mean, std dev, min, max, skewness, kurtosis and other stats.

install.packages("psych")
library(psych)
describe(mtcars$hp)

Output:

Count: Total number of observations.
Mean and Standard Deviation: Central tendency and spread of horsepower.
Minimum and Maximum: Extreme values in horsepower.
Skewness and Kurtosis: Shape of the distribution for horsepower.

2.3 Using Hmisc Package

We generate a compact and informative summary using describe() from the Hmisc package.

describe: Displays stats like mean, 5-number summary, missing values and histogram-based distribution.

install.packages("Hmisc")
library(Hmisc)
describe(mtcars$mpg)

Output:

Mean and Five-Number Summary: Key statistics like minimum, lower quartile, median, upper quartile and maximum.
Missing Values: Count of any NA values.
Distribution Summary: A general shape of how mpg values are spread.

3. Grouped Summary Statistics

We use functions from the dplyr package to compute summary statistics on grouped data.

3.1 Group by One Variable

We group data by one column and compute group-wise statistics.

group_by: Groups data based on one or more variables.
summarise: Applies summary functions to each group.
mean: Calculates group-wise average.
median: Calculates group-wise median.
sd: Calculates group-wise standard deviation.

install.packages("dplyr")
library(dplyr)
mtcars %>%
  group_by(cyl) %>%
  summarise(
    mean_mpg = mean(mpg),
    median_mpg = median(mpg),
    sd_mpg = sd(mpg)
  )

Output:

mean_mpg: Average mileage within each cylinder group.
median_mpg: Middle mileage value within each group.
sd_mpg: Shows how much the mileage varies within each group.

3.2 Group by Multiple Variables

We calculate summary statistics grouped by multiple columns.

summarise: Can compute summaries of multiple variables at once.
mean: Computes mean of multiple columns.
sd: Computes standard deviation of a column.
var: Computes variance of a column.

mtcars %>%
  summarise(
    mean_mpg = mean(mpg),
    mean_disp = mean(disp),
    sd_hp = sd(hp),
    var_wt = var(wt)
  )

Output:

mean_mpg (20.09): The average fuel efficiency across all cars.
mean_disp (230.72): The average engine displacement (in cubic inches).
sd_hp (68.56): Indicates how much the horsepower values vary around the mean.
var_wt (0.96): Shows the spread (variance) in vehicle weights.

R Programming Language - Introduction

tanmoymishra

Improve

Article Tags :

Compute Summary Statistics In R

Implementation of Summary Statistics in R

1. Using Base R

1.1 One Variable

1.2 Multiple Variables

2. Using External Packages

2.1 Using e1071 Package

2.2 Using psych Package

2.3 Using Hmisc Package

3. Grouped Summary Statistics

3.1 Group by One Variable

3.2 Group by Multiple Variables

Similar Reads

Introduction

Fundamentals of R

Variables

Input/Output

Control Flow

Functions

Data Structures

Object Oriented Programming

Error Handling

Thank You!

What kind of Experience do you want to share?