0% found this document useful (0 votes)

53 views10 pages

Statistics

The document provides an introduction to the "mtcars" data set in R, which contains information on 32 cars. It describes the variables in the data set like miles per gallon, number of cylinders, horsepower, weight, and others. It demonstrates how to access, manipulate, and analyze the data set using R functions like dim(), names(), sort(), summary(), mean(), median(), and quantiles. The goal is to help users understand this built-in data set and how to extract useful information from it.

Uploaded by

noufatcoursera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views10 pages

Statistics

Uploaded by

noufatcoursera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

R Statistics

Statistics Introduction
 Statistics is the science of analyzing, reviewing, and concluding data.
 Some basic statistical numbers include:
 Mean, median and mode
 Minimum and maximum value
 Percentiles
 Variance and Standard Deviation
 Covariance and Correlation
 Probability distributions

 The R language was developed by two statisticians. It has many built-in

functionalities, in addition to libraries for the exact purpose of statistical
analysis.
Data Set
 A data set is a collection of data, often presented in a table.
 There is a popular built-in data set in R called "mtcars" (Motor Trend Car
Road Tests), which is retrieved from the 1974 Motor Trend US Magazine.

Example
# Print the mtcars data set
mtcars

“mtcars” data set:

Motor Trend Car Road Tests
Description
 The data was extracted from the 1974 Motor Trend US magazine and
comprises fuel consumption and 10 aspects of automobile design and
performance for 32 automobiles (1973-74 models).
Usage
“mtcars”
Format
A data frame with 32 observations on 11 (numeric) variables:
[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (1000 lbs)
[, 7] qsec 1/4 mile time
[, 8] vs Engine (0 = V-shaped, 1 = straight)
[, 9] am Transmission (0 = automatic, 1 = manual)
[,10] gear Number of forward gears
[,11] carb Number of carburetors

Information About the Data Set

 We can use the question mark (?) to get information about the mtcars data
set.

 Example
# Use the question mark to get information about the data set

?mtcars
Get Information
 Use the dim() function to find the dimensions of the data set, and the
names() function to view the names of the variables:

 Example
Data_Cars <- mtcars # create a variable of the mtcars data set for
better organization

# Use dim() to find the dimension of the data set

dim(Data_Cars)

# Use names() to find the names of the variables from the data set
names(Data_Cars)

 Use the rownames() function to get the name of each row in the first column,
which is the name of each car:

 Example
Data_Cars <- mtcars

rownames(Data_Cars)

From the examples above, we have found out that the data set
has 32 observations (Mazda RX4, Mazda RX4 Wag, Datsun 710, etc)
and 11 variables (mpg, cyl, disp, etc).

 A variable is defined as something that can be measured or counted.

Here is a brief explanation of the variables from the mtcars data set:
Variable Name Description
mpg Miles/(US) Gallon
cyl Number of cylinders
disp Displacement
hp Gross horsepower
drat Rear axle ratio
wt Weight (1000 lbs)
qsec 1/4 mile time
vs Engine (0 = V-shaped, 1 = straight)
am Transmission (0 = automatic, 1 = manual)
gear Number of forward gears
carb Number of carburetors

Print Variable Values

 If you want to print all values that belong to a variable, access the data frame
by using the $ sign, and the name of the variable (for example cyl
(cylinders)).

 Example
Data_Cars <- mtcars

Data_Cars$cyl

Sort Variable Values

 To sort the values, use the sort() function:

Example
Data_Cars <- mtcars

sort(Data_Cars$cyl)

From the examples above, we see that most cars have 4 and 8 cylinders.
Analyzing the Data
 Now that we have some information about the data set, we can start to
analyze it with some statistical numbers.
 For example, we can use the summary() function to get a statistical summary
of the data:

 Example
Data_Cars <- mtcars

summary(Data_Cars)

The summary() function returns six statistical numbers for each variable:
 Min
 First quantile (percentile)
 Median
 Mean
 Third quantile (percentile)
 Max
Max Min
 In the previous chapter, we introduced the mtcars data set. We learned from
the ‘R Math’ chapter that R has several built-in math functions.
 For example, the min() and max() functions can be used to find the lowest or
highest value in a set:

 Example
Find the largest and smallest value of the variable hp (horsepower).

Data_Cars <- mtcars

max(Data_Cars$hp)
min(Data_Cars$hp)
Now we know that the largest horsepower value in the set is 335, and
the lowest 52.

We could take a look at the data set and try to find out which car these
two values belongs to:

By observing the table, it looks like the largest hp value belongs to a

Maserati Bora, and the lowest belongs to a Honda Civic.

 It is much easier (and safer) to let R find out this for us.
 For example, we can use the which.max() and which.min() functions to find
the index position of the max and min value in the table:

 Example
Data_Cars <- mtcars
which.max(Data_Cars$hp)
which.min(Data_Cars$hp)

 Or even better, combine which.max() and which.min() with the rownames()

function to get the name of the car with the largest and smallest horsepower:

 Example
Data_Cars <- mtcars

rownames(Data_Cars)[which.max(Data_Cars$hp)]
rownames(Data_Cars)[which.min(Data_Cars$hp)]

Now we know for sure:

Maserati Bora is the car with the highest horsepower, and Honda
Civic is the car with the lowest horsepower.

Outliers
 Max and min can also be used to detect outliers. An outlier is a data point
that differs from rest of the observations.
 Example of data points that could have been outliers in the mtcars data set:
 If maximum of forward gears of a car was 11
 If minimum of horsepower of a car was 0
 If maximum weight of a car was 50 000 lbs

R Mean, Median, and Mode

In statistics, there are often three values that interests us:
 Mean - The average value
 Median - The middle value
 Mode - The most common value

Mean
 To calculate the average value (mean) of a variable from the mtcars
data set, find the sum of all values, and divide the sum by the number
of values.
Sorted observation of wt (weight)
1.513 1.615 1.835 1.935 2.140 2.200 2.320 2.465
2.620 2.770 2.780 2.875 3.150 3.170 3.190 3.215
3.435 3.440 3.440 3.440 3.460 3.520 3.570 3.570
3.730 3.780 3.840 3.845 4.070 5.250 5.345 5.424

The mean() function in R can do it.

 Example
Find the average weight (wt) of a car:

Data_Cars <- mtcars

mean(Data_Cars$wt)
Median
 The median value is the value in the middle, after you have sorted all the
values.
 If we look at the values of the wt variable (from the mtcars data set), we will
see that there are two numbers in the middle.
 Sorted observation of wt (weight)
1.513 1.615 1.835 1.935 2.140 2.200 2.320 2.465
2.620 2.770 2.780 2.875 3.150 3.170 3.190 3.215
3.435 3.440 3.440 3.440 3.460 3.520 3.570 3.570
3.730 3.780 3.840 3.845 4.070 5.250 5.345 5.424

Note: If there are two numbers in the middle, we must divide the sum of those
numbers by two, to find the median.
 The median() function in R can find the middle value:

 Example
Find the mid point value of weight (wt):

Data_Cars <- mtcars

median(Data_Cars$wt)

Mode
 The mode value is the value that appears the greatest number of times.
 R does not have a function to calculate the mode. However, we can create
our own function to find it.
 If we look at the values of the wt variable (from the mtcars data set), we will
see that the numbers 60 are often shown.
1.513 1.615 1.835 1.935 2.140 2.200 2.320 2.465
2.620 2.770 2.780 2.875 3.150 3.170 3.190 3.215
3.435 3.440 3.440 3.440 3.460 3.520 3.570 3.570
3.730 3.780 3.840 3.845 4.070 5.250 5.345 5.424

Instead of counting it ourselves, we can use R to find the mode:

 Example
Data_Cars <- mtcars

names(sort(-table(Data_Cars$wt)))[1]

From the example above, we now know that the number that appears the
most number of times in mtcars wt variable is 3.44 or 3.440 lbs.

Percentiles
 Percentiles are used in statistics to give you a number that describes the
value that a given percent of the values are lower than.
 If we look at the values of the wt (weight) variable from the mtcars data set:

1.513 1.615 1.835 1.935 2.140 2.200 2.320 2.465

2.620 2.770 2.780 2.875 3.150 3.170 3.190 3.215
3.435 3.440 3.440 3.440 3.460 3.520 3.570 3.570
3.730 3.780 3.840 3.845 4.070 5.250 5.345 5.424

 What is the 75. percentile of the weight of the cars? The answer is 3.61 or 3
610 lbs, meaning that 75% or the cars weight 3 610 lbs or less:
 Example
Data_Cars <- mtcars

# c() specifies which percentile you want

quantile(Data_Cars$wt, c(0.75))

 If we run the quantile() function without specifying the c() parameter, you
will get the percentiles of 0, 25, 50, 75 and 100:

Example
Data_Cars <- mtcars

quantile(Data_Cars$wt)

Quartiles
 Quartiles are data divided into four parts, when sorted in an ascending order:
 The value of the first quartile cuts off the first 25% of the data
 The value of the second quartile cuts off the first 50% of the data
 The value of the third quartile cuts off the first 75% of the data
 The value of the fourth quartile cuts off the 100% of the data

 Use the quantile() function to get the quartiles.

Estimate
No ratings yet
Estimate
3 pages
7296 0110 IMP Multi Point Inspection Form Toyota 1
No ratings yet
7296 0110 IMP Multi Point Inspection Form Toyota 1
2 pages
Suez University Faculty of Petroleum &mining Engineering Mechanical Design Sheet # 2
No ratings yet
Suez University Faculty of Petroleum &mining Engineering Mechanical Design Sheet # 2
6 pages
Drilling Machine - Notes
No ratings yet
Drilling Machine - Notes
6 pages
Physics Project Report
No ratings yet
Physics Project Report
7 pages
As Data Manipulation With Dplyr-2
No ratings yet
As Data Manipulation With Dplyr-2
6 pages
1909 - Humphrey1909 - An Internal-Combustion Pump, and Other Applications of A New Principle
No ratings yet
1909 - Humphrey1909 - An Internal-Combustion Pump, and Other Applications of A New Principle
126 pages
VHP9500GSI: Basic Specifications
No ratings yet
VHP9500GSI: Basic Specifications
2 pages
R Programming-Chapiter 6
No ratings yet
R Programming-Chapiter 6
10 pages
Catálogo de Válvulas Borboleta
No ratings yet
Catálogo de Válvulas Borboleta
6 pages
R Ka Assignment
No ratings yet
R Ka Assignment
4 pages
Coduri Eroare Tractoare
No ratings yet
Coduri Eroare Tractoare
39 pages
Experiment 8
No ratings yet
Experiment 8
4 pages
Introduction To Data Analysis Using R 35 Min Lecture
No ratings yet
Introduction To Data Analysis Using R 35 Min Lecture
17 pages
Data - Wrangling Analysis
No ratings yet
Data - Wrangling Analysis
26 pages
Module 5 (003) - Updated
No ratings yet
Module 5 (003) - Updated
101 pages
Practice Questions On Central Tendency On Mtcars
No ratings yet
Practice Questions On Central Tendency On Mtcars
3 pages
Week 3
No ratings yet
Week 3
6 pages
Bda File
No ratings yet
Bda File
54 pages
Main Data Wartsila Rtflex50
No ratings yet
Main Data Wartsila Rtflex50
3 pages
Exercise 1 Filtering and Summarizing Fuel Efficiency
No ratings yet
Exercise 1 Filtering and Summarizing Fuel Efficiency
1 page
PUMP 5 Buyer11.iocl - Mathura
No ratings yet
PUMP 5 Buyer11.iocl - Mathura
125 pages
2nd Article
No ratings yet
2nd Article
2 pages
Immunology
No ratings yet
Immunology
4 pages
CS605 Labcf
No ratings yet
CS605 Labcf
30 pages
Service Manual 13-7832
No ratings yet
Service Manual 13-7832
7 pages
Mini CVT 040930 Vt1f Map
No ratings yet
Mini CVT 040930 Vt1f Map
1 page
DEV Lab Manual
No ratings yet
DEV Lab Manual
27 pages
Catalog 2012 Cessna
No ratings yet
Catalog 2012 Cessna
13 pages
Data Structures
No ratings yet
Data Structures
25 pages
Business Analytics Unit - IV Notes - 60637706 - 2025 - 05!15!02 - 16
No ratings yet
Business Analytics Unit - IV Notes - 60637706 - 2025 - 05!15!02 - 16
28 pages
Practical 02
No ratings yet
Practical 02
4 pages
Spare Parts List: R902248366 R909712512 Drawing: Material Number
No ratings yet
Spare Parts List: R902248366 R909712512 Drawing: Material Number
26 pages
Data Analysis Using Stata
No ratings yet
Data Analysis Using Stata
13 pages
4 1 1diaphragmmeteringpump
No ratings yet
4 1 1diaphragmmeteringpump
47 pages
R LAB Exproling Data
100% (2)
R LAB Exproling Data
6 pages
Dar 4
No ratings yet
Dar 4
28 pages
R Tutorial
No ratings yet
R Tutorial
34 pages
Unit 4 V Statistics
No ratings yet
Unit 4 V Statistics
11 pages
A320Neo - Start Procedure From Cold and Dark and Checklist
No ratings yet
A320Neo - Start Procedure From Cold and Dark and Checklist
5 pages
SML Practicals All
No ratings yet
SML Practicals All
22 pages
Starting With R
No ratings yet
Starting With R
34 pages
Data Preprocessing
No ratings yet
Data Preprocessing
27 pages
Statistics and Data Science With R Part - 4
No ratings yet
Statistics and Data Science With R Part - 4
23 pages
Rev A Durospeed Guide
No ratings yet
Rev A Durospeed Guide
15 pages
Job Card Manual
No ratings yet
Job Card Manual
20 pages
Nº Sabó Código Valor Un Medida
No ratings yet
Nº Sabó Código Valor Un Medida
40 pages
MDPN460 Lecture05
No ratings yet
MDPN460 Lecture05
32 pages
Aidco 450E BR
No ratings yet
Aidco 450E BR
4 pages
315 D Caterpillar
No ratings yet
315 D Caterpillar
13 pages
R Module 5
No ratings yet
R Module 5
21 pages
Assignment Auto
No ratings yet
Assignment Auto
6 pages
RTJ Smith Gasket Brochure
No ratings yet
RTJ Smith Gasket Brochure
3 pages
2 Table and Graphical Representations
No ratings yet
2 Table and Graphical Representations
46 pages
PRACTICAL4
No ratings yet
PRACTICAL4
4 pages
C831/PRO8432WT Illustrated Spare Parts Manual
No ratings yet
C831/PRO8432WT Illustrated Spare Parts Manual
19 pages
PDF Combined Sep 2016
No ratings yet
PDF Combined Sep 2016
3 pages
R Module 5
No ratings yet
R Module 5
21 pages
Statistical Modeling Using R - Lab Manual
No ratings yet
Statistical Modeling Using R - Lab Manual
23 pages
TIS - Lexus 2013 LX570 Repair Manual (RM19F0U)
No ratings yet
TIS - Lexus 2013 LX570 Repair Manual (RM19F0U)
18 pages
Honda Nx500
No ratings yet
Honda Nx500
92 pages
Plot
No ratings yet
Plot
34 pages
Aayushi Bda File
No ratings yet
Aayushi Bda File
41 pages
Unit 4 DVTTT
No ratings yet
Unit 4 DVTTT
24 pages
Dismantling Procedure
No ratings yet
Dismantling Procedure
23 pages
BAB 5-2 MTK Graph in R PT 2 Materi Line Plot
No ratings yet
BAB 5-2 MTK Graph in R PT 2 Materi Line Plot
9 pages
R Program
No ratings yet
R Program
2 pages
Exploratory Data Analysis - NOTES
No ratings yet
Exploratory Data Analysis - NOTES
31 pages
Statistics Introduction
No ratings yet
Statistics Introduction
8 pages
Commercial Vehicle Business Unit: Product Spare Parts Catalogue
100% (1)
Commercial Vehicle Business Unit: Product Spare Parts Catalogue
195 pages
Using R For Basic Statistical Analysis
No ratings yet
Using R For Basic Statistical Analysis
11 pages
Rotary Numbering Machine: Operators Guide
No ratings yet
Rotary Numbering Machine: Operators Guide
35 pages
Task 1 Statistics
No ratings yet
Task 1 Statistics
10 pages
Final DSR Lab Record
No ratings yet
Final DSR Lab Record
16 pages
2023 Tutorial 12
No ratings yet
2023 Tutorial 12
6 pages
Lab1: Introduction To R: Islr2
No ratings yet
Lab1: Introduction To R: Islr2
10 pages
W4 Manipulate Dataframe
No ratings yet
W4 Manipulate Dataframe
35 pages
Functions and Packages
No ratings yet
Functions and Packages
7 pages
Data Science Lab
No ratings yet
Data Science Lab
28 pages
Fast R
No ratings yet
Fast R
43 pages
Business Analytics-1: STR (Crew - Data)
No ratings yet
Business Analytics-1: STR (Crew - Data)
16 pages
Mtcars: Choosing The Most Related Variable (S) To The Response
No ratings yet
Mtcars: Choosing The Most Related Variable (S) To The Response
13 pages
Multivarable Analysis
No ratings yet
Multivarable Analysis
20 pages
Week2 Submission Assignment Solution AshaA-3
No ratings yet
Week2 Submission Assignment Solution AshaA-3
2 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
Descriptive and Inferential Statistics With R
No ratings yet
Descriptive and Inferential Statistics With R
6 pages
AllCheatSheets Stata v15
100% (1)
AllCheatSheets Stata v15
6 pages
Motor Trend Car Road Tests
No ratings yet
Motor Trend Car Road Tests
5 pages
Graph Plotting in R Programming
No ratings yet
Graph Plotting in R Programming
12 pages
Cheat Sheet: With Stata 15
No ratings yet
Cheat Sheet: With Stata 15
6 pages
Clodes Class Data Science
No ratings yet
Clodes Class Data Science
14 pages
AllCheatSheets Stata v15 PDF
No ratings yet
AllCheatSheets Stata v15 PDF
6 pages
MTCARS Regression Analysis
No ratings yet
MTCARS Regression Analysis
5 pages

Statistics

Uploaded by

Statistics

Uploaded by

R Statistics

 The R language was developed by two statisticians. It has many built-in

“mtcars” data set:

Information About the Data Set

# Use dim() to find the dimension of the data set

 A variable is defined as something that can be measured or counted.

Print Variable Values

Sort Variable Values

Data_Cars <- mtcars

By observing the table, it looks like the largest hp value belongs to a

Maserati Bora, and the lowest belongs to a Honda Civic.

 Or even better, combine which.max() and which.min() with the rownames()

Now we know for sure:

R Mean, Median, and Mode

The mean() function in R can do it.

Data_Cars <- mtcars

Data_Cars <- mtcars

Instead of counting it ourselves, we can use R to find the mode:

1.513 1.615 1.835 1.935 2.140 2.200 2.320 2.465

# c() specifies which percentile you want

 Use the quantile() function to get the quartiles.

You might also like