Data Import, Export and Analysis using R
Data Import, Export and Analysis using R
Module-4
Data Analysis
data1 ->
• Output:
Data_gfg <-read.xlsx('Data_gfg.xlsx’)
Data_gfg
Parameters:
• x: a matrix or a data frame to be written.
• file: a character specifying the name of the result file.
• sep: the field separator string, e.g., sep = “\t” (for tab-separated value).
• dec: the string to be used as decimal separator. Default is “.”
• row.names: either a logical value indicating whether the row names of x are to be
written along with x, or a character vector of row names to be written.
• col.names: either a logical value indicating whether the column names of x are to be
written along with x, or a character vector of column names to be written.
• Quote: logical value which is by default TRUE used to represent whether quotes are
required or not
15-03-2025 Dr. V. Srilakshmi 21
Exporting Data from R Scripts
• write.table():
• Example:
# R program to illustrate Exporting data from R
# Creating a dataframe
df = data.frame( "Name" = c("Amiya", "Raj", "Asish"),
"Language" = c("R", "Python", "Java"),
"Age" = c(22, 25, 45))
15-03-2025 38
Dr. V. Srilakshmi
Data cleaning and summarizing with dplyr package
• select(): Select column list either by name or index number
sample <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45))
dplyr::rename(sample,PULSE1=Pulse,DURATION1=Duration)
dplyr::select(sample,Training)
dplyr::select(sample,Pulse,Training)
dplyr::select(sample,1,3)
dplyr::select(sample,1:3)
15-03-2025 39
Dr. V. Srilakshmi
Data cleaning and summarizing with dplyr package
• select(): Some additional options to select columns based
on a specific criteria include
15-03-2025 40
Dr. V. Srilakshmi
Example:
15-03-2025 46
Dr. V. Srilakshmi
Data cleaning and summarizing with dplyr package
• mutate(): mutate() function in R Programming Language is used to add new
variables in a data frame which are formed by performing operations on existing
variables.
• Syntax: mutate(x, expr)
• In R there are five types of main function for mutate that are discribe as below.
we will use dplyr package in R for all mutate functions.
• mutate() - adds new variables while retaining old variables to a data frame.
• transmute() - adds new variables and removes old ones from a data frame.
• mutate_all() - changes every variable in a data frame simultaneously.
• mutate_at() - changes certain variables by name.
• mutate_if() - alterations all variables that satisfy a specific criterion
15-03-2025 47
Dr. V. Srilakshmi
Data cleaning and summarizing with dplyr package
mutate() Example:
library(dplyr)
# Create a data frame
d <- data.frame( name = c("Abhi", "Bhavesh", "Chaman", "Dimri"),
age = c(7, 5, 9, 16),
ht = c(46, NA, NA, 69),
school = c("yes", "yes", "no", "no") )
print(d)
# Calculating a variable x3 which is sum of height and age printing with ht and
age
dplyr::mutate(d, x3 = ht + age)
15-03-2025 48
Dr. V. Srilakshmi
Data cleaning and summarizing with dplyr package
transmute() Example:
# Use transmute to create a new variable 'age_in_months' and drop the 'age'
variable
result <- transmute(d,
name = name,
age_in_months = age * 12,
ht,school)
print(result)
15-03-2025 49
Dr. V. Srilakshmi
Data cleaning and summarizing with dplyr package
mutate_all() Example: The mutate_all() function changes every variable in a data
frame at once, enabling you to use the funs() function to apply a certain function to every
variable.
15-03-2025 55
Dr. V. Srilakshmi
Data cleaning and summarizing with dplyr
package
Summarise_all() Example:
15-03-2025 56
Dr. V. Srilakshmi
Data cleaning and summarizing with dplyr package
arrange():
• arrange() function in R Language is used for reordering of table rows with the help of
column names as expression passed to the function.
• Syntax: arrange(x, expr)
• Parameters:
• x: data set to be reordered
• expr: logical expression with column name
• Example:
#library(dplyr)
d <- data.frame( name = c("Abhi", "Bhavesh", "Chaman", "Dimri"),
age = c(7, 5, 9, 16) )
# Arranging name according to the age
d2<- dplyr::arrange(d, age)
print(d2)
15-03-2025 57
Dr. V. Srilakshmi
Data cleaning and summarizing with dplyr
package
• arrange(): arrange() function in R Language is used for reordering of table
rows with the help of column names as expression passed to the function.
15-03-2025 58
Dr. V. Srilakshmi
Data cleaning and summarizing with dplyr
package
• The count() function is part of the dplyr package, which is widely used for
data manipulation in R. It provides a convenient way to count the
occurrences of unique combinations of variables in a data frame.
15-03-2025 62
Dr. V. Srilakshmi
Exploratory Data Analysis
• Exploratory Data Analysis or EDA is a statistical approach or technique for analysing data
sets in order to summarize their important and main characteristics generally by using
some visual aids.
• The EDA approach can be used to gather knowledge about the following aspects of data:
• Main characteristics or features of the data.
• The variables and their relationships.
• Finding out the important variables that can be used in our problem.
• Exploratory Data Analysis in R:
• In R Language, we are going to perform EDA under two broad classifications:
• Descriptive Statistics, which includes mean, median, mode, inter-quartile range, and so on.
• Graphical Methods, which includes Box plot, Histogram, Pie graph, Line chart, Barplot, Scatter
Plot and so on.
15-03-2025 63
Dr. V. Srilakshmi
Exploratory Data Analysis
• Diagrammatic representation of data:
• The diagrammatic representation of data is one of the best and attractive way of
presenting data.
• It caters both educated and uneducated section of the society.
15-03-2025 64
Dr. V. Srilakshmi
15-03-2025 Dr. V. Srilakshmi 65
15-03-2025 Dr. V. Srilakshmi 66
15-03-2025 Dr. V. Srilakshmi 67
15-03-2025 Dr. V. Srilakshmi 68
Histograms
• A histogram contains a rectangular area to display the statistical information
which is proportional to the frequency of a variable and its width in
successive numerical intervals.
• We can create histograms in R Programming Language using the hist()
function.
• Syntax: hist(v, main, xlab, xlim, ylim, breaks, col, border)
• v: This parameter contains numerical values used in histogram.
• main: This parameter main is the title of the chart.
• col: This parameter is used to set color of the bars.
• xlab: This parameter is the label for horizontal axis.
• border: This parameter is used to set border color of each bar.
• xlim: This parameter is used for plotting values of x-axis.
• ylim: This parameter is used for plotting values of y-axis.
• breaks: This parameter is used as width of each bar.
15-03-2025 69
Dr. V. Srilakshmi
Histograms
• Example:
# Create data for the graph.
v <- c(19, 23, 11, 5, 16, 21, 32, 14, 19, 27, 39)
# Create the histogram.
hist(v, xlab = "No.of Articles ",col = "green", border = "black")
15-03-2025 70
Dr. V. Srilakshmi
Histograms
• Example:
# Create data for the graph.
v <- c(19, 23, 11, 5, 16, 21, 32, 14, 19, 27, 39)
# Create the histogram.
hist(v, xlab="No.of Articles ",col = "green", border = "black“,xlim=c(0,50),ylim=c(0,5),break=5)
15-03-2025 71
Dr. V. Srilakshmi
Pie graph
• A pie chart is a circular statistical graphic, which is divided into slices to illustrate
numerical proportions.
• It depicts a special chart that uses “pie slices”, where each sector shows the relative
sizes of data.
• A circular chart cuts in the form of radii into segments describing relative
frequencies or magnitude also known as a circle graph.
• Syntax: pie(x, labels, main, col, clockwise)
• x: This parameter is a vector that contains the numeric values which are used in the pie chart.
• labels: This parameter gives the description to the slices in pie chart.
• main: This parameter is representing title of the pie chart.
• clockwise: This parameter contains the logical value which indicates whether the slices are
drawn clockwise or in anti-clockwise direction.
• col: This parameter give colours to the pie in the graph.
15-03-2025 72
Dr. V. Srilakshmi
Pie graph
• Example:
bitmap(file="out.png")
Temp<- c(23, 36, 50, 43)
Cities <- c("Banglore", "Pune", "Chennai", "Amaravati")
# Plot the chart.
pie(Temp, Cities)
15-03-2025 73
Dr. V. Srilakshmi
Pie graph
• Example:
bitmap(file="out.png")
Temp<- c(23, 36, 50, 43)
Cities <- c("Banglore", "Pune", "Chennai", "Amaravati")
# Plot the chart.
pie(Temp, Cities, main = "City pie chart",
col = rainbow(length(Temp)) )
15-03-2025 74
Dr. V. Srilakshmi
15-03-2025 Dr. V. Srilakshmi 75
Barplot
• Bar charts are a popular and effective way to visually represent categorical data in a
structured manner.
• R uses the barplot() function to create bar charts. Here, both vertical and Horizontal
bars can be drawn.
• Syntax: barplot(H, xlab, ylab, main, names.arg, col)
• H: This parameter is a vector or matrix containing numeric values which are used in bar chart.
• xlab: This parameter is the label for x axis in bar chart.
• ylab: This parameter is the label for y axis in bar chart.
• main: This parameter is the title of the bar chart.
• names.arg: This parameter is a vector of names appearing under each bar in bar chart.
• col: This parameter is used to give colors to the bars in the graph.
15-03-2025 76
Dr. V. Srilakshmi
Barplot
• Example:
bitmap(file="out.png")
# Create the data for the chart
A <- c(17, 32, 8, 53, 1)
# Plot the bar chart
barplot(A, xlab = "X-axis", ylab = "Y-axis", main ="Bar-Chart")
15-03-2025 77
Dr. V. Srilakshmi
Barplot
• Example:
# Create the data for the chart
A <- c(17, 32, 8, 53, 1)
# Plot the bar chart
barplot(A, horiz = TRUE, xlab = "X-axis", ylab = "Y-axis", main ="Horizontal Bar Chart" )
15-03-2025 78
Dr. V. Srilakshmi
15-03-2025 Dr. V. Srilakshmi 79
15-03-2025 Dr. V. Srilakshmi 80
Scatter Plots
• A "scatter plot" is a type of plot used to display the relationship between two
numerical variables, and plots one dot for each observation.
• It needs two vectors of same length, one for the x-axis (horizontal) and one for the
y-axis (vertical).
• Syntax: plot(x, y, main, xlab, ylab, xlim, ylim, axes)
• x: This parameter sets the horizontal coordinates.
• y: This parameter sets the vertical coordinates.
• xlab: This parameter is the label for horizontal axis.
• ylab: This parameter is the label for vertical axis.
• main: This parameter main is the title of the chart.
• xlim: This parameter is used for plotting values of x.
• ylim: This parameter is used for plotting values of y.
• axes: This parameter indicates whether both axes should be drawn on the plot.
15-03-2025 81
Dr. V. Srilakshmi
Scatter Plots
• Example:
# Get the input values.
input <- mtcars[, c('wt', 'mpg')]
# Plot the chart for cars with weight between 1.5 to 4 and mileage between 10 and 25.
plot(x = input$wt, y = input$mpg,
xlab = "Weight",
ylab = "Milage",
xlim = c(1.5, 4),
ylim = c(10, 25),
main = "Weight vs Milage"
)
15-03-2025 82
Dr. V. Srilakshmi
Line Graphs
• A line graph is a chart that is used to display information in the form of a series of
data points.
• It utilizes points and lines to represent change over time.
• Line graphs are drawn by plotting different points on their X coordinates and Y
coordinates, then by joining them together through a line from beginning to end.
• Syntax: plot(v, type, col, xlab, ylab)
• v: This parameter is a contains only the numeric values
• type: This parameter has the following value:
• “p” : This value is used to draw only the points.
• “l” : This value is used to draw only the lines.
• “o”: This value is used to draw both points and lines
• xlab: This parameter is the label for x axis in the chart.
• ylab: This parameter is the label for y axis in the chart.
• main: This parameter main is the title of the chart.
• col: This parameter is used to give colors to both the points and lines.
15-03-2025 83
Dr. V. Srilakshmi
Line Graphs
• Example:
# Create the data for the chart.
v <- c(17, 25, 38, 13, 41)
15-03-2025 84
Dr. V. Srilakshmi
15-03-2025 Dr. V. Srilakshmi 85
Boxplots
• A box graph is a chart that is used to display information in the form of distribution
by drawing boxplots for each of them.
• This distribution of data is based on five sets (minimum, first quartile, median, third
quartile, and maximum).
• Syntax: boxplot(x, data, notch, varwidth, names, main)
• x: This parameter sets as a vector or a formula.
• data: This parameter sets the data frame.
• notch: This parameter is the label for horizontal axis.
• varwidth: This parameter is a logical value. Set as true to draw width of the box proportionate
to the sample size.
• main: This parameter is the title of the chart.
• names: This parameter are the group labels that will be showed under each boxplot.
15-03-2025 86
Dr. V. Srilakshmi
Boxplots
• Example:
# use head() to load first six rows of mtcars dataset
head(mtcars)
15-03-2025 87
Dr. V. Srilakshmi
Boxplots
• Example:
# add title, label, new color to boxplot
boxplot(mtcars$mpg,
main="Mileage Data Boxplot",
ylab="Miles Per Gallon(mpg)",
xlab="No. of Cylinders",
col="orange")
15-03-2025 88
Dr. V. Srilakshmi
15-03-2025 Dr. V. Srilakshmi 89
15-03-2025 Dr. V. Srilakshmi 90
15-03-2025 Dr. V. Srilakshmi 91
15-03-2025 Dr. V. Srilakshmi 92
15-03-2025 Dr. V. Srilakshmi 93
15-03-2025 Dr. V. Srilakshmi 94
15-03-2025 Dr. V. Srilakshmi 95
15-03-2025 Dr. V. Srilakshmi 96
15-03-2025 Dr. V. Srilakshmi 97
15-03-2025 Dr. V. Srilakshmi 98
15-03-2025 Dr. V. Srilakshmi 99
15-03-2025 Dr. V. Srilakshmi 100
15-03-2025 Dr. V. Srilakshmi 101
15-03-2025 Dr. V. Srilakshmi 102
15-03-2025 Dr. V. Srilakshmi 103
15-03-2025 Dr. V. Srilakshmi 104
15-03-2025 Dr. V. Srilakshmi 105
15-03-2025 Dr. V. Srilakshmi 106
15-03-2025 Dr. V. Srilakshmi 107
15-03-2025 Dr. V. Srilakshmi 108
15-03-2025 Dr. V. Srilakshmi 109
15-03-2025 Dr. V. Srilakshmi 110
15-03-2025 Dr. V. Srilakshmi 111
15-03-2025 Dr. V. Srilakshmi 112
15-03-2025 Dr. V. Srilakshmi 113
15-03-2025 Dr. V. Srilakshmi 114
15-03-2025 Dr. V. Srilakshmi 115
15-03-2025 Dr. V. Srilakshmi 116
15-03-2025 Dr. V. Srilakshmi 117
15-03-2025 Dr. V. Srilakshmi 118
15-03-2025 Dr. V. Srilakshmi 119
15-03-2025 Dr. V. Srilakshmi 120
15-03-2025 Dr. V. Srilakshmi 121
15-03-2025 Dr. V. Srilakshmi 122
15-03-2025 Dr. V. Srilakshmi 123
15-03-2025 Dr. V. Srilakshmi 124
15-03-2025 Dr. V. Srilakshmi 125
15-03-2025 Dr. V. Srilakshmi 126
15-03-2025 Dr. V. Srilakshmi 127
15-03-2025 Dr. V. Srilakshmi 128
15-03-2025 Dr. V. Srilakshmi 129
15-03-2025 Dr. V. Srilakshmi 130
15-03-2025 Dr. V. Srilakshmi 131
15-03-2025 Dr. V. Srilakshmi 132
15-03-2025 Dr. V. Srilakshmi 133
15-03-2025 Dr. V. Srilakshmi 134
15-03-2025 Dr. V. Srilakshmi 135
15-03-2025 Dr. V. Srilakshmi 136
15-03-2025 Dr. V. Srilakshmi 137
15-03-2025 Dr. V. Srilakshmi 138
15-03-2025 Dr. V. Srilakshmi 139
15-03-2025 Dr. V. Srilakshmi 140
15-03-2025 Dr. V. Srilakshmi 141
15-03-2025 Dr. V. Srilakshmi 142
15-03-2025 Dr. V. Srilakshmi 143
15-03-2025 Dr. V. Srilakshmi 144
15-03-2025 Dr. V. Srilakshmi 145
15-03-2025 Dr. V. Srilakshmi 146
15-03-2025 Dr. V. Srilakshmi 147
15-03-2025 Dr. V. Srilakshmi 148
15-03-2025 Dr. V. Srilakshmi 149
15-03-2025 Dr. V. Srilakshmi 150
15-03-2025 Dr. V. Srilakshmi 151
15-03-2025 Dr. V. Srilakshmi 152
15-03-2025 Dr. V. Srilakshmi 153
15-03-2025 Dr. V. Srilakshmi 154
15-03-2025 Dr. V. Srilakshmi 155
15-03-2025 Dr. V. Srilakshmi 156
15-03-2025 Dr. V. Srilakshmi 157
15-03-2025 Dr. V. Srilakshmi 158
15-03-2025 Dr. V. Srilakshmi 159
15-03-2025 Dr. V. Srilakshmi 160
15-03-2025 Dr. V. Srilakshmi 161
15-03-2025 Dr. V. Srilakshmi 162
15-03-2025 Dr. V. Srilakshmi 163
15-03-2025 Dr. V. Srilakshmi 164
15-03-2025 Dr. V. Srilakshmi 165
15-03-2025 Dr. V. Srilakshmi 166
15-03-2025 Dr. V. Srilakshmi 167
15-03-2025 Dr. V. Srilakshmi 168
15-03-2025 Dr. V. Srilakshmi 169
15-03-2025 Dr. V. Srilakshmi 170
15-03-2025 Dr. V. Srilakshmi 171
15-03-2025 Dr. V. Srilakshmi 172
15-03-2025 Dr. V. Srilakshmi 173
15-03-2025 Dr. V. Srilakshmi 174
15-03-2025 Dr. V. Srilakshmi 175
15-03-2025 Dr. V. Srilakshmi 176
15-03-2025 Dr. V. Srilakshmi 177
15-03-2025 Dr. V. Srilakshmi 178
15-03-2025 Dr. V. Srilakshmi 179
15-03-2025 Dr. V. Srilakshmi 180
15-03-2025 Dr. V. Srilakshmi 181
15-03-2025 Dr. V. Srilakshmi 182
15-03-2025 Dr. V. Srilakshmi 183
15-03-2025 Dr. V. Srilakshmi 184
15-03-2025 Dr. V. Srilakshmi 185
15-03-2025 Dr. V. Srilakshmi 186
15-03-2025 Dr. V. Srilakshmi 187
15-03-2025 Dr. V. Srilakshmi 188
15-03-2025 Dr. V. Srilakshmi 189
15-03-2025 Dr. V. Srilakshmi 190