0% found this document useful (0 votes)
3 views10 pages

R Programming-Chapiter 6

Chapter 6 discusses data manipulation in R, highlighting its statistical capabilities and built-in datasets. It covers how to use existing datasets, import and export files, and perform basic statistical analyses using various functions. The chapter also includes an example of analyzing the 'cars' dataset, detailing steps for exploration and visualization.

Uploaded by

memoiremath1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views10 pages

R Programming-Chapiter 6

Chapter 6 discusses data manipulation in R, highlighting its statistical capabilities and built-in datasets. It covers how to use existing datasets, import and export files, and perform basic statistical analyses using various functions. The chapter also includes an example of analyzing the 'cars' dataset, detailing steps for exploration and visualization.

Uploaded by

memoiremath1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

42

Chapter 6
DATA MANIPULATION IN R
Our goal here is to give some landmarks with the aim to have an idea of the features
of R to perform statistical and data analyses. R software is an environment within which
many classical and modern statistical techniques have been implemented [ref4]. In this
range, some statistical methods are available in a large number of packages. Some of them
are distributed with a base installation of R (about 25 packages supplied with R (called
“standard” and “recommended” packages) [4], and many other packages are contributed
and must be installed by the user [2] and re available through the CRAN family of Internet
sites (via https://CRAN.R-project.org) and elsewhere.

6.1 DataSets in R
The R programming language has many built-in datasets that can usually be used as
sample data to illustrate the performance of R functions.

6.1.1 What is DataSet

A dataset is a data collection presented in a table.

6.1.2 Using Existing DataSet in R

For more details about datset packages in R consult the link : https://stat.ethz.ch/R-
manual/R-devel/library/datasets/html/00Index.html. It shows a set of existing dataset
in R that can be used and explored using statistical functions. Table 6.1 presents a few
dataset that existing in R.
43

N° DataSet Name in R Description

1 Cars Speed and stopping Distances of cars.

2 BJsales.lead Sales Data with Leading Indicator.

3 iris Edgar Anderson’s Iris Data.

4 airquality New York Air Quality Measurements.

5 Nile Flow of the River Nile.

Table 6.1: Sample Existing DataSet in R.

Example: In this example, we will explore the dataset "airquality". To display the
dataset, we simply write the name of the dataset inside the print() as shown in the table
6.2

N° Function Example Description Execution result

1 print print("airquality") display the data of the dataset. All data

2 dim dim(airquality) get dimension of dataset 1536

3 nRow nrow(airquality) get number of rows 153

4 cCol ncol(airquality) get number of columns 6

5 Names names(airquality) get name of variable of dataset All names

6 $ print(airquality$Temp) display all values of Temp variable All values

7 sort sort(airquality$Temp) sort values of Temp variable sort values

Table 6.2: Functions to Get Information About the Dataset.

6.2 Directory functions in R


In R software, there are two important predefined functions that allow a user to
designate a working directory (see figure 6.1). These functions are the following:

• getwd(): this function is used to get the current working directory.


44

• setwd(): this function is used to change the current working directory.

Example:

Figure 6.1: Directory functions in R.

6.3 Importing Files in R Software


For statistical analysis, it is important to use certain functions with the R program-
ming language to work with system directories and import and export data from these
directories. The R software can read different types of files such as (CSV) files, text files,
Excel sheets and files, SPSS files, SAS files, etc. Table 6.3 shows some functions that can
be used to read some files.

File type Function

Text file read.table

CSV file read.csv

Excel file read.xlsx

Table 6.3: The reading functions of files in R.

6.3.1 Importing Text Files in R

The function "read.table" allows to read text files saved in the current working directory
and then import the data from that particular text file as shown in the figure 6.2:
45

Figure 6.2: Read Text File in R.

6.3.2 Importing CSV Files in R

The function "read.csv" allows to read CSV file saved in the current working directory
and then import the data from that particular text file as shown in the figure 6.3:

Figure 6.3: Read CSV File in R.


46

6.3.3 Importing Excel Files in R

The function "read.xlsx" allows to read Excel files saved in the current working di-
rectory and then import the data from that particular text file as shown in the figure
6.4:

Figure 6.4: Read Excel File in R.

6.4 Exporting files in R Software


To export data to a file, R software contains some functions that allow saving data
into files of different types.

6.4.1 Exporting Data to Text Files in R

The function "sink" allows exporting data to a text file in the current working directory
as shown in figure 6.5:
47

Figure 6.5: Export Data to Text File in R.

6.4.2 Exporting to CSV Files in R

The function "write.csv" allows exporting data to CSV file saved in the current working
directory as shown in figure 6.6:

Figure 6.6: Export Data to CSV File in R.

6.5 Basic Statistics


R is a statistical computing language, and many functions integrated into R are de-
veloped for statistical purposes. In this section, we will examine some basic statistical
functions and use R to illustrate their application [6].
48

6.5.1 Statistical Functions in R

The following table (see table 6.4) summarizes the most important basic statistical
functions found in the R program, giving the name of the function, its implementation
method, and its role.

N° Function Run in R Description

1 Mean mean(Vector) Calculate the average of a vector.

2 Trimmed Mean mean(Vector,trim=0.##) Calculate the mean of certain proportion of the vector.

3 Variance var(Vector) Measure the spread of a vector.

4 Standard Deviation sd(Vector) measure the spread of the data in the vector.

5 Standard Error sd(Vector)/sqrt(length(Vector)) Display the error associated with a point estimate.

6 Median Absolute Deviation mad(Vector) calculate the average distance between each datapoint.

7 Median median(Vector) Estimate the center of the data in the vector.

8 Minimum min(Vector) Find the smallest value in the vector.

9 Maximum max(Vector) Find the largest value in the vector.

10 Range max(Vector, - min(Vector) caclulate the maximum minus the minimum.

11 Quantile quantile (Vector, c(##)) calculate n percent of the data in a vector.

12 Interquartile Range IQR(vector) calculate the middle 50% of data.

Table 6.4: The Most Important Basic Statistical Functions in R.

Example: We can use the summary() function to get statistical information about
the variable in the dataset as shown in figure 6.7. This function returns six statistical
summaries which are: min, First Quartile, Median, Mean, Third Quartile, and Max. The
example shows the statistical information about the Temp variable.
49

Figure 6.7: Get Statistical information Using summary Functions in R.

6.6 Data analysis of cars


In this section, we are going to analyze the dataset in R which is called cars. We can
find the description of this dataset by just writing "cars" in the help section in Rstudio,
checking figures 6.8 and 6.9.

Figure 6.8: Cars dataset


50

Figure 6.9: Information about Cars dataset

To formally do a good analysis of this data we need to follow the following steps:

1. Get to know the details of this dataset by using the functions "names(), col.names(),
row.names().

2. Defines the data cars by using the function "view()" which can be used to invoke a
spreadsheet-style data viewer within RStudio.

3. Get to know the type of car data.

4. Use the function "summary()" to summarize the data frame into just one value or
vector.

5. Separate the information into two sections by using "summary()[,1]" and "summary()[2,]"

6. Plot the data and give a name to the x-axis by "speed", and a name to the y-axis
by "stop distance" and give this title "cars data".
51

7. Choose data of the variable "speed" and also of the variable "distance" and plot its
histogram.

8. check the ANOVA analysis for the following variables: "cars.1, cars.2, cars.3, cars.4".

You might also like